Hyperspectral Mixed Denoising via Spectral Difference-Induced Total Variation and Low-Rank Approximation †

: Exploration of multiple priors on observed signals has been demonstrated to be one of the effective ways for recovering underlying signals. In this paper, a new spectral difference-induced total variation and low-rank approximation (termed SDTVLA) method is proposed for hyperspectral mixed denoising. Spectral difference transform, which projects data into spectral difference space (SDS), has been proven to be powerful at changing the structures of noises (especially for sparse noise with a speciﬁc pattern, e.g., stripes or dead lines present at the same position in a series of bands) in an original hyperspectral image (HSI), thus allowing low-rank techniques to get rid of mixed noises more efﬁciently without treating them as low-rank features. In addition, because the neighboring pixels are highly correlated and the spectra of homogeneous objects in a hyperspectral scene are always in the same low-dimensional manifold, we are inspired to combine total variation and the nuclear norm to simultaneously exploit the local piecewise smoothness and global low rankness in SDS for mixed noise reduction of HSI. Finally, the alternating direction methods of multipliers (ADMM) is employed to effectively solve the SDTVLA model. Extensive experiments on three simulated and two real HSI datasets demonstrate that, in terms of quantitative metrics (i.e., the mean peak signal-to-noise ratio (MPSNR), the mean structural similarity index (MSSIM) and the mean spectral angle (MSA)), the proposed SDTVLA method is, on average, 1.5 dB higher MPSNR values than the competitive methods as well as performing better in terms of visual effect.


Introduction
Hyperspectral images (HSIs) contain a broad range of spectral information from 400 nm to 2500 nm which can provide a rich observation capability beyond human vision. This enables HSIs to be widely used in many applications [1], e.g., precision agriculture, pharmaceutical, medical diagnosis, and food security. However, in the course of data acquisition, the HSIs are often degraded by multiple noises, such as Gaussian noise and sparse noise (including impulse noise, stripe noise and dead lines). It significantly decreases the accuracy of subsequent applications, e.g., classification [2][3][4], target detection [5] and unmixing [6,7]. Therefore, it is necessary and important to develop more effective noise reduction techniques for HSIs.
To date, enormous approaches have been developed for HSI denoising. For instance, in the framework of Bayesian inferring, noise-adjusted principal component (NAPC) analysis [8] and minimum noise fraction (MNF) [9] are the representative denoising methods for HSIs and both have been successfully applied to ENVI and EDARS software. Besides, by treating an HSI as the combination of gray images, lots of traditional methods, e.g., bilateral filtering [10], total variation [11], nonlocal means [12] and convolutional networks [13], can be directly applied to recover HSI data. However, due to lack of the consideration of spectral correlations, these methods often result in unsatisfactory performance.
Being good at learning and representing multi-scale information of signals with few atoms, wavelet methods [14] have become another powerful instrument for HSI denoising. Sparse regularizations, e.g., Lasso penalty [15] or l 1 sparsity [16], are always combined with wavelet transform for image denoising. In [17,18], principal component analysis (PCA) was combined with a 3D or 4D wavelet filtering to get rid of the Gaussian noise in the low-energy channels and improved the denoising performance. In [19], a novel wavelet-based sparse reduced-rank regression (WSRRR) method, in which the tuning parameters were adaptively calculated based on Stein's unbiased risk estimation, was introduced for Gaussian noise reduction of HSI. To further reduce the computational complexity, a parameter-free method for the restoration of HSIs termed HyRes [20] was proposed. It introduced a sparse low-rank model by fixing the orthogonal projection instead of updating it iteratively, thus estimating the unknown signal in the subspace and saving the time. In addition to wavelet technique, sparse representation (SR) is also popular for HSI denoising, due to its powerful capacity of linearly representing signals with few basis [21]. It makes a great consistence with the linear mixture assumption that spectra in HSIs all lie in a subspace linearly spanned by the spectra of endmembers. In [22], a novel HSI denoising method was proposed to use the global and local redundancy and correlation (RAC) in spectral and spatial domains. While in [23], a novel spectral-spatial distributed SR method was put forward for HSI denoising. It exploited both intraband and interband structures in the course of learning. To fully use the highly correlated spectral information and similar spatial information, a novel spatial and spectral adaptive SR (SSASR) method [24] was introduced to further improve the performance of estimation. More literatures for HSI denoising related to SR or dictionary learning can be found in [25][26][27] and therein references. Moreover, deep learning, as one of the powerful nonlinear feature extraction and signal representation techniques [28][29][30], have gradually emerged in HSI denoising and achieved considerable results [31,32].
Recently, low-rank techniques have drawn increased attention and become one of the powerful tools in exploiting the intrinsic properties of HSIs. A multitude of remarkable methods have been proposed for HSI denoising in the framework of low-rank approximation. For example, a novel destriping method via low-rank representation (LRR) was introduced for HSI [33]. It employed global LRR for exploring the highly correlated spectral information between bands and enforced a graph constraint to preserve the local details. Later, Zhang et al. [34] proposed to employ low-rank matrix recovery (LRMR) for HSI mixed noise removal by "GoDec" algorithm [35] and achieved significant success. However, LRMR has one obvious weakness that it only considers the local similarity within local patches and ignores the unbiased noise intensity in each patch. To alleviate such problem, the spectrally nonlocal LRR [36], group LRR [37], noise adaptive estimation [38] and subspace LRR [39] were put forward to further explore more useful information of HSI and improve the performance. Meanwhile, several tensor low rank (TLR)-based methods [40][41][42] are also proposed to exploit the spatial-spectral structures in overlapped cubic tensors. Compared to LR methods, those TLR methods only modify to exploit the low-rank property from a 2d matrix manner to a 3d tensor manner, thus further improving the denoising performance to an extent. However, there are still two fatal flaws for those pure LR or TLR-based methods. First, because of the independent distribution of Gaussian noise, LR or TLR methods are not excellent at getting rid of them completely. Second, either LR or TLR methods cannot completely get rid of the structured stripes and dead lines, that is, when they present at the same position in a series of bands, pure LR or TLR approaches would treat them as low-rank features and retain them.
These two shortcomings severely limit the denoising accuracy of LR and TLR methods. To alleviate these limitations, other techniques that explore more additional information should be involved for further improvement. Total variation (TV) which has been demonstrated as an effective tool for Gaussian noise removal, should be one of the best candidates. Recently, band-by-band TV [43], 3-dimensional TV (3DTV) [44,45], and tensor TV (TenTV) [46] have been involved into the framework of LR and TLR framework for HSI mixed denoising. Although these three types of algorithms could significantly improve the precision of HSI mixed denoising, they still cannot completely get rid of the sparse noise with a specific pattern. The reason may be that TV regularizer is prone to enforce the pixel differences along horizontal and vertical directions, and one of the two directions just coincides with the direction of the stripes or dead lines. It means that the TV defined in each band (e.g., band-by-band TV and TenTV) would maintain the structure of stripes or dead lines along the horizontal or vertical direction. Benefitting from the spectral difference constraint, 3DTV can assist LRR [45] or TLR [47] techniques to remove the structured sparse noise to an extent; however, it will lead, more or less, to spectral distortion. More recently, a novel HSI denoising method which enforces LRR on the spectral difference space (LRRSDS) [48] has achieved great attention in suppressing the structured sparse noise. It can change the structures of noises by projecting them into the SDS, and then remove them effectively with LR techniques. However, one shortcoming for LRRSDS is that it does not take the spatially local correlations into consideration, thus failing to remove the heavy Gaussian noise completely.
By taking full consideration of the above three techniques (i.e., TV, LRR and SDS), in this paper, we propose a novel spectral difference-induced TV and low-rank approximation method, termed SDTVLA, for HSI mixed denoising, and the flowchart of it is illustrated in Figure 1. It needs to be noted that this manuscript is extended from our conference paper [49] and the contributions are summarized as follows.

1.
The proposed method takes full consideration of three kinds of noises that exist in HSI, i.e., random sparse noise, Gaussian noise, and structured sparse noises. To completely remove all of them, multiple priors (TV, LRR and SDS) are fused into a unified framework to accurately reconstruct the underlying clean HSI.

2.
The combination of TV and SDS can be treated as a novel cross TV (CTV) which is defined as the conventional 2-d TV across one-dimensional spectral TV, and CTV has been validated to be effective for dealing with both Gaussian noise and structured stripes. 3.
The SDTVLA model with all convex terms is easy to be separately solved by alternating direction method of multipliers (ADMM).

4.
Extensive experiments on three simulated and two real HSI datasets demonstrate the superiority of SDTVLA algorithm in terms of visual effect and quantitative assessment.
The remainder of this paper is organized as follows. Section 2 recalls the related works of two state-of-the-art LR-based methods. Section 3 formulates the newly developed denoising method via spectral difference-induced TV and low-rank approximation. Section 4 presents the experimental results and discussions on simulated and real HSI datasets as well as the parameters analysis. Conclusions are drawn in Section 5.

Observation Model
Before formulating the problems for HSI denoising, we make some notations as follows. Let Y ∈ R (m×n)×l (2-d matrix reshaped from the 3-d tensor Y ∈ R m×n×l ) be the observed noisy HSI with m × n pixels and l bands, and let X ∈ R (m×n)×l represent the latent noise-free HSI. In the mixed noise scenario, the observation model for HSI can be formulated in a matrix form as where S ∈ R (m×n)×l denotes sparse noise in the scene and N ∈ R (m×n)×l represents the Gaussian noise generated by the sensor or atmospheric effect.

LRMR Model
Having (1) in mind, the LRMR model for HSI mixed denoising can be expressed as [34]: where ||S|| 1 = ∑ |s i | is the l 1 norm of the sparse noise matrix S and s i denotes the i-th element of X. ||X|| * denotes the well-known nuclear norm and it is defined as the sum of the absolute values of all singular values, i.e., ||X|| * = ∑ |σ i |. Due to the simplicity and good performance for mixed noise reduction, the model (2) and its variants have evolved into one of the fundamental models. In [34], the model (2) was equivalently rewritten as the following formulation.
where card(·) denotes the cardinality of S, and rank(·) represents the low-rank constraint. The model (3) can be effectively and easily solved by "GoDec" algorithm [35].
As analyzed in Section 1, LRMR cannot successfully recover the data with severe Gaussian noise and structured stripes. Take Figure 2 for example, it is easy to see that LRMR produces a satisfactory result for the band 103 with moderate noise level (see Figure 2a) while it fails to get rid of the heavy Gaussian noise and structured stripes in band 108, see Figure 2b.

LRTV Model
To alleviate the shortcoming of LRMR, the LRTV model [43] which combines band-by-band TV regularization and low-rank constraint together was put forward to further improve the denoising accuracy. It aims at solving the optimization problem as min X,S where ||X|| HTV is the so-called 'hyperspectral total variation' which enforces the conventional TV constraint in each band and is defined as: where X i is the vectorization of the i-th band image. H : R (m×n)×1 → R m×n is defined as an operator to reshape the one-dimensional vector into a 2D image, ∇ h and ∇ v can be seen as two convolution operators in the horizontal and vertical directions, respectively. Thanks to the HTV term, LRTV can yield much superior performance than LRMR, especially for getting rid of the severe Gaussian noise. However, neither LRTV nor LRMR can effectively get rid of the structured stripes completely. Take Figure 2 for instance, LRTV produces the cleaner result than LRMR method in moderate noise case, see Figure 2c. However, for band 108 with severe Gaussian noise and structured stripes, LRTV fails to remove them, see Figure 2d.

Spectral Difference Transformation
First, we give the definition of spectral difference transformation [48]. It aims at projecting the original data into the spectral difference image and then recovering the image from the residual space.
where X(:, i) denotes the i-th column of X, and it is also the vectorization of the i-th band image of the HSI.
According to the above definition, spectral difference transform is a linear projection. It will not change the distribution of Gaussian noise and impulse noise except for slightly changing the intensities of them. However, these changes can be alleviated by adjusting the related parameters in the proposed model. As analyzed in [48], the advantage of SDS is that it will change the patterns of the structured stripes and dead lines, so that the LR technique can be used to effectively remove them.
Moreover, by comparing the low rankness and total variation properties in SDS with those in the original HSI space, we find that the TV in SDS would promote much strong sparsity than that in the original HSI, see Figure 3, while the low rankness in SDS shows similar behavior as that in the original HSI space. Therefore, it inspires us to employ both in SDS to further improve the performance of HSI mixed denoising.

SDTVLA Model
Based on the above analysis, a novel spectral difference-induced TV regularization and low-rank approximation method is put forward for HSI mixed denoising and it aims at dealing with the optimization problem as: where ||∇ z X|| * ,TV is the novel combination term, which simultaneously exploits the local piecewise smoothness by the TV constraint and the global low rankness by the nuclear norm in SDS. It is defined as follows.
where ρ is a parameter that keeps the balance of TV regularization and low-rank constraint in SDS.
Parameters λ 1 and λ 2 control the tradeoff between the sparse noise term and the combination term. The advantages of model (7) can be summarized as follows.
• Different from the conventional TV constraint in each band, here the TV regularization is defined in SDS, and it can be seen as a cross TV that explores both spatial and spectral information.
Meanwhile, it could effectively help low-rank tools to reduce the severe Gaussian noise and structured stripes.

•
Spectral difference transform can effectively change the structures of the noises in the original HSI, thus enabling the TVLA regularization to further improve the denoising accuracy of the structured stripes.

•
With all convex regularizations, the model (7) can be effectively and easily solved by ADMM.

Optimization
In this subsection, ADMM is used to effectively solve the optimization problem (7) by spitting it into several simpler subproblems. By introducing the auxiliary variables Q 1 = ∇ z X, Q 2 = ∇ z X, Q 3 = ∇ h Q 2 and Q 4 = ∇ v Q 2 , the Lagrangian function of problem (7) can be expressed as where B 1 , B 2 , B 3 , B 4 are four augmented multipliers and µ > 0 is the regularization parameter. Generally, we minimize the Lagrangian function (9) iteratively over one variable while fixing the other ones. Algorithm 1 summarizes the main steps for solving the proposed SDTVLA model using ADMM.

Algorithm 1
The pseudo-code for SDTVLA solver via ADMM.
Update Lagrangian multipliers 8: Line 4 in Algorithm 1 is to solve the X subproblem as follows.
Optimization problem (10) is a quadratic programming problem and has an analytical solution by the n-dimensional fast Fourier transform (nFFT).
where ξ 1 2 , F −1 is the inverse nFFT operator, F is the nFFT operator, and H represents complex conjugate.
Line 5 in Algorithm 1 is to solve the S subproblem.
Optimization problem (12) can be effectively solved by the well-known soft-thresholding function.
where W = Y − X (k+1) , and sign(·) is an odd function that extracts the sign of a real number. Line 6 in Algorithm 1 is to solve the subproblems regarding the auxiliary variables Q i , i = 1, 2, 3, 4 [50,51]. In the following, we will show the details for solving the corresponding subproblems.
The optimization subproblem related to variable Q 1 can be expressed as: Problem (14) is a nuclear norm minimization problem, which can be easily solved by the famous singular-value thresholding operator, and the solution can be expressed as follows. where Z = UΣV T is the singular-value decomposition, and {σ i } 1≤i≤r are the first r largest singular values of matrix Z. The optimization problem related to variable Q 2 is similar to that of X, and it is also a quadratic programming problem.
The equivalent linear system of the model (17) can be express as: 4 . Regarding ∇ h and ∇ v as two convolutions along spatial directions, problem (17) has the closed form solution by using fast nFFT operator.
For the remaining two variables, i.e., Q 3 and Q 4 , the optimization subproblems related to them have the same mathematical form. That is, both can be effectively solved by the soft-thresholding function which has been given in (13).

Parameters Determination and Convergence Analysis
There are totally four parameters, i.e., λ 1 , λ 2 , ρ and the latent rank r, in SDTVLA solver. As analyzed above, parameter λ 1 has a strong relationship to the intensity of sparse noise in HSI. Empirically, we set it as λ 1 = 10 √ m × n by default in the following experiments, here n and m are dimensions of the image in each band of HSI. λ 2 is the combined regularizer's parameter, which controls the contribution of the low-rank and total variation constraints in SDS while ρ is the proportion parameter related to the cross TV regularizer. Later, we will systematically discuss and analyze the impact of all these four parameters on the experimental results.
With all the convex terms, the SDTVLA solver can theoretically guarantee good convergence [52]. In fact, the parameter µ has a great influence on the convergence rate of SDTVLA algorithm. To achieve a rapid convergence rate, we empirically set µ = 1.0 in the experiments and this value proves that the SDTVLA solver has a good convergence rate in practice. In the discussion section, we will give a detailed analysis of the convergence associated with the parameter µ and plot the curves.

Experimental Results and Discussions
In this section, three simulated datasets and two real HSI datasets are used to assess the performance of SDTVLA algorithm in the experiments. All data are normalized into the interval [0, 1] before the experiment, and after the denoising, they will be scaled back to the original range.

Datasets Description
Five datasets are employed to validate the effectiveness of the proposed SDTVLA solver. The details of the datasets are described as follows, and all datasets can be downloaded from the website: http://lesun.weebly.com/hyperspectral-data-set.html.

Competitive Methods and Assessment Indexes
To fully verify the superiority of the SDTVLA algorithm, the following state-of-the-art denoising methods were used to be the benchmarks.
• BM4D [53]: one of the representative wavelet denoisers which explores the nonlocal self-similarities in a tensor manner and achieves great success in nature image denoising. • LRMR [34]: one of the outstanding HSI mixed denoising methods by using so-called "GoDec" algorithm to solve the patch-based low-rank matrix recovery problem. • LRTV [43]: a novel band-by-band TV regularized LRR method for mixed noise reduction of HSI. • 3DTVLR [45]: a novel mixed noise removal method combining three-dimensional TV (spatial TV and spectral TV) and LRR for HSI. • LRRSDS [48]: one of the state-of-the-art mixed noise reduction methods by enforcing the low-rank constraint in the SDS.

Experiments on Simulated Datasets
The simulation experiments are conducted in WDC, Pavia and Gulf datasets, because these three datasets all have a high image quality in each band and can be treated as clean HSIs. To accurately simulate the noises in a real HSI, several types of noises were added to each dataset according to the following criteria.

2.
Since the impulse noise is usually generated by the water absorption or atmosphere effect and is often present in continuous bands, we add the impulse noise to the bands from 90 to 110 with the density of 20% pixels being contaminated; 3.
Caused by the sensors, dead lines or stripes always exist in the HSI. Therefore, we add dead lines to 10 bands, and the width of each dead line is randomly set from 1 to 3.
The parameters of the comparison algorithms are tuned slightly according to the default values in the corresponding literatures. In addition, the results presented in the following experiments are based on the highest value of MPSNR. For the SDTVLA solver, four parameters are set as λ 1 = 0.01, λ 2 = 0.2, ρ = 0.1 and r = 2. To make the reader easy to reproduce the algorithm, Table 1 lists the optimal values of the parameters for each competitive method. Moreover, the MATLAB source code of the proposed algorithm as well as the competitive methods are released at the author's homepage (see Supplementary Materials for details). Table 1. Optimal parameters of the comparison algorithms on the simulated HSI datasets.

BM4D [53]
---LRMR [34] q = 26, r = 4, k card = 6000 q = 26, r = 9, k card = 4000 q = 26, r = 4, k card = 6000 LRTV [43] r = 8, λ = 10 √ m × n, τ = 0.004 Figure 5 illustrates the denoising results of band 59 obtained by the competitive methods in the simulated WDC dataset. In the simulation, band 59 is corrupted by severe impulse noise and Gaussian noise simultaneously. It is clear that BM4D is not good at getting rid of the impulse noise due to lack of the sparse term modeling. All low-rank-based denoisers (e.g., LRMR, LRTV, 3DTVLR, LRRSDS and SDTVLA) can successfully get rid of both impulse noise and Gaussian noise. The difference is that the methods with local spatial constraints (e.g., LRTV, 3DTVLR, SDTVLA) can produce more precise results than their only low-rank-based counterparts (e.g., LRMR and LRRSDS). Among all competitive methods, the proposed SDTVLA delivers the best result in removing all noises and preserving the fine details in both spatial and spectral domains. Figure 6 presents the denoising results of band 84 image for the simulated WDC dataset. This band is the mostly contaminated band in the dataset, it is simultaneously simulated with impulse noise, Gaussian noise, and the dead lines. From the illustration, it is obvious that the SDTVLA solver can perfectly get rid of all types of mixed noises and produce the best results in visual effects. Due to lack of sparse noise modeling, BM4D is almost incompetent to the impulse noise and dead lines. For the LRMR denoiser, most of the Gaussian noise and all the impulse noise are successfully removed, but some of the dead lines are still retained in the results (see the details in purple ellipse). LRTV significantly removes almost all the impulse and Gaussian noises. However, for the dead lines, LRTV cannot get rid of any of it and behaves even worse than LRMR. This phenomenon validates our analysis above, that is, when stripes or dead lines exist in the band, the TV regularizer will appear to be negative and prevent the low-rank term from getting rid of them. Because 3DTVLR method assigns the same weights for each pixel (especially for the weights of spectral TV), some dead lines are successfully removed but some dead lines are still partially retained (see the details in blue ellipse of Figure 6f). Without enforcing the spatial local correlation, LRRSDS can only get rid of the severe Gaussian noise to an extent (compare the zoomed-in portions in red rectangles of Figure 6g,h, respectively).  Figure 7 plots the spectrum of the pixel located at (110, 206) in the simulated WDC dataset before and after restoration. It is clear that the SDTVLA solver yields the best estimate of the original spectrum. This conclusion can also be drawn from the difference spectrum in Figure 7h, and it shows that SDTVLA gives the minimum residual spectrum. For the BM4D and LRTV methods, both produce several fluctuations in the spectrum (see Figure 7c,e), indicating that these two methods cannot get rid of the stripes or dead lines completely. For the LRMR denoiser, it can remove all kinds of noises to an extent. However, there are still some small fluctuations in the restored spectrum (see the details in purple ellipse and green rectangle). For the 3DTVLR denoiser, it can produce better results than the above three denoisers in getting rid of the mixed noise for HSI. One of its main drawbacks is that it sometimes does not precisely recover the first or last few bands due to the unbalanced spectral continuity enforced by the spectral TV regularizer. For instance, as shown in the difference spectrum of Figure 7f, it is obvious that the digital number (DN) values of the first few bands deviate too much (i.e., more than 1000) from the DN values in the original spectrum. For the LRRSDS denoiser, we can observe that it does not restore the spectrum as precise as that of SDTVLA method, especially in the parts highlighted by the purple ellipse and green rectangle in Figure 7g. To further verify the denoising performance of the SDTVLA solver on each band, Figure 8 presents the PSNR and SSIM values as a function of the bands regarding different methods in WDC, Pavia University and Gulf datasets. It leads us to conclude that the SDTVLA solver achieves higher PSNR and SSIM values than other algorithms in almost all bands. This phenomenon also makes great consistence with the visual results. In addition, it can be found that there are many serious fluctuations in the BM4D and LRTV results, especially for those bands with stripes or dead lines. It indicates that these two methods fail to remove the stripes or dead lines completely. For the LRMR method, it can also be found that there are several fluctuations in the result of Pavia University dataset. This phenomenon reveals that LRMR sometimes also produces inaccurate results due to the correlated structures in the data. For the 3DTVLR and LRRSDS methods, they can successfully remove all kinds of noises; however, the details of the data are not preserved as well as the SDTVLA method does. Table 2 exhibits the assessment indexes of the denoising results regarding different algorithms in the three simulated datasets. It draws the similar conclusion as that from the above illustrations, that is, the proposed SDTVLA solver delivers the best assessment metric values in all terms of MPSNR, MSSIM, FSSIM, ERGAS and MSA. Specifically, the proposed SDTVLA method produces about 1.5 dB higher PSNR values and 0.01 rad lower MSA values than the second-best method (i.e., 3DTVLR or LRRSDS). In addition, the average runtime of each method for these three simulated datasets are also listed in Table 2. All algorithms are implemented by MATLAB 2017a in Windows system with Intel Core i7 3.6 GHz CUP and 8 GB RAM. Compared with 3DTVLR and LRRSDS methods, the proposed SDTVLA solver explores both local smoothness and global low rankness in SDS. Therefore, it costs more computational time than other state-of-the-art methods.  In summary, the proposed SDTVLA method can produce the best results in both visual effects and quantitative assessment. Moreover, it shows overwhelming superiority in removing the complex mixed noises as well as preserving the spatial-spectral details when compared with the other state-of-the-art denoisers.

Experiments on Real Datasets
The second experiment was conducted in Urban and Indian Pines datasets. Both scenes are seriously contaminated by the Gaussian noise, stripes, and dead lines as well as atmosphere effect and water absorption. It is worth noting that the stripes existing in Urban are strongly structured, which means that they have very strongly correlated patterns. In addition, some continuous bands (i.e., band 105 to band 150) are affected by bias illumination, see the details in Figure 9c-e. Band   In the experiment for the two real HSI datasets, BM4D, LRMR, LRTV, 3DTVLR and LRRSDS methods were implemented by slightly adjusting the parameters to achieve the optimal visual results according to the rules for setting the parameters in the simulated datasets. In addition, a new denoising method, i.e., 3-dimensional cross TV method (3DCrTV) [51], is added for comparison. The details of the parameters setting for the comparison methods are listed in Table 3. Table 3. Parameters setting of competitive algorithms for the real HSI datasets.

BM4D [53]
--LRMR [34] q = 20, r = 7, k card = 4000 q = 20, r = 4, k card = 4000 LRTV [43] r = 3, λ = 10 √ m × n, τ = 0.001 Figure 9 illustrates the images of band 2, 104, 107, 141, 150, 208 and the denoising results of competitive methods. Among them, the observed image of band 2 is slightly contaminated by the Gaussian noise and stripes and band 104 is contaminated by severe Gaussian noise, impulse noise, and stripes. The observed images of band 107, 141, 150 and 208 are heavily corrupted by severe mixed noises and bias illumination. It is obvious that the SDTVLA solver delivers the optimal results than the other comparison denoisers in visual effect. It completely gets rid of the complex mixed noises and bias illumination due to its full exploration of useful structures in the SDS. BM4D behaves badly in getting rid of the impulse noise and stripes in Urban dataset (see the second row of Figure 9), because it follows the assumption that the noise is mainly Gaussian. Besides, BM4D oversmoothes the portions heavily corrupted by the Gaussian noise. LRMR and LRTV denoisers can get rid of the Gaussian noise and impulse noise completely (see the third and fourth rows in Figure 9a), but neither of them can successfully get rid of the structured stripes, for instance, many stripes are maintained in the restored results in the third and fourth rows of Figure 9b,c,e. Because the stripes existing in Urban dataset all locate at the same place from band 100 to 145. In addition, from Figure 9d,e, it clearly shows that LRMR and LRTV are not good at alleviating the impact of bias illumination. The 3DTVLR method can successfully remove most of the structured stripes by strengthening the weights of the spectral TV and get rid of the Gaussian noise and impulse noise. However, due to the severe mixed noise and bias illumination in the continuous bands, lots of details in the restored results of 3DTVLR are lost, see the fifth row of Figure 9c-e. For the 3DCrTV and LRRSDS methods, both explore the information in the SDS and can get rid of the Gaussian noise and impulse noise successfully. The difference is that 3DCrTV can successfully suppress the structured stripes (see the zoom-in portions in the sixth row of Figure 9b,c) but fail to get rid of the impact of bias illumination (see the sixth row of Figure 9e,f) by enforcing the local TV constraint, while LRRSDS is good at removing the bias illumination (see the seventh row of Figure 9e,f) but fail to remove the structured stripes completely (see the seventh row of Figure 9b,c) by exploiting the low-rank properties from the whole scene. Figure 10 presents the horizontal mean profiles of band 109 before and after denoising by different algorithms in Urban dataset. As plotted in Figure 10a, due to the existence of stripes and dead lines, there are rapid fluctuations in the observed curve. After denoising, it can be seen that the fluctuations are more or less reduced by the competitive algorithms. However, there are still many rapid fluctuations existing in the outputs of BM4D, LRMR, LRTV and LRRSDS, because the structured stripes are treated as low-rank features by these LR-based algorithms. It also indicates that the HTV cannot successfully assist LR technique to completely remove the structured stripes or dead lines. For the 3DTVLRT denoiser, it successfully gets rid of the mixed noises, especially the stripes and dead lines. However, due to the larger weight in spectral direction, 3DTVLR devotes to enforcing the reconstructed band to be as approximate as possible to its upper and lower bands, thus losing lots of fine details in spatial-spectral domains (see the parts in blue and brown rectangles of Figure 10e). For 3DCrTV method, it fails to suppress the bias illumination at the middle of the image (see the part in blue rectangle of Figure 10f), and also loses some fine details (see the part in red ellipse of Figure 10f). For the proposed SDTVLA, it completely removes the complex mixed noises as well as the bias illumination and finally produces the best mean profile curve with more details. To further validate the superior performance of the SDTVLA solver, two typical bands, i.e., band 107 and band 220, of the Indian Pine dataset are illustrated in Figures 11 and 12 to show the performance of all competitive algorithms. Here, it leads us to draw the same conclusion that the SDTVLA solver still gets the best performance for removing the complex mixed noises. Moreover, from the zoom-in portions within the red and green rectangles, SDTVLA preserves more fine details than other comparison denoisers. For the BM4D denoiser, it still fails to remove the impulse noise and stripes. For the LRMR denoiser, there are still lots of Gaussian noise left in the results. For TVLR and 3DTVLR, they distort many fine details by exploring the spatial smoothness with improper strength. For 3DCrTV and LRRSDS, they cannot reconstruct the images with more fine details as SDTVLA does due to lack of the combination of global and local priors.

Discussion
Basically, we introduced a new spectral difference-induced TV and low-rank approximation method for mixed denoising of HSI. Different from the existing low-rank-based methods, our method mainly exploits the useful information in the SDS. First, spectral difference projection can effectively change the intrinsic structures of the noises, especially for the structured stripes and dead lines, so that LR technique can successfully get rid of them. Besides, the total variation in SDS can be treated as a local correlation of the residual HSI and it is much sparser than the TV in original HSI. Moreover, low rankness can be treated as one of the intrinsic properties of the whole HSI data. Therefore, the proposed SDTVLA simultaneously explores the local piecewise correlation and global low rankness of HSI cube in the SDS. In the SDTVLA model, there are totally four parameters that need to be carefully identified. In the following, we will analyze the impact of each parameter on the restoration results of the SDTVLA algorithm. Specifically, a systematical discussion on how to choose such parameters in our experiments will be addressed. All the results are based on the simulated experiment in WDC dataset.
(1) The impact of parameters λ 1 and λ 2 . The two parameters are related to the density of sparse noise (i.e., impulse noise, stripes, and dead lines) and TV regularized low-rank regularizer, respectively. (2) The impact of the desired rank r. In the optimization of SDTVLA, we use the nuclear norm to encode the low-rank prior. Under the linear mixture model of HSI, as analyzed in LRMR solver, the value of r should be equivalent to the exact number of endmembers in the HSI. However, in SDTVLA model, we exploited the low-rank property in the SDS. According to ||LX|| * ≤ ||X|| * (here, L is a linear transformation), the value of r in SDTVLA solver should be smaller or equal to the number of endmembers. Figure 13c,d plot the MPSNR and MSSIM values as a function of λ 1 and r, respectively. From it, we can see that the parameter r indeed has a strong impact on the result. However, when r lies in the range of [1][2][3][4], SDTVLA can produce an optimal result.
(3) The impact of the spectral proportion ρ. The parameter ρ plays an important role to keep the balance of the TV regularization and low-rank constraint. Figure 13e,f plot the MPSNR and MSSIM values as a function of the parameter ρ. It clearly shows the benefit of tuning such a parameter.
(4) The convergence of SDTVLA solver. Figure 14 presents the MPSNR and MSSIM gains as a function of the iteration number of the SDTVLA solver. It is clear to see that the MPSNR and MSSIM values rapidly converge to the stable values as the number of iteration increases. This phenomenon proves convergence of the SDTVLA solver.

Conclusions
In this paper, we proposed a new HSI mixed denoising method by combining the TV regularization and low-rank approximation in the SDS. Specifically, the low-rank approximation in SDS is employed to explore the global spectral correlation from all bands and a TV in SDS regularizer is adopted to describe the piecewise smoothness in spatial-spectral domains of HSI. Therefore, the proposed SDTVLA method thoroughly exploits both spectrally global low rankness and spatially local smoothness in the spectral difference space. It discloses the fact that TVLA term in SDS could significantly get rid of the complex mixed noise. Extensive experiments on three simulated HSI datasets and two real HSI datasets validate the overwhelming superiority of the proposed SDTVLA solver over other comparison denoisers in removing the severely mixed noises, especially for the heavy Gaussian noise and structured sparse noise. Specifically, from the results of simulated experiments, we can easily observe that the MPSNR values achieved by the proposed SDTVLA are about 1.5 dB higher than those of the second-best denoiser. For the experiment in real Urban dataset, the proposed SDTVLA shows the overwhelming advantages in removing the complex mixed noises and the bias illumination.
Next, we are planning to incorporate the noise-adjusting modeling into our SDTVLA solver to further improve its capability of dealing with more complex noises. Meanwhile, we will also consider implementing the proposed algorithm on a GPU or cloud platform to reduce runtime for more real-time applications.