Hyperspectral Image Denoising Using Global Weighted Tensor Norm Minimum and Nonlocal Low-Rank Approximation

: A hyperspectral image (HSI) contains abundant spatial and spectral information, but it is always corrupted by various noises, especially Gaussian noise. Global correlation (GC) across spectral domain and nonlocal self-similarity (NSS) across spatial domain are two important characteristics for an HSI. To keep the integrity of the global structure and improve the details of the restored HSI, we propose a global and nonlocal weighted tensor norm minimum denoising method which jointly utilizes GC and NSS. The weighted multilinear rank is utilized to depict the GC information. To preserve structural information with NSS, a patch-group-based low-rank-tensor-approximation (LRTA) model is designed. The LRTA makes use of Tucker decompositions of 4D patches, which are composed of a similar 3D patch group of HSI. The alternating direction method of multipliers (ADMM) is adapted to solve the proposed models. Experimental results show that the proposed algorithm can preserve the structural information and outperforms several state-of-the-art denoising methods.


Introduction
A hyperspectral image (HSI) consists of hundreds of contiguous bands at specific wavelengths.They can deliver rich spectral information for real scenes.They have been widely used in many fields, such as urban planning, mapping, agriculture monitoring, forestry, and mineral resources exploration [1,2].But HSI is unavoidably corrupted by various noises, e.g., Gaussian noise, mixed Poisson-Gaussian noise, dead-lines, and stripes.The noise will influence the subsequent processing, such as classification [3][4][5][6][7], segmentation [8], unmixing [9,10], object detection [11,12], background subtraction [13], and super-resolution [14].The central limit theorem establishes that the composite effect of many independent noise sources (e.g., thermal noise, shot noise, etc.) should approach a Gaussian distribution.From a practical point of view, current imaging systems designed based on the assumption of additive Gaussian noise perform quite well.As a kind of signal independent noise, the Gaussian assumption has been broadly used in HSI denoising [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18].From a theoretical point of view, Gaussian noise is the worst-case scenario for additive noise as the Gaussian distribution maximizes the entropy subject to a variance constraint [15].Therefore, reducing the noise of HSI, especially the Gaussian noise, has been an important preprocessing step in practical applications.
When we add noise to the image to simulate a degraded image, the noise is usually chosen from N(0, Σ), and Σ is the noise covariance matrix which may contain off-diagonal elements.When Σ = σ 2 I b , b is the number of bands, and σ 2 is the variance of the independent and identically distributed noise.In this case, N(0, Σ) is always written as N(0, σ 2 ) for the sake of simplicity.This represents spectrally uncorrelated noise.In the case of spectrally correlated noise, the noise covariance matrix Σ was generated by: where c i is the correlation coefficient between the noise in bands i and (i−1).This is the simple case where the noise in each band is correlated with that of its neighbors.Σ can be adapted to account for more complex interactions.In the literature, additive Gaussian noise is assumed to be spectrally uncorrelated, and the spectrally correlated noise can be transformed to spectrally uncorrelated noise by decorrelation methods.In this paper, we mainly consider reducing the Gaussian noise, which is spectrally uncorrelated.HSI denoising methods can be classified into different categories from different perspectives.In this work, we classify the HSI denoising methods from the mathematical view.It can be divided into three categories: vectorization, matricization, and tensor (here tensor is especially referred to whose order is more than two).An HSI contains hundreds of spectral channels, and each channel can be regarded as a grayscale image.In the tensor framework, HSI has three modes, one mode is the spectral dimension, the other two are spatial dimensions.Vectorization methods represent the HSI as a vector, such as Spatio-Spectral Total Variation (SSTV) [18], which may result in very high dimensionality and computation complexity.Matricization methods can be divided into two categories.One is band-wise (or band-by-band) processing.Each band of HSI is a gray image, and it can be denoised band-wise by traditional gray-level image denoising methods, such as the nonlocal-based algorithm [19], K-singular value decomposition (K-SVD) [20], and block-matching 3-D filtering (BM3D) [21].The band-wise methods ignore the correlation in the spectral domain, leading to relatively poor performance.The second category, [10] build the HSI denoising methods under a matrix framework by reshaping an HSI into a matrix, which evidently is more reasonable than band-wise methods.By contrast, though HSI denoising methods based on matrix always outperform the band-wise method, the intrinsic spatial structure of HSI is also inevitably destroyed compared with the methods with the tensor model [22].
Spectral information in an HSI is of great importance in HSI analysis.Therefore, it is essential for HSI denoising techniques to preserve the spectral information.Low rank (LR) prior can reveal the low-dimensional structure in high-dimensional data.It has always been used to regularize the denoising problem [22][23][24].The tensor-based approach is feasible and effective for HSI processing from a physical point of view.Tensor-based approaches have achieved promising results in HSI denoising [25][26][27][28][29][30][31][32][33][34][35][36] as they can process the spatial and spectral information jointly.Renard et al. [25] proposed a low-rank-tensor-approximation (LRTA) method by employing the Tucker factorization method to obtain the LR approximation of an input HSI.Because LRTA depends on the estimation of the rank of tensor, it may result in unstable results.Liu et al. [27] designed the parallel factor analysis (PARAFAC) method by utilizing the parallel factor analysis [28] to denoise HSI.It regards the two spatial dimensions as two modes of tensor, this will lead to vertical and horizontal artifacts.Xue et al. [29] used the rank-1 tensor decomposition for HSI denoising.However, the number of components of this decomposition cannot be estimated precisely, and the accurate estimation of rank is difficult to guarantee.By jointly considering Tucker and Canonical Polyadic(CP) factorization, a new sparsity measure is presented to characterize tensor data with physical meaning, and it can achieve better recovery from a corrupted multispectral image (MSI) [30].
Considering the nonlocal self-similarity across space in an MSI, a tensor dictionary learning-based (TDL) model [31] is proposed for denoising by enforcing hard constraints on the rank of the core tensor.In real data, this constraint is not guaranteed as the real rank of the core tensor is hard to determine.Yan et al. [34] considered the low-rankness of patches and Hao et al. [35] considered the unfolding matrices low-rankness of HSI.But the former method ignored the correlation of different modes and the latter ignored the global low-rankness of the whole HSI (see Figure 1).Figure 1 illustrates the low-rankness of the unfolding (matricization) of HSI along three modes.The blue curves in Figure 1b,c are the singular values of matrices unfolding along mode-1 and mode-2 (spatial dimension).As both of them show almost the same tendency, it implies they have the same property in the spatial domain along mode-1 and mode-2.The fast decaying trend of the blue curve in Figure 1d indicates the strong correlation along mode-3 (spectral dimension).The yellow squares in Figure 1b-d are examples of similar matrix patches in the unfolding matrices along three modes which imply global similarity of HSI.Similar cube patches in the HSI can be used to enhance denoising performance.Xue et al. [36] have shown good results with nonlocal similarity consideration.However, the singular value of global HSI, or the weighted nuclear norm introduced in [37], should be treated differently to further improve performance.How to fully explore the high correlation and nonlocal similarity is still a challenge for HSI denoising algorithms.By taking into account of the global correlation and nonlocal similarities jointly, we proposed a novel denoising approach.
Remote Sens. 2018, 10, x FOR PEER REVIEW 3 of 23 unfolding matrices low-rankness of HSI.But the former method ignored the correlation of different modes and the latter ignored the global low-rankness of the whole HSI (see Figure 1).Figure 1 illustrates the low-rankness of the unfolding (matricization) of HSI along three modes.The blue curves in Figure 1b  The main contributions of this paper are three-fold.First, instead of only exploring mode-3 LR prior knowledge of the clean HSI [35], we also consider the LR property of mode-1 and mode-2, which is called global low-rankness.This idea makes full use of the global information in both the spectral and spatial domains.
Second, we use k-nearest neighbor (k-NN) to cluster the nonlocal similar patches (3-order tensor) to construct a 4-order tensor, in which the third dimension is to keep spectral consistent with the HSI itself.These constructed tensors are LR.This is different from the method in [35], which reshapes these similar patches to a new 3-order tensor.The main contributions of this paper are three-fold.First, instead of only exploring mode-3 LR prior knowledge of the clean HSI [35], we also consider the LR property of mode-1 and mode-2, which is called global low-rankness.This idea makes full use of the global information in both the spectral and spatial domains.
Second, we use k-nearest neighbor (k-NN) to cluster the nonlocal similar patches (3-order tensor) to construct a 4-order tensor, in which the third dimension is to keep spectral consistent with the HSI itself.These constructed tensors are LR.This is different from the method in [35], which reshapes these similar patches to a new 3-order tensor.
Third, the joint global correlation (GC) and nonlocal low-rankness regularization are integrated into a single scheme, in which the alternative direction minimization method (ADMM) [38,39] is utilized and extended in the proposed model.Extensive experimental testing shows that the proposed method can effectively remove the Gaussian noises and recover details of the original scene.When compared with state-of-the-art methods, the proposed method has done well in both the quantitative evaluation and the visual comparison.
The remainder of this paper is organized as follows: Section 2 reviews the related work and details the proposed method for HSI denoising.Then, the denoising mathematical algorithm and optimization procedure are presented in Section 3. Experimental results and discussions about the experimental results are reported in Sections 4 and 5, respectively, and conclusions are drawn in Section 5.

Motivation
We use boldface Euler script letter (e.g., X), boldface capital letter (e.g., X), boldface lowercase letter (e.g., x), lowercase letter (e.g., x) to denote, respectively, tensor, matrix, vector, and scalar.For an N order tensor where R is the real manifold.Its fiber is a vector defined by fixing all indices but one [13].Its L 1 norm and Frobenius norm are defined as , which is obtained by taking all the mode-n fibers to be the columns of the resulting matrix.For a tensor X, its n-mode product with a matrix The Tucker decomposition of tensor X is defined as a core tensor multiplied by a matrix along each mode [17]: where G is the core tensor of the same size as X, and U i ∈ R I i ×I i is the orthogonal factor matrix which can be regarded as the principal component in mode i.For more details of tensor formulation, the reader is referred to [13].
The degradation model with additive Gaussian noise of HSI can be represented by Y = X + N , where Y, X, N ∈ R d×h×b represent the noisy HSI, clean HSI, and Gaussian noise, respectively, and d, h, b denote the width, height, and number of bands of the HSI, respectively.In this paper, we assume the additive white Gaussian noise N is with zero mean and variance σ 2 .
With the knowledge of LR prior, the denoising problem of matrix X can described as [37] argmin µ is the tradeoff parameter.Similarly, the denoising problem of HSI in tensor format can be described as In the low-rank approximation problem, the nuclear norm is usually introduced as the surrogate functional of LR constraint.The matrix unfolding along mode-3 is LR [22].Through unfolding along mode-3 with the weighted nuclear norm minimum (WNNM) described in [37], excellent performance in noise removal has been shown in [35].However, the method in [31] ignores the fact that the unfolding matrix along each mode is coded information which means that both matrices unfolding along mode-1 and mode-2 have the same LR property.
For each matrix, the nuclear norm minimum method treats each singular value equally, and it will lead to a sub-optimum estimation.To overcome this problem, the denoising problem based on the global correlation can be described as X w, * = N i=1 α i X (i) w, * and X w, * = i w i σ i (X) 1 is the weight nuclear norm of matrix X [37], and w = [w 1 ,w 2 , . . .,w n ] (w i ≥ 0) is the non-negative weight of σ i (X), where w k i = C/ σ i (X) + ε , and C=0.05 is a constant, ε > 0 is a small number to avoid zero numerator.σ i (X) is the i-th largest singular value of X.In this paper, the original HSI is treated as a 3-order tensor, so N=3, and through some experiments, we find that α 1 , α 2 should be equal and they should be smaller than α 3 when the denoising result is good.We have conducted various experiments and have empirically chosen

The Low-Rankness Approximation of Nonlocal Similar Patches Groups
The HSI represents continuous imaging of the same object across the spectral domain.Spectral measurements taken from the same objects or materials in different locations are similar, as they exhibit almost the same spectral reflectance.The patches we call in the following are referred to as 3-D full-band patches (FBPs).As these patches contain the nonlocal similarity information in the spatial domain and the global correlation information across all spectral bands, they are usually used as prior knowledge in denoising methods [22,31].However, the matricization method usually concatenates the patches to a matrix, this procession always results in spectral distortion.Based on GC prior, noise in the HSI will be removed globally, but local spatial and spectral distortion will appear, and there will be much residual noise.The distortion and residual noise can be removed efficiently based on nonlocal low-rank regularization [40], which uses nonlocal self-similarity (NSS) to characterize HSI [41].The similarity is evaluated by the Frobenius norm of the Euclidean distance of given two patches.Smaller norms represent a higher similarity [42].
Motivated by [30], we separate noisy-free HSI X into many overlapped 3D patches of the size t × t × b, where t × t is the spatial dimension, and b is the number of bands of HSI.These patches are collected in a patch set S : S = P i ∈ R t×t×b , i ∈ Γ , where Γ represents the index set and P i is the i-th 3D patch in this set.For each exemplar patch P i , we search for its similar patches by k-nearest neighbor (k-NN) [43,44] method in the spatial domain.Here, two patches are considered similar if the Euclidean distance between two patch vectors is smaller than a given threshold.For each 3D patch of S, we search for similar 3D patches from a big window around this 3D patch.Then S can be grouped into J clusters and these similar 3D patches in each cluster are reshaped into a 4-order tensor R p (X) of size t × t × b × N 0 , where p is the index of the p-th cluster, as shown in Figure 2, and N 0 is the number of patches.The selection of J is illustrated in Section 4.4.It is more reasonable than reshaping each cluster into a 3-order tensor, as described in [30].Because the patches in each cluster have very similar structures, R p (X) can be approximated by a LR tensor L p , i.e., R p (X) ≈ L p .The corresponding optimization problem is As an HSI has strong correlation across the spectrum, L p (3) is LR.The patches in each cluster have a very similar structure which implies that L p (4) is also LR.So L p can be represented by Tucker , where G p is core tensor, and U 1p , U 2p , U 3 , U 4p are factor matrices orthogonal in columns, e.g., U T ip U jp = I, U T 3 U 3=I.Note that the factor matrices in the spectral mode for all p are set as a shared matrix U 3 , insuring that L p is low-rank in the spectral mode on the whole.Similar to the global denoising problem, the ideal HSI's patches can be estimated by solving the following optimization problem: where • w, * is tensor norm described in (1).As an HSI has strong correlation across the spectrum, (  ) ( 3) is LR.The patches in each cluster have a very similar structure which implies that (  ) ( 4) is also LR.So   can be represented by Tucker decomposition: , where   is core tensor, and  1 ,  2 ,  3 ,  4 are factor matrices orthogonal in columns, e.g.,   T   = ,  3 T  3 =I.Note that the factor matrices in the spectral mode for all p are set as a shared matrix  3 , insuring that  p is lowrank in the spectral mode on the whole.Similar to the global denoising problem, the ideal HSI's patches can be estimated by solving the following optimization problem： where ‖•‖ w, * is tensor norm described in (1).
By replacing   in (6) with the corresponding Tucker decomposition, we obtain the following problem: where   is core tensor.

Proposed method
Considering both global and nonlocal low-rankness, we propose the following regularized optimization problem： The unconstrained version of ( 8) is

Optimization procedure and algorithm Results
The scheme of the proposed HSI denoising method is summarized in Figure 3.To solve the proposed denoising model, we apply the variable splitting technique [39], and introduce new By replacing L p in ( 6) with the corresponding Tucker decomposition, we obtain the following problem: min where G p is core tensor.

Proposed Method
Considering both global and nonlocal low-rankness, we propose the following regularized optimization problem: The unconstrained version of ( 8) is

Optimization Procedure and Algorithm Results
The scheme of the proposed HSI denoising method is summarized in Figure 3.To solve the proposed denoising model, we apply the variable splitting technique [39], and introduce new auxiliary variables Q and Z p (p = 1,2,...,K).Replacing variable X in the second term and G p in the third term, problem (8) can be reformulated as By introducing two proper parameters, α and β, (10) can be changed to the unconstrained version: The proposed algorithm for denoising can now be summarized in Algorithm 1. Please refer to the Appendix A for the detail of the optimization process with this algorithm.

Experimental results and discussion
Experiments on both simulated and real HSI data were executed to qualitatively and quantitatively assess the denoising performance of the proposed approach.We implemented five state-of-art HSI denoising methods for comparison, namely HyRes [22], LRTA [25], PARAFAC [27], tensor dictionary learning (TDL) [30] , Intrinsic Tensor Sparsity (ITSReg) [31], and, tensor singular value decomposition (t-SVD) [32].Here, the first four compared algorithms were based on tensor, although the last one was based on matrix, but it can deal with 3-D HSI data such as tensor, so we choose it to compare with the proposed method.Their implementation codes can be directly obtained from the authors' websites.In our experiments, the parameter settings of the compared methods were the default setting provided in the reference papers.

Experiment on Simulated Noisy Data
The first simulated experiments were conducted with the Washington D.C. dataset (WDC for short) (https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html).This image comprised 1280 lines and 307 columns, with a spatial resolution of 3m and 210 bands.We extracted a 341×307

Experimental Results and Discussion
Experiments on both simulated and real HSI data were executed to qualitatively and quantitatively assess the denoising performance of the proposed approach.We implemented five state-of-art HSI denoising methods for comparison, namely HyRes [22], LRTA [25], PARAFAC [27], tensor dictionary learning (TDL) [30], Intrinsic Tensor Sparsity (ITSReg) [31], and, tensor singular value decomposition (t-SVD) [32].Here, the first four compared algorithms were based on tensor, although the last one was based on matrix, but it can deal with 3-D HSI data such as tensor, so we choose it to compare with the proposed method.Their implementation codes can be directly obtained from the authors' websites.In our experiments, the parameter settings of the compared methods were the default setting provided in the reference papers.

Experiment on Simulated Noisy Data
The first simulated experiments were conducted with the Washington D.C. dataset (WDC for short) (https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html).This image comprised 1280 lines and 307 columns, with a spatial resolution of 3m and 210 bands.We extracted a 341 × 307 sub-image, and the bands which were seriously degraded were removed.The data were normalized to the range of [0, 1], and the variance of added noise varied from 0.1 to 0.3.
Five quantitative image quality assessment indices (IQAs) were employed for performance evaluation, including peak signal-to-noise ratio (PSNR) [45], structure similarity (SSIM) [45], erreur relative globale adimensionnelle de synthese (ERGAS) [46],feature similarity (FSIM) [47], and spectral angle mapper (SAM) [48].PSNR and SSIM were utilized to evaluate the similarity between the target image and the reference image based on mean squared error (MSE) and structural consistency, respectively.The unit of PSNR was dB.The FSIM was utilized to evaluate the perceptual consistency with the reference image.Larger values of these three measures represent better results.ERGAS measures fidelity of the restored image based on the weighted sum of MSE in each band, and SAM calculates the average angle between spectrum vectors of the target HSI and the reference one across all spatial positions, so it fully reflects the fidelity of the spectral reflectance of the target HSI.However, different from the former three measures, smaller values of these two measures represent better denoising results of the HSI.In this paper, the denoising IQAs (MPSNR, MSSIM, MFSIM, MERGAS, or MSAM) for each HSI were calculated as the average of all the bands.
The performances of all methods on WDC dataset at different noise intensity levels are listed in Table 1.It can be found that the indices of the proposed method with ERGAS and SAM were lower than the compared methods, while the other three indices of the proposed method were higher than these methods.The visual superiority of our method with the results of compared methods is also obvious.To further depict the visual denoising performance of our method, the denoising results under variance 0.3 are shown in Figure 4 with a false-color image.The red, green, and blue channels are the composition of bands 60, 27, and 17 as described in https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html.It is observed from the enlarged part of the denoising results that there were obvious visual artifacts in the results of the PARAFAC and LRTA.The visual artifacts of the PARAFAC came from the fact that the two spatial dimensions should not be considered as two modes of tensor.With such decomposition, the outer products will lead to vertical and horizontal artifacts(see Figure 4d, Figure 5d, and Figure 6d).The details were over-smoothed with t-SVD and ITSReg, as shown in the zoomed square.The TDL considers only the nonlocal similarity but ignores the global correlation, so the reconstructed HSI by TDL was not so completed compared with the proposed method.Even though the false-color image of HyRes looks similar to the proposed method, blurred parts are still observed from the enlarged part.To verify the effectiveness of the algorithm in different datasets and at different noise intensity levels, the Urban dataset with size 301 × 201 × 162 was used.Figures 5 and 6 show the denoising results with noise variance 0.2 of band 42 and 74, respectively.A similar conclusion can be drawn in the experiment on this dataset (see Table 2 and Figures 5 and 6).Our algorithm not only suppresses noise but also preserves image details and texture.4 with a false-color image.The red, green, and blue channels are the composition of bands 60, 27, and 17 as described in https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html.It is observed from the enlarged part of the denoising results that there were obvious visual artifacts in the results of the PARAFAC and LRTA.The visual artifacts of the PARAFAC came from the fact that the two spatial dimensions should not be considered as two modes of tensor.With such decomposition, the outer products will lead to vertical and horizontal artifacts(see Figures 4d, 5d, and 6d).The details were over-smoothed with t-SVD and ITSReg, as shown in the zoomed square.The TDL considers only the nonlocal similarity but ignores the global correlation, so the reconstructed HSI by TDL was not so completed compared with the proposed method.Even though the false-color image of HyRes looks similar to the proposed method, blurred parts are still observed from the enlarged part.To verify the effectiveness of the algorithm in different datasets and at different noise intensity levels, the Urban dataset with size 301×201×162 was used.Figures 5 and 6 show the denoising results with noise varian        As shown by the enlarged part of the denoising results, PARAFAC fails to maintain the structural integrity and generated obvious artifacts because it lacks accurate estimation for the rank of HSI.The result of LRTA still has much residual noise.As an extension of matrix singular value decomposition, the t-SVD still treats each singular value equally in the diagonal.It usually results in over-smooth in some regions and leaves residual noise in others.Although TDL considers nonlocal similarity in the spectral domain, it has higher spectral distortion and cannot preserve the details.While ITSReg has considered the intrinsic structure, it ignores the nonlocal similarity.The HyRes shows almost the same performance as the proposed method.By comparing with the LRTA, PARAFAC, TDL, t-SVD, and ITSReg, the proposed method produces better results on Gaussian noise removal.As we consider the global correlation and nonlocal similarity, it achieves the better visual outcome and spectral fidelity.Figure 7 shows the PSNR, SSIM, and FSIM values of each band of the Urban dataset with noise variance at 0.2.The PSNR, SSIM, and FSIM values of the proposed method are higher than those of the compared methods, indicating better noise removal.
To further investigate the performance of preserving useful spectral information while removing noise, we plotted the spectral values curve at spatial locations (100,100) and (150,150) in Figure 8a,b, respectively.The results are contrast-stretched to 0-255 for better visualization.The closer to the original spectrum means the spectral information preservation is better.Though the HyRes showed almost the same performance with the proposed method, it showed severe distortion in various bands.The same results have been observed with the other five compared methods.This result is in consistency with the IQAs and visual inspection mentioned above.
To give some qualitative comparison, the qualitative index of the mean cross-track profile was adopted.We show the horizontal mean profiles of 35th band in Indian Pine set before and after denoising in Figure 9.The horizontal axis in Figure 9 represents the row number, and the vertical axis represents the mean digital number values of each row.As shown in Figure 9a, the curve has sharp fluctuation because of the noise.The fluctuation is more or less suppressed by all the methods.The proposed method provides evidently smoother curves while preserving the spatial information.To further investigate the performance of preserving useful spectral information while removing noise, we plotted the spectral values curve at spatial locations (100,100) and (150,150) in Figures 8a     and 8b, respectively.The results are contrast-stretched to 0-255 for better visualization.The closer to the original spectrum means the spectral information preservation is better.Though the HyRes showed almost the same performance with the proposed method, it showed severe distortion in various bands.The same results have been observed with the other five compared methods.This result is in consistency with the IQAs and visual inspection mentioned above.

Mean DN
Original Image Denoised Image

Real HSI Denoising
In this section, we present the results of the Indian Pine dataset (https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html).Its size is 145 × 145 × 220, and spatial resolution is 20 m.The implementation strategies and parameter settings for all competing methods were the same as the above-simulated experiments.Since the noise level was unknown, we used the no-reference HSI quality assessment (NHQA for short) [49] method, which is based on quality-sensitive features.The results are listed in Table 3, from which we see that the value of the proposed method was lower than others, which indicates better performance.By analyzing Tables 1-3, we can conclude that the proposed method works better than compared methods, no matter what the noise level and testing data (synthetic data or real data) are.It demonstrates the potentials of the global and nonlocal low-rank constraints in our algorithm.The proposed algorithm provides competitive results to the state-of-the-art algorithms.
Figure 10 shows the 107 th band of the restored images obtained by all competing methods.Figure 10a shows that the original HSI bands were corrupted by heavy noise.Although the restored results from Figure 10b to Figure 10f look satisfactory, in Figure 10b-d, there still exists residual noise, and Figure 10e,f show over-smooth with missing details.Comparatively, albeit less prior knowledge was considered, our method can better restore textures and edge details, and manage to remove structural noises, as shown in Figure 10h.Some stripes and random noises were removed.This further substantiates the robustness of the proposed method.3, from which we see that the value of the 347 proposed method was lower than others, which indicates better performance. 348

356
proposed algorithm provides competitive results to the state-of-the-art algorithms.

357
Figure 10 shows the 107 th band of the restored images obtained by all competing methods.Figure 358 10a shows that the original HSI bands were corrupted by heavy noise.Although the restored results

359
from Figure 10b to Figure 10f

Compare of Computational Costs
In this section, we compared the computational costs of different models for these datasets: the WDC (size 341 × 307 × 160), URBAN (size 145 × 145 × 220), and Indian Pine (size 145 × 145 × 220).The running time (in seconds) for the denoising task by LRTA, PARAFAC, TDL, t-SVD, ITSReg, and the proposed method are listed in Table 4.All the experiments were implemented on Windows 7, the Core i5-7300HQ, CPU@2.5 GHz, and the 8 G memory platform by MATLAB R2014a.The LRTA method is the fastest.But its denoising performance is lower compared to the other methods.The proposed method needs to search for similar 3D patches and the optimization procedure with Tucker decomposition.Both make it relatively slow.For the optimization procedure, though each sub-problem has the closed-form solution in the ADMM framework, the operation time is still long.For future work, we intend to investigate how to reduce the processing time.Although the HyRes obtains similar performance both qualitatively and quantitatively, it can be observed from Figures 8  and 9 that the denoised HSI suffers from spectral distortion.

Parameter Selection and Analysis of Convergence
In Algorithm 1, there are eight parameters, i.e., α, β, λ, µ, η, R 1 , R 2 , and R 3 .Since the auxiliary variable Q should be close to the estimated X.The regularization parameter α will gradually increase, where the error between the two variables will decrease with increasing iterations.In all experiments, we initialize d α as 10 and updated it by α = 1.05 * α.Similarly, we initialized β=10 and updated it by new β = 1.05*β.The R 1 and R 2 for factor matrices U 1 and U 2 control the complexity of spatial redundancy.They were empirically set as R 1 = ceil(h × 0.6) and R 2 =ceil(d × 0.6) in all conducted experiments as this setting works fairly well.The R 3 , which controls the complexity of temporal redundancy, should be carefully tuned with each dataset.We set R 3 as 1 for the real-world datasets.
The parameter λ controls the error of LR approximation.In our experiments with simulated data, we initialized λ = 40/σ and λ = 10/σ for WDC dataset and Indian Pine dataset, respectively.The σ here is the variance of Gaussian noise.For the real data experiments, we set λ = 1.5 in the first iteration.Then, we gradually increased the value of λ with error decreasing.
The parameters of µ and η balanced the regularizations from the spectral domain and spatial domain, respectively.To determine the optimal values of these two parameters, we conducted simulated experiments.Figure 11 shows the example using mean PSNR (MPSNR) as the selection criterion, then a greedy strategy was applied to select the parameter values one by one.Although the parameters obtained by this method are not global optimum, it has achieved favorable denoising performance.
The function curve of MPSNR values with the regularization parameters µ and η are plotted in Figure 11a,b.MPSNR is insensitive to different values of µ and η.Therefore, we can conclude that the proposed method is robust with any µ and η.For convenience, we set µ and η to 1 in all the experiments with both simulated and real datasets.

408
of MPSNR versus the J are displayed in Figure 12.When J=45, it achieved the best MPSNR.We also 409 observe that MPSNR declined slightly when J was increased from 45 to 60.This is mainly because the 410 similarity within a cluster cannot be guaranteed when J is too large.In other words, the parameter J 411 is optimal when the denoising performance has reached a plateau.Additionally, we conducted experiments to understand the best number of clusters J.The results of MPSNR versus the J are displayed in Figure 12.When J = 45, it achieved the best MPSNR.We also observe that MPSNR declined slightly when J was increased from 45 to 60.This is mainly because the similarity within a cluster cannot be guaranteed when J is too large.In other words, the parameter J is optimal when the denoising performance has reached a plateau.Additionally, we conducted experiments to understand the best number of clusters J.The results of MPSNR versus the J are displayed in Figure 12.When J=45, it achieved the best MPSNR.We also observe that MPSNR declined slightly when J was increased from 45 to 60.This is mainly because the similarity within a cluster cannot be guaranteed when J is too large.In other words, the parameter J is optimal when the denoising performance has reached a plateau.

A comparison of state-of-the-art clustering methods
In Table 5, the proposed tensor-based learning, traditional deep learning method, and the clustering approach in [7] are compared.Both of the clustering approach in [7] and our proposed method can be regarded as tensor-based learning.However, there are still some differences.The main difference is that the clustering approach in [7] is CP (rank-1) decomposition (CPD), while our method is built on Tucker decomposition (TD).For each constructed 4-D tensor, which possesses the two local spatial modes, one spectral mode, and one nonlocal spatial mode, each mode has a specific physical meaning.Compared to CPD without reasonable interpretation to descript information prior to each mode, TD has a stronger ability to characterize the low-rank property of HSI.For other differences, please refer to Table 5.

A comparison of state-of-the-art clustering methods
In Table 5, the proposed tensor-based learning, traditional deep learning method, and the clustering approach in [7] are compared.Both of the clustering approach in [7] and our proposed method can be regarded as tensor-based learning.However, there are still some differences.The main difference is that the clustering approach in [7] is CP (rank-1) decomposition (CPD), while our method is built on Tucker decomposition (TD).For each constructed 4-D tensor, which possesses the two local spatial modes, one spectral mode, and one nonlocal spatial mode, each mode has a specific physical meaning.Compared to CPD without reasonable interpretation to descript information prior to each mode, TD has a stronger ability to characterize the low-rank property of HSI.For other differences, please refer to Table 5.

A comparison of State-of-the-Art Clustering Methods
In Table 5, the proposed tensor-based learning, traditional deep learning method, and the clustering approach in [7] are compared.Both of the clustering approach in [7] and our proposed method can be regarded as tensor-based learning.However, there are still some differences.The main difference is that the clustering approach in [7] is CP (rank-1) decomposition (CPD), while our method is built on Tucker decomposition (TD).For each constructed 4-D tensor, which possesses the two local spatial modes, one spectral mode, and one nonlocal spatial mode, each mode has a specific physical meaning.Compared to CPD without reasonable interpretation to descript information prior to each mode, TD has a stronger ability to characterize the low-rank property of HSI.For other differences, please refer to Table 5.

Conclusions
In this paper, we proposed an HSI denoising method by jointly utilizing nonlocal and global low-rankness of HSI.Global low-rankness was exploited via three modes unfolding matrices.This approach exploited the structural information of the original HSI.To take advantage of nonlocal similarity, an LR constraint was added as regularization.This approach was also utilized to exploit the original structural information of image patches and structured sparsity of similar patches.We also split the variables and designed an efficient ADMM algorithm to solve the model.From the experiments with both simulated and real datasets, we conclude that the proposed method based on the intrinsic features of GC and NSS is more efficient.The values of ERGAS and SAM were lower while the FSIM, SSIM, and PSNR were higher (Tables 1 and 2).Figures 8 and 9 illustrate the efficiency of the proposed method where spectral features and spatial features were more consistent with the references.The experimental results demonstrate that our method can achieve competitive performance compared with other state-of-the-art methods.However, the parameters have to be selected experimentally, and the time cost is considered high.This is a disadvantage.In the future, we will concentrate on accelerating the speed of the algorithm to improve its practical significance.

2 (Figure 1 .
Figure 1.Illustration of global spatial-and-spectral correlation in hyperspectral images (HSI).(a) Original HSI and (b) the matrix unfolding by mode-1 and its singular value curve; (c) the matrix unfolding by mode-2 and its singular values curve, and (d) the matrix unfolding by mode-3 and its singular values curve.The x-label is the length of the unfolding matrix.

Figure 1 .
Figure 1.Illustration of global spatial-and-spectral correlation in hyperspectral images (HSI).(a) Original HSI and (b) the matrix unfolding by mode-1 and its singular value curve; (c) the matrix unfolding by mode-2 and its singular values curve, and (d) the matrix unfolding by mode-3 and its singular values curve.The x-label is the length of the unfolding matrix.

Figure 2 .
Figure 2. The 4-order tensor grouped in each cluster can be reconstructed by low-rank Tucker decomposition.

Figure 2 .
Figure 2. The 4-order tensor grouped in each cluster can be reconstructed by low-rank Tucker decomposition.

Figure 3 .
Figure 3. Scheme of the proposed HSI denoising method using global and nonlocal low-rank (LR) regularizations.

Figure 3 .
Figure 3. Scheme of the proposed HSI denoising method using global and nonlocal low-rank (LR) regularizations.

Figure 7 .
Figure 7.Comparison of IQAs of the denoising results with variance 0.20 Gaussian noise: including

Figure 9
Figure 9 Horizontal mean profiles of 35 th band in WDC data, the horizontal axis represents the row number, and the vertical axis represents the mean digital number values of each row.(a) LRTA, (b) PARAFAC, (c) TDL, (d) tSVD, (e) ITSReg, (f) HyRes, (g) Proposed

Figure 9 .
Figure 9. Horizontal mean profiles of 35 th band in WDC data, the horizontal axis represents the row number, and the vertical axis represents the mean digital number values of each row.(a) LRTA, (b) PARAFAC, (c) TDL, (d) tSVD, (e) ITSReg, (f) HyRes, (g) Proposed.
Remote Sens. 2018, 10, x FOR PEER REVIEW 16 of 23 spatial resolution is 20 m.The implementation strategies and parameter settings for all competing 344 methods were the same as the above-simulated experiments.Since the noise level was unknown, we 345 used the no-reference HSI quality assessment (NHQA for short) [49] method, which is based on 346 quality-sensitive features.The results are listed in Table

Figure 10
Figure 10 Visual quality comparison of the denoising results of all methods on band 107 in Indian

Figure 10 .
Figures 10e,f show over-smooth with missing details.Comparatively, albeit less prior knowledge was 361

Figure 12 .Figure 13 .Figure 14 .
Figure 12.Relationship between MPSNR and the cluster number J.4.5.Analysis of ConvergenceTo illustrate the convergence of the proposed algorithm, we provide the convergence curves with MPSNR and MSSIM of the Urban dataset.The curves under different noise levels are shown in Figure13a,b.It can be seen that the values of MPSNR and MSSIM do not change after about 20 iterations.In addition, we also provide the convergence curves of the synthetic WDC dataset and Indian Pine dataset in Figure14a,b.The objective function values were used as the assessment index of algorithm convergence.The steepest decline happened in the first 20 iterations.

Figure 14 .
Figure 14.Convergence curves for (a) WDC dataset under noise level 0.2 and (b) Indian Pine dataset.

Table 1 .
Average performance of six competing methods with respect to five IQAs on the WDC dataset.
under variance 0.3 are shown in Figure

Table 2 .
Average performance of six competing methods with respect to five IQAs on the URBAN dataset.

Table 2 .
Average performance of six competing methods with respect to five IQAs on the URBAN dataset.

Table 4 .
Comparisons of computational time for the denoising methods (in second).

Table 5 .
Comparisons of differences between tensor-based learning and traditional deep learning.

Table 5 .
Comparisons of differences between tensor-based learning and traditional deep learning.

Table 5 .
Comparisons of differences between tensor-based learning and traditional deep learning.