Hyperspectral Image Denoising Based on Nonlocal Low-rank and TV Regularization

: Hyperspectral image (HSI) acquisitions are degraded by various noises, among which additive Gaussian noise may be the worst-case, as suggested by information theory. In this paper, we present a novel tensor-based HSI denoising approach by fully identifying the intrinsic structures of the clean HSI and the noise. Specifically, the HSI is first divided into local overlapping full-band patches (FBPs), then the nonlocal similar patches in each group are unfolded and stacked into a new third order tensor. As this tensor shows a stronger low-rank property than the original degraded HSI, the tensor weighted nuclear norm minimization (TWNNM) on the constructed tensor can effectively separate the low-rank clean HSI patches. In addition, a regularization strategy with spatial–spectral total variation (SSTV) is utilized to ensure the global spatial–spectral smoothness in both spatial and spectral domains. Our method is designed to model the spatial–spectral non-local self-similarity and global spatial–spectral smoothness simultaneously. Experiments conducted on simulated and real datasets show the superiority of the proposed method.


Introduction
For various hyperspectral image (HSI) applications, it is important to fully exploit useful spatialspectral features of HSI. However, because of the limitations of the hyperspectral imaging system and the influence of the atmospheric environment and other transmission factors, the captured HSIs are always contaminated by various noises during image acquisition, among those the Gaussian noise is the most common and most challenging [1]. This makes HSI denoising a necessary preprocessing step for HSI applications, including classification [2], super-resolution [3,4], compressive sensing [5,6], and so forth.
Traditionally, HSI can be denoised by a vector method [7] or matrix method [8][9][10][11][12]. Unfolding all of the bands in HSI to a long vector is done by the vector method [7]. This kind of method has a high processing speed at the cost of destroying the spatial structure and spectral correlation. The matrix method can be divided into the following two categories: band-by-band method and tensormatrixing method [8]. The former is a natural generalization of the procession of a gray-level image. However, as it ignores the correlation between the adjacent spectral bands, such a method cannot provide satisfactory results. Tensor-matrixing is conducted by unfolding each band into a vector, and all of the vectors are cascaded into a matrix. Although this kind of method considers spectral correlation, the spatial structure could still be destroyed. Given the shortcomings of such methods, more effective strategies and methods have been proposed targeting the correlation in both the spatial and spectral domains. For example, a spatial-spectral wavelet shrinkage method has been proposed by the authors of [9] in order to utilize the difference in both the spatial and spectral domains of HSI. To simultaneously utilize the spatial and spectral dependences in a unified probabilistic framework, Yuan et al. [10] proposed a spectral-spatial adaptive total variation model. Some advanced techniques in traditional image processing have also been adopted for HSI denoising, such as nonlocal similarity [11] and anisotropic diffusion [12].
Low-rank (LR) is an important property and common characteristic of HSI, and various approaches based on the LR constraint have been proposed for HSI denoising [13][14][15][16][17][18]. One of the popular approaches for the LR constraint is rank minimization [19,20], in which a nuclear norm [21] is applied in order to estimate the rank of a matrix. However, shrinking the singular values of the nuclear norm (NN) equally will lead to over-estimating or under-estimating the matrix rank. To overcome this problem, Gu et al. [19,20] proposed a weighted nuclear norm minimization (WNNM) model. From a physical point of view, each singular value has a special physical meaning-WNNM considers that larger singular values signify more physical information. Therefore, each singular value should be treated differently. In particular, the large singular values of a clean image carry more physical information, they should be assigned larger weights, and small singular values should be assigned smaller weights. For better denoising results, WNNM is always combined with total variation (TV); the advantage of TV regularization is that it removes noise while keeping the edge texture of HSI. In the literature [22] and [23], TV-regularized WNNM was proposed for HSI denoising combined with spatial low-rankness and spectral piecewise smoothness. Although these methods improve the denoising performance, they still have some disadvantages. Firstly, they deal with spatial domain and spectral domain separately, which may have adverse effects on noise removal. Secondly, these methods fail to fully exploit the prior knowledge on the intrinsic structures of HSI. The recent development of tensor technologies can tackle the aforementioned problem. For example, our previous work [24] enhanced the denoising performance by considering the global and nonlocal low-rank property. The method in the literature [25] integrated the structure tensor TV [26] into the WNNM model, and outperformed the band-by-band TV-regularized WNMM method.
To overcome the drawbacks, we present a novel model to jointly consider the spatial nonlocal similarity and high spectral correlation. To summarize, our contributions are as follows.
First, each group of full band patches (FBPs) is collected by nearest neighbor search (NNS) [27], and the matrix-based WNNM is extended to tensor-based WNNM (TWNNM) so as to keep the multidimensional structure.
Second, to reserve a more refined structure, we use 3D weighted total variation regularization to exploit the prior local smoothness in spatial-spectral domain.
Third, we propose a novel HSI denoising model by combining low-rank and TV, and the alternating direction method of multipliers (ADMM) is designed to solve the proposed model. We conduct experiments on both synthetic and real datasets so as to illustrate the validity and efficiency of the proposed method. Figure 1 shows the flowchart of our method. It is noteworthy that the proposed TWNNM-TV is applied to the constructed HSI from the group of similar patches, not the original degraded HSI. For the sake of brevity and readability, we omit the continuous summation symbol Σ in the model.

Notations and Preliminaries
The mathematical symbols and explanations used in this paper are listed in Table 1. For further information on the tensor algebra, interested readers are referred to [28,29] for more details. For the three-order tensor ∈ ℝ × × , its block circulant matrix is defined as is the k-th front slice of .

From WNNM to TWNNM
We consider the extension of NNM to WNNM as follows [19,20] min ‖Y − X‖ + ‖X‖ , * where ‖X‖ , * = ∑ w X represents the WNN of matrix X, w = w , w , ⋯ , w (w ≥ 0) denotes the weight vector, and X is the i-th singular value of matrix X. The Problem (1) has a closed-form solution of X = US ⁄ ∑ V , where Y = UΣV T is the singular value decomposition (SVD) of matrix Y and S ⁄ • is a soft thresholding operator, which is defined as The HSI collected by optical imaging system are always contaminated by Gaussian noise [24]. The HSI data are a three dimensional cube, which can be denoted as a three-order tensor = X , X , ⋯ , X ∈ ℝ × × , where each matrix X ∈ ℝ × = 1,2, ⋯ , represents the i-th band of his; h and v represent the height and width in each band, respectively; and the HSI has b spectral bands. As our degradation model considers only Gaussian noise, the additive degradation model is where , , ∈ ℝ × × denotes the underlying clean HSI, observed degraded his, and the Gaussian noise, respectively. According to our previous work [24], the tensor weighted nuclear norm minimization (TWNNM) model can be formulated as follows:

Weighted Tensor Total Variation regularization
Even though the method in the literature [24] can remove most of the noise, there is still room for improvement. As 2D total variation (TV) has been shown to preserve the local spatial piecewise smooth structure and suppress noise, it is widely applied to visual processing tasks [8,10]. HSI has a spatial dimension and spectral dimension. The clean spectral band should be smooth, so it is natural to use 3D weighted TV to preserve both the spatial and spectral smooth structure. It is defined as where , , and are three weight parameters, and , and are the differential operators along the spatial horizontal direction, spatial vertical direction, and spectral direction, respectively. Based on the notations in Section 2, , , and at location (i, j, and k) are given by

Nonlocal Low-rank Tensor Construction
When constructing the nonlocal low-rank tensor, we use the traditional nearest neighbor search (NNS). For an individual reference FBP with size m × m × b, we use NNS to find its k similar patches, Then, each FBP is unfolded into a matrix with size m 2 ×1× b; all the k + 1 FBPs (including the reference one) are stacked into a three-order tensor with size m 2 × (k + 1) × b. This operation corresponds to the unfolding and stacking stages in Figure 1. Note that the constructed three order tensor jointly utilizes the spatial local sparsity, the non-local similarity in the spectral and spatial domains, and spectral high correlation. All of the FBPs denoised by the proposed TWNNM-TV are split into matrices with size m 2 ×b; each matrix is folded as a FBP with size m × m × b, and all of the FBPs are aggregated into final denoised HSI. This operation corresponds to the splitting and folding stage in Figure 1.
To illustrate that the patch groups have a stronger low-rank property than the original HSI, we plot the first 40 singular values of the patch groups (blue curve) and the original patch (red curve) in Figure 2a. For a closer observation, we show the zoomed-in part of the singular value numbers between 10 and 20, as shown in Figure 2b. It can be seen from Figure 2a that the singular values of the patch groups are lower than those of the original HSI, and they decrease rapidly. This phenomenon demonstrates that the rank of the patch group is absolutely lower than that of the original HSI, and the same conclusion can be drawn from Figure 2b. Therefore, we apply the LR constraints on the patch group instead of on the original HSI.

Model Proposal and Optimization
Combined with the low-rank prior (TWNNM) and spatial-spectral smooth prior (TV) of the image component, the final optimization model for denoising HSI is as follows: The ADMM method [30] is used to solve the proposed model in Model (6). To this end, we introduce four auxiliary variables, , , , to Model (6), and it is equivalent to the following problem: (7) can be rewritten as its augmented Lagrangian form, as follows: 3,4) are Lagrange multipliers, and μ represents the positive penalty parameter. For the multivariable optimization problem, the usual way is to fix other variables and optimize them alternately one by one. The optimization process is collected in Algorithm 1.
2: Initialize: Let = , : Update the penalty parameter μ = min{ρμ, μmax}. 10: end while Output: The restoration result . By fixing the other variables, each of them can be optimized as following: The subproblems P1, P2, and P3 are of the same form, and by using the soft-threshold shrinkage operator in the literature [31], they can be updated by The subproblem P5 can be solved by the following linear system: , and the ℐ denotes unit tensor, , , and represent the transaction of , , and , respectively. Here, it takes the periodic boundary condition for into consideration, and the in above linear system can be efficiently updated via 3D fast Fourier transform (FFT), as follows: , fftn and ifftn represent the 3D FFT and its inverse operation, respectively. For the subproblem P4, with Problem (1) in mind and according to the authors of [32], its closedform solution is = fold , ( + Λ ⁄ ) , where = 1⁄ .

Experimental Results and Analysis
To evaluate our method for HSI denoising, we perform experiments on simulated and real-world data. The compared state-of-the-art denoising methods include the TV-regularized low-rank matrix factorization (LRTV) [8], the low-rank matrix recovery (LRMR) [18], the automatic hyperspectral image restoration (HyRes) [33], noise-adjusted iterative low-rank matrix approximation (NAILRMA) [34], and total variation regularized low-rank tensor decomposition (LRTDTV) [35]. The codes of these methods are downloaded from the authors' homepages. For the parameters in the compared methods, they are manually adjusted to get the best results. For the weight parameters (i=1,2,3) in TV regularization, considering that and both control the spatial dimension of HSI, they should be assigned to the same weights. For simplicity, we set = = 1, and then we tune according to reconstruction performance; it is found that the result is better when =0.4.

Experiment with Simulated Data
In the experiment with the simulated data, we add different intensity Gaussian noise to the Indian Pines dataset [36], whose size is 145 × 145 × 224.
1) Visual effectiveness comparison: For visual comparison, the denoising results of different methods with the 11 th band are presented in Figure 3 and Figure 4, and the corresponding variances of Gaussian noise are 20 and 60, respectively. It can be seen from Figure 3b and Figure 4b that the clean HSIs suffer degradation to a different degree. When the noise variance is 20, the compared methods could remove most of the Gaussian noise, but there is still obvious residual noise in LRMR and NAIRLMA. When the variance is 60, there is obvious residual noise in all of the compared methods. It can be observed from the enlarged yellow squares in the top left corner of Figure 3 and the top right corner of Figure 4 that the results obtained by our method preserve clearer and sharper edges, but the results obtained by HyRes and LRTV have blurred edges or an over-smooth phenomenon. In general, our method outperforms all of the compared methods at different noise levels. (a) (2) Quantitative comparison: Some frequently-used objective evaluation indexes are adopted, including the mean peak signal-to-noise ratio (MPSNR) [37], the mean structural similarity index (MSSIM) [37], the mean spectral angle mapper (MSAM) [38], and the Erreur Relative Globale Adimensionnelle de Synthese (ERGAS; relative dimensionless global error in synthesis in English) [39]. PSNR (its unit is dB.) and SSIM are utilized to assess the similarity between the denoised image and the original image based on mean square error (MSE) and structural consistency, respectively. Larger values of MPSNR and MSSIM indicate that the results are better. ERGAS is used to measure the fidelity of the denoised image by calculating the weighted sums of the MSE of all the bands, while SAM denotes the average angle of spectrum vectors between the denoised HSI and its corresponding original image across all spatial positions. SAM fully reflects the spectral consistency of the denoised HSI with the original image. Smaller values of these two indexes represent better denoised results. The definitions of these indexes are as follows:

ERGAS = 100 ℎ 1 ( ) ( )
where h, v and b are defined above. RMSE (xi) denotes the root-mean-square error (RMSE) for image xi, and μ(i) denotes the mean of image yi. It is easy to see from Table 2 that the MPSNR values of our method are 3-7 dB higher than the maximum PSNR values of the compared methods. For MSSIM, the TV-regularization methods (LRTV, LRTDTV, and our method) achieve better results than the other methods, but the MSSIM values of our method are still higher than those of LRTV and LRTDTV. This indicates that the denoising results of our method have a better visual effect, and this is consistent with what we see in Figure 3 and Figure 4. To take a closer look at the SSIM and PSNR values of all the bands, we use noise variance 60 (see Figure 5) as an example. The results from Figure 5 show that our method outperforms almost all of the compared methods for each band, except that the SSIM values of our method in some bands are lower than those of LRTV.
(a) (b) Figure 5. The peak signal-to-noise ratio (PSNR) value (a) and structural similarity index (SSIM) value (b) of each band when the noise variance is 60.
On the account of nonlocal similarity, it can be seen from Table 2 that the MSAM and ERGAS values of our method are much lower than the other five methods. This can be interpreted as our method beng able to better maintain spectral information. To demonstrate the spectral fidelity achieved by our method, Figure 6 shows the spectral reflectance spectrum at location (55, 55) from all of the compared methods. In Figure 6, the blue curve represents the spectral reflectance values of the original image, and the orange curve represents the denoised spectral reflectance. It is not difficult to see that the spectral curve obtained by the proposed method achieves less spectral distortion and fits better to the original spectral curve than the compared methods.

Real-world Data Experiments
The performance on the simulated dataset is evaluated in Section 4.1. In this section, we choose two widely used real-world HSI datasets to verify the denoising performance. The first one is the Hyperspectral Digital Imagery Collection Experiment (HYDICE) urban dataset [40] and the second one is the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) Salinas dataset [36].

HYDICE Urban Dataset
The original dataset has 210 bands, and each band is a 307 pixel × 307 pixel grayscale image. In the experiment, we manually adjust the parameters of the compared methods accordingly to achieve the best results. Figure 7 and Figure 8 display the denoising results of band 138 and band 206, respectively, with different methods. There is still plenty of residual noise in HyRes, LRMR, NAILRMA, and LRTDTV. For the LRTV, most noise is removed, but the result is over smooth. By considering the nonlocal lowrank property and spectral-spatial TV regularization, our proposed method shows superior performance on removing the Gaussian noise and preserving the spatial texture information and spectral information.
Based on the above analysis, we go further to evaluate all of the denoising algorithms with the mean cross-track profile (MCTP) [24]. All the MTCPs of 206 th spectral band after denoising, along with the original MTCP, are presented in Figure 9. The horizontal axis and vertical axis in Figure 9 represent the column number and the mean digital number (MDN) values of each column, respectively. The existing noise leads to severe disturbances in the profile of the original image. After denoising, the disturbances are suppressed by the compared methods with different levels of success. In particular, the dead lines in Figure 7a are also eliminated, and the corresponding MTCPs are smoothed, as shown in the red circles in Figure 9. Evidently, our method provides a smoother mean profile, which is consistent with the results shown in Figure 7.

AVIRIS Salinas Dataset
This data contains 224 bands, and each band is a 512 × 217 pixel grayscale image. The original image is too large for display, so we extract a subimage of 300 × 217 pixels to show the denoising results. The second band in this dataset is contaminated by heavy Gaussian noise (Figure 10a), so we select this band to evaluate the denoising performance, the results are presented in Figure 10b-g. We can observe that the LRMR method completely failed to denoise this band. The HyRes and NAILRMA can remove some noise, but there is still obvious noise remaining. As for LRTV and LRTDTV, they have over-smoothed the image and distorted the structure, as presented in Figure 10c and Figure 10f, and thus fail to give satisfactory results. Figure 10g indicates that the proposed method can still keep sharp edge when removing heavy noise. All of the curves of MDN with band 2 before and after denoising are presented in Figure 11. By comparison, our method achieves a better restoration result than the compared methods.

Discussion
(1) Parameters selection The regularization parameters are discussed in Section 4.1. The other two parameters are related to the patch (size m × m × b), the parameters involved are m and k, and k is the number of similar patches in each cluster.

Mean DN
To determine the optimal values of m and k, the MPSNR index in the simulated experiments is used as the criterion. The curves of the MPSNR values with m under three different noise variance cases are shown in Figure 12a, in which the represents noise variance. It can be seen that in the interval from 5 to 10, the MPSNR value increases with the increase of m. The most likely reason for this is that the damaged structure can be restored with the increase of m. As the value of m continues to increase, the MPSNR value gradually decreases, which indicates that under the optimal selection of m, a reasonably satisfactory denoising effect can be obtained. When the variance is 50, the MPSNR has a higher value when m = 11 than when m = 10. As the improvement is negligible, the patch size is set as m = 10 for all the experiments. Moreover, the number of similar FBPs (k) is evaluated with the parameter m being fixed. The result in Figure 12b shows that the MPSNR index increases gradually until k = 40, thereafter it drops slowly. It can be explained that too many patches will destroy the low-rank property. Furthermore, more FBPs means a higher time cost, and we finally set the number of similar FBPs to 40.
(2) Convergence Analysis To illustrate the convergence of the proposed method, the relative changes and MPSNR values versus the iteration number of our method are presented in Figure 13. It can be seen that the values of these two indexes tend to be stable after about 35 iterations, which clearly shows the convergence of our method. (3) Operation time analysis To test the computational complexity with compared algorithms, we select the running time of the Salinas dataset for comparison. The running times (in seconds) are shown in the Table 3. LRMR is the fastest, but its performances are the worst. Our algorithm is not the fastest, but also not the slowest.

Conclusion
To remove Gaussian noise, we propose a TV-regularization TWNNM model. In this model, we apply TWNNM on the HSI group clustered by NNS to characterize the LR property with similar patches. Moreover, TV regularization is utilized to not only suppress noise, but also keep the local smoothness in both the spatial and spectral domain. Experiments on both simulated and real HSI datasets indicate that our method can retain detailed information of the image better, while noise points are removed, which can be explained as by the fact that the combined LR and smooth prior information of the image component has the ability to accurately suppress noise and keep the smooth structure. Our method outperforms the state-of-the-art methods both in visual quality and evaluation criteria. Furthrmore, in further works, we will extend our method to other restoration tasks, such as magnetic resonance imaging (MRI) [41] and optical coherence tomography (OCT) images [42][43][44].