Hyperspectral Image Denoising Based on Spectral Dictionary Learning and Sparse Coding

: Processing and applications of hyperspectral images (HSI) are limited by the noise component. This paper establishes an HSI denoising algorithm by applying dictionary learning and sparse coding theory, which is extended into the spectral domain. First, the HSI noise model under additive noise assumption was studied. Considering the spectral information of HSI data, a novel dictionary learning method based on an online method is proposed to train the spectral dictionary for denoising. With the spatial–contextual information in the noisy HSI exploited as a priori knowledge, the total variation regularizer is introduced to perform the sparse coding. Finally, sparse reconstruction is implemented to produce the denoised HSI. The performance of the proposed approach is better than the existing algorithms. The experiments illustrate that the denoising result obtained by the proposed algorithm is at least 1 dB better than that of the comparison algorithms. The intrinsic details of both spatial and spectral structures can be preserved after significant denoising.


Introduction
Hyperspectral images (HSIs) are widely used in military, geological exploration, forestry, and agriculture domains as entry data [1,2]. Information contained in HSI can be decomposed into either unidimensional data, representing the spectral information, or bi-dimensional data, representing the spatial information. In real-world applications, the processing of HSI often includes unmixing [3,4], classification [5,6], and target detection [7,8]. However, the nature of HSI acquisition inevitably results in the blending of noise in HSI data. The noise information not only reduces the visual quality of images, but also complicates the processing of HSI, so the results of processing are less accurate [9]. Thus, HSI denoising is the first and a crucial phase of HSI processing, necessitating research for more effective and economic denoising methods. This domain has gained popularity, attracting many researchers [10].
In the research of HSI denoising methods, various theories have been proposed and tested. Initially, since each spectral channel in the HSI data cube can be treated as a grey-level image, typical two-dimensional (2D) image denoising algorithms, such as block-matching three-dimensional (3D) filtering (BM3D) [11], total variation (TV) [12], and the nonlocal-based algorithm [13], were applied to denoise HSIs band-by-band. Then, some denoising methods conceived for 3D data, such as video denoising by sparse 3D transform-domain collaborative filtering (VBM3D) [14] and block-matching four-dimensional (4D) filtering (BM4D) [15], were applied to HSIs. However, the above denoising methods fail to consider the high correlation between spectral bands and always produce low-quality images, they similarly manage the hyperspectral data. After expanding the noisy hyperspectral data cube into a pixel spectral matrix, dictionary learning algorithms select blocks in the matrix as the training set to obtain dictionary atoms. However, the spectral information of hyperspectral data cannot be fully utilized with this method. Therefore, we use the pixel spectral vectors instead of the traditional image blocks as the training data to perform dictionary learning. Obviously, the learned dictionary dimension is the same as the HSI spectral dimension. Under the linear mixture model (LMM), the dictionary atoms are regarded as spectral curves constituting the noisy HSI. Thereby, the dictionary atoms can better reflect the details of spectral features, and more precise HSI sparse reconstruction optimization can be realized, which means better denoising results.
The HSI denoising algorithm proposed in this study is based on sparse coding and adaptive dictionary learning, which is termed HyDeSpDLS. In the denoising algorithm, we propose a novel approach for directly constructing a dictionary from hyperspectral data by progressively using the pixel spectral vectors as the training set. Compared with overall loading methods, such as the batch method, the proposed dictionary learning approach can significantly improve the training efficiency, and the learned dictionary can adaptively represent HSIs. With the learned dictionary, the sparse coding is performed using a variable splitting and augmented Lagrangian and total variation method. The total variation regularization is spatially homogenous, which means that nearby pixels have similar coefficients for the same endmember. As a priori knowledge, the total variation regularizer improves the sparse reconstruction accuracy. These improvements make HyDeSpDLS a competitive HSI denoiser. This paper has the following contributions: (1) The proposed novel dictionary learning method can construct a dictionary directly from noisy HSIs. To adopt the characteristics of HSI data, we use an online method to improve the training efficiency and use the pixel spectral vectors instead of the traditional image blocks as the training dataset to fully use the spectral information in noisy HSIs. (2) Considering the spatial-contextual information in HSI data, the TV regularizer is introduced into the sparse coding after the dictionary is obtained to guarantee that neighboring pixels have similar coefficients for one same endmember. With this prior information, the accuracy of sparse reconstruction improves.
This work is an extension of a conference paper [28]. The new material is as follows: (1) The online spectral dictionary learning algorithm is introduced and characterized in detail.
(2) Considering the spatial-contextual information, a new algorithm is introduced to perform the sparse representation, termed sparse regression by variable splitting and augmented Lagrangian and total variation (SpaRSAL-TV). The contents of this paper are divided into five parts. The first part briefly introduces the background and current status of the research topic. The second part presents the mathematical noise model and the theory of HSI denoising. In the third part, the sparse coding method and the dictionary learning algorithm are explained. The fourth part formally outlines HyDeSpDLS, a denoising approach based on online dictionary learning and sparse representation in the spectral domain. The fifth part presents the results of the proposed method when applied to real and synthetic data, which are compared with the denoising methods produced by other authors. The final part concludes this paper.

HSI Noise Model
Under the additive noise assumption, the HSI noise model is written as below: f = s + n (1)

Denoising Process
HSI obtained in the real world is usually noisy, which means that the image can be considered decomposition of a random noise component and a clean image, which can be further decomposed using a dictionary. A clean image is a small number of basis vectors extracted from a dictionary, called a linear combination of atoms. In this assumption, noise is random and cannot be represented by any basic vector in the dictionary. Therefore, noise can be efficiently reduced by reconstructing the image with a dictionary and the corresponding sparse codes.
Considering the special structure of HSI data, constructing a dictionary in the spectral domain to sparsely represent images is more consistent with the imaging mode and physical meaning of HSIs. Since the number of bands contained in an image is usually as high as several hundreds, it is more efficient and time-efficient to perform HSI denoising with a pixel spectral matrix expanded from the data cube.
Therefore, we expand the noisy HSI by scan lines into a pixel matrix, which is used as a training set so that a spectral dictionary can be obtained. The noisy image can be reconstructed with the dictionary and the corresponding sparse codes. The denoising process is shown in Figure 1.

Denoising Mechanism
According to the above analysis of the HSI noise model and the denoising process, HSI denoising can be regarded as a sparse signal recovery task supported by a specific dictionary.
Considering the sparse property of HSIs, the clean HSI may be represented as: represents the spectral dictionary, k is the number of dictionary atoms, kn   α R denotes the sparse codes, and only a few elements in each column are nonzero.
Then, Equation (2) can be written as: Therefore, the HSI denoising problem is formulated as: where, where 0 i α is the nonzero element number of vector i α , which is termed as a 0 l norm, and the regularization parameter  (>0) is used to establish the relative weight between the two terms in the objective function.
Since the optimization problem of the 0 l norm is non-convex, the problem is hard to precisely and easily solve. The 0 l norm can be replaced with the 1 l norm as a convex approximation to deal with the optimization in Equation (7). Then, Equation (7) can be written as a constrained sparse regression: where 0   is the parameter controlling the reconstruction error. To obtain the sparse codes ˆi α by solving the optimization in Equation (8), we can sparsely reconstruct the pixel spectral vector i as: ˆˆ, 1,..., ii in  xD α (9) Thereby, the denoised HSI In the above sparse representation-based HSI denoising mechanism, the two key steps are the acquisition of the spectral dictionary D and the solution of sparse codes ˆi α .

Analysis of Noise Reduction
The pixel spectral vector estimation error With the assumption that i x is in the range of S D , we have: where S i i  D γ y represents the error between the observed pixel spectral vector and the reconstructed one, that is, the components of the observed pixel spectral vector which fail to be projected into the space expanded by S D . Thus, the error is equivalent to Then, the minimum residual can be calculated as: Due to ˆi S i  xD γ , the estimation error of the pixel spectral vector is ii   Pn . The error is caused by projecting noise components into the signal space. With the assumption that the mean of the noise is zero and the variance is 2  I , we have: The noise attenuation is: (13) where m is the degree of freedom of noise.
Therefore, we conclude that the sparse representation estimation error is proportional to the level of signal sparsity. The more sparse the codes, the better the denoising results. However, in reality, the ratio shown in Equation (13) is hard to attain due to the errors in ˆi α , since the encoding is always a non-deterministic polynomial hard (NP-hard) problem in sparse representation.
To improve the accuracy of ˆi α , on the basis of the spectral dictionary in which the atoms can be regarded as spectral curves, the TV regularizer is introduced into the sparse coding. Since neighboring pixel spectral vectors having similar codes for the same atom in the dictionary, the TV regularizer imposes spatial consistency in the encoding results. Concretely, the TV regularizer fully uses the spatial-contextual information of the HSI data when doing the sparse coding, acting as prior information to improve the conditioning for solving the codes. Therefore, the noise attenuation ratio can be closer to Equation (13) when the accuracy of ˆi α is improved by using the TV regularizer in this paper. The dictionary training process is aimed at finding the basis vectors to present the information in a noisy HSI. In order to obtain better denoising results, according to the ratio in Equation (13), it is necessary to ensure that the coding of the information contained in each pixel spectral vector has a high sparsity level. Therefore, we use an adaptive spectral dictionary to support the sparse coding. When training the dictionary, we use the pixel spectral vectors instead of the traditional image blocks as the training dataset to fully use the spectral information in the noisy HSI. Thus, the sparsity p decreases when completing the sparse coding by using the spectral dictionary trained following our method. Then, better denoising results can be produced.

Online Spectral Dictionary Learning
In this paper, the spectral dictionary D used to solve the sparse coding ˆi α and finally reconstruct the denoised HSI X is obtained by adaptive training-the dictionary learning process.
Considering the large scale of HSIs, the overall loading methods, such as the batch method, usually lead to excessive calculation and a low training efficiency. We use the online method to train the dictionary and select only one subset or element from the training dataset in each iteration in dictionary learning. Due to the special structure of HSI data, we propose an online spectral dictionary learning method, termed OSDL, to train the spectral dictionary for denoising. Compared with the existing ODL, the two major improvements in OSDL are as follows. First, in order to fully use the spectral information in a noisy HSI, the observed pixel spectral vectors are used instead of the traditional image blocks as the training data. Second, we use the alternating direction method of multipliers (ADMM) [29] to replace the least angle regression (LARS) [30] applied in existing algorithms when completing the sparse coding in each iteration, since ADMM is more suitable for large-scale problems [30].
Compared with other currently applied algorithms that directly construct a dictionary from HSI data, such as K-SVD and BPFA, OSDL has a lower computational complexity and improves the efficiency of dictionary learning. Since the existing dictionary learning algorithms do not well-utilize the spectral information, OSDL overcomes this disadvantage by using pixel spectral vectors as the training data. The dictionary atoms reflect the details of spectral features.
Given a set of pixel spectral vectors from HSI data, dictionary learning is formulated in the regularization framework: The objective function of the optimization problem in Equation (14) is the sum of the representation error quadratic norm plus a sparsity promoting term. The 1 l norm represents the linear regression coefficients. The regularization parameter  (>0) is used to establish the relative weight between the two terms and p N is the number of pixel spectral vectors. To prevent the dictionary atoms of D tending to infinity due to the role of the 1 Though it is non-convex to simultaneously optimize all variables, the optimizations of the coefficients p 1 , ... , and the dictionary D are convex. Therefore, one variable is kept constant when the other is minimized, which is the direct method to perform the optimization problem in Equation (14).
Due to the large scale of the HSI data set, in a typical small image scenario consisting of 100 × 100 pixels and 100 bands, we have = 10,000 p N and 100 L  . Therefore, the optimization problem of Equation (14) is relatively light with respect to D , but extremely heavy with respect to i α . In order to efficiently obtain the spectral dictionary, an online method is introduced into dictionary learning. First, a random sequence of pixel spectral vectors is selected from the pixel spectral matrix as the training set in the current cycle and processed sequentially. For each new element in the current training set, sparse coding is calculated first, and the current dictionary is then updated. The optimization of Equation (14) is a basis pursuit denoising (BPDN) problem with respect to the optimization of i α . With a known dictionary, the sparse coding can manage the sparse-  (15) where 0   is the parameter controlling the reconstruction error.
The most widely used algorithm used to implement the sparse regression is LARS. Considering the large scale of the HSI dataset, ADMM is applied instead. The ability to decompose a large problem into several smaller pieces is the major advantage of ADMM [30].
When the sparse coding is finished, the goal of the optimization of D is to minimize the function as: where t is the current iteration number, j y is the training set consisting of j spectral vectors that are randomly obtained from the pixel spectral matrix, and the sparse codes j α are already known.
Then, the dictionary columns are updated through a projected block-coordinate descent method in the optimization of Equation (16). The online dictionary learning principle is analyzed to determine the OSDL algorithm flow. First, the pixel spectral matrix, expanded from the noisy hyperspectral image, is used as the original training set. Then, a random sequence of pixel spectral vectors is selected from the pixel spectral matrix as the training set in the current cycle, which is processed sequentially. For each new element in the current training set, the sparse coding is calculated by solving the BPDN problem, and the current dictionary is then updated. Algorithm 1 shows the pseudo code for OSDL.

Algorithm 1
Online spectral dictionary learning (OSDL) [8]. In Algorithm 1, α denotes the sparse codes matrix, and matrixes Α and B represent the accumulated "past" information. Thus, the value of both 0 A and 0 B is zero.
The algorithm output cannot be influenced by the initial dictionary. Generally, the first k pixel spectral vectors in the pixel spectral matrix are taken as the initial dictionary atoms. Since the dictionary columns are updated through a projected block-coordinate descent method with the high level of sparsity of codes i α , just one iteration per column is enough to finish the update. Therefore, to improve the rate of the convergence, the current iteration is implemented by using the dictionary in the previous iteration as a warm restart.
Since "new" information can be more accurate, a parameter is introduced into the dictionary learning algorithm to gradually decrease the accumulated information weight in Α and B over time.
t  is defined here as:

Sparse Coding
The learned spectral dictionary can be used to calculate sparse codes of the noisy HSI. Due to the correlation between pixel vectors and the neighbors, the spatial-contextual information in noisy HSIs is exploited as a priori knowledge when completing the sparse coding. Hence, sparse regression by variable splitting and augmented Lagrangian and total variation (SpaRSAL-TV) is applied to perform the sparse coding, and this optimization problem can be written as: is a vector extension of the non-isotropic total variation (TV), which allows the abundance coefficients of the same end-member among neighboring pixels to change smoothly.  represents the neighborhood subsets in the image horizontally and vertically, and the ith column of the matrix α . The regularization parameters c  and TV  are both non-negative. If TV 0   , the optimization in Equation (18) is simplified to a BPDN problem without considering spatial information, which is the optimization in Equation (5).
Due to the non-smooth terms and the large dimensionality, even though the optimization in Equation (18) is convex, it is hard to solve. Therefore, according to the methodology proposed in Reference [30], the variable splitting and augmented Lagrangian (SUnSAL) algorithm is used to introduce new variables for regularization into the sparse unmixing [31], so that the initial problem is converted into simpler problems. Essentially, the ADMM method is used here to solve the optimization in Equation (18) Then, the optimization in Equation (18) can be written as:   (23) The optimization in Equation (22) We can rewrite the optimization in Equation (24) in an augmented Lagrangian form as: where  is a positive constant and /  Q is the Lagrange multiplier of the constraint 0 

GU PV
. Therefore, the sparse coding algorithm begins with the application of the augmented Lagrangian in Equation (26), and the flow is shown in Algorithm 2.

HSI Denoising Algorithm Outline
By analyzing the HSI denoising process and the HSI noise model in Section 2 and studying the spectral dictionary learning method and the sparse representation approach in Section 3, an HSI denoising algorithm on the basis of spectral dictionary learning and sparse coding, HyDeSpDLS, is proposed. Algorithm 3 shows the pseudo code of the HyDeSpDLS algorithm.

Experiments and Results
In this section, the competitiveness and effectiveness of the proposed HyDeSpDLS are verified according to real and synthetic HSI data in exhaustive experiments. The proposed algorithm is comprehensively evaluated with both qualitative observation and quantitative indexes compared with the existing HSI denoising methods. MATLAB R2014a on a laptop that is equipped with eight Intel Core i7-7700HQ CPU with 16 GB RAM, is used to implement the algorithms.

Data Description and Experimental Condition
The synthetic data is generated from a Washington DC Mall scene. The original data has 256 × 256 pixels and 191 spectral channels with atmospheric correction, which is regarded as a clean image for our experiments. Each band is normalized to [0,1] in advance before adding simulated noise.
Three The optimization parameter of the proposed HyDeSpDLS is the regularization parameter  in the dictionary learning process. The value of  is related to the HSI noise level. When the image contains more noise, it is suitable to select a larger  . Otherwise, a smaller one is selected. Generally, the value of  is set to 1. In addtion, the memory required for the proposed algorithm is 96 Mb.

Evaluation Indexes
The performances of different denoising approaches are quantitatively examined by calculating the peak signal-to-noise (PSNR) index and the structural similarity (SSIM) index.
The PSNR index is defined as:  (27) where s is the clean HSI, and ŝ is the reconstructed image. s has a size of M N L , the indexes of which are ,, i j l . The PSNR is the overall approximation degree of the denoised image to the clean image.
The SSIM index is calculated as: where  s and  s represent the mean and the variance of the clean HSI, respectively; similarly,  s and  s denote the mean and the variance of the denoised HSI, respectively. Due to the weak denominator, we define two constants, 1 C and 2 C , to make the division stabilized. The denoising performance of the spatial dimension of HSI can be evaluated through the SSIM index focusing on structure information.

Experimental Results
Figures 2-4 present the denoising results from different denoising methods with different kinds of noise. Under the Poissonian noise assumption, the Poissonian noise is first transformed into approximate additive Gaussian noise with roughly equal variance by the Anscombe transform [32], and then denoising processing is performed. From the figures, we qualitatively find that the performance of the proposed HyDeSpDLS is better than that of the other denoising algorithms. It can be concluded that all the algorithms are able to denoise the image under all the noise conditions. BM3D, BM4D, and KSVD smooth the details of the denoising results to varying degrees. From  Figures 2d,e,g-4d,e,g, we can find that the details of the buildings and green belts are smoothed and blurred. In contrast, PCA + BM4D, LRMR, NAILRMA, and BPFA are better able to preserve local details, as shown in Figures 2f,h-j-4f,h-j. However, these four approaches still leave small amounts of noise in the denoising results. From Figures 2-4k, we can find that the proposed HyDeSpDLS is best able to significantly preserve the intrinsic details of the spatial structure when removing the noise.  (a) Figures 5-7 present the denoised spectral signature results produced by different denoising methods and with different kinds of noise. The quality of the spectral signatures in HSIs is important for material recognition. Due to inadequate use of the spectral information in noisy HSIs, the reconstructed spectra obtained by BM3D, BM4D, and KSVD are quite low, as shown in Figures 5-7b,c,e. Considering the low rank property of HSIs, the accuracy of reconstructed spectra obtained by PCA + BM4D, LRMR, NAILRMA, and BPFA improved, as shown in Figures 5-7d,f-h. Since the proposed HyDeSpDLS uses a spectral dictionary in which atoms better reflect the details of spectral features in the noisy HSI, this algorithm yields the best performance in restoring the spectral signatures, as shown in Figures 5-7i.   Table 1 that the performance of the proposed HyDeSpDLS is the best in both indexes under all the noise conditions. With increasing noise, the improvements produced by HyDeSpDLS also increase, compared with other denoisers. Compared with the dictionary learningbased denoisers, namely BPFA and KSVD, the denoising performance of the proposed HyDeSpDLS is significantly better.  1 The second highest value in each row is underlined. 2 The highest value in each row is shown in bold.
The running times of different denoising algorithms are shown in Table 2. We can find that, compared with the dictionary-learning-based denoisers, namely BPFA and KSVD, the running time of the proposed HyDeSpDLS is much shorter. It can be concluded that the dictionary learning method applied in HyDeSpDLS has less computational complexity. However, the proposed HyDeSpDLS needs more time than low-rank based algorithms, namely LRMR and NAILRMA, and it may be improved in future research.  HyDeSpDLS is applied to the Indian Pine data acquired by the AVIRIS (airborne visible/infrared imaging spectrometer) hyperspectral sensor over Northwestern Indiana, USA in June 1992. There are 145 × 145 pixels in the image, with a spatial resolution of 20 m per pixel, and 220 bands with atmospheric correction. It is assumed that the noise in the dataset is non-i.i.d and displays a strong effect in a number of bands.

Experimental Results
Since there are mass bands in the dataset, only the 61st and 110th bands are selected to illustrate the denoising performance of different algorithms. The image displays strong noise and high brightness in the 61st band, as shown in Figure 8a. Qualitatively, the proposed HyDeSpDLS produces the best denoising result and the spatial structure details, such as edges of the image, are well preserved. In contrast, the image displays weak noise and low brightness in the 110th band, as shown in Figure 9a. BM3D, BM4D, and KSVD can more or less remove the noise, as shown in Figure 9b,c,e, respectively. PCA + BM4D, LRMR, and BPFA remove the noise moderately, as presented in Figure  9d,f,h, respectively. However, the three approaches also smooth the details. NAILRMA and the proposed HyDeSpDLS produce the best denoising results and HyDeSpDLS better preserves the details of spatial structures compared to NAILRMA, as shown in Figure 9g,i, respectively. This further illustrates the robustness of the proposed method with different noise intensities and image brightness.

Conclusions
In this paper, a novel denoising method for HSI, called HyDeSpDLS, is proposed based on dictionary learning and sparse coding and extended to the spectral domain. Firstly, the noisy HSI data cube is expanded into a pixel spectral matrix by the scan lines first. A training set consisting of pixel spectral vectors from the matrix is used to train the spectral dictionary for image sparse representation. The spatial-contextual information present in the noisy HSI is exploited as a priori knowledge when completing the sparse coding. Compared with the existing algorithms, including BPFA, NAILRMA, PCA+BM4D, BM4D, BM3D, KSVD, and LRMR, the performance of the proposed method is much better. The intrinsic details of both spatial and spectral structures are well preserved with significant denoising, according to qualitative observations and quantitative indexes. However, the proposed algorithm still has a time consumption problem. In future research, we will focus on a faster implementation of the algorithm.