Hyperspectral Image Mixed Noise Removal Using Subspace Representation and Deep CNN Image Prior

: The ever-increasing spectral resolution of hyperspectral images (HSIs) is often obtained at the cost of a decrease in the signal-to-noise ratio (SNR) of the measurements. The decreased SNR reduces the reliability of measured features or information extracted from HSIs, thus calling for effective denoising techniques. This work aims to estimate clean HSIs from observations corrupted by mixed noise (containing Gaussian noise, impulse noise, and dead-lines/stripes) by exploiting two main characteristics of hyperspectral data, namely low-rankness in the spectral domain and high correlation in the spatial domain. We take advantage of the spectral low-rankness of HSIs by representing spectral vectors in an orthogonal subspace, which is learned from observed images by a new method. Subspace representation coefﬁcients of HSIs are learned by solving an optimization problem plugged with an image prior extracted from a neural denoising network. The proposed method is evaluated on simulated and real HSIs. An exhaustive array of experiments and comparisons with state-of-the-art denoisers were carried out.


Introduction
Hyperspectral cameras measure the radiation arriving at a sensor with high spectral resolution over a sufficiently broad spectral band such that the acquired spectrum can be used to uniquely characterize and identify any given material [1]. Hyperspectral imaging plays an important role in remote sensing and has been used in a wide array of applications, such as the identification of various minerals in mining and oil industries [2], monitoring the development and health of crops in agriculture [3], detecting the development of cracks in pavements in civil engineering [4], and so on.
A hyperspectral image (HSI) is a three-dimensional data cube, where the first two dimensions represent the spatial domain and the third dimension represents the spectral domain. In contrast to multispectral imaging, hyperspectral cameras capture electromagnetic information in hundreds of narrow spectral bands, instead of multiple wide spectral bands. However, due to the decrease of the width of spectral bands, hyperspectral cameras receive fewer photons per band and tend to acquire images with a lower signal-to-noise ratio (SNR). The decreased SNR reduces the reliability of measured features or information extracted from HSIs [1]. Therefore, hyperspectral image denoising is a fundamental inverse problem before further applications.
The degradations linked with various mechanisms also result in different types of noise, such as Gaussian noise, Poissonian noise, impulse noise, dead-lines/stripes, and

Related Work
The subspace representation of spectral vectors in HSIs has been successfully used to remove Gaussian noise by regularizing the representation coefficients of HSIs. We refer to representative work on global and nonlocal low-rank factorizations (GLF) [17], fast hyperspectral denoising (FastHyDe) [8], and non-local meets global (NGmeet) [18]. The idea of regularizing the subspace representation coefficients of HSIs underpins state-of-theart Gaussian denoisers and has been extended to address mixed noise. The challenge of this extension lies in the estimation of the subspace.
A spectral subspace can be estimated from an observed HSI by simply carrying out the singular value decomposition (SVD) of observed image matrix when noise is i.i.d. Gaussian. When noise is a mixture of Gaussian noise, stripes, and impulse noise, the spectral subspace is usually estimated jointly with subspace coefficients of the HSI; for example, in double-factor-regularized low-rank tensor factorization (LRTF-DFR) [21] and subspace-based nonlocal low-rank and sparse factorization (SNLRSF) [22]. The joint estimation of the subspace and the corresponding coefficients of the HSI usually produce poor estimates of the subspace when HSI is affected by severe mixed noise. To sidestep the joint estimation, one strategy is to estimate the subspace and the corresponding coefficients of the HSI separately, which motivated us to develop the L1HyMixDe [32] method.
The proposed method in this paper is also based on two approaches: (a) learning subspace and coefficients of the HSI separately and (b) regularizing representation coefficients. However, the new work is distinct from L1HyMixDe in terms of the method of learning the subspace and the regularization imposed on the subspace representation coefficients. The main difference in subspace learning is that L1HyMixDe performs median filtering band by band, exploiting spatial correlation, whereas the new work estimates a coarse image using Hampel filtering pixel by pixel, exploiting spectral correlation. The Hampel filtering is more effective in outlier removal procedures. Furthermore, as anomalous target detection is an important task in hyperspectral imaging, it occurs that both anomalous targets and sparse noise are sparsely distributed in the spatial domain. If only the spatial information is considered-e.g., L1HyMixDe performs median filtering band by band, exploiting spatial correlation-then anomalous targets may be wrongly detected as sparse noise and would not be represented in the estimated subspace. This motivated us to develop a new method exploiting spectral information; i.e., spectral signatures of anomalous targets are smooth while spectral signatures of materials corrupted by sparse noise have unusual jumps in the value of the pixels. Furthermore, L1HyMixDe regularizes subspace representation coefficients with a non-local image prior, while this work adopts a more powerful deep CNN image prior. Compared with using non-local patch-based image priors, the new method using a deep-learning-based image prior is much faster as long as the deep network has been well trained. A computational efficient HSI mixed noise removal method is of importance in practice.

Contributions
The work aims to estimate a clean HSI from observations corrupted by mixed noise (containing Gaussian noise, impulse noise, and dead-lines/stripes) by exploiting two main characteristics of hyperspectral data, namely low-rankness in the spectral domain and high correlation in the spatial domain. Contributions of this work are summarized as follows: • Instead of estimating the subspace basis and the corresponding coefficients of HSIs jointly and iteratively, we decouple the estimation of the subspace basis and the corresponding coefficients. A new subspace learning method, which works independently from coefficient estimation and is robust to mixed noise, is proposed. • An image prior extracted from a state-of-the-art neural denoising network, FFDNet, is seamlessly embedded within our HSI mixed noise removal framework, which is a successful combination of the traditional machine learning technique and deep learning technique.
This paper is organized as follows. Section 2 formally introduces a mixed noise removal method-HySuDeep. Section 3 extends the proposed method to address mixed noise containing non-i.i.d. Gaussian noise. Sections 4 and 5 show and analyze the experimental results of the proposed method and the comparison methods. Finally, we present a conclusion of this paper in Section 6.

Problem Formulation
Some notations and tensor operations used in this paper and their definitions are provided in Table 1. ith mode-3 slice of X , a matrix obtained by fixing the mode-3 index of X to be i.
Mode-3 unfolding of X . A tensor can be unfolded into a matrix by rearranging its mode-3 vectors, which are the column vectors of X (3) .
Tensor matrix multiplication. The mode-3 product of a tensor X ∈ R I 1 ×I 2 ×I 3 by a matrix E ∈ R Jn ×I 3 is a tensor Y ∈ R I 1 ×I 2 ×Jn , denoted as Y = X × 3 E, which is corresponding to a matrix multiplication, Y (3) = EX (3) .
X F The definition of Frobenius norm of a matrix is extended to a tensor as follows:

Observation Model
Let X ∈ R r×c×n b denote an underlying clean HSI with r × c pixels and n b bands. Assuming that noise is additive, we can write an observation model as where Y, S, G ∈ R r×c×n b are observed HSI data, sparse noise, and Gaussian noise, respectively. Elements in G are assumed to be i.i.d. Gaussian with zero-mean and variance σ 2 . Non-i.i.d. noise is discussed in Section 3.

Subspace Representation of HSIs
The high correlation in the spectral domain of HSIs implies that spectral vectors (i.e., mode-3 vectors) of an HSI X can be well represented in a low-dimensional subspace [1,8,19]. We take a cropped Washington DC Mall image (shown in Figure 1a) as an example. To analyze the correlation of pixel values in the spectral domain, we obtained the autocorrelation matrix of its spectral vectors by computing E[X (3) Figure 1b, we can see that the first 100 bands are highly correlated. Due to the high spectral correlation, the autocorrelation matrix can be approximated by a low-rank matrix. The low-rankness may also be inferred in Figure 1c, where the first two singular values (the largest) dominate all other singular values, which decrease to near-zero rapidly. From Figure 1d,e, we can see that two spectral vectors of size 191 can be well approximated by vectors in a subspace with a dimension of 10. Based on the subspace representation of spectral vectors, we write with E ∈ R n b ×p (p n b ), Z ∈ R r×c×p , and Ξ ∈ R r×c×n b . Matrix E holds an orthogonal basis for the signal subspace, and the entries of Z are representation coefficients of X with respect to E. Hereafter, mode-3 slices of Z are termed the eigenimages. Tensor Ξ represents residuals of a low-dimensional representation of X . When SVD is adopted, most of the image variation can be compactly represented in a low-dimensional subspace, and the residual tensor contains a small amount of image energy. Thus, we omit Ξ when reconstructing the image; that is, X = Z × 3 E. Figure 2 illustrates the subspace representation of a hyperspectral image. The observation model (1) can be written as The subspace representation of spectral vectors has been demonstrated to be a powerful tool for solving some hyperspectral inverse problems, such as hyperspectral image super-resolution [33], hyperspectral image segmentation [34], hyperspectral image classification [35], and hyperspectral unmixing [36]. In this paper, we propose a hyperspectral image mixed noise removal method using subspace representation and deep CNN Image prior (HySuDeep), the flowchart of which is illustrated in Figure 3. However, for the hyperspectral mixed noise removal problem, the challenge of exploiting subspace representation lies in the estimation of the subspace from observations corrupted not only by Gaussian noise but also by sparse noise.

Subspace Learning against Mixed Noise
When noise in an observed HSI is approximately Gaussian distributed, the subspace of the underlying clean image can be estimated easily from the noisy image by performing SVD on the noisy image matrix; that is, {U, Σ, V} = SVD(Y (3) ). The estimated subspace is spanned by k eigenvectors corresponding to the k largest eigenvalues; that is, E = [U] :,1:k , where the subscript (:, 1 : k) means extracting the first k columns from a matrix.
However, when noise is a mixture of Gaussian noise and stripe noise, performing SVD on a noisy image matrix is not an optimal way to learn the spectral subspace. SVD aims to find the most important dimensions in the data by finding the directions of maximum variance, and the addition of i.i.d. Gaussian noise increases the variance of each dimension uniformly (thus, it does not change the order of singular values and consequently does not change the estimation of subspace), whereas the addition of stripes increases the variance of each dimension non-uniformly (thus, it does change the order of singular values). Experimental analysis is given in Section 4.3.
One of our contributions is to estimate the spectral subspace from observations corrupted by mixed noise and thus propose an efficient denoising method, which does not need to estimate spectral subspace iteratively.

Outlier Removal Using Hampel Filtering
Since the existence of sparse noise significantly disturbs the identification of signal subspace, we may estimate a coarse image by removing the sparse noise from observations. An outlier removal step applying Hampel filtering to spectral vectors of the observed image is introduced below.
Given a spectral vector of observed HSI, denoted as y = [y 1 , y 2 , . . . , y n b ] T ∈ R n b , and a sliding window of length q, we define the point-to-point median and median absolute deviation (MAD) estimates for the samples in the ith band using and respectively. The standard deviation (denoted as σ i,MAD ) is estimated by scaling the local median absolute deviation (MAD) by a factor of κ = 1/( √ 2erf −1 (1/2)) ≈ 1.4826: The Hampel filtering is a variation of the three-sigma rule of statistics, which is robust against outliers. If a sample y i is such that for a given threshold n σ , then the Hampel filtering declares y i an outlier and replaces it with m i ; that is,ỹ Here, we simply set n σ to 3. The coarse image Y is obtained by applying the Hampel filtering (8) to each spectral vector of the observed image. We remark that although the Hampel filtering can detect the positions of impulse noise and stripes well, it may not produce exact spectral estimates for the polluted pixels. Thus, the goal of performing the Hampel filtering on the observed image is to obtain a coarse image for subspace learning.

Subspace Identification
Given the coarse image Y (mainly containing Gaussian noise, not sparse noise), the subspace basis can be directly obtained by performing SVD on Y, as Y mainly contains Gaussian noise which is independent and identically distributed (i.i.d.). That is, the signal subspace is approximately spanned by the first p left-singular vectors of Y: where U ∈ R n b ×n b is an orthogonal matrix and { U, Σ, V} = SVD( Y (3) ) with singular values in Σ ordered by non-increasing magnitude. The dimension of subspace, p, can be automatically estimated by the HySime method [37].

Objective Function
Given the estimate of spectral subspace and the noise assumption (i.e., a mixture of i.i.d. Gaussian noise and sparse noise), we estimate the unknown variables, Z and S, in (3) by solving The first term on the above optimization problem is derived from the observation model (3). The third term is a regularizer with respect to matrix S. The 1 norm of S, given by promotes the sparsity of S. Finally, λ 1 , λ 2 ≥ 0 are regularization parameters that trade off the importance of the respective regularizers. If the φ is a convex function, then the optimization (10) is a convex problem.
The second term in the optimization problem (10) is a regularization expressing prior information tailored to spatially correlated eigenimages [8,17]. We use the plug-andplay technique [29,30] to assign a prior for eigenimages; rather than investing efforts in designing new regularization for eigenimages, we can plug an image prior extracted from an off-the-shelf denoiser conceived for natural images. A deep image prior extracted from a CNN-based denoising network, FFDnet, is employed, and thus an explicit definition of the function φ(·) is not given in optimization (10). More detailed analysis with regard to the deep prior φ(·) can be found in Section 2.4.3.
Let A =concat([Z, S], 3) be a r × c × (p + n b ) tensor that concatenates the eigenimages Z and the sparse noise S along the third dimension. The optimization problem (10) can be rewritten as where I a denotes an identity matrix of size a and 0 (a×b) is a zero matrix of size a × b.

Solver
The optimization problem (11) can be solved by the constrained-split augmented Lagrangian shrinkage algorithm (C-SALSA) [38,39]. C-SALSA is an instance of an alternating direction method of multipliers (ADMM) [40] that was developed to solve the constrained optimization formulation of regularized image restoration. By using variable splitting, we can convert the original optimization into a constrained one: The augmented Lagrangian function of the above optimization is where µ 1 , µ 2 , µ 3 > 0 are C-SALSA penalty parameters.
The optimizations with regard to A and V 1 (on lines 3 and 4 of Algorithm 1) are quadratic problems, whose solutions are given by (for line 3) and (for line 4). Line 6 of Algorithm 1 is a proximity operator of the 1 norm applied to It can be solved by an element-wise soft-threshold function; that is, and threshold (x, τ) returns the soft thresholding of x (where τ is the threshold value) [41]: Line 5 of Algorithm 1 is a proximity operator of φ applied to V 2,k = A k+1 × 3 [I p , 0 (p×n b ) ] + D 2,k . As the scaled Lagrange multiplier D 2,k tends to 0 as the C-SALSA iteration goes on, V 2,k tends to be close to Z k+1 . Considering that orthogonal projection is a decorrelation transformation and mode-3 slices of Z tend to be decorrelated, it is reasonable to decouple φ(·) with respect to the mode-3 slices; that is, where V 2,k (:, :, i) denotes the ith mode-3 slice. The solution of the subproblem on line 5 can be decoupled with regard to mode-3 slices of V 2,k and can be written as where is the so-called Moreau proximity operator (MPO) [42] of φ or the denoising operator.

Plug-and-Play Prior, φ(·)
Instead of investing great effort in conceiving a new eigenimage regularization φ i (X), we make use of the plug-and-play (PnP) prior [8,29,30]. The PnP tool is a powerful framework that solves an inverse image problem by directly using an existing regularizer from a state-of-the-art image denoiser. The PnP framework has been applied to hyperspectral image fusion, spectral unmixing, hyperspectral image inpainting, and anomaly detection, and produces surprisingly good image recovery results. Inspired by these successful applications of PnP, we plug the prior of a deep denoising network, FFDnet [24], into (18), leading to where FFDNet(·) represents the network and outputs a denoised image. It is worth noting that other state-of-the-art image denoisers can also be plugged in (18). We choose FFDNet in this work based on the following considerations: (a) deep-learning-based FFDNet is much faster than other machine learning-based denoisers as long as it has been well trained and (b) FFDNet is flexible and able to address images with various noise levels without retraining the network.
Since the denoiser FFDNet, plugged into the subproblem with regard to V 2 , is not a proximity operator, we do not have a theoretical convergence guaranteed for the implemented variant of C-SALSA. However, impressive performances using plugged nonconvex regularizations have been observed in a number of inverse problems [19,43,44]. The numerical convergence of the proposed HySuDeep is analyzed in Section 4.5.

HSI Recovery
After estimating Z, the denoised image is obtained as follows:

Model Extension to Non-i.i.d. Gaussian Noise
Gaussian noise in the observation model (1) is assumed to be i.i.d., meaning that the model is simplified and we can focus on the discussion of the image structure. Let C λ = E[n i n T i ] (where n i is a generic column of unfolded mode-3 matrix N (3) ) define the covariance matrix of the spectral noise. We have C λ = σ 2 I n b when the noise is i.i.d.
However, we remark that Gaussian noise in real HSIs tends to be non-i.i.d.; that is, it is pixel-wise independent but band-wise dependent. Before implementing denoising according to model (1), we need to convert the non-i.i.d. scenario into an i.i.d. scenario by implementing data whitening:Ȳ where √ C λ and C −1 λ denote the square root of C λ and the inverse of C λ , respectively. The estimate of the covariance matrix, C λ , is challenging when noise in observations is a mixture of Gaussian and sparse noise. If the HSI is only corrupted by Gaussian noise, then the noise covariance matrix can be estimated by exploiting the spectral correlation of HSI. The spectrally uncorrelated components extracted from the HSI are considered as Gaussian noise, whose covariance matrix can be computed easily. This problem has been studied deeply, and we refer readers to representative works [37,45]. However, when the noise in the image is a mixture of Gaussian noise and sparse noise, the spectrally uncorrelated components extracted from the HSI are mixed noise, instead of single Gaussian noise. To solve this problem, we propose a method to split two kinds of noise and estimate C λ in two steps, as listed below: • Estimate a coarse image, Y, by removing the sparse noise from observation. A outlier removal step applying Hampel filtering to spectral vectors of the observed image is given in (8).
• We apply linear regression to each band of the image Y; i.e., each band is represented as a linear combination of the remaining bands [37]. That is, where Y T (3) denotes the transpose of the mode-3 unfolding matrix Y (3) , the subscript [·] :,i means extracting the ith column from a matrix, a matrix with the subscript [·] α i means the matrix including all columns except ith column, β i ∈ R n b −1 denotes regression coefficients, andξ i ∈ R n denotes regression error. The regression coefficients β i can be estimated by the least squares method; i.e., Givenβ, the regression error,ξ i , is computed bŷ The regression errors are taken as a coarse estimate of the Gaussian noise; thus, its covariance matrix, C λ = diag([σ 2 1 , σ 2 2 , . . . , σ 2 n b ]), can be estimated by where var(·) is a function computing the variance of the elements of the input vector.
Given the estimate of C λ , we can convert band-dependent Gaussian noise to i.i.d. Gaussian noise via (21). Here, we analyze the impact of image conversion on Gaussian noise. Given the image conversion, we can compute the spectral covariance of the Gaussian noise in the converted image as follows:C From (27), we can see that after image conversion, the Gaussian noise is i.i.d. and standard distributed, which meets the noise assumption in model (1); thus, the converted image,Ȳ, can be denoised as discussed in Section 2. Finally, we reconstruct the clean image as follows: where X is the estimated clean version of imageȲ.
The pseudocode in Algorithm 2 shows how HySuDeep is implemented to reduce mixed noise for an HSI. Given an HIS of size r (rows) ×c (columns) ×n b (bands) with subspace dimension p (p n b ), the computational complexity of obtaining a coarse image Y in line 2 and C λ in line 3 is O(r * c * n b * q * log q) and O(r * c * n 3 b ), respectively. The Gaussian noise-whitening in line 4, its inverse transformation in line 8, and the image reconstruction step in line 7 have the same computational complexity; that is, O(r × c × n 2 b ). The estimation of the spectral subspace in line 5 and eigenimages in line 6, respectively, cost O(r 2 * c 2 * n b ) and O(r * c * (p + n b ) 2 + (p + n b ) 3 + p * d). Here, d represents the computational complexity of denoising an eigenimage using FFDNet. Consequently, the overall computational complexity of HySuDeep is O(r * c * n 3 b + r 2 * c 2 * n b + p * d). Obtain an whitened imageȲ using (21). 5: Denoise the whitened imageȲ using Algorithm 1, and obtain X . 6: Compute X = X × 3 √ C λ , an estimate of the clean HSI.

Experiments with Simulated Images
Experiments were carried out on three simulated datasets to evaluate the performance of the proposed HySuDeep method compared with six state-of-the-art denoising methods for HSI mixed noise removal.

Simulation of Noisy Images
Three public hyperspectral datasets (shown in Figure 4a-c) were employed to simulate the noisy images. Following the same procedure in [32], we generated noisy HSIs of Pavia University data (of size 310(rows) × 250(columns) × 87(bands)), a subregion of Washington DC Mall data (of size 150(rows) × 200(columns) × 191(bands)), and Terrain data (of size 500(rows) × 307(columns) × 166(bands)).  For the first two datasets, we removed bands that were severely corrupted by water vapor in the atmosphere. To obtain relatively clean images, we projected spectral vectors of each image onto a subspace spanned by eight principal eigenvectors of each image. The projection of each image was considered to be a clean image. To simulate noisy HSIs, we added four kinds of additive noise into images as follows: Case 1 (Gaussian non-i.i.d. noise): Synthetic data with Gaussian noise. The noise in ith pixel vector g i ∼ N (0, D 2 ), where D is a diagonal matrix with diagonal elements sampled from a uniform distribution U(0, 0.01).

Case 2 (Gaussian noise + stripes):
Synthetic data with Gaussian noise (described in case 1) and oblique stripe noise randomly affecting 30% of the bands and, for each band, about 10% of the pixels, at random.

Case 3 (Gaussian noise + "Salt and Pepper" noise):
Synthetic data with Gaussian noise (described in case 1) and "Salt and Pepper" noise with a noise density of 0.5%, meaning approximately 0.5% of elements in X are affected.
Case 4 (Gaussian noise + stripes + "Salt and Pepper" noise): Synthetic data with Gaussian noise (described in case 1), random oblique stripes (described in case 2) and "Salt and Pepper" noise (described in case 3).
To evaluate the impact of denoising methods on the hyperspectral unmixing task, we simulated a clean semi-real HSI based on the publically available Terrain image (Figure 4c) following generation steps in [36]. A MATLAB demo used to generate a semi-real HSI is available at https://github.com/LinaZhuang/NMF-QMV_demo.The original Terrain image has a size of 500(rows) × 307(columns) × 166(bands) and is mainly composed of soil, tree, grass, and shadow. The number of endmembers is empirically set to 5, as performed in [36,46,47]. Briefly, a clean Terrain image was synthesized based on a linear mixing model [1]; i.e., X = A× 3 M, where M and A are endmembers and abundances, respectively, estimated from the original Terrain image. Next, we generated a noisy Terrain HSI by adding the Gaussian noise and oblique stripe noise (as described in case 2)-i.e., Y = A× 3 M + S + G-yielding an MPSNR of 28.90 dB.

Comparisons
To thoroughly evaluate the performance of the proposed method, six state-of-the-art HSI denoising methods were selected for comparison: the fast hyperspectral denoising method (FastHyDe) [8], the noise-adjusted iterative low-rank matrix approximation method (NAIRLMA) [48], the spatio-spectral total variation based method (SSTV) [49], the low-rank matrix recovery method (LRMR) [15], double-factor-regularized low-rank tensor factorization (LRTF-DFR) [21], and the 1 -norm based hyperspectral mixed noise denoising method (L1HyMixDe) [32]. These compared methods were carefully selected. The FastHyDe served as a benchmark to see whether mixed noise could be simply addressed by Gaussian denoisers. The NAIRLMA, SSTV, and LRMR are from highly cited papers working on hyperspectral mixed noise based on low-rank and sparse representations. The LRTF-DFR and L1HyMixDe methods are subspace representation methods and remove noise by filtering the subspace coefficients of HSIs. All experiments were implemented in MATLAB (R2016a) on Windows 10 with an Intel Core i7-7700HQ 2.8-GHz processor and 24 GB RAM. A MATLAB demo of this work will be available at https://github.com/LinaZhuang for the sake of reproducibility.
The size of the sliding window in the Hampel filter, q, was fixed to 7. The dimension of the subspace, p, for FastHyDe, LRTF-DFR, L1HyMixDe, and HySuDeep methods was estimated by HySime [37] automatically. The other parameters of FastHyDe, NAILRMA, LRTF-DFR, and L1HyMixDe were set as suggested in their original references. We finetuned the parameters λ and µ in SSTV and parameters r and s in LRMR to obtain the optimal results. For quantitative assessment, the peak signal-to-noise ratio (PSNR) index, the structural similarity (SSIM) index, and the feature similarity (FSIM) index of each band were calculated. The mean PSNR (MPSNR), mean SSIM (MSSIM), and mean FSIM (MFSIM) of the proposed and compared methods on Pavia University Data and Washington DC Mall Data are presented in Table 2, where we have highlighted the best results in bold.

Mixed Noise Removal
We compare the proposed method HySuDeep with FastHyDe [8], NAILRMA [48], SSTV [49], LRMR [15], LRTF-DFR [21], and L1HyMixDe [32] methods in terms of MPSNR, MSSIM, and MFSIM (provided in Tables 2 and 4). The band-wise PSNR is depicted in Figure 5 for quantitative assessment. Among the compared methods, only FastHyDe is shown to address pure Gaussian noise, not mixed noise. We include FastHyDe to evaluate whether mixed noise can be reduced by a simple Gaussian denoiser. As shown in Table 2, FastHyDe outperformed other methods in case 1 (where HSIs are corrupted by only Gaussian noise). However, when the images were contaminated by mixed noise in Cases 2-4, the results of FastHyDe show that heavy mixed noise cannot be reduced well by a Gaussian denoiser. The existence of mixed noise in HSIs calls for effective denoising techniques.
To address mixed noise, an additive term is usually introduced to model sparse noise so that the entries in the data fidelity term, (Y − X − S), follow Gaussian distributions. The proposed HySuDeep, NAIRLMA, SSTV, LRMR, and LRTF-DFR methods fall in this line of research. The critical differences among these methods are the regularizations imposed on the HSI, X . SSTV minimizes the total variation of the HSI in the spectral-spatial domain. NAILRMR and LRMR impose spectral low-rankness in spatial patches of the image. LRTF-DFR and the proposed HySuDeep both enforce the low-rankness of spectral vectors by subspace representation. To exploit the spatial correlation of eigenimages, LRTF-DFR minimizes the spatial difference (i.e., total variation) of eigenimages, while HySuDeep uses a CNN-regularization (i.e., a deep image prior extracted from a CNN network).
The proposed method uniformly yields the best results in Cases 2-4, as shown in Tables 2 and 4. The main reason is that, compared with SSTV, NAILRMR, LRMR, LRTF-DFR, and L1HyMixDe methods, our method uses a more efficient spatial regularization term. Although the CNN regularization works like a black box, we can see its superiority over TV regularization (used in SSTV and LRTF-DFR) and patch-based regularization (used in NAILRMA, LRMR, and L1HyMixDe) from Tables 2 and 4. Furthermore, HySuDeep is similar to the L1HyMixDe method in terms of using subspace representations of spectral vectors and imposing a regularization on representation coefficients. Tables 2 and 4 show that HySuDeep achieves better results than L1HyMixDe. The difference is caused by the accuracy of spectral subspace learning. Compared with L1HyMixDe, HySuDeep elaborates an outlier removal operator, which improves the estimates of subspace E (see discussion in Section 4.3).
For visual comparison, we display the 37th band of Pavia University data and the 126th band of Washington DC Mall data in Figures 6 and 7, respectively. For case 1 (Gaussian noise), all methods can reduce noise significantly. As shown in Figures 6 and 7, SSTV and LRMR are able to remove light stripes, but still leave some wide stripes. Heavy noise still remains in the results of NAILRMA and FastHyDe. LRTF-DFR, L1HyMixDe, and HySuDeep methods visually yield comparable results in Figures 6 and 7. To show their differences, we generated and presented the residual images in Figure 8a,b, where we can see the results of proposed HySuDeep contain less residual compared with LRTF-DFR and L1HyMixDe.

Subspace Learning against Mixed Noise
One of the contributions of this paper is the introduction of an orthogonal subspace identification method that is robust to mixed noise. We conducted experiments using simulated images to compare the proposed subspace learning method with representative subspace learning methods: namely, SVD, RPCA, L1HyMixDe [32], and HySime [37]. Results are reported in Table 3, where we show two metrics, namely γ 1 = X × 3 E T × 3 E 2 F / X 2 F and γ 2 = N × 3 E T × 3 E 2 F / N 2 F , measuring the relative power of the clean spectral vectors and noise lying in the estimated subspace, respectively.
In Case 1, two images were corrupted only by Gaussian noise, and values of γ 1 for all methods are higher than 0.9993, implying that all the methods can estimate a subspace representing image signal very well. Case 3 shows results similar to Case 1; that is, all the methods achieve relatively high γ 1 . We conclude that when noise is Gaussian distributed or a mixture of Gaussian and "Salt and Pepper" noise, signal subspace learning is not a challenging task and we can simply use SVD. The reason is that Gaussian noise and "Salt and Pepper" noise are randomly and uniformly distributed in each channel; thus, the noise increases singular values in the direction of each eigenvector almost uniformly and does not change the order of the singular values of clean images. The signal subspace is approximately spanned by p singular vectors of the noisy image corresponding to the largest p singular values.
However, subspace learning from noisy images in Case 2 and Case 4 is challenging and we cannot simply use SVD because the stripe noise usually exist in specific channels, instead of being uniformly distributed in all channels. The stripe noise will significantly increase the variances of those channels affected by stripe noise. Due to the fact that SVD tends to learn a subspace representing information in channels with high variances, SVD is not suitable for Case 2 and Case 4. This inference is consistent with the results in Table  3, where SVD obtains the worst results in terms of γ 1 and γ 2 in case 2 and case 4. If we only focus on the table rows highlighted with gray color, the proposed learning method performs better than others. Among the compared methods, RPCA learns a subspace representing low-rank signal and excluding sparse noise. However, stripe noise is also low-rank [50], and RPCA cannot separate an image signal and stripes using only a low-rank regularization. HySime is conceived based on an assumption of Gaussian noise and is clearly not suitable for mixed noise. L1HyMixDe performs median filtering band by band, exploiting spatial correlation, whereas HySuDeep estimates a coarse image using Hampel filtering pixel by pixel, exploiting spectral correlation. As shown in gray rows of Table 3, HySuDeep uniformly provides the best performances.

Analysis of Regularization Parameters
There are two regularization parameters-namely, λ 1 and λ 2 -in the objective function of the proposed HySuDeep method. Controlling the trade-off between Gaussian noise reduction and detail preservation, the parameter λ 1 is related to the standard deviation of Gaussian noise, which can be estimated via (25).
The second parameter λ 2 controls the sparsity of estimation of sparse noise S. The setting of λ 2 should depend on the intensity of sparse noise. Figure 9 depicts the denoising performance of HySuDeep in the Pavia University image and in the Washington DC Mall image as a function of the value of parameter λ 2 . When the value of λ 2 is set to {1, 3, 5, 7, 9}, denoising results are acceptable in both images. Therefore, we simply set λ 2 to 3 in all experiments in this paper.

Numerical Convergence of the HySuDeep
Since CNN-regularization is incorporated into (10), the proposed HySuDeep cannot be considered as a convex optimization problem; thus, its theoretical convergence is not guaranteed. However, its numerical convergence is systematically observed when the augmented Lagrangian parameters are set to µ i = 1 (i = 1, 2, 3). As shown in Figure 10, the relative change converges to near zero after 15 iterations, implying the convergence of the proposed method can be numerically guaranteed.

Application in Hyperspectral Unmixing
Hyperspectral denoising is usually implemented as a preprocessing step for subsequent applications. This subsection takes hyperspectral unmixing as an example to evaluate whether the image denoising step has a positive impact on the performance of subsequent applications. Spectral unmixing mainly involves two stages: (i) identifying materials present in the scene (termed endmembers) and (ii) estimating the fraction (or abundance) of each material in each pixel. The hyperspectral unmixing task was chosen considering that the endmember extraction step can be very sensitive to sparse noise.
We first performed denoising on the Terrain image using FastHyDe, NAILRMA, SSTV, LRMR, LRTF-DFR, L1HyMixDe, and HySuDeep. The mean PSNR (MPSNR), mean SSIM (MSSIM), and mean FSIM (MFSIM) of the proposed and compared methods on Terrain image are presented in Table 4, where we have highlighted the best results in bold. Then, we unmixed the spectra of the denoised images by using vertex component analysis (VCA) [53] to estimate the endmembers and used fully constrained least squares (FCLS) [54] to estimate the abundances. Two metrics were computed: the normalized mean square error (NMSE) of endmembers A and abundances S, denoted as NMSE A and NMSE S , respectively. As reported in Table 4, LRMR, LRTF-DFR, L1HyMixDe, and HySuDeep obtain a lower NMSE of endmembers than the counterpart image without denoising processing. For images denoised by FastHyDe, NAILRMA, and SSTV, the errors in the estimates of endmembers directly result in high errors in the estimates of abundances. Among LRMR, LRTF-DFR, L1HyMixDe, and HySuDeep, the proposed HySuDeep leads to the lowest NMSE of abundances. Note: NMSEA M and NMSES A denote root mean square error of endmembers and abundances, respectively.

Experimental Results for Real Images
We evaluate the performance of the proposed method on two real HSI datasets, as shown in Figure 4d,e.

Hyperion Cuprite Dataset
The Cuprite HSI was captured at Cuprite, NV, USA, by the Hyperion sensor, which divides the spectrum from 355 nm to 2577 nm into 242 channels with a spectral resolution of 10 nm. The spatial resolution of the image is 30 m. A subregion of size 240 × 178 pixels with 177 spectral channels (after removing very low SNR channels) was cropped for the test. Four bands are shown in Figure 11, where bands 93, 133, and 134 were corrupted by severe stripes, and band 156 was affected by a dead-line. The full-resolution images of Figure 11 will be available at https://github.com/LinaZhuang (accessed on Nov. 1, 2021). All methods (except NAILRMA) are able to remove the dead-line in band 156. For severe stripes in the other three bands, we can see that FastHyDe, LRMR, L1HyMixDe, and HySuDeep achieved good restoration results while obvious stripes remained within the results of other methods.

Tiangong-1 Dataset
The Tiangong-1 dataset was acquired over an area of Qinghai Province, China in May 2013, by a sensor placed in the Tiangong-1 imager, which has a 75-band push broom scanner with nominal bandwidths of 23 nm short wave infrared (SWIR), covering from 800 nm to 2500 nm. A subregion image of size 351 × 253 pixels was tested. Three bands displaying strong noise are shown in Figure 12, where band 1 has severe cross-track illumination error, and bands 36 and 65 contain obvious stripes. The full-resolution images of Figure 12 will be available at https://github.com/LinaZhuang (accessed on Nov. 1, 2021). Comparing the images before denoising and after denoising, we can see that FastHyDe, L1HyMixDe, and HySuDeep can correct the illumination error in band 1. In bands 36 and 65, FastHyDe, LRMR, L1HyMixDe, and HySuDeep can alleviate stripe noise. If we focus on band 65, our HySuDeep method obtains the visually best result considering the restoration of the areas in the red circles.

Conclusions
This paper introduces a hyperspectral mixed noise removal method, HySuDeep, by exploiting two important characteristics of HSIs. HySuDeep takes advantage of the spectral low-rankness of HSIs by representing clean spectral vectors in a low-dimensional subspace, which significantly improves the conditioning of the denoising problem. The spatial correlation of HSIs is exploited by adding CNN regularization for eigenimages. Although the network was trained using grayscale images acquired from commercial cameras, without using remote sensing images, the network still achieves impressive performance for HSI denoising. The reason is that both kinds of images are natural images, sharing the same properties, such as local and non-local similarity, and piece-wise smoothness. Therefore, the image prior learned by the network from grayscale images is also applicable to HSIs. Experimental results on both simulated and real HSIs show the superiority of the proposed method for mixed noise in HSIs.