Hyperspectral Image Recovery Using Non-Convex Low-Rank Tensor Approximation

Low-rank tensors have received more attention in hyperspectral image (HSI) recovery. Minimizing the tensor nuclear norm, as a low-rank approximation method, often leads to modeling bias. To achieve an unbiased approximation and improve the robustness, this paper develops a non-convex relaxation approach for low-rank tensor approximation. Firstly, a non-convex approximation of tensor nuclear norm (NCTNN) is introduced to the low-rank tensor completion. Secondly, a non-convex tensor robust principal component analysis (NCTRPCA) method is proposed, which aims at exactly recovering a low-rank tensor corrupted by mixed-noise. The two proposed models are solved efficiently by the alternating direction method of multipliers (ADMM). Three HSI datasets are employed to exhibit the superiority of the proposed model over the low rank penalization method in terms of accuracy and robustness.


Introduction
Hyperspectral image (HSI) has drawn a lot of attention on account of its rich spectral information. However, HSI can be easily disturbed by many external factors, such as missing entries, noise, and so on, which not only degrade the visual quality of the image but also limit the precision of the subsequent image interpretation and analysis. Therefore, HSI restoration has attracted an increasing interest in recent years and various HSI restoration models have been developed [1][2][3]. Among them, considering the spatial geometric similarity and spectral correlation of HSI, low-rank prior modeling has become one of the most popular technologies [4]. Besides, to enhance the structural details, a total variation regularized low-rank matrix factorization (LRTV) restoration model was proposed in Ref. [5], which obtains satisfying restoration results in case of Gaussian noise. As for removing the mixednoise, enhancing edge and spectral signatures, a weighted total variation regularized low-rank (LRWTV) model was proposed [6]. Candes et al. employed robust principle component analysis (RPCA) to separate clean HSI from corrupted sparse noise in polynomial time [7]. Chang et al. established a low-rank-based single-image decomposition model (LRSID) was established to remove the stripe noise in HSI [1].
With the development of low-rank approaches, non-convex low-rank approximation approaches, including weighted nuclear norm minimization (WNNM) [8], Schatten p-norm minimization [9], weighted Schatten p-norm minimization (WSNM) [10] and minimax concave penalty (MCP) function [11], etc., have also been studied due to the robustness and unbiasedness. Wang et al. [12] developed the MCP function for the rank minimization problem and to obtain a socalled γ-norm as a non-convex relaxation of the matrix rank. A non-convex relaxation HSI low-rank

Notations and Preliminaries
Throughout this paper, we use , , and x to denote tensor, matrix, vector and scalar, respectively. For a 3-order tensor ∈ ℝ , the , , -th entry is denoted as and is the mode-i matricization. We use the Matlab notation , : , : , : , , : and : , : , to denote, respectively, the -th horizontal, lateral and frontal slice. More often, the frontal slice : , : , is denoted as . The tube is denoted as , , : . The inner product of matrices and in ℝ is defined as , Tr * , where * denotes the conjugate transpose of and Tr(•) denotes the matrix trace. The inner product of the two tensors and in ℝ is defined as , ∑ , . Some norms of vector, matrix and tensor are used. We denote the -norm, Frobenius norm and nuclear norm as || || ∑ | | , || || ∑ | | and || || * ∑ || || * , respectively.
The above norms reduce to the vector or matrix norms if is a vector or a matrix. Specifically, the nuclear norm of the matrix is defined as ‖ ‖ * ∑ , in which is the singular values of matrix with ≫ ≫ ⋯ ≫ , and r is the rank of matrix .
The Fourier transformation of tensor ∈ ℝ is denoted by . In particular, we denote as a block diagonal matrix with each block on diagonal as the frontal slice of , i.e.,

⋱
The block circulant matrix of tensor ∈ ℝ has size , i.e., We also define the following operator Then, t-product between two 3-order tensors can be defined as follows. [19]: The t-product * of two tensors ∈ ℝ and ∈ ℝ is a tensor of size , and * • .

Definition 1. (t-product)
Definition 2. (f-diagonal tensor) [19]: A tensor is called f-diagonal if each of its frontal slices is a diagonal matrix. [19]: For a tensor ∈ ℝ , the t-SVD of is given by * * , where ∈ ℝ , ∈ ℝ are orthogonal, and is a rectangular f-diagonal tensor of size . Definition 3. (Tensor multi rank and tubal rank) [33]: The tensor multi rank of ∈ ℝ is a vector ∈ ℝ with its i-th entry as the rank of the i-th frontal slice of , i.e., .

Theorem 1. (t-SVD)
The tensor tubal rank, denoted as , is defined as the number of nonzero singular tubes of , where is from the t-SVD of * * .

Proposed Methods
Minimizing the rank is equal to minimizing the -norm; it is a NP-hard problem and can be solved by convex or non-convex relaxation methods. Nuclear norm and weighted nuclear norm are common convex approximations, while the convex surrogate will over-penalize larger singular values, resulting in the approximation deviation. Therein, non-convex surrogate function, which can keep large singular values unchanged, has a nearly unbiased approximation to the -norm.

Non-Convex Low-Rank Relaxation
As it has been investigated in [10], the γ-norm || || (γ is the positive factor) of matrix with rank r is defined as the sum of MCP functions of the matrix singular values • is the so-called MCP function, λ is a threshold parameter.
It can be derived that when → ∞, the non-convex MCP function tends to -norm and it satisfies ≪ || || , which leads to a nearly optimal non-convex approximation to -norm.
As such, the derived matrix -norm defined by the MCP function with regard to singular values generates a nearly optimal non-convex approximation to low-rank (NRLR). The solution of -norm results in singular value soft thresholding (SVST). Compared with the popular singular values thresholding (SVT) of matrix nuclear norm, SVST keeps large singular values of a matrix unchanged, which has the advantages of nearly unbiasedness and more accuracy, as illustrated in Figure 2. Inspired by matrix γ-norm, for a third-order tensor ∈ ℝ , we define the tensor γ-norm as follows: where stands for the r-th singular values of the matrix. The defined tensor γ-norm is the average of -norm of all the frontal slices of , and can be computed by the sum of MCP function with singular values of as variables, which is a non-convex relaxation of the matrix nuclear norm. Therefore, the defined tensor γ-norm can be regarded as a nearly unbiased non-convex relaxation of tensor rank, which leads to better approximation to tensor rank than the TNN norm or other convex relaxation methods. Moreover, the proposed tensor γ-norm has some good properties that can be summarized in Theorem 2. Proof.

So we have
then ‖ ‖ is the orthogonal invariant.

Non-Convex Soft Shrinkage Operator of Tensor
As it is stated in the matrix γ-norm, minimization of the -norm gets the solution in the form of SVST. For the proposed tensor γ-norm minimization, we can also infer the same result as the following theorem presented.
where matrix Ψ can be represented as Ψ , , are orthogonal matrices, is a diagonal matrix with diagonal elements , … , .
According to matrix-based non-convex soft shrinkage operator in [44], the solution of Equation where , In the following, we will show how to solve through the expression of as Equation (6).
Assuming Ψ has t-SVD Ψ * * , has Fourier transform , then the elements in the singular tube of are produced by multiplying , , and , , where The above mentioned is processed in the Fourier domain; in the spatial domain, an equivalent formulation can be represented by * ℳ, where ℳ is a f-diagonal tensor and its Fourier transform of the i-th diagonal tube is denoted as ℳ , , : ℳ , , : , , 3 . Hence, the elements of ℳ , , : are and are computed as follows: We have * , * ℳ * : , Ψ . This theorem proves that the optimization of the proposed tensor γ-norm leads to SVST. As illustrated in Figure 2, SVST retains more of the large singular values of frontal slice than SVT does, which in turn captures the most important information of HSI. Taking this advantage into consideration, we utilize the proposed tensor γ-norm as a non-convex low-rank regularization to the following HSI restoration models.

Non-Convex Low-Rank Tensor Completion
Assuming of size is the observed HSI in a set Ω, which has missing values compared with original HSI of size , Ω ⊂ 1, 1, 1, denotes the socalled sampling set. The objective of HSI completion is to estimate the original HSI from the observed HSI . The HSI completion problem can be formulated as follows: is the pixel of . By incorporating tensor γ-norm into the TNN minimization model [33] as a non-convex relaxation of low-rank, a novel low-rank tensor completion method named as non-convex TNN (NCTNN) minimization is developed as follows: According to Theorem 3, tensor -norm in problem (7) has a solution with the SVST operator, as Equation (3) has shown. As it is well known, low-rank prior represents the inherence spectralspatial correlation of HSI, and large singular values stand for the most useful information. In SVST, the large singular values are nearly untouched, and other tiny values are shrunk, which provides nearly unbiased and more accurate low-rank approximation. Moreover, due to the low-rank relaxation term, the model can capture the spatial and spectral information of HSI simultaneously.
The optimization problem (7) (7) is equivalent to the following: The optimization problem (8) can be solved by various methods. Considering that t-SVD converts the TNN minimization into matrix rank minimization in the Fourier domain, the argument Lagrange formulation of Equation (8) in the Fourier domain is written as follows: where is the auxiliary variable which satisfies , is the indicator function, is Lagrange multiplier, is the parameter.
Then we incorporate t-SVD with the alternating direction method of multipliers (ADMM) [30][31][36][37]45] to solve the proposed model as shown in Algorithm 1. The iteration is ended if the error of two successive iterations is smaller than a constant ε, which is usually sufficiently small. The smaller ε is, the more time consumption takes. In Algorithm 1, we set ε 10 empirically.

Non-Convex Tensor Robust Principle Component Analysis
Although the aforementioned NCTNN model can recover degraded HSI well, it may have a weakness in removing mixed-noise efficiently, such as the sparse noise. Robust principle component analysis (RPCA) [7] fully considers the sparse noise in HSI by separating the image into signal and sparse noise or outliers, and gives low-rank and sparsity priors to the two parts, respectively. Unfortunately, RPCA in [7] regards HSI as a matrix, which destroys the intrinsic structure of HSI, while tensor robust principle component analysis (TRPCA) associated with TNN in [33] may obtain an image with residual noise on account of the limited low-rank approximation of TNN. Hence, to remove the sparse noise effectively, we introduce the proposed tensor γ-norm to the RPCA model, resulting in the second proposed method: non-convex tensor robust principle component analysis (NCTRPCA) as follows: where ⊆ ℝ is the observed image, and are the spatial ways and is the spectral way, is the original image, is the sparse noise, is a parameter, || || , , is defined as ∑ || , , : || , .
The non-convex optimization problem Equation (10) is solved by the same strategy as Algorithm 1, in which ADMM and Fourier transform are also employed.

Experimental Results and Analysis
In The parameters in the compared methods are adjusted manually to optimal. While in our method, parameter rank r, and are adjusted corresponding to the parameters analysis given later on.
The quantitative assessment indices, including the relative square error (RSE), the mean peak signal-to-noise ratio (MPSNR), the mean structural similarity (MSSIM) and the mean spectral angle distance (MSAD) are adopted.

Experiment 1: Comparative Experiments for NCTNN
In order to verify the effectiveness of NCTNN in the restoration of HSI, we applied some matrix and tensor completion methods from the perspective of non-convex relaxation and convex relaxation to HSI recovery. The observed HSI is obtained by randomly choosing a certain percentage element from the original HSI tensor, a large percentage denotes a high sampling rate (SR) and vice versa. Washington DC and Pavia University datasets are used in the experiments.

Non-Convex Methods Comparative Experiments
Firstly, the proposed NCTNN is compared with two non-convex methods, NMC [14] and NORT [41], among them, NMC is a matrix completion method, and NORT belongs to a tensor completion method.
The rank of the matrix in NMC is 15, and the tensor rank of NORT is 10,10,10 , and in the proposed NCTNN method, 25,25,25 , 2, respectively. Table 1 summarizes the MPSNR, MSSIM values obtained from the three methods for Pavia University in different sampling rates (SR). The best index is in boldface. It is obvious that NORT and NCTNN, two tensor completion methods, perform better than the matrix-based NMC method in dealing with HSI recovery. NCTNN has improvements in MPSNR, MSSIM with nearly 0.7535 dB, 0.0122 over NORT on average, respectively. In addition, MSAD values of NCTNN perform best on spectral maintenance. Furthermore, the performance of NCTNN has a rising trend with the decrease in the sampling rate. Figure 3 shows PSNR and SSIM curves of the three methods with a 50% sampling rate. We can find that NORT has more abnormal bands than the other two methods, while the curves of NMC, NCTNN are much more stable, and NCTNN performs best.
The recovery images of the three methods on band 100 with a 50% sampling rate are shown in Figure 4. NMC has more missing pixels, while the other two tensor-based methods can recover most information of the image, and the result of NCTNN has more details in the red box.

Convex Methods Comparative Experiments
The state-of-the-art low-rank tensor convex relaxation methods, including LRTC [17], TMac [23] and TNN [30], are employed in this section. TNN uses tensor tubal-rank to tensor completion, while LRTC and TMac utilize the same tensor rank, which is defined by matrix factorization of each mode unfolding of the tensor. Moreover, LRTC provides algorithms to get efficient or high accuracy estimations, while TMac solves the tensor completion with faster algorithms as well as rank-adjusting strategies.
The parameters are set empirically. In TMac, a smaller rank is given due to its adaptation in the algorithm, e.g., 10,10,10 . While for other three methods, the same rank 20,20,20 is chosen to give a relatively fair comparison. Parameters and in NCTNN are set as 1/ √ , 2. Table 2 summarizes the RSE, MPSNR, MSSIM and MSAD values obtained from four tensor completion methods for Washington DC Mall in different sampling ratios. Obviously, the indices of TNN and NCTNN are superior to those of the LRTC and TMac methods, which is mainly due to the different definitions of tensor rank. The proposed NCTNN method achieves the best performance, and TNN ranks second best. Compared with TNN, NCTNN has higher indices in MPSNR, MSSIM with nearly 0.01 dB and 0.002 values, respectively. Meanwhile, NCTNN also achieves the lowest value in MSAD, which means an improvement in spectral fidelity.   In order to highlight the advantages of NCTNN over TNN minimization, the PSNR and SSIM curves for Washington DC Mall with 30% sampling rates are given in Figure 5. The indices of NCTNN are greater than that of TNN's, and the curve of NCTNN is more stable, which further demonstrates the outstanding performance of our proposed method in data completion with unbiasedness, as well as robustness. In Figure 5, the abnormal band 75 has a lower value than those of other normal bands. Figure 6 gives the restoration results. It can be obviously observed that TMac and LRTC perform worse in retaining the edges and textures. In comparison to NCTNN, TNN deletes some important textural information. In general, due to the non-convex relaxation of the tensor nuclear norm, NCTNN perfectly preserves the spatial geometric structures, which further illustrates that NCTNN has the highest recovery ability and the best stability compared with the other three methods.  To further verify the advantages of the proposed model, Figure 7 shows the spectral signatures before and after restoration of the Washington DC dataset with a 50% sampling rate. We choose the spectral signature of pixel (68, 42); it is obvious that the curve of NCTNN is the most similar to the original one, and the amplitude of vibration is the smallest, which shows that NCTNN has an improvement in spectral preservation as well as robustness.

Parameter Analysis
In order to illustrate the effect of rank selection on image restoration, we selected the Washington DC Mall dataset (SR = 10%) to carry out image restoration simulation experiments with different r, where r is taken from [20,20,20] to [40,40,40], and the results are shown in Figure 8a,b. It can be seen that the overall trend of MPSNR and MSSIM increases with the increase in r. When r > [25,25,25], the curve begins to decline. Therefore, r = [25,25,25] may be as a reference for other datasets.
In order to study the approximation ability of the proposed tensor -norm on the tensor rank, we choose the Washington DC Mall dataset (SR = 10%) to carry out image restoration simulation experiments with different , where the value of is from 0.5 to 6, and the results are shown in Figure 8c,d. It can be found that when 1, the effect of on the curve changes slowly. When 2, the curve gets the peak. Therefore, in experiment 1, we fixed it to 2 empirically.

Experiment 2: Comparative Experiments for NCTRPCA
In this experiment, the proposed NCTRPCA is compared with four methods: two convex methods: RPCA [7], TRPCA [31] and two non-convex methods: IRNN [15], NRLRWTV [13]. RPCA separates clean HSI from corrupted sparse noise with low-rank and sparsity regularization of the matrix, TRPCA is a convex relaxation based on TNN norm of tensor, IRNN is a non-convex nonsmooth minimization denoising method by iteratively reweighted nuclear norm, and NRLRWTV is a non-convex low-rank approximation along with a weighted total variation.

Comparative Experiments
For simulated experiments, Washington DC Mall and Pavia University datasets are corrupted by mixed Gaussian and salt-and-pepper noise. Each band is added with the same noise intensity.
Several different noise intensities are tested, specifically, for zero mean Gaussian noise, variances G are 0.025, 0.05, 0.075, and 0.1, while for salt-and-pepper noise, the percentages P are 0.05, 0.1, 0.15, and 0.2, respectively. Table 3 summarizes the values of RSE, MPSNR, MSSIM and MSAD after the recovery of Washington DC Mall and Pavia University datasets by five methods. It can be seen that TRPCA and NCTRPCA are superior to other matrix-based methods in denoising performance, which benefits from structural properties of the tensor. At the same time, compared with convex method TRPCA, the MPSNR and MSSIM indices of NCTRPCA have relatively large values, which indicate a better performance in maintaining the spatial structure and spectral information. For the Washington DC Mall dataset, Figure 9 shows the PSNR and SSIM curves recovered by three denoising methods with the noise intensity of G = 0.05 and P = 0.1. The curves of NRLRWTV and NCTRPCA are more stable. Meanwhile, the index values of NCTRPCA and TRPCA are higher than those of IRNN and NRLRWTV. This is because the IRNN, NRLRWTV and RPCA methods decompose HSI into matrix and solve the calculation in the process of model construction, while NCTRPCA and TRPCA deal with HSI in the third-order tensor manner. The tensor representation makes better use of the structural information contained in the spatial similarity and spectral correlation of HSI. Figure 10 shows the recovered image of abnormal band 100 with five denoising methods, among which the image of IRNN and RPCA are very blurry, and the edges of NRLRWTV, TRPCA and NCTRPCA are clearer. However, the recovered images of NRLRWTV contain more noise, and TRPCA is somewhat blurry, as shown in the enlarged parts. The proposed NCTRPCA exhibits a restored image with the highest quality.
For the Pavia University dataset, the advantages of NCTRPCA and TRPCA over other matrixbased methods are illustrated in Figure 11 by PSNR and SSIM curves with G = 0.05, P = 0.1. As it has been shown previously, the tensor-based methods are more robust than matrix-based methods. At the same time, the fluctuation of the NCTRPCA curve has the highest value with the lowest range, which means it has the best denoising ability and achieves the best unbiasedness. The visual comparison of band 87 is shown in Figure 12, along with the enlarged red box. Compared with the other four methods, NCTRPCA also obtains a noise-free image with more geometrical details simultaneously.
(a) (b) For real experiments that we performed on the HYDICE Urban dataset to compare the recovery results, the parameters are set the same as in the simulated experiments.
In the real experiments, the indices of RSE, PSNR and SSIM are incapable on account of missing groundtruth. However, MSAD between the noisy and denoised images can be computed to give a measure of the spectral similarity. Therefore, Table 4 gives the MSAD after the recovery of the HYDICE Urban dataset by five methods. It can be seen that compared with convex methods, nonconvex methods perform better in spectral preservation. TRPCA and NCTRPCA are superior to matrix-based methods. Furthermore, NCTRPCA has larger values than TRPCA, which shows a better performance in spectral preservation.
A visual comparison of spectral similarity is shown by plotting the spectral signatures of the noisy and the resulted images in Figure 13. The closer to the original curve, the higher similarity is. As an illustration, pixel (60, 50) of HYDICE Urban is selected randomly. We can find that IRNN has the biggest spectral distortion, while the curves of other non-convex methods are more close to those of the original. In addition, most of the curves are more stable except some abnormal bands. The results of RPCA and TRPCA, NRLRWTV and NCTRPCA are comparable, due to the same low-rank relaxation they employed. On the whole, the results of NCTRPCA are best, both on visual and quantitative assessment, which further demonstrates the effectiveness of the proposed method in spectral distortion.

Parameter Analysis
In order to illustrate the influence of rank selection on image restoration, as an example, we chose the Washington DC Mall dataset (G = 0.025, P = 0.05) to conduct noise removal simulation experiments with different r, where r is taken from [5,5,5] to [25,25,25], and the results are shown in Figure 14a,b. On the whole, the values of MPSNR and MSSIM first increase then decrease with the increase in r. When r = [15,15,15], the curves of MPSNR and MSSIM can reach an approximate peak value, and the model has the best denoising effect.
In order to study the approximate effect of the tensor -norm on the tensor rank, we chose the Washington DC Mall dataset (G = 0.025, P = 0.05) to carry out noise removal simulation experiments under different . Among them, the value of is from 0.5 to 2.5 as shown in Figure 14c,d. It can be found that when 1.5, the curve changes slowly. When 1.5, both MPSNR and MSSIM reach the maximum value. Therefore, we fixed to 1.5 in the experimental parameter selection. is to adjust the regularization parameters of the sparse term; we chose the Washington DC Mall dataset (G = 0.025, P = 0.05) to conduct noise removal simulation experiments under different , where p/ √ n n , and the value of is from 0.5 to 15, as shown in Figure 14e,f. It is easy to see that when 1, the value curves of MPSNR and MSSIM are relatively stable, that is, NCTRPCA is robust in the selection of parameters . Therefore, when 1, the curve reaches the peak value, and the best denoising performance can also be obtained in visual effects. Finally, time complexity is also analyzed. The proposed methods, along with other tensor-based methods, take more time than matrix-based methods. In addition, LRTC and TMac are faster than other methods; in LRTC, the Nesterov strategy is used to implement the nonsmooth optimization, while in TMac, low-rank matrix factorization is applied to speed up the algorithm. The other tensorbased methods, namely TNN, TRPCA, NCTNN, and NCTRPCA, are mainly negatively affected by the high complexity of SVD. More efficient algorithms for commutating SVD, such as random or parallel based algorithms, are encouraged for further investigation.

Conclusions
In this paper, to obtain an unbiased HSI restoration, we introduce the MCP penalization as a new non-convex penalty in tensor TNN minimization and TRPCA. On the whole, convex low-rank approximation methods are inferior to non-convex methods, and matrix-based methods also perform worse than tensor-based methods. Compared with the state-of-the-art non-convex and convex completion and denoising methods, the proposed methods, NCTNN and NCTRPCA, perform best both in visual and quantitative assessments, and achieve stable and robust results, especially on retaining spectral structure.
Although the proposed methods achieve stratifying restoration results, there are some aspects that need to be improved further. The low-rank-based methods focus on the global correlation while ignoring the nonlocal spatial similarity. Nonlocal tensor patches, which have been proven to be effective in image modeling, are going to be introduced to the proposed models in our future work. Besides, as mentioned above, the tensor-based methods are quite time-consuming, efficient processing techniques, e.g., parallel and random, are our on-going work.