Nonlocal Tensor Sparse Representation and Low-Rank Regularization for Hyperspectral Image Compressive Sensing Reconstruction

: Hyperspectral image compressive sensing reconstruction (HSI-CSR) is an important issue in remote sensing, and has recently been investigated increasingly by the sparsity prior based approaches. However, most of the available HSI-CSR methods consider the sparsity prior in spatial and spectral vector domains via vectorizing hyperspectral cubes along a certain dimension. Besides, in most previous works, little attention has been paid to exploiting the underlying nonlocal structure in spatial domain of the HSI. In this paper, we propose a nonlocal tensor sparse and low-rank regularization (NTSRLR) approach, which can encode essential structured sparsity of an HSI and explore its advantages for HSI-CSR task. Speciﬁcally, we study how to utilize reasonably the l 1 -based sparsity of core tensor and tensor nuclear norm function as tensor sparse and low-rank regularization, respectively, to describe the nonlocal spatial-spectral correlation hidden in an HSI. To study the minimization problem of the proposed algorithm, we design a fast implementation strategy based on the alternative direction multiplier method (ADMM) technique. Experimental results on various HSI datasets verify that the proposed HSI-CSR algorithm can signiﬁcantly outperform existing state-of-the-art CSR techniques for HSI recovery.


Introduction
Hyperspectral image (HSI) is a three-dimension data cube by simultaneously capturing the information over two spatial and one spectral dimensions.The abundant spatial-spectral information is able to provide more accurate and reliable signature features on distinct materials, which contributes to various applications such as scene classification [1], object detection [2], environmental monitoring [3], etc.However, due to the large data sizes of HSI, the storage and transmission on limited resource platform become a challenge problem.Although various methods, mainly including wavelet transform [4][5][6], TDLT + KLT [7], DPCM [8] and JPEG2000 [9,10], have been proposed to compress HSI effectively, they treat the HSI as a collection of single band images and neglect the spatial-spectral knowledge redundancy.Thus, how to build rational and powerful HSI compressive reconstruction models is still a worthy research issue.
Recently, the compressive sensing (CS) [11][12][13] theory offers a brand-new field for HSI acquisition or compression, which only needs to capture a small number of incoherent measurements in the imaging stage.Then, the acquired measurements can be employed to reconstruct the whole HSI.For convenient application of CS on HSI, many well-known techniques  have been presented to convert an HSI into a sparse signal.Although HSI CS can greatly reduce the resource consumption on imaging, storage and transmission compared with those conventional compression methods, how to reconstruct precisely the HSI from fewer measurements is still a challenging problem.
One of the main concerns to the ill-posed reconstruction problem is to convert HSI into sparse description form via imposing some proper sparsity priors.For example, some effective sparsity terms with l 0 , l 1 and l p (0 < p < 1) norms [13][14][15][16] have been presented to characterize the sparsity for signal recovery, but those methods neglect the underlying structure information.Regularization-based approaches usually incorporate the prior knowledge into the observation model and develop a united framework [17][18][19][20].For those methods, one key issue is how to design a proper regularization term to characterize the sparsity of HSI.The works in [21][22][23] mainly consider the sparsity of abundance matrix by the linear unmixing of an HSI, and then HSI CS models are built using spectral unmixing procedures.By introducing structured sparsity across spatial or spectral dimension, Zhang et al. [24][25][26][27][28] extended the compression method based sparse representation/dictionary learning to HSI compression.More recently, Meza et al. [29,30] explored the group sparsity based spatial/spectral redundancy structure to achieve HSI compressive sensing reconstruction (HSI-CSR).The HSI CS model proposed by Golbabaee et al. [31][32][33][34] utilized the piecewise smooth structure to explain the underlying gradient sparsity of an HSI.However, as those techniques depict the HSI sparsity in vector space, the description form of sparsity is treated as one vector without considering its multidimensional structure.It will inevitably induce losses and distortions of useful structure information.
Tensor-based HSI-CSR approaches can improve remarkably the HSI recovery quality, since the existing methods jointly take into account the spatial-spectral information, and reduce the losses and distortions caused by HSI reshaping [35][36][37][38][39][40][41][42][43][44].Karami et al. [35,36] exploited discrete wavelet transform and Tucker decomposition (DWT-TD) to encode the spatial-spectral information of HSI.The core idea behind those techniques is first to use DWT to effectively separate an HSI into different sub-images, and then to apply TD on the DWT coefficients of HSI bands to compact the energy of sub-images.Zhang et al. [37,38] compressed an HSI to the core tensor and the HSI could be reconstructed by the multi-linear projection of the factor matrices.Those methods only consider an HSI as a whole 3D tensor while they are short of more potent constraints on spatial-spectral structure of an HSI.Yang [39] employed nonlinear tensor sparse representation to recover an HSI from small number of measurements, and some training examples are required.Wang [40] used the global spatial-spectral correlation and local smoothness properties underlying in an HSI to enhance the HSI-CSR task, in which the tensor Tucker decomposition and 3-D total variation jointly characterize the sparsity of an HSI.Du [41] proposed a patch-based low-rank tensor decomposition for HSI-CSR algorithm that combined the nonlocal similarity across the spatial domain and the low-rank property over spectral domain in a united framework.
Although methods reported in [37,38,40,41] are considerably effective for HSI-CSR compared with vector based approaches, it is difficult to estimate the accurate rank under tensor decomposition and further acquire unique decomposition.Thus, the methods based on tensor decomposition cannot provide an elaborate characterization on spatial-spectral information in HSI-CSR problem.In [42,43], this reasonable usage of the global correlation across spectrum (GCS) and nonlocal self-similarity over space (NSS) prior knowledge have led to quite powerful HSI denoising algorithms, and the effectiveness of GCS and NSS for HSI-CSR has not been reported in the public literature.Such facts inspire us to solve the challenging HSI-CSR problem by the structured sparsity based on GCS and NSS in this paper, and a unified framework combining nonlocal tensor sparse representation and low-rank regularization is proposed for HSI-CSR, as shown in Figure 1.The main contributions of this paper are listed as follows.1.To the best of our knowledge, we are the first to exploit GCS and NSS to construct the nonlocal structure sparsity of HSI that is a faithfully structured sparsity representation form for HSI-CSR task.2. For each cube that is formed by grouping nonlocal similar cubes, the tensor representation based on tensor sparse and low-rank approximation is introduced to encode the intrinsic spatial-spectral correlation.3. The HSI-CSR task is treated as an optimization problem; we resort to alternative direction multiplier method (ADMM) [44] to solve it.
A preliminary version of this work has appeared in [45], which presents the basic approach.In [45], we established the nonlocal structured sparsity from the perspective of the tensor low-rank property, which adopts the two most commonly used tensor low-rank representation forms: tensor low-rank approximation and tensor low-rank decomposition.In this paper, we depict the nonlocal structured sparsity via the tensor low-rank approximation and sparse representation.Although the tensor low-rank decomposition and sparse representation are derived from the Tucker decomposition model, the former needs to preset the ranks along all dimension while the latter introduces an l 1 -based sparse term on core tensor.In practical application, the latter possesses the reliable capability to represent the high-dimension data by mitigating the tensor rank overfitting or underfitting.In addition, this paper adds: (1) the detailed background of HSI-CSR; (2) the theoretical analysis of NTSRLR; and (3) additional HSI-CSR experiments.
The remainder of this paper is organized as follows.Section 2 introduces the tensor notations and operations commonly used in this paper, and background of CS.In Section 3, a novel algorithm for HSI-CSR based on the NTSRLR model is proposed.Section 4 demonstrates the results of extensive experiments and Section 5 draws the conclusion.

Notations
Throughout the paper, we denote scalars, vectors, matrices and tensors by non-bold letters, bold lower case letters, bold upper case letters and calligraphic upper case letters, respectively.Besides, we introduce some necessary notations and preliminaries about tensor as follows.A tensor of order N, which corresponds to a N-dimensional data array, is denoted as Definitions of tensor terminologies in the paper follow exactly the same description in [46] ..,i N and X 0 as the F-norm, l 1 norm and l 0 norm of a tensor X , respectively.X 0 ≤ K means that K is the number of non-zero entries of X .It is convenient to unfold a tensor into a matrix during the algorithm.The "unfold" operation along the mode-n on a tensor X is defined as unfold n (X ) , and its opposite operation "fold" is defined as fold n (X (n) ) := X .The Kronecker product of matrices A ∈ R I×J and B ∈ R K×L is a matrix of size IK×JL, denoted by A ⊗ B. The multiplication of a tensor X with a matrix Y ∈ R I k ×J k on mode-k is denoted by X × k Y = Z, which also can be defined in terms of mode-k unfolding as Z k = YX k .Definition 1. (Tucker decomposition) [46]: The Tucker decomposition form of a tensor X is: where G ∈ R J 1 ×J 2 ×•••×J N is the core tensor and it reflects the interaction between components along different modes, and Un ∈ R I n ×J n is the orthogonal factor matrix in each mode.Thus, we can achieve the k-unfolding form of Tucker decomposition in Equation ( 1)

Background of HSI-CS
For a given HSI X ∈ R W×H×S (W × H spatial resolution and S spectral bands), x ∈ R W HS denotes the vector form of X .Let N = W HS, then the compressive measurement y ∈ R M can be obtained from the following CS model: y=Φx where Φ ∈ R M×N (M < N) denotes the compressive operator.The CS theory indicates that a sufficiently sparse signal x can be exactly reconstructed from only a few observation y when the compressive operator Φ satisfies the restricted isometry property (RIP) [11].Under the RIP, the ill-posed recovery problem can be formulated into following form by pursuing the sparsest signal x, i.e., where • 0 denotes l 0 norm as a sparsity constraint.However, the l 0 norm minimization in Equation ( 4) is combinatorially NP-hard and unstable with the noise.For this reason, a feasible strategy is to replace nonconvex l 0 norm as a convex l 1 counterpart [15,47] as follows: The optimization for above l 1 -minimization CS problem can resort to iterative shrinkage algorithm [48] and Bregman Split algorithm [49].
Since an HSI can be sparsely represented in a certain domain, many CS models have been proposed for an HSI.Zhang et al. [21][22][23] unmixed the HSI into a spatially sparse abundance matrix with an endmember matrix.Meza et al. [29][30][31] extracted the spatial/spectral redundancy structure and then applied the group sparsity constraint.Golbabaee [34] used a wavelet basis to transform the HSI into a sparse matrix, and then adopted the low-rankness and l 1 norm to jointly encode sparsity of the matrix.Zhang et al. [37,38] depicted the sparsity of an HSI in the core tensor domain, instead of reshaped vector domain.Further works [39][40][41] employ the sparse tensor decomposition to characterize sparsity of an HSI.However, those sparsity constraint terms are incapable of capturing the underlying structure in an HSI or handling the unwanted noise and artifacts in the CSR procedure.In our method, we try to cope with those problems by introducing more refined prior knowledge of an HSI to perfectly promote HSI-CSR performance.

The Proposed HSI-CSR via NTSRLR
Structured sparsity is of great importance to the HSI-CSR model that often reveals the rich self-repetitive structures over spatial domain and the highly correlated bands across the spectral domain.Several previous works exploiting nonlocal prior have indicated that the structured sparsity based on nonlocal self-similarity is fairly effective for image restoration [18,19].However, the research works in HSI-CSR fields have not been documented.In this paper, we present a unified framework for HSI-CSR using the structured sparsity via nonlocal tensor sparse representation and low-rank approximation.

Non-Local Tensor Formula for Structure Sparsity
The proposed regularization model for structured sparsity consists of two steps: cube grouping for characterizing GCS and NSS and tensor formulation for sparsity enforcement.

Non-Local Structure Sparsity Analysis
Concerning the GCS and NNS underlying an HSI, we provide an analysis for nonlocal tensor sparsity and low-rankness, as illustrated in Figure 2. To begin with, for an initial third-order tensor HSI X ∈ R W×H×S (e.g., PaviaU dataset), we divide the HSI into a group of 3D full-band cubes (FBC) {P i,j } 1≤i≤W−w+1,1≤j≤H−h+1 ∈ R w×h×S (w < W, h < H) with overlaps.For the exemplar cube P i,j of size 8 × 8 × 60 located at spatial position (i, j) in Figure 2a marked in red, we first search K-1 (here, we set K = 80) similar cubes by k-NN within a local window (e.g., 70 × 70), shown as k-NN clustering in Figure 2b.Then, to avoid destroying the high spectral correlation, we unfold a series of 3D cubes into corresponding 2D matrices along the spectral modes (Figure 2c), and obtain a new third-order tensor Y p of size 64 × 80 × 60 by stacking a series of similar items (Figure 2d), where p = 1, . . .P, and P denotes the group number.Such constructed third-order tensor simultaneously employ the spatial local sparsity (mode-1), the non-local similarity between cubes (mode-2) and strong spectral correlation (mode-3).The outcome of such arrangement maximizes the benefit from nonlocal tensor representation form.Next, we give a visual interpretation for the nonlocal tensor sparsity and low-rank property.
First, by Tucker decomposition for a nonlocal similar cube group from PaviaU dataset, Figure 2e shows the location of singular values in the core tensor, where redder and bluer colors of elements represent large values and smaller values, respectively.To further understand the sparsity of tensor core, Figure 2(e2)-(e4) present three typical slices of core tensor.It is easy to find that the core tensor satisfies sparse property, with 82.59% of its elements being zeroes.Second, the low-rank analysis is performed along its local spatial, nonlocal spatial, and global spectral modes, as shown in Figure 2f.Evidently, the decaying trends of singular values on three curves (pink, blue and green curves correspond to local spatial, nonlocal spatial, and global spectral modes, respectively) indicate there are strong correlations in the three modes.Comparatively, the decaying trend of the curve in mode-2 is most drastic, which is consistent with the nonlocal spatial low-rank theory of an HSI given in [50].According to the definition of the accumulation energy ratio (Aer) of top k singular values in [50], we calculate Top 10 singular values of three modes and attain the Aers of 0.8029, 0.9031 and 0.8186.The quantitative values (i.e., Aers) also indicate that each cube by grouping nonlocal similar cubes can possess strong low-rank correlation along the mode-2.

Non-Local Structure Sparsity Modeling
In Figure 2f, we can observe that the formed FBCs possess the low-rank property, and a tractable strategy is to use the mode-n rank(r 1 , . . ., r n ) to estimate tensor rank by Tucker decomposition [46].For an Nth-order tensor X , the Tucker rank is defined as rank(X ): = [rank(X (1) ), rank(X (2) ), . . ., rank(X (N) )], where X (i) is the mode-i unfolding of X [51].Motivated by the practical applications that the nuclear norm is the convex envelope of the matrix rank within the unit ball of the spectral norm, further tensor nuclear norm, X * = ∑ N n=1 α n X (n) * is defined as weighting the unfolding matrix nuclear norm along each mode.Thus, we resort to the following relaxation form for each X p to characterize the low-rank property based on GCS and NSS: where denotes the nuclear norm of matrix X p(i) of size m × n.In practice, {Y p } P p=1 may contain some noise, the data Y p can be modeled as: Y p = X p + W p , where X p and W p denote the low-rank component and the noise component, respectively.Hence, we can estimate the low-rank tensor X p via the following optimization problem: where ε is associated with the noise level.The model in Equation ( 7) is similar to the matrix cases in [18], the difference primarily reflected in that we consider the combination with the correlations along local-nonlocal spatial modes and spectral mode, and measure the low-rankness of a third-order tensor X p by a weighted sum of the rank along each unfolding.Besides, considering the strong nonlocal spatial low-rankness along mode-2 than two other modes, we set a larger weight for mode-2 in our experiments.
In addition, as shown in Figure 2e, we give a detailed analysis for another notable representation form for the sparsity prior based on tensor sparse decomposition, which suggests that we can depict the structured sparsity of an HSI from the perspective of core tensor.Some pioneering works are presented in [42,43,[52][53][54].Here, we draw attention to the structured sparsity formulation of an HSI under tensor sparse representation framework, thus each third-order tensor X p can be approximated by following problem: where U1p, U2p, and U3p are factor matrices and S(G p ) is sparse constraint term, and we assume S(G p ) = G p 0 as suggested in [42,43,52].However, the optimization problem based on l 0 constraint deduced by Equation ( 8) is non-convex, the research in [53,54] further relaxes the l 0 -based core sparsity to l 1 case as S(G p ) = G p 1 .The convex optimization problem corresponding to l 1 case can be represented in Lagrangian form as following: where λ 1 and λ 2 are the trade-off parameters.Essentially, all factor matrices are orthogonal dictionaries along local-nonlocal spatial modes and spectral mode.It can be seen that the tensor sparse representation model explores the GCS and NSS of HSIs in different dimensions by adaptive multi-dictionaries learning.Compared with the matrix sparse representation technique [19,20], the advantage of tensor modeling is that it not only characterizes the spatial-spectral correlation but also the correlation over nonlocal similar cubes in an HSI.

Proposed Model
Based on the previous analysis, we now derive the following model for solving the HSI-CSR problem: where λ 3 is the regularization parameter.It is worth noting that the proposed model can fully exploit the underlying prior over spatial-spectral domain in an HSI, and thus is expected to have a strong ability to enhance HSI-CRS task.

Optimization Algorithm
For the proposed HSI-CSR model, we apply the ADMM [44], an effective strategy for solving large scale optimization problems, to solve Equation (10).Firstly, we replace S(G p ) and L(X p ) with the G p 1 and X p * , respectively, and introduce P auxiliary tensors {M p } P p=1 and equivalently reformulate Equation (10) as follows: Then, its augmented Lagrangian function is: where {Z p } P p=1 and Λ are the Lagrange multipliers, λ 4 is the positive scalars.We shall break Equation ( 12) into five sub-problems and iteratively update each variable via fixing the other ones.
(a) U1p, U2p, U3p problem: which is equivalent to the following sub-problem: where can be easily solved by the method as suggested in [53,54].
It can be rewritten as min It can be solved by the Tensor-based Iterative Shrinkage Thresholding Algorithm (TISTA) in [53,54].(c) M p sub-problem: min It can be briefly reformulated as: min where As suggested in [51], its close-form solution is expressed as: For a given matrix X, the singular value shrinkage operator S τ (X) is defined as S τ (X): = U X D τ (Σ X )V T X , and where It is easy to observe that optimizing L with respect to x can be treated as solving the following linear system: where denotes the vectorization operator for a matrix or tensor, and Φ * indicates the adjoint of Φ. Obviously, this linear system can be solved by well-known preconditioned conjugate gradient technique.(e) Update the multipliers where ρ is a parameter associated with the convergence rate at values of, e.g., [1.05-1.1].The whole optimization procedure for the proposed HSI-CSR model can be summarized as Algorithm 1, and we abbreviate the proposed method as NTSRLR.

Input:
The compressive measurements y, measurement operator Φ, and the parameters of the algorithm.1: Initialization: Initializing an HSI x (0) via a standard CSR method (e.g., DCT based CSR).
2: For l = 1 : L do 3: Extract the set of tensor {X p } P p=1 from x (0) via k-NN search the each exemplar cube; 4: For p = 1 : P do 5: Solve the problem (12) by ADMM;

9:
Updating the multipliers Z p via Equation ( 23

Experimential Results and Analysis
In this section, various experiments on real HSI datasets are executed to assess the performance of the proposed NTSRLR method.We chose eight popular methods for comparisons, namely the three classic CS methods including StOMP [55], BCS [56] and multidimensional signal based KCS [57]; total variation based methods with LRTV [34] and TVAL3 [58]; structured sparsity based HSI-CSR methods with RLPHCS [24], SRPREC [25] and CSFHR [28]; and the recent joint tensor decomposition regularization and total variation based method (JTRTV) [40].These methods represent state-of-the-art HSI-CSR, especially LRTV and JTRTV, which fully consider the HSI sparsity priors.In comparison experiments, we used the default parameter settings of those compared methods described in the reference papers.We adopted random measurement matrix as the sampling operator for all methods.

Quantitative Metrics
To evaluate the HSI-CSR performances of all methods, five quantitative picture quality indices (PQIs) were employed in experiments.The first index is mean peak signal-to-noise ratio (MPSNR), which is defined as the average PSNR of all bands for HSI, e.g.
where X s and X s denote sth band images of ground truth X ∈ R W×H×S reconstructed HSI X ∈ R W×H×S , respectively, and both of them are scaled to the range [0; 255].The second index, mean structure similarity (MSSIM), was used to evaluate the similarity between the reconstructed HSI and the original HSI based on structural consistency, which is defined as average SSIM [59] of all bands for HSI, The third index, mean feature similarity (MFSIM), emphasizes the perceptual consistency with the original image, which is defined as average FSIM [60] of all bands for HSI, High values of these three measures MPSNR, MSSIM and MFSIM represent better reconstructed results.
The fourth index is the spectral angle mapper (SAM) [61], which calculates the average angle between spectrum vectors of the CS reconstructed HSI and the reference one across all spatial positions; its definition is as follows: where x and x denote vector form of the ground truth X reconstructed HSI X , respectively.The fifth index is the Erreur relative globale adimensionnelle desynth èse (ERGAS) [62], which measures fidelity of the CS reconstructed HSI based on the weighted sum of MSE in each band, defined as follows where MSE(X s , X s ) is the mean square error between X s and X s , and µ 2 X s is the mean value of X s .Different from the former three PQI measures, smaller values of these two measures represent better reconstruction performances.

Experiments on Noiseless HSI Datasets
All methods are evaluated on three HSIs, namely Toy from the CAVE dataset (http://www1.cs.co lumbia.edu/CAVE/databases/multispectral/),PaviaU and corrected Indian Pines from hyperspectral remote sensing scenes (http://www.ehu.eus/ccwintco/index.php?title=Hyperspectra-Remote-Sen sing-Scenes).The Toy is full spectral resolution reflectance data from 400 nm to 700 nm at 10 nm steps (31 bands total), with spatial resolution 512 × 512.The PaviaU dataset contains 103 bands, including 610 × 340 pixels.The Indian Pines is of size 145 × 145 with 10 m spatial resolution and consists of 200 bands via removing 20 noisy bands polluted by water absorption, which covers the wavelength in the range from 400 to 2500 nm by 10 nm spectral resolution.We conducted experiments on the three HSI datasets mainly for the following reasons.(1) The three HSI datasets possess higher spatial-spectral resolutions and richer non-local similarity, which facilitates that the structured sparsity across spatial-spectral domains is employed in our HSI-CSR model.(2) These HSIs are benchmark testing datasets in HSI reconstruction, as presented in [21,22,24,25,40,42,43,45,50,53,54].(3) We selected the dataset with classification label, Indian Pines, which helps to compare all methods in term of classification accuracy.For the experiment, we cropped a sub-region of 300 × 300 for all bands of Toy and PaviaU, as shown in Figure 3.To validate the performance of proposed method, five different sampling rates (SR), namely 0.02, 0.05, 0.10, 0.15 and 0.20, were considered.

Visual Quality Evaluation
To visually demonstrate the HSI-CSR performances of the proposed method, we present the pseudocolor images with bands (25,15, 5), bands (55,30,5), and bands (23, 13, 3) of reconstructed Toy, PaviaU and Indian Pines obtained by all methods under sampling rates of 0.20, 0.10 and 0.15 in Figures 4-6, respectively.We have the following observations.(1) All the competing methods achieved relatively good reconstructed results.(2) The proposed method outperformed the other methods, as shown by the enlarged subregion (delineated in a red box), where the large-scale sharp edges and small-scale fine texture features are reconstructed well, as shown in Figures 4, 5 and 6j.
(3) The method StOMP produced serious noise during reconstruction and the details are blurred in the results of BCS, KCS and CSFHR.Instead of l 1 -based sparsity term, the TVAL3 utilizes the TV regularization based on gradient sparsity to preserve the more accurate edges but many details are lost.Although LRTV simultaneously considers the gradient sparsity and low-rankness of the data, the lack of an effective constraint for nonlocal spatial information will generate blurring artifacts.The JTRTV method is a generalization of LRTV for high-dimensional data, although it can deal with the artifacts problem generated by LRTV, it introduces unwanted noises.The RLPHCS and SRPREC consider the structure sparsity based on the reweighted Laplace prior.Nevertheless, their reconstructed results are unsatisfactory and the two methods appear to be virtually powerless for HSI-CSR.We provide following justifications about poor performance of RLPHCS and SRPREC: (1) The two HSI-CSR models use the maximum a posteriori framework to learn the hyperparameters; the accumulation of estimated bias for parameters may lead to a poor HSI-CSR performance.(2) The collected dictionaries in RLPHCS and SRPREC algorithms may not be overcomplete, which do not fully consider the redundant structure over spatial and spectral domain.This demonstrates the effectiveness of NTSRLR technique for HSI-CSR, greatly preserving the local details and structural information of the HSI.

Quantitative Evaluation
In Tables 1 and 2, we provide the performance of all methods using MPSNR, MSSIM, MFSIM, SAM and ERGAS results, over all the spectral bands in Toy, PaviaU and Indian Pines.We highlight the best results for each case in bold in the current and following tables.The proposed method outperforms the other approaches under all sampling rates and in particular the PQIs are better than the recent JTRTV.At sampling rate ρ = 0.02, NTSRLR improves the MPSNR at least 10 dB more than JTRTV on the Toy, 1.3 dB better on the PaviaU, and 2.7 dB better on the Indian Pines.For ρ = 0.20, the average gain of MPSNR values of NTSRLR are more amplified compared with JTRTV, up to 14 dB on Toy, 8 dB on PaviaU and 7 dB on Indian Pines.MSSIM, MFSIM, SAM and ERGAS values values under three HSI datasets further confirm the robustness of the proposed method at all sampling rates.Although LRTV is second best method, obviously it still is inferior to ours by visual quality evaluation.Since NTSRTR explores the underlying nonlocal structure of an HSI by the tensor sparse representation and low-rank modeling, it gives higher MPNSR, MSSIM, and MFSIM values, and smaller SAM and ERGAS than the other methods, which only consider the local or single sparsity prior.
(a) StOMP [55] (b) BCS [56] (c) KCS [57] (d) LRTV [34] (e) TVAL3 [58] (f) RLPHCS [24] (g) SRPREC [25] (h) JTRTV [40] (i) CSFHR [28] (j) NTSRLR (k) Original (a) StOMP [55] (b) BCS [56] (c) KCS [57] (d) LRTV [34] (e) TVAL3 [58] (f) RLPHCS [24] (g) SRPREC [25] (h) JTRTV [40] (i) CSFHR [28] (j) NTSRLR (k) Original The values of PSNR, SSIM and FSIM across all bands on Indian Pines under sampling rate ρ = 0.10 are presented in Figure 7.The proposed method achieves the best PSNR, SSIM and FSIM values in most bands of the HSI, which also further validates the robustness of the proposed method over all spectral bands.To further illustrate the superiority of proposed NTSRLR on spectrum reconstruction, we chose four regions in Toy and PaviaU datasets shown Figure 8a,d; the average reflectance differences were calculated between reconstructed spectra and original spectra across all bands.The curves of those average reflectance differences are plotted in Figure 8b,c      The classification accuracy of the HSI with different algorithms was employed to further verify the effectiveness of the proposed method.Under the same circumstance, we chose the support vector machine (SVM) [63] and overall accuracy (OA) as the classifier and evaluation index, respectively.During the classification results with SVM algorithm, we used 16 ground-truth classes in Indian Pines and 10% randomly generated training sets from each class to test the classification accuracy.The classification results with different HSI-CSR methods under sampling rate ρ = 0.20 are revealed in Figure 9a-j.The OA are given in Table 3.As shown in Figure 9j, the classification results in original HSI appear continuous, and the OA is 86.37%.As shown in Figure 9i, the classification results of NTSRLR still show a continuous phenomenon, and the OA of NTSRLR is closer to the reference value.However, the classification results of other methods are more fragmentary in most regions of the image, with lower OA values.

Robustness for Noise Suppression during HSI-CSR
To further evaluate the effectiveness and robustness of proposed HSI-CSR method for noise suppression, we chose the Urban dataset (http://www.tec.army.mil/hypercube)contaminated by different degrees of mixture noise, which was with size of 307 × 307 and 4 m spatial resolution, and covers the wavelength in the range from 400 to 2400 nm by 10 nm spectral resolution.Under same competing methods, we removed 24 bands seriously affected by atmospheric attenuations and water absorptions, and finally reserved 186 bands for the dataset.
We present the pseudocolor image with bands (186, 131, 1), in which the input data is polluted by Gaussian noise and stripes, as shown in Figure 10k.The CSR results produced by StOMP, BCS, CSFHR and TVAL3 could neither recover the original HSI nor perform the denoising task well.Instead, the methods RLPHCS and SRPREC amplified the noise.Although the methods KCS, LRTV and JTRTV could suppress the noise to some extent, they lost the edges and textural details when compared to NTSRLR.
Furthermore, we present the quantitative comparisons by showing the horizontal mean profiles of bands 1 and 186 in Urban dataset before and after CSR in Figures 11 and 12.The horizontal axis in the figure denotes the row number, and the vertical axis represents the mean gray value of each row.As shown in Figures 11k and 12k, the profiles have huge fluctuation due to the disturbance of noises.After CSR, the fluctuation has been moderately alleviated.Evidently, the profiles with the proposed NTSRLR method are more natural and smoother.The over-smooth profiles corresponding to BCS are mainly due to the image blurring.This further substantiates the efficiency and robustness of the proposed HSI-CSR method for noise suppression.
Note that we removed all noisy bands and preserved only 171 bands for quantitative assessment.Table 4 presents MPSNR, MSSIM, MFSIM ERGAS and SAM of all methods under sampling rates 0.10, 0.15 and 0.20.It can be seen that our method not only recovered the structural and perceptual feature of Urban dataset, but also preserved better spectral information.

Effectiveness Analysis of Single NTSR or NTLR Constraint
To further demonstrate the effectiveness of nonlocal tensor sparse representation and low-rank regularization in our model, we conducted two more experiments using the PaviaU dataset.The first experiment was to perform CSR without the nonlocal tensor low-rank regularization term, and the reconstructed HSI was achieved solely by nonlocal tensor sparse representation (NTSR).The second experiment was a reconstruction with the nonlocal tensor low-rank regularization method, but without NTSR, which is abbreviated as NTLR.
Figure 13 shows the comparison results of MPSNR, MSSIM and SAM of all methods under sampling rates from 0.05 to 0.20 with interval 0.05.Compared with other methods, the proposed NTSRLR obtained larger MPSNRs and MSSIMs, and smaller errors as measured by SAM under different sampling rates.In particular, when the sampling rate is small, the results from NTSRLR are significantly better than the NTSR and NTLR, which are based on a single constraint.This provides additional evidence for the effectiveness of the proposed method from the perspective of having integrated constraints with both non-local sparse representation and low rankness in our model.

Computational Complexity Analysis
For an input HSI X ∈ R W×H×S , the number of FBCs is P = O(W H), the size of each FBC group is wh × s × S, where s is number of FBCs in each group.The computation cost seems not very small for quite large P.However, CSR on the P FBCs can be processed in parallel, each with relatively small computational complexity.The computational complexity of the proposed algorithm that mainly lies in the update of M p(i) , U ip (i = 1, 2, 3).Updating U ip requires computing an SVD of I i × I i matrix, and updating M p(i) requires computing an SVD of I i × (∏ j =i I j ) matrix.Relatively, the other variables G p , x and multipliers updating will not consume lots of running time.

Convergence Analysis
Lastly, we have conducted experiments to show the convergence of our method using the Toy and Indian Pines dataset as examples under different sampling rates and different initializations.Figure 14 plots the PSNRs versus iteration numbers for the tested HSIs when the sampling rates are at 0.10 for Toy and 0.15 for Indian Pines, when using initialization x = Φ * y and DCT.As can be seen, the different initialization ways can provide quite close solutions, which indicates the performance of proposed algorithm is not sensitive to initialization.However, the two initialization ways possess different rates of convergence, and, by contrast, the initialization via DCT requires only a small number of iterations to get to the final PSNR.Therefore, we adopted the initialization strategy based on DCT to speed up our algorithm.Besides, the value of PSNR will become a constant when the algorithm converges.Thus, in the experiment, we set the maximum number of iterations for termination condition.

Parameters Analysis
There are four parameters {λ i } 4 i=1 in the proposed model.Considering the different roles of nonlocal tensor sparseness and low-rankness terms, we conducted two more experiments on PaviaU dataset in Section 4.4.The results of MPSNR, MSSIM and SAM demonstrate the nonlocal tensor low-rank regularization term plays a more important role in proposed model than nonlocal tensor sparse representation term.It implies that the nonlocal tensor low-rankness term should be assigned a greater weight to balance the two parts.Therefore, we set λ 2 = 1 and λ 3 = 10 in all our experiments.Correspondingly, we can regard the other two parts with λ 1 and λ 4 tradeoff as loyalty terms of the nonlocal tensor sparseness and low-rankness; it is reasonable to obtain a greater value for λ 4 , and we set λ 1 = 0.02 and λ 4 = 250, as suggested in [42].
Besides, the spatial size of cube and the number of non-local similar cubes are two key parameters.Some research [17,18,30,41] reports that the spatial size of cube and the number of non-local similar cubes are dependent on sampling rates.The higher the sampling rate is, the more detailed information of texture and structure the HSI loses.For this reason, the bigger spatial size and more non-local similar cubes are beneficial to provide extra knowledge to further promote the HSI reconstruction performance.Thus, according to the parameter setting principle in [17,18,30,41], we set spatial size to 6 × 6, 7 × 7, 8 × 8, 9 × 9 and 10 × 10 for ρ = 0.20, 0.15, 0.10, 0.05 and 0.02, respectively; and the corresponding number of non-local similar cubes are set to 50, 55, 60, 65 and 70.

Conclusions
In this paper, we propose a novel method for hyperspectral image compressed sensing reconstruction by non-local tensor sparse representation and low-rank regularization.The proposed method considers intrinsic structured sparsity, where the nonlocal similarity between spatial cubes and the global correlation across all bands are considered fully.Each cube group contains similar structures; its tensor-based sparsity and low-rank properties can be regarded as very valuable priors.Experimental results reveal that the proposed methods outperform the state-of-the-art methods in term of visual inspection, quantitative and classification accuracy assessment.The proposed method is also superior in noise suppression.We also conclude that it is advantageous to have integrated constraints using both non-local tensor sparse representation and low-rankness rather than using only one of them in our model.

Figure 1 .
Figure 1.Flowchart of the proposed HSI-CSR algorithm, which consists of two steps: sensing and reconstruction.First, it acquires the compressive measurement y by a random sampling matrix Φ.Second, NTSRLR recovers an HSI from the measurements y = Φx.

Figure 2 .
Figure 2. Nonlocal tensor sparsity and low-rank property analysis in HSI.

Figure 4 .
Figure 4. Compressive sensing reconstructed results on pseudocolor images with bands (25,15, 5) of the Toy image from different methods under sampling rate ρ = 0.20.

Figure 6 .
Figure 6.Compressive sensing reconstructed results on pseudocolor images with bands (23, 13, 3) of the Indian Pines image from different methods under sampling rate ρ = 0.15.
for Toy and Figure8e,f for PaviaU.It is obvious that the reflectance difference between the reference and the reconstruction by NTSRLR is close to zero-much better than the other comparison methods.

Figure 7 .
Figure 7. PSNR, SSIM and FSIM values comparison of different methods for each band on Indian Pines dataset under sampling rate ρ = 0.20.

Figure 8 .
Figure 8.Comparison of spectra difference on Toy and PaviaU datasets: (b,c) the spectra difference curves of different methods corresponding to the region marked by cyan and green rectangles of Toy in (a) under sampling rate ρ = 0.05; and (e,f) the spectra difference curves of different methods corresponding to the region marked by red and blue rectangles of PaviaU in (d) under sampling rate ρ = 0.10.

Figure 9 .
Figure 9. Classification results for the Indian Pines image using SVM before and after CSR under sampling rate ρ = 0.20.

Figure 10 .
Figure 10.Compressive sensing reconstructed results on pseudocolor images with bands (186, 131, 1) of the noisy Urban image from different methods under sampling rate ρ = 0.10.

Figure 11 .
Figure 11.Horizontal mean profiles of compressive sensing reconstructed results on 1st band of real noisy Urban HSI data from different methods under sampling rate ρ = 0.10.

Figure 12 .
Figure 12.Horizontal mean profiles of compressive sensing reconstructed results on 186th band of real noisy Urban HSI data from different methods under sampling rate ρ = 0.10.

Figure 13 .
Figure 13.MPSNR, MSSIM and SAM bars of different methods under sampling rates 0.05 to 0.20 with interval 0.05 on PaviaU dataset.

15 Figure 14 .
Figure 14.Verification of the convergence of the proposed method.Progression of the PSNRs for the Toy and Indian Pines datasets under different sampling rates.

Table 1 .
MPSNRs, MSSIMs, and MFSIMs of different CSR methods on three selected HSIs under different sampling rates.

Table 2 .
SAM and ERGAS comparisons of different CSR methods on three selected HSIs under different sampling rates.

Table 3 .
Classification performance comparison before and after CSR on Indian Pines under different sampling rates.

Table 4 .
MPSNRs, MSSIMs, MFSIMs ERGAS and SAM of different CSR methods on Urban with different sampling rates.