Global and Local Tensor Sparse Approximation Models for Hyperspectral Image Destriping

: This paper presents a global and local tensor sparse approximation (GLTSA) model for removing the stripes in hyperspectral images (HSIs). HSIs can easily be degraded by unwanted stripes. Two intrinsic characteristics of the stripes are (1) global sparse distribution and (2) local smoothness along the stripe direction. Stripe-free hyperspectral images are smooth in spatial domain, with strong spectral correlation. Existing destriping approaches often do not fully investigate such intrinsic characteristics of the stripes in spatial and spectral domains simultaneously. Those methods may generate new artifacts in extreme areas, causing spectral distortion. The proposed GLTSA model applies two (cid:96) 0 -norm regularizers to the stripe components and along-stripe gradient to improve the destriping performance. Two (cid:96) 1 -norm regularizers are applied to the gradients of clean image in spatial and spectral domains. The double non-convex functions in GLTSA are converted to single non-convex function by mathematical program with equilibrium constraints (MPEC). Experiment results demonstrate that GLTSA is e ﬀ ective and outperforms existing competitive matrix-based and tensor-based destriping methods in visual, as well as quantitative, evaluation measures.

In tensor algebra, an HSI dataset is a three-mode tensor, where the first two modes represent spatial information, while the third mode gives spectral information of surface materials. HSI destriping methods can be divided into two categories: matrix-based (band-by-band) methods [13,17] and from existing methods are either over-smoothed or degraded with artifacts see the Figures presented in the experimental part.
Tensor-based destriping methods can fully exploit structural spectral-spatial correlation [35,36]. Low-Rank Tensor Decomposition (LRTD) in [20] separated the stripes from the perspective of lowrank tensor decomposition. In [35], a weighted low-rank tensor recovery (WLRTR) model is extended to WLRTR robust principal component analysis (WLRTR-RPCA) to extract the stripes. In [36], destriping is implemented with combined low-rank approximation and nonlocal total variation. Although tensor-based destriping methods outperform matrix-based methods, the low-rank constraint of stripes is difficult to satisfy in real HSIs, especially when the stripes are discontinuous [34]. Determining an optimal rank of tensor is challenging.
Considering the characteristic of stripes, sparse representation and sparsity promoting priors have been utilized to remove the stripes in HSIs [15,16,32,33]. However, this sparsity characteristic has not been considered under a tensor framework. In this paper, the tensor-based non-convex sparse model utilizes the sparse priors to estimate stripes from the observed images, using two ℓ sparse priors and two ℓ sparse priors. One ℓ sparse prior is related to the global sparse distributional property of stripes, and the other is related to the directional property of stripes (along the gradient stripes). One ℓ sparse prior characterizes the continuity of image (across-stripe direction), and the other denotes the correlation along the spectral dimension. Section 3.2 further explains the reasons for choosing these priors. Figure 1 shows the framework and priors of the proposed method. The ∇1 and ∇2 represent unidirectional TV operators of the along-stripe direction and the perpendicular direction, respectively, and ∇3 represents the difference operator along the spectral direction. To solve the proposed GLTSA, we extended the proximal alternating direction method of multipliers (PADMM)-based algorithm [16] to the tensor framework. The main contributions of the proposed HSI destriping methods are as follows: (1) We transformed an HSI destriping model to tensor framework, to better exploit high correlation between adjacent bands. In the tensor framework, a non-convex optimization model was constructed to estimate the stripes and underlying stripe-free image simultaneously for higher spectral fidelity. (2) The proposed method fully considers the discriminative prior of the stripes and the stripe-free image in tensor framework. We represent and analyze intrinsic characteristics of the stripes and stripe-free images, using ℓ and ℓ , respectively. Experiment results show that the proposed method outperforms existing state-of-theart methods, especially when the stripes are non-periodic. (3) We used the PADMM to solve a non-convex optimization model effectively. Extensive experiments were conducted on simulated data and real-world data. The proposed scheme achieved superior performance in both quantitative evaluation and visual comparison compared with existing approaches. The remainder of this paper is organized as follows. Section 2 gives some preliminary knowledge on the tensor notations and intrinsic statistical characteristics of stripe and stripe-free The main contributions of the proposed HSI destriping methods are as follows: (1) We transformed an HSI destriping model to tensor framework, to better exploit high correlation between adjacent bands. In the tensor framework, a non-convex optimization model was constructed to estimate the stripes and underlying stripe-free image simultaneously for higher spectral fidelity. (2) The proposed method fully considers the discriminative prior of the stripes and the stripe-free image in tensor framework. We represent and analyze intrinsic characteristics of the stripes and stripe-free images, using 0 and 1 , respectively. Experiment results show that the proposed method outperforms existing state-of-the-art methods, especially when the stripes are non-periodic. (3) We used the PADMM to solve a non-convex optimization model effectively. Extensive experiments were conducted on simulated data and real-world data. The proposed scheme achieved superior performance in both quantitative evaluation and visual comparison compared with existing approaches.
The remainder of this paper is organized as follows. Section 2 gives some preliminary knowledge on the tensor notations and intrinsic statistical characteristics of stripe and stripe-free images. Section 3 presents the formulation of our model, along with a PADMM solver. In Section 4, we discuss the experiments and analyze the selection of parameters. Section 5 concludes this paper.

Priors and Regularizers
This subsection analyzes the priors of stripes and stripe-free image and design appropriate regularizers to describe the priors in a destriping model.

Global Sparsity of Stripes
When the stripes are few in an HSI image, the stripes are regarded as a sparse tensor [16]. Naturally we choose 0 -norm, the number of nonzero elements, to measure the sparsity of stripe component S: Based on the sparsity constraint, the stripes are simply separated, while keeping useful information. However, when the stripes are extremely dense, it is difficult to restore HSIs. Heavy stripes, which means that the percentage of stripes in an image is high (usually more than 50%), significantly damage reliable spatial and spectral information. A single 0 -sparsity measurement for stripes can no longer depict reasonably intrinsic characteristics of stripes. Thus, it is necessary to explore other regularizers for stripes and HSIs to deal with heavy stripes.

Local Smoothness along Stripe Direction
Besides the global sparsity of stripes mentioned above, more powerful sparsity is embodied in gradient domain along stripe. Figure 2 represents statistical distribution of non-zero elements in ∇ 2 S, where ∇ 2 is the gradient operator along the stripe, and the stripe S in Figure 2 is extracted from the promising method proposed in [13]. Most gradient along the stripes are so small that they are close to zero. The sparsity in gradient domain essentially reflects local smoothness along stripe direction. The local sparsity measurement for stripes is as follows: images. Section 3 presents the formulation of our model, along with a PADMM solver. In Section 4, we discuss the experiments and analyze the selection of parameters. Section 5 concludes this paper.

Priors and Regularizers
This subsection analyzes the priors of stripes and stripe-free image and design appropriate regularizers to describe the priors in a destriping model.

Global Sparsity of Stripes
When the stripes are few in an HSI image, the stripes are regarded as a sparse tensor [16]. Naturally we choose ℓ -norm, the number of nonzero elements, to measure the sparsity of stripe component : Based on the sparsity constraint, the stripes are simply separated, while keeping useful information. However, when the stripes are extremely dense, it is difficult to restore HSIs. Heavy stripes, which means that the percentage of stripes in an image is high (usually more than 50%), significantly damage reliable spatial and spectral information. A single ℓ -sparsity measurement for stripes can no longer depict reasonably intrinsic characteristics of stripes. Thus, it is necessary to explore other regularizers for stripes and HSIs to deal with heavy stripes.

Local Smoothness along Stripe Direction
Besides the global sparsity of stripes mentioned above, more powerful sparsity is embodied in gradient domain along stripe. Figure 2 represents statistical distribution of non-zero elements in ∇ , where ∇ is the gradient operator along the stripe, and the stripe in Figure 2 is extracted from the promising method proposed in [13]. Most gradient along the stripes are so small that they are close to zero. The sparsity in gradient domain essentially reflects local smoothness along stripe direction. The local sparsity measurement for stripes is as follows:

Continuity of HSIs along Spatial Domain
As suggested in [16], for an HSI U, the spatial information long mode-1 (or spatial horizontal direction) is continuous. From Figure 3b, the stripes severely destroy the local spatial continuity in mode-1, indicating that the horizontal gradient of U should be as small as possible, to keep the continuity. With this observation, the 1 -norm regularization on the gradient of U below is used to describe the local continuity of the image in the spatial domain: where ∇ 1 U represents horizontal gradient of U.

Continuity of HSIs along Spatial Domain
As suggested in [16], for an HSI , the spatial information long mode-1 (or spatial horizontal direction) is continuous. From Figure 3b, the stripes severely destroy the local spatial continuity in mode-1, indicating that the horizontal gradient of should be as small as possible, to keep the continuity. With this observation, the ℓ -norm regularization on the gradient of below is used to describe the local continuity of the image in the spatial domain:   Figure 3. The spectral gradient histogram (Figure 4b) of the stripe-free HSI includes more zeroes and smaller values than that of the stripes (Figure 4c), which illustrates that the underlying HSI possesses strong continuity along the spectral direction. The low-rank prior of stripes will be violated for real HSI, especially when the stripes are fragments [34]. It is natural to minimize the ℓ norm of the spectral gradient of :

Stripe-Degraded Model
The stripes in HSI can be classified into two categories from the mechanism of noise generation, including additive stripes [14] and multiplicative stripes [38]. The multiplicative stripes can be  Figure 4 represents histograms of spectral gradient of degraded HSIs, the stripe-free image, and the stripes of WDCM in Figure 3. The spectral gradient histogram (Figure 4b) of the stripe-free HSI includes more zeroes and smaller values than that of the stripes (Figure 4c), which illustrates that the underlying HSI possesses strong continuity along the spectral direction. The low-rank prior of stripes will be violated for real HSI, especially when the stripes are fragments [34]. It is natural to minimize the 1 norm of the spectral gradient of U: Figure 2. Statistics of non-zero elements in ∇ . Vertical axis indicates the number of non-zero elements, while horizontal axis represents pixel value.

Continuity of HSIs along Spatial Domain
As suggested in [16], for an HSI , the spatial information long mode-1 (or spatial horizontal direction) is continuous. From Figure 3b, the stripes severely destroy the local spatial continuity in mode-1, indicating that the horizontal gradient of should be as small as possible, to keep the continuity. With this observation, the ℓ -norm regularization on the gradient of below is used to describe the local continuity of the image in the spatial domain:   Figure 3. The spectral gradient histogram (Figure 4b) of the stripe-free HSI includes more zeroes and smaller values than that of the stripes (Figure 4c), which illustrates that the underlying HSI possesses strong continuity along the spectral direction. The low-rank prior of stripes will be violated for real HSI, especially when the stripes are fragments [34]. It is natural to minimize the ℓ norm of the spectral gradient of :

Stripe-Degraded Model
The stripes in HSI can be classified into two categories from the mechanism of noise generation, including additive stripes [14] and multiplicative stripes [38]. The multiplicative stripes can be

Stripe-Degraded Model
The stripes in HSI can be classified into two categories from the mechanism of noise generation, including additive stripes [14] and multiplicative stripes [38]. The multiplicative stripes can be transformed to additive stripes by logarithmic transformation [39]. Thus, the observation model of stripe-degraded HSI is formulated as F = U + S [13,14], and the corresponding tensor-based reformulation is as follows: where F , U, S ∈ R h×v×b , and F , U, S indicate the degraded image, the unknown clean image, and stripes, respectively. Moreover, h and v indicate the horizontal and vertical pixel numbers in spatial domain, and b represents the number of spectral bands.

The GLTSA Destriping Model
Based on the discussion for the priors for both the image and stripe components in Section 2.2, the regularizers in the proposed model can be formulated as follows: where α, λ, and γ are weighted parameters. To tackle this double-objective non-convex optimization problem, we can transform it to the following single-objective non-convex optimization problem with constraint U = F − S: Note that the optimization problem in Equation (7) substantially differs from the models proposed in [13,16]. The main difference is that the two models are matrix-based, while our method is tensor-based. The two models in [13,16] have not considered the spectral correlation, which would result in spectral distortion. The proposed method introduces the spectral prior to avoid this issue.

Optimization Procedure and Algorithm
There are two non-convex 0 regularization terms in Equation (7). In order to handle them, we first present Lemma 1, the non-convex 0 optimization problem can be converted into convex optimization with this Lemma. Then the PADMM method is utilized to solve it, and theoretically guarantee the convergence at the same time. [39]). For any given vector w ∈ R n , the following holds:

Lemma 1. (Mathematical Program with Equilibrium Constraints (MPEC) equation
where < > represents the inner product of two vectors, and denotes the element-wise product. The absolute of w stands for the absolute value of each element of w. Then, v * = 1 − sign(|w|) is the unique optimal solution of the minimization problem in Equation (8), and sign( Lemma 1 can be intuitively extended to the tensor version, i.e., Lemma 2: Remote Sens. 2020, 12, 704 7 of 27 Lemma 2. For any given tensor W ∈ R I 1 ×I 2 ×···×I N , the following holds: where V * = 1 − sign(|W|) is the unique optimal solution of the minimization problem in Equation (9).
According to Lemma 2, the objective function (7) can be reformulated as follows: According to ADMM optimization framework, we introduce four auxiliary variables O, P, Q, and R, and then equally transform Equation (10) as follows: The augmented Lagrangian form of (11) is as follows: where the β i s are positive scalar parameters and the Λ i s (i = 1,2,3,4,5) are the scaled Lagrange multipliers. 1. O subproblem: This subproblem can be written as follows.
It has a close-form solution [16]: where T k = β 5 ∇ 2 S k + Λ k 5 I k , I k is the unit tensor of the same size as S k . 2. P subproblem: This subproblem is a non-convex optimization problem.
Based on 3D fast Fourier transform (FFT), the explicit close-form solution is as follows: where which is equivalent to the following: Based on the constraint 0 ≤ V ≤ 1, it still has a closed-form solution: where D k = 1 − Λ k 2 |O k |. 5. R subproblem: when other variables are fixed, variable R can be updated by By using soft-threshold shrinkage operator [42], the solution is given by the following equation: where soft(a, x) = sign(a ) * max(0, |a − x|). 6. Q subproblem: Similar to optimization for variable R, Q can be updated by the following equation: The closed-form solution is given by the following equation: 7. Multiplier subproblems: The update form of Lagrange multipliers Λ i s (i = 1, 2, 3, 4, 5) are as follows: Λ k+1 Remote Sens. 2020, 12, 704 All the subproblems have the closed-form solutions, which guarantee the accuracy of the model. The GLTSA destriping algorithm is summarized in the following Algorithm 1. With regard to computational complexity, for an HSI with dimensions of h×b×v, the time complexity of GLTSA is proportional to O(hbv·log(hbv)).

Experiment Results
To verify the effectiveness of GLTSA on simulated and real data, we compared with six state-of-the-art methods. The first four methods belong to band-by-band restoration matrix-based methods, including the unidirectional total variation model (UTV) [14], Directional L0 Sparse (DL0S) Model [16], the wavelet Fourier adaptive filter (WFAF) [22], and statistical gain estimation (SGE) [23]. The other three methods are tensor-based or tensor-like-based methods, including anisotropic spectral-spatial TV (ASSTV) model [19], Low-Rank Tensor Decomposition (LRTD) [20] model, and Low-Rank Multispectral Image Decomposition (LRMID) model [34]. The codes of WFAF, UTV, ASSTV, and LRMID can be downloaded from the website (http://www.escience.cn/people/changyi/codes.html). The code of DL0S can be downloaded from the website (http://www.escience.cn/people/dengliangjian/codes.html). The code of LRTD is available on the website (https://www.researchgate.net/profile/Yong_Chen96). In the simulated experiments, we add periodic and non-periodic stripes, as suggested in [34]. Before this, it is necessary to determine two indicators of the stripes: (1) the intensity represents the stripes' absolute mean value (denoted by I) and (2) the ratio (or percentage) of the stripe area to the whole image (denoted by r). We selected several typical combinations of I and r in the simulated experiments, that is I ∈ {20,50,100}, r∈{0.2,0.4,0.6,0.8, 0 -1 }, where r = 0 − 1 indicates that the percentage of stripes varies from band to band. The test data are normalized to a range of 0 to 1. In the following experiments, we manually tuned the parameters. For the experiments, we used MATLAB (R2016a) on a desktop with 8Gb RAM and Inter(R) Core(TM) CPU i5-4590: @3.30 GHz.

Simulated Data Experiments
In the simulated experiments, we generated periodic stripes and non-periodic stripes under different intensities and ratios. We used the Hyperspectral Digital Imagery Collection Experiment (HYDICE) image dataset of the Washington DC Mall (WDCM in short) as the ground truth image, which is available on website (https://engineering.purdue.edu/biehl/MultiSpec/hyperspectral.html). There are 191 spectral bands, and its spatial resolution is 1208 × 307 pixels. Because the original image is too large, which is very expensive in terms of computational cost, a sub-image of size 256 × 256 × 10 is extracted for the simulated experiment.
(a) Visual effect comparison: To evaluate the destriping effectiveness of GLTSA, we simulated many degraded cases. We selected one representative band of WDCM to show the destriping results. The original images are presented in Figures 5a and 6a, and the images are degraded by periodic stripes (I = 60, r = 0.8) and non-periodic stripes (I = 60, r = 0.4) are listed in Figures 5b and 6b, respectively. The destriping results obtained by different methods are illustrated in Figures 5c-j and 6c-j, where significant differences in visual quality can be observed.
When the intensity and ratio are low, most of the comparative methods obtain similar visual quality comparable to the proposed method. But when the intensity and ratio are high, most of the comparative methods have poor results, in both cases of the periodic and non-periodic stripes. For example, the result images are distorted, blurred, and over-smoothed in Figure 5c-e. LRTD has obtained satisfactory results due to that it directly processes the data cubes instead of a band-by-band process. There are almost no obvious residual stripes in DL0S, LRMID, and LRTD. Even though the LRMID, DL0S, and LRTD achieve similar visual quality as the proposed method, there are still residual stripes in the images from the enlarged part (yellow box), especially in the non-periodic stripes case (see Figure 6g-i). The results show that GLTSA can effectively remove stripes and retain more details and structure.
respectively. The destriping results obtained by different methods are illustrated in Figures 5c-j and  6c-j, where significant differences in visual quality can be observed.
When the intensity and ratio are low, most of the comparative methods obtain similar visual quality comparable to the proposed method. But when the intensity and ratio are high, most of the comparative methods have poor results, in both cases of the periodic and non-periodic stripes. For example, the result images are distorted, blurred, and over-smoothed in Figure 5c-e. LRTD has obtained satisfactory results due to that it directly processes the data cubes instead of a band-by-band process. There are almost no obvious residual stripes in DL0S, LRMID, and LRTD. Even though the LRMID, DL0S, and LRTD achieve similar visual quality as the proposed method, there are still residual stripes in the images from the enlarged part (yellow box), especially in the non-periodic stripes case (see Figure 6g-i). The results show that GLTSA can effectively remove stripes and retain more details and structure. (h) (i) (j)     (c) Quantitative comparison: To verify the robustness of GLTSA under different noise levels, we adopted five acknowledged quantitative indices, namely the mean peak signal-to-noise ratio (MPSNR) [43], mean structural similarity (MSSIM) [43], feature similarity (MFSIM) [44], mean spectral angle mapper (SAM) [45], and erreur relative globale adimensionnelle de synthese (ERGAS, relative dimensionless global error in synthesis, in English) [46]. The definitions of these indexes are as follows: where b represents the number of spectral bands,û i and u i are the restored image and the original clean image of i-th band (they are of the same size), and n i represents the total number of pixels of image of u i .
where µ x and µ y represent the average value of x and y images; σ x and σ y stand for the variance of x and y images; and σ xy is the covariance of these two images. C 1 and C 2 are constant here.
where Ω means the whole image spatial domain, S L (x) is the similarity of images f 1 and f 2 at each location x, and PC m (x) is the weight of S L (x) in the overall similarity between images f 1 and f 2 .
where h, v, and b are defined above. RMSE(x i ) denotes the root-mean-square error (RMSE) for image x i , and µ(i) denotes the mean of image y i . The MPSNR and MSSIM are used to measure the performance of spatial information, while the global performance of spectral distortion is evaluated by SAM and ERGAS. The larger the MPSNR and MSSIM values, the smaller the SAM and ERGAS values, and both are indicating that the effect of destriping is better. Tables 1 and 2 list the quantitative indices of all the destriping methods with simulated periodic and non-periodic stripes, respectively. The boldface digits in Tables 1 and 2 represent the best results. GLTSA outperforms LRTD, DL0S in these indices except only a few cases. As the intensity and ratio of stripes increases, the values of MPSNR and MSSIM decrease, while the GLTSA shows stable results, indicating robustness. The SAM and ERGAS values of GLTSA are the smallest among all compared methods; this indicates that the spectral distortion of GLTSA is the lowest. The quantitative evaluation results are in line with the visual effects. The boldface digits in Table 2 represent the best results. The boldface digits in Table 2 represent the best results.

Experiments on Real Data
In this experiment, we choose two real image datasets. The first one is Urban data from Hyperspectral Digital Imagery Collection Experiment (HYDICE). Some bands are degraded with non-periodic stripes. The original size is 307 × 307 × 210, and a sub-image with the dimensions 307 × 307 × 8 (from band 1 to band 8) is extracted for the experiment. The original data are available on the website (http://www.erdc.usace.army.mil/). The second data are collected by EO-1 Hyperion L1R-level sensor. These data were captured in Australia on December 4, 2010. The size of original image is 3858 × 256 × 242, and a sub-image of 400 × 225 × 10 (from band 54 to band 63) is used in our experiment, which is available online (http://remote-sensing.nci.org.au/). The data are normalized to [0,1] before processing.
The results of all the methods on the two real data are displayed in Figures 9 and 10. From Figures  9b-d and 10b-d, UTV, WFAF, and SGE result in the distortion of details and structure. Erroneous horizontal stripes are introduced by ASSTV, LRMID, and LRTD, as shown in Figures 9e-h and 10e-h. DL0S can eliminate most of the stripes, but still leave some obvious stripes, as shown in Figures 9h and 10h. Regardless of the level of image quality, GLTSA is able to remove the stripes effectively, while retaining the details.

Experiments on Real Data
In this experiment, we choose two real image datasets. The first one is Urban data from Hyperspectral Digital Imagery Collection Experiment (HYDICE). Some bands are degraded with non-periodic stripes. The original size is 307 × 307 × 210, and a sub-image with the dimensions 307 × 307 × 8 (from band 1 to band 8) is extracted for the experiment. The original data are available on the website (http://www. erdc.usace.army.mil/). The second data are collected by EO-1 Hyperion L1Rlevel sensor. These data were captured in Australia on December 4, 2010. The size of original image is 3858 × 256 × 242, and a sub-image of 400 × 225 × 10 (from band 54 to band 63) is used in our experiment, which is available online (http://remote-sensing.nci.org.au/). The data are normalized to [0,1] before processing.
The results of all the methods on the two real data are displayed in Figures 9 and 10. From  Figures 9b-d and 10b-d, UTV, WFAF, and SGE result in the distortion of details and structure. Erroneous horizontal stripes are introduced by ASSTV, LRMID, and LRTD, as shown in Figures 9eh and 10e-h. DL0S can eliminate most of the stripes, but still leave some obvious stripes, as shown in Figures 9h and 10h. Regardless of the level of image quality, GLTSA is able to remove the stripes effectively, while retaining the details.  (a)  profiles of WFAF and UTV show over-smoothing, which suggests details are lost. Figure 11i indicates that GLTSA can realize the desired result, while keeping more details.
The MCTPs of Figures 9 and 10 are shown in Figures 11 and 12, respectively. The MCTP of the Urban data is shown in Figure 11a, while Figure 11b-i shows the MCTPs of the compared methods. The profiles of LRMID and LRTD have obvious fluctuations, which indicate residual stripes, and the profiles of WFAF and UTV show over-smoothing, which suggests details are lost. Figure 11i indicates that GLTSA can realize the desired result, while keeping more details. The MCTPs of Figures 9 and 10 are shown in Figures 11 and 12, respectively. The MCTP of the Urban data is shown in Figure 11a, while Figure 11b-i shows the MCTPs of the compared methods. The profiles of LRMID and LRTD have obvious fluctuations, which indicate residual stripes, and the profiles of WFAF and UTV show over-smoothing, which suggests details are lost. Figure 11i indicates that GLTSA can realize the desired result, while keeping more details. Since there is a lack of ground-truth images as reference data, we adopt two non-reference destriping indices, including the inverse coefficient of variation (ICV) [38] and mean relative deviation (MRD) [38]. The definitions of these two indices are introduced in [38]: where Rm and Rs represent the mean and standard deviation of pixel values in the same region of the destriping and original images, respectively; and xi and yi are the pixel values in the destriping and original images, respectively. Table 3 lists the results of two non-reference indices. The best results are given in bold. The GLTSA outperforms the compared methods in most cases. The best results of compared methods have only little difference with ours. The GLTSA outperforms on destriping the state-of-the-art methods for comparison.

Analysis for Parameter Settings
At present, the selection of regularization parameters automatically or adaptively of different sensors is still a difficulty for many algorithms, which will be further researched in our future work. In this paper, the distributional properties of sparsity, smoothness, and spatial-spectral continuity are constructed as constraint terms to estimate the stripe component. Our model contains eight Since there is a lack of ground-truth images as reference data, we adopt two non-reference destriping indices, including the inverse coefficient of variation (ICV) [38] and mean relative deviation (MRD) [38]. The definitions of these two indices are introduced in [38]: where R m and R s represent the mean and standard deviation of pixel values in the same region of the destriping and original images, respectively; and x i and y i are the pixel values in the destriping and original images, respectively. Table 3 lists the results of two non-reference indices. The best results are given in bold. The GLTSA outperforms the compared methods in most cases. The best results of compared methods have only little difference with ours. The GLTSA outperforms on destriping the state-of-the-art methods for comparison.

Analysis for Parameter Settings
At present, the selection of regularization parameters automatically or adaptively of different sensors is still a difficulty for many algorithms, which will be further researched in our future work. In this paper, the distributional properties of sparsity, smoothness, and spatial-spectral continuity are constructed as constraint terms to estimate the stripe component. Our model contains eight regularization parameters (12): α, γ, λ and β i (i = 1, 2, 3, 4, 5). Since the noise intensity and proportion of stripes have many different combinations, we tuned them empirically to obtain a more accurate estimation. To find the appropriate parameter values, we chose the mean PSNR (MPSNR) as the selection criterion to evaluate and analyze the effect. We conducted experiments on different simulated data. To show the influence of parameter tuning, we used the WDCM dataset as an example. It is obvious that the optimal values of eight parameters are in the eight-dimensional parameter space, but it will be time-consuming. To mitigate this problem, we adopt a greedy algorithm to determine the optimal value of parameters in turn. Experiment results show that, although the parameters obtained by this method are not globally optimum, the destriping results are still excellent.
The function curve of MPSNR values and the regularization parameters, λ, is plotted in Figure 13a; the results show that, when λ increases from 0.6 to 1.2, the MPSNR is obviously improved, and then the MPSNR reduces slightly when λ is further increased. That is to say, the highest MPSNR value of λ is reached near 1.2. The function curve of MPSNR values with parameter γ is listed in Figure 13b. With the increase of γ, the highest MPSNR value is reached near 1. It shows similar trends as λ. Since there are many degradation scenarios in our implementation, the parameters are not fixed; they are varied in the [1.1, 1.3] range for λ and [0.8, 1] for γ. The relationship between MPSNR and the parameter α is depicted in Figure 13c. It is observed that MPSNR does not change much with the change of α. This means that the GLTSA is robust with α. In fact, the image with severe stripes will prefer a larger γ and λ but a lower α. The value of α is fixed within the range of .
The relationship between MPSNR and the parameters β 1 , β 2 , β 3 , β 4 , and β 5 are listed in Figure 13d-h, respectively. From Figure 13d, it is observed that MPSNR changes very little with parameter β 1 in the range of [20,100]. Similar to the analysis of parameter α, it can be seen from Figure 13e,f that our method is robust with β 2 and β 3 . It can be seen from the Figure 13g,h that the GLTSA obtains the best result when β 4 = 16 and β 5 = 0.05. Based on the above evaluation results, the parameter values in this paper are empirically set as β 1 = 10, β 2 = 1000, β 3 = 10, β 4 = 16, and β 5 = 0.05. regularization parameters (12): α, γ, λ and β (i = 1, 2, 3, 4, 5). Since the noise intensity and proportion of stripes have many different combinations, we tuned them empirically to obtain a more accurate estimation. To find the appropriate parameter values, we chose the mean PSNR (MPSNR) as the selection criterion to evaluate and analyze the effect. We conducted experiments on different simulated data. To show the influence of parameter tuning, we used the WDCM dataset as an example. It is obvious that the optimal values of eight parameters are in the eight-dimensional parameter space, but it will be time-consuming. To mitigate this problem, we adopt a greedy algorithm to determine the optimal value of parameters in turn. Experiment results show that, although the parameters obtained by this method are not globally optimum, the destriping results are still excellent.
The function curve of MPSNR values and the regularization parameters, λ, is plotted in Figure  13a; the results show that, when λ increases from 0.6 to 1.2, the MPSNR is obviously improved, and then the MPSNR reduces slightly when λ is further increased. That is to say, the highest MPSNR value of λ is reached near 1.2. The function curve of MPSNR values with parameter γ is listed in Figure  13b. With the increase of γ , the highest MPSNR value is reached near 1. It shows similar trends as λ . Since there are many degradation scenarios in our implementation, the parameters are not fixed; they are varied in the [1.1, 1.3] range for λ and [0.8, 1] for γ. The relationship between MPSNR and the parameter α is depicted in Figure 13c. It is observed that MPSNR does not change much with the change of α . This means that the GLTSA is robust with α. In fact, the image with severe stripes will prefer a larger γ and λ but a lower α . The value of α is fixed within the range of .
The relationship between MPSNR and the parameters , , , , and are listed in Figure  13d-h, respectively. From Figure 13d, it is observed that MPSNR changes very little with parameter in the range of [20,100]. Similar to the analysis of parameter α , it can be seen from Figure 13e,f that our method is robust with and . It can be seen from the Figure 13g,h that the GLTSA obtains the best result when = 16 and = 0.05. Based on the above evaluation results, the parameter values in this paper are empirically set as = 10, = 1000, = 10, = 16, and = 0.05.   Figure 14 shows the relationship between the MPSNR and MSSIM values and the number of iterations of the GLTSA solver. Figure 14a,b is the periodic situations with (r = 0.5, I = 100), Figure  14c,d is the non-periodic situations with (r = 0.9, I = 60). As the iterations go on, GLTSA converges as relative changes of MPSNR and MSSIM converge to zero.

Convergence Analysis and Comparison of Running Time
For time complexity, we show the running times of all the methods in Table 4. It is easy to see that, for 10 bands of WDCM data, the running speeds of methods UTV, WFAF, and SGE are obviously faster than other methods. However, according to the visual effect ( Figures 5 and 6) and index analysis (Tables 1 and 2), their results are worse. In fact, it is hard to find the best trade-off between good results and time complexity. Compared with other methods, the running speed of our method is not the fastest, but our results are the best.   Figure 14 shows the relationship between the MPSNR and MSSIM values and the number of iterations of the GLTSA solver. Figure 14a,b is the periodic situations with (r = 0.5, I = 100), Figure 14c,d is the non-periodic situations with (r = 0.9, I = 60). As the iterations go on, GLTSA converges as relative changes of MPSNR and MSSIM converge to zero.

Convergence Analysis and Comparison of Running Time
For time complexity, we show the running times of all the methods in Table 4. It is easy to see that, for 10 bands of WDCM data, the running speeds of methods UTV, WFAF, and SGE are obviously faster than other methods. However, according to the visual effect ( Figures 5 and 6) and index analysis (Tables 1 and 2), their results are worse. In fact, it is hard to find the best trade-off between good results and time complexity. Compared with other methods, the running speed of our method is not the fastest, but our results are the best.  Figure 14 shows the relationship between the MPSNR and MSSIM values and the number of iterations of the GLTSA solver. Figure 14a,b is the periodic situations with (r = 0.5, I = 100), Figure  14c,d is the non-periodic situations with (r = 0.9, I = 60). As the iterations go on, GLTSA converges as relative changes of MPSNR and MSSIM converge to zero.

Convergence Analysis and Comparison of Running Time
For time complexity, we show the running times of all the methods in Table 4. It is easy to see that, for 10 bands of WDCM data, the running speeds of methods UTV, WFAF, and SGE are obviously faster than other methods. However, according to the visual effect ( Figures 5 and 6) and index analysis (Tables 1 and 2), their results are worse. In fact, it is hard to find the best trade-off between good results and time complexity. Compared with other methods, the running speed of our method is not the fastest, but our results are the best.

Conclusions
This paper presents a tensor sparse approximation model for HSI destriping. The characteristics of image and stripe are simultaneously modeled under tensor-based sparse approximation framework. More specially, the 0 -norm of stripes and its gradient are used to characterize the global sparsity of the stripes and local smoothness along stripes direction. The 1 -norm of the gradients of underlying images in spatial and spectral are utilized to depict the continuity. The tensor-based PADMM algorithm is adopted to handle the non-convex tensor optimization model. Experiment results have demonstrated that GLTSA has better striping performance and lower spectral distortion compared with the existing methods.