Remote Sensing Image Stripe Detecting and Destriping Using the Joint Sparsity Constraint with Iterative Support Detection

Remote sensing images have been applied to a wide range of fields, but they are often degraded by various types of stripes, which affect the image visual quality and limit the subsequent processing tasks. Most existing destriping methods fail to exploit the stripe properties adequately, leading to suboptimal performance. Based on a full consideration of the stripe properties, we propose a new destriping model to achieve stripe detection and stripe removal simultaneously. In this model, we adopt the unidirectional total variation regularization to depict the directional property of stripes and the weighted `2,1-norm regularization to depict the joint sparsity of stripes. Then, we combine the alternating direction method of multipliers and iterative support detection to solve the proposed model effectively. Comparison results on simulated and real data suggest that the proposed method can remove and detect stripes effectively while preserving image edges and details.


Introduction
Remote sensing images have been widely used in many fields, such as urban planning, environment monitoring, and precision farming [1].Unfortunately, remote sensing images are often contaminated by noises.One common and important noise is the stripe noise, which is mainly due to lines being dropped during scanning, differences between forward and reverse scanning [2], and variations in calibrations across sensor arrays in multisensor instruments.In real applications, stripe noise in remote sensing images not only reduces the image visual quality, but also lowers the accuracy of further processing tasks, such as classification [3], superresolution [4], and unmixing [5,6].Therefore, the task of destriping is an important problem before further applications.Destriping aims at removing the stripes while maintaining image features, such as edges and details.

Existing Methods
Existing destriping methods can be categorized into three classes: filter-based, statistics-based, and optimization-based.
Filtering-based methods assume that stripes are periodic and can be distinguished from the power spectrum.Therefore, they suppress the specific frequencies caused by stripe noise in the transform domain, such as low-pass filter [2], wavelet analysis [7][8][9][10], and the Fourier transform [2,11].Although these methods are easy to implement and perform well in specific cases, they generally lead to blurry artifacts in the resulting images due to the shrinkage of the frequencies of image details.To overcome this drawback, Munch et al. [12] proposed a combined Fourier and wavelet filter-based method to exploit the directional property of stripes via wavelet decomposition, achieving results with less blurry artifacts.
Statistics-based methods depend on the statistical characteristic of the digital numbers of each sensor [13][14][15][16][17][18], such as moment matching [14,17] and histogram matching [15,18].The histogram matching assumes that the probability distribution of scene radiances seen by each sensor is the same and performs well only for single-type sensors competitively.To solve this problem, Wegner [18] proposed calculating the statistics only over homogeneous image regions; Gorsini et al. [19] applied the radiometric equalization to remove non-periodic stripes.In general, the main problem of statistics-based methods is that these methods greatly depend on the pre-established reference moment or histogram.
The optimization-based methods formulate destriping as an ill-posed problem [20][21][22][23], in which the true image and stripes are characterized by prior information [24,25].For example, Shen and Zhang [22] proposed a maximum a posteriori method based on Huber-Markov regularization for image destriping and inpainting.Bouali and Ladjal [20] used the unidirectional total variation (UTV) to exploit the direction features of stripes in Moderate-Resolution Imaging Spectroradiometer (MODIS) images.Subsequently, UTV [26,27] and its variants have been widely used for destriping [21,23,[28][29][30][31].For example, Dou et al. [28] proposed a non-convex destriping model by employing the 0 -norm variant of UTV and the global sparsity regularization, which is depicted by the 1 -norm to constrain the stripe noise.The method performs well, but it fails to consider the inner structure among the stripes.Compared with Dou's model, we excavate the directional property of strips more deeply and use a joint sparsity regularization, which is depicted by the weighted 2,1 -norm ( w,2,1 -norm), to constrain the stripe component.Recently, some researchers addressed multispectral and hyperspectral image [32][33][34] destriping by exploiting the high coherence between different image bands adequately [35,36].There are several commonly-used sparsity metrics [37] to describe the sparsity of stripes, e.g., 0 -norm, 1 -norm, 2,0 -norm, and 2,1 -norm.Here, 0 -norm and 1 -norm penalize, respectively, the number and the magnitude of non-zero elements, which describes the global sparsity.On the other hand, 2,0 -norm and 2,1 -norm penalize, respectively, the number and the magnitude of non-zero columns, which describes the group sparsity to explore the structural sparsity of stripes.As summarized in Table 1, we discuss the differences and relations among several state-of-the-art optimization-based methods.

Motivation
Most existing destriping methods concentrate on estimating the clean image directly from the observed image, while paying little attention to the stripe component.Besides the directional and structural properties of the stripes, the stripe position is also an important feature, and effective exploitation of this feature can significantly improve the destriping performance.
In this paper, we propose a novel destriping model that can detect the stripe position and remove the stripes simultaneously, by fully exploiting the properties of the stripes.Our model exploits both the structural and the directional properties of stripes.On the one hand, we use the w,2,1 -norm to characterize the joint sparsity (structural property) of stripes.In addition, we use the iterative support detection (ISD) [41,42] to calculate the weight vector in the w,2,1 -norm, which shows the stripe position in the observed image.On the other hand, we use UTV to characterize the directional property of stripes.We apply the alternating direction method of multipliers (ADMM) [6,[43][44][45] algorithm to solve the proposed model efficiently.In summary, our contributions are as follows: (1) We propose a non-convex optimization model to characterize the position properties and the joint sparsity of stripes using the w,2,1 -norm, which can significantly improve the destriping performance.(2) We combine ADMM and ISD to solve the proposed model effectively.Under specified conditions, the convergence of the proposed method can be guaranteed.In the solving process, we achieve stripe detection by calculating the weight vector.Meanwhile, we design new indices to analyze the accuracy of detection.
The remainder of this paper is organized as follows.Section 2 gives the proposed model with analyses of the stripe properties.Section 3 presents the optimization algorithm.Section 4 shows the experimental results and provides the comparisons with other methods.Section 5 concludes this paper.

The Proposed Model
Without loss of generality, we consider each stripe as a column, since we can rotate the image to make the stripe vertical if the stripe is horizontal.We consider the following additive degradation model [20,38]: where F ∈ R m×n , U ∈ R m×n , and S ∈ R m×n represent the observed image, the ground-truth image, and the stripe noise, respectively.This degradation process is shown in Figure 1.
The degradation process (a The directional property of stripes The joint sparsity of stripes  From a degraded image F, we detect the stripe position and estimate the stripe S simultaneously by solving the following model: where D y S 1,1 , D x F − D x S 1,1 , and S ω,2,1 are denoted as: where the subscript : j denotes the j th column; D y and D x are linear second-order difference operators along the vertical direction and the horizontal direction, respectively; λ 1 > 0 and λ 2 > 0 are regularization parameters balancing the three terms; ω = [ω 1 , • • • , ω n ] is a weight vector.After estimating the stripe S from (2), the ground-truth image U is obtained by F − S.
We explain in detail the motivation of each term in our model.To illustrate the stripe properties appropriately, we use GSUTV [40], which achieves the destriping by estimating the stripe component, to remove stripes in the Terra MODIS image and explore the stripe properties in results.Figure 1a-c presents the degraded image, the destriping result, and the stripe component, respectively.Figure 1d-i shows the gradients of the observed image, the destriping image, and the stripe image in two directions: horizontal (across the stripes) and vertical (along the stripes), respectively.
We analyze the gradient statistical distribution before and after the degradation process firstly.It is easy to see that the horizontal gradient mapping between Figure 1d,f differs greatly in the image domain, while the vertical gradient mapping between Figure 1e,g is similar.In addition, we plot their corresponding gradient histograms in Figure 1j,k.We observe that the gradient distributions of Figure 1d,f are absolutely different, while the gradient distributions of Figure 1e,g are almost the same.Such an observation is not surprising, since the stripe noise has significantly directional feature.As shown in Figure 1c, we observe that the stripe noise possesses a significant structural property.Furthermore, we plot 2 -norm values of the stripe component (shown in Figure 1l).
The smoothness along the vertical direction of the stripe: The first term D y S 1,1 penalizes the gradient of the stripe image along the vertical direction.Figure 1c shows the great smoothness along the vertical direction of the stripe image, and Figure 1i shows that the vertical gradient of the stripe image has significant sparsity.Thus, we chose the 1 -norm of the vertical gradient of stripe image to keep the stripe component smoothness well.Although the p -norm (0 < p < 1) [39,46] can depict sparsity more accurately than the 1 -norm [28], here we did not adopt them because they are nonconvex and have a high possibility of affecting the convergence of the algorithm.
Joint sparsity of the stripe: The second term S ω,2,1 characterizes the joint sparsity of stripes.As shown in Figure 1c, the stripe component has a special column structure.As Chen et al. [40] pointed out, the stripe component is composed by column vectors, and each column can be considered as a group.Moreover, we observe that the structure of the stripe component exhibited joint sparsity in the sense that the non-zeros of S 1: , • • • , S m: absolutely appeared in the same position.This motivated us to use a joint sparsity regularization.As shown in Figure 1l, most of the 2 -norm values of columns are close to zero, while the other values are much bigger than them.This explains again that the joint sparsity regularization is more appropriate to depict the structural property of stripes than the normal group sparsity [47] regularization.
Compared with GSUTV, the main difference of the proposed model is the weight parameter vector ω.We can remove some columns from the regularization by setting the weighted vector ω, if we regard them as true stripe columns.Therefore, the ω,2,1 -norm can depict joint sparsity more appropriately than the 2,1 -norm.It is worth mentioning that unlike most of the present weighted models, which acquire ω :j from the set of positive real numbers, we adopt ω :j from {0, 1} due to the specific property of the stripes.This means that if we regard S :j as the true non-zero column (true stripe column), we can forbid it from moving nearer to zero, and set ω :j as zero.Because the position information about non-zero is unknown, ω j is not given beforehand.Therefore, we used ISD [42], as a self-learning scheme, to explore partial support information gradually.
The smoothness along the horizontal direction of the ground-truth image: The third term D x F − D x S 1,1 , in the proposed model, penalizes the difference between the gradient of the ground-truth image and the stripe image along the horizontal direction, which enhances the constraint on the smoothness of the ground-truth image along the horizontal direction.Generally speaking, the ground-truth image exhibits smoothness along both horizontal and vertical directions, resulting in sparsity in the gradient domain.However, the stripe noise mainly damages the sparsity along the horizontal direction (Figure 1d), rather than the vertical direction (Figure 1e).Therefore, we chose the 1 -norm of the horizontal gradient of the ground-truth image to make the desired ground-truth image smoother along the horizontal direction.
The framework of the proposed method is illustrated in Figure 2, and the details of the optimization method are shown in the next section.

The Optimization Algorithm
To solve the proposed model, we alternately used ISD [42], a self-learning scheme to explore true non-zeros gradually, and ADMM [43,48,49], a useful approach to deal with a non-smooth, but convex optimization model.The optimization method that combines ISD and ADMM is reasonable.The model is hard to solve due to the non-convex regularization term S ω,2,1 and the nature of the remainder non-smooth regularization terms.To overcome the solving difficulty caused by S ω,2,1 , we extended the idea of ISD in our method from normal sparsity to joint sparsity.We can detect the position of non-zeros and update the value of the weight vector based on ISD.It is easy to see that when the weight parameter vector is a constant, the proposed model degrades to an ordinary convex model in terms of S, and we call it the joint sparsity model.The joint sparsity model, which is non-smooth, but convex, is simple to deal with by ADMM.Therefore, we used ADMM to handle the problem caused by the non-smooth regularizer.Therefore, we solved this model by executing the following two steps: The details of the two steps are shown as follows.

Solving the Joint Sparsity Model
The main idea of ADMM is, by introducing auxiliary variables, to transform the unconstrained optimization model into a constrained optimization model with the splitting structure of variables.By introducing three auxiliary variables X, Y, and Z, we rewrite (2) as follows: The augmented Lagrangian function of ( 6) is as follows: where A 1 , A 2 , and A 3 are the Lagrange multipliers and β > 0 is a penalty scalar.ADMM iterates as follows: 3 ), 3 ), 3 ), 3 ), The details of solving the four subproblems are shown as follows: (1) The X-subproblem can be rewritten by completing the square as follows: whose solution can be given by the following column-wise vector-soft threshold function [50,51]: Considering the matrix X ∈ R m×n , the cost of computing X is O(mn).
(2) The same as the X-subproblem, the Z-subproblem can be solved by the shrinkage operator: The same as the X-subproblem, the cost of computing Z is also O(mn).
(3) The Y-subproblem is: which also can be solved by a shrinkage operator [49][50][51]: The cost of computing Y is also O(mn).
(4) The S-subproblem: is a quadratic optimization, which has the solution: We assume that the boundary of S is periodic in this paper; thus, D y D y , D x D x are both block cyclic system matrices.Therefore, ( 16) can be efficiently solved by fast Fourier transforms [42]. where: F (•) and F (•) −1 denote the fast Fourier transform and its reverse transform, respectively.The symbol ⊗ means component-wise multiplication, the the same as the division, and the symbol (•) means component-wise complex conjugate.The cost of computing S is O(mn log mn) by using fast Fourier transform.At each iteration, the cost of computing all the variables X, Y, Z, and S is O(mn log mn).

Update the Weight Vector ω via ISD
In this step, we used ISD to detect the position of stripes and update the weight vector ω based on the current intermediate solution S.
Firstly, we give the definition of the set whose elements are indices of detected nonzero columns as follows: where Secondly, we chose threshold (v) by the "first significant jump".In detail, after sorting sequence s (v) from small to large, the "first significant jump" scheme finds the smallest j satisfying the inequality: where s k j is the j th number of s (v) , which is sorted.There are several easy methods [42] to define τ (v) ; in this case, we set it as the average value of It is easy to see that the true stripe entries of |s (v) | are large in magnitude and small in number, and the false ones are large in number and small in magnitude.Therefore, the "first significant jump" scheme spread out the magnitudes of the true stripe entries and clustered those of the false ones.
Finally, we can obtain the index set I (v+1) and give the definition of the set whose elements are indices of detected nonzero columns as follows: We update the weight vector ω (v+1) according to the two index sets as follows: The main computational cost of the weight vector ω stems from computing the S in the inner loop.Moreover, the iteration number is small empirically, as per the following discussion in Section 4.1.Therefore, the total complexity of the proposed algorithm is O(mn log mn).
The pseudocode of the algorithm combining ADMM and ISD is summarized in Algorithm 1.

Algorithm 1
The optimization algorithm combining ADMM and ISD.

Input:
The observed image F, the parameters λ 1 , λ 2 , the penalty parameter β, the outer iteration number V, and the inner iteration number K max .

7:
Check the convergence condition: end for 9: Estimate the intermediate ground-truth image U (v) = F − S (k) .10: Calculate the index set I (v+1) by the method of threshold choice for S = S (k) .11: Update the weight vector ω (v+1) by (21).12: end for Output: The final weight vector ω (V) and the estimation of the ground-truth image U.
Remark 2. Although we set ω from the set of {0, 1}, we can consider ω from real continuous values as an alternative.

Experiments
In this section, we conduct experiments on both simulated and real data to test the efficiency of the proposed method.Five state-of-the-art destriping methods were chosen for comparison: the filtering-based method [12] (WAFT), the statistics-based method [13] (SLD), and optimization-based methods such as the weighted UTV-based method [29] (WDSUV), the low-rank decomposition-based method [38] (LRSID), and the group sparsity regularization-based method [40] (GSUTV).In the simulated experiments, the original MODIS Image Band 32 of size 400 × 400, which is available at https:// ladsweb.nascom.nasa.gov/,and the IKONOS image of size 377 × 331, which can be downloaded from https://openremotesensing.net/, were used by adding synthetic stripe noises to the clean image.In the real experiments, we used two real MODIS images, which are available at https://ladsweb.nascom.nasa.gov/,degraded by different types of stripes.For the convenience of quantitative evaluation and parameter selection, the gray value of the test images was normalized to [0,1].We used a desktop of 16 GB RAM, Intel (R) Core (TM) i5-4590U CPU, @3.30GHz to run all codes in MATLAB (R2017a).
To evaluate the results of both simulated and real experiments, we chose several qualitative and quantitative assessments.The qualitative assessments include the visual effect of destriping image and the mean cross-track profile [38].In the simulated experiments, we chose full-reference evaluation indices: the peak signal-to-noise ratio (PSNR) and the structural similarity (SSIM) [52].Moreover, to evaluate the correctness of stripe detection, we defined two indices called the detection error rate (DER) and the detection missing rate (DMR) as follows: where S is the synthetic stripe component, W is the last weight vector, a is the number of error detected stripe columns, and b is the number of missing detected stripe columns.In real experiments, we selected two no-reference indices including noise reduction (NR) [16,22,36] and mean relative deviation (MRD) [16,36], since the ground-truth image was unavailable.NR depicts the ratio of stripes reduction in the frequency domain; MRD evaluates the ability of the method to retain the pixel value in stripe-free regions.
In short, larger values of PSNR, SSIM, and NR indicate better destriping performance, and smaller values of MRD indicate better destriping performance.
In the outer loop, the key index is the iterative number that decides the threshold.Fan et al. [41] pointed out that the threshold value will not change much, even if the iterative number goes to infinity.
Empirically, we set the iterative number as five to stabilize the threshold.For the compared methods, we set the parameters as the suggested values in the references.

Simulated Experiments
In the simulated experiments, we added both periodical and non-periodical types of synthetic stripe based on the observed model ( 1) to two different remote images.We denoted the periodic stripe level by using a vector with three elements, e.g., (period, intensity, and rate), where "period" denotes the number of columns between each stripe column (denoted p), "intensity" denotes the absolution value of pixels of each stripe column (denoted I), and "rate" denotes the rate of the stripe area within the ground-truth image (denoted r).Similar to the periodic stripe, we denoted the non-periodic stripe level by using a vector with two elements, e.g., (intensity and rate), where "intensity" and "rate" have the same definition as the elements listed above.In our simulation experiments, p is in the set of {10, 15}, I is in the set of {50, 100}, and r is in the set of {0.2, 0.5, 0.8}.For both periodic and nonperiodic cases, the stripe locations were randomly distributed on the image, and the absolute value of the stripe column pixels was the same.From Case 1-3, we gradually increased the rate of the stripes with p = 10, I = 50, leading to some different striping effects.From Case 4-6, we gradually increased the rate of the stripes with p = 15, I = 50.From Case 7-9, the rate also has been increased with I = 50.From Case 10-12, the rate also has been increased with I = 100.
As shown in Figures 4 and 5, the visual results of Case 9 and Case 3 are displayed as examples.In Figure 4, we observe that the WAFT method blurred the boundaries and many residual stripes still existed; the IRUV method lost the detailed information; the LRSID method blurred the whole image.Although the stripes were well removed by the SLD and GSUTV methods, some blurred regions are shown in the red circle.As a comparison, the proposed method achieved better visual performance.
Figures 6 and 7 show the estimated stripe images of different methods and the corresponding error images.The error images of the five comparative methods show that they failed to estimate the stripe component precisely.In contrast, the error image of the proposed method had fewer stripe detail information, which suggests that the proposed method had a better destriping performance.
The quantitative results of all the cases are listed in Table 2.Moreover, to test the effect of stripe position detecting, we show the results of DER and DMR in Table 3. From Table 2, we observe that the proposed method obtained better PSNR and SSIM values in the simulation experiments.From Table 3, with the striping effect increasing, the DER values increased gradually, and most of the DMR values were zero.In general, it is clear that the proposed method could detect the stripe position almost precisely.In summary, the quantitative comparison conformed with the above visual results presented.
In addition, Figures 8 and 9 show the mean cross-track profiles of Figures 4 and 5, respectively.Figure 8 shows that the proposed method kept a right curve tendency, and the column mean values of the destriping image were almost as the same as the true image.This illustrates that the proposed method can preserve the edges and details of the image, while it can accurately estimate the stripe component.

Real Experiments
In this subsection, Figures 10 and 11 show the results from all six methods on two real remote sensing images.The real images of different stripe noise were selected from Terra MODIS and Aqua MODIS, as shown in Figures 10a and 11a.The Terra MODIS image was highly influenced by non-periodical stripe noise, and the Aqua MODIS image was gravely contaminated by a periodical one.As shown in Figure 10, the WAFT and SLD methods blurred the boundaries, and many residual stripes still existed; the IRUV method removed the stripes completely, while blurring some regions; the LRSID method had a bad performance in preserving the stripe-free information, and some residual stripes still existed.For the GSUTV method, all stripes could be removed completely, but some edge regions are shallower in Figure 10f.The proposed method successfully suppressed the stripe noise with fewer artifacts.Figures 10 and 11 show that the JSUTV method was better than the five compared methods in terms of removing stripes and preserving the stripe-free information.
Moreover, in Table 4, we list the quantitative NR and MRD indices' values of the real experiments.In order to avoid the affect of external factors, five 10 × 10 homogeneous regions were selected to calculate MRD, then to obtain the mean value of MRD.For the NR and MRD indices, we can see that the proposed method achieved better results.In summary, the proposed method can obtain both a better visual effect and quantitative indices.

Numerical Convergence of the Proposed Algorithm
Since our model was convex for a fixed ω, and the convergence of the S subproblem could be guaranteed according to the convergence result of ADMM.We give the following Figure 12 to numerically illustrate the convergence of our algorithm.Figure 12 shows the relative error curves of the successive estimated stripes S (k) and S (k+1) , and the relative error kept decreasing as the iteration number increases, which illustrates that the proposed algorithm was convergent numerically.

Conclusions
In this paper, we have proposed a new destriping model to achieve the stripe detection and stripe removal simultaneously.The proposed model exploits both the stripe directional property and the stripe structural property and utilizes the joint sparsity prior to describe the structural property.We have proposed a multi-stage alternative optimization method, which combines ISD and ADMM, to solve the minimization problem.The proposed method has been compared with WAFT, SLD, IRUV, LRSID, and GSUTV, and experimental results show the superiority of the proposed method in terms of both quantitative and qualitative assessments.Moreover, simulated experimental results show the accuracy of the stripe detection.
Although the proposed method removes both periodic and nonperiodic stripes well in the horizontal or vertical direction, it still has a shortcoming: it cannot remove oblique stripes.Thus, we will focus on improving our method to overcome the mentioned challenge.

Figure 1 .
Figure 1.Destriping results in Terra MODIS by GSUTV.The first line: (a) Degraded image.(b) Destriping image.(c) Stripe component.The second line: (d) Horizontal gradient of the observed image.(e) Vertical gradient of the observed image.(f) Horizontal gradient of the destriping image.(g) Vertical gradient of the destriping image.(h) Horizontal gradient of the stripe component.(i) Vertical gradient of the stripe component.The third line: (j) Histogram of horizontal gradient.(k) Histogram of vertical gradient.The fourth line: (l) 2 -norm values of the stripe component.

Remark 1 .Figure 2 .
Figure 2. The framework of the proposed model.ADMM, alternating direction method of multipliers; ISD, iterative support detection.

( 1 )
Step 1: Keep the weight vector fixed, and solve the joint sparsity model by ADMM.In the first iteration, we set the initial value of ω as [1, • • • , 1] and obtained an estimation S 1 by solving the joint sparsity model.(2)Step 2: Using the estimated S from Step 1, update ω via a support detection operation.

Table 1 .
A comparison of the related optimization-based methods and their properties.UTV, unidirectional total variation.

Table 3 .
The detection error rate (DER) and detection missing rate (DMR) results of simulated experiments.

Table 4 .
Quantitative results on real data using NR and mean relative deviation (MRD).