A New Sparse Collaborative Low-Rank Prior Knowledge Representation for Thick Cloud Removal in Remote Sensing Images

: Efficiently removing clouds from remote sensing imagery presents a significant challenge, yet it is crucial for a variety of applications. This paper introduces a novel sparse function, named the tri-fiber-wise sparse function, meticulously engineered for the targeted tasks of cloud detection and removal. This function is adept at capturing cloud characteristics across three dimensions, leveraging the sparsity of mode-1, -2, and -3 fibers simultaneously to achieve precise cloud detection. By incorporating the concept of tensor multi-rank, which describes the global correlation, we have developed a tri-fiber-wise sparse-based model that excels in both detecting and eliminating clouds from images. Furthermore, to ensure that the cloud-free information accurately matches the corresponding areas in the observed data, we have enhanced our model with an extended box-constraint strategy. The experiments showcase the notable success of the proposed method in cloud removal. This highlights its potential and utility in enhancing the accuracy of remote sensing imagery.


Introduction
Remote sensing technology has been widely employed across various fields, including for unmixing [1,2], fusion [3][4][5][6], and classification [7].However, these images often suffer from inevitable cloud contamination, resulting in significant information loss and constraining the further analysis of remote sensing data [8,9].Consequently, advancing cloud removal techniques is critical for enhancing the practical utility of remote sensing images.
Mathematically, multitemporal images contaminated by clouds can be represented by a tensor O ∈ R a×b×d×t (a and b denote the spatial dimension, d represents the spectral dimension, and t is the time dimension); the clean image component is represented by U , the cloud component is denoted by C, and the model noise is signified by E.Then, the degradation process is formulated as follows: Numerous cloud removal techniques have been developed by researchers [10,11].These methods are generally classified into two main approaches depending on whether a cloud mask is available, offering distinct strategies for addressing the challenge of cloud removal.
The first approach is called the nonblind method.It uses a given cloud mask as prior knowledge to reconstruct information obscured by clouds.Traditional methods for addressing this problem are spatial-based methods, which only utilize the interrelations between pixels across the spatial dimension [12][13][14][15].To more effectively harness the correlations across spectral bands, researchers [16,17] have developed spectral-based methods aimed at improving the reconstruction of missing data.However, the above methods always fail to produce promising reconstruction results if the remote sensing imagery is obscured by thick clouds.Methods that take advantage of multiple images have been developed, which are classified as either multitemporal [18][19][20][21][22] or multisource [23,24] methods.Wang et al. [20] proposed a method for scene reconstruction that employs a robust matrix completion technique through temporal contiguity.Then, they presented an efficient algorithm based on the augmented Lagrangian method (ALM) with inexact proximal gradients (IPGs) to address optimization problems.Zhang et al. [21] proposed a cloud and shadow removal technique based on the learning of spatial-temporal patches.Li et al. [23] developed an innovative cloud removal methodology employing a CMD network that incorporates optical and SAR imagery.Gao et al. [24] utilized a generative adversarial network to fuse optical and SAR images in order to reconstruct information obscured by clouds.To effectively leverage image prior knowledge, hybrid methods have been proposed to exploit two or all aspects of an image's spatial, spectral, and temporal features.For instance, Chen et al. [25] proposed a Spatially and Temporally Weighted Regression (STWR) method that fully leverages cloud-free information from input Landsat scenes.Melgani [26] introduced contextual multiple linear and nonlinear prediction models.These models assume that the image's spectral and spatial characteristics remain relatively consistent across the image sequence.While nonblind methods are effective in cloud removal, their heavy reliance on cloud masks somewhat limits their applicability in comprehensively addressing cloud removal challenges.
The second approach, known as the blind method, removes clouds without providing a cloud mask.Wen et al. [27] utilized a technique known as Robust Principal Component Analysis (RPCA) to initially identify cloud cover, followed by reconstructing the missing information using discriminant RPCA to eliminate thick clouds.Chen et al. [28] proposed TVLRSDC, which means total-variation plus low-rank sparsity decomposition.A deep residual neural network model created by Meraner et al. [29] focused on the effective removal of clouds from multispectral satellite images from Sentinel-2.By integrating SAR and optical data through a fusion process, the synergistic characteristics of both imaging systems were exploited to provide guidance for image reconstruction.Wang et al. [30] developed an unsupervised domain factorization network aimed at eliminating thick clouds from multitemporal remote sensing images.These blind removal methods always perform cloud detection and removal separately and independently, which usually changes information for cloud-free regions.
In order to remove clouds better, it is necessary to study prior knowledge about the clouds.It is widely recognized that cloud components can be effectively characterized by sparse functions.Recently, the adoption of element-wise sparse functions, such as the l 1norm, has become prevalent [27,28,31] owing to their concise form and convexity.However, an element-wise sparse function ignores the correlations across the spectrum.Therefore, investigating an appropriate sparse function to characterize the cloud properties becomes crucial.Recently, Ji et al. [32] introduced a model that combines box-constrained low-rank and group sparse techniques, with the specific purpose of detecting and removing clouds.This approach defines cloud properties using group sparsity in the spectral dimension.Nonetheless, it falls short of fully exploring the characteristics of the cloud component.
To further refine the sparsity representation of the cloud component, we introduce the tri-fiber-wise sparse (TriSps) function.This function utilizes the sparsity of mode-1, -2, and -3 fibers to capture more cloud information.Our development of TriSps is inspired by the inherent structural sparsity of cloud components.Specifically, when each tube is considered as a whole, the fibers exhibit sparse characteristics, termed fiber-wise sparsity.In Figure 1a, an image affected by cloud cover is illustrated, and Figure 1b shows the corresponding cloud imagery.In Figure 1c-e, histograms of the l 2 -norms of mode-1, -2, and -3 fibers extracted from the cloud component are depicted.It is evident from Figure 1c-e that the majority of l 2 -norm values for mode-1, -2, and -3 fibers are zero, indicating significant sparsity.Additionally, we incorporate the global prior of image components, characterized by multi-rank.Leveraging the proposed sparse function and global low rank, we devised a novel cloud removal model and take advantage of a proximal alternating minimization (PAM)-based approach [33] to efficiently solve the proposed model.We devised an effective algorithm based on PAM to tackle our method.Experiments with synthetic and real datasets highlight the proposed method's prowess in cloud removal, outperforming other advanced methods currently available in the field.
This paper unfolds as follows.Section 2 introduces fundamental notations and definitions essential for understanding.In Section 3, we introduce a tri-fiber-wise sparse function to characterize the cloud component's properties.Furthermore, we have devised a method for detecting and removing clouds.In Section 4, we validate the effectiveness of our method through synthetic and real experiments.Lastly, Section 5 provides some conclusions.

Notations and Preliminaries
We present key notations and definitions [34] that are fundamental in our study.The primary notations employed in this paper are outlined in Table 1 for clarity and reference.

Notations Interpretations
a, a, A, A scalar, vector, matrix, tensor T (•) reshaping of fourth-order tensor into third-order tensor

D(•) difference operator in fourth mode
Furthermore, some definitions are presented below.

Definition 1 (Tensor mode-n product [35]). Given a tensor
In this context, A (n) ∈ R R N ×∏ d̸ =n R d stands for the matricization of tensor A in the nth mode, while Fold n (•) represents the operator that, in the nth mode, reshapes the matrix back into a tensor form.

Definition 2 (TNN [36]
).The tensor nuclear norm of tensor A ∈ R R 1 ×R 2 ×R 3 is characterized as follows: Here, Z k represents the kth frontal slice obtained from the Fourier-transformed tensor Based on the TNN, the tensor completion model can be written as follows: where B Ω signifies the tensor obtained by copying entries from A corresponding to the index set Ω while setting all other entries to zero.

The Proposed Method
In this section, we present our proposed sparse function, TriSps.Utilizing the TriSps function as a foundation, we introduce a cloud removal model that combines tensor multirank.Then, we devise a PAM algorithm to solve the proposed model.

The Tri-Fiber-Wise Sparse Function
Existing sparse functions, such as element-wise and tube-wise sparsity, fail to fully exploit the correlations across the spectrum when it comes to cloud properties in remotely sensed images.To overcome this deficiency, we introduce a novel sparse function designed to efficiently capture cloud characteristics across three dimensions.The presence of clouds in contaminated remotely sensed images is relatively sparse compared to the entire image, as shown in Figure 1b.This observation implies that the cloud component exhibits fiberwise sparsity, aligning with the fundamental nature of clouds.Figure 1 shows the sparsity within a cloud-contaminated image, where Figure 1c-e particularly highlight that most of the l 2 -norms of mode-n (n = 1, 2, 3) fibers derived from the cloud component are zero.In other words, the mode-1, -2, and -3 fibers of the cloud component exhibit clear fiber-wise sparsity.This realization allows us to simultaneously consider the sparsity across mode-1, -2, and -3 fibers rather than focusing on a single fiber's sparsity.Leveraging this insight, we introduce the tri-fiber-wise sparse (TriSps) function.This novel function adeptly captures the sparse structures of mode-1, -2, and -3 fibers simultaneously.Utilizing the TriSps function, we can more thoroughly characterize cloud properties and significantly enhance the accuracy of cloud detection in remotely sensed imagery.Mathematically, the TriSps function of C is defined as with where λ n (n = 1, 2, 3) denotes the positive weights, and C i (n) denotes the ith column of mode-n unfolding C (n) , i.e., the ith mode-n fiber of C.
Remark 1. 1. Equation ( 6) characterizes the sparsity of mode-n (n = 1, 2, 3) fibers, since it is the approximation of the following l 0 -quasi-norm of C with respect to mode-n fibers: which is to indicate the number of mode-n fibers that are non-zero.Therefore, the proposed TriSps function in Equation ( 5) can sufficiently take advantage of the sparsity property of the cloud component.
2. The proposed TriSps function reduces to the sparse function proposed by Ji et al. [32] if the weights are λ 1 = λ 2 = 0 and λ 3 = 1, which means that the function only considers the sparsity along the tubes.Different from the sparse function in [32], our proposed TriSps function is more general and can capture the properties of clouds in three-dimensional directions.

Proposed Model
The multitemporal images have high correlations among mode-1, -2, and -3 fibers.The tensor rank function [37,38] serves as an effective tool for characterizing the global correlation of image components U , adeptly capturing the inherent low-rank characteristics of images.In our study, we employ a multi-rank regularization function for U .We transform U into a third tensor and establish the regularization function as described below.

∥U ∥
where X r represents the rth frontal slice of X .That is, X = T (U ) × 3 Q T , where T (U ) restructures U into a third-order tensor, and Q satisfies Q T Q = I (I denotes the identity matrix).Additionally, we observe that the images exhibit similarity during adjacent temporal periods, which can be described by Here, D(•) represents a difference operator, D represents a difference matrix, and U (4) is the unfolding of U in its fourth mode.
Using the proposed TriSps function and the prior knowledge of the image component, we propose the following cloud removal model: The first term in the objective function represents the prior knowledge derived from the cloud component C, while the last two terms encapsulate the prior knowledge from the image component U .λ and γ denote positive regularization parameters.U = f bc (O, C, U ) represents a box constraint designed to preserve cloud-free details within the image component, ensuring it remains consistent with the observed data.The adoption of the boxconstraint strategy is motivated by the need to maintain the integrity of the cloud-free portions in U .Without this constraint, these portions may differ from those in the observed data, leading to a decrease in the quality of the reconstruction.Thus, they need the assistance of some strategies to improve the reconstruction quality of the cloud-free part.In this paper, we use the following box-constraint strategy and extend it to the proposed model.The box-constraint function is determined by U = f bc (O, C, U ), whose ith mode-3 fiber is where ϵ ≥ 0 is a given thresholding value, and avg(x) denotes the average value of vector x.

Optimization Algorithm
This subsection outlines the development of the PAM algorithm [33], crafted to address our proposed model.To facilitate this, we integrate auxiliary variables M = X , N = C, S = C and reformulate the model (10) where M r denotes the frontal slice of M corresponding to index r.
The aforementioned objective function refers to g(U , C, X , M, Q, N , S).Within the PAM-based algorithm framework, we iteratively update individual variables in an alternating fashion: Here, the superscript notation s signifies the outcome obtained after the sth iteration.µ represents a proximal parameter.Subsequently, each variable can be updated according to the following procedure.

•
Updating the U -subproblem The U -subproblem is min T (•), as a reversible reshape operator, allows for the rephrasing of the aforementioned equation.min Here, the third tensor is reformatted to its initial four-dimensional form using the inverse operator T −1 (•), which is applied to transform the tensor from R a×b×dt → R a×b×d×t .Clearly, U has the following solution: In this formulation, I ∈ R t×t represents the identity matrix.Incorporating the box constraint, we obtain the image component, as detailed below: In this context, the matrix is reformatted into a tensor by the operator Fold (4) (•).The image component U is computed by the box-constraint function U = f bc (O, C, U ).

•
Updating the X -subproblem The X -subproblem is written as Equation ( 16) yields a closed-form solution, which is given by • Updating the C-subproblem The subproblem C is outlined below.
By integrating the last four parts, the presented problem can be equivalently rewritten as follows: and δ = ∑ 5 i=3 η i .Subsequently, the C-subproblem can be dissected into the following constituent subproblems: min is the tube of Q.Then, the solution of the tube C i (3) is given by where we define 0 0 = 1.

• Updating the M-subproblem
The M-subproblem is min By including the two final terms, the aforementioned issue can be transformed into the following equivalent form: where R r denotes the frontal slice of R = η 2 X s+1 +µM s η 2 +µ corresponding to index r.As a result, the M-subproblem can be effectively addressed through the following dt subproblems: Each subproblem's (19) solution can be achieved through the implementation of a singular value thresholding (SVT) operator, specifically In this context, σ i represents the ith singular value on Σ.The matrix R r undergoes singular value decomposition to yield the matrices U,Σ, and V T .

•
Updating the N -subproblem The N -subproblem is min By including the two final terms, the aforementioned issue can be written as where F = η 4 C+µN s η 4 +µ .Next, the aforementioned subproblem can be divided into the following subproblems: min Then, the solution of the tube of N i (1) can be obtained: • Updating the S-subproblem The S-subproblem is min Similarly to solving the N -subproblem, we incorporate the last two terms: where B = η 5 C+µS s η 5 +µ .The aforementioned subproblem can be divided into the following subproblems: min Then, the solution of the tube of S i (2) can be obtained: • Updating the Q-subproblem The Q-subproblem is min The problem can be addressed by solving the following formulation: min T Here, Tr(•) signifies the trace of a matrix.This subproblem yields a closed-form solution, which is given by where ) T + µQ s .
We outline the algorithm for cloud detection and removal in Algorithm 1.
Algorithm 1 Tri-fiber-wise sparse collaborative low-rank prior knowledge algorithm.

Synthetic Experiments
Initially, we employed multitemporal remote sensing images to showcase the efficiency of our proposed technique in restoring information hidden by cloud cover.We generated four simulated datasets-Munich, Picardie, France, and Beijing-by extracting data from the Sentinel-2 (https://earthexplorer.usgs.gov)and Landsat-8 (https://theia.cnes.fr/atdistrib/rocket/#/home) (accessed on 21 April 2024) collections.The details of these datasets are comprehensively outlined in Table 2 and visually represented in Figures 2-5.We refer to the ground sampling distance as GSD.Various types of cloud masks were applied to these multitemporal images to augment the complexity of the task.To rigorously assess the effectiveness of the proposed method, we employ three quantitative metrics, mean PSNR, mean CC [39], and mean SSIM [40], as established benchmarks for this type of analysis.Superior performance is indicated by higher values across these metrics.In Table 3, we provide a comprehensive quantitative comparison of our method against established methods like TNN, ALM-IPG, TVLRSDC, and BC-SLRpGS.The highest values of PSNR, CC, and SSIM are indicated in bold for clear distinction.Our method consistently surpasses the comparative methods in PSNR across all datasets.Regarding SSIM, our method displays a slight disadvantage compared to the BC-SLRpGS method in the Munich dataset and to the ALM-IPG method in the Picardie dataset, highlighting areas for further improvement.To provide a more intuitive comparison of the methods' performance, we detail the cloud removal results in Figures 6-9.These figures include zoomed-in patches and corresponding residual images for a more nuanced comparison.Our method and BC-SLRpGS successfully reconstructed most cloud information in the Munich dataset, whereas other methods introduced distortions in image details.In the Picardie dataset, the TVLRSDC method was unable to adequately reconstruct cloud regions, in contrast to TNN, ALM-IPG, BC-SLRpGS, and our proposed method, which yielded satisfactory results.Notably, the TNN and ALM-IPG methods necessitate a predefined cloud mask, potentially incorporating additional information into the reconstruction process.Our method produced darker residual images, as seen at the bottom of Figures 6 and 7, signifying a closer similarity to cloud-free images and, thus, more effective cloud removal.The exceptional performance of our method can be attributed to our effective utilization of the sparse structure of mode-1, -2, and -3 fibers.For the Beijing dataset, our method, along with ALM-IPG and BC-SLRpGS, achieved promising outcomes, outperforming TNN and TVLRSDC methods, which were visually inferior.The zoomed-in views in Figure 8 further substantiate our method's clarity and precision in reconstructing more detailed and clearer images compared to other methods.For the France dataset, as illustrated in Figure 9, all tested methods failed to achieve promising results, primarily due to the dataset's intricate details that defy reconstruction through global correlations.Nonetheless, our method yielded notably clearer outcomes compared to the others and demonstrated superior accuracy in cloud mask detection relative to BC-SLRpGS.

Real Experiments
Two real datasets, namely, the Eura and Morocco data, as outlined in Table 4 and visualized in Figures 10 and 11, were used to assess the efficacy of our method.The reconstructed results for the Eura and Morocco datasets using various methods are depicted in Figures 12 and 13, respectively.From Figure 12, it is evident that the TNN, ALM-IPG, and TVLRSDC methods are unsuccessful in effectively removing the clouds and reconstructing cloud information.Conversely, both BC-SLRpGS and the proposed method exhibit similar performance in this aspect.From the zoomed-in figures presented in Figure 12, it is evident that our method delivers smoother results and captures a greater amount of detail in comparison to the BC-SLRpGS method.As shown by the visual comparisons in Figure 13, our method demonstrates enhanced performance compared to the other methods.From temporal node 1 in Figure 13, the proposed method effectively eliminates the cloud, while the result of TVLRSDC still exhibits some discrete clouds.Furthermore, as displayed in the zoomed-in figure in Figure 13, our method produces a clearer reconstruction result than the other compared methods.From temporal node 2 in Figure 13, we find that the proposed method removes the cloud areas well and can preserve the cloud-free areas, whereas the TNN and ALM-IPG methods change the cloud-free information after the reconstruction process.In conclusion, the proposed method outperforms other methods in terms of both quantitative metrics and visual quality.

Conclusions
We have introduced a novel tri-fiber-wise sparse function to characterize the cloud component.By leveraging sparse prior information, our method excels at detecting clouds and effectively reconstructing contaminated values, providing robust results.Moreover, we described the global prior of the image component by tensor multi-rank.Utilizing the introduced novel sparse and low-rank functions, we have developed a cloud removal model that incorporates a box-constrained strategy.This model not only effectively detects clouds but also simultaneously estimates missing information.The experiments demonstrate the superior effectiveness of the proposed method to other cloud removal techniques.

Figure 1 .
Figure 1.(a) Observed cloud-contaminated image, (b) cloud component, (c-e) mode-1, -2, and -3 fiber-wise sparsity, i.e., the l 2 -norms of mode-1, -2, and -3 fibers from (b), respectively.The contributions of this paper are threefold: • To leverage the inherent structural sparsity of cloud components, we propose the TriSps function specifically for cloud removal purposes.Unlike element-wise and tube-wise sparsity, the TriSps function is designed to capture the properties of clouds in three-dimensional directions more effectively.• Building upon the TriSps function, we propose a novel cloud removal model that simultaneously estimates both the image and cloud aspects.•We devised an effective algorithm based on PAM to tackle our method.Experiments with synthetic and real datasets highlight the proposed method's prowess in cloud removal, outperforming other advanced methods currently available in the field.

Figure 6 .Figure 7 .
Figure 6.The outcomes of cloud removal using various methods on the Munich dataset.The corresponding zoomed-in patches accompanying each image are depicted at the bottom.

Figure 8 .Figure 9 .
Figure 8.The outcomes of cloud removal using various methods on the Beijing dataset.The corresponding zoomed-in patches accompanying each image are depicted at the bottom.Original Observed TNN ALM-IPG TVLRSDC BC-SLRpGS Proposed

Table 2 .
Multitemporal remote sensing images for synthetic experiments.

Table 3 .
Quantitative metrics for simulated data.Highest values are emphasized in bold.

Table 4 .
Multitemporal remote sensing images for real experiments.