Hyperspectral Anomaly Detection Based on Improved RPCA with Non-Convex Regularization

: The low-rank and sparse decomposition model has been favored by the majority of hyperspectral image anomaly detection personnel, especially the robust principal component analy-sis(RPCA) model, over recent years. However, in the RPCA model, (cid:96) 0 operator minimization is an NP-hard problem, which is applicable in both low-rank and sparse items. A general approach is to relax the (cid:96) 0 operator to (cid:96) 1 -norm in the traditional RPCA model, so as to approximately transform it to the convex optimization ﬁeld. However, the solution obtained by convex optimization approximation often brings the problem of excessive punishment and inaccuracy. On this basis, we propose a non-convex regularized approximation model based on low-rank and sparse matrix decomposition (LRSNCR), which is closer to the original problem than RPCA. The WNNM and Capped (cid:96) 2,1 -norm are used to replace the low-rank item and sparse item of the matrix, respectively. Based on the proposed model, an effective optimization algorithm is then given. Finally, the experimental results on four real hyperspectral image datasets show that the proposed LRSNCR has better detection performance.


Introduction
The Hyperspectral sensing image (HSI) integrates spectrum and spatial information and is a kind of three-dimensional image data [1][2][3]. Compared with single-band images, hyperspectral images contain richer spectral information. Hyperspectral anomaly detection is an important research direction in hyperspectral image processing. It is currently widely used in reconnaissance and environmental monitoring [4][5][6]. The purpose of hyperspectral anomaly detection (HAD) is to extract the target information (anomaly information) from the background from the influence [7]. Different from traditional target detection, no prior information of the target is required [8,9]. Assuming that the main objects in the image scene are background information, the probability of anomaly objects appearing in the whole image is often very low [10][11][12].
In 1990, the linear RX method was proposed by Reed [13] et al., it is a pioneering algorithm for hyperspectral anomaly detection [14]. This algorithm divides the HSI into a background information part and binary classification problem to be detected, which solves the problem of anomaly detection. On this basis, many classic anomaly detection algorithms have been proposed, such as the Local RX(LRX) [15] algorithm, which uses a local sliding window to estimate the reference background, the weighted RX (WRX) [16] method, which aims to reduce the influence of anomalies on the covariance matrix when estimating background statistical data, and the Kernel RX (KRX) [17], which maps the original space to a high-dimensional feature space by using nonlinear kernel functions, so it is easier to distinguish anomalies from background pixels in this feature space. However, attributing to randomness and complexity of background, the established background model is not easy to describe the complex background, resulting in a high false alarm detection rate of RX algorithm [18].
Due to its complete theoretical knowledge and operability, the method derived from low-rank sparse matrix decomposition (LRaSMD) [19][20][21] has attracted increasing attention in the field of HSIs anomaly detection.
LSMAD (the LRaSMD-based Mahalanobis Distance Method) greatly utilizes LRaSMD for hyperspectral anomaly detection, the Godec [22] method is used to separate background and sparse components, and Mahalanobis distance is used to measure similarity [23]. Assuming that the background data is located in Low-Rank and Sparse Representation (LRASR) [24] in multiple low-rank subspaces, a background dictionary training method is proposed to separate outlier pixels through the trained background dictionary.
The LSDM-MoG (low rank and sparse decomposition model via mixture of Gaussian) method proposes to fit the sparse components in the image through a mixture of Gaussian distribution so as to obtain more accurate detection results [21]. At the same time, considering the three-dimensional data characteristics of hyperspectral images, some researchers proposed to use third-order tensors to characterize hyperspectral images, and good detection results have also been achieved [25].
Robust Principal Component Analysis (RPCA) [26] was proposed in 2009 to better solve the problem that background information is easily affected by noise and gross errors in traditional principal component analysis. At present, scholars in the field of hyperspectral image anomaly detection have carried out extensive research on the RPCA model. The original RPCA-RX [27] algorithm treats hyperspectral remote sensing image data as a two-dimensional matrix A and uses matrix decomposition to decompose it as a low-rank item L and a sparse item S, the former of which is background and the latter is a non-zero element that contains the anomaly information of the image. Finally, the classic RX detector is then applied to the sparse item. The following are robust PCA optimization problems: where rank(L) is the rank of L, S 0 is the 0 operator of matrix S, which represents the number of elements in S that are not zero, and λ represents a positive trade-off parameter, due to the goal in (1). The function is non-convex and non-continuous, and solving the 0 operator in Equation (1) is NP-hard. It is the most common and traditional method to relax the 0 operator to the 1 -norm in most academic research. Although 1 -norm is widely studied and applied in sparse learning, it may not be optimal in most sparse items because slack approximation of the 0 operator to 1 -norm often leads to over-penalization. Later, some scholars proposed many non-convex regularizers to solve this problem in order to better approximate the 0 operator, such as Smoothly Clipped Absolute Deviation (SCAD) [28], Minimax Concave Penalty (MCP) [29], p -norm (0 < p < 1) [30], Log-Sum Penalty (LSP) [31], Laplace [32], and Capped 1 penalty [33]. They are defined and visualized in Table 1 and Figure 1. These penalty functions have a common feature: they are all non-convex and monotonically unreducing on (0, +∞). Thus, their gradients are non-negative and monotonically decreasing.
The low-rank is an extension of matrix singular value sparsity. Recently, some scholars have done a lot of work on low-rank approximation methods used in predictions tasks and have made outstanding contributions to complex fluid dynamics problems combined with deep learning methods [34,35]. However direct rank minimization is likewise NP-hard and difficult to solve. The principal method is to apply the nuclear norm [36,37]. The rank function is approximated by this problem normally by minimizing the estimated matrix nuclear norm, that is, by minimizing the matrix rank convex relaxation [38].

Penalty
Formula g(θ) ≥ 0, ω >0 Gradient This relaxes Problem (1) into the following problem: The nuclear norm of a matrix L is defined as the sum of its singular values, i.e., ||L|| * = ∑ i σ i (L), where σ i is the i-th singular value of A. Significant attention has been being given to the nuclear norm minimization (NNM) method due to its rapid development in both matrix decomposition and matrix recovery [39]. However, even NNMs have a few disadvantages. In this method, all singular values are treated equally and shrink at the same threshold. Clearly, this NNM method, as well as its corresponding soft-thresholding solvers, are inflexible. Therefore, instead of the rank norm, we used the weighted nuclear norm (WNN), called weighted nuclear norm minimization (WNNM), which is more flexible than NNM. The WNN of a matrix L is represented by ||L|| w, * = ∑ i w i σ i (L) and w = [w 1 , w 2 , . . . , w n ] T . The representation capability of the original nuclear norm was enhanced by the weight vector.
In addition, we also found that WNNM, IRNN (iteratively reweighted nuclear norm), and LSP had the same display effect in the matrix recovery, decomposition, image denoising, etc. The relevant proof can be found in [40,41]. LSP usually performs better than other nonconvex surrogates. However, LSP still needs to be scaled iteratively, and WNNM can get the optimal solution directly.
As can be seen from Figure 2a, there is a very large difference in each singular values of the low-rank item, which often dramatically decreases at an exponential level, e.g., 10 3 ∼10 −1 . However, the non-zero value of the sparse item of the hyperspectral image indicates anomalies, which tend to be smaller, e.g., 0.6∼0.05, as shown in Figure 2b. The predecessors mainly used the same low-rank or sparse non-convex regularization. In light of the characters of the priors above in the hyperspectral image, two different types of regularization for low-rank and sparse were proposed, respectively. Specifically, we replaced the nuclear norm with a weighted nuclear norm for a low-rank item. We then replaced the 1 -norm with Capped 2,1 -norm for a sparse item. This is due to the weighted nuclear norm being equal to the sum of the logarithm of a singular, which is more suitable for the situation when the singulars change dramatically. Similarly, the Capped 2,1 -norm is more suitable for the situation when the variables become smaller. We also added groupsparseness to our model to incorporate the sparsity into the pixel rather than the spectral intensity of the pixel. The three main contributions of the proposed HSI anomaly detection method are listed as follows.
(1) The rank function of a low-rank item is replaced by a weighted nuclear norm; (2) The Capped 2,1 -norm is used to replace the 0 operator of the sparse item; (3) The proposed method adopts improved RPCA models to detect anomalies, anomalies are modeled by the sparse component, and background is modeled by the low-rank component. The experimental results on four real HSI datasets show that the proposed LRSNCR method has better detection performance than other methods and can better separate the background and anomalies.

Methodology
In this section, LRSNCR is proposed from HAD in the article and its optimization algorithm is introduced in detail. Firstly, the HSI cube data was rearranged as an input to LRSNCR. Secondly, the anomaly was separated from the background by getting the utmost out of the idea of matrix factorization to solve the problem of constrained convex optimization. Meanwhile, the weighted nuclear norm and Capped 2,1 -norm were used to replace the rank function in a low-rank item and the 0 operator in a sparse item, respectively. Thirdly, the sparse and low-rank components were modeled and solved separately. The proposed LRSNCR architecture is shown in Figure 3. Reformulating (1) leads to the following LRSNCR model: In general, if only Equation (3) is transformed into a convex problem, there are a lot of methods that can be used to solve it. Here, we only introduce an augmented Lagrangian multiplier algorithm, namely Alternating Direction Method of Multipliers (ADMMs [42,43]).
For the optimization problem (3), the augmented Lagrangian function is first constructed: where Y is the Lagrangian multiplier, it is the weight of the sparse error term in the cost function, and also is the given parameter, and µ is a positive scaler. • F is the Frobenius norm of the matrix when Y = Y k , µ = µ k . Fixed L,Y, update S: The original problem can then be recast into two sub-problems: q 1 and q 2 . This is equivalent to decomposing a non-convex set into two convex sets to solve. The global minimum point is obtained by comparing the local minimum in two convex sets [40].
The solutions of q 1 and q 2 are e 1 and e 1 , respectively, which are the the local minimum points.
The global minimum point is determined by Equation (10).
and θ is a parameter to be set. For each element of u j , sign(u j ) is the sign function. Fixed S, Y, update L: where ∑ i w i σ i (L) is the WNN of matrix L. These include: w i , the weight (zero or positive number), defined for σ i (L) and σ i (L), which is the i-th singular value of matrix L.
In [44], let A =U ∑ V T be the Singular Value Decomposition (SVD) of the observation , the reweighting formula w i = C σ i (L)+ε , with the initial estimation L 0 = A, and the optimization problem min L A − L 2 F + L w, * (WNNM) has the closed-form solution in a low-rank item, which is Fixed L, S, update Y: update µ: where µ is a positive scaler so that the objective function is only perturbed slightly. Since the LRSNCR model is non-convex for general weight conditions, we used an unbounded µ k to guarantee the convergence of the proposed method. We did not want the iterations to be stopped very quickly in case µ k increased too fast, which would be bad for our model. Therefore, in our proposed method, a small value of ρ was used to constrain the problem of excessive growth of µ k . The LRSNCR is summarized in Algorithm 1 (https://github.com/yoyoath/LRSNCR, accessed on 1 January 2022). Here, the orders of updating L and S can be changed.

Experimentation Results and Discussion
In this section, the availability of the LRSNCR model for HAD is expounded by the analysis and discussion of the experimental results. The algorithms and processes in this paper were implemented using MATLAB language in a PC that was powered by Windows 10 and Core i7-1165 CPU @2.80 GHz by Intel with 16 GB RAM.

Hyperspectral Datasets
In the experiment, we used some real hyperspectral images to verify the effectiveness of this method in anomaly detection.
(1) ABU(Airport-Beach-Urban)-Urban: The first dataset was from HSI in the ABU dataset, which contained 13 different hyperspectral image scenes, of which Urban was one [45]. These images of size 100 × 100 × 207 were collected on the Rexas Coast, had a resolution of 17.2 m, and were extracted from the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) website. The anomaly object in this image is a cluster of buildings. The pseudo-color image and ground-truth(GT) map of this dataset is displayed in Figure 4.
(2) ABU (Airport-Beach-Urban)-Beach: The second HSI dataset was the beach scenes and consisted of 100 × 100 pixels with 180 bands, and like the source of the first dataset, it was also obtained by (AVIRIS) sensor [46]. Anomalies consisted of man-made reefs in this scene. The pseudo-color image and GT map of this dataset is displayed in Figure 5.
(3) SpecTIR: The third experimental data were derived from the SpecTIR hyperspectral airborne Rochester experiment [47]. The detailed experimental information for this dataset was 180 × 180 pixels, 120 bands, 5 nanometer spectral resolution, and 1 meter spatial resolution. In this dataset, noise and useless bands were eliminated in advance where anomalies were forged by artificial-colored textile materials. The pseudo-color image and ground-truth map of this dataset is displayed in Figure 6.
(4) Sandiego: The fourth dataset, the San Diego scene, came from the same source as the first two datasets, also provided by (AVIRIS) [48]. The dataset had a spatial resolution of 3.5 and used a partial scene of the Sandiego airport in California (located in the US) in the experiments. The original image contained a total of 224 bands, covering 3,702,510 nm. In total, 189 bands were used in this article. Its space size was 100 × 100 pixels. The three aircraft above the image were considered anomalies, occupying a total of 58 pixels. The pseudo-color image and ground-truth map of this dataset is displayed in Figure 7. As shown in Figure 7a, we selected three bands from 189 bands in the Sandiego dataset to display as pseudo-color images. As shown in Figure 7b, the white pixel was the target or anomaly information, the exception in this image was the aircraft, and the background was a black pixel. It was used to compare with our test results.

Evaluation Metrics
We introduced the detection map, anomaly background separability map, ROC curve, and AUC value to carry out a qualitative and quantitative analysis of LRSNCR [49]. The anomaly background separability map and detection map can qualitatively evaluate the proposed detection performance of LRSNCR.
The ROC curve is a quantitative HAD and evaluation technical index based on the anomaly target reference information provided in the true value map marked by the anomaly point position in the hyperspectral image, and the detection results obtained by the HAD method were based on the detection value. In general, the false alarm rate (FAR) was used as the abscissa and the probability of detection (PD) was used as the ordinate in the ROC curve, respectively.
The definition of FAR and PD is: 18) where N f d symbolizes the background pixel counts that are incorrectly judged as anomaly pixel counts and N a symbolizes the total counts of pixels in HSI. N cd represents the detected anomaly pixel counts and N t represents the total counts of anomaly target pixel in the HSI. The AUC value is often used as an important performance assessment indicator to estimate the algorithm performance under different parameters. Ideally, this value is 1. The closer the value is to 1, the better the algorithm performance under this parameter. The calculation formula of AUC is: (19) where F ROC represents the ROC curve function.

Parameter Tuning
LRSNCR had some parameters to tune, including the weighted C in WNNM and other parameters (ρ, λ, and θ). With AUC as the evaluation index, we tested the effect of different parameters on the performance of LRSNCR on ABU-Urban, ABU-Beach, and SpecTIR datasets.
In our proposed method, ρ was not a very large value, according to prior knowledge and the previous chapter. Thus, we searched the best ρ, varying from 0.5 to 1.5 at intervals of 0.05 in Figure 8. In all datasets, we found that the detection performance was very poor when ρ was between 0.5 and 1.05 (short of 1.05). When ρ reached 1.05, the AUC value was higher and the detection performance was the best. Therefore, in all subsequent experiments, the ρ of LRSNCR was set to a fixed value of 1.05. We also dynamically adjusted the influence of different C, λ, and θ changes on AUC.
They were fixed on a scale of 10 −4 , 10 −3 , 10 −2 , 10 −1 , 10 0 , 10 1 , 10 2 , 10 3 , 10 4 , respectively. Firstly, C and θ were tuned with other parameters fixed in Figure 9. Then λ and θ were tuned with other parameters fixed in Figure 10. The parameters adjustment process in this part was all carried out in the ABU-Urban dataset.
Aiming to further study the effect of these three parameters for AUC values, the different effects of each parameter for AUC values were explored on three datasets, which are shown in Figures 11-13. It was observed that the weighted C was more sensitive than the λ and θ parameters according to AUC value. Therefore, in the following experiments, in all datasets, λ was 1 and θ was 10, respectively, and the parameter C still needed to be adjusted separately in each datasets to obtain a better good experimental results.

Detection Performance and Discussion
We analyzed the detection performance in our proposed LRSNCR, which done using two sets of comparative experiments. Firstly, the fixed low-rank term remained unchanged using the WNNM method, and some mainstream penalty functions were selected to replace the 0 operator in the sparse item, such as SCAD, MCP, and 2,1 , and were called WNNM-SCAD, WNNM-MCP, and WMMN-L2,1. In addition, the same method (WNNM or Capped 2,1 ) was used to replace the rank function of the low-rank term and the 0 operator in the sparse item, which was named WNNM-WNNM, Capped L2,1-Capped L2,1. Using SpecTIR dataset as an example, we evaluated their detection capabilities based on ROC and AUC. Figures 14a and 15a show that the ROC curve of the LRSNCR algorithm almost wrapped the curves of other methods, indicating that the performance of the LRSNCR algorithm on the ROC curve far exceeded the other methods. In Figures 14b and 15b, AUC values of the proposed LRSNCR were also relatively high. Therefore, whether for the low-rank item using the Capped 2,1 -norm or the sparse item using other penalty functions (or WNNM), the LRSNCR had the better detection performance.  Secondly, to further analyze the detection of performance based on the proposed LRSNCR, we contrasted the detection performance between some classical hyperspectral image anomaly detection methods, such as GRXD [13], LRXD [15], and LRaSMD-based methods, such as LRASR [24], LSMAD [23], and RPCA-RX [27]. Figures 16-19 illustrate the detection diagrams by the different methods on each datasets. The higher the value in the graph, the brighter the pixel, which means the greater the chance of being an anomaly pixel. As shown in Figure 16, our proposed method could detect most of the anomalies in the dataset and could better suppress the background. Other methods can only detect some anomaly, and LRXD has the worst detection performance. In Figures 17 and 18, although all methods could detect anomalies, it was obvious that only our method had the lowest FAR. In Figure 19, only our proposed method could fully detect the three aircraft located in the top right-hand corner, while others could not. Therefore, it can be concluded that the proposed LRSNCR is superior to other methods, with a better capability of separating background from anomaly.    As shown in Figure 20a, the PD of LRSNCR was bigger than others in the beginning. After continuing, LRSNCR still had a high probability detection and low FAR. The curves produced by GRXD, LSMAD, and RPCA-RX were intertwined and mediocre, while LRXD and LRASR tended to be closer to the bottom right and performed less well. As shown in Figure 21a, the curve produced by LRSNCR demonstrated a higher PD and a lower FAR than the other detection methods. LRSNCR had a FAR of approximately 0.001 while achieving a 100% probability of detection. Therefore, it was more convincing than others. In Figure 22a, the curve of LRSNCR had a significant trend, wrapping the other curves, and the rest of the curves were intertwined, except for the LRSNCR. In addition, after FAR = 0.008, the LRSNCR had a larger PD. In comparison with others, LRSNCR was at an obvious advantage. As shown in Figure 23a, the curves of GRXD, LRXD, LRASR, LSMAD, and RPCA-RX were also intertwined, which meant that the effect of LSMAD was better but not as good as that of LRSNCR. These curves were always below that produced by LRSNCR. Almost all of the ROC curves of the LRSNCR method can wrap around those of other methods, with a higher detection rate but a lower FAR. This shows that the performance of LRSNCR method on ROC curve is much better than that of other methods.    In the box-plot, after first processing the data regarding anomaly and background, two rectangles of the separability graph were obtained, in which red rectangle indicated the anomaly and the blue rectangle indicated the background. The separability between the anomaly and the background was determined by the distance between the red and blue rectangles. The larger distance between red and blue rectangles of LRSNCR than that of other comparison methods, as shown in Figure 20b, indicates that LRSNCR can better separate anomalies from the background and is more convincing. The box-plot shown in Figures 21b-23b, also indicates that the proposed LRSNCR can easily identify desired anomalies from background. Table 2 lists the objective index AUC values obtained by anomaly detection of each data set by different methods. The AUC for the proposed LRSNCR on the four datasets were 0.9991, 0.9999, 0.9995, and 0.9903, respectively. Furthermore, the second largest AUC values on the four datasets were 0.9957, 0.9998, 0.9976, and 0.9778, which were not higher than our method. In each hyperspectral data set, the execution time of proposed method was 49.83561, 22.08689, 27.86498, and 19.45832, respectively. Table 3 provides the execution time of various detection methods. We proposed LRSNCR to be in the middle and lower level in execution time and did not have much advantage in computing time, because our method requires iterative optimization.

Conclusions
In this article, an improved RPCA with non-convex regularized was proposed for HAD through the research and improvement of the RPCA problem. The rank norm and the 0 operator were replaced with the WNNM and non-convex Capped 2,1 -norm. Experiments with four hyperspectral datasets demonstrated that, by using the LRSNCR method, discrimination between anomalies and background was enhanced. Compared with classical GRXD, LRXD and LRASR and LSMAD and RPCA-RX detectors in four real hyperspectral datasets, it was still better than others in terms of detection effectiveness and stability by the proposed LRSNCR.
The proposed method, based on low-rank and sparse joint non-convex regularization, was different from many improvements based on low-rank or sparse terms. Experimental results also show that our method achieved good results but still has some limitations. When dealing with large-scale hyperspectral image data, the iterative optimization speed of the algorithm may be less dominant, so reducing the running time or speeding up the convergence speed is an important aspect worth considering. In recent years, tensorbased and deep learning-based methods have been widely used in hyperspectral anomaly detection. In the future, we will consider low-rank approximation and deep learning or methods combined with tensors.