Robust Infrared Small Target Detection via Jointly Sparse Constraint of l 1 / 2 -Metric and Dual-Graph Regularization

: Small target detection is a critical step in remotely infrared searching and guiding applications. However, previously proposed algorithms would exhibit performance deterioration in the presence of complex background. It is attributed to two main reasons. First, some common background interferences are di ﬃ cult to eliminate e ﬀ ectively by using conventional sparse measure. Second, most methods only exploit the spatial information typically, but ignore the structural priors across feature space. To address these issues, this paper gives a novel model combining the spatial-feature graph regularization and l 1 / 2 -norm sparse constraint. In this model, the spatial and feature regularizations are imposed on the sparse component in the form of graph Laplacians, where the sparse component is enforced as the eigenvectors of their graph Laplacian matrices. Such an approach is to explore the geometric information in both data and feature space simultaneously. Moreover, l 1 / 2 -norm acts as a substitute of the traditional l 1 -norm to constrain the sparse component, further reducing the fake targets. Finally, an e ﬃ cient optimization algorithm equipped with linearized alternating direction method with adaptive penalty (LADMAP) is carefully designed for model solution. Comprehensive experiments on di ﬀ erent infrared scenes substantiate the superiority of the proposed method beyond 11 competitive algorithms in subjective and objective evaluation.


Introduction
Small target detection is a pivotal technique in infrared search and tracking applications, such as precise guidance and antimissile systems, maritime target search equipment, and small unmanned aerial vehicle surveillance systems [1][2][3]. The main purpose of small target detection is to search and locate potentially suspicious targets in the distance as early as possible, which facilitates people to take adequate preparations for emergencies. In long-range infrared scenes, projected targets only possess one or few pixels in size let alone other concrete discriminating features, i.e., texture, edge, contour, and might even be buried in diverse interferences or heavy noise. Additionally, their visibility varies greatly depending on target type and background environment. Although great advances have been made for detecting small target in recent decades, it remains a formidable task due to the above challenges.
Generally, relying on whether the prior information like target velocity and trajectory is applied or not, existing detection algorithms can be roughly viewed as sequential detection or single-frame detection. Traditional sequential detecting schemes present acceptable performance under given conditions [4][5][6]. However, they degrade performance possibly in the absence of the priors. Small target The main contributions of this paper are summarized in the following: (1) We propose a novel model based on dual-graph regularization for infrared small target detection, which simultaneously incorporates both the data and feature manifold in the form of graph Laplacian.
(2) To eliminate fake targets effectively, l1/2-norm instead of l1-norm in traditional methods is used to constrain the sparse part. Additionally, a non-negative constraint in sparse component is appended to cater to the fact that targets have higher intensity.
(3) To accelerate the efficiency of the proposed algorithm, we skillfully design an efficient optimization algorithm based on the linearized alternating direction method with adaptive penalty (LADMAP) [31], which uses fewer auxiliary variables with convergence guarantee. Extensive experiments on various scenes demonstrate the superiority of the proposed model compared with 11 competitive baselines.
The remainder of this paper is organized as follows. We have a brief review about graph representation for data as well as about the methods related to infrared small target detection in Section 2. The proposed dual-graph regularized method is given detailed in Section 3. We propose a simple and feasible optimization algorithm to solve the proposed model in Section 4. The performance of the proposed method is evaluated by extensive experiments in Section 5. Finally, our discussion and conclusions are presented in Sections 6 and 7, respectively.

Graph Laplacian
Suppose that  The main contributions of this paper are summarized in the following: (1) We propose a novel model based on dual-graph regularization for infrared small target detection, which simultaneously incorporates both the data and feature manifold in the form of graph Laplacian.
(2) To eliminate fake targets effectively, l 1/2 -norm instead of l 1 -norm in traditional methods is used to constrain the sparse part. Additionally, a non-negative constraint in sparse component is appended to cater to the fact that targets have higher intensity.
(3) To accelerate the efficiency of the proposed algorithm, we skillfully design an efficient optimization algorithm based on the linearized alternating direction method with adaptive penalty (LADMAP) [31], which uses fewer auxiliary variables with convergence guarantee. Extensive experiments on various scenes demonstrate the superiority of the proposed model compared with 11 competitive baselines.
The remainder of this paper is organized as follows. We have a brief review about graph representation for data as well as about the methods related to infrared small target detection in Section 2. The proposed dual-graph regularized method is given detailed in Section 3. We propose a simple and feasible optimization algorithm to solve the proposed model in Section 4. The performance of the proposed method is evaluated by extensive experiments in Section 5. Finally, our discussion and conclusions are presented in Sections 6 and 7, respectively.

Graph Laplacian
Suppose that X ∈ R d×n resides on a potential manifold M, an undirected weighted graph G(X, E, W) with n vertices can be constructed via k-nearest neighboring manner, as shown in Figure 2, where E = e ij is the edge set with each edge e ij connecting vertices x i and x j . W = w ij denotes edge weight set measuring the correlation between vertices, which can be calculated by binary method, heat kernel, or correlation distance. If vertices x i and x j are not in a cluster set, then w ij x i , x j = 0. The unnormalized graph Laplacian matrix L ∈ R n×n corresponding to graph G is presented as where H is the degree matrix with each entry H ii = j W ij . It can be expressed mathematically as follows:

Related Algorithms
Significant advances in single-frame infrared small target detection have been made in recent decades. Existing methods can be roughly classified into background prediction-based, local saliency-based, transform domain-based, dictionary learning, multiple features integration, deep learning, and subspace learning.
Background prediction-based methods model the appearance of each pixel by numerical statistics, such as a filtering manner [8,9,32] or using a parametric probability density function, such as Gaussian mixture function [7,33], and non-parametric algorithms, such as kernel density estimation functions [10]. These methods can locate small targets precisely when the background presents uniformity visually but may fail to deal with abrupt structures in heterogeneous scenes. Local saliency-based methods delineate the difference based on the local region around suspicious targets to enhance the target and suppress background, such as local contrast measure (LCM) [11], novel local contrast method (NLCM) [34], multiscale patch-based measure (MPCM) [12], derivative entropy-based contrast measure (DECM) [35], weighted local difference measure (WLDM) [13], dualwindow local contrast method [14]. Such types of methods successfully enhance the dim small target and neglect the smooth areas of background, increasing the detection rate. However, they are less robust to sun glints and high-contrast edges in intricate sea backgrounds, causing high false alarm rates. Transform domain-based models explore more useful features in different domains, such as Fourier domain [1], fuzzy space [36], gradient vector field [37], to discriminate real target components from such target-like ones. The kind of methods are computational friendly and can complete the task of target extraction in a clean scene with high contrast. However, they are incapable of making a distinction between a dim small target and heavy natural noise. Dictionary learning methods such as those proposed by Li et al. [38] and Wang et al. [39], recognize the real target from several candidates derived from a given dictionary. These methods can deal with different types of small targets well, yet rely heavily on the quality of the given target dictionary. They would obtain unsatisfactory performance when actual targets cannot be represented by a dictionary atom or their combination. Methods based on multiple features integration overcome the drawbacks of the simple feature depicted by raw pixels. For example, Qin et al. [15] and Yao et al. [16] gradually remove clutter

Related Algorithms
Significant advances in single-frame infrared small target detection have been made in recent decades. Existing methods can be roughly classified into background prediction-based, local saliencybased, transform domain-based, dictionary learning, multiple features integration, deep learning, and subspace learning.
Background prediction-based methods model the appearance of each pixel by numerical statistics, such as a filtering manner [8,9,32] or using a parametric probability density function, such as Gaussian mixture function [7,33], and non-parametric algorithms, such as kernel density estimation functions [10]. These methods can locate small targets precisely when the background presents uniformity visually but may fail to deal with abrupt structures in heterogeneous scenes. Local saliency-based methods delineate the difference based on the local region around suspicious targets to enhance the target and suppress background, such as local contrast measure (LCM) [11], novel local contrast method (NLCM) [34], multiscale patch-based measure (MPCM) [12], derivative entropy-based contrast measure (DECM) [35], weighted local difference measure (WLDM) [13], dual-window local contrast method [14]. Such types of methods successfully enhance the dim small target and neglect the smooth areas of background, increasing the detection rate. However, they are less robust to sun glints and high-contrast edges in intricate sea backgrounds, causing high false alarm rates. Transform domain-based models explore more useful features in different domains, such as Fourier domain [1], fuzzy space [36], gradient vector field [37], to discriminate real target components from such target-like ones. The kind of methods are computational friendly and can complete the task of target extraction in a clean scene with high contrast. However, they are incapable of making a distinction between a dim small target and heavy natural noise. Dictionary learning methods such as those proposed by Li et al. [38] and Wang et al. [39], recognize the real target from several candidates derived from a given dictionary. These methods can deal with different types of small targets well, yet rely heavily on the quality of the given target dictionary. They would obtain unsatisfactory performance when actual targets cannot be represented by a dictionary atom or their combination. Methods based on multiple features integration overcome the drawbacks of the simple feature depicted by raw pixels. For example, Qin et al. [15] and Yao et al. [16] gradually remove clutter interferences and highlight target by combining background consistency and target singularity. Additionally, a multiscale adaptive difference and variance difference are jointly used to enhance small targets and alleviate the impact of background fluctuation [40]. Methods of this type effectively eliminate the edge clutters and pixelwise noise with high brightness in heterogeneous scenes. However, they become less effective when encountering dim small target scenes, resulting in missing targets with high probability. Recently, deep convolutional neural networks have been employed for the community of small target detection [41][42][43][44]. Lin et al. [42] designed a seven-layer conventional neural network in an end-to-end way to automatically extract small target features and eliminate clutter. With the help of massive training samples generation, Zhao et al. [43] suggested a simple conventional network for modeling the background patches. Such methods show good robustness even in some complex situations with heavy clutter but they require a great quantity of labeled training data, which may not always be available in practice. In contrast, the proposed method is unsupervised and does not need any labeled training data.
The proposed method falls into the category of a subspace learning model, so we review the precious subspace learning methods based on robust principal component analysis (RPCA) for small target detection. In [19], RPCA is initially applied to separate the outliers in data, which also is employed for infrared small detection in [18], called infrared patch image model (IPI). Many researchers have put forward effective optimization, enhancements, extensions, and ameliorations of the original IPI model. For the model, one of the limitations is long consuming time due to the slow convergence of the optimization based on accelerated proximal gradient (APG). The alternating direction method of multiplier (ADMM) optimization scheme is used in the models proposed recently since it can get same optimal solution under faster convergence [21,22], [45,46]. To mine more nonlocal self-correlation information from patch models, some extended versions of the IPI model have been proposed, including infrared patch-tensor model (IPT) [21], spatial-temporal patch-image model (STPI) [7], spatial-temporal tensor model (STTM) [47], multi-subspace learning model (SMSL) [20]. Among them, IPT and SMSL perform well in some complex scenes, but they may obtain high false alarms in sea backgrounds with heavy waves and sun glint. The models in [7,47] take spatial-temporal information into account and increase the detection probability of dim small targets in slowly changing backgrounds. Some ameliorated versions are proposed to further improve the robustness of the initial IPI model. For instance, Dai et al. [22] proposed a weighted IPI model (WIPI), which used the target likelihood coefficient based on steering kernel instead of the constant weight. In [23,24], the nonconvex and tighter rank surrogate acts as a substitute for the original nuclear norm to achieve better background suppression. Besides, in the enhancement type of the IPI model, Wang et al. [45] used the total variation regularization (TVPCP) to depict the background feature, which aimed to obtain good target-background separation for some mild situations. In [46], we proposed a combination of nonconvex rank approximation and graph regularization (GRLA) to take full use of the intrinsic structure between patch images.
Difference to related existing subspace learning based methods: As a subspace learning method, our proposed model differs from the aforementioned ones in several aspects. (1) Our method incorporates prior information in both spatial and feature spaces of patch images simultaneously, whereas other methods [21][22][23][24] only take the priors within the patch space into account, ignoring the feature space.
(2) Our method employs the sparser regularizer instead of commonly used reweighting manner [22][23][24] to enhance the sparsity of small targets, so as to suppress target-like outlier better. (3) Our method uses LADMAP to give the model solution, while other methods apply the traditional ADMM [45,46], which will introduce too many multipliers, leading to increasing time consumption. The detailed characteristics of the existing methods are summarized in Table 1.

Algorithm Description
In this section, we describe the major steps of the proposed model in detail and give its system diagram in Figure 3. An infrared small target image is transformed into a patch image by using a sliding window as the input matrix. The patch and feature graphs are constructed along the columns and rows of the input matrix, and then, these Laplacian matrices of corresponding graphs are imposed on the sparse component to preserve sparse structures. To better suppress sparse outliers, we employ l 1/2 -norm as the surrogate of l 1 -norm. The novel objective function is formulated by incorporating these sparse regularizations and solved by LADMAP effectively. Finally, a simple thresholding operator is used to extract the real target from the target image reconstructed by the uniform average of estimator re-projection (UAE).

Algorithm Description
In this section, we describe the major steps of the proposed model in detail and give its system diagram in Figure 3. An infrared small target image is transformed into a patch image by using a sliding window as the input matrix. The patch and feature graphs are constructed along the columns and rows of the input matrix, and then, these Laplacian matrices of corresponding graphs are imposed on the sparse component to preserve sparse structures. To better suppress sparse outliers, we employ l1/2-norm as the surrogate of l1-norm. The novel objective function is formulated by incorporating these sparse regularizations and solved by LADMAP effectively. Finally, a simple thresholding operator is used to extract the real target from the target image reconstructed by the uniform average of estimator re-projection (UAE).

Patch and Feature Graph Regularizations
Herein, we present in detail that patch and feature structural regularizations on sparse component are jointly introduced into an objective function. Constructing an infrared patch-image are the column vectors of matrix D . P W is the adjacency matrix, which encodes the edge weight and connectivity of the graph. The graph

Patch and Feature Graph Regularizations
Herein, we present in detail that patch and feature structural regularizations on sparse component are jointly introduced into an objective function. Constructing an infrared patch-image D ∈ R p×n , let G P = (D, E P , W P ) whose vertices {d ·1 , d ·2 , · · · , d ·n } are the column vectors of matrix D. W P is the adjacency matrix, which encodes the edge weight and connectivity of the graph. The graph is constructed based on k-nearest neighboring strategy by using heat kernel, which involves searching for the closest neighbors of all the columns based on the Euclidean distance. In G P , W P contains the edge weight of each node connecting to its k-nearest neighbors, which can be defined as where i, j = 1, . . . , n. N k (d ·i ) represents the set of k-nearest neighbors of d ·i . The patch graph constraint on sparse component is designed as where Tr(•) is the trace function of matrix and h P (i, i) is a diagonal element of the degree matrix H P . t ·i and t ·j are the column vectors of target patch-image T. The patch graph Laplacian matrix L P ∈ R n×n is calculated by Equation (1). Similarly, the feature graph G F = (D, E F , W F ) is built by using the row vectors d i· and d j· of matrix D. W F is formulated as where i, j = 1, . . . , p. Then, the feature graph constraint is denoted as where t i· and t j· are the row vectors of T. The information explored by the feature graph can refine the rare structure in the sparse component. The feature graph Laplacian matrix L F ∈ R p×p is computed by W F similar to L P , as given by Equation (1).

l 1/2 -Norm Regularization with Non-Negative Constraint
In heterogeneous scenes, corrupted components not only contain a small target but also an irregular flash point, which show similar sparsity to small targets under insufficient observation data. These components cannot be constrained equally as l 0 -norm under l 1 -norm measurement, causing biased suppression of sparse components. Then, some rare components will remain in the detection result and increase false alarms, directly undermining the robustness for the practical applications. Albeit employing l q -norm (0 < q < 1) to constrain the sparse component can achieve a detection result with fewer false alarms, manual selection of q is an unwieldy process and reduces the adaptability of the method [48]. Fortunately, Xu et al. [25] validated the representativeness of l 1/2 regularization among Remote Sens. 2020, 12,1963 8 of 24 all l q regularization (0 < q < 1) by a phase diagram study. They creatively pointed that whenever q takes in [1/2, 1), the smaller the q, the sparser the solutions can be generated by l q regularization and for q in (0,1/2), the performance of l q regularization has no significant difference. Herein, we introduce l 1/2 regularization as a surrogate of l 1 -norm to constrain the sparse part. Nevertheless, l 1/2 regularization only enhances the sparsity of target but neglects the basic physical fact that pixel values of target must be non-negative. Therefore, it is more reasonable to add a non-negative constraint on the sparse component. The non-negative sparse constraint can be defined as Finally, integrating the geometric manifold in patch and feature spaces in the form of a graph and l 1/2 regularization with a non-negative constraint into an overall framework, we propose a novel model for small target detection, which is formulated as where λ, γ 1 , and γ 2 are the tradeoffs to control the corresponding weight to each of the terms while optimizing the objective function.

LADMAP for Solving the Proposed Model
In recent years, many categories have been developed for solving low-rank optimization problems [49][50][51]. Especially, ADMM is frequently employed to handle the target-background separation under RPCA framework in infrared small target detection community. It can update the separable variables in convex programming by alternately minimizing, so that the optimization problem can be simplified in this way. However, observing the model in Equation (7), one can find that multiple auxiliary variables should be introduced to realize the separability of the augmented Lagrangian function. The computational complexity of the algorithm will then be correspondingly increased, because a certain amount of complexity is required to minimize each variable, and the number of iterations, as a key factor of influencing the computational efficiency, may be increased. It seriously erodes the algorithm in real time. To tackle this issue, we adopted a well-designed variant of ADMM, called the linearized ADMM with adaptive penalty (LADMAP) [31], to solve the proposed model effectively. For this purpose, the linear equality constraint in Equation (7) is removed by using the following augmented Lagragian function: where Y ∈ R p×n is a Lagrangian multiplier, and µ > 0 assigns the penalty to the violation of the linear constraints. LADMAP is used to directly optimize the primary variables B, T, and Y by solving each variable alternatively while fixing other ones. It involves fewer auxiliary variables and converges faster than the initial ADMM [52]. The detail procedure of solving Equation (8) by LADMAP is provided in the following.

Solution of the Proposed Method
Solving B: Assuming that T and Y are fixed, the solution of B k+1 is provided by minimizing the following objective function: The closed-form solution in Equation (9) can be obtained by singular value thresholding (SVT) operator, which is defined as where U, Σ, V are obtained using singular value decomposition (SVD) of Solving T: Assuming that B and Y are fixed, the solution of T k+1 is provided by minimizing the following objective function: which does not have a closed-form solution. In order to use the closed-form solution to the proximity operator of l 1/2 -norm achieved by the half -thresholding operator [25], we take an ingenious strategy by further linearizing the smooth component of Equation (11) to simplify the subproblem. The smooth component of Equation (11) can be written as Then, motivated by the spirit of LADMAP, solving Equation (11) can be replaced by minimizing the following problem: where s(B k+1 , T, Y k , µ k ) can be approximated by the second-order Taylor expansion of the smooth components at T k . ∇ T s(T k ) is the gradient of s(B k+1 , T, Y k , µ k ) with respect to T. As long as η 1 > 2λ T 2 + µ 1 + Y 2 2 , in which T 2 denotes the spectral norm of a matrix taking the largest singular value, the above replacement is valid. The subproblem (Equation (13)) is transformed into the minimization of l 1/2 -regularization, expressed as where α= 2λ/η 1 . According to [25], the solution of Equation (14) with a non-negative constraint can be computed with the help of half -thresholding operator, detailed as the following: where H α, 1 2 denotes the half -thresholding operator defined by Equations (16)- (19): where with and Updating Y and µ: where µ max is a given positive constant, and Convergence Criteria: According to KKT condition, the stopping criteria is designed as where ε 1 and ε 2 are the tolerance factors. Relying on the stopping criteria defined in Equation (23), the sequences (B, T, Y) yielded by the revised LADMAP converges to an optimal solution of the problem of Equation (7). The key steps of the proposed algorithm are summarized in Algorithm 1.

Complexity Analysis
The computational complexity of the proposed model is majorly dominated by the optimization of the revised LADMAP and the construction of the patch and feature graph. The construction of dual-graph based on nearest neighboring manner needs O p 2 n + pn 2 . Let k be the total number of iterations and r be the lowest rank of B. In each iteration, SVT is employed to compute the low-rank matrices in which its total complexity is O rn 2 under the usage of partial SVD. For half -thresholding operator, the complexity is O(pn) since some productions between matrix and vector are only required. The overall computational complexity in all iterations is O pn 2 + p 2 n + k pn + rn 2 . Hence, the primary computational complexity is O p 2 n in the case of p > n or O pn 2 in the case of n > p.

Experimental Evaluation and Analysis
In this section, extensive experiments are conducted to test the effectiveness and robustness of the proposed model in terms of clutter suppression and sparse point elimination. Specially, we illustrate the validity of the proposed patch and feature sparse regularizations and provide sensitivity analysis of the parameters in the proposed model. After that, we also give the experimental comparisons with 11 competitive works on subjectively visual effect and objective indicators.

Datasets and Baselines
The experiments were carried out on numerous infrared images. Herein, 10 infrared small target sequences are displayed, which cover four typical scenes: deep space, sky-cloudy, sea-sky, terrain. To observe these sequences intuitively, Figure 4 exhibits representative frames randomly picked from each sequence, where the designated target areas are zoomed in for better observation for a dim small target. From the figures, it is easily found that these experimental scenes contain different interference components, which make them more complex than clearly simple background. The detailed features and scene classification of these sequences are presented in Table 2.

Experimental Evaluation and Analysis
In this section, extensive experiments are conducted to test the effectiveness and robustness of the proposed model in terms of clutter suppression and sparse point elimination. Specially, we illustrate the validity of the proposed patch and feature sparse regularizations and provide sensitivity analysis of the parameters in the proposed model. After that, we also give the experimental comparisons with 11 competitive works on subjectively visual effect and objective indicators.

Datasets and Baselines
The experiments were carried out on numerous infrared images. Herein, 10 infrared small target sequences are displayed, which cover four typical scenes: deep space, sky-cloudy, sea-sky, terrain. To observe these sequences intuitively, Figure 4 exhibits representative frames randomly picked from each sequence, where the designated target areas are zoomed in for better observation for a dim small target. From the figures, it is easily found that these experimental scenes contain different interference components, which make them more complex than clearly simple background. The detailed features and scene classification of these sequences are presented in Table 2.  The performance of the proposed model is compared with 11 state-of-the-art methods, comprising TDLMS [32], TopHat [9], MOG [7], MPCM [12], WLDM [13], FKRW [15], IPI [18], TVPCP [45], SMSL [20], GRLA [46], RIPT [21]. Among these methods, TDLMS, TopHat, and MOG belong to the type of background prediction. MPCM and WLDM are the enhanced versions of LCM and achieve leading performance. FKRW is viewed as a model of multiple features integration. TVPCP, SMSL, GRLA, and RIPT are recently proposed methods based on subspace learning. For comparison,  The performance of the proposed model is compared with 11 state-of-the-art methods, comprising TDLMS [32], TopHat [9], MOG [7], MPCM [12], WLDM [13], FKRW [15], IPI [18], TVPCP [45], SMSL [20], GRLA [46], RIPT [21]. Among these methods, TDLMS, TopHat, and MOG belong to the type of background prediction. MPCM and WLDM are the enhanced versions of LCM and achieve leading performance. FKRW is viewed as a model of multiple features integration. TVPCP, SMSL, GRLA, and RIPT are recently proposed methods based on subspace learning. For comparison, we used the original codes of MOG, IPI, TVPCP, SMSL, GRLA and RIPT provided by the authors, while the remaining methods were reimplemented according to their corresponding literature. Moreover, the parameters in these tested methods were adjusted to obtain better performance, as summarized in Table 3.

Evaluation Indicators
The signal-to-clutter ratio gain (SCR G ) gives a measure of how much the complexity of the target area varies, defined as follows: where SCR in and SCR out denote the SCR before and after background suppression separately, and it is defined as SCR = M t − µ b /(σ b + ω). M t is the maximum intensity of target area. µ b and σ b are the average grayscale and standard deviation of the neighboring region around the small target. ω = 0.01 denotes as a smoothing factor to avoid division zero. BSF gives some sense of how much the background suppression effect is presented, which is formulated as where σ in and σ out denote the standard deviation of target neighboring region in original and suppressed images, respectively. Additionally, we introduce another metric to quantify the target contrast enhancement, namely contrast gain (CG) defined as CG = CON out CON in (26) where CON in and CON out denote the target local contrast before and after processing, respectively. The CON is defined as where M t and µ b are same as those in Equation (24). The higher the above three indicators, the better the background suppression performance of the detection method is. Both the original image and the detection results are normalized to [0, 1] when calculating these indicators. The three metrics are calculated in target local region, supposing that the target size is a × b and d is set to 20 as the neighborhood width. Furthermore, the probability of detection (P d ) and false alarm rate (F a ) are also very important indicators for wholly evaluating the detection performance, which are defined as P d = number of true detections number of actual targets (28) F a = number of false detections number of images (29) In experiments, we deem that the detection of small target is correct under this case where there are pixels within a 5 × 5 window centering on the ground truth. A good detector owns high P d under low F a . The receiver operating characteristic (ROC) displays the dynamic relationship between P d and F a .

Validity of the Proposed Patch and Feature Sparse Regularizations
The importance of the patch and feature regularizations are validated with a series of experiments in four different scenarios. For Equation (7), the proposed model degenerates into a simple model with l 1/2 -norm non-negative constraint by setting γ 1 = γ 2 = 0, whereas γ 1 > 0, γ 2 = 0 generates the patch regularized model (PRM), and γ 1 = 0, γ 2 > 0 produces the feature regularized model (FRM). Figure 5 shows the ROC results of four derived models for the tested scenes. From the figures, one can easily discover that the full model achieves the highest performance among other three variants of the proposed model, and the simple model without any regularizations obtains the lowest. For the deep-space scene, PRM performs better than FRM, as shown in Figure 5a. It is because the patch graph regularization can preserve clutter edges and recover them well, but the ability of feature graph is limited resulting in the performance degradation. With a sky-cloudy background (Figure 5b), FRM slightly outperforms PRM since the dim and weak target can be captured effectively by FRM but be ignored by PRM. With a sky-sea background (Figure 5c), PRM achieves higher P d under the same F a compared with FRM. The reason lies in that sea glitters that present similar sparsity to the small target may disrupt the dimension of feature space. FRM cannot effectively restrain them, and they then may be remained in the target images to raise the false alarm rate. For the terrain scenes, there is no significant difference between the performance of PRM and FRM. However, their probability of detection is prominently lower than that of the full model. By the above observations, it readily finds that the integration of patch and feature sparse regularizations contributes to an improvement of the detection performance in the proposed model.

Sensitivity to Parameters
For the proposed model, there are several key parameters that affect its stability. Herein, we discuss the influence of these parameters on the model performance by investigating the variation of the detection probability with varying these parameters. The parameters include patch size, sliding step, sparse penalty λ , patch graph regularization weight 1 γ and feature graph regularization weight 2 γ . In order to better reflect the suitability of model parameters, we construct a tested dataset covering diverse scenes by randomly selecting 10 frames from each exhibited sequence. The variations in detection performance measured by ROC are visualized in Figure 6, where we vary one parameter while fixing others. Among them, patch size and sliding step are closely related to not only

Sensitivity to Parameters
For the proposed model, there are several key parameters that affect its stability. Herein, we discuss the influence of these parameters on the model performance by investigating the variation of the detection probability with varying these parameters. The parameters include patch size, sliding step, sparse penalty λ, patch graph regularization weight γ 1 and feature graph regularization weight γ 2 . In order to better reflect the suitability of model parameters, we construct a tested dataset covering diverse scenes by randomly selecting 10 frames from each exhibited sequence. The variations in detection performance measured by ROC are visualized in Figure 6, where we vary one parameter while fixing others. Among them, patch size and sliding step are closely related to not only detection performance but also computational complexity, as shown in Figure 6a,b and Table 4. From the two subfigures, it is observed that larger patch and step exhibit the degradation of performance. It has largely been caused by a larger patch size and sliding step that undermines the relationship between the nonlocal patches. Then, the patch and feature graph may be inefficient in the preservation of the intrinsic structure of an image. On the other hand, although smaller patch and step may weaken the sparsity of a small target, the incorporation of dual-graph regularization is more conducive to maintaining the manifold structure of rare interferences. Then, the singularity of a small target can be highlighted well. In addition, we give the average execution time of the proposed method with different patch size and sliding step in Table 4. One can see that with a fixed-size patch, the larger the sliding step is, the shorter time the proposed model costs. Observing the two subfigures, the proposed model achieves the best in the tested dataset when the patch size is 20 and sliding step is set as 10. The sparse penalty λ plays a great significant role in controlling the balance between detection probability and false alarm ratio. To adjust this parameter more finely, we use λ = L/min(m, n) 1/2 varying L instead of directly varying λ. Figure 6c shows the impact of the parameter on the performance of the proposed method with changing L from 0.5 to 5. In the illustration, it is clearly observed that a larger penalty can effectively eliminate false alarms, but the detection probability is also decreased dramatically. For example, when L belongs to (0, 2], the proposed method has a higher detection rate compared with the values outside the given interval. Although the ROC results of the proposed method become a straight line under larger penalty such as L in [2.5, 5], meaning that it achieves zero false alarms, the detection rate is reduced seriously. In the experiments, when L takes (0, 2], the proposed method is superior in robustness and effectiveness.
In spite of analyzing the importance of the proposed regularizations, we also performed experiments to evaluate the effect of the regularization parameters on the performance. Figure 6d-f shows the ROC results of the proposed method with regularization parameter variations for the constructed dataset. Among them, Figure 6d shows the result of varying γ 1 while fixing γ 2 , and Figure 6e is obtained by varying γ 2 while fixing γ 1 . For Figure 6f, it is obtained by varying both γ 1 and γ 2 while keeping γ 1 = γ 2 . From these figures, we can see that as the regularized parameters increase, the P d of the proposed method increases first and then decreases. It indicates that the penalty degree of different regularization has a certain impact on the performance of the proposed algorithm. In our test, γ 1 = γ 2 = 3 seems to be a good choice because it is the best in ROC. From these figures, we can see that as the regularized parameters increase, the d P of the proposed method increases first and then decreases. It indicates that the penalty degree of different regularization has a certain impact on the performance of the proposed algorithm. In our test, 1 2 3 γ γ = = seems to be a good choice because it is the best in ROC.

Qualitative Evaluations
To further evaluate the performance of the proposed model, the visual comparisons with 11 state-of-the-art methods for different scenes are shown in Figures 7-10. In these results, only one representative image is selected from each sequence. For better observation, we enlarged the demarcated target area and creatively integrated target images with their 3D stereogram. From the figures, one can find that the typical filtering methods, such as TDLMS, TopHat, can highlight targets to a certain extent while retaining many background clutter residues in the detected results. MOG also belongs to the category of background prediction but different from the filtering methods. It can achieve relatively good performance on dim small target extraction in uniform scenes, especially for the sky-cloud background in Figure 8a-c. Compared with the results obtained by background prediction manners, WLDM and MPCM present marked improvement in enhancing the target, as illustrated in Figures 7-10. However, for the sky-cloud background with sparse noise, such as in Figure 8a, these two methods significantly highlight the small target while enhancing the background noise. Moreover, WLDM does not have enough power to deal with the border of image, as shown in Figures 9a, b and 10a, b. FKRW first uses the filtering manner to eliminate high bright noise, and then employs the local saliency to suppress background clutter and enhance the small target. It is

Qualitative Evaluations
To further evaluate the performance of the proposed model, the visual comparisons with 11 state-of-the-art methods for different scenes are shown in Figures 7-10. In these results, only one representative image is selected from each sequence. For better observation, we enlarged the demarcated target area and creatively integrated target images with their 3D stereogram. From the figures, one can find that the typical filtering methods, such as TDLMS, TopHat, can highlight targets to a certain extent while retaining many background clutter residues in the detected results. MOG also belongs to the category of background prediction but different from the filtering methods. It can achieve relatively good performance on dim small target extraction in uniform scenes, especially for the sky-cloud background in Figure 8a-c. Compared with the results obtained by background prediction manners, WLDM and MPCM present marked improvement in enhancing the target, as illustrated in Figures 7-10. However, for the sky-cloud background with sparse noise, such as in Figure 8a, these two methods significantly highlight the small target while enhancing the background noise. Moreover, WLDM does not have enough power to deal with the border of image, as shown in Figure 9a,b and Figure 10a,b. FKRW first uses the filtering manner to eliminate high bright noise, and then employs the local saliency to suppress background clutter and enhance the small target. It is impressive at suppression of clutter and sparse noise in comparison with WLDM and MPCM. However, FKRW might suffer from incorrect detection, such as in Figure 7b. In addition, it is more susceptible to spot-like sea clutters in sea-sky scenes because the sea glitter has a similar appearance to a small target, as presented in Figure 9c. It is easily found that the methods derived on subspace learning perform better than other comparisons in background clutter removal. Meanwhile, we can clearly discover that the proposed method accurately extracts small targets from different complex scenes without any background residuals, which is attributed to the usage of the proposed patch-feature regularization. For the initial IPI model, it is less robust to complex scenes due to its above deficiencies. For example, in Figure 7a,b and Figure 9c, it incorrectly includes background clutter and sparse glitter into the target image, resulting in substantial false alarms. As extended versions of IPI, SMSL and RIPT improve the performance in eliminating false alarms. However, SMSL uses the manner of non-overlapping patch to construct the infrared patch-image. It will reduce the coherence of the sparse structure in patches. Some like-target points are left in the target image when using multi-subspace learning strategy, as shown in Figure 7a,b, Figures 8a, 9c and 10b. Besides, RIPT shows sensitivity to the scenes with sparse points, such as in Figures 8a, 9c and 10b. TVPCP and GRLA can be regarded as enhancements of the IPI model. They integrate different regularization, such as total variation and graph regularization into the IPI model. TVPCP applies the total variation as the constraint of the background smooth component. From the results, one can obviously see that it successfully extracts the small target but some background interferences still remain, as shown in Figure 7a,b, Figures 8a, 9c and 10b. Among the two enhancements, GRLA can suppress background clutter better than TVPCP. The reason for this is that the graph regularization is more efficient in preserving the intrinsic structure within the background to address non-smooth component in complex scenes. Nevertheless, GRLA only employs one-side graph regularization, say in patch space. It achieves a comparable performance in static scenes, but it will lead to missing detection in dynamic scenes.
However, FKRW might suffer from incorrect detection, such as in Figure 7b. In addition, it is more susceptible to spot-like sea clutters in sea-sky scenes because the sea glitter has a similar appearance to a small target, as presented in Figure 9c. It is easily found that the methods derived on subspace learning perform better than other comparisons in background clutter removal. Meanwhile, we can clearly discover that the proposed method accurately extracts small targets from different complex scenes without any background residuals, which is attributed to the usage of the proposed patchfeature regularization. For the initial IPI model, it is less robust to complex scenes due to its above deficiencies. For example, in Figures 7a, b and 9c, it incorrectly includes background clutter and sparse glitter into the target image, resulting in substantial false alarms. As extended versions of IPI, SMSL and RIPT improve the performance in eliminating false alarms. However, SMSL uses the manner of non-overlapping patch to construct the infrared patch-image. It will reduce the coherence of the sparse structure in patches. Some like-target points are left in the target image when using multi-subspace learning strategy, as shown in Figures 7a, b, 8a, 9c, and 10b. Besides, RIPT shows sensitivity to the scenes with sparse points, such as in Figures 8a, 9c, and 10b. TVPCP and GRLA can be regarded as enhancements of the IPI model. They integrate different regularization, such as total variation and graph regularization into the IPI model. TVPCP applies the total variation as the constraint of the background smooth component. From the results, one can obviously see that it successfully extracts the small target but some background interferences still remain, as shown in Figures 7a, b, 8a, 9c, and 10b. Among the two enhancements, GRLA can suppress background clutter better than TVPCP. The reason for this is that the graph regularization is more efficient in preserving the intrinsic structure within the background to address non-smooth component in complex scenes. Nevertheless, GRLA only employs one-side graph regularization, say in patch space. It achieves a comparable performance in static scenes, but it will lead to missing detection in dynamic scenes.

Quantitative Evaluations
With the exception of the subjective visual evaluations, objective indicators are also important evaluation criteria. In this subsection, we provide the comparisons of the average G SCR , BSF , and CG , which are obtained by all involved methods in the different scenes, as shown in Table 5. In experiments, the 10 infrared sequences are divided into four categories according to the type of scenes. In the table, we mark the best three results by red, blue, and green, respectively. The results present that, for most cases, the proposed method earns the first or second ranking place on these sequences across different indicators. Regarding the average G SCR and BSF , the proposed method gets the highest scores in all tested methods for different scenarios, which means that our method has the best performance in clutter removal and background suppression. Additionally, GRLA is lower than RIPT in the deep-space scenes, but higher for the other three scenes. It is clear that methods based on subspace learning are generally better than other types of methods. This is because the IPI model takes advantage of redundant information of the image patch to improve the robustness for diverse scenes. It is noteworthy that, for CG , the proposed method ranks the second in deep-space and skycloud scenes and the third in sea-sky and terrain-sky scenes. The reason lies in that the proposed method can separate small targets cleanly but does not enhance small targets. In contrast, although WLDM and MPCM do not completely suppress the background, they distinctly strengthen the target intensity. WLDM and MPCM obtain relative superior performance in CG , but their detection results contain numerous false alarms. In addition, the three indicators merely reflect the good performance in a local region, and do not necessarily represent the overall performance of the method.

Quantitative Evaluations
With the exception of the subjective visual evaluations, objective indicators are also important evaluation criteria. In this subsection, we provide the comparisons of the average SCR G , BSF, and CG, which are obtained by all involved methods in the different scenes, as shown in Table 5. In experiments, the 10 infrared sequences are divided into four categories according to the type of scenes. In the table, we mark the best three results by red, blue, and green, respectively. The results present that, for most cases, the proposed method earns the first or second ranking place on these sequences across different indicators. Regarding the average SCR G and BSF, the proposed method gets the highest scores in all tested methods for different scenarios, which means that our method has the best performance in clutter removal and background suppression. Additionally, GRLA is lower than RIPT in the deep-space scenes, but higher for the other three scenes. It is clear that methods based on subspace learning are generally better than other types of methods. This is because the IPI model takes advantage of redundant information of the image patch to improve the robustness for diverse scenes. It is noteworthy that, for CG, the proposed method ranks the second in deep-space and sky-cloud scenes and the third in sea-sky and terrain-sky scenes. The reason lies in that the proposed method can separate small targets cleanly but does not enhance small targets. In contrast, although WLDM and MPCM do not completely suppress the background, they distinctly strengthen the target intensity. WLDM and MPCM obtain relative superior performance in CG, but their detection results contain numerous false alarms. In addition, the three indicators merely reflect the good performance in a local region, and do not necessarily represent the overall performance of the method. To further assess the superiority of the algorithm globally, Figure 11 presents the ROC results obtained by different methods on the 10 displayed sequences. From the figure, we clearly identify that the proposed method achieves the best behaviors with respect to both detection accuracy and stability on different scenes compared with those competitive methods. Especially for the Sequences 2-7 ( Figure 11b-g), our method attains the highest P d at the lowest F a . For Sequences 1 and 10 ( Figure 11a,j), RIPT owns higher P d when the F a is less than 0.4 and 0.45, respectively. Nevertheless, with the increase of F a , the proposed method reaches the highest P d scores faster than RIPT. In addition, the significant changes in the performance of the compared algorithms can be easily discovered. For example, GRLA outperforms the others in Sequences 1, 6, 7, and 9 ( Figure 11a,f,g,i), but is slightly inferior to some methods for other sequences. The P d of RIPT in Sequences 1,5,7,9,10 (Figure 11a,e,g,i,j) arrives 1 within the range of F a less than 1. However, it obtains lower P d in remaining sequences, especially in Sequence 3. Figure 11e,g-i displays that TVPCP shows an impressive performance in these sequences while its performance declines seriously for other remaining sequences. Moreover, due to the limited capability to suppress background clutter, approaches based on background prediction and target saliency, such as TDLMS, TopHat, MOG, WLDM, and MPCM, have low values of P d in the range of low F a . Additionally, FKRW is less robust in Sequences 1, 2, 8, 10, obtaining low P d . The conclusions drawn from objective evaluations demonstrate that our method is superior and more robust in detection performance against different scenes as compared with the baselines. remaining sequences. Moreover, due to the limited capability to suppress background clutter, approaches based on background prediction and target saliency, such as TDLMS, TopHat, MOG, WLDM, and MPCM, have low values of d P in the range of low a F . Additionally, FKRW is less robust in Sequences 1, 2, 8, 10, obtaining low d P . The conclusions drawn from objective evaluations demonstrate that our method is superior and more robust in detection performance against different scenes as compared with the baselines.

Convergence Analysis
The convergence of the proposed method solved by LADMAP was empirically studied on the real sequences, as illustrated in Figure 12. The relative error ( e r ) in objective function served as the iteration stop criterion, which was computed in each iteration by Figure   12a displays the changes of the relative error e r in successive iterations of the objective function.
Observing the figure, we can find that the declining rate of relative error is of great difference among Sequences 3, 5, 7, 9, and 10, and is almost uniform for the remaining sequences. Furthermore, the relative error drops sharply between iterations 1 and 2 that indicates a very fast convergence of the proposed method. Besides, the average number of iterations of the proposed method in all sequences is provided in Figure 12b, from which we can see that the convergence is obtained in less than 12 iterations.

Execution Time Comparison
To evaluate the computational efficiency of the proposed method more intuitively, we provide a comparison of the average time executed by the proposed method and comparative methods on 10 sequences. All of the tested algorithms were implemented on a PC with Intel Core i5 CPU 3.4 GHz and 8GB RAM by MATLAB R2016b. Considering that the running time may be affected by the

Convergence Analysis
The convergence of the proposed method solved by LADMAP was empirically studied on the real sequences, as illustrated in Figure 12. The relative error (r e ) in objective function served as the iteration stop criterion, which was computed in each iteration by D − B k+1 − T k+1 F / D F < ε 2 . Figure 12a displays the changes of the relative error r e in successive iterations of the objective function. Observing the figure, we can find that the declining rate of relative error is of great difference among Sequences 3, 5, 7, 9, and 10, and is almost uniform for the remaining sequences. Furthermore, the relative error drops sharply between iterations 1 and 2 that indicates a very fast convergence of the proposed method. Besides, the average number of iterations of the proposed method in all sequences is provided in Figure 12b, from which we can see that the convergence is obtained in less than 12 iterations.

Convergence Analysis
The convergence of the proposed method solved by LADMAP was empirically studied on the real sequences, as illustrated in Figure 12. The relative error ( e r ) in objective function served as the iteration stop criterion, which was computed in each iteration by Figure   12a displays the changes of the relative error e r in successive iterations of the objective function.
Observing the figure, we can find that the declining rate of relative error is of great difference among Sequences 3, 5, 7, 9, and 10, and is almost uniform for the remaining sequences. Furthermore, the relative error drops sharply between iterations 1 and 2 that indicates a very fast convergence of the proposed method. Besides, the average number of iterations of the proposed method in all sequences is provided in Figure 12b, from which we can see that the convergence is obtained in less than 12 iterations.

Execution Time Comparison
To evaluate the computational efficiency of the proposed method more intuitively, we provide a comparison of the average time executed by the proposed method and comparative methods on 10 sequences. All of the tested algorithms were implemented on a PC with Intel Core i5 CPU 3.4 GHz and 8GB RAM by MATLAB R2016b. Considering that the running time may be affected by the parameter settings of the tested methods, we used the installing parameters obtained under optimal

Execution Time Comparison
To evaluate the computational efficiency of the proposed method more intuitively, we provide a comparison of the average time executed by the proposed method and comparative methods on 10 sequences. All of the tested algorithms were implemented on a PC with Intel Core i5 CPU 3.4 GHz and 8GB RAM by MATLAB R2016b. Considering that the running time may be affected by the parameter settings of the tested methods, we used the installing parameters obtained under optimal performance, as given in Table 2. Table 6 provides the average running time of each tested method. From the table, one can find that the computational cost of MOG and TVPCP is much higher than other algorithms. TopHat takes the shortest time in all test methods, but its stability is the worst. Among subspace learning models, SMSL is the fastest since it employs the block coordinate descent method to avoid SVD in every iteration. As can be seen, although the proposed method is not the fastest, its execution time is at the same level as RIPT and GRLA. Furthermore, the execution time of the proposed method can be accelerated by parallelly implementing the construction of graph.

Discussion
From the above analysis, we can see that the demand of real time detection can be satisfied by the methods based on local prior, e.g., local spatial consistency, local saliency. Additionally, these methods do really highlight a small target in the scenes with high signal-to-clutter ratio. However, they have high false alarm rates for the complex scenes due to the incapability to remove strong edges and bright spots. Methods based on subspace learning, by comparison, show the superiority in background suppression and sparse point elimination. It is mainly attributed to the usage of global prior including the nonlocal self-correlation of background and the sparsity of the small target. Nevertheless, the original IPI model still suffers from the performance degradation when facing extremely complicated scenes due to the aforementioned limitations. To overcome its flaws, some improvements have been developed subsequently along three lines: (1) selecting tighter rank approximation function to reduce the bias of background estimation, recovering background better [23,24]; (2) applying a sparse enhancement strategy such as reweighting to suppress non-target outliers in the background [21,22]; (3) imposing structural regularization to strengthen the correlation of background spatial structure, promoting the recovery of background [45,46]. These methods evidently reduce the false alarm rate and increase the detection probability for some complex scenes. However, these methods only employ the prior in patch space ignoring the one in feature space. Additionally, there is no closed form solution to the subproblems of some rank function surrogates. Hence, some of them may not only perform less robust in complex scenes with a dim small target, but have increased computational cost. Different from the above methods, our proposed method integrates the prior in both data space and feature space in the form of dual graph regularization. This manner effectively constrains the background structures of deviating from low-rank subspace, boosting the suppression of clutters and edges. In addition, we employed l 1/2 -norm instead of l 1 -norm and its reweighting versions to constrain the target component, whose minimization realizes a sparser solution. It leads to a better removal of sparse outliers while preserving the small target. Finally, the solution of the objective function can be obtained by a well-designed optimization framework derived from LADMAP. By reducing the introduction of alternating multipliers, this framework not only reduces the number of iterations but also makes the computation friendly. The above experimental analysis has proved that the proposed method outperforms other tested methods in edge clutter suppression and sparse spot elimination. However, some issues still need to be considered. For example, the sparse constraint potentially assumes that the target elements are mutually independent, neglecting the spatial and pattern relation of some small targets. It will lead to the incompleteness of detection target. Furthermore, how to enhance the energy of small targets in the process of background separation is also a problem worthy of consideration. It will further increase the detection rate of dim and small targets. Following this, the future work can be concentrated on three lines: (1) designing a model to explicitly encode the spatial relation and feature similarity of small targets; (2) formulating a multi-frame detection model to fully explore the temporal cues in infrared sequences; (3) introducing appropriate deep learning framework to extract the target adaptively.

Conclusions
In this study, we gave an improvement on infrared small target detection when encountering complex scenes with cloudy clutter, sea glitter, and man-made interference. This is realized by incorporating the patch-feature sparse regularization into a RPCA framework with l 1/2 -norm constraint. The l 1/2 -norm can approximate a sparser solution than the traditional l 1 -norm, and achieve effective removal of pixel-sized interferences. On the other hand, the knowledge embodied in the patch and feature space is imposed on the sparse component through the patch and feature graph Laplacian, where each graph preserves the corresponding manifold structure. The proposed model is efficiently solved by LADMAP. We demonstrate that the proposed model obtains excellent performance over different real infrared sequences compared with 11 state-of-the-art methods in terms of both objective evaluation and visual effect.
Author Contributions: F.Z. conceived the original idea, conducted the experiments, and wrote the manuscript. Y.D. and K.N. helped with data collection and revised the manuscript. Y.W. contributed to the writing, content, and revised the manuscript. All authors have read and agreed to the published version of the manuscript.