Infrared Small Target Detection via Non-Convex Tensor Rank Surrogate Joint Local Contrast Energy

: Small target detection is a crucial technique that restricts the performance of many infrared imaging systems. In this paper, a novel detection model of infrared small target via non-convex tensor rank surrogate joint local contrast energy (NTRS) is proposed. To improve the latest infrared patch-tensor (IPT) model, a non-convex tensor rank surrogate merging tensor nuclear norm (TNN) and the Laplace function, is utilized for low rank background patch-tensor constraint, which has a useful property of adaptively allocating weight for every singular value and can better approximate (cid:96) 0 -norm. Considering that the local prior map can be equivalent to the saliency map, we introduce a local contrast energy feature into IPT detection framework to weight target tensor, which can e ﬃ ciently suppress the background and preserve the target simultaneously. Besides, to remove the structured edges more thoroughly, we suggest an additional structured sparse regularization term using the (cid:96) 1,1,2 -norm of third-order tensor. To solve the proposed model, a high-e ﬃ ciency optimization way based on alternating direction method of multipliers with the fast computing of tensor singular value decomposition is designed. Finally, an adaptive threshold is utilized to extract real targets of the reconstructed target image. A series of experimental results show that the proposed method has robust detection performance and outperforms the other advanced methods.


Introduction
Infrared imaging as an important means of photoelectric detection has wide applications, such as space surveillance, remote sensing, missile tracking, infrared search and track (IRST), etc. [1][2][3][4]. The performance of these applications depends greatly on robust target detection [5][6][7][8]. However, detection robustness decreases at long distances. On the one hand, the target is small without specific shape and texture, so that no obvious spatial structure information can be used. On the other hand, due to complex backgrounds and heavy imaging noise, infrared images usually have low signal-to-clutter ratio, making target detection exceedingly difficult [8][9][10]. Moreover, various interferences, such as heavy cloud edges, sea clutters, and artificial heat source on the ground, usually cause high false alarm rates and weaken detection performance. Therefore, it is a very valuable and challenging work to study the small infrared target detection in the complex background [9,11].

Related Works
The single-frame small infrared target detection methods based on local prior information have been widely studied, which can be roughly classified into two groups. The first class of methods assumes that local background is uniform and the small target would break the local uniformity. Thus, the background can be estimated, and then the small target can be enhanced via subtracting the estimated background from the raw image. Classical background estimation methods, including max-mean and max-median filters [25], two-dimensional least mean square (TDLMS) filter [26], and Top-hat filter [27], have achieved good performance under uniform background. Nevertheless, it is hard to choose a suitable structural element to match various targets and clutters, thus these filters cannot solve complex and varied real scenes. By introducing the estimation of edge direction information, some improved methods have been proposed [28,29]. Other background estimation techniques, such as wavelet transform [30] and kernel-based nonparametric regression method [31], are much investigated as well [32]. The second category methods focus on the saliency of target and produce a saliency map via calculating local contrast. It has been confirmed that the contrast mechanism plays an important role in human visual system (HVS) [33]. Inspired by HVS, several approaches based on local contrast, such as local contrast measure (LCM) [34], Laplacian of Gaussian (LoG) filter [33], high-boost multi-scale local contrast measure (HBMLCM) [35], multi-scale patch contrast measure (MPCM) [10], homogeneity-weighted local contrast measure (HWLCM) [36], derivative entropy contrast measure (DECM) [37], etc., have been developed progressively.
Remote Sens. 2020, 12, 1520 3 of 31 In the last years, the methods based on nonlocal prior, which mainly utilize the sparsity of small target and the nonlocal self-correlation property of background, have attracted more and more attention. Supposing that background patches belong to the low-rank subspace and the small target is regarded as an outlier in the input image, an innovative infrared patch-image (IPI) model, which converts the small target detection to an optimization problem of separating sparse and low-rank matrices, is first introduced in [38]. Compared with the traditional filtering methods, IPI model shows a superior background suppression performance. However, there are some shortcomings in IPI model, such as target over-shrinking, clutter residuals, and time consuming. To address the above problems and further improve the detection ability, several reformative models, such as column-wise weighted IPI model [39], NIPPS model [40], reweighted IPI model [41], NRAM model via non-convex rank approximation [42], and NOLC model using Lp-norm constraint [5], have been proposed gradually. To dig out more information in patch space, Dai et al. [23] extended IPI model to patch-tensor space and proposed a reweighted infrared patch-tensor (RIPT) model. RIPT with weighted tensor nuclear norm (WNRIPT) was proposed [43]. In RIPT model, a local structure prior was incorporated into infrared patch-tensor (IPT) model to preserve the dim small targets and suppress the residual edges. Subsequently, the small infrared target detection approaches of the combining IPT model with local prior, such as partial sum of tensor nuclear norm (PSTNN) model [8], have emerged and shown the state-of-the-art performance.

Motivation
Normally, infrared background is considered to show a slow transition, which means that there are high correlations between the patches in the image [23]. In other words, low-rank is the inherent characteristic of the background patch-tensor in IPT model. This low-rank property can be described by minimizing the rank of tensor. Thus, it is very important how to define appropriately tensor rank for infrared small target detection. In RIPT model, the sum of nuclear norm (SNN) of the unfolding matrices is used as the definition of the infrared background tensor rank [44]. However, because the unfolding operator may damage the inherent structure of an infrared background patch-tensor, the rank defined by the unfolding matrices cannot accurately describe the low-rank property of the tensor [45]. Meanwhile, SNN is not tight convex relaxation of the rank sum of unfolding matrices [46], which may lead to a suboptimal value. In recent years, the singular value decomposition of matrix has been extended to tensor space and the tensor singular value decomposition (t-SVD) was proposed [46]. The tensor nuclear norm (TNN) derived from t-SVD has been proposed and extensively studied and applied [46][47][48][49][50]. One disadvantage of TNN is that all singular values are treated equally. Nevertheless, for infrared image, each singular value has a different importance and explicit physical meaning, and thus should be treated differentially; for example, a larger singular value conveys image details and should be allocated smaller weight in the background tensor. In PSTNN model [8], a low-rank constraint named partial sum of the tensor nuclear norm, which replaces TNN as the non-convex approximation of tensor multi-rank, has been introduced to the IPT model for small target detection. Unfortunately, PSTNN only truncates some singular values and has the same weight for the retained singular values. In addition, the PSTNN requires setting a ratio parameter for predicted rank of background tensor. Due to that infrared scenes could change from uniform to complex and the target sizes are also variable in practical application, it is unreasonable to employ a fixed ratio parameter for background rank constraint. These two reasons would make PSTNN model easy to produce false alarm in some real scenarios. Recently, a non-convex surrogate via Laplace function has been proposed for low-rank tensor completion [45]. The Laplace function can more tightly approximate to the 0 -norm than nuclear norm. Meanwhile, it has a useful property of adaptively allocating weight for every singular value. Thus, minimizing the sum of the Laplace function of singular values is a better constraint for low-rank infrared background tensor.
Because nonlocal prior and local prior are complementary for small infrared target detection task, simultaneously utilizing nonlocal and local priors can improve the detection performance under complex backgrounds [23]. In RIPT model, a local structure weight map is constructed to measure the saliency of edges and is merged into IPT detection framework. However, the local structure prior only takes into account the prior of background correlation while neglecting the priori of target correlation, and easily suffers from target over-shrinking and corner disappearance [8]. To alleviate the issue of RIPT, PSTNN has proposed an improved local structure descriptor associated to both background and target priors. However, the improved local structure descriptor still retains some edge structures. In fact, the local prior information is used to weight target patch-tensor in IPT detection framework. Thus, a local prior weight map could be equivalent to the saliency map, which can be produced by precious detection methods based on local contrast. To overcome these problems of RIPT and PSTNN, we introduce a local contrast energy feature into IPT detection framework in this paper.
In addition, the common challenge, faced by most advanced methods at present, is that some rare structured interference cannot be completely eliminated, especially the stubborn edge interference [42]. The edge residuals in the target patch-tensor are generally considered to be linearly structured sparse, while the 1,1,2 -norm of third-order tensor is tube-wise sparse [50] and can be related to structured sparse. Thus, to further improve detection performance of IPT model, we introduce an additional structured sparse regularization term utilizing the 1,1,2 -norm into IPT model.
In summary, to overcome some deficiencies of current detection methods based on IPT model, we propose a novel small infrared target detection model via non-convex tensor rank surrogate (NTRS) and local contrast energy. The main contributions for the article are listed below.
(1) First, to more appropriately characterize the low-rank property of background tensor, we apply a non-convex tensor rank surrogate via Laplace function to infrared small target detection. The non-convex surrogate can adaptively assign different weights to singular values and can approximate 0 -norm better. The advantages of the method lead to a more robust target background separation performance. (2) Second, by introducing a novel local contrast energy feature into IPT model, the proposed model, which takes advantages of IPT model and traditional local contrast detection method, can suppress the complex background and preserve the dim small target better. (3) Third, considering that residual strong edge interferences are linearly structured sparse, we add a structured sparse item utilizing the 1,1,2 norm constraint to IPT model, which can reduce false alarm caused by structured sparse interference sources. (4) Fourth, an optimization way via alternating direction method of multipliers (ADMM) is designed to solve the non-convex model accurately and efficiently.
The rest of this paper is arranged as follows. In Section 2, we give some necessary mathematical symbols and definitions and briefly introduce IPT model. In Section 3, the details of the proposed NTRS model and the solution of that are described, and the whole procedure of small target detection is presented. A series of experiments was conducted and experimental results are shown in Section 4. Finally, the conclusion of this paper is presented.

Mathematical Symbols and Definitions
In the following, x, x, X, and X are used to represent a scalar, a vector, a matrix, and a tensor, respectively. For a third-order tensor X, the slice can be acquired via fixing one of its indexes. The frontal, lateral, and horizontal slices are represented as X(:, :, k), X(:, j, :), and X(i, :, :), respectively. Its tube, row, and column are denoted as X(i, j, :), X(i, :, k), and X(:, j, k), respectively. We use X(i, j, k) and x ijk to denote (i, j, k)th element of X. In most cases, X (k) is used to denote kth frontal slice. The Frobenius norm of X is defined as X F = ( i,j,k x ijk 2 ). The 0 -norm X 0 is defined as the non-zero element number of tensor. The 1 -norm of X is defined as X 1 = i,j,k x ijk . The inner product of two third-order tensors Y and X is defined as Y, X = i,j,k y ijk x ijk . X ∈ C m 1 ×m 2 ×m 3 is used to denote the result of the Fast Fourier Transformation (FFT) of X ∈ R m 1 ×m 2 ×m 3 along the tube, as X = f f t(X, [ ], 3). In the same fashion, the inverse FFT (IFFT) operator computes X from X, as X = i f f t(X, [ ], 3).

Definition 1.
(identity tensor) [51]. A tensor I ∈ R m 1 ×m 1 ×m 3 is called the identity tensor whose first frontal slice is a identity matrix with size m 1 × m 1 , while other frontal slices are all zeros.
Definition 2. (f-diagonal tensor) [51]. The f-diagonal tensor is the tensor with every of its frontal slice being diagonal matrix.
The block circulant matrix of X can be formulated as In addition, unfold and fold operators [52,53] can be formulated as Definition 6. (t-product) [53]. Given A ∈ R n 1 ×n 2 ×n 3 and B ∈ R n 2 ×n 4 ×n 3 , the t-product A * B is defined based on the block circulant matrix, as follows Remote Sens. 2020, 12, 1520 6 of 31 The t-product has some properties being alike to the matrix product, such as it satisfies the associative law [54]: A * (B * C) = (A * B) * C. In the Fourier domain, the t-product can be converted to the product of block diagonal matrix. Thus, Equation (6) can be rewritten as Theorem 1. (t-SVD) [51,53,55] For a tensor X ∈ R m 1 ×m 2 ×m 3 , its t-SVD is formulated as where U ∈ R m 1 ×m 1 ×m 3 and V ∈ R m 2 ×m 2 ×m 3 are orthogonal tensors, S is a rectangular f-diagonal tensor with size m 1 × m 2 × m 3 . The entries in S are known as the singular values. In Figure 1, the t-SVD is illustrated. By generalizing Equation (7), X can be formulated as Thus, we can perform the matrix SVD on each frontal slice of X, as X . In other words, t-SVD can be obtained via computing matrix SVDs in the Fourier domain. A most efficient way at present for computing t-SVD is shown in Algorithm 1 [54].

Definition 7.
(tensor multi rank and tubal rank) [55]. The tensor multi rank of a tensor X ∈ R m 1 ×m 2 ×m 3 is a vector r = r 1 , r 2 , · · · , r k , · · · , r m 3 , r k equals to the rank of kth frontal slice of X in the Fourier domain. The tensor tubal rank r is defined as the number of non-zero singular tubes of S, where S comes from the t-SVD of X. An alternative definition of tubal rank is defined to be the largest rank of all the frontal slices of X, which means r = max{r}.
Remote Sens. 2020, 12, x FOR PEER REVIEW 6 of 32 The t-product has some properties being alike to the matrix product, such as it satisfies the associative law [54]: In the Fourier domain, the t-product can be converted to the product of block diagonal matrix. Thus, Equation (6) can be rewritten as Theorem 1. (t-SVD) [51,53,55] For a tensor m m ḿ´. The entries in  are known as the singular values. In Figure 1, the t-SVD is illustrated. By generalizing Equation (7),  can be formulated as Thus, we can perform the matrix SVD on each frontal slice of  , as ( ) In other words, t-SVD can be obtained via computing matrix SVDs in the Fourier domain. A most efficient way at present for computing t-SVD is shown in Algorithm 1 [54].

Definition 7.
(tensor multi rank and tubal rank) [55]. The tensor multi rank of a tensor m m ḿ´, respectively.  is also a f-diagonal tensor.

Infrared Patch-Tensor Model
Generally, an infrared image with small target can be described as the following model [31]: where f D , f B , and f T are raw image, background image, target image, respectively, f N denotes stochastic noise component, (i, j) is the coordinate of the pixel. Considering different assumptions about target and background can results in different small target detection methods. Depending on the assumptions of the target sparsity and the nonlocal background self-correlation, the conventional infrared image model can be extended to IPI model via constructing local patches [38]. Then, the task of target detection was formulated as a problem of separating the sparse and low rank matrices. To more fully exploit spatial correlationships, IPI model is further extended to tensor space and an innovative framework of separating background and target called infrared patch-tensor (IPT) model, which can be described as Equation (11), is proposed.
where D, T , B, N ∈ R m 1 ×m 2 ×m 3 represent the patch-tensor of raw, target, background, and noise, respectively. m 1 and m 2 denote height and width of a local patch and m 3 denotes the number of patches. Similar to IPI model, the local patch-images can be got by sliding a window from upper left corner to lower right corner on the image. Then, the image patch-tensor can be constructed by directly stacking these patch-images into a 3D cube [8], as shown in Figure 2.

Infrared Patch-Tensor Model
Generally, an infrared image with small target can be described as the following model [31]:  [38]. Then, the task of target detection was formulated as a problem of separating the sparse and low rank matrices. To more fully exploit spatial correlationships, IPI model is further extended to tensor space and an innovative framework of separating background and target called infrared patch-tensor (IPT) model, which can be described as Equation (11), is proposed.
where  ,  ,  ,  The background patch-tensor  can be considered as a low rank tensor. There is no doubt that target patch-tensor  belongs to sparse tensor, since the small target merely occupies very small area in entire image. Meanwhile, assuming that noise is additive white Gaussian noise, IPT model formulates small target detection as the decomposition problem of sparse tensor and low rank tensor by solving the following tensor robust principle component analysis (TRPCA) optimization: where l denotes a compromising parameter, 0 . stands for 0  -norm.

The Nonconvex Surrogate of Tensor Rank
In IPT model, one of the critical issues is how to measure the low-rank property of background tensor. However, unlike the rank of matrix, it is not easy to define a good tensor rank [47]. Researchers The background patch-tensor B can be considered as a low rank tensor. There is no doubt that target patch-tensor T belongs to sparse tensor, since the small target merely occupies very small area in entire image. Meanwhile, assuming that noise is additive white Gaussian noise, IPT model formulates small target detection as the decomposition problem of sparse tensor and low rank tensor by solving the following tensor robust principle component analysis (TRPCA) optimization: where λ denotes a compromising parameter, . 0 stands for 0 -norm.

The Nonconvex Surrogate of Tensor Rank
In IPT model, one of the critical issues is how to measure the low-rank property of background tensor. However, unlike the rank of matrix, it is not easy to define a good tensor rank [47]. Researchers have proposed some definitions of tensor rank, for example, CP-rank [56] and Tucker rank [57], but they all have their own limitations [58].
Rooted in tensor singular value decomposition, tensor multi-rank and tubal-rank have been defined. Furthermore, as a convex relaxation of 1 -norm of the tensor multi-rank, tensor nuclear norm (TNN) has been proposed [50]. TNN of a third-order tensor X ∈ R m 1 ×m 2 ×m 3 can be formulated as follows: where · * is nuclear norm of matrix. Although the matrix nuclear norm in the Fourier domain is tractable, it would cause some unavoidable biases. To alleviate these bias phenomena caused by a convex surrogate, the non-convex relaxations of the matrix nuclear norm are reasonable options. On the foundation of the partial sum of singular values (PSSV) [59], PSTNN was proposed as a non-convex approximation 1 -norm of the tensor multi-rank [60]. For a third-order tensor X ∈ R m 1 ×m 2 ×m 3 , PSTNN is written as follows: where . From Equation (14), it can be observed that the PSTNN only truncates some large singular values and has the same weight for the retained small singular values. In addition, the PSTNN requires setting an important parameter N for the predicted rank constraint. Recently, the Laplace function is introduced into TNN to generate another non-convex approximation of tensor multi-rank, which has been used to solve the low rank tensor completion problem [45]. The non-convex tensor rank surrogate on the basis of Laplace function is defined as follows: where m = min(m 1 , m 2 ) and ε is a positive constant. φ(x) = 1 − e −x/ε represents a Laplace function, which can approximate to the 0 -norm better compared with 1 -norm, as shown in Figure 3. Moreover, different from PSTNN, the Laplace function can automatically distribute the appropriate weight to each singular value. Therefore, the sum of the Laplace function is a better surrogate for tensor multi-rank. The Laplace function based non-convex tensor rank surrogate has some useful properties [45].
), which means the rank approximately equal to the sum of all elements of multi rank, when ε is a small value. Secondly, UXV ε = X ε , for any orthonormal tensor U ∈ R m 1 ×m 1 ×m 3 and V ∈ R m 2 ×m 2 ×m 3 . Thirdly, X ε ≥ 0 for any X ∈ R m 1 ×m 2 ×m 3 , and X ε = 0 if and only if X = 0.
Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 32 Based on the above considerations, the Laplace function based non-convex surrogate is an excellent candidate to represent the rank of a third-order tensor. Thus, we can minimize the non-convex surrogate to measure the low rank property of background patch-tensor  From Equation (15), we can observe that the non-convex tensor rank surrogate is a linear combination of all frontal slices Laplace function in Fourier domain along the tube dimension. Therefore, in the Fourier domain, the optimization problem in Equation (16) can be achieved by solving 3 m matrix optimization problem [45] , which is formulated as follows: where 3 1, , k m =  , and (17) is solved via the generalized weighted singular value thresholding operator [61], shown in Equations (18) and (19).
where , , is the gradient of f at  Based on the above considerations, the Laplace function based non-convex surrogate is an excellent candidate to represent the rank of a third-order tensor. Thus, we can minimize the non-convex surrogate to measure the low rank property of background patch-tensor B ∈ R m 1 ×m 1 ×m 3 . Aiming for the non-convex surrogate, optimization problem of the low rank tensor B is written in Equation (16) argmin From Equation (15), we can observe that the non-convex tensor rank surrogate is a linear combination of all frontal slices Laplace function in Fourier domain along the tube dimension. Therefore, in the Fourier domain, the optimization problem in Equation (16) can be achieved by solving m 3 matrix optimization problem [45], which is formulated as follows: where k = 1, · · · , m 3 , and Z (17) is solved via the generalized weighted singular value thresholding operator [61], shown in Equations (18) and (19).
Remote Sens. 2020, 12, 1520 is the gradient of φ at σ k,l j , and σ k,l j is the jth singular value of the kth frontal slice of B at the lth iteration. Finally, B can be obtained by inverse FFT. Each iteration solution of the optimization problem in Equation (16) is briefly described in Algorithm 2.
Algorithm 2 Each iteration solution of optimization problem in Equation (16).
can be obtained by Equation (18);

Local Prior Weight Map
It has been observed that local prior is different from nonlocal prior, and that local prior is a significant supplement to IPT detection framework [23]. In the RIPT model, the local structure map is used as local prior information to weight the target patch-tensor. The PSTNN model further analyzes the local structure of the infrared image and proposes an improved local structure weight map. Actually, the background should be suppressed and the target should be enhanced as much as possible in the local prior weight map. A saliency map based on local contrast has similar effect. Thus, we can construct the local prior weight map by utilizing local contrast features.
In infrared small target detection community, the concept of local contrast has been extensively applied [33,34]. Considering that the small target is brighter than its adjacent background in the local region, Xia et al. [9] proposed a local contrast element to describe the discontinuity between target and local adjacent background.
For the infrared image f D , sub-block B is defined as a local region in the whole image. In Figure 4a, represents a set of all neighborhood pixels, the Chebyshev distance of which to the center location (x, y) is k, and is defined as follows: Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 32 k xy B that has the minimum difference with the center pixel, we have where ( , ) c f x y f i j k d (22) Using the minimal operation in Equation (21), the local contrast element is sensitive to the direction information. That is because the small infrared target generally obeys the Gaussian distribution of spot-like; in other words, its information is similar in all directions, while the background edge commonly has a specific direction in the sub-block and the information varies greatly in different directions. As a result, local contrast element can effectively distinguish the small target and the salient clutter edges, such as cloud edge, building edge. By utilizing the local contrast element, a local energy feature which describes the discontinuity structure between a center pixel and its neighboring pixels can be defined as Considering for practical applications, the infrared small target shows a patch brighter than the surrounding background. Thus, pixels with contrast energy below 0 are considered as dark spots and should be excluded, and the saliency map SM can be obtained by Equation (24): Further, the local prior weight map lp W is defined as normalization of SM : The construction of B k (x,y) is shown in Figure 4b; the B k (x,y) corresponding to different k values is represented by dotted lines with different colors. Defining (i k xy , j k xy ) as the pixel in B k (x,y) that has the minimum difference with the center pixel, we have where f (i, j) and f (x, y) are the grayscale of the corresponding pixels and · is the floor function. Then, a local contrast element c k xy can be formulated as Using the minimal operation in Equation (21), the local contrast element is sensitive to the direction information. That is because the small infrared target generally obeys the Gaussian distribution of spot-like; in other words, its information is similar in all directions, while the background edge commonly has a specific direction in the sub-block and the information varies greatly in different directions. As a result, local contrast element can effectively distinguish the small target and the salient clutter edges, such as cloud edge, building edge. By utilizing the local contrast element, a local energy feature which describes the discontinuity structure between a center pixel and its neighboring pixels can be defined as Considering for practical applications, the infrared small target shows a patch brighter than the surrounding background. Thus, pixels with contrast energy below 0 are considered as dark spots and should be excluded, and the saliency map SM can be obtained by Equation (24): Remote Sens. 2020, 12, 1520 12 of 31 Further, the local prior weight map W lp is defined as normalization of SM: (25) where sm max and sm min denote the maximum and minimum of SM. The different local prior weight maps are displayed in Figure 5. By using the local energy feature, it can be observed that the complex background clutters are fairly suppressed and the target is effectively enhanced in the proposed local prior weight map. Compared with the prior weight map of PSTNN, the proposed local energy feature can suppress the strong edge more significantly. Finally, using the approach of the tensor construction in Figure 2, the prior weight map W lp can be converted into a tensor form, W lp .  (25) where max sm and min sm denote the maximum and minimum of SM . The different local prior weight maps are displayed in Figure 5. By using the local energy feature, it can be observed that the complex background clutters are fairly suppressed and the target is effectively enhanced in the proposed local prior weight map. Compared with the prior weight map of PSTNN, the proposed local energy feature can suppress the strong edge more significantly.
Finally, using the approach of the tensor construction in Figure 2, the prior weight map lp W can be converted into a tensor form, lp .

Structured Sparse Regularization
In the process of background-target separation, some rare structures tend to enter the sparse target component. One of the challenges of almost all approaches based on patch image models is rare structure effect [32]. Since many objects have a streamlined appearance in real scenes [62], we can consider that most of rare structures, such as the stubborn edges, remaining in the target image are linear structural sparse relative to the whole data of an input image [42]. Therefore, to minimize these rare structures being transferred to sparse target component, we introduce an extra tube-wise sparse tensor regularization term utilizing 1,1,2 -norm to IPT model [50]. The 1,1,2 -norm of thirdorder tensor is similar to 2,1 -norm of matrix which has the ability to identify the sample outlier, while most outliers can be associated with sparse structure [32,42]. Then, an input infrared patchtensor can be decomposed into three components, which are sparse target tensor , low-rank background tensor , and structured sparse edge tensor , respectively, as follows: where the 1,1,2 -norm of is denoted as 1,1,2 , which can be defined as

The Proposed NTRS Model
According to the above description, a target detection model utilizing the non-convex tensor rank surrogate, the local contrast energy, and the structured sparse regularization, which is named NTRS, is formulated as follows:

Structured Sparse Regularization
In the process of background-target separation, some rare structures tend to enter the sparse target component. One of the challenges of almost all approaches based on patch image models is rare structure effect [32]. Since many objects have a streamlined appearance in real scenes [62], we can consider that most of rare structures, such as the stubborn edges, remaining in the target image are linear structural sparse relative to the whole data of an input image [42]. Therefore, to minimize these rare structures being transferred to sparse target component, we introduce an extra tube-wise sparse tensor regularization term utilizing 1,1,2 -norm to IPT model [50]. The 1,1,2 -norm of third-order tensor is similar to 2,1 -norm of matrix which has the ability to identify the sample outlier, while most outliers can be associated with sparse structure [32,42]. Then, an input infrared patch-tensor D can be decomposed into three components, which are sparse target tensor T , low-rank background tensor B, and structured sparse edge tensor S, respectively, as follows: where the 1,1,2 -norm of S is denoted as S 1,1,2 , which can be defined as i,j S(i, j, :) 2 .

The Proposed NTRS Model
According to the above description, a target detection model utilizing the non-convex tensor rank surrogate, the local contrast energy, and the structured sparse regularization, which is named NTRS, is formulated as follows: min Remote Sens. 2020, 12, 1520 13 of 31 where denotes the Hadamard product; W rec is the tensor, each element of which is the reciprocal of the corresponding element in W lp ; and λ and β are positive trade-off coefficients. Due to the non-convex and non-smooth of 0 -norm, it is difficult to directly solve problems involving that norm. For tractable computation, we characterize the sparsity of the target tensor by employing 1 -norm as other many detection models. A reweighted 1 -norm minimization is proposed to enhance sparsity in [63]. After that, the reweighted scheme, which can speed up the convergence rate and alleviate the imbalance that the larger numerical value suffers the greater penalty than the smaller one [8], is widely studied and applied [64][65][66]. Thus, the reweighted scheme is utilized in this paper, too. Moreover, the gray values of the real small targets are usually greater than those of non-target sparse spots. Thus, the sparse reweight can be given as follows: where c is a non-negative constant, and can be set to 1 for simplicity [23,32]. To prevent division by zero, η represents a tiny positive constant and is set to 10 −8 in this paper. l stands for the lth iteration. The local prior weight and the sparse reweight can be combined to get the final weight tensor, as follows: Finally, the proposed NTRS model (Equation (27)) is rewritten as:

Solution of NTRS Model
In this subsection, an optimization algorithm based on ADMM [67], which has high precision and fast convergence speed, is proposed to solve Equation (30). The augmented Lagrangian function of that can be defined as where Y and µ > 0 are the Lagrange multiplier and the penalty factor, respectively. · is the two third-order tensors inner product. Subsequently, ADMM decomposes minimization of L into several subproblems, and B, T , S can be updated by an iterative approach. Specifically, at the (l + 1)th iteration, B l+1 , T l+1 , and S l+1 are obtained via solving the three subproblems, as follows: where Y and µ can be updated by the Equations (35) and (36), respectively.
where ρ > 1 is a constant. Let γ = µ l , Z = D − T l − S l − Y l µ l , the subproblem in Equation (32) can be solved by Algorithm 2 of Section 3.1. For the subproblem in Equation (33), T l+1 can be calculated via a soft-thresholding operator [68], as shown in Equations (37) and (38): where m = 1, · · · , m 3 . The convergence speed of the proposed model is a dominant factor to computational time of the whole detection algorithm. Generally, the given threshold of reconstruction relative error is used as the stopping criterion for robust principle component analysis (RPCA) problem [69]. However, it has been observed that number of non-zero elements among the sparse target tensor barely change after a few iterations [8,23]. Therefore, besides the reconstruction relative error, we could count the number of non-zero elements of target tensor as the stopping condition, which can be defined as T l 0 = T l+1 0 , to reduce the iteration number.
Finally, we summarize the whole optimization process of NTRS model as Algorithm 3.
Algorithm 3 ADMM for solving the proposed NTRS model.

Input:
Original patch-image D, W lp , λ, β;  In Figure 6, we show the specific implementation procedure of the target detection method based on the proposed NTRS model. Meanwhile, the detailed steps are explained in the following.
(1) Local prior map generation. For an input infrared image f D ∈ R m×n , its local prior weight map W lp ∈ R m×n is calculated via Equation (25).
(2) Patch-tensor construction. Original patch-tensor D ∈ R m 1 ×m 2 ×m 3 can be constructed by stacking image patches which are obtained via sliding a window of size m 1 × m 2 over the input image, as shown in Figure 2. In the same way, the local prior weight patch-tensor W lp ∈ R m 1 ×m 2 ×m 3 can be constructed from local prior weight map. (3) Background-target-edge separation. By Algorithm 3, an original infrared patch-tensor D can be decomposed into three patch-tensor components: the low rank background B, the sparse target T , and the structured sparse edge S. (4) Image reconstruction. The two-dimensional image can be reconstructed by the inverse operation of patch-tensor construction [42]. Considering that structured edges are also background, we first sum B and S as the final background patch-tensor B. Then, the target image f T and the background image f B are reconstructed from target patch-tensor T and background patch-tensor B, respectively. For the overlapped positions, one-dimensional median filter can be used to determine the values. (5) Target detection. Considering that the pixels of the true targets have higher grayscale in the reconstructed target image [70], small targets can be extracted via a simple adaptive threshold segmentation algorithm. The threshold Th is as follows: where f Tstd and f Tmean are the standard deviation and mean of the reconstructed target image f T , respectively. ρ is an empirical coefficient to compromise false alarm rate and detection probability.

Target Detection
In Figure 6, we show the specific implementation procedure of the target detection method based on the proposed NTRS model. Meanwhile, the detailed steps are explained in the following.  Background-target-edge separation. By Algorithm 3, an original infrared patch-tensor can be decomposed into three patch-tensor components: the low rank background , the sparse target , and the structured sparse edge . (4) Image reconstruction. The two-dimensional image can be reconstructed by the inverse operation of patch-tensor construction [42]. Considering that structured edges are also background, we first sum and as the final background patch-tensor . Then, the target image Target detection. Considering that the pixels of the true targets have higher grayscale in the reconstructed target image [70], small targets can be extracted via a simple adaptive threshold segmentation algorithm. The threshold Th is as follows: Tmean Tstd Th f f (40) where Tstd f and is an empirical coefficient to compromise false alarm rate and detection probability.  (40) Detection result Patch including target + Figure 6. The flowchart of the proposed infrared small target detection method. First, local prior weight map is generated by using local contrast energy feature. Second, original patch-tensor and local prior weight patch-tensor are constructed. Third, the low-rank background tensor B, the structured sparse edge tensor S, and the sparse target tensor T can be separated by the proposed model. Then, the target image is reconstructed and small targets can be extracted by an adaptive threshold segmentation algorithm.

Experimental Preparation
To demonstrate the effectiveness of our method in multiple aspects, such as robustness to noise and various background, the capability of small target enhancement and background suppression, detection performance, and computational time, extensive experiments were conducted. In our experiments, fourteen single frame infrared images of different scenes, varying from homogeneous background containing extreme faint small target to complicated background with prominent clutters and interferences, and six real sequences were utilized to verify our method. The fourteen single images (a-n) and the representative images (o-t) of the six real sequences, a total of twenty images with different scenes, are displayed in Figure 7. To facilitate visualization, these images were converted to the identical size and each real target is indicated by a red square box. Figure 8 shows the 3D mesh views corresponding to the images. The six real sequences were captured via an infrared thermal camera (Its infrared sensor is a long wave infrared focal plane arrays, MARS LW K508), which is mounted on aircraft to detect aerial targets and its imaging resolution is 320 × 256. The dynamic range of the original image data is 14bit and we converted that to 8bit (0-255) by using linear mapping. All six sequences contain the complex background and clutter, such as different types of clouds and various ground highlight interferences. Meanwhile, all targets are small. Detailed descriptions are given in Table 1.
Six other classical and advanced methods, namely local contrast measure (LCM) [34], Top-hat filter [27], multi-scale patch contrast measure (MPCM) [10], RIPT model [23], IPI model [38], and PSTNN model [8] were used as the benchmarks for the overall comparison. The parameter settings of the seven approaches including ours are summarized in Table 2.
Remote Sens. 2020, 12, x FOR PEER REVIEW 16 of 32 Figure 6. The flowchart of the proposed infrared small target detection method. First, local prior weight map is generated by using local contrast energy feature. Second, original patch-tensor and local prior weight patch-tensor are constructed. Third, the low-rank background tensor , the structured sparse edge tensor , and the sparse target tensor can be separated by the proposed model. Then, the target image is reconstructed and small targets can be extracted by an adaptive threshold segmentation algorithm.

Experimental Preparation
To demonstrate the effectiveness of our method in multiple aspects, such as robustness to noise and various background, the capability of small target enhancement and background suppression, detection performance, and computational time, extensive experiments were conducted. In our experiments, fourteen single frame infrared images of different scenes, varying from homogeneous background containing extreme faint small target to complicated background with prominent clutters and interferences, and six real sequences were utilized to verify our method. The fourteen single images (a-n) and the representative images (o-t) of the six real sequences, a total of twenty images with different scenes, are displayed in Figure 7. To facilitate visualization, these images were converted to the identical size and each real target is indicated by a red square box. Figure 8 shows the 3D mesh views corresponding to the images. The six real sequences were captured via an infrared thermal camera (Its infrared sensor is a long wave infrared focal plane arrays, MARS LW K508), which is mounted on aircraft to detect aerial targets and its imaging resolution is 320 256  . The dynamic range of the original image data is 14bit and we converted that to 8bit (0-255) by using linear mapping. All six sequences contain the complex background and clutter, such as different types of clouds and various ground highlight interferences. Meanwhile, all targets are small. Detailed descriptions are given in Table 1.
Six other classical and advanced methods, namely local contrast measure (LCM) [34], Top-hat filter [27], multi-scale patch contrast measure (MPCM) [10], RIPT model [23], IPI model [38], and PSTNN model [8] were used as the benchmarks for the overall comparison. The parameter settings of the seven approaches including ours are summarized in Table 2.  Figure 7. The images in (a-t) correspond to Figure 7a-t, respectively. The x-axis and y-axis represent the column coordinate and row coordinate of the pixels of the original image, respectively, while the z-axis represents the gray value of the pixels.     Figure 7o-t, are the representative frames of Sequence 1-6, respectively. Table 2. Detailed parameter settings for the seven methods.

Method Parameters
Top-hat

Evaluation Metrics
In this subsection, several commonly used performance evaluation metrics of infrared small target detection methods are introduced [3]. The signal-to-clutter ratio (SCR) is a basic indicator to measure the target saliency. It is also a measurement of the detection difficulty. Its formula is: where SCR is calculated in local adjacent region of the target, as shown in Figure 9. µ t and µ b denote the gray mean values of the target region and the neighborhood background region surrounding the target, respectively. σ b is the standard deviation of the target adjacent background. We set d = 25 in our experiments. Generally, when the SCR is higher, the detection will be easier. Based on SCR, the gain of SCR (GSCR), which is the most frequently adopted criterion, can be defined: where SCR out and SCR in denote the SCR of the processed image (which is the reconstructed target image for the proposed method in this paper) and the original image, respectively. Higher GSCR means greater ability of target enhancement.

Evaluation Metrics
In this subsection, several commonly used performance evaluation metrics of infrared small target detection methods are introduced [3]. The signal-to-clutter ratio (SCR) is a basic indicator to measure the target saliency. It is also a measurement of the detection difficulty. Its formula is: (41) where SCR is calculated in local adjacent region of the target, as shown in Figure 9. We set 25 d  in our experiments. Generally, when the SCR is higher, the detection will be easier. Based on SCR, the gain of SCR (GSCR), which is the most frequently adopted criterion, can be defined: (42) where SCR out and SCR in denote the SCR of the processed image (which is the reconstructed target image for the proposed method in this paper) and the original image, respectively. Higher GSCR means greater ability of target enhancement.
Local background region Target Figure 9. An infrared small target and its local adjacent area. Yellow rectangle denotes the small target area and red rectangle denotes its local adjacent area. On the right is an enlarged view of the red rectangle.
Background suppression factor (BSF) is another typical evaluation metric, which reflects the ability of background suppression of detection methods. Higher BSF means that the detection algorithm can suppress background clutter more effectively. BSF is defined as Figure 9. An infrared small target and its local adjacent area. Yellow rectangle denotes the small target area and red rectangle denotes its local adjacent area. On the right is an enlarged view of the red rectangle.
Background suppression factor (BSF) is another typical evaluation metric, which reflects the ability of background suppression of detection methods. Higher BSF means that the detection algorithm can suppress background clutter more effectively. BSF is defined as where σ out and σ in stand for the corresponding standard deviation values of the local background in the processed image and the original image, respectively. In addition to the target enhancement and background suppression ability indicator, false positive rate (FPR) and true positive rate (TPR), which represent false alarm rate and detection probability, Remote Sens. 2020, 12, 1520 19 of 31 respectively, were utilized to evaluate comprehensive target detection ability of several methods. FPR and TPR are formulated as follows: FPR = number of nontarget pixels detected total number ofpixels in all images (44) TPR = number of true targets detected number of true targets (45) Usually, the value of FPR is very small. Receive operating characteristic (ROC) curve can be drawn according to TPR and FPR, and then the area under curve (AUC) can be obtained. The ROC curve reveals the trade-off between the false-alarm rate and detection probability: the closer the curve is to the left upper corner, the more robust is the target detection ability. In the same way, the AUC value increases with improving performance of the detection approach.

Parameter Analysis
Several key parameters are included in the proposed NTRS model, for instance sliding step, patch size, compromising parameters λ and β, and penalty factor µ. These parameters would affect detection ability and robustness of the model under different backgrounds. Thus, to achieve excellent performance in most cases, especially for real scenes, it is necessary to select appropriate parameter setting. We selected the parameter values through experiments and analyzed them in detail. In the experiments, other parameters were fixed while one was tuned. In Figure 10, we show the ROC curves of different parameters on six tested infrared sequences (Sequences 1-6).

Patch Size
Patch size not only has the critical influence on detection performance, but also affects computational time of the algorithm [8]. On the one hand, it is obvious that a smaller patch size could result in a lower computational complexity of NTRS model; however, it would weaken the sparsity of the target. On the other hand, a larger patch size is usually adopted to ensure that the target is sufficiently sparse to deal with the changing size target, but it would reduce the correlation between patches and relax the sparsity of patch too much, resulting in some clutter with sparse characteristics. To investigate the influence of patch size on the proposed model, the patch size was varied from 20 to 60 by taking 10 as a length of stride. The ROC curves on six tested sequences are shown in Figure 10a1-a6. From these ROC curves, we can observe that the detection performance of the patch size within 30-50 is generally acceptable and that the model achieves the best performance on the all sequences when 40 is set for the patch size. It is also observed that the detection performance decreases when the patch size is 20 or 60; especially, when the patch size is 60, the detection results are the worst. These demonstrate our previous analysis that both overlarge and too small patch size would result in low detection performance. Hence, we set the patch size to 40 in the experiment.

Sliding Step
The sliding step determines the number of frontal slices that we can obtain for constructing patch-tensor. Similar to the patch size, it would influence the detection performance and computation time simultaneously. A larger sliding step means there would be fewer frontal slices, which would reduce the operation time of Algorithm 1. However, a larger sliding step may degrade the redundancy of the original patch-tensor and undermine the nonlocal correlation of the background [17], which weakens the background clutter separation ability of the proposed model. On the contrary, a smaller sliding step would increase computation time of the model and weaken the sparsity of the target due to more frontal slices containing the target. In addition, the sliding step should not be larger than the patch size to avoid missing the target. To inquire into influence of the sliding step, we changed it from 20 to 40 with five intervals. Corresponding ROC curves for the six sequences are displayed in Figure 10b1-b6. Analyzing the ROC curves, we can conclude that the proposed model is robust to the variation of sliding step between 25 and 40. At the same time, we observe that, when the sliding step is set to 30, the proposed model achieves the optimal detection performance on all sequences. Thus, 30 is the best choice for the sliding step.

Penalty Factor µ
Penalty factor µ plays an important role in the optimization process of the model and controls the compromise between the sparse component and low-rank component. On the one hand, a smaller µ would preserve more details in the low-rank component and prevent more non-target interference from entering the sparse component. As a result, less background clutter remains in the target patch-tensor. However, details of the target may also be kept in the low-rank component, so that the small target suffers from over-shrinking or even is missed. On the other hand, the target details can be preserved by a larger; nevertheless, this may lead to more background clutter appearing in the target patch-tensor. For the best detection performance, it is necessary to select a suitable value of µ to keep a reasonable tradeoff between false-alarm rate and detection probability. Therefore, we varied the penalty factor µ from 0.001 to 0.005 with a step of 0.001. The ROC curves for the six tested sequences are shown in Figure 10c1-c6. It can be observed that the optimal detection performance is obtained in most instances when µ = 0.002. A larger or smaller value is not an optimal choice. Therefore, we used 0.002 as the best penalty factor value for our experiments.

Compromising Parameters λ and β
In NTRS model, the parameters λ and β regulate the compromise among sparse target tensor, structured sparse edge tensor and low-rank background tensor. It is necessary to fine tune them.
Although a large λ value would lead to the purer target image, small targets might be over-shrunk. In contrast, a small λ value can preserve more details of the target. However, background residuals would be kept in target image. Considering λ and β are used to constrain sparse (or structured sparse) components, they have similar characteristics. For simplicity, we first made β = k * λ and empirically chose k = 10.5. Next, to investigate the influence of λ on detection performance, similar to PSTNN model, we let λ = L/ max(m 1 , m 2 ) * m 3 and varied L from 0.35 to 1.15 with an interval of 0.15 instead of varying λ directly. ROC curves for the six tested sequences are illustrated in Figure 11d1-d6. From the illustrations, it can be observed that the detection result is relatively poor when L is too larger (L = 1.15) or too small (L = 0.35). Especially, when L = 1.15, we obtain the worst performance with some targets always being missed in each sequence. Meanwhile, the detection results show that the best performance is achieved for all sequences when L = 0.75. Thus, we used λ = 0.75/ max(m 1 , m 2 ) * m 3 in the following experiments.
background to complex background, from sky background with cloud to sea or ground background, and so on. Because of the changing distance from the target to the imaging sensor, the target size in the image is variable, from one pixel to dozens of pixels. Meanwhile, the brightness of the target in the image may also vary dramatically. Furthermore, the target may appear in different positions of the background. For example, to the sky background with cloud, small target may be above the cloud, in the cloud, or near the cloud edge. The twenty single frame images, which are shown in Figure 7, cover most of the above-mentioned cases and are very representative. Therefore, these images can be used to test the proposed method's robustness.
In Figure 11, we show the target images reconstructed by the proposed method for the twenty different scenes in Figure 7. Corresponding 3D mesh views of the reconstructed target images are shown in Figure 12. Figure 11 shows that the various types of background clutters are completely wiped out and desired targets are well preserved. Figure 12 shows that each small target is transformed into the most prominent spot of the 3D mesh view. Comparing Figures 8 and 12 (3D mesh views of original images), we can see more clearly that all the small targets have been enhanced markedly by the proposed approach. Meanwhile, various background clutters are effectively suppressed. We also show the target images processed by PSTNN model, which is the most advanced method for small infrared target detection in Figure 13. From the processed results, although PSTNN can detect all targets, some background clutters (marked by yellow square boxes) are still retained in the processed target images. Therefore, we conclude that our method can handle different targets and backgrounds, and is quite robust to various scenes. Figure 11. Twenty reconstructed target images by the proposed model. Red Squares indicate the detected true targets. (a-t) The processed results corresponding to the original images in Figure 7a-t, respectively. All the true targets are accurately detected without any false alarm.

Robustness to Various Scenes
Wide adaptability to various scenes is a crucial ability for target detection algorithm. Here, the various scenes mean three points: the diversity of background, the variability of target and uncertainty of target position in background. Infrared backgrounds are diverse, from uniform background to complex background, from sky background with cloud to sea or ground background, and so on. Because of the changing distance from the target to the imaging sensor, the target size in the image is variable, from one pixel to dozens of pixels. Meanwhile, the brightness of the target in the image may also vary dramatically. Furthermore, the target may appear in different positions of the background. For example, to the sky background with cloud, small target may be above the cloud, in the cloud, or near the cloud edge. The twenty single frame images, which are shown in Figure 7, cover most of the above-mentioned cases and are very representative. Therefore, these images can be used to test the proposed method's robustness.
In Figure 11, we show the target images reconstructed by the proposed method for the twenty different scenes in Figure 7. Corresponding 3D mesh views of the reconstructed target images are shown in Figure 12. Figure 11 shows that the various types of background clutters are completely wiped out and desired targets are well preserved. Figure 12 shows that each small target is transformed into the most prominent spot of the 3D mesh view. Comparing Figures 8 and 12 (3D mesh views of original images), we can see more clearly that all the small targets have been enhanced markedly by the proposed approach. Meanwhile, various background clutters are effectively suppressed. We also show the target images processed by PSTNN model, which is the most advanced method for small infrared target detection in Figure 13. From the processed results, although PSTNN can detect all targets, some background clutters (marked by yellow square boxes) are still retained in the processed target images. Therefore, we conclude that our method can handle different targets and backgrounds, and is quite robust to various scenes.  Figure 11. The images in (a-t) correspond to Figure 11a-t, respectively. In each subfigure, the x-axis and y-axis represent the column coordinate and row coordinate of the pixels of the original image, respectively, while the z-axis represents the gray value of the pixels. Figure 12. 3D mesh views of the reconstructed target images in Figure 11. The images in (a-t) correspond to Figure 11a-t, respectively. In each subfigure, the x-axis and y-axis represent the column coordinate and row coordinate of the pixels of the original image, respectively, while the z-axis represents the gray value of the pixels.
x y x y x y x y x y Figure 12. 3D mesh views of the reconstructed target images in Figure 11. The images in (a-t) correspond to Figure 11a-t, respectively. In each subfigure, the x-axis and y-axis represent the column coordinate and row coordinate of the pixels of the original image, respectively, while the z-axis represents the gray value of the pixels. Figure 13. Twenty processed target images by PSTNN model. The images in (a-t) correspond to the original images in Figure 7a-t, respectively. Red Squares denote the detected targets. Yellow rectangles and squares denote the residual clutters that may cause false alarms and their enlarged views are also shown to facilitate observation.

Anti-Noise Performance
In practical application, an infrared image is inevitably contaminated by various noises [1]. Thus, in addition to robustness for various scenes, anti-noise ability is another important indicator for infrared small detection method. The anti-noise performance of our model was evaluated via an experiment.
First, two infrared images (Scenes o and q) from Figure 7 were selected for our experiment and different Gaussian noises with mean 0 were added. We varied the standard deviation of the noise from 4 to 20 by an interval of 4 and obtained five image samples for each scene. Then, these image samples were processed to get the reconstructed target images by the proposed model. Experimental results of anti-noise are presented in Figures 14 and 15 for the selected two scenes, respectively. In Figures 14 and 15, the five images of the first row show the image samples added white Gaussian noise with standard deviation of 4, 8, 12, 16, and 20, respectively; the five images of the second row are the corresponding 3D mesh views of the image samples; the reconstructed target images by the proposed model; and their 3D mesh views are shown in third and fourth rows, respectively. From the experimental results, we can see that the proposed model significantly enhances the targets, and well suppresses the clutters and noise of the image samples simultaneously, if the standard deviation of noise is less than or equal to 16. As the standard deviation of noise increases to 20, it can be observed that our model still can detect the two targets, as shown in Figures 14e3 and 15e3. An interference point (marked by the yellow square box) appears in Figure 14e3, but the little flaw is acceptable, because, in the 3D mesh view (Figure 14e4), we can observe that the grayscale of the true target is far much higher than that of the interference point. In other words, the above-mentioned experiment shows the robustness of our method with respect to the noise. experimental results, we can see that the proposed model significantly enhances the targets, and well suppresses the clutters and noise of the image samples simultaneously, if the standard deviation of noise is less than or equal to 16. As the standard deviation of noise increases to 20, it can be observed that our model still can detect the two targets, as shown in Figures 14(e3) and 15(e3). An interference point (marked by the yellow square box) appears in Figure 14(e3), but the little flaw is acceptable, because, in the 3D mesh view (Figure 14(e4)), we can observe that the grayscale of the true target is far much higher than that of the interference point. In other words, the above-mentioned experiment shows the robustness of our method with respect to the noise.  (a2-e2) 3D mesh views of the image samples; (a3-e3) the reconstructed target images; and (a4-e4) 3D mesh views of the reconstructed target images. Red Squares denote the true targets detected. Yellow square denotes the residual clutter in the reconstructed target image. In (a2-e2,a4-e4), the horizontal plane represents the coordinate location of the image pixels, and the height dimension represents the gray value of the pixels. (a2-e2) 3D mesh views of the image samples; (a3-e3) the reconstructed target images; and (a4-e4) 3D mesh views of the reconstructed target images. Red Squares denote the true targets detected. Yellow square denotes the residual clutter in the reconstructed target image. In (a2-e2,a4-e4), the horizontal plane represents the coordinate location of the image pixels, and the height dimension represents the gray value of the pixels.

Quantitative Evaluation
In this subsection, the small target detection advantages of our method are further evaluated by quantitative indices including BSF, GSCR, ROC curve, computational time, etc. The six actual sequences of Table 1 were used as test set. Meanwhile, the six baseline methods were also tested and

Quantitative Evaluation
In this subsection, the small target detection advantages of our method are further evaluated by quantitative indices including BSF, GSCR, ROC curve, computational time, etc. The six actual sequences of Table 1 were used as test set. Meanwhile, the six baseline methods were also tested and compared with our method.
In Table 3, BSF and GSCR values of the tested seven methods for the representative images of Sequences 1-6 are shown, where INF, namely infinity, indicates that the backgrounds of target neighborhood are completely swept away and shrink to zeros. It should be noted that we do not show GSCR and BSF values of IPI for the 29th frame of Sequence 5 and of RIPT for the 18th frame of Sequence 2, because the gray values of target pixels shrink to zeros in the processed images. In this case, the target cannot be detected, thus it is meaningless to calculate GSCR and BSF. Table 3 shows that our method has achieved the highest GSCR and BSF values on each sequence, showing the excellent ability in terms of target enhancement and background suppression. In general, we observe that the three methods (RIPT, PSTNN, and ours) combining local and nonlocal priors have achieved higher values than the three methods (top-hat, LCM, and MPCM) based on local prior only. IPI method based on non-local prior gets an intermediate value in most sequences. These indicate that combining local and non-local priors can dig out more useful image information to improve detection performance. In addition, RIPT and PSTNN have achieved the best results as the proposed method in most cases, but it can be observed that RIPT would shrink the target pixels to 0 for the 18th frame of Sequence 2 (actually, there are similar results in some other frames of this sequence), resulting in the target not being detected, and that PSTNN compared with the proposed method is slightly worse for Sequence 6, which contains extremely complex ground background. This illustrates the superiority of our method.
BSF and GSCR only reflect the background suppression and target enhancement abilities of detection method in a local region, not global image [23]. Therefore, to further verify the comprehensive detection performance of our algorithm, we obtained the ROC curves of those for the six sequences through experiments and compared them with the six other methods. In Figure 16, all the ROC curves are illustrated, and the corresponding AUC values are shown in Table 4. The ROC curve that is the closer to the upper left corner and has the larger AUC value represents the better detection performance. In Figure 16 and Table 4, it can be observed that the detection performance of Top-hat is the worst in most sequences (except for Sequence 4), due to that the simple structural elements cannot handle the complex background and varying target. LCM based on local contrast performs slightly better than top-hat. MPCM is a derived version of LCM, thus it has great performance improvement; however, its performance is still worse than PSTNN and the proposed method for all sequences. IPI works well, but its performance is fluctuating. RIPT can achieve good results in most cases, but its performance is slightly worse on Sequence 2, because it would over-shrink the target in some cases. Overall, PSTNN and the proposed method have achieved the two best results, but it can be clearly seen that our method almost invariably has maximal TPR value in respect of the identical FPR and has the largest AUC value. All these demonstrate the proposed method has the most robust detection performance.   Notes: Bold and underline represent the highest value and the second highest value in each sequence, respectively.
In addition to good detection ability, computational time is an important factor to be considered for infrared small target detection. Thus, the average single frame computational time of each method on six real sequences was also tested, and the test results are shown in Table 5. We can observe that our method is slightly slower than the filtering methods (Top-hat, LCM and MPCM) utilizing local prior merely, but much faster than the similar optimizing methods (IPI, RIPT, and PSTTN) based on sparse and low-rank decomposition on all sequences, except for PSTNN in Sequence 1. Considering that the detection ability of our method greatly outperforms those filtering methods, it is obviously acceptable. IPI model is the slowest, resulting from high time-consuming accelerated proximal gradient (APG) approach [71], while the other optimizing methods are based on faster ADMM. Moreover, the processing time of RIPT and PSTNN fluctuate greatly for different scenes, while our method is relatively stable. This is also very important for practical application systems.  Notes: Bold and underline represent the highest value and the second highest value in each sequence, respectively.
In addition to good detection ability, computational time is an important factor to be considered for infrared small target detection. Thus, the average single frame computational time of each method on six real sequences was also tested, and the test results are shown in Table 5. We can observe that our method is slightly slower than the filtering methods (Top-hat, LCM and MPCM) utilizing local prior merely, but much faster than the similar optimizing methods (IPI, RIPT, and PSTTN) based on sparse and low-rank decomposition on all sequences, except for PSTNN in Sequence 1. Considering that the detection ability of our method greatly outperforms those filtering methods, it is obviously acceptable. IPI model is the slowest, resulting from high time-consuming accelerated proximal gradient (APG) approach [71], while the other optimizing methods are based on faster ADMM. Moreover, the processing time of RIPT and PSTNN fluctuate greatly for different scenes, while our method is relatively stable. This is also very important for practical application systems.

Conclusions
In this paper, a patch-tensor model with non-convex tensor rank surrogate, local contrast energy, and structured sparse regularization, namely, NTRS, is proposed for small infrared target detection. In the proposed model, the Laplace function based non-convex approximation for 1 -norm of the tensor multi-rank is utilized to measure the low-rank characteristic of the background patch-tensor, so as to separate target and background more robustly. Considering that the local prior map can be equivalent to the saliency map, the local contrast energy feature as prior information is introduced to weight sparse target patch-tensor. To further wipe out those stubborn rare structures, an additional tube-wise sparse tensor regularization term is introduced into our detection model. Finally, target detection task is formulated as a separation problem of low-rank, sparse, and structured sparse tensors, which is solved via an optimization algorithm based on ADMM and the fast t-SVD. In addition, the key parameters of the model can be selected appropriately through experiments. Extensive experimental results of various scenes and real sequences verify that our approach is robust and achieves excellent detection ability, and outperforms the other advanced approaches. Moreover, our approach is competitive in processing time with respect to other widely used methods. In the future, we can further explore detection techniques based on local contrast and merge them into IPT detection framework, such as multi-scale technique.