Dim and Small Target Detection with a Combined New Norm and Self-Attention Mechanism of Low-Rank Sparse Inversion

Methods for detecting small infrared targets in complex scenes are widely utilized across various domains. Traditional methods have drawbacks such as a poor clutter suppression ability and a high number of edge residuals in the detection results in complex scenes. To address these issues, we propose a method based on a joint new norm and self-attention mechanism of low-rank sparse inversion. Firstly, we propose a new tensor nuclear norm based on linear transformation, which globally constrains the low-rank characteristics of the image background and makes full use of the structural information among tensor slices to better approximate the rank of the non-convex tensor, thus achieving effective background suppression. Secondly, we construct a self-attention mechanism in order to constrain the sparse characteristics of the target, which further eliminates any edge residuals in the detection results by transforming the local feature information into a weight matrix to further constrain the target component. Finally, we use the alternating direction multiplier method to decompose the newly reconstructed objective function and introduce a reweighted strategy to accelerate the convergence speed of the model. The average values of the three evaluation metrics, SSIM, BSF, and SNR, for the algorithm proposed in this paper are 0.9997, 467.23, and 11.72, respectively. Meanwhile, the proposed detection method obtains a higher detection rate compared with other algorithms under the same false alarm rate.


Introduction
The weak target detection algorithm in infrared is an important technology for infrared warning and guidance. In real-world scenarios, the imaging distance of the target is far, the imaging area is small, and the radiation energy is low, which makes the target easily affected by noise and clutter interference. Therefore, how to separate weak targets from complex backgrounds is a challenging research topic, and many scholars have conducted in-depth research on this issue [1,2].
Traditional spatial-temporal filtering methods are one of the mainstream weak target detection algorithms in infrared, which separate the target and the background by local feature information of the image in the spatial-temporal domain. There are two methods of temporal filtering: single-frame detection algorithms and multi-frame detection algorithms. Multi-frame detection algorithms are mainly three-dimensional matching filtering algorithm, projection transformation algorithm, dynamic programming algorithm, pipeline filtering algorithm, etc. Common single-frame detection algorithms include top-hat filtering algorithm, gradient reciprocal filtering algorithm, etc. The principle of three-dimensional matching filtering is to set a three-dimensional filter on all possible motion trajectories of the target, and then analyze and judge the collection results of the filter. The three-dimensional filter with the highest signal-to-noise ratio is determined as the weak target and motion trajectory. However, for motion-mismatched targets, multiple velocity filters need to be set to capture the target trajectory, which leads to an increase in the computational complexity of the detection algorithm [3]. The projection transformation projects multidimensional data into a low-dimensional space, and obtain the motion trajectory of the target from the low-dimensional data. Since this algorithm appears to reduce the dimensionality of the data during the calculation, the data of the samples to be processed and the suspicious target signals are greatly reduced, which simplifies the computation. Compared with the multidimensional matching filtering, reduced computation is accompanied by poorer target detection result [4,5]. In the study of small target detection, Wang et al. [6] first used feature triangles to perform registration on infrared images and reduced the computation by using the star point coordinate matrix method. Then, the maximum value projection transformation method was used to process the sequence images and, therefore, obtain the result. The idea based on dynamic programming is to equivalently treat the accumulated energy of the detected small target on a certain trajectory as the decision function in the theory of dynamic programming. The motion range of the target at different stages is regarded as the decision space. Then, by recursive methods, the motion trajectory that can make the decision function achieve global optimization is found [7]. Guo et al. [8] proposed a dynamic programming method based on parallel computing, which performs a parallel search on the suspicious candidate target area to obtain the candidate target points. This method effectively improves the detection efficiency, but its detection performance decreases as the target signal-to-noise ratio gradually decreases. The principle of the pipeline filtering algorithm is that when the target signal is sampled at an appropriate frequency, the motion characteristics of the target signal in space are continuous in time, while random noise does not have continuity. Dong et al. [9] proposed a method that combines a modified visual attention model (VAM) with anti-jitter pipeline filtering algorithm. In this method, the best mode is adaptively selected to calculate the saliency map. Then, a local saliency singularity evaluation strategy is designed to automatically extract suspicious targets and suppress background clutter. Finally, the real target is distinguished by anti-vibration pipe filtering. This method can improve the efficiency of target detection in infrared imaging under different weather conditions. Although many improved pipeline methods have been proposed, it is difficult to achieve good detection performance when the motion of small targets has complex and varied characteristics. The top-hat filtering algorithm models the background of the image by using a structural element as a template for opening and closing operations on the image, and obtains the final difference map. Bai et al. [10] combined multi-scale operations with structural elements, which predict the image background by different scale structural elements. The top-hat filtering algorithm causes a loss of target energy intensity during opening and closing operations, resulting in poor detection in scenes with low target energy. Li et al. [11] established a correlation coefficient function based on the local feature information of the image, and introduced it into the gradient reciprocal filter algorithm, which can obtain an improved background estimation. Due to the complex and variable gray-scale characteristics of the image background, the filtering template of the gradient reciprocal filter algorithm cannot completely match the background features, resulting in a large number of edge residues in the detection results. Most real-world scenarios contain a lot of noise and clutter, and the noise signal is only slightly different from the weak target in terms of texture and energy. It is difficult to fully distinguish noise from the target based on only local feature information in the image. Therefore, traditional spatio-temporal filtering algorithms are easily interfered by noise and clutter, leading to low detection rates.
In the study of infrared faint target detection with deep learning, in order to effectively improve the generalization ability of the deep neural network detection model, Dai et al. proposed a special ACM detection model based on the construction of a high-quality annotation data set. There is an asymmetric upper and lower information modulation model for small infrared target detection [12]. This model not only realizes forward and backward information transfer to fusion, but also supplements a top-down information modulation module based on point-oriented channel attention, which enhances the fusion of target signals. The DNAnet network proposed by Li et al. completes the feature fusion of the target through the repeated use of image features [13], making full use of the weak information of the target. In addition, the model also includes the constructed dense interaction module and channel-attention module, which realizes the fusion of target features and the enhancement of adaptive features, and has achieved remarkable results in the detection of small infrared targets. In order to solve the problems of detection loss (MD) and false alarm (FA) more effectively, Wang et al. proposed an adversarial network consisting of two generators and one discriminator to extract target feature information [14]. This model distributes detection tasks to different training models, which improves the overall detection efficiency of the model. To sum up, the weak and small target detection model of deep learning has achieved remarkable results, but considering the limitations of the generalization ability of the model brought about by the multiple changes in the air and space detection environment, it is necessary to start from the data set construction and model construction. From the perspective of research, increase the practical application of deep learning detection models in the detection of small and weak infrared targets.
In recent years, scholars have paid more attention to the information about the entire image. They have transformed the problem of detecting targets into the matrix reconstruction problem through machine learning theories, and solved this problem using the overlapping direction multiplier method to obtain the detect result. Gao et al. [15] proposed an infrared patch-image (IPI) model that utilizes the correlation characteristics of non-local image background. Wang et al. [16] suppressed strong background edge contours by combining anisotropy and L1 norm on the basis of the IPI model. Xue et al. [17] suppressed strong background edge contours by combining anisotropy and the L1 norm. Xue et al. used the weighted L1 norm to constrain the target and added constraints on the noise using L1,1 and L2,1 norms to eliminate the background residue. However, in a complex background, the simple nuclear norm cannot fully constrain the background. These methods are unable to fully recover the background image, causing unsatisfactory detection results. Compared with two-dimensional data, high-dimensional data have more structural information, making it easier for us to explore hidden relationships between data. Therefore, Dai et al. [18] proposed an infrared patch tensor (IPT) model to extend the two-dimensional image matrix to three-dimensional, fully exploring the spatial correlation of non-local areas of the image, and using the prior information to constrain sparse targets through re-weighting. Zhang et al. [19] further suppressed the background using partial sum of the tensor nuclear norm (PSTNN), and then used the corner descriptor function and the background descriptor function to obtain prior information of the image to jointly constrain the target using the L1 norm. Wang et al. [20] constructed a new spatio-temporal tensor model, non-overlapping patch spatial-temporal tensor (NPSTT) model, based on the IPT model to avoid redundant information. The method also obtains better target constraint results with the help of tensor upper bound nuclear norm. Kong et al. [21] approximated the non-convex tensor fiber rank using Log tensor fibered nuclear norm (LogTFNN) on the basis of the IPT model, and then used hypertotal variation (HTV) to suppress noise. Cao et al. [22] defined a mode-k1k2 extension tensor tubal rank (METTR) and fully preserve the spatial correlation of non-local areas in images by METTR-norm.
During the actual process of target detection, the background is complex, often accompanied by strong edge contours and strong noise. Merely using the information about the entire image cannot completely suppress the strong edges, leading to residual background in the detection results. The IPT model constructs background weight function based on the characteristics of the structural tensor, but ignores the information of the target, leading to excessive contraction of the target. The PSTTN model proposes a structural tensor descriptor function for both the background and the corner points based on the RIPT model, while retaining the prior information of both the background and the target. However, because the prior information related to the background is only based on the maximum value of the eigenvalues, it cannot completely suppress the contours of strong edge contours, resulting in some edge contours being retained in the final local feature prior information.
In summary, in order to solve the limitations of existing detection models, we propose a low-rank sparse inversion method for detecting small targets by combining a new norm with self-attention mechanisms.
1. To better constrain the low-rank tensor, we approximate the tensor rank using a new tensor nuclear norm that is non-convex. The new tensor nuclear norm preserves the structural information between the various slices of the tensor and can accurately recover low-rank backgrounds under certain interference. Moreover, the high-order tensor is reduced during the operation, reducing the computational complexity of the algorithm. 2. We construct a self-attention mechanism model to constrain the sparse characteristics of the target. This model obtains local feature information of the image to suppress strong edge contours and strong noise and transforms the local feature information into a weight matrix to introduce the constraint of the sparse component of the infrared small target detection model. 3. We design an overlapping direction multiplier method to solve the infrared small target detection model and obtain the target image. In the solving process, we use a re-weighting strategy to accelerate the convergence speed and accuracy of the model.

Introduction to the Principle of Tensor Singular Value Decomposition (T-SVD) [23]
We define a invertible linear transformation L : R n 1 ×n 2 ×n 3 → R n 1 ×n 2 ×n 3 , for any invertible transformation L, we have L(0) = L −1 (0) = 0, and obtaining A by linearly transforming A along the third dimension, as shown in: The symbol × 3 in the formula represents the mode-3 product. The linear transformation L ∈ R n 3 ×n 3 is any invertible matrix. Under any invertible linear transformation L, the singular value decomposition of A ∈ R n 1 ×n 2 ×n 3 is given by: In the above equation slice of U , S, V, A ∈ R n 1 ×n 2 ×n 3 , respectively. Where U ∈ R n 2 ×n 2 ×n 3 and V ∈ R n 2 ×n 2 ×n 3 are the orthogonal tensor, S ∈ R n 2 ×n 2 ×n 3 is the diagonal tensor. The inverse mapping of (3) Figure 1 and Algorithm 1 demonstrate the process of tensor singular value decomposition. As S and S are F-diagonal tensors, we can define the tensor tube rank using their non-zero tube counts [24].

Algorithm 1 T-SVD under linear transform L
Input: A ∈ R n 1 ×n 2 ×n 3 and invertible linear transform L; Step 1. Compute A = L(A); Step 2. Frontal slicing of image A and singular value decomposition of each frontal slice combined with Equation (2) to obtain U , S and V Step 3. The inverse linear transformation of the three tensors is calculated by Equation (3) to obtain U , S and V Output: U , S, V

Constructing New Tensor Nuclear Norm Constrained Background Low Rank Properties
In the absence of noise, the infrared weak small target detection model can be transformed into a Tensor Robust Principal Component Analysis (TRPCA) model, which expression is: where D, A, E represent the original image tensor of the sequence, the background image tensor of the sequence, and the target image tensor of the sequence, respectively. λ is a positive parameter. Considering that the TRPCA model mentioned above is disrupted by the strong edge contours and noise in the environment with many strong noise points, causing the bad detection result, we have made improvements to the above model. Firstly, we introduce the tensor nuclear norm based on linear transformation to replace the original tensor nuclear norm, which constrains the image background globally. We use Algorithm 1 based on linear transformations of tensor singular value decomposition and tensor t-product to define tensor spectral norm, and then derive the new tensor nuclear norm by the property of tensor spectral norm as the dual norm of tensor nuclear norm. The definition of the new tensor nuclear norm is as follows [25,26]: where > 0 is a constant. A ∈ R n 1 ×n 2 ×n 3 , A ∈ R n 1 n 3 ×n 2 n 3 are the block diagonal matrix, and its i-th block on the diagonal is the i-th slice A (i) of A. The new tensor nuclear norm is able to approximate non-convex tensor rank well, while preserving the structural information between each tensor slice, and achieving dimensionality reduction for highorder tensors with low time complexity and high computational efficiency. Therefore, this paper adopts the new tensor nuclear norm to build the detection model. The model expression of the new tensor nuclear norm is as follows, arg min

Constructing Self-Attention Mechanisms to Constrain the Sparsity of the Target Feature
The detection model constructed based on non-local feature information of images can well recognize complex edge contours in the background of images. However, it cannot completely suppress some strong edge contours and strong noise points, lead to residual edge contours and noise in the detection results. Therefore, many scholars use traditional filtering or human visual saliency methods to obtain local feature information of images, and then convert the local feature information into a weight matrix and introduce it into the infrared weak small target detection model constructed by nonlocal feature information. Scholars have found that in infrared images containing weak targets, the high-order singular values obtained through SVD decomposition usually reflect the noise in the image, the middle-order singular values reflect the targets in the infrared image, and the low-order singular values reflect the background in the infrared image [27]. Based on these characteristics, the singular value decomposition of the image matrix in this article can achieve background suppression. Assuming that the rank of matrix X ∈ R m×n is r ≤ min(m, n), the singular value decomposition of matrix t can be expressed as follows In the above equation, U ∈ R m×m and V ∈ R n×n are orthogonal matrices, which are, respectively, called left and right singular value vector matrices, and ∑ ∈ R m×n is the singular value matrix. Since the high-order and low-order singular values reflect the noise and background of the infrared image, respectively, the target image can be obtained by directly truncating the high-order and low-order singular values of the singular value matrix through the double threshold method [27]. Assuming that the truncated singular value matrix is ∑ , the SVD filtering result can be calculated by the following equation The result of the singular value filtering obtained through the above thoughts is shown in Figure 2b, and its global three-dimensional diagram is shown in Figure 2c. From Figure 2, we can see that using the double threshold rule alone cannot accurately determine the high-order and low-order singular values of the infrared image matrix, which leads to the retention of a large number of edge contours and noise residues in the detection results. Therefore, we have developed a target sparse constraint algorithm based on self-attention mechanism to further suppress the edge contours and noise residues in the SVD filtering results.
In cognitive science, due to the bottleneck of information processing, humans often focus on important information while ignoring unimportant information, a mechanism called the self-attention mechanism. The Self-attention mechanism usually focuses on important information with high weights and ignores irrelevant information with low weights.
Moreover, the weights can be continuously adjusted to obtain important information under different circumstances [28]. The core formula of the self-attention mechanism is: where Q, K, V represent the Query, Key, and Value matrices, respectively. d k denotes the dimension size of matrix Key, and So f tmax is the normalization function. From analyzing the information of the filtered image, it can be seen that there are only a few edge contours and a small amount of noise residues present in the image after filtering. We replace Q, K, V with non-overlapping infrared block image matrices X t , the transpose of non-overlapping infrared block image matrices X T t , and non-overlapping infrared block image matrices X t , respectively. Since the highest correlation between X t and X T t is in the target area, we can achieve target extraction and suppression of edge contours through the idea of self-attention mechanism. The formula established in this paper based on the self-attention mechanism for non-overlapping infrared image blocks is as follows: where X t denotes the non-overlapping infrared block image matrix, * and denote the multiplication of the matrix and the hadamard product of the matrix, respectively, and So f tmax() is the normalized function. We found that the background residues in the filtered results always appear as stripes, while the targets appear as dots. Using block image matrices X t * X T t , we can enhance the energy of the targets and suppress edge contours. We then normalize the results of the block matrix X t * X T t to use them as weight matrices to constrain the original block image matrix, thereby suppressing the background edge residues. The process of constructing a self-attention mechanism to constrain the sparse characteristics of the target is shown in Algorithm 2.

Algorithm 2 Target sparse constraint algorithm based on self-attention mechanism
Input: The result X ∈ R m×n of SVD filtering; Step 1. The image matrix X is divided into block image matrices X p using a sliding window; Step 2. The operation result X p of the self-attention mechanism is obtained by calculating each block image matrix using Equation (10); Step 3. The final result matrix X is obtained by concatenating the operation result block matrices X p of the self-attention mechanism; Output: The result X of the target sparse constraint algorithm after applying the selfattention mechanism.
We construct the tensor W x ∈ R m×n×t by stacking together the results X of t-frame sequence images after the self-attentive mechanism, and then take the inverse of all elements of the tensor W x to obtain W l . To accelerate the convergence speed and improve the accuracy of the model, paper [29] proposed a re-weighting strategy which adds a coefficient weight in the model. The expression for this coefficient weight is: where c represents a positive constant and vs. represents a very small constant to prevent the denominator from being 0. By combining the re-weighting strategy with local feature tensors, we obtain the final local feature tensor as follows: the objective function of the final model is:

Solving of the Model
For the model represented by Equation (13), we usually use the overlapping direction multiplier method to solve it [30]. The augmented Lagrangian function for Equation (13) is: where Y denotes the Lagrangian multiplier, • denotes the inner product of the tensor, and µ > 0 denotes the penalty factor. We divide the augmented Lagrangian function into several sub-problems for solving. With regard to the sub-problem of A, in k + 1 iterations, For the above sub-problem, we use the Tensor Singular Value Thresholding (T-SVT) operator [25] for solving.
where S 1 = L −1 (L(S) − 1) + , here t + = max(t, 0). With regard to the sub-problem of E , in k + 1 iterations, For the above sub-problem, we use the threshold shrinkage operator for solving: where S τ (X ) = sign(X ) max(|X | − τ, 0), and sign(•) denotes the sign function. With regard to the sub-problem of Y, µ, in k + 1 iterations, where ρ, µ max are given constants. Finally, Algorithm 3 shows the entire process of the ADMM algorithm, and Figure 3 shows the entire process of the model.

NNSAM Model
Construction of joint image SVD and Self-Attention mechanism clutter suppression model t X 

Experiment and Analysis
In this section, we will further showing the advantages of our model from target detection ability, robustness to noise, and adaptability to multiple scenes. We will compare our algorithm with nine advanced algorithms in the weak object detection field on six sequences representing image detection effects [31]. We will also evaluate our algorithm using performance metrics such as SSIM, BSF, SNR, and ROC curves [32]. Figure 4 shows the six sequential images used in the experiments.

Parametric Analysis
In our proposed methods, the size of image patches in the sparse target-constrained algorithm based on the self-attention mechanism, as well as the regularization factor λ and µ in the infrared weak small target detection model, have a significant impact on the algorithm proposed in this paper. Therefore, in order to demonstrate the impact of parameters on the algorithm, we conducted separate analysis on these three parameters. The analysis results are as follows.

Patch Size
In the background-constrained algorithm based on self-attention mechanism, the size of image patches has a significant impact on the constraint effect of edge contours. Moreover, the size of image patches is related to the size of the target. When the target is larger, the size of image patches should also be larger. Figure 5a shows the relationship curve between the size of image patches and the global signal-to-noise ratio under sequence (a). In this paper, the size of image patches is 5.

Regularization Factor λ
λ is a regularization factor used to balance the sparse target components and low-rank background components in the model. When λ is too small, it can lead to incomplete separation of the low-rank components, resulting in more background edge contours in the detection results. When λ is too large, it can cause the targets to shrink excessively, affecting the detection performance of the model. Therefore, the value of λ has a significant impact on the model, and finding the best balance is a challenge. Figure 5b shows the relationship curve between the value of λ and the detection rate under sequence (a). In this paper, the regularization factor λ is 0.1.

Penalty Factor µ
The penalty factor µ is used to balance the T-SVT operator and the threshold shrinkage operator. The penalty factor can affect the convergence speed and solution accuracy of the model. When the penalty factor is too small, it will lead to excessive shrinkage of the target, resulting in a low detection rate. When the penalty factor is too large, it will cause inaccurate solution results, leading to false positives and false negatives. Figure 5c shows the relationship curve between the value of µ and the detection rate under sequence (a). In this paper, the penalty factor µ is 3 × 10 −4 .

Results of Background Suppression Experiments
To demonstrate the sparsity constraint effect of the self-attention mechanism algorithm proposed in this paper, we conducted experiments on the eight scenes and the experiments result are shown in Figure 6. The results suggested that the self-attention mechanismbased sparsity constraint algorithm can effectively suppress edge residual and noise in the background by dividing the image matrix into blocks and performing the operation in Equation (10), achieving good background suppression effect. Figure 6a,c,e, respectively, represent the original image, SVD filtering result and the result after being constrained by the self-attention mechanism, while Figure 6b,d,f, respectively, represent the threedimensional view of the original image, the three-dimensional view of the SVD filtering result, and the three-dimensional view of the result after being constrained by the selfattention mechanism.

Visual Comparison with Baselines
We compare the visual detection results of the algorithm in this paper with nine algorithms in eight different scenarios and the result shown in Figures 7-14. The Aadwcdd [33] algorithm uses the cumulative directional derivative to weight the mean absolute gray difference and eliminate the disadvantage of the decrease in target detection rate when facing strong structured edge backgrounds. However, we can find from Figures 7-14 that this algorithm cannot completely suppress the contours and noise in the complex backgrounds, resulting in more background contours and noise in the detection results. The ASTTV [34] algorithm proposes a non-convex approximation tensor rank method that adaptively distributes singular value weights with the help of the Laplace function, and further constrains the background by combining asymmetric spatial-temporal total variation (ASTTV) as a regularization term. However, we can find from Figures 7-14 that this algo-rithm cannot effectively suppress the background when facing strong edge contours and large areas of strong background only by using the global feature information of images. The DPSRG [35] algorithm extracts targets by peak density search and maximum area growth method and can effectively suppress the background. However, we can find from Figure 11 that the DPSRG algorithm cannot separate the target when facing low-energy backgrounds that are submerged in contours, resulting in poorer detection. The GST [36] algorithm models small targets using the second conjugate symmetric derivative of the two-dimensional Gaussian on the directional field. We can find from Figures 9 and 10 that this algorithm cannot detect target points under low-energy target scenes. The FKRW [37] algorithm uses the local contrast descriptor (NLCD) to achieve clutter suppression and target enhancement. We can see from Figures 7-14 that NLCD cannot completely inhibit the edge contours under strong edge contours and strong noise conditions. This leads to the appearance of edge residues in the detection results. The PSTNN [19] algorithm and the VNTFR [21] algorithm combine the local prior information and global features of the image to achieve weak target detection. We can find from Figures 8-14 that there are a small number of edge contours in the difference map, which is due to the incomplete suppression of the background by the local feature prior information of the PSTNN and VNTFR algorithms, making it impossible to completely constrain the target in the detection model, resulting in a small amount of background residues in the detection results. The SRWS [38] algorithm achieves background suppression by overlapping edge information (OEI). However, we can see from Figure 9 that there are still many edge residues in the detection results under the low-energy target condition. The principle of the top-hat algorithm uses a structural element as a template to perform opening and closing operations on the image to obtain the background model. However, simple opening and closing operations cannot suppress complex backgrounds and noise when facing strong edge contours and noise in complex scenes, leading to more background residues in the detection results. From Figures 7-14, we can find that our proposed algorithm has certain advancements compared with the remaining nine algorithms. From the 3D view, we can find that our algorithm completely suppresses the background, eliminates false alarms caused by noise, indicating that our algorithm has excellent noise robustness and background suppression ability.

Quantitative Evaluation
In this section, we quantitatively evaluated our algorithm and the other nine comparative algorithms using SSIM, BSF, SNR, and ROC curves. The results of the calculation of SSIM, BSF, and SNR for our algorithm and nine classic algorithms under six sequential images are shown in Tables 1 and 2. The bold text in Tables 1 and 2 represents the maximum value. The parameters used in the experiment were the best values in the corresponding literature. From Tables 1 and 2, our method can achieve better background estimation compared with other nine advanced algorithms. Moreover, the SNR of our method is the highest in Scene (d), Scene (e), and Scene (f), while in the other scenes, our algorithm can rank in the top three. From Figures 7-12, the detection results of SRWS, Aadwcdd, and FKRW have strong target energy, so they have larger SNR values. However, our method is superior to these three algorithms in terms of noise suppression.
We also plotted the ROC curves of each algorithm under the six sequential images, as shown in Figure 15. From Figure 15a-e, we can see that the ROC curve of our algorithm is included in the curves of other algorithms, and the area under the curve is the largest. Therefore, compared with other algorithms, our algorithm has the perfect results.

Algorithm Complexity and Computational Time
Real-time performance and accuracy are two important performance indicators for infrared weak small target detection systems. However, for weak small target detection algorithms, balancing real-time performance and accuracy has become a significant challenge. Traditional spatio-temporal filtering methods have a great advantage in real-time performance due to their simple computation process. However, they often need to improve the accuracy of detecting targets in complex environments with strong edge contours and noise. On the other hand, weak small target detection algorithms based on low-rank and sparse theory have better accuracy in different complex environments. However, due to the use of alternating direction method of multipliers to solve the detection model and complex operations such as SVD decomposition in the solving process, these algorithms often perform poorly in terms of real-time performance. In this paper, a new tensor nuclear norm based on linear transformation is proposed. Through linear transformation in the process of solving this nuclear norm, the original three-dimensional tensor is transformed into a two-dimensional matrix, achieving data dimensionality reduction and effectively reducing the running time of the algorithm. For a sequence tensor image of size n 1 × n 2 × n 3 , the algorithm proposed in this paper has a complexity of O n 1 n 2 n 2 3 + n (1) n 2 (2) n 3 , where n (1) = max(n 1 , n 2 ) and n (2) = min(n 1 , n 2 ). The running time of the proposed algorithm and nine other algorithms on six sequences is shown in Table 3.

Conclusions
In response to the shortcomings of existing weak target algorithms in an intricate environment, such as low noise robustness, weak background suppression capability, and a high false alarm rate, we propose a low-rank sparse inversion weak target detection method using a joint new paradigm and attention mechanism. In this method, we use an improved tensor nuclear norm to explore the hidden structural information between various modes of the tensor, which can better approximate the tensor rank while reducing time and computational complexity compared to PSTNN, SNN, TNN, etc. We also establish a target sparse constraint algorithm based on self-attention mechanism as a prior information constraint model for target components to achieve better background and target separation and reduce noise and residual background in detection results. Finally, we use optimization algorithms based on ADMM and linear transformation T-SVD to solve the detection model. Based on extensive experimental results in various scenarios, our method demonstrates robustness and outperforms other algorithms in evaluation metrics such as BSF, SSIM, SNR, ROC, etc.
In the sparse target-constrained algorithm based on self-attention mechanism, the image patch size used in this paper is fixed, which has certain limitations when dealing with targets of different sizes. Therefore, if a method can be established to adaptively adjust the image patch size, it will lead to more exciting results. At the same time, we will explore the application of attention mechanism in the time domain and establish a target sparse constraint method based on image temporal-spatial information, which will be integrated into TRPCA model to further improve target detection performance.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The abbreviations are used in this manuscript as following: