Local Spatial–Temporal Matching Method for Space-Based Infrared Aerial Target Detection

The feature of a space-based infrared signal is that the intensity of clutter is much stronger than that of an aerial target. Such a feature poses a great challenge to aerial target detection since the existing infrared target detection methods are prone to enhance clutter but ignore the real target, which results in missed detection and false alarms. To tackle the challenge, we propose a concise method based on local spatial–temporal matching (LSM). Specifically, LSM mainly consists of local normalization, local direction matching, spatial–temporal joint model, and inverse matching. Local normalization aims to enhance the target to the same strength as the clutter, so that the weak target will not be ignored. After normalization, a direction-matching step is applied to estimate the moving direction of the background between the basic frame and referenced frame. Then the spatial–temporal joint model is constructed to enhance the target and suppress strong clutter. Similarly, inverse matching is conducted to further enhance the target. Finally, a salience map is obtained, on which the aerial target is extracted by the adaptive threshold segmentation. Experiments conducted on four space-based infrared datasets indicate that LSM handles the above challenge and outperforms seven state-of-the-art methods in space-based infrared aerial target detection.


Introduction
The task of aerial target detection is of great importance in many fields, including air traffic surveillance [1] and intelligence reconnaissance [2]. Space-based infrared (IR) imaging technology has the advantages of all-day and wide-area imaging, while both the on-orbit experiment [3] and ground theoretical research [4] have certified that an aerial target can be detected by space-based IR detectors. Therefore, research on space-based IR aerial target detection continues to attract much attention.
The sizes of the aerial targets on the space-based IR images range from 5 × 5 to 9 × 9 pixels, which is in accordance with the definition [5] of IR small target. However, space-based IR aerial target detection is considerably different from ground-based IR small target detection. First, a remote imaging distance (>300 km) weakens the intensity of aerial targets. Second, the complex earth background and frequent human activities generate strong clutter whose spatial characters are similar to the small target in space-based images. The above two factors result in a feature in which the aerial target is much weaker than the clutter in the space-based IR image. As shown in Figure 1, both aerial target and clutter cover several pixels in the space-based IR image, while the intensity of clutter is nearly twenty times stronger than that of the target. During detection, the weak target may lead to missed detection, and the strong clutter could yield false alarms. Third, the computational resources are limited on the space-based platform, while they are unlimited on the ground-based platform, which means the space-based detection method must be resource-friendly. Therefore, how to enhance weak targets and suppress strong clutter efficiently has been a critical issue for the space-based detection method, and it has been the challenge for space-based IR aerial target detection. In the past decades, researchers continually proposed hundreds of IR small target detection methods, aiming for efficient detection under different scenarios. Nevertheless, most methods are proposed for ground-based detection instead of space-based detection.
The filter-based methods, such as TDLMS [6] and Top-hat [7], are easy to achieve, but they still struggle to enhance weak the target. Recently, some researchers have designed more complicated filters to detect small targets under specific background; for example, Lu et al. proposed a filter-based method for maritime IR small target detection [8]. The local contrast method (LCM) proposed by Chen et al. [9] attracts much attention for the concise structure. Additionally, a great number of LCM-based methods [10][11][12] working on the ground-based platform have subsequently been proposed and detect small targets under complex backgrounds. Moreover, most space-based detection methods, such as local blob-like contrast map and local gradient map (LBCM-LGM) [13], neighborhood saliency map (NSM) [14], spatial-temporal local contrast method (STLCM) [2], and spatialtemporal local contrast filter (STLCF) [15], are LCM based. Though these methods perform well on weak target enhancement, they suppress the strong clutter inefficiently, leading to false alarms. In recent years, the mainstream detection methods are mostly based on IR image patch (IPI), low-rank representation (LLR), and deep learning (DL). According to the different spatial correlations of target and background, some methods based on IPI [16][17][18][19] or LLR [20][21][22] have been proposed to extract the small target from the IR image. However, these methods cannot distinguish the real target from the background when the target has close intensity to its neighboring region. In addition, they show poor real-time performance since they require a great number of iterations during optimization. Other mainstream methods based on DL [23][24][25][26] show effective performance in complex backgrounds. However, the performance of DL-based methods severely relies on their datasets; thus, they are not suitable for space-based detection because the space-based dataset is scarce in its current state.
As far as we know, although thousands of infrared small target detection methods have been proposed, most are ground-based detection methods; the number of space-based detection methods is much fewer than ground-based methods. Most existing space-based detection methods are LCM-based since computational resources on space platforms are limited while LCM-based methods are easy to implement on hardware and consume fewer computational resources than IPI-or DL-based methods. In 2018, a single-frame method called neighboring saliency map (NSM) for space-based detection was proposed and detected a dim target with a signal-to-clutter (SCR) less than 1. The space-based detection methods based on spatial-temporal local contrast maps (STLCF) [15] and spatial-temporal local contrast maps (STLCM) [2] are both LCM-based methods. Lv et al. proposed a method that detects the space-based weak moving target with an SCR ≈ 1 or even <1 [27]. They further proposed a dim small moving target detection and tracking method based on a spatial-temporal joint processing model (STJP) [28], which also performed well on spacebased dim target detection. However, the existing space-based detection methods mainly focus on dim target enhancement but ignore interference resulting from the strong clutter.
Although current methods, both space-based and ground-based methods, achieve detection in complex backgrounds, they are only for conditions where the target strength is close to or stronger than the clutter or highly light background. Therefore, it is significant to overcome the space-based detection challenge posed by a feature in which the clutter is much stronger than the aerial target.
To conquer the above challenge, we propose a space-based IR aerial target detection method based on local spatial-temporal matching (LSM), which has a concise structure. The contributions of LSM are given as follows.
(1) Local normalization is proposed to shorten the difference between aerial target and strong clutter, which ensures that the weak target and strong clutter will be processed in subsequent steps within the same value domain. (2) Local direction matching and spatial-temporal joint model are constructed to suppress the strong clutter and enhance aerial target by considering the spatial-temporal difference between aerial target and background. (3) A reverse matching step is leveraged to further enhance the target and eliminate the residual clutter. (4) Experiments conducted on the space-based IR datasets demonstrate that LSM can enhance the weak target and suppress the strong clutter simultaneously and effectively and that it performs better than the existing methods on space-based IR aerial target detection.

Proposed Methods
The local spatial-temporal matching detection method (LSM) is suitable for the IR image sequence obtained by a space-based platform under staring imaging mode. LSM consists of five steps: local normalization, local direction matching, spatial-temporal joint model, reverse matching, and adaptive threshold segmentation. The details of LSM are elaborated in this section, and an overview is given in Figure 2.

Local Slices Extraction and Normalization
The first step in the proposed methods is local normalization, which reduces the sensitivity of the subsequent steps to the strong clutter. As shown in Figure 2, at the local normalization step, a local slice named R 11 is extracted at the point (x, y) in the base frame I b (b represents the frame number in the sequence). The neighboring region of R 11 is defined in Equation (1): where Ω R 11 is the neighboring region, and s is the radius of the target. In reference frame I b+l , the local region of the pixel I b+l (x, y) represented by Ω local is extracted and defined as: where l represents frame interval, and r match represents the matching radius determined by practical engineering tasks. In our work, r match is set to 1; the range of is illustrated in Figures 2 and 3. Then nine slices with the same dimension as R 11 are extracted and named R 2m , m = 1, 2, 3, . . . , 9. The positions of R 2m s are further illustrated in Figure 4, and the yellow points are the centers of R 2m s.   The considerable difference between target and strong clutter causes missed detection and false alarm. Thus, the local normalization is designed to transfer the intensity into range [0, 1]. The definition of local normalization is as follows: where (g, h) represents the position in the R 11 and R 2m s, R nor1 (g, h) means the normalized value at the point (g, h) within R 11 , R m nor (g, h) does the same. After the local normalization, both the target and clutter are processed within the value domain [0, 1] during the subsequent steps.

Local Direction Matching
When the space-based imaging system works under the staring mode, the backgrounds including the strong clutter in the IR sequence are moving within a tiny area. Therefore, the background can be supposed to move straightly in a short frame interval. Local direction matching is designed to determine which local slice of R 2m in I b+l is the most similar to R 11 in the I b . In this paper, the local matching function is designed to measure the matching degree. The matching coefficient at point (x, y) is also determined. The functions are given as follows: where r m s represents the matching degree between R nor1 and R m nor s, r 1 is the matching coefficient, and m max determines the local slice in R m nor s that is most similar to R nor1 . As shown in Figure 3, if m max = 9, R 9 nor is the slice matching to R nor1 , which means the background moves from (x, y) to (x + 1, y + 1) during [b, b + l], as illustrated by the green arrow at the local direction matching step.

Spatial-Temporal Joint Model
Once local direction matching is performed, suppression of strong clutter and aerial target enhancement can be achieved by the spatial-temporal joint model. First, the difference slice R dif is obtained by local slice difference: after which most backgrounds, including the clutter in I b , are suppressed initially, even if the clutter is much stronger than the aerial target. The neighboring region of R dif is divided into internal and external regions, and their relationships are given as follows: where Ω int and Ω ext are the internal and external regions, respectively, and Ω R dif represents the neighboring region of R dif , which has the same range as Ω R 11 ; ∅ is the null set. The relationship between Ω int and Ω ext is illustrated in Figure 3, where the red region represents Ω int , and the rest of the blue rectangle represents the range of Ω ext . Then, the nonuniformity stripes resulting from the inadequate preprocessing can be suppressed by the equation: where R int is the matrix constructed by the pixels in Ω int , and R ext does the same.
If the target appears, the dipole containing positive and negative peaks are left in R int . The dipole is highlighted by a pair of red circles in Figure 2. Thus, the dipole value at (x, y) is extracted: where d dipole1 is the dipole value. At this step, the clutter can be further suppressed, but the aerial target can be significantly enhanced by quadratic operation. Finally, the value of local spatial-temporal matching between I b and I b+l is calculated as: where I v1 (x, y) represents the matching value at the point (x, y).

Reverse Matching
After obtaining I v1 (x, y), reverse matching is added into LSM. As shown in Figure 3, at (x, y), if the offset from I b and I b+l is in the direction indicated by the green arrow, the offset from I b−l and I b is in the direction indicated by the yellow arrow in Figure 3, and I b−l is another reference frame. In this case, the local backgrounds in I b−l and I b can be reverse-matched. At (x, y) in I b , the offsets of local background from I b−l to I b can be determined by: where dy and dx denote the offsets in the horizontal and vertical directions, respectively, and fix( * ) represents the operation calculating the nearest integer in the direction to zero. As shown in Figure 3, the local slice in I b−l , given by R 31 , is determined in the reverse matching step. The neighboring region of R 31 is formulated as: where Ω R 31 denotes the neighboring region of R 31 . The normalized slice of R 31 and the matching coefficient between R 11 and R 31 are obtained by Equations (3) and (5), and R 31 nor and r 2 (x, y) represent the normalized slice and matching coefficient, respectively. Finally, the spatial-temporal joint model is constructed. Identically, the value of local spatial-temporal matching between I b and I b−l is calculated by Equations (8)- (14) and represented by I v2 (x, y).

Adaptive Threshold Segmentation
The mean filter is introduced to suppress noise, which is conducted as: where d dif2 (x, y) denotes the value after mean filtering at point (x, y). In addition, the saliency map I map is obtained by: where I map (x, y) is the map value at (x, y). The results are shown in Figure 5; even though the target is much weaker than clutter in Figures 1 and 5a, it is enhanced significantly, and the strong clutter is well suppressed. In I map , clutter and background are suppressed, but the IR aerial target is retained and enhanced. Finally, the aerial target is detected by adaptive threshold segmentation: where std( * ) represents the standard deviation operation, and k is the segmentation parameter. k has been experimentally proved to show that k ∈ [10, 30] is effective. When the value of the element in I map is greater than T, it is set to one, and the opposite is set to zero. The point set to one is the aerial target. The entire procedure of LSM is given in Algorithm 1. (2) for x = 1 : row do (3) for y = 1 : col do (4) Obtain the local slices R 11 and R 2m s by Equations (1) and (2); (5) Obtain the normalized slices R nor1 and R m nor s by Equations (3) and (4); (6) Calculate the matching coefficient r 1 (x,y) and determine the R 2m max by Equations (5)-(7); Construct the spatial-temporal joint model between I b and I b+l and calculate I v1 (x, y) by Equations (8)-(14); (8) Conduct reverse matching and obtain R 31 by Equations (15)-(17); (9) Calculate the normalized slice of R 31 by Equation (3); (10) Calculate the matching coefficient r 2 (x, y) by Equation (5); (11) Construct the spatial-temporal joint model between I b−l and I b and calculate I v2 (x, y) by Equations (8)- (14); (12) Calculate the saliency map value I map (x, y) by Equations (18) and (19); (13) end for (14) end for (15) Obtain the saliency map I map ; (16) Calculate the adaptive threshold T by formula Equation (20); (17) Output the position of the aerial target.

Experimental Condition and Evaluation Index
The datasets used for experiments were four space-based IR sequences with different backgrounds, and the aerial targets were simulated targets with the same intensity distri- bution proportions as the real targets. The backgrounds and real targets were obtained from the identical space-based system working under staring imaging mode. The details of the four sequences are found in Table 1. The speeds of aerial targets ranged from 1.1 to 2.0 pixel/frame. The value of l was set to two after parameter analysis conducted in Section 4. To evaluate the detection effectiveness, LSMs are compared with seven state-of-the-art detection methods, including fusion saliency map (FSM) [10], double-neighborhood gradient method (DNGM) [11], neighborhood saliency map (NSM) [14], spatial-temporal local contrast filter (STLCF) [15], spatial-temporal local contrast method (STLCM) [2], spatialtemporal joint processing model (STJP) [28], and multiscale local target characteristics algorithm (MLTC) [29]. NSM, STLCF, STLCM, and STJP are existing space-based detection methods, FSM is a newly proposed detection method utilized for low-altitude slow target detection that has a similar background to space-based detection, and DNGM and MLTC are new detection methods proposed in 2020 and 2021, respectively.
The evaluation indices are background suppression factor (BSF), the gain of signalto-clutter ratio (GSCR), detection rate (P d ), false alarm rate (P f ), and area under the curve (AUC). BSF is a global index for evaluating the performance of global background suppression and is defined as: BSF = σ before /σ after , where σ after and σ before are standard deviations of the processed and raw image, respectively. GSCR is the index used to evaluate the target enhancement performance. The GSCR is calculated by: where µ tar represents the mean of the target, and σ bk are the mean and standard deviation of the background, respectively, and SCR after and SCR before are SCR values of the processed and primitive targets, respectively. The indices used to evaluate the detection effectiveness are P d and P f , whose formulas are: where N detected is the number of real targets detected by the method, N real is the total number of real targets, N false is the number of targets falsely detected, and N pixel is the number of pixels. To visualize the detection effectiveness, a receiver operating characteristic curve (ROC) was drawn according to the relationship between P d and P f . The area under the ROC curve is represented by AUC.

Experimental Results
The three views of the results corresponding to different methods are given in Figures 6-9. The experimental results under two situations are both exhibited. Under the first situation where the aerial target is much weaker than clutter, as shown in images in Seqs.1, 2, and 4, STLCF, STLCM, FSM, NSM, and DNGM suppress most of the background in the images, but the clutter with strong intensities is still retained. STJP is sensitive to clutter. Only MLTC and our method can enhance the weak targets. However, MLTC still enhances the background component, which results in false alarms. The images in Seq.3 show a situation in which the intensity of a small target is close to strong clutter. Most methods, such as STLCF, STLCM, STJP, MLTC, and the proposed method, perform well on target enhancement, but STLCM, STJP, and MLTC also bring a great number of false alarms. According to the above comparison, our method fits the task to suppress the strong clutter and enhance the weak target simultaneously. The quantitative comparison and analysis are given in the next part, for more compelling results. It is worth noting that BSF is a global index evaluating the background suppression ability of a method in the whole image, but clutter only accounts for a little proportion in the background. Thus, taking the results in Figures 6-9 and Table 2 into consideration, STLCM and FSM fail to suppress the clutter even if they suppress conventional background suppression better than our method. The other methods received lower values than LSM on all sequences. These results indicate that most methods can suppress most backgrounds but have poor abilities to suppress the strong clutter on the space-based IR images. STLCM and STLCF suppress background by the direct interframe difference, in which the weak target is suppressed but residual clutter still exists. MLTC, NSM, and DNGM fail to suppress clutter since it has a similar spatial distribution as the aerial target.  The results of BSF are listed in Table 2. LSM achieved the highest BSF value on Seq.4 but had lower values than FSM or STLCM on the other three sequences because of the zero-setting operation in the two methods. In FSM, when the mean of the variance difference between the internal window and the external window is less than zero, the spatial variance saliency map value of a pixel will be zero, and the output will be zero eventually. In STLCM, the final value of a pixel will be set to zeros if this pixel is not the local maximum point. In an IR image, the local maximum points are usually composed of the small target, clutter, and noise. Therefore, because of the zero-setting operation, those pixels around the local maximum point will be assigned to values of zero. It is clear that the more zero points the final saliency map has, the lower the value of σ after in Equation (21) will be, and the value of BSF will consequently increase.
It is worth noting that BSF is a global index evaluating the background suppression ability of a method in the whole image, but clutter only accounts for a little proportion in the background. Thus, taking the results in Figures 6-9 and Table 2 into consideration, STLCM and FSM fail to suppress the clutter even if they suppress conventional background suppression better than our method. The other methods received lower values than LSM on all sequences. These results indicate that most methods can suppress most backgrounds but have poor abilities to suppress the strong clutter on the space-based IR images. STLCM and STLCF suppress background by the direct interframe difference, in which the weak target is suppressed but residual clutter still exists. MLTC, NSM, and DNGM fail to suppress clutter since it has a similar spatial distribution as the aerial target.  The average GSCR values are listed in Table 3. LSM receives the best results on Seqs.1, 3, and 4. STJP and STLCF find it hard to enhance the target in the space-based IR image. STLCM, FSM, and NSM receive malfunctions on Seqs.1, 2, and 4 because of the considerable intensity difference between target and clutter. Only DNGM shows a better result than LSM on Seq.3 but enhances clutter better, as shown in Figure 7. MLTC can enhance the aerial targets that are much weaker than clutter in space-based images, but our method performs better than MLTC, as shown in Table 3.  The results of detection effectiveness are shown by the ROC curves and AUC values in Figure 10 and Table 4. The P d s of LSM are more than 85% on the four sequences when P f s are 10 −4 and more than 97% when s are 10 −3 . The AUC values of LSM on the four sequences are 0.9994, 0.9990, 0.9986, and 0.9995, respectively. However, the results of other methods are unstable on the four sequences. In Figure 10a,d, most AUC values of the most compared methods are less than 0.8 because Seqs.1 and 4 have backgrounds of sea and land, and the intensity of clutter is at least 10 times stronger than that of the target. On Seq.2, DNGM has an AUC value of more than 0.99 because it enhances the target better than LSM. However, the effectiveness of DNGM drops sharply when P f < 10 −4 because DNGM enhances clutter better than target, and MLTC does the same on four sequences. On Seqs.3, the compared methods achieved better detection performance; the five methods obtained AUC values of more than 0.99 because the intensities of the target are close to those of clutter. The average AUC values are also calculated and given in Table 4; MLTC receives the highest average value of 0.9534 because of great target enhancement ability, while the minimum value is 0.7345 belonging to the single-frame method NSM. Compared with the seven methods, the average AUC value of LSM is 0.9991, indicating that LSM has the best detection effectiveness.   According to the experimental results, LSM performs better than the seven compared methods and can detect the aerial targets more effectively on the space-based IR sequences with different backgrounds. The results prove that LSM can conquer the challenge of enhancing the weak target and suppressing the strong clutter simultaneously.

Analysis and Discussion
Seqs.1-3 contain 7 × 7 targets and Seq.4 contains 5 × 5 targets. According to the results, our method enhanced both targets significantly. The GSCR values in Table 3 revealed that our method enhanced the targets with different sizes considerably. As for the detection effectiveness, our method obtained P d > 98% when the P f s reached 10 −3 on all sequences. Meanwhile, the values of AUC were more than 0.9986. The above results indicate our method can maintain its effectiveness when detecting targets of different sizes.
In order to analyze the influence of l, we selected Seq.1 as the example with which to explore the influence of parameter l, in which the target speed is 1.55 pixel/frame; the experiments were conducted with different values of l. Results of BSF and GSCR are given in Table 5, and the detection effectiveness is shown in Figure 11. The results indicate that there are few evidently different results between l and −l but that the tendencies of BSF and GSCR are different with |l| increase. With |l| increase, the matching coefficient r 1 decreases, but the dipole is clearer; thus, the target can be further enhanced, but the background cannot be well suppressed. The detection effectiveness shown in Figure 11 reveals that it generates similar detection results when l = ±2 or ±3, which is better than the result when l = ±1. Therefore, the value of l is recommended to be ±2, and set to two in this paper.  The segmentation parameter k directly influences the detection effectiveness; the relationships between k and detection results are shown in Figure 12. With the increase in the k, both detection rate and false alarm rate decrease. The detection rate of different experimental sequences showed similar trends, and the detection rates could be maintained above 90% when 10 ≤ k ≤ 30. The variation trends of the false alarm rates of the five test sequences are nearly the same as well. When k ≥ 10, the false alarm rates of all the sequences are less than 10 −3 . In order to maintain the detection rate ≥ 90% and false alarm rate ≤ 0.5 × 10 −3 , the value range of k in this method is recommended to be [10,30], which was given in Section 2.5.

Conclusions
This paper proposes a concise method, which is based on local spatial-temporal matching, for detecting an aerial target on a space-based IR platform. The experimental results determine that, compared with existing methods, LSM exhibits better detection performance when the clutter is much stronger than the aerial target. However, LSM is currently only suitable for the staring imaging mode and still needs to be optimized to adapt to other modes.