Structure-Adaptive Clutter Suppression for Infrared Small Target Detection: Chain-Growth Filtering

: Robust detection of infrared small target is an important and challenging task in many photoelectric detection systems. Using the difference of a speciﬁc feature between the target and the background, various detection methods were proposed in recent decades. However, most methods extract the feature in a region with ﬁxed shape, especially in a rectangular region, which causes a problem: when faced with complex-shape clutters, the rectangular region involves the pixels inside and outside the clutters, and the signiﬁcant grey-level difference among these pixels leads to a relatively large feature in the clutter area, interfering with the target detection. In this paper, we propose a structure-adaptive clutter suppression method, called chain-growth ﬁltering, for robust infrared small target detection. The well-designed ﬁltering model can adjust its shape to ﬁt various clutter structures such as lines, curves and irregular edges, and thus has a more robust clutter suppression capability than the ﬁxed-shape feature extraction strategy. In addition, the proposed method achieves a considerable anti-noise ability by employing guided ﬁlter as a preprocessing approach and enjoys the capability of multi-scale target detection without complex parameter tuning. In the experiment, we evaluate the performance of the detection method through 12 typical infrared scenes which contain different types of clutters. Compared with seven state-of-the-art methods, the proposed method shows the superior clutter-suppression effects for various types of clutters and the excellent detection performance for various scenes.


Introduction
Infrared small target detection plays an important role in many applications such as infrared search and tracking system (IRST), automatic target recognition system (ATR) and early warning system [1][2][3]. Due to the long-imaging distance in these applications, targets are usually small and lack of shape and structure information in infrared images, leading to the difficulties in extracting abundant distinctive features of the targets [4][5][6]. Moreover, in practical applications, the small targets are usually drowned in heavy noise and complicated background clutters, which cause more interference to stable detection [7,8]. Therefore, it is a challenging problem to separate small targets from complicated backgrounds without any false alarms in infrared noisy images [9,10]. To solve this problem, many methods were proposed in recent processing a scene with strong edges [38]. Although the following methods [39][40][41][42][43] obtain a remarkable progress to remove the edge residuals, they can hardly eliminate the strong local clutters of various shapes completely by employing a specific sophisticated norm to replace the nuclear norm.
In recent years, the contrast mechanism of human visual system (HVS) have been extensively introduced to the DBT-based methods. In 2013, Chen et al. created a feature called the local contrast measure (LCM) to define the local contrast between the targets and background in infrared images [44]. Since then, plenty of other definitions of the local contrast, such as the improved difference of Gabor filter [45], the multiscale patch-based contrast measure (MPCM) [46], the high-boost-based multiscale local contrast measure (HBMLCM) [47], the multiscale weighted local contrast measure (MWLCM) [48], the derivative entropy-based contrast measure (DECM) [49], the relative local contrast measure (RLCM) [50], the Gaussian scale-space enhanced local contrast measure (GSS-ELCM) [51], the homogeneity-weighted local contrast measure (HWLCM) [52], and so on, were proposed. The above models compute the local feature value at each position by sliding a rectangle window, and they are usually concise and thus have a fast running speed. However, the rectangle window can hardly match the different clutter shapes very well, and this may cause a decline of the detection performance. For instance, for an irregular clutter region, the rectangle window involves the pixels inside and outside the clutter region, and the wide grey-level gap between these pixels may lead to a large feature value in the clutter region, interfering with the detection.
In addition, some DBT approaches exploit the local features of the original image, including the self-information [53], the principal curvature [54], the entropy [49], the shearlet's kurtosis [55], the multi-order directional derivatives [56], and so on, to distinguish the target regions from the background. These methods extract features based on the inflexible rectangular windows, which could cause the high false alarms under complex conditions with various irregular clutters. Some anomaly detection methods such as the cluster kernel Reed-Xiaoli (CKRX) algorithm [57] were also proposed for small target detection, yet they are sensitive to abnormal background pixels. Lately, a novel approach via modified random walks (MRW) [58] was proposed to detect the small IR targets with low signal-to-noise ratio. However, it is still a challenging task to detect the small infrared targets with high detection rate and low false alarm rate under complicated background.
As mentioned before, the rectangle-window-based feature extraction strategies seem not the optimal solutions to suppress the clutters with various irregular shapes. However, if a feature extraction model only involves the pixels inside the clutter region, the unfavourable influence on clutter suppression brought by irregular clutter shapes might be weakened. Based on this intuition, in this paper, we propose a structure-adaptive clutter suppression method for infrared small target detection, which is called chain-growth filtering. Compared with the traditional feature extraction strategy based on rectangle windows, when encountering various types of clutters with irregular shape, our filtering model can adjust its shape and only involve the pixels inside the clutter region, and the small grey-level difference (difference in grey value) among these pixels leads to a better clutter suppression effect. In addition, the proposed method enjoys the capability of multi-scale target detection and achieves a considerable anti-noise ability as well. In the experiments, 12 infrared scenes under various conditions (different levels of noise, different target sizes, different types of clutters and so on) are tested, and the diversity of these scenes brings a challenge for infrared small target detection methods. To evaluate the performance of our method, we adopt seven small target detection algorithms as baseline methods for comparison. In the experimental results, our method obtains the large values of the evaluation metrics signal-to-clutter-ratio gain (SCRg) and background-suppression-factor (BSF) under the different tested scenes, showing the superior clutter suppression effects of our method for various types of clutters. Besides, our method gets the best receiver-operating-characteristic (ROC) curve in each infraed scene, which demonstrates both the excellent detection performance and the robustness of our proposed method. Figure 1 shows the diagram of the proposed detection method, which is mainly composed of 3 parts: preprocessing, chain-growth filtering and thresholding. We first preprocess the input images because the random noise (widely exists in infrared images) usually interferes with the detection process. Since the structure of clutters and targets is important in the following detection steps, here we employee the guided filtering method [59] to keep as many structure details in original images as possible while denoising (with the default parameters of the matlab command "imguidedfilter"). Then, we generate the chains at each pixel and perform the proposed chain-growth filtering model to suppress various clutters according to their structures, and this procedure is depicted in detail in this section. Finally, a classic adaptive thresholding technique is used to produce the final detection results.

Chain-Growth Filtering
The chain-growth filtering model is designed based on the following intuition: if a filtering model only involves pixels inside the clutter region, then the clutter shape's influence on clutter suppression will be weakened. Because of the relative small grey-level difference between the pixels inside the clutter, similarly to the flat region, the clutter region can also have small filtering response, which benefits the clutter suppression. Therefore, in this paper, we propose a chain-growth filtering model which can adjust its shape flexibly and only involve the pixels inside the clutter region in computation when encountering clutters, resulting in a better clutter suppression effect.

Terminology
To illustrate the concept of the chain-growth filtering clearly, we introduce the following terminologies. A chain is a set that is composed of 8-connected pixels, which follows a specific region growing criteria. A chain has a starting point and an end point. The initial state of a chain is just a pixel, and the pixel is both the starting point and the end point of the chain (marked in black in Figure 2) in this period. In the following growth procedure, only one pixel adjacent to the end point (8-connected) can be subsumed into the chain at each step, and the new-joined pixel becomes the new end point of the chain. Figure 2 shows several types of chains with the starting point and the end point marked. In this paper, we use numbers to denote the directions. The numbers 0, 1, 2, 3, 4, 5, 6, 7 correspond to the north, the northeast, the east, the southeast, the south, the southwest, the west, the northwest, respectively. A integer greater than 7 or less than 0 represents the same direction as the remainder in the division of this integer by 8. Figure 3 marks the 8 directions with numbers intuitively. A chain has a search direction and a growth direction. The search direction determines the scope of the candidate pixels that may join the chain in the following growth step. We also define the search direction boundary (SDB) of a chain, which is the set consisting of 3 neighboring pixels of the end point in the search direction.

The Growth Process
The concept of chain growth, similar to that of other region growing approaches, is to start from a point and to grow the point in a specific direction to extend the chain. The concrete growth process of a chain is depicted as follows. Here, we use C to denote the chain, d s to denote the search direction of the chain, and d g to denote the growth direction of the chain. Let us assume that the growth process starts from an arbitrary pixel p 0 . The pixel p 0 is labeled as chain C that then grows according to the growth strategy. Please note that we use C (n) to denote the chain C that has grown n steps, and use d (n) s and d (n) g to represent the search direction and growth direction of C (n) respectively. Consequently, p 0 can be denoted as C (0) , the initial search direction can be denoted as d (0) s . Please note that p 0 is both the starting point and the end point of C (0) . When C (0) and d (0) s are given, the chain can grow step by step through the following strategy; therefore, the initial growth point and the initial search direction are the two initial growth conditions of a chain. In the first growth step, the neighboring pixel of p 0 in direction d (0) s (denoted as p 1 here) is added to C (0) and turned into the new end point of C (1) ; the search direction is unchanged. This procedure can be formulated as In the following each step of growth (taking the ith growth step as an example), the maximum grey-level pixel in the SDB of C (i) (denoted as p i+1 ) is absorbed and turned into the new end point of C (i+1) , which is formulated as where g(·) represents the grey level of a pixel. If there are multiple pixels with the maximum gray value in the SDB, we select the pixel closest to the current search direction d The growth direction d (i) g (representing the direction from p i to p i+1 ) can be an infinity of integers, which construct a set called D g here, and we choose the closest integer to d (i) s in D g and assign it to d (i) g . The m in Equation (5) is the bending factor that controls the flexibility of the chains. The search direction won' t change if m = +∞, and the search direction equals to the growth direction if m = 1. If the growth direction is not changed from beginning to end, the generated chains couldn't match the clutter that bends sharply; if the search direction equals to the growth direction of the chain, the generated chains might bend greatly and grow around the target, leading to unwanted small outputs in target region. Therefore, in order to balance the flexibility and the extensibility of the chains, we set m = 3 in this paper. Please note that when d (i) s is not an integer, it represents the same direction with its closest integer when determining the SDB (see this concept in Section 2.1.1).

Stop Criterion
We use the maximum number of growth steps N g to stop the growing process. Obviously N g determines the length of the chain, which is closely related to the size of small targets. According to Society of Photo-Optical Instrumentation Engineers (SPIE), a small target is defined to have a total spatial extent of less than 80 pixels [44,50], which means the target size is less than 9 × 9 (pixels) and the radius of the target is less than 5 pixels. To guarantee the chain can exceed the scope of target region when it grows from the target center, in this paper, we set N g = 5 so that the length of the chain is larger than the radius of small targets.

Filtering Model
Based on the chain, we develop an approach called chain-growth filtering to suppress various types of clutters. We perform the proposed filtering model at each pixel, but for illustration, here let us assume that we calculate the filtering result of an arbitrary pixel p. Taking pixel p as the starting point and direction 0-7 as the initial search directions (the terminologies are given in Section 2.1.1), we can generate 8 chains denoted as C 0 , C 1 , · · · , C 7 . In the chain C j , we calculate the difference between the gray value of p and the minimum gray value in C j , and record it as h j (p): where g(·) represents the gray value of a pixel, and q represents another pixel. Then we adopt a minimum pooling strategy in the calculation of the final filtering response: we choose the minimum value from h 0 (p), h 1 (p), · · · , h 7 (p) as the chain-growth filtering result of pixel p, which is denoted as r(p) For a point in flat background region, the grey-level difference among the points in the eight chains is small, leading to a slight filtering response. For a central point in target area, because of the small area the target occupies, a chain can exceed the scope of a small target region after growing, no matter in which way the chain grows. Therefore, in all the 8 chains, the grey-level difference between the starting point and the end point is remarkable, resulting in a large final filtering response. For a point in clutter region, because the clutters (such as lines, curves, edges and so on) usually occupy quite more pixels than the small targets (only occupy several pixels) in infrared images, the flexible growth rules can guarantee at least one chain is finally inside of the clutter region despite of the complex clutter shape. In this chain, the grey-level difference among the points is small, leading to a small final filtering response similar to that in flat region. This is why chain-growth filtering can suppress various types of complex-shape clutters. Figure 5 shows the chain-growth filtering results at pure background region, line-shape clutter region and target region, respectively. The first subfigure presents an input infrared image containing some clutters and a target. We pick a background region, a line-shape clutter (bridge) region and a target region in this image, which are marked as region 1, region 2 and region 3 separately. The three selected typical regions all have a size of 11 × 11 (pixels), and they are enlarged to display (in the order of region 1, 2 and 3) in the second to fourth subfigures for a better show of the chains. The generated chains in region 1, 2, and 3 are painted yellow, and the chains which determine the final filtering response are painted orange. In region 1, we can see all the chains has little grey-level difference, so the final filtering response is small too. In region 2, some chains has considerable grey-level difference, but the chains stretching along with the bridge (clutter line) have small grey-level difference; the minimum pooling strategy leads to a weak final filtering response. In region 3, because the target has larger gray values than the neighboring pixels around it, the chains growing in all directions have large grey-level difference, resulting in a large final filtering response. In this way, the proposed chain-growth filtering method can distinguish the targets from background and clutters .  84  85  86  87  87  87  86  86  85  84  83   86  88  90  91  92  92  92  90  88  86  84   88  91  94  96  104 110 104  99  94  89  86   90  94  98  103 120 135 122 109  99  92  88   92  96  102 114 140 166 149 129 112  98  90   94  98  109 125 159 202 175 147 121 103  93   97 100 108 119 155 194 166 138 119 107  97   100 103 107 116 148 190 155 123 113 107 101   100 103 106 109 129 147 133 119 110 105  99   100 102 104 107 111 116 114 112 107 102  99   100 102 103 104 106 107 108 107 105 102  98   76  76  77  77  77  77  77  77  77  77  76   77  78  78  78  79  79  79  79  79  79  78   79  80  80  80  81  81  82  82  82  82  81   81  82  83  83  84  84  85  85  85  85  84   89  91  91  91  92  92  92  92  92  91  91   98  99  99  99  100 100  99  99  98  98  98   93  94  95  95  95  95  94  93  92  92  92   88  89  89  89  89  89  88  87  86  86  86   83  84  84  84  83  83  83  82  81  81  81   79  79  79  79  78  78  77  77  77  77  77   76  76  76  76  76  75  75  75  75  75  75   1  2   3   74  74  75  75  74  74  74  73  73  72  72   75  75  75  75  75  75  74  73  73  72  72   76  76  76  76  76  75  75  74  73  72  72   77  77  76  76  76  76  75  74  74  73  72   77  77  77  77  76  76  76  75  75  74  73   78  78  77  77  77  77  77  76  76  75  75   79  78  78  77  77  77  77  77 78  78  77  77  77  77  77  78  78  78  78   77  77  76  76  76  76  77  77  77  77  77 Region 1: flat background region Chain-growth filtering response: 0 Region 2: line-shape clutter region Chain-growth filtering response: 2 Region 3: target region Chain-growth filtering response: 94 An infrared image with a target and line-shape clutters Figure 5. Case analysis at pure background region, line-shape clutter region and target region (the chains are painted yellow and the chain to produce the filtering response is painted orange).
In summary, the chain-growth filtering model is depicted in Algorithm 1.

Algorithm 1:
Chain-growth filtering. Input: A pixel p 0 to be filtered. Output: The chain-growth filtering response of the pixel p 0 : r(p 0 ). 1: for j = 0 to 7 do 2: Initialize the chain C   After the growth procedure of the chains, we have C j = C (N g ) j . 10: Calculate the difference between the gray value of p 0 and the minimum gray value in C j through Equation (7). 11: end for 12: Calculate the chain-growth filtering response r(p 0 ) by Equation (8). 13: Replace the gray value of pixel p 0 with r(p 0 ).

Adaptive Threshold for Target Segmentation
In the process of chain-growth filtering, the background region and clutter region are suppressed, and the target region are relatively enhanced. Consequently, we can conceive that the target region is the most salient region after background suppression. Based on this fact, we can use a segmentation operation in the output image of chain-growth filtering to get the final detection results. The segmentation threshold is obtained through an adaptive threshold where µ and σ are the mean and standard deviation of the chain-growth filtering response values in the output image, and k is a relative decision threshold. In practice, the range of k is usually from 15 to 30, and the large range of k benefits the robustness of our detection method for various scenes. Finally, any pixels with a chain-growth filtering output value larger than T will be regarded as a pixel of the target.

Complexity Analysis
In the chain-growth filtering approach, both the chain growth procedure and the filtering computation model cost a constant number of operations at each position. Therefore, for an image with N pixels, the chain-growth filtering operation naturally has a complexity of O(N). Considering the preprocessing module-guided filtering-can also be computed efficiently in O(N) time [59], thus the complexity of the whole proposed detection method is O(N), which represents a relatively low computation burden. We carry out the time consumption test, and the results are presented in Section 3.6. During this test, we perform the chain-growth filtering model at each pixel. It can be seen that our proposed method currently still need some time to cope with images with large size. In practical applications, we can only perform the chain-growth filtering at some candidate target points [2], which reduces the computation significantly; furthermore, some parallel computing techniques can also be applied to further accelerate the chain-growth filtering procedure.

Experimental Results
In this section, we carry out extensive experiments to test the performance of the proposed method. We introduce the test data and the baseline methods, and illustrate the evaluation metrics for infrared small target detection. Then, we present two experiments aiming to test the robustness to noise and the capability of multi-scale target detection. Finally, both the qualitative and quantitative experiments are conducted to test the clutter suppression effects and the detection performance of each method. Our proposed method performs well in these experiments, showing its superiority in comparison with the seven state-of-the-art baseline methods. It is worth mentioning that our experiment platform is Matlab 2016b running on a laptop with a 2.60-GHz Intel i5-7300U CPU processor and 8 GB memory.

Experimental Setup
In the experiment, we test the performance of the detection method through 12 infrared scenes that are under different conditions (different levels of noise, different target sizes, and different types of clutters).
The diversity of these scenes can test the different properties of a detection method: different conditions of noise can test the anti-noise performance, different conditions of target size can test the multi-scale detection ability, and different types of clutters can test the robustness of clutter suppression. Thus, the 12 diverse scenes are exploited to form a challenging test set so that we can evaluate the performance of the detection methods objectively. Figure 6 shows the representative frame of each scene, where the targets are marked by red rectangles. Table 1 presents a brief description of these data.
To demonstrate the effectiveness of our proposed method, here we employee seven baseline methods for comparison. Please note that most of these baseline methods are proposed in the last two years, and they can represent the highest level of infrared small target detection in the current period. The employed baseline methods are Min-Local-LoG method [60], the LS-SVM-based method [25], the multiscale patch based contrast measure (MPCM) [46], the high-boost-based multiscale local contrast measure (HB-MLCM) [47], the multiscale weighted local contrast measure (MWLCM) [48], the derivative entropy based contrast measure (DECM) [49], and the multiscale relative local contrast measure (RLCM) [50].

Evaluation Metrics
The background suppression factor (BSF) is introduced to evaluate the clutter suppression effects quantitatively, and the signal-to-clutter ratio gain (SCRg) is adopted to evaluate how much the prominence of the target increases relative to the background. The two metrics are defined as where subscript in and out represent the original image and the output image of chain-growth filtering respectively. µ t is the average pixel value of target, µ b and σ are the average pixel value and standard deviation of the surrounding local neighbor background. From the above definitions, we can see the property of both the two metrics-the larger the better.
In contrast to the BSF and SCRg, the ROC curve can evaluate the final detection results directly. The ROC curve is plotted based on the true positive rate (TPR) and false positive rate (FPR): TP (true positive) represents the number of detected true targets, and AP (actual positive) represents the total number of targets. FP (false positive) represents the number of detected false targets, and AN (actual negative) is commonly defined as the total number of pixels in one frame in this research field [49,50]. Through choosing different segmentation thresholds, we can get different points in the TPR-FPR space (also called ROC space). Connect the points with lines, and we can obtain a ROC curve. In the ROC space, a curve closer to the top-left corner represents a better performance. To measure the ROC curves quantitatively, here we also calculate the area under the curve (AUC): the larger the metrics AUC is, the better detection performance it represents.

Anti-Noise Performance
Usually, the infrared images more or less contain some noise, which has some similarity to targets and could degrade the detection performance. Thus, the infrared small target detection methods should have a good anti-noise ability. Here, we evaluate the anti-noise ability of our proposed method through a designed experiment.
In this experiment, we chose two highly noisy scenes (Figure 6a,i) and added different levels of Gaussian noise to them; then we got the image samples with different signal-to-noise ratio (SNR) in each scene. Through processing these artificial images, we can find the limit of our method's anti-noise ability. Figures 7 and 8 present the processing results of our method in the two noisy scenes. The first column of Figure 7a shows the original image of the noisy scene Figure 6a. The second to fourth columns of Figure 7a show the image samples after adding different levels of noise, and their SNR values correspond to 4.0, 3.2 and 2.3. Figure 7b shows the images after denoising, and Figure 7c shows the processing results of our method. In this scene, there is little residual noise in processing results when SNR is higher than 3; yet there is considerable residual noise that could exceed the target in intensity when SNR is lower than 2. Ther first column of Figure 8a shows the original image of the scene Figure 6i. The second to fourth columns of Figure 8a show the image samples after adding different levels of noise, and their SNR values correspond to 3.4, 2.8 and 2.2. The denoised images are shown in Figure 8b, and the processing results are shown in Figure 8c. In this scene, the residual noise in processing results will not exceed the target in intensity when SNR is higher than 2.2. In the above anti-noise tests, although the quality of detection result deteriorates as the noise increases, we can still get the correct final detection results when SNR reduces to around 2, which demonstrates that our method has a certain degree of anti-noise ability.

Multi-Scale Target Detection
In practical applications, the small targets in different scenes could vary in size, and even within a scene, the target size could change greatly. Thus, the detection methods should have a good ability for multi-scale target detection. In this subsection, we test the multi-scale target detection ability of our method without any parameter tuning. Figure 9 presents the processing results of our method for four image samples in the scene Figure 6j where target size changes from 8 × 6 to 3 × 2 (pixels). Figure 9a show the four typical image samples in which the target sizes are 8 × 6, 7 × 5, 5 × 3, and 3 × 2 (pixels) respectively. Figure 9b show the processing results of our method for the four image samples. Figure 9c show the three-dimensional projections of the processing results in Figure 9b. In the processing results, the targets are enhanced while the clutters are suppressed. Furthermore, the intensities of target in four processing results are roughly the same, showing the little influence of target size on the processing results of our method. In other words, the processing results of the above test demonstrate that our method has a strong adaptability to different small target sizes and a great ability of multi-scale target detection.  Figures 10 and 11 show the processing results of various methods for the twelve scenes in Figure 6: Figure 10 shows the processing results for Figure 6a-f, and Figure 11 shows the processing results for   According to Figure 10, the good results of our method can be seen for the six different scenes. Despite the various types of clutters such as noise, clouds, buildings, and bridges, our method suppresses these typical clutters effectively and obtains the pure backgrounds. Meanwhile, the targets are enhanced significantly by our method and turned into the most salient spots in the processing results. As for other comparison methods, they get satisfactory processing results in some scenes, but they also lose effectiveness in some other specific scenes. For example, the scenes shown in the first row of Figure 10b and Figure 10c not only contain the boundaries of clouds and buildings, but also include considerable noise. These two scenes become a challenge for two classic methods: Min-Local-LoG and LS-SVM. In the processing results of the two methods (as shown in the second and third rows of Figure 10b and the second and third rows of Figure 10c), the clutter residuals near boundaries is larger than targets in intensity, resulting in false alarms in the decision-making stage. The forth and fifth rows of Figure 10a show the processing results of MPCM and HB-MLCM for the scene shown in the first row of Figure 10a, which is full of heavy noise. There is a certain degree of clutter residuals in the processing results, showing the fact that the anti-noise ability of MPCM and HB-MLCM still needs improvement. As shown in the sixth row and eighth row, though the targets have the largest intensity in the processing results of MWLCM and RLCM, the intensity of backgrounds fluctuates considerably, attenuating the difference between targets and backgrounds and further reducing the robustness of the method in various scenes. The seventh row of Figure 10 represents the processing results of DECM. It can be seen that this method gets good clutter suppression effects for most scenes. However, for the scene shown in the first row of Figure 10e, which has complicated ground backgrounds, DECM has some remarkable clutter residuals in its detection result (shown in the eighth row of Figure 10e), leading to the false alarms in the final decisions.

Qualitative Comparison
For the scenes shown in Figure 11, our method also achieves superior processing results compared to other baseline methods. For example, the first row of Figure 11a shows a scene with cloud clutters and obvious boundaries. What' more, the target is dim and obscure, and drowned in heavy random noise. Such complicated scene bring challenges to our method. Through the detection result shown in the ninth row of Figure 11a, we can find that our method suppress the cloud clutters, the boundaries, and the noise clearly, and the targets are enhanced significantly and turned into the most salient spots. As for other comparison methods, DECM also gets satisfactory processing results shown in the seventh row of Figure 11a. Min-Local-LoG leaves some obvious clutter residuals near the boundaries in the processing results, and the noise is not removed. LS-SVM eliminates most clutters effectively, but we can still see some spot-like residuals near the cloud boundaries in the third row of Figure 11a. As shown in the sixth and eighth rows of Figure 11a, the noise is suppressed effectively in the processing results of MWLCM and RLCM, but the backgrounds fluctuate apparently, bringing the negative impacts on robust detection for various scenes. MPCM and HB-MLCM suppress the cloud clutters successfully, but they still retain a high level of noise in their processing results as shown in the forth and fifth rows of Figure 11a.
In conclusion, the boundaries and noise are the main challenges to detection. The above qualitative experiment demonstrates that our method overcomes these challenges and obtains the superior clutter-removal effects compared to the other seven baseline methods. Besides, the experiment based on 12 different scenes also validate the robustness of our method.

Quantitative Comparison
We use two metrics, BSF and SCRg, to evaluate the clutter-suppression and target-enhancement effects of our method and other baseline methods. Please note that a larger value of BSF or SCRg represents a better performance. Table 2 shows the evaluation results of 8 methods for the 12 infrared scenes in Figure 6, and the largest value of BSF and SCRg in each scene is displayed in bold. Our method gets the largest SCRg value in most scenes, and this shows that our method has a better effect on separating targets from backgrounds despite the noise and various types of clutters. Our method gets the largest BSF value in nine scenes and gets the second largest BSF value in the other three scenes, and this demonstrates the superior clutter-suppression effectiveness of our method in comparison with other baseline methods. DECM gets the largest BSF value in the scene Figure 6a,c,l, which shows the best clutter-suppression effect in the three scenes. However, considering both the clutter-suppression and target-enhancement effects, as the metric SCRg reveals, our method produces a better processing result. We also use ROC curve to evaluate the detection performance of our method and other comparison methods, and it should be noted that a curve closer to the top left corner in ROC space represents a better detection performance. As shown in Table 1, in all the tested scenes, there is only one infrared image in the scene (a)-(f). In these scenes, a detection method can get a perfect ROC curve as long as the target has the larger intensity than backgrounds in the processing results for only one image. This condition is quite easy to achieve, leading to the perfect ROC curves of most methods, and we cannot distinguish the performance differences from these ROC curves. Thus, we only draw the ROC curves for the six sequences marked as scene (g)-(l) in Table 1. Figure 12 presents the ROC curves of different methods for the six scenes, and the value of area under curve (AUC) for each curve is also calculated and shown in the bottom right corner. In each subgraph, the ROC curve of our method is closest to the top left corner, and the AUC value of our method is the largest. This illustrate the best detection performance of our method. We also test the average processing time of each method for a single frame in each scene, and the results are also shown in Table 2. Our method is slower than Min-Local-LoG, LS-SVM, MPCM, HB-MLCM, MWLCM, but much faster than DECM; actually, the computational efficiency of our method is comparable with that of RLCM, showing an acceptable efficiency of our method. Our method does not achieve a superior efficiency because the growth procedure at each position consumes too much time. In practical applications, we could use some simple feature to select the candidate targets and only apply the chain-growth filtering at the potential target positions to save the whole processing time; besides, parallel computing is also a helpful measure to reduce the running time.

Conclusions
In this paper, we presented a novel structure-adaptive clutter suppression method called chain-growth filtering for infrared small target detection. Owing to the flexible growth strategy, the chain-growth filtering model can only involve the pixels inside the clutter region despite the complex clutter shapes. Because of the relative small grey-level difference between the involving pixels, the proposed filtering model obtains a superior and robust suppression effect for various clutters with irregular shapes. Furthermore, the proposed detection method also achieves a multi-scale target detection ability and a considerable anti-noise ability. Compared with seven state-of-the-art methods, our proposed method shows an excellent detection performance for the diverse infrared scenes in extensive experiments.
For the algorithm generality, the filtering model exploits the pixels' grey values in Equation (7), while using other advanced pixel-wise features might obtain more exciting results in some scenario-specific applications. Moreover, since our proposed method only uses the local image characteristics, in future, some additional non-local characteristics could be employed to further improve the detection performance. In addition, the proposed chain-growth filtering model might also be applied to other similar tasks, for instance, pulmonary nodules detection in CT images also needs to suppress line-shaped clutters brought by blood vessels. How does the proposed idea work in such applications? We leave this for further studies.