Infrared Small Moving Target Detection via Saliency Histogram and Geometrical Invariability

In order to detect both bright and dark small moving targets effectively in infrared (IR) video sequences, a saliency histogram and geometrical invariability based method is presented in this paper. First, a saliency map that roughly highlights the salient regions of the original image is obtained by tuning its amplitude spectrum in the frequency domain. Then, a saliency histogram is constructed by means of averaging the accumulated saliency value of each gray level in the map, through which bins corresponding to bright target and dark target are assigned with large values in the histogram. Next, single-frame detection of candidate targets is accomplished by a binarized segmentation using an adaptive threshold, and their centroid coordinates with sub-pixel accuracy are calculated through a connected components labeling method as well as a gray-weighted criterion. Finally, considering the motion characteristics in consecutive frames, an inter-frame false alarm suppression method based on geometrical invariability is developed to improve the precision rate further. Quantitative analyses demonstrate the detecting precision of this proposed approach can be up to 97% and Receiver Operating Characteristic (ROC) curves further verify our method outperforms other state-of-the-arts methods in both detection rate and false alarm rate.


Introduction
IR imaging based small moving target detection is one of the most significant techniques for military, astronautics and aeronautics applications [1]. The performance of an infrared search and track (IRST) system highly relies on the precision of small target detection. A high-performance IR small target detection algorithm should remove the background clutters effectively and examine the real targets not only in a single frame, but also in consecutive frames. Moreover, motion trajectories need to be delineated, which makes it easier to monitor and capture the targets of interest. Although modern IR detectors possess the advantages of fast detection, cheap equipment and simple setup [2], the specific imaging mechanism and detection condition still result in the following inherent properties of IR images which cause much inconvenience for target detection [3][4][5]. On the one hand, IR imaging is based on IR radiation, so the target/background contrast may be weak if their radiometric quantities are similar. Furthermore, it is an inevitable fact that pixel size of the existing IR cameras cannot be small enough to generate high-resolution images, which means the images to be processed are always blurred. In addition, the target size becomes comparatively small (fewer than 20 × 20 pixels) on account of the long military observation distance.
To efficiently examine small moving targets and remove various sorts of background clutters in IR images simultaneously, numerous algorithms have been developed so far, including filter based methods, mathematical morphology based methods, wavelet based methods, and so on. Filter based methods, the representatives of which are max-mean/max-median filter [6], high-pass filter [7] as well as two-dimensional least mean square (TDLMS) filter [8], utilize fixed templates to suppress clutters according to intensity difference. Although they can meet the requirement of real-time processing, the results are always inaccurate [9,10]. Mathematical morphology theory, like Hat transformation (including top-hat and bottom-hat transformation) [11], is another embranchment in the field of small target detection. These kinds of methods are aimed at enhancing regions of interest via morphological operations, but usually fails when the target is dim or the clutter is heavy [12,13]. Wavelet based algorithms [14] design a group of filters which are matched to a point spread function (PSF) at different scales by choosing a mother wavelet similar to PSF. Unfortunately, they are quite time-consuming and the false alarm rates are always high [15].
Although much progress has been achieved in the past decades, there are some significant problems that remain to be worthy of further investigation: on the one hand, dark target whose IR radiation is lower than the surroundings is seldom covered in previous algorithms; furthermore, using motion features to eliminate false alarms created in the single frame and forming complete trajectories are still tough tasks.
In this paper, we first present a new saliency histogram on the basis of a saliency map to distinguish visually salient regions from the background. Based on the fact that both bright and dark IR small targets have relatively different grayness with the background, proving IR small targets can be seen as salient regions in IR images, the gray levels correspond to targets would be assigned with large bin values in the saliency histogram. Then, an adaptive threshold is calculated via Otsu's method [16][17][18] to roughly extract IR targets according to the above-constructed histogram, and sub-pixel-accuracy centroid coordinates of all of the candidate target regions are obtained through a connected components labeling algorithm and an intensity-weighted criterion. For consecutive frames, we apply a uniformly accelerated motion model to make track correlations [19] and form completed motion trails for each candidate point. Then, the real small moving targets can be picked out from all the correlated points by use of the potential geometrical invariability existing in the sequences. Figure 1 gives an illustration of the framework of our method. Appl. Sci. 2017, 7, 569 2 of 18 are always blurred. In addition, the target size becomes comparatively small (fewer than 20 × 20 pixels) on account of the long military observation distance.
To efficiently examine small moving targets and remove various sorts of background clutters in IR images simultaneously, numerous algorithms have been developed so far, including filter based methods, mathematical morphology based methods, wavelet based methods, and so on. Filter based methods, the representatives of which are max-mean/max-median filter [6], high-pass filter [7] as well as two-dimensional least mean square (TDLMS) filter [8], utilize fixed templates to suppress clutters according to intensity difference. Although they can meet the requirement of real-time processing, the results are always inaccurate [9,10]. Mathematical morphology theory, like Hat transformation (including top-hat and bottom-hat transformation) [11], is another embranchment in the field of small target detection. These kinds of methods are aimed at enhancing regions of interest via morphological operations, but usually fails when the target is dim or the clutter is heavy [12,13]. Wavelet based algorithms [14] design a group of filters which are matched to a point spread function (PSF) at different scales by choosing a mother wavelet similar to PSF. Unfortunately, they are quite time-consuming and the false alarm rates are always high [15].
Although much progress has been achieved in the past decades, there are some significant problems that remain to be worthy of further investigation: on the one hand, dark target whose IR radiation is lower than the surroundings is seldom covered in previous algorithms; furthermore, using motion features to eliminate false alarms created in the single frame and forming complete trajectories are still tough tasks.
In this paper, we first present a new saliency histogram on the basis of a saliency map to distinguish visually salient regions from the background. Based on the fact that both bright and dark IR small targets have relatively different grayness with the background, proving IR small targets can be seen as salient regions in IR images, the gray levels correspond to targets would be assigned with large bin values in the saliency histogram. Then, an adaptive threshold is calculated via Otsu's method [16][17][18] to roughly extract IR targets according to the above-constructed histogram, and sub-pixel-accuracy centroid coordinates of all of the candidate target regions are obtained through a connected components labeling algorithm and an intensity-weighted criterion. For consecutive frames, we apply a uniformly accelerated motion model to make track correlations [19] and form completed motion trails for each candidate point. Then, the real small moving targets can be picked out from all the correlated points by use of the potential geometrical invariability existing in the sequences. Figure 1 gives an illustration of the framework of our method.   In conclusion, we argue that the main contribution of our work is to come up with an IR small moving target detection method that is suitable for both bright and dark targets and has a high detection accuracy under different conditions.

Saliency Map
Saliency detection is a popular and efficient tool that roughly presents the saliency distribution and locates the visually salient regions [20,21]. An IR small target, regardless of whether it is bright or dark, is able to attract much more attention from human eyes because of the distinct differences of brightness and shape when compared to its surrounding background. That is to say, IR small targets can be seen as salient regions [22].
In this section, a saliency map is constructed based on the amplitude spectrum in the frequency domain. Let us suppose that I(x,y) is the input IR image and its Fourier frequency spectrum F(u,v) is obtained by two-dimensional discrete Fourier transform (DFT) as Equation (1): where W and H are the width and height of I(x,y), respectively; Re(u,v) and Im(u,v) are the real part and imaginary part of F(u,v), respectively; j is an imaginary unit. Then, we can extract the amplitude spectrum A(u,v) and phase spectrum P(u,v) as Equations (2) and (3): where |·| and Θ|·| denote calculating the norm and phase angle of an arbitrary complex number, and arctg(·) is an arctan function. From the perspective of frequency domain, high-frequency components correspond to the salient target while low-frequency components represent the smooth background. However, the proportion of high-frequency components is much larger than that of low-frequency components, which is reflected by the fact that amplitude values of high frequencies are much smaller.
Based on the above-mentioned analysis, an exponential function is designed to make an reversal of the amplitude distribution, but the phase spectrum P(u,v) remains unchanged. Equation (4) shows the way of amplitude adjustment: where A (u, v) is the adjusted amplitude spectrum and γ is a fixed parameter (γ is set as 100 uniformly). As the illustration given by Figure 2, Equation (4) is a decreasing function, which ensures that a small A(u,v) becomes larger while a large one turns to be smaller. That is to say, high frequencies (IR target) are enhanced in the spatial domain meanwhile low frequencies (background) are suppressed. Finally, the saliency map S(x,y) can be calculated by inverse discrete Fourier transform (IDFT) and smoothed by Gaussian filtering as follows: where Ga(x, y) is a two-dimensional Gaussian filter; * is a convolution operator; and F −1 (·) is an IDFT operator.

Saliency Histogram
Considering the saliency map S and the input IR image I share the same size, it is obvious that there is a pixel-wise correspondence between S and I in the spatial domain. Based on this, a normalized saliency histogram χ is developed via averaging the accumulating saliency value of each gray level, which is illustrated by Equation (6): where χ k)∈[0,1] is the bin value for an arbitrary gray level k (k = 1, 2, 3, …, L and L is the maximum gray level value) in the saliency histogram; sum(χ) means the sum of all the bin values which is used to make a normalization; N k) = NUM{(x,y)|I x,y = k} refers to the total number of the pixels whose gray levels are k in I, where NUM{·} means counting the number of elements in a set; S a (k) is defined as the cumulative sum of a set of saliency values in S, and S a k) = S x,y , x,y ∈{(x,y)|I x,y = k} y x (7) Now, let us have a further discussion for the design of our saliency histogram. First of all, the ideal result of saliency histogram should meet the request that only those bins whose gray levels correspond to IR target are assigned with large saliency values. Enlightened by this idea, we calculate the saliency map S, which provides a quantitative measurement of saliency for each pixel and develop Equation (7) to measure the degree of saliency for each gray level. However, S a (k) is not a rigorous metric because there is a special case where the gray level k ʹ in the background area has a low saliency value but quite a large quantity of pixels, and the S a (k ʹ ) is still large. To address this problem, we take the number of pixels with different gray levels into consideration and use the averaging procedure to propose the final saliency histogram as Equation (6).

Adaptive Segmentation and Centroid Localization
The above-introduced saliency histogram χ reveals that the gray levels with large bin values correspond to the salient regions. For the purpose of extracting the target regions in a single frame as correctly as possible, we need to remove those bins with remarkably small values through a simple threshold and generate a modified saliency histogram χ ʹ as follows:

Saliency Histogram
Considering the saliency map S and the input IR image I share the same size, it is obvious that there is a pixel-wise correspondence between S and I in the spatial domain. Based on this, a normalized saliency histogram χ is developed via averaging the accumulating saliency value of each gray level, which is illustrated by Equation (6): where χ(k) ∈ [0,1] is the bin value for an arbitrary gray level k (k = 1, 2, 3, . . . , L and L is the maximum gray level value) in the saliency histogram; sum(χ) means the sum of all the bin values which is used to make a normalization; N(k) = NUM{(x, y)|I(x, y) = k} refers to the total number of the pixels whose gray levels are k in I, where NUM{·} means counting the number of elements in a set; S a (k) is defined as the cumulative sum of a set of saliency values in S, and Now, let us have a further discussion for the design of our saliency histogram. First of all, the ideal result of saliency histogram should meet the request that only those bins whose gray levels correspond to IR target are assigned with large saliency values. Enlightened by this idea, we calculate the saliency map S, which provides a quantitative measurement of saliency for each pixel and develop Equation (7) to measure the degree of saliency for each gray level. However, S a (k) is not a rigorous metric because there is a special case where the gray level k in the background area has a low saliency value but quite a large quantity of pixels, and the S a k is still large. To address this problem, we take the number of pixels with different gray levels into consideration and use the averaging procedure to propose the final saliency histogram as Equation (6).

Adaptive Segmentation and Centroid Localization
The above-introduced saliency histogram χ reveals that the gray levels with large bin values correspond to the salient regions. For the purpose of extracting the target regions in a single frame as Appl. Sci. 2017, 7, 569 5 of 18 correctly as possible, we need to remove those bins with remarkably small values through a simple threshold and generate a modified saliency histogram χ as follows: where χ * is an adaptive threshold calculated by Otsu's method. According to this modification, these removed bins stand for the smooth background and parts of the clutters. As a result, we can make a binarization for the input IR image on the basis of χ directly, and a binarized image BI can be obtained from Equation (9): After binarization, it is highly possible that some of the extracted regions are incomplete due to the inhomogeneity of intensity. For BI(x, y), close operation [23] is utilized to smooth the contours and fill in the holes existing in target regions as where BI c is the resulting image after close operation; M is a 5 × 5 sized structuring element; is a close operator; is a dilation operator, and is an erosion operator. Intended to avoid forming repeated trajectories of the same target, every region only needs to be represented by a single centroid precisely. A typical fast connected components labeling algorithm [24] is applied to label all of the candidate regions. For all of the connected regions with the same label, we use a gray-weighted criterion to calculate its centroid position as follows: where P denotes the space coordinate of the centroid; p i denotes the i-th pixel in each connected region; and N is the total pixel number of the region. By this procedure, every candidate target region is represented by an isolated point, which is called candidate point in the following contents.

Track Correlation
Within a short time, the motion of a small moving target can be described with a uniformly accelerated motion model [25], which is applied to implement track correlation and form completed trajectories in our inter-frame detection.
Assume the space coordinates of all the candidate points in an arbitrary frame compose a set P ξ = {P ξ 1 , P ξ 2 , P ξ 3 , . . . , P ξ n , where ξ represents the frame number and n is the total number of candidate points in that frame. Here, we just take the m-th candidate point P ξ m as an example to explain the process of track correlation.
First, the velocity v ξ m and acceleration a ξ m of P ξ m are written as where ∆t means the time interval. For two adjacent frames, ∆t = 1. Next, based on the uniformly accelerated motion model, the estimated position P ξ+1 m of P ξ m in the (ξ + 1)-th frame is predicted as where ∆P ξ+1 denotes the displacement of background in the two adjacent frames. In this paper, we only consider the rotation and translation displacement caused by camera motion, and ∆P ξ+1 is calculated by an automatic registration method introduced in Ref. [26]. Furthermore, as is shown in Figure 3, a circular gate whose radius is (we take = 5 empirically) is set at P ξ+1 m . In the (ξ + 1)-th frame, all of the candidate points (represented by the light green dots) located inside the gate make up of another set Q m = {Q 1 , Q 2 , Q 3 , . . . , Q t } where t is the quantity of candidate points inside the gate and the point Q * is the maximum bin value in the saliency histogram is selected as the correlated one for P ξ m (if there is more than one point located inside the gate). Thus, a complete trajectory of P ξ m can be established if the above-discussed track correlation is repeated in every frame. Furthermore, as is shown in Figure 3, a circular gate whose radius is ℜ (we take ℜ = 5 empirically) is set at P m ξ+1 . In the (ξ + 1)-th frame, all of the candidate points (represented by the light green dots) located inside the gate make up of another set Q m = {Q 1 ,Q 2 ,Q 3 ,…,Q t } where t is the quantity of candidate points inside the gate and the point Q * is the maximum bin value in the saliency histogram is selected as the correlated one for P m ξ (if there is more than one point located inside the gate). Thus, a complete trajectory of P m ξ can be established if the above-discussed track correlation is repeated in every frame.

False Alarm Suppression
The procedure of track correlation presented above is implemented for each candidate point detected in the first frame and lasts for L frames. However, a formed trajectory would be eliminated on the condition that we cannot find the next correlated point for it in the following ε frames continuously. In this paper, we uniformly set L = 20 and ε = 3.
For each candidate point in the L-th frame, two maps can be constructed: points in the L ʹ -th frame, where this candidate point is correlated successfully for the first time, being seen as dots, and the Euclidean distances between each other, being seen as edges, compose the first map; those of the L-th frame (these points should also exist in the L ʹ -th frame) compose the second map.
Here, we demonstrate a fact that the distance between two background points remains constant regardless of the rotation and translation of background. An assumption is made that the rotation angle and the translation distance for the two maps defined above are ΔΦ and Δρ, and the rotation center is P * . As a result, for the n-th point P n f in the first map and P n s in the second map, there is an equation existing obviously: P n s -P * = ℵ(ΔΦ)·(P n f -P * + Δρ) where ℵ(ΔΦ) means the rotation matrix and Based on Equations (15) and (16), the Euclidean distance between two background points, P m s and P n s in the second map, can be expressed as where ‖•‖ stands for 2-norm. Under the circumstance that P m s and P n s are two background points, there are no other displacements for them except for the translation of background Δρ. It is

False Alarm Suppression
The procedure of track correlation presented above is implemented for each candidate point detected in the first frame and lasts for L frames. However, a formed trajectory would be eliminated on the condition that we cannot find the next correlated point for it in the following ε frames continuously. In this paper, we uniformly set L = 20 and ε = 3.
For each candidate point in the L-th frame, two maps can be constructed: points in the L -th frame, where this candidate point is correlated successfully for the first time, being seen as dots, and the Euclidean distances between each other, being seen as edges, compose the first map; those of the L-th frame (these points should also exist in the L -th frame) compose the second map.
Here, we demonstrate a fact that the distance between two background points remains constant regardless of the rotation and translation of background. An assumption is made that the rotation angle and the translation distance for the two maps defined above are ∆Φ and ∆ρ, and the rotation center is P * . As a result, for the n-th point P f n in the first map and P s n in the second map, there is an equation existing obviously: P s n −P * = ℵ(∆Φ)·(P f n −P * + ∆ρ) where ℵ(∆Φ) means the rotation matrix and Based on Equations (15) and (16), the Euclidean distance between two background points, P s m and P s n in the second map, can be expressed as where · 2 stands for 2-norm. Under the circumstance that P s m and P s n are two background points, there are no other displacements for them except for the translation of background ∆ρ. It is absolutely true that rotation cannot change the length of a vector, so Equation (17) can be further written as In contrast, if P s m is a moving target point while P s n is a background point, P s m may have an extra displacement ∆ρ m caused by self-motion, which means Equation (18) needs to be modified as Therefore, a conclusion can be summarized that the distance between two background points is constant, i.e., there is a geometrical invariability between background points. However, if there is at least a moving target point among them, the distance is changed. Motivated by this regulation, we propose a geometrical invariability based false alarm suppression method below. For the n-th candidate point in the second map, an index ψ denoting the difference of relative position is developed to judge whether P s n is a real moving target or a false alarm: where N is the number of trajectories in the second map. Lastly, the judge criterion is defined as follows: where thr is a threshold and it is set as thr = η·max(ψ). We take η = 0.7 uniformly in our method.

Experiments and Analyses
We briefly introduce the test sequences used in our simulations, and then make comparisons of the results of both in-frame detection and inter-frame detection with several state-of-the-art algorithms to prove the robustness and precision of our algorithm under different natural backgrounds.

Introduction of Datasets
Four IR video sequences, captured by mid-wave infrared (MWIR) refrigerant imagers at a frame rate of 25 fps, are selected as our datasets for analysis. Table 1 shows the detailed information of these sequences and the corresponding first frames are displayed in Figure 4, where real moving targets that need to be detected are marked with red circles, and the fake targets, as well as the regions that may possibly generate false alarms, are marked with yellow circles.  The sky background of Seq.1 is clear, but the cloud regions are of great inhomogeneity of intensity, which means interference of cloud edge is the main source of false alarms. Among these four groups, Seq.2 and Seq.3 contain dark IR small targets, improving the difficulty and complexity of detection to some extent. Those bright but still points existing around the hills and houses in Seq.2 are fake targets, and it is quite hard to distinguish these kinds of points in a single frame. The sea waves are also very challenging for small target detection because the edges of waves are difficult to be suppressed by filtering based algorithms, and the motion of waves also cause much difficulty for the inter-frame detection. In addition, a dead point caused by the detector itself always exists during the whole sequence, which is a tough task for us to address. Seq.4 has a moving background and the cloud layer is quite dense. In addition, there are a certain number of dead pixels due to the poor quality of imaging.

Experimental Results of In-Frame Detection
First of all, this section is organized to show the processing results by our algorithm and the four other conventional algorithms. Then, several metrics are applied to make a quantitative comparison.
We choose the four groups of IR images shown in Figure 4 as the tested samples to perform the experiment. Meanwhile, four state-of-the-art algorithms: Max-Mean, Butterworth high-pass (BHP), Hat transformation (including Top-hat and Bottom-hat transformations) and two-dimensional least mean square (TDLMS) are selected to compare with our method. For Max-Mean, the raw IR image is filtered by a max-mean filter, and the filtered output is subtracted from the original image to enhance the IR small target. Furthermore, specific frequency components belonging to IR targets are extracted by setting a specific cut-off frequency of the Butterworth high-pass filter in the BHP The sky background of Seq.1 is clear, but the cloud regions are of great inhomogeneity of intensity, which means interference of cloud edge is the main source of false alarms. Among these four groups, Seq.2 and Seq.3 contain dark IR small targets, improving the difficulty and complexity of detection to some extent. Those bright but still points existing around the hills and houses in Seq.2 are fake targets, and it is quite hard to distinguish these kinds of points in a single frame. The sea waves are also very challenging for small target detection because the edges of waves are difficult to be suppressed by filtering based algorithms, and the motion of waves also cause much difficulty for the inter-frame detection. In addition, a dead point caused by the detector itself always exists during the whole sequence, which is a tough task for us to address. Seq.4 has a moving background and the cloud layer is quite dense. In addition, there are a certain number of dead pixels due to the poor quality of imaging.

Experimental Results of In-Frame Detection
First of all, this section is organized to show the processing results by our algorithm and the four other conventional algorithms. Then, several metrics are applied to make a quantitative comparison.
We choose the four groups of IR images shown in Figure 4 as the tested samples to perform the experiment. Meanwhile, four state-of-the-art algorithms: Max-Mean, Butterworth high-pass (BHP), Hat transformation (including Top-hat and Bottom-hat transformations) and two-dimensional least mean square (TDLMS) are selected to compare with our method. For Max-Mean, the raw IR image is filtered by a max-mean filter, and the filtered output is subtracted from the original image to enhance the IR small target. Furthermore, specific frequency components belonging to IR targets are extracted by setting a specific cut-off frequency of the Butterworth high-pass filter in the BHP method. Hat transformation denotes the pixel-wise difference between the raw image and the resulting image processed by morphological opening or closing operation. It should be noted that Top-hat transformation is designed for detecting bright targets while Bottom-hat transformation is designed for dark targets. As a result, Bottom-hat transformation is implemented in Seq.2 while Top-hat transformation is implemented in other sequences. Lastly, TDLMS detects small targets in the way of calculating the difference between the original image and the background estimation.        Figure 9 presents the original processing results using the state-of-the-art algorithms, and Figure 10 further shows their binarized results via thresholds calculated by Otsu's method. For Max-Mean, it has poor performances when dealing with the cloud and sea backgrounds, indicating that it cannot remove the strong interferences caused by edges. BHP has the same drawbacks when addressing the edges, which can be seen from Figure 10b, because this algorithm is sensitive to frequency characteristics and large numbers of false alarms would appear, if the cutting-off frequency is chosen inappropriately. In addition, according to Figure 10c, an obvious phenomenon of Hat transformation is found that if there are both bright and dark targets in the image, it is inevitable that at least one kind of target would be lost because we cannot use Top-hat transformation and Bottom-hat transformation simultaneously. Finally, TDLMS has a stronger ability to suppress edges when compared to others, but it is easy to generate large quantities of candidate points for the same target region and this would cause repeated trajectories in inter-frame detections remarkably.
In order to discuss the performances more convincingly, two widely-accepted metrics called precision rate P and recall rate R [27] are selected to measure the detection results quantitatively. As is illustrated in Figure 11, an assumption is made that N T is the pixel number of true targets existing in the current frame; N D is the pixel number of targets detected by the tested algorithm, and N C = N T ∩N D is the pixel number of targets detected correctly. In this case, precision rate P and recall rate R can be thus defined as Furthermore, a comprehensive evaluation index η representing the detecting precision of each algorithm [28] can be expressed as  Figure 9 presents the original processing results using the state-of-the-art algorithms, and Figure 10 further shows their binarized results via thresholds calculated by Otsu's method. For Max-Mean, it has poor performances when dealing with the cloud and sea backgrounds, indicating that it cannot remove the strong interferences caused by edges. BHP has the same drawbacks when addressing the edges, which can be seen from Figure 10b, because this algorithm is sensitive to frequency characteristics and large numbers of false alarms would appear, if the cutting-off frequency is chosen inappropriately. In addition, according to Figure 10c, an obvious phenomenon of Hat transformation is found that if there are both bright and dark targets in the image, it is inevitable that at least one kind of target would be lost because we cannot use Top-hat transformation and Bottom-hat transformation simultaneously. Finally, TDLMS has a stronger ability to suppress edges when compared to others, but it is easy to generate large quantities of candidate points for the same target region and this would cause repeated trajectories in inter-frame detections remarkably.
In order to discuss the performances more convincingly, two widely-accepted metrics called precision rate P and recall rate R [27] are selected to measure the detection results quantitatively. As is illustrated in Figure 11, an assumption is made that N T is the pixel number of true targets existing in the current frame; N D is the pixel number of targets detected by the tested algorithm, and N C = N T ∩ N D is the pixel number of targets detected correctly. In this case, precision rate P and recall rate R can be thus defined as Furthermore, a comprehensive evaluation index η representing the detecting precision of each algorithm [28] can be expressed as where λ is harmonic coefficient, and we set this coefficient as λ = 1 in this paper. η = 1 indicates that there are no false alarms and all of the real targets are discovered, while η = 0 means that none of the real targets are found out. Hence, the larger η is, the more satisfactory the result is. Table 2 reveals the statistical data of η for the in-frame detections at length. Clearly, the result presented by Table 2 matches the qualitative analyses made above. Our method achieves the largest η in all sequences, but the accuracies are not at a high level as a whole.
of the real targets are found out. Hence, the larger η is, the more satisfactory the result is. Table 2 reveals the statistical data of η for the in-frame detections at length. Clearly, the result presented by Table 2 matches the qualitative analyses made above. Our method achieves the largest η in all sequences, but the accuracies are not at a high level as a whole.

Results of Inter-Frame Detection
In this part, the binarized detection results of the consecutive frames are accumulated in the final frame. Figure 12 presents the cumulative results for each sequence in detail. Overall, our algorithm has the best performances in all sequences and nearly all of the false alarms remaining in the single frame detection are removed by the subsequent inter-frame detection effectively. Moreover, complete trajectories are drawn in the final resulting images, and the phenomenon of repeated trajectories for the same target is successfully avoided. By comparison, other methods have less satisfactory performances in this experiment. As is shown in the resulting images of Seq.1, Seq.3 and Seq.4, Max-Mean is easy to produce discontinuous trajectories, meaning that real targets are lost in certain frames. The trajectories generated by BHP are relatively complete, but regions with clutter edges contain large numbers of false alarms, which is especially outstanding in Seq.1 and Seq.3. Hat transformation suffers from missing detection seriously, and the trail of the dark IR small target is completely missed in Seq.3. However, this method can reduce the quantity of repeated trails to some extent when compared with the other three algorithms. Furthermore, it has a good anti-inference ability to clutter edges. In contrast, TDLMS tends to severely suffer from the repeated trails, and it is obvious that the trajectories produced by this method are thicker than others. Table 3 presents the η values of the five groups in detail. From the data provided by this table, we can clearly find that η values of our method are higher than 97% and are at least two times larger than other methods in all of the four sequences. Through further analyses for the source of false alarms produced by contrastive algorithms, we argue that the major sources contain two aspects: (1) the repeated trajectories increase the quantities of redundant and useless points to a great extent; (2) the false detection results of edges have been accumulated.
The ROC (Receiving Operating Characteristic) curve [29] is an effective tool to describe the quality of detection methods. For an ROC curve, the abscissa and ordinate stand for the probability of false alarm rate (Pf) and detection rate (Pd), which are expressed as Equations (25) and (26), respectively: where N 0 denotes the total pixel number of the current frame. A good ROC means that the tested method is able to highlight the target and suppress the clutters at the same time. The ROC curves of the four experiments are drawn as Figure 13.  In light of the four groups of ROC curves, it is apparent that the area under the ROC curve belonging to our method is always far larger than the contrastive methods. However, the performance of each contrastive method varies a lot under different backgrounds, further demonstrating that the robustness of the four methods is weaker than our method.

Conclusions
It is easy for conventional IR small target detection methods to generate large quantities of false alarms due to the small size of target, the lack of color or texture information and the interference of clutters. Furthermore, the existing algorithms scarcely have the ability to detect both bright and dark IR small targets accurately at the same time, and the inter-frame motion information is also ignored by most of the researchers.
In this paper, an IR small moving target detection method using saliency histogram and geometrical invariability is proposed. For the in-frame detection part, a saliency histogram is established by averaging the cumulative saliency value of each gray level so that a single-frame segmentation can be made via an adaptive threshold of the histogram, and the centroid position of candidate targets is calculated via a connected components labeling algorithm and a gray-weighted criterion. For the inter-frame detection part, false alarms are further removed according to the geometrical invariability existing between two relatively still points. Large numbers of experiments convincingly prove that our method has robustness and satisfactory precision under various natural backgrounds compared with other state-of-the-art methods.
In our future work, we plan to concentrate on investigating more well-performed features of the IR small target in single-frame detection so as to reduce the computing quantity of the inter-frame detection and further improve the final detection precision.

Conclusions
It is easy for conventional IR small target detection methods to generate large quantities of false alarms due to the small size of target, the lack of color or texture information and the interference of clutters. Furthermore, the existing algorithms scarcely have the ability to detect both bright and dark IR small targets accurately at the same time, and the inter-frame motion information is also ignored by most of the researchers.
In this paper, an IR small moving target detection method using saliency histogram and geometrical invariability is proposed. For the in-frame detection part, a saliency histogram is established by averaging the cumulative saliency value of each gray level so that a single-frame segmentation can be made via an adaptive threshold of the histogram, and the centroid position of candidate targets is calculated via a connected components labeling algorithm and a gray-weighted criterion. For the inter-frame detection part, false alarms are further removed according to the geometrical invariability existing between two relatively still points. Large numbers of experiments convincingly prove that our method has robustness and satisfactory precision under various natural backgrounds compared with other state-of-the-art methods.
In our future work, we plan to concentrate on investigating more well-performed features of the IR small target in single-frame detection so as to reduce the computing quantity of the inter-frame detection and further improve the final detection precision.