Infrared Small-Faint Target Detection Using Non-i.i.d. Mixture of Gaussians and Flux Density

: The robustness of infrared small-faint target detection methods to noisy situations has been a challenging and meaningful research spot. The targets are usually spatially small due to the far observation distance. Considering the underlying assumption of noise distribution in the existing methods is impractical; a state-of-the-art method has been developed to dig out valuable information in the temporal domain and separate small-faint targets from background noise. However, there are still two drawbacks: (1) The mixture of Gaussians (MoG) model assumes that noise of different frames satisﬁes independent and identical distribution (i.i.d.); (2) the assumption of Markov random ﬁeld (MRF) would fail in more complex noise scenarios. In real scenarios, the noise is actually more complicated than the MoG model. To address this problem, a method using the non-i.i.d. mixture of Gaussians (NMoG) with modiﬁed ﬂux density (MFD) is proposed in this paper. We ﬁrstly construct a novel data structure containing spatial and temporal information with an infrared image sequence. Then, we use an NMoG model to describe the noise, which can be separated with the background via the variational Bayes algorithm. Finally, we can select the component containing true targets through the obvious difference of target and noise in an MFD maple. Extensive experiments demonstrate that the proposed method performs better in complicated noisy scenarios than the competitive approaches.


Introduction
Distant and faint target detection is of great importance to infrared systems, as anti-missile techniques and early-warning systems.Due to the unique characteristic of these military tasks, the targets need to be detected accurately as early as possible in the infrared search and track systems to provide ample time for deployment and striking back.However, the target usually occupies only a few pixels and lacks texture information due to the very far observation distance.The backgrounds are very complex, including sky background and sea-sky background, which means the acquired infrared images are usually contaminated by a clutter background and a varying noise.The contrast between targets, background and the varying noise might be very poor.The low signal-to-clutter ratio (SCR) and signal-to-noise ratio (SNR) make the infrared targets very faint.Therefore, robust infrared small and faint target detection technique remains a valuable research hotspot [1][2][3].
To achieve a satisfying target detection performance, many approaches have been proposed for different scenarios, including two types: Track-before-detection (TBD) approaches [4,5] and detection-before-track (DBT) approaches [6][7][8].TBD approaches have good detection performance for targets with continuous track motion, such as 3D matched filters [9] and its improved versions [10,11].DBT approaches focus on suppressing the clutter background while enhancing the target in single frame, and are more efficient than TBD approaches.TBD approaches are widely used in practical engineering.At present, the common types of DBT methods are filtering, human visual system (HVS) and multi-feature based approaches.Filtering methods analyze spatial continuity of an input image, and the target is modeled as a break point, such as max-median filter [12], top-hat filter [13] and 2D least mean square (TDLMS) filter [14].HVS based methods [15][16][17] assume that there is a significant contrast between background and target regions.Multi-feature based methods [18][19][20] represent the target characteristics and background region with features used to train the classifiers.
Moreover, the low-rank and sparse component recovery based approach, as a subdiscipline of the low-rank representation (LRR) [21], has become very popular in recent years.In this approach, the background regions are assumed to change gradually, and a special low-rank data structure can be constructed with the original images, such as a 2D matrix and a 3D tensor.With the recovery of the low-rank background, the dim target can be separated from the original image.Gao proposed an infrared image-patch (IPI) model [22], which constructs a low-rank matrix by sliding window.The IPI model uses vanilla nuclear norm minimization (NNM) [23] and l 1 [24] to regularize the background and the target, respectively.The performance of NNM in a low-rank component estimation problem would degrade in a noisy scenario.The solution for this problem is to replace NNM with a more suitable regularizer.Thus, Dai proposed a weighted IPI approach [25] and a non-negative IPI approach [26], and Guo proposed a reweighted WIPI model (ReWIPI) based on weighted nuclear norm minimization (WNNM) [27].In the view of the dimension of data, Dai proposed a reweighted infrared patch-tensor (RIPT) method to generalize the low-rank matrix to low-rank tensor for mining more spatial information [28].However, the RIPT method unfolds the background patch tensor as three matrices and regularizes it via the sum of nuclear norms (SNN) [29], which is suboptimal and inefficient.To remedy this issue, Sun proposed a weighted tensor nuclear norm with IPT (WNRIPT) method [30].
However, most of the existing low-rank component recovery based approaches [22,[25][26][27][28]30] only use the Frobenius loss term [31] to constrain the noise, which models the noise as an independent and identically distributed (i.i.d.) Gaussian distribution.In practical applications, the infrared images usually include complex instrumental noise that degrades the performance for target detection.The complex noise degrades the performance of the target detection severely.A robust method, capable of distinguishing different kinds of noise, is needed.
To this end, a state-of-the-art method [32] digs out valuable information in time domain and uses a mixture of Gaussian (MoG) noise models [33] to model the target component and noise component together.The MoG model characterizes each pixel in the image and updates the mixed Gaussian model after the new image is acquired.It matches each pixel in the current image with the MoG model, and the matched pixels are classified into background regions [34,35].Finally, the Markov random field (MRF) method [34] is used to detect the target.However, the noise distribution in different frames is modeled as i.i.d.MoG distributions substantially in [32], which is not suitable for complex noisy scenarios.In addition, the MRF model does not provide a robust noise estimate in complex scenarios, since its performance is based on the assumption that the noise component does not arise in the neighborhood region of the targets.However, the noise permeates through the whole image, including the target.
We propose a small and faint target detection approach based on a non-i.i.d.MoG (NMoG) model [36] and modified flux density (MFD) maple [37].The noise distributions in different frames (sequences of images) is assumed to follow non-i.i.d. for improving the robustness in real scenarios.The target is considered as a kind of noise extracted from the background via an NMoG low-rank matrix factorization (NMoG-LRMF) model, solved by a variational Bayes (VB) algorithm.In a second step, the MFD maple [37] method is used to distinguish the true target from the noise, accounting for the fact that target flux density differs from the noise in infrared gradient vector field.This paper is organized as follows.The proposed method is described in Section 2. Section 3 provides the experimental results to validate the effectiveness of the proposed method.Finally, we conclude our work in Section 4.

Spatio-Temporal Patch Model
Given an infrared image sequence, we can get a 3D cube patch tensor by storing each frame into its slice.We vectorize each slice and get a 2D matrix.The procedure is given in Figure 1.Note that it is possible to reconstruct the image sequence from the processed 2D matrix via inverse operation.Assume an infrared image sequence f 1 , f 2 , • • • , f P ∈ R m×n transformed into a matrix F with size of N × P, where N=m × n and P denote the spatial and temporal dimensions.We divide F into background component B and noise E, described as: and the small-faint target component T is considered as a sparse noise component in E [32].

Target Image Selection
Target Image Sequence

Background Component
In low-rank recovery based methods, background regions are assumed to vary slowly, and there are a lot of repeated regions.The low-rank matrix B [32] is modeled as follows: where U ∈ R N×R and V ∈ R P×R , and their l-th columns are represented as u •l and v •l .R is the initial rank of B. The intrinsic low-rank nature of B is guaranteed by assuming u •l and v •l generated according to a Gaussian distribution: where I N (I P ) is the N × N (P × P) identity matrix.γ l denotes the precision of u •l and v •l that satisfies: where Gam (γ l |ξ 0 , δ 0 ) represents a gamma distribution, and ξ 0 , δ 0 are scales.The low-rank component can be estimated accurately by this model [38].

Noise Component
In [32], the noise of different frames are assumed to be i.i.d., which is not practical in real and complex scenarios.Thus, we use the NMoG model [36] to model the noise distributions in different frames, namely noise distribution of images in different frames are nonidentical.The ij-th element of the noise E can be divided into K components as below: where π jk denotes the mixing proportion that is non-negative, and ∑ K k=1 π jk = 1.µ jk and τ jk denote mean and precision, respectively.Instead of setting the MoG parameters, i.e., π jk , µ jk and τ jk , as unchanging value for k-th Gaussian component, we vary them in different frames.Equation ( 5) can be equivalently expressed as a two-level generative model by introducing the indicator variables z ij .z ij is the hidden variable generated from Multinomial distribution with parameter π j : where z ij = z ij1 , . . ., z ijK ∈ {0, 1} K , ∑ K k=1 z ijk = 1.Multinomial( ) represents the multinomial Dirichlet distribution.The conjugate priors of µ jk , τ jk and the mixing proportions π j = [π j1 , . . ., π jK ] are also defined for completing the Bayesian model: where β 0 , m 0 , c 0 , d are the hyper-parameters, and d satisfies Gam distribution with hyper-parameters η 0 , λ 0 .Dir(.) is a Dirichlet distribution parameterized by α 0 = (α 01 , . . . ,α 0K ).Then, the noise component can be modeled by Equations ( 6) and (7).

Variational Inference
In this section, the posterior of parametric model Equation ( 8) is inferred by the VB approach [39].VB obtains the objective parameters x finding the minimum Kullback-Leibler (KL) divergence between the approximated distribution q (x) and the actual distribution p (x |D ) with the known observation D, which can be formulated as below: where Ω is the constrained probability densities for obtaining the feasible solution.We can factorize q (θ) as q (θ) = ∏ i q i (θ i ) by mean field theory, and the posterior distribution Equation ( 8) can be approximated with the following form:

Estimation of Noise Component
For the noise component in the j-th frame, we need to estimate four parameters, µ j , τ j , Z and π j .Firstly we update µ j and τ j in the following way: where where f ij denotes the ij-th element of the F. The variables z ij can be derived in closed form as below: where Finally, we update π j by: where α jk = α 0 + ∑ i z ijk , and the hyper-parameter d is updated by the following equation: where η = η 0 + c 0 KP and λ = λ 0 + ∑ j,k τ jk .

Estimation of Background Component
For the background component, we need to estimate three parameters, including U, V and γ. u i• (i = 1, . . ., N) can be estimated as follows: where Similarly, v j• (j = 1, . . ., P) is estimated by: where , γ l is a decisive factor for guaranteeing low-rank property of B by removing the corresponding rows when its value is very large [38], which can be estimated by: where In the following experiments, we set m 0 = 0, and α 0 , β 0 , c 0 , d 0 , η 0 , λ 0 , ξ 0 , δ 0 are initialized with 10 −6 [36].

Target Extraction
In this section, we firstly select the noise component containing small-faint target.Then, the MFD method [37] is used to extract the target from the noise.

Selecting Noise Component Containing Target
We obtain the noise component E separating it from the background component, and we can divide it into K components E 1 , . . ., E K according to the maximum probability criteria [32]: The K components are restored to sequences Ē1 , . . ., ĒK by the aforementioned method in Section 2.1.Note that the intensity of the true target is quite different from the noise.Instead of using variance guided method in [32], we calculate the difference between the minimum and maximum of intensity and select the largest one Ēi as the component containing target, which can be described as follows: The following experimental results demonstrate that this method is effective.

Extracting Target by MFD
Figure 2 gives the results of a representative infrared noisy image using NMoG method with K = 3, and subfigure (c) is the slice containing the true target.It is observed from Figure 2c that the restored slice containing true target is still contaminated by pixel noise.Thus, we use the MFD method [37] to wipe out the noise and enhance the target.The noise component E containing the target is firstly transformed into a gradient vector field by: where e (x, y) denotes the value of E at location (x, y), e x (x, y) and e y (x, y) are the gradient value in the x-direction and y-direction.
From Figure 3b,d, it can be observed that both the true target and bright noise residuals are a sink in gradient vector field.But the gradient vectors of noise pixel focus on 4 directions, and MFD method can compute the flux density of each pixel after removing its four largest gradient vectors, which is defined as follows [37]: where MFD s is s-scale MFD, s denotes the scale variable, O denotes the subset of O, which removes four pixels containing the four largest gradient vectors.Note that the number of pixels on the curve is 8s − 4. O represents the neighborhood region as: and the norm vector on the boundary point n o (x, y) is defined as follows: where n ox (x, y) and n oy (x, y) are the value in the x-direction and y-direction.Figure 3 shows that noisy pixels are wiped out according to their MFD value.This is because the MFD value of the noisy pixels is much smaller than that of the real target, which is usually a negative element.Following this property, the corresponding noise pixels are wiped out in the target image.Thus, we obtain an initial result by the following equation: where T (x, y) denotes the initial target image, MFD s (x, y) + is the result by setting the positive elements and negative elements in the original MFD maple to 1 and 0, respectively.Finally, we use an adaptive threshold to further separate the target [22], which is described as below: where µ and σ are the mean value and standard deviation of the small target image.k is a empirical value, and we set it as 0.05 in our experiment.The framework of our method is shown in Figure 1, and the detection procedure is given in Algorithm 1.
Step 1: Construct the spatio-temporal patch image F with the input infrared image sequence using the method in Section 2.1.
Step 5: Select the true target images by Equation (22).
Step 6: Calculate the original MFD map of the target images by Equations ( 23) and (24).
Step 7: Obtain the separated target images by using both MFD maple and adaptive threshold, which can be computed by Equation (27).
Output: Separated target image sequence.

Experiments
To validate the effectiveness of the proposed approach, extensive experiments are performed on simulated and real infrared image sequences in this section.

Metrics and Comparative Methods
In this paper, we use the receiver operating characteristic (ROC) to show the relationship between the detection probability P d and false alarm rate F a , and they are described as below [22,[25][26][27][28]32]: number of true detections number of actual targets F a = number of false detections number of images (30) In addition, the local signal-to-noise ratio gain (LSNRG), background suppression factor (BSF) , signal to clutter ratio gain (SCRG) and contrast gain (CG) metrics are also used in our work, and the detailed definitions can be found in [28,32].We also introduce a local background region for computing the LSNRG and SCRG [22], which is displayed in Figure 4.The width of neighboring region d is set as 20 here.Nonetheless, the accuracy of the low-rank background estimation is also an important metric, since less estimation error means better preservation of strong edges in the background component.Thus, we use another metric, namely accuracy of background recovery (ABR), which is defined as: where B in and B out are the background before and after processing.The five baseline methods for comparison including two classical filtering methods, i.e., top-hat [13] and max-median filtering [12], and three low-rank matrix analysis methods IPI [22] and RIPT [28] (using spatial information) and the MRF-MoG [32] (using spatio-temporal information and assuming i.i.d.MoG noise) method.Table 1 gives the detailed parameter settings, where n 1 , n 2 , n 3 denotes the dimensions of the infrared patch tensor [28].

Simulated and Real Datasets
The noise of real infrared images usually includes five typical types: Gaussian noise, Poisson noise, impulse noise, dead pixels or lines, and salt and pepper noise.To validate the effectiveness of the proposed approach in complex noisy situations, five consecutive real infrared image sequences are used as original images to add the mixture of the above five types of noises, and these original images are approximately noise-free.Additive white Gaussian noise with two SNR value are added to each frame of five sequences, and the SNR are in the range of [10,15] dB and [15,20] dB, respectively.The location of pixels corrupted by different noises are chosen randomly.We choose forty frames of Sequences 1-4 to add with various types of noise and different intensity.Finally, we add the mixture of noise to each frame in sequence 5.The details are described in Table 2, and their representative frames are displayed in the first column of Figure 9. SCR is defined as follows [40]: where µ t is the average pixel value of the target region, µ b and σ b denote the average pixel value and the standard deviation of the neighborhood region.Based on definition of SCR, the average SCR value of targets is used to characterize the noisy sequence, which is defined as follows [22]: where N denotes the number of targets and SCR i denotes ith target.Then we also carry out comparison experiments with three real infrared image sequences contaminated by heavy noise.

Effect of Component Number
Here, we vary K from 2 to 7 for analyzing the influence of noise component parameter K on the performance of the proposed model.For quantitative analysis, the experiments have fixed false-alarm rates (F a ) by changing the segmentation thresholds on Sequences 1-5, which are given in Table 3.The bold format number indicates the highest score.Besides, we also display the ROC curves in Figure 5.We can observe from the result that there is no significant difference in performance when K is larger than 2. From Figure 5a,d, it can be seen that F a of K = 2 are higher than that of other K values, this is because the target component might contain the sparse noise, which could not be wiped out by the threshold.However, it is also improper to set K too large.From Figure 5a,c-e, the probability of detection is decreasing as K becoming larger when K ≥ 4 due to the true targets might lose in the separated target component.In addition, considering the computation complexity is increasing with larger K, K is set as 3 in experiments.

Effect of MFD
To demonstrate the superiority of the MFD method over other methods, we perform comparative experiments on a representative image of simulated Sequence 5, including the MRF [32] and the ablated version (NMoG without MFD).From Figure 6, we can observe that the MFD method can effectively wipe out the bright noise, while the other two methods lose the true target and have many residual noise pixels, and these residuals could cause a high false alarm ratio.

Performance of Multiple Targets Scene
Considering the number of targets may change in different scenes, such as antimissile systems, we test the effectiveness of the proposed method in multi-target scenarios (the number of the targets is 3).The method of embedding a synthetic target into the images can be found in [22].The representative images and the corresponding results are displayed in the first row and second row of Figure 7.All the targets are detected successfully by the proposed method.3.6.Comparisons to Baseline Methods

Experiments on Simulated Data
In this experiment, we focus on analyzing and comparing the performance of different approaches on real infrared images with synthetic noise.To illustrate the difference between the original images and noisy images, we display the gray histograms of five representative frames in Figure 8.The representative images are chosen from one image of the corresponding 40 noisy images of Sequences 1-4 and from one image of Sequence 5 randomly.It can be observed from Figure 8 that the distributions of original and noisy images are quite different.Figure 9 shows the corresponding target images of different approaches.We can observe that both max-median filter and top-hat filter can not suppress the noise pixels clearly, and these residuals would increase F a .Besides, top-hat filter loses the target in Sequences 2 and 5.The performances of both max-median filter and top-hat filter are limited by the filtering size required to be fixed as an input parameter without any knowledge of the target size.Their performances degrade heavily when the filter size deviates from the target size.
From the comparison between the results of filtering based approaches and low-rank based approaches, we conclude that the latter can achieve better performance than the former ones.All the targets can be detected by IPI method, but many noise pixels are also retained due to the deficiency effects [28], especially for Sequences 2, 4 and 5.This phenomenon demonstrate that the IPI method is quite sensitive to salt and pepper noise and impulse noise.The RIPT approach has better background suppression ability than IPI approach, but we can find that it is also sensitive to salt and pepper noise from the corresponding results of Sequences 2 and 5.Moreover, the RIPT method fails in Sequence 3. MoG-MRF only detects the true targets of Sequence 1 and 4, the unsatisfying performance of MoG-MRF is because the i.i.d.MoG assumption is not suitable to the case when the noise distribution between different frames is nonidentical.Besides, the segmentation performance of MRF would degrade when the noise pixel is adjacent to true targets in complex noisy cases.From the last column of Figure 9, it can be observed that all targets are detected correctly by the proposed model while noise pixels and clutters being suppressed completely.In addition, we also use five metrics to analyze the performance of different approaches quantitatively.The LSNRG, BSF and SCRG values of different approaches for the representative images are given in Tables 4 and 5.The Inf means that the standard deviation of neighboring background is zero, and the high scores in the above three metrics only reflect the good suppression performance in a local region.Note that the values of low-rank based methods in the above three metrics are usually Inf, as the results of RIPT method, MoG-MRF method and the proposed method on Sequences 1 and 4. Considering the above phenomenon, the average CG and ABR values of all images are also computed for further comparison [32], as listed in Table 6.For quantitative analysis, the experiments have fixed false-alarm rates (F a ) by changing the segmentation thresholds on Sequences 1-5, which are given in Table 7.In conclusion, the proposed approach achieves the best performance.In conclusion, the proposed approach achieves the best performance.
Moreover, we show the ROC curves of different approaches in Figure 10.From the result, we can see that the F a of max-median on Sequences 2 and 5 are very high.The performance of the proposed approach is superior to other approaches on Sequences 1-3 and 5, which achieves the highest P d with very low F a , this is because the noise pixels and background residuals are suppressed thoroughly by the proposed method.As for Sequence 4, IPI achieves higher P d than that of the proposed method when F a ≤ 1.1.However, the proposed method can achieve higher probability of detection when F a > 1.1.The ROC curves of IPI and RIPT on Sequences 2 and 5 demonstrate that they are sensitive to salt and pepper noise, and the performance of MoG with MRF method is not satisfying due to the identical noise distribution assumption fails in complex noise case.

Experiments on Real Data
We also carry out additional experiments on three real and noisy infrared image sequences, namely, Sequences 6-8.Briefly, we use the most important metric, i.e., the ROC curves of 6 tested method on real image sequences, to compare their performance, which are shown in Figure 11.In addition, Table 8 shows the quantitative analysis, and the proposed approach achieves the highest P d with the same F a .The results demonstrate the superiority of the proposed approach on target detection, background clutter and noise suppression ability over other competitive methods, because the NMoG model and MFD maple improve the robustness of the proposed approach to different kinds of noise.

Complexity Analysis
Here, we analyze and compare the complexity of different approaches, which are listed in Table 9. (m, n) and L denote the image size and the structure element, respectively.(n 1 , n 2 , n 3 ) represent the dimensions of the tensor in RIPT model, and the details can be found in [30].As for the proposed method, let F ∈ R N×P , we firstly need to infer the parameters in NMoG model, and its complexity is O (N+P) R 3 +KNPR in each iteration.For computing MFD maple of an image with size of m × n, the whole computational cost is O mn(2s + 1) 2 .For target segmentation, the cost of this step is O (mn).Thus, the entire computation cost of the proposed method is O t (N+P) R 3 +kNPR + mn(2s + 1) 2 + mn , where t is the iteration number.The MoG with MRF method uses median operation to reconstruct image sequences, and its complexity is O (mnw), where w denotes the number of overlapped pixels during the transformation from the spatio-temporal patch image to the reconstruction image [32].In addition, we compare the computational time of different approaches on whole Sequence 6.It can be observed from the result that MOG with MRF method is the slowest while the top-hat filter is the fastest.The processing time of the RIPT approach is shorter than the IPI approach and max-median filter.The proposed approach is slower than RIPT method and two filtering methods, but the superiority of its performance over other baseline methods can make up for this deficiency.

Conclusions
In this paper, we propose a novel infrared small and faint target detection approach based on NMoG and MFD models for complex and noisy scenarios.The proposed model can finely accord with the noise characteristics embeded in real infrared image sequences by using the NMoG model.We model the recovery of a low-rank background component and noise component as an LRMF model, which can be solved by the VB algorithm.Finally, the target can be extracted correctly from the noise by using MFD maple.Experimental results show that the proposed approach performs better than other competitive approaches, since it is more robust to complex noisy scenarios in real application.

Figure 1 .
Figure 1.The framework of the proposed method.

Figure 2 .
Figure 2. The results of NMoG method with K = 3.

Figure 5 .
Figure 5.The receiver operating characteristic (ROC) curves of different values for the parameter K on Sequences 1-5.

Figure 6 .
Figure 6.The results of different methods on a representative image of Sequence 5. (a-e) are the original image, the noisy image, the results of the NMoG (non-independent and identical distribution (i.i.d.) mixture of Gaussians (MoG)) with MFD, NMoG without MFD and MoG with Markov random field (MRF), respectively.The red rectangles denote the targets and the green ellipses are representative examples of noise.

Figure 7 .
Figure 7. Multiple target scenes.The first and second row of (a-e) are five original images and corresponding results processed by the proposed method, respectively.

Figure 8 .Figure 9 .
Figure 8.The histograms of the representative frames in original and noisy Sequences 1-5.The first row of (a-e) are the histograms of five original infrared images for experiments.The second row of (a-e) are corresponding histograms of noisy infrared images.
and d t by Equation (17), respectively.2. Update approximate posterior of background component U, V by by Equations (18) and (19).3. Update approximate posterior of parameters in noise component γ t by Equation (20).4. Update t = t + 1. Noise component E by E = F − UV t .Decompose E into K components by Equation (

Table 1 .
Parameter setting of methods.

Table 2 .
Characteristics of noisy infrared sequences.

Table 3 .
The detection performance of the proposed method with different K values.

Table 4 .
Quantitative evaluation of methods for the representative images of Sequences 1-3.

Table 5 .
Quantitative evaluation of different methods for the representative images of Sequences 4 and 5.

Table 6 .
The evaluation results of average contrast gain (CG) and accuracy of background recovery (ABR) values of different methods for all image sequences.

Table 8 .
The detection performance of different methods on Sequences 6-8.