Fast and Robust Infrared Small Target Detection Using Weighted Local Difference Variance Measure

Infrared (IR) small-target-detection performance restricts the development of infrared search and track (IRST) systems. Existing detection methods easily lead to missed detection and false alarms under complex backgrounds and interference, and only focus on the target position while ignoring the target shape features, which cannot further identify the category of IR targets. To address these issues and guarantee a certain runtime, a weighted local difference variance measure (WLDVM) algorithm is proposed. First, Gaussian filtering is used to preprocess the image by using the idea of a matched filter to purposefully enhance the target and suppress noise. Then, the target area is divided into a new tri-layer filtering window according to the distribution characteristics of the target area, and a window intensity level (WIL) is proposed to represent the complexity level of each layer of windows. Secondly, a local difference variance measure (LDVM) is proposed, which can eliminate the high-brightness background through the difference-form, and further use the local variance to make the target area appear brighter. The background estimation is then adopted to calculate the weighting function to determine the shape of the real small target. Finally, a simple adaptive threshold is used after obtaining the WLDVM saliency map (SM) to capture the true target. Experiments on nine groups of IR small-target datasets with complex backgrounds illustrate that the proposed method can effectively solve the above problems, and its detection performance is better than seven classic and widely used methods.


Introduction
Infrared (IR) imaging technology has been widely used in civilian fields such as car navigation, diseased-cell diagnosis, industrial-flaw detection, physiological performance of animal life processes, and plant monitoring [1]. It is worth noting that the infrared search and track (IRST) system based on IR imaging technology has the advantages of passive surveillance, all-weather use, and high spatial resolution, and uses the difference in thermal radiation between the target and the background to achieve long-distance target detection [2,3]. It has very important application value in military fields such as precision guidance, early-warning systems, space-based surveillance, and geological analysis [4,5]. IR small-target detection plays a vital role in these applications. To find the target as early as possible, long-distance detection and tracking are required, so the target has few pixel and texture features and lacks shape and structure information [6]. Furthermore, targets are usually immersed in complex backgrounds, and targets can be affected by a low signal-to-clutter ratio (SCR) [7]. Therefore, IR small-target detection is still a difficult and challenging task.
IR small-target detection methods in complex scenes can be divided into sequence detection methods and single frame detection methods [8,9]. Compared with the sequence detection method, the single frame detection method has a small amount of calculation and strong scene adaptability. Since real-time target detection becomes urgent in the military application of an IRST system, research based on single frame detection method is very necessary [10,11].
Existing single-frame detection methods can be divided into four categories. The first category is based on filtering methods, which are divided into algorithms based on spatial filtering and algorithms based on frequency-domain filtering. The algorithm based on spatial filtering is simple in design, fast in calculation speed, and has better performance in a uniform background, but it is easy to cause false detection in a complex background and has poor robustness [12,13]. Although algorithms based on frequency-domain filtering can suppress complex backgrounds, they have high computational complexity [14,15]. The second category is based on low-rank sparse restoration methods, which have high detection performance under strong noise background conditions, but have high computational complexity when dealing with large-scale images [16][17][18][19]. The third category is methods based on deep learning, which can improve the detection accuracy of small targets to a certain extent, but lack many datasets in various forms, which is challenging [20][21][22][23][24]. The fourth category is methods based on the human visual system. This system is relatively real-time and it is not easy to lose target features during the detection process and, but it is easy to cause false positives in complex scenes [25][26][27][28][29][30][31][32][33][34][35]. Given the importance of real-time detection and detection rate, this paper was inspired by the human visual system, a brief overview of detection methods based on the human visual system follows.
IR small-target detection algorithms based on the local-contrast method of the human visual system have attracted much attention. These algorithms focus on the differences between the target and the background surrounding it. For instance, Chen et al. [5] proposed a local contrast measure (LCM) that uses nested windows with eight orientations to suppress background edges; Han et al. [25] proposed an improved LCM (ILCM) that uses the target area average to suppress pixel-sized noise with high brightness (PNHB); Han et al. [26] proposed the relative LCM (RLCM) computed by combining ratio differences, and then generalizing it to the sub-block level [27]; Wei et al. [28] used the multi-scale patch-based contrast measure (MPCM) algorithm to fuse the corresponding two directions into a whole to capture the target; Han et al. [29] adopted a multi-scale three-layer local contrast measure (TLLCM), used Gaussian filtering to enhance the target area, and took the average value of several largest pixels in the surrounding area; Moradi et al. [30] proposed absolute directional mean difference (ADMD), which uses an orientation method to suppress the structural background; and Zhang et al. [20] proposed a multi-scale strengthened directional difference (MSDD) algorithm, which combines the local directional-intensity measure and the local directional-fluctuation measure to effectively suppress the angular clutter. Furthermore, in existing studies, many researchers are keen to employ weighting functions on top of basic local-contrast algorithms to improve detection performance. For example, Qin et al. [10] used the variance of the central unit as the weight function; Deng et al. [31] improved the local entropy as the weight function; Nasiri et al. [32] used the center and surrounding variance difference (VAR_DIFF) as the weighting function; Liu et al. [33] proposed a weighted LCM, which defines a weighting function based on the strong clutter edge features; Lv et al. [34] proposed the regional intensity level (RIL) algorithm to assess the complexity level of each unit, taking the RIL difference between the central unit and its surrounding background as a weighting function; and Han et al. [35] proposed weighted strengthened LCM (WSLCM) and proposed an improved RIL (IRIL) that replaces the maximum with the average of several maximum grayscale calculations.
The weighted LCM using more local information can reduce the false-alarm rate to a certain extent. However, there are still some problems. First, current algorithms usually directly compute the contrastive information between the target area and surrounding areas, but when the target scale is small, the edge information cannot be captured for effective enhancement. Second, some weighting algorithms increase the time of image processing during detection. Third, the existing methods do not sufficiently consider the shape of the true target, and the detection process is easily disturbed by noise.
To better enhance targets of different scales in different complex scenes, ensure a certain detection time, better preserve target shape characteristics, and reduce false-alarm rates, a detection framework based on weighted local difference variance measure (WLDVM) is proposed. First, the image is preprocessed by Gaussian filtering, and then according to the distribution characteristics in the target area, the target area is divided into a new tri-layer filtering window and the window intensity level (WIL) value of each layer of windows is calculated. Second, the local difference variance measure (LDVM) and weighting function are calculated by ratio and difference operations using the obtained position and WIL value of each layer window. Finally, a simple threshold is used to segment the fused result WLDVM to capture the true target. The contributions of this paper are as follows: 1.
The new tri-layer filtering window is proposed. The target area is divided according to its distribution characteristics and size, which can adapt to the detection of targets of different scales and save detection time.

2.
WIL is proposed. Each layer of window uses the mean of the two largest subblock averages instead of the single largest subblock average to better capture the target and suppress edge noise.

3.
LDVM is proposed. Through the idea of local fluctuation, the target area is further enhanced, and the high-brightness background is eliminated.

4.
A detection framework based on WLDVM is proposed. The experimental results using multiple sets of IR datasets show that the proposed algorithm has the best detection performance and consumes less time. Figure 1 shows the proposed WLDVM algorithm framework. First, the image is preprocessed by Gaussian filtering, and the WIL values of each layer are calculated through the new tri-layer filtering window. Then, according to the WIL value and location of each layer, the idea of local fluctuation and background estimation is introduced to calculate LDVM and weighting function. The true small target is the most prominent in the final weighted result, which can be easily captured with a simple threshold segmentation. To better enhance targets of different scales in different complex scenes, ensure a certain detection time, better preserve target shape characteristics, and reduce false-alarm rates, a detection framework based on weighted local difference variance measure (WLDVM) is proposed. First, the image is preprocessed by Gaussian filtering, and then according to the distribution characteristics in the target area, the target area is divided into a new tri-layer filtering window and the window intensity level (WIL) value of each layer of windows is calculated. Second, the local difference variance measure (LDVM) and weighting function are calculated by ratio and difference operations using the obtained position and WIL value of each layer window. Finally, a simple threshold is used to segment the fused result WLDVM to capture the true target. The contributions of this paper are as follows:

Proposed Algorithm
1. The new tri-layer filtering window is proposed. The target area is divided according to its distribution characteristics and size, which can adapt to the detection of targets of different scales and save detection time. 2. WIL is proposed. Each layer of window uses the mean of the two largest subblock averages instead of the single largest subblock average to better capture the target and suppress edge noise. 3. LDVM is proposed. Through the idea of local fluctuation, the target area is further enhanced, and the high-brightness background is eliminated. 4. A detection framework based on WLDVM is proposed. The experimental results using multiple sets of IR datasets show that the proposed algorithm has the best detection performance and consumes less time. Figure 1 shows the proposed WLDVM algorithm framework. First, the image is preprocessed by Gaussian filtering, and the WIL values of each layer are calculated through the new tri-layer filtering window. Then, according to the WIL value and location of each layer, the idea of local fluctuation and background estimation is introduced to calculate LDVM and weighting function. The true small target is the most prominent in the final weighted result, which can be easily captured with a simple threshold segmentation.

Gaussian Filtering Pre-Processing
Small targets in IR image usually have a low SCR and are susceptible to noise interference because of the effects of long-distance and atmospheric transmission. These factors make detection more difficult, requiring noise suppression and target enhancement.

Gaussian Filtering Pre-Processing
Small targets in IR image usually have a low SCR and are susceptible to noise interference because of the effects of long-distance and atmospheric transmission. These factors make detection more difficult, requiring noise suppression and target enhancement. The best filter for improving the target should have the same distribution as the target, according to the matched filter theory [36] and given that small IR targets have Gaussian-like properties and that Gaussian filters are excellent at suppressing high-frequency IR image components including scattered noises, Gaussian noises, and PNHB [29,37]. In this study, noise is reduced and small targets are enhanced using Gaussian filtering. The result of the Gaussian filtering operation is expressed as where G is the Gaussian template and I is the original IR image.

Construction of the New Tri-Layer Filtering Window
Traditional LCM and its improved algorithms adopt a double-layer filtering window, the central unit captures the target area, and the surrounding units capture the background area around the target, see Figure 2a. But when the scale of the true target area is smaller than the central unit scale, the detected target will be enlarged. Therefore, Nasiri et al. [32] made an improvement and proposed a three-layer nested window to divide the central unit into two parts, namely the core layer and the reserve layer. The core layer captures the main energy of the target area, and the reserve layer separates the target from its surrounding units, see Figure 2b. Usually, PNHB in complex backgrounds is difficult to suppress because its core layer differs significantly from surrounding layers. The best filter for improving the target should have the same distribution as the target, according to the matched filter theory [35] and given that small IR targets have Gaussianlike properties and that Gaussian filters are excellent at suppressing high-frequency IR image components including scattered noises, Gaussian noises, and PNHB [29,36]. In this study, noise is reduced and small targets are enhanced using Gaussian filtering. The result of the Gaussian filtering operation is expressed as where is the Gaussian template and is the original IR image.

Construction of the New Tri-Layer Filtering Window
Traditional LCM and its improved algorithms adopt a double-layer filtering window, the central unit captures the target area, and the surrounding units capture the background area around the target, see Figure 2a. But when the scale of the true target area is smaller than the central unit scale, the detected target will be enlarged. Therefore, Nasiri et al. [32] made an improvement and proposed a three-layer nested window to divide the central unit into two parts, namely the core layer and the reserve layer. The core layer captures the main energy of the target area, and the reserve layer separates the target from its surrounding units, see Figure 2b. Usually, PNHB in complex backgrounds is difficult to suppress because its core layer differs significantly from surrounding layers. It is well known that the real target area has a compact two-dimensional Gaussian shape distribution whose intensity weakens towards the surroundings, as shown in Figure  2d, while PNHB does not possess such a distribution. In this paper, according to the target area distribution characteristics in Figure 2d, a new tri-layer filtering window is proposed to capture the target area, namely inner layer ( yellow area), middle layer ( green area), and outer layer ( blue area), see Figure 2c. According to SPIE, the total spatial extent of the small target is usually less than 80 pixels [5]. Therefore, the inner layer is set to 1 × 1; through four directions and the middle layer and the outer layer are each divided into four subblocks. The subblock of the middle layer is a symmetrical trapezoid with a height of 2, an upper base of 1, and a lower base of 3. The subblock of the outer layer is a symmetrical trapezoid with a height of 2, an upper base of 5, and a lower base of 7. The proposed new tri-layer filtering window can adapt to the detection of targets of different scales, and its total space is small, which will make the algorithm run faster.

Calculation of the Window Intensity Level (WIL)
Apply the new tri-layer filtering window from top to bottom and left to right on the Gaussian filtered image and follow the steps below to calculate the WIL value for each layer of each pixel. It is well known that the real target area has a compact two-dimensional Gaussian shape distribution whose intensity weakens towards the surroundings, as shown in Figure 2d, while PNHB does not possess such a distribution. In this paper, according to the target area distribution characteristics in Figure 2d, a new tri-layer filtering window is proposed to capture the target area, namely inner layer (T 0 yellow area), middle layer (T 1 green area), and outer layer (T 2 blue area), see Figure 2c. According to SPIE, the total spatial extent of the small target is usually less than 80 pixels [5]. Therefore, the inner layer is set to 1 × 1; through four directions and the middle layer and the outer layer are each divided into four subblocks. The subblock of the middle layer is a symmetrical trapezoid with a height of 2, an upper base of 1, and a lower base of 3. The subblock of the outer layer is a symmetrical trapezoid with a height of 2, an upper base of 5, and a lower base of 7. The proposed new tri-layer filtering window can adapt to the detection of targets of different scales, and its total space is small, which will make the algorithm run faster.

Calculation of the Window Intensity Level (WIL)
Apply the new tri-layer filtering window from top to bottom and left to right on the Gaussian filtered image and follow the steps below to calculate the WIL value for each layer of each pixel.

1.
For the inner layer: where GI T 0 is the pixel in cell T 0 of the Gaussian filtered image GI.

2.
For the middle and outer layers: First, the average value of each subblock in the layer is calculated as the key parameter for the next calculation: where N T ij is the total number of pixels in cell T ij , and GI k T ij is the gray value of the kth pixel in cell T ij .
WIL T i is the mean of the m largest M T ij values in T i area, that is, where M l T i is the lth largest M T ij value in the T i area. The distribution trend of the cloud layer is a gradual process; the interior of the cloud layer changes slowly, and the gray value of the edge fluctuates greatly, as shown in Figure 2e. With this type of edge it is easy to cause the occlusion of the weak target, and the gray value of the inner layer of the small target at the cloud edge is at least not much different from the average gray value of a sub-block of other layers. To effectively enhance this type of small target to avoid missed detection, m in Equation (4) needs to be greater than 1. When m is greater than 2, edge clutter will be enhanced to cause false alarms, so 2 is the most suitable value for m.

Local Difference Variance Measure (LDVM)
The local contrast in the form of differences can eliminate the high-brightness background. The difference of WIL is defined by the difference between layers as where WIL T q indicates that the maximum value in WIL T i is in the T q layer and WIL T p indicates that the minimum value in WIL T i is in the T p layer. Clutter can be further suppressed by non-negative constraints. Through the above calculation, there are cases where pixels are suppressed at the edges inside the target area. To prevent these pixels from being suppressed by further calculations, this paper enhances areas with large local fluctuations by computing the mean filtering of the square of the image minus the square of its mean filtering. The LDVM of each pixel is defined as where M 2L and M L are defined as Sensors 2023, 23, 2630 6 of 17 where MF is a 5 × 5 normalized mean filtering template. Obviously, the gray value of the local area of the pixels at the edges inside the target area fluctuates greatly, so these pixels are effectively enhanced.

Weighting Function
The local contrast in the form of ratio can enhance the true target. The ratio of WIL is defined by the difference between layers as Mean filtering can reduce the sharp change of image gray value to achieve the purpose of smoothing the image. In this paper, the background estimation is performed by mean filtering as Although RoW IL as an enhancement factor can effectively enhance the target area, there is still a lot of background clutter. In this paper, background estimation is used to calculate the weight function of each pixel in the form of ratio difference combination to suppress part of the background clutter, which is defined as In general, the weight of the true target is very large, and its surrounding local background is completely suppressed, so the weighting function fully considers the shape of the target.

Weighted Local Difference Variance Measure (WLDVM)
The LDVM and weighting function are fused to obtain the WLDVM of the current pixel, that is WLDVM(x, y) = W(x, y)LDVM(x, y).
The calculation of LDVM can better eliminate the high-brightness background and make the whole target appear high-brightness. The operation result of the weighting function can fully consider the shape of the target. The WLDVM algorithm preserves the shape of the original target, the target is effectively enhanced, and the background is effectively suppressed. In most cases the target size is unknown and multi-scale detection is required. In this paper, multi-scale detection is not required, and efficient detection can be performed, which greatly saves detection time.

Threshold Operation
The saliency map (SM) of each IR image can be obtained by computing WLDVM and the different results produced by the pixels of different situations are analyzed.

1.
For a pixel in the real target area, since the target area often presents a compact twodimensional Gaussian shape, its DoW IL will be large and RoW IL > 1, and its LDVM and weight will be very large. Hence, the resulting value of WLDVM will be large.

2.
For a pixel in the pure background area, since the pure background area is often continuous and evenly distributed, its DoW IL ≈ 0 and RoW IL ≈ 1, then its LDVM ≈ 0 and W ≈ 0. Therefore, WLDVM ≈ 0.

3.
For a pixel at the edge of the background, its DoW IL may be greater than 0 but less than that of the true target, so its LDVM is much less than that of the true target; in addition, RoW IL may be greater than 1, but its enhancement effect is not much different from the local background estimation, so the corresponding W will be less than the true target's W. Hence, its WLDVM is much less than that of the true target.

4.
For a pixel in the PNHB area, its DoW IL will be less than that of the true target, and thus its LDVM will be less than the true target's LDVM; in addition, its W will be less than the true target's W. Hence, its WLDVM is much less than that of the true target.
As can be seen from this discussion, the true target area will be the most salient in SM, so a simple threshold operation is used to extract it, the threshold is defined as where max SM is the maximum gray value of SM, and mean SM is the average gray value of SM. λ is an experimental constant between 0 and 1. In the experimental part, the value of λ is analyzed in detail, and the experiment shows that λ can take any value between 0.5 and 0.6.

Experimental Results
To demonstrate the detection performance of the proposed algorithm, nine groups of IR datasets were used, including three sets of real IR sequences (datasets 1, 3, and 4), five sets of simulated IR sequences (datasets 2, 5, 6, 7, and 8), and one non-sequential dataset (dataset 9). Datasets are shown in [5,[38][39][40][41][42][43]. The targets in dataset 1 are all immersed in very complex dense cloud cover and most of the targets have very low contrast. The targets in dataset 2 are all immersed in a dimly lit background. The target in dataset 3 moves from the cloud layer to the cloudless area, some targets have low contrast, and the background contains a lot of noise. The aircraft target in dataset 4 is large in scale and immersed in a cloudless area with a few thin clouds in the background. The target in dataset 5 moves from a cloudless dark background area into thin cloud cover. The target in dataset 6 is immersed in a complex air and sea background, which contains many PNHBs. Targets in dataset 7 move from a background containing buildings. The targets in dataset 8 are immersed in complex and changing land backgrounds. Dataset 9 consists of representative images of different sequences, with both targets and backgrounds differing between images. Additional details are shown in Table 1. First, we analyzed the effect of λ value on detection performance. Table 2 shows the number of false-alarm images N FA and the number of missed images N MD corresponding to different datasets under different values of λ, where λ increases from 0 to 1 with a step size of 0.1. The experiments showed that when the value of λ was between 0.5 and 0.6, small targets in different complex scenes could be effectively captured, there were no missed detections and false alarms in any dataset, and high classification accuracy could be obtained.  Then, seven LCM-based algorithms were selected from multiple perspectives for comparison with the proposed algorithm, including LCM [5], MPCM [28], RLCM [26], TLLCM [29], VAR_DIFF [32], ADMD [30], and WSLCM [35]. Among them, VAR_DIFF and TLLCM are local-contrast algorithms based on tri-layer windows, and the rest are localcontrast algorithms based on double-layer windows; RLCM, TLLCM, and WSLCM are local-contrast methods using ratio difference joint operations; and VA_DIFF and WSLCM are local-contrast algorithms that use the weighting function.
To analyze different methods intuitively, Figure 3 shows the SMs of different algorithms. Each dataset's original image sample may be found in the first column. The target size was variable, the backdrop was intricate, and there were various levels of noise present. As shown in the second column of the figure, LCM enhanced the target and made the target area larger, while enhancing the noise, and the background suppression effect was not good. As shown in the third column of the figure, MPCM enhanced the target but did not preserve the target shape very well and had a certain suppression effect on the background and noise, but when the background was more complex, the detection effect was not good. As shown in the fourth column of the figure, RLCM enhanced the target and made the target area larger and had a general effect on background and noise suppression. As shown in the fifth column, TLLCM had a mediocre level of noise and background suppression efficiency, but the detection effect was poor when the background was complicated. As shown in the sixth, seventh, and eighth columns of the figure, VAR_DIFF, ADMD, and WSLCM had better background suppression effects, but when the background was complex, the noise suppression effect was average, and the detection performance was unstable. As shown in the ninth column of the figure, the proposed method effectively improved the target SCR and better preserved the target outline, could better suppress the background and noise, and the detection performance was the best.  To illustrate the detection performance of these algorithms, the indicator's signal-toclutter ratio gain (SCRG) and background suppression factor (BSF) before thresholding are used simultaneously, and defined as where SCR in is the SCR value of the original image, SCR out is the SCR value of the SM, δ in is the standard deviation of the non-target area in the original image and δ out is the standard deviation of the non-target area in the SM, and m t is the mean of the target area, m b and σ b are the mean and standard deviation of the local background area around the target, respectively. It can be seen in Table 3 that VAR_DIFF had one set with the highest SCRG value, WSLCM had two sets with the highest SCRG value, and the proposed algorithm had six sets with the highest SCRG value and the highest average SCRG value.
The results show that the proposed method achieved more significant target enhancement before thresholding than other methods. VAR_DIFF had one set with the highest BSF value, WSLCM had five sets with the highest BSF value, and the proposed algorithm had three sets with the highest BSF value and the highest mean. It shows that the background-suppression ability of the proposed algorithm is equivalent to that of the WSLCM algorithm, and better than that of other algorithms.  Figure 4 depicts the receiver operating characteristic (ROC) curves for different algorithms to evaluate the target-enhancement ability and background-suppression ability after thresholding, where the false-positive rate (FPR) and the true-positive rate (TPR) are the horizontal and vertical coordinates of the ROC curve [44], respectively, and are defined as where N f alse is the number of detected false targets, N pixel is the total number of pixels in the whole image, N detected is the number of detected true targets, and N ture is the total number of true targets. In the ROC curve, the more the curve shifts to the upper left corner, the better the detection performance will be. Under the same FPR, the larger the TPR, the better the performance of the algorithm. As can be seen from Figure 4, when FPR = 10 −5 : • LCM had a TPR greater than 0.9 and less than 1 in dataset 4, and performed poorly in other datasets; • MPCM had a TPR greater than 0.9 and less than 1 in dataset 5, and performed poorly in other datasets; • RLCM achieved the highest TPR in dataset 2, and performed poorly in other datasets; • TLLCM achieved the highest TPR in dataset 2, TPR greater than 0.9 and less than 1 in dataset 8, while performing poorly in other datasets; • VAR_DIFF achieved the highest TPR in datasets 2 and 4, TPR greater than 0.8 and less than 1 in datasets 1, 7, 8, and 9, and performed poorly in other datasets; • ADMD achieved the highest TPR in dataset 7, while performing poorly in other datasets; • WSLCM achieved the highest TPR in datasets 2, 4, 6, 7, and 8, and the TPR was greater than 0.8 and less than 1 in datasets 1, 3, 5, and 9; • The proposed algorithm achieved the highest TPR in all nine datasets.
Obviously, the proposed algorithms achieved satisfactory results, but the existing algorithms were affected by varying degrees of background clutter, resulting in algorithm instability. Compared with existing algorithms, the proposed algorithm was more stable, could effectively handle different scenarios, and had the best detection performance.  Table 4 reports the full specification of the implementation environment. The mean runtime was used to demonstrate the computational complexity of different detection algorithms. As can be seen in Table 5, the VAR_DIFF algorithm was faster than other existing algorithms, and the proposed algorithm was second only to the VAR_DIFF algorithm. Although our method was not the most time efficient, it was still relatively fast.  Obviously, the proposed algorithms achieved satisfactory results, but the existing algorithms were affected by varying degrees of background clutter, resulting in algorithm instability. Compared with existing algorithms, the proposed algorithm was more stable, could effectively handle different scenarios, and had the best detection performance. Table 4 reports the full specification of the implementation environment. The mean runtime was used to demonstrate the computational complexity of different detection algorithms. As can be seen in Table 5, the VAR_DIFF algorithm was faster than other existing algorithms, and the proposed algorithm was second only to the VAR_DIFF algorithm. Although our method was not the most time efficient, it was still relatively fast. It can be seen from all the above experimental results that none of the existing algorithms could preserve the shape characteristics of the target well. Among them, LCM diffused the target, RLCM diffused the smaller scale target; and LCM, MPCM, RLCM, and TLLCM had poor background suppression. Although VAR_DIFF, ADMD, and WSLCM had strong background suppression capabilities, the detection rate of ADMD was average, and VAR_DIFF and WSLCM had missed detection in scenes with low SCR targets. Although only WSLCM had a relatively low false-alarm rate among existing algorithms, the detection of the WSLCM algorithm is particularly time-consuming; however, the proposed algorithm can preserve the target shape well, has strong background suppression ability, high detection rate, low false-alarm rate, and faster detection speed. In general, the proposed algorithm can effectively preserve the shape features of targets of different scales and types, and can adapt to detection in different scenarios, which further guarantees the speed of the algorithm based on effective detection. Therefore, the proposed algorithm performs better overall.
Furthermore, to evaluate the robustness of the proposed algorithm against noise, different types of noise were added to dataset 3 which already contained different degrees of noise for performance comparison. Figure 5 shows representative images of the original IR dataset 3 and images with different types of noise added. Five types of noise were added in the experiment, including Gaussian white noise with a variance of 0.001, Poisson noise, Rayleigh noise with a variance of 15, multiplicative noise with a variance of 3, and uniform noise with a minimum of −14 and a maximum of 14. Table 6 shows the adaptive threshold calculation formulas corresponding to different algorithms and the range of experimental constants. VAR_DIFF and ADMD do not give specific threshold formulas, so the other 5 algorithms were selected for comparison, and the middle value of the applicable range of the constant was used as the constant value in the experiment, the specific information is shown in Table 6. Figures 6 and 7, respectively, show the number of missed images and the number of false-alarm images after different detection algorithms pass the corresponding threshold operation under different noise-type datasets. The experimental results show that the proposed algorithm did not miss detection under the influence of different types of noise. Although the proposed algorithm had false alarms under the influence of Poisson noise and Rayleigh noise, other algorithms had missed detection and false positives under the influence of different noises. Overall, compared with other methods, the proposed algorithm could successfully suppress most of the noise and had strong robustness against noise. different noise-type datasets. The experimental results show that the proposed algorithm did not miss detection under the influence of different types of noise. Although the proposed algorithm had false alarms under the influence of Poisson noise and Rayleigh noise, other algorithms had missed detection and false positives under the influence of different noises. Overall, compared with other methods, the proposed algorithm could successfully suppress most of the noise and had strong robustness against noise.

Conclusions
This paper proposes an IR small-target detection algorithm based on WLDVM. The proposed algorithm performs preprocessing operations through the idea of matched filtering, which can reduce noise and enhance small targets to a certain extent. The distribution characteristics of the target area are fully utilized to divide the window area, which can adapt to the detection of small targets of different scales. LDVM can more effectively highlight the target area and eliminate the bright background, thereby effectively improving the detection rate and reducing the false-alarm rate. The weighting function can improve the adaptability to complex backgrounds and can preserve the shape features of targets of different scales. The fused results can further reduce the missed-detection rate and false-positive rate in complex scenes, thus achieving strong robust detection. Experiments show that the algorithm has good anti-noise ability and is robust to objects of different scales and categories under complex backgrounds. Compared with other methods, the proposed method has obvious advantages in quantitative results such as BSF, SCRG, and the mean runtime, and can better preserve the shape features of targets visually. In future work, we will further study the application of this method in the recognition of IR target categories such as tanks, warships, and aircraft.