DP–MHT–TBD: A Dynamic Programming and Multiple Hypothesis Testing-Based Infrared Dim Point Target Detection Algorithm

: The detection and tracking of small targets under low signal-to-clutter ratio (SCR) has been a challenging task for infrared search and track (IRST) systems. Track-before-detect (TBD) is a widely-known algorithm which can solve this problem. However, huge computation costs and storage requirements limit its application. To address these issues, a dynamic programming (DP) and multiple hypothesis testing (MHT)-based infrared dim point target detection algorithm (DP–MHT–TBD) is proposed. It consists of three parts. (1) For each pixel in current frame, the second power optimal merit function-based DP is designed and performed in eight search areas to ﬁnd the target search area that contains the real target trajectory. (2) In the target search area, the parallel MHT model is designed to save the tree-structured trajectory space, and a two-stage strategy is designed to mitigate the contradiction between the redundant trajectories and the requirements of more trajectories under low SCR. After constant false alarm segmentation of the energy accumulation map, the preliminary candidate points can be obtained. (3) The target tracking method is designed to eliminate false alarms. In this work, an efﬁcient second power optimal merit function-based DP is designed to ﬁnd the target search area for each pixel, which greatly reduces the trajectory search space. A two-stage MHT model, in which pruning for the tree-structured trajectory space is avoided and all trajectories can be processed in parallel, is designed to further reduce the hypothesis space exponentially. This model greatly reduces computational complexity and saves storage space, improving the engineering application of the TBD method. The DP–MHT–TBD not only takes advantage of the small computation amount of DP and high accuracy of an exhaustive search but also utilizes a novel structure. It can detect a single infrared point target when the SCR is 1.5 with detection probability above 90% and a false alarm rate below 0.01%.


Introduction
Target detection techniques in infrared (IR) images have played an important role in military and civil applications. IR imaging of long-range space targets, which can be modeled as point targets, contain little information about shape and texture. These point targets are usually buried in strong clutter. Therefore, detecting dim small IR targets remains a challenging problem [1][2][3][4]. During the past few decades, scholars have performed many studies on dim small-target detection. Existing IR small-target detection methods can be roughly divided into three categories: TBD methods, detect-before-track (DBT) methods and deep learning (DL)-based methods.
All methods have advantages and disadvantages. DBT methods, such as the classical local contrast measure (LCM) [5], local intensity and gradient (LIG) [6] and IR patch-image model (IPI) [7], concentrate on detecting targets in a single frame based on prior information. These methods are usually based on the image data structure and are designed to enhance the target, suppress the background, expand the contrast between the target and the background to improve the recognizability of the target, and then realize the detection of the target. Generally, this kind of method has the advantages of low complexity, high efficiency and easy hardware implementation. However, when the target SCR is low (SCR ≤ 1.5), many false alarms are produced, and the detection accuracy is decreased. Under low SCR conditions, TBD methods, such as 3-D matched filtering [8], Hough transform (HT) [9][10][11], DP [12] and MHT [13], are more stable and demonstrate better detection performance because they usually utilize spatial and temporal information by capturing the target trajectory. However, the computational cost and storage requirements of this kind of method inevitably increase, making practical applications difficult. Recently, DL methods, such as the single shot multibox detector (SSD) [14], you only look once (YOLO) [15], and Faster-RCNN [16], have already achieved remarkable progress in the target detection field. Relying on their strong feature extraction and generalization ability, neural network-based infrared target detection methods, such as the spatial-temporal feature-based detection framework [1], target-oriented shallow-deep feature (TSDF)-based detection method [2] and TBC-Net [17], have attracted increasing attention. In the previous work [1,2], some DL-based IR target detection algorithms were analyzed in detail. When the target has certain shape or texture information (size greater than 2 × 2) and the characteristics of background clutter are spatially nonstationary but temporally stationary, as with the images in Figure 1a, DL methods could play a role in the IR target detection field by assisting some techniques, such as shallow-deep feature fusion, spatial-temporal information extraction, and a reasonable sample selection strategy. However, when the target occupies only one pixel and the background clutter is temporally nonstationary, as with the images in Figure 1b, it is difficult for the DBT and DL methods to detect targets mainly for the following reasons. (1) Because the target SCR is very low, there are many pixels similar to the target in a single frame image, and the contrast between the target and the background is very low. Even the human eye cannot distinguish the target from the background by relying only on gray information. (2) The spatial-temporal information extraction method applied to the temporally stationary background has difficulty playing a role in the temporally nonstationary background, resulting in the neural network not learning the useful features. Taking the spatial-temporal information extraction method in [2] as an example, to suppress the background and extract the moving features of the target, the operations of frame subtraction and addition are used. When applied to an infrared sequence with a temporally nonstationary background, this kind of operation will not suppress the background but will reduce the SCR of the target and increase the difficulty of detection. This work focuses on the dim point target detection task. The problem analysis, methods and experiment part in this work focus on the 1 × 1 point target. The key to solving the problem of infrared point target detection under low SCR conditions is to adopt an appropriate TBD method; that is, first accumulate the energy of the target along the target trajectory to improve the SCR, and then detect the target.
In recent years, DP-TBD has become a hot research direction. The core idea of DP is to transform a complex multistep decision process into multiple single-stage decision processes and then optimize each decision to obtain a global optimal solution. The algorithm has clear principles, and the operation is based on the pixel level. This method is relatively small in computation and storage and easy to implement in hardware. There have been several improved DP-TBD methods in recent years. Direction information-based DP methods [18][19][20] have been proposed to reduce the number of pseudo trajectories under a strong clutter background and reduce the search range to decrease the diffusion effect of the target energy in the accumulation process. As the coverage of the target maneuvering range is limited to the fixed transition step, ISTS-DP-TBD (TBD algorithm with an improved state transition set) [21] was proposed. The state search efficiency of the maneuvering target is improved with optimization of the state transition strategy. PC-DP-TBD (DP-TBD method using parallel computing) [22] was proposed to solve the problem that the adjacent targets may interfere with each other and the computational complexity is increased with the number of targets. Although some progress has been made in DP-TBD, there are several problems to be solved [23,24]: In recent years, DP-TBD has become a hot research direction. The core idea of DP is to transform a complex multistep decision process into multiple single-stage decision processes and then optimize each decision to obtain a global optimal solution. The algorithm has clear principles, and the operation is based on the pixel level. This method is relatively small in computation and storage and easy to implement in hardware. There have been several improved DP-TBD methods in recent years. Direction information-based DP methods [18][19][20] have been proposed to reduce the number of pseudo trajectories under a strong clutter background and reduce the search range to decrease the diffusion effect of the target energy in the accumulation process. As the coverage of the target maneuvering range is limited to the fixed transition step, ISTS•DP•TBD (TBD algorithm with an improved state transition set) [21] was proposed. The state search efficiency of the maneuvering target is improved with optimization of the state transition strategy. PC-DP-TBD (DP-TBD method using parallel computing) [22] was proposed to solve the problem that the adjacent targets may interfere with each other and the computational complexity is increased with the number of targets. Although some progress has been made in DP-TBD, there are several problems to be solved [23,24]: (1) The algorithm is affected by some parameters, such as the number of state transitions, the number of accumulation frames and the type of merit function value function. The selection of parameters has a great impact on the performance of the algorithm. It is difficult to find a set of parameters that is suitable for different backgrounds.
(2) Agglomeration effect. When tracking and detecting a target, the energy is accumulated along the possible trajectories. After K frame accumulation, the merit value of the target state increases, and in most cases this value is the largest. However, during the accumulation process, the energy of the target in the previous observation frame will diffuse to the next observation frame, forming a cluster of observation areas with similar energy. This is also known as the 'agglomeration effect'. This effect becomes increasingly (1) The algorithm is affected by some parameters, such as the number of state transitions, the number of accumulation frames and the type of merit function value function. The selection of parameters has a great impact on the performance of the algorithm. It is difficult to find a set of parameters that is suitable for different backgrounds.
(2) Agglomeration effect. When tracking and detecting a target, the energy is accumulated along the possible trajectories. After K frame accumulation, the merit value of the target state increases, and in most cases this value is the largest. However, during the accumulation process, the energy of the target in the previous observation frame will diffuse to the next observation frame, forming a cluster of observation areas with similar energy. This is also known as the 'agglomeration effect'. This effect becomes increasingly serious with increasing accumulation frames. When the SCR is low, after accumulation, the agglomeration effect will occur not only in the target but also in noise with strong energy, and even the diffusion of noise will be greater than that of the target. In this case, the state of this kind of noise is stronger than that of the target, which is not conducive to target detection or tracking. In addition, when there are multiple adjacent targets, there will be many trajectories with similar energy after energy accumulation. At this time, the agglomeration areas of different targets may overlap, causing difficulties in differentiating different targets. This is also the reason why most algorithms require the targets to be neither adjacent nor intersected.
Due to the above problems, compared with exhaustive search methods, the DP-TBD method performance can be reduced by 3 dB [12]. When the energy of the detected target drops to a certain extent, even if the number of accumulated observation frames is increased, the detection performance is poor. Even for some improved methods [21,[25][26][27], the detection performance will be greatly reduced when the SCR is lower than 1.8.
Therefore, to achieve high-precision point target detection under low SCR (SCR ≤ 1.5), an exhaustive search method needs to be adopted. The real target trajectory can be found by searching as many trajectories as possible. The classical representative algorithm is MHT. However, the number of trajectories increases exponentially with the number of accumulation frames, causing tremendous computation and storage costs, so it is difficult to realize the accumulation of many frames. Therefore, the key to exhaustive search methods is to design a reasonable strategy to reduce the number of redundant trajectories.
In summary, to detect point targets under very low SCR (SCR ≤ 1.5), it is necessary to design a TBD algorithm that can reduce the trajectory hypothesis space. Based on this core idea, the DP-MHT-TBD is proposed. The main contributions are summarized as follows: (i) A second power optimal merit function-based DP method is designed. It can find the target search area whose range is 90 • (all range is 360 • ) with high confidence and can reduce the trajectory hypothesis space by 3 4 for each pixel. (ii) A two-stage MHT model is designed. It can reduce the trajectory hypothesis space exponentially, be operated in parallel, avoid pruning for the tree-structured trajectory space, greatly reduce the computational cost and save the storage space.
(iii) The proposed DP-MHT-TBD improves the engineering application of the TBD method. It takes advantage of the DP and exhaustive search, utilizes a novel structure, and can detect point targets when the SCR is 1.5 with a probability of more than 90% and a false alarm rate of less than 0.01%.
The remainder of this article is organized as follows. In Section 2, the methodology is described in detail. An overview of the proposed DP-MHT-TBD detection framework in IR images is given. In Section 3, simulation experiments and an analysis of the results are presented. In Section 4, discussions are given. In Section 5, conclusions are given.

Methodology
The proposed DP-MHT-TBD, which consists of a second power optimal merit functionbased DP, a two-stage MHT model and a target tracking method, is described in detail.
In Section 2.1, through qualitative and quantitative analysis, the agglomeration effect caused by the diffusion of energy in the process of accumulation is studied. It is concluded that using DP to find the target search area is more reliable than directly detecting the target. Based on this idea and the property of the power function, the second power optimal merit function-based DP is designed to obtain a 90 • target search area for each point, reducing the trajectory hypothesis space by 3 4 . In Section 2.2, a novel parallel MHT is first designed. For each point, via the proposed MHT, the tree-structured trajectory space can be quickly obtained, and all trajectories can be processed in parallel. The final accumulated energy can be obtained by using only one testing. Compared with the classical MHT, a one-by-one search of root nodes in all trajectories is avoided, multistage testing is avoided, and pruning for the tree-structured trajectory space is avoided, greatly reducing the computational complexity. In addition, to mitigate the contradiction between the redundant trajectories and the requirements of more trajectories under low SCR, a two-stage strategy is designed. This not only ensures that the target energy is accumulated to a certain extent under low SCR but also reduces the number of trajectories exponentially. The two-stage MHT not only reduces the amount of calculation but also saves on the storage requirements.
In Section 2.3, a target tracking method that can eliminate false alarms by deleting discontinuous trajectories is introduced.
In Section 2.4, the DP-MHT-TBD detection framework is shown; it is a sequential detection process.

Basic DP Model
In infrared images, from the 1th measurement to kth measurement, the target observation model is z 1:k = {z 1 , z 2 · · · z k }.
where k denotes the kth measurement and η denotes observation noise, which obeys a Gaussian distribution N ∼ (µ, σ). The mean value is µ, and the standard deviation is σ. x k is the state vector of the target at the kth measurement frame, which can be described as: where (x k , y k ) denotes the position of the discrete target state in the kth infrared frame on the X-Y plane, .
x k and . y k denote the velocity toward the X and Y axes, respectively, and I k denotes the gray value.
In the IR field, the target energy accumulation is based on the principle of ballistic trajectory integral: where e denotes the energy of the target or noise, c and . c denote the trajectory of the target and the noise, respectively. The above formula means that if the energy is accumulated along the trajectory of the target in the IR sequence, the accumulated energy must be greater than the energy accumulated along any other trajectory. Therefore, when using the DP method to accumulate the energy of IR small targets, the optimization process opt and stage merit function ω are usually taken as the maximum function max and gray value I, respectively. At this time, the DP-TBD model is: where I denotes the gray value, (x k , y k ) denotes image coordinates in the kth frame, D k denotes the state transition set consisting of possible states at time k − 1; it refers to the motion range of the target between frames and is determined by the position and velocity of the target. l denotes the number of pixels corresponding to the velocity. According to (4) and (5), after k frame energy accumulation, the energy accumulation map I can be obtained. Because the accumulated energy of the noise may be greater than that of the target, the threshold Th is selected according to a certain false alarm rate to segment I to obtain the candidate point set X.
After the candidate points are obtained, the noise points can be further eliminated by track-association detection or other methods, and the final reserved point is taken as the target.

Agglomeration Effect
From the above analysis, it can be seen that the type of merit function and the size of the state transition set will affect the performance of DP methods. In practical applications, due to the lack of prior information about the target's moving direction, the hypothesis is usually that: the transition of the target state from the k − 1th to the kth frame is usually of equal probability. This means that the target energy in the k − 1th frame diffuses to the neighborhood of the corresponding position in the kth frame with equal weight. The other pixels in this neighborhood are noise points, resulting in a trajectory containing both target and noise points. In addition, if the target SCR is very low, then the energy of the noise with strong energy will have a similar diffusion phenomenon. Finally, many bright blocks, which are centered on the target or noise with strong energy, appear on the energy accumulation map. This is called the agglomeration effect.
To solve the problem of the agglomeration effect, direction information-based DP methods [18][19][20] have been proposed. However, as long as the target is not accumulated in strict accordance with the real trajectory, there will be diffusion of the target energy. Therefore, due to the lack of prior information about the target moving direction, the energy diffusion problem can only be alleviated.
To determine whether there is regularity that can be used, qualitative and quantitative analyses of the energy diffusion were carried out.

1.
Qualitative Analysis As in Figure 2a, the target position in the current frame is O(x o , y o ), and the target trajectory is the red curve in the XOY area with O as the origin point. When O is the point to be detected, energy accumulation based on the DP method is carried out in the XOY area. The energy accumulation and diffusion process of the target is shown in Figure 2a, where the red dot indicates the result of energy accumulation along the target track during the accumulation process, and the blue arrow indicates the diffusion of the target energy. The larger the dot is, the greater the energy accumulated. The wider the arrow is, the more energy diffused, and the black triangular points A, B and C indicate the noise points with strong energy in the current frame. When the number of accumulated frames is very small (as in Figure 2a, t = 1), the accumulated target energy and the diffused energy are very small, which can be ignored. With a gradual increase in the accumulated frames (as t > k), the accumulated target energy and the diffused energy increase gradually, which cannot be ignored. After the accumulation of previous n − 1 frames, the accumulated energy of the target is E(t = n − 1). From t = n − 1 to t = n, the previous accumulated energy diffuses to the target point O (with energy E O (t = n)) and the nearby noise point set {A, B, C · · ·} ∈ N (with energy E N (t = n)). At t = n, the energy of the target and the neighborhood noise is: According to (8), after energy accumulation, the probability that the energy of the target point is greater than that of the nearby noise point ( pends on the energy of the target and its nearby points in the current frame. When the According to (8), after energy accumulation, the probability that the energy of the target point is greater than that of the nearby noise point (P r {E(target) > E(noise)}) depends on the energy of the target and its nearby points in the current frame. When the target SCR is large (SCR > 3), decrease, meaning that an increasing number of noise points are enhanced and more false alarms are generated.

2.
Quantitative Analysis The analysis focuses on a popular DP method in which the maximum function max is the optimization process and the gray value is the stage merit function. Suppose that before processing, the noise obeys a Gaussian distribution N ∼ (µ, σ), the noise point energy is µ, the target point energy is T, and the number of accumulated frames is n.
For each pixel P in the current image, as in Figure 3a, 4 accumulated energy values I xpy (P), I ypz (P), I zpw (P), and I wpx (P) can be obtained after performing the DP method in 4 different areas {XPY, YPZ, ZPW, WPX}. There are three types of pixels in an image (see (9)): pixel P belongs to the target, noise is far from the target (noise_ f ar), and noise is near the target (noise_near).
According to (8), after energy accumulation, the probability that the energy of the target point is greater than that of the nearby noise point ( { ( ) > ( )}) depends on the energy of the target and its nearby points in the current frame. When the target SCR is large (SCR > 3), ) decrease, meaning that an increasing number of noise points are enhanced and more false alarms are generated.

Quantitative Analysis
The analysis focuses on a popular DP method in which the maximum function is the optimization process and the gray value is the stage merit function. Suppose that before processing, the noise obeys a Gaussian distribution ~( , ), the noise point energy is , the target point energy is , and the number of accumulated frames is .
For each pixel in the current image, as in Figure  Assume that when pixel P is the target, as in Figure 3a, the target trajectory C belongs to the XPY area. After energy accumulation, the final energy of P is: I(P) = max{I xpy (P), I ypz (P), I zpw (P), I wpx (P)} =    I xpy (P) = nT, P ∈ target, C ∈ XPY; I xpy (P) = kT + (n − k)µ, P ∈ noise_near, C ∈ XPY; nµ, P ∈ noise_ f ar, C ∈ 4 areas with equal probability.
When pixel P belongs to noise that is near the target, in the trajectory of this kind of point, the first k points belong to the targets, while the last n − k points belong to the noise. After accumulation, if SCR > 3, according to the 3σ rule of thumb, then the probability that the target energy is larger than the noise is more than 99%. That is, on the energy accumulation map, the mean value of the probability distribution of the target point is greater than that of the noise point. However, when SCR < 3, this relation might be incorrect. As in Figure 2b, for noise point A with strong energy (µ > T), k = n − 1, after accumulation (kT + (n − k)µ) > nT, which means E(noise) > E(target). The lower the SCR is, the more noise points are enhanced after energy accumulation.
Another conclusion can be drawn from (10): If P t ∈ target, then the corresponding trajectory C t of P t contains n target points; if P near ∈ noise_near, then the corresponding trajectory C near of P near contains k target points and n − k noise points; if P f ar ∈ noise_ f ar, then the corresponding trajectory C f ar of P f ar contains n noise points. After accumulation, whether the energy of P t is larger than that of P near , the trajectories C t and C near are in the same area. As shown in Figure 2, the trajectories of target point O and noise point A belong to areas XOY and XAY, respectively. Because the categories of points O and A are not known in advance, O and A are treated as one point P in the current image; thus, XOY and XAY are XPY. Therefore, as long as the energy of C t or C near is larger than that of trajectory C f ar , the target search area can be found via the following backtracking method: First, for the accumulation map I map , the point with the largest energy P max can be found through the maximum function max. When the SCR is not very low, the following premise (12) is true. As above, the point P max belongs to target or the target's nearby noise (noise_near).
{nT or (kT + (n − k)µ)}> nµ (12) Then, through P max and (10), find the area where the corresponding trajectory C max belongs and take the area as the target search area for each pixel.
Summarizing the above analysis, when using the DP method for energy accumulation under low SCR conditions, (1) the energy of the target will diffuse to the noise points and produce many false alarms; (2) if there is only one target, then the area to which the target trajectory belongs can be found according to the point with the largest energy (as the process in (11)), but this point cannot be taken as the target because it probably belongs to the noise.

Second Power Optimal Merit Function
As in the above analysis, the lower the SCR is, the more similar the probability distribution of the target and noise, and the smaller the probability of nT > nµ is, the larger the probability that the point with the largest energy P max belongs to the noise far from the target (noise_ f ar). According to (10), when P max ∈ noise_ f ar, it is impossible to find where the trajectory belongs because it belongs to 4 areas with equal probability. This means that the accuracy of finding the target trajectory according to (11) decreases under low SCR conditions. Therefore, to solve the above problems, the second power optimal merit functionbased DP method is proposed. It consists of four steps.
(1) As in Figure 3b, for each pixel P, 8 different search areas are set: {APC, BPD, CPE, DPF, EPG, FPH, GPA, HPB}, the range of each area is 90 • , all range is 360 • , and the overlap angle between each two areas is 45 • .
(2) In 8 search areas, for each pixel P, the DP method is performed, in which max is the optimization process and the gray value is the stage merit function to obtain 8 accumulated energy values I(P).
(3) For each pixel P, the optimal value is calculated according to the second power optimal merit function (14). Assume that when pixel P is the target, as in Figure 3b, target trajectory C belongs to APC and BPD. I opt (P) = max{I 1 , I 2 , I 3 , I 4 , I 5 , I 6 , I 7 , (nµ) 2 , P ∈ noise_ f ar, C ∈ 8 areas with equal probability. (14) (4) As in (11), the target search area is found via the following backtracking method.
where the area pair denotes the areas with overlaps.
where APC & BPD denotes the areas APC and BPD.
The proposed second power optimal merit function essentially transforms the comparison of the target and noise trajectory energy from the first power (as in (12)) to the following second power.
As long as premise (17) is true, this method can be used to find the area to which the target trajectory belongs.
According to the property of the power function: If x > 1, with increasing α (α > 0), y increases; if x > 1 and α > 1, with increasing x, the slope of the curve and the derivative increase, meaning that a small increase in x will lead to a large increase in y.
According to the principle of ballistic trajectory integral (3), after accumulation, the probability (P r ) that the target energy is greater than the noise energy is larger than the probability that the noise energy is greater than the target energy: This means When the accumulation frame number n > 1, nT, nµ > 1, from {nT, nµ} to (nT) 2 , (nµ) 2 , the increase in nT is larger than nµ, meaning there is a larger increase for the y of nT, according to the property of the power function and (20), the following relation can be deduced: Similarly, for nT and kT + (n − k)µ: To find the area to which the target trajectory belongs, premises (12) and (17) must be true. Based on (21) and (22), it can be concluded that the probability of premise (17) being true is larger than that of premise (12), meaning the second power optimal merit function-based method is more reliable than the first power, in theory. Similar to (11), the point with the largest energy P max in (15) can only be used to find the target search area but cannot be taken as the target because it probably belongs to the noise.
As in Figure 3b, point P is the target, and the target trajectory is the blue curve in the APC area with P as the origin point. According to the proposed second power merit function-based DP ((12)-(15)), the area pair {APC&BPD} can be obtained. Both APC and BPD can be used as the target search area because both areas contain the target trajectory.
To accurately locate the position of the target, for each pixel to be detected, the next task is to find the possible trajectories in the target search area. Compared with an exhaustive search, this approach reduces the trajectory hypothesis space by 3 4 for each pixel. After the target search area is obtained, for each pixel in the current frame, the possible trajectories in the target search area need to be found. Since the sampling frequency of the infrared detector cannot completely match the moving speed of the target, if the target moves in a straight line in the real 2D space, then the imaging of the target is probably not a straight line in the discretized 2D image space. As in Figure 4a, in the 2D space the real trajectories that pass through point P are the red straight lines. If these lines are mapped to the 2D image space, then the trajectories are no longer straight lines but tree-structure-like curves that pass through point P, and the nodes of the curves are different pixels, as shown in Figure 4b. The number of trajectories (m) in the XPY area are related to the length of the sequence n. m = 2 n − 1 (23) Similarly, for and + ( − ) : To find the area to which the target trajectory belongs, premises (12) and (17) must be true. Based on (21) and (22), it can be concluded that the probability of premise (17) being true is larger than that of premise (12), meaning the second power optimal merit function-based method is more reliable than the first power, in theory.
Similar to (11), the point with the largest energy in (15) can only be used to find the target search area but cannot be taken as the target because it probably belongs to the noise.
As in Figure 3b, point is the target, and the target trajectory is the blue curve in the area with as the origin point. According to the proposed second power merit function-based DP ((12)-(15)), the area pair { & } can be obtained. Both and can be used as the target search area because both areas contain the target trajectory.
To accurately locate the position of the target, for each pixel to be detected, the next task is to find the possible trajectories in the target search area. Compared with an exhaustive search, this approach reduces the trajectory hypothesis space by 3 4 for each pixel.

The Proposed Parallel MHT Model
After the target search area is obtained, for each pixel in the current frame, the possible trajectories in the target search area need to be found. Since the sampling frequency of the infrared detector cannot completely match the moving speed of the target, if the target moves in a straight line in the real 2D space, then the imaging of the target is probably not a straight line in the discretized 2D image space. As in Figure 4a, in the 2D space the real trajectories that pass through point are the red straight lines. If these lines are mapped to the 2D image space, then the trajectories are no longer straight lines but tree-structurelike curves that pass through point , and the nodes of the curves are different pixels, as shown in Figure 4b. The number of trajectories ( ) in the XPY area are related to the length of the sequence .  These trajectories have the following property. For the points in different positions on the current image, only the coordinates of nodes in each trajectory are different; the shape of the tree-structured trajectory space and the number of all trajectories are the same. This is called the 'trajectory shape similarity property' in this paper. To describe this property, the multiple trajectory hypotheses model ∆H tree is constructed. As in Figure 5, ∆H tree is a m × n 2-D matrix. The relative coordinates of all trajectories are saved in ∆H tree . As in Figure 5, ∆X m n denotes the relative coordinates (∆x, ∆y) of the nth point in the mth trajectory. For the point at position P(x, y), the tree-structured trajectory space can be described by H (x,y) .
same. This is called the 'trajectory shape similarity property' in this paper. To describe this property, the multiple trajectory hypotheses model ∆Η is constructed. As in Figure 5, ∆Η is a × 2-D matrix. The relative coordinates of all trajectories are saved in ∆Η . As in Figure 5, ∆ denotes the relative coordinates (∆ , ∆ ) of the th point in the th trajectory. For the point at position ( , ), the tree-structured trajectory space can be described by ( , ) .
where ⊕ denotes adding the coordinates of the current point ( , ) to each node in ∆Η .For point , after obtaining trajectories, the testing process is: where denotes the infrared sequence with length , the size of each frame is Μ × Ν, and 1 ≤ ≤ Μ, 1 ≤ ≤ Ν. ⊛ denotes matching the node coordinates on each trajectory with the image gray value and accumulating energy for each trajectory. denotes the energy of the th trajectory. max : denotes the testing process, which takes the trajectory with the largest energy as the trajectory for point and the corresponding energy as the accumulated energy ( , ) . Equations (24) and (25) constitute the proposed MHT model. There are the following advantages: (1) The trajectory search problem is simplified. Because of the 'trajectory shape similarity property', the tree-structured trajectory space for each point can be saved in ( , ) using (24). In this way, a one-by-one search of root nodes in all trajectories is avoided, reducing the calculation cost and improving the efficiency of obtaining trajectory space. The larger the image resolution is, the larger reduction in calculations.
(2) The energy accumulation process can be implemented in parallel. The process of energy accumulation is independent of the trajectory order, so all trajectories can be operated in parallel. In actual operation, all trajectories can be allocated to different threads of different central processing units (CPUs). This means that (25) can be performed in different threads. This is very beneficial for improving the operability of the TBD method.
where ⊕ denotes adding the coordinates of the current point (x, y) to each node in ∆H tree .For point P, after obtaining trajectories, the testing process is: where S denotes the infrared sequence with length n, the size of each frame is M × N, and 1 ≤ x ≤ M, 1 ≤ y ≤ N. denotes matching the node coordinates on each trajectory with the image gray value and accumulating energy for each trajectory. I m denotes the energy of the mth trajectory. max 1:m denotes the testing process, which takes the trajectory with the largest energy as the trajectory for point P and the corresponding energy as the accumulated energy E(P(x, y)). Equations (24) and (25) constitute the proposed MHT model. There are the following advantages: (1) The trajectory search problem is simplified. Because of the 'trajectory shape similarity property', the tree-structured trajectory space for each point can be saved in H (x,y) using (24). In this way, a one-by-one search of root nodes in all trajectories is avoided, reducing the calculation cost and improving the efficiency of obtaining trajectory space. The larger the image resolution is, the larger reduction in calculations.
(2) The energy accumulation process can be implemented in parallel. The process of energy accumulation is independent of the trajectory order, so all trajectories can be operated in parallel. In actual operation, all trajectories can be allocated to different threads of different central processing units (CPUs). This means that (25) can be performed in different threads. This is very beneficial for improving the operability of the TBD method.
(3) Compared with the classical MHT method [13], the proposed MHT method considers both accuracy and efficiency. Under low SCR conditions, to ensure the detection probability, more nodes should be reserved in the early stage, and the redundant trajectories should be deleted in the later stage when the accumulated energy reaches a certain extent. The classical MHT is a multistage testing process, and some nodes will be deleted in each stage, causing the loss of accuracy. To avoid this loss, in the first stage of proposed MHT model, all trajectories and nodes are saved. In addition, the whole process of classical MHT includes the establishment of a tree-structured list and the judgment, insertion and deletion of the tree nodes. The computational complexity of the operation of the tree list is high. In the proposed MHT model, the operations involving the tree list are only (24). The only testing process is max 1:m . The computational complexity is very low.

Two-Stage MHT Model
In the process of energy accumulation, the lower the target SCR is, the longer the length of the trajectory needed, and the larger the number of trajectories, meaning the larger the trajectory hypothesis space. As in (23), the number of trajectories m increases exponentially with increasing sequence length n. For example, from n = 15 to n = 30, m changes from 3.3 × 10 4 to 1.1 × 10 9 . The trajectory length is only doubled, and the number of trajectories is increased by approximately 3.3 × 10 4 times, which brings much more computing and storage consumption than the benefits of energy accumulation. The main reason for this problem is the redundancy of the trajectory. As shown in Figure 6a, trajectory OP is one trajectory that accumulates from point O (t = 1) to point P (t = n). For point P, when t = n + 1, there are 3 possible trajectories: OPQ, OPR and OPS. OP is the overlapping trajectory. If OP is the target trajectory and n is relatively large, the energy of the target has been accumulated to a certain extent at point P. The OPQ, OPR and OPS contain the target trajectory OP, so the energy accumulated along the three trajectories may be similar. At this time, OPQ, OPR and OPS are redundant trajectories, and two of them should be removed. When n is small, the energy of the target has not been fully accumulated when reaching point P. In this case, any trajectory cannot be eliminated to ensure that the target trajectory is retained. Therefore, the premise of eliminating redundant trajectories is that the target energy has accumulated to a certain extent.
list is high. In the proposed MHT model, the operations involving the tree list are only (24). The only testing process is max : . The computational complexity is very low.

Two-Stage MHT Model
In the process of energy accumulation, the lower the target SCR is, the longer the length of the trajectory needed, and the larger the number of trajectories, meaning the larger the trajectory hypothesis space. As in (23), the number of trajectories increases exponentially with increasing sequence length . For example, from = 15 to = 30, changes from 3.3 × 10 to 1.1 × 10 . The trajectory length is only doubled, and the number of trajectories is increased by approximately 3.3 × 10 times, which brings much more computing and storage consumption than the benefits of energy accumulation. The main reason for this problem is the redundancy of the trajectory. As shown in Figure 6a, trajectory OP is one trajectory that accumulates from point O ( = 1) to point P ( = ). For point P, when = + 1, there are 3 possible trajectories: OPQ, OPR and OPS. OP is the overlapping trajectory. If OP is the target trajectory and is relatively large, the energy of the target has been accumulated to a certain extent at point P. The OPQ, OPR and OPS contain the target trajectory OP, so the energy accumulated along the three trajectories may be similar. At this time, OPQ, OPR and OPS are redundant trajectories, and two of them should be removed. When is small, the energy of the target has not been fully accumulated when reaching point P. In this case, any trajectory cannot be eliminated to ensure that the target trajectory is retained. Therefore, the premise of eliminating redundant trajectories is that the target energy has accumulated to a certain extent. The two-stage search-based energy accumulation method can be used to reduce redundant trajectories, as shown in Figure 6b. In the first stage, from = 1 to = , O is taken as the starting point, and energy is accumulated along the sparse trajectories. After accumulation, some trajectories are missed because the trajectories are sparse. In the second stage, starting from the missing point in the first stage, the energy is accumulated from = to = . After a two-stage search, not only is the number of redundant trajectories reduced but also the search of multiple trajectories is realized. The two-stage search-based energy accumulation method can be used to reduce redundant trajectories, as shown in Figure 6b. In the first stage, from t = 1 to t = k, O is taken as the starting point, and energy is accumulated along the sparse trajectories. After accumulation, some trajectories are missed because the trajectories are sparse. In the second stage, starting from the missing point in the first stage, the energy is accumulated from t = k to t = n. After a two-stage search, not only is the number of redundant trajectories reduced but also the search of multiple trajectories is realized.
Therefore, to mitigate the contradiction between the redundant trajectories and the requirements of more trajectories under low SCR, the two-stage MHT model is proposed. For each point P(x, y) in the current frame, the two-stage MHT model is used to obtain the accumulated energy. It consists of three steps.
(1) In the first stage, from t = 1 to t = k, in the target search area XPY, as in (24) and (25), the proposed MHT model is used to obtain the tree-structured trajectory space and the accumulated energy E 1 (P(x, y)).
(2) In the second stage, from t = k to t = n, in the opposite area X PY , as in Figure 6c, the proposed MHT model is used to obtain E 2 (P(x, y)).
In H (x,y) , the relative coordinates of all trajectories in X PY are saved. (3) For each point P, the final accumulated energy E(P(x, y)) is: If k = n ÷ 2, the trajectory number m 2 in the proposed two-stage MHT model is The ratio of m and m 2 is α: (30) α shows that with increasing trajectory length, compared with the proposed singlestage MHT, the trajectory hypothesis space of the two-stage MHT decreases exponentially. For example, when n = 30, the trajectory number in the single-stage MHT is approximately 1.07 × 10 9 . Therefore, many trajectories make the trajectory space very difficult to store and calculate, and there is basically no operability. However, the trajectory number in the two-stage MHT is approximately 6.5 × 10 4 , making 30-frame-based energy accumulation possible.

Target Tracking Method
For each pixel of the current frame, after energy accumulation via two-stage MHT, the energy accumulation map can be obtained, and the candidate target points can be obtained by constant false alarm (CFAR) segmentation of the map.
To further eliminate false alarms, a target tracking method is designed. It consists two steps.
(1) All the reserved candidate points in the target search area are tracked.
(2) The discontinuous trajectories and the associated points are deleted.

Detection Framework
The DP-MHT-TBD detection framework is shown in Figure 7. It is a sequential detection process.
Part 1: Second power optimal merit function-based DP The continuous n 1 frames are fed into the proposed second power optimal merit function-based DP to obtain the target search area XPY and its opposite area X PY . These two areas are used in the next part. The flow of this part consists of four steps (the details can be seen from Formula (13) to (16) in Section 2.1.3).
Part 2: Two-stage MHT and CFAR segmentation (1) The previous n 2 frames are used for energy accumulation. For each pixel P in the current frame, the proposed two-stage MHT model is used to accumulate energy along the trajectories only in areas XPY and X PY . Once all the points have been processed, the energy accumulation map can be obtained. The flow of this part consists of three steps (the details can be seen from Formula (26) to (28) in Section 2.2.2).
(2) The CFAR segmentation method is used to segment the energy accumulation map to obtain the candidate points of the current frame. In the segmentation process, the threshold Th is set according the pre-set false alarm rate F a .
Part 3: Target tracking All the reserved candidate points are tracked, the discontinuous trajectories are found, and the candidate points in the discontinuous trajectories to noise are classified. The flow of this part consists of two steps (the details can be seen in Section 2.3).

Figure 7.
Proposed DP-MHT-TBD detection framework. It consists of the second power optimal merit function-based DP, two stage MHT and CFAR segmentation, target tracking. It is a sequential detection process.

Part 1: Second power optimal merit function-based DP
The continuous frames are fed into the proposed second power optimal merit function-based DP to obtain the target search area XPY and its opposite area . These two areas are used in the next part. The flow of this part consists of four steps (the details can be seen from Formula (13) to (16) in Section 2.1.3).
Part 2: Two-stage MHT and CFAR segmentation (1) The previous frames are used for energy accumulation. For each pixel P in the current frame, the proposed two-stage MHT model is used to accumulate energy along the trajectories only in areas XPY and . Once all the points have been processed, the energy accumulation map can be obtained. The flow of this part consists of three steps (the details can be seen from Formula (26) to (28) in Section 2.2.2).
(2) The CFAR segmentation method is used to segment the energy accumulation map to obtain the candidate points of the current frame. In the segmentation process, the threshold ℎ is set according the pre-set false alarm rate .

Part 3: Target tracking
All the reserved candidate points are tracked, the discontinuous trajectories are found, and the candidate points in the discontinuous trajectories to noise are classified. The flow of this part consists of two steps (the details can be seen in Section 2.3).

Experiments and Analysis
To verify the proposed algorithm, experiments regarding the proposed second power optimal merit function-based DP, two-stage MHT and target tracking were carried out. Figure 7. Proposed DP-MHT-TBD detection framework. It consists of the second power optimal merit function-based DP, two stage MHT and CFAR segmentation, target tracking. It is a sequential detection process.

Experiments and Analysis
To verify the proposed algorithm, experiments regarding the proposed second power optimal merit function-based DP, two-stage MHT and target tracking were carried out.

Datasets and Evaluation Setup
In practical applications, it is difficult to obtain the IR data of space point targets, so an IR image with low SCR is simulated first. Some images are shown in Figure 1b. When building the simulation image, the noise function of MATLAB is first used to add Gaussian noise ( N ∼ (µ, σ)) to a blank 256 × 256 image to obtain the background B. The mean value µ and the standard deviation σ of the background are 90 and 10, respectively. Then, according to an SCR value, as in (32), the target T is only set at one certain position in each image. In the datasets, the target occupies only one pixel in each frame and the target motion speed is 1 pixel per frame.
The targets in the infrared sequence simulated according to the above process obey normal distribution, making the simulated datasets more reasonable and more correspondent with the actual infrared point target scene.
The target search area detection probability P area is used to verify whether the proposed second power DP can correctly find the target search area. The detection probability P d and the false alarm rate F a are used as the evaluation metrics for target detection, and P d and F a are the ordinate and the abscissa of the receiver operating characteristic (ROC) curve, respectively [4].
The detected target is considered true if it simultaneously meets two requirements: (1) the center of the ground truth is detected and (2) the pixel distance between the center of the ground truth and the result is less than 2 pixels (Manhattan distance).
All the experiments were implemented with MATLAB R2019 and C++ in Ubuntu 16.04 on a PC with a 4-core CPU and 16-GB RAM.

Experiments on Second Power Optimal Merit Function-Based DP
A total of 1000 sets of sequences with different lengths and different SCRs are used to verify the proposed second power optimal merit function-based DP.

The Second Power Optimal Merit Function-Based DP
To determine the advantage and validity of the proposed method, a comparative experiment with respect to the first/second power optimal merit function-based DP is carried out. The SCR of the sequences used in the experiment is 1.5. The comparison results are shown in Table 1, where n denotes the number of frames used in different methods. From the comparison result (see Table 1), the following conclusions can be drawn: (1) In a very low SCR condition (SCR = 1.5), no matter how many frames are used, and whether the first or second power function-based DP is used, P area > P d . For example, when using 60 frames of images, the target search area detection probability is 98.57%, while the target detection probability is only 78.76%. As in Table 1, the P d value of all methods is less than 80%. These data show that using the DP method to find the target search area is more reliable than directly detecting the target position.
(2) P area increases with the number of frames used in the energy accumulation. However, this parameter does not grow linearly but grows more slowly. For example, when n increases from 45 to 60, P area increases approximately 3.3%. However, when n increases from 60 to 75, P area only increases approximately 0.1%. P d increases with the number of frames, but is not necessarily larger as the number of frames continues to increase. For example, compared with n = 60, when n = 75, P d decreases by approximately 5%.
(3) The first power DP and the second power DP accumulate energy in four and eight areas, respectively, so the time cost of the second power DP is twice that of the first power DP. However, even if 60 frames are used, it only takes 1.6 s to find the target search area. In practical applications, the more frames that are used, the greater the amount of calculation and the worse the real-time performance. After comprehensive consideration of various factors, the appropriate n is 60.

The SCR of the Infrared Sequence
In each different SCR (from 1 to 1.5), 1000 sets of sequences are used to investigate the performance of the proposed method. The length n of every sequence is 60.
The P area and P d performances are shown in Table 2 and Figure 8. The following conclusions can be drawn from the results:  (1) In different SCRs, the performance of the proposed second power DP is better than that of the original first power DP. As shown in Figure 8, the and curves of the second power DP are always above the curve of the first power DP, showing the advantages of the second power DP. In addition, the curves of different methods are above the PD curves, further showing that using the DP method to find the target search area is more reliable than directly detecting the target position.
(2) As the SCR decreases, decreases rapidly. For the proposed second power DP, when ≤ 1.3, is less than 80%, and when ≤ 1.2, is less than 50%, indicating that the effect of the proposed method is not good when the < 1.4. The subsequent target detection part is based on the correct detection of the target search area. If is less than 90%, even if the of the subsequent MHT part is very high, then the final detection probability will not be very high.

Experiments on Two-Stage MHT
The condition of the proposed MHT is that the target search area is correctly deter- (1) In different SCRs, the performance of the proposed second power DP is better than that of the original first power DP. As shown in Figure 8, the P area and P d curves of the second power DP are always above the curve of the first power DP, showing the advantages of the second power DP. In addition, the P area curves of different methods are above the PD curves, further showing that using the DP method to find the target search area is more reliable than directly detecting the target position.
(2) As the SCR decreases, P area decreases rapidly. For the proposed second power DP, when SCR ≤ 1.3, P area is less than 80%, and when SCR ≤ 1.2, P area is less than 50%, indicating that the effect of the proposed method is not good when the SCR < 1.4. The subsequent target detection part is based on the correct detection of the target search area.
If P area is less than 90%, even if the P d of the subsequent MHT part is very high, then the final detection probability will not be very high.

Experiments on Two-Stage MHT
The condition of the proposed MHT is that the target search area is correctly determined by the proposed DP. Event A is defined as the target search area correctly found by the proposed DP, while event B is defined as the target correctly detected by the proposed MHT. The target detection probability of the DP-MHT-TBD is P(dp − mht − tbd). In this work, P area = P(A) and P(dp − mht − tbd) = P(AB) According to the conditional probability formula, P(AB) is: where P(B|A) denotes the conditional probability. Therefore, to obtain P(B|A) , experiments were carried out on the basis of event A being true. According to above analysis, when SCR ≥ 1.4, the proposed DP can find the target search area with probability greater than 90. The MHT-based target detection part is based on the target search area. Therefore, this section only investigates the performance of the proposed two-stage MHT when SCR ≥ 1.4. A total of 1000 sets of sequences with different lengths n are used to verify the performance of the proposed MHT, and the target SCR of each sequence is 1.5 or 1.4. The ROC curves are shown in Figure 9, and the P d and time cost are shown in Tables 3 and 4, respectively. In this section, P d = P(B|A) .      From the results, the following conclusions can be drawn: (1) The more images used, the better the performance but the greater the amount of calculation. As shown in Figure 9, the larger n is, the closer the ROC curve is to the top left corner, indicating that the more images used, the greater the energy accumulated in the energy accumulation process, the higher the target detection probability and the lower the false alarm rate. With increasing n, the amount of calculation also increases. For example, when n = 30, the time cost is 33 s, which means that it takes approximately 9 h to perform 1000 experiments. The amount of calculation is too large, so n is less than 28 in the experiment.
(2) The performance decreases with decreasing SCR. As in Tables 3 and 4, to meet the dual requirements of F a < 0.1% and P d > 90%, when SCR is 1.5 and 1.4, the minimum number of images required is approximately 16 and 20, respectively.
After obtaining P(A) and P(B|A) , according (36), the final target detection probability of the DP-MHT-TBD P(dp − mht − tbd) can be obtained. When F a = 0.1%, P(dp − mht − tbd) is shown in Table 5. When SCR is 1.5, the proposed DP-MHT-TBD can detect the target in the image sequence with a detection probability of more than 90%.

Experiments on Target Tracking
After CFAR segmentation of the energy accumulation map, candidate points can be obtained. However, the candidate points contain both targets and noise.
The energy accumulation maps are obtained by the two-stage MHT in which 20 frames are used for energy accumulation. Then, the CFAR segmentation method is used to segment 450 continuous energy accumulation maps. In the target tracking part, the trajectories that are discontinuous in five continuous segmentation maps are deleted, and the associated points are classified as noise. For each operation, the time cost is 0.003 s. The change in F a before and after target tracking is shown in Table 6. It can be seen that the target tracking method can eliminate false alarms by an order of magnitude.

Comparison Experiments
To verify the superiority of the proposed DP-MHT-TBD, three kinds of methods should be compared: TBD, DBT and DL Methods. However, this work focuses on the detection of the 1 × 1 point target under low SCR condition (SCR = 1.5). There is no shape and texture information in the 1 × 1 point target. It is difficult to train the neural networks with a sample in which there is only grey information of one pixel. Many deep learning methods were tested in the experiment, such as Faster-RCNN, the previous work's spatial-temporal based method [1], TSDF [2] and so on. In these methods, the neural network could not learn the useful features and the network could not achieve coverage during the training process. To the best of our knowledge, there is no deep learning method that could detect the infrared point target under low SCR (SCR = 1.5). Thus, there are no experiments about deep learning methods in this work. In addition, the point target detection task under low SCR is usually related to the military field; it is difficult to obtain relevant code due to confidentiality rules or technical reasons. Considering the above factors, the following representative DBT and TBD methods are selected: DBT: LIG [6], multiscale patch-based contrast measure (MPCM) [28], absolute average gray difference (AAGD) [29], absolute average difference weighted by cumulative directional derivatives (AADCDD) [30] and LCM utilizing a tri-layer (TLCM) [31]; TBD: DP [12], second power optimal merit function-based DP, and facet derivative-based multidirectional edge awareness with spatial-temporal tensor (FDMDEA-STT) [32]. The experimental parameters and time cost of all methods are listed in Table 7. The ROC curves are shown in Figure 10. Some CFAR segmentation maps are shown in Figure 11. Table 7. Parameter setting and time cost of different methods.
(2) The proper energy accumulation method is the key to the detection of dim point targets. Only the proposed DP-MHT-TBD can detect the target because the energy of the From the results, it can be seen that: (1) The proposed DP-MHT-TBD is superior to other methods with respect to the detection probability and false alarm rate. The proposed second power DP has better performance than the original DP.
(2) The proper energy accumulation method is the key to the detection of dim point targets. Only the proposed DP-MHT-TBD can detect the target because the energy of the point target is accumulated through the right way. In other methods, the target energy is not accumulated (in single-frame detection methods and FDMDEA-STT), or there is a serious diffusion of energy during the accumulation process (in DP and second power DP); the target is missed after segmentation because the energy of the target point is not superior to that of the noise point.
(3) The consumption time of the DP-MHT-TBD is greater than that of the other methods because there are more images to be processed. The proposed method is a sequential detection process and the image processing speed can be accelerated by using GPU or other ways, having high practical application values.
In brief, in Section 3, experiments on the proposed second power optimal merit function-based DP, two-stage MHT, target tracking and comparison methods are carried out. The results of each part verify the superiority of the corresponding part. When SCR is 1.5, 60 images (image size is 256 × 256) are used in the second power optimal merit function-based DP to find the target search area, 20 images are used in the two-stage MHT to accumulate energy, and 5 segmentation maps are used in target tracking to further eliminate false alarms. According to Tables 2, 5 and 6, the proposed DP-MHT-TBD can detect a single point target in the image sequence with a detection probability above 90% and a false alarm rate of below 0.01%.

Discussion
Several observations from experimental and quantitative analysis are discussed.
First, theoretical and experimental studies have shown that the proposed second power optimal merit function-based DP outperforms the original DP. The proposed second power DP is used to find the target search area instead of directly detecting the target, avoiding the influence of agglomeration effect and reducing the trajectory hypothesis space.
Second, the proposed two stage parallel MHT is the key to detecting a point target under low SCR. This can be attributed to the fact that the novel parallel MHT structure and the two-stage strategy simplifies the trajectory search problem and reduces the hypothesis space exponentially. This enables the energy accumulation process of dim targets to be very efficient.
Third, as the SCR decreases, the point detection task becomes more and more difficult. The gain is finite by only using infrared data or improving TBD methods. Some potential ideas should be considered. (1) The idea of adaptive spatial-temporal context [33] can be used to improve the robustness of the tracking part in the TBD. (2)The multi-sensor data fusion [34,35] may be a valuable project for theoretical investigation and practical application in the dim point detection field. With the development of detector technology, it will be easy to obtain radar, infrared, hyperspectral and other data. If these data can be fused to take advantage of their respective advantages, it will be of great help in reducing the trajectory hypothesis space, improving processing efficiency and detection capability.

Conclusions
In this paper, a novel and accurate DP-MHT-TBD (dynamic programming-multiple hypothesis testing-track before detect) algorithm is proposed for infrared dim point target detection. The method consists of three parts: the second power optimal merit functionbased DP, two-stage MHT and target tracking. In particular, first, for each point to be detected, the second power optimal merit function-based DP is used to find the target search area to reduce the trajectory hypothesis space. Next, the two-stage MHT, which can further reduce the trajectory hypothesis space exponentially and mitigate the contradiction between the redundant trajectories and the requirements of more trajectories under low SCR (signalto-clutter ratio), is used to save the tree-structured trajectory space and accumulate energy in parallel. Finally, after CFAR (constant false alarm) segmentation of the accumulation map, the target tracking method is used to further eliminate false alarms. Via experiments on each part and comparison methods, the proposed DP-MHT-TBD algorithm, which takes advantage of the small computation cost of DP and high accuracy of exhaustive search, greatly reduces the computational cost and storage requirements, showing superior ability in detecting point targets under low SCR conditions. The point target detection task under low SCR is difficult but significant. Although the proposed method has engineering applicability and good detection performance, it can only detect a single point target. In future work, the studies could be conducted from the following aspects: the combination of TBD and deep learning methods, the use of adaptive spatial-temporal context information and the fusion of multi-sensor data.