A Novel 4D Track-before-Detect Approach for Weak Targets Detection in Clutter Regions

: A 4D TBD approach is developed here for closely weak extended target tracking and overcoming heterogeneous clutter background and various clutter regions. The 4D measurements in this work are the points containing three positional information in spatial space and corresponding timestamp. The proposed method is mainly designed to address two issues. The ﬁrst one is the dilemma between the weak target detection and difﬁcult computation originating from the high dimensions of measurement. The second issue is the suppression of inhomogeneous background clutter and various clutter regions. The extension experiment using synthetic data showcases that no false alarm track would be built in the clutter regions, and the detection rate of close targets exceeds 94%. The experiments using real 3D radar also prove that the method works well in tracking closely maneuvering extended targets even if a clutter region exists.


Introduction
In this work, a 4D TBD approach method is developed to detect the weak tracks in 3D radar systems. The 4D information of measurement includes the 3D position and its timestamp. Two extensively existing issues greatly deteriorate the target tracking performance in the application of 3D radars. The first issue is the dilemma between the weak extended target detection and difficult computation originating from the high dimensions of measurement [1].
The targets in this work are assumed extended targets [2]. The PHD filter [3] and CPHD filter [4] have been developed for multiple extended target tracking. Compared with single point target tracking, measurement partition [5] should be added before using a PHD filter. Extensive computation will be necessary to associate all possible measurement partitions with potential tracks, especially where lots of targets or measurements exist [3]. In [6,7], raw hydrophone data of a bistatic architecture sonar are processed using a TBD strategy, where several consecutive scans (or frames) are jointly processed, relying on target kinematics or, simply, exploiting the physically admissible target transitions, jointly declaring the presence of a target [8]. The unthresholded data in [6][7][8] are far more than the normal measurements. Therefore, extended target tracking using a TBD method strategy is a computationally complex task. In the issue of weak trajectory detection, various TBD methods have been developed such as DP-TBD [9][10][11], PF-TBD [12][13][14][15][16], HT-TBD [2,[17][18][19], and some optimization based TBD [20]. In 3D target tracking scenarios, the computation of the TBD methods [9,10,[12][13][14]17,18,20] will sharply increase for a satisfying tracking performance. It is mainly because all possible high-dimensional solutions are exploited. The methods [2,19,21] are designed for 2D radars whose position information only contains the X axis and Y axis. Therefore, to solve the problem, the proposed 4D TBD consists of two stages. A 3D TBD in the first stage is designed to filter out the majority of clutter points, and multiple 2D TBD in the second stage is used to carefully detect the tracks with the reserved measurements. Therefore, the proposed method is efficient in dealing with high-dimensional measurements and maintains a remarkably good performance in tracking weak extended targets.
The proposed method here is designed to track the weak targets in uniform linear motion and slowly maneuvering targets, especially for aircraft, using air surveillance radar [22]. In our former work [13,19,21], tracking maneuvering targets [23] were divided into two stages, tracklet generation and tracklet association. In maneuvering target tracking cases, extended target tracklets can be extracted by the proposed method. In the tracklet association, the tracklets of the same target are associated to get the whole maneuvering trajectory. The tracklets are far fewer than the points, which decreases the computation of obtaining trajectory by associating points directly. Therefore, the maneuvering target tracking problem can be also addressed with fewer extra computations.
The second issue is the clutter region suppression [24]. The regions in the surveillance area can be quite different. For example, downtown areas usually generate far more false alarm points than villages. The nonhomogeneity of the surveillance region makes the tracking method very sensitive to false alarms and target miss-detection. In these scenarios, the clutter region suppression is significant in reducing false tracks and improving true target track life [25]. To address this issue, Spatio-temporal detection [26], a clutter map [27], and a prior knowledge-based method [25] have been developed to suppress the clutter region. In [25,[28][29][30], clutter region suppression using a clutter map is applied as preprocessing before target tracking. In [25,28], the clutter regions are labeled and then the measurements or initial tracks in these regions are removed. In [29,30], the clutter densities of different regions are considered, and the clutter map is integrated with standard tracking methods. The points that originated from low clutter density regions are given a higher weight and are more likely to be associated with a track.
In the proposed method, rather than estimating point weight with a clutter map, a novel clutter map in each detection bin is integrated with the TBD method. The component of various clutter regions and background can be well canceled in track detection.
Not only the two issues mentioned above, but other issues, such as the ability to cope with multiple close tracks and the universality of the extended target and non-extended target, are also addressed in this work. The merits are validated by both simulation and real experiments of 3D radar. This paper is organized as follows. Section 2 introduces the system and state-ofthe-art models. In Section 3, the framework of the proposed 4D-TBD is introduced at first. Then, a detailed description of the three parts of the proposed 4D-TBD is presented. Thirdly, the theoretical model of the whole algorithm is presented. Finally, the expression of achieving the optimal detection thresholds is derived. To illustrate the effectiveness of the proposed method, existing approaches are compared with the proposed method. Section 4 shows results using both Monte Carlo simulation and real data gathered from actual air surveillance radar. Conclusions are drawn in Section 5.

Preliminaries
A 3D radar system is considered in this work capable of returning the target position in 3D space. We write k for the index of a potential target, i for the index of measurement, and t for the time. Vectors and matrices are denoted by boldface characters. The state evolution of individual targets is assumed following the linear Markov model; thus, the target motion model can be expressed by: where the (x k t ,ŷ k t ,ẑ k t ) is the target position at time t in X, Y, and Z axes, respectively. The (v k x , v k y , v k z ) is the corresponding velocities, and the (x k 0 , y k 0 , z k 0 ) regards the initial location at time t = 0.
The measurement model of 3D radar can be represented by: The (e x , e y , e z ) is the system noise originating from the mismatching between the assumed motion model (1) and ground truth. The (e r , e α , e φ ) is the measurement noise originating from the noise in the radar system. Both the (e x , e y , e z ) and the (e r , e α , e φ ) are unavoidable in real cases and assumed zero-mean and Gaussian. The track detection in this work is performed in Cartesian coordinates. The measurement is the unlabeled points which have 3 components. The position component is the (x i , y i , z i ). The time component (t i ) regards the time when the measurement is acquired. The feature component (w i ) means the point score of points such as the amplitude of the point. A larger point score infers that this point more likely originated from a target. Therefore, each measurement point Z i can be represented by: Meanwhile, the extended target problem arises for the high resolution of the radar. Each target may generate more than one point. The set of points originating from a target can be expressed by {z} k t , where {z} k t = {x k,n t , y k,n t , z k,n t , w k,n t |n = 1, . . . , |{z} k t |}, where | • | denotes cardinality. The quantity of target points during one illumination of radar beam |{z} k t | usually is assumed as following a Poisson distribution whose expectation is γ k . The γ k is related to the target size and radar parameters. The time interval between two scans of the target is represented by t In , and t 2 −t 1 t In × γ k target points will be generated during t 1 to t 2 on average.
Besides the points of the target, the surveillance area also contains the points of the non-target. The false alarm points can be divided into two categories. The first resource is background clutter originating from the noise of radar itself and ghost waves. The false alarm points of the background are usually randomly and uniformly scattered among the entire surveillance area. We assume that the background clutter density is γ 0 per cubic meter. The second source of false alarm points is clutter regions. For the difference between areas, the clutter density of regions can be quite different. For example, an urban region generates far more electromagnetic interference and brings more clutter points. The clutter regions are usually irregular and contain more false alarm points. Meanwhile, the false alarm rate of clutter regions is various and unknown. The false point set of background and a clutter region is denoted by {z} 0 F and {z} c F , and c is the index of the clutter region. The quantity of false alarm points in the background and clutter region is also assumed to follow a Poisson distribution whose expectation is the production of clutter density and clutter region volume. The density of c-th clutter region is represented by γ 0 c Therefore the set of measurements during time t 1 and t 2 can be expressed by where N, M, and C are the number of measurement points, targets, and clutter regions, respectively. The proposed method is designed to acquire the target trajectory {z} k which contains all the points of the kth target.
The temporal and spatial relationship between the target points in (1) is exploited to solve the problem.

Overview and Comparison with Previous Work
The proposed 4D TBD method is mainly designed to address the issues of the weak target detection and non-homogeneous clutter with measurement of 3D radar systems. Clutter density of background and clutter regions in each detection bin is estimated for clutter suppression before the processing of current measurements. In each detection bin, the clutter component can be canceled by subtracting the current map and the clutter one. By processing the measurements during a relatively long time period jointly, the "energy" or "vote" of a target can be accumulated, and that of clutter is well canceled.
The 4D TBD problem is decomposed into a 3D TBD stage and a 2D TBD stage, corresponding to a rough detection and a final confirmation, respectively. Two thresholds corresponding to the two TBD stages are set. The 3D TBD is applied to get the most promising candidates using the first detection threshold T 1 d . The candidates are further detected by multiple 2D TBD, and a candidate can be confirmed as a target if its score exceeds the second detection threshold T 2 d . The majority of non-candidate points are discarded in the first stage. So, we can save a lot of computation time because far fewer reserved measurements are carefully processed.
According to (1), each track is defined by 6 unknown parameters ( , and a 6D parameter space is necessary for enumeration of all possible candidates in [17,31]. The measurements will vote for the 6D parameter space, and the optimal track is confirmed by searching the highest score in the 6D parameter space. The computation will be incurably large. In [18], HT is performed C 2 n times, where n is the dimension of the measurements, by independently mapping each of the measured parameters versus the others. In 4D measurement scenarios, there are X versus Y, X versus Z, X versus time, Y versus Z, Y versus time, and Z versus time. Each 2D HT obtains a set of candidate points. The target track should be the intersection of the C 2 n point set. However, the method is insufficient in coping multiple closely distributed tracks for two reasons. Firstly, the candidate tracks obtained in each 2D HT should be associated before the intersection. If all possible associations of 2D HT detections are enumerated, pairs of associations will take an extremely long time to compute. Secondly, several close tracks would be detected as a "larger" target, which results in some misdetection. In [13], PF TBD is applied, but 6D particles are necessary. Far more particles should be exploited, otherwise the track performance is unsatisfying because of under sampling.

Framework of the Proposed Method
The proposed method consists of three parts, clutter suppression, 3D TBD, and 2D HT. The processing flowchart is presented in Figure 1. The three components are highlighted by blue, green, and red boxes. Firstly, N v detection bins are built [21], each detection bin corresponding to a 3D vector. In [2,19,21], the 3 dimensions in 3D TBD regard the X axis, Y axis, and time, respectively. A total of 140 3D vectors are selected for the limitation of target velocity [2]. However, the three dimensions here are the X axis, Y axis, and Z axis. Without the limitation of the target motion model, 1217 3D vectors covering all directions are exploited, and 1217 detection bins are built. The 3D vectors are built based on tessellation of Platonic solids in [32].
In clutter suppression, history measurements are firstly rotated and then projected to get a score map. Clutter density of background and clutter regions in the detection bin is denoted by the score map, so we called the score map "clutter map". Each detection bin corresponds to one clutter map, and 1217 clutter maps are obtained in total. Then, in track detection processing, the current measurements are rotated by the same 3D vector and generate a current score map in the same way. In each detection bin, the current score map is subtracted by the clutter map. The reserved votes are regarded as target votes. The 3D TBD method introduced the 3D rotation, voting strategy for maps, and backtrace for the candidates whose score exceeds threshold T 1 d . In the third part, each candidate contains one track at most. Therefore, three HT are performed on X-axis versus time, Y-axis versus time and Z-axis versus time respectively. The intersection of the point set obtained by the 3 Hough transformations is tested with the second threshold T 2 d . The intersection is confirmed as a target track if its score is larger than T 2 d . In maneuvering target tracking scenarios, tracklets can be detected by the proposed method firstly, and the whole trajectory can be then obtained by associating the tracklets [21,33].

Clutter Region Suppression
The acquisition during a long period, Rotation of the projected points is performed along the Z axis by α j , and α j = arctan Then, the rotated point ẍ Then, the above points are rotated again along the X axis by β j = arctan The points after the above two rotations x The clutter map corresponding to the 3D vector e x j , e y j , e z j can be represented by S c j , The clutter map set {S c j |j = 1, . . . , N v } is designed to cancel the component of the clutter region in the projected map.

The 4D TBD
After obtaining the clutter map in N v detection bins, we detect the current tracks by the latest measurements. The latest measurement point set is represented by denote the time window in which the target track exists. The Z is also rotated by N v 3D vectors in each detection bin. Each detection bin generates a current projected map. Corresponding to the N v clutter map, the N v current projected map is represented by {S T j |j = 1, . . . , N v }. Current projected map S T j is obtained by (7)-(9) and (12) using measurements Z. Then, we can get the projected map after clutter region suppression, {S j |j = 1, . . . , N v }, by subtraction of the two maps.
Then the cells whose score is larger than a threshold T 1 d in map {S j } are found out. These cells are candidates potentially containing a target track. Therefore, these cells are called "candidate cell". If all the cells in {S j } are smaller than the T 1 d , it infers that no target whose direction close to the 3D vector of this detection bin exists. It is unnecessary to do further processing on this detection bin.
The candidate cells in the same score map are firstly clustered into sets to get the candidate regions by the region growing method in [33,34]. Each candidate cell set potentially contains a target track. It assumes that N j candidate regions are obtained in the S j . Further detection will be performed on each set. The cardinality of a candidate cell set is denoted by N r j . Candidate region set R j can be expressed by The processing of each region in {S j } follows three steps. Firstly, the points voting for the candidate cell set are backtracked. The 3D straight line which corresponds to a candidate cell S j (m a , n a ) can be expressed by If the distance d  [19]. The point set voting for the S j (m a , n a ) is denoted by Z j (m a , n a ) = x j i ,ŷ j i ,ẑ j i |i = 1, . . . , N j (m a , n a ) . Then, the corresponding point set of the candidate region Z r j can be expressed by the union of N c j point set, i.e., The parameter r denotes the index of the candidate regions in S j . Each set Z r j is promising and contains one track.
Secondly, three 2D HT are performed according to (1). On the X axis, a parameter space The bin width of x k 0 and v k x is w 0 H and w k x , respectively. The voting process corresponding to the X axis can be expressed by The maximum score of parameter space map, M r j,x , can be obtained by enumeration. Similarly, on the Y axis and Z axis, 2D HT is also performed for the M r j,y and M r j,z . If all three maximums, M r j,x ,M r j,y , and M r j,z , are larger than a fixed threshold T 2 d , we believe that one track exits in the point set Z r j , i.e., The points which were voting for the cell of maximum vote M r j,x are represented by T r,x j . Similarly, we can get the point sets T r,y j and T r,z j . The obtained tracklet in T r j is exactly the intersection of the three point set, i.e., Thirdly, track fusion is necessary to avoid duplicate detection. The same track may be detected in more than one detection bin if 3D vectors of detection bins are very close.
, e z j 2 , meet the expression in (23), the tracks detected in the two detection bins are attempted for fusion.
The T f is a constant set to π 15 in our experiments. Each pair of the tracklets obtained in j 1 -th and j 2 -th detection bins are compared. Two tracks are fused if they are sharing the same points, and the fusion can be expressed by (24) where the constant parameter η is set to 0.5 in the experiments.

Theoretical Model
After the introduction of methods in each stage, an integrated theoretical model of the whole algorithm is given.
The measurements during a time period are projected, rotated firstly. Then, a map voted by the rotated points is obtained in each detection bin. Each map contains N x j × N Y j grid cells which are sized w xy × w xy . The votes in a map cell consist of three parts when a track exists, i.e., the votes of the target, background, and clutter region. The expectation of the three components all follow a Poisson distribution, and their expectation is The R C means the radar coverage range. The historical measurement Z c is exactly used to estimate the clutter components, i.e., (26) and (27). The clutter components in each grid cell and in N v detection bins ({S c j |j = 1, . . . , N v }) are estimated by (7)- (12). If a target exists in a cell, the expectation of the points voting for the cell is the summation of (25) and (26) (target not in a clutter region) or the summation of (25)- (27) (target in a clutter region). The map is precisely the S T j , which is estimated by Z by using (7)- (12).
In S c j and S T j , the distributions of false alarm points in each cell generated by background (26) and clutter regions (27) are the same, as the variation of clutter is quite slow compared with that of the target cell. Therefore, after the subtraction of S c j and S T j (15), only the components of target (25) are reserved in S j , and it is much larger than the residual error of subtraction in (15). Therefore, target detection in S j is quite easy, and the subtraction in each detection bin is the theoretical basis of our improvements.
The distribution of the quantity of the points in a target cell can be expressed by (28) because the summation of several individual Poisson processes also follows a Poisson distribution.
Similarly, the distribution of a cell in which a target is absent can be written as The vote of cell in clutter map S c j (x, y) is the expectation of clutter components, i.e., ). If the threshold equals T 1 d , the false alarm detection probability of a cell (x, y) can be written as The false alarm rate is the summation of the probability of votes more than γ 3D c + T 1 d . According to (30), the estimated clutter S c j (x, y) and the component γ 3D c in (29) are canceled out. Therefore, the summation is usually very low in using an appropriate T 1 d . According to (28), the detection rate can be written as For the contribution of target component, (t T 2 − t T 1 )( γ k t In ), exceeding γ 3D c + T 1 d is quite easy for the γ 3D t . It provides a remarkable detection rate of the target tracklet. Even if a false alarm tracklet is built by the 3D HT, three 2D HT are performed to further verify the tracklet. In the 2D HT, the parameter space is divided into H 0 H × H V H grid cells. Each point would vote for all H V H speed bins in 2D Hough map S r j,x , each velocity bin one vote. Therefore, the map S r j,x has |Z r j | × H V H votes in total. If a cell has been correctly confirmed as a tracklet candidate, it contains (t 2 − t 1 )(w xy w xy R C )(γ 0 + I 3 (x, y)γ 0 c ) + T 1 d points, at least. The point cardinality of the set Z r j follows the distribution: Similarly, the point cardinality of set |Z r j | in which no target points exist but has been falsely confirmed as a candidate follows the distribution: If the clutter points randomly vote for the H 0 H × H V H cells in map S r j,x , the expectation of vote in each cell can be expressed by According to (33), the vote distribution of a cell in map S r j,x if no target exits can be expressed by Then, the false alarm rate of T 2 d in map S r j,x can be written as Similarly, the distributions P 2D,y FA (T 2 d ) and P 2D,z FA (T 2 d ), which correspond to S r j,y and S r j,z , are available, and the three distributions are individual with each other. Therefore, according to (21), the overall false alarm rate of track detection can be expressed by The P 2D,x FA (T 2 d ) is usually very low because the score is hard to exceed T 2 d if the point set originates from clutter and the points are randomly voting for the cells. Meanwhile, according to (37), even if the false alarm arises in one 2D HT map, the false alarm track is very hard to pass the three 2D HT map detection at the same time. Therefore, (36) and (37) grant our method an extremely low false track detection rate.
If the target points exist in a cell, the clutter points randomly vote for the H 0 H × H V H cells in map S r j,x . However, all the target points would vote for the one cell, and the vote expectation of the cell can be expressed by According to (32), the distribution of votes in a cell of map S r j,x when the target actually exists can be expressed by The detection rate of T 2 d in map S r j,x can be written as According to (21), the track can be confirmed only if it is detected in all three 2D Hough maps. Therefore, the overall track detection rate can be expressed by For the component of target points, γ k t In , the vote is very easy to exceed T 2 d , and the P 2D,x D (T 2 d ) is close to 1. Similarly, a high detection rate of target track P 2D D can be achieved.

Implementation of the Whole Algorithm
The implementation of the proposed method is explained by the Pseudocode in Algorithm 1. In the Pseudocode, lines 1-3 present the clutter map S c j estimation using Z c in N v detection bins. In lines 5 and 6, target map S j is calculated, and only the component of target (25) is reserved. In line 7, N j target regions are easily obtained. In lines 9-12, three 2D HT are performed. In lines 13-15, the target tracklet is confirmed. After getting all the tracklets in N v detection bins in lines 4-17, tracklet fusion is performed in line 18.
A diagram presenting the whole process is given in Figure 2 to make our method more understandable. The two sub-figures in the first line present the measurement points Z in the example and the 3D vectors e x j , e y j , e z j . They are the inputs to detect the four target tracklets. The measurements include the target points and false alarm points originating in the background and two clutter regions. The weight of all points is the same at the beginning. The target points are labeled by larger balls for distinction. The blue to red points denote the initial points to most recently obtained points. In the second line, three 3D vectors are presented as examples. Each vector regards a detection bin. In the third line, points after projection (7) and two rotation (8) and (9), i.e., x j i ,ŷ j i ,ẑ j i , in the three detection bins are presented. In the first detection bin, one target track is perpendicular to the current X-Y plane. Two target tracks are perpendicular to the plane in the subfigure in line 3, column 2. In the fourth line, the score maps before clutter suppression S T j in the three detection bins are presented. The vote of two clutter regions is quite outstanding. The clutter map S c j , which is built by history measurements Z c , is presented by the subfigures in the fifth line. Subfigures in the sixth line and seventh line present the projected map when the false alarm points are absent and present, respectively. In the projected map S j , the target region is much easier to be detected. The target region in the first detection bin is outstanding in subfigure of line 7, column 1. The points in set Z 1 1 , which vote for the target region, are presented in the subfigure of line 8, column 1. Then, one tracklet can be obtained after 2D HT in the subfigure of line 9, column 1. Similarly, in the second detection bin, two target regions are detected in the subfigures of line 7, column 2. Two tracklets can be then obtained after two individual 2D HT processing. In the third column (jth detection bin), no target regions can be obtained in subfigures of line 7, column 3. That means no tracklet can be obtained in this detection bin. The subfigures in the bottom (line 10) present all the tracklets obtained by the N v detection bins.  Get S c j using Z c by (7)-(9) and (12); 3 end 4 for j ← 1 to N v do 5 Get S T j using Z T by (7)- (9) and (12); 6 Get S j using Z c and Z T by (14); 7 Get the N j candidate regions R j with region growing method; 8 for r ← 1 to N j do 9 Get the point set of region Z r j by (17) and (18) (23) and (24) An example of 2D HT is presented in Figure 3. The subfigure in the first line presents the point set of a target region, Z r i . It shows that the point of a target tracklet and a few false alarm points exist. In the second line, the maps of HT in X axis-time, Y axis-time, and Z axis-time are present. The target region in each 2D HT map (M r j,x , M r j,y and M r j,z ) is highlighted by a red circle. As is presented in the three subfigures in the third line, the points which were voting for the target region in each 2D HT map are extracted, i.e., T r,x j , T r,Y j , and T r,Z j . In the three point sets, both the target points and a few false alarm points are included. At the bottom of Figure 3, the intersection of three point set, also the obtained track, is presented. It shows that all the target points are detected in the track T r j , and false alarm points have been filtered out.

Discussion of Thresholds
The tracking performance is closely related to the detection thresholds T 1 d in (16) and T 2 d in (21). The optimal values of the two thresholds are decided by the parameters of targets and measurement environment.
The T 1 d is applied to confirm the candidate target region in the 3D voting process. The optimal threshold T 1 d,opt can be defined by (42) according to (30) and (31) The track is easier to be detected if more target points and fewer clutter points are voting for the cell, i.e., In (43), t In and R C are the constants in the radar system. The γ k , γ 0 , and γ 0 c are related to targets and measurement environment and are out of our control. The w xy should match with the measurement position error of radar systems, otherwise not all the target points would vote for the cell. Therefore, the most simple approach to improve the target track detection performance is to enhance the time period, i.e., (t 2 − t 1 ). However, it is worth noting that the discussion in this section is built on the target model in (17). Therefore, a very large value of (t 2 − t 1 ) is unsuitable because the target may be maneuvering during t 1 to t 2 . The (43) also proves that the merit of track-before-detect method comes from accumulating the target "votes" according to the target motion model during a long enough time. The T 2 d is applied to confirm the candidate target region in the 2D voting process. Similarly, the optimal threshold T 2 d,opt can be defined by With using T 1 d,opt and T 2 d,opt , a high target track detection rate and a very low false alarm track rate can be achieved at the same time.

Synthetic Data Experiment
To fully access the superiority of the proposed algorithm, experiments using synthetic data are conducted.

Optimal Threshold Estimation
Before performing the tracking method, the optimal thresholds T 1 d,opt and T 2 d,opt are estimated. The parameters used in the method are presented in Table 1. These parameters are applied in both synthetic data and real experiments. Figure 4a presents the relationship between T 1 d and P 3D Figure 4b. The detection performance could be remarkably good when the difference is close to 1. The curves in different colors mean using different T t e , T t e = t T 2 − t T 1 . A large T t e represents more measurements are jointly processed in the time window. The curves of T t e = 1, · · · , 10 s infer that a larger time window is beneficial to achieve a better performance on weak targets. According to (42) and (44), the optimal T 1 d and T 2 d are 11.5 and 19.97 respectively.

. Scenario and Parameters of Synthetic Experiment
In order to numerically assess the performance of the proposed processing chain, 200 Monte Carlo numerical simulations are performed for the tracks. Figure 5a presents the simulated scenario in which 9 target tracks and 2 clutter regions exist. The time duration of the measurement is 10 scans. The 9 tracks are labeled by T1 to T9 and can be partitioned into 3 groups. The first group, which includes track 1 to track 6, is closely distributed tracks. The second group is merely track 7, an individual target. The third group contains track 8 and track 9. Track 8 is located at the boundary of clutter region 1. Half of track 8 is in the clutter region, and the other 5 scans are out of the clutter region. Track 9 is totally inside the clutter region. The two clutter regions, clutter region 1 and clutter region 2, are different in clutter density. Besides clutter regions, the background false alarm points also widely exist in the surveillance area. The values of simulation parameters are patched in Table 2.

Parameter Values
Measurement rate (γ k ) 2.5 Measurement noise in X Y Z axis (30,30,30) (m) Background clutter density (γ 0 ) 5 × 10 −10 ( 1 m 3 ) Clutter region 1 density (γ 0 1 ) 15 × 10 −10 ( 1 m 3 ) Clutter region 2 density (γ 0 2 ) 25 × 10 −10 ( 1 m 3 ) The synthetic measurement points in an experiment during 10 scans are presented in Figure 5b. The point color represents the time information; red means the first scan, and blue means the last scan. It is hard to find the target points without an outstanding tracking method because of the massive clutter points.
Monte Carlo experiments are performed by the proposed method, ET-PHD [35], CPHD filter [36], Bernoulli filter [37], and GLMB filter [38]. In ET-PHD [35], points are partitioned into sets in advance for extended target tracking. In the CPHD filter [36], Bernoulli filter [37], and GLMB filter [38], targets are assumed to generate one point at most. Tables 1 and 3 summarize the values of the parameters used in the proposed method and in the comparison methods [35][36][37][38], respectively. The parameter did not change in all experiments.
The above methods in comparison to [35][36][37][38] are not TBD methods. Therefore, multidimensional HT-based TBD [18] is also tested to present the superiority of the proposed method in weak target detection.

Result of the Proposed Method
The target points obtained in 200 Monte Carlo experiments using the proposed method are presented in Figure 5c. It infers that almost all the target points are founded, and no false tracks are built in clutter regions.
The detection rate and OSPA distance of the three groups has been patched in Table 4. The cut-off distance and distance order in the calculation of OSPA distance are 75 m and 1, respectively. The detection rate of the three groups is always higher than 94%. Not unexpectedly, group 2, an individual track, achieved the highest detection rate and the lowest OSPA distance. The two tracks in the clutter region can be also well detected. It proves that the clutter can be well suppressed by the clutter maps in each detection bin no matter if the track is inside or at the boundary of the clutter region. In 200 Monte Carlo experiments, no false alarm tracks were built by the two clutter regions or by background clutter. In the group of close tracks, all the track points were associated with the correct track.
No incorrect tracks were built by incorrect associations of target points. The incorrect track here regards a track whose points originated from different targets. Incorrect tracks usually arise in the method which is insufficient in coping with multiple closely distributed tracks.   Table 4 presents the results of comparison methods, ET-PHD [35], CPHD [36], Bernoulli filter [37], and GLMB filter [38]. In all three groups, for the merits of multi-scan detection, the detection rate of ET-PHD [35] was lower than that of the proposed method by about 20%. For the existence of close tracks and clutter region, 1.16 false alarm targets were acquired per scan, on average. Among the 1.16 false alarm targets, 0.253 targets in the background (most of them nearby the group 1), 0.152 targets in clutter region 1, and 0.759 targets in clutter region 2 were obtained.
In [36][37][38], targets are assumed to merely generate one point. Tracking extended targets here is beneficial for weak target detection, but false alarm tracks may be generated by the reserved points of the extended target. As is presented in Table 4, their detection rate is higher than ET-PHD [35], but it suffers from more false alarm tracks. Therefore, the OSPA distance of methods [36][37][38] is higher than ET-PHD [35].
In the sixth column of Table 4, the detection rate vs false alarm rate is calculated. That of the proposed method can be very large, and the ET-PHD [35] is higher than the other methods [36][37][38]. As to the positional error, ET-PHD [35] is slightly larger than our method and methods [36][37][38], particularly in group 1. The positional error of [38] is slightly lower than the others. Therefore, Table 4 proves that the proposed method achieves a much lower OSPA distance. The superiority of the proposed method includes improving target detection, decreasing positional error, and suppression of clutter.

Result of Multi-Dimensional HT-TBD
To access the superiority of the proposed method in TBD methods. Multi-dimensional HT-TBD [18] is performed with the synthetic data as comparison. The result is presented in Figure 6. In processing the 4D points with the method in [18], 2D HT is performed 4×3 2 times by independently mapping each of the measured parameters versus the other three. Then, the most-likely linear target tracks for each data cluster are determined. After that, the candidate linear target tracks are associated. Finally, the intersections of the sets for the 6 HT domains are regarded as possible target tracks. The experiment of [18] consists of two parts, testing the multi-dimensional HT-TBD [18] method only using the target points and using the both the target points and clutter points. The multi-dimensional HT-TBD [18] is performed using the information of the X-axis and Y-axis.
Firstly, only the target points in Figure 6a are fed to the multi-dimensional HT-TBD. The parameter map in Figure 6b can be obtained if the gradient k p and intercept b p in (45) are the two parameters of the voting map.
The widths of the gradient bin and intercept bin are 25 and 100, respectively. In (45), the y i and x i are the position on the X axis and timestamp, respectively. Figure 6b infers that the vote of the 3 individual tracks is outstanding compared with their surrounding cells. However, the vote of the close tracks, group 1 in Figure 5a, is overlapped. It is hard to distinguish the individual tracks. The cluster result of Figure 6b is presented in Figure 6c. The three individual tracks can be detected as three individual regions, and each region generates a candidate point set. It means the three tracks can be detected individually. However, the close tracks will be detected as one larger region, and the points of all 6 tracks in group 1 will be put in one set. Then, another parameter space, range ρ and azimuth θ, is tested. The range ρ and azimuth θ in (46) are constants if the target points in a straight line.
The widths of the range bin and azimuth bin are 100 m and π/90, respectively. In (46), the y i and x i are the position on the X axis and timestamp, respectively. The 2D HT map and corresponding result using the ρ and θ are presented in Figure 6d,e. A similar problem arises in the close tracks. Figure 6c,e infers that the multi-dimensional HT-TBD [18] is available to detect individual targets but insufficient to detect close tracks. Secondly, both the points of the target and clutter are processed by the multi-dimensional HT-TBD; the fed points are presented in Figure 6f. The result corresponding to (45) is presented in Figure 6g,h. The vote of targets is falling in that of enormous clutter. The correct candidate point set is missed. The result corresponding to (46) is presented in Figure 6i,j. The problem is similar. The result in Figure 6f-g infers that, in 4D scenarios, the multi-dimensional HT-TBD is insufficient to cope with the strong clutter backgrounds and clutter regions.

Real Data Experiment
To further access the performance of the proposed method. An extensive experiment using a 3D radar is performed. The coverage range of the radar is about 30 km, and the horizontal rotate speed of the radar antenna is π rad/s, i.e., t In = 2. The scope of the vertical angle for the elevation of targets is 0 to π 3 . The targets in the surveillance area are multiple maneuvering airplanes. The GPS in the airplane records the ground truth of the airplane's trajectory. The experiment contains two real scenarios. However, in both experiments, only the ground truth of the airplane which belongs to our experiment is available.

Experiment 1
The measurements of scenario 1 are the unlabelled points presented in Figure 7a. It contains the measurements of 400 s, i.e., all the points from the 1st to 200th scan. The 3 axes of the figure regard the X position, Y position, and altitude. The point color regards the time information, the blue to red points denote the initial points to the most recently obtained points. The points are partitioned into time windows [21] to access the tracklet detection performance, and the points in each time window are individually processed. The width of the time window here is 20 s, i.e., the points of 10 scans are jointly processed.
The detected tracks in each time window are presented in Figure 7b. The target track can be well detected in each time window despite the inhomogeneous clutter background. It is worth noting that in the 3rd and 4th time windows, the target is maneuvering. Both the target course and speed are varying along with the time. The track is detected in 3rd and 4th time windows and infers that, although the method is designed for targets in a straight line, maneuvering targets can be also detected. The smoothed track can be obtained with the tracklet association method in [21]. The smoothed track is presented by the red line in Figure 7c. It matches well with the ground truth. Meanwhile, the PHD filter [35] in the 3D version is also tested, and the tracking result is presented in Figure 7d. An outline of the track can be obtained, but the target is lost in some scans. Meanwhile, in some scans, plenty of measurements (several times of measurement rate) are generated by the target. One target may be falsely detected as two close targets. So, some false alarm tracks are caused by the instability of the measurement rate.
Then, track smoothing and outlier removal are performed to decrease the position error of the track. The difference between the ground truth and the smoothed tracks is presented in Figure 8a. The red, blue, and green lines mean the difference in 3D space. In the 30th to 40th scan where the target is maneuvering, the difference is similar with the others. Those of the PHD filter [35] are presented in Figure 8b, and only the positional error of detected target is presented. Comparison between Figure 8a,b infers that the positional error of the two methods is similar. However, the positional error of scans where the target is maneuvering (30th to 40th scan) is slightly larger than the other scans. In Figure 8c, the innovation square which evaluates the difference between the measurement and corresponding prediction is presented. In most scans, even the scans in which the target is maneuvering, the innovation of the method is stable and similar with the others. It means that our tracking method is stable to access the correct measurement point and to get a suitable prediction even if the target is maneuvering. Figure 8d shows the OSPA distance of methods [35][36][37][38] among 200 scans. The cut-off value and the distance order are 200 m and 1, respectively. The result of the real experiment, on average, is summarized in Table 5. The comparison infers that the ET-PHD filter [35] and the methods [36][37][38] are no better than the proposed method mainly because of the misdetection and false alarm track, respectively. We think it is an acceptable result to track a moving target whose speed is about 200 m per second.  In the second experiment, the surveillance area has 2 tracks to access the tracking performance of close targets. The measurement of 220 scans is presented in Figure 9a. The longer track exists in all 220 scans, and the shorter one keeps 40 scans, from 151th to 190th scans. The two tracks are close with each other in 166th to 180th scans. Meanwhile, the clutter density in the region of track crossing is much larger than that of other regions. The 220 scans are partitioned into 22 not overlapped time windows. The tracks obtained in each time window are presented in Figure 9b. In the 15th to 18th time window, two tracks are obtained without false alarm tracks, even if the clutter region exists. The position difference between the smoothed track and ground truth of X, Y, and Z axes using the ET-PHD filter [35]. (c) The innovation square of the proposed method and four comparisons [35][36][37][38]. (d) The OSPA distance of the proposed method and four comparisons [35][36][37][38]. In (c,d), red squares mean the proposed method, blue balls mean the ET-PHD [35], green triangles mean the CPHD filter [36], black inverted triangles mean the Bernoulli filter [37], and cyan diamonds mean the GLMB filter [38]. The points of 166th to 175th are presented in Figure 10a, and 171th to 180th are presented in Figure 10b. Among the 36 points of track 1, it has 12 points whose distance from the points of track 2 is smaller than 350 m. Among the 36 points of track 2, 11 points have a distance from track 1 smaller than 350 m. The nearest distance between two tracks is merely 268 m. The obtained track using the the points of Figure 10a is presented in Figure 10c. The obtained track in Figure 10b is presented in Figure 10d. The two tracks obtained are labeled by balls and triangles, respectively. Figure 10 infers that the points of closely distributed tracks can be well partitioned into correct tracks. As to the position error, only the ground truth of track 1 is available. The position differences between the smoothed track and ground truth in X, Y, and Z axes using the proposed method are presented in Figure 11a. That of the ET-PHD filter is presented in Figure 11b. It infers that lots of false alarm tracks are built by the clutter region. Meanwhile, the target is lost in 22 successive scans for the clutter region. In Figure 8c, the innovation square of the five methods is presented, which proves the stability of the proposed method in multiple target tracking. The overall OSPA distance of 220 scans is presented in Figure 11d. In Figure 11d, the gray rectangle labels the scans in which two tracks are close with each other. It infers that the corresponding position error is similar to the other scans. The yellow rectangle labels the scans, 140th to 160th, in which the target is quite weak and merely generates a few measurements per scan. It has 3.21 target points per scan, on average, in all 220 scans and 1.39 target points during 140th to 160th scans. The target is hard to detect in these scans by using the comparisons [35][36][37][38]. The result further proves that the proposed method works well in close track scenarios.
The result summarized in Table 5 infers that the proposed method is superior in suppression of clutter region and detection of a weak target for the benefit of multi-scan detection. (c) (d) Figure 11. (a) The position difference between the smoothed track and ground truth of X, Y, and Z axes using the proposed method. (b) The position difference between the smoothed track and ground truth of X, Y, and Z axes using the ET-PHD filter [35]. (c) The innovation square of the proposed method and four comparisons [35][36][37][38]. (d) The OSPA distance of the proposed method and four comparisons [35][36][37][38]. Yellow rectangle means weak target scans. Gray rectangle means closely targeted scans. In (c,d), red squares mean the proposed method, blue balls mean the ET-PHD [35], green triangles mean the CPHD filter [36], black inverted triangles mean the Bernoulli filter [37], and cyan diamonds mean the GLMB filter [38].

Conclusions
In this work, a 4D TBD is developed for extended target detection in 3D radar systems. The problem is designed to detect the closely distributed weak extended targets using the 3D positional information and timestamp of points, in spite of heterogeneous background clutter and various clutter regions. The method is divided into three parts, and plenty of detection bins are built. The 4D TBD process is completed by a 3D TBD and three 2D TBD. Although the proposed 4D TBD is designed for the straight-line targets, the experiment using the real data infers that the method also works well for closely maneuvering targets.