You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

23 March 2023

3D-DIoU: 3D Distance Intersection over Union for Multi-Object Tracking in Point Cloud

,
and
1
Institute of Microengineering and Nanoelectronics (IMEN), Universiti Kebangsaan Malaysia, Bangi 43600, Malaysia
2
Department of Automotive Technology, Erbil Technology College, Erbil Polytechnic University, Erbil 44001, Iraq
3
Center for Artificial Intelligence Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Malaysia
*
Author to whom correspondence should be addressed.
This article belongs to the Section Intelligent Sensors

Abstract

Multi-object tracking (MOT) is a prominent and important study in point cloud processing and computer vision. The main objective of MOT is to predict full tracklets of several objects in point cloud. Occlusion and similar objects are two common problems that reduce the algorithm’s performance throughout the tracking phase. The tracking performance of current MOT techniques, which adopt the ‘tracking-by-detection’ paradigm, is degrading, as evidenced by increasing numbers of identification (ID) switch and tracking drifts because it is difficult to perfectly predict the location of objects in complex scenes that are unable to track. Since the occluded object may have been visible in former frames, we manipulated the speed and location position of the object in the previous frames in order to guess where the occluded object might have been. In this paper, we employed a unique intersection over union (IoU) method in three-dimension (3D) planes, namely a distance IoU non-maximum suppression (DIoU-NMS) to accurately detect objects, and consequently we use 3D-DIoU for an object association process in order to increase tracking robustness and speed. By using a hybrid 3D DIoU-NMS and 3D-DIoU method, the tracking speed improved significantly. Experimental findings on the Waymo Open Dataset and nuScenes dataset, demonstrate that our multistage data association and tracking technique has clear benefits over previously developed algorithms in terms of tracking accuracy. In comparison with other 3D MOT tracking methods, our proposed approach demonstrates significant enhancement in tracking performances.

1. Introduction

An important challenge in computer vision study is multi-object tracking (MOT), which identifies and keeps track a unique identification (ID) for each object of interest in a point cloud series while predicting the locations of all objects. MOT has many important theoretical research implications and practical applications. Systems for visual security surveillance, vehicle visual navigation [1], augmented reality [2], human–computer interface, high senstitivity audio-visual (AV) [3] to name a few, all heavily rely on MOT systems with well-behaved performances. There are several difficulties that can deteriorate tracking performances in real-world applications. These difficulties include the way an object interacts, occlusion, and how close certain objects are related to one another. These difficulties lead to many unwanted detection mistakes and errors, including bounding box drift and ID changes, which cause tracking performance to degrade severely. As a result, this work proposes an improved and reliable MOT method for point cloud scenarios. Previously developed three dimension (3D) multiple object tracking (3D MOT) algorithms [4,5,6,7,8,9] adopt the tracking-by-detection pattern. Across frames, the tracklets depend directly on the 3D bounding boxes from 3D detectors.
In general, the concept of the tracking-by-detection algorithm consists of four modules: (i) input detection pre-processing module, (ii) motion module, (iii) association module, and (iv) managing tracklet life cycle. All objects of interest from the point cloud series are determined using the detector. Then, the identical objects from the detector and predicted motion model are associated using the metrics, which is established on features. A continually updating tracklet set is created by connecting the same item in many point cloud frames. In this procedure, the detector’s effectiveness and the performance of the data association algorithm jointly impact the tracking accuracy and flexibility. This detection process is normally evaluated using an intersection over union (IoU) metric.
The association results can be wrong when the input detectors are inaccurate. However, refining these detectors by using the non-maximum suppression (NMS) technique can improve the association. Additionally, we found that the association metric expressed between two 3D bounding boxes should be designed properly. Neither generalized IoU or GIoU [10] nor L2 [11] work well. The inference speed of the tracking system is significantly influenced by both the detector and the data association. Therefore, the multistage association process between predictions and tracklets can express the existence of the objects. Based on these findings, using distance-IoU (DIoU) over the tracking pipeline can significantly improve the solutions. In order to tackle 3D MOT issues, we propose an improved DIoU method in this paper. Consequently, we utilize Waymo Open Dataset [12] and nuScenes [13] in order to evaluate and verify our proposed algorithm. Our method and contributions, in brief, are as follows:
  • We added DIoU-NMS to 3D MOT tracking pipeline and analyze the performance;
  • We proposed the use of DIoU for two-stage and multi-stage data association, which showed competitive results on both Waymo Open Dataset and nuScenes;
  • We used unmatched tracklets and unmatched detection from previous stages for data association in the next stage, and the verification results on Waymo Open Dataset show better performance for cyclist objects.
By using DIoU in the tracking pipeline, we overcome premature tracklet termination where the tracking framework depends on prediction position for invisible objects. Previous work in [14] used GIoU for the tracking process, which terminates an unassociated tracklet. Instead, we used DIoU to maintain the unassociated tracklet by using its predicted position. Therefore, when a temporarily invisible object reappears, it can be associated with its original predicted position.

3. Materials and Methods

In this section, a simple tracking with the PointPillars and a motion prediction technique is proposed, and the workflow of tracking procedure is shown in Figure 1. The tracking process consists of the following parts:
Figure 1. Three-dimensional MOT workflow steps.
  • Detection: for this step, the bounding boxes are selected from the detector, as shown in Figure 2;
    Figure 2. Vehicle bounding box detectors.
  • Selected Detection: by applying NMS process the number of bounding boxes decreased and the unwanted boxes are removed;
  • Tracklets, Prediction, and Motion Update: all these processes are related to each other where Kalman filter is used, as illustrated in Figure 3 and Figure 4;
    Figure 3. Vehicle tracklets bounding boxes.
    Figure 4. Motion prediction process.
  • Multi-Stage Association: in this step the detectors in the present frame are associated with the tracklets from the previous frame. The unmatched prediction and tracklets are associated in another stage. Three-dimensional GIoU and DIoU association metrics are used in this work is coupled with Hungarian algorithm;
  • Motion update and Life Cycle Management: the creation and termination of the tracklets are updated and are determined in this step, and the final tracklets are shown in Figure 5.
    Figure 5. Tracklets at frame k.

3.1. Adding 3D DIoU Function

In this part, we examine and enhance the detection and multi-stage association modules by including a 3D DIoU model. In this work, we revised the NMS and its association function for bounding boxes of the conventional tracking method to enhance the tracking capability of multiscale and occluded objects.
The association speed and performance of the object tracker are directly dependent on the values of the association function. To determine the multiple object tracking (MOT) value, it is necessary to calculate the correspondence among the bounding boxes using the tracking method. In order to determine the volume of union between two bounding boxes, the intersection over union (IoU) metric [41] are used, and the consistent association function is stated as follows:
I o U = B 1 B 2 | B 1 B 2 | ,
where B 1 and B 2 are 3D bounding boxes; B 1 B 2 indicates the volume of intersection of B 1   and B 2 ; and B 1 B 2 indicates the volume of the union of B 1 and B 2 . The IoU is equal to 0 when there is no intersection between the two 3D bounding boxes. In this case, the tracking process cannot continue.
As a solution for this gradient vanishing matter, generalized intersection over union (GIoU) equation [33] is used in the tracking technique, which is stated as follows:
G I o U = I o U D C ,
where C   is the smallest volume that covers B 1 and B 2 ; let D = C / B 1 B 2 , the C = B 1 B 2 D ; C and D stand for the volume of C and D , respectively. When B 2 box contains the B 1 box, then the variance between each B 1 box and the B 2 box are the same, GIoU, in this case, degenerates into IoU, without any tracking relationship.
IoU and GIoU only take into account the overlapping volume, and the associated functions have two drawbacks, including delayed corresponding and incorrect association. However, distance intersection over union (DIoU) uses the standardized distance among the centers of the B 1 and B 2 bounding boxes. The following definition relates to this association function [42]:
D I o U = I o U d 2 c 2 ,
where d   is the Euclidean distance length between the center points of the B 1 and B 2 bounding boxes; c   is the diagonal length of the smallest enclosing box that encompasses the two boxes. The DIoU function causes the model to acquire quick association if the two boxes are in either the horizontal or vertical direction at the same time. Directly reducing the normalized distances between central point’s using the DIoU function leads to a faster convergence rate [42] and more precise association. The IoU, GIoU, and DIoU, as expressed above, are used to describe the association between any two bounding boxes. The algorithm of 3D DIoU metric is defined as Algorithm 1.
Algorithm 1. 3D Distance Intersection Over Union Function
Input: the information data of B 1 and B 2 bounding boxes: B 1 = x 1 ,   y 1 , z 1 ,   l 1 ,   w 1 , h 1 , θ 1 ,   B 2 = x 2 ,   y 2 ,   z 2 ,   l 2 ,   w 2 ,   h 2 ,   θ 2
Output: 3D DIoU Association Metric
1. Determining the Projections B 1 and B 2 of B 1 and B 2 on the bird’s eye view, respectively B 1 = x 1 1 ,   y 1 1 ,   x 2 1 ,   y 2 1 ,   θ 1 , B 2 = ( x 2 1 ,   y 2 1 ,   x 2 2 ,   y 2 2 ,   θ 2 )
2.  A 1 the area of the 2D box B 1
3.  A 2 the area of the 2D box B 2
4.  I 2 D intersection between B 1 and B 2
5.  U 2 D union between B 1 and B 2
6.  I h the height of the intersection between B 1 and B 2
7.  U h the height of the union between B 1 and B 2
8.  I w the width of the intersection between B 1 and B 2
9.  U w the width of the union between B 1 and B 2
10.  I l the length of the intersection between B 1 and B 2
11.  U l the length of the union between B 1 and B 2
12.  I v the volume of the intersection between B 1 and B 2
13.  U v the volume of the union between B 1 and B 2
14.  d x the center distance for ( x 1 x 2 )
15.  d y the center distance for ( y 1 y 2 )
16.  d z the center distance for ( z 1 z 2 )
17.  d 2 the diagonal distance between B 1 and B 2
18.  c 2 the diagonal distance for the smallest enclosing box that encompasses between B 1 and B 2
19. If I 2 D   0 :
    I v   =   0 ;
  else:
  If   I h   0 :
    I v   =   0 ;
      else:
         I v = I 2 D   X   I h ;
20.  d 2 = d x 2 + d y 2 + d z 2 ;
21.  c 2 = U w 2 + U I 2 + U h 2 ;
22.  I o U 3 D = I v U v ;
23.  D I o U 3 D = I o U 3 D d 2 c 2

3.2. Non-Maxamium Suppressing (NMS) Upgrade to DIoU-NMS

To locate local maximum and to eliminate non-maximum bounding boxes, the NMS approach is used. Most object-tracking systems use NMS as a pre-processing stage, which is often used to choose the bounding boxes before starting the tracking operation. Depending on the score for classification confidence, which is the foundation of the original NMS, the bounding box that has the highest confidence score can be maintained. Since IoU and classification confidence scores are typically not strongly correlated, it is difficult to pinpoint many classification labels with a high number of confidence scores. When using the tracking method with the original NMS technique, analysis is only performed over overlapping regions, increasing the likelihood of missing and false detection, specifically in scenes with extremely overlapping objects.
We use DIoU-NMS to increase the detection efficiency for the occluded object. DIoU-NMS uses DIoU as a tool for suppressing the redundant bounding boxes, in contrast to the original NMS, which uses IoU as the criterion. DIoU-NMS takes into account the distance length between the center points of the two bounding boxes in addition to the overlapping area. The DIoU-NMS is stated as
s i = s i ,       I o U R D I o U M , B i < ε 0 ,         I o U R D I o U M , B i ε ,
where s i stands for the score of classification confidence; I o U is stated in an Equation (1); ε denotes the value of NMS threshold; M represents the highest-scoring bounding boxes; and B i is the pending bounding box. When conducting DIoU-NMS, the distance between the centers of two bounding boxes is taken into account concurrently with IoU. The distance is indicated by R D I o U and the equivalent equation is as follows:
R D I o U = p 2 b 1 , b 2 c 2
where p 2 denotes the length of the central distance measured between the bounding box b center point and the bounding box b 2   ones; c 2 is the smallest box’s diagonal, which contains both boxes.

4. Results

4.1. Datasets

There have been several MOT datasets recommended and used during the last few years. Waymo Open Dataset [12] and nuScenes [13] are the most commonly used and most considerable benchmark for MOT. Waymo Open Dataset (WOD) includes a perception dataset and a motion dataset. The total number of scenes in the dataset is 1150, divided into 150, 202, and 798 scenes for testing, validation, and training, respectively. While the motion dataset comprises 103,354 sequences, the perception dataset has 1950 lidar sequences that have been annotated. Each sequence is recorded for 20 s at a sample rate of 10 Hz. For each frame, point cloud data and 3D ground truth boxes for vehicles, pedestrians, and cyclists are provided. By using the evaluation metrics stated in [12], we recorded multiple object tracking accuracy (MOTA), multiple object tracking precision (MOTP) [43], Miss, Mishmatch, and false Positive (FP) for objects with the L2 diffuclty level.
NuScenes [13] provides ground truth 3D box annotations at 20 frames per second and LiDAR scans at 2 frames per second (fps) for a total of 1000 driving sessions. We report identity switches (IDS), AMOTA [7], and MOTA for nuScenes. AMOTA, the average value of MOTA, serves as the main indicator for assessing 3DMOT on nuScenes, is created by merging MOTA over several recalls. Meanwhile, AMOTP, the average value of MOTP, indicates an error value for the association process. Hence, the value for MOTP and AMOTP should be kept as small as possible.

4.2. DIoU-NMS Results

Our approach aims to increase the precision without considerably reducing the recall. We apply a strict DIoU-NMS to the input detections, and it is found that the ID switch only recorded 479 switches, in comparison with the IoU method, which is 519, as shown in Table 1.
Table 1. NMS for IoU and DIoU in the detection process with GIoU in association two stage.
In addition, when DIoU-NMS is applied to the Waymo Open Dataset, the MOTA is higher than that resulting from IoU-NMS. Similarly, the mismatch value improved and reached 0.077% for vehicle class, as in Table 2. Meanwhile, MOTA results reached 51% and the mismatch value is equal to 0.4% for pedestrian objects, as shown in Table 3.
Table 2. Comparison of the tracking results for vehicle objects using different NMS metrics on the validation set of Waymo Open Dataset.
Table 3. Comparison of the tracking results for pedestrian objects using different NMS metrics on the validation set of Waymo Open Dataset.

4.3. Association Results

We used 3D box detection from the CenterPoint method as the input data. To select boxes with scores higher than 0.7 on the Waymo Open Dataset, 3D IoU-NMS was set to 0.7. To associate between detection and prediction boxes, we used two-stage data association, namely 3D GIoU and DIoU. In this case, we associated the detection and prediction boxes by using DIoU in the first stage, then we re-associated again any un-associated boxes with DIoU for the detection and tracklets in the second stage. A similar second stage approach was applied to the third and higher order stage data association. The results for two-stage data associations are shown in Table 4 for vehicle class, and Table 5 for pedestrian class using the Waymo Open Dataset. The first row in both Table 4 and Table 5 represents two-stage data association results, where 3D GIoU metric is coupled with the Hungarian algorithm to match between detections and tracklets. On the other hand, in the second row, we used 3D DIoU instead of GIoU metrics.
Table 4. Comparisons for 3D MOT two-stage association on vehicle class, Waymo Open Dataset validation set.
Table 5. Comparisons for 3D MOT two-stage association on pedestrian class, Waymo Open Dataset validation set.
On the other hand, Table 6 represents a three-stage data association, where only the cyclist class has a low false positive value (FP), the unmatched detections and unmatched tracklets are associated using 3D DIoU coupled with the Hungarian algorithm.
Table 6. Three-dimensional MOT three-stage association on cyclist class, Waymo Open Dataset validation set.

4.4. Comparison with Previous Techniques

In this part, we incorporate the aforementioned methods into the combined DIoU-NMS and DIoU data association in order to demonstrate how the performance can be enhanced. Table 7 below shows our proposed 3D MOT trackers perform better than the baselines. In the case of the Waymo Open Dataset, although the dimensions of vehicles and pedestrians are much different, DIoU-NMS and two-stage DIoU data association are adequate and appropriate for both vehicle and pedestrian objects due to high tracking performance values, as the results are illustrated in Table 7 and Table 8. The comparison for vehicle class in the Waymo Open Dataset test set is tabulated in Table 7, which CenterPoint [8] recognitions are utilized. For comparison, the results from AB3DMOT [7] and Chiu et al. [5] are also presented. On the same note, Table 8 highlights the results for pedestrian class in the Waymo Open Dataset test set. Meanwhile, the three-stage technique is only applicable to cyclist objects due to the limitation of MOTA value computation for vehicles and pedestrians in this multi-stage evaluation. DIoU-NMS shows effective results on the nuScenses dataset, as shown in Table 9. In this case, CenterPoint [8] detection is utilized and compared. In all tests, 2 Hz frame rate is used for the detection.
Table 7. Comparison on Waymo Open Dataset test set, vehicle class.
Table 8. Comparison on Waymo Open Dataset test set, pedestrian class.
Table 9. Comparison on the nuScenes test set.

4.5. Comparison between GIoU and DIoU

As a comparison between the association metric between GIoU and DIoU, the score threshold for selecting the boxes is set equal to 0.7. The GIoU association threshold is equal to 1.5 and in the case of DIoU, it is equal to 1. Meanwhile, the NMS-IoU threshold is equal to 0. The figures below show the associations between the detection results (green boxes) and the predicted results (blue boxes). In this case, it can be seen that for both DIoU and GIoU in the first stage, and for DIoU, the predicted box is preserved until the object is detected again. Meanwhile, in the GIoU case, the predicted box is terminated when the object is temporarily not observed, causing an identity switch, as illustrated in Figure 6. At frame 9, which the figures shown in the first row, the detected box for vehicle ID number 2 (green box) is associated with its predicted box (blue box). The second row showed frame 11, which contains the tracklet for vehicle ID 10. We also use DIoU with a predicted box for vehicle ID 2 for association process. However, when we apply GIoU on the association process, we obtained the tracklet only for vehicle ID number 10 and the predicted box for vehicle ID 2 is terminated, as shown by the first column in the second row of Figure 6. At frame 28, the tracklet for vehicle 3 and 11 and predicted position for vehicles 0, 1, 2, and 6 are shown in final row, second column when we apply DIoU on the association process. On the other hand, when we apply GIoU on association process, we obtain the tracklet for vehicle 3 and 11 only, and the predicted boxes are terminated for vehicles 0, 1, 2, and 6, which lead to an increase in ID switch and lowering of the multiple object tracking accuracy (MOTA) value.
Figure 6. Comparison between GIoU and DIoU for Association process.

5. Conclusions

It was discovered that tracklet termination leads to identity switches in 3D MOT, which are common and unresolved issues in recent 3D MOT studies. Therefore, in this paper, we proposed a hybrid method of using DIoU-NMS and DIoU in order to improve the association between tracklet and prediction boxes for objects. We found that by using the combination of DIoU-NMS and DIoU, the identity switch cases can be reduced.
Additionally, we used DIoU for multi-stage association, which lead to an increase in MOTA values for small objects on the Waymo Open Dataset. Experiment results show that DIoU-NMS can significantly reduce the identity switches when it is used in selecting the detectors for tracking. Our approach achieved 479 ID switches for the vehicle objects on nuScense compared with 519 for GIoU only. While the mismatches were improved slightly for vehicles and pedestrian objects, the MOTA results also recorded better performance in tracking on the Waymo Open Dataset. Meanwhile, two-stage data association results demonstrated significant improvements in MOTA values with 58.9% and 59.7% for vehicle and pedestrian objects, respectively. The FP values also significantly improved, which are 11.1% and 47.7% for vehicle and pedestrian objects, respectively. In addition, using DIoU for three-stage association reduced the false positive detection as well as improved MOTA values.
In comparison to previous work, our method recorded significant improvement in ID mismatches, which achieved at least 63.8% and 28.6% reductions for vehicle and pedestrian objects, respectively. Similarly, test results on Waymo Open Dataset show MOTA values for both vehicles and pedestrian objects reach over 60%, overtaking all previously issued LiDAR-based methods. The results show great potential for future 3D MOT analysis and can pave the path for many real-time 3D tracking-by-detection applications.

Author Contributions

Conceptualization and investigation, S.A.K.M.; Supervision, writing—review and editing, M.Z.A.R.; Validation, Rahman, A.H.A.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Ministry of Higher Education (MOHE) Malaysia under research grant FRGS/1/2020/STG07/UKM/02/3 and the APC was partially funded by Universiti Kebangsaan Malaysia (UKM).

Data Availability Statement

Data supporting the conclusions of this manuscript are provided within the article and will be available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Pang, Z.; Li, Z.; Wang, N. Model-Free Vehicle Tracking and State Estimation in Point Cloud Sequences. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 8075–8082. [Google Scholar]
  2. Qi, C.R.; Zhou, Y.; Najibi, M.; Sun, P.; Vo, K.; Deng, B.; Anguelov, D. Offboard 3d Object Detection from Point Cloud Sequences. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6134–6144. [Google Scholar]
  3. Liu, Y.; Wang, W.; Chambers, J.; Kilic, V.; Hilton, A. Particle Flow SMC-PHD Filter for Audio-Visual Multi-Speaker Tracking. In Proceedings of the Latent Variable Analysis and Signal Separation: 13th International Conference, LVA/ICA 2017, Grenoble, France, 21–23 February 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 344–353. [Google Scholar]
  4. Benbarka, N.; Schröder, J.; Zell, A. Score Refinement for Confidence-Based 3D Multi-Object Tracking. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 8083–8090. [Google Scholar]
  5. Kuang Chiu, H.; Prioletti, A.; Li, J.; Bohg, J. Probabilistic 3d Multi-Object Tracking for Autonomous Driving. arXiv 2020, arXiv:2001.05673. [Google Scholar]
  6. Pöschmann, J.; Pfeifer, T.; Protzel, P. Factor Graph Based 3d Multi-Object Tracking in Point Clouds. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October–24 January 2020; pp. 10343–10350. [Google Scholar]
  7. Weng, X.; Wang, J.; Held, D.; Kitani, K. 3d Multi-Object Tracking: A Baseline and New Evaluation Metrics. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October–24 January 2020; pp. 10359–10366. [Google Scholar]
  8. Yin, T.; Zhou, X.; Krahenbuhl, P. Center-Based 3d Object Detection and Tracking. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 11784–11793. [Google Scholar]
  9. Zaech, J.-N.; Liniger, A.; Dai, D.; Danelljan, M.; Van Gool, L. Learnable Online Graph Representations for 3d Multi-Object Tracking. IEEE Robot. Autom. Lett. 2022, 7, 5103–5110. [Google Scholar] [CrossRef]
  10. Mahalanobis, P.C. On the Generalised Distance in Statistics. Proc. Natl. Inst. Sci. India 1936, 12, 49–55. [Google Scholar]
  11. Zhou, Y.; Tuzel, O. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4490–4499. [Google Scholar]
  12. Sun, P.; Kretzschmar, H.; Dotiwalla, X.; Chouard, A.; Patnaik, V.; Tsui, P.; Guo, J.; Zhou, Y.; Chai, Y.; Caine, B. Scalability in Perception for Autonomous Driving: Waymo Open Dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2446–2454. [Google Scholar]
  13. Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. Nuscenes: A Multimodal Dataset for Autonomous Driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11621–11631. [Google Scholar]
  14. Pang, Z.; Li, Z.; Wang, N. Simpletrack: Understanding and Rethinking 3d Multi-Object Tracking. In Proceedings of the Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2023; pp. 680–696. [Google Scholar]
  15. Li, X.; Ma, C.; Wu, B.; He, Z.; Yang, M.-H. Target-Aware Deep Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1369–1378. [Google Scholar]
  16. Yuan, D.; Shu, X.; Fan, N.; Chang, X.; Liu, Q.; He, Z. Accurate Bounding-Box Regression with Distance-IoU Loss for Visual Tracking. J. Vis. Commun. Image Represent. 2022, 83, 103428. [Google Scholar] [CrossRef]
  17. Zhihao, C.A.I.; Longhong, W.; Jiang, Z.; Kun, W.U.; Yingxun, W. Virtual Target Guidance-Based Distributed Model Predictive Control for Formation Control of Multiple UAVs. Chin. J. Aeronaut. 2020, 33, 1037–1056. [Google Scholar]
  18. Huang, Y.; Liu, W.; Li, B.; Yang, Y.; Xiao, B. Finite-Time Formation Tracking Control with Collision Avoidance for Quadrotor UAVs. J. Franklin Inst. 2020, 357, 4034–4058. [Google Scholar] [CrossRef]
  19. Dewangan, D.K.; Sahu, S.P. Lane Detection in Intelligent Vehicle System Using Optimal 2-Tier Deep Convolutional Neural Network. Multimed. Tools Appl. 2023, 82, 7293–7317. [Google Scholar] [CrossRef]
  20. Dicle, C.; Camps, O.I.; Sznaier, M. The Way They Move: Tracking Multiple Targets with Similar Appearance. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 1–8 December 2013; pp. 2304–2311. [Google Scholar]
  21. Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple Online and Realtime Tracking. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar]
  22. Bergmann, P.; Meinhardt, T.; Leal-Taixe, L. Tracking without Bells and Whistles. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Long Beach, CA, USA, 15–20 June 2019; pp. 941–951. [Google Scholar]
  23. Lu, Z.; Rathod, V.; Votel, R.; Huang, J. Retinatrack: Online Single Stage Joint Detection and Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 14668–14678. [Google Scholar]
  24. Sadeghian, A.; Alahi, A.; Savarese, S. Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 300–311. [Google Scholar]
  25. Zhang, Y.; Wang, C.; Wang, X.; Zeng, W.; Liu, W. Fairmot: On the Fairness of Detection and Re-Identification in Multiple Object Tracking. Int. J. Comput. Vis. 2021, 129, 3069–3087. [Google Scholar] [CrossRef]
  26. Leal-Taixé, L.; Canton-Ferrer, C.; Schindler, K. Learning by Tracking: Siamese CNN for Robust Target Association. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA, 27–30 July 2016; pp. 33–40. [Google Scholar]
  27. Li, J.; Gao, X.; Jiang, T. Graph Networks for Multiple Object Tracking. In Proceedings of the IEEE/CVF winter Conference on Applications of Computer Vision, Snowmass, CO, USA, 1–5 March 2020; pp. 719–728. [Google Scholar]
  28. Wojke, N.; Bewley, A.; Paulus, D. Simple Online and Realtime Tracking with a Deep Association Metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar]
  29. Patil, A.; Malla, S.; Gang, H.; Chen, Y.-T. The H3d Dataset for Full-Surround 3d Multi-Object Detection and Tracking in Crowded Urban Scenes. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 9552–9557. [Google Scholar]
  30. Shi, S.; Wang, X.; Li, H. PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 770–779. [Google Scholar]
  31. Kuhn, H.W. The Hungarian Method for the Assignment Problem. Nav. Res. Logist. Q. 1955, 2, 83–97. [Google Scholar] [CrossRef]
  32. Chiu, H.; Li, J.; Ambruş, R.; Bohg, J. Probabilistic 3d Multi-Modal, Multi-Object Tracking for Autonomous Driving. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–June 5 2021; pp. 14227–14233. [Google Scholar]
  33. Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
  34. Zhou, X.; Koltun, V.; Krähenbühl, P. Tracking Objects as Points. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 474–490. [Google Scholar]
  35. Yang, B.; Bai, M.; Liang, M.; Zeng, W.; Urtasun, R. Auto4d: Learning to Label 4d Objects from Sequential Point Clouds. arXiv 2021, arXiv:2101.06586. [Google Scholar]
  36. Weng, X.; Wang, Y.; Man, Y.; Kitani, K.M. Gnn3dmot: Graph Neural Network for 3d Multi-Object Tracking with 2d-3d Multi-Feature Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 6499–6508. [Google Scholar]
  37. Kim, A.; Ošep, A.; Leal-Taixé, L. Eagermot: 3d Multi-Object Tracking via Sensor Fusion. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 11315–11321. [Google Scholar]
  38. He, J.; Huang, Z.; Wang, N.; Zhang, Z. Learnable Graph Matching: Incorporating Graph Partitioning with Deep Feature Learning for Multiple Object Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 5299–5309. [Google Scholar]
  39. Liu, Y.; Wang, W.; Kilic, V. Intensity Particle Flow Smc-Phd Filter for Audio Speaker Tracking. arXiv 2018, arXiv:1812.01570. [Google Scholar]
  40. Liu, Y.; Hu, Q.; Zou, Y.; Wang, W. Labelled Non-Zero Particle Flow for Smc-Phd Filtering. In Proceedings of the ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 5197–5201. [Google Scholar]
  41. Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; Huang, T. Unitbox: An Advanced Object Detection Network. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 516–520. [Google Scholar]
  42. Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
  43. Bernardin, K.; Stiefelhagen, R. Evaluating Multiple Object Tracking Performance: The Clear Mot Metrics. EURASIP J. Image Video Process. 2008, 2008, 1–10. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.