Real-Time Robust 2.5D Stereo Multi-Object Tracking with Lightweight Stereo Matching Algorithm
Highlights
- Lightweight stereo matching using only bounding box coordinates achieves robust multi-object tracking with a MOTA of 0.932 and an IDF1 of 0.823, outperforming state-of-the-art monocular trackers.
- A dual-tracker design with a re-identification mechanism maintains consistent object identities during occlusions and truncations by leveraging stereo redundancy.
- Resource-efficient 2.5D tracking enables real-time deployment (70 FPS) on standard hardware without expensive 3D reconstruction or dense stereo matching.
- Stereo vision’s inherent redundancy provides a practical solution for robust tracking in challenging real-world scenarios like retail monitoring and autonomous systems.
Abstract
1. Introduction
2. Related Work
2.1. Stereo Vision-Based Object Tracking
2.2. Deep Learning-Based Object Detection
2.3. Depth Estimation and Stereo Matching Algorithms
2.4. Multi-Object Tracking Under Occlusion and Truncation
3. Materials and Methods
3.1. Overview of the Proposed Framework
3.2. Stage 1: Object Detection from Stereo Image Pairs
| Algorithm 1: Stereo Object Detection (Pseudocode) |
| Input: Stereo image pair (IL, IR) at time t Output: Detection sets DL and DR 1: procedure STEREO_OBJECT_DETECTION(IL, IR) 2: // Synchronize and preprocess images 3: IL_sync, IR_sync ← SYNCHRONIZE(IL, IR) 4: 5: // Initialize the detection sets 6: DL ← ∅, DR ← ∅ 7: 8: // Detection on left image 9: DL ← DETECT_OBJECTS(IL_sync) 10: for each detection dL in DL do 11: dL.class ← CLASSIFY(dL) 12: dL.bbox ← (xL, yL, wL, hL) 13: dL.conf ← CONFIDENCE_SCORE(dL) 14: end for 15: 16: // Detection on right image 17: DR ← DETECT_OBJECTS(IR_sync) 18: for each detection dR in DR do 19: dR.class ← CLASSIFY(dR) 20: dR.bbox ← (xR, yR, wR, hR) 21: dR.conf ← CONFIDENCE_SCORE(dR) 22: end for 23: 24: return DL, DR 25: end procedure |
3.3. Stage 2: Stereo Matching for Optimal Object Pairing
| Algorithm 2: Stereo Object Matching (Pseudocode) |
| Input: Detection sets DL and DR, camera parameters (b, f) Output: Stereo object pairs P and their depths Z 1: procedure STEREO_MATCHING(DL, DR, b, f) 2: // Initialize cost matrix 3: C ← zeros(|DL|, |DR|) 4: Z_temp ← zeros(|DL|, |DR|) 5: 6: // Compute matching costs for all pairs 7: for i = 1 to |DL| do 8: for j = 1 to |DR| do 9: // Assume correspondence and estimate depth 10: Z_temp[i,j] ← TRIANGULATE(DL[i].center, DR[j].center, b, f) 11: 12: // Calculate expected disparity 13: d ← b × f/Z_temp[i,j] 14: 15: // Warp bounding box and compute IoU 16: bbox_warped ← WARP_BBOX(DL[i].bbox, d) 17: C[i,j] ← −IoU(bbox_warped, DR[j].bbox) 18: end for 19: end for 20: 21: // Find optimal assignment using Hungarian algorithm 22: P ← HUNGARIAN_ASSIGNMENT(C) 23: 24: // Filter matches by threshold and assign depths 25: for each match (i,j) in P do 26: if C[i,j] > −τ then //Since C contains negative IoU values 27: Remove (i,j) from P 28: else 29: Z[i,j] ← Z_temp[i,j] 30: end if 31: end for 32: 33: return P, Z 34: end procedure |
3.4. Stage 3: Stereo Tracking with Temporal Data Association
- Unassociated Trackers: When existing trackers fail to associate with any detected object, the system increments their age counter if the age is below the maximum threshold. As shown in the right branch of Figure 3, for trackers whose age equals or exceeds the maximum threshold, the system checks their status in the stereo track association map (denoted as T in Figure 3 and Algorithm 3). If the tracker exists in the map and its paired tracker also has an age equal to or exceeding the maximum threshold, both trackers are deleted. If the paired tracker’s age is below the threshold, the current tracker is maintained without updates to preserve stereo consistency. Trackers not included in the map are immediately deleted when their age reaches the maximum threshold [23].
- Unassociated Objects: The middle branch of Figure 3 illustrates the handling of detected objects that lack tracker associations. The system first verifies their stereo pairing status from P (denoted in Algorithm 2 and Figure 3). Unpaired objects generate new independent trackers. For stereo object pairs, if their corresponding object has an existing tracker in the stereo track association map, a new tracker is created for the object but inherits the ID from the previously tracked object, implementing re-identification that maintains consistent ID after temporary occlusions or truncations while starting fresh motion estimation from the current position [49]. Otherwise, a new tracker for the object is initialized with a new ID.
- Associated Tracker–Object Pairs: The left branch of Figure 3 shows that successfully associated pairs replace the predicted tracker position with the actual detected object position in the current frame. Specifically, while the motion estimator provides predicted positions based on historical trajectory information, the association confirms the actual object location, allowing the tracker state to be updated with this ground-truth position rather than relying on the prediction [1].
| Algorithm 3: Stereo Tracking (Pseudocode) |
| Input: Detection sets DL and DR, Stereo object pairs P, Tracker sets TL and TR, Association Map T (list) Output: Updated trackers and association map 1: procedure STEREO_TRACKING(DL, DR, P, TL, TR, T) 2: // Predict tracker positions for all trackers 3: for each tracker t in TL ∪ TR do 4: t.predicted ← PREDICT(t.estimator) 5: end for 6: 7: // Temporal data association (returns tracker-object pairs) 8: AL ← ASSOCIATE(TL, DL) //Associated tracker-object pairs for left 9: AR ← ASSOCIATE(TR, DR) //Associated tracker-object pairs for right 10: 11: // Process unassociated trackers 12: for each tracker t in TL ∪ TR do 13: if t not in TRACKERS(AL ∪ AR) then //t is not in associated trackers 14: if t.age ≥ max_age then 15: if EXISTS_IN_MAP(t, T) then 16: paired_t ← GET_PAIRED_TRACKER(t, T) 17: if paired_t.age ≥ max_age then 18: DELETE(t) 19: DELETE(paired_t) 20: end if 21: //Otherwise, maintain tracker without update 22: else 23: DELETE(t) 24: end if 25: else 26: t.age ← t.age + 1 27: end if 28: end if 29: end for 30: 31: // Handle unassociated objects 32: for each object o not in OBJECTS(AL ∪ AR) do //o is not in associated objects 33: if HAS_STEREO_PAIR(o, P) then 34: if PAIRED_TRACKER_EXISTS(o, P, T) then 35: paired_tracker ← GET_PAIRED_TRACKER(o, P, T) 36: // Create new tracker with re-identified ID 37: t_new ← CREATE_TRACKER(o) 38: t_new.id ← paired_tracker.id //Re-identification 39: t_new.age ← 0 40: if o ∈ DL then ADD_TO_TL(t_new) else ADD_TO_TR(t_new) 41: else 42: t_new ← CREATE_TRACKER(o) 43: t_new.id ← GENERATE_NEW_ID() 44: t_new.age ← 0 45: if o ∈ DL then ADD_TO_TL(t_new) else ADD_TO_TR(t_new) 46: end if 47: else 48: t_new ← CREATE_TRACKER(o) 49: t_new.id ← GENERATE_NEW_ID() 50: t_new.age ← 0 51: if o ∈ DL then ADD_TO_TL(t_new) else ADD_TO_TR(t_new) 52: end if 53: end for 54: 55: // Update associated trackers with actual detected positions 56: for each (tracker t, object o) in AL ∪ AR do 57: // Replace predicted position with actual detection 58: t.position ← o.position 59: t.age ← 0 //Reset age for associated trackers 60: UPDATE_TRACKER_STATE(t, o) 61: end for 62: 63: // Update motion estimators for associated and new trackers only 64: for each tracker t in TL ∪ TR do 65: if (t.age = 0) then //Associated or newly created trackers 66: UPDATE_MOTION_ESTIMATOR(t) 67: end if 68: end for 69: 70: // Update stereo track association map (list structure) 71: T ← UPDATE_STEREO_ASSOCIATIONS(P, TL, TR, T) 72: 73: return TL, TR, T 74: end procedure |
4. Results
4.1. Experimental Setup
4.1.1. Stereo Vision System Configuration
4.1.2. Computational Platform
4.2. Dataset Preparation
4.2.1. Custom Dataset Construction
4.2.2. Data Collection and Annotation
4.3. Implementation Details
4.3.1. Object Detection Model Selection and Training
4.3.2. Tracking Algorithm Implementation
4.3.3. 2.5D Multi-Object Tracking Framework Integration
4.4. Depth Estimation Accuracy Evaluation
4.4.1. Validation Methodology
4.4.2. Experimental Results
4.5. Multi-Object Tracking Performance
4.5.1. Evaluation Dataset and Metrics
4.5.2. Comparative Analysis
5. Discussion
5.1. Analysis of Stereo Matching Performance
5.2. Robustness to Occlusions and Truncations
5.3. Computational Efficiency Considerations
5.4. Limitations and Practical Considerations
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| AssA | Association Accuracy |
| AssR | Association Recall |
| BoT-SORT | Bag of Tricks for Simple Online and Realtime Tracking |
| CPU | Central Processing Unit |
| DeepSORT | Deep Simple Online and Realtime Tracking |
| DetA | Detection Accuracy |
| DLT | Direct Linear Transform |
| FPS | Frames per second |
| GPU | Graphics Processing Unit |
| HOTA | Higher Order Tracking Accuracy |
| ID | Identity |
| IDF1 | Identification F1 Score |
| IoU | Intersection over Union |
| LocA | Localization Accuracy |
| MAE | Mean Absolute Error |
| MOT | Multi-Object Tracking |
| MOTA | Multiple Object Tracking Accuracy |
| OCM | Observation-Centric Momentum |
| OCR | Observation-Centric Reassociation |
| OC-SORT | Observation-Centric SORT |
| P95 | 95th Percentile Error |
| RMSE | Root Mean Square Error |
| SORT | Simple Online and Realtime Tracking |
| YOLO | You Only Look Once |
References
- Luo, W.; Xing, J.; Milan, A.; Zhang, X.; Liu, W.; Kim, T.-K. Multiple object tracking: A literature review. Artif. Intell. 2021, 293, 103448. [Google Scholar] [CrossRef]
- Reid, D. An algorithm for tracking multiple targets. IEEE Trans. Autom. Control. 2003, 24, 843–854. [Google Scholar] [CrossRef]
- Bar-Shalom, Y.; Fortmann, T.E.; Cable, P.G. Tracking and Data Association; Academic Press Professional, Inc.: San Diego, CA, USA, 1990. [Google Scholar]
- Breitenstein, M.D.; Reichlin, F.; Leibe, B.; Koller-Meier, E.; Van Gool, L. Robust tracking-by-detection using a detector confidence particle filter. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 27 September–4 October 2009; pp. 1515–1522. [Google Scholar]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? the kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
- Luiten, J.; Osep, A.; Dendorfer, P.; Torr, P.; Geiger, A.; Leal-Taixé, L.; Leibe, B. Hota: A higher order metric for evaluating multi-object tracking. Int. J. Comput. Vis. 2021, 129, 548–578. [Google Scholar] [CrossRef] [PubMed]
- Weng, X.; Wang, J.; Held, D.; Kitani, K. 3d multi-object tracking: A baseline and new evaluation metrics. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 10359–10366. [Google Scholar]
- Yin, T.; Zhou, X.; Krahenbuhl, P. Center-based 3d object detection and tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 11784–11793. [Google Scholar]
- Kim, A.; Ošep, A.; Leal-Taixé, L. Eagermot: 3d multi-object tracking via sensor fusion. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 11315–11321. [Google Scholar]
- Zhu, Y.; Wang, T.; Zhu, S. Adaptive multi-pedestrian tracking by multi-sensor: Track-to-track fusion using monocular 3D detection and MMW radar. Remote Sens. 2022, 14, 1837. [Google Scholar] [CrossRef]
- Ahmadyan, A.; Hou, T.; Wei, J.; Zhang, L.; Ablavatski, A.; Grundmann, M. Instant 3D object tracking with applications in augmented reality. arXiv 2020, arXiv:2006.13194. [Google Scholar] [CrossRef]
- Hu, H.-N.; Cai, Q.-Z.; Wang, D.; Lin, J.; Sun, M.; Krahenbuhl, P.; Darrell, T.; Yu, F. Joint monocular 3D vehicle detection and tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 5390–5399. [Google Scholar]
- Hu, H.-N.; Yang, Y.-H.; Fischer, T.; Darrell, T.; Yu, F.; Sun, M. Monocular quasi-dense 3d object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1992–2008. [Google Scholar] [CrossRef] [PubMed]
- Tosi, F.; Bartolomei, L.; Poggi, M. A survey on deep stereo matching in the twenties. Int. J. Comput. Vis. 2025, 133, 4245–4276. [Google Scholar] [CrossRef]
- Li, Y.; Ibanez-Guzman, J. Lidar for autonomous driving: The principles, challenges, and trends for automotive lidar and perception systems. IEEE Signal Process. Mag. 2020, 37, 50–61. [Google Scholar] [CrossRef]
- Roriz, R.; Cabral, J.; Gomes, T. Automotive LiDAR technology: A survey. IEEE Trans. Intell. Transp. Syst. 2021, 23, 6282–6297. [Google Scholar] [CrossRef]
- Scharstein, D.; Szeliski, R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 2002, 47, 7–42. [Google Scholar] [CrossRef]
- Hirschmuller, H. Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 30, 328–341. [Google Scholar] [CrossRef] [PubMed]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO, version 8.3.182; Ultralytics: Online, 2023. [Google Scholar]
- Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Pollefeys, M.; Koch, R.; Van Gool, L. A simple and efficient rectification method for general motion. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; pp. 496–501. [Google Scholar]
- Bolya, D.; Foley, S.; Hays, J.; Hoffman, J. Tide: A general toolbox for identifying object detection errors. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 558–573. [Google Scholar]
- Bergmann, P.; Meinhardt, T.; Leal-Taixe, L. Tracking without bells and whistles. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Honolulu, HI, USA, 26 October–1 November 2019; pp. 941–951. [Google Scholar]
- Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. Bytetrack: Multi-object tracking by associating every detection box. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 1–21. [Google Scholar]
- Zhang, F.; Prisacariu, V.; Yang, R.; Torr, P.H. Ga-net: Guided aggregation net for end-to-end stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 185–194. [Google Scholar]
- Kendall, A.; Martirosyan, H.; Dasgupta, S.; Henry, P.; Kennedy, R.; Bachrach, A.; Bry, A. End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 66–75. [Google Scholar]
- Zhang, S.; Zheng, L.; Tao, W. Survey and evaluation of RGB-D SLAM. IEEE Access 2021, 9, 21367–21387. [Google Scholar] [CrossRef]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A Multimodal Dataset for Autonomous Driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11621–11631. [Google Scholar]
- Li, P.; Shi, J.; Shen, S. Joint spatial-temporal optimization for stereo 3D object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6877–6886. [Google Scholar]
- Karaev, N.; Rocco, I.; Graham, B.; Neverova, N.; Vedaldi, A.; Rupprecht, C. Dynamicstereo: Consistent dynamic depth from stereo videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–20 June 2023; pp. 13229–13239. [Google Scholar]
- Zhang, Y.; Poggi, M.; Mattoccia, S. Temporalstereo: Efficient spatial-temporal stereo matching network. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 9528–9535. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
- Chang, J.-R.; Chen, Y.-S. Pyramid stereo matching network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2018; pp. 5410–5418. [Google Scholar]
- Milan, A.; Leal-Taixé, L.; Reid, I.; Roth, S.; Schindler, K. MOT16: A benchmark for multi-object tracking. arXiv 2016, arXiv:1603.00831. [Google Scholar] [CrossRef]
- Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar]
- Cao, J.; Pang, J.; Weng, X.; Khirodkar, R.; Kitani, K. Observation-centric sort: Rethinking sort for robust multi-object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–20 June 2023; pp. 9686–9696. [Google Scholar]
- Mahler, R.P. Advances in Statistical Multisource-Multitarget Information Fusion; Artech House: London, UK, 2014. [Google Scholar]
- Vo, B.-N.; Vo, B.-T.; Nguyen, T.T.D.; Shim, C. An overview of multi-object estimation via labeled random finite set. IEEE Trans. Signal Process. 2024, 72, 4888–4917. [Google Scholar] [CrossRef]
- Menze, M.; Geiger, A. Object scene flow for autonomous vehicles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3061–3070. [Google Scholar]
- Wang, Z.; Wu, Y.; Niu, Q. Multi-sensor fusion in automated driving: A survey. IEEE Access 2019, 8, 2847–2868. [Google Scholar] [CrossRef]
- Szeliski, R. Computer Vision: Algorithms and Applications; Springer Nature: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 658–666. [Google Scholar]
- Kuhn, H.W. The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 1955, 2, 83–97. [Google Scholar] [CrossRef]
- Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple online and realtime tracking. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar]
- Yu, F.; Li, W.; Li, Q.; Liu, Y.; Shi, X.; Yan, J. Poi: Multiple object tracking with high performance detection and appearance feature. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 36–42. [Google Scholar]
- Zhang, Y.; Wang, C.; Wang, X.; Zeng, W.; Liu, W. Fairmot: On the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis. 2021, 129, 3069–3087. [Google Scholar] [CrossRef]
- Kalman, R.E. A new approach to linear filtering and prediction problems. ASME J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef]
- Chen, L.; Ai, H.; Zhuang, Z.; Shang, C. Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In Proceedings of the 2018 IEEE international conference on multimedia and expo (ICME), San Diego, CA, USA, 23–27 July 2018; pp. 1–6. [Google Scholar]
- Ciaparrone, G.; Sánchez, F.L.; Tabik, S.; Troiano, L.; Tagliaferri, R.; Herrera, F. Deep learning in video multi-object tracking: A survey. Neurocomputing 2020, 381, 61–88. [Google Scholar] [CrossRef]
- Dwyer, B.; Nelson, J.; Hansen, T. Roboflow, version 1.0; Roboflow, Inc.: Des Moines, IA, USA, 2025. [Google Scholar]
- Tian, Y.; Ye, Q.; Doermann, D. Yolov12: Attention-centric real-time object detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
- Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; Tomasi, C. Performance measures and a data set for multi-target, multi-camera tracking. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 17–35. [Google Scholar]
- Aharon, N.; Orfaig, R.; Bobrovsky, B.-Z. BoT-SORT: Robust associations multi-pedestrian tracking. arXiv 2022, arXiv:2206.14651. [Google Scholar]
- Mokdad, S.; Khalid, A.; Nasr, D.; Talib, M.A. Interpretable deep learning: Evaluating YOLO models and XAI techniques for video annotation. In Proceedings of the IET Conference Proceedings CP870, Patna, India, 14–15 July 2023; pp. 487–496. [Google Scholar]
- Yoon, C.; Park, E.; Misra, S.; Kim, J.Y.; Baik, J.W.; Kim, K.G.; Jung, C.K.; Kim, C. Deep learning-based virtual staining, segmentation, and classification in label-free photoacoustic histology of human specimens. Light Sci. Appl. 2024, 13, 226. [Google Scholar] [CrossRef] [PubMed]
- Park, E.; Misra, S.; Hwang, D.G.; Yoon, C.; Ahn, J.; Kim, D.; Jang, J.; Kim, C. Unsupervised inter-domain transformation for virtually stained high-resolution mid-infrared photoacoustic microscopy using explainable deep learning. Nat. Commun. 2024, 15, 10892. [Google Scholar] [CrossRef] [PubMed]
- Beemelmanns, T.; Zahr, W.; Eckstein, L. Explainable Multi-Camera 3D Object Detection with Transformer-Based Saliency Maps. arXiv 2023, arXiv:2312.14606. [Google Scholar] [CrossRef]






| Tracker | MOTA (↑) | IDF1 (↑) | HOTA (↑) | DetA (↑) | AssA (↑) | AssR (↑) | LocA (↑) |
|---|---|---|---|---|---|---|---|
| SORT [45] | 0.920 | 0.651 | 0.665 | 0.828 | 0.535 | 0.550 | 0.909 |
| DeepSORT [36] | 0.920 | 0.601 | 0.692 | 0.912 | 0.525 | 0.529 | 0.981 |
| ByteTrack [24] | 0.874 | 0.609 | 0.616 | 0.775 | 0.492 | 0.510 | 0.888 |
| BoT-SORT [54] | 0.929 | 0.624 | 0.694 | 0.892 | 0.540 | 0.548 | 0.949 |
| OC-SORT [37] | 0.926 | 0.765 | 0.802 | 0.915 | 0.703 | 0.711 | 0.982 |
| StereoSORT (proposed) | 0.932 | 0.823 | 0.844 | 0.919 | 0.775 | 0.787 | 0.981 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, J.; Shin, J.; Park, E.; Kim, D. Real-Time Robust 2.5D Stereo Multi-Object Tracking with Lightweight Stereo Matching Algorithm. Sensors 2025, 25, 6773. https://doi.org/10.3390/s25216773
Lee J, Shin J, Park E, Kim D. Real-Time Robust 2.5D Stereo Multi-Object Tracking with Lightweight Stereo Matching Algorithm. Sensors. 2025; 25(21):6773. https://doi.org/10.3390/s25216773
Chicago/Turabian StyleLee, Jinhyeong, Junyoung Shin, Eunwoo Park, and Daekeun Kim. 2025. "Real-Time Robust 2.5D Stereo Multi-Object Tracking with Lightweight Stereo Matching Algorithm" Sensors 25, no. 21: 6773. https://doi.org/10.3390/s25216773
APA StyleLee, J., Shin, J., Park, E., & Kim, D. (2025). Real-Time Robust 2.5D Stereo Multi-Object Tracking with Lightweight Stereo Matching Algorithm. Sensors, 25(21), 6773. https://doi.org/10.3390/s25216773

