Bidirectional Tracking Method for Construction Workers in Dealing with Identity Errors
Abstract
:1. Introduction
2. Related Works
- Feature extraction (appearance model): employing a person re-identification (ReID) [8] network to extract a unidimensional vector from the ROI.
- TBD treats object detection as a separate detector, while feature extraction and data association are considered trackers [8]. TBD offers the advantage of flexibility in replacing modules with better DNNs or association methods. However, the detector and the tracker cannot enhance each other’s performance. If the detector produces missing or false bounding boxes, this will result in the tracker’s failure to track or accurately identify the target.
- JDT integrates the detector and tracker into one unified network that can be trained end-to-end, such as Siamese [22] or Transformer networks [23]. JDT relies exclusively on appearance features, but the training of DNNs often demands better GPUs and takes significant time. For instance, TransTrack [24] demands 16.1 GB of GPU memory for inference, making it incompatible with the NVIDIA Tesla T4 (15 GB) on Google Colaboratory [25]. To provide a simple and training-free tracking method, TBD is the better choice.
3. Motion Estimation with KF
3.1. Basic KF Formula
3.2. KF Divergence Proof
- If the KF divergence → elements in the covariance matrix is larger → elements in the inverse covariance matrix is smaller → Mahalanobis distance of two different persons’ IDs is smaller than the threshold → leading to “accepting false ID” errors
3.3. KF Processing Example
4. Methods
- Head_tracker for tracking heads (in Section 4.1).
- Body_tracker for tracking bodies (in Section 4.1).
- Intra-frame processing to delete false positives of heads and bodies (in Section 4.2).
- Inter-frame matching to find the pairing relationship between heads and bodies (in Section 4.3).
4.1. Head Tracker and Body Tracker
Algorithm 1: Two Stages of the Matching Algorithm for the Trackers |
Input: M ← number of workers; A ← number of frames; Two detections set: Output: Two tracks set: 1: for frame j in A do: 2: /*observations from the detector and ReID at j*/ /*posterior state from KF by j − 1 */ 3. /*The First match: assignment to list of matches, unmatched_tracks, unmatched_detections:*/ /*cost matrix at j */ Use the linear . if ( > 0.2 and > 2 × head_width): 4. /*The second match: for the remaining detecting-ID in */ Use the linear . if ( > 0.7) or (match_times = 0 and ): |
4.2. Intra-Frame Processing
Algorithm 2: Filter out False Positives of Detected Heads |
Input: M ← number of detected heads; N ← number of remain heads; j = frame; Output: 1: /*sorted by head center yc*/ 2: while i + 1 < M − 1 do: if < −5: /*two heads are very close, maybe FP in here*/ if ≥ 99%: /* confidence ≥ 0.99*/ /*delete i*/ elif : /*delete i + 1*/ else: /*delete i and i + 1*/ i += 1 3: |
Algorithm 3: Filter out False Positives of Detected Bodies |
Input: M ← number of detected bodies; N ← number of remain bodies; j = frame; Output: 1: for i, k in M do: if ≥ 60%: /*two bodies are very close, maybe FPs in here*/ /*lower confidence deleted*/ i += 1 k += 1 2: for i in length(keypoints) do: /*if the body has no more than two effective keypoints in total*/ if torch.sum (one_key[:, −1] ≥ 0.05) < 2: /*lower confidence deleted*/ i += 1 3: |
4.3. Inter-Frame Matching
Algorithm 4: Matching of “Head to Body” across Frames |
Input: MT ← number of tracked heads; NT ← number of tracked bodies; j = frame; Output: 1: if : for k in do: /*find the closest head, and calculate the Euclidean distance of the center*/ if Euclidean > 3×w and > 0.95: 2: if : for k in do: /*find the closest body, and calculate the IoU*/ if IoU > 0.8 and : 3: if : /*matching of new added head–body pairs*/ ; ; Use the linear # row_indices, col_indices = linear_assignment(cost_matrix) |
4.4. Evaluation Metrics
5. Results and Discussion
5.1. Quantitative Results
5.2. Comparision of Other SOTA Methods
5.3. Discussion
- (1)
- Low dependency of detection DNN
- (2)
- Application of head tracking aid in body tracking
- (3)
- Avoidance of the impact of KF divergence issues
- (4)
- Focus on metrics performance with ID errors
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix B
- detector calculation → Intra-frame processing → head_tracker → body_tracker → Inter-frame matching.
No. | Calculation Descriptions | ||||||
---|---|---|---|---|---|---|---|
Input frame #1 | |||||||
1 |
| ||||||
1102.4229 | 275.0947 | 1219.5381 | 563.0864 | 0.9966 | 0 | #body ID = 0 | |
733.7047 | 1.0617 | 789.8318 | 97.3516 | 0.9864 | 0 | #body ID = 1 | |
1312.4270 | 197.8513 | 1434.9780 | 470.1823 | 0.9817 | 0 | #body ID = 2 | |
601.3785 | 0.0000 | 654.7200 | 104.9832 | 0.9561 | 0 | #body ID = 3 | |
1148.3512 | 274.9141 | 1189.9109 | 322.3585 | 0.9955 | 1 | #head ID = 0 | |
1382.5782 | 198.2840 | 1422.7518 | 249.7969 | 0.9886 | 1 | #head ID = 1 ]) | |
Number of Body IDs = 4, Number of Head IDs = 2. | |||||||
2 |
Algorithm 3: delta_del_body = [3]. Then the detection = [601.3785, 0.0000, 654.7200, 104.9832, 0.9561, 0] is deleted, and the left bounding box is: boxes_xyxy = [ | ||||||
1102.422852 | 275.094666 | 1219.538086 | 563.086426 | 0.996561 | 0 | ||
733.704712 | 1.061707 | 789.831787 | 97.351639 | 0.986442 | 0 | ||
1312.427002 | 197.851257 | 1434.978027 | 470.182251 | 0.98167 | 0 | ||
1148.351318 | 274.914093 | 1189.910889 | 322.35849 | 0.995536 | 1 | ||
1382.578247 | 198.283997 | 1422.751831 | 249.796875 | 0.988596 | 1] | ||
3 |
the second match: matches_b, unmatched_tracks_b, unmatched_detections = [] [] [0, 1] matches, unmatched_tracks, unmatched_detections = [] [] [0, 1] The tracking ID in the head sequence set is: 0, 1. | ||||||
4 |
the second match: matches_b, unmatched_tracks_b, unmatched_detections = [] [] [0, 1, 2] The tracking ID in the body sequence set is: 0, 1, 2. | ||||||
5 |
cost_matrix = [ | ||||||
0.003819 | 100,000. | ||||||
100,000. | 100,000. | ||||||
100,000. | 0.] | ||||||
row_indices = bodys, col_indices = heads: [0 2] [0 1] The matched body-ID and head-ID are self.match_body_head = [(0, 0), (2, 1), (1, None)] | |||||||
Input frame #2 | |||||||
1 | Bounding boxes in (x1, y1, x2, y2, confidence, class_id) = tensor ([ | ||||||
1100.2812 | 274.1615 | 1218.8927 | 563.6195 | 0.9977 | 0 | ||
1317.1239 | 199.0990 | 1435.9489 | 468.3447 | 0.9848, | 0 | ||
733.1741 | 0.5491 | 783.7123 | 96.3040 | 0.9834 | 0 | ||
598.9022 | 0.0000 | 644.2981 | 103.0769 | 0.9701 | 0 | ||
1148.8118 | 275.4363 | 1190.7026 | 322.5742 | 0.9915 | 1 | ||
1383.2902 | 199.0062 | 1423.4681 | 248.5701 | 0.9893 | 1]) | ||
Number of Body IDs = 4, Number of Head IDs = 2. | |||||||
2 |
Algorithm 3: delta_del_body = [3]. [598.9022, 0.0000, 644.2981, 103.0769, 0.9701, 0] is deleted. | ||||||
3 |
the second match: matches_b, unmatched_tracks_b, unmatched_detections = [] [] [] | ||||||
4 |
the second match: matches_b, unmatched_tracks_b, unmatched_detections= [] [] [] | ||||||
5 |
body-np_xyxy_final = [ | ||||||
1100.495617 | 274.254816 | 1218.957032 | 563.566199 | 0.997711 | 0 | ||
731.006493 | 0.616912 | 786.759376 | 96.442545 | 0.983394 | 1 | ||
1315.511496 | 198.934008 | 1436.811807 | 468.587705 | 0.984758 | 2] | ||
⋮ ⋮ | |||||||
Input the frame #7 | |||||||
1 | Bounding boxes in (x1, y1, x2, y2, confidence, class_id) = tensor ([ | ||||||
1103.0753 | 278.1667 | 1218.2045 | 573.1316 | 0.9963 | |||
1327.7861 | 199.1935 | 1435.1885 | 466.8471 | 0.9849 | |||
576.0401 | 0.9442 | 636.5553 | 104.2498 | 0.9785 | |||
721.5332 | 2.7390 | 797.3731 | 105.7765 | 0.9664 | |||
646.1354 | 0.9752 | 769.4316 | 103.6911 | 0.8158 | |||
1153.6139 | 279.1069 | 1196.0039 | 327.2707 | 0.9968 | |||
1388.5093 | 198.7742 | 1425.3273 | 243.2225 | 0.9838]) | |||
Number of Body IDs = 5, Number of Head IDs = 2. | |||||||
2 |
Algorithm 3: delta_del_body = [4]. [646.1354, 0.9752, 769.4316, 103.6911, 0.8158] is deleted. | ||||||
3 |
the second match: matches_b, unmatched_tracks_b, unmatched_detections = [] [] [] | ||||||
4 |
the second match: matches_b, unmatched_tracks_b, unmatched_detections = [] [3] [] | ||||||
5 |
head-candidates_tlwh = [ | ||||||
1153.614397 | 278.977509 | 42.391931 | 48.180364 | ||||
1388.411052 | 198.606596 | 37.066294 | 44.685888] | ||||
body_id_list= [0, 2, 1]; head_id_list = [0, 1, None] #match between one newly added body-ID and two existing head-ID cost_matrix = [[100,000. 100,000.]] #100,000 is bigger than the threshold, then body ID = 3 with head-ID = None self.match_body_head = [(0, 0), (2, 1), (1, None), (3, None)] #the newly added track ID’s bounding box is not in the current frame, but will be shown in the next frame. body-np_xyxy_final = [ | |||||||
1102.914528 | 278.100212 | 1218.567574 | 572.995262 | 0.996286 | 0 | ||
570.531019 | 0.777104 | 631.442964 | 104.358195 | 0.978527 | 1 | ||
1322.21706 | 198.937752 | 1440.727778 | 466.620414 | 0.984871 | 2] | ||
Input the frame #8 | |||||||
5 | body-np_xyxy_final = [ | ||||||
1104.565028 | 282.556513 | 1212.796813 | 574.978288 | 0.99675 | 0 | ||
563.08197 | 1.296683 | 623.345229 | 103.507251 | 0.983013 | 1 | ||
1323.317717 | 198.276943 | 1441.398572 | 466.717254 | 0.983189 | 2 | ||
722.836121 | 2.213905 | 787.554428 | 106.905021 | 0.941132 | 3] #new added body-ID = 3 |
References
- Teizer, J. Status quo and open challenges in vision-based sensing and tracking of temporary resources on infrastructure construction sites. Adv. Eng. Inform. 2015, 29, 225–238. [Google Scholar] [CrossRef]
- Xiao, B.; Xiao, H.; Wang, J.; Chen, Y. Vision-based method for tracking workers by integrating deep learning instance segmentation in off-site construction. Autom. Constr. 2022, 136, 104148. [Google Scholar] [CrossRef]
- Golizadeh, H.; Hon, C.K.H.; Drogemuller, R.; Hosseini, M.R. Digital engineering potential in addressing causes of construction accidents. Autom. Constr. 2018, 95, 284–295. [Google Scholar] [CrossRef]
- Freimuth, H.; Koenig, M. Planning and executing construction inspections with unmanned aerial vehicles. Autom. Constr. 2018, 96, 540–553. [Google Scholar] [CrossRef]
- Guo, H.; Yu, Y.; Skitmore, M. Visualization technology-based construction safety management: A review. Autom. Constr. 2017, 73, 135–144. [Google Scholar] [CrossRef]
- Luiten, J.; Os, A.A.; Dendorfer, P.; Torr, P.; Geiger, A.; Leal-Taixé, L.; Leibe, B. HOTA: A Higher Order Metric for Evaluating Multi-object Tracking. Int. J. Comput. Vis. 2021, 129, 548–578. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. ByteTrack: Multi-Object Tracking by Associating Every Detection Box. arXiv 2022, arXiv:2110.06864. [Google Scholar] [CrossRef]
- Wojke, N.; Bewley, A.; Paulus, D. Simple Online and Realtime Tracking with a Deep Association Metric. In Proceedings of the 24th IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar] [CrossRef]
- Cao, J.; Pang, J.; Weng, X.; Khirodkar, R.; Kitani, K. Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking. arXiv 2022, arXiv:2203.14360. [Google Scholar] [CrossRef]
- Liu, Y.; Zhou, Z.; Wang, Y.; Sun, C. Head-Integrated Detecting Method for Workers under Complex Construction Scenarios. Buildings 2024, 14, 859. [Google Scholar] [CrossRef]
- Dendorfer, P.; Ošep, A.; Milan, A.; Schindler, K.; Cremers, D.; Reid, I.; Roth, S.; Leal-Taixé, L. MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking. arXiv 2020, arXiv:2010.07548. [Google Scholar] [CrossRef]
- Leal-Taixé, L.; Milan, A.; Reid, I.; Roth, S.; Schindler, K. MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking. arXiv 2015, arXiv:1504.01942. [Google Scholar] [CrossRef]
- Ciaparrone, G.; Sánchez, F.L.; Tabik, S.; Troiano, L.; Tagliaferri, R.; Herrera, F. Deep Learning in Video Multi-Object Tracking: A Survey. arXiv 2019, arXiv:1907.12740. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv 2016, arXiv:1506.01497. [Google Scholar] [CrossRef] [PubMed]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. arXiv 2015, arXiv:1506.02640. [Google Scholar] [CrossRef]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. arXiv 2019, arXiv:1904.08189. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. arXiv 2015, arXiv:1504.08083. [Google Scholar] [CrossRef]
- Bewley, A.; Ge, Z.Y.; Ott, L.; Ramov, F.; Upcroft, B. Simple Onlne and Realtime Tracking. In Proceedings of the 23rd IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. arXiv 2018, arXiv:1703.06870. [Google Scholar] [CrossRef]
- Bashar, M.; Islam, S.; Hussain, K.K.; Hasan, M.B.; Rahman, A.B.M.A.; Kabir, M.H. Multiple Object Tracking in Recent Times: A Literature Review. arXiv 2022, arXiv:2209.04796. [Google Scholar] [CrossRef]
- Shuai, B.; Berneshawi, A.; Li, X.; Modolo, D.; Tighe, J. SiamMOT: Siamese Multi-Object Tracking. arXiv 2021, arXiv:2105.11595. [Google Scholar] [CrossRef]
- Meinhardt, T.; Kirillov, A.; Leal-Taixe, L.; Feichtenhofer, C. TrackFormer: Multi-Object Tracking with Transformers. arXiv 2021, arXiv:2101.02702. [Google Scholar] [CrossRef]
- Sun, P.; Cao, J.; Jiang, Y.; Zhang, R.; Xie, E.; Yuan, Z.; Wang, C.; Luo, P. TransTrack: Multiple Object Tracking with Transformer. arXiv 2020, arXiv:2012.15460. [Google Scholar] [CrossRef]
- Google. Google Colaboratory. 2023. Available online: https://colab.research.google.com/ (accessed on 29 August 2023).
- Maggiolino, G.; Ahmad, A.; Cao, J.; Kitani, K. Deep OC-SORT: Multi-Pedestrian Tracking by Adaptive Re-Identification. arXiv 2023, arXiv:2302.11813. [Google Scholar] [CrossRef]
- Yang, M.; Han, G.; Yan, B.; Zhang, W.; Qi, J.; Lu, H.; Wang, D. Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking. arXiv 2023, arXiv:2308.00783. [Google Scholar] [CrossRef]
- Duan, P.; Zhou, J.; Goh, Y.M. Spatial-temporal analysis of safety risks in trajectories of construction workers based on complex network theory. Adv. Eng. Inform. 2023, 5, 101990. [Google Scholar] [CrossRef]
- Aharon, N.; Orfaig, R.; Bobrovsky, B.-Z. BoT-SORT: Robust Associations Multi-Pedestrian Tracking. arXiv 2022, arXiv:2206.14651. [Google Scholar] [CrossRef]
- Du, Y.; Zhao, Z.; Song, Y.; Zhao, Y.; Su, F.; Gong, T.; Meng, H. StrongSORT: Make DeepSORT Great Again. arXiv 2022, arXiv:2202.13514. [Google Scholar] [CrossRef]
- Wang, Z.; Zhao, H.; Li, Y.L.; Wang, S.; Torr, P.; Bertinetto, L. Do Different Tracking Tasks Require Different Appearance Models? arXiv 2021, arXiv:2107.02156. [Google Scholar] [CrossRef]
- Zhang, Y.; Wang, C.; Wang, X.; Zeng, W.; Liu, W. FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking. arXiv 2020, arXiv:2004.01888. [Google Scholar] [CrossRef]
- Chu, P.; Wang, J.; You, Q.; Ling, H.; Liu, Z. TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking. arXiv 2021, arXiv:2104.00194. [Google Scholar] [CrossRef]
- Zheng, L.; Shen, L.; Tian, L.; Wang, S.; Bu, J.; Tian, Q. Person Re-identification Meets Image Search. arXiv 2015, arXiv:1502.02171. [Google Scholar] [CrossRef]
- Bernardin, K.; Stiefelhagen, R. Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. Eurasip J. Image Video Process. 2008, 2008, 246309. [Google Scholar] [CrossRef]
- Ristani, E.; Solera, F.; Zou, R.S.; Cucchiara, R.; Tomasi, C. Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. arXiv 2016, arXiv:1609.01775. [Google Scholar] [CrossRef]
- KubaRurak. Detectron2-Deepsort-Repo. 2021. Available online: https://github.com/KubaRurak/detectron2-deepsort-repo (accessed on 31 August 2023).
- Konstantinou, E.; Lasenby, J.; Brilakis, I. Adaptive computer vision-based 2D tracking of workers in complex environments. Autom. Constr. 2019, 103, 168–184. [Google Scholar] [CrossRef]
- JonathonLuiten. TrackEval. 2021. Available online: https://github.com/JonathonLuiten/TrackEval (accessed on 31 August 2023).
- Mikel-Brostrom. YOLO_Tracking. 2023. Available online: https://github.com/mikel-brostrom/yolo_tracking#real-time-multi-object-segmentation-and-pose-tracking-using-yolov8--yolo-nas--yolox-with-deepocsort-and-lightmbn (accessed on 31 August 2023).
- pmj110119. YOLOX_Deepsort_Tracker. 2021. Available online: https://github.com/pmj110119/YOLOX_deepsort_tracker (accessed on 31 August 2023).
- Xiao, B.; Kang, S.-C. Vision-Based Method Integrating Deep Learning Detection for Tracking Multiple Construction Machines. J. Comput. Civ. Eng. 2021, 35, 04020071. [Google Scholar] [CrossRef]
- Xiao, B.; Lin, Q.; Chen, Y. A vision-based method for automatic tracking of construction machines at nighttime based on deep learning illumination enhancement. Autom. Constr. 2021, 127, 13. [Google Scholar] [CrossRef]
- Drive, G. Deepsort_Parameters. 2023. Available online: https://drive.google.com/drive/folders/1xhG0kRH1EX5B9_Iz8gQJb7UNnn_riXi6 (accessed on 31 August 2023).
No. | SOTA Methods | Year | Information Types | Advantages (A) & Shortcomings (S) |
---|---|---|---|---|
1 | SORT [19] | 2016 | O: Faster R-CNN; A: None; M: KF + IoU + Hungarian. | A: presented as the baseline; KF state is , s = aspect ratio and r = area. S: highly dependent on detection performance and many IDSWs; no occlusion-solving considerations. |
2 | DeepSORT [8] | 2017 | O: Faster R-CNN; A: ReID (128-d); M: KF + Cosine distance + IoU + Hungarian. | A: presented as the baseline integrate appearance information. KF state is , γ = aspect ratio and h = height. S: detection performance dependency; occlusion-related IDSWs reduced but still frequent; constant-velocity model. |
3 | ByteTrack [7] | 2022 | O: re-trained YOLOX-x on 1400 videos; A: ReID (1024-d); M: KF + Cosine distance + IoU + Hungarian + low scores re-match. | A: best performance in MOT20 and with already published codes. KF state is ; a = aspect ratio and h = height. S: highly dependent on detection performance and many IDSWs; no occlusion considerations; constant-velocity model. |
4 | OC_SORT [9] | 2022 | O: baseline detections in MOTChallenge; A: None; M: KF + IoU + Hungarian + re-update of KF + motion direction difference. | A: first to explain the KF predict errors accumulation in detail; motion direction difference is added in the association cost matrix. KF state is , a = area and s = aspect ratio. S: no real online method for KF update; needs future frame; constant-velocity assumption during occlusion, cannot remain effective during long-term occlusions; detection performance dependency; constant-velocity model. |
5 | Deep OC_SORT [26] | 2023 | O: YOLOX; A: ReID (SBS50, 287MB) + Camera Motion Compensation + Dynamic Appearance; M: KF + IoU + Hungarian + re-update of KF + motion direction difference. | A: Apply Camera Motion Compensation to correct the KF state for better locations of the bounding box; apply detection confidence to modify ReID output vectors. KF state is ; a = area and s = aspect ratio. S: the same as the OC_SORT; constant-velocity model. |
6 | BoTSORT [29] | 2022 | O: Faster R-CNN; A: ReID + Camera Motion Compensation; M: KF + cosine distance + IoU + Hungarian. | A: modify KF state to , s = area and a = aspect ratio; apply Camera Motion Compensation to reduce errors of moving cameras; apply new cost matrix with weights of appearance cost and motion cost. S: the same as the OC_SORT; constant-velocity model; slow when working with sparse optical flow. |
7 | Strong_SORT [30] | 2022 | O: YOLOX-x; A: ReID (BoT) + Camera Motion Compensation; M: NSA-KF + cosine distance + IoU + Hungarian. | A: apply a new cost matrix with weights of appearance cost and motion cost; KF state is ; a = aspect ratio and h = height. S: MOTA is slightly lower, mainly due to the high detection score threshold leading to many missing detections; working speed is not high. |
8 | TransTrack [24] | 2020 | O: re-trained transformer; A: None; M: None. | A: Self-Attention Mechanism and Query-Key pipeline. S: hard to train; no motion information utilization; JDT not better than TBD in performance. |
9 | UniTrack [31] | 2021 | O: ResNet-50; A: ImageNet-supervised appearance model; M: KF + cosine distance + IoU + Hungarian. | A: can support different tracking tasks and leverage many existing general appearance models. KF state is ; a = aspect ratio and h = height. S: not better in terms of metrics performance. |
10 | FairMOT [32] | 2020 | O: encoder–decoder; A: encoder–decoder; M: KF + cosine distance + IoU + Hungarian. | A: one encoder–decoder network to obtain observation and appearance at the same time, with no need for an independent ReID model; S: need training for about 30 h on two RTX 2080 Ti GPUs; still a SORT-related method. |
11 | TransMOT [33] | 2021 | O: spatial–temporal graph Transformer; A: None; M: KF + cosine distance + IoU + Hungarian. | A: A cascaded association structure to handle low confidence detection and long-term occlusion. S: relatively large computing resources and data; no public codes. |
12 | Hybrid-SORT [27] | 2023 | O: YOLOX-x; A: ReID + Camera Motion Compensation; M: KF + cosine distance + IoU + Hungarian + weak cues. | A: apply a new cost matrix with weights of appearance cost, motion cost, four corners’ velocity direction, and height-modulated IoU. KF state is ; r = aspect ratio, s = area and c = confidence score. S: detection performance dependency. |
13 | Xiao et al. [2] | 2023 | O: Mask R-CNN; A: ReID (128-d); M: KF + cosine distance + IoU + Hungarian. | A: baseline in worker tracking; S: needs retraining on a new dataset; does not address severe occlusions; no public codes. |
Metrics | Video-1 | Video-2 | Video-3 | Video-4 | Video-5 | Video-6 | Video-7 | Video-8 | Video-9 | Combined | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | MOTA↑ (%) | 100 | 94.08 | 93.943 | 93.582 | 94.833 | 98.859 | 95.44 | 96.843 | 93.972 | 95.191 |
2 | IDF1↑ (%) | 100 | 97.04 | 96.966 | 96.824 | 97.468 | 99.431 | 97.73 | 98.422 | 97.007 | 97.609 |
3 | HOTA↑ (%) | 93.023 | 77.601 | 81.446 | 75.7 | 74.801 | 76.911 | 78.273 | 81.799 | 77.446 | 78.884 |
4 | AssA↑ (%) | 93.023 | 79.784 | 86.066 | 78.962 | 81.28 | 85.451 | 80.484 | 83.341 | 78.149 | 83.296 |
5 | AssRe↑ (%) | 94.298 | 84.026 | 89.916 | 84.525 | 87.219 | 89.145 | 85.352 | 86.762 | 84.28 | 87.83 |
6 | AssPr↑ (%) | 94.298 | 84.026 | 90.052 | 83.247 | 84.142 | 88.895 | 84.634 | 86.689 | 83.016 | 86.976 |
7 | LocA↑ (%) | 92.568 | 83.922 | 86.915 | 82.642 | 82.643 | 81.798 | 84.326 | 85.815 | 85.614 | 84.725 |
8 | IDSW↓ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
9 | Frag↓ | 0 | 4 | 7 | 21 | 22 | 4 | 6 | 10 | 12 | 86 |
10 | Hz↑ | 9.05 | 6.01 | 5.69 | 5.96 | 7.11 | 6.69 | 6.62 | 5.33 | 6.82 | 6.58 |
11 | MT | 1 | 3 | 9 | 5 | 5 | 4 | 4 | 6 | 4 | 41 |
Metrics | DeepSORT | ByteTrack | Deep OC_SORT | BoTSORT | OC_SORT | Strong_SORT | TransTrack | UniTrack | |
---|---|---|---|---|---|---|---|---|---|
1 | MOTA↑ (%) | 69.914 | 62.515 | 56.429 | 60.699 | 66.002 | 56.698 | 39.345 | 27.441 |
2 | IDF1↑ (%) | 79.771 | 81.829 | 74.471 | 77.365 | 80.633 | 77.065 | 70.459 | 61.887 |
3 | HOTA↑ (%) | 68.418 | 66.915 | 62.425 | 63.208 | 66.596 | 63.893 | 57.607 | 49.201 |
4 | AssA↑ (%) | 74.585 | 77.995 | 71.394 | 73.429 | 76.166 | 74.7 | 72.685 | 63.686 |
5 | AssRe↑ (%) | 78.896 | 82.717 | 75.462 | 76.777 | 80.316 | 78.679 | 76.712 | 68.208 |
6 | AssPr↑ (%) | 84.488 | 83.637 | 83.043 | 85.385 | 86.221 | 84.348 | 82.495 | 74.389 |
7 | LocA↑ (%) | 85.353 | 80.694 | 82.01 | 82.833 | 82.657 | 82.058 | 80.222 | 69.791 |
8 | IDSW↓ | 21 | 5 | 51 | 8 | 10 | 12 | 7 | 15 |
9 | Frag↓ | 101 | 107 | 178 | 180 | 168 | 140 | 161 | 268 |
10 | Hz↑ | 1.77 | 9.09 | 8.33 | 7.14 | 8.47 | 7.69 | 5.13 | 4.30 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, Y.; Wang, Y.; Zhou, Z. Bidirectional Tracking Method for Construction Workers in Dealing with Identity Errors. Mathematics 2024, 12, 1245. https://doi.org/10.3390/math12081245
Liu Y, Wang Y, Zhou Z. Bidirectional Tracking Method for Construction Workers in Dealing with Identity Errors. Mathematics. 2024; 12(8):1245. https://doi.org/10.3390/math12081245
Chicago/Turabian StyleLiu, Yongyue, Yaowu Wang, and Zhenzong Zhou. 2024. "Bidirectional Tracking Method for Construction Workers in Dealing with Identity Errors" Mathematics 12, no. 8: 1245. https://doi.org/10.3390/math12081245
APA StyleLiu, Y., Wang, Y., & Zhou, Z. (2024). Bidirectional Tracking Method for Construction Workers in Dealing with Identity Errors. Mathematics, 12(8), 1245. https://doi.org/10.3390/math12081245