Improved DeepSORT-Based Object Tracking in Foggy Weather for AVs Using Sematic Labels and Fused Appearance Feature Network
Abstract
:1. Introduction
- Instead of a single sensor modal system (video sequence only) used for the detection phase in [9,10], we employed our deep camera-radar fusion network (CR-YOLOnet model) [31] for faster and more accurate object detection in the detection phase of our improved deep learning-based MOT in Figure 2. Our CR-YOLOnet model reached an accuracy (mAP at 0.50) of 0.849 and a speed of 69 fps for small and distant object detection under medium and heavy fog conditions.
- We simulated a real-time autonomous driving environment in CARLA [32]. In addition to the radar and camera sensors, we obtain semantic labels of the ego vehicle environment using semantic segmentation cameras. The semantic segmentation camera presents each object in its field of vision with a distinct color corresponding to the predetermined object category (label). We fed the semantic labels into the segmentation module of our own deep convolutional neural network-based appearance descriptor.
- We replaced the basic convolutional neural network used for appearance descriptors in DeepSORT with our deep convolutional neural network. Our deep appearance descriptor uses a cross-stage partial connection (CSP)-based backbone for low-level feature extraction and a feature pyramid network (FPN)-based neck for multi-scale feature vectors to address objects of different sizes. We incorporate GhostNet into our deep appearance descriptor to replace the traditional convolutional layers used in standard neural networks. Using GhostNet helps to: (i) generate more features, thus improving the integrity of the feature extracted for an accurate detection and prediction match; (ii) reduce the number of parameters, computational complexities, and cost, thus improving tracking speed without diminishing the output feature map.
- We incorporate a segmentation module to add rich semantic information to the low-level appearance feature map generated using semantic labels. With semantic labels, the segmentation module can help the deep appearance descriptor distinguish between objects with close appearances and similarities, even when the background is noisy.
2. Related Works
2.1. Object Detection
2.2. Tracking by Detection
2.3. Issues Related to the Use of Semantic Segmentation Cameras and Alternative Solutions
3. Methods
3.1. Experimental Platform
3.2. Datasets and Semantic Labels
3.3. Object Detection Model
3.4. The Object Tracking Method
- (i)
- The KF takes detection information (the object bounding box provided by CR-YOLOnet) as the input measurement, then predicts the target object’s future states (position) in the detection frame, and the prior estimate value of the target object is estimated. Also, CR-YOLOnet extracts and saves the feature information of the target object on the detection frame.
- (ii)
- The Hungarian algorithms employ the appearance feature and Mahalanobis distance to associate the target object with the detection frame and track. In the event that the association procedure generates a successful match, the system will proceed to a Kalman update and provide tracking results. However, if there is no match, cascade matching is employed to associate the unmatched detection frame and track. Each track has its own time update parameter, “time_since_update”, established throughout the cascade matching process. Since the tracks are sorted by “time_since_update”, the unmatched detection frame is first associated with the track with the minimum “time_since_update” to prevent tracking failure while decreasing the frequency of ID switches.
- (iii)
- To evaluate if the target object in the detection frame and the track have the same ID and are a match, we compute the percentage of overlapping areas to compute their similarity. If the similarity computation generates a successful match, the system will proceed to a Kalman update and provide tracking results. However, if there is no match for the subsequent five frames, the mismatched track will be associated with a new detection frame. The tracking result will be returned if the detection frame is found within the next five frames. The target is consequently deleted if no match is found after five frames.
3.4.1. Kalman Filter Prediction of Target Object State
- Tracking Space: The tracking space in autonomous vehicle systems is the coordinate system used to track objects. Our method performs tracking using image plane coordinates, which offers several advantages and limitations. Image coordinate tracking entails observing the variations in the location and dimensions of objects as they are depicted in the two-dimensional projection of the scene recorded by the camera on the vehicle. This approach is very advantageous for identifying and monitoring objects in real-time because of its efficient processing capabilities and compatibility with a wide range of vision-based algorithms [3,81]. Nevertheless, to track objects accurately, it is important to comprehend the correlation between the coordinates in the image plane and the corresponding spatial coordinates in the real world. The main obstacle in this situation is that the two-dimensional image plane does not immediately provide depth information, therefore complicating the extraction of precise distances and relative velocities of objects. These measurements are essential for the navigation and decision-making processes of autonomous vehicles [82].
- Physical Interpretation of Tracking Parameters: Image-based tracking involves representing the physical parameters of tracked objects in the image plane, such as the object’s position, height, and aspect ratio, as described in Equation (3). The parameters change over time because of the relative motion between the camera (which is attached to the vehicle) and the objects in the scene. The key parameters and their physical interpretations are as follows:
- Position: The coordinates of the object in the image plane. These coordinates represent the location of the object’s bounding box within the 2D image frame [83].
- Aspect Ratio: The aspect ratio is the ratio of the width to the height of the object’s bounding box. Changes in aspect ratio can indicate changes in the object’s orientation relative to the camera. For example, a vehicle turning relative to the camera may show a different aspect ratio over time [84].
- Height: The height is the vertical dimension of the object’s bounding box in the image plane. This parameter changes as the object moves closer to or further away from the camera, affecting its apparent size. Changes in the height of the bounding box can indicate relative changes in distance between the object and the camera [85].
3.4.2. Matching New Detection Measurements and Predicted Target States
- (i)
- Mahalanobis distance:
- (ii)
- Appearance feature matching:
3.5. The Appearance Feature Extraction Model
3.5.1. GhostNet for Improved Performance and Reduced Computational Complexity and Cost
3.5.2. Segmentation Module for an Improved Appearance Feature
4. Improved Deep Appearance Feature Extraction Network
4.1. The Architecture of Our Appearance Feature Extraction Network
4.2. Training
5. Multi-Object Tracking Experimental Results and Discussion
5.1. Comparison of Multi-Object Tracking Performance Using Our CARLA Dataset
- The multi-object tracking accuracy (MOTA) describes the total tracking accuracy with respect to false positives (FP), false negatives (N), and identity switches (IDS), and it is expressed in Equation (27).
- The multi-object tracking precision (MOTP) describes the total tracking precision measured with respect to the amount of actual bounding box overlap with the predicted position, and it is expressed in Equation (28).
5.2. Qualitative Results of Multi-Object Tracking Performance on Our CARLA Dataset
5.3. Addressing Fundamental Issues and the Real-Life Applicability of Our Proposed Method
5.3.1. Domain Adaptation
- Adversarial Training: Adversarial domain adaptation employs a domain discriminator to differentiate between characteristics from the source and target domains. The model is trained to intentionally perplex the domain discriminator, thereby acquiring domain-invariant features. This method enhances the model’s ability to generalize effectively across diverse domains [97].
- Feature Alignment: Feature alignment approaches seek to align the distributions of features between the source and target domains. These objectives can be accomplished using techniques such as Maximum Mean Discrepancy (MMD) and Correlation Alignment (CORAL). Through the process of aligning the distributions, the model is able to acquire knowledge about features that are significant in both domains [98].
- Self-Training: Self-training is a process where the model uses its own predictions on the target domain data as pseudo-labels to train itself again. The iterative method facilitates the model’s steady adaptation to the target domain by refining its predictions and enhancing its performance [99].
5.3.2. Transfer Learning
- Fine-Tuning: Fine-tuning refers to the process of initially training a model on a large dataset from a certain domain and subsequently refining the model using target domain data. This method capitalizes on the knowledge obtained from pre-training and adapts it to suit the specific target domain [100].
- Domain Randomization: Domain randomization is a simulation approach that involves randomizing certain elements of the simulated environment, such as lighting and texturing, to provide a diverse range of scenarios. This facilitates the model’s acquisition of resilient characteristics that exhibit strong generalization capabilities in real-world scenarios. After being trained on a range of different scenarios, the model is capable of performing efficiently in real-world situations [101].
- Multi-Task Learning: Multi-task learning is the process of training a model on numerous interconnected tasks at the same time. This methodology enables the model to exchange representations between different tasks, hence enhancing its ability to generalize and perform well in the target domain. For instance, a model that has been trained to do both object detection and semantic segmentation can make use of the shared knowledge that exists between these two tasks [102].
5.3.3. Applications and Benefits
- Minimizing the Requirement for Abundant Real-World Data: By utilizing simulated data and transferring expertise to real-world applications, these methods diminish the reliance on substantial quantities of labeled real-world data, which may be expensive and time-consuming to gather.
- Improving Model Robustness and Generalization: Enhancing the model’s domain adaptation and transfer learning improve the ability of models to handle varied real-world settings, hence enhancing their resilience and generalization capabilities.
- Accelerating Development and Deployment: The utilization of easily available simulated data for initial training and its subsequent adaptation for real-world settings allows for the expeditious construction and deployment of models.
5.3.4. Potential Proof of Concept
- Data Collection and Preprocessing: The data collection and preprocessing phase will include collecting data from the KITTI and Waymo Open Datasets, with the aim of including a wide range of driving scenarios and conditions. The data will undergo preprocessing to synchronize and calibrate sensor inputs, such as RADAR and camera data, as described.
- Model Training and Adaptation: The models will undergo initial training using simulated data from the CARLA simulator. Subsequently, domain adaptation and transfer learning methodologies will be utilized to customize these models for real-world data. The training procedure will involve adjusting hyperparameters to enhance performance on various datasets.
- Performance Evaluation: The adapted models will be evaluated using standard metrics such as Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and F1-score. The efficiency of the adaptation approaches will be evaluated by comparing their performance on simulated data with real-world data. The expected results are intended to demonstrate that the adapted models uphold a high level of accuracy and resilience, hence confirming the validity of the suggested methodology.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Nabati, M.R. Sensor Fusion for Object Detection and Tracking in Autonomous Vehicles. Ph.D. Thesis, University of Tennessee, Knoxville, TN, USA, 2021. [Google Scholar]
- Zhang, X.-Q.; Jiang, R.-H.; Fan, C.-X.; Tong, T.-Y.; Wang, T.; Huang, P.-C. Advances in Deep Learning Methods for Visual Tracking: Literature Review and Fundamentals. Int. J. Autom. Comput. 2021, 18, 311–333. [Google Scholar] [CrossRef]
- Wu, Z.; Li, F.; Zhu, Y.; Lu, K.; Wu, M. Design of a Robust System Architecture for Tracking Vehicle on Highway Based on Monocular Camera. Sensors 2022, 22, 3359. [Google Scholar] [CrossRef] [PubMed]
- Jang, J.; Seon, M.; Choi, J. Lightweight Indoor Multi-Object Tracking in Overlapping FOV Multi-Camera Environments. Sensors 2022, 22, 5267. [Google Scholar] [CrossRef] [PubMed]
- Li, J.; Ding, Y.; Wei, H.-L.; Zhang, Y.; Lin, W. SimpleTrack: Rethinking and Improving the JDE Approach for Multi-Object Tracking. Sensors 2022, 22, 5863. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Hu, T.; Shao, X.; Xiao, M.; Rong, Y.; Xiao, Z. Multi-Target Tracking Using Windowed Fourier Single-Pixel Imaging. Sensors 2021, 21, 7934. [Google Scholar] [CrossRef] [PubMed]
- Diab, M.S.; Elhosseini, M.A.; El-Sayed, M.S.; Ali, H.A. Brain Strategy Algorithm for Multiple Object Tracking Based on Merging Semantic Attributes and Appearance Features. Sensors 2021, 21, 7604. [Google Scholar] [CrossRef]
- Bar-Shalom, Y.; Daum, F.; Huang, J. The probabilistic data association filter. IEEE Control Syst. Mag. 2009, 29, 82–100. [Google Scholar] [CrossRef]
- Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple online and realtime tracking. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar]
- Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar]
- Parico, A.I.B.; Ahamed, T. Real Time Pear Fruit Detection and Counting Using YOLOv4 Models and Deep SORT. Sensors 2021, 21, 4803. [Google Scholar] [CrossRef] [PubMed]
- Qiu, Z.; Zhao, N.; Zhou, L.; Wang, M.; Yang, L.; Fang, H.; He, Y.; Liu, Y. Vision-Based Moving Obstacle Detection and Tracking in Paddy Field Using Improved Yolov3 and Deep SORT. Sensors 2020, 20, 4082. [Google Scholar] [CrossRef]
- Zhao, Y.; Zhou, X.; Xu, X.; Jiang, Z.; Cheng, F.; Tang, J.; Shen, Y. A Novel Vehicle Tracking ID Switches Algorithm for Driving Recording Sensors. Sensors 2020, 20, 3638. [Google Scholar] [CrossRef]
- Pereira, R.; Carvalho, G.; Garrote, L.; Nunes, U.J. Sort and Deep-SORT Based Multi-Object Tracking for Mobile Robotics: Evaluation with New Data Association Metrics. Appl. Sci. 2022, 12, 1319. [Google Scholar] [CrossRef]
- Lee, S.; Kim, E. Multiple object tracking via feature pyramid siamese networks. IEEE Access 2018, 7, 8181–8194. [Google Scholar] [CrossRef]
- Jin, J.; Li, X.; Li, X.; Guan, S. Online multi-object tracking with Siamese network and optical flow. In Proceedings of the 2020 IEEE 5th International Conference on Image, Vision and Computing (ICIVC), Beijing, China, 10–12 July 2020; pp. 193–198. [Google Scholar]
- De Ponte Müller, F. Survey on Ranging Sensors and Cooperative Techniques for Relative Positioning of Vehicles. Sensors 2017, 17, 271. [Google Scholar] [CrossRef] [PubMed]
- De-Las-Heras, G.; Sánchez-Soriano, J.; Puertas, E. Advanced Driver Assistance Systems (ADAS) Based on Machine Learning Techniques for the Detection and Transcription of Variable Message Signs on Roads. Sensors 2021, 21, 5866. [Google Scholar] [CrossRef] [PubMed]
- Fayyad, J.; Jaradat, M.A.; Gruyer, D.; Najjaran, H. Deep Learning Sensor Fusion for Autonomous Vehicle Perception and Localization: A Review. Sensors 2020, 20, 4220. [Google Scholar] [CrossRef] [PubMed]
- Al-Haija, Q.A.; Gharaibeh, M.; Odeh, A. Detection in Adverse Weather Conditions for Autonomous Vehicles via Deep Learning. AI 2022, 3, 303–317. [Google Scholar] [CrossRef]
- Bijelic, M.; Gruber, T.; Mannan, F.; Kraus, F.; Ritter, W.; Dietmayer, K.; Heide, F. Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11682–11692. [Google Scholar]
- Hasirlioglu, S.; Riener, A. Challenges in Object Detection under Rainy Weather Conditions; Springer: Cham, Switzerland, 2019; pp. 53–65. [Google Scholar]
- Song, R.; Wetherall, J.; Maskell, S.; Ralph, F.J. Weather Effects on Obstacle Detection for Autonomous Car. In Proceedings of the International Conference on Vehicle Technology and Intelligent Transport Systems, Prague, Czech Republic, 2–4 May 2020. [Google Scholar]
- Zang, S.; Ding, M.; Smith, D.; Tyler, P.; Rakotoarivelo, T.; Kaafar, M.A. The Impact of Adverse Weather Conditions on Autonomous Vehicles: How Rain, Snow, Fog, and Hail Affect the Performance of a Self-Driving Car. IEEE Veh. Technol. Mag. 2019, 14, 103–111. [Google Scholar] [CrossRef]
- Ogunrinde, I.; Bernadin, S. A Review of the Impacts of Defogging on Deep Learning-Based Object Detectors in Self-Driving Cars. In Proceedings of the SoutheastCon 2021, Atlanta, GA, USA, 10–13 March 2021; pp. 1–8. [Google Scholar]
- Tan, R.T. Visibility in bad weather from a single image. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2008; pp. 1–8. [Google Scholar]
- Choi, W.Y.; Yang, J.H.; Chung, C.C. Data-Driven Object Vehicle Estimation by Radar Accuracy Modeling with Weighted Interpolation. Sensors 2021, 21, 2317. [Google Scholar] [CrossRef] [PubMed]
- Nabati, R.; Qi, H. Radar-Camera Sensor Fusion for Joint Object Detection and Distance Estimation in Autonomous Vehicles. arXiv 2020, arXiv:2009.08428. [Google Scholar]
- Chang, S.; Zhang, Y.; Zhang, F.; Zhao, X.; Huang, S.; Feng, Z.; Wei, Z. Spatial Attention Fusion for Obstacle Detection Using MmWave Radar and Vision Sensor. Sensors 2020, 20, 956. [Google Scholar] [CrossRef]
- Zhang, X.; Zhou, M.; Qiu, P.; Huang, Y.; Li, J. Radar and vision fusion for the real-time obstacle detection and identification. Ind. Robot Int. J. Robot. Res. Appl. 2019, 46, 391–395. [Google Scholar] [CrossRef]
- Ogunrinde, I.; Bernadin, S. Deep Camera-Radar Fusion with an Attention Framework for Autonomous Vehicle Vision in Foggy Weather Conditions. Sensors 2023, 23, 6255. [Google Scholar] [CrossRef] [PubMed]
- Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning, Mountain View, CA, USA, 13–15 November 2017. [Google Scholar]
- Ahmed, M.; Hashmi, K.A.; Pagani, A.; Liwicki, M.; Stricker, D.; Afzal, M.Z. Survey and Performance Analysis of Deep Learning Based Object Detection in Challenging Environments. Sensors 2021, 21, 5116. [Google Scholar] [CrossRef] [PubMed]
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef] [PubMed]
- Jiao, L.; Zhang, F.; Liu, F.; Yang, S.; Li, L.; Feng, Z.; Qu, R. A Survey of Deep Learning-Based Object Detection. IEEE Access 2019, 7, 128837–128868. [Google Scholar] [CrossRef]
- Abdu, F.J.; Zhang, Y.; Fu, M.; Li, Y.; Deng, Z. Application of Deep Learning on Millimeter-Wave Radar Signals: A Review. Sensors 2021, 21, 1951. [Google Scholar] [CrossRef] [PubMed]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Benjdira, B.; Khursheed, T.; Koubaa, A.; Ammar, A.; Ouni, K. Car detection using unmanned aerial vehicles: Comparison between faster r-cnn and yolov3. In Proceedings of the 2019 1st International Conference on Unmanned Vehicle Systems-Oman (UVS), Muscat, Oman, 5–7 February 2019; pp. 1–6. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Jocher, G.; Nishimura, K.; Mineeva, T.; Vilariño, R. YOLOv5 (2020). Available online: https://github.com/ultralytics/yolov5 (accessed on 16 August 2022).
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Chadwick, S.; Maddern, W.; Newman, P. Distant vehicle detection using radar and vision. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 8311–8317. [Google Scholar]
- John, V.; Nithilan, M.; Mita, S.; Tehrani, H.; Sudheesh, R.; Lalu, P. So-net: Joint semantic segmentation and obstacle detection using deep fusion of monocular camera and radar. In Proceedings of the Pacific-Rim Symposium on Image and Video Technology, Sydney, Australia, 18–22 November 2019; pp. 138–148. [Google Scholar]
- Meyer, M.; Kuschk, G. Deep learning based 3D object detection for automotive radar and camera. In Proceedings of the 2019 16th European Radar Conference (EuRAD), Paris, France, 2–4 October 2019; pp. 133–136. [Google Scholar]
- Nobis, F.; Geisslinger, M.; Weber, M.; Betz, J.; Lienkamp, M. A deep learning-based radar and camera sensor fusion architecture for object detection. In Proceedings of the 2019 Sensor Data Fusion: Trends, Solutions, Applications (SDF), Bonn, Germany, 15–17 October 2019; pp. 1–7. [Google Scholar]
- Yoo, H.; Kim, K.; Byeon, M.; Jeon, Y.; Choi, J.Y. Online Scheme for Multiple Camera Multiple Target Tracking Based on Multiple Hypothesis Tracking. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 454–469. [Google Scholar] [CrossRef]
- Sheng, H.; Chen, J.; Zhang, Y.; Ke, W.; Xiong, Z.; Yu, J. Iterative Multiple Hypothesis Tracking with Tracklet-Level Association. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 3660–3672. [Google Scholar] [CrossRef]
- Reid, D. An algorithm for tracking multiple targets. IEEE Trans. Autom. Control 1979, 24, 843–854. [Google Scholar] [CrossRef]
- Chen, L.; Ai, H.; Zhuang, Z.; Shang, C. Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In Proceedings of the 2018 IEEE International Conference on Multimedia and Expo (ICME), San Diego, CA, USA, 23–27 July 2018; pp. 1–6. [Google Scholar]
- Mozhdehi, R.J.; Medeiros, H. Deep convolutional particle filter for visual tracking. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3650–3654. [Google Scholar]
- Kalman, R.E. A new Approach to Linear Filtering and Prediction Problems. J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef]
- Kuhn, H.W. The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 1955, 2, 83–97. [Google Scholar] [CrossRef]
- Chen, J.; Xi, Z.; Wei, C.; Lu, J.; Niu, Y.; Li, Z. Multiple Object Tracking Using Edge Multi-Channel Gradient Model with ORB Feature. IEEE Access 2021, 9, 2294–2309. [Google Scholar] [CrossRef]
- He, J.; Huang, Z.; Wang, N.; Zhang, Z. Learnable graph matching: Incorporating graph partitioning with deep feature learning for multiple object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 5299–5309. [Google Scholar]
- Lucas, B.D.; Kanade, T. An Iterative Image Registration Technique with an Application to Stereo Vision; HAL: Vancouver, BC, Canada, 1981; Volume 81. [Google Scholar]
- Rosique, F.; Navarro, P.J.; Fernández, C.; Padilla, A. A Systematic Review of Perception System and Simulators for Autonomous Vehicles Research. Sensors 2019, 19, 648. [Google Scholar] [CrossRef] [PubMed]
- Richter, S.R.; Vineet, V.; Roth, S.; Koltun, V. Playing for data: Ground truth from computer games. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Part II 14. pp. 102–118. [Google Scholar]
- Mueller, M.; Smith, N.; Ghanem, B. A Benchmark and Simulator for UAV Tracking; Springer: Berlin/Heidelberg, Germany, 2016; Volume 9905, pp. 445–461. [Google Scholar]
- Tremblay, J.; Prakash, A.; Acuna, D.; Brophy, M.; Jampani, V.; Anil, C.; To, T.; Cameracci, E.; Boochoon, S.; Birchfield, S. Training deep networks with synthetic data: Bridging the reality gap by domain randomization. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–23 June 2018; pp. 969–977. [Google Scholar]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Ye, H.; Chen, Y.; Liu, M. Tightly coupled 3d lidar inertial odometry and mapping. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 3144–3150. [Google Scholar]
- Zhang, G.; Yin, J.; Deng, P.; Sun, Y.; Zhou, L.; Zhang, K. Achieving Adaptive Visual Multi-Object Tracking with Unscented Kalman Filter. Sensors 2022, 22, 9106. [Google Scholar] [CrossRef]
- Ogunrinde, I.O. Multi-Sensor Fusion for Object Detection and Tracking Under Foggy Weather Conditions. Ph.D. Thesis, Florida Agricultural and Mechanical University, Tallahassee, FL, USA, 2023. [Google Scholar]
- Hasirlioglu, S.; Kamann, A.; Doric, I.; Brandmeier, T. Test methodology for rain influence on automotive surround sensors. In Proceedings of the 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, 1–4 November 2016; pp. 2242–2247. [Google Scholar]
- Henley, C.; Somasundaram, S.; Hollmann, J.; Raskar, R. Detection and Mapping of Specular Surfaces Using Multibounce Lidar Returns. Opt. Express 2022, 31, 6370–6388. [Google Scholar] [CrossRef]
- Gao, R.; Park, J.; Hu, X.; Yang, S.; Cho, K. Reflective Noise Filtering of Large-Scale Point Cloud Using Multi-Position LiDAR Sensing Data. Remote Sens. 2021, 13, 3058. [Google Scholar] [CrossRef]
- Kashani, A.G.; Olsen, M.J.; Parrish, C.E.; Wilson, N. A Review of LIDAR Radiometric Processing: From Ad Hoc Intensity Correction to Rigorous Radiometric Calibration. Sensors 2015, 15, 28099–28128. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Y.; Liu, L.; Zhao, H.; López-Benítez, M.; Yu, L.; Yue, Y. Towards Deep Radar Perception for Autonomous Driving: Datasets, Methods, and Challenges. Sensors 2022, 22, 4208. [Google Scholar] [CrossRef] [PubMed]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A Multimodal Dataset for Autonomous Driving. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11618–11628. [Google Scholar]
- Barnes, D.; Gadd, M.; Murcutt, P.; Newman, P.; Posner, I. The oxford radar robotcar dataset: A radar extension to the oxford robotcar dataset. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31August 2020; pp. 6433–6438. [Google Scholar]
- Kim, G.; Park, Y.S.; Cho, Y.; Jeong, J.; Kim, A. MulRan: Multimodal Range Dataset for Urban Place Recognition. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 6246–6253. [Google Scholar]
- Sheeny, M.; De Pellegrin, E.; Mukherjee, S.; Ahrabian, A.; Wang, S.; Wallace, A. RADIATE: A radar dataset for automotive perception in bad weather. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 1–7. [Google Scholar]
- Meyer, M.; Kuschk, G. Automotive radar dataset for deep learning based 3d object detection. In Proceedings of the 2019 16th European Radar Conference (EuRAD), Paris, France, 2–4 October 2019; pp. 129–132. [Google Scholar]
- Shi, J.; Tang, Y.; Gao, J.; Piao, C.; Wang, Z. Multitarget-Tracking Method Based on the Fusion of Millimeter-Wave Radar and LiDAR Sensor Information for Autonomous Vehicles. Sensors 2023, 23, 6920. [Google Scholar] [CrossRef] [PubMed]
- Alaba, S.Y.; Ball, J.E. A Survey on Deep-Learning-Based LiDAR 3D Object Detection for Autonomous Driving. Sensors 2022, 22, 9577. [Google Scholar] [CrossRef] [PubMed]
- Leon, F.; Gavrilescu, M. A Review of Tracking and Trajectory Prediction Methods for Autonomous Driving. Mathematics 2021, 9, 660. [Google Scholar] [CrossRef]
- Zhao, D.; Fu, H.; Xiao, L.; Wu, T.; Dai, B. Multi-Object Tracking with Correlation Filter for Autonomous Vehicle. Sensors 2018, 18, 2004. [Google Scholar] [CrossRef] [PubMed]
- Lin, C.; Li, B.; Siampis, E.; Longo, S.; Velenis, E. Predictive Path-Tracking Control of an Autonomous Electric Vehicle with Various Multi-Actuation Topologies. Sensors 2024, 24, 1566. [Google Scholar] [CrossRef] [PubMed]
- El Natour, G.; Bresson, G.; Trichet, R. Multi-Sensors System and Deep Learning Models for Object Tracking. Sensors 2023, 23, 7804. [Google Scholar] [CrossRef] [PubMed]
- Mauri, A.; Khemmar, R.; Decoux, B.; Ragot, N.; Rossi, R.; Trabelsi, R.; Boutteau, R.; Ertaud, J.-Y.; Savatier, X. Deep Learning for Real-Time 3D Multi-Object Detection, Localisation, and Tracking: Application to Smart Mobility. Sensors 2020, 20, 532. [Google Scholar] [CrossRef]
- Daramouskas, I.; Meimetis, D.; Patrinopoulou, N.; Lappas, V.; Kostopoulos, V.; Kapoulas, V. Camera-Based Local and Global Target Detection, Tracking, and Localization Techniques for UAVs. Machines 2023, 11, 315. [Google Scholar] [CrossRef]
- Liu, H.; Pei, Y.; Bei, Q.; Deng, L. Improved DeepSORT Algorithm Based on Multi-Feature Fusion. Appl. Syst. Innov. 2022, 5, 55. [Google Scholar] [CrossRef]
- Wang, C.-Y.; Liao, H.-Y.M.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.-H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features From Cheap Operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1577–1586. [Google Scholar]
- Zhang, Z.; Qiao, S.; Xie, C.; Shen, W.; Wang, B.; Yuille, A.L. Single-Shot Object Detection with Enriched Semantics. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5813–5821. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 12993–13000. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Toldo, M.; Maracani, A.; Michieli, U.; Zanuttigh, P. Unsupervised Domain Adaptation in Semantic Segmentation: A Review. Technologies 2020, 8, 35. [Google Scholar] [CrossRef]
- Ivanovs, M.; Ozols, K.; Dobrajs, A.; Kadikis, R. Improving Semantic Segmentation of Urban Scenes for Self-Driving Cars with Synthetic Images. Sensors 2022, 22, 2252. [Google Scholar] [CrossRef] [PubMed]
- Caltagirone, L.; Bellone, M.; Svensson, L.; Wahde, M.; Sell, R. Lidar–Camera Semi-Supervised Learning for Semantic Segmentation. Sensors 2021, 21, 4813. [Google Scholar] [CrossRef] [PubMed]
- Qiu, Y.; Lu, Y.; Wang, Y.; Jiang, H. IDOD-YOLOV7: Image-Dehazing YOLOV7 for Object Detection in Low-Light Foggy Traffic Environments. Sensors 2023, 23, 1347. [Google Scholar] [CrossRef] [PubMed]
- Gomaa, A.; Abdalrazik, A. Novel Deep Learning Domain Adaptation Approach for Object Detection Using Semi-Self Building Dataset and Modified YOLOv4. World Electr. Veh. J. 2024, 15, 255. [Google Scholar] [CrossRef]
- Mounsey, A.; Khan, A.; Sharma, S. Deep and Transfer Learning Approaches for Pedestrian Identification and Classification in Autonomous Vehicles. Electronics 2021, 10, 3159. [Google Scholar] [CrossRef]
Models | Segmentation Module (SM) | MOTA | MOTP | MT | ML | IDS | FP | FN | FPS |
---|---|---|---|---|---|---|---|---|---|
CR-YOLO + Ours (CIoU) | WITH | 74.86 | 84.21 | 43.40% | 15.86% | 513 | 5737 | 1259 | 68.34 |
CR-YOLO + Ours (GIoU) | WITH | 72.35 | 83.79 | 41.80% | 17.09% | 534 | 3079 | 1427 | 66.09 |
CR-YOLO + Ours (CIoU) | WITHOUT | 71.77 | 80.10 | 40.25% | 18.06% | 598 | 4765 | 3882 | 62.94 |
CR-YOLO + Ours (GIoU) | WITHOUT | 68.07 | 78.96 | 35.25% | 21.22% | 782 | 5357 | 4224 | 58.8 |
Models | Segmentation Module (SM) | MOTA | MOTP | MT | ML | IDS | FP | FN | FPS |
---|---|---|---|---|---|---|---|---|---|
CR-YOLO + Ours (CIoU) | WITH | 68.25 | 79.65 | 38.77% | 17.05% | 691 | 5767 | 1401 | 66.15 |
CR-YOLO + Ours (GIoU) | WITH | 66.93 | 76.99 | 35.53% | 19.24% | 723 | 3628 | 2183 | 62.37 |
CR-YOLO + Ours (CIoU) | WITHOUT | 64.93 | 73.38 | 33.90% | 20.13% | 833 | 4239 | 5992 | 59.58 |
CR-YOLO + Ours (GIoU) | WITHOUT | 61.05 | 70.32 | 30.16% | 23.80% | 910 | 3897 | 6327 | 55.95 |
Models | Segmentation Module (SM) | MOTA | MOTP | MT | ML | IDS | FP | FN | FPS |
---|---|---|---|---|---|---|---|---|---|
CR-YOLO + Ours (CIoU) | WITH | 66.14 | 75.78 | 36.80% | 19.24% | 816 | 7158 | 5062 | 64.88 |
CR-YOLO + Ours (GIoU) | WITH | 64.23 | 73.33 | 34.02% | 21.44% | 865 | 6592 | 7914 | 60.42 |
CR-YOLO + Ours (CIoU) | WITHOUT | 60.67 | 70.54 | 31.17% | 22.79% | 1034 | 10548 | 5523 | 56.27 |
CR-YOLO + Ours (GIoU) | WITHOUT | 52.09 | 66.48 | 28.76% | 26.52% | 1193 | 8349 | 13626 | 51.82 |
Models | MOTA | MOTP | MT | ML | IDS | FP | FN | FPS |
---|---|---|---|---|---|---|---|---|
CR-YOLO + Ours (CIoU) with SM | 74.86 | 84.21 | 43.40% | 15.86% | 513 | 5737 | 1259 | 68.34 |
CR-YOLO + Ours (GIoU) with SM | 72.35 | 83.79 | 41.80% | 17.09% | 534 | 3079 | 1427 | 66.09 |
CR-YOLO + DeepSORT | 68.94 | 78.82 | 38.11% | 20.49% | 827 | 7280 | 4947 | 61.24 |
CR-YOLO + SORT | 65.81 | 72.03 | 30.94% | 22.63% | 1129 | 6945 | 6740 | 52.47 |
YOLOv5 + Ours (CIoU) with SM | 67.13 | 74.91 | 36.27% | 18.29% | 841 | 9394 | 4235 | 56.43 |
YOLOv5 + Ours (GIoU) with SM | 66.94 | 73.58 | 33.79% | 20.45% | 867 | 6651 | 6685 | 53.12 |
YOLOv5 + DeepSORT | 63.37 | 71.41 | 31.63% | 22.95% | 893 | 4476 | 8708 | 51.5 |
YOLOv5 + SORT | 60.68 | 69.48 | 28.18% | 26.43% | 1342 | 6911 | 10882 | 44.52 |
Models | MOTA | MOTP | MT | ML | IDS | FP | FN | FPS |
---|---|---|---|---|---|---|---|---|
CR-YOLO + Ours (CIoU) with SM | 68.25 | 79.65 | 38.77% | 17.05% | 691 | 5767 | 1401 | 66.15 |
CR-YOLO + Ours (GIoU) with SM | 66.93 | 76.99 | 35.53% | 19.24% | 723 | 3628 | 2183 | 62.37 |
CR-YOLO + DeepSORT | 61.67 | 71.84 | 33.31% | 22.61% | 954 | 4210 | 9632 | 56.8 |
CR-YOLO + SORT | 54.31 | 64.85 | 25.60% | 36.54% | 1685 | 5877 | 10184 | 49.02 |
YOLOv5 + Ours (CIoU) with SM | 63.14 | 73.42 | 32.41% | 23.09% | 976 | 7394 | 5185 | 54.66 |
YOLOv5 + Ours (GIoU) with SM | 60.56 | 69.25 | 30.15% | 25.03% | 1151 | 6053 | 7739 | 52.92 |
YOLOv5 + DeepSORT | 54.98 | 63.86 | 27.06% | 26.28% | 1234 | 4579 | 10371 | 48.71 |
YOLOv5 + SORT | 47.1 | 58.59 | 20.72% | 29.35% | 2425 | 1706 | 11125 | 41.78 |
Models | MOTA | MOTP | MT | ML | IDS | FP | FN | FPS |
---|---|---|---|---|---|---|---|---|
CR-YOLO + Ours (CIoU) with SM | 66.14 | 75.78 | 36.80% | 19.24% | 816 | 7158 | 5062 | 64.88 |
CR-YOLO + Ours (GIoU) with SM | 64.23 | 73.33 | 34.02% | 21.44% | 865 | 6592 | 7914 | 60.42 |
CR-YOLO + DeepSORT | 56.94 | 64.23 | 29.85% | 26.56% | 1378 | 6122 | 8378 | 52.59 |
CR-YOLO + SORT | 49.67 | 58.4 | 21.10% | 24.27% | 1797 | 4773 | 9354 | 47.9 |
YOLOv5 + Ours (CIoU) with SM | 61.21 | 70.02 | 30.34% | 21.52% | 1213 | 9612 | 8941 | 54.01 |
YOLOv5 + Ours (GIoU) with SM | 55.27 | 64.74 | 27.18% | 27.88% | 1276 | 2315 | 9723 | 51.45 |
YOLOv5 + DeepSORT | 48.94 | 57.13 | 24.75% | 32.98% | 1534 | 3380 | 11042 | 47.14 |
YOLOv5 + SORT | 42.07 | 53.01 | 18.87% | 38.06% | 3305 | 2911 | 13185 | 36.65 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ogunrinde, I.; Bernadin, S. Improved DeepSORT-Based Object Tracking in Foggy Weather for AVs Using Sematic Labels and Fused Appearance Feature Network. Sensors 2024, 24, 4692. https://doi.org/10.3390/s24144692
Ogunrinde I, Bernadin S. Improved DeepSORT-Based Object Tracking in Foggy Weather for AVs Using Sematic Labels and Fused Appearance Feature Network. Sensors. 2024; 24(14):4692. https://doi.org/10.3390/s24144692
Chicago/Turabian StyleOgunrinde, Isaac, and Shonda Bernadin. 2024. "Improved DeepSORT-Based Object Tracking in Foggy Weather for AVs Using Sematic Labels and Fused Appearance Feature Network" Sensors 24, no. 14: 4692. https://doi.org/10.3390/s24144692
APA StyleOgunrinde, I., & Bernadin, S. (2024). Improved DeepSORT-Based Object Tracking in Foggy Weather for AVs Using Sematic Labels and Fused Appearance Feature Network. Sensors, 24(14), 4692. https://doi.org/10.3390/s24144692