# Real-Time Object Detection and Tracking Based on Embedded Edge Devices for Local Dynamic Map Generation

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Overview of the Proposed Camera System

#### 2.1. Hardware Overview

^{2}and 88 × 106 mm

^{2}, respectively. Moreover, the system’s average power consumption is low at 15 watts, making it highly suitable for application on mobile platforms. The power can be supplied through either a DC adapter or power over ethernet (PoE). The specifications of the proposed camera system are summarized in Table 1.

#### 2.2. Software Overview

## 3. Object Detection Network on DSP

## 4. Tracking Feature Extraction Network on GPU

## 5. Three-dimensional Trajectory Estimation on CPU

## 6. Experimental Results

#### 6.1. Detector Performance Evaluation

#### 6.2. Tracker Performance Evaluation

## 7. Conclusions and Future Works

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- TR 102 863—V1.1.1; Intelligent Transport Systems (ITS); Vehicular Communications; Basic Set of Applications; Local Dynamic Map (LDM); Rationale for and Guidance on Standardization. ETSI: Sophia Antipolis, France, 2011.
- Damerow, F.; Puphal, T.; Flade, B.; Li, Y.; Eggert, J. Intersection Warning System for Occlusion Risks Using Relational Local Dynamic Maps. IEEE Intell. Transp. Syst. Mag.
**2018**, 10, 47–59. [Google Scholar] [CrossRef] - Carletti, C.M.R.; Raviglione, F.; Casetti, C.; Stoffella, F.; Yilma, G.M.; Visintainer, F.; Risma Carletti, C.M. S-LDM: Server Local Dynamic Map for 5G-Based Centralized Enhanced Collective Perception. SSRN
**2023**. [Google Scholar] [CrossRef] - Qualcomm QCS605 SoC|Next-Gen 8-Core IoT & Smart Camera Chipset|Qualcomm. Available online: https://www.qualcomm.com/products/technology/processors/application-processors/qcs605 (accessed on 29 September 2022).
- Zaidi, S.S.A.; Ansari, M.S.; Aslam, A.; Kanwal, N.; Asghar, M.; Lee, B. A Survey of Modern Deep Learning Based Object Detection Models. Digit. Signal Process.
**2021**, 126, 103514. [Google Scholar] [CrossRef] - Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv
**2020**. [Google Scholar] [CrossRef] - Choi, K.; Wi, S.M.; Jung, H.G.; Suhr, J.K. Simplification of Deep Neural Network-Based Object Detector for Real-Time Edge Computing. Sensors
**2023**, 23, 3777. [Google Scholar] [CrossRef] [PubMed] - Wojke, N.; Bewley, A.; Paulus, D. Simple Online and Realtime Tracking with a Deep Association Metric. arXiv
**2017**. [Google Scholar] [CrossRef] - Kim, G.; Jung, H.G.; Suhr, J.K. CNN-Based Vehicle Bottom Face Quadrilateral Detection Using Surveillance Cameras for Intelligent Transportation Systems. Sensors
**2023**, 23, 6688. [Google Scholar] [CrossRef] [PubMed] - Caprile, B.; Torre, V. Using Vanishing Points for Camera Calibration; Springer: Berlin/Heidelberg, Germany, 1990; Volume 4. [Google Scholar]
- RoadGaze Hardware Specification. Available online: http://withrobot.com/en/ai-camera/roadgaze/ (accessed on 2 February 2024).
- MIPI CSI-2. Available online: https://www.mipi.org/specifications/csi-2 (accessed on 2 February 2024).
- Yurdusev, A.A.; Adem, K.; Hekim, M. Detection and Classification of Microcalcifications in Mammograms Images Using Difference Filter and Yolov4 Deep Learning Model. Biomed. Signal Process. Control
**2023**, 80, 104360. [Google Scholar] [CrossRef] - Dlamini, S.; Chen, Y.-H.; Jeffrey Kuo, C.-F. Complete Fully Automatic Detection, Segmentation and 3D Reconstruction of Tumor Volume for Non-Small Cell Lung Cancer Using YOLOv4 and Region-Based Active Contour Model. Expert Syst. Appl.
**2023**, 212, 118661. [Google Scholar] [CrossRef] - YOLOv4. Available online: https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/yolo_v4.html (accessed on 6 March 2023).
- Getting Started with YOLO V4. Available online: https://www.mathworks.com/help/vision/ug/getting-started-with-yolo-v4.html (accessed on 16 February 2024).
- Zhang, B.; Zhang, J. A Traffic Surveillance System for Obtaining Comprehensive Information of the Passing Vehicles Based on Instance Segmentation. IEEE Trans. Intell. Transp. Syst.
**2021**, 22, 7040–7055. [Google Scholar] [CrossRef] - Mauri, A.; Khemmar, R.; Decoux, B.; Haddad, M.; Boutteau, R. Real-Time 3D Multi-Object Detection and Localization Based on Deep Learning for Road and Railway Smart Mobility. J. Imaging
**2021**, 7, 145. [Google Scholar] [CrossRef] [PubMed] - Li, P. RTM3D: Real-Time Monocular 3D Detection from Object Keypoints for Autonomous Driving. arXiv
**2020**. [Google Scholar] [CrossRef] - Zhu, M.; Zhang, S.; Zhong, Y.; Lu, P.; Peng, H.; Lenneman, J. Monocular 3D Vehicle Detection Using Uncalibrated Traffic Cameras through Homography. arXiv
**2021**. [Google Scholar] [CrossRef] - Kannala, J.; Brandt, S.S. A Generic Camera Model and Calibration Method for Conventional, Wide-Angle, and Fish-Eye Lenses. IEEE Trans. Pattern Anal. Mach. Intell.
**2006**, 28, 1335–1340. [Google Scholar] [CrossRef] [PubMed] - Bouguet, J.-Y. Complete Camera Calibration Toolbox for Matlab. In Jean-Yves Bouguet’s Homepage; 1999; Available online: http://robots.stanford.edu/cs223b04/JeanYvesCalib/ (accessed on 1 January 2024).
- Cipolla, R.; Drummond, T.; Robertson, D. Camera Calibration from Vanishing Points in Image OfArchitectural Scenes. In Proceedings of the 1999 British Machine Vision Conference, Nottingham, UK, 1 January 1999. [Google Scholar]
- Li, N.; Pan, Y.; Chen, Y.; Ding, Z.; Zhao, D.; Xu, Z. Heuristic Rank Selection with Progressively Searching Tensor Ring Network. arXiv
**2020**. [Google Scholar] [CrossRef] - Yin, M.; Sui, Y.; Liao, S.; Yuan, B. Towards Efficient Tensor Decomposition-Based DNN Model Compression with Optimization Framework. arXiv
**2021**. [Google Scholar] [CrossRef] - Liang, T.; Glossner, J.; Wang, L.; Shi, S.; Zhang, X. Pruning and Quantization for Deep Neural Network Acceleration: A Survey. arXiv
**2021**. [Google Scholar] [CrossRef] - Liu, Z.; Li, J.; Shen, Z.; Huang, G.; Yan, S.; Zhang, C. Learning Efficient Convolutional Networks through Network Slimming. arXiv
**2017**. [Google Scholar] [CrossRef] - Masana, M.; Van De Weijer, J.; Herranz, L.; Bagdanov, A.D.; Alvarez, J.M. Domain-Adaptive Deep Network Compression. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Yang, J.; Zou, H.; Cao, S.; Chen, Z.; Xie, L. MobileDA: Toward Edge-Domain Adaptation. IEEE Internet Things J.
**2020**, 7, 6909–6918. [Google Scholar] [CrossRef] - Liu, C.; Zoph, B.; Neumann, M.; Shlens, J.; Hua, W.; Li, L.-J.; Fei-Fei, L.; Yuille, A.; Huang, J.; Murphy, K. Progressive Neural Architecture Search. arXiv
**2017**. [Google Scholar] [CrossRef] - White, C.; Neiswanger, W.; Savani, Y. BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search. arXiv
**2019**. [Google Scholar] [CrossRef] - Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple Online and Realtime Tracking. arXiv
**2016**. [Google Scholar] [CrossRef] - Ondrasovic, M.; Tarabek, P. Siamese Visual Object Tracking: A Survey. IEEE Access
**2021**, 9, 110149–110172. [Google Scholar] [CrossRef] - Chen, F.; Wang, X.; Zhao, Y.; Lv, S.; Niu, X. Visual Object Tracking: A Survey. Comput. Vis. Image Underst.
**2022**, 222, 103508. [Google Scholar] [CrossRef] - Wojke, N.; Bewley, A. Deep Cosine Metric Learning for Person Re-Identification. arXiv
**2018**. [Google Scholar] [CrossRef] - Du, D.; Zhu, P.; Wen, L.; Bian, X.; Ling, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y.; et al. VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep High-Resolution Representation Learning for Visual Recognition. arXiv
**2019**. [Google Scholar] [CrossRef] [PubMed] - Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. arXiv
**2016**. [Google Scholar] [CrossRef] - Triki, N.; Ksantini, M.; Karray, M. Traffic Sign Recognition System Based on Belief Functions Theory. In Proceedings of the ICAART 2021—13th International Conference on Agents and Artificial Intelligence, Virtual Event, 4–6 February 2021; SciTePress: Setúbal, Portugal, 2021; Volume 2, pp. 775–780. [Google Scholar]

**Figure 1.**Hardware overview [11].

Major Components | Items | Specification |
---|---|---|

Main Board | QCS605 | CPU: Kyro 300: 64 bit-8 cores, up 2.5 GHz |

DSP: 2$\times $ Hexagon Vector Processor, Hexagon 685 | ||

GPU: Adreno 615 | ||

Memory | 4 GB LPDDR4, eMMC 64 GB | |

OS | Android | |

Size | $42\times 35\text{}{\mathrm{m}\mathrm{m}}^{2}$ | |

Carrier Board | Interface | Exterior: Ethernet, Camera: MIPI |

Power | 15 w (PoE or DC Adaptor) | |

Data Transfer | CODEC: H.264 Protocol: RTSP | |

Size | $88\times 106\text{}{\mathrm{m}\mathrm{m}}^{2}$ | |

Camera | Sensor | Sony IMX334(CMOS) M12 Mount, Rolling shutter |

FOV | 34.4°~128° | |

Resolution | $1920\times 1080$ |

Network | Dataset | Input Size | mAP (%) | BFLOPs |
---|---|---|---|---|

YOLOv4 | Visdrone2019-Det | $416\times 416$ | 20.62 | 59.75 |

SCOD | $416\times 256$ | 86.69 | 36.79 | |

Modified YOLOv4 | Visdrone2019-Det | $416\times 416$ | 25.79 | 63.77 |

SCOD | $416\times 256$ | 89.47 | 39.22 |

Dataset | Pruning Rate (%) | mAP (%) | Parameter | BFLOPs |
---|---|---|---|---|

Visdrone2019-Det | 0 | 25.79 | 48.0 M | 63.77 |

50 | 27.38 | 15.1 M | 38.45 | |

70 | 26.99 | 6.49 M | 27.28 | |

SCOD | 0 | 89.47 | 48.0 M | 39.22 |

50 | 89.98 | 12.7 M | 19.25 | |

70 | 89.56 | 6.48 M | 13.23 |

DB | FOV | Sequences ID | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | Total | ||

Train | N | 810 (11) | 690 (12) | 1020 (7) | 930 (12) | 780 (8) | 900 (10) | 611 (8) | 5741 (68) | ||||||

W | 990 (30) | 1020 (25) | 630 (26) | 899 (22) | 659 (20) | 990 (33) | 5098 (156) | ||||||||

Test | N | 1080 (7) | 900 (7) | 509 (4) | 604 (10) | 795 (19) | 695 (7) | 4583 (54) | |||||||

W | 630 (15) | 900 (22) | 660 (17) | 450 (15) | 2640 (69) |

Tracker | NFOV Sequences | WFOV Sequences | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

MOTA (%) | FN | FP | IDsw | BFLOPs | MOTA (%) | FN | FP | IDsw | BFLOPs | |

DeepSORT | 41.24 | 2688 | 1885 | 56 | 0.59 x N | 87.04 | 277 | 361 | 21 | 1.18 x N |

Modified DeepSORT | 41.54 | 2726 | 1817 | 62 | 10.66 | 87.35 | 271 | 358 | 14 | 10.66 |

Pruning Rate | BFLOPs | NFOV Sequences | WFOV Sequences | ||||||
---|---|---|---|---|---|---|---|---|---|

MOTA (%) | FN | FP | IDsw | MOTA (%) | FN | FP | IDsw | ||

0 | 10.66 | 41.54 | 2726 | 1817 | 62 | 87.35 | 271 | 358 | 14 |

50 | 4.32 | 39.74 | 2772 | 1913 | 62 | 87.49 | 264 | 349 | 23 |

70 | 2.99 | 40.73 | 2879 | 1732 | 58 | 87.37 | 263 | 360 | 19 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Choi, K.; Moon, J.; Jung, H.G.; Suhr, J.K.
Real-Time Object Detection and Tracking Based on Embedded Edge Devices for Local Dynamic Map Generation. *Electronics* **2024**, *13*, 811.
https://doi.org/10.3390/electronics13050811

**AMA Style**

Choi K, Moon J, Jung HG, Suhr JK.
Real-Time Object Detection and Tracking Based on Embedded Edge Devices for Local Dynamic Map Generation. *Electronics*. 2024; 13(5):811.
https://doi.org/10.3390/electronics13050811

**Chicago/Turabian Style**

Choi, Kyoungtaek, Jongwon Moon, Ho Gi Jung, and Jae Kyu Suhr.
2024. "Real-Time Object Detection and Tracking Based on Embedded Edge Devices for Local Dynamic Map Generation" *Electronics* 13, no. 5: 811.
https://doi.org/10.3390/electronics13050811