# Real-Time Object Detection and Tracking Based on Embedded Edge Devices for Local Dynamic Map Generation

## Abstract

## 1. Introduction

## 2. Overview of the Proposed Camera System

#### 2.1. Hardware Overview

^{2}and 88 × 106 mm

^{2}, respectively. Moreover, the system’s average power consumption is low at 15 watts, making it highly suitable for application on mobile platforms. The power can be supplied through either a DC adapter or power over ethernet (PoE). The specifications of the proposed camera system are summarized in Table 1.

#### 2.2. Software Overview

## 3. Object Detection Network on DSP

## 4. Tracking Feature Extraction Network on GPU

## 5. Three-dimensional Trajectory Estimation on CPU

## 6. Experimental Results

#### 6.1. Detector Performance Evaluation

#### 6.2. Tracker Performance Evaluation

## 7. Conclusions and Future Works

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

**Figure 1.**Hardware overview [11].

Major Components | Items | Specification |
---|---|---|

Main Board | QCS605 | CPU: Kyro 300: 64 bit-8 cores, up 2.5 GHz |

DSP: 2$\times $ Hexagon Vector Processor, Hexagon 685 | ||

GPU: Adreno 615 | ||

Memory | 4 GB LPDDR4, eMMC 64 GB | |

OS | Android | |

Size | $42\times 35\text{}{\mathrm{m}\mathrm{m}}^{2}$ | |

Carrier Board | Interface | Exterior: Ethernet, Camera: MIPI |

Power | 15 w (PoE or DC Adaptor) | |

Data Transfer | CODEC: H.264 Protocol: RTSP | |

Size | $88\times 106\text{}{\mathrm{m}\mathrm{m}}^{2}$ | |

Camera | Sensor | Sony IMX334(CMOS) M12 Mount, Rolling shutter |

FOV | 34.4°~128° | |

Resolution | $1920\times 1080$ |

Network | Dataset | Input Size | mAP (%) | BFLOPs |
---|---|---|---|---|

YOLOv4 | Visdrone2019-Det | $416\times 416$ | 20.62 | 59.75 |

SCOD | $416\times 256$ | 86.69 | 36.79 | |

Modified YOLOv4 | Visdrone2019-Det | $416\times 416$ | 25.79 | 63.77 |

SCOD | $416\times 256$ | 89.47 | 39.22 |

Dataset | Pruning Rate (%) | mAP (%) | Parameter | BFLOPs |
---|---|---|---|---|

Visdrone2019-Det | 0 | 25.79 | 48.0 M | 63.77 |

50 | 27.38 | 15.1 M | 38.45 | |

70 | 26.99 | 6.49 M | 27.28 | |

SCOD | 0 | 89.47 | 48.0 M | 39.22 |

50 | 89.98 | 12.7 M | 19.25 | |

70 | 89.56 | 6.48 M | 13.23 |

DB | FOV | Sequences ID | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | Total | ||

Train | N | 810 (11) | 690 (12) | 1020 (7) | 930 (12) | 780 (8) | 900 (10) | 611 (8) | 5741 (68) | ||||||

W | 990 (30) | 1020 (25) | 630 (26) | 899 (22) | 659 (20) | 990 (33) | 5098 (156) | ||||||||

Test | N | 1080 (7) | 900 (7) | 509 (4) | 604 (10) | 795 (19) | 695 (7) | 4583 (54) | |||||||

W | 630 (15) | 900 (22) | 660 (17) | 450 (15) | 2640 (69) |

Tracker | NFOV Sequences | WFOV Sequences | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

MOTA (%) | FN | FP | IDsw | BFLOPs | MOTA (%) | FN | FP | IDsw | BFLOPs | |

DeepSORT | 41.24 | 2688 | 1885 | 56 | 0.59 x N | 87.04 | 277 | 361 | 21 | 1.18 x N |

Modified DeepSORT | 41.54 | 2726 | 1817 | 62 | 10.66 | 87.35 | 271 | 358 | 14 | 10.66 |

Pruning Rate | BFLOPs | NFOV Sequences | WFOV Sequences | ||||||
---|---|---|---|---|---|---|---|---|---|

MOTA (%) | FN | FP | IDsw | MOTA (%) | FN | FP | IDsw | ||

0 | 10.66 | 41.54 | 2726 | 1817 | 62 | 87.35 | 271 | 358 | 14 |

50 | 4.32 | 39.74 | 2772 | 1913 | 62 | 87.49 | 264 | 349 | 23 |

70 | 2.99 | 40.73 | 2879 | 1732 | 58 | 87.37 | 263 | 360 | 19 |

