Human Detection in UAV Thermal Imagery: Dataset Extension and Comparative Evaluation on Embedded Platforms
Abstract
1. Introduction
- A new mountain-oriented thermal dataset extending existing UAV infrared benchmarks for search and rescue applications.
- A comprehensive evaluation of state-of-the-art detection and segmentation models on both desktop and embedded hardware.
2. Related Work
2.1. Detection Algorithms
2.2. Domain Adaptation
2.3. Multimodal Fusion
2.4. Synthetic Data Generation
2.5. Embedded Inference
| Paper/Model | mAP | FPS | Dataset | Device | Quantization | Comments |
|---|---|---|---|---|---|---|
| EdgeYOLO [42] | 44.8% | 34 | VisDrone | Jetson Xavier | FP16 | Lightweight YOLO-style detector with decoupled head and simplified backbone for edge deployment. |
| Quantized Object Detection for Real-Time Inference on Embedded GPU Architectures (YOLOv4) [41] | 71.36% | 16 | KITTI | Jetson AGX | FP32 | Quantized YOLOv4 achieving real-time inference on low-cost Jetsons; minimal accuracy loss (<2%). |
| Quantized Object Detection for Real-Time Inference on Embedded GPU Architectures (YOLOv4) [41] | 68.33% | 47 | KITTI | Jetson AGX | FP16 | Quantized YOLOv4 achieving real-time inference on low-cost Jetsons; minimal accuracy loss (<2%). |
| Quantized Object Detection for Real-Time Inference on Embedded GPU Architectures (YOLOv4) [41] | 56.69% | 62 | KITTI | Jetson AGX | INT8 | Quantized YOLOv4 achieving real-time inference on low-cost Jetsons; minimal accuracy loss (<2%). |
| EL-YOLO [43] | 44.1% | 3 | VisDrone | Jetson Xavier | FP16 | 9W average power consumption. |
| A Novel Smart System with Jetson Nano for Remote Insect Monitoring (YOLOv7) [45] | 77.2% | 5 | Custom insect dataset | Jetson Nano | FP32 | YOLOv7 applied for insect detection. |
| Counting People and Bicycles in Real Time Using YOLO on Jetson Nano (YOLOv5s) [46] | 60.51% | 10 | Custom urban dataset | Jetson Nano | FP32 | The paper evaluates multiple YOLOv5 variants. |
2.6. Datasets
3. Methods
3.1. Object Detection
3.1.1. Dataset
- Performed a literature review and searched for thermal images datasets containing labeled instances of humans;
- Transformed all annotations to the YOLO format;
- Filtered the datasets to keep only thermal images (for datasets targeting both thermal and visible spectra);
- Filtered the annotations to keep only those targeting humans (‘human’, ‘people’, ‘pedestrian’, etc.);
- Renamed all resulting labels to ‘human’;
- The resulting annotations were manually checked to remove any potential outliers;
- Prepended the original dataset source name to the resulting image files for traceability;
- Added images from our own custom mountainous dataset;
- Balanced the dataset so every source contributes roughly equally to the model’s learning.
3.1.2. Algorithms and Models
3.1.3. Embedded Platform
3.1.4. Evaluation
3.1.5. Statistical Evaluation
3.1.6. Embedded Costs
3.2. Semantic Segmentation
4. Results
4.1. Accuracy
4.1.1. Object Detection
4.1.2. Dataset Characteristics and Scale Sensitivity
4.1.3. Training Data Limitations
4.1.4. Architectural Differences
4.1.5. Semantic Segmentation
4.2. Embedded Performance
4.2.1. Object Detection
4.2.2. Semantic Segmentation
4.3. Power Considerations
4.4. Comparison with State-of-the-Art
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| UAV | Unmanned Aerial Vehicle |
| SAR | Search and Rescue |
| GPU | Graphical Processing Unit |
| YOLO | You Only Look Once |
| CNNs | Convolutional Neural Networks |
| ADAS | Advanced Driver-Assistance System |
| TIR | Thermal Infrared |
| RGB | Red Green Blue |
Appendix A
| Size/Model | YOLO v8 m | YOLO v8 x | YOLO v9 e | ||||||
|---|---|---|---|---|---|---|---|---|---|
| AP50 | AP75 | AP[50–95] | AP50 | AP75 | AP[50–95] | AP50 | AP75 | AP[50–95] | |
| Small | 27.8 | 6.9 | 11.2 | 28.6 | 7.1 | 11.6 | 27.3 | 6.7 | 11.1 |
| Medium | 50.3 | 22.3 | 25.5 | 51.0 | 23.4 | 26.2 | 50.7 | 23.0 | 25.9 |
| Large | 85.2 | 69.8 | 60.6 | 85.2 | 70.6 | 61.2 | 85.5 | 70.3 | 60.9 |
| All | 84.3 | 47.7 | 48.1 | 85.4 | 49.3 | 49.4 | 84.6 | 48.4 | 48.6 |
| Size/Model | YOLO v10 m | YOLO v10 x | RT-DETR l | ||||||
| AP50 | AP75 | AP[50–95] | AP50 | AP75 | AP[50–95] | AP50 | AP75 | AP[50–95] | |
| Small | 26.7 | 6.3 | 10.7 | 27.3 | 7.0 | 11.2 | 18.3 | 0.9 | 5.0 |
| Medium | 51.1 | 23.9 | 26.5 | 50.8 | 23.8 | 26.3 | 46.1 | 9.0 | 17.8 |
| Large | 85.9 | 71.2 | 61.6 | 86.1 | 70.7 | 61.3 | 89.4 | 62.2 | 55.6 |
| All | 84.3 | 49.2 | 49.0 | 84.4 | 49.3 | 49.0 | 76.7 | 29.8 | 36.3 |
| Size/Model | YOLO v8 m | YOLO v8 x | ||||||
|---|---|---|---|---|---|---|---|---|
| P | R | F1-Score | AR | P | R | F1-Score | AR | |
| Small | 35.6 | 62.3 | 44.4 | 59.0 | 36.7 | 63.9 | 45.5 | 60.0 |
| Medium | 52.1 | 86.0 | 61.8 | 76.0 | 51.7 | 87.2 | 62.3 | 78.4 |
| Large | 74.3 | 95.8 | 78.5 | 83.1 | 73.7 | 96.1 | 78.5 | 83.9 |
| All sizes | 85.9 | 81.8 | 81.3 | 77.2 | 78.5 | 82.9 | 82.4 | 86.7 |
| Size/Model | YOLO v9 e | YOLO v10 m | ||||||
| P | R | F1-Score | AR | P | R | F1-Score | AR | |
| Small | 35.1 | 61.9 | 44.1 | 59.3 | 34.5 | 59.1 | 42.8 | 56.2 |
| Medium | 52.2 | 86.8 | 62.3 | 77.3 | 51.9 | 85.0 | 61.9 | 76.5 |
| Large | 73.5 | 96.0 | 78.4 | 83.9 | 74.8 | 95.3 | 79.1 | 83.9 |
| All sizes | 86.1 | 82.1 | 81.6 | 77.6 | 85.8 | 80.3 | 80.8 | 76.3 |
| Size/Model | YOLO v10 x | RT-DETR l | ||||||
| P | R | F1-Score | AR | P | R | F1-Score | AR | |
| Small | 35.5 | 60.1 | 43.5 | 56.3 | 24.4 | 79.0 | 34.0 | 56.2 |
| Medium | 51.6 | 85.3 | 61.7 | 76.7 | 50.0 | 91.8 | 57.5 | 67.7 |
| Large | 75.5 | 95.4 | 79.4 | 83.8 | 79.0 | 97.4 | 82.3 | 85.9 |
| All sizes | 85.8 | 80.8 | 80.9 | 76.5 | 77.1 | 89.6 | 73.0 | 69.3 |
| YOLO v8 m | YOLO v8 x | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PyTorch | ONNX | TensorRT | PyTorch | ONNX | TensorRT | |||||||
| Size\Metric | AP [50–95] | AR optimal conf | AP [50–95] | AR optimal conf | AP [50–95] | AR optimal conf | AP [50–95] | AR optimal conf | AP [50–95] | AR optimal conf | AP [50–95] | AR optimal conf |
| Small | 11.2 | 59.0 | 11.1 | 58.6 | 11.1 | 58.6 | 11.6 | 60.0 | 11.5 | 59.7 | 11.5 | 59.7 |
| Medium | 25.5 | 76.0 | 25.3 | 75.5 | 25.3 | 75.5 | 26.3 | 78.4 | 26.0 | 78.0 | 26.0 | 78.0 |
| Large | 60.6 | 83.1 | 60.2 | 82.6 | 60.2 | 82.6 | 61.2 | 83.9 | 60.8 | 83.5 | 60.8 | 83.5 |
| All sizes | 48.1 | 77.2 | 47.8 | 76.7 | 47.8 | 76.7 | 49.4 | 86.7 | 49.0 | 86.3 | 49.0 | 86.3 |
| YOLO v10 m | YOLO v10 x | |||||||||||
| PyTorch | ONNX | TensorRT | PyTorch | ONNX | TensorRT | |||||||
| Size\Metric | AP [50–95] | AR optimal conf | AP [50–95] | AR optimal conf | AP [50–95] | AR optimal conf | AP [50–95] | AR optimal conf | AP [50–95] | AR optimal conf | AP [50–95] | AR optimal conf |
| Small | 10.7 | 56.2 | 10.6 | 55.8 | 10.6 | 55.8 | 11.2 | 56.3 | 11.1 | 56.0 | 11.1 | 56.0 |
| Medium | 26.5 | 76.5 | 26.3 | 75.9 | 26.3 | 75.9 | 26.3 | 76.7 | 26.12 | 76.3 | 26.2 | 76.3 |
| Large | 61.5 | 83.9 | 60.9 | 83.3 | 60.9 | 83.3 | 61.3 | 83.8 | 61.0 | 83.4 | 61.0 | 83.4 |
| All sizes | 49.0 | 76.3 | 48.6 | 75.7 | 48.6 | 75.7 | 49.0 | 76.5 | 48.7 | 76.1 | 48.8 | 76.1 |
| YOLO v9 e | RT-DETR l | |||||||||||
| PyTorch | ONNX | TensorRT | PyTorch | ONNX | TensorRT | |||||||
| Size\Metric | AP [50–95] | AR optimal conf | AP [50–95] | AR optimal conf | AP [50–95] | AR optimal conf | AP [50–95] | AR optimal conf | AP [50–95] | AR optimal conf | AP [50–95] | AR optimal conf |
| Small | 11.1 | 59.3 | 11.0 | 58.9 | 11.0 | 58.9 | 5.0 | 56.2 | 5.0 | 56.2 | 5.0 | 56.2 |
| Medium | 25.9 | 77.3 | 25.7 | 76.8 | 25.7 | 76.8 | 17.8 | 67.7 | 17.8 | 67.7 | 17.8 | 67.7 |
| Large | 60.9 | 83.9 | 60.4 | 83.3 | 60.4 | 83.3 | 55.6 | 85.9 | 55.6 | 85.9 | 55.6 | 85.9 |
| All sizes | 48.6 | 77.6 | 48.2 | 77.0 | 48.2 | 77.0 | 36.3 | 69.3 | 36.3 | 69.3 | 36.3 | 69.3 |
| Power | Metric | v8 m | v8 x | v9 e | v10 m | v10 x | RT-DETR | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FP16 | I8 | FP16 | I8 | FP16 | I8 | FP16 | I8 | FP16 | I8 | FP16 | I8 | ||
| Min | Time (ms) | 137 | 152 | 260 | 265 | 334 | 319 | 135 | 115 | 240 | 218 | 3618 | 379 |
| Energy (J) | 0.58 | 0.39 | 1.41 | 0.62 | 1.78 | 1.32 | 0.58 | 0.41 | 1.25 | 0.56 | 1.24 | 2.26 | |
| Max | Time (ms) | 30 | 31 | 51 | 49 | 53 | 51 | 27 | 49 | 43 | 62 | 46 | 109 |
| Energy (J) | 0.62 | 0.69 | 1.63 | 1.31 | 1.87 | 1.96 | 0.59 | 0.68 | 1.38 | 1.49 | 1.41 | 2.55 | |
| YOLO v8 x | YOLO v9 e | YOLO v10 x | YOLO v8 m | YOLO v10 m | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Metric/Format | TF32 | FP16 | TF32 | FP16 | TF32 | FP16 | TF32 | FP16 | TF32 | FP16 |
| Inference time (ms) | 259.82 | 134.40 | 334.10 | 172.84 | 240.33 | 124.68 | 136.98 | 68.68 | 135.33 | 68.17 |
| FPS | 3.85 | 7.44 | 2.99 | 5.79 | 4.16 | 8.02 | 7.30 | 14.56 | 7.39 | 14.67 |
| Energy/img (J) | 1.41 | 0.61 | 1.78 | 0.80 | 1.25 | 0.57 | 0.58 | 0.19 | 0.58 | 0.24 |
References
- Liu, C.; Szirányi, T. Real-Time Human Detection and Gesture Recognition. Sensors 2021, 21, 2180. [Google Scholar] [CrossRef]
- Jung, H.K.; Choi, G.S. Improved YOLOv5 for Object Detection in Drone Images. Appl. Sci. 2022, 12, 7255. [Google Scholar]
- Mantau, A.J.; Widayat, I.W.; Leu, J.S.; Köppen, M. Human Detection Using YOLOv5 and Thermal UAV Data. Drones 2022, 6, 290. [Google Scholar] [CrossRef]
- Zhao, H.; Zhou, Y.; Zhang, L.; Peng, Y.; Hu, X.; Peng, H.; Cai, X. Mixed YOLOv3-LITE: A Lightweight Real-Time Object Detection Method. Sensors 2020, 20, 1861. [Google Scholar]
- Soundrapandiyan, R.; Mouli, P.C. Adaptive Pedestrian Detection in Infrared Images. Procedia Comput. Sci. 2015, 58, 706–713. [Google Scholar] [CrossRef]
- Tsai, P.F.; Liao, C.H.; Yuan, S.M. Deep Learning with Thermal Imaging for Human Detection. Sensors 2022, 22, 5351. [Google Scholar] [CrossRef]
- Guettala, W.; Sayah, A.; Kahloul, L.; Tibermacine, A. Real Time Human Detection by UAVs. arXiv 2024, arXiv:2401.03275. [Google Scholar]
- Kim, Y.H.; Shin, U.; Park, J.; Kweon, I.S. MS-UDA: Multi-Spectral Unsupervised Domain Adaptation for Thermal Image Semantic Segmentation. IEEE Robot. Autom. Lett. 2021, 6, 6497–6504. [Google Scholar] [CrossRef]
- VS, V.; Poster, D.; You, S.; Hu, S.; Patel, V.M. Meta-UDA: Unsupervised Domain Adaptive Thermal Object Detection Using Meta-Learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 1412–1423. [Google Scholar]
- Rauch, J.; Doer, C.; Trommer, G.F. Object Detection on Thermal Images for Unmanned Aerial Vehicles Using Domain Adaption Through Fine-Tuning. In Proceedings of the 2021 28th Saint Petersburg International Conference on Integrated Navigation Systems (ICINS), Saint Petersburg, Russia, 31 May–2 June 2021; pp. 1–4. [Google Scholar] [CrossRef]
- Jiao, L.; Wei, H.; Pan, Q. Region and Sample Level Domain Adaptation for Unsupervised Infrared Target Detection in Aerial Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 11289–11306. [Google Scholar] [CrossRef]
- Gan, L.; Lee, C.; Chung, S.J. Unsupervised RGB-to-Thermal Domain Adaptation via Multi-Domain Attention Network. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 6014–6020. [Google Scholar] [CrossRef]
- D, M.; Sikdar, A.; Gurunath, P.; Udupa, S.; Sundaram, S. SAGA: Semantic-Aware Gray color Augmentation for Visible-to-Thermal Domain Adaptation across Multi-View Drone and Ground-Based Vision Systems. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Nashville, TN, USA, 11–15 June 2025; pp. 4587–4597. [Google Scholar]
- Do, D.P.; Kim, T.; Na, J.; Kim, J.; Lee, K.; Cho, K.; Hwang, W. D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 23313–23322. [Google Scholar] [CrossRef]
- Li, Q.; Tan, K.; Yuan, D.; Liu, Q. Progressive Domain Adaptation for Thermal Infrared Tracking. Electronics 2025, 14, 162. [Google Scholar] [CrossRef]
- Shi, C.; Zheng, Y.; Chen, Z. Domain Adaptive Thermal Object Detection with Unbiased Granularity Alignment. ACM Trans. Multimed. Comput. Commun. Appl. 2024, 20, 1–23. [Google Scholar] [CrossRef]
- Wu, Y.; Liu, C. A Method of Aerial Multi-Modal Image Registration for a Low-Visibility Approach Based on Virtual Reality Fusion. Appl. Sci. 2023, 13, 3396. [Google Scholar] [CrossRef]
- Mo, Y.; Kang, X.; Zhang, S.; Duan, P.; Li, S. A Robust Infrared and Visible Image Registration Method for Dual-Sensor UAV System. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5004113. [Google Scholar] [CrossRef]
- García-Moreno, L.M.; Díaz-Paz, J.P.; Loaiza-Correa, H.; Restrepo-Girón, A.D. Dataset of thermal and visible aerial images for multi-modal and multi-spectral image registration and fusion. Data Brief 2020, 29, 105326. [Google Scholar] [CrossRef]
- Maes, W.H.; Huete, A.R.; Steppe, K. Optimizing the Processing of UAV-Based Thermal Imagery. Remote Sens. 2017, 9, 476. [Google Scholar] [CrossRef]
- Li, H.; Ding, W.; Cao, X.; Liu, C. Image Registration and Fusion of Visible and Infrared Integrated Camera for Medium-Altitude Unmanned Aerial Vehicle Remote Sensing. Remote Sens. 2017, 9, 441. [Google Scholar] [CrossRef]
- Zhang, P.; Zhao, J.; Wang, D.; Lu, H.; Ruan, X. Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 8886–8895. [Google Scholar]
- Ghannadi, M.A.; Alebooye, S.; Izadi, M.; Esmaeili, F. UAV-Borne Thermal Images Registration Using Optimal Gradient Filter. J. Indian Soc. Remote Sens. 2025, 53, 911–922. [Google Scholar] [CrossRef]
- El Ahmar, W.; Massoud, Y.; Kolhatkar, D.; AlGhamdi, H.; Alja’Afreh, M.; Laganiere, R.; Hammoud, R. Enhanced Thermal-RGB Fusion for Robust Object Detection. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, BC, Canada, 17–24 June 2023; pp. 365–374. [Google Scholar] [CrossRef]
- Yang, X.; Guo, R.; Li, H. Comparison of multimodal RGB-thermal fusion techniques for exterior wall multi-defect detection. J. Infrastruct. Intell. Resil. 2023, 2, 100029. [Google Scholar] [CrossRef]
- Sousa, E.; Mota, K.O.S.; Gomes, I.P.; Garrote, L.; Wolf, D.F.; Premebida, C. Late-Fusion Multimodal Human Detection Based on RGB and Thermal Images for Robotic Perception. In Proceedings of the 2023 European Conference on Mobile Robots (ECMR), Coimbra, Portugal, 4–7 September 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Jiang, C.; Yang, H.; Huo, H.T.; Zhu, P.; Yao, Z.; Li, J.; Sun, M.; Yang, S. M2FNet: Multi-modal fusion network for object detection from visible and thermal infrared images. Int. J. Appl. Earth Obs. Geoinf. 2024, 130, 103918. [Google Scholar] [CrossRef]
- Wang, Q.; Tu, Z.; Li, C.; Tang, J. High performance RGB-Thermal Video Object Detection via hybrid fusion with progressive interaction and temporal-modal difference. Inf. Fusion 2025, 114, 102665. [Google Scholar] [CrossRef]
- Hwang, S.; Park, J.; Kim, N.; Choi, Y.; Kweon, I.S. Multispectral pedestrian detection: Benchmark dataset and baseline. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1037–1045. [Google Scholar] [CrossRef]
- González, A.; Fang, Z.; Socarras, Y.; Serrat, J.; Vázquez, D.; Xu, J.; López, A.M. Pedestrian Detection at Day/Night Time with Visible and FIR Cameras: A Comparison. Sensors 2016, 16, 820. [Google Scholar] [CrossRef] [PubMed]
- Teledyne FLIR. FREE—FLIR Thermal Dataset. Available online: https://oem.flir.com/solutions/automotive/adas-dataset-form/ (accessed on 20 August 2025).
- Zhang, Y.; Xu, C.; Yang, W.; He, G.; Yu, H.; Yu, L.; Xia, G. Drone-Based RGBT Tiny Person Detection. ISPRS J. Photogramm. Remote Sens. 2023, 204, 61–76. [Google Scholar] [CrossRef]
- Bongini, F.; Berlincioni, L.; Bertini, M.; Bimbo, A.D. Partially fake it till you make it: Mixing real and fake thermal images for improved object detection. arXiv 2021, arXiv:2106.13603. [Google Scholar] [CrossRef]
- Madan, N.; Siemon, M.S.N.; Gjerde, M.K.; Petersson, B.S.; Grotuzas, A.; Esbensen, M.A.; Nikolov, I.A.; Philipsen, M.P.; Nasrollahi, K.; Moeslund, T.B. ThermalSynth: A Novel Approach for Generating Synthetic Thermal Human Scenarios. In Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), Waikoloa, HI, USA, 3–7 January 2023; pp. 130–139. [Google Scholar] [CrossRef]
- Qazi, T.; Lall, B.; Mukherjee, P. ThermalDiff: A diffusion architecture for thermal image synthesis. J. Vis. Commun. Image Represent. 2025, 111, 104524. [Google Scholar] [CrossRef]
- Bianchi, L.; Bechini, M.; Quirino, M.; Lavagna, M. Synthetic thermal image generation and processing for close proximity operations. Acta Astronaut. 2025, 226, 611–625. [Google Scholar] [CrossRef]
- Pavez, V.; Hermosilla, G.; Silva, M.; Farias, G. Advanced Deep Learning Techniques for High-Quality Synthetic Thermal Image Generation. Mathematics 2023, 11, 4446. [Google Scholar] [CrossRef]
- Thinh Vo, D.; Nguyen Phong, L.; Nguyen Quoc, H.; Nguyen Nhu, T.; Phan Anh, D.; Anh Tu, N.; Ninh, H.; Tran Tien, H. A generative model for synthetic thermal infrared images. J. Electron. Imaging 2024, 33, 053047. [Google Scholar] [CrossRef]
- Liu, P.; Li, F.; Li, W. Unsupervised Image-generation Enhanced Adaptation for Object Detection in Thermal images. arXiv 2021, arXiv:2002.06770. [Google Scholar] [CrossRef]
- Vasile, C.-E.; Ulmamei, A.-A.; Bira, C. Image Processing Hardware Acceleration-A Review of Operations Involved and Current Hardware Approaches. J. Imaging 2024, 10, 298. [Google Scholar] [CrossRef]
- Guerrouj, F.Z.; Florez, S.R.; Ouardi, A.E.; Abouzahir, M.; Ramzi, M. Quantized Object Detection for Real-Time Inference on Embedded GPU Architectures. Int. J. Adv. Comput. Sci. Appl. 2025, 20. [Google Scholar] [CrossRef]
- Liu, S.; Zha, J.; Sun, J.; Li, Z.; Wang, G. EdgeYOLO: An Edge-Real-Time Object Detector. arXiv 2023, arXiv:2302.07483. [Google Scholar]
- Xue, C.; Xia, Y.; Wu, M.; Chen, Z.; Cheng, F.; Yun, L. EL-YOLO: An efficient and lightweight low-altitude aerial objects detector for onboard applications. Expert Syst. Appl. 2024, 256, 124848. [Google Scholar] [CrossRef]
- Archet, A.; Gac, N.; Orieux, F.; Ventroux, N. Embedded AI performances of Nvidia’s Jetson Orin SoC series. In Proceedings of the 17ème Colloque National du GDR SOC2, Lyon, France, 13–14 June 2023. [Google Scholar]
- Doan, T.N.; Phan, T.H. A Novel Smart System with Jetson Nano for Remote Insect Monitoring. Int. J. Adv. Comput. Sci. Appl. 2024, 1002. [Google Scholar] [CrossRef]
- Gomes, H.; Redinha, N.; Lavado, N.; Mendes, M. Counting People and Bicycles in Real Time Using YOLO on Jetson Nano. Energies 2022, 15, 8816. [Google Scholar] [CrossRef]
- Lazarevich, I.; Grimaldi, M.; Kumar, R.; Mitra, S.; Khan, S.; Sah, S. YOLOBench: Benchmarking Efficient Object Detectors on Embedded Systems. arXiv 2023, arXiv:2307.13901. [Google Scholar] [CrossRef]
- Zhu, P.; Wen, L.; Du, D.; Bian, X.; Fan, H.; Hu, Q.; Ling, H. Detection and Tracking Meet Drones Challenge. IEEE TPAMI 2022, 44, 7380–7399. [Google Scholar] [CrossRef]
- Lannan, N.; Zhou, L.; Fan, G. A Multiview Depth-based Motion Capture Benchmark Dataset. In Proceedings of the CVPR Workshops, New Orleans, LA, USA, 19–20 June 2022; pp. 426–435. [Google Scholar]
- Davis, J.W.; Keck, M.A. A Two-Stage Template Approach to Person Detection in Thermal Imagery. In Proceedings of the IEEE Workshops on Applications of Computer Vision, Breckenridge, CO, USA, 5–7 January 2005; Volume 1, pp. 364–369. [Google Scholar]
- Davis, J.W.; Sharma, V. Background-subtraction using contour-based fusion of thermal and visible imagery. Comput. Vis. Image Underst. 2007, 106, 162–182. [Google Scholar] [CrossRef]
- Bondi, E.; Jain, R.; Aggrawal, P.; Anand, S.; Hannaford, R.; Kapoor, A.; Piavis, J.; Shah, S.; Joppa, L.; Dilkina, B.; et al. BIRDSAI: A Dataset for Detection and Tracking in Aerial Thermal Infrared Videos. In Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA, 1–5 March 2020; pp. 1736–1745. [Google Scholar] [CrossRef]
- Suo, J.; Wang, T.; Zhang, X.; Chen, H.; Zhou, W.; Shi, W. HIT-UAV: High-altitude Infrared Thermal Dataset. Sci. Data 2023, 10, 227. [Google Scholar] [CrossRef]
- Lynen, S.; Portmann, J. Thermal Infrared Dataset. 2014. Available online: https://projects.asl.ethz.ch/datasets/doku.php?id=ir:iricra2014 (accessed on 25 August 2025).
- Jia, X.; Zhu, C.; Li, M.; Tang, W.; Liu, S.; Zhou, W. LLVIP: Visible-Infrared Paired Dataset. arXiv 2023, arXiv:2108.10831. [Google Scholar]
- Ramírez-Ayala, O.; González-Hernández, I.; Salazar, S.; Flores, J.; Lozano, R. Real-Time Person Detection in Wooded Areas Using Thermal Images from an Aerial Perspective. Sensors 2023, 23, 9216. [Google Scholar] [CrossRef]
- CVAT.ai Corporation. Computer Vision Annotation Tool (CVAT). 2023. Available online: https://www.cvat.ai/ (accessed on 25 August 2025).
- Skalski, P. Make Sense. 2019. Available online: https://github.com/SkalskiP/make-sense/ (accessed on 28 August 2025).
- Jocher, G.; Qiu, J.; Chaurasia, A. Ultralytics YOLO. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 26 August 2025).
- Hosang, J.; Benenson, R.; Dollár, P.; Schiele, B. Effective Detection Proposals. IEEE TPAMI 2016, 38, 814. [Google Scholar] [CrossRef]
- NVIDIA Corporation. Jetson Orin NX & AGX Orin—Power Modeling. Available online: https://docs.nvidia.com/jetson/archives/r34.1/DeveloperGuide/text/SD/PlatformPowerAndPerformance/JetsonOrinNxSeriesAndJetsonAgxOrinSeries.html (accessed on 26 August 2025).
- Passalis, N. jetson-power. 2022. Available online: http://github.com/opendr-eu/jetson_power (accessed on 27 August 2025).
- Archet, A.; Gac, N.; Orieux, F.; Ventroux, N. Embedded AI on Jetson Orin SoCs. In Proceedings of the 17eme Colloque GDR SOC2, Lyon, France, 12–14 June 2023. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar] [CrossRef]
- Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: A Nested U-Net Architecture. arXiv 2018, arXiv:1807.10165. [Google Scholar]
- Diakogiannis, F.I.; Waldner, F.; Caccetta, P.; Wu, C. ResUNet-a: A framework for semantic segmentation of remotely sensed data. ISPRS J. Photogramm. Remote Sens. 2020, 162, 94–114. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. arXiv 2014, arXiv:1409.4842. [Google Scholar] [CrossRef]
- Cai, S.; Tian, Y.; Lui, H.; Zeng, H.; Wu1, Y.; Chen, G. Dense-UNet: A novel multiphoton in vivo cellular image segmentation model. Quant Imaging Med. Surg. 2020, 1275. [Google Scholar] [CrossRef]
- Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. arXiv 2019, arXiv:1709.01507. [Google Scholar]
- Wang, Q.; Liu, F.; Cao, Y.; Ullah, F.; Zhou, M. LFIR-YOLO: Lightweight Model for Infrared Vehicle and Pedestrian Detection. Sensors 2024, 24, 6609. [Google Scholar] [CrossRef] [PubMed]
- Liu, S.; He, H.; Zhang, Z.; Zhou, Y. LI-YOLO: An Object Detection Algorithm for UAV Aerial Images in Low-Illumination Scenes. Drones 2024, 8, 653. [Google Scholar] [CrossRef]
- Zhao, X.; Xia, Y.; Zhang, W.; Zheng, C.; Zhang, Z. YOLO-ViT-Based Method for Unmanned Aerial Vehicle Infrared Vehicle Target Detection. Remote Sens. 2023, 15, 3778. [Google Scholar] [CrossRef]
- Wu, Y.; Liu, X.; Hao, J.; Xing, Z.; Xi, H.; Sun, W.; Yao, Q. MBAF-MSCNet: Multi-Branch Fusion for Multi-Scale Context in Infrared Wildlife Detection. IEEE Access 2025, 13, 165830–165843. [Google Scholar] [CrossRef]
- Zhang, X.; Feng, Y.; Wang, N.; Lu, G.; Mei, S. Transformer-Based Person Detection in Paired RGB-T Aerial Images With VTSaR Dataset. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 5082–5099. [Google Scholar] [CrossRef]








| Model | AP50 (%) | AP[50–95] (%) | Precision (%) | Recall (%) |
|---|---|---|---|---|
| yolov8m | 84.13 ± 0.06 | 47.81 ± 0.04 | 85.81 ± 0.09 | 77.05 ± 0.12 |
| yolov8x | 85.17 ± 0.14 | 49.10 ± 0.09 | 86.43 ± 0.22 | 78.40 ± 0.19 |
| yolov9e | 84.39 ± 0.16 | 48.31 ± 0.08 | 85.95 ± 0.17 | 77.49 ± 0.18 |
| yolov10m | 84.15 ± 0.24 | 48.70 ± 0.22 | 85.61 ± 0.16 | 76.09 ± 0.30 |
| yolov10x | 84.45 ± 0.08 | 49.00 ± 0.05 | 85.89 ± 0.21 | 76.58 ± 0.20 |
| rtdetr-l | 74.98 ± 2.59 | 35.76 ± 1.27 | 76.24 ± 2.41 | 68.26 ± 2.21 |
| custom YOLO v8 x | custom YOLO v9 e | custom YOLO v10 x | ||||
| Measure\Format | ONNX | TensorRT | ONNX | TensorRT | ONNX | TensorRT |
| AP[50–95] | 49.0 | 49.0 | 48.2 | 48.1 | 48.7 | 48.8 |
| AP Loss Relative to PyTorch | −0.72% | −0.72% | −0.83% | −0.85% | −0.55% | −0.52% |
| AROptimal Confidence | 86.3 | 86.3 | 77.1 | 77.0 | 76.1 | 76.2 |
| AR Loss Relative to PyTorch | −0.49% | −0.49% | −0.69% | −0.71% | −0.51% | −0.49% |
| custom YOLO v8 m | custom YOLO v10 m | |||||
| Measure\Format | ONNX | TensorRT | ONNX | TensorRT | ||
| AP[50–95] | 47.8 | 47.8 | 48.6 | 48.5 | ||
| AP Loss Relative to PyTorch | −0.65% | −0.65% | −0.91% | −0.92% | ||
| AROptimal Confidence | 76.7 | 76.7 | 75.7 | 75.7 | ||
| AR Loss Relative to PyTorch | −0.65% | −0.65% | −0.75% | −0.75% | ||
| custom YOLO v8 x | custom YOLO v9 e | custom YOLO v10 x | ||||
| Measure\Format | TRT FP16 | TRT INT8 | TRT FP16 | TRT INT8 | TRT FP16 | TRT INT8 |
| AP[50–95] | 49.3 | 44.9 | 48.6 | 35.5 | 48.9 | 44.5 |
| AP Loss Relative to PyTorch | −0.06% | −8.55% | 0.01% | −26.36% | −0.09% | −8.82% |
| AROptimal Confidence | 86.7 | 78.0 | 77.6 | 50.3 | 76.4 | 69.8 |
| AR Loss Relative to TensorRT TF32 | −0.04% | −10.82% | 0.03% | −30.61% | −0.07% | −11.78% |
| Time Gain Relative to TensorRT TF32 | 40.48% | 51.45% | 37.19% | 33.21% | 34.97% | 29.86% |
| Energy Difference Relative to TensorRT TF32 | 51.72% | 75.34% | 50.53% | 94.79% | 49.92% | 72.74% |
| custom YOLO v8 m | custom YOLO v10 m | N/A | ||||
| Measure\Format | TRT FP16 | TRT INT8 | TRT FP16 | TRT INT8 | ||
| AP[50–95] | 48.1 | 42.8 | 49.0 | 40.3 | ||
| AP Loss Relative to PyTorch | −0.03% | −13.43% | −0.03% | −18.90% | ||
| AROptimal Confidence | 77.2 | 61.7 | 76.3 | 60.9 | ||
| AR Loss Relative to TensorRT TF32 | 0.00% | −21.07% | 0.00% | −22.29% | ||
| Time Gain Relative to TensorRT TF32 | 35.48% | 40.03% | 26.94% | −10.57% | ||
| Energy Difference Relative to TensorRT TF32 | 53.38% | 71.61% | 51.90% | 65.65% | ||
| Method | Loss Function | Precision | Recall | F1-Score | F2-Score |
|---|---|---|---|---|---|
| U-Net | Basnet | 0.9141 | 0.8705 | 0.8918 | 0.8789 |
| U-Net++ | Cross-Entropy | 0.9068 | 0.8799 | 0.8931 | 0.8851 |
| Residual U-Net | BCE Dice | 0.9096 | 0.8802 | 0.8946 | 0.8859 |
| Inception U-Net | Basnet | 0.9194 | 0.8738 | 0.8960 | 0.8825 |
| Dense U-Net | Basnet | 0.9191 | 0.8811 | 0.8997 | 0.8884 |
| Attention U-Net | BCE Dice | 0.9078 | 0.9185 | 0.9131 | 0.9163 |
| SE U-Net | Basnet | 0.9072 | 0.8805 | 0.8937 | 0.8857 |
| Phase (ms) | GPU | CPU | Metric | No Lim | Max 15W |
|---|---|---|---|---|---|
| Pre-proc time | 1.03 | 1.03 | Energy Over Idle (avg) | 0.70143 J | 1.3213 J |
| Inference time | 16.82 | 2080 | Pre-processing Time (avg) | 3.04 ms | 7.21 ms |
| Post-proc time | 3.2 | 3.2 | Inference Time (avg) | 33 ms | 43 ms |
| Total time | 21.06 | 2084 | Post-processing Time | 1.03 ms | 2.43 ms |
| FPS | 47.47 | 0.5 | Average FPS | 27 | 18.99 |
| Dataset | Method | mAP50 | Reference |
|---|---|---|---|
| FLIR ADAS | LFIR-YOLO | 0.72 | Wang et al., 2024 [71] |
| FLIR ADAS | YOLOv9t | 0.72 | Wang et al., 2024 [71] |
| FLIR ADAS | Our YOLOv10 (subset) | 0.70 | This work |
| LLVIP | LI-YOLO | 0.90 | Liu et al., 2024 [72] |
| LLVIP | YOLOv8 | 0.90 | Liu et al., 2024 [72] |
| LLVIP | Our YOLOv10 (subset) | 0.85 | This work |
| HIT-UAV | YOLO-ViT | 0.93 | Zhao et al., 2023 [73] |
| HIT-UAV | YOLOv7s | 0.93 | Suo et al., 2023 [53] |
| HIT-UAV | Our YOLOv10 (subset) | 0.88 | This work |
| BIRDSAI | MBAF-MSC | 0.93 | Wu et al., 2025 [74] |
| BIRDSAI | Our YOLOv10 (subset) | 0.89 | This work |
| RGBTDronePerson | VTSaR-Trans | 0.94 | Zhang et al., 2025 [75] |
| RGBTDronePerson | YOLOv5s | 0.94 | Zhang et al., 2023 [32] |
| RGBTDronePerson | Our YOLOv10 (subset) | 0.90 | This work |
| Meta-dataset (ours) | Our YOLOv9e | 0.85 | This work |
| Mountain (ours) | Our YOLOv11 | 0.80 | This work |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ulmămei, A.-A.; D’Adamo, T.; Vasile, C.-E.; Hobincu, R. Human Detection in UAV Thermal Imagery: Dataset Extension and Comparative Evaluation on Embedded Platforms. J. Imaging 2025, 11, 436. https://doi.org/10.3390/jimaging11120436
Ulmămei A-A, D’Adamo T, Vasile C-E, Hobincu R. Human Detection in UAV Thermal Imagery: Dataset Extension and Comparative Evaluation on Embedded Platforms. Journal of Imaging. 2025; 11(12):436. https://doi.org/10.3390/jimaging11120436
Chicago/Turabian StyleUlmămei, Andrei-Alexandru, Taddeo D’Adamo, Costin-Emanuel Vasile, and Radu Hobincu. 2025. "Human Detection in UAV Thermal Imagery: Dataset Extension and Comparative Evaluation on Embedded Platforms" Journal of Imaging 11, no. 12: 436. https://doi.org/10.3390/jimaging11120436
APA StyleUlmămei, A.-A., D’Adamo, T., Vasile, C.-E., & Hobincu, R. (2025). Human Detection in UAV Thermal Imagery: Dataset Extension and Comparative Evaluation on Embedded Platforms. Journal of Imaging, 11(12), 436. https://doi.org/10.3390/jimaging11120436

