MVPOD: A Dataset and Benchmark for Multi-Vertical-Perspective Object Detection in Multi-Platform Remote Sensing Images
Abstract
1. Introduction
2. Review of Normal Object Detection
2.1. Two-Stage Object Detection Methods
2.2. One-Stage Object Detection Methods
3. Review of Rotated Object Detection
3.1. Methods Based on Rotated Rectangular Boxes
3.2. Methods Based on Quadrilateral Bounding Boxes
3.3. Methods Based on Point Set Representation
4. Proposed MVPOD Dataset
4.1. Category Information
4.2. Data Collection
4.3. Annotation Types
4.4. Dataset Characteristics
5. Experiments and Analysis
5.1. Implementation Details
5.2. Evaluation Metrics
5.3. Object Detection Benchmark
5.3.1. Experimental Results
5.3.2. Visualization
5.4. Rotated Object Detection Benchmark
5.4.1. Experimental Results
5.4.2. Visualization
5.5. Vertical Perspective Contrast Experiment
6. Future Work
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A


References
- Lang, C.; Cheng, G.; Wu, J.; Li, Z.; Xie, X.; Li, J.; Han, J. Toward Open-World Remote Sensing Imagery Interpretation: Past, present, and future. IEEE Geosci. Remote Sens. Mag. 2024, 2–38. [Google Scholar] [CrossRef]
- Han, J.; Ding, J.; Li, J.; Xia, G.S. Align deep features for oriented object detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5602511. [Google Scholar] [CrossRef]
- Wu, X.; Li, W.; Hong, D.; Tao, R.; Du, Q. Deep learning for unmanned aerial vehicle-based object detection and tracking: A survey. IEEE Geosci. Remote Sens. Mag. 2021, 10, 91–124. [Google Scholar] [CrossRef]
- Ma, X.; Ouyang, W.; Simonelli, A.; Ricci, E. 3d object detection from images for autonomous driving: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 3537–3556. [Google Scholar] [CrossRef] [PubMed]
- Cheng, G.; Han, J.; Zhou, P.; Xu, D. Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection. IEEE Trans. Image Process. 2018, 28, 265–278. [Google Scholar] [CrossRef]
- Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
- Zhao, Z.Q.; Zheng, P.; Xu, S.t.; Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef]
- Faraji, H.; Chen, B. Drone-yolo: Improved yolo for small object detection in uav. In Proceedings of the 2023 8th International Conference on Image, Vision and Computing (ICIVC), Dalian, China, 27–29 July 2023; pp. 93–100. [Google Scholar]
- Zhang, H.; Liu, K.; Gan, Z.; Zhu, G.N. UAV-DETR: Efficient End-to-End Object Detection for Unmanned Aerial Vehicle Imagery. arXiv 2025, arXiv:2501.01855. [Google Scholar]
- Li, X.; Diao, W.; Mao, Y.; Li, X.; Sun, X. SCLNet: A Scale-Robust Complementary Learning Network for Object Detection in UAV Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5651119. [Google Scholar] [CrossRef]
- Wen, L.; Cheng, Y.; Fang, Y.; Li, X. A comprehensive survey of oriented object detection in remote sensing images. Expert Syst. Appl. 2023, 224, 119960. [Google Scholar] [CrossRef]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3974–3983. [Google Scholar]
- Zhang, S.; Long, J.; Xu, Y.; Mei, S. PMHO: Point-Supervised Oriented Object Detection Based on Segmentation-Driven Proposal Generation. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5638118. [Google Scholar] [CrossRef]
- Li, Z.; Hou, B.; Wu, Z.; Ren, B.; Ren, Z.; Jiao, L. Gaussian Synthesis for High-Precision Location in Oriented Object Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5619612. [Google Scholar] [CrossRef]
- Zhou, J.; Li, W.; Cao, Y.; Cai, H.; Huang, T.; Xia, G.S.; Li, X. Few-Shot Oriented Object Detection in Remote Sensing Images via Memorable Contrastive Learning. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5630814. [Google Scholar] [CrossRef]
- Zhou, S.; Liu, Z.; Luo, H.; Qi, G.; Liu, Y.; Zuo, H.; Zhang, J.; Wei, Y. GCA2Net: Global-Consolidation and Angle-Adaptive Network for Oriented Object Detection in Aerial Imagery. Remote Sens. 2025, 17, 1077. [Google Scholar] [CrossRef]
- Wang, X.; Han, C.; Huang, L.; Nie, T.; Liu, X.; Liu, H.; Li, M. AG-Yolo: Attention-Guided Yolo for Efficient Remote Sensing Oriented Object Detection. Remote Sens. 2025, 17, 1027. [Google Scholar] [CrossRef]
- Zhang, Y.; Yuan, Y.; Feng, Y.; Lu, X. Hierarchical and Robust Convolutional Neural Network for Very High-Resolution Remote Sensing Object Detection. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5535–5548. [Google Scholar] [CrossRef]
- Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
- Long, Y.; Gong, Y.; Xiao, Z.; Liu, Q. Accurate Object Localization in Remote Sensing Images Based on Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2486–2498. [Google Scholar] [CrossRef]
- Qu, S.; Dang, C.; Chen, W.; Liu, Y. SMA-YOLO: An Improved YOLOv8 Algorithm Based on Parameter-Free Attention Mechanism and Multi-Scale Feature Fusion for Small Object Detection in UAV Images. Remote Sens. 2025, 17, 2421. [Google Scholar] [CrossRef]
- Mao, Y.; Zhang, H.; Li, R.; Zhu, F.; Sun, R.; Ji, P. HSF-DETR: Hyper Scale Fusion Detection Transformer for Multi-Perspective UAV Object Detection. Remote Sens. 2025, 17, 1997. [Google Scholar] [CrossRef]
- Chen, Y.; Ye, Z.; Sun, H.; Gong, T.; Xiong, S.; Lu, X. Global–Local Fusion With Semantic Information Guidance for Accurate Small Object Detection in UAV Aerial Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4701115. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Everingham, M.; Eslami, S.A.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes challenge: A retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
- Ren, S. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef] [PubMed]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6154–6162. [Google Scholar]
- Jocher, G. YOLOv5 by Ultralytics. 2020. Available online: https://docs.ultralytics.com/models/yolov5/ (accessed on 15 June 2025).
- Jocher, G.; Qiu, J.; Chaurasia, A. Ultralytics YOLO. 2023. Available online: https://docs.ultralytics.com/zh/models/yolov8/ (accessed on 16 June 2025).
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–18 June 2024; pp. 16965–16974. [Google Scholar]
- Xiao, Y.; Xu, T.; Xin, Y.; Li, J. FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Singapore, 20–27 January 2025; Volume 39, pp. 8673–8681. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Redmon, J. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Girshick, R. Fast r-cnn. arXiv 2015, arXiv:1504.08083. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 29–29 October 2017; pp. 2961–2969. [Google Scholar]
- Zhang, H.; Chang, H.; Ma, B.; Wang, N.; Chen, X. Dynamic R-CNN: Towards high quality object detection via dynamic training. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 260–275. [Google Scholar]
- Sun, P.; Zhang, R.; Jiang, Y.; Kong, T.; Xu, C.; Zhan, W.; Tomizuka, M.; Li, L.; Yuan, Z.; Wang, C.; et al. Sparse r-cnn: End-to-end object detection with learnable proposals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14454–14463. [Google Scholar]
- Redmon, J. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
- Lyu, C.; Zhang, W.; Huang, H.; Zhou, Y.; Wang, Y.; Liu, Y.; Zhang, S.; Chen, K. RTMDet: An Empirical Study of Designing Real-Time Object Detectors. arXiv 2022, arXiv:2212.07784. [Google Scholar] [CrossRef]
- Lin, T. Focal Loss for Dense Object Detection. arXiv 2017, arXiv:1708.02002. [Google Scholar]
- Law, H.; Deng, J. Cornernet: Detecting objects as paired keypoints. In Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 765–781. [Google Scholar]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as Points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully convolutional one-stage object detection. arXiv 2019, arXiv:1904.01355. [Google Scholar] [CrossRef]
- Yang, Z.; Liu, S.; Hu, H.; Wang, L.; Lin, S. Reppoints: Point set representation for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9657–9666. [Google Scholar]
- Vaswani, A. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv 2010, arXiv:2010.04159. [Google Scholar]
- Chen, Q.; Chen, X.; Wang, J.; Zhang, S.; Yao, K.; Feng, H.; Han, J.; Ding, E.; Zeng, G.; Wang, J. Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Paris, France, 2–3 October 2023. [Google Scholar]
- Zhang, H.; Li, F.; Liu, S.; Zhang, L.; Su, H.; Zhu, J.; Ni, L.M.; Shum, H.Y. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. arXiv 2022, arXiv:2203.03605. [Google Scholar]
- Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X. Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. 2018, 20, 3111–3122. [Google Scholar] [CrossRef]
- Wang, K.; Wang, Z.; Li, Z.; Su, A.; Teng, X.; Liu, M.; Yu, Q. Oriented object detection in optical remote sensing images using deep learning: A survey. arXiv 2023, arXiv:2302.10473. [Google Scholar] [CrossRef]
- Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q. Learning RoI Transformer for Oriented Object Detection in Aerial Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 2849–2858. [Google Scholar]
- Yang, X.; Yan, J. Arbitrary-oriented object detection with circular smooth label. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 677–694. [Google Scholar]
- Yang, X.; Yang, X.; Yang, J.; Ming, Q.; Wang, W.; Tian, Q.; Yan, J. Learning High-Precision Bounding Box for Rotated Object Detection via Kullback-Leibler Divergence. Adv. Neural Inf. Process. Syst. 2021, 34, 18381–18394. [Google Scholar]
- Yang, X.; Yan, J.; Ming, Q.; Wang, W.; Zhang, X.; Tian, Q. Rethinking rotated object detection with gaussian wasserstein distance loss. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 11830–11841. [Google Scholar]
- Yang, X.; Yan, J.; Feng, Z.; He, T. R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 3163–3171. [Google Scholar]
- Han, J.; Ding, J.; Xue, N.; Xia, G.S. Redet: A rotation-equivariant detector for aerial object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2786–2795. [Google Scholar]
- Yang, X.; Zhou, Y.; Zhang, G.; Yang, J.; Wang, W.; Yan, J.; Zhang, X.; Tian, Q. The KFIoU loss for rotated object detection. arXiv 2022, arXiv:2201.12558. [Google Scholar]
- Hou, L.; Lu, K.; Xue, J.; Li, Y. Shape-Adaptive Selection and Measurement for Oriented Object Detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022. [Google Scholar]
- Yu, Y.; Da, F. Phase-Shifting Coder: Predicting Accurate Orientation in Oriented Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
- Xu, Y.; Fu, M.; Wang, Q.; Wang, Y.; Chen, K.; Xia, G.S.; Bai, X. Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 1452–1459. [Google Scholar] [CrossRef] [PubMed]
- Qian, W.; Yang, X.; Peng, S.; Yan, J.; Guo, Y. Learning modulated loss for rotated object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 19–21 May 2021; Volume 35, pp. 2458–2466. [Google Scholar]
- Guo, Z.; Liu, C.; Zhang, X.; Jiao, J.; Ji, X.; Ye, Q. Beyond Bounding-Box: Convex-hull Feature Adaptation for Oriented and Densely Packed Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Li, W.; Chen, Y.; Hu, K.; Zhu, J. Oriented reppoints for aerial object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1829–1838. [Google Scholar]
- Zhou, Q.; Yu, C. Point rcnn: An angle-free framework for rotated object detection. Remote Sens. 2022, 14, 2605. [Google Scholar] [CrossRef]
- Zhao, Z.; Xue, Q.; He, Y.; Bai, Y.; Wei, X.; Gong, Y. Projecting Points to Axes: Oriented Object Detection via Point-Axis Representation. arXiv 2024, arXiv:2407.08489. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J.; Zhou, P.; Guo, L. Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J. Photogramm. Remote Sens. 2014, 98, 119–132. [Google Scholar] [CrossRef]
- Zhu, H.; Chen, X.; Dai, W.; Fu, K.; Ye, Q.; Jiao, J. Orientation robust object detection in aerial images using deep convolutional neural network. In Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 27–30 September 2015; pp. 3735–3739. [Google Scholar]
- Liu, Z.; Wang, H.; Weng, L.; Yang, Y. Ship Rotated Bounding Box Space for Ship Extraction From High-Resolution Optical Satellite Images With Complex Backgrounds. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1074–1078. [Google Scholar] [CrossRef]
- Lam, D.; Kuzma, R.; McGee, K.; Dooley, S.; Laielli, M.; Klaric, M.; Bulatov, Y.; McCord, B. xview: Objects in context in overhead imagery. arXiv 2018, arXiv:1802.07856. [Google Scholar] [CrossRef]
- Ding, J.; Xue, N.; Xia, G.S.; Bai, X.; Yang, W.; Yang, M.Y.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; et al. Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges. arXiv 2021, arXiv:2102.12219. [Google Scholar] [CrossRef]
- Zhu, P.; Wen, L.; Du, D.; Bian, X.; Fan, H.; Hu, Q.; Ling, H. Detection and Tracking Meet Drones Challenge. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 7380–7399. [Google Scholar] [CrossRef]
- Du, D.; Qi, Y.; Yu, H.; Yang, Y.; Duan, K.; Li, G.; Zhang, W.; Huang, Q.; Tian, Q. The unmanned aerial vehicle benchmark: Object detection and tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 370–386. [Google Scholar]
- Tzutalin. LabelImg. 2015. Available online: https://github.com/tzutalin/labelImg (accessed on 30 May 2025).
- cgvict. roLabelImg. 2020. Available online: https://github.com/cgvict/roLabelImg (accessed on 30 May 2025).
- Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. MMDetection: Open MMLab Detection Toolbox and Benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar] [CrossRef]
- glenn jocher, G.J. Ultralytics. 2022. Available online: https://github.com/ultralytics/ultralytics (accessed on 18 June 2025).
- Zhou, Y.; Yang, X.; Zhang, G.; Wang, J.; Liu, Y.; Hou, L.; Jiang, X.; Liu, X.; Yan, J.; Lyu, C.; et al. MMRotate: A Rotated Object Detection Benchmark using PyTorch. In Proceedings of the 30th ACM International Conference on Multimedia, Lisbon, Portugal, 10–14 October 2022; pp. 7331–7334. [Google Scholar]
- Hu, S.M.; Liang, D.; Yang, G.Y.; Yang, G.W.; Zhou, W.Y. Jittor: A novel deep learning framework with meta-operators and unified graph execution. Sci. China Inf. Sci. 2020, 63, 222103. [Google Scholar] [CrossRef]
- Yuan, Y.; Zhang, Y. OLCN: An Optimized Low Coupling Network for Small Objects Detection. IEEE Geosci. Remote Sens. Lett. 2022, 19, 8022005. [Google Scholar] [CrossRef]
- Xu, C.; Wang, J.; Yang, W.; Yu, H.; Yu, L.; Xia, G.S. RFLA: Gaussian receptive field based label assignment for tiny object detection. In Proceedings of the European Conference on Computer Vision. Springer, Tel Aviv, Israel, 23–27 October 2022; pp. 526–543. [Google Scholar]
- Zhang, Z. Drone-YOLO: An Efficient Neural Network Method for Target Detection in Drone Images. Drones 2023, 7, 526. [Google Scholar] [CrossRef]
- Zhang, Y.; Ye, M.; Zhu, G.; Liu, Y.; Guo, P.; Yan, J. FFCA-YOLO for Small Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5611215. [Google Scholar] [CrossRef]
- Liu, D.; Zhang, J.; Qi, Y.; Wu, Y.; Zhang, Y. A Tiny Object Detection Method Based on Explicit Semantic Guidance for Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2024, 21, 6005405. [Google Scholar] [CrossRef]
- Liu, D.; Zhang, J.; Qi, Y.; Xi, Y.; Jin, J. Exploring Lightweight Structures for Tiny Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5623215. [Google Scholar] [CrossRef]










| Platform | Datasets | Categories | Images | Instances | Bounding Box Type | Vertical Perspective | Year |
|---|---|---|---|---|---|---|---|
| NWPU VHR-10 [68] | 10 | 800 | 3775 | horizontal | NO | 2014 | |
| UCAS-AOD [69] | 2 | 910 | 6029 | horizontal | NO | 2015 | |
| HRSC2016 [70] | 1 | 1070 | 2976 | oriented | NO | 2016 | |
| RSOD [20] | 4 | 976 | 6950 | horizontal | NO | 2017 | |
| Spaceborne | HRRSD [18] | 13 | 21,761 | 55,740 | horizontal | NO | 2017 |
| DOTA [12] | 15 | 2806 | 188,282 | oriented | NO | 2017 | |
| DIOR [19] | 20 | 23,463 | 192,472 | horizontal | NO | 2018 | |
| xView [71] | 60 | 1127 | 1,000,000 | horizontal | NO | 2021 | |
| ODAI [72] | 18 | 11,268 | 1,793,658 | oriented | NO | 2021 | |
| Airborne | VisDrone2021 [73] | 10 | 10,209 | 54,200 | horizontal | NO | 2021 |
| UAVDT [74] | 4 | 77,819 | 835,879 | horizontal | NO | 2018 | |
| Ground-based | COCO [24] | 80 | 123,287 | 886,266 | horizontal | NO | 2014 |
| VOC [25] | 20 | 21,503 | 52,090 | horizontal | NO | 2012 | |
| Multi-Platform | MVPOD | 8 | 10,470 | 15,467 | horizontal & oriented | YES | 2025 |
| Airplane | Car | Bus | Truck | Carrier | Cargoship | Warship | Bridge | |
|---|---|---|---|---|---|---|---|---|
| nadir-downward | 1489 | 292 | 25 | 30 | 185 | 1867 | 1455 | 1663 |
| oblique-downward | 288 | 3380 | 408 | 801 | 413 | 291 | 187 | 202 |
| horizontal | 165 | 0 | 693 | 671 | 106 | 319 | 308 | 0 |
| oblique-upward | 206 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| nadir-upward | 23 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Model | Backbone | Epochs | Par/Mb | GFLOPs | FPS | mAP | AP50 | AP75 | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Two-Stage Object Detection Methods | ||||||||||||||||
| Faster R-CNN [26] | R50 | 12 | 41 | 188 | 20 | 0.791 | 0.947 | 0.885 | 0.884 | 0.917 | 0.837 | 0.859 | 0.839 | 0.760 | 0.731 | 0.538 |
| Casc. R-CNN [27] | R50 | 12 | 69 | 216 | 12 | 0.837 | 0.957 | 0.917 | 0.870 | 0.945 | 0.921 | 0.902 | 0.872 | 0.825 | 0.781 | 0.580 |
| Sparse R-CNN [38] | R50 | 12 | 106 | 135 | 25 | 0.768 | 0.910 | 0.842 | 0.789 | 0.843 | 0.869 | 0.859 | 0.871 | 0.788 | 0.627 | 0.494 |
| OLCN [81] | R50 | 12 | 55 | 199 | 20 | 0.809 | 0.952 | 0.902 | 0.860 | 0.927 | 0.846 | 0.874 | 0.865 | 0.806 | 0.755 | 0.539 |
| RFLA [82] | R50 | 12 | 69 | 196 | 14 | 0.808 | 0.945 | 0.892 | 0.849 | 0.918 | 0.855 | 0.871 | 0.887 | 0.801 | 0.737 | 0.545 |
| One-Stage Object Detection Methods | ||||||||||||||||
| YOLOv5s [28] | CSPDark53 | 200 | 7 | 23 | 87 | 0.821 | 0.954 | 0.902 | 0.873 | 0.946 | 0.875 | 0.895 | 0.890 | 0.784 | 0.749 | 0.560 |
| YOLOv8s [29] | CSPDark53 | 200 | 10 | 16 | 58 | 0.826 | 0.960 | 0.919 | 0.871 | 0.934 | 0.881 | 0.909 | 0.879 | 0.783 | 0.761 | 0.588 |
| RTMDET [41] | CSPNeXt | 200 | 9 | 15 | 37 | 0.864 | 0.959 | 0.920 | 0.913 | 0.941 | 0.925 | 0.930 | 0.923 | 0.851 | 0.832 | 0.602 |
| Drone-YOLO [83] | Darknet53 | 200 | 11 | 37 | 39 | 0.869 | 0.961 | 0.920 | 0.927 | 0.963 | 0.935 | 0.935 | 0.931 | 0.849 | 0.824 | 0.584 |
| FFCA-YOLO [84] | CSPDark53 | 200 | 5 | 37 | 86 | 0.785 | 0.953 | 0.915 | 0.808 | 0.905 | 0.884 | 0.860 | 0.818 | 0.740 | 0.705 | 0.559 |
| FBRT-YOLO [32] | CSPDark53 | 200 | 3 | 23 | 70 | 0.865 | 0.962 | 0.923 | 0.906 | 0.957 | 0.949 | 0.934 | 0.914 | 0.844 | 0.805 | 0.609 |
| ESG-TOD [85] | R50 | 36 | 33 | 387 | 19 | 0.747 | 0.913 | 0.850 | 0.816 | 0.875 | 0.800 | 0.792 | 0.813 | 0.727 | 0.673 | 0.477 |
| LTDNet [86] | RepVit | 36 | 5 | 29 | 45 | 0.746 | 0.915 | 0.849 | 0.818 | 0.872 | 0.799 | 0.791 | 0.818 | 0.725 | 0.671 | 0.477 |
| DINO [50] | R50 | 24 | 48 | 249 | 14 | 0.858 | 0.952 | 0.900 | 0.919 | 0.955 | 0.933 | 0.916 | 0.925 | 0.861 | 0.788 | 0.567 |
| RT-DETR [31] | R50 | 200 | 42 | 125 | 31 | 0.840 | 0.938 | 0.894 | 0.912 | 0.934 | 0.894 | 0.920 | 0.914 | 0.798 | 0.770 | 0.577 |
| UAV-DETR [9] | R50 | 200 | 34 | 103 | 35 | 0.850 | 0.938 | 0.899 | 0.919 | 0.959 | 0.918 | 0.927 | 0.924 | 0.775 | 0.781 | 0.598 |
| Model | Backbone | Epochs | Par/Mb | GFLOPs | FPS | AP50 | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Rotated Rectangular Box Methods | ||||||||||||||
| RoITransf. [53] | R50 | 12 | 55.3 | - | 13 | 0.853 | 0.909 | 0.908 | 0.895 | 0.876 | 0.882 | 0.807 | 0.794 | 0.757 |
| KLD [55] | R50 | 12 | 36 | 213 | 19 | 0.763 | 0.906 | 0.896 | 0.864 | 0.827 | 0.834 | 0.635 | 0.536 | 0.605 |
| GWD [56] | R50 | 12 | 36 | 213 | 20 | 0.762 | 0.905 | 0.897 | 0.869 | 0.824 | 0.836 | 0.616 | 0.546 | 0.603 |
| R3Det [57] | R50 | 12 | 42 | 332 | 17 | 0.782 | 0.905 | 0.908 | 0.879 | 0.836 | 0.797 | 0.681 | 0.614 | 0.633 |
| S2A-Net [2] | R50 | 12 | 36 | 197 | 20 | 0.845 | 0.907 | 0.908 | 0.893 | 0.868 | 0.929 | 0.775 | 0.768 | 0.713 |
| ReDet [58] | Re50 | 12 | 32 | - | 12 | 0.857 | 0.908 | 0.908 | 0.899 | 0.863 | 0.893 | 0.785 | 0.798 | 0.799 |
| SASM [60] | R50 | 12 | 37 | 194 | 18 | 0.755 | 0.893 | 0.895 | 0.737 | 0.652 | 0.806 | 0.621 | 0.692 | 0.740 |
| KFIoU [59] | R50 | 12 | 36 | 213 | 20 | 0.816 | 0.904 | 0.898 | 0.883 | 0.861 | 0.878 | 0.760 | 0.688 | 0.654 |
| PSC [61] | R50 | 12 | 36 | 215 | 19 | 0.764 | 0.906 | 0.908 | 0.857 | 0.837 | 0.848 | 0.616 | 0.545 | 0.597 |
| Quadrilateral Bounding Box Methods | ||||||||||||||
| GlidVertex [62] | R50 | 12 | 41 | - | 12 | 0.716 | 0.889 | 0.800 | 0.876 | 0.745 | 0.729 | 0.661 | 0.545 | 0.486 |
| RSDet [63] | R50 | 12 | 36 | - | 14 | 0.773 | 0.839 | 0.759 | 0.949 | 0.915 | 0.880 | 0.632 | 0.564 | 0.646 |
| Point Set Representation Methods | ||||||||||||||
| CFA [64] | R50 | 12 | 37 | 194 | 17 | 0.797 | 0.906 | 0.906 | 0.861 | 0.822 | 0.870 | 0.714 | 0.669 | 0.627 |
| OrRepPoints [65] | R50 | 12 | 37 | 194 | 18 | 0.823 | 0.899 | 0.905 | 0.868 | 0.807 | 0.847 | 0.791 | 0.729 | 0.735 |
| Test Sets | AP | Airplane | Car | Bus | Truck | Carrier | Cargoship | Warship | Bridge |
|---|---|---|---|---|---|---|---|---|---|
| YOLOv5 | |||||||||
| Training on the nadir-downward perspective dataset | |||||||||
| nadir-downward | 0.689 | 0.858 | 0.857 | 0.533 | 0.617 | 0.716 | 0.711 | 0.633 | 0.591 |
| oblique-downward | 0.391 | 0.765 | 0.77 | 0.328 | 0.261 | 0.111 | 0.266 | 0.0645 | 0.562 |
| horizontal | 0.037 | 0.006 | - | 0.052 | 0.109 | 0.007 | 0.040 | 0.008 | - |
| Training on the oblique-downward perspective dataset | |||||||||
| nadir-downward | 0.475 | 0.742 | 0.919 | 0.501 | 0.779 | 0.1 | 0.289 | 0.065 | 0.405 |
| oblique-downward | 0.755 | 0.685 | 0.929 | 0.846 | 0.833 | 0.858 | 0.772 | 0.669 | 0.444 |
| horizontal | 0.556 | 0.378 | - | 0.515 | 0.493 | 0.756 | 0.587 | 0.61 | - |
| Training on the horizontal perspective dataset | |||||||||
| nadir-downward | 0.00392 | 0.003 | - | 0.000 | 0.003 | 0.002 | 0.016 | 0.000 | - |
| oblique-downward | 0.295 | 0.047 | - | 0.035 | 0.039 | 0.634 | 0.455 | 0.560 | - |
| horizontal | 0.793 | 0.634 | - | 0.882 | 0.885 | 0.789 | 0.787 | 0.783 | - |
| Faster R-CNN | |||||||||
| Training on the nadir-downward perspective dataset | |||||||||
| nadir-downward | 0.644 | 0.854 | 0.895 | 0.279 | 0.513 | 0.672 | 0.723 | 0.684 | 0.536 |
| oblique-downward | 0.391 | 0.760 | 0.741 | 0.320 | 0.197 | 0.199 | 0.289 | 0.093 | 0.529 |
| horizontal | 0.052 | 0.005 | - | 0.065 | 0.105 | 0.008 | 0.133 | 0.002 | - |
| Training on the oblique-downward perspective dataset | |||||||||
| nadir-downward | 0.484 | 0.708 | 0.924 | 0.684 | 0.746 | 0.102 | 0.307 | 0.070 | 0.333 |
| oblique-downward | 0.750 | 0.691 | 0.922 | 0.792 | 0.829 | 0.830 | 0.761 | 0.749 | 0.426 |
| horizontal | 0.621 | 0.484 | - | 0.566 | 0.536 | 0.837 | 0.635 | 0.668 | - |
| Training on the horizontal perspective dataset | |||||||||
| nadir-downward | 0.007 | 0.005 | - | 0.001 | 0.019 | 0.001 | 0.002 | 0.001 | - |
| oblique-downward | 0.367 | 0.036 | - | 0.122 | 0.164 | 0.681 | 0.532 | 0.669 | - |
| horizontal | 0.816 | 0.648 | - | 0.893 | 0.874 | 0.858 | 0.821 | 0.800 | - |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jin, H.; Chen, J.; Zhang, Y.; Su, H.; Wang, B. MVPOD: A Dataset and Benchmark for Multi-Vertical-Perspective Object Detection in Multi-Platform Remote Sensing Images. Remote Sens. 2025, 17, 3029. https://doi.org/10.3390/rs17173029
Jin H, Chen J, Zhang Y, Su H, Wang B. MVPOD: A Dataset and Benchmark for Multi-Vertical-Perspective Object Detection in Multi-Platform Remote Sensing Images. Remote Sensing. 2025; 17(17):3029. https://doi.org/10.3390/rs17173029
Chicago/Turabian StyleJin, Haiyan, Jintao Chen, Yuanlin Zhang, Haonan Su, and Bin Wang. 2025. "MVPOD: A Dataset and Benchmark for Multi-Vertical-Perspective Object Detection in Multi-Platform Remote Sensing Images" Remote Sensing 17, no. 17: 3029. https://doi.org/10.3390/rs17173029
APA StyleJin, H., Chen, J., Zhang, Y., Su, H., & Wang, B. (2025). MVPOD: A Dataset and Benchmark for Multi-Vertical-Perspective Object Detection in Multi-Platform Remote Sensing Images. Remote Sensing, 17(17), 3029. https://doi.org/10.3390/rs17173029

