Hierarchical Dual-Model Detection Framework for Spotted Seals Using Deep Learning on UAVs
Simple Summary
Abstract
1. Introduction
- (1)
- A dual-model hierarchical detection framework is developed, integrating UAV-based lightweight detection with high-precision ground-station verification to achieve the real-time monitoring and accurate population estimation of spotted seals.
- (2)
- A lightweight YOLOv10 [34] variant optimized for edge deployment is constructed, incorporating focal modulation networks (FocalNets) to enhance the detection of hard-to-recognize targets under limited onboard resources.
- (3)
- The ground-based YOLOv7 [35] model is enhanced with multi-scale feature pyramids and partial convolution, strengthening small-target representation and suppressing background interference, thereby achieving a practical balance between accuracy and efficiency.
2. Materials and Methods
2.1. Study Area
2.2. Data Acquisition
2.3. Real-Time Spotted Seal Detection Model Based on Enhanced YOLOv10
2.3.1. Lightweight C2f Module
2.3.2. Focal Modulation Networks
- (1)
- Focal Modulation
- (2)
- Hierarchical Contextualization
- (3)
- Gated Aggregation
2.4. Improved YOLOv7-Based Precision Detection Model for Spotted Seals
2.4.1. Small-Target Detection Layer
2.4.2. Partial Convolution
3. Results
3.1. Environmental Configuration and Evaluation Metrics
3.2. Selection of Baseline Models for UAVs and Ground Stations
3.3. Comparison and Ablation Experiments of UAV End Models
3.3.1. Comparison Experiment of Backbone Network
3.3.2. Comparison with Other UAV End Models
3.3.3. Ablation Experiment of UAV End Models
3.4. Comparison and Ablation Experiments of Ground-Station Models
3.4.1. Comparison with Other Ground-Station Models
3.4.2. Ablation Experiment of Ground-Station Models
3.5. Comparison of Detection Results Under Different Weather Conditions
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- McNeely, J.A.; Miller, K.R.; Reid, W.V.; Mittermeier, R.A.; Werner, T.B. Conserving the World’s Biological Diversity; IUCN: Gland, Switzerland; WRI: Washington, DC, USA; CI: Crystal City, VA, USA; WWF-US: Washington, DC, USA; World Bank: Washington, DC, USA, 1990. [Google Scholar]
- Allen, S.G.; Ainley, D.G.; Page, G.W. The effect of disturbance on harbor seal haul out patterns at Bolinas Lagoon, California. Fish. Bull. 1984, 82, 493. [Google Scholar]
- Zhuang, H.; Hou, L.; Wang, S.; Gao, Y.; Zhang, C.; Wang, Z.; Zhao, L.; He, Y.; Zhou, Q.; Lu, Z.; et al. Facial feature-based individual identification of spotted seals (Phoca largha). Acta Ecol. Sin. 2025, 45, 6586–6599. (In Chinese) [Google Scholar] [CrossRef]
- Birenbaum, Z.; Do, H.; Horstmyer, L.; Orff, H.; Ingram, K.; Ay, A. SEALNET: Facial recognition software for ecological studies of harbor seals. Ecol. Evol. 2022, 12, e8851. [Google Scholar] [CrossRef]
- Aarts, G.; Brasseur, S.; Poos, J.J.; Schop, J.; Kirkwood, R.; Van Kooten, T.; Mul, E.; Reijnders, P.; Rijnsdorp, A.D.; Tulp, I. Top-down pressure on a coastal ecosystem by harbor seals. Ecosphere 2019, 10, e02538. [Google Scholar] [CrossRef]
- Zhang, J.; Song, W. Construction and Practice of Marine Ecological Protection Importance Evaluation System–Taking Dalian Sea Area as an Example. Nat. Resour. Inf. 2025, 06, 47–54. (In Chinese) [Google Scholar]
- Wang, N.; Ding, K. Effects of the marine environment on spotted seals survival (Phoca largha) Bohai Sea. Mar. Sci. Bull. 2019, 38, 202–209. [Google Scholar] [CrossRef]
- Mulero-Pázmány, M.; Hurtado, S.; Barba-González, C.; Antequera-Gómez, M.L.; Díaz-Ruiz, F.; Real, R.; Navas-Delgado, I.; Aldana-Montes, J.F. Addressing significant challenges for animal detection in camera trap images: A novel deep learning-based approach. Sci. Rep. 2025, 15, 16191. [Google Scholar] [CrossRef] [PubMed]
- Cunha, F.; dos Santos, E.M.; Barreto, R.; Colonna, J.G. Filtering empty camera trap images in embedded systems. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2438–2446. [Google Scholar]
- Zhang, X.; Zhang, H.; Han, Y.; Weng, Q.; Yuan, Z.; Yao, Y. Research Progress on Wildlife Monitoring and Recognition Based on Deep Learning. Acta Theriol. Sin. 2022, 43, 251–258. [Google Scholar]
- Xiao, Z.; Chen, Y.; Zhou, X.; He, M.; Liu, L.; Yu, F.; Jiang, M. Human action recognition in immersive virtual reality based on multi-scale spatio-temporal attention network. Comput. Animat. Virtual Worlds 2024, 35, e2293. [Google Scholar] [CrossRef]
- Ma, Z.; Dong, Y.; Xia, Y.; Xu, D.; Xu, F.; Chen, F. Wildlife real-time detection in complex forest scenes based on YOLOv5s deep learning network. Remote Sens. 2024, 16, 1350. [Google Scholar] [CrossRef]
- Cen, Q.; Zhu, Q.; Wang, Y.; Chen, W.; Liu, S. YOLOv9-YX: Lightweight algorithm for underwater target detection. Vis. Comput. 2024, 41, 4033–4045. [Google Scholar] [CrossRef]
- He, G.; Zhang, X.; Wang, J.; Xu, P.; Hou, X.; Dong, W.; Lei, Y.; Jin, X.; Wang, W.; Tian, W.; et al. Advancing primate surveillance with image recognition techniques from unmanned aerial vehicles. Am. J. Primatol. 2025, 87, e23676. [Google Scholar] [CrossRef] [PubMed]
- Lee, S.; Song, Y.; Kil, S.H. Feasibility Analyses of Real-Time Detection of Wildlife Using UAV-Derived Thermal and RGB Images. Remote Sens. 2021, 13, 2169. [Google Scholar] [CrossRef]
- Hodgson, A.; Kelly, N.; Peel, D. Unmanned aerial vehicles (UAVs) for surveying marine fauna: A dugong case study. PLoS ONE 2013, 8, e79556. [Google Scholar] [CrossRef]
- Kiszka, J.J.; Mourier, J.; Gastrich, K.; Heithaus, M.R. Using unmanned aerial vehicles (UAVs) to investigate shark and ray densities in a shallow coral lagoon. Mar. Ecol. Prog. Ser. 2016, 560, 237–242. [Google Scholar] [CrossRef]
- Hodgson, J.C.; Baylis, S.M.; Mott, R.; Herrod, A.; Clarke, R.H. Precision wildlife monitoring using unmanned aerial vehicles. Sci. Rep. 2016, 6, 22574. [Google Scholar] [CrossRef]
- Beaver, J.T.; Baldwin, R.W.; Messinger, M.; Newbolt, C.H.; Ditchkoff, S.S.; Silman, M.R. Evaluating the use of drones equipped with thermal sensors as an effective method for estimating wildlife. Wildl. Soc. Bull. 2020, 44, 434–443. [Google Scholar] [CrossRef]
- Gonzalez, L.F.; Montes, G.A.; Puig, E.; Johnson, S.; Mengersen, K.; Gaston, K.J. Unmanned aerial vehicles (UAVs) and artificial intelligence revolutionizing wildlife monitoring and conservation. Sensors 2016, 16, 97. [Google Scholar] [CrossRef]
- Corcoran, E.; Winsen, M.; Sudholz, A.; Hamilton, G. Automated detection of wildlife using drones: Synthesis, opportunities and constraints. Methods Ecol. Evol. 2021, 12, 1103–1114. [Google Scholar] [CrossRef]
- Tuia, D.; Kellenberger, B.; Beery, S.; Costelloe, B.R.; Zuffi, S.; Risse, B.; Mathis, A.; Mathis, M.W.; Van Langevelde, F.; Burghardt, T.; et al. Perspectives in machine learning for wildlife conservation. Nat. Commun. 2022, 13, 792. [Google Scholar] [CrossRef]
- Yan, P.; Wang, W.; Li, G.; Zhao, Y.; Wang, J.; Wen, Z. A lightweight coal gangue detection method based on multispectral imaging and enhanced YOLOv8n. Microchem. J. 2024, 199, 110142. [Google Scholar] [CrossRef]
- Peng, J.; Wang, D.; Liao, X.; Shao, Q.; Sun, Z.; Yue, H.; Ye, H. Wild animal survey using UAS imagery and deep learning: Modified Faster R-CNN for kiang detection in Tibetan Plateau. ISPRS J. Photogramm. Remote Sens. 2020, 169, 364–376. [Google Scholar] [CrossRef]
- Gray, P.C.; Fleishman, A.B.; Klein, D.J.; McKown, M.W.; Bezy, V.S.; Lohmann, K.J.; Johnston, D.W. A convolutional neural network for detecting sea turtles in drone imagery. Methods Ecol. Evol. 2019, 10, 345–355. [Google Scholar] [CrossRef]
- Tripathi, R.N.; Agarwal, K.; Tripathi, V.; Badola, R.; Hussain, S.A. Conservation in action: Cost-effective UAVs and real-time detection of the globally threatened swamp deer (Rucervus duvaucelii). Ecol. Inform. 2025, 85, 102913. [Google Scholar] [CrossRef]
- Jiang, L.; Wu, L. Enhanced Yolov8 network with Extended Kalman Filter for wildlife detection and tracking in complex environments. Ecol. Inform. 2024, 84, 102856. [Google Scholar] [CrossRef]
- Wu, L.; Jinma, Y.; Wang, X.; Yang, F.; Xu, F.; Cui, X.; Sun, Q. Amur Tiger Individual Identification Based on the Improved InceptionResNetV2. Animals 2024, 14, 2312. [Google Scholar] [CrossRef] [PubMed]
- Zhao, Y.; Wang, L.; Lei, G.; Guo, C.; Ma, Q. Lightweight UAV Small Target Detection and Perception Based on Improved YOLOv8-E. Drones 2024, 8, 681. [Google Scholar] [CrossRef]
- Lu, Y.; Sun, M. Lightweight multidimensional feature enhancement algorithm LPS-YOLO for UAV remote sensing target detection. Sci. Rep. 2025, 15, 1340. [Google Scholar] [CrossRef]
- Axford, D.; Sohel, F.; Vanderklift, M.; Hodgson, A. Collectively advancing deep learning for animal detection in drone imagery: Successes, challenges, and research gaps. Ecol. Inform. 2024, 83, 102842. [Google Scholar] [CrossRef]
- Bartlett, B.; Santos, M.; Dorian, T.; Moreno, M.; Trslic, P.; Dooly, G. Real-Time UAV Surveys with the Modular Detection and Targeting System: Balancing Wide-Area Coverage and High-Resolution Precision in Wildlife Monitoring. Remote Sens. 2025, 17, 879. [Google Scholar] [CrossRef]
- Dat, N.N.; Richardson, T.; Watson, M.; Meier, K.; Kline, J.; Reid, S.; Maalouf, G.; Hine, D.; Mirmehdi, M.; Burghardt, T. WildLive: Near Real-time Visual Wildlife Tracking onboard UAVs. arXiv 2025, arXiv:2504.10165. [Google Scholar] [CrossRef]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar] [CrossRef]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar] [CrossRef]
- Jiao, F. Conservation and management of spotted seal resources in Liaodong Bay. China Fish. 2015, 04, 35–38. (In Chinese) [Google Scholar]
- Varghese, R.; M., S. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Chen, J.; Kao, S.-H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.H.; Chan, S.H.G. Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. arXiv 2023, arXiv:2303.03667. [Google Scholar] [CrossRef]
- Wang, C.; Han, Q.; Zhang, T.; Li, C.; Sun, X. Litchi picking points localization in natural environment based on the Litchi-YOSO model and branch morphology reconstruction algorithm. Comput. Electron. Agric. 2024, 226, 109473. [Google Scholar] [CrossRef]
- Zhang, M.; Tian, X. Transformer architecture based on mutual attention for image-anomaly detection. Virtual Real. Intell. Hardw. 2023, 5, 57–67. [Google Scholar] [CrossRef]
- Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
- Pham, V.; Ngoc, L.D.T.; Bui, D.L. Optimizing YOLO Architectures for Optimal Road Damage Detection and Classification: A Comparative Study from YOLOv7 to YOLOv10. arXiv 2024, arXiv:2410.08409. [Google Scholar] [CrossRef]
- Alshibli, A.; Memon, Q. Benchmarking YOLO Models for Marine Search and Rescue in Variable Weather Conditions. Automation 2025, 6, 35. [Google Scholar] [CrossRef]
- Mai, R.; Wang, J. UM-YOLOv10: Underwater Object Detection Algorithm for Marine Environment Based on YOLOv10 Model. Fishes 2025, 10, 173. [Google Scholar] [CrossRef]
- Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. arXiv 2018, arXiv:1807.11164. [Google Scholar] [CrossRef]
- Qin, D.; Leichner, C.; Delakis, M.; Fornoni, M.; Luo, S.; Yang, F.; Wang, W.; Banbury, C.; Ye, C.; Akin, B.; et al. MobileNetV4—Universal Models for the Mobile Ecosystem. arXiv 2024, arXiv:2404.10518. [Google Scholar] [CrossRef]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Luo, X.; Yao, M.; Chou, Y.; Xu, B.; Li, G. Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection. arXiv 2024, arXiv:2407.20708. [Google Scholar] [CrossRef]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-time Object Detection. arXiv 2024, arXiv:2304.08069. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Gao, S.; Zhang, P.; Yan, T.; Lu, H. Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection. In Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, VIC, Australia, 28 October–1 November 2024; Association for Computing Machinery: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
- Zhang, H.; Xu, C.; Li, Y.; Wang, J. Dense attention pyramid for tiny object detection. Pattern Recognit. 2021, 118, 108030. [Google Scholar]















| Configuration | Parameter |
|---|---|
| Programming language | Python 3.8.18 |
| Deep learning frameworks | PyTorch 1.8.0 |
| Operating system | Windows 10 X64 |
| CPU | Intel i9-10980XE |
| Host memory | 64 GB |
| GPU | Nvidia GeForce RTX 3080Ti |
| Model | Precision | Recall | mAP | GFLOPs | MB | Parameters | FPS |
|---|---|---|---|---|---|---|---|
| YOLOv11 | 0.857 | 0.654 | 0.735 | 6.3 | 5.5 | 2,582,347 | 666.7 |
| YOLOv10 | 0.867 | 0.672 | 0.741 | 1.2 | 5.4 | 2,492,822 | 625.0 |
| YOLOv8 | 0.856 | 0.668 | 0.742 | 8.1 | 6.3 | 3,005,843 | 769.2 |
| YOLOv7 | 0.930 | 0.847 | 0.901 | 103.2 | 74.8 | 36,479,926 | 166.7 |
| YOLOv5 | 0.920 | 0.726 | 0.810 | 15.8 | 14.4 | 7,012,822 | 108.1 |
| Model | Recall | mAP | MB | Parameters | FPS |
|---|---|---|---|---|---|
| Spike-YOLO | 0.609 | 0.696 | 27.1 | 13,248,643 | 196.1 |
| YOLOv7-tiny | 0.688 | 0.793 | 12.3 | 6,006,646 | 212.8 |
| RT-DETR | 0.683 | 0.753 | 66.2 | 31,985,795 | 81.3 |
| SSD | 0.465 | 0.691 | 90.6 | 23,612,246 | 25.2 |
| FF-YOLOv10 | 0.665 | 0.742 | 5.0 | 1,888,742 | 833.3 |
| Model | Recall | mAP | MB | Parameters | FPS |
|---|---|---|---|---|---|
| YOLOv10 | 0.672 | 0.741 | 5.4 | 2,492,822 | 625 |
| F-YOLOv10 | 0.664 | 0.748 | 4.8 | 1,780,323 | 212.8 |
| FF-YOLOv10 | 0.665 | 0.742 | 5.0 | 1,888,742 | 833.3 ↑33.3% |
| Models | Precision | Recall | mAP | MB | Parameters |
|---|---|---|---|---|---|
| YOLOv7 | 0.93 | 0.847 | 0.901 | 74.8 | 36,479,926 |
| YOLOv7+A | 0.937 | 0.874 | 0.928 | 72.5 | 37,023,248 |
| YOLOv7+A+B | 0.942 | 0.866 | 0.924 | 72.5 | 37,023,184 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, J.; Jin, F.; Ji, M.; Qu, L.; Wang, J.; Wang, C. Hierarchical Dual-Model Detection Framework for Spotted Seals Using Deep Learning on UAVs. Animals 2025, 15, 3100. https://doi.org/10.3390/ani15213100
Liu J, Jin F, Ji M, Qu L, Wang J, Wang C. Hierarchical Dual-Model Detection Framework for Spotted Seals Using Deep Learning on UAVs. Animals. 2025; 15(21):3100. https://doi.org/10.3390/ani15213100
Chicago/Turabian StyleLiu, Jun, Fengxiang Jin, Min Ji, Liang Qu, Juan Wang, and Chen Wang. 2025. "Hierarchical Dual-Model Detection Framework for Spotted Seals Using Deep Learning on UAVs" Animals 15, no. 21: 3100. https://doi.org/10.3390/ani15213100
APA StyleLiu, J., Jin, F., Ji, M., Qu, L., Wang, J., & Wang, C. (2025). Hierarchical Dual-Model Detection Framework for Spotted Seals Using Deep Learning on UAVs. Animals, 15(21), 3100. https://doi.org/10.3390/ani15213100

