MSFN-YOLOv11: A Novel Multi-Scale Feature Fusion Recognition Model Based on Improved YOLOv11 for Real-Time Monitoring of Birds in Wetland Ecosystems
Simple Summary
Abstract
1. Introduction
2. Materials and Methods
2.1. Overview of the Methodology
2.2. Birds Dataset
2.3. Data Augmentation
2.4. The Original YOLOv11 Network
2.5. The Enhanced YOLOv11 Network
2.6. Assessment Metrics
- True Positive (TP): The number of positive instances correctly detected by the model.
- False Positive (FP): The number of negative instances incorrectly detected as positive.
- False Negative (FN): The number of positive instances that the model failed to detect.
- True Negative (TN): The number of negative instances correctly detected as negative.
- The Average Precision (AP) is computed for each object class based on the Precision-Recall curve.
- mAP@50 is the mean of AP across all classes at a single IoU threshold of 0.5. It is suitable for scenarios with moderate localization requirements.
- mAP@50:95 is a more stringent metric, defined as the average mAP computed at multiple IoU thresholds from 0.5 to 0.95 in steps of 0.05. It imposes a higher demand on bounding box accuracy.
3. Experimental Results
3.1. Experimental Environment
3.2. Enhanced Convergence Robustness of MSFN-YOLOv11
- (1)
- Analysis of the loss/Accuracy Graph
- (2)
- F1–Confidence Curve Analysis
3.3. Overall Performance Analysis of MSFN-YOLOv11 for Bird Detection
- (1)
- Superior Performance and Efficiency of MSFN-YOLOv11
- (2)
- Detection Accuracy of MSFN-YOLOv11 for ten types of birds
3.4. Generalization Performance in Validation Experiment
- (1)
- Detection Results of MSFN-YOLOv11 Under Noisy Conditions
- (2)
- Birds Identification and Application on Real Surveillance Videos
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Gregory, R.D.; Noble, D.G.; Field, R.; Marchant, J.H.; Raven, M.; Gibbons, D. Using birds as indicators of biodiversity. Ornis Hung. 2003, 12, 11–24. [Google Scholar]
- Li, Y.; Qian, F.; Silbernagel, J.; Larson, H. Community structure, abundance variation and population trends of waterbirds in relation to water level fluctuation in Poyang Lake. J. Great Lakes Res. 2019, 45, 976–985. [Google Scholar] [CrossRef]
- Wang, F.; Zhang, R.; Kang, P.; Zhao, H. An evaluation of the ecological security of the Dongting Lake, China. Desalination Water Treat. 2018, 110, 283–297. [Google Scholar] [CrossRef]
- Hong, S.-J.; Han, Y.; Kim, S.-Y.; Lee, A.-Y.; Kim, G. Application of Deep-Learning Methods to Bird Detection Using Unmanned Aerial Vehicle Imagery. Sensors 2019, 19, 1651. [Google Scholar] [CrossRef]
- Chalmers, C.; Fergus, P.; Wich, S.; Longmore, S.N.; Walsh, N.D.; Stephens, P.A.; Sutherland, C.; Matthews, N.; Mudde, J.; Nuseibeh, A. Removing Human Bottlenecks in Bird Classification Using Camera Trap Images and Deep Learning. Remote Sens. 2023, 15, 2638. [Google Scholar] [CrossRef]
- Wu, E.; Wang, H.; Lu, H.; Zhu, W.; Jia, Y.; Wen, L.; Choi, C.-Y.; Guo, H.; Li, B.; Sun, L.; et al. Unlocking the Potential of Deep Learning for Migratory Waterbirds Monitoring Using Surveillance Video. Remote Sens. 2022, 14, 514. [Google Scholar] [CrossRef]
- Hassanie, S.; Gohar, A.; Ali, H.; Khan, T.; Faiz, S.; Raja, H.; Ali, A. A Scalable AI Approach to Bird Species Identification for Conservation and Ecological Planning. IEEE Access 2025, 13, 1000–1010. [Google Scholar] [CrossRef]
- Chen, X.; Liu, C.; Chen, L.; Zhu, X.; Zhang, Y.; Wang, C. A Pavement Crack Detection and Evaluation Framework for a UAV Inspection System Based on Deep Learning. Appl. Sci. 2024, 14, 1157. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Qiu, Z.; Zhu, X.; Liao, C.; Shi, D.; Kuang, Y.; Li, Y.; Zhang, Y. Detection of Bird Species Related to Transmission Line Faults Based on Lightweight Convolutional Neural Network. IET Gener. Transm. Distrib. 2022, 16, 869–881. [Google Scholar] [CrossRef]
- Vo, H.-T.; Nguyen Thien, N.; Chau Mui, K. Bird Detection and Species Classification: Using YOLOv5 and Deep Transfer Learning Models. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 942–950. [Google Scholar] [CrossRef]
- Jiang, T.; Zhao, J.; Wang, M. Bird Detection on Power Transmission Lines Based on Improved YOLOv7. Appl. Sci. 2023, 13, 11940. [Google Scholar] [CrossRef]
- Lei, J.; Gao, S.; Rasool, M.A.; Fan, R.; Jia, Y.; Lei, G. Optimized Small Waterbird Detection Method Using Surveillance Videos Based on YOLOv7. Animals 2023, 13, 1929. [Google Scholar] [CrossRef] [PubMed]
- Nie, H.; Pang, H.; Ma, M.; Zheng, R. A Lightweight Remote Sensing Small Target Image Detection Algorithm Based on Improved YOLOv8. Sensors 2024, 24, 2952. [Google Scholar] [CrossRef] [PubMed]
- Hidayatullah, P.; Syakrani, N.; Sholahuddin, M.R.; Gelar, T.; Tubagus, R. YOLOv8 to YOLO11: A Comprehensive Architecture In-depth Comparative Review. arXiv 2025, arXiv:2501.13400. [Google Scholar] [CrossRef]
- Vijayakumar, A.; Vairavasundaram, S. YOLO-Based Object Detection Models: A Review and Its Applications. Multimodal Tools Appl. 2024, 83, 83535–83574. [Google Scholar] [CrossRef]
- Ma, J.; Guo, J.; Zheng, X.; Fang, C. An Improved Bird Detection Method Using Surveillance Videos from Poyang Lake Based on YOLOv8. Animals 2024, 14, 3353. [Google Scholar] [CrossRef]
- Yang, X.; Cheng, Y.; Dong, M.; Xie, X. YOLO-AWK: A Model for Injurious Bird Detection in Complex Farmland Environments. Symmetry 2025, 17, 1210. [Google Scholar] [CrossRef]
- Cao, Y.; Liu, M.; Liu, S.; Wang, X.; Lei, L. Physics-Guided ISO-Dependent Sensor Noise Modeling for Extreme Low-Light Photography. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 5744–5753. [Google Scholar] [CrossRef]
- Hasinoff, S.W.; Durand, F.; Freeman, W.T. Noise-Optimal Capture for High Dynamic Range Photography. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; pp. 553–560. [Google Scholar] [CrossRef]
- Song, Q.; Guan, Y.; Guo, X.; Guo, X.; Chen, Y.; Wang, H.; Ge, J.-P.; Wang, T.; Bao, L. Benchmarking wild bird detection in complex forest scenes. Ecol. Inform. 2024, 80, 102466. [Google Scholar] [CrossRef]
- Mpouziotas, D.; Karvelis, P.; Tsoulos, I.; Stylios, C. Automated Wildlife Bird Detection from Drone Footage Using Computer Vision Techniques. Appl. Sci. 2023, 13, 7787. [Google Scholar] [CrossRef]
- Wah, C.; Branson, S.; Welinder, P.; Perona, P.; Belongie, S. The Caltech-Ucsd Birds-200-2011 Dataset; California Institute of Techlogy: Pasadena, CA, USA, 2011. [Google Scholar]
- Zhang, L.; Wang, Y.; Zhou, J.; Zhang, C.; Zhang, Y.; Guan, J.; Bian, Y.; Zhou, S. Hierarchical few-shot object detection: Problem, benchmark and method. In Proceedings of the 30th ACM International Conference on Multimedia. 2022: 2002–2011. (HiFSOD-Bird), Lisboa, Portugal, 10–14 October 2022. [Google Scholar] [CrossRef]
- Ultralytics. Ultralytics YOLOv11, Version 8.3.0. Frederick, Maryland, United States. 2024. Available online: https://github.com/ultralytics/ultralytics (accessed on 18 January 2025).
- Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024. [Google Scholar] [CrossRef]
- Hu, S.; Gao, F.; Zhou, X.W.; Dong, J.Y. Hybrid Convolutional and Attention Network for Hyperspectral Image Denoising. arXiv 2024, arXiv:2403.10067. [Google Scholar] [CrossRef]
- Giannone, C.; Sahraeibelverdy, M.; Lamanna, M.; Cavallini, D.; Formigoni, A.; Tassinari, P.; Torreggiani, D.; Bovo, M. Automated Dairy Cow Identification and Feeding Behaviour Analysis Using a Computer Vision Model Based on YOLOv8. Smart Agric. Technol. 2025, 12, 101304. [Google Scholar] [CrossRef]
- Jocher, G.; Derrenger, P. yolo11.yaml. GitHub 2025. Available online: https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/models/11/yolo11.yaml (accessed on 20 January 2025).
- Hidayatullah, P.; Tubagus, R. YOLO11 Architecture—Detailed Explanation [Video]. YouTube, 28 October 2024. Available online: https://www.youtube.com/watch?v=L9Va7Y9UT8E (accessed on 18 January 2025).
- Lengchitu, F.D. A Bird That Is Easy to Remember—Pied Avocet. Bilibili 2022. Available online: https://www.bilibili.com/video/BV1kK411B7Wd?vd_source=d0a7f02cdb49ce9a5e01a2d203ab1d77 (accessed on 28 October 2025).
- Shangengzi. Black Stork Foraging Leisurely. Bilibili 2025. Available online: https://www.bilibili.com/video/BV1UvsizaEdL?share_source=copy_web (accessed on 28 October 2025).
- National Ecology. Oriental Stork: The Elegant Dancer at Shengjin Lake. Bilibili 2025. Available online: https://www.bilibili.com/video/BV1rfAze2EwW/?share_source=copy_web&vd_source=d0a7f02cdb49ce9a5e01a2d203ab1d77 (accessed on 28 October 2025).
- Zhuwai. Exclusive Report: The Clash Between Green and Orange Mandarin Duck Factions Over Mid-Autumn Festival Naming Rights. Bilibili 2024. Available online: https://www.bilibili.com/video/BV1eEtheuEuB/?share_source=copy_web&vd_source=d0a7f02cdb49ce9a5e01a2d203ab1d77 (accessed on 28 October 2025).














| Bird Types | Bird Name | Images | Samples | |
|---|---|---|---|---|
| Training Set | Test Set | |||
| B1 | Black Stork | 489 | 572 | 142 |
| B2 | Black-faced Spoonbill | 514 | 488 | 154 |
| B3 | Common Black-headed Gull | 311 | 523 | 153 |
| B4 | Great Bustard | 527 | 563 | 140 |
| B5 | Greater White-fronted Goose | 446 | 607 | 150 |
| B6 | Oriental Stork | 413 | 493 | 107 |
| B7 | Pied Avocet | 426 | 515 | 136 |
| B8 | Scaly-sided Merganser | 445 | 511 | 147 |
| B9 | White Crane | 342 | 599 | 171 |
| B10 | White-tailed Sea Eagle | 532 | 531 | 122 |
| / | Total | 4540 | 5402 | 1422 |
| Noise Types | Standard Deviation (σ) | Ratio | Quantity/Sheet |
|---|---|---|---|
| Mild Gaussian Noise | 0.2 | 0.10 | 427 |
| Medium Gaussian Noise | 0.3 | 0.12 | 512 |
| Severe Gaussian Noise | 0.4 | 0.03 | 126 |
| Environment | Configuration Name | Configuration Parameter |
|---|---|---|
| Hardware environment | CPU | Intel(R)Xeon(R) Gold 5220 CPU @ 2.20 GHz |
| temporary memory | 62 G | |
| GPU | NVIDIA_Tesla_V100_SXM2_32_GB | |
| video memory | 32 GB | |
| Software environment | operating system | Linux |
| PyTorch | v2.4.1 | |
| CUDA | v11.8 | |
| Python | v3.8.2 | |
| NVIDIA Driver | 535.154.02 | |
| Training Settings | Optimizer | SGD |
| Epochs/Batch/Size | 500/16/640 × 640 | |
| Seed | 0 (deterministic = True) |
| Models | P (%) | R (%) | mAP@50 (%) | mAP@50-95 (%) | Training Time/h |
|---|---|---|---|---|---|
| YOLOv8n (Baseline) | 92.4 | 96.0 | 95.6 | 81.7 | 2.914 |
| YOLOv11n (Original) | 95.0+2.6 | 96.0+0.0 | 96.1+0.5 | 82.9+1.2 | 2.886−0.028 |
| MSFN-YOLOv11n (Ours) | 96.3+3.9 | 97.0+1.0 | 96.4+0.8 | 83.2+1.5 | 2.367−0.547 |
| Bird Types | mAP@50 (%) | F1-Score (%) | ||||
|---|---|---|---|---|---|---|
| YOLOv8n (Baseline) | YOLOv11n (Original) | MSFN-YOLOv11n (Ours) | YOLOv8n (Baseline) | YOLOv11n (Original) | MSFN-YOLOv11n (Ours) | |
| B1 | 96.7 | 97.5 | 98.3 | 93.9 | 92.3 | 95.8 |
| B2 | 95.3 | 96.1 | 97.1 | 83.6 | 88.2 | 89.6 |
| B3 | 93.3 | 94.8 | 95.8 | 91.9 | 91.0 | 93.4 |
| B4 | 97.5 | 97.3 | 97.9 | 95.5 | 95.8 | 96.3 |
| B5 | 94.6 | 95.3 | 97.8 | 95.8 | 95.8 | 96.8 |
| B6 | 96.4 | 97.4 | 99.4 | 93.7 | 93.6 | 92.5 |
| B7 | 95.1 | 96.7 | 92.2 | 92.8 | 94.7 | 95.8 |
| B8 | 96.7 | 96.7 | 95.9 | 82.9 | 84.9 | 87.7 |
| B9 | 90.6 | 90.5 | 90.5 | 90.2 | 89.7 | 94.4 |
| B10 | 97.4 | 97.6 | 98.9 | 91.1 | 94.1 | 92.9 |
| Videos (Birds) | Frame Count | FPS (f/s) | Accuracy (%) | Average Confidences (%) | Inference Time (ms) | Tracking or Smoothing |
|---|---|---|---|---|---|---|
| Video1 (Pied Avocet) [32] | 1750 | 70 | 62.8 | 71.1 | 11.43 | Smoothing |
| Video2 (Black Stork) [33] | 987 | 85 | 66.5 | 76.8 | 9.37 | Smoothing |
| Video3 (Oriental Stork) [34] | 1800 | 68 | 52.6 | 52.2 | 11.35 | Smoothing |
| Video4 (Scaly-sided Merganser) [35] | 2047 | 65 | 70.5 | 79.3 | 13.29 | Smoothing |
| Statistics (Average) | 1646 | 72 | 63.1 | 69.85 | 11.36 | Smoothing |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, L.; Ye, L.; Chen, X.; Chu, N. MSFN-YOLOv11: A Novel Multi-Scale Feature Fusion Recognition Model Based on Improved YOLOv11 for Real-Time Monitoring of Birds in Wetland Ecosystems. Animals 2025, 15, 3472. https://doi.org/10.3390/ani15233472
Wang L, Ye L, Chen X, Chu N. MSFN-YOLOv11: A Novel Multi-Scale Feature Fusion Recognition Model Based on Improved YOLOv11 for Real-Time Monitoring of Birds in Wetland Ecosystems. Animals. 2025; 15(23):3472. https://doi.org/10.3390/ani15233472
Chicago/Turabian StyleWang, Linqi, Lin Ye, Xinbao Chen, and Nan Chu. 2025. "MSFN-YOLOv11: A Novel Multi-Scale Feature Fusion Recognition Model Based on Improved YOLOv11 for Real-Time Monitoring of Birds in Wetland Ecosystems" Animals 15, no. 23: 3472. https://doi.org/10.3390/ani15233472
APA StyleWang, L., Ye, L., Chen, X., & Chu, N. (2025). MSFN-YOLOv11: A Novel Multi-Scale Feature Fusion Recognition Model Based on Improved YOLOv11 for Real-Time Monitoring of Birds in Wetland Ecosystems. Animals, 15(23), 3472. https://doi.org/10.3390/ani15233472

