Deep Learning Models for Waterfowl Detection and Classification in Aerial Images †
Abstract
:1. Introduction
- 1.
- We created three new labeled datasets specifically designed for waterfowl detection in aerial images, along with a new dataset for waterfowl classification in aerial images.
- 2.
- Through rigorous evaluation using authentic waterfowl datasets, we assessed the efficacy of cutting-edge supervised deep learning models for both waterfowl detection and classification. Our analysis yielded notably accurate outcomes, demonstrating the models’ robust performance in real-life scenarios.
- 3.
- We trained and evaluated semi-supervised learning models for waterfowl detection and classification. Our experiments’ results showed an improvement in detection and classification accuracy.
2. Related Work
2.1. Deep Learning Methods for Object Detection
2.2. Deep Learning Methods for Image Classification
3. New Waterfowl Aerial-Image Datasets
3.1. Waterfowl-Detection Datasets
3.2. Waterfowl-Classification Dataset
4. Methods
4.1. Deep Learning Models for Waterfowl Detection
4.2. Deep Learning Models for Waterfowl Classification
4.3. Data Processing
4.4. Evaluation Metrics
5. Experimental Results
5.1. Performance of Detectors Trained Using Individual Datasets
5.2. Performance of Detectors Trained Using All Datasets Combined
5.3. Performance of Altitude-Specific Detection Models
5.4. Performance of Semi-Supervised Learning Detectors
5.5. Performances of Classification Models
6. Summary and Future Works
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Missouri Department of Conservation. Wetland Planning Initiative Implementation Plan; Missouri Department of Conservation: St. Charles, MO, USA, 2019. [Google Scholar]
- Tang, Z.; Zhang, Y.; Wang, Y.; Shang, Y.; Viegut, R.; Webb, E.; Raedeke, A.; Sartwell, J. sUAS and Machine Learning Integration in Waterfowl Population Surveys. In Proceedings of the 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI), Washington, DC, USA, 1–3 November 2021; pp. 517–521. [Google Scholar]
- Zhang, Y.; Wang, S.; Zhai, Z.; Shang, Y.; Viegut, R.; Webb, E.; Raedeke, A.; Sartwell, J. Development of New Aerial Image Datasets and Deep Learning Methods for Waterfowl Detection and Classification. In Proceedings of the 2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI), Atlanta, GA, USA, 14–17 December 2022; pp. 117–124. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the Advances in Neural Information Processing Systems; Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2015; Volume 28. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef] [PubMed]
- Ma, W.; Wang, X.; Yu, J. A Lightweight Feature Fusion Single Shot Multibox Detector for Garbage Detection. IEEE Access 2020, 8, 188577–188586. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Aharon, S.; Louis-Dupont; Oferbaratz; Masad, O.; Yurkova, K.; Fridman, L.; Lkdci; Khvedchenya, E.; Rubin, R.; Bagrov, N.; et al. Super-Gradients, 2021. Available online: https://zenodo.org/records/7789328 (accessed on 29 February 2024).
- Ultralytics. YOLOv5: A State-of-the-Art Real-Time Object Detection System. 2021. Available online: https://docs.ultralytics.com (accessed on 29 February 2024).
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Zong, Z.; Song, G.; Liu, Y. DETRs with Collaborative Hybrid Assignments Training. arXiv 2022, arXiv:2211.12860. [Google Scholar]
- Weinstein, B.G.; Marconi, S.; Aubry-Kientz, M.; Vincent, G.; Senyondo, H.; White, E.P. DeepForest: A Python package for RGB deep learning tree crown delineation. Methods Ecol. Evol. 2020, 11, 1743–1751. [Google Scholar] [CrossRef]
- Tarvainen, A.; Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Jeong, J.; Lee, S.; Kim, J.; Kwak, N. Consistency-based semi-supervised learning for object detection. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Xu, M.; Zhang, Z.; Hu, H.; Wang, J.; Wang, L.; Wei, F.; Bai, X.; Liu, Z. End-to-End Semi-Supervised Object Detection with Soft Teacher. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 3060–3069. [Google Scholar]
- Liu, Y.C.; Ma, C.Y.; He, Z.; Kuo, C.W.; Chen, K.; Zhang, P.; Wu, B.; Kira, Z.; Vajda, P. Unbiased Teacher for Semi-Supervised Object Detection. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event, Austria, 3–7 May 2021. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Network. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Wei, W.; Yang, Y.; Wang, X.; Wang, W.; Li, J. Development of convolutional neural network and its application in image classification: A survey. Opt. Eng. 2019, 58, 040901. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019. [Google Scholar]
- Berthelot, D.; Carlini, N.; Goodfellow, I.J.; Papernot, N.; Oliver, A.; Raffel, C. MixMatch: A Holistic Approach to Semi-Supervised Learning. arXiv 2019, arXiv:1905.02249. [Google Scholar]
- Sohn, K.; Berthelot, D.; Li, C.L.; Zhang, Z.; Carlini, N.; Cubuk, E.D.; Kurakin, A.; Zhang, H.; Raffel, C. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, Virtual, 6–12 December 2020. [Google Scholar]
- Lin, T.; Maire, M.; Belongie, S.J.; Bourdev, L.D.; Girshick, R.B.; Hays, J.; Perona, P.; Ramanan, D.; Doll’ar, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. arXiv 2014, arXiv:1405.0312. [Google Scholar]
Dataset Name | No. of Images | No. of Birds | Altitude (m) | Object |
---|---|---|---|---|
Bird-G | 181 | 62,758 | 15–90 | Birds |
Bird-H | 177 | 16,738 | 15–90 | Decoys |
Bird-I | 171 | 7058 | 15 | Birds |
Bird-J | 708 | 111,088 | 15–90 | Birds |
Unlabeled-K | 11,021 | Unknown | 15–90 | Birds |
Faster R-CNN | YOLOv5 | YOLONAS | SoftTeacher | |
---|---|---|---|---|
Bird-G | 89.76 | 89.42 | 86.62 | 88.73 |
Bird-H | 81.77 | 52.14 | 91.52 | 78.56 |
Bird-I | 94.57 | 88.48 | 89.2 | 95.54 |
Bird-J | 73.60 | 78.61 | 79.23 | 71.43 |
Average | 84.92 | 77.16 | 86.66 | 83.31 |
DeepForest | RetinaNet | Faster R-CNN | YOLOv5 | YOLONAS | Soft Teacher | |
---|---|---|---|---|---|---|
Bird-G | 76.60 | 89.69 | 89.67 | 91.08 | 84.56 | 88.56 |
Bird-H | 55.65 | 81.69 | 82.88 | 68.78 | 88.11 | 82.45 |
Bird-I | 77.41 | 85.48 | 88.85 | 87.07 | 87.57 | 84.46 |
Bird-J | 48.93 | 74.71 | 74.48 | 88.98 | 76.97 | 72.20 |
Average | 64.64 | 82.89 | 83.71 | 83.97 | 84.30 | 81.91 |
DeepForest | RetinaNet | Faster R-CNN | YOLOv5 | YOLONAS | Soft Teacher | |
---|---|---|---|---|---|---|
Training | - | 36 | 247 | 59 | 58 | 1014 |
Inference | 0.9 | 0.9 | 4.3 | 1.1 | 1.0 | 4.2 |
Faster R-CNN | YOLOv5 | YOLONAS | RetinaNet | Soft Teacher | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Sunny | Cloudy | Sunny | Cloudy | Sunny | Cloudy | Sunny | Cloudy | Sunny | Cloudy | |
HarvestedCrop | 91.18 | 75.41 | 82.93 | 36.16 | 84.15 | 71.77 | 56.10 | 50.15 | 90.75 | 72.90 |
Ice | 99.05 | 70.68 | 99.37 | 98.48 | 99.33 | 92.69 | 89.76 | 34.28 | 96.66 | 49.00 |
Land | 95.82 | 74.29 | 88.71 | 66.95 | 90.56 | 68.99 | 73.11 | 48.85 | 68.03 | 61.81 |
Lotus | 88.24 | 85.70 | 79.32 | 66.16 | 85.59 | 81.75 | 88.24 | 85.88 | 56.08 | 74.99 |
MoistSoil | 93.13 | 90.20 | 72.98 | 93.59 | 86.58 | 91.99 | 76.44 | 84.61 | 78.65 | 71.91 |
OpenWater | 98.67 | 87.18 | 99.09 | 93.05 | 97.89 | 91.30 | 98.11 | 83.98 | 89.09 | 44.10 |
ShrubScrub | 93.73 | - | 56.81 | - | 89.15 | - | 64.01 | - | 84.80 | - |
StandingCorn | 90.75 | 93.36 | 55.06 | 69.46 | 83.20 | 86.07 | 87.27 | 72.51 | 75.88 | 74.70 |
WaterCorn | 95.48 | 91.69 | 71.27 | 68.03 | 91.78 | 88.56 | 94.87 | 66.24 | 83.21 | 52.99 |
Wooded | 81.92 | 92.88 | 67.66 | 92.16 | 78.02 | 87.29 | 89.04 | 88.13 | 58.19 | 75.33 |
Altitude | Faster R-CNN | RetinaNet | YOLOv5 | YOLONAS | Soft Teacher |
---|---|---|---|---|---|
15 m | 95.38 | 86.20 | 85.37 | 93.96 | 92.59 |
30 m | 93.25 | 90.54 | 80.78 | 91.23 | 92.27 |
60 m | 87.56 | 43.93 | 86.21 | 91.41 | 88.58 |
90 m | 81.67 | 62.94 | 90.58 | 88.70 | 77.23 |
Labeled Training Set Proportion | ||||
---|---|---|---|---|
10% | 20% | 50% | 100% | |
Faster R-CNN | 67.50 | 74.12 | 78.17 | 82.79 |
Soft Teacher | 73.45 | 77.74 | 82.65 | - |
EfficientNet | ResNet18 | MixMatch | FixMatch | |
---|---|---|---|---|
15 m real bird | 81.65 | 78.37 | 82.88 | 80.70 |
15 m decoy | 91.58 | 89.78 | 87.54 | 88.71 |
30 m decoy | 79.98 | 76.74 | 81.34 | 80.09 |
60 m decoy | 43.75 | 40.66 | 46.40 | 48.80 |
90 m decoy | 41.05 | 36.72 | 47.92 | 46.25 |
EfficientNet | ResNet18 | MixMatch | FixMatch | |
---|---|---|---|---|
OpenWater | 93.46 | 91.25 | 92.18 | 93.57 |
MoistSoil | 70.77 | 54.78 | 71.58 | 72.53 |
StandingCorn | 83.68 | 82.44 | 84.55 | 81.56 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Y.; Feng, Y.; Wang, S.; Tang, Z.; Zhai, Z.; Viegut, R.; Webb, L.; Raedeke, A.; Shang, Y. Deep Learning Models for Waterfowl Detection and Classification in Aerial Images. Information 2024, 15, 157. https://doi.org/10.3390/info15030157
Zhang Y, Feng Y, Wang S, Tang Z, Zhai Z, Viegut R, Webb L, Raedeke A, Shang Y. Deep Learning Models for Waterfowl Detection and Classification in Aerial Images. Information. 2024; 15(3):157. https://doi.org/10.3390/info15030157
Chicago/Turabian StyleZhang, Yang, Yuan Feng, Shiqi Wang, Zhicheng Tang, Zhenduo Zhai, Reid Viegut, Lisa Webb, Andrew Raedeke, and Yi Shang. 2024. "Deep Learning Models for Waterfowl Detection and Classification in Aerial Images" Information 15, no. 3: 157. https://doi.org/10.3390/info15030157
APA StyleZhang, Y., Feng, Y., Wang, S., Tang, Z., Zhai, Z., Viegut, R., Webb, L., Raedeke, A., & Shang, Y. (2024). Deep Learning Models for Waterfowl Detection and Classification in Aerial Images. Information, 15(3), 157. https://doi.org/10.3390/info15030157