Automatic Detection of Feral Pigeons in Urban Environments Using Deep Learning
Abstract
:Simple Summary
Abstract
1. Introduction
- (1)
- We created a unique dataset of feral pigeons in an urban environment with manually annotated bounding boxes, concentrating on the detection and enumeration of urban pigeons across diverse cityscapes.
- (2)
- We developed Swin-Mask R-CNN with SAHI model for pigeon detection, incorporating the SAHI tool to preserve fine details of smaller targets during the inference phase, thereby enhancing detection accuracy.
- (3)
- We further improved the model with SAHI tool and enabled it to encompass a greater number of feral pigeons by utilizing large-scale images (4032 × 3024), achieving broader detection coverage.
2. Materials and Methods
2.1. Image Data Collection
2.2. Data Labeling and Augmentation
2.3. Workflow Overview
2.4. Swin-Mask R-CNN with SAHI Model
2.5. Swin Transformer Backbone
2.6. Feature Pyramid Network (FPN)
2.7. Slicing-Aided Hyper Inference (SAHI) Tool
3. Results
3.1. Experimental Settings and Model Evaluation Indicators
3.2. mAP Comparison of Different Model
3.3. Results Visualization
3.4. Pigeon Counting Demo
4. Discussion
- The use of deep learning network algorithms for bird recognition has consistently demonstrated strong performance [43,44]. However, there are several limitations in current bird detection methods. Firstly, the utilization of traditional backbone networks in two-stage detection approaches [45,46] hinders the maximization of network performance in bird detection. Secondly, most existing studies on bird detection using deep learning techniques lack a specific focus on individual bird species, such as feral pigeons. Some studies concentrate on the accurate identification of various bird species in airborne scenarios [14,15,24,46], while others explore the classification and detection of different bird species in natural environments, such as wind farms or aquatic habitats [19,45]. Additionally, a few studies specifically investigate the detection and counting of different bird species in specific regions, such as birds on transmission lines [18]. Moreover, extensive resources are required for traditional feral pigeon research in urban environments [47], and the limited urban pigeon detection focuses only on specific areas such as buildings [48]. There is currently no comprehensive study on feral pigeon detection in complex urban settings. To address these challenges, we propose an automatic detection method for feral pigeons in urban environments using deep learning. Through a series of experiments, we demonstrate the effectiveness of our proposed method in feral pigeon detection in urban areas.
- In bird detection, most studies have utilized one-stage (YOLO) [15,17,19,21] and two-stage (Faster R-CNN and Mask R-CNN) [43,44] object detection models. The original Mask R-CNN has demonstrated great performance in bird detection [37]. Based on this, we propose an improved algorithm that enhances the main components of the original Mask R-CNN and incorporates the SAHI tool to improve the model’s detection performance. Recent studies have shown the effectiveness of the Swin Transformer in capturing fine-grained animal details [49,50]. Therefore, we replace the backbone of the original Mask R-CNN with the Swin Transformer and add FPN as the network’s neck for multi-scale feature fusion [51]. After adjusting the network, to evaluate the performance of the Swin-Mask R-CNN model, we compare it with commonly used object detection methods for bird detection, including YOLO [16,18,19,21], Faster R-CNN [45], Mask R-CNN [46], and our proposed method (Swin-Mask R-CNN) on our feral pigeon dataset. The mAP of our proposed Swin-Mask R-CNN model reaches the highest value of 0.68. These experimental results demonstrate that by applying various bird detection models and the Swin-Mask R-CNN model to feral pigeon detection, our model achieves the best performance. Moreover, although using Swin-Mask R-CNN as the architecture yields optimal results in the previous comparative experiments, there is still room for improvement in detecting small objects of birds (AP50s). There are specific studies focused on the detection of small objects of birds [15,46]. Therefore, to further enhance the accuracy of detecting small objects of feral pigeons, we introduce the SAHI tool [30] to assist inference processing. In this phase, we incorporate the SAHI tool into all the models involved in the previous experiments and conduct further experiments on our dataset. The experimental results demonstrate that our Swin-Mask R-CNN with SAHI model significantly improves the accuracy of feral pigeon detection, achieving the highest values in mAP, AP50, and AP50s with improvements of 6%, 6%, and 10%, respectively.
- The detection and estimation of feral pigeon populations can provide us with a better understanding of their growth and distribution in different urban areas. If there is an occurrence of feral pigeon overpopulation, relevant authorities can take appropriate management measures to prevent their negative impact on the urban environment and avoid excessive competition with other urban bird species, which could disrupt ecological balance and species diversity. Our current work has significantly improved the detection capability of feral pigeons in urban environments, but we still face some challenges in the future. Our research has the following two limitations: we have not further tested the generalization ability of our model, and we have not fully deployed it in real time on portable terminals. In future work, we plan to enhance these aspects. On one hand, although our proposed model demonstrates good detection performance, to further validate its generalization ability, we intend to collect larger datasets encompassing feral pigeons and other bird species from various cities through collaborations with researchers and public data sources. On the other hand, while we have developed a demo for automatic feral pigeon counting, it has not been extensively deployed in real-world scenarios. Our goal for future work is to deploy our algorithm on cloud and mobile platforms, enabling researchers to upload photos and videos for automatic analysis by the model. This will provide feral pigeon detection and counting results, allowing estimation of feral pigeon populations in different areas and assessment of the impact of feral pigeon overpopulation.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Angen, Ø.; Johannesen, T.B.; Petersen, R.F.; Uldum, S.A.; Schnee, C. Development of a species-specific real-time PCR test for Chlamydia psittaci and its employment in the investigation of zoonotic transmission from racing pigeons in Denmark. Diagn. Microbiol. Infect. Dis. 2021, 100, 115341. [Google Scholar] [CrossRef]
- Smith, W.J.; Jezierski, M.T.; Dunn, J.C.; Clegg, S.M. Parasite exchange and hybridisation at a wild-feral-domestic interface. Int. J. Parasitol. 2023, 53, 797–808. [Google Scholar] [CrossRef] [PubMed]
- Oldekop, W.; Oldekop, G.; Vahldiek, K.; Klawonn, F.; Rinas, U. Counting young birds: A simple tool for the determination of avian population parameters. PLoS ONE 2023, 18, e0279899. [Google Scholar] [CrossRef] [PubMed]
- Edwards, B.P.M.; Smith, A.C.; Docherty, T.D.S.; Gahbauer, M.A.; Gillespie, C.R.; Grinde, A.R.; Harmer, T.; Iles, D.T.; Matsuoka, S.M.; Michel, N.L.; et al. Point count offsets for estimating population sizes of north American landbirds. Ibis 2023, 165, 482–503. [Google Scholar] [CrossRef]
- Ding, Z.; Guo, A.; Lian, M.; Wang, Y.; Ying, W.; Jiang, H.; Zhou, X.; Qian, C.; Lai, J.; Cao, J. Landscape factors influencing bird nest site selection in urban green spaces. Front. Ecol. Evol. 2023, 11, 1258185. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Xiao, Y.; Tian, Z.; Yu, J.; Zhang, Y.; Liu, S.; Du, S.; Lan, X. A review of object detection based on deep learning. Multimed. Tools Appl. 2020, 79, 23729–23791. [Google Scholar] [CrossRef]
- Oliveira, D.A.B.; Pereira, L.G.R.; Bresolin, T.; Ferreira, R.E.P.; Dorea, J.R.R. A review of deep learning algorithms for computer vision systems in livestock. Livest. Sci. 2021, 253, 104700. [Google Scholar] [CrossRef]
- Banupriya, N.; Saranya, S.; Swaminathan, R.; Harikumar, S.; Palanisamy, S. Animal detection using deep learning algorithm. J. Crit. Rev. 2020, 7, 434–439. [Google Scholar]
- Huang, E.; Mao, A.; Gan, H.; Ceballos, M.C.; Parsons, T.D.; Xue, Y.; Liu, K. Center clustering network improves piglet counting under occlusion. Comput. Electron. Agric. 2021, 189, 106417. [Google Scholar] [CrossRef]
- Shao, W.; Kawakami, R.; Yoshihashi, R.; You, S.; Kawase, H.; Naemura, T. Cattle detection and counting in UAV images based on convolutional neural networks. Int. J. Remote Sens. 2020, 41, 31–52. [Google Scholar] [CrossRef]
- Sarwar, F.; Griffin, A.; Periasamy, P.; Portas, K.; Law, J. Detecting and counting sheep with a convolutional neural network. In Proceedings of the 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Auckland, New Zealand, 27–30 November 2018; IEEE: Toulouse, France; pp. 1–6. [Google Scholar]
- Xu, B.; Wang, W.; Falzon, G.; Kwan, P.; Guo, L.; Chen, G.; Tait, A.; Schneider, D. Automated cattle counting using Mask R-CNN in quadcopter vision system. Comput. Electron. Agric. 2020, 171, 105300. [Google Scholar] [CrossRef]
- Chabot, D.; Francis, C.M. Computer-automated bird detection and counts in high-resolution aerial images: A review. J. Field Ornithol. 2016, 87, 343–359. [Google Scholar] [CrossRef]
- Boudaoud, L.B.; Maussang, F.; Garello, R.; Chevallier, A. Marine bird detection based on deep learning using high-resolution aerial images. In Proceedings of the OCEANS 2019-Marseille, Marseille, France, 17–20 June 2019; IEEE: Toulouse, France, 2019; pp. 1–7. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Zou, C.; Liang, Y.Q. Bird detection on transmission lines based on DC-YOLO model. In Proceedings of the 11th IFIP TC 12 International Conference on Intelligent Information Processing (IIP 2020), Hangzhou, China, 3–6 July 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 222–231. [Google Scholar]
- Yang, X.; Chai, L.; Bist, R.B.; Subedi, S.; Wu, Z. A deep learning model for detecting cage-free hens on the litter floor. Animals 2022, 12, 1983. [Google Scholar] [CrossRef]
- Alqaysi, H.; Fedorov, I.; Qureshi, F.Z.; O’nils, M. A temporal boosted YOLO-based model for birds detection around wind farms. J. Imaging 2021, 7, 227. [Google Scholar] [CrossRef]
- Welch, G.F. Kalman filter. In Computer Vision: A Reference Guide; Springer: Cham, Switzerland, 2020; pp. 1–3. [Google Scholar]
- Siriani, A.L.R.; Kodaira, V.; Mehdizadeh, S.A.; Nääs, I.d.A.; de Moura, D.J.; Pereira, D.F. Detection and tracking of chickens in low-light images using YOLO network and Kalman filter. Neural Comput. Appl. 2022, 34, 21987–21997. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, X.; Wang, F.; Wei, B.; Li, L. A comprehensive review of one-stage networks for object detection. In Proceedings of the 2021 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Xi’an, China, 17–19 August 2021; IEEE: Toulouse, France, 2021. [Google Scholar]
- Du, L.; Zhang, R.; Wang, X. Overview of two-stage object detection algorithms. J. Phys. Conf. Ser. 2020, 1544, 012034. [Google Scholar] [CrossRef]
- Hong, S.-J.; Han, Y.; Kim, S.-Y.; Lee, A.-Y.; Kim, G. Application of deep-learning methods to bird detection using unmanned aerial vehicle imagery. Sensors 2019, 19, 1651. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot Multibox Detector. In Proceedings of the 14th European Conference on Computer Vision (ECCV 2016), Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; p. 10012. [Google Scholar]
- Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Akyon, F.C.; Altinuc, S.O.; Temizel, A. Slicing aided hyper inference and fine-tuning for small object detection. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; IEEE: Toulouse, France, 2022; pp. 966–970. [Google Scholar]
- Neubeck, A.; Van Gool, L. Efficient non-maximum suppression. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; IEEE: Toulouse, France, 2006. [Google Scholar]
- Tzutalin. LabelImg. Git Code. 2015. Available online: https://github.com/tzutalin/labelImg (accessed on 16 December 2023).
- Tao, A.; Sapra, K.; Catanzaro, B. Hierarchical multi-scale attention for semantic segmentation. arXiv 2005, arXiv:2005.10821. [Google Scholar]
- Tian, Y.; Wang, Y.; Krishnan, D.; Tenenbaum, J.B.; Isola, P. Rethinking few-shot image classification: A good embedding is all you need? In Proceedings of the Computer Vision–ECCV 2020 16th European Conference, Glasgow, UK, 23–28 August 2020; Part XIV; Springer International Publishing: Cham, Switzerland, 2020.
- Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6999–7019. [Google Scholar] [CrossRef]
- Dollár, P.; Appel, R.; Belongie, S.; Perona, P. Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 1532–1545. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Chen, Y.; Huang, C.; Gao, M. Object detection network based on feature fusion and attention mechanism. Future Internet 2019, 11, 9. [Google Scholar] [CrossRef]
- Zhang, W.; Zhou, L.; Zhuang, P.; Li, G.; Pan, X.; Zhao, W.; Li, C. Underwater image enhancement via weighted wavelet visual perception fusion. IEEE Trans. Circuits Syst. Video Technol. 2023. [Google Scholar] [CrossRef]
- Zhang, W.; Jin, S.; Zhuang, P.; Liang, Z.; Li, C. Underwater image enhancement via piecewise color correction and dual prior optimized contrast enhancement. IEEE Signal Process. Lett. 2023, 30, 229–233. [Google Scholar] [CrossRef]
- Tang, Y.; Han, K.; Guo, J.; Xu, C.; Xu, C.; Wang, Y. GhostNetv2: Enhance cheap operation with long-range attention. Adv. Neural Inf. Process. Syst. 2022, 35, 9969–9982. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Mehta, S.; Rastegari, M. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
- Datar, P.; Jain, K.; Dhedhi, B. Detection of birds in the wild using deep learning methods. In Proceedings of the 2018 4th International Conference for Convergence in Technology (I2CT), Mangalore, India, 27–28 October 2018; IEEE: Toulouse, France, 2018. [Google Scholar]
- Pillai, S.K.; Raghuwanshi, M.M.; Borkar, P. Super Resolution Mask Rcnn Based Transfer Deep Learning Approach for Identification of Bird Species. Int. J. Adv. Res. Eng. Technol. 2020, 11, 864–874. [Google Scholar]
- Xiang, W.; Song, Z.; Zhang, G.; Wu, X. Birds detection in natural scenes based on improved faster RCNN. Appl. Sci. 2022, 12, 6094. [Google Scholar] [CrossRef]
- Kassim, Y.M.; Byrne, M.E.; Burch, C.; Mote, K.; Hardin, J.; Larsen, D.R.; Palaniappan, K. Small object bird detection in infrared drone videos using mask R-CNN deep learning. Electron. Imaging 2020, 32, art00003. [Google Scholar] [CrossRef]
- Giunchi, D.; Gaggini, V.; Baldaccini, N.E. Distance sampling as an effective method for monitoring feral pigeon (Columba livia f. domestica) urban populations. Urban Ecosyst. 2007, 10, 397–412. [Google Scholar] [CrossRef]
- Schiano, F.; Natter, D.; Zambrano, D.; Floreano, D. Autonomous detection and deterrence of pigeons on buildings by drones. IEEE Access 2021, 10, 1745–1755. [Google Scholar] [CrossRef]
- Agilandeeswari, L.; Meena, S. Swin transformer based contrastive self-supervised learning for animal detection and classification. Multimed. Tools Appl. 2023, 82, 10445–10470. [Google Scholar] [CrossRef]
- Gu, T.; Min, R. A Swin Transformer based Framework for Shape Recognition. In Proceedings of the 2022 14th International Conference on Machine Learning and Computing (ICMLC), Guangzhou, China, 18–21 February 2022; pp. 388–393. [Google Scholar]
- Dogra, A.; Goyal, B.; Agrawal, S. From multi-scale decomposition to non-multi-scale decomposition methods: A comprehensive survey of image fusion techniques and its applications. IEEE Access 2017, 5, 16040–16067. [Google Scholar] [CrossRef]
Dataset | Train | Validation | Test | All |
---|---|---|---|---|
Original dataset | 266 | 67 | 67 | 400 |
Data augment | 2400 | 600 | 600 | 3600 |
Final dataset | 2666 | 667 | 667 | 4000 |
Configuration | Parameters |
---|---|
CPU | 32 vCPU AMD EPYC 7763 64-Core Processor |
GPU | A100-SXM4-80GB (80 GB) |
Development environment | Python 3.8 |
Operation system | Ubuntu 18.04 |
Operating Deep Learning Framework | Pytorch 1.9.0 |
CUDA Version | CUDA 11.1 |
Model | Backbone | Model Weight Size | mAP | AP50 | AP50s |
---|---|---|---|---|---|
YOLOv5-s | Darknet53 | 72 m | 0.45 | 0.65 | 0.36 |
YOLOv5-m | Darknet53 | 98 m | 0.44 | 0.69 | 0.39 |
YOLOv5-s | GhostnetV2 | 73 m | 0.42 | 0.62 | 0.33 |
YOLOv5-s | MobilenetV3 | 111 m | 0.47 | 0.65 | 0.35 |
Faster R-CNN | Resnet | 142 m | 0.52 | 0.70 | 0.43 |
Mask R-CNN | Resnet | 229 m | 0.51 | 0.75 | 0.40 |
Mask R-CNN | MobileViT | 240 m | 0.61 | 0.77 | 0.47 |
Mask R-CNN | Swin Transformer | 256 m | 0.64 | 0.77 | 0.52 |
Modified Mask R-CNN (Swin-Mask-RCNN) | Swin Transformer + FPN | 269 m | 0.68 | 0.87 | 0.57 |
Step | Model | Backbone | Model Weight Size | mAP | AP50 | AP50s |
---|---|---|---|---|---|---|
1 | Mask R-CNN | Swin Transformer + FPN | 269 m | 0.68 | 0.87 | 0.57 |
2 | Mask R-CNN | Swin Transformer | 256 m | 0.64 | 0.77 | 0.52 |
3 | Mask R-CNN | Resnet | 229 m | 0.51 | 0.75 | 0.40 |
Model | mAP | AP50 | AP50s |
---|---|---|---|
YOLOv5-s + SAHI | 0.51 | 0.71 | 0.42 |
YOLOv5-m + SAHI | 0.56 | 0.74 | 0.46 |
YOLOv5-s + GhostnetV2 + SAHI | 0.49 | 0.67 | 0.38 |
YOLOv5-s + MobilenetV3+ SAHI | 0.53 | 0.70 | 0.41 |
Faster R-CNN + Resnet + SAHI | 0.60 | 0.72 | 0.46 |
Mask R-CNN+ Resnet + SAHI | 0.62 | 0.78 | 0.52 |
Mask R-CNN+ MobileViT + SAHI | 0.69 | 0.83 | 0.62 |
Swin-Mask R-CNN + SAHI (ours) | 0.74 | 0.93 | 0.67 |
Model (Swin-Mask R-CNN + SAHI) | mAP | AP50 | AP50s |
---|---|---|---|
400 images | 0.51 | 0.71 | 0.42 |
2000 images | 0.71 | 0.74 | 0.63 |
4000 images | 0.74 | 0.93 | 0.67 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, Z.; He, Z.; Lyu, L.; Mao, A.; Huang, E.; Liu, K. Automatic Detection of Feral Pigeons in Urban Environments Using Deep Learning. Animals 2024, 14, 159. https://doi.org/10.3390/ani14010159
Guo Z, He Z, Lyu L, Mao A, Huang E, Liu K. Automatic Detection of Feral Pigeons in Urban Environments Using Deep Learning. Animals. 2024; 14(1):159. https://doi.org/10.3390/ani14010159
Chicago/Turabian StyleGuo, Zhaojin, Zheng He, Li Lyu, Axiu Mao, Endai Huang, and Kai Liu. 2024. "Automatic Detection of Feral Pigeons in Urban Environments Using Deep Learning" Animals 14, no. 1: 159. https://doi.org/10.3390/ani14010159
APA StyleGuo, Z., He, Z., Lyu, L., Mao, A., Huang, E., & Liu, K. (2024). Automatic Detection of Feral Pigeons in Urban Environments Using Deep Learning. Animals, 14(1), 159. https://doi.org/10.3390/ani14010159