BAE-UNet: A Background-Aware and Edge-Enhanced Segmentation Network for Two-Stage Pest Recognition in Complex Field Environments
Abstract
1. Introduction
- The BAE-UNet segmentation model is proposed, which integrates three key modules: BACM, SCRA, and MESA. The segmentation stage is designed in a task-driven collaborative manner, in which BACM, SCRA, and MESA respectively focus on multi-scale feature representation, background noise suppression, and boundary-aware feature enhancement, thereby collaboratively strengthening the support of segmentation results for subsequent detection tasks. As a result, the model effectively removes complex field background interference and accurately extracts pest regions, providing high quality feature inputs for subsequent detection tasks.
- A detection stage based on YOLOv8 is constructed. Using the pest area generated by segmentation as input, it can achieve accurate positioning and category discrimination, effectively reducing the interference of complex backgrounds on the detection task. This significantly improves the model’s robustness and generalization in scenarios with multi-scale and multi-pose pests.
- The systematic verification of the self-built dataset and the validation of the method’s effectiveness have been completed. Experimental results on this dataset show that the detection accuracy and robustness of the proposed two-stage method are superior to those of mainstream single-stage models, providing reliable data support and practical reference for the field implementation of intelligent pest monitoring systems.
2. Materials and Methods
2.1. Materials
2.1.1. Image Data Collection
2.1.2. Dataset Construction
- These species are of significant economic importance in the crop ecosystems of Northeast China;
- Their morphological characteristics exhibit a high degree of diversity, including variations in body size, morphological structure, and phenotypic traits. For example, Gryllotalpidae are relatively large, while moths such as Ostrinia furnacalis and Chilo suppressalis are slender, which helps to assess the model’s adaptability to multi-scale targets;
- Some species have similar morphologies. For instance, Ostrinia furnacalis and Chilo suppressalis appear similar from certain viewpoints, providing challenging samples for the model to distinguish subtle morphological differences.
2.1.3. Data Augmentation
2.2. Methods
2.2.1. BAE-UNet
- Its encoder typically uses a stack of standard convolutions with a fixed receptive field, making it difficult to simultaneously adapt to the large-scale variations of pest targets.
- The shallow-layer features transmitted directly by skip connections contain substantial background clutter, which may interfere with the decoder’s restoration of target details.
- The feature map at the end of the decoder lacks a mechanism to strengthen key boundary information, resulting in blurred predicted boundaries where pests and background are mixed.
2.2.2. BACM
2.2.3. SCRA
2.2.4. MESA
2.2.5. YOLOv8
2.3. Evaluation Metrics and Experimental Setup
2.3.1. Evaluation Metrics
2.3.2. Experimental Setup
3. Experimental Results
3.1. First-Stage Segmentation Results
3.1.1. Comparative Experiments
3.1.2. Ablation Experiments
3.2. Second-Stage Detection Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Nanni, L.; Maguolo, G.; Pancino, F. Insect pest image detection and recognition based on bio-inspired methods. Ecol. Inform. 2020, 57, 101089. [Google Scholar] [CrossRef]
- Preti, M.; Verheggen, F.; Angeli, S. Insect pest monitoring with camera-equipped traps: Strengths and limitations. J. Pest Sci. 2021, 94, 203–217. [Google Scholar] [CrossRef]
- Espinoza, K.; Valera, D.L.; Torres, J.A.; López, A.; Molina-Aiz, F.D. Combination of image processing and artificial neural networks as a novel approach for the identification of Bemisia tabaci and Frankliniella occidentalis on sticky traps in greenhouse agriculture. Comput. Electron. Agric. 2016, 127, 495–505. [Google Scholar] [CrossRef]
- Xie, C.; Wang, R.; Zhang, J.; Chen, P.; Dong, W.; Li, R.; Chen, T.; Chen, H. Multi-level learning features for automatic classification of field crop pests. Comput. Electron. Agric. 2018, 152, 233–241. [Google Scholar] [CrossRef]
- Xie, C.; Zhang, J.; Li, R.; Li, J.; Hong, P.; Xia, J.; Chen, P. Automatic classification for field crop insects via multiple-task sparse representation and multiple-kernel learning. Comput. Electron. Agric. 2015, 119, 123–132. [Google Scholar] [CrossRef]
- Liu, T.; Chen, W.; Wu, W.; Sun, C.; Guo, W.; Zhu, X. Detection of aphids in wheat fields using a computer vision technique. Biosyst. Eng. 2016, 141, 82–93. [Google Scholar] [CrossRef]
- Ebrahimi, M.; Khoshtaghaza, M.H.; Minaei, S.; Jamshidi, B. Vision-based pest detection based on SVM classification method. Comput. Electron. Agric. 2017, 137, 52–58. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
- Liu, J.; Wang, X. Plant diseases and pests detection based on deep learning: A review. Plant Methods 2021, 17, 22. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 2002, 86, 2278–2324. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Li, C.; Zhen, T.; Li, Z. Image classification of pests with residual neural network based on transfer learning. Appl. Sci. 2022, 12, 4356. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Wen, C.; Chen, H.; Ma, Z.; Zhang, T.; Yang, C.; Su, H.; Chen, H. Pest-YOLO: A model for large-scale multi-class dense and tiny pest detection and counting. Front. Plant Sci. 2022, 13, 973985. [Google Scholar] [CrossRef] [PubMed]
- Nazir, A.; Wani, M.A. Multi-scale feature enhancement using EfficientNet-B7 and PANet in faster R-CNN for small object detection. Int. J. Inf. Technol. 2025, 1–8. [Google Scholar] [CrossRef]
- Huang, Y.-Q.; Huang, Z.-C.; Huang, C.; Qiao, X. MRUNet: A two-stage segmentation model for small insect targets in complex environments. J. Integr. Agric. 2023, 22, 1117–1130. [Google Scholar] [CrossRef]
- Abinaya, S.; Kumar, K.U.; Alphonse, A.S. Cascading autoencoder with attention residual U-Net for multi-class plant leaf disease segmentation and classification. IEEE Access 2023, 11, 98153–98170. [Google Scholar] [CrossRef]
- Biradar, N.; Hosalli, G. Segmentation and detection of crop pests using novel U-Net with hybrid deep learning mechanism. Pest Manag. Sci. 2024, 80, 3795–3807. [Google Scholar] [CrossRef]
- Mu, J.; Sun, L.; Ma, B.; Liu, R.; Liu, S.; Hu, X.; Zhang, H.; Wang, J. TFEMRNet: A Two-Stage Multi-Feature Fusion Model for Efficient Small Pest Detection on Edge Platforms. AgriEngineering 2024, 6, 4688–4703. [Google Scholar] [CrossRef]
- Abbas, A.; Saddam, B.; Ullah, F.; Hassan, M.A.; Shoukat, K.; Hafeez, F.; Alam, A.; Abbas, S.; Ghramh, H.A.; Khan, K.A. Global distribution and sustainable management of Asian corn borer (ACB), Ostrinia furnacalis (Lepidoptera: Crambidae): Recent advancement and future prospects. Bull. Entomol. Res. 2025, 115, 105–120. [Google Scholar] [CrossRef]
- Xiang, X.; Liu, S.; Li, H.; Danso Ofori, A.; Yi, X.; Zheng, A. Defense Strategies of Rice in Response to the Attack of the Herbivorous Insect, Chilo suppressalis. Int. J. Mol. Sci. 2023, 24, 14361. [Google Scholar] [CrossRef]
- Landolt, P.; Guedot, C.; Zack, R. Spotted cutworm, Xestia c-nigrum (L.)(Lepidoptera: Noctuidae) responses to sex pheromone and blacklight. J. Appl. Entomol. 2011, 135, 593–600. [Google Scholar] [CrossRef]
- Thompson, S.R.; Brandenburg, R.L. Tunneling responses of mole crickets (Orthoptera: Gryllotalpidae) to the entomopathogenic fungus, Beauveria bassiana. Environ. Entomol. 2005, 34, 140–147. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]
- Lin, M.; Chen, Q.; Yan, S. Network in network. arXiv 2013, arXiv:1312.4400. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar] [CrossRef]
- Varghese, R.; Sambath, M. Yolov8: A novel object detection algorithm with enhanced performance and robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar] [CrossRef]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X. Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3349–3364. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. arXiv 2014, arXiv:1406.2661. [Google Scholar] [CrossRef]













| Phase | Dataset/Subset | Number of Original Images | Data Augmentation | Final Number of Images |
|---|---|---|---|---|
| Data Preparation | Basic Dataset | 3000 | No | 2667 |
| First Phase (BAE-UNet) | Total | 1452 | - | 3352 |
| Training Set | 950 | Yes | 2850 | |
| Validation Set | 502 | No | 502 | |
| Test Set | 1215 | No | 1215 | |
| Second Phase (YOLOv8) | Total | 1215 | - | 1215 |
| Training Set | 972 | No | 972 | |
| Validation Set | 243 | No | 243 |
| Environment Configuration | Parameter |
|---|---|
| CPU | Intel(R) Xeon(R) Gold 6148CPU@2.40 GHz |
| GPU | 2 × A100 (80 GB) |
| Development environment | PyCharm 2023.2.5 |
| Language | Python 3.8 |
| frame | PyTorch 2.0.1 |
| Operating platform | CUDA 11.8 |
| Operating system | Windows 11 |
| Hyperparameter | First Phase (BAE-UNet) | Second Phase (YOLOv8) |
|---|---|---|
| Epochs | 300 | 300 |
| Batch Size | 16 | 32 |
| Learning Rate | 1 × 10−4 | 5 × 10−4 |
| Optimizer | Adam | AdamW |
| Input Image Size | 512 × 512 | 512 × 512 |
| Model | Backbone | mIoU | Dice | Boundary F1 | mPA |
|---|---|---|---|---|---|
| U-Net | VGG16 | 0.852 | 0.867 | 0.857 | 0.981 |
| DeepLabV3+ | ResNet101 | 0.838 | 0.848 | 0.830 | 0.975 |
| PSPNet | ResNet50 | 0.826 | 0.836 | 0.796 | 0.983 |
| HRNetV2 | HRNetV2-W32 | 0.850 | 0.852 | 0.811 | 0.993 |
| BAE-UNet | VGG16 | 0.930 | 0.951 | 0.943 | 0.985 |
| Model | mIoU | Dice | Boundary F1 | mPA |
|---|---|---|---|---|
| U-Net | 0.852 | 0.867 | 0.857 | 0.981 |
| U-Net + BACM | 0.932 | 0.944 | 0.891 | 0.971 |
| U-Net + SCRA | 0.929 | 0.936 | 0.916 | 0.984 |
| U-Net + MESA | 0.933 | 0.938 | 0.937 | 0.979 |
| BAE-UNet | 0.930 | 0.951 | 0.943 | 0.985 |
| Model | Precision | Recall | mAP50 | mAP50–95 |
|---|---|---|---|---|
| Original image+YOLOv8 | 0.748 | 0.796 | 0.818 | 0.525 |
| BAE-UNet+YOLOv8 | 0.958 | 0.971 | 0.977 | 0.882 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Chang, J.; Li, X.; Ze, X.; Ding, X.; Gong, H. BAE-UNet: A Background-Aware and Edge-Enhanced Segmentation Network for Two-Stage Pest Recognition in Complex Field Environments. Agronomy 2026, 16, 166. https://doi.org/10.3390/agronomy16020166
Chang J, Li X, Ze X, Ding X, Gong H. BAE-UNet: A Background-Aware and Edge-Enhanced Segmentation Network for Two-Stage Pest Recognition in Complex Field Environments. Agronomy. 2026; 16(2):166. https://doi.org/10.3390/agronomy16020166
Chicago/Turabian StyleChang, Jing, Xuefang Li, Xingye Ze, Xue Ding, and He Gong. 2026. "BAE-UNet: A Background-Aware and Edge-Enhanced Segmentation Network for Two-Stage Pest Recognition in Complex Field Environments" Agronomy 16, no. 2: 166. https://doi.org/10.3390/agronomy16020166
APA StyleChang, J., Li, X., Ze, X., Ding, X., & Gong, H. (2026). BAE-UNet: A Background-Aware and Edge-Enhanced Segmentation Network for Two-Stage Pest Recognition in Complex Field Environments. Agronomy, 16(2), 166. https://doi.org/10.3390/agronomy16020166

