Small-Target Pest Detection Model Based on Dynamic Multi-Scale Feature Extraction and Dimensionally Selected Feature Fusion
Abstract
1. Introduction
2. Related Work
2.1. YOLO Algorithm Based on Feature Extraction Network
2.2. YOLO Algorithm Based on Feature Fusion Network
- (1)
- A dynamic multi-scale feature extraction module is proposed, which can adaptively adjust to different input features as well as efficiently fuse information across layers to efficiently capture multi-scale and diverse feature information in images.
- (2)
- Integration of feature maps at different scales through a dimensionality-selective feature pyramid network enhances feature fusion and propagation across layers. Then, the fused feature maps are further extracted with effective features through the dynamic multi-scale feature extraction module to minimize the problem of information loss during the information fusion process.
- (3)
- In this paper, NWD combined with CIoU is used as the position loss function of MSDS-YOLO to measure the prediction error more accurately. Furthermore, the implementation of a specialized detection head for small targets has led to a substantial enhancement in the accuracy of detecting minute objects.
3. Methodology
3.1. Dynamic Multiscale Feature Extraction Module (C3k2_DMSFE)
3.2. Dimension Selection Feature Pyramid Network (DSFPN)
3.3. Normalized Gaussian Wasserstein Distance Loss Function
4. Experimental Results and Discussion
4.1. Datasets
4.1.1. Public Datasets
Yellow-Sticky-Traps-Datasets [38]
VisDrone2019 [40]
4.1.2. Self-Buit Datasets
Data Collection
Dataset Preparation
4.2. Experimental Environment and Evaluation Indicators
4.3. Experimental Results and Discussion
4.3.1. Comparative Analysis of Detection Model Performance Metrics
Cottonpest2 Dataset
Public Dataset
4.3.2. Ablation Experiments
4.4. Visualization Analysis
4.5. Model Deployment
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Chen, P.; Xiao, Q.; Zhang, J.; Xie, C.; Wang, B. Occurrence prediction of cotton pests and diseases by bidirectional long short-term memory networks with climate and atmosphere circulation. Comput. Electron. Agric. 2020, 176, 105612. [Google Scholar] [CrossRef]
- Jing, R.; Zhang, W.; Li, Y.; Li, W.; Liu, Y. Feature aggregation network for small object detection. Expert Syst. Appl. 2024, 255, 124686. [Google Scholar] [CrossRef]
- Chen, X.; Fang, H.; Lin, T.-Y.; Vedantam, R.; Gupta, S.; Dollár, P.; Zitnick, C.L. Microsoft COCO Captions: Data collection and evaluation server. arXiv 2015, arXiv:1504.00325. [Google Scholar] [CrossRef]
- Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 8–14 December 2001. [Google Scholar]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2005), San Diego, CA, USA, 20–25 June 2005; IEEE: Piscataway, NJ, USA, 2005; Volume 1, pp. 886–893. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Schapire, R.E. Explaining adaboost. In Empirical inference: Festschrift in Honor of Vladimir N. Vapnik; Schölkopf, B., Luo, Z., Vovk, V., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 37–52. [Google Scholar]
- Ye, Y.; Huang, Q.; Rong, Y.; Yu, X.; Liang, W.; Chen, Y.; Xiong, S. Field detection of small pests through stochastic gradient descent with genetic algorithm. Comput. Electron. Agric. 2023, 206, 107694. [Google Scholar] [CrossRef]
- Li, W.; Zheng, T.; Yang, Z.; Li, M.; Sun, C.; Yang, X. Classification and detection of insects from field images using deep learning for smart pest management: A systematic review. Ecol. Inform. 2021, 66, 101460. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Xie, Y.-L.; Lin, C.-W. YOLO-ResTinyECG: ECG-based lightweight embedded AI arrhythmia small object detector with pruning methods. Expert Syst. Appl. 2025, 263, 125691. [Google Scholar] [CrossRef]
- Tian, Y.; Wang, S.; Li, E.; Yang, G.; Liang, Z.; Tan, M. MD-YOLO: Multi-scale Dense YOLO for small target pest detection. Comput. Electron. Agric. 2023, 213, 108233. [Google Scholar] [CrossRef]
- Hu, X.; Li, X.; Huang, Z.; Chen, Q.; Lin, S. Detecting tea tree pests in complex backgrounds using a hybrid architecture guided by transformers and multi-scale attention mechanism. J. Sci. Food Agric. 2024, 104, 3570–3584. [Google Scholar] [CrossRef]
- Wang, J.; Wang, J. A lightweight YOLOv8 based on attention mechanism for mango pest and disease detection. J. Real-Time Image Process. 2024, 21, 136. [Google Scholar] [CrossRef]
- Tang, Z.; Lu, J.; Chen, Z.; Qi, F.; Zhang, L. Improved Pest-YOLO: Real-time pest detection based on efficient channel attention mechanism and transformer encoder. Ecol. Inform. 2023, 78, 102340. [Google Scholar] [CrossRef]
- Hu, J.; Li, Z.; Huang, H.; Hong, T.; Jiang, S.; Zeng, J. Citrus psyllid detection based on improved YOLOv4-Tiny model. Trans. Chin. Soc. Agric. Eng. 2021, 37, 197–203. [Google Scholar]
- Chu, J.; Li, Y.; Feng, H.; Weng, X.; Ruan, Y. Research on Multi-Scale Pest Detection and Identification Method in Granary Based on Improved YOLOv5. Agriculture 2023, 13, 364. [Google Scholar] [CrossRef]
- Xu, H.; Zheng, W.; Liu, F.; Li, P.; Wang, R. Unmanned Aerial Vehicle Perspective Small Target Recognition Algorithm Based on Improved YOLOv5. Remote Sens. 2023, 15, 3583. [Google Scholar] [CrossRef]
- Guo, A.; Jia, Z.; Ge, B.; Chen, W.; Song, S.; He, C.; Zhou, G.; Wang, J.; Lv, X. RLCFE-Net: A reparameterization large convolutional kernel feature extraction network for weed detection in multiple scenarios. Expert Syst. Appl. 2025, 274, 126941. [Google Scholar] [CrossRef]
- Bai, C.; Zhang, K.; Jin, H.; Qian, P.; Zhai, R.; Lu, K. SFFEF-YOLO: Small object detection network based on fine-grained feature extraction and fusion for unmanned aerial images. Image Vis. Comput. 2025, 156, 105469. [Google Scholar] [CrossRef]
- Jiang, L.; Yuan, B.; Du, J.; Chen, B.; Xie, H.; Tian, J.; Yuan, Z. MFFSODNet: Multiscale Feature Fusion Small Object Detection Network for UAV Aerial Images. IEEE Trans. Instrum. Meas. 2024, 73, 1–14. [Google Scholar] [CrossRef]
- Dong, S.; Teng, Y.; Jiao, L.; Du, J.; Liu, K.; Wang, R. ESA-Net: An efficient scale-aware network for small crop pest detection. Expert Syst. Appl. 2024, 236, 121308. [Google Scholar] [CrossRef]
- Iqra; Giri, K.J. SO-YOLOv8: A novel deep learning-based approach for small object detection with YOLO beyond COCO. Expert Syst. Appl. 2025, 280, 127447. [Google Scholar]
- Ding, S.; Xiong, M.; Wang, X.; Zhang, Z.; Chen, Q.; Zhang, J.; Wang, X.; Zhang, Z.; Li, D.; Xu, S.; et al. Dynamic feature and context enhancement network for faster detection of small objects. Expert Syst. Appl. 2025, 265, 125732. [Google Scholar]
- Zhang, Y.; Zhang, H.; Huang, Q.; Han, Y.; Zhao, M. DsP-YOLO: An anchor-free network with DsPAN for small object detection of multiscale defects. Expert Syst. Appl. 2024, 241, 122669. [Google Scholar] [CrossRef]
- Shi, P.; He, Q.; Zhu, S.; Li, X.; Fan, X.; Xin, Y. Multi-scale fusion and efficient feature extraction for enhanced sonar image object detection. Expert Syst. Appl. 2024, 256, 124958. [Google Scholar] [CrossRef]
- Zhang, Y.; Wu, C.; Guo, W.; Zhang, T.; Li, W. CFANet: Efficient Detection of UAV Image Based on Cross-Layer Feature Aggregation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–11. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhang, T.; Wu, C.; Tao, R. Multi-Scale Spatiotemporal Feature Fusion Network for Video Saliency Prediction. IEEE Trans. Multimed. 2024, 26, 4183–4193. [Google Scholar]
- Chen, Z.; Ji, H.; Zhang, Y.; Zhu, Z.; Li, Y. High-Resolution Feature Pyramid Network for Small Object Detection on Drone View. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 475–489. [Google Scholar] [CrossRef]
- Wang, H.; Liu, J.; Zhao, J.; Zhang, J.; Zhao, D. Precision and speed: LSOD-YOLO for lightweight small object detection. Expert Syst. Appl. 2025, 269, 126440. [Google Scholar] [CrossRef]
- Zhao, W.; Kang, Y.; Chen, H.; Zhao, Z.; Zhao, Z.; Zhai, Y. Adaptively Attentional Feature Fusion Oriented to Multiscale Object Detection in Remote Sensing Images. IEEE Trans. Instrum. Meas. 2023, 72, 1–11. [Google Scholar] [CrossRef]
- Yu, W.; Zhou, P.; Yan, S.; Wang, X. InceptionNeXt: When Inception Meets ConvNeXt. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 5672–5683. [Google Scholar]
- Shi, D. TransNeXt: Robust Foveal Visual Perception for Vision Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 17773–17783. [Google Scholar]
- Xu, S.; Zheng, S.; Xu, W.; Xu, R.; Wang, C.; Zhang, J.; Teng, X.; Li, A.; Guo, L. HCF-Net: Hierarchical Context Fusion Network for Infrared Small Object Detection. In Proceedings of the 2024 IEEE International Conference on Multimedia and Expo (ICME), Niagara Falls, ON, Canada, 15–19 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
- Wang, J.; Xu, C.; Yang, W.; Yu, L. A Normalized Gaussian Wasserstein Distance for Tiny Object Detection. arXiv 2021, arXiv:2110.13389. [Google Scholar]
- Nieuwenhuizen, A.; Hemming, J.; Janssen, D.; Suh, H.K.; Bosmans, L.; Sluydts, V.; Brenard, N.; Rodríguez, E.; Tellez, M. Raw data from Yellow Sticky Traps with insects for training of deep learning Convolutional Neural Network for object detection. Wagening. Univ. Res. 2019, 3, S2. [Google Scholar]
- Shi, J.; Jia, Y.; Zhou, G.; Wang, J.; Jia, Z. Small Target Insect Detection Based on Improved YOLOv8n. In Proceedings of the ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1–5. [Google Scholar]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y.; et al. VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 213–226. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Jocher, G.; Stoken, A.; Borovec, J.; Liu, C.; Hogan, A.; Diaconu, L.; Poznanski, J.; Yu, L.; Rai, P.; Ferriday, R.; et al. ultralytics/yolov5: V3.0. Zenodo. 12 August 2020. Available online: https://zenodo.org/records/3983579 (accessed on 8 January 2026).
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 7464–7475. [Google Scholar]
- Yaseen, M. What is YOLOv8: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector. arXiv 2024, arXiv:2408.15857. [Google Scholar]
- Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar] [CrossRef]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
- Tian, Y.; Ye, Q.; Doermann, D. Yolov12: Attention-centric real-time object detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]








| Dataset | Class | Instances | Box Size | Target Amount | |||
|---|---|---|---|---|---|---|---|
| MaxSize | MinSize | Small | Medium | Large | |||
| Training Set | aphids | 5754 | 54 × 53 | 11 × 6 | 5304 | 450 | 0 |
| thrips | 4292 | 30 × 30 | 9 × 3 | 4292 | 0 | 0 | |
| Val Set | aphids | 1746 | 70 × 38 | 11 × 7 | 1623 | 123 | 0 |
| thrips | 1226 | 33 × 28 | 9 × 3 | 1226 | 0 | 0 | |
| Test Set | aphids | 751 | 52 × 47 | 14 × 8 | 653 | 62 | 0 |
| thrips | 582 | 26 × 21 | 8 × 4 | 582 | 0 | 0 | |
| Parameter | Setup |
|---|---|
| Image size | |
| Momentum | |
| BatchSize | 8 |
| Epoch | 750 |
| Patience | 100 |
| Initial learning rate | |
| Final learning rate | |
| Weight decay | |
| Warmup epochs | |
| IoU | |
| Close-Mosaic | 0 |
| Optimizer | SGD |
| Seed | 0 |
| Model | Parameters | FLOPS (G) | Size (MB) | F1-Score | mAP (50) | mAP (50:95) |
|---|---|---|---|---|---|---|
| YOLOv3-tiny | 8,671,312 | 12.9 | 33.08 | 67.12 | 68.7 | 30.4 |
| YOLOv5n | 1,761,871 | 4.1 | 6.72 | 78.99 | 84.5 | 38.3 |
| YOLOv7n | 6,010,302 | 13.0 | 22.93 | 73.05 | 77.0 | 33.4 |
| YOLOv8n | 3,006,038 | 8.1 | 11.47 | 79.61 | 84.6 | 40.0 |
| YOLOv9-t | 2,801,644 | 11.7 | 10.69 | 78.80 | 83.4 | 39.4 |
| GELAN-t | 1,879,014 | 7.1 | 7.17 | 79.62 | 84.1 | 40.2 |
| YOLOv10n | 2,695,196 | 8.2 | 10.28 | 76.69 | 81.6 | 38.0 |
| YOLOv11n | 2,582,542 | 6.3 | 9.85 | 78.55 | 83.7 | 39.9 |
| YOLOv12n | 2,508,734 | 5.8 | 9.57 | 77.75 | 83.7 | 39.5 |
| MSDS-YOLO (ours) | 2,818,094 | 18.8 | 10.75 | 80.83 | 86.7 | 40.6 |
| Different Datasets | Models | Parameters | FLOPs (G) | Size (MB) | F1-Core | mAP(50) | mAP(50:95) |
|---|---|---|---|---|---|---|---|
| YOLOv3-tiny | 8,671,312 | 12.9 | 33.08 | 57.04 | 50.7 | 18.5 | |
| YOLOv5n | 1,763,224 | 4.1 | 6.73 | 85.00 | 85.6 | 34.9 | |
| YOLOv7n | 6,013,008 | 13.0 | 22.94 | 68.78 | 70.0 | 25.2 | |
| Public Dataset | YOLOv8n | 3,006,233 | 8.1 | 11.47 | 81.88 | 83.9 | 37.1 |
| YOLOv9-t | 2,802,034 | 11.7 | 10.69 | 74.77 | 79.2 | 34.9 | |
| Yellow-Sticky- | GELAN-t | 1,879,209 | 7.1 | 7.17 | 76.16 | 80.9 | 36.0 |
| Traps-Datasets | YOLOv10n | 2,695,586 | 8.2 | 10.28 | 79.15 | 83.8 | 34.8 |
| YOLOv11n | 2,582,737 | 6.3 | 9.85 | 82.69 | 86.2 | 36.4 | |
| YOLOv12n | 2,508,929 | 5.8 | 9.57 | 82.40 | 87.5 | 40.6 | |
| MSDS-YOLO (Our) | 2,818,289 | 18.8 | 10.75 | 87.65 | 91.5 | 40.2 | |
| YOLOv3-tiny | 8,687,482 | 12.9 | 33.14 | 21.89 | 14.4 | 6.09 | |
| YOLOv5n | 1,772,695 | 4.2 | 6.76 | 30.74 | 24.0 | 12.1 | |
| YOLOv7n | 6,031,950 | 13.1 | 23.01 | 38.78 | 31.2 | 15.8 | |
| Public Dataset | YOLOv8n | 3,007,598 | 8.1 | 11.47 | 35.00 | 28.3 | 16.0 |
| YOLOv9-t | 2,804,764 | 11.7 | 10.70 | 18.52 | 31.9 | 18.4 | |
| Visdrone | GELAN-t | 1,880,574 | 7.1 | 7.17 | 37.23 | 31.0 | 17.0 |
| 2019 | YOLOv10n | 2,698,316 | 8.2 | 10.29 | 35.03 | 28.6 | 15.9 |
| YOLOv11n | 2,584,102 | 6.3 | 9.86 | 34.82 | 28.1 | 15.7 | |
| YOLOv12n | 2,510,294 | 5.8 | 9.58 | 34.60 | 28.1 | 15.8 | |
| MSDS-YOLO (Our) | 2,819,654 | 18.8 | 10.77 | 38.82 | 32.2 | 17.3 |
| Basic (YOLO11) | +C3k2_DMSFE | +DSFPN | +NWD | Parameters | FLOPS (G) | Size (MB) | F1-Score | mAP(50) | mAP(50:95) |
|---|---|---|---|---|---|---|---|---|---|
| ✓ | 2,582,542 | 6.3 | 9.85 | 78.55 | 83.7 | 39.9 | |||
| ✓ | ✓ | 2,320,958 | 5.8 | 8.85 | 79.09 | 85.3 | 40.5 | ||
| ✓ | ✓ | 2,982,238 | 19.7 | 11.38 | 80.15 | 86.0 | 40.3 | ||
| ✓ | ✓ | 2,582,542 | 6.3 | 9.85 | 79.95 | 84.6 | 39.5 | ||
| ✓ | ✓ | ✓ | 2,818,094 | 18.8 | 10.75 | 80.42 | 86.3 | 41.7 | |
| ✓ | ✓ | ✓ | 2,320,958 | 5.8 | 8.85 | 79.05 | 84.4 | 39.2 | |
| ✓ | ✓ | ✓ | 2,982,238 | 19.7 | 11.38 | 80.19 | 85.5 | 40.1 | |
| ✓ | ✓ | ✓ | ✓ | 2,818,094 | 18.8 | 10.75 | 80.83 | 86.7 | 40.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Li, J.; Le, W.; Jia, Z.; Zhou, G.; Wang, J.; Chen, G.; Wang, Y.; Guo, Y. Small-Target Pest Detection Model Based on Dynamic Multi-Scale Feature Extraction and Dimensionally Selected Feature Fusion. Appl. Sci. 2026, 16, 793. https://doi.org/10.3390/app16020793
Li J, Le W, Jia Z, Zhou G, Wang J, Chen G, Wang Y, Guo Y. Small-Target Pest Detection Model Based on Dynamic Multi-Scale Feature Extraction and Dimensionally Selected Feature Fusion. Applied Sciences. 2026; 16(2):793. https://doi.org/10.3390/app16020793
Chicago/Turabian StyleLi, Junjie, Wu Le, Zhenhong Jia, Gang Zhou, Jiajia Wang, Guohong Chen, Yang Wang, and Yani Guo. 2026. "Small-Target Pest Detection Model Based on Dynamic Multi-Scale Feature Extraction and Dimensionally Selected Feature Fusion" Applied Sciences 16, no. 2: 793. https://doi.org/10.3390/app16020793
APA StyleLi, J., Le, W., Jia, Z., Zhou, G., Wang, J., Chen, G., Wang, Y., & Guo, Y. (2026). Small-Target Pest Detection Model Based on Dynamic Multi-Scale Feature Extraction and Dimensionally Selected Feature Fusion. Applied Sciences, 16(2), 793. https://doi.org/10.3390/app16020793

