MSG-YOLO: A Multi-Scale Dynamically Enhanced Network for the Real-Time Detection of Small Impurities in Large-Volume Parenterals
Abstract
:1. Introduction
- (1)
- Multi-scale Dynamic Perception: We implement adaptive multi-scale feature fusion through parallel dilated convolution, leveraging feature disparities across scales to enhance small target detection.
- (2)
- Dual-dimensional Attention Enhancement: We integrate channel re-weighting with spatial detail enhancement to create a two-level noise filtering mechanism, effectively suppressing interference and highlighting target features.
- (3)
- Lightweight Structural Design: Our module employs grouped computation for spatial attention, enabling channel group-specific attention calculation while preserving inter-channel differences. This approach significantly reduces parameters while improving deployment efficiency.
2. Proposed Method
2.1. Overall Model Structure
2.2. MSGELAN
- CSPNet: The CSPNet architecture involves bifurcating the input via a transformation layer, followed by parallel processing through arbitrary computational blocks. Subsequently, these divergent branches are reconciled through concatenation, and then subjected to another transformation layer, thereby revitalizing the information flow.
- ELAN: In contrast to CSPNet, ELAN employs a hierarchical arrangement of stacked convolutional layers, where each layer’s output is synergistically combined with the next layer’s input, and then subjected to additional convolutional processing. This hierarchical schema enables ELAN to effectively capture complex patterns and relationships.
- GELAN: By synthesizing the design philosophies of CSPNet and ELAN, GELAN emerges as a more versatile and efficient architecture. It incorporates the segmentation and recombination principles of CSPNet while integrating ELAN’s hierarchical convolutional processing paradigm at each segment. A key differentiator of GELAN lies in its flexibility to accommodate any type of computational block, rather than being confined to convolutional layers alone. This adaptability enables GELAN to be tailored to diverse application requirements.
2.3. MSG-CECM Module
2.3.1. Multi-Scale Deep Feature Extraction
2.3.2. Channel Attention Enhancement
2.3.3. Dynamic Group-Wise Spatial Attention
2.3.4. Residual Connections
3. Methodology
3.1. Material
3.2. Experimental Setup
3.3. Evaluation Metrics
4. Experiments Result
4.1. Ablation Studies
4.2. Comparative Experiments
4.3. Visualization Analysis of Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Eisenhauer, D.A.; Schmidt, R.; Martin, C.; Schultz, S.G. Processing of small volume parenterals and large volume parenterals. In Pharmaceutical Dosage Forms-Parenteral Medications; CRC Press: Boca Raton, FL, USA, 2016; pp. 348–366. [Google Scholar]
- Jia, D.; Sun, H.; Zhang, C.; Tang, J.; Li, Z.; Wu, N.; He, Z. Detection Method of Foreign Body in Large Volume Parenteral Based on Continuous Time Series. Available online: https://ssrn.com/abstract=4178845 (accessed on 11 March 2025).
- Zhang, H.; Li, X.; Zhong, H.; Yang, Y.; Wu, Q.J.; Ge, J.; Wang, Y. Automated machine vision system for liquid particle inspection of pharmaceutical injection. IEEE Trans. Instrum. Meas. 2018, 67, 1278–1297. [Google Scholar] [CrossRef]
- Zhang, Q.; Liu, K.; Huang, B. Research on Defect Detection of The Liquid Bag of Bag Infusion Sets Based on Machine Vision. Acad. J. Sci. Technol. 2023, 5, 186–197. [Google Scholar] [CrossRef]
- Ge, J.; Xie, S.; Wang, Y.; Liu, J.; Zhang, H.; Zhou, B.; Weng, F.; Ru, C.; Zhou, C.; Tan, M.; et al. A system for automated detection of ampoule injection impurities. IEEE Trans. Autom. Sci. Eng. 2015, 14, 1119–1128. [Google Scholar] [CrossRef]
- Zhang, H.; Shi, T.; He, S.; Wang, H.; Ruan, F. Visual detection system design for plastic infusion combinations containers based on reverse PM diffusion. In Proceedings of the 2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics, Hangzhou, China, 26–27 August 2015; Volume 2, pp. 306–310. [Google Scholar]
- Cheng, K.S.; Lin, J.S.; Mao, C.W. Techniques and comparative analysis of neural network systems and fuzzy systems in medical image segmentation. In Fuzzy Theory Systems; Elsevier: Amsterdam, The Netherlands, 1999; pp. 973–1008. [Google Scholar]
- Liang, Q.; Luo, B. Visual inspection intelligent robot technology for large infusion industry. Open Comput. Sci. 2023, 13, 20220262. [Google Scholar] [CrossRef]
- Wang, C.Y.; Yeh, I.H.; Mark Liao, H.Y. Yolov9: Learning what you want to learn using programmable gradient information. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 1–21. [Google Scholar]
- Terven, J.; Córdova-Esparza, D.M.; Romero-González, J.A. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Farhadi, A.; Redmon, J. Yolov3: An incremental improvement. In Proceedings of the Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; Springer: Berlin/Heidelberg, Germany, 2018; Volume 1804, pp. 1–6. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar]
- Wei, X.; Li, Z.; Wang, Y. SED-YOLO based multi-scale attention for small object detection in remote sensing. Sci. Rep. 2025, 15, 3125. [Google Scholar] [CrossRef] [PubMed]
- Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; Kwon, Y.; Michael, K.; Fang, J.; Wong, C.; Yifu, Z.; Montes, D.; et al. Ultralytics/yolov5: v6. 2-yolov5 classification models, apple m1, reproducibility, clearml and deci. ai integrations. Zenodo 2022. Available online: https://ui.adsabs.harvard.edu/abs/2022zndo...7002879J/abstract (accessed on 11 March 2025).
- Wang, J.; Gao, J.; Zhang, B. A small object detection model in aerial images based on CPDD-YOLOv8. Sci. Rep. 2025, 15, 770. [Google Scholar] [CrossRef] [PubMed]
- Dong, Y.; Xu, F.; Guo, J. LKR-DETR: Small object detection in remote sensing images based on multi-large kernel convolution. J. Real-Time Image Process. 2025, 22, 46. [Google Scholar] [CrossRef]
- Peng, Y.; Li, H.; Wu, P.; Zhang, Y.; Sun, X.; Wu, F. D-FINE: Redefine regression Task in DETRs as Fine-grained distribution refinement. arXiv 2024, arXiv:2410.13842. [Google Scholar]
- Chen, L.C. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Umar, M.; Siddique, M.F.; Ullah, N.; Kim, J.M. Milling machine fault diagnosis using acoustic emission and hybrid deep learning with feature optimization. Appl. Sci. 2024, 14, 10404. [Google Scholar] [CrossRef]
- Siddique, M.F.; Zaman, W.; Ullah, S.; Umar, M.; Saleem, F.; Shon, D.; Yoon, T.H.; Yoo, D.S.; Kim, J.M. Advanced Bearing-Fault Diagnosis and Classification Using Mel-Scalograms and FOX-Optimized ANN. Sensors 2024, 24, 7303. [Google Scholar] [CrossRef]
- Ma, Y.; Mao, Z. LiqD: A Dynamic Liquid Level Detection Model under Tricky Small Containers. arXiv 2024, arXiv:2403.08273. [Google Scholar]
- Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
- Wittenburg, P.; Brugman, H.; Russel, A.; Klassmann, A.; Sloetjes, H. ELAN: A professional framework for multimodality research. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy, 22–28 May 2006; pp. 1556–1559. [Google Scholar]
- Khanam, R.; Hussain, M. What is YOLOv5: A deep look into the internal features of the popular object detector. arXiv 2024, arXiv:2407.20892. [Google Scholar]
- Xu, S.; Wang, X.; Lv, W.; Chang, Q.; Cui, C.; Deng, K.; Wang, G.; Dang, Q.; Wei, S.; Du, Y.; et al. PP-YOLOE: An evolved version of YOLO. arXiv 2022, arXiv:2203.16250. [Google Scholar]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2778–2788. [Google Scholar]
- Miao, L.; Li, N.; Zhou, M.; Zhou, H. CBAM-Yolov5: Improved Yolov5 based on attention model for infrared ship detection. In Proceedings of the International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2021), Harbin, China, 24–26 December 2021; Volume 12168, pp. 564–571. [Google Scholar]
Hyperparameter | Value | Hyperparameter | Value |
---|---|---|---|
lr0 | 0.01 | warmup_momentum | 0.8 |
lrf | 0.01 | dfl | 1.5 |
momentum | 0.937 | box | 7.5 |
warmup_bias_lr | 0.1 | cls | 0.5 |
warmup_epochs | 3.0 | obj | 0.7 |
Experiment Group | Multi-Scale Conv. | Channel Attention | Spatial Attention | Dynamic Grouping | mAP@0.5 (%) | AP_Small (%) |
---|---|---|---|---|---|---|
Baseline | × | × | × | × | 31.5 | 19.5 |
Stage 1 | ✓ | × | × | × | +1.7 (33.2) | +2.3 (21.8) |
Stage 2 | ✓ | ✓ | × | × | +1.5 (34.7) | +1.3 (23.1) |
Stage 3 | ✓ | ✓ | ✓ | × | +1.1 (35.8) | +1.1 (25.2) |
MSG-YOLO | ✓ | ✓ | ✓ | ✓ | +0.7 (36.5) | +0.4 (25.6) |
Group Size (g) | mAP@0.5 (%) | AP_Small (%) | FPS | Params (M) |
---|---|---|---|---|
2 | 35.1 | 24.3 | 56 | 7.1 |
4 | 36.5 | 25.6 | 54 | 6.8 |
8 | 36.9 | 26.1 | 51 | 6.7 |
16 | 36.4 | 25.4 | 47 | 6.6 |
32 | 35.8 | 24.9 | 42 | 6.6 |
Test Condition | Parameters | mAP@0.5 | AP_Small | FPS | mAP Drop |
---|---|---|---|---|---|
Resolution | Original | 36.5 | 25.6 | 58 | - |
768 × 768 | 38.1 | 26.9 | 46 | - | |
1024 × 1024 | 39.7 | 27.7 | 25 | - | |
Gaussian Noise | = 10 | 35.1 | 24.8 | - | −3.9% |
= 30 | 29.4 | 18.2 | - | −19.5% | |
Salt-and-Pepper Noise | Density = 10% | 27.6 | 15.9 | - | −24.4% |
Motion Blur | Kernel Length = 15 | 34.2 | 23.1 | - | −6.3% |
Kernel Length = 25 | 30.7 | 19.5 | - | −15.9% |
Models | mAP@0.5 | AP_Small ↑ | FPS ↑ | Params (M) ↓ | FLOPs (G) ↓ |
---|---|---|---|---|---|
YOLOv9t [9] | 34.2 | 22.5 | 72 | 2.4 | 10.1 |
YOLOv5s [29] | 33.8 | 21.7 | 63 | 7.2 | 16.5 |
PP-YOLOE [30] | 35.1 | 23.1 | 65 | 8.9 | 24.3 |
TPH-YOLO [31] | 36.5 | 24.8 | 41 | 16.7 | 36.8 |
YOLOv5s-CBAM [32] | 34.6 | 23.3 | 58 | 7.5 | 17.2 |
MSG-YOLO(OURS) | 36.5 | 25.6 | 58 | 2.5 | 10.7 |
Model | mAP@0.5 | AP_Small | FPS | Params (M) | FLOPs (G) |
---|---|---|---|---|---|
MSG-YOLO | 41.2 | 28.6 | 54 | 6.8 | 15.4 |
YOLOv9t | 38.1 | 24.1 | 62 | 5.2 | 12.7 |
TPH-YOLO | 39.4 | 26.8 | 47 | 8.5 | 18.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Z.; Jia, D.; He, Z.; Wu, N. MSG-YOLO: A Multi-Scale Dynamically Enhanced Network for the Real-Time Detection of Small Impurities in Large-Volume Parenterals. Electronics 2025, 14, 1149. https://doi.org/10.3390/electronics14061149
Li Z, Jia D, He Z, Wu N. MSG-YOLO: A Multi-Scale Dynamically Enhanced Network for the Real-Time Detection of Small Impurities in Large-Volume Parenterals. Electronics. 2025; 14(6):1149. https://doi.org/10.3390/electronics14061149
Chicago/Turabian StyleLi, Ziqi, Dongyao Jia, Zihao He, and Nengkai Wu. 2025. "MSG-YOLO: A Multi-Scale Dynamically Enhanced Network for the Real-Time Detection of Small Impurities in Large-Volume Parenterals" Electronics 14, no. 6: 1149. https://doi.org/10.3390/electronics14061149
APA StyleLi, Z., Jia, D., He, Z., & Wu, N. (2025). MSG-YOLO: A Multi-Scale Dynamically Enhanced Network for the Real-Time Detection of Small Impurities in Large-Volume Parenterals. Electronics, 14(6), 1149. https://doi.org/10.3390/electronics14061149