DScanNet: Packaging Defect Detection Algorithm Based on Selective State Space Models
Abstract
1. Introduction
- We propose the DScanNet model, a detection algorithm based on deep learning and selective state-space models, to successfully design an efficient model for defect detection in logistics packaging. A new technical solution is provided to solve the real-time defect detection problem.
- The multi-scale enhanced feature extractor (MEFE Block), PCR, Mamba Block, and local feature extraction module (LFEM Block) modules are proposed. MEFE Block captures the detailed features and contextual information in the image better through multi-scale feature extraction, PCR optimizes the feature extraction process and reduces redundant computation through partial convolution, Mamba Block optimizes the model by using Mamba’s linear complexity, and LFEM Block fully extracts the local spatial information. Block optimizes the model using the linear complexity of Mamba, and LFEM Block fully extracts the local spatial information. These modules work in concert to effectively improve the model’s ability to capture and focus on defective features and enhance the detection accuracy.
- Based on the current shortage of packaging defect datasets, we conducted experiments on our own dataset BIGC-LP. After a large number of experiments, our proposed algorithm performs excellently on the defect dataset. Compared with the mainstream detection algorithms, the accuracy is as high as 96.8%, which is an obvious advantage in detection accuracy; at the same time, the number of model parameters and computation volume are effectively controlled, which achieves a good balance between performance and efficiency.
2. Methods
2.1. YOLO Series Real-Time Target Detection Algorithm
2.2. An Efficient Visual Modeling Approach Based on Mamba
3. Proposed Method
3.1. Multi-Scale Enhanced Feature Extractor(MEFE Block)
3.2. PCR Block
3.3. Mamba Block
Local Feature Extraction Module (LFEM Block)
4. Dataset Preparation and Experimental Environment
4.1. Dataset Preparation
4.2. Experimental Environment
5. Experiments
5.1. Evaluation Metrics
5.2. Experimental Analysis
5.3. Comparison Experiment
5.4. Ablation Experiment
6. Discussion
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Anjomshoae, S.T.; Rahim, M.S.M. Enhancement of template-based method for overlapping rubber tree leaf identification. Comput. Electron. Agric. 2016, 122, 176–184. [Google Scholar] [CrossRef]
- Rahman, M.O.; Hussain, A.; Scavino, E.; Hannan, M.; Basri, H. DNA computer based algorithm for recyclable waste paper segregation. Appl. Soft Comput. 2015, 31, 223–240. [Google Scholar] [CrossRef]
- Kulkarni, K.; Evangelidis, G.; Cech, J.; Horaud, R. Continuous action recognition based on sequence alignment. Int. J. Comput. Vis. 2015, 112, 90–114. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 142–158. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar]
- Yasir, M.; Shanwei, L.; Mingming, X.; Jianhua, W.; Nazir, S.; Islam, Q.U.; Dang, K.B. SwinYOLOv7: Robust ship detection in complex synthetic aperture radar images. Appl. Soft Comput. 2024, 160, 111704. [Google Scholar] [CrossRef]
- Tong, Y.; Yue, G.; Fan, L.; Lyu, G.; Zhu, D.; Liu, Y.; Meng, B.; Liu, S.; Mu, X.; Tian, C. YOLO-Faster: An efficient remote sensing object detection method based on AMFFN. Sci. Prog. 2024, 107, 00368504241280765. [Google Scholar] [CrossRef]
- Fu, X.; Zhou, Z.; Meng, H.; Li, S. A synthetic aperture radar small ship detector based on transformers and multi-dimensional parallel feature extraction. Eng. Appl. Artif. Intell. 2024, 137, 109049. [Google Scholar] [CrossRef]
- Wang, T.; Ma, Z.; Yang, T.; Zou, S. PETNet: A YOLO-based prior enhanced transformer network for aerial image detection. Neurocomputing 2023, 547, 126384. [Google Scholar] [CrossRef]
- Mehta, S.; Rastegari, M. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
- Chen, Z.; Zhong, F.; Luo, Q.; Zhang, X.; Zheng, Y. Edgevit: Efficient visual modeling for edge computing. In Proceedings of the International Conference on Wireless Algorithms, Systems, and Applications, Dalian, China, 24–26 November 2022; pp. 393–405. [Google Scholar]
- Li, Y.; Hu, J.; Wen, Y.; Evangelidis, G.; Salahi, K.; Wang, Y.; Tulyakov, S.; Ren, J. Rethinking vision transformers for mobilenet size and speed. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 16889–16900. [Google Scholar]
- Wang, Z.; Li, C.; Xu, H.; Zhu, X. Mamba YOLO: SSMs-based YOLO for object detection. arXiv 2024, arXiv:2406.05835. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Nelson, J.; Solawetz, J. Yolov5 is here: State-of-the-art object detection at 140 fps. Roboflow 2020, 17, 26. Available online: https://blog.roboflow.com/yolov5-is-here/ (accessed on 22 November 2022).
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8. Version 8.0.0. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 12 June 2025).
- Wang, C.-Y.; Yeh, I.-H.; Mark Liao, H.-Y. Yolov9: Learning what you want to learn using programmable gradient information. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 1–21. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J. Yolov10: Real-time end-to-end object detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
- Khanam, R.; Hussain, M. Yolov11: An overview of the key architectural enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar]
- Tian, Y.; Ye, Q.; Doermann, D. Yolov12: Attention-centric real-time object detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
- Liu, Y.; Tian, Y.; Zhao, Y.; Yu, H.; Xie, L.; Wang, Y.; Ye, Q.; Jiao, J.; Liu, Y. Vmamba: Visual state space model. Adv. Neural Inf. Process. Syst. 2024, 37, 103031–103063. [Google Scholar]
- Ma, J.; Li, F.; Wang, B. U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv 2024, arXiv:2401.04722. [Google Scholar]
- Zhu, L.; Liao, B.; Zhang, Q.; Wang, X.; Liu, W.; Wang, X. Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model. ICML 2024. arXiv 2024, arXiv:2401.09417. [Google Scholar]
- Chen, J.; Kao, S.-H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.-H.; Chan, S.-H.G. Run, don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar]
- Shi, H.; Wang, N.; Xu, X.; Qian, Y.; Zeng, L.; Zhu, Y. HeMoDU: High-Efficiency Multi-Object Detection Algorithm for Unmanned Aerial Vehicles on Urban Roads. Sensors 2024, 24, 4045. [Google Scholar] [CrossRef]
- Szőlősi, J.; Szekeres, B.J.; Magyar, P.; Adrián, B.; Farkas, G.; Andó, M. Welding defect detection with image processing on a custom small dataset: A comparative study. IET Collab. Intell. Manuf. 2024, 6, e70005. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. pp. 21–37. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
Name | Configure |
---|---|
Operating System | Linux-Ubuntu 20.04 |
CPU | Intel-i7-11700 |
GPU | NVIDIA GeForce GTX 3090 |
RAM | 24 GB |
IDE | PyCharm 2020 |
Deep Learning Frameworks | PyTorch 1.11 |
Programming Language | Python 3.9 |
CUDA | Version 11.3 |
Image size | 640 × 640 |
Learning rate | 0.01 |
Epoch | 100 |
Batchsize | 16 |
Learning rate | 0.01 |
Optimizer | SGD |
Algorithm | Precision | Recall | mAP @0.5 | mAP @0.5:0.95 | FPS | Param (M) | FLOPs (B) |
---|---|---|---|---|---|---|---|
YOLOv5 | 92.5 | 91.7 | 93.1 | 61.3 | 83.1 | 7.02 | 16.1 |
DScanNet | 96.8 | 94.6 | 95.1 | 66.1 | 97.1 | 3.68 | 9.4 |
Model | Precision | mAP@0.5 | mAP@0.5:0.95 | FPS | Params (M) | FLOPs (B) |
---|---|---|---|---|---|---|
YOLOv6 [21] | 92.7 | 91.8 | 61.3 | 60 | 9.14 | 17.23 |
YOLOv7 [22] | 93.5 | 93.1 | 64 | 90.2 | 6.007 | 13 |
YOLOv8 [23] | 90.1 | 88.7 | 58.2 | 76 | 11.2 | 28 |
YOLOv9 [24] | 89.5 | 85 | 54.4 | 72.5 | 7.2 | 26 |
YOLOv10 [25] | 60.2 | 58.7 | 34.9 | 83.4 | 7.2 | 21.6 |
YOLOv11 [26] | 86.8 | 86.1 | 54.8 | 72.5 | 9.4 | 21.5 |
YOLOv12 [27] | 85.9 | 82.2 | 47.9 | 65.7 | 9.3 | 21.4 |
SSD [34] | 84.98 | 86.06 | 50.9 | 52.3 | 62.74 | 26.29 |
Faster R-CNN [35] | 85.66 | 87.05 | 56 | 40 | 137 | 370.21 |
DScanNet | 96.8 | 95.1 | 66.1 | 97.1 | 3.68 | 9.4 |
Model | MEFE Block | PCR | Mamba Block | Precision | FPS | mAP@0.5 | Params (M) |
---|---|---|---|---|---|---|---|
1 | 92.5 | 83.1 | 93.1 | 7.02 | |||
2 | √ | 93.1 | 92.5 | 93.5 | 7.27 | ||
3 | √ | √ | 94.3 | 96.4 | 93.9 | 5.67 | |
4 | √ | √ | √ | 96.8 | 97.1 | 95.1 | 3.68 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Luo, Y.; Du, Y.; Wang, Z.; Mo, J.; Yu, W.; Dou, S. DScanNet: Packaging Defect Detection Algorithm Based on Selective State Space Models. Algorithms 2025, 18, 370. https://doi.org/10.3390/a18060370
Luo Y, Du Y, Wang Z, Mo J, Yu W, Dou S. DScanNet: Packaging Defect Detection Algorithm Based on Selective State Space Models. Algorithms. 2025; 18(6):370. https://doi.org/10.3390/a18060370
Chicago/Turabian StyleLuo, Yirong, Yanping Du, Zhaohua Wang, Jingtian Mo, Wenxuan Yu, and Shuihai Dou. 2025. "DScanNet: Packaging Defect Detection Algorithm Based on Selective State Space Models" Algorithms 18, no. 6: 370. https://doi.org/10.3390/a18060370
APA StyleLuo, Y., Du, Y., Wang, Z., Mo, J., Yu, W., & Dou, S. (2025). DScanNet: Packaging Defect Detection Algorithm Based on Selective State Space Models. Algorithms, 18(6), 370. https://doi.org/10.3390/a18060370