SLA-YOLO—Enhancing YOLO for Tiny Defect Detection in Industrial Defect Scenes
Abstract
1. Introduction
- To address the difficulty of effectively modeling local feature representations in high-resolution images containing tiny defects, we introduce a SAHI-inspired Image Slicing Processing (ISP) strategy, which enhances local regions during both training and inference processes and improves the model’s sensitivity to small-scale structures.
- We introduce the LRSC module before the detection head, which adaptively adjusts the receptive field according to object scale, thereby improving contextual feature modeling for tiny defects.
- We incorporate the THFE module between the backbone and neck networks to enhance high-level feature representations through positional encoding and attention-based global interaction, thereby improving the model’s discriminative ability for tiny defect detection.
2. Related Work
2.1. Data Augmentation
2.2. Contextual Modeling
2.3. The YOLO Family and Its Variants
3. Methods
3.1. Image Slicing Processing
3.2. THFE in Backbone
3.3. Large Receptive-Field Selective Context Module
4. Experiments
- 1.
- Can the proposed framework deliver noticeable performance gains in scenarios characterized by extremely small scale, low contrast, and blurred boundaries?
- 2.
- Do the core modules—ISP, THFE, and LRSC—function as intended and fully achieve their design objectives?
- 3.
- While preserving real-time performance, how well does the model generalize and remain practically feasible across different industrial application scenarios?
4.1. Datasets
4.2. Implementation Details
4.3. Ablation Study
4.3.1. Ablation Study of ISP
4.3.2. Ablation Study of THFE
4.3.3. Ablation Study of LRSC
4.4. Main Results
4.4.1. Baseline Model Selection
4.4.2. Comparative Results on Three Datasets
- (1)
- The local high-resolution perception mechanism based on image slicing (the ISP strategy) effectively alleviates the information attenuation of tiny object features caused by repeated downsampling operations, thereby enhancing the model’s ability to capture fine-grained local features of tiny defects.
- (2)
- The semantic-level feature enhancement mechanism (THFE module) improves the representation capability of discriminative structural patterns in tiny defects, which consequently enhances the accuracy of both object category recognition and spatial localization.
- (3)
- The dynamic context modeling mechanism (the LRSC module) significantly enhances the stability of detection performance in scenarios characterized by low contrast and uniform backgrounds, thereby improving the robustness of the model in complex environments.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Cheng, Y.; Cao, Y.; Yao, H.; Luo, W.; Jiang, C.; Zhang, H.; Shen, W. A comprehensive survey for real-world industrial surface defect detection: Challenges, approaches, and prospects. J. Manuf. Syst. 2026, 84, 152–172. [Google Scholar] [CrossRef]
- Sun, P.; Hua, C.; Ding, W.; Hua, C.; Liu, P.; Lei, Z. Ceramic tableware surface defect detection based on deep learning. Eng. Appl. Artif. Intell. 2025, 141, 109723. [Google Scholar] [CrossRef]
- Nautiyal, R.; Deshmukh, M. Tiny object detection: An in-depth survey of techniques, challenges, and future directions. Digit. Signal Process. 2026, 174, 105995. [Google Scholar] [CrossRef]
- Wang, J.; Yang, W.; Guo, H.; Zhang, R.; Xia, G.S. Tiny object detection in aerial images. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; IEEE: New York, NY, USA, 2021; pp. 3791–3798. [Google Scholar]
- Deng, R. A Review of the Applications of Machine Vision in Industrial Surface Defect Detection. J. Artif. Intell. Pract. 2025, 8, 144–151. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016, Proceedings, Part I; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Zhu, G.; Qi, H.; Lv, K. DGYOLOv8: An enhanced model for steel surface defect detection based on YOLOv8. Mathematics 2025, 13, 831. [Google Scholar] [CrossRef]
- Liu, Y.; Fan, G.; Zhang, H.; Xiao, D. Defect detection algorithm of galvanized sheet based on S-C-B-YOLO. Mathematics 2026, 14, 110. [Google Scholar] [CrossRef]
- Buslaev, A.; Parinov, A.; Khvedchenya, E.; Iglovikov, V.I.; Kalinin, A.A. Albumentations: Fast and Flexible Image Augmentation. Information 2020, 11, 125. [Google Scholar] [CrossRef]
- Deng, Z.; Sun, H.; Zhou, S.; Zhao, J.; Lei, L.; Zou, H. Multi-scale Object Detection in Remote Sensing Imagery with Convolutional Neural Networks. ISPRS J. Photogramm. Remote Sens. 2018, 145, 3–22. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems 25 (NeurIPS 2012), Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1106–1114. [Google Scholar]
- Wan, L.; Zeiler, M.; Zhang, S.; Le Cun, Y.; Fergus, R. Regularization of neural networks using dropconnect. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 1058–1066. [Google Scholar]
- Su, X.; Chang, L.; Shen, J.; Cheng, Y. Data Augmentation Techniques for Deep Learning-Based Object Detection: A Comprehensive Survey. J. Vis. Commun. Image Represent. 2023, 90, 103724. [Google Scholar] [CrossRef]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond Empirical Risk Minimization. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar] [CrossRef]
- Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27–28 October 2019; pp. 6023–6032. [Google Scholar] [CrossRef]
- Kisantal, M. Augmentation for Small Object Detection. arXiv 2019, arXiv:1902.07296. [Google Scholar] [CrossRef]
- Chen, C.; Zhang, Y.; Lv, Q.; Wei, S.; Wang, X.; Sun, X.; Dong, J. RRNet: A hybrid detector for object detection in drone-captured images. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 100–108. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Chen, Y.; Zhang, P.; Li, Z.L.Y.; Zhang, X.; Meng, G. Feedback-driven data provider for object detection. arXiv 2020, arXiv:2004.12432. [Google Scholar]
- Suárez-Ramírez, J.; Santana-Cedrés, D.; Monzón, N. DAHI: A Fast and Efficient Density Aided Hyper Inference Technique for Large Scene Object Detection. Pattern Recognit. 2025, 171, 112228. [Google Scholar] [CrossRef]
- De Ridder, V.; Dey, B.; Blanco, V.; Halder, S.; Van Waeyenberge, B. Improved Defect Detection and Classification Method for Advanced IC Nodes by Using Slicing Aided Hyper Inference with Refinement Strategy. arXiv 2023, arXiv:2311.11439. [Google Scholar] [CrossRef]
- Chen, X.; Gupta, A. Spatial memory for context reasoning in object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4086–4096. [Google Scholar]
- Chen, C.; Liu, M.Y.; Tuzel, O.; Xiao, J. R-CNN for small object detection. In Proceedings of the Computer Vision–ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016, Revised Selected Papers, Part V; Springer International Publishing: Cham, Switzerland, 2017; pp. 214–230. [Google Scholar]
- Cai, Z.; Fan, Q.; Feris, R.S.; Vasconcelos, N. A unified multi-scale deep convolutional neural network for fast object detection. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016, Proceedings, Part IV; Springer International Publishing: Cham, Switzerland, 2016; pp. 354–370. [Google Scholar]
- Zhu, Y.; Zhao, C.; Wang, J.; Zhao, X.; Wu, Y.; Lu, H. Couplenet: Coupling global structure with local parts for object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4126–4134. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
- Wang, A.; Peng, T.; Cao, H.; Xu, Y.; Wei, X.; Cui, B. TIA-YOLOv5: An improved YOLOv5 network for real-time detection of crop and weed in the field. Front. Plant Sci. 2022, 13, 1091655. [Google Scholar] [CrossRef]
- Nazli, N.A.N.M.; Sabri, N.; Aminuddin, R.; Ibrahim, S.; Yusof, S.; Nasir, S.D.N.M. A real-time system for detecting personal protective equipment compliance using deep learning model YOLOv5. Procedia Comput. Sci. 2024, 245, 647–656. [Google Scholar] [CrossRef]
- Varga, L.A.; Kiefer, B.; Messmer, M.; Zell, A. SeaDronesSee: A Maritime Benchmark for Detecting Humans in Open Water. In Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 2260–2270. [Google Scholar] [CrossRef]
- Lai, H.; Chen, L.; Liu, W.; Yan, Z.; Ye, S. STCYOLO: Small object detection network for traffic signs in complex environments. Sensors 2023, 23, 5307. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Computer Vision—ECCV 2018: 15th European Conference, Munich, Germany, 8–14 September 2018; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event, 3–7 May 2021. [Google Scholar]
- Wang, H.; Yang, H.; Chen, H.; Wang, J.; Zhou, X.; Xu, Y. A remote sensing image target detection algorithm based on improved YOLOv8. Appl. Sci. 2024, 14, 1557. [Google Scholar] [CrossRef]
- Shen, L.; Lang, B.; Song, Z. DS-YOLOv8-based object detection method for remote sensing images. IEEE Access 2023, 11, 125122–125137. [Google Scholar] [CrossRef]
- Wang, G.; Chen, Y.; An, P.; Hong, H.; Hu, J.; Huang, T. UAV-YOLOv8: A small-object-detection model based on improved YOLOv8 for UAV aerial photography scenarios. Sensors 2023, 23, 7190. [Google Scholar] [CrossRef]
- Lv, Y.; Dong, G.; Song, X. MicroDETR: DETR with frequency-spatial aware and cross-scale fusion for tiny object detection. Pattern Recognit. 2026, 172, 113747. [Google Scholar] [CrossRef]
- Zhou, W.; Tang, B.; Cong, R.; Jiang, Q. Turbidity-Similarity Decoupling: Feature-Consistent Mutual Learning for Underwater Salient Object Detection. IEEE Trans. Image Process. 2026, 35, 495–510. [Google Scholar] [CrossRef]
- Zhou, W.; Zhu, Y.; Lei, J.; Yang, R.; Yu, L. LSNet: Lightweight Spatial Boosting Network for Detecting Salient Objects in RGB-Thermal Images. IEEE Trans. Image Process. 2023, 32, 1329–1340. [Google Scholar] [CrossRef]
- Zhou, W.; Guo, Q.; Lei, J.; Yu, L.; Hwang, J.N. ECFFNet: Effective and Consistent Feature Fusion Network for RGB-T Salient Object Detection. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 1224–1235. [Google Scholar] [CrossRef]
- Akyon, F.C.; Altinuc, S.O.; Temizel, A. Slicing aided hyper inference and fine-tuning for small object detection. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; IEEE: New York, NY, USA, 2022; pp. 966–970. [Google Scholar]
- Aldubaikhi, A.; Patel, S. Advancements in Small-Object Detection (2023–2025): Approaches, Datasets, Benchmarks, Applications, and Practical Guidance. Appl. Sci. 2025, 15, 11882. [Google Scholar] [CrossRef]
- Hua, W.; Chen, Q. A survey of small object detection based on deep learning in aerial images. Artif. Intell. Rev. 2025, 58, 162. [Google Scholar] [CrossRef]
- Gao, Y.; Gao, Q.; Shao, L.; Wang, X.; Liu, L. HFI-Former: High-Frequency Interaction Transformer for Robust Scene Text Detection. Information 2026, 17, 365. [Google Scholar] [CrossRef]
- Li, Y.; Shen, L. A Frequency Domain-Enhanced Transformer for Nighttime Object Detection. Sensors 2025, 25, 3673. [Google Scholar] [CrossRef]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in Vision: A Survey. ACM Comput. Surv. 2022, 54, 1–41. [Google Scholar] [CrossRef]
- Peng, X.; Jiang, H. A Review of Small Object Detection Based on Deep Learning. In Proceedings of the 2025 2nd International Conference on Big Data Analytics and Artificial Intelligence Application (BDAIA ’25), New York, NY, USA, 28–30 November 2025; pp. 89–96. [Google Scholar] [CrossRef]
- Jamali, M.; Davidsson, P.; Khoshkangini, R.; Ljungqvist, M.G.; Mihailescu, R.C. Context in object detection: A systematic literature review. Artif. Intell. Rev. 2025, 58, 175. [Google Scholar] [CrossRef]
- Wang, Z.; Chen, Y.; Gu, Y.; Liu, J.; Zhu, X.; He, M. The evolution of object detection from CNNs to transformers and multi-modal fusion. Sci. Rep. 2026, 16, 7517. [Google Scholar] [CrossRef]
- Li, Y.; Hou, Q.; Zheng, Z.; Cheng, M.M.; Yang, J.; Li, X. Large selective kernel network for remote sensing object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 10–17 October 2023; pp. 16794–16805. [Google Scholar]
- Lema, D.G.; Sánchez-González, L.; Usamentiaga, R.; delaCalle, F.J. Benchmarking Deep Learning Models for Surface Defect Detection: A Reproducible and Statistically-Rigorous Approach. J. Intell. Manuf. 2025; in press. [CrossRef]
- Lyu, Y.; Liu, Y.; Zhao, Q.; Hao, Z.; Song, X. SFSIN: A Lightweight Model for Remote Sensing Image Super-Resolution with Strip-like Feature Superpixel Interaction Network. Mathematics 2025, 13, 1720. [Google Scholar] [CrossRef]
- Varghese, R.; Sambath, M. YOLOv8: A novel object detection algorithm with enhanced performance and robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Singapore, 5–7 July 2024; IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar]
- Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. YOLOv9: Learning what you want to learn using programmable gradient information. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 1030–1040. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Ultralytics. YOLO11 Documentation. 2024. Available online: https://docs.ultralytics.com/models/yolo11/ (accessed on 8 May 2026).
- Ultralytics. YOLO12 Documentation. 2025. Available online: https://docs.ultralytics.com/ (accessed on 8 May 2026).
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 19–25 June 2024; pp. 16965–16974. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable transformers for end-to-end object detection. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event, 3–7 May 2021. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar] [CrossRef]






| Dataset | Method | R | 50:95 | ||
|---|---|---|---|---|---|
| CCB | YOLOv8s | 0.358 | 0.330 | 0.104 | 0.157 |
| YOLOv8s+ISP | 0.591 | 0.657 | 0.371 | 0.388 | |
| S-ODv2 | YOLOv8s | 0.611 | 0.683 | 0.434 | 0.419 |
| YOLOv8s+ISP | 0.737 | 0.796 | 0.513 | 0.501 |
| Dataset | THFE | LRSC | R | 50:95 | Params (M) | FLOPs (G) | FPS | ||
|---|---|---|---|---|---|---|---|---|---|
| S-ODv2 | 0.737 | 0.796 | 0.513 | 0.501 | 11.13 | 28.4 | 263.16 | ||
| ✓ | 0.745 | 0.800 | 0.526 | 0.510 | 13.22 | 29.3 | 217.39 | ||
| ✓ | 0.746 | 0.801 | 0.524 | 0.509 | 11.71 | 29.6 | 285.71 | ||
| ✓ | ✓ | 0.747 | 0.804 | 0.541 | 0.516 | 15.13 | 30.4 | 222.22 | |
| CH-Glass | 0.614 | 0.652 | 0.375 | 0.371 | 11.13 | 28.4 | 98.04 | ||
| ✓ | 0.626 | 0.670 | 0.390 | 0.396 | 13.22 | 29.3 | 77.52 | ||
| ✓ | 0.614 | 0.676 | 0.394 | 0.390 | 11.71 | 29.6 | 142.86 | ||
| ✓ | ✓ | 0.618 | 0.685 | 0.407 | 0.398 | 15.13 | 30.4 | 149.25 | |
| Mini-COCO | 0.524 | 0.565 | 0.418 | 0.377 | 11.13 | 28.4 | 454.55 | ||
| ✓ | 0.533 | 0.572 | 0.419 | 0.380 | 13.22 | 29.3 | 333.33 | ||
| ✓ | 0.516 | 0.568 | 0.419 | 0.379 | 11.71 | 29.6 | 434.78 | ||
| ✓ | ✓ | 0.532 | 0.576 | 0.424 | 0.383 | 15.13 | 30.4 | 416.67 |
| Method | R | 50:95 | Params (M) | FLOPs (G) | FPS | ||
|---|---|---|---|---|---|---|---|
| YOLOv5 [22] | 0.574 | 0.650 | 0.376 | 0.370 | 7.2 | 16.5 | 170.12 |
| YOLOv8s [56] | 0.614 | 0.652 | 0.375 | 0.371 | 8.7 | 28.6 | 98.04 |
| YOLOv9 [57] | 0.590 | 0.627 | 0.361 | 0.357 | 8.6 | 27.9 | 100.45 |
| YOLOv10 [58] | 0.616 | 0.669 | 0.355 | 0.366 | 8.9 | 29.4 | 95.30 |
| YOLOv11 [59] | 0.480 | 0.503 | 0.234 | 0.269 | 9.5 | 31.2 | 90.15 |
| YOLOv12 [60] | 0.615 | 0.653 | 0.386 | 0.382 | 9.0 | 30.0 | 94.20 |
| YOLOX [61] | 0.331 | 0.555 | 0.179 | 0.245 | 9.0 | 26.8 | 104.50 |
| RT-DETR [62] | 0.392 | 0.417 | 0.240 | 0.237 | 42.0 | 80.0 | 35.12 |
| Deformable DETR [63] | 0.475 | 0.505 | 0.291 | 0.287 | 40.0 | 78.0 | 22.40 |
| DETR [64] | 0.460 | 0.595 | 0.253 | 0.294 | 41.0 | 86.0 | 18.50 |
| FCOS [65] | 0.314 | 0.508 | 0.162 | 0.235 | 32.0 | 54.0 | 55.60 |
| Mask R-CNN [66] | 0.362 | 0.542 | 0.366 | 0.318 | 44.0 | 178.0 | 12.40 |
| Cascade R-CNN [67] | 0.392 | 0.536 | 0.364 | 0.335 | 68.0 | 240.0 | 8.25 |
| SLA-YOLO (ours) | 0.618 | 0.685 | 0.407 | 0.398 | 15.13 | 30.4 | 149.25 |
| Method | R | 50:95 | Params (M) | FLOPs (G) | FPS | ||
|---|---|---|---|---|---|---|---|
| YOLOv5 [22] | 0.629 | 0.685 | 0.400 | 0.400 | 9.1 | 23.9 | 284.15 |
| YOLOv8s [56] | 0.737 | 0.796 | 0.513 | 0.501 | 11.1 | 28.7 | 263.16 |
| YOLOv9 [57] | 0.708 | 0.765 | 0.487 | 0.469 | 5.16 | 13.41 | 342.10 |
| YOLOv10 [58] | 0.748 | 0.795 | 0.498 | 0.491 | 8.0 | 24.5 | 308.20 |
| YOLOv11 [59] | 0.592 | 0.648 | 0.394 | 0.389 | 9.4 | 21.3 | 325.45 |
| YOLOv12 [60] | 0.738 | 0.798 | 0.514 | 0.501 | 6.72 | 15.47 | 331.18 |
| YOLOX [61] | 0.482 | 0.796 | 0.379 | 0.419 | 5.0 | 17.1 | 255.40 |
| RT-DETR [62] | 0.472 | 0.509 | 0.265 | 0.280 | 42.8 | 130.5 | 78.50 |
| Deformable DETR [63] | 0.567 | 0.616 | 0.349 | 0.359 | 39.8 | 173.0 | 45.12 |
| DETR [64] | 0.522 | 0.717 | 0.379 | 0.382 | 41.6 | 85.8 | 32.40 |
| Mask R-CNN [66] | 0.444 | 0.600 | 0.420 | 0.390 | 44.0 | 178.0 | 21.50 |
| Cascade R-CNN [67] | 0.464 | 0.641 | 0.429 | 0.406 | 69.3 | 243.0 | 14.20 |
| SLA-YOLO (ours) | 0.747 | 0.804 | 0.541 | 0.516 | 15.13 | 30.4 | 222.22 |
| Method | R | 50:95 | Params (M) | FLOPs (G) | FPS | ||
|---|---|---|---|---|---|---|---|
| YOLOv5 [22] | 0.528 | 0.561 | 0.378 | 0.351 | 9.1 | 23.9 | 510.20 |
| YOLOv8s [56] | 0.524 | 0.565 | 0.418 | 0.377 | 11.1 | 28.7 | 454.55 |
| YOLOv9 [57] | 0.504 | 0.543 | 0.402 | 0.362 | 5.16 | 13.41 | 590.35 |
| YOLOv10 [58] | 0.509 | 0.546 | 0.395 | 0.361 | 8.0 | 24.5 | 530.15 |
| YOLOv11 [59] | 0.519 | 0.550 | 0.404 | 0.367 | 9.4 | 21.3 | 545.60 |
| YOLOv12 [60] | 0.525 | 0.566 | 0.419 | 0.378 | 6.72 | 15.47 | 575.40 |
| YOLOX [61] | 0.510 | 0.560 | 0.389 | 0.358 | 5.0 | 17.1 | 550.25 |
| RT-DETR [62] | 0.335 | 0.361 | 0.267 | 0.241 | 42.8 | 130.5 | 150.35 |
| Deformable DETR [63] | 0.406 | 0.437 | 0.324 | 0.292 | 39.8 | 173.0 | 110.45 |
| DETR [64] | 0.403 | 0.459 | 0.170 | 0.214 | 41.6 | 85.8 | 100.20 |
| Mask R-CNN [66] | 0.485 | 0.563 | 0.384 | 0.352 | 44.0 | 178.0 | 50.15 |
| Cascade R-CNN [67] | 0.459 | 0.559 | 0.374 | 0.341 | 69.3 | 243.0 | 35.50 |
| SLA-YOLO (ours) | 0.532 | 0.576 | 0.424 | 0.383 | 15.13 | 30.4 | 416.67 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Lyu, Y.; Wang, X.; Jin, C.; Wei, Y.; Sun, Z. SLA-YOLO—Enhancing YOLO for Tiny Defect Detection in Industrial Defect Scenes. Mathematics 2026, 14, 1973. https://doi.org/10.3390/math14111973
Lyu Y, Wang X, Jin C, Wei Y, Sun Z. SLA-YOLO—Enhancing YOLO for Tiny Defect Detection in Industrial Defect Scenes. Mathematics. 2026; 14(11):1973. https://doi.org/10.3390/math14111973
Chicago/Turabian StyleLyu, Yanxia, Xinqi Wang, Chenyu Jin, Yuanhong Wei, and Zhenyu Sun. 2026. "SLA-YOLO—Enhancing YOLO for Tiny Defect Detection in Industrial Defect Scenes" Mathematics 14, no. 11: 1973. https://doi.org/10.3390/math14111973
APA StyleLyu, Y., Wang, X., Jin, C., Wei, Y., & Sun, Z. (2026). SLA-YOLO—Enhancing YOLO for Tiny Defect Detection in Industrial Defect Scenes. Mathematics, 14(11), 1973. https://doi.org/10.3390/math14111973

