Foreign Object Detection Model for Retail Cabinets Under Complex Backgrounds
Abstract
1. Introduction
2. Related Work
2.1. Deep Learning-Based Foreign Object Detection
2.2. Low-Light Image Enhancement
- (1)
- A lightweight improved YOLOv11n-FOD (foreign object detection) model is proposed for retail cabinet foreign object detection, achieving an optimal balance between high detection accuracy and real-time inference efficiency.
- (2)
- A dynamic up-and-down sampling module integrated with the CBAM is introduced, significantly enhancing multi-scale feature extraction and improving detection robustness under complex background interference.
- (3)
- By incorporating advanced low-light image enhancement techniques, the proposed framework demonstrates superior detection robustness compared to state-of-the-art detectors in dimly illuminated retail scenarios.
3. Materials and Methods
3.1. Overall Algorithm Framework
3.2. C3K2-SAC
- Parameter Analysis:
3.3. CBAM
3.4. CARAFE
4. Results and Analysis
4.1. Experimental Environment Setup
4.2. Object Detection Evaluation Metrics
- APAP (average precision) is calculated as the area under the precision–recall (P-R) curve to evaluate the model’s performance in the object detection task, reflecting the average precision of the model across different recall levels. The calculation formula is as follows:
- RecallIn the formula, FN represents the total number of false negatives, and TP represents the number of true positives. Precision denotes the proportion of correctly detected positive samples among all predicted positive results.
- PrecisionIn the formula, FP represents the number of negative samples incorrectly detected as positive by the model. Let the total number of samples be n, and k is the number of samples detected from them. The recall is denoted as , and represents the maximum precision for recall greater than . Precision is defined as
- mAPis the mean of the average precision across all classes and is defined asIn the formula, N represents the total number of classes, and denotes the average precision of the i-th class.
- Inference TimeTo comprehensively evaluate the inference capability of the proposed model, the inference time is adopted as a key evaluation metric. This metric is defined as the cumulative sum of three components: preprocessing, model inference, and postprocessing. As illustrated in the experimental results, this composite indicator objectively reflects the real-time responsiveness of the entire object detection framework in practical deployment scenarios.
4.3. Experimental Dataset
4.4. Heatmap Visualization Experiment
4.5. Occlusion Comparasion Experiment
- Python: 3.10.14;
- Ultralytics framework: 8.3.23;
- PyTorch: 2.5.1;
- Hardware: NVIDIA RTX A6000 GPU with 48670 MiB video memory.
4.6. Comparison Under Diverse Brightness Levels
4.7. Ablation Experiment
4.8. Attention Comparation Experiment
4.9. Training Loss Curve Comparison
4.10. Generalization Experiment
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wang, Z.; Zhang, S.; Chen, Y.; Xia, Y.; Wang, H.; Jin, R.; Wang, C.; Fan, Z.; Wang, Y.; Wang, B. Detection of small foreign objects in Pu-erh sun-dried green tea: An enhanced YOLOv8 neural network model based on deep learning. Food Control 2025, 168, 110890. [Google Scholar]
- Khan, Z.; Yoon, S.C.; Bhandarkar, S.M. Deep learning model compression and hardware acceleration for high-performance foreign material detection on poultry meat using NIR hyperspectral imaging. Sensors 2025, 25, 970. [Google Scholar] [CrossRef] [PubMed]
- Li, Q.; Zeng, R.; Wang, G.; Yang, T. Conveyor belt foreign object detection method based on improved YOLOv11 and ESRGAN. Sci. Rep. 2026; Online ahead of print. [CrossRef]
- Chen, Z.; Yang, J.; Li, F.; Feng, Z.; Chen, L.; Jia, L.; Li, P. Foreign object detection method for railway catenary based on a scarce image generation model and lightweight perception architecture. IEEE Trans. Circuits Syst. Video Technol. 2025, 36, 1377–1391. [Google Scholar] [CrossRef]
- Gu, W.; Gao, W.; Zou, Y.; Ma, S. ATW-YOLO: Reconstructing the downsampling process and attention mechanism of YOLO network for rail foreign body detection. Signal Image Video Process. 2025, 19, 368. [Google Scholar] [CrossRef]
- Bin, F.; He, J.; Qiu, K.; Hu, L.; Zheng, Z.; Sun, Q. CI-YOLO: A lightweight foreign object detection model for inspecting transmission line. Measurement 2025, 242, 116193. [Google Scholar]
- Dong, Z.; Yang, Q.; Chen, H.L.; Zhou, H.; Gao, D. A lightweight transformer-based framework for real-time foreign object detection in complex railway environments. J. Real-Time Image Process. 2026, 23, 3. [Google Scholar]
- Mushtaq, Y.; Ali, W.; Ghani, U.; Khan, R.U.; Adak, A.K. Advancing aviation safety and sustainable infrastructure: High-accuracy detection and classification of foreign object debris using deep learning models. Int. J. Sustain. Dev. Goals 2025, 1, 82–98. [Google Scholar] [CrossRef]
- Luo, Z.; Fu, Z.; Huang, Z.; Fu, W.; Zhu, Z.; Chen, X. A day-night cross-modal network for robust commodity recognition under low-light illumination. Eng. Appl. Artif. Intell. 2026, 164, 113164. [Google Scholar] [CrossRef]
- Hou, P.; Huang, S. BCSM-YOLO: An improved product package recognition algorithm for unmanned retail stores based on YOLOv11. IEEE Access 2025, 13, 139665–139679. [Google Scholar] [CrossRef]
- Patel, S. Multi-Modal product recognition in retail environments: Enhancing accuracy through integrated vision and OCR approaches. World J. Adv. Res. Rev. 2025, 25, 1837–1844. [Google Scholar] [CrossRef]
- Chadha, S. Vision-Based Object Recognition in Retail. Int. J. Sci. Res. Eng. Trends 2025, 11, 1–8. [Google Scholar]
- Agranata, I.Y.B.; Hamami, F.; Suakanto, S. Detection of Customer Interaction and Density Patterns Using Pose Estimation and Object Detection for Retail Store Layout Optimization. In Proceedings of the 2025 International Seminar on Intelligent Technology and Its Applications (ISITIA); IEEE: Piscataway, NJ, USA, 2025; pp. 230–235. [Google Scholar]
- Yan, Q.; Feng, Y.; Zhang, C.; Pang, G.; Shi, K.; Wu, P.; Dong, W.; Sun, J.; Zhang, Y. Hvi: A new color space for low-light image enhancement. In Proceedings of the Computer Vision and Pattern Recognition Conference; IEEE: Piscataway, NJ, USA, 2025; pp. 5678–5687. [Google Scholar]
- Xu, L.; Hu, C.; Hu, Y.; Jing, X.; Cai, Z.; Lu, X. UPT-Flow: Multi-scale transformer-guided normalizing flow for low-light image enhancement. Pattern Recognit. 2025, 158, 111076. [Google Scholar]
- Feijoo, D.; Benito, J.C.; Garcia, A.; Conde, M.V. Darkir: Robust low-light image restoration. In Proceedings of the Computer Vision and Pattern Recognition Conference; IEEE: Piscataway, NJ, USA, 2025; pp. 10879–10889. [Google Scholar]
- Ciubotariu, G.; Rehman, A.; Dharejo, F.A.; Naqvi, R.A.; Conde, M.V.; Timofte, R.; Jin, Z.; Wu, H.; Zhang, W.; Ye, C.; et al. Low Light Image Enhancement Challenge at NTIRE 2026. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2026. [Google Scholar]
- Zhao, Q.; Li, G.; He, B.; Shen, R. Deep learning for low-light vision: A comprehensive survey. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 15685–15705. [Google Scholar] [PubMed]
- Liu, F.; Fan, L. A review of advancements in low-light image enhancement using deep learning. Neurocomputing 2025, 652, 131052. [Google Scholar] [CrossRef]
- Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C.C.; Lin, D. Carafe: Content-aware reassembly of features. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2019; pp. 3007–3016. [Google Scholar]
- Tian, J.; Lee, S.; Kang, K. Faster R-CNN in healthcare and disease detection: A comprehensive review. In Proceedings of the 2025 International Conference on Electronics, Information, and Communication (ICEIC); IEEE: Piscataway, NJ, USA, 2025; pp. 1–6. [Google Scholar]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2019; pp. 6569–6578. [Google Scholar]
- Ale, L.; Zhang, N.; Li, L. Road damage detection using RetinaNet. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data); IEEE: Piscataway, NJ, USA, 2018; pp. 5197–5200. [Google Scholar]
- Zhang, H.; Li, F.; Liu, S.; Zhang, L.; Su, H.; Zhu, J.; Ni, L.M.; Shum, H.Y. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv 2022, arXiv:2203.03605. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Sapkota, R.; Cheppally, R.H.; Sharda, A.; Karkee, M. Rf-detr object detection vs yolov12: A study of transformer-based and cnn-based architectures for single-class and multi-class greenfruit detection in complex orchard environments under label ambiguity. arXiv 2025, arXiv:2504.13099. [Google Scholar]
- Sapkota, R.; Karkee, M. Ultralytics YOLO evolution: An overview of YOLO26, YOLO11, YOLOv8 and YOLOv5 object detectors for computer vision and pattern recognition. arXiv 2025, arXiv:2510.09653. [Google Scholar]
- Afifah, V.; Erniwati, S. Yolov8 for object detection: A comprehensive review of advances, techniques, and applications. IJACI Int. J. Adv. Comput. Inform. 2026, 2, 53–61. [Google Scholar]
- Ghahremani, A.; Adams, S.D.; Norton, M.; Khoo, S.Y.; Kouzani, A.Z. Detecting defects in solar panels using the yolo v10 and v11 algorithms. Electronics 2025, 14, 344. [Google Scholar] [CrossRef]
- Tian, Y.; Ye, Q.; Doermann, D. Yolov12: Attention-centric real-time object detectors. Adv. Neural Inf. Process. Syst. 2026, 38, 78433–78457. [Google Scholar]
- Liu, Z.; Wang, J.; Wu, H.; Xue, F.; Qin, Z.; Sun, S.; Guo, X.; Zhao, F. Water-aware real-time detection of floating plastic debris via an enhanced YOLOv13 framework for aquatic pollution monitoring. Expert Syst. Appl. 2026, 313, 131552. [Google Scholar]
- Hidayatullah, P.; Tubagus, R. YOLO26: A Comprehensive Architecture Overview and Key Improvements. arXiv 2026, arXiv:2602.14582. [Google Scholar]
- Goldman, E.; Herzig, R.; Eisenschtat, A.; Goldberger, J.; Hassner, T. Precise detection in densely packed scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2019; pp. 5227–5236. [Google Scholar]















| NAME | Configuration Information |
|---|---|
| CPU | Intel(R) Core(TM) i7-14700K 3.40 GHz |
| Graphics Card | NVIDIA GeForce RTX 4080 16 G / NVIDIA A6000 (50 G) |
| DeepLearning Environment | CUDA v12.4 + CUDNN v8.9.8 |
| Pytorch | 2.5.1 |
| Operating System | win11 Professional Edition |
| Camera | HF899_27mm |
| Compilation Software | Pycharm 2025.3 EAP |
| Optimizer | SGD |
| Epochs | 100 |
| Batch size | 16 |
| Learning rate | 0.001 |
| augment | True |
| Models | GFLOPs | Recall | mAP@50 | mAP@50-95 | Inference_Time (ms) |
|---|---|---|---|---|---|
| Faster R-CNN [21] | 20.9 | 0.899 | 0.724 | 0.329 | 7.8 |
| CenterNet [22] | 15.32 | 0.902 | 0.714 | 0.330 | 4.3 |
| RetinaNet [23] | 48.64 | 0.897 | 0.710 | 0.328 | 6.3 |
| DETR [24] | 156.77 | 0.905 | 0.718 | 0.312 | 9.5 |
| Deformable DETR [25] | 89.30 | 0.908 | 0.733 | 0.315 | 7.2 |
| Rf-DETR [26] | 14.43 | 0.910 | 0.725 | 0.312 | 6.1 |
| YOLOv5n [27] | 4.50 | 0.885 | 0.880 | 0.420 | 2.6 |
| YOLOv8n [28] | 7.40 | 0.891 | 0.891 | 0.483 | 3.1 |
| YOLOv10n [29] | 7.80 | 0.749 | 0.749 | 0.430 | 2.7 |
| YOLOv11n [3] | 6.50 | 0.911 | 0.956 | 0.467 | 2.6 |
| YOLOv12 [30] | 6.30 | 0.907 | 0.954 | 0.456 | 3.8 |
| YOLOv13 [31] | 6.40 | 0.913 | 0.953 | 0.480 | 4.8 |
| YOLOv26 [32] | 5.80 | 0.788 | 0.855 | 0.432 | 2.1 |
| Model | GFLOPS | mAP50 | mAP50-95 | Recall | Inference Time (ms) |
|---|---|---|---|---|---|
| Faster R-CNN [21] | 88.32 | 0.975 | 0.765 | 0.953 | 7.6 |
| CenterNet [22] | 15.32 | 0.976 | 0.764 | 0.896 | 3.4 |
| Retinanet [23] | 48.64 | 0.973 | 0.763 | 0.953 | 6.7 |
| DETR [24] | 156.78 | 0.977 | 0.761 | 0.942 | 8.8 |
| Deformable DETR [25] | 8.92 | 0.975 | 0.763 | 0.938 | 8.3 |
| Rf-DETR [26] | 14.43 | 0.983 | 0.788 | 0.955 | 5.6 |
| YOLOv5n [27] | 4.50 | 0.977 | 0.772 | 0.948 | 2.6 |
| YOLOv8n [28] | 8.70 | 0.981 | 0.754 | 0.958 | 2.7 |
| YOLOv10n [29] | 8.70 | 0.963 | 0.772 | 0.964 | 2.5 |
| YOLOv11n [3] | 6.50 | 0.984 | 0.768 | 0.992 | 2.3 |
| YOLOv12 [30] | 6.30 | 0.973 | 0.763 | 0.986 | 2.7 |
| YOLOv13 [31] | 6.40 | 0.984 | 0.774 | 0.967 | 2.9 |
| YOLOv26 [32] | 5.80 | 0.983 | 0.771 | 0.961 | 1.3 |
| YOLOv11n-FOD | 6.60 | 0.992 | 0.776 | 0.995 | 2.4 |
| Index | C3K2-SAC | CARAFE | CBAM + CONV | GFLOPS | mAP50 | mAP50-95 | Recall | Inference Time (ms) |
|---|---|---|---|---|---|---|---|---|
| 0 | × | × | × | 6.5 | 0.984 | 0.768 | 0.966 | 2.1 |
| 1 | ✓ | × | × | 6.8 | 0.986 | 0.742 | 0.962 | 2.7 |
| 2 | × | ✓ | × | 5.8 | 0.984 | 0.753 | 0.968 | 2.6 |
| 3 | × | × | ✓ | 6.6 | 0.985 | 0.771 | 0.973 | 2.7 |
| 4 | ✓ | ✓ | × | 6.3 | 0.988 | 0.778 | 0.970 | 1.9 |
| 5 | ✓ | × | ✓ | 7.1 | 0.983 | 0.779 | 0.980 | 2.1 |
| 6 | × | ✓ | ✓ | 7.1 | 0.983 | 0.779 | 0.980 | 2.8 |
| 7 | ✓ | ✓ | ✓ | 6.8 | 0.988 | 0.778 | 0.976 | 2.8 |
| Attn Type | GFLOPS | mAP50 | mAP50-95 | Recall | Inference Time (ms) |
|---|---|---|---|---|---|
| CBAM | 6.6 | 0.988 | 0.788 | 0.976 | 2.3 |
| CA | 6.6 | 0.987 | 0.746 | 0.983 | 2.6 |
| ECA | 6.6 | 0.985 | 0.782 | 0.960 | 2.7 |
| SENet | 6.6 | 0.972 | 0.782 | 0.965 | 2.8 |
| EMA | 6.6 | 0.959 | 0.786 | 0.959 | 3.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhou, Z.; Xie, K.; Zhang, W.; He, J. Foreign Object Detection Model for Retail Cabinets Under Complex Backgrounds. Electronics 2026, 15, 2920. https://doi.org/10.3390/electronics15132920
Zhou Z, Xie K, Zhang W, He J. Foreign Object Detection Model for Retail Cabinets Under Complex Backgrounds. Electronics. 2026; 15(13):2920. https://doi.org/10.3390/electronics15132920
Chicago/Turabian StyleZhou, Zhenshuo, Kai Xie, Wei Zhang, and Jianbiao He. 2026. "Foreign Object Detection Model for Retail Cabinets Under Complex Backgrounds" Electronics 15, no. 13: 2920. https://doi.org/10.3390/electronics15132920
APA StyleZhou, Z., Xie, K., Zhang, W., & He, J. (2026). Foreign Object Detection Model for Retail Cabinets Under Complex Backgrounds. Electronics, 15(13), 2920. https://doi.org/10.3390/electronics15132920

