Unordered Stacked Pillbox Detection Algorithm Based on Improved YOLOv8
Abstract
1. Introduction
- (i)
- An improved YOLOv8 detection algorithm is proposed by integrating the SPD-Conv and BiFormer. These components improve the ability of the model to detect small objects while maintaining high computational efficiency.
- (ii)
- To address boundary discontinuities and periodic anomalies in angle prediction for rotated targets, the CSL technique is incorporated. By transforming discrete angle classification into a continuous cyclic distribution, the proposed method significantly improves the accuracy of the prediction of rotation angle for small objects.
- (iii)
- The proposed algorithm demonstrates high-speed and high-precision performance in detecting disordered stacking pillboxes. Validation is carried out using a substantial dataset of real-world images collected from production environments, confirming the practical applicability of the Computing algorithm.
2. Materials and Methods
2.1. Pillbox Detection Based on YOLOv8
2.2. Small-Target Detection in Pillbox Detection
2.3. Rotating Target Detection in Pillbox Detection
2.4. Overall Architecture
2.5. SPD-Conv for Feature Extraction
- (i)
- Spatial attention: The embedded attention mechanism enables SPD-Conv to focus more precisely on the spatial distribution of visual cues. This facilitates the identification of subtle structural features critical for the detection of small targets.
- (ii)
- Parameter efficiency: In contrast to standard convolution operations, SPD-Conv requires fewer parameters while maintaining or even enhancing performance. This significantly reduces the complexity of the model and the memory footprint.
- (iii)
- Computational efficiency: By reorganizing feature maps and leveraging efficient convolutional operations, SPD-Conv achieves improved perceptual capability under similar computational constraints, thereby enhancing overall model throughput.
2.6. Vision Transformer with Bi-Level Routing Attention
2.7. Rotation Angle Prediction Based on CSL
- 1.
- In the regression-based approach, two distinct boundary cases exist—vertical and horizontal—leading to ambiguity when the predicted angle lies near these transitions. In contrast, the regression method limits this issue to the vertical boundary. However, edge swappability remains a source of instability, particularly for near-boundary angle predictions.
- 2.
- Conventional classification loss functions (for example, cross-entropy) are inherently agnostic to the angular distance between the predicted label and the true label. For example, if the ground truth is , a prediction of and incurs the same loss—although the former is significantly closer in a rotational sense. This uniform treatment of all incorrect classes leads to suboptimal learning for angular regression tasks.
2.8. Theoretical Complexity Summary
2.9. Method Stability and Theoretical Justification
- (1)
- SPD-Conv—spatial unfolding and information retention. SPD-Conv reorganizes local spatial neighborhoods into expanded channel representations without changing the receptive-field topology. This operation increases the effective rank of the local covariance of feature activations, thereby enriching descriptive capacity for small-scale patterns. Because it does not depend on specific residual-block formulations or kernel sizes, SPD-Conv functions as a plug-in operator compatible with a wide range of convolutional backbones. Its performance depends mainly on the statistical properties of local image patches, not on YOLO-specific structural details, which confers robustness to future updates [27].
- (2)
- BiFormer—hierarchical sparse routing and signal isolation. BiFormer introduces a coarse-to-fine attention routing mechanism that adaptively filters irrelevant spatial tokens. By limiting computation to the top-k informative regions, the effective signal-to-noise ratio of feature aggregation increases. Because this routing process relies only on the existence of multi-scale feature maps—a standard property across YOLO versions—it remains stable even if the backbone architecture evolves [28].
- (3)
- CSL—continuity in orientation representation. Circular Smooth Label (CSL) transforms angular regression into a smoothed classification space that preserves topological adjacency between labels. This change concerns the label representation rather than the model internals, making it independent of any particular detection head configuration. Consequently, CSL ensures continuity in output prediction and contributes to general stability across models.
- (4)
- Integrated stability perspective. The three modules act at distinct yet complementary levels—local encoding, attention routing, and label representation. None alter the global detection objective or training regime, which minimizes coupling to a specific YOLO implementation. Therefore, even as YOLO continues to evolve, the proposed modules are expected to provide consistent benefits provided that (i) high-resolution feature maps remain available for SPD-Conv, (ii) multi-scale feature fusion persists for BiFormer, and (iii) classification-style output heads exist for CSL. This modular design mitigates the “inherent instability” concern raised by the reviewer and explains, at a theoretical level, why the method’s efficacy should extend across future iterations of the YOLO framework.
3. Results
3.1. Setup
3.2. Ablation and Sensitivity Studies
- : learning rate scaling factor (0.5×–1.5×);
- : attention sparsity threshold in BiFormer (k = 6–12);
- : CSL smoothness factor ( = 1–5).
3.3. Comparison with Advanced Methods
3.4. Per-Class Evaluation and Precision–Recall Analysis
4. Discussion
5. Conclusions and Recommendations for Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Nomenclature
| SPD-Conv | Spatial Positive Definite Convolution |
| BiFormer | Bi-Level Routing Attention Transformer |
| CSL | Circular Smooth Label |
| CNN | Convolutional Neural Network |
| mAP | Mean Average Precision |
| FLOPs | Floating Point Operations |
| GFLOPs | Giga Floating Point Operations |
| PP-HGNetv2 | PaddlePaddle High-Granularity Network v2 |
References
- Jiang, W.; Chen, Z.; Zhang, H.; Li, J. MelSPPNET—A self-explainable recognition model for emerald ash borer vibrational signals. Front. For. Glob. Change 2024, 7, 1239424. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hier Archies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Wang, R.; Jing, Y.; Gu, C.; He, S.; Chen, J. End-to-end multi-target flexible job shop scheduling with deep reinforcement learning. IEEE Internet of Things J. 2024, 12, 4420–4434. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Kijdech, D.; Vongbunyong, S. Manipulation of a Complex Object Using Dual-Arm Robot with Mask R-CNN and Grasping Strategy. J. Intell. Robot. Syst. 2024, 110, 103. [Google Scholar] [CrossRef]
- Liu, L.; Xie, L.; Zhang, P.; Wang, Y.; Tian, H. Method of Dead Standing Tree Detection Based on RetinaNet Object Detection Network. Int. J. New Dev. Eng. Soc. 2024, 8, 110–114. [Google Scholar] [CrossRef]
- Zhou, R.; Peng, H.; Liu, S. Research on Employee Abnormal Behavior Detection Algorithm Based on Improved SSD. Adv. Comput. Signals Syst. 2024, 8, 83–87. [Google Scholar] [CrossRef]
- Hua, L.; Wu, X.; Gu, J. Optimization of intelligent guided vehicle vision navigation based on improved YOLOv2. Rev. Sci. Instrum. 2024, 95, 351–359. [Google Scholar] [CrossRef]
- Xiao, D.; Shan, F.; Li, Z.; Tuan, B.; Liu, X.; Li, X. A Target Detection Model Based on Improved Tiny-Yolov3 Under the Environment of Mining Truck. IEEE Access 2019, 7, 123757–123764. [Google Scholar] [CrossRef]
- Panda, J. Refining Yolov4 for vehicle detection. Int. J. Adv. Res. Eng. Technol. 2020, 11, 409–419. [Google Scholar]
- Pan, Q.; Liu, Y.; Wei, S. Design of a multi-category drug information integration platform for intelligent pharmacy management: A needs analysis study. Medicine 2024, 103, E37591. [Google Scholar] [CrossRef]
- Jin, Y.; Wu, X.; Dong, H.; Yu, L.; Zhang, W. Helmet wearing detection algorithm based on improved YOLOv4. Comput. Sci. 2021, 48, 268–275. [Google Scholar]
- Wang, J.; Huang, Y.; Liu, Y. Low contrast stamped dates recognition for pill packaging boxes based on YOLO-SFD and image fusion. Digit. Signal Process. 2024, 153, 104602. [Google Scholar] [CrossRef]
- Sun, W.; Niu, X.; Wu, Z.; Guo, Z. Lightweight Detection and Counting Method for Pill Boxes Using Machine Vision. Electronics 2024, 13, 4953. [Google Scholar] [CrossRef]
- Li, H.; Chen, B.; Qian, L.; Zeng, K. Improvement of YOLOv5 algorithm for detecting counting of pill boxes in drug vending machines. Comput. Eng. Des. 2024, 45, 1572–1579. [Google Scholar]
- Liu, D.; Zhang, F.; Meng, T.; Ding, Y. Research on Deep Learning-based Drug Name Recognition Technology for Pill Boxes. J. Qingdao Univ. 2021, 34, 29–33+39. [Google Scholar]
- My, L.N.T.; Le, V.-T.; Vo, T.; Hoang, V.T. A Comprehensive Review of Pill Image Recognition. Comput. Mater. Contin. 2025, 82, 3693–3740. [Google Scholar] [CrossRef]
- Yuan, B.; Lang, Y.; Chen, L.; Li, C. Design of Multi-target Pillbox Gripping System Based on YOLOv5 and U-NET. Packag. Eng. 2024, 45, 141–149. [Google Scholar]
- Shen, Z.; Chen, W.; Gan, Z. Research on rotating target detection algorithm based on improved YOLOv5 and its application. Packag. Eng. 2023, 44, 229–237. [Google Scholar]
- Wei, C.; Ni, W.; Qin, Y.; Wu, J.; Zhang, H.; Liu, Q.; Cheng, K.; Bian, H. RiDOP: A Rotation-Invariant Detector with Simple Oriented Proposals in Remote Sensing Images. Remote Sens. 2023, 15, 594. [Google Scholar] [CrossRef]
- Li, H.; Ma, H. MCSC-Net: A Rotated Object Detection Strategy for Densely Arranged Objects. Appl. Soft Comput. 2024, 166, 112181. [Google Scholar] [CrossRef]
- Wang, L.; Shen, Y.; Yang, J.; Zeng, H.; Gao, H. Rotated Points for Object Detection in Remote Sensing Images. IET Image Process. 2024, 18, 1655–1665. [Google Scholar] [CrossRef]
- Zhu, L.; Wang, X.; Ke, Z.; Zhang, W.; Lau, R. BiFormer: Vision Transformer with Bi-Level Routing Attention. arXiv 2023, arXiv:2303.08810. [Google Scholar]
- Cao, X.; Zhang, Y.; Lang, S.; Gong, Y. Swin-Transformer-Based YOLOv5 for Small-Object Detection in Remote Sensing Images. Sensors 2023, 23, 3634. [Google Scholar] [CrossRef]
- Tsai, C.-Y.; Lin, W.-C. Precise Orientation Estimation for Rotated Object Detection Based on a Unit Vector Coding Approach. Electronics 2024, 13, 4402. [Google Scholar] [CrossRef]
- Yaseen, M. What is YOLOv8: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector. arXiv 2024, arXiv:2408.15857. [Google Scholar]
- Li, J.; Wu, J.; Shao, Y. FSNB-YOLOv8: Improvement of Object Detection Model for Surface Defects Inspection in Online Industrial Systems. Appl. Sci. 2024, 14, 7913. [Google Scholar] [CrossRef]
- Li, K.; Wan, G.; Cheng, G. Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. Appl. Intell. 2025, 159, 296–307. [Google Scholar] [CrossRef]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-time Object Detection. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
- Tang, S.; Zhang, S.; Fang, Y. HIC-YOLOv5: Improved YOLOv5 For Small Object Detection. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 6614–6619. [Google Scholar]
















| Module | Theoretical Complexity (Big-O) | Estimated GFLOPs * | Parameter Change vs. Baseline | Memory Change vs. Baseline | Main Benefit |
|---|---|---|---|---|---|
| Baseline YOLOv8 | 16.6 | – | – | Standard convolutional baseline | |
| +SPD-Conv | 10.8 (down 35%) | +3.8% | +5.1% | Expanded receptive field, richer small-target features | |
| +BiFormer | 9.3 (down 44%) | +2.9% | −11.2% | Sparse attention, lower FLOPs | |
| +CSL | 9.7 (down 42%) | +1.4% | negligible | Smooth angle classification | |
| Full Model (Ours) | 4.9 (down 70%) | +8.1% total | −6.0% overall | Balanced precision/efficiency |
| Model | Precision (%) | Recall (%) | mAP (%) | GFLOPs |
|---|---|---|---|---|
| YOLOv8 (baseline) | 16.6 | |||
| +SPD-Conv | 10.8 | |||
| +BiFormer | 9.3 | |||
| +CSL | 9.7 | |||
| All modules (Ours) | 4.9 |
| Model | Precision (%) | Recall (%) | mAP (%) | Weight Size (MB) | GFLOPs | Detection Time (ms) |
|---|---|---|---|---|---|---|
| YOLOv8 | 90.90 | 87.86 | 90.81 | 5.96 | 16.6 | 220.47 |
| RT-DETR | 93.37 | 88.99 | 93.41 | 44.7 | 14.1 | 271.71 |
| HIC-YOLOv8 | 93.86 | 89.43 | 93.68 | 10.5 | 9.4 | 212.64 |
| Our method | 94.24 | 90.39 | 94.16 | 4.4 | 4.9 | 200.83 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pan, J.; Zhou, R.; Feng, J.; Wu, M.; Wu, X.; Dong, H. Unordered Stacked Pillbox Detection Algorithm Based on Improved YOLOv8. Big Data Cogn. Comput. 2025, 9, 300. https://doi.org/10.3390/bdcc9120300
Pan J, Zhou R, Feng J, Wu M, Wu X, Dong H. Unordered Stacked Pillbox Detection Algorithm Based on Improved YOLOv8. Big Data and Cognitive Computing. 2025; 9(12):300. https://doi.org/10.3390/bdcc9120300
Chicago/Turabian StylePan, Jiahang, Rui Zhou, Jie Feng, Mincheng Wu, Xiang Wu, and Hui Dong. 2025. "Unordered Stacked Pillbox Detection Algorithm Based on Improved YOLOv8" Big Data and Cognitive Computing 9, no. 12: 300. https://doi.org/10.3390/bdcc9120300
APA StylePan, J., Zhou, R., Feng, J., Wu, M., Wu, X., & Dong, H. (2025). Unordered Stacked Pillbox Detection Algorithm Based on Improved YOLOv8. Big Data and Cognitive Computing, 9(12), 300. https://doi.org/10.3390/bdcc9120300

