YOLOv11-GLIDE: An Improved YOLOv11n Student Behavior Detection Algorithm Based on Scale-Based Dynamic Loss and Channel Prior Convolutional Attention
Abstract
1. Introduction
- (1)
- The original feature extraction module of YOLOv11n has the problem of insufficient capture of local and global information when dealing with complex occlusion and multi-scale behavior, which makes it difficult for the model to accurately focus on key areas. To this end, this article introduces the CPCA [13] to achieve adaptive enhancement of the salient target area by fusing the two-dimensional feature attention of the channel and space, thereby improving the accuracy and robustness of feature extraction.
- (2)
- The traditional loss function is susceptible to labeling errors and IoU jitter in small target detection, resulting in unstable training. In order to solve this problem, a scale-based dynamic loss function, SD Loss [14], is proposed, which adaptively adjusts the weight ratio of scale and position loss according to the actual area of the target, which significantly improves the detection stability for small-scale behavior.
- (3)
- In the embedded deployment environment, the step convolution of YOLOv11n can easily cause fine-grained information loss and increase the computational burden. In this article, the sparse deep convolution (SPD-Conv) [15] module is used to replace the traditional down-sampling method, which can effectively retain high-resolution features while reducing the number of parameters and calculations, so that the model still has strong feature expression ability under lightweight conditions.
2. YOLO Series
2.1. The Evolution of YOLO Architecture
2.2. YOLOv11n
3. YOLOv11-GLIDE Architecture
3.1. Channel Prior Convolutional Attention
3.2. Scale-Based Dynamic Loss
3.3. Sparse Depthwise Convolution
3.4. Improved YOLOv11n Model Construction Based on Scale Dynamic Loss and Channel Prior Convolution Proof
4. Experiments
4.1. Dataset
4.2. Experimental Environment
4.3. Evaluation Metrics
4.4. The Influence of Attention Mechanism on the Performance of Classroom Behavior Detection and Result Analysis
4.5. Performance Comparison and Result Analysis of Different Algorithms
4.6. Ablation Experiment and Result Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Sheng, X.; Li, S.; Chan, S. Real-time classroom student behavior detection based on improved YOLOv8s. Sci. Rep. 2025, 15, 14470. [Google Scholar] [CrossRef] [PubMed]
- Zhang, H.; Zhou, G.; He, W.; Deng, H. Classroom Behavior Detection Method Based on PLA-YOLO11n. Sensors 2025, 25, 5386. [Google Scholar] [CrossRef] [PubMed]
- Chen, J.; Kao, S.-H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.-H.; Chan, S.-H.G. Run, don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar]
- Lau, K.W.; Po, L.M.; Rehman, Y.A.U. Large separable kernel attention: Rethinking the large kernel attention design in CNN. Expert Syst. Appl. 2024, 236, 121352. [Google Scholar] [CrossRef]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs beat YOLOs on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems 28 (NIPS 2015); Curran Associates, Inc.: Red Hook, NY, USA, 2015; Volume 28. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems 30 (NIPS 2017); Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2008, 20, 61–80. [Google Scholar] [CrossRef] [PubMed]
- Kipf, T.N. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Huang, H.; Chen, Z.; Zou, Y.; Lu, M.; Chen, C.; Song, Y.; Zhang, H.; Yan, F. Channel prior convolutional attention for medical image segmentation. Comput. Biol. Med. 2024, 178, 108784. [Google Scholar] [CrossRef] [PubMed]
- Yang, J.; Liu, S.; Wu, J.; Su, X.; Hai, N.; Huang, X. Pinwheel-shaped convolution and scale-based dynamic loss for infrared small target detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 9202–9210. [Google Scholar]
- Sunkara, R.; Luo, T. No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Grenoble, France, 19–23 September 2022; Springer: Cham, Switzerland, 2022; pp. 443–459. [Google Scholar]
- Wang, C.-Y.; Liao, H.-Y.M. YOLOv1 to YOLOv10: The fastest and most accurate real-time object detection systems. arXiv 2024, arXiv:2408.09332. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Adarsh, P.; Rathi, P.; Kumar, M. YOLO v3-Tiny: Object Detection and Recognition using one stage improved model. In Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 6–7 March 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 687–694. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Khanam, R.; Hussain, M. What is YOLOv5: A deep look into the internal features of the popular object detector. arXiv 2024, arXiv:2407.20892. [Google Scholar] [CrossRef]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
- Sohan, M.; Sai Ram, T.; Rami Reddy, C.V. A review on yolov8 and its advancements. In Proceedings of the International Conference on Data Intelligence and Cognitive Informatics, Tirunelveli, India, 18–20 November 2024; Springer: Singapore, 2024; pp. 529–545. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J. Yolov10: Real-time end-to-end object detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
- Khanam, R.; Hussain, M. Yolov11: An overview of the key architectural enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
- Tian, Y.; Ye, Q.; Doermann, D. Yolov12: Attention-centric real-time object detectors. arXiv 2023, arXiv:2502.12524. [Google Scholar]
- Park, H.; Yoo, Y.; Seo, G.; Han, D.; Yun, S.; Kwak, N. C3: Concentrated-comprehensive convolution and its application to semantic segmentation. arXiv 2018, arXiv:1812.04920. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
- LeCun, Y.; Boser, B.; Denker, J.; Henderson, D.; Howard, R.; Hubbard, W.; Jackel, L. Handwritten digit recognition with a back-propagation network. In Advances in Neural Information Processing Systems 2 (NIPS 1989); Curran Associates, Inc.: Red Hook, NY, USA, 1989; Volume 2. [Google Scholar]
- Yang, F.; Wang, T. Scb-dataset3: A benchmark for detecting student classroom behavior. arXiv 2023, arXiv:2310.02522. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Guo, M.H.; Lu, C.Z.; Hou, Q.; Liu, Z.; Cheng, M.M.; Hu, S.M. SegNeXt: Rethinking convolutional attention design for semantic segmentation. Adv. Neural Inf. Process. Syst. 2022, 35, 1140–1156. [Google Scholar]
- Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient multi-scale attention module with cross-spatial learning. In Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–5. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11534–11542. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Zhu, W.; Yang, Z. Csb-yolo: A rapid and efficient real-time algorithm for classroom student behavior detection. J. Real-Time Image Process. 2024, 21, 140. [Google Scholar] [CrossRef]
- Wang, Z.; Wang, M.; Zeng, C.; Li, L. SBD-net: Incorporating multi-level features for an efficient detection network of student behavior in smart classrooms. Appl. Sci. 2024, 14, 8357. [Google Scholar] [CrossRef]






| Environment Configuration | Name | Related Configuration |
|---|---|---|
| Hardware environment | CPU | 12v CPU Intel(R) Xeon(R) Platinum 8352 V CPU @ 2.10 GHz |
| Running memory | 24 GB | |
| GPU | NVIDIA GeForce RTX 4090 | |
| Software environment | Operating system | Ubuntu 22.04 |
| Python | 3.12 | |
| PyTorch | 2.3.0 | |
| CUDA | 12.1 |
| Model | Precision | Recall | mAP@0.5 | mAP@0.5-0.95 | Parameters/106 | GFLOPS | FPS (Frame/s) |
|---|---|---|---|---|---|---|---|
| - | 0.859 | 0.832 | 0.876 | 0.673 | 2.6 | 6.3 | 126.6 |
| SE | 0.862 | 0.832 | 0.879 | 0.669 | 2.6 | 6.3 | 125.8 |
| CBAM | 0.864 | 0.833 | 0.881 | 0.672 | 2.6 | 6.4 | 124.2 |
| MSCA | 0.868 | 0.836 | 0.884 | 0.682 | 2.7 | 6.5 | 123.9 |
| EMA | 0.866 | 0.831 | 0.883 | 0.675 | 2.6 | 6.5 | 124.8 |
| ECA | 0.861 | 0.832 | 0.877 | 0.667 | 2.6 | 6.3 | 126.1 |
| CPCA | 0.871 | 0.838 | 0.889 | 0.702 | 2.7 | 6.5 | 123.7 |
| Model | Precision | Recall | mAP@0.5 | mAP@0.5-0.95 | Parameters/106 | GFLOPS | FPS (Frame/s) |
|---|---|---|---|---|---|---|---|
| YOLOv3-tiny | 0.819 | 0.804 | 0.827 | 0.597 | 12.1 | 18.9 | 112.1 |
| YOLOv5n | 0.833 | 0.821 | 0.865 | 0.657 | 2.5 | 7.1 | 117.4 |
| YOLOv6n | 0.847 | 0.826 | 0.857 | 0.656 | 4.2 | 11.8 | 121.4 |
| YOLOv8n | 0.851 | 0.830 | 0.872 | 0.673 | 3.0 | 8.1 | 125.5 |
| YOLOv10n | 0.853 | 0.819 | 0.875 | 0.674 | 2.7 | 8.2 | 114.3 |
| YOLOv11n | 0.859 | 0.832 | 0.876 | 0.673 | 2.6 | 6.3 | 126.6 |
| YOLOv12n | 0.856 | 0.828 | 0.877 | 0.676 | 2.6 | 6.3 | 123.2 |
| SSD | 0.848 | 0.794 | 0.855 | 0.614 | 17.8 | 12.3 | 87.4 |
| Faster R-CNN | 0.867 | 0.837 | 0.882 | 0.628 | 47.5 | 71.8 | 8.8 |
| RT-DETR | 0.869 | 0.840 | 0.889 | 0.689 | 44.7 | 27.6 | 115.2 |
| Ours | 0.879 | 0.842 | 0.898 | 0.724 | 2.3 | 5.6 | 127.9 |
| Model | Precision | Recall | mAP@0.5 | mAP@0.5-0.95 | Parameters/106 |
|---|---|---|---|---|---|
| CSB-YOLO [38] | 0.855 | 0.824 | 0.747 | 0.576 | 1.9 |
| SBD-Net [39] | 0.863 | 0.816 | 0.745 | 0.577 | 36.5 |
| PLA-YOLO11n | 0.871 | 0.824 | 0.764 | 0.513 | 18.4 |
| Ours | 0.879 | 0.842 | 0.776 | 0.582 | 2.7 |
| Model | Precision | Recall | mAP@0.5 | mAP@0.5-0.95 | Parameters | GFLOPS | FPS |
|---|---|---|---|---|---|---|---|
| YOLOv11n | 0.859 | 0.832 | 0.876 | 0.673 | 2.6 | 6.3 | 126.6 |
| YOLOv11n + SPD-Conv | 0.864 | 0.837 | 0.882 | 0.695 | 2.2 | 5.4 | 129.6 |
| YOLOv11n + CPCA | 0.871 | 0.838 | 0.889 | 0.702 | 2.7 | 6.5 | 123.7 |
| YOLOv11n + SDLoss | 0.864 | 0.835 | 0.885 | 0.691 | 2.6 | 6.3 | 126.6 |
| YOLOv11n + SPD-Conv + CPCA | 0.877 | 0.841 | 0.896 | 0.718 | 2.3 | 5.6 | 127.9 |
| YOLOv11n + SPD-Conv + SDLoss | 0.867 | 0.838 | 0.887 | 0.707 | 2.2 | 5.4 | 129.6 |
| YOLOv11n + SDLoss + CPCA | 0.875 | 0.840 | 0.894 | 0.719 | 2.7 | 6.5 | 123.7 |
| Ours | 0.879 | 0.842 | 0.898 | 0.724 | 2.3 | 5.6 | 127.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, H.; Gao, G.; Zhang, W.; Li, K.; Che, N.; Yan, C.; Wang, L. YOLOv11-GLIDE: An Improved YOLOv11n Student Behavior Detection Algorithm Based on Scale-Based Dynamic Loss and Channel Prior Convolutional Attention. Sensors 2025, 25, 6972. https://doi.org/10.3390/s25226972
Wang H, Gao G, Zhang W, Li K, Che N, Yan C, Wang L. YOLOv11-GLIDE: An Improved YOLOv11n Student Behavior Detection Algorithm Based on Scale-Based Dynamic Loss and Channel Prior Convolutional Attention. Sensors. 2025; 25(22):6972. https://doi.org/10.3390/s25226972
Chicago/Turabian StyleWang, Haiyan, Guiyuan Gao, Wei Zhang, Kejing Li, Na Che, Caihua Yan, and Liu Wang. 2025. "YOLOv11-GLIDE: An Improved YOLOv11n Student Behavior Detection Algorithm Based on Scale-Based Dynamic Loss and Channel Prior Convolutional Attention" Sensors 25, no. 22: 6972. https://doi.org/10.3390/s25226972
APA StyleWang, H., Gao, G., Zhang, W., Li, K., Che, N., Yan, C., & Wang, L. (2025). YOLOv11-GLIDE: An Improved YOLOv11n Student Behavior Detection Algorithm Based on Scale-Based Dynamic Loss and Channel Prior Convolutional Attention. Sensors, 25(22), 6972. https://doi.org/10.3390/s25226972

