Student Classroom Behavior Recognition Based on YOLOv8 and Attention Mechanism
Abstract
1. Introduction
2. Dataset
2.1. Dataset Overview
2.2. Dataset Annotation
3. Methods
3.1. YOLOv8 and Its Network Structure
3.2. Attention Mechanism
3.3. MHSA Mechanism
3.4. Construction of the YOLOv8-MHSA Model
3.5. CA Attention Mechanism
3.6. Construction of the YOLOv8-CA Model
3.7. Loss Function
3.8. Implementation Details
4. Results
4.1. Comparison with the Benchmark YOLOv8 Model
4.2. Comparison with Other Models
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wang, Z.; Yao, J.; Zeng, C.; Li, L.; Tan, C. Students’ Classroom Behavior Detection System Incorporating Deformable DETR with Swin Transformer and Light-Weight Feature Pyramid Network. Systems 2023, 11, 372. [Google Scholar] [CrossRef]
- Messeri, L.; Crockett, M.J. Artificial Intelligence and Illusions of Understanding in Scientific Research. Nature 2024, 627, 49–58. [Google Scholar] [CrossRef]
- Jiang, L.; Lu, X. Analyzing and Optimizing Virtual Reality Classroom Scenarios: A Deep Learning Approach. Trait. Signal 2023, 40, 2553–2563. [Google Scholar] [CrossRef]
- Yang, F. SCB-dataset: A dataset for detecting student classroom behavior. arXiv 2023, arXiv:2304.02488. [Google Scholar] [CrossRef]
- Dey, A.; Anand, A.; Samanta, S.; Sah, B.K.; Biswas, S. Attention-Based AdaptSepCX Network for Effective Student Action Recognition in Online Learning. Procedia Comput. Sci. 2024, 233, 164–174. [Google Scholar] [CrossRef]
- Perkins, C.J. Evidence-Based Classroom Observation Technique: An Interdisciplinary, Structured Approach to Classroom Observation. Nurs. Educ. Perspect. 2024, 45, 120–121. [Google Scholar] [CrossRef]
- Lu, Z.; Nishimura, Y. Telepresence Observation for Kindergarten Classroom Rating: A Pilot Study. IEEE Access 2024, 12, 32181–32191. [Google Scholar] [CrossRef]
- Li, Y.; Qi, X.; Saudagar, A.K.J.; Badshah, A.M.; Muhammad, K.; Liu, S. Student Behavior Recognition for Interaction Detection in the Classroom Environment. Image Vis. Comput. 2023, 136, 104726. [Google Scholar] [CrossRef]
- De Lima, J.Á.; Silva, M.J.T. Resistance to Classroom Observation in the Context of Teacher Evaluation: Teachers’ and Department Heads’ Experiences and Perspectives. Educ. Assess. Eval. Account. 2018, 30, 7–26. [Google Scholar] [CrossRef]
- Zhong, Z.; Guo, H.; Qian, K. Deciphering the impact of machine learning on education: Insights from a bibliometric analysis using bibliometrix R-package. Educ. Inf. Technol. 2024, 29, 16. [Google Scholar] [CrossRef]
- Jin, Z.; Qiu, Y.; Zhang, K.; Li, H.; Luo, W. MB-TaylorFormer V2: Improved Multi-Branch Linear Transformer Expanded by Taylor Formula for Image Restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 5990–6005. [Google Scholar] [CrossRef]
- Wang, T.; Zhang, K.; Shao, Z.; Luo, W.; Stenger, B.; Lu, T.; Kim, T.-K.; Liu, W.; Li, H. GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions. Int. J. Comput. Vis. 2024, 132, 4541–4563. [Google Scholar] [CrossRef]
- Zhang, K.; Li, D.; Luo, W.; Ren, W.; Liu, W. Enhanced Spatio-Temporal Interaction Learning for Video Deraining: A Faster and Better Framework. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 1287–1293. [Google Scholar] [CrossRef] [PubMed]
- Zhang, K.; Li, R.; Yu, Y.; Luo, W.; Li, C. Deep Dense Multi-Scale Network for Snow Removal Using Semantic and Depth Priors. IEEE Trans. Image Process. 2021, 30, 7419–7431. [Google Scholar] [CrossRef]
- Wang, T.; Zhang, K.; Shao, Z.; Luo, W.; Stenger, B.; Kim, T.-K.; Liu, W.; Li, H. LLDiffusion: Learning Degradation Representations in Diffusion Models for Low-Light Image Enhancement. Pattern Recogn. 2025, 166, 111628. [Google Scholar] [CrossRef]
- Holmes, J.; Guy, J.; Kievit, R.A.; Bryant, A.; Mareva, S.; Gathercole, S.E. Cognitive dimensions of learning in children with problems in attention, learning, and memory. J. Educ. Psychol. 2021, 113, 1454–1480. [Google Scholar] [CrossRef]
- Wang, Z.; Wang, M.; Zeng, C.; Li, L. SBD-Net: Incorporating Multi-Level Features for an Efficient Detection Network of Student Behavior in Smart Classrooms. Appl. Sci. 2024, 14, 8357. [Google Scholar] [CrossRef]
- Hou, J.; Xu, Y.; He, W.; Zhong, Y.; Zhao, D.; Zhou, F.; Zhao, M.; Dong, S. A Systematic Review for the Fatigue Driving Behavior Recognition Method. J. Intell. Fuzzy Syst. 2024, 46, 1407–1427. [Google Scholar] [CrossRef]
- Saqlain, M. Revolutionizing Political Education in Pakistan: An AI-Integrated Approach. Educ. Sci. Manag. 2023, 1, 122–131. [Google Scholar] [CrossRef]
- Lohaus, T.; Rogalla, S.; Thoma, P. Use of Technologies in the Therapy of Social Cognition Deficits in Neurological and Mental Diseases: A Systematic Review. Telemed. E-Health 2023, 29, 331–351. [Google Scholar] [CrossRef]
- Tang, L.; Xie, T.; Yang, Y.; Wang, H. Classroom Behavior Detection Based on Improved YOLOv5 Algorithm Combining Multi-Scale Feature Fusion and Attention Mechanism. Appl. Sci. 2022, 12, 6790. [Google Scholar] [CrossRef]
- Mo, J.; Zhu, R.; Yuan, H.; Shou, Z.; Chen, L. Student Behavior Recognition Based on Multitask Learning. Multimed. Tools Appl. 2023, 82, 19091–19108. [Google Scholar] [CrossRef]
- Zong, L.; Fang, J. Deep Visual Computing of Behavioral Characteristics in Complex Scenarios and Embedded Object Recognition Applications. Sensors 2024, 24, 4582. [Google Scholar] [CrossRef]
- Yin Albert, C.C.; Sun, Y.; Li, G.; Peng, J.; Ran, F.; Wang, Z.; Zhou, J. Identifying and Monitoring Students’ Classroom Learning Behavior Based on Multisource Information. Mob. Inf. Syst. 2022, 2022, 9903342. [Google Scholar] [CrossRef]
- Sharma, P.; Joshi, S.; Gautam, S.; Maharjan, S.; Khanal, S.R.; Reis, M.C.; Barroso, J.; De Jesus Filipe, V.M. Student Engagement Detection Using Emotion Analysis, Eye Tracking and Head Movement with Machine Learning. In Technology and Innovation in Learning, Teaching and Education; Reis, A., Barroso, J., Martins, P., Jimoyiannis, A., Huang, R.Y.-M., Henriques, R., Eds.; Communications in Computer and Information Science; Springer Nature: Cham, Switzerland, 2022; Volume 1720, pp. 52–68. ISBN 978-3-031-22917-6. [Google Scholar]
- Delgado, K.; Origgi, J.M.; Hasanpoor, T.; Yu, H.; Allessio, D.; Arroyo, I.; Lee, W.; Betke, M.; Woolf, B.; Bargal, S.A. Student Engagement Dataset. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Montreal, BC, Canada, 11–17 October 2021; pp. 3628–3636. [Google Scholar]
- Gu, C.; Sun, C.; Ross, D.A.; Vondrick, C.; Pantofaru, C.; Li, Y.; Vijayanarasimhan, S.; Toderici, G.; Ricco, S.; Sukthankar, R.; et al. Ava: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6047–6056. [Google Scholar]
- Feichtenhofer, C.; Fan, H.; Malik, J.; He, K. Slowfast Networks for Video Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6202–6211. [Google Scholar]
- Sigurdsson, G.A.; Varol, G.; Wang, X.; Farhadi, A.; Laptev, I.; Gupta, A. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Cham, Switzerland, 2016; pp. 510–526. [Google Scholar]
- Carreira, J.; Zisserman, A. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6299–6308. [Google Scholar]
- Jisi, A.; Yin, S. A New Feature Fusion Network for Student Behavior Recognition in Education. J. Appl. Sci. Eng. 2021, 24, 133–140. [Google Scholar]
- Shi, L.; Di, X. A Recognition Method of Learning Behaviour in English Online Classroom Based on Feature Data Mining. Int. J. Reason.-Based Intell. Syst. 2023, 15, 8–14. [Google Scholar] [CrossRef]
- Li, X.; Ji, Y.; Yang, J.; Li, M. Student Behavior Analysis using YOLOv5 and OpenPose in Smart Classroom Environment. AMIA Annu. Symp. Proc. 2025, 2024, 674–683. [Google Scholar] [PubMed]
- Sheng, X.; Li, S.; Chan, S. Real-time classroom student behavior detection based on improved YOLOv8s. Sci. Rep. 2025, 15, 14470. [Google Scholar] [CrossRef]
- Rashmi, M.; Ashwin, T.; Guddeti, R.M.R. Surveillance Video Analysis for Student Action Recognition and Localization inside Computer Laboratories of a Smart Campus. Multimed. Tools Appl. 2021, 80, 2907–2929. [Google Scholar] [CrossRef]
- Ali, M.Y.; Zhang, X.-D.; Harun-Ar-Rashid, M. Student Activities Detection of SUST Using YOLOv3 on Deep Learning. Indones. J. Electr. Eng. Inform. IJEEI 2020, 8, 757–769. [Google Scholar]
- Zhang, G.; Wang, L.; Wang, L.; Chen, Z. Hand-Raising Gesture Detection in Classroom with Spatial Context Augmentation and Dilated Convolution. Comput. Graph. 2023, 110, 151–161. [Google Scholar] [CrossRef]
- Wang, Z.; Li, L.; Zeng, C.; Yao, J. Student Learning Behavior Recognition Incorporating Data Augmentation with Learning Feature Representation in Smart Classrooms. Sensors 2023, 23, 8190. [Google Scholar] [CrossRef] [PubMed]
- Chen, H.; Zhou, G.; Jiang, H. Student Behavior Detection in the Classroom Based on Improved YOLOv8. Sensors 2023, 23, 8385. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
- Chi, M.T.; Wylie, R. The ICAP framework: Linking cognitive engagement to active learning outcomes. Educ. Psychol. 2014, 49, 219–243. [Google Scholar] [CrossRef]












| Model | Precision | Recall | mAP50 | mAP50-95 |
|---|---|---|---|---|
| YOLOv8 | 0.84 | 0.807 | 0.842 | 0.669 |
| YOLOv8-MHSA | 0.86 | 0.807 | 0.855 | 0.677 |
| YOLOv8-CA | 0.843 | 0.808 | 0.85 | 0.670 |
| Model | Precision | Recall | mAP50 | mAP50-90 |
|---|---|---|---|---|
| YOLOv8-CBAM | 0.817 | 0.803 | 0.841 | 0.671 |
| YOLOv8-SimAM | 0.818 | 0.808 | 0.838 | 0.667 |
| YOLOv8-ECA | 0.839 | 0.801 | 0.84 | 0.664 |
| YOLOv8-SE | 0.816 | 0.816 | 0.843 | 0.666 |
| SBD-Net | 0.853 | 0.801 | 0.845 | 0.674 |
| YOLOv8-MHSA | 0.86 | 0.807 | 0.855 | 0.677 |
| YOLOv8-CA | 0.843 | 0.808 | 0.85 | 0.670 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, J.; Guo, L.; Wang, X. Student Classroom Behavior Recognition Based on YOLOv8 and Attention Mechanism. Information 2025, 16, 934. https://doi.org/10.3390/info16110934
Zhang J, Guo L, Wang X. Student Classroom Behavior Recognition Based on YOLOv8 and Attention Mechanism. Information. 2025; 16(11):934. https://doi.org/10.3390/info16110934
Chicago/Turabian StyleZhang, Jingpu, Lizheng Guo, and Xuyang Wang. 2025. "Student Classroom Behavior Recognition Based on YOLOv8 and Attention Mechanism" Information 16, no. 11: 934. https://doi.org/10.3390/info16110934
APA StyleZhang, J., Guo, L., & Wang, X. (2025). Student Classroom Behavior Recognition Based on YOLOv8 and Attention Mechanism. Information, 16(11), 934. https://doi.org/10.3390/info16110934

