Research on Eye-Tracking Control Methods Based on an Improved YOLOv11 Model
Abstract
1. Introduction
2. Method
2.1. Dataset Construction and Analysis
2.2. Improving the YOLOv11 Model
2.2.1. EFFM Module
2.2.2. ORC Module
2.3. Eye-Tracking Control Method Based on an Improved Model
2.3.1. Eye Movement Area Discrimination
2.3.2. Eye Movement Area Discrimination Method Combined with Frame Voting Mechanism
2.3.3. Eye Movement Encoding and Command Design
2.4. Set up an Experimental Platform
2.4.1. Physical Implementation of Platform Movement
2.4.2. Implementation of the Platform’s Data Transmission Protocol
3. Experimental Results and Analysis
3.1. Human Eye Socket and Iris Recognition Experiment
3.1.1. Experimental Environment and Parameter Settings
3.1.2. Model Comparison Experiment
3.1.3. EFFM Module Comparison Experiment
3.1.4. Ablation Experiment
3.1.5. Comparison of Experiments Before and After Model Improvement
3.2. Eye Movement Discrimination and Coding Method Experiment
3.2.1. Eye Movement Direction Discrimination Accuracy Experiment
3.2.2. Eye Movement Coding Matching Experiment
3.3. Eye-Controlled Robotic Arm Experiment
3.3.1. Human–Machine Motion Coordination Experiment
3.3.2. Eye-Controlled Robotic Arm Grasping Experiment
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ren, F.; Bao, Y. A review on human-computer interaction and intelligent robots. Int. J. Inf. Technol. Decis. Mak. 2020, 19, 5–47. [Google Scholar] [CrossRef]
- Lazar, J.; Feng, J.H.; Hochheiser, H. Research Methods in Human-Computer Interaction. Morgan Kaufmann: Burlington, MA, USA, 2017. [Google Scholar]
- Velt, R.; Benford, S.; Reeves, S. Translations and boundaries in the gap between HCI theory and design practice. ACM Trans. Comput.-Hum. Interact. (TOCHI) 2020, 27, 29. [Google Scholar] [CrossRef]
- Chignell, M.; Wang, L.; Zare, A.; Li, J. The evolution of HCI and human factors: Integrating human and artificial intelligence. ACM Trans. Comput.-Hum. Interact. 2023, 30, 17. [Google Scholar] [CrossRef]
- Kosch, T.; Karolus, J.; Zagermann, J.; Reiterer, H.; Schmidt, A.; Woźniak, P.W. A survey on measuring cognitive workload in human-computer interaction. ACM Comput. Surv. 2023, 55, 283. [Google Scholar] [CrossRef]
- Hamdani, R.; Chihi, I. Adaptive human-computer interaction for industry 5.0: A novel concept, with comprehensive review and empirical validation. Comput. Ind. 2025, 168, 104268. [Google Scholar] [CrossRef]
- Zhu, H.; Wang, X.; Jiang, Y.; Chang, S.; Wang, X. Secure voice interactions with smart devices. IEEE Trans. Mob. Comput. 2021, 22, 515–526. [Google Scholar] [CrossRef]
- Zhang, R.; Hummelgård, M.; Örtegren, J.; Olsen, M.; Andersson, H.; Yang, Y.; Olin, H.; Wang, Z.L. Utilising the triboelectricity of the human body for human-computer interactions. Nano Energy 2022, 100, 107503. [Google Scholar] [CrossRef]
- Guo, W.; Zhao, Z.; Zhou, Z.; Fang, Y.; Yu, Y.; Sheng, X. Hand kinematics, high-density sEMG comprising forearm and far-field potentials for motion intent recognition. Sci. Data 2025, 12, 445. [Google Scholar] [CrossRef] [PubMed]
- Tang, X.; Shen, H.; Zhao, S.; Li, N.; Liu, J. Flexible brain–computer interfaces. Nat. Electron. 2023, 6, 109–118. [Google Scholar] [CrossRef]
- Sun, Y.; Li, Y.; Chen, Y.; Yang, C.; Sun, J.; Liang, L.; Chen, X.; Gao, X. Efficient dual-frequency SSVEP brain-computer interface system exploiting interocular visual resource disparities. Expert Syst. Appl. 2024, 252, 124144. [Google Scholar] [CrossRef]
- Metzger, S.L.; Littlejohn, K.T.; Silva, A.B.; Moses, D.A.; Seaton, M.P.; Wang, R.; Dougherty, M.E.; Liu, J.R.; Wu, P.; Berger, M.A.; et al. A high-performance neuroprosthesis for speech decoding and avatar control. Nature 2023, 620, 1037–1046. [Google Scholar] [CrossRef]
- Wang, Z.; Gao, C.; Wu, Z.; Conde, M.V.; Timofte, R.; Liu, S.-C.; Chen, Q.; Zha, Z.-J.; Zhai, W.; Han, H.; et al. Event-based eye tracking. AIS 2024 challenge survey. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, DC, USA, 17–18 June 2024; pp. 5810–5825. [Google Scholar]
- König, A.; Thomas, U.; Bremmer, F.; Dowiasch, S. Quantitative comparison of a mobile, tablet-based eye-tracker and two stationary, video-based eye-trackers. Behav. Res. Methods 2025, 57, 45. [Google Scholar] [CrossRef] [PubMed]
- Zhao, J.; Chrysler, B.; Kostuk, R.K. Design of a high-resolution holographic waveguide eye-tracking system operating in near-infrared with conventional optical elements. Opt. Express 2021, 29, 24536–24551. [Google Scholar] [CrossRef]
- Wanluk, N.; Visitsattapongse, S.; Juhong, A.; Pintavirooj, C. Smart wheelchair based on eye tracking. In Proceedings of the 2016 9th Biomedical Engineering International Conference (BMEiCON), Laung Prabang, Laos, 7–9 December 2016; IEEE: New York, NY, USA, 2016; pp. 1–4. [Google Scholar]
- Niu, Y.F.; Chen, R.; Wang, Y.Y.; Yao, X.-Y.; Feng, Y. Research on click enhancement strategy of hand-eye dual-channel human-computer interaction system: Trade-off between sensing area and area cursor. Adv. Eng. Inform. 2024, 62, 102880. [Google Scholar] [CrossRef]
- Wang, Y.Y.; Tian, J.Z.; Xiao, L.; He, J.-X.; Niu, Y.-F. Research on a spatial–temporal characterisation of blink-triggered eye control interactions. Adv. Eng. Inform. 2024, 59, 102297. [Google Scholar] [CrossRef]
- Ya-Feng, N.; Jia-Yin, H.; Jin, L. Magilock: A reliable control triggering method in multi-channel eye-control systems. Front. Hum. Neurosci. 2024, 18, 1365838. [Google Scholar] [CrossRef]
- Bozomitu, R.G.; Păsărică, A.; Tărniceriu, D.; Rotariu, C. Development of an eye tracking-based human-computer interface for real-time applications. Sensors 2019, 19, 3630. [Google Scholar] [CrossRef]
- Chen, H.; Zendehdel, N.; Leu, M.C.; Yin, Z. A gaze-driven manufacturing assembly assistant system with integrated step recognition, repetition analysis, and real-time feedback. Eng. Appl. Artif. Intell. 2025, 144, 110076. [Google Scholar] [CrossRef]
- Zeng, L.; Sun, B.; Zhu, D. Underwater target detection based on Faster R-CNN and adversarial occlusion network. Eng. Appl. Artif. Intell. 2021, 100, 104190. [Google Scholar] [CrossRef]
- Wu, S. Expression recognition method using improved VGG16 network model in robot interaction. J. Robot. 2021, 2021, 9326695. [Google Scholar] [CrossRef]
- Sun, X.; Wu, P.; Hoi, S.C.H. Face detection using deep learning: An improved faster RCNN approach. Neurocomputing 2018, 299, 42–50. [Google Scholar] [CrossRef]
- Gong, L.; Huang, X.; Chao, Y.; Chen, J.; Lei, B. An enhanced SSD with feature cross-reinforcement for small-object detection. Appl. Intell. 2023, 53, 19449–19465. [Google Scholar] [CrossRef]
- Wang, H.; Liu, C.; Cai, Y.; Chen, L.; Li, Y. YOLOv8-QSD: An improved small object detection algorithm for autonomous vehicles based on YOLOv8. IEEE Trans. Instrum. Meas. 2024, 73, 2513916. [Google Scholar] [CrossRef]
- Nguyen, D.L.; Putro, M.D.; Jo, K.H. Lightweight CNN-based driver eye status surveillance for smart vehicles. IEEE Trans. Ind. Inform. 2023, 20, 3154–3162. [Google Scholar] [CrossRef]
- Liu, T.; Hu, M.; Ma, S.; Xiao, Y.; Liu, Y.; Song, W. Exploring the effectiveness of gesture interaction in driver assistance systems via virtual reality. IEEE/CAA J. Autom. Sin. 2022, 9, 1520–1523. [Google Scholar] [CrossRef]
- Fredj, H.B.; Sghair, S.; Souani, C. An efficient parallel implementation of face detection system using CUDA. In Proceedings of the 2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Sousse, Tunisia, 2–5 September 2020; IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
- Xu, B.; Gao, B.; Li, Y. Improved small object detection algorithm based on YOLOv5. IEEE Intell. Syst. 2024, 39, 57–65. [Google Scholar] [CrossRef]
- Yi, H.; Liu, B.; Zhao, B.; Liu, E. Small object detection algorithm based on improved YOLOv8 for remote sensing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 17, 1734–1747. [Google Scholar] [CrossRef]
- Jain, S.; Sreedevi, I. Robust Detection of Iris Region Using an Adapted SSD Framework. In Computer Vision Applications, Proceedings of the 13th World Congress of Veterinary Anaesthesiology (WCVA) 2018, Venice, Italy, 25–29 September 2018; Springer: Singapore, 2018; pp. 51–64. [Google Scholar]
- Ruiz-Beltrán, C.A.; Romero-Garcés, A.; González, M.; Pedraza, A.S.; Rodríguez-Fernández, J.A.; Bandera, A. Real-time embedded eye detection system. Expert Syst. Appl. 2022, 194, 116505. [Google Scholar] [CrossRef]
- Nsaif, A.K.; Ali, S.H.M.; Jassim, K.N.; Nseaf, A.K.; Sulaiman, R.; Al-Qaraghuli, A.; Wahdan, O.; Nayan, N.A. FRCNN-GNB: Cascade faster R-CNN with Gabor filters and naïve bayes for enhanced eye detection. IEEE Access 2021, 9, 15708–15719. [Google Scholar] [CrossRef]
- Chen, J.; Yu, P.; Yao, C.; Zhao, L.-P.; Qiao, Y.-Y. Eye detection and coarse localization of pupil for video-based eye tracking systems. Expert Syst. Appl. 2024, 236, 121316. [Google Scholar] [CrossRef]
- Saleem, N.; Elmannai, H.; Bourouis, S.; Trigui, A. Squeeze-and-excitation 3D convolutional attention recurrent network for end-to-end speech emotion recognition. Appl. Soft Comput. 2024, 161, 111735. [Google Scholar] [CrossRef]
- Li, C.; Du, D.; Zhang, L.; Wen, L.; Lou, T.; Wu, Y.; Zhu, P. Spatial attention pyramid network for unsupervised domain adaptation. In Proceedings of the European Conference on Computer Vision, Malmö, Sweden, 29 September–4 October 2024; Springer International Publishing: Cham, Switzerland, 2020; pp. 481–497. [Google Scholar]
- Yu, X.; Yu, Z.; Ramalingam, S. Learning strict identity mappings in deep residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Lake City, UT, USA, 18–23 June 2018; pp. 4432–4440. [Google Scholar]
- Yang, X.; Zhao, Z.; Yuan, K.; Xiao, C.; Luo, Y. Mobile recognition system for multinational currencies based on CA-DSC-RepVGG algorithm. J. Supercomput. 2025, 81, 433. [Google Scholar] [CrossRef]
- Shi, W.; Zhang, R.; Xuan, K.; Wang, C.; Fu, H.; Zhao, L.; Li, J.; Han, Z. A self-iterative learning with dual attention mechanism fusion method for pumpkin seed quality equipment. Eng. Appl. Artif. Intell. 2024, 138, 109446. [Google Scholar] [CrossRef]
- Xu, L.; Dong, S.; Wei, H.; Ren, Q.; Huang, J.; Liu, J. Defect signal intelligent recognition of weld radiographs based on YOLO V5-IMPROVEMENT. J. Manuf. Process. 2023, 99, 373–381. [Google Scholar] [CrossRef]
- Wang, C.; Han, Q.; Zhang, T.; Li, C.; Sun, X. Litchi picking points localization in natural environment based on the Litchi-YOSO model and branch morphology reconstruction algorithm. Comput. Electron. Agric. 2024, 226, 109473. [Google Scholar] [CrossRef]
Discrimination Criteria | Eye Movement Direction |
---|---|
left direction | |
right direction | |
left direction | |
right direction | |
gaze | |
gaze |
Serial Number | Robotic Arm Movements | Eye Movement Control Binary Encoding Instructions |
---|---|---|
1 | upward shift | 0001 |
2 | downward shift | 0010 |
3 | left shift | 0100 |
4 | right shift | 1000 |
5 | forward shift | 0011 |
6 | backward shift | 1100 |
7 | grab | 0101 |
8 | loosen | 1010 |
9 | start | 1111 |
10 | confirmation | 0000 |
M528 | M529 | M530 | M531 | M532 | M533 | M534 | M535 |
---|---|---|---|---|---|---|---|
starting check digit | Write/read data | Get current location | Coil I/O | Coil I/O | Coil I/O | Coil I/O | End check digit |
M531~M534 | Robotic Arm Movements | Offset Direction Setting |
---|---|---|
0001 | upward shift | Positive direction of the Z-axis |
0010 | downward shift | negative direction of the Z axis |
0100 | left shift | negative direction of the X axis |
1000 | right shift | Positive direction of the X-axis |
0011 | forward shift | Positive direction of the Y-axis |
1100 | backward shift | negative direction of the Y axis |
0101 | grab | - Irrespective of the direction of offset |
1010 | loosen | Irrespective of the direction of offset |
Experimental Environment | Version Model |
---|---|
operating system | Windows 10 Professional Edition |
CPU | 12th Gen Intel Core i7-12700F |
GPU | NVIDIA GeForce GTX 3060 |
NVIDIA driver | 522.25 |
CUDA | 10.2 |
compiler | PyCharm2022.1.1 |
compiled language | Python3.9.19 |
Deep learning framework | Pytorch2.0.1 |
Algorithm | Precision | Recall | mAP@0.5 |
---|---|---|---|
SSD | 0.57 | 0.513 | 0.455 |
Faster R-CNN | 0.562 | 0.528 | 0.522 |
YOLO10-n | 0.908 | 0.431 | 0.489 |
YOLOv11-n | 0.853 | 0.645 | 0.719 |
YOLOv11-n+CA | 0.897 | 0.650 | 0.745 |
YOLOv11-n+SE | 0.908 | 0.637 | 0.760 |
YOLOv11-n+BiFormer | 0.906 | 0.862 | 0.907 |
Our | 0.911 | 0.892 | 0.920 |
Algorithm Model | Precision | Recall | mAP@0.5 |
---|---|---|---|
YOLOv11+CA | 0.897 | 0.650 | 0.745 |
YOLOv11+SE | 0.908 | 0.637 | 0.760 |
YOLOv11+BiFormer | 0.906 | 0.862 | 0.907 |
YOLOv11+EFFM | 0.915 | 0.844 | 0.910 |
Algorithm | Category | Precision | Recall | AP (Average Accuracy) |
---|---|---|---|---|
eye socket | 0.901 | 0.869 | 0.927 | |
Baseline | iris | 0.806 | 0.421 | 0.511 |
average | 0.853 | 0.645 | 0.719 | |
eye socket | 0.929 | 0.93 | 0.973 | |
Experiment 1 | iris | 0.901 | 0.758 | 0.848 |
average | 0.915 | 0.844 | 0.910 | |
eye socket | 0.925 | 0.845 | 0.928 | |
Experiment 2 | iris | 0.852 | 0.470 | 0.591 |
average | 0.888 | 0.657 | 0.759 | |
eye socket | 0.918 | 0.924 | 0.970 | |
Experiment 3 | iris | 0.905 | 0.861 | 0.871 |
average | 0.911 | 0.892 | 0.920 |
Model | Parameters | Gradients | GFLOPs |
---|---|---|---|
Baseline | 2,590,230 | 2,590,241 | 6.4 |
Experiment 1 | 2,623,254 | 2,623,238 | 6.5 |
Experiment 2 | 2,485,174 | 2,485,158 | 6.2 |
Experiment 3 | 2,516,278 | 2,516,262 | 6.3 |
Number of Completions | Triangular Cone | Rectangular Prism | Cylinder |
---|---|---|---|
maximum | 46 | 50 | 50 |
minimum | 36 | 47 | 45 |
average | 39 | 49 | 48 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, X.; Wu, J.; Zhang, W.; Chen, X.; Mei, H. Research on Eye-Tracking Control Methods Based on an Improved YOLOv11 Model. Sensors 2025, 25, 6236. https://doi.org/10.3390/s25196236
Sun X, Wu J, Zhang W, Chen X, Mei H. Research on Eye-Tracking Control Methods Based on an Improved YOLOv11 Model. Sensors. 2025; 25(19):6236. https://doi.org/10.3390/s25196236
Chicago/Turabian StyleSun, Xiangyang, Jiahua Wu, Wenjun Zhang, Xianwei Chen, and Haixia Mei. 2025. "Research on Eye-Tracking Control Methods Based on an Improved YOLOv11 Model" Sensors 25, no. 19: 6236. https://doi.org/10.3390/s25196236
APA StyleSun, X., Wu, J., Zhang, W., Chen, X., & Mei, H. (2025). Research on Eye-Tracking Control Methods Based on an Improved YOLOv11 Model. Sensors, 25(19), 6236. https://doi.org/10.3390/s25196236