Helpful or Harmful? Comparative Study of Perceived and Actual Effectiveness of LLM-Driven Tutors in Game-Based CFL Learning
Abstract
1. Introduction
2. Literature Review
2.1. The Role of LLM-Driven Tutors in Digital Game-Based Learning Environments
2.2. Effectiveness of LLM-Driven Tutors: Perception vs. Reality
2.2.1. Perceived Effectiveness
2.2.2. Actual Effectiveness
2.3. The Present Study
3. Method
3.1. LLM-Driven Tutor System
3.2. Participants
3.3. Procedure
3.4. Category of Chat Topic
4. Results
4.1. The Perceived Effectiveness of the LLM-Driven Tutor
4.2. The Actual Effectiveness of the LLM-Driven Tutor
4.2.1. Differences in Accuracy Before and After Interacting with the LLM-Driven Tutor
4.2.2. Post-Interaction Cognitive Behavior Patterns
4.3. Relationship Between Perceived and Actual Effectiveness of the LLM-Driven Tutor
4.3.1. Effect of Perceived Effectiveness on Immediate Cognitive Performance
4.3.2. Effect of Perceived Effectiveness of the LLM-Driven Tutor on Long-Term Learning Behavior Patterns
5. Conclusions and Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Afzal, S., Dempsey, B., D’Helon, C., Mukhi, N., Pribic, M., Sickler, A., Strong, P., Vanchiswar, M., & Wilde, L. (2019). The personality of AI systems in education: Experiences with the Watson Tutor, a one-on-one virtual tutoring system. Childhood Education, 95(1), 44–52. [Google Scholar] [CrossRef]
- Ahmed, J., & Mahin, M. M. K. (2025). AI chatbot for solving mathematical problems using large language models and retrieval-augmented generation (RAG) with custom dataset integration. Available online: http://ar.cou.ac.bd:8080/handle/123456789/92 (accessed on 11 June 2025).
- Almulla, M. A. (2024). Investigating influencing factors of learning satisfaction in AI ChatGPT for research: University students’ perspective. Heliyon, 10(11), e32220. [Google Scholar] [CrossRef] [PubMed]
- Bialystok, E. (2001). Bilingualism in development: Language, literacy, and cognition. Cambridge University Press. [Google Scholar] [CrossRef]
- Borchers, C., Carvalho, P. F., Xia, M., Liu, P., Koedinger, K. R., & Aleven, V. (2023, August). What makes problem-solving practice effective? Comparing paper and AI tutoring. In European conference on technology enhanced learning (pp. 44–59). Springer Nature Switzerland. [Google Scholar]
- Chi, M. T. H. (2000). Self-explaining expository texts: The dual processes of generating inferences and repairing mental models. In Advances in instructional psychology (Vol. 5). Routledge. [Google Scholar]
- Chi, M. T. H., Adams, J., Bogusch, E. B., Bruchok, C., Kang, S., Lancaster, M., Levy, R., Li, N., McEldoon, K. L., Stump, G. S., Wylie, R., Xu, D., & Yaghmourian, D. L. (2018). Translating the ICAP theory of cognitive engagement into practice. Cognitive Science, 42(6), 1777–1832. [Google Scholar] [CrossRef] [PubMed]
- Chi, M. T. H., & Wylie, R. (2014). The ICAP framework: Linking cognitive engagement to active learning outcomes. Educational Psychologist, 49(4), 219–243. [Google Scholar] [CrossRef]
- Coffey, H. (2009). Digital game-based learning. Learn NC, 1–3. [Google Scholar]
- Connolly, T. M., Boyle, E. A., MacArthur, E., Hainey, T., & Boyle, J. M. (2012). A systematic literature review of empirical evidence on computer games and serious games. Computers & Education, 59(2), 661–686. [Google Scholar] [CrossRef]
- DaCosta, B. (2025). Generative AI meets adventure: Elevating text-based games for engaging language learning experiences. Open Journal of Social Sciences, 13(4), 601–644. [Google Scholar] [CrossRef]
- Davis, F. D., Bagozzi, R. P., & Warshaw, P. R. (1989). User acceptance of computer technology: A comparison of two theoretical models. Management Science, 35(8), 982–1003. [Google Scholar] [CrossRef]
- Dillard, J. P., & Ha, Y. (2016). Interpreting perceived effectiveness: Understanding and addressing the problem of mean validity. Journal of Health Communication, 21(9), 1016–1022. [Google Scholar] [CrossRef]
- Duenas, T., & Ruiz, D. (2024). The risks of human overreliance on large language models for critical thinking. Available online: https://www.researchgate.net/publication/385743952_The_Risks_Of_Human_Overreliance_On_Large_Language_Models_For_Critical_Thinking (accessed on 11 June 2025).
- Han, J., Yoo, H., Myung, J., Kim, M., Lim, H., Kim, Y., Lee, T. Y., Hong, H., Kim, J., Ahn, S.-Y., & Oh, A. (2024). LLM-as-a-tutor in EFL writing education: Focusing on evaluation of student-LLM interaction. arXiv, arXiv:2310.05191. [Google Scholar]
- Holmes, W., Bialik, M., & Fadel, C. (2019). Artificial intelligence in education: Promises and implications for teaching and learning. The Center for Curriculum Redesign. [Google Scholar]
- Hung, C., Yang, J., & Hwang, G. (2018). A scoping review of digital game-based language learning. Educational Technology & Society, 21(3), 72–87. [Google Scholar]
- Hung, H. T., Chang, J. L., & Yeh, H. C. (2016, July). A review of trends in digital game-based language learning research. In 2016 IEEE 16th international conference on advanced learning technologies (ICALT) (pp. 508–512). IEEE. [Google Scholar]
- Jiang, L. (2014). HSK standard course 3. Beijing Language and Culture University Press. [Google Scholar]
- Ju, Q. (2023). Experimental evidence on negative impact of generative AI on scientific learning outcomes. arXiv, arXiv:2311.05629. [Google Scholar] [CrossRef]
- Keshtkar, F., Rastogi, N., Chalarca, S., & Bukhari, S. A. C. (2024). AI tutor: Student’s perceptions and expectations of AI-driven tutoring systems: A survey-based investigation. The International FLAIRS Conference Proceedings, 37(1). [Google Scholar] [CrossRef]
- Kestin, G., Miller, K., Klales, A., Milbourne, T., & Ponti, G. (2025). AI tutoring outperforms in-class active learning: An RCT introducing a novel research-based design in an authentic educational setting. Scientific Reports, 15, 17458. [Google Scholar] [CrossRef]
- Kong, W., Chen, L., & Zhang, S. (2023). Better zero-shot reasoning with role-play prompting. arXiv, arXiv:2308.07702. [Google Scholar]
- Krashen, S. D. (1985). The input hypothesis: Issues and implications. Longman. [Google Scholar]
- Kumar, H., Musabirov, I., Reza, M., Shi, J., Wang, X., Williams, J. J., Kuzminykh, A., & Liut, M. (2024). Guiding students in using LLMs in supported learning environments: Effects on interaction dynamics, learner performance, confidence, and trust. Proceedings of the ACM on Human-Computer Interaction, 8(CSCW2), 1–30. [Google Scholar] [CrossRef]
- Lin, Y., & Yu, Z. (2023). Extending Technology Acceptance Model to higher-education students’ use of digital academic reading tools on computers. International Journal of Educational Technology in Higher Education, 20, 34. [Google Scholar] [CrossRef]
- Liu, Y., Sahagun, J., & Sun, Y. (2021). An adaptive and interactive educational game platform for English learning enhancement using AI and chatbot techniques. Natural Language Processing, 11(23), 97–106. [Google Scholar] [CrossRef]
- Maurya, K. K., Srivatsa, K. A., Petukhova, K., & Kochmar, E. (2024). Unifying AI tutor evaluation: An evaluation taxonomy for pedagogical ability assessment of LLM-powered AI tutors. arXiv. Available online: https://arxiv.org/abs/2412.09416 (accessed on 7 October 2025). [CrossRef]
- Mayer, R. E. (2009). Multimedia learning (2nd ed.). Cambridge University Press. [Google Scholar] [CrossRef]
- Metcalfe, J. (2017). Learning from errors. Annual Review of Psychology, 68, 465–489. [Google Scholar] [CrossRef]
- Persky, A. M., Lee, E., & Schlesselman, L. S. (2020). Perception of learning versus performance as outcome measures of educational research. American Journal of Pharmaceutical Education, 84(7), 7782. [Google Scholar] [CrossRef] [PubMed]
- Plass, J. L., Homer, B. D., & Kinzer, C. K. (2015). Foundations of Game-Based Learning. Educational Psychologist, 50(4), 258–283. [Google Scholar] [CrossRef]
- Prastiwi, F. D., & Lestari, T. D. (2025). Digital game-based learning in enhancing English vocabulary: A systematic literature review. Jurnal Penelitian Ilmu Pendidikan Indonesia, 4(2), 349–358. [Google Scholar]
- Ruan, S., Jiang, L., Xu, Q., Liu, Z., Davis, G. M., Brunskill, E., & Landay, J. A. (2021, April 14–17). EnglishBot: An AI-powered conversational system for second language learning. 26th International Conference on Intelligent User Interfaces (pp. 434–444), College Station, TX, USA. [Google Scholar] [CrossRef]
- Ruwe, T., & Mayweg-Paus, E. (2024). Embracing LLM Feedback: The role of feedback providers and provider information for feedback effectiveness. Frontiers in Education, 9, 1461362. [Google Scholar] [CrossRef]
- Scherer, R., Siddiq, F., & Tondeur, J. (2019). The Technology Acceptance Model (TAM): A meta-analytic structural equation modeling approach to explaining teachers’ adoption of digital technology in education. Computers & Education, 128, 13–35. [Google Scholar] [CrossRef]
- Schwid, S. R., Tyler, C. M., Scheid, E. A., Weinstein, A., Goodman, A. D., & McDermott, M. P. (2003). Cognitive fatigue during a test requiring sustained attention: A pilot study. Multiple Sclerosis Journal, 9(5), 503–508. [Google Scholar] [CrossRef]
- Shahri, H., Emad, M., Ibrahim, N., Rais, R. N. B., & Al-Fayoumi, Y. (2024, April 24–25). Elevating education through AI tutor: Utilizing GPT-4 for personalized learning. 2024 15th Annual Undergraduate Research Conference on Applied Computing (URC) (pp. 1–5), Dubai, United Arab Emirates. [Google Scholar]
- Sommer, K. L., & Kulkarni, M. (2012). Does constructive performance feedback improve citizenship intentions and job satisfaction? The roles of perceived opportunities for advancement, respect, and mood. Human Resource Development Quarterly, 23(2), 177–201. [Google Scholar] [CrossRef]
- Sweller, J. (1994). Cognitive load theory, learning difficulty, and instructional design. Learning and Instruction, 4(4), 295–312. [Google Scholar] [CrossRef]
- VanLehn, K. (2011). The relative effectiveness of human tutoring, intelligent tutoring systems, and other tutoring systems. Educational Psychologist, 46(4), 197–221. [Google Scholar] [CrossRef]
- Vanzo, A., Chowdhury, S. P., & Sachan, M. (2024). GPT-4 as a homework tutor can improve student engagement and learning outcomes. arXiv, arXiv:2409.15981. [Google Scholar] [CrossRef]
- Vosniadou, S. (2013). Conceptual change in learning and instruction: The framework theory approach. In International handbook of research on conceptual change (2nd ed.). Routledge. [Google Scholar]
- Vygotsky, L. S. (1978). Mind in Society: The development of higher psychological processes. Harvard University Press. [Google Scholar]
- Wang, D. (2013). The use of English as a lingua franca in teaching Chinese as a foreign language: A case study of native Chinese teachers in Beijing. In K. Murata (Ed.), WASEDA studies in ELF communication (pp. 197–211). Springer. [Google Scholar] [CrossRef]
- Waring, M., & Evans, C. (2024). Facilitating students’ development of assessment and feedback skills through critical engagement with generative artificial intelligence. In Research handbook on innovations in assessment and feedback in higher education (pp. 330–354). Edward Elgar Publishing. [Google Scholar]
- Zeng, J., Parks, S., & Shang, J. (2020). To learn scientifically, effectively, and enjoyably: A review of educational games. Human Behavior and Emerging Technologies, 2(2), 186–195. [Google Scholar] [CrossRef]
- Zhai, C., Wibowo, S., & Li, L. D. (2024). The effects of over-reliance on AI dialogue systems on students’ cognitive abilities: A systematic review. Smart Learning Environments, 11(1), 28. [Google Scholar] [CrossRef]
- Zhu, C., Sam, C. H., Wu, Y., & Tang, Y. (2025). WIP: Enhancing game-based learning with AI-driven peer agents. arXiv. Available online: https://arxiv.org/abs/2508.01169 (accessed on 7 October 2025).
- Zou, D., Huang, Y., & Xie, H. (2021). Digital game-based vocabulary learning: Where are we and where are we going? Computer Assisted Language Learning, 34(5–6), 751–777. [Google Scholar] [CrossRef]
- Zou, S., Guo, K., Wang, J., & Liu, Y. (2025). Investigating students’ uptake of teacher- and ChatGPT-generated feedback in EFL writing: A comparison study. Computer Assisted Language Learning, 1–30. [Google Scholar] [CrossRef]



| Group | Behavior | Description |
|---|---|---|
| Highly Effective | Error Correction | Learner submits the correct option for the current question |
| Low Effective | Pass | Learner submits an initial correct answer |
| Low Ineffective | New Errors | Learner submits a different incorrect option for the current question |
| Failure | Learner submits an initial incorrect answer | |
| Highly Ineffective | Repeated Error | Learner submits the same incorrect option for the current question |
| Content Relevance/Effectiveness Level | Highly Effective | Low Effective | Low Ineffective | Highly Ineffective | ||||
|---|---|---|---|---|---|---|---|---|
| mean | std | mean | std | mean | std | mean | std | |
| Content-relevant (N = 39) | 0.04 | 0.07 | 0.15 | 0.23 | 0.76 | 0.32 | 0.002 | 0.01 |
| Content-irrelevant (N = 43) | 0.09 | 0.07 | 0.33 | 0.29 | 0.57 | 0.34 | 0.008 | 0.17 |
| t | 2.828 * | 3.115 ** | −2.588 * | 1.750 | ||||
| Cohen’s d | 0.71 | 0.68 | −0.57 | 0.05 | ||||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fang, L.; Tang, G.; Zhang, L. Helpful or Harmful? Comparative Study of Perceived and Actual Effectiveness of LLM-Driven Tutors in Game-Based CFL Learning. Educ. Sci. 2025, 15, 1502. https://doi.org/10.3390/educsci15111502
Fang L, Tang G, Zhang L. Helpful or Harmful? Comparative Study of Perceived and Actual Effectiveness of LLM-Driven Tutors in Game-Based CFL Learning. Education Sciences. 2025; 15(11):1502. https://doi.org/10.3390/educsci15111502
Chicago/Turabian StyleFang, Lu, Ge Tang, and Lu Zhang. 2025. "Helpful or Harmful? Comparative Study of Perceived and Actual Effectiveness of LLM-Driven Tutors in Game-Based CFL Learning" Education Sciences 15, no. 11: 1502. https://doi.org/10.3390/educsci15111502
APA StyleFang, L., Tang, G., & Zhang, L. (2025). Helpful or Harmful? Comparative Study of Perceived and Actual Effectiveness of LLM-Driven Tutors in Game-Based CFL Learning. Education Sciences, 15(11), 1502. https://doi.org/10.3390/educsci15111502

