Processing Written Language in Video Games: An Eye-Tracking Study on Subtitled Instructions
Highlights
- Relevant subtitles attracted more visual attention than irrelevant ones.
- Action-related words had a higher number and longer fixations than other word categories.
- The absence of audio increased reliance on subtitles.
- Findings offer a basis for future research on text processing in interactive multimedia.
Abstract
1. Introduction
1.1. Subtitles in Video Games and Digital Literacy
1.2. Theoretical Frameworks in Multimodal Processing
2. Methods
2.1. Materials and Design
- Communication Decision Model: Communication decisions were determined by comparing GP and TA locations and headings against a set of logic-based rules. These policies were derived from prior research, which investigated optimal movement dynamics and operator instructions in herding tasks [43,44,45,46].
- Communication Generator: Audio was recorded of each confederate reading text passages aloud for three minutes. The recordings were used to train voice clone models (ElevenLabs Inc., New York, U.S 2023), which were optimized to match each confederate’s gender and accent, producing near-indistinguishable vocal tonality. A large language model (LLM), GPT-4-0125-Preview (OpenAI, California, U.S., 2023), was used to generate the text-based instructions from the communication decision model output. “Function calls” were employed to restrict LLM instructions to those relevant to the task and to minimize LLM hallucinations. These constraints and the output of the LLM function calls were derived from previous research [44]. Finally, the assigned confederate’s voice clone model converted this text into speech.
2.2. Participants
2.3. Procedure
2.4. Measures
2.5. Analysis
3. Results
3.1. Behavioral Performance
3.2. Visual Attention Allocation on Directional Information
3.3. Subtitle-Level Analysis: Modality and Relevancy
3.4. Word-Level Analysis: Modality and Word Categories
4. Discussion
4.1. Gaming Performance and General Processing
4.2. Subtitle-Level Processing
4.3. Word-Level Processing
5. Implications for Future Research in Digital Literacy
6. Limitations
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Player Profiles: Gaming and Subtitle Experience
Appendix A.1. Gaming Experience
Appendix A.2. Subtitle Experience
Appendix B. Summary of G/LMM Models and Major Impact Factors
| Dependent Variables | Final Models | Summary of Major Factors Significantly Impact the Dependent Variables (p < 0.05) | Reference Level | Observations | Marginal/ Conditional | 
|---|---|---|---|---|---|
| Contained number | Modalities + (1|Subjects) | None | Modality: Auditory only | 177 | 0.006/0.285 | 
| Herding efficiency | Modalities + (1|Subjects) | None | Modality: Auditory only | 177 | 0.011/0.144 | 
| Containment efficiency | Modalities + (1|Subjects) | None | Modality: Auditory only | 177 | 0.026/0.130 | 
| Subtitle-level processing | |||||
| Total Fixation Counts | Modalities ∗ Relevancy + (1|Subjects) | Visual–auditory (); Irrelevant () | (1) Modality: Visual only(2) Relevancy: Relevant | 384 | 0.250/0.751 | 
| Dwell Time Percentage | Modalities ∗ Relevancy + (1|Subjects) | Visual–auditory (); Irrelevant () | (1) Modality: Visual only(2) Relevancy: Relevant | 384 | 0.213/0.741 | 
| Subtitle Skipping Rate | Modalities ∗ Relevancy + (1|Subjects) | None | (1) Modality: Visual only(2) Relevancy: Relevant | 384 | 0.087/0.290 | 
| Mean Fixation Duration | Modalities ∗ Relevancy + (1|Subjects) | Irrelevant () | (1) Modality: Visual only(2) Relevancy: Relevant | 375 | 0.031/0.545 | 
| Word-level processing | |||||
| Word Skipping Rate | Modalities ∗ Word Category + (1|Subjects) | Visual-auditory (); for interaction effects see Table 1 and Table 2 | (1) Modality: Visual only(2) Word categories: Directional Words(3) Modality ∗ Word categories: Directional Words ∗ Visual only | 29,118 | 0.017/0.165 | 
| Fixation Counts | Modalities ∗ Word Category + (1|Subjects) | Visual–auditory (); for interaction effects, see Table 1 and Table 2 | (1) Modality: Visual only(2) Word categories: Directional Words(3) Modality ∗ Word categories: Directional Words ∗ Visual only | 648 | 0.072/0.364 | 
| Mean Fixation Duration | Modalities ∗ Word Category + (1|Subjects) | None | (1) Modality: Visual only(2) Word categories: Directional Words(3) Modality ∗ Word categories: Directional Words ∗ Visua only | 648 | 0.007/0.297 | 
Appendix C. Subtitle-Level Analysis: Sanity Check

References
- Lo, N.P.K. Revolutionising language teaching and learning via digital media innovations. In Learning Environment and Design: Current and Future Impacts; Springer: Singapore, 2020; pp. 245–261. [Google Scholar]
- Chik, A. Digital gameplay for autonomous foreign language learning: Gamers’ and language teachers’ perspectives. In Digital Games in Language Learning and Teaching; Springer: London, UK, 2012; pp. 95–114. [Google Scholar]
- Wang, X.; Hamat, A.B.; Shi, N.L. Designing a pedagogical framework for mobile-assisted language learning. Heliyon 2024, 10, e28102. [Google Scholar] [CrossRef]
- De Witte, B.; Reynaert, V.; Hutain, J.; Kieken, D.; Jabbour, J.; Possik, J. Immersive learning of factual knowledge while assessing the influence of cognitive load and spatial abilities. Comput. Educ. X Real. 2024, 5, 100085. [Google Scholar] [CrossRef]
- Gee, J.P. What video games have to teach us about learning and literacy. Comput. Entertain. (CIE) 2003, 1, 20. [Google Scholar] [CrossRef]
- Chik, A. Digital gaming and language learning: Autonomy and community. Lang. Learn. Technol. 2014, 18, 85–100. [Google Scholar] [CrossRef]
- Nash, B.L.; Brady, R.B. Video games in the secondary English language arts classroom: A state-of-the-art review of the literature. Read. Res. Q. 2022, 57, 957–981. [Google Scholar] [CrossRef]
- Green, C.S. Video games and cognitive skills. In Video Games; Routledge: London, UK, 2018; pp. 25–43. [Google Scholar]
- Tarchi, C.; Zaccoletti, S.; Mason, L. Learning from text, video, or subtitles: A comparative analysis. Comput. Educ. 2021, 160, 104034. [Google Scholar] [CrossRef]
- Peters, E.; Heynen, E.; Puimège, E. Learning vocabulary through audiovisual input: The differential effect of L1 subtitles and captions. System 2016, 63, 134–148. [Google Scholar] [CrossRef]
- Dizon, G.; Thanyawatpokin, B. Language learning with Netflix: Exploring the effects of dual subtitles on vocabulary learning and listening comprehension. Comput. Assist. Lang. Learn. Electron. J. 2021, 22, 52–65. [Google Scholar]
- Bisson, M.J.; Van Heuven, W.J.; Conklin, K.; Tunney, R.J. Processing of native and foreign language subtitles in films: An eye tracking study. Appl. Psycholinguist. 2014, 35, 399–418. [Google Scholar] [CrossRef]
- De Linde, Z.; Kay, N. Processing subtitles and film images: Hearing vs deaf viewers. Translator 1999, 5, 45–60. [Google Scholar] [CrossRef]
- Perego, E.; Del Missier, F.; Porta, M.; Mosconi, M. The cognitive effectiveness of subtitle processing. Media Psychol. 2010, 13, 243–272. [Google Scholar] [CrossRef]
- Szarkowska, A.; Gerber-Morón, O. Two or three lines: A mixed-methods study on subtitle processing and preferences. Perspectives 2019, 27, 144–164. [Google Scholar] [CrossRef]
- Szarkowska, A.; Ragni, V.; Szkriba, S.; Black, S.; Orrego-Carmona, D.; Kruger, J.L. Watching subtitled videos with the sound off affects viewers’ comprehension, cognitive load, immersion, enjoyment, and gaze patterns: A mixed-methods eye-tracking study. PLoS ONE 2024, 19, e0306251. [Google Scholar] [CrossRef] [PubMed]
- Kruger, J.L.; Wisniewska, N.; Liao, S. Why subtitle speed matters: Evidence from word skipping and rereading. Appl. Psycholinguist. 2022, 43, 211–236. [Google Scholar] [CrossRef]
- Liao, S.; Yu, L.; Reichle, E.D.; Kruger, J.L. Using eye movements to study the reading of subtitles in video. Sci. Stud. Read. 2021, 25, 417–435. [Google Scholar] [CrossRef]
- Ryan, M.L. From narrative games to playable stories: Toward a poetics of interactive narrative. StoryWorlds J. Narrat. Stud. 2009, 1, 43–59. [Google Scholar] [CrossRef]
- Gee, J.P. Learning by design: Games as learning machines. Interact. Educ. Multimed. IEM 2004, 8, 15–23. [Google Scholar]
- Wildfeuer, J.; Stamenković, D. The discourse structure of video games: A multimodal discourse semantics approach to game tutorials. Lang. Commun. 2022, 82, 28–51. [Google Scholar] [CrossRef]
- Bowman, N.D. The demanding nature of video game play. In Video Games; Routledge: London, UK, 2018; pp. 1–24. [Google Scholar]
- von Gillern, S. Perceptual, decision-making, and learning processes during video gameplay: An analysis of Infamous-Second Son with the Gamer Response and Decision (GRAD) framework. In Proceedings of the Games and Learning Society 2016 Conference Proceedings, Madison, WI, USA, 16 August 2016. [Google Scholar]
- De Bruycker, W.; d’Ydewalle, G. Reading native and foreign language television subtitles in children and adults. In The Mind’s Eye; Elsevier: Amsterdam, The Netherlands, 2003; pp. 671–684. [Google Scholar]
- Frumuselu, A.D.; De Maeyer, S.; Donche, V.; Plana, M.d.M.G.C. Television series inside the EFL classroom: Bridging the gap between teaching and learning informal language through subtitles. Linguist. Educ. 2015, 32, 107–117. [Google Scholar] [CrossRef]
- Paivio, A. Mental Representations: A Dual Coding Approach; Oxford University Press: Oxford, UK, 1990. [Google Scholar]
- Baddeley, A.D.; Hitch, G.J. Working Memory. In The Psychology of Learning and Motivation: Advances in Research and Theory; Bower, G.H., Ed.; Academic Press: New York, NY, USA, 1974; Volume 8, pp. 47–89. [Google Scholar]
- Baddeley, A.D.; Hitch, G.; Allen, R. A multicomponent model of working memory. In Working Memory: State of the Science; Oxford University Press: Oxford, UK, 2021; pp. 10–43. [Google Scholar]
- Mayer, R.E. Multimedia learning. In Psychology of Learning and Motivation; Elsevier: Amsterdam, The Netherlands, 2002; Volume 41, pp. 85–139. [Google Scholar]
- Mayer, R.E. The Cambridge Handbook of Multimedia Learning; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
- Schnotz, W. An integrated model of text and picture comprehension. In The Cambridge Handbook of Multimedia Learning; Cambridge University Press: Cambridge, UK, 2005; Volume 49, p. 69. [Google Scholar]
- Loschky, L.C.; Larson, A.M.; Smith, T.J.; Magliano, J.P. The scene perception & event comprehension theory (SPECT) applied to visual narratives. Top. Cogn. Sci. 2020, 12, 311–351. [Google Scholar]
- Cohn, N. The architecture of visual narrative comprehension: The interaction of narrative structure and page layout in understanding comics. Front. Psychol. 2014, 5, 680. [Google Scholar] [CrossRef]
- Cohn, N. Your brain on comics: A cognitive model of visual narrative comprehension. Top. Cogn. Sci. 2020, 12, 352–386. [Google Scholar] [CrossRef]
- Liao, S.; Yu, L.; Kruger, J.L.; Reichle, E.D. The impact of audio on the reading of intralingual versus interlingual subtitles: Evidence from eye movements. Appl. Psycholinguist. 2022, 43, 237–269. [Google Scholar] [CrossRef]
- Reichle, E.D.; Pollatsek, A.; Rayner, K. E–Z Reader: A cognitive-control, serial-attention model of eye-movement behavior during reading. Cogn. Syst. Res. 2006, 7, 4–22. [Google Scholar] [CrossRef]
- Britt, M.A.; Durik, A.; Rouet, J.F. Reading contexts, goals, and decisions: Text comprehension as a situated activity. Discourse Processes 2022, 59, 361–378. [Google Scholar] [CrossRef]
- Rouet, J.F.; Britt, M.A.; Durik, A.M. RESOLV: Readers’ representation of reading contexts and tasks. Educ. Psychol. 2017, 52, 200–215. [Google Scholar] [CrossRef]
- Gillern, S.V. The Gamer Response and Decision Framework: A Tool for Understanding Video Gameplay Experiences. Simul. Gaming 2016, 47, 666–683. [Google Scholar] [CrossRef]
- von Gillern, S.; Stufft, C. Multimodality, learning and decision-making: Children’s metacognitive reflections on their engagement with video games as interactive texts. Literacy 2023, 57, 3–16. [Google Scholar] [CrossRef]
- Miyake, A.; Friedman, N.P.; Emerson, M.J.; Witzki, A.H.; Howerter, A.; Wager, T.D. The unity and diversity of executive functions and their contributions to complex “frontal lobe” tasks: A latent variable analysis. Cogn. Psychol. 2000, 41, 49–100. [Google Scholar] [CrossRef]
- Miyake, A.; Friedman, N.P. The nature and organization of individual differences in executive functions: Four general conclusions. Curr. Dir. Psychol. Sci. 2012, 21, 8–14. [Google Scholar] [CrossRef]
- Prants, M.J.; Simpson, J.; Nalepka, P.; Kallen, R.W.; Dras, M.; Reichle, E.D.; Hosking, S.; Best, C.J.; Richardson, M.J. The structure of team search behaviors with varying access to information. In Proceedings of the Annual Meeting of the Cognitive Science Society, Vienna, Austria, 26–29 July 2021; Volume 43. [Google Scholar]
- Simpson, J.; Nalepka, P.; Kallen, R.W.; Dras, M.; Reichle, E.D.; Hosking, S.G.; Best, C.; Richards, D.; Richardson, M.J. Conversation dynamics in a multiplayer video game with knowledge asymmetry. Front. Psychol. 2022, 13, 1039431. [Google Scholar] [CrossRef] [PubMed]
- Nalepka, P.; Prants, M.; Stening, H.; Simpson, J.; Kallen, R.W.; Dras, M.; Reichle, E.D.; Hosking, S.G.; Best, C.; Richardson, M.J. Assessing team effectiveness by how players structure their search in a first-person multiplayer video game. Cogn. Sci. 2022, 46, e13204. [Google Scholar] [CrossRef] [PubMed]
- Simpson, J.; Stening, H.; Nalepka, P.; Dras, M.; Reichle, E.D.; Hosking, S.; Best, C.J.; Richards, D.; Richardson, M.J. DesertWoZ: A Wizard of Oz environment to support the design of collaborative conversational agents. In Proceedings of the Companion Publication of the 2022 Conference on Computer Supported Cooperative Work and Social Computing, Virtual, 8–22 November 2022; pp. 188–192. [Google Scholar]
- Kruger, J.L.; Steyn, F. Subtitles and eye tracking: Reading and performance. Read. Res. Q. 2014, 49, 105–120. [Google Scholar] [CrossRef]
- Mangiron, C. Reception of game subtitles: An empirical study. Translator 2016, 22, 72–93. [Google Scholar] [CrossRef]
- Baayen, R.H.; Davidson, D.J.; Bates, D.M. Mixed-effects modeling with crossed random effects for subjects and items. J. Mem. Lang. 2008, 59, 390–412. [Google Scholar] [CrossRef]
- Faraway, J.J. Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models; Chapman and Hall/CRC: Boca Raton, FL, USA, 2016. [Google Scholar]
- Nakagawa, S.; Schielzeth, H. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods Ecol. Evol. 2013, 4, 133–142. [Google Scholar] [CrossRef]
- Bates, D.; Mächler, M.; Bolker, B.; Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 2015, 67, 1–48. [Google Scholar] [CrossRef]
- Kuznetsova, A.; Brockhoff, P.B.; Christensen, R.H. lmerTest package: Tests in linear mixed effects models. J. Stat. Softw. 2017, 82, 1–26. [Google Scholar] [CrossRef]
- Powell, M.J. The BOBYQA Algorithm for Bound Constrained Optimization Without Derivatives; Cambridge NA Report NA2009/06; University of Cambridge: Cambridge, UK, 2009; Volume 26, pp. 26–46. [Google Scholar]
- Nagle, C. An Introduction to fitting and evaluating mixed-effects models in R. Pronunciation Second. Lang. Learn. Teach. Proc. 2018, 10, 82–105. [Google Scholar]
- Barr, D.J. Random effects structure for testing interactions in linear mixed-effects models. Front. Psychol. 2013, 4, 328. [Google Scholar] [CrossRef]
- Lenth, R.; Lenth, M.R. Package ‘lsmeans’. Am. Stat. 2018, 34, 216–221. [Google Scholar]
- León, J.A.; Moreno, J.D.; Escudero, I.; Olmos, R.; Ruiz, M.; Lorch, R.F., Jr. Specific relevance instructions promote selective reading strategies: Evidences from eye tracking and oral summaries. J. Res. Read. 2019, 42, 432–453. [Google Scholar] [CrossRef]
- Rayner, K. Eye movements and attention in reading, scene perception, and visual search. Q. J. Exp. Psychol. 2009, 62, 1457–1506. [Google Scholar] [CrossRef]
- Fitzsimmons, G.; Weal, M.; Drieghe, D. Skim Reading: An Adaptive Strategy for Reading on the Web. In Proceedings of the 2014 ACM Conference on Web Science (WebSci’14); ACM: New York, NY, USA, 2014; pp. 211–219. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, C.; Zhou, K.; Nie, J.; Zhang, M.; Ma, S. From skimming to reading: A two-stage examination model for web search. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, Shanghai, China, 3–7 November 2014; pp. 849–858. [Google Scholar]
- Yang, Q.; de Melo, G.; Cheng, Y.; Wang, S. HiText: Text reading with dynamic salience marking. In Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, 3–7 April 2017; pp. 311–319. [Google Scholar]
- Fitzsimmons, G.; Jayes, L.T.; Weal, M.J.; Drieghe, D. The impact of skim reading and navigation when reading hyperlinks on the web. PLoS ONE 2020, 15, e0239134. [Google Scholar] [CrossRef]





| Eye-Movement Measures | Category | Contrast | Estimate | SE | z/t Ratio | p-Value | 
|---|---|---|---|---|---|---|
| Word Skipping Rates | Directional Words | Visual only/Visual–auditory | −0.28 | 0.06 | −4.52 | <0.0001 | 
| Names | Visual only/Visual–auditory | −0.59 | 0.07 | −8.81 | <0.0001 | |
| Target N | Visual only/Visual–auditory | −0.39 | 0.07 | −5.70 | <0.0001 | |
| Fixation Counts | Directional Words | Visual only/Visual–auditory | 0.22 | 0.08 | 2.97 | 0.00 | 
| Names | Visual only/Visual–auditory | 0.65 | 0.14 | 4.49 | <0.0001 | |
| Target N | Visual only/Visual–auditory | 0.56 | 0.14 | 3.96 | 0.00 | 
| Measure | Modality | Contrast | Estimate | SE | z/t Ratio | p-Value | 
|---|---|---|---|---|---|---|
| Word Skipping Rates | Visual only modality | Directional Words–Names | −0.03 | 0.06 | −0.53 | 0.59 | 
| Directional Words–Target N | 0.20 | 0.06 | 3.31 | 0.00 | ||
| Names–Target N | 0.23 | 0.06 | 3.81 | 0.00 | ||
| Visual–auditory modality | Directional Words–Names | −0.34 | 0.07 | −4.92 | <0.0001 | |
| Directional Words–Target N | 0.09 | 0.07 | 1.37 | 0.17 | ||
| Names–Target N | 0.44 | 0.07 | 5.98 | <0.0001 | ||
| Fixation Counts | Visual only modality | Directional Words–Names | −0.06 | 0.12 | −0.51 | 0.61 | 
| Directional Words–Target N | 0.24 | 0.11 | 2.11 | 0.04 | ||
| Names–Target N | 0.30 | 0.14 | 2.08 | 0.04 | ||
| Visual–auditory modality | Directional Words–Names | 0.37 | 0.12 | 3.15 | 0.00 | |
| Directional Words–Target N | 0.58 | 0.11 | 5.07 | <0.0001 | ||
| Names–Target N | 0.21 | 0.14 | 1.49 | 0.14 | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lan, H.; Liao, S.; Kruger, J.-L.; Richardson, M.J. Processing Written Language in Video Games: An Eye-Tracking Study on Subtitled Instructions. J. Eye Mov. Res. 2025, 18, 44. https://doi.org/10.3390/jemr18050044
Lan H, Liao S, Kruger J-L, Richardson MJ. Processing Written Language in Video Games: An Eye-Tracking Study on Subtitled Instructions. Journal of Eye Movement Research. 2025; 18(5):44. https://doi.org/10.3390/jemr18050044
Chicago/Turabian StyleLan, Haiting, Sixin Liao, Jan-Louis Kruger, and Michael J. Richardson. 2025. "Processing Written Language in Video Games: An Eye-Tracking Study on Subtitled Instructions" Journal of Eye Movement Research 18, no. 5: 44. https://doi.org/10.3390/jemr18050044
APA StyleLan, H., Liao, S., Kruger, J.-L., & Richardson, M. J. (2025). Processing Written Language in Video Games: An Eye-Tracking Study on Subtitled Instructions. Journal of Eye Movement Research, 18(5), 44. https://doi.org/10.3390/jemr18050044
 
         
                                                

 
       