Natural Language Processing as a Scalable Method for Evaluating Educational Text Personalization by LLMs
Abstract
1. Introduction
1.1. Personalized Learning
1.2. Text Personalization Evaluation Using Natural Language Processing
1.3. Current Research
2. Materials and Methods
2.1. LLM Selection and Implementation Details
2.2. Text Corpus
2.3. Descriptions of Reader Profiles
2.4. Procedure
3. Results
3.1. Main Effect of Reader Profile on Variations in Linguistic Features of Modified Texts
3.2. Main Effect of LLMs
3.3. Main Effect of Text Types
3.4. Interaction Effect Reader × Text Genre
4. Discussion
4.1. Texts Adapted for Different Reader Profiles
4.2. LLMs Generated Outputs with Unique Linguistic Patterns
4.3. Linguistic Differences in Adapted Texts from Science Versus History Domain
4.4. Implications
4.5. Limitations and Future Directions
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| PK | Prior Knowledge |
| RS | Reading Skills |
| GenAI | Generative AI |
| LLM | Large Language Model |
| AFS | Adaptive Feedback System |
| RAG | Retrieval-Augmented Generation |
| NLP | Natural Language Processing |
| iSTART | Interactive Strategy Training for Active Reading and Thinking |
| FKGL | Flesch–Kincaid Grade Level |
| WAT | Writing Analytics Tool |
Appendix A. LLM Descriptions
| Model | Claude 3.5 Sonnet | Llama 3.1 | ChatGPT 4 | Gemini Pro 1.5 |
|---|---|---|---|---|
| Owned by | Anthropic | Meta | OpenAI | Google DeepMind |
| Training Size | Anthropic does not disclose the exact number but training datasets are large, publicly available, and curated online sources. | 2 trillion tokens sourced from publicly available datasets (i.e., books, websites, and other digital content) | 1.8 trillion tokens from diverse sources, including books, web pages, academic papers, and large text corpora | 1.5 trillion tokens, sourced from a wide variety of publicly available and curated data, including text from books, websites, and other large corpora |
| Number of Parameters | Somewhere between 70 and 100 billion parameters | 70 billion parameters | Not publicly disclosed but has approximately 175 billion parameters. | 100 billion parameters |
Appendix B. Reader Profile Descriptions
| Descriptions of High and Low Knowledge Reader in Science | Descriptions of High and Low Knowledge Reader in History | |
|---|---|---|
| Reader 1 (High RS/High PK *) | Age: 25 Educational level: Senior Major: Chemistry (Pre-med) ACT English composite score: 32/36 (performance is in the 96th percentile) ACT Reading composite score: 32/36 (performance is in the 96th percentile) ACT Math composite score: 28/36 (performance is in the 89th percentile) ACT Science composite score: 30/36 (performance is in the 94th percentile) Science background: Completed eight required biology, physics, and chemistry college-level courses (comprehensive academic background in the sciences, covering advanced topics in biology, chemistry, and physics, well-prepared for higher-level scientific learning and analysis) Reading goal: Understand scientific concepts and principles | Age: 25 Educational level: Senior Major: History and Archeology ACT English: 32/36 (96th percentile) ACT Reading: 33/36 (97th percentile) AP History score: 5 out of 5 History background: Completed 4 years of college-level courses in U.S. and World History (extensive training in historical analysis, primary source evaluation, and historiography) Reading goal: Understand key historical events and their relevance to society |
| Reader 2 (High RS/Low PK *) | Age: 20 Educational level: Sophomore Major: Psychology ACT English composite score: 32/36 (performance is in the 96th percentile) ACT Reading composite score: 31/36 (performance is in the 94th percentile) ACT Math composite score: 18/36 (performance is in the 42nd percentile) ACT Science composite score: 19/36 (performance is in the 46th percentile) Science background: Completed one high-school-level chemistry course (no advanced science course). Limited exposure and understanding of scientific concepts Interests/Favorite subjects: arts, literature Reading goal: Understand scientific concepts and principles | Age: 21 Educational level: Junior Major: Biology ACT English: 32/36 (96th percentile) ACT Reading: 31/36 (94th percentile) AP History score: 2 out of 5 History background: Completed general education high school history; no college-level history courses. Limited interest and knowledge of historical events Interests/Favorite subjects: arts, literature Reading goal: Understand key historical events and their relevance to society |
| Reader 3 (Low RS/High PK *) | Age: 20 Educational level: Sophomore Major: Health Science ACT English composite score: 19/36 (performance is in the 44th percentile) ACT Reading composite score: 20/36 (performance is in the 47th percentile) ACT Math composite score: 32/36 (performance is in the 97th percentile) ACT Science composite score: 30/36 (performance is in the 94th percentile) Science background: Completed one physics, one astronomy, and two college-level biology courses (substantial prior knowledge in science, having completed multiple college-level courses across several disciplines, strong foundation in scientific principles and concepts) Reading goal: Understand scientific concepts Reading disability: Dyslexia | Age: 22 Educational level: Junior Major: History ACT English: 19/36 (44th percentile) ACT Reading: 20/36 (47th percentile) AP History score: 5 out of 5 History background: Completed 3 years of college-level history courses (specializing in U.S. history and early modern Europe) Reading goal: Understand key historical events and their relevance to society Reading disability: Dyslexia |
| Reader 4 (Low RS/Low PK *) | Age: 18 Educational level: Freshman Major: Marketing ACT English composite score: 17/36 (performance is in the 33rd percentile) ACT Reading composite score: 18/36 (performance is in the 36th percentile) ACT Math composite score: 19/36 (performance is in the 48th percentile) ACT Science composite score: 17/36 (performance is in the 34th percentile) Science background: Completed one high-school-level biology course (no advanced science course) Limited exposure and understanding of scientific concepts Reading goal: Understand scientific concepts | Age: 18 Educational level: Freshman Major: Finance ACT English: 18/36 (35th percentile) ACT Reading: 17/36 (32nd percentile) AP History: 1 out of 5 History background: Only completed basic U.S. History in high school; little engagement or interest in history topics Reading goal: Understand key historical events and their relevance to society |
Appendix C. Prompt Used
| Components | Augmented Prompt |
|---|---|
| Personification | Imagine you are a cognitive scientist specializing in reading comprehension and learning science |
| Task objectives |
|
| Chain-of-thought | Explain the rationale behind each modification approach and how each change helps the reader grasp the scientific concepts and retain information |
| RAG | Refer to the attached pdf files. Apply these empirical findings and theoretical frameworks from these files as guidelines to tailor text
|
| Reader profile | [Insert Reader Profile Description from Appendix B] |
| Text input | [Insert Text] |
Appendix D. Quality Assessment Rubric
- Readability: Does the text use an appropriate reading level and sentence length for the intended audience? Examine whether the syntax, vocabulary, and tone were accessible to readers of varying skill and knowledge level.
- Structure and Organization: Is information presented in a clear, logical sequence that supports understanding? The structure was evaluated for cohesion across paragraphs, the use of headings and transitions, and the ease with which readers could follow the text.
- Cohesion and Flow: Are titles, headings, and subheadings used effectively to guide readers through the material?
- Language Use: Is the language appropriate for the reader’s background knowledge and interests? Assess the formality, specificity, and engagement level of the text.
- Engagement: Does the writing capture and sustain the reader’s interest? Whether the content, tone, and examples made the reading experience enjoyable and relatable.
- Reader Perspective: Reviewers also considered the question, “If I were the student, would this text hold my attention and make me want to keep reading?”
- Clarity and Precision: Are key concepts explained clearly, concisely, and without ambiguity? Consider whether definitions, explanations, and descriptions were phrased in a way that minimizes confusion.
- Depth and Rigor: Does the level of detail match the reader’s background knowledge? Does each adaptation provide sufficient technical detail and conceptual depth without overwhelming the intended audience?
- Use of Examples: Rate for the examples’ relevance and explanatory value. High-quality examples should concretely reinforce abstract ideas and help the reader connect new information to prior knowledge.
References
- Lee, J.S. InstructPatentGPT: Training patent language models to follow instructions with human feedback. Artif. Intell. Law 2024, 33, 739–782. [Google Scholar] [CrossRef]
- Cherian, A.; Peng, K.C.; Lohit, S.; Matthiesen, J.; Smith, K.; Tenenbaum, J. Evaluating large vision-and-language models on children’s mathematical olympiads. Adv. Neural Inf. Process. Syst. 2024, 37, 15779–15800. [Google Scholar]
- Liu, D.; Hu, X.; Xiao, C.; Bai, J.; Barandouzi, Z.A.; Lee, S.; Lin, Y. Evaluation of large language models in tailoring educational content for cancer survivors and their caregivers: Quality analysis. JMIR Cancer 2025, 11, e67914. [Google Scholar] [CrossRef]
- Krause, S.; Stolzenburg, F. Commonsense reasoning and explainable artificial intelligence using large language models. In Proceedings of the European Conference on Artificial Intelligence; Springer: Cham, Switzerland, 2023; pp. 302–319. [Google Scholar]
- Lin, C.-Y. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop; Association for Computational Linguistics: Barcelona, Spain, 2004; pp. 74–81. [Google Scholar]
- Lee, C.; Porfirio, D.; Wang, X.J.; Zhao, K.; Mutlu, B. VeriPlan: Integrating formal verification and LLMs into end-user planning. arXiv 2025, arXiv:2502.17898. [Google Scholar]
- Pesovski, I.; Santos, R.; Henriques, R.; Trajkovik, V. Generative AI for customizable learning experiences. Sustainability 2024, 16, 3034. [Google Scholar] [CrossRef]
- Laak, K.-J.; Aru, J. AI and personalized learning: Bridging the gap with modern educational goals. arXiv 2024, arXiv:2404.02798. [Google Scholar] [CrossRef]
- Park, M.; Kim, S.; Lee, S.; Kwon, S.; Kim, K. Empowering personalized learning through a conversation-based tutoring system with student modeling. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems; ACM: New York, NY, USA, 2024; pp. 1–10. [Google Scholar]
- Wen, Q.; Liang, J.; Sierra, C.; Luckin, R.; Tong, R.; Liu, Z.; Cui, P.; Tang, J. AI for education (AI4EDU): Advancing personalized education with LLM and adaptive learning. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2024; pp. 6743–6744. [Google Scholar]
- Pane, J.F.; Steiner, E.D.; Baird, M.D.; Hamilton, L.S.; Pane, J.D. Informing Progress: Insights on Personalized Learning Implementation and Effects; Res. Rep. RR-2042-BMGF; Rand Corp.: Santa Monica, CA, USA, 2017. [Google Scholar]
- Bernacki, M.L.; Greene, M.J.; Lobczowski, N.G. A systematic review of research on personalized learning: Personalized by whom, to what, how, and for what purpose(s)? Educ. Psychol. Rev. 2021, 33, 1675–1715. [Google Scholar] [CrossRef]
- Kaswan, K.S.; Dhatterwal, J.S.; Ojha, R.P. AI in personalized learning. In Advances in Technological Innovations in Higher Education; CRC Press: Boca Raton, FL, USA, 2024; pp. 103–117. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Hardaker, G.; Glenn, L.E. Artificial intelligence for personalized learning: A systematic literature review. Int. J. Inf. Learn. Technol. 2025, 42, 1–14. [Google Scholar] [CrossRef]
- Ma, X.; Mishra, S.; Liu, A.; Su, S.Y.; Chen, J.; Kulkarni, C.; Cheng, H.T.; Le, Q.; Chi, E. Beyond chatbots: ExploreLLM for structured thoughts and personalized model responses. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems; ACM: New York, NY, USA, 2024; pp. 1–12. [Google Scholar]
- Ng, C.; Fung, Y. Educational personalized learning path planning with large language models. arXiv 2024, arXiv:2407.11773. [Google Scholar] [CrossRef]
- Zhang, Y.; Xu, X.; Zhang, M.; Cai, N.; Lei, V.N.L. Personal learning environments and personalized learning in the education field: Challenges and future trends. In Applied Degree Education and the Shape of Things to Come; Springer Nature Singapore: Singapore, 2023; pp. 231–247. [Google Scholar]
- Lyu, W.; Wang, Y.; Chung, T.; Sun, Y.; Zhang, Y. Evaluating the effectiveness of LLMs in introductory computer science education: A semester-long field study. In Proceedings of the 11th ACM Conference on Learning Scale; ACM: New York, NY, USA, 2024; pp. 63–74. [Google Scholar]
- Létourneau, A.; Deslandes Martineau, M.; Charland, P.; Karran, J.A.; Boasen, J.; Léger, P.M. A systematic review of AI-driven intelligent tutoring systems (ITS) in K–12 education. npj Sci. Learn. 2025, 10, 29. [Google Scholar] [CrossRef]
- Xiao, R.; Hou, X.; Ye, R.; Kazemitabaar, M.; Diana, N.; Liut, M.; Stamper, J. Improving student–AI interaction through pedagogical prompting: An example in computer science education. arXiv 2025, arXiv:2506.19107. [Google Scholar]
- Pane, J.F.; Steiner, E.D.; Baird, M.D.; Hamilton, L.S. Continued Progress: Promising Evidence on Personalized Learning; Rand Corporation: Santa Monica, CA, USA, 2015. [Google Scholar]
- Cuéllar, Ó.; Contero, M.; Hincapié, M. Personalized and timely feedback in online education: Enhancing learning with deep learning and large language models. Multimodal Technol. Interact. 2025, 9, 45. [Google Scholar] [CrossRef]
- Lim, L.; Bannert, M.; van der Graaf, J.; Singh, S.; Fan, Y.; Surendrannair, S.; Gašević, D. Effects of real-time analytics-based personalized scaffolds on students’ self-regulated learning. Comput. Hum. Behav. 2023, 139, 107547. [Google Scholar] [CrossRef]
- Chen, C.H.; Law, V.; Huang, K. Adaptive scaffolding and engagement in digital game-based learning. Educ. Technol. Res. Dev. 2023, 71, 1785–1798. [Google Scholar] [CrossRef]
- Major, L.; Francis, G.A.; Tsapali, M. The effectiveness of technology-supported personalised learning in low- and middle-income countries: A meta-analysis. Br. J. Educ. Technol. 2021, 52, 1935–1964. [Google Scholar] [CrossRef]
- Hooshyar, D.; Weng, X.; Sillat, P.J.; Tammets, K.; Wang, M.; Hämäläinen, R. The effectiveness of personalized technology-enhanced learning in higher education: A meta-analysis with association rule mining. Comput. Educ. 2024, 223, 105169. [Google Scholar] [CrossRef]
- Sibley, L.; Fabian, A.; Plicht, C.; Pagano, L.; Ehrhardt, N.; Wellert, L.; Lachner, A. Adaptive teaching with technology enhances lasting learning. Learn. Instr. 2025, 99, 101863. [Google Scholar] [CrossRef]
- Jian, M.J.K.O. Personalized learning through AI. Adv. Eng. Innov. 2023, 5, 16–19. [Google Scholar] [CrossRef]
- Pratama, M.P.; Sampelolo, R.; Lura, H. Revolutionizing education: Harnessing the power of artificial intelligence for personalized learning. Klasikal J. Educ. Lang. Teach. Sci. 2023, 5, 350–357. [Google Scholar] [CrossRef]
- Martínez, P.; Moreno, L.; Ramos, A. Exploring large language models to generate easy-to-read content. Front. Comput. Sci. 2024, 6, 1394705. [Google Scholar] [CrossRef]
- Ozuru, Y.; Dempsey, K.; McNamara, D.S. Prior knowledge, reading skill, and text cohesion in the comprehension of science texts. Learn. Instr. 2009, 19, 228–242. [Google Scholar] [CrossRef]
- Follmer, D.J.; Sperling, R.A. Interactions between reader and text: Contributions of cognitive processes, strategy use, and text cohesion to comprehension of expository science text. Learn. Individ. Differ. 2018, 67, 177–187. [Google Scholar] [CrossRef]
- O’Reilly, T.; McNamara, D.S. Reversing the reverse cohesion effect: Good texts can be better for strategic, high-knowledge readers. Discourse Process. 2007, 43, 121–152. [Google Scholar] [CrossRef]
- Van den Broek, P.; Bohn-Gettelmann, S.; Kendeou, P.; White, M.J. Reading comprehension and the comprehension-monitoring activities of children with learning disabilities: A review using the dual-process theory of reading. Educ. Psychol. Rev. 2015, 27, 641–644. [Google Scholar]
- Frantz, R.S.; Starr, L.E.; Bailey, A.L. Syntactic complexity as an aspect of text complexity. Educ. Res. 2015, 44, 387–393. [Google Scholar] [CrossRef]
- McNamara, D.S.; Ozuru, Y.; Floyd, R.G. Comprehension challenges in the fourth grade: The roles of text cohesion, text genre, and readers’ prior knowledge. Int. Electron. J. Elem. Educ. 2011, 4, 229–257. [Google Scholar]
- Sharma, S.; Mittal, P.; Kumar, M.; Bhardwaj, V. The role of large language models in personalized learning: A systematic review of educational impact. Discov. Sustain. 2025, 6, 1–24. [Google Scholar] [CrossRef]
- du Boulay, B.; Poulovassilis, A.; Holmes, W.; Mavrikis, M. What does the research say about how artificial intelligence and big data can close the achievement gap? In Enhancing Learning and Teaching with Technology: What the Research Says; Luckin, R., Ed.; Routledge: London, UK, 2018; pp. 256–285. [Google Scholar]
- Kucirkova, N. Personalised learning with digital technologies at home and school: Where is children’s agency? In Mobile Technologies in Children’s Language and Literacy; Oakley, G., Ed.; Emerald Publishing: Bingley, UK, 2018; pp. 133–153. [Google Scholar]
- Mesmer, H.A.E. Tools for Matching Readers to Texts: Research-Based Practices, 1st ed.; Guilford Press: New York, NY, USA, 2008; pp. 1–234. [Google Scholar]
- Lennon, C.; Burdick, H. The Lexile Framework as an Approach for Reading Measurement and Success . Available online: https://metametricsinc.com/wp-content/uploads/2017/07/The-Lexile-Framework-for-Reading.pdf (accessed on 12 June 2025).
- Stenner, A.J.; Burdick, H.; Sanford, E.E.; Burdick, D.S. How accurate are Lexile text measures. J. Appl. Meas. 2007, 8, 307–322. [Google Scholar]
- Wang, S.; Xu, T.; Li, H.; Zhang, C.; Liang, J.; Tang, J.; Yu, P.S.; Wen, Q. Large language models for education: A survey and outlook. arXiv 2024, arXiv:2403.18105. [Google Scholar] [CrossRef]
- Merino-Campos, C. The impact of artificial intelligence on personalized learning in higher education: A systematic review. Trends High. Educ. 2025, 4, 17. [Google Scholar] [CrossRef]
- Yan, L.; Sha, L.; Zhao, L.; Li, Y.; Martinez-Maldonado, R.; Chen, G.; Li, X.; Jin, Y.; Gašević, D. Practical and ethical challenges of large language models in education: A systematic scoping review. Br. J. Educ. Technol. 2024, 55, 90–112. [Google Scholar] [CrossRef]
- Jacobsen, L.J.; Weber, K.E. The promises and pitfalls of large language models as feedback providers: A study of prompt engineering and the quality of AI-driven feedback. AI 2025, 6, 35. [Google Scholar] [CrossRef]
- Murtaza, M.; Ahmed, Y.; Shamsi, J.A.; Sherwani, F.; Usman, M. AI-based personalized e-learning systems: Issues, challenges, and solutions. IEEE Access 2022, 10, 81323–81342. [Google Scholar] [CrossRef]
- Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K.Q.; Artzi, Y. BERTScore: Evaluating text generation with BERT. In Proceedings of the International Conference on Representation Learning (ICLR), Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Basham, J.D.; Hall, T.E.; Carter, R.A., Jr.; Stahl, W.M. An operationalized understanding of personalized learning. J. Spec. Educ. Technol. 2016, 31, 126–135. [Google Scholar] [CrossRef]
- Bray, B.; McClaskey, K. A step-by-step guide to personalize learning. Learn. Lead. Technol. 2013, 40, 12–19. [Google Scholar]
- Banerjee, S.; Lavie, A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, MI, USA, 29 June 2005; pp. 65–72. [Google Scholar]
- Novikova, J.; Dušek, O.; Curry, A.C.; Rieser, V. Why we need new evaluation metrics for NLG. arXiv 2017, arXiv:1707.06875. [Google Scholar] [CrossRef]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics; Association for Computational Linguistics; Philadelphia, PA, USA, 2002; pp. 311–318. [Google Scholar]
- Crossley, S.; Salsbury, T.; McNamara, D. Measuring L2 lexical growth using hypernymic relationships. Lang. Learn. 2009, 59, 307–334. [Google Scholar] [CrossRef]
- Tetzlaff, L.; Schmiedek, F.; Brod, G. Developing personalized education: A dynamic framework. Educ. Psychol. Rev. 2021, 33, 863–882. [Google Scholar] [CrossRef]
- Huynh, L.; McNamara, D.S. GenAI-powered text personalization: Natural language processing validation of adaptation capabilities. Appl. Sci. 2025, 15, 6791. [Google Scholar] [CrossRef]
- Tran, H.; Yao, Z.; Li, L.; Yu, H. ReadCtrl: Personalizing text generation with readability-controlled instruction learning. arXiv 2024, arXiv:2406.09205. [Google Scholar]
- Cromley, J.G.; Snyder-Hogan, L.E.; Luciw-Dubas, U.A. Reading comprehension of scientific text: A domain-specific test of the direct and inferential mediation model of reading comprehension. J. Educ. Psychol. 2010, 102, 687–700. [Google Scholar] [CrossRef]
- Potter, A.; Shortt, M.; Goldshtein, M.; Roscoe, R.D. Assessing academic language in tenth-grade essays using natural language processing. Assess. Writ. 2025; in press. [Google Scholar]
- Crossley, S.A. Developing linguistic constructs of text readability using natural language processing. Sci. Stud. Read. 2025, 29, 138–160. [Google Scholar] [CrossRef]
- Crossley, S.A.; Skalicky, S.; Dascalu, M.; McNamara, D.S.; Kyle, K. Predicting text comprehension, processing, and familiarity in adult readers: New approaches to readability formulas. Discourse Process. 2017, 54, 340–359. [Google Scholar] [CrossRef]
- Smith, R.; Snow, P.; Serry, T.; Hammond, L. The role of background knowledge in reading comprehension: A critical review. Read. Psychol. 2021, 42, 214–240. [Google Scholar] [CrossRef]
- Staples, S.; Egbert, J.; Biber, D.; Gray, B. Academic Writing Development at the University Level: Phrasal and Clausal Complexity across Level of Study, Discipline, and Genre. Writ. Commun. 2016, 33, 149–183. [Google Scholar] [CrossRef]
- Fang, Z.; Schleppegrell, M.J. Reading in Secondary Content Areas: A Language-Based Pedagogy; University of Michigan Press: Ann Arbor, MI, USA, 2008. [Google Scholar]
- Halliday, M.A.K.; Martin, J.R. Writing Science: Literacy and Discursive Power; Routledge: London, UK, 2003. [Google Scholar]
- Schleppegrell, M.J.; Achugar, M.; Oteíza, T. The grammar of history: Enhancing content-based instruction through a functional focus on language. TESOL Q. 2004, 38, 67–93. [Google Scholar] [CrossRef]
- Biber, D.; Gray, B.; Poonpon, K. Lexical density and structural elaboration in academic writing over time: A multidimensional corpus analysis. J. English Acad. Purp. 2021, 50, 100968. [Google Scholar]
- Graesser, A.C.; McNamara, D.S.; Kulikowich, J.M. Coh-Metrix: Providing multilevel analyses of text characteristics. Educ. Res. 2011, 40, 223–234. [Google Scholar] [CrossRef]
- Nagy, W.E.; Townsend, D. Words as tools: Learning academic vocabulary as language acquisition. Read. Res. Q. 2012, 47, 91–108. [Google Scholar] [CrossRef]
- Biber, D.; Gray, B. Nominalizing the verb phrase in academic science writing. In The Verb Phrase in English: Investigating Recent Language Change with Corpora; Aarts, B., Close, J., Leech, G., Wallis, S., Eds.; Cambridge University Press: Cambridge, UK, 2013; pp. 99–132. [Google Scholar] [CrossRef]
- Dong, J.; Wang, H.; Buckingham, L. Mapping out the disciplinary variation of syntactic complexity in student academic writing. System 2023, 113, 102974. [Google Scholar] [CrossRef]
- Fang, Z. The language demands of science reading in middle school. Int. J. Sci. Educ. 2006, 28, 491–520. [Google Scholar] [CrossRef]
- Grever, M.; Van der Vlies, T. Why national narratives are perpetuated: A literature review on new insights from history textbook research. London Rev. Educ. 2017, 15, 155–173. [Google Scholar] [CrossRef]
- Huijgen, T.; Van Boxtel, C.; Van de Grift, W.; Holthuis, P. Toward historical perspective taking: Students’ reasoning when contextualizing the actions of people in the past. Theory Res. Soc. Educ. 2017, 45, 110–144. [Google Scholar] [CrossRef]
- Duran, N.D.; McCarthy, P.M.; Graesser, A.C.; McNamara, D.S. Using temporal cohesion to predict temporal coherence in narrative and expository texts. Behav. Res. Methods 2007, 39, 212–223. [Google Scholar] [CrossRef] [PubMed]
- Van Drie, J.; Van Boxtel, C. Historical reasoning: Towards a framework for analyzing students’ reasoning about the past. Educ. Psychol. Rev. 2008, 20, 87–110. [Google Scholar] [CrossRef]
- Wineburg, S.S.; Martin, D.; Monte-Sano, C. Reading Like a Historian: Teaching Literacy in Middle and High School History Classrooms; Teachers College Press: New York, NY, USA, 2012. [Google Scholar]
- Shanahan, T.; Shanahan, C. Teaching disciplinary literacy to adolescents: Rethinking content-area literacy. Harv. Educ. Rev. 2008, 78, 40–59. [Google Scholar] [CrossRef]
- Blevins, B.; Magill, K.; Salinas, C. Critical historical inquiry: The intersection of ideological clarity and pedagogical content knowledge. J. Soc. Stud. Res. 2020, 44, 35–50. [Google Scholar] [CrossRef]
- Biber, D.; Conrad, S.; Cortes, V. If you look at…: Lexical bundles in university teaching and textbooks. Appl. Linguist. 2004, 25, 371–405. [Google Scholar] [CrossRef]
- Hyland, K. As can be seen: Lexical bundles and disciplinary variation. English Spec. Purp. 2008, 27, 4–21. [Google Scholar] [CrossRef]
- Malvern, D.; Richards, B.; Chipere, N.; Durán, P. Lexical Diversity and Language Development; Palgrave Macmillan UK: London, UK, 2004; pp. 16–30. [Google Scholar]
- McCarthy, P.M.; Jarvis, S. MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behav. Res. Methods 2010, 42, 381–392. [Google Scholar] [CrossRef]
- Cain, K.; Oakhill, J.V.; Barnes, M.A.; Bryant, P.E. Comprehension skill, inference-making ability, and their relation to knowledge. Mem. Cognit. 2001, 29, 850–859. [Google Scholar] [CrossRef]
- Magliano, J.P.; Millis, K.K.; RSAT Development Team; Levinstein, I.; Boonthum, C. Assessing comprehension during reading with the Reading Strategy Assessment Tool (RSAT). Metacogn. Learn. 2011, 6, 131–154. [Google Scholar] [CrossRef] [PubMed]
- Cruz Neri, N.; Guill, K.; Retelsdorf, J. Language in science performance: Do good readers perform better? Eur. J. Psychol. Educ. 2021, 36, 45–61. [Google Scholar] [CrossRef]
- McNamara, D.S. Reading both high-coherence and low-coherence texts: Effects of text sequence and prior knowledge. Can. J. Exp. Psychol. 2001, 55, 51–62. [Google Scholar] [CrossRef] [PubMed]
- Pickren, S.E.; Stacy, M.; Del Tufo, S.N.; Spencer, M.; Cutting, L.E. The contribution of text characteristics to reading comprehension: Investigating the influence of text emotionality. Read. Res. Q. 2022, 57, 649–667. [Google Scholar] [CrossRef]
- Chen, L.; Zaharia, M.; Zou, J. How is ChatGPT’s behavior changing over time? arXiv 2023, arXiv:2307.09009. [Google Scholar] [CrossRef]
- Luo, Z.; Xie, Q.; Ananiadou, S. Factual consistency evaluation of summarization in the era of large language models. Expert Syst. Appl. 2024, 254, 124456. [Google Scholar] [CrossRef]
- Liu, Y.; Cong, T.; Zhao, Z.; Backes, M.; Shen, Y.; Zhang, Y. Robustness over time: Understanding adversarial examples’ effectiveness on longitudinal versions of large language models. arXiv 2023, arXiv:2308.07847. [Google Scholar] [CrossRef]
- Rosenfeld, A.; Lazebnik, T. Whose LLM is it anyway? Linguistic comparison and LLM attribution for GPT-3.5, GPT-4, and Bard. arXiv 2024, arXiv:2402.14533. [Google Scholar]
- Arner, T.; McCarthy, K.S.; McNamara, D.S. iSTART StairStepper—Using comprehension strategy training to game the test. Computers 2021, 10, 48. [Google Scholar] [CrossRef]
- Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Lawrence Erlbaum Associates: Hillsdale, NJ, USA, 1988. [Google Scholar]
- Viera, R.T. Syntactic complexity in journal research article abstracts written in English. MEXTESOL J. 2022, 46, n2. [Google Scholar] [CrossRef]
- Wu, J.; Zhao, H.; Wu, X.; Liu, Q.; Su, J.; Ji, Y.; Wang, Q. Word concreteness modulates bilingual language control during reading comprehension. J. Exp. Psychol. Learn. Mem. Cogn. 2024; Advance online publication. [Google Scholar]
- McNamara, D.S.; Graesser, A.C.; Louwerse, M.M. Sources of text difficulty: Across genres and grades. In Measuring Up: Advances in How We Assess Reading Ability; Sabatini, J.P., Albro, E., O’Reilly, T., Eds.; Rowman & Littlefield: Lanham, MD, USA, 2012; pp. 89–116. [Google Scholar]
- Achugar, M.; Schleppegrell, M.J. Beyond connectors: The construction of cause in history textbooks. Linguist. Educ. 2005, 16, 298–318. [Google Scholar] [CrossRef]
- Gatiyatullina, G.M.; Solnyshkina, M.I.; Kupriyanov, R.V.; Ziganshina, C.R. Lexical density as a complexity predictor: The case of science and social studies textbooks. Res. Result. Theor. Appl. Linguist. 2023, 9, 11–26. [Google Scholar] [CrossRef]
- de Oliveira, L.C. Nouns in history: Packaging information, expanding explanations, and structuring reasoning. Hist. Teach. 2010, 43, 191–203. [Google Scholar]
- Follmer, D.J.; Li, P.; Clariana, R. Predicting expository text processing: Causal content density as a critical expository text metric. Read. Psychol. 2021, 42, 625–662. [Google Scholar] [CrossRef]
- Hao, Y.; Cao, P.; Jin, Z.; Liao, H.; Chen, Y.; Liu, K.; Zhao, J. Evaluating personalized tool-augmented LLMs from the perspectives of personalization and proactivity. arXiv 2025, arXiv:2503.00771. [Google Scholar] [CrossRef]
- Zhang, Z.; Rossi, R.A.; Kveton, B.; Shao, Y.; Yang, D.; Zamani, H.; Wang, Y. Personalization of large language models: A survey. arXiv 2024, arXiv:2411.00027. [Google Scholar] [CrossRef]
- Sulem, E.; Abend, O.; Rappoport, A. BLEU is not suitable for the evaluation of text simplification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 738–744. [Google Scholar]
- Weissburg, I.; Anand, S.; Levy, S.; Jeong, H. LLMs are biased teachers: Evaluating LLM bias in personalized education. arXiv 2024, arXiv:2410.14012. [Google Scholar] [CrossRef]
- Gupta, V.; Chowdhury, S.P.; Zouhar, V.; Rooein, D.; Sachan, M. Multilingual performance biases of large language models in education. arXiv 2025, arXiv:2504.17720. [Google Scholar] [CrossRef]
- Chinta, S.V.; Wang, Z.; Yin, Z.; Hoang, N.; Gonzalez, M.; Quy, T.L.; Zhang, W. FairAIED: Navigating fairness, bias, and ethics in educational AI applications. arXiv 2024, arXiv:2407.18745. [Google Scholar] [CrossRef]
- Ji, Z.; Lee, N.; Frieske, R.; Yu, T.; Su, D.; Xu, Y.; Ishii, E.; Bang, Y.; Madotto, A.; Fung, P. Survey of hallucination in natural language generation. ACM Comput. Surv. 2023, 55, 1–38. [Google Scholar] [CrossRef]
- Ji, Z.; Yu, T.; Xu, Y.; Lee, N.; Ishii, E.; Fung, P. Towards mitigating LLM hallucination via self-reflection. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; ACL: Singapore, 2023. [Google Scholar]
- Kintsch, W. The Role of Knowledge in Discourse Comprehension: A Construction-Integration Model. Psychol. Rev. 1988, 95, 163–182. [Google Scholar] [CrossRef] [PubMed]
- Atil, B.; Chittams, A.; Fu, L.; Ture, F.; Xu, L.; Baldwin, B. LLM Stability: A detailed analysis with some surprises. arXiv 2024, arXiv:2408.04667. [Google Scholar]
- Zhou, H.; Savova, G.; Wang, L. Assessing the macro and micro effects of random seeds on fine-tuning large language models. arXiv 2025, arXiv:2503.07329. [Google Scholar] [CrossRef]
- Echterhoff, J.; Faghri, F.; Vemulapalli, R.; Pouransari, H. MUSCLE: A Model Update Strategy for Compatible LLM Evolution. arXiv 2024, arXiv:2407.09435. [Google Scholar] [CrossRef]
- Pimentel, M.A.F.; Christophe, C.; Raha, T.; Munjal, P.; Kanithi, P.; Khan, S. Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks. arXiv 2024, arXiv:2407.21072. [Google Scholar] [CrossRef]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Alkaissi, H.; McFarlane, S.I. Artificial Hallucinations in ChatGPT: Implications in Scientific Writing. Cureus 2023, 15, e35179. [Google Scholar] [CrossRef]
- Hatem, R.; Simmons, B.; Thornton, J.E. A call to address AI “hallucinations” and how healthcare professionals can mitigate their risks. Cureus 2023, 15, e44720. [Google Scholar] [CrossRef]
- Bang, Y.; Cahyawijaya, S.; Lee, N.; Dai, W.; Su, D.; Wilie, B.; Fung, P. A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. arXiv 2023, arXiv:2302.04023. [Google Scholar] [CrossRef]
- Maynez, J.; Narayan, S.; Bohnet, B.; McDonald, R. On faithfulness and factuality in abstractive summarization. arXiv 2020, arXiv:2005.00661. [Google Scholar] [CrossRef]
- Laban, P.; Kryściński, W.; Agarwal, D.; Fabbri, A.R.; Xiong, C.; Joty, S.; Wu, C.S. LLMs as factual reasoners: Insights from existing benchmarks and beyond. arXiv 2023, arXiv:2305.14540. [Google Scholar] [CrossRef]



| Features | Metrics and Descriptions |
|---|---|
| Writing Style | Academic writing *: The extent to which texts include domain-specific terminology and complex sentence structures typical of academic discourse. Texts with higher academic measures reflect a more formal style that can foster difficulty for less-skilled and low-knowledge readers. |
| Conceptual Density and Cohesion | Lexical density: The extent to which text contains sentences with dense and precise information, including complex noun phrases and sophisticated words. Texts with high lexical density convey more information per sentence but may require greater effort to process. Noun-to-verb ratio *: Text with a high noun-to-verb ratio results in dense information and complex sentences. High nominalization is characteristic of academic discourse and can be challenging for struggling readers. Sentence cohesion *: The degree to which sentences are explicitly connected through connectives (e.g., because, therefore, or in addition) or cohesion cues (e.g., overlapping ideas and concepts). Cohesive texts support comprehension, especially for low-knowledge readers, since they rely on explicit textual cues to infer meanings. |
| Syntax Complexity | Sentence length *: Longer sentences often have multiple clauses and embedded phrases, increasing syntactic complexity. Longer sentences may hinder comprehension for less skilled readers. Language variety *: The extent to which the text contains a variety of lexical and syntactic structures. The high language variety measure enhances stylistic richness and engagement, while low variety can be monotonous but simplifies comprehension. |
| Lexical Complexity | Word concreteness: The degree to which words refer to sensory experiences or physical objects that can be experienced by the senses. High measures indicate the texts contain more tangible words, while low measures indicate more abstract concepts that can be difficult for novices. Sophisticated wording *: Lower measures indicate common, familiar vocabulary, whereas higher measures indicate more advanced words. Using sophisticated vocabulary enriches expression and academic tone but can reduce readability for readers with less knowledge. Academic frequency *: Indicates the extent of sophisticated vocabulary that is used, which is also common in academic texts. High academic frequency indicates technical or scholarly language that requires greater background knowledge to comprehend. |
| Connectives | All connectives: Refers to the overall density of linking words and phrases (e.g., however, therefore, then, in addition). Higher values indicate the text is overtly guiding the reader through logical, additive, contrastive, temporal, or causal relations, increasing cohesion. Lower values imply that relationships must be inferred from contextTemporal connectives: Markers that place events on a timeline (e.g., then, meanwhile, during, subsequently)Causal connectives: Markers that signal cause-and-effect or reasoning links (e.g., because, since, therefore, thus, as a result) |
| Domain | Topic | Text Titles | Word Count | FKGL * |
|---|---|---|---|---|
| Science | Biology | Bacteria | 468 | 12.10 |
| Science | Biology | The Cells | 426 | 11.61 |
| Science | Biology | Microbes | 407 | 14.38 |
| Science | Biology | Genetic Equilibrium | 441 | 12.61 |
| Science | Biology | Food Webs | 492 | 12.06 |
| Science | Biology | Patterns of Evolution | 341 | 15.09 |
| Science | Biology | Causes and Effects of Mutations | 318 | 11.35 |
| Science | Biochemistry | Photosynthesis | 427 | 11.44 |
| Science | Chemistry | Chemistry of Life | 436 | 12.71 |
| Science | Physics | What are Gravitational Waves? | 359 | 16.51 |
| History | American History | Battle of Saratoga | 424 | 9.86 |
| History | American History | Battles of New York | 445 | 11.77 |
| History | American History | Battles of Lexington and Concord | 483 | 12.85 |
| History | American History | Emancipation Proclamation | 271 | 13.4 |
| History | American History | House of Burgesses | 200 | 12.8 |
| History | American History | Abraham Lincoln—Rise to Presidency | 631 | 12.15 |
| History | American History | George Washington | 260 | 9.79 |
| History | French and American History | Marquis de Lafayette | 356 | 13.78 |
| History | Dutch and American History | New York (New Amsterdam) Colony | 403 | 12.97 |
| History | World History | Age of Exploration | 490 | 10.49 |
| Linguistic Features | Reader 1 (High RS/High PK **) | Reader 2 (High RS/Low PK **) | Reader 3 (Low RS/High PK **) | Reader 4 (Low RS/Low PK **) | Main Effects of Profile | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| M | SD | M | SD | M | SD | M | SD | F (3, 303) | p | η2 | |
| Academic Writing * | 75.84 | 24.74 | 51.66 | 26.48 | 33.06 | 27.15 | 34.30 | 22.96 | 121.25 | <0.001 | 0.38 |
| Language Variety * | 80.73 | 19.21 | 50.76 | 20.57 | 27.72 | 17.39 | 30.33 | 18.44 | 251.32 | <0.001 | 0.55 |
| Lexical Density * | 0.68 | 0.12 | 0.61 | 0.12 | 0.59 | 0.11 | 0.58 | 0.10 | 226.13 | <0.001 | 0.53 |
| Sentence Cohesion * | 32.86 | 28.89 | 54.75 | 29.93 | 55.83 | 22.68 | 60.45 | 26.92 | 35.11 | <0.001 | 0.15 |
| Noun-to-Verb Ratio * | 2.79 | 0.46 | 2.53 | 0.55 | 2.54 | 0.72 | 1.84 | 0.34 | 119.86 | <0.001 | 0.37 |
| Sentence Length * | 18.62 | 5.97 | 14.78 | 5.49 | 14.59 | 4.47 | 13.53 | 4.11 | 61.98 | <0.001 | 0.23 |
| Word Concreteness * | 29.86 | 17.79 | 50.52 | 25.63 | 55.18 | 27.21 | 60.76 | 24.96 | 57.26 | <0.001 | 0.22 |
| Sophisticated Word * | 88.85 | 9.52 | 51.12 | 21.09 | 29.05 | 17.64 | 23.42 | 16.06 | 603.28 | <0.001 | 0.75 |
| Academic Frequency * | 2.78 | 0.01 | 2.77 | 0.01 | 2.73 | 0.01 | 2.72 | 0.01 | 12.41 | <0.001 | 0.06 |
| Causal Connectives * | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 3.11 | 0.03 | 0.02 |
| Temporal Connectives * | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.79 | 0.50 | 0.00 |
| All Connectives * | 0.05 | 0.01 | 0.05 | 0.01 | 0.05 | 0.01 | 0.05 | 0.01 | 3.54 | 0.02 | 0.02 |
| Linguistic Features | Claude | Llama | Gemini | ChatGPT | Main Effects of LLMs | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| M | SD | M | SD | M | SD | M | SD | F (3, 303) | p | η2 | |
| Academic Writing | 48.42 | 31.73 | 53.28 | 30.50 | 45.78 | 30.98 | 47.37 | 29.23 | 1.98 | 0.12 | 0.01 |
| Language Variety * | 47.16 | 29.78 | 40.04 | 27.99 | 54.42 | 28.78 | 47.92 | 25.37 | 13.15 | <0.001 | 0.06 |
| Lexical Density * | 0.62 | 0.12 | 0.61 | 0.12 | 0.62 | 0.12 | 0.61 | 0.12 | 5.26 | 0.001 | 0.03 |
| Sentence Cohesion * | 63.35 | 28.59 | 41.55 | 27.20 | 47.94 | 27.31 | 51.05 | 29.52 | 21.73 | <0.001 | 0.10 |
| Noun-to-Verb Ratio * | 2.56 | 0.84 | 2.38 | 0.56 | 2.44 | 0.57 | 2.33 | 0.51 | 7.88 | <0.001 | 0.17 |
| Sentence Length * | 12.46 | 4.38 | 16.32 | 5.15 | 16.38 | 5.09 | 16.35 | 5.88 | 42.71 | <0.001 | 0.17 |
| Word Concreteness | 47.33 | 26.23 | 46.25 | 26.89 | 51.94 | 27.85 | 50.80 | 26.00 | 1.60 | 0.189 | 0.10 |
| Sophisticated Word * | 47.30 | 32.20 | 46.70 | 27.88 | 49.38 | 31.52 | 49.05 | 30.82 | 2.72 | 0.04 | 0.01 |
| Academic Frequency * | 2.71 | 0.01 | 2.77 | 0.01 | 2.73 | 0.01 | 2.80 | 0.01 | 17.57 | <0.001 | 0.08 |
| Causal Connectives | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.56 | 0.64 | 0.00 |
| Temporal Connectives * | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 11.87 | <0.001 | 0.06 |
| All Connectives * | 0.05 | 0.01 | 0.05 | 0.01 | 0.05 | 0.01 | 0.05 | 0.01 | 4.03 | 0.007 | 0.02 |
| Linguistic Features | Science Texts | History Texts | Main Effects of Text Types | ||||
|---|---|---|---|---|---|---|---|
| M | SD | M | SD | F (1, 303) | p | η2 | |
| Academic Writing | 50.08 | 27.98 | 47.34 | 33.15 | 2.80 | 0.10 | 0.01 |
| Language Variety | 47.06 | 29.30 | 47.71 | 27.56 | 0.84 | 0.36 | 0.00 |
| Lexical Density * | 0.72 | 0.06 | 0.51 | 0.05 | 5743.50 | <0.001 | 0.90 |
| Sentence Cohesion | 51.26 | 28.63 | 50.68 | 29.80 | 0.09 | 0.76 | 0.00 |
| Noun-to-Verb Ratio * | 2.34 | 0.58 | 2.51 | 0.68 | 18.31 | <0.001 | 0.03 |
| Sentence Length * | 17.67 | 5.21 | 13.09 | 4.59 | 254.88 | <0.001 | 0.30 |
| Word Concreteness | 49.53 | 28.42 | 48.62 | 25.10 | 0.12 | 0.73 | 0.00 |
| Sophisticated Word * | 49.27 | 30.95 | 46.94 | 30.25 | 5.00 | 0.03 | 0.01 |
| Academic Frequency * | 2.80 | 0.01 | 2.81 | 0.01 | 97.11 | <0.001 | 0.14 |
| Causal Connectives * | 0.01 | 0.01 | 0.00 | 0.00 | 78.44 | <0.001 | 0.11 |
| Temporal Connectives * | 0.01 | 0.01 | 0.01 | 0.01 | 17.01 | <0.001 | 0.03 |
| All Connectives * | 0.06 | 0.01 | 0.05 | 0.01 | 26.50 | <0.001 | 0.04 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huynh, L.; McNamara, D.S. Natural Language Processing as a Scalable Method for Evaluating Educational Text Personalization by LLMs. Appl. Sci. 2025, 15, 12128. https://doi.org/10.3390/app152212128
Huynh L, McNamara DS. Natural Language Processing as a Scalable Method for Evaluating Educational Text Personalization by LLMs. Applied Sciences. 2025; 15(22):12128. https://doi.org/10.3390/app152212128
Chicago/Turabian StyleHuynh, Linh, and Danielle S. McNamara. 2025. "Natural Language Processing as a Scalable Method for Evaluating Educational Text Personalization by LLMs" Applied Sciences 15, no. 22: 12128. https://doi.org/10.3390/app152212128
APA StyleHuynh, L., & McNamara, D. S. (2025). Natural Language Processing as a Scalable Method for Evaluating Educational Text Personalization by LLMs. Applied Sciences, 15(22), 12128. https://doi.org/10.3390/app152212128

