GenAI-Powered Text Personalization: Natural Language Processing Validation of Adaptation Capabilities

Huynh, Linh; McNamara, Danielle S.

doi:10.3390/app15126791

Open AccessArticle

GenAI-Powered Text Personalization: Natural Language Processing Validation of Adaptation Capabilities

by

Linh Huynh

and

Danielle S. McNamara

^*

Learning Engineering Institute, Arizona State University, 120 Cady Mall, Tempe, AZ 85281, USA

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(12), 6791; https://doi.org/10.3390/app15126791

Submission received: 26 April 2025 / Revised: 10 June 2025 / Accepted: 11 June 2025 / Published: 17 June 2025

(This article belongs to the Special Issue Applied Intelligence in Natural Language Processing)

Download

Browse Figures

Versions Notes

Abstract

The authors conducted two experiments to assess the alignment between Generative AI (GenAI) text personalization and hypothetical readers’ profiles. In Experiment 1, four LLMs (i.e., Claude 3.5 Sonnet, Llama, Gemini Pro 1.5, and ChatGPT 4) were prompted to tailor 10 science texts (i.e., biology, chemistry, and physics) to accommodate four different profiles varying in knowledge, reading skills, and learning goals. Natural Language Processing (NLP) was leveraged to evaluate the GenAI-adapted texts using an array of linguistic and semantic features empirically associated with text readability. NLP analyses revealed variations in the degree to which the LLMs successfully adjusted linguistic features to suit reader profiles. Most notably, NLP highlighted inconsistent alignment between potential reader abilities and text complexity. The results pointed toward the need to augment the AI prompts using personification, chain-of-thought, and documents regarding text comprehension, text readability, and individual differences (i.e., leveraging RAG). The resulting text modifications in Experiment 2 were better aligned with readers’ profiles. Augmented prompts resulted in LLM modifications with more appropriate cohesion features tailored to high- and low-knowledge readers for optimal comprehension. This study demonstrates how LLMs can be prompted to modify text and uniquely demonstrates the application of NLP to evaluate theory-driven content personalization using GenAI. NLP offers an efficient, real-time solution to validate personalized content across multiple domains and contexts.

Keywords:

large language models; natural language processing; personalized learning; reading comprehension

1. Introduction

Reading is a fundamental learning activity that supports students in acquiring new knowledge and developing expertise across domains [1]. To build a coherent understanding of complex topics, readers must integrate information from texts into an interconnected mental model [2]. While learning from texts is essential for academic success, it can be challenging for students who lack prior knowledge of the topic or sufficient reading skills to understand the content [3,4,5]. Furthermore, reading comprehension is influenced by both reader characteristics and textual features. Text readability is the extent to which a reader can process and understand a text [6,7]. The same text can be easy or challenging to a reader depending on their reading skills and familiarity with the topic. Given the significance of reading comprehension in learning and the variability in reader characteristics, personalized learning has emerged to address students’ diverse needs [8,9,10].

1.1. Personalized Learning

Personalized learning involves matching students with materials at an appropriate difficulty level, tailoring contents to suit their unique interests, thereby enhancing comprehension and engagement, and supporting vocabulary acquisition, skill development, and overall academic success [11,12]. When students perceive learning materials as relevant and aligned with their interests, they are more likely to persist through challenges and actively engage in learning activities, even when the content is difficult [13,14]. Moreover, students engaged in personalized learning experiences demonstrate improved knowledge retention and higher gains in mathematics and reading performance compared to those in traditional learning settings [15,16]. For example, tailoring algebra problems based on students’ interests has been shown to enhance student’s motivation, engagement, and learning outcomes [17].

Personalized learning has emerged as a strategy to align text readability and writing style appropriate for individual needs [18]. Reading texts suited to students’ abilities also fosters interest, motivation, and a passion for learning [19,20]. Prior research has demonstrated the importance and efficacy of matching text to readers’ abilities to optimize comprehension and learning outcomes [21,22]. To effectively align reading materials with students’ reading capabilities, the Lexile Framework is a widely used text–reader matching tool that determines the reading skills of children and matches them with texts based on a readability formula [23,24]. This framework quantifies text difficulty based on factors such as syntactic complexity, sentence length, and word frequency, while reader skill is assessed using standardized assessments [24,25].

While established frameworks like Lexile [23,24,25] have demonstrated effectiveness, advancements in artificial intelligence (AI) provide promising tools to implement personalized learning at scale. Generative AI (GenAI) plays a crucial role in advancing personalized learning by dynamically tailoring learning materials specifically to students’ needs and abilities. Large Language Model (LLM) applications have been applied to adapt writing style, simplify content, and generate materials tailored to individual needs [26,27]. For instance, in educational contexts, LLMs were leveraged to simplify content and adjust lexical and syntactic complexity to enhance accessibility for students with learning disabilities [28]. Similarly, researchers have leveraged LLMs and OpenAI’s API to automatically generate personalized reading materials and comprehension assessments within the learning management system [29,30,31]. These personalized experiences resulted in higher student engagement and increased their study time, ultimately enhancing academic performance and outcomes [32,33,34].

Although the positive impact of personalization on learners’ engagement and performance is well-established, implementation of GenAI-powered personalization in the classroom remains challenging. While text adaptation using LLMs offers a scalable solution to personalize learning, evaluating the effectiveness of modifications can be complicated. Effective text personalization requires rapid validation and iterative refinements to ensure that materials continuously align with students’ evolving needs and learning goals. Traditional assessment (e.g., human comprehension study) is both time-consuming and resource-intensive, which makes it challenging and impractical to implement on a large scale. To address this challenge, an automated and scalable validation method is necessary to evaluate the extent to which personalized content aligns with students’ various needs and assets.

1.2. Evaluating Text Differences Using Natural Language Processing

Natural Language Processing (NLP) is a computational approach that systematically analyzes and extracts patterns from text [7]. In educational and psychological research, NLP tools have been widely applied to automatically evaluate linguistic and semantic features of texts. NLP analyses are extensively used in computer-based learning systems to evaluate text quality and assess student-generated responses (e.g., self-explanations, think-aloud responses, essays, and open-ended answers) [8]. These linguistic analyses provide insights into task performance, cognitive attributes, and individual differences such as literacy skills and prior knowledge [35,36].

Another application of NLP analyses is text readability assessment. Text readability is the extent to which a reader can process and understand a text and it is quantifiable based on various linguistic indices (e.g., cohesion, noun-to-verb ratio, syntactic, and lexical complexity; [8,37]). NLP tools (e.g., Coh-Metrix; [37]) have been leveraged to extract and analyze features that influence text readability based on discourse processing theories. For instance, cohesion, syntactic complexity, and lexical sophistication are associated with coherence-building processes and reading comprehension for different students [8,37,38]. Additionally, NLP analyses afford systematic distinguishing between different types of texts and discourse based on linguistic and semantic features and, in turn, evaluate the difficulty of educational materials [38,39]. Scientific texts often contain dense information, abstract concepts, and complex sentence structures that are challenging for students to process and understand [40,41]. Moreover, academic vocabulary, which is commonly used in academic text, increases comprehension difficulty because readers need to have relevant background knowledge to infer word meanings [41,42].

Text cohesion (e.g., connectives, causal, referential, and coreference) or how well ideas are connected within a text is another property of text that is measured using NLP. Cohesion influences how readers synthesize information and construct meaning [43]. High-cohesion texts benefit low-knowledge readers who lack the necessary background knowledge to infer meanings due to conceptual gaps [43]. In contrast, low-cohesion texts are more suitable for high-knowledge readers, as they facilitate deep learning by encouraging inference generation using prior knowledge [43].

Comprehension difficulties arise not only from a lack of reading skills or prior knowledge but also from a mismatch between textual features and the individual characteristics of the reader [44,45]. When readers lack requisite background knowledge or reading skills, linguistic features such as complex syntax or low cohesion can exacerbate comprehension challenges [8]. Skilled readers are able to comprehend advanced academic texts because they possess high-quality lexical representations and apply effective reading strategies [44]. These strategies enable them to efficiently decode complex syntactic structures, infer the meaning of unfamiliar vocabulary from context, and integrate prior knowledge to support deeper understanding. In contrast, readers with limited skills or knowledge may struggle with texts that are less cohesive or that contain sophisticated vocabulary and sentence structures [45,46]. Additionally, less skilled readers may have difficulty processing complex syntax, as effective comprehension requires sustained attention and cognitive engagement [47,48].

Table 1 describes several linguistic measures related to text complexity that have been shown to correspond to comprehension difficulties faced by students [37]. We measured these linguistic features using NLP because these linguistic patterns provide objective, quantifiable measures of text difficulty that correlate with expert judgments of readability [49]. To effectively tailor text readability for diverse readers with unique characteristics, it is essential to examine the alignment between linguistic features and reader abilities. Additionally, assessing multiple linguistic features related to text difficulty rather than relying on a single metric provides a more comprehensive assessment of readability [8]. Texts written using academic language, uncommon vocabulary, high lexical density, and complex syntax pose challenges for many readers [50]. In addition, text comprehensibility can be enhanced by revising materials to include clearer causal relationships, explicit explanations, and more elaborated background information for less skilled, low-knowledge readers [43,45].

As adaptive learning technologies evolve, automated methods for quick and efficient assessment of text modifications are essential. Leveraging NLP analyses to validate modifications offers several advantages over traditional human assessment. NLP tools can extract linguistic metrics, quantify text complexity, and provide an objective validation method for text personalization. NLP is efficient and scalable, which allows for iterative evaluation of content across various domains without requiring multiple human studies. NLP-based assessments can be scaled to provide immediate feedback, enabling real-time assessment and text refinement. As a result, personalized content can be modified based on quantitative feedback and assessed and adapted iteratively. Thus, in this study, we leveraged NLP to assess these linguistic features and determine how text adaptations varied in complexity across reader profiles and LLMs. The linguistic features of each text modification were extracted using the Writing Analytics Tool (WAT; [51]). By quantifying linguistic properties using NLP, we aimed to evaluate how LLMs tailored the modifications to accommodate the needs of diverse readers.

1.3. Current Research

In the following experiments, we examined how Generative AI (GenAI) can personalize educational texts for different reader profiles by leveraging NLP tools to validate modifications. In Experiment 1, we prompted LLMs to modify scientific texts to accommodate four different reader profiles varying in knowledge, reading skills, and learning goals. In Experiment 2, we refined prompting strategies to improve text personalization and focused on one LLM to examine the effect. NLP analyses were leveraged to evaluate how different LLMs and prompting techniques adjust text readability to examine the extent to which modifications align appropriately with the unique needs of readers. We aimed to explore two research questions: 1. To what extent do different LLMs adapt scientific texts for different reader profiles? 2. How do augmented prompting strategies influence the quality of modified texts? These studies highlight how NLP analyses can be applied to validate text readability in real time across large datasets. The NLP-based validation method also facilitates rapid assessment and iteration so that adaptive learning technologies can quickly tailor content to suit students’ needs. Rapid testing and refinement help ensure that learning materials are effectively tailored and personalized to optimize student’s learning.

2. Experiment 1: LLM Text Personalization

2.1. Introduction Experiment 1

Due to the inherent differences in model design (i.e., training corpus and tuning strategies), different LLMs may produce varied outputs even when given identical prompts [52]. The corpus that LLM is trained on can have a significant influence on its performance and suitability for different task applications. For instance, an LLM trained with academic texts may perform better in educational or scientific content generation, while a model trained with code may excel in programming-related tasks [52]. Recent research comparing outputs from widely used LLMs such as Claude, Llama, ChatGPT, Gemini, and DeepSeek highlighted noticeable differences in output patterns [53]. Furthermore, different versions of the same model (e.g., GPT-3.5 vs. GPT-4) also show outputs with varied levels of lexical richness and syntactic complexity [54,55,56].

Considering these sources of variability, it is important to examine how different LLMs personalize learning materials. In Experiment 1, we investigated how four commonly used LLMs (Claude 3.5 Sonnet, Llama 3.1, Gemini Pro 1.5, and ChatGPT 4) modify scientific texts to suit distinct reader characteristics and learning contexts (e.g., prior knowledge, reading skills, and learning goals). The goal was to apply NLP analyses to assess the extent to which linguistic features varied based on reader profiles and whether the models differed in their ability to personalize texts. Ten texts from biology, chemistry, and physics were adapted to align with different reader profiles. We used NLP to extract and analyze linguistic features in the modified texts, evaluating the extent to which the outputs were tailored to the intended readers. We hypothesized that readability-related linguistic features would vary across profiles and that each LLM would exhibit unique output patterns even when prompted identically.

2.2. Materials and Method Experiment 1

2.2.1. LLM Selection and Implementation Details

The four LLMs selected for the comparative analysis were Claude 3.5 Sonnet (Anthropic), Llama 3.1 (Meta), Gemini Pro 1.5 (Google), and ChatGPT 4 (OpenAI). See Appendix A for details (e.g., versions, dates of usage, and training size and parameters) of the four LLMs. These models were selected for comparison based on three criteria. First, they are selected because of the comparable training size and parameters, which suggests comparable capabilities in language comprehension and generation. Secondly, these LLMs are easily accessible and commonly used across settings (i.e., personal, academic, and industry). Lastly, all four LLMs have a well-established reputation for high performance for general-purpose and instructional NLP tasks. While the training sizes vary across models, they are all well-known for comparable capabilities in language understanding and generation of contextually relevant text. Each model is known for its proficiency in natural language understanding and generating contextually relevant text.

2.2.2. Reader Profiles Experiment 1

Four hypothetical reader profiles were created to simulate readers’ unique characteristics and assess LLMs’ capability to adapt texts to the persona. The characteristics of these four reader profiles differ across background knowledge and reading levels. Each profile included detailed information on the readers’ age, educational background, reading level, prior knowledge, interests, and reading goals (see Table 2 for reader profile descriptions). While four profiles are limited to explore the extent to which LLM can tailor text to each reader, we designed these four as a proof-of-concept to initially explore personalized adaptations and NLP capabilities in evaluating personalized content.

2.2.3. Text Corpus

Ten expository texts in the domains of biology, chemistry, and physics were compiled. These texts were selected from the iSTART website www.adaptiveliteracy.com/istart (accessed on 21 August 2024) (The texts are publicly available through the iSTART website (www.adaptiveliteracy.com/istart) (accessed on 21 August 2024). Users may create a free account on the website and navigate to the “Texts Library” menu. These materials are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0), (https://creativecommons.org/licenses/by/4.0/) (accessed on 6 June 2025) (e.g., iSTART StairStepper, 55).) [57]. The texts were chosen for their varying levels of complexity and relevance to scientific topics, which are suitable for testing the models’ ability to modify text based on different reader profiles. Table 3 provides the list of the texts, domains, titles, number of words, and the Flesch–Kincaid Grade Level.

2.2.4. Procedure Experiment 1

The four LLMs were prompted to modify 10 scientific texts to suit four different reader profiles varying in prior knowledge and reading skills. See Appendix B for details of the prompts used. The LLMs were prompted to modify each text to improve comprehension and engagement for a reader, with the goal of tailoring the text to align with the unique characteristics of the reader (age, educational background, reading skills, prior knowledge, and reading goals). Table 2 provides the descriptions of the four reader profiles that varied as a function of prior knowledge (PK) and reading skills (RS). For each reader profile, the procedure was repeated 10 times, each time with one text. Then, the conversation history was erased, and the same procedure was repeated for the next reader profile, modifying 10 science texts using the same prompt. In total, each LLM was prompted 40 times, generating 160 modifications for four reader profiles from four LLMs (Claude, Llama, ChatGPT, and Gemini).

The linguistic features (as shown in Table 1) were automatically extracted using WAT [51], which provides linguistic and semantic indices associated with writing quality. These indices serve as validation metrics to evaluate linguistic and semantic features of the LLM text personalization. By aligning linguistic features with given reader profiles, we provide a basis to assess the extent to which LLM modifications appropriately tailor texts to suit readers’ needs (e.g., low-knowledge readers benefit from cohesive text and simplified syntax vocabulary; high-knowledge readers benefit from lower cohesion and increased complexity). In Table 4, we outline how linguistic features should be adapted to align with the characteristics of various reader profiles.

2.3. Results Experiment 1

2.3.1. Main Effect of Reader Profile on Variations in Linguistic Features of Modified Texts

The goal of this analysis was to examine the extent to which the linguistic features of LLM-modified texts aligned with different reader profiles. A 4 × 4 MANCOVA was conducted to examine the effects of reader profiles (High RS/High PK, High RS/Low PK, Low RS/High PK, and Low RS/Low PK) and LLMs (Claude, Llama, Gemini, and ChatGPT) on linguistic features of modifications, with word count included as a covariate. The main effect of reader profiles on linguistic features was significant. See Table 5 for descriptive statistics and F-values of the main effects of reader profiles. Results showed that linguistic features of modifications significantly differed across reader profiles.

As expected, the modifications for Profile 1 (High RS/High PK) had the highest syntactic and lexical complexity measures, followed by Profile 2, then 3 and 4. Texts modified for Profile 1 had significantly higher Flesch–Kincaid Grade Level (FKGL) (M = 16.97, SD = 2.36) compared to texts modified for Profile 2 (High RS/Low PK) (M = 10.63, SD = 1.76), p < 0.001, Profile 3 (Low RS/High PK) (M = 9.84, SD = 1.89), p < 0.001, and Profile 4 (Low RS/Low PK) (M = 8.50, SD = 1.39), p < 0.001. Pairwise comparisons also revealed significant differences across reader profiles for measures of text complexity, including academic writing, noun-to-verb ratio, sentence length, language variety, sophisticated wording, and academic vocabulary, aligning with the hypothesized difficulty levels for each profile. As intended, texts modified for Profile 1 had the highest complex syntax and lexical measures, followed by Profiles 2, 3, and 4 (all p < 0.05). Contrary to expectations, cohesion was higher for Profile 1, which has high knowledge, compared to Profiles 2 and 4, which both have low prior knowledge (p < 0.05).

2.3.2. Main Effect of LLMs

The main effect of the models was significant. Linguistic features of modifications from different LLMs differed regardless of reader characteristics. See Table 6 for descriptive statistics and F-values.

Llama’s modifications (M = 11.87, SD = 3.18) had significantly higher FKGL compared to modifications by Claude (M = 11.13, SD = 4.34), p < 0.05. Llama’s modifications are significantly higher in academic writing and academic wording compared to modifications by Claude, Gemini, and ChatGPT (all p < 0.05). Modifications by Claude (M = 30.38, SD = 24.15) had significantly lower sentence cohesion compared to modifications by Llama (M = 60.86, SD = 24.71), p < 0.001; Gemini (M = 52.65, SD = 25.32), p < 0.001; and ChatGPT (M = 51.06, SD = 31.06), p < 0.001. Modifications by Gemini had the highest language variety (M = 55.51, SD = 25.77) compared to texts generated by Claude (M = 47.61, SD = 31.9), p < 0.05, and Llama (M = 38.07, SD = 27.84), p < 0.001.

2.4. Discussion Experiment 1

NLP analyses showed variations in linguistic features that were adjusted by various LLMs to address individual differences. These variations ensure that the modifications are comprehensible and suitable for the given reader. Specifically, modifications generated for skilled and high prior knowledge profiles showed greater syntactical and lexical complexity, academic writing style, and sophisticated language. In contrast, texts modified for skilled, low-knowledge profiles featured long sentences, sophisticated wording, and varied language while maintaining moderate information density that can avoid overwhelming readers with many science concepts due to lower knowledge. Modifications for low-knowledge and less skilled readers simplified sentence structures and wording to increase readability. LLMs’ sensitivity to different readers’ needs led to variations in linguistic features of modified texts, which were captured by NLP analyses.

Moreover, each LLM’s modifications differed linguistically regardless of reader profiles. Claude’s modifications contained short sentences and were the least cohesive but also had the highest levels of idea development. Llama’s modifications had the highest complexity measures (e.g., FKGL, academic vocabulary, and academic writing). Gemini balanced the depth of information with engaging and cohesive language and used relatable analogies and illustrative examples, which are particularly beneficial for educational materials. ChatGPT modifications were cohesive and elaborated concepts thoroughly in the modifications, especially for low-knowledge profiles.

However, NLP analyses also revealed that cohesion features were not appropriately aligned with readers’ needs, which contradicts findings for optimal comprehension suggested by prior research [43,45]. These inconsistencies in modification highlighted the need to improve prompting techniques and provide LLMs with more explicit instructions and background information to guide the modification process. The single-shot prompt used in Experiment 1 lacks context-relevant information that could guide LLMs to perform the task more precisely. Having detected limitations in the generic prompt by leveraging NLP measures, we refined and tested an augmented prompt that was more detailed and structured to improve text personalization.

3. Experiment 2: Prompt Refinements

3.1. Introduction Experiment 2

In Experiment 1, the LLMs did not effectively tailor cohesion features to align with the readers’ background knowledge. Specifically, texts modified for high-knowledge readers were highly cohesive compared to those for low-knowledge readers. Cohesion plays a critical role in comprehension and learning because conceptual gaps can promote active processing to generate bridging inferences [43,45]. As a result, readers with low prior knowledge benefit from cohesive texts because they lack the necessary background knowledge to fill in conceptual gaps. In contrast, texts adapted for high-knowledge readers should contain less cohesive cues to foster knowledge-bridging and deep comprehension. The mismatch in cohesion for modifications for high- versus low-knowledge readers suggested that the LLMs require further instruction on how to adjust cohesion appropriately for these profiles. Specifically, there may be some limitations in the LLM’s ability to fully differentiate text cohesion for this specific reader profile, highlighting the need for further refinement in tailoring text modifications for less-skilled, high-knowledge readers.

Findings from NLP analyses also showed that idea development was significantly higher in modifications tailored for high-knowledge and skilled readers compared to modifications for all other profiles. When reading for academic purposes, it is crucial that students can learn effectively from the text. As such, scientific concepts should be sufficiently and equally elaborated for all students and not just the skilled ones. It is critical that text modifications for low-knowledge or less skilled profiles simplify vocabulary and syntax but remain cohesive and comprehendible.

One of the most effective methods to enhance LLM performance is by instructing the models with clear and structured prompts. Providing LLMs with additional context on how to complete the task also helps LLMs to generate more accurate outputs and follow adaptation guidelines suggested by prior research [58,59]. High-quality instructional prompts also help LLMs reduce hallucinations and help models adhere more closely to intended outcomes. Prompting plays a critical role in determining the quality and relevance of a model’s responses [52]. The input prompt can influence the variability in performance such that ambiguous or underspecified prompts can lead to inconsistent outputs [52]. Similarly, prior research has demonstrated that well-structured and carefully designed prompts consistently enhanced task performance across diverse NLP tasks [60]. Prompt quality is a major factor for LLM performance, especially for tasks involving reasoning, adaptation, or following instructions.

These findings underscore the importance of systematic prompt design in enhancing output quality. Based on the findings of Experiments 1 and 2, we identified several areas for improvement from the current prompt to enhance text personalization. Iterative prompt improvement can potentially improve LLM’s capability in tailoring texts to specific reader profiles. As such, the primary focus of Experiment 2 is on enhancing prompts to optimize the LLM performance in text personalization and improve the alignment between LLM-generated personalized texts and specific reader profiles.

Several refinements were made to strengthen the instruction prompt used in Experiment 1. See Table 7 for changes made to the modified prompts. First, the descriptions of reader profiles lacked specificity and clarity. To address this, we included more detailed and quantifiable metrics to demonstrate reading proficiency and prior knowledge level. Instead of using broad terms like high or low, we predicted that incorporating more specific metrics to describe these attributes would enable the model to better tailor texts. By explicitly including quantifiable measures in the profile description, we aimed to reduce ambiguity and enhance the model’s ability to adjust linguistic features aligned with unique needs. Incorporating objective data about the reader was included to enhance clarity and ensure more consistent and goal-aligned responses. This approach aligns with practices in intelligent tutoring systems (ITSs) in which detailed learner profiles are essential for delivering tailored content to suit each student [61,62,63]. Furthermore, insights from educational assessment and LLM evaluation methodologies suggested that standardized metrics provide a clear context for both human and machine interpretation [64]. As such, in LLM prompt design, the standardized metrics provide a more precise context, which helps LLM tailor the text more effectively and align with the intended complexity level appropriate for the student’s proficiency [59,65].

To quantify reading skills, standardized test scores were used as indicators. For instance, the ACT English composite score (32/36; 96th percentile) and ACT Reading composite score (31/36; 94th percentile) are included to represent a skilled reader. A reader profile with low knowledge of science is represented by ACT Math (18/36; 42nd percentile) and ACT Science percentile ranking (19/36; 46th percentile). The description in the profile includes additional information like “Completed one high-school level biology course, with no advanced science coursework, and limited exposure to scientific concepts”. By including standardized test scores and quantifiable metrics representing reading proficiency and science knowledge, we predicted that LLM performance would be enhanced.

Secondly, we integrated Retrieval-Augmented Generation (RAG) to enhance the contextual relevance of outputs by grounding the LLMs’ responses in empirical findings. Retrieval-Augmented Generation (RAG) is a framework that enhances the accuracy and contextual relevance of output by incorporating an external knowledge base during the text generation process [66,67]. Integrating research articles into the LLMs’ knowledge base helps guide the modification process, which helps the LLMs to align linguistic features to reader abilities [68,69]. The LLM integrates insights by tailoring text features to align with established cognitive science theories on discourse processing without referencing the documents directly.

In our prompt, the LLMs were instructed to retrieve and utilize research papers as guidelines for modifying content. The RAG approach utilized here involves content-grounded referencing in which the retrieved empirical and theoretical documents serve as contextual frameworks to guide the modifications implicitly. The documents guide the adaptation process implicitly rather than through explicit textual referencing. The reference materials provided to the LLMs to support the RAG process consisted of peer-reviewed research papers and book chapters focusing on key domains such as text comprehension, readability, cohesion, and individual differences. We included theoretical papers on comprehension theories, such as the Construction–Integration model [70] and comprehensive model of comprehension [2], which explain how readers construct meaning based on textual features and prior knowledge. Empirical studies were also used to provide grounded evidence on how text cohesion and prior knowledge interact to influence comprehension [43,45]. Moreover, we included research papers on linguistic features of writing quality to guide the adaptation of cohesion, writing style, and lexical and syntactic features. See Appendix D for a complete list of documents.

Two additional techniques were also incorporated into the new prompt: chain-of-thought and personification. Personification involves instructing the LLMs to adopt a role aligned with the task objectives, which leverages its ability to simulate human-like reasoning [59,60]. When researchers prompt LLMs to adopt a specific role (e.g., cognitive scientist), they more effectively leverage the model’s capability to simulate goal-directed behaviors aligned with task expectations [59]. Chain-of-thought involves instructing the model to explain its reasoning process step by step. Prior research has shown that making the reasoning process transparent can enhance LLM performance, accuracy, and consistency because it breaks down tasks into smaller parts, resulting in more coherent and logical responses [71]. Outlining steps and rationales also helps researchers identify errors and refine prompts.

Prompting plays a critical role in the response quality, and making the prompt more detailed and structured enhances task performance. As such, in the second experiment, we used NLP to evaluate the improvement in modification generated using the augmented prompt. By analyzing linguistic features, we assess the alignment between modifications and readers’ unique needs and abilities. The four hypothetical reader profiles in Experiment 1 were modified to be more detailed and structured, following the example in Table 6. The revised prompts were evaluated using the same 10 scientific texts from Experiment 1. Unlike Experiment 1, which compared multiple LLMs, Experiment 2 focused on examining the effect of augmented prompting strategies using Gemini to isolate the variability from other LLMs. The augmented prompts were tested on the same 10 scientific texts, and modifications were analyzed using NLP tools to extract linguistic features such as cohesion, lexical complexity, and sentence structure. The NLP analyses were used to assess whether the augmented prompt led to improvements in text alignment with different reader profiles. By demonstrating how NLP tools can assess suitability between textual features and reader characteristics, this research proposes a scalable solution for evaluating text personalization.

3.2. Materials and Method Experiment 2

3.2.1. LLM Selection Experiment 2

In the second experiment, we refined the prompting process to improve text personalization and focused on the performance of Gemini. Gemini was selected for Experiment 2 because its modifications closely resembled those of ChatGPT in terms of conceptual depth but provided more engaging and cohesive outputs. See Appendix A for details (e.g., versions, dates of usage, and training size and parameters).

3.2.2. Reader Profiles Experiment 2

Four hypothetical reader profiles created for Experiment 1 were modified to improve the structure and level of specificity. These enhancements aimed to provide LLMs with clear and more context about the profiles. For instance, standardized test scores (e.g., ACT composite scores in English, Reading, Math, and Science) were included, which quantified readers’ proficiency levels. Statements about readers’ academic backgrounds and exposure to scientific concepts further clarified each profile’s prior knowledge and topic familiarity. Table 7 provides comprehensive descriptions of these enhanced reader profiles.

3.2.3. Procedure Experiment 2

The goal of Experiment 2 was to enhance prompt clarity and structure to improve the alignment between LLM-generated modifications and the unique needs of diverse reader profiles. Gemini was prompted with the augmented prompt to modify ten scientific texts to suit the four revised reader profiles (see Appendix C). Ten expository texts from Experiment 1 were used for this experiment. See Table 3 for information about the texts. Each generation iteration involved ten modifications for each reader profile. In total, there were 40 modifications for four reader profiles. Each modified text was systematically analyzed using WAT [51] to extract linguistic features related text readability, such as cohesion, lexical complexity, syntax complexity, academic writing style, and idea elaboration. Table 1 describes each linguistic feature included in the analysis. These features provided quantitative validation metrics to assess the extent to which the augmented prompt improved the alignment between text modifications and reader profiles. The linguistic features of modifications generated by Gemini from Experiment 1 were used to compare across the two prompts.

3.3. Results Experiment 2

3.3.1. Academic Writing

To examine the extent to which augmented prompts improved alignment between linguistic features and readers’ needs, a two-way ANOVA (four reader profiles: High PK/High RS, High PK/Low RS, Low PK/High RS, and Low PK/Low RS × two prompt types: single-shot vs. augmented) was conducted. There was a significant two-way interaction effect between reader profile and prompt, suggesting that the impact of the augmented prompt on linguistic features of modified texts varied depending on the reader profiles.

The interaction effects between prompt and profile were significant, F (3, 240) = 10.45, p < 0.001, η² = 0.12. As intended, the results showed that academic writing was significantly lower for texts modified for less skilled and low-knowledge readers using the augmented prompt compared to the single-shot prompt (p < 0.001). See Figure 1 for the main effect of prompt and reader profiles on academic writing.

3.3.2. Conceptual Density and Cohesion

The interaction effect between prompt and profile was significant for cohesion, F (3, 240) = 6.18, p < 0.001, η² = 0.07. As predicted, cohesion was significantly lower for texts modified for skilled and high-knowledge readers using the augmented prompt (p = 0.02) and was significantly higher for less skilled and low-knowledge readers (p < 0.05). See Figure 2 for the main effect of prompt and reader profiles on sentence cohesion.

3.3.3. Syntactic and Lexical Complexity

The interaction effect between prompt and profile was significant for language variety, F (3, 240) = 6.24, p < 0.001, η² = 0.08, and sophisticated wording, F (3, 240) = 17.44, p < 0.001, η² = 0.19. As intended, modifications generated by augmented prompts for high knowledge and skilled profiles significantly increased language variety and sophisticated wording. In contrast, modifications for low-knowledge or less skilled profiles significantly decreased lexical sophistication, aligning with the hypothesized difficulty levels for each profile. See Figure 3 and Figure 4 for the main effect of prompt and reader profiles on language variety and sophisticated wording.

3.4. Discussion Experiment 2

NLP analyses provided quantifiable evidence suggesting that the augmented prompt effectively tailored linguistic features to suit different reader needs. Linguistic features were more appropriately aligned to reader needs based on evidence-based text modifications reported in prior research related to comprehending challenging text. Specifically, texts generated for skilled, high-knowledge reader profiles were more syntactically complex and lexically sophisticated and lower in cohesion. In contrast, the modifications for less skilled and low-knowledge profiles had simpler syntax and more common vocabulary and were more cohesive. Simplifying language and using familiar words improves readability and ease of understanding for less skilled, low-knowledge readers. The results of Experiment 2 suggest that NLP successfully provided objective measures that illustrate how prompt refinements can lead to more effective text personalization.

4. General Discussion

4.1. Text Readability and Reader Alignment

In these two experiments, we leveraged LLMs to modify text for different reader profiles and leveraged linguistic features to evaluate the LLM-generated modifications. Experiment 1 demonstrated that LLMs vary in how they adapt text readability to align with readers’ characteristics, which was evidenced in variations of linguistic features related to text difficulty. In Experiment 2, NLP analyses illustrated how the augmented prompt enhances text personalization. These prompt refinements resulted in better-aligned text readability.

Tailoring text complexity to reader abilities is essential for supporting comprehension and learning, especially for less skilled readers. Students with lower topic knowledge and reading proficiency often struggle with decoding, fluency, and monitoring comprehension [72,73]. These challenges are exacerbated when texts contain dense sentence structures, academic vocabulary, or lack cohesion. Therefore, to support comprehension, personalized texts should simplify syntax (e.g., active voice and shorter sentences) and lexical complexity and use consistent language. Deeper linguistic properties, such as cohesion and conceptual density, also influence comprehension, especially for readers with limited prior knowledge. In our research, personalized texts for low-knowledge readers demonstrated high cohesion through elaborations and connective words. These features have been shown to benefit low-knowledge readers and support comprehension by bridging conceptual gaps [43,45]. In contrast, texts modified for high-knowledge and skilled readers included more sophisticated syntax and vocabulary, more varied language, and a writing style more similar to academic writing. These texts exhibited lower cohesion, which can promote deeper comprehension and learning by encouraging active inference-making [43,45].

4.2. Variability Across LLMs

In addition, NLP analyses revealed variations in linguistic features across LLM outputs, regardless of reader profiles. While all LLMs demonstrated sensitivity to readers’ needs, the modifications varied in writing style and linguistic patterns. Llama produced the most complex texts, marked by extensive use of academic language and sophisticated terminologies. Claude’s outputs were significantly less cohesive compared to other models, while Gemini’s outputs had the highest level of lexical diversity and used varied language structures. LLMs also differed significantly on features such as idea development, sentence length, and academic word frequency, suggesting each model produced outputs with distinct linguistic patterns. These findings aligned with prior research, suggesting that differences in both training data and fine-tuning strategies result in variability in outputs even when prompted with identical input [54,65,74]. It is both practically necessary and theoretically important to examine how LLMs personalize texts when considering applying LLMs to modify learning materials. Even small variations in output can influence readability, coherence, and ease of comprehension, ultimately impacting the quality of personalized learning experiences.

This research further demonstrates the benefits of augmenting information provided to the LLMs using various techniques (e.g., personification, chain-of-thought, and RAG). Generative AI can be leveraged to quickly modify content; there is little doubt that LLMs can generate revised versions of content based on human instructions. However, the dilemma is that there have been few attempts (if any), other than intuition, to validate modifications of content using GenAI. These findings also demonstrated how text analysis using NLP tools can be used to evaluate the linguistic features of LLM-generated texts, providing a basis to assess whether adaptations align with different reader needs. The current research highlighted NLP-based analysis as a practical and scalable validation method for assessing personalization quality. In an adaptive learning system, NLP tools can be leveraged to analyze linguistic features in real time, which supports rapid evaluation and iterative refinement without requiring costly human studies.

4.3. Limitations and Future Directions

In the current research, only four reader profiles were used for generating LLM-adapted texts. While this number provided sufficient proof-of-concept for exploring personalized adaptations and demonstrating the capabilities of NLP analyses in evaluating personalized content, the limited number of profiles potentially constrains the generalizability of our findings. Specifically, the reader profiles might not capture the full complexity and range of individual differences in a diverse student population. Additional characteristics such as interests, motivation, cultural backgrounds, learning disabilities beyond dyslexia, or cognitive characteristics were not sufficiently represented in these profiles [75,76,77]. To accurately personalize and evaluate educational materials, it is essential to capture these broader individual differences beyond reading skills and prior knowledge [78]. Future research should explore more nuanced reader characteristics and incorporate a more diverse set of reader profiles to improve external validity. In addition, it would be beneficial to test on a larger and more diverse text domain.

Another limitation of these two experiments is that each text was prompted to be modified only once for each reader profile. LLMs are inherently non-deterministic, and each iteration can produce varying outputs even when using the same prompts [79]. The variability in responses across identical prompts even within the same model is particularly problematic, especially when they are used for educational tasks that require consistency [80]. As such, conducting single-run trials limits the ability to capture and quantify this internal variability. Future research can systematically analyze repeated text generations to examine reliability and consistency across LLM outputs.

The current research demonstrated how LLMs can be used to generate personalized educational content, as well as how NLP can be leveraged to assess and validate content quality. Notably, however, a persistent challenge is the risk of hallucinations. LLMs inherently generate information that is potentially inaccurate or irrelevant [81,82,83]. While semantic similarity between the original and revised texts was automatically calculated to detect inconsistencies, we still had to have a human in the loop to manually review whether the generated content preserved key concepts and remained relevant to the original text (see Appendix E for the rubric guiding evaluation process). While automating this process would require multi-agent programming in which a separate LLM checked accuracy and relevance, NLP offers an efficient means of supporting quality assessment and iterative refinement of content. This approach is valuable because personalized content can be dynamically and continuously adapted to students’ evolving knowledge, interests, and skills.

5. Conclusions

The novel contribution of this research is the demonstration of an NLP-based validation method that systematically and iteratively assesses GenAI-modified educational texts. Unlike traditional human assessment, which is resource-intensive and time-consuming, leveraging NLP analyses provides a scalable and efficient validation process. Specifically, NLP metrics objectively quantify text complexity and readability, allowing for immediate feedback loops that support rapid iteration and content refinement. In the two experiments, NLP analyses provided objective and quantifiable evidence to evaluate linguistic alignment between LLM-generated modifications and readers’ individual needs. Experiment 1 illustrated that LLMs varied in their capacity to adjust readability appropriately for different reader profiles. Experiment 2 demonstrated how augmented prompting techniques improved alignment with readers’ characteristics, further emphasizing the effectiveness of iterative refinements using NLP-based feedback.

Leveraging NLP to validate LLM modifications has strong potential to support continuous and iterative refinements to enhance personalization. NLP analyses can provide real-time assessment regarding the extent to which content is appropriately tailored to students’ needs and abilities. As a result, it allows for iterative refinements to increase the likelihood that the LLM will generate content aligned with expectations. Modifications can be adjusted based on quantitative analyses and further improved without requiring costly and time-consuming studies with human participants. By quickly tailoring text features with individual needs and validating the content, NLP-based validation approaches have the potential to enhance personalized learning [84,85]. After initial LLM outputs are automatically analyzed, subsequent iterations can integrate NLP-generated feedback. This data-driven iterative cycle allows the system to continuously and dynamically adapt educational content as learners’ skills and knowledge evolve, significantly enhancing personalized learning experiences over time. The current research demonstrated that NLP can serve as a scalable, real-time, and efficient evaluation method for automatically validating LLM-generated personalized content. NLP-based validation can be applied to create adaptive, personalized learning environments that evolve in response to learner growth. Investigating the impact of learner-aligned LLM adaptation on academic performance and engagement represents a critical next step.

Author Contributions

Conceptualization, L.H. and D.S.M.; methodology, L.H. and D.S.M.; formal analysis, L.H.; investigation, L.H. and D.S.M.; resources, D.S.M.; data curation, L.H.; writing—original draft preparation, L.H.; writing—review and editing, L.H. and D.S.M.; visualization, L.H.; supervision, D.S.M.; project administration, D.S.M.; funding acquisition, D.S.M. All authors have read and agreed to the published version of the manuscript.

Funding

The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305T240035 to Arizona State University. The opinions expressed are those of the authors and do not represent the views of the Institute or the U.S. Department of Education.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in Google Drive at: https://drive.google.com/drive/folders/1qkkt6OwtpfJRQENKNoDrSUj1NGABL2Hm?usp=sharing, accessed on 10 June 2025.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PK	Prior knowledge
RS	Reading skills
GenAI	Generative AI
LLM	Large Language Model
RAG	Retrieval-Augmented Generation
NLP	Natural Language Processing
ITS	Intelligent Tutoring System
iSTART	Interactive Strategy Training for Active Reading and Thinking
FKGL	Flesch–Kincaid Grade Level
WAT	Writing Analytics Tool

Appendix A. LLM Descriptions

This appendix includes details on the LLMs used in the submitted research, including model versions, date of access, access method, training size, and number of parameters.

Claude 3.5 Sonnet (Anthropic)

Version Used: Claude 3.5;
Date Accessed: 31 August 2024;
Accessed via https://poe.com web deployment, default configurations were used;
Training Size: Claude is trained on a large-scale, diverse dataset derived from a broad range of online and curated sources. The exact size of the training data remains proprietary;
Number of Parameters: The exact number of parameters for Claude 3.5 is not disclosed by Anthropic, but it is estimated to be between 70–100 billion parameters.

Llama (Meta)

Version Used: Llama 3.1;
Date Accessed: 31 August 2024;
Accessed via https://poe.com web deployment, default configurations were used;
Llama 3.1 was trained on 2 trillion tokens sourced from publicly available datasets, including books, websites, and other digital content.;
Number of Parameters: Llama 3.1 consists of 70 billion parameters.

Gemini Pro 1.5 (Google DeepMind)

Version Used: Gemini Pro 1.5;
Date Accessed: 31 August 2024;
Accessed via https://poe.com web deployment, default configurations were used;
Training Size: Gemini is trained on 1.5 trillion tokens, sourced from a wide variety of publicly available and curated data, including text from books, websites, and other large corpora;
Number of Parameters: Gemini 1.0 operates with 100 billion parameters.

ChatGPT (OpenAI)

Version Used: GPT-4o;
Date Accessed: 31 August 2024;
Accessed via https://poe.com web deployment, default configurations were used;
Training Size: GPT-4 was trained on an estimated 1.8 trillion tokens from diverse sources, including books, web pages, academic papers, and large text corpora;
Number of Parameters: The exact number of parameters for GPT-4 is not publicly disclosed but is in the range of 175 billion parameters.

Appendix B. Single-Shot Prompt Experiment 1

This appendix contains the single-shot prompt used in Experiment 1. The prompt instructs the LLM to modify a scientific text to enhance comprehension and engagement for a specific reader profile.

Prompt: Modify the text to improve comprehension and engagement for a reader. The goal of this personalization task is to tailor materials that align with the unique characteristics of the reader (age, background knowledge, reading skills, reading goal and preferences, and interests) while maintaining content coverage and important terms/concepts from the original text. Follow the following steps:

Analyze the input text and determine its reading level (e.g., Flesch–Kincaid Grade Level), linguistic complexity (e.g., sentence length and vocabulary), and the assumed background knowledge required for comprehension.
Analyze the reader profile and identify key information: age, reading level (e.g., beginner, intermediate, advanced), prior knowledge (specific knowledge related to the text’s topic), reading goals (e.g., learning new concepts, enjoyment, research, pass an exam), interests (what topics or themes are motivating for the reader?), accessibility needs (specify any learning disabilities or preferences that require text adaptations, dyslexia, or visual impairments).
Reorganize information and modify the syntax, vocabulary, and tone to tailor to the readers’ characteristics.
If the reader has less knowledge about the topic, then provide sufficient background knowledge or relatable examples and analogies to support comprehension and engagement. If the reader has strong background knowledge and high reading skills, then increase the depth of information and avoid overly explaining details.
[Insert Reader 1 Description].
[Insert Text].

Appendix C. Augmented Prompt Experiment 2

This appendix includes the augmented prompt used in Experiment 2. The prompt incorporates advanced prompting strategies, including personification, task objectives, chain-of-thought reasoning, and Retrieval-Augmented Generation (RAG).

Table A1. Augmented Prompt used in Experiment 2. Source: authors’ contribution.

Components	Augmented Prompt
Personification	Imagine you are a cognitive scientist specializing in reading comprehension and learning science
Task objectives	Modify this text to enhance text comprehension, engagement, and accessibility for the reader profile while maintaining conceptual depth, scientific rigor, and pedagogical value Adapt the text in a way that supports the readers’ understanding of scientific concepts, using strategies that align with empirical findings on text cohesion, reading skills, and prior knowledge Help the reader retain scientific concepts and reinforce understanding Ensure that the reader can build meaningful understanding while being challenged at an appropriate level
Chain-of-thought	Explain the rationale behind each modification approach and how each change helps the reader grasp the scientific concepts and retain information
RAG	Refer to the attached pdf files. Apply these empirical findings and theoretical frameworks from these files as guidelines to tailor text Impact of prior knowledge and reading skills on comprehension of science texts Impact of prior knowledge on integration of new knowledge according to the Construction-Integration (CI) Model of Text Comprehension (Kintsch, 1998) Impact of text cohesion on comprehension the differential effect of cohesion on comprehension depending on level of prior knowledge and reading skills
Reader profile	[Insert Revised Reader Profile Description from Table 2]
Text input	[Insert Text]

Appendix D. Articles Used in RAG Process

This appendix provides the documents used during the Retrieval-Augmented Generation (RAG) process in Experiment 2.

McNamara, D.S.; Crossley, S.A.; McCarthy, P.M. (2010). Linguistic features of writing quality. Written Communication, 27(1), 57–86. https://doi.org/10.1177/0741088309351547.

McNamara, D.S.; Graesser, A.C.; Louwerse, M.M. Sources of text difficulty: Across genres and grades. In Measuring Up: Advances in How We Assess Reading Ability; Sabatini, J., Albro, E., O’Reilly, T., Eds.; Rowman & Littlefield: Lanham, MD, USA, 2012; pp. 89–116.

McNamara, D.S.; Louwerse, M. M.; McCarthy, P.M.; Graesser, A. C. Coh-Metrix: Capturing linguistic features of cohesion. Discourse Process, 2010, 47(4), 292–330. https://doi.org/10.1080/01638530902959943.

Kintsch, W. Revisiting the construction–integration model of text comprehension and its implications for instruction. In Theoretical Models and Processes of Literacy, 7th ed.; Alvermann, D. E., Unrau, N. J., Ruddell, R. B., Eds.; Routledge: New York, NY, USA, 2018; pp. 178–203.

McNamara, D.S.; Magliano, J.P. (2009). Toward a comprehensive model of comprehension. In B. Ross (Ed.), Psychology of Learning and Motivation, 51, 297–384. Elsevier. https://doi.org/10.1016/S0079-7421(09)51009-2.

McNamara, D.S.; Ozuru, Y.; Floyd, R.G. (2011). Comprehension challenges in the fourth grade: The roles of text cohesion, genre, and readers’ prior knowledge. International Electronic Journal of Elementary Education, 4(1), 229–257.

Ozuru, Y.; Dempsey, K.; McNamara, D.S. (2009). Prior knowledge, reading skill, and text cohesion in the comprehension of science texts. Learning and Instruction, 19(3), 228-242. https://doi.org/10.1016/j.learninstruc.2008.04.003.

O’Reilly, T.; McNamara, D.S. (2007). The impact of science knowledge, reading skill, and reading strategy knowledge on more traditional “high-stakes” measures of high school students’ science achievement. American Educational Research Journal, 44(1), 161–196. https://doi.org/10.3102/0002831206298171.

Appendix E. Quality Assessment Rubric

This appendix provides the rubric used by researchers to assess the quality, relevance, and alignment of the modified texts with the intended reader profiles.

1. Reader Engagement:

Given the reader’s characteristics, what factors of the modified text make it suitable and engaging for the specific reader?
If I were the student,
○
Does the text capture my attention and interest?
○
Do I feel interested in/engaged with the text?

2. Text–Reader Alignment:

Which modification is most well-suited to the reader considering the following:

Readability: reading level, overall length, syntax and word, and tone and style
Structure and organization: Does the text present information in a way that is easily processed considering the reader’s characteristics (age, reading level, reading disability)?
Titles, headings, and subheadings (cohesion, clear flow).
Language used and word choice.
Engagement: Does the text capture and maintain the reader’s interest throughout? Consider factors like motivation and interest and writing tone.
Depth of information: level of technicality, focus, and emphasis. Which version provides sufficient detail and scientific rigor suitable for the reader’s background knowledge?
Accessibility: Does the text accommodate potential learning/reading disabilities (e.g., dyslexia)?

3. Comprehension Support:

Are scientific concepts conveyed clearly and at an appropriate level?
Are there features that facilitate memory and comprehension? (summary section— summarize the main points and reiterate important concepts, bullet points, highlighted key terms, relatable examples, and analogies)
Quality of examples?

References

Abbes, F.; Bennani, S.; Maalel, A. Generative AI and gamification for personalized learning: Literature review and future challenges. SN Comput. Sci. 2024, 5, 1154. [Google Scholar] [CrossRef]
McNamara, D.S.; Magliano, J. Toward a comprehensive model of comprehension. Psychol. Learn. Motiv. 2009, 51, 297–384. [Google Scholar]
Calloway, R.C.; Helder, A.; Perfetti, C.A. A measure of individual differences in readers’ approaches to text and its relation to reading experience and reading comprehension. Behav. Res. Methods 2023, 55, 899–931. [Google Scholar] [CrossRef] [PubMed]
Smith, R.; Snow, P.; Serry, T.; Hammond, L. The role of background knowledge in reading comprehension: A critical review. Read. Psychol. 2021, 42, 214–240. [Google Scholar] [CrossRef]
Leong, J.; Pataranutaporn, P.; Danry, V.; Perteneder, F.; Mao, Y.; Maes, P. Putting things into context: Generative AI-enabled context personalization for vocabulary learning improves learning motivation. In Proceedings of the CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 11–16 May 2024; pp. 1–15. [Google Scholar]
Crossley, S.A.; Skalicky, S.; Dascalu, M.; McNamara, D.; Kyle, K. Predicting text comprehension, processing, and familiarity in adult readers: New approaches to readability formulas. Discourse Process. 2017, 54, 340–359. [Google Scholar] [CrossRef]
Crossley, S.A.; Heintz, A.; Choi, J.S.; Batchelor, J.; Karimi, M.; Malatinszky, A. A large-scaled corpus for assessing text difficulty. Behav. Res. Methods 2023, 55, 491–507. [Google Scholar] [CrossRef]
Natriello, G. The adaptive learning landscape. Teach. Coll. Rec. 2017, 119, 1–22. [Google Scholar] [CrossRef]
du Boulay, B.; Poulovassilis, A.; Holmes, W.; Mavrikis, M. What does the research say about how artificial intelligence and big data can close the achievement gap? In Enhancing Learning and Teaching with Technology: What the Research Says; Luckin, R., Ed.; Routledge: London, UK, 2018; pp. 256–285. [Google Scholar]
Kucirkova, N. Personalised learning with digital technologies at home and school: Where is children’s agency? In Mobile Technologies in Children’s Language and Literacy; Oakley, G., Ed.; Emerald Publishing: Bingley, UK, 2018; pp. 133–153. [Google Scholar]
Dong, L.; Tang, X.; Wang, X. Examining the effect of artificial intelligence in relation to students’ academic achievement in classroom: A meta-analysis. Comput. Educ. Artif. Intell. 2025, 8, 100400. [Google Scholar] [CrossRef]
Pesovski, I.; Santos, R.; Henriques, R.; Trajkovik, V. Generative AI for customizable learning experiences. Sustainability 2024, 16, 3034. [Google Scholar] [CrossRef]
Eccles, J.S.; Wigfield, A. From expectancy-value theory to situated expectancy-value theory: A developmental, social cognitive, and sociocultural perspective on motivation. Contemp. Educ. Psychol. 2020, 61, 101859. [Google Scholar] [CrossRef]
Bonifacci, P.; Viroli, C.; Vassura, C.; Colombini, E.; Desideri, L. The relationship between mind wandering and reading comprehension: A meta-analysis. Psychon. Bull. Rev. 2023, 30, 40–59. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; Huang, R.T.; Sommer, M.; Pei, B.; Shidfar, P.; Rehman, M.S.; Martin, F. The efficacy of AI-enabled adaptive learning systems from 2010 to 2022 on learner outcomes: A meta-analysis. J. Educ. Comput. Res. 2024, 62, 1568–1603. [Google Scholar] [CrossRef]
Pane, J.F.; Steiner, E.D.; Baird, M.D.; Hamilton, L.S.; Pane, J.D. Informing Progress: Insights on Personalized Learning Implementation and Effects; Res. Rep. RR-2042-BMGF; Rand Corp.: Santa Monica, CA, USA, 2017. [Google Scholar]
Walkington, C.; Bernacki, M.L. Personalizing algebra to students’ individual interests in an intelligent tutoring system: Moderators of impact. Int. J. Artif. Intell. Educ. 2019, 29, 58–88. [Google Scholar] [CrossRef]
Simon, P.D.; Zeng, L.M. Behind the scenes of adaptive learning: A scoping review of teachers’ perspectives on the use of adaptive learning technologies. Educ. Sci. 2024, 14, 1413. [Google Scholar] [CrossRef]
Kim, J.S.; Burkhauser, M.A.; Mesite, L.M.; Asher, C.A.; Relyea, J.E.; Fitzgerald, J.; Elmore, J. Improving reading comprehension, science domain knowledge, and reading engagement through a first-grade content literacy intervention. J. Educ. Psychol. 2021, 113, 3–26. [Google Scholar] [CrossRef]
Hattan, C.; Alexander, P.A.; Lupo, S.M. Leveraging what students know to make sense of texts: What the research says about prior knowledge activation. Rev. Educ. Res. 2024, 94, 73–111. [Google Scholar] [CrossRef]
Major, L.; Francis, G.A.; Tsapali, M. The effectiveness of technology-supported personalized learning in low- and middle-income countries: A meta-analysis. Br. J. Educ. Technol. 2021, 52, 1935–1964. [Google Scholar] [CrossRef]
FitzGerald, E.; Jones, A.; Kucirkova, N.; Scanlon, E. A literature synthesis of personalised technology-enhanced learning: What works and why. Res. Learn. Technol. 2018, 26, 1–13. [Google Scholar] [CrossRef]
Mesmer, H.A.E. Tools for Matching Readers to Texts: Research-Based Practices, 1st ed.; Guilford Press: New York, NY, USA, 2008; pp. 1–234. [Google Scholar]
Lennon, C.; Burdick, H. The Lexile Framework as an Approach for Reading Measurement and Success. Available online: https://metametricsinc.com/wp-content/uploads/2017/07/The-Lexile-Framework-for-Reading.pdf (accessed on 10 June 2025).
Stenner, A.J.; Burdick, H.; Sanford, E.E.; Burdick, D.S. How accurate are Lexile text measures. J. Appl. Meas. 2007, 8, 307–322. [Google Scholar]
Gligorea, I.; Cioca, M.; Oancea, R.; Gorski, A.T.; Gorski, H.; Tudorache, P. Adaptive learning using artificial intelligence in e-learning: A literature review. Educ. Sci. 2023, 13, 1216. [Google Scholar] [CrossRef]
Sharma, S.; Mittal, P.; Kumar, M.; Bhardwaj, V. The role of large language models in personalized learning: A systematic review of educational impact. Discov. Sustain. 2025, 6, 243. [Google Scholar] [CrossRef]
Martínez, P.; Ramos, A.; Moreno, L. Exploring large language models to generate easy-to-read content. Front. Comput. Sci. 2024, 6, 1394705. [Google Scholar] [CrossRef]
Xiao, C.; Xu, S.X.; Zhang, K.; Wang, Y.; Xia, L. Evaluating reading comprehension exercises generated by LLMs: A showcase of ChatGPT in education applications. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), Toronto, ON, Canada, 13 July 2023; pp. 610–625. [Google Scholar]
Hadzhikoleva, S.; Rachovski, T.; Ivanov, I.; Hadzhikolev, E.; Dimitrov, G. Automated test creation using large language models: A practical application. Appl. Sci. 2024, 14, 9125. [Google Scholar] [CrossRef]
Chu, Z.; Wang, S.; Xie, J.; Zhu, T.; Yan, Y.; Ye, J.; Wen, Q. LLM agents for education: Advances and applications. arXiv 2025, arXiv:2503.11733. [Google Scholar]
Dunn, T.J.; Kennedy, M. Technology-enhanced learning in higher education: Motivators and demotivators of student engagement. Comput. Educ. 2019, 129, 13–22. [Google Scholar]
Caccavale, F.; Gargalo, C.L.; Gernaey, K.V.; Krühne, U. Towards Education 4.0: The role of Large Language Models as virtual tutors in chemical engineering. Educ. Chem. Eng. 2024, 49, 1–11. [Google Scholar] [CrossRef]
Alamri, H.A.; Watson, S.; Watson, W. Learning technology models that support personalization within blended learning environments in higher education. TechTrends 2021, 65, 62–78. [Google Scholar] [CrossRef]
Crossley, S.A. Linguistic features in writing quality and development: An overview. J. Writ. Res. 2020, 11, 415–443. [Google Scholar] [CrossRef]
Gao, R.; Merzdorf, H.E.; Anwar, S.; Hipwell, M.C.; Srinivasa, A.R. Automatic assessment of text-based responses in post-secondary education: A systematic review. Comput. Educ. Artif. Intell. 2024, 6, 100206. [Google Scholar] [CrossRef]
McNamara, D.S.; Graesser, A.C.; McCarthy, P.M.; Cai, Z. Automated Evaluation of Text and Discourse with Coh-Metrix, 1st ed.; Cambridge University Press: Cambridge, UK, 2014; pp. 1–312. [Google Scholar]
Lupo, S.M.; Tortorelli, L.; Invernizzi, M.; Ryoo, J.H.; Strong, J.Z. An exploration of text difficulty and knowledge support on adolescents’ comprehension. Read. Res. Q. 2019, 54, 457–479. [Google Scholar] [CrossRef]
Allen, L.K.; Creer, S.C.; Öncel, P. Natural language processing as a tool for learning analytics—Towards a multi-dimensional view of the learning process. In Proceedings of the Artificial Intelligence in Education Conference (AIED 2022), Durham, UK, 27–31 July 2022. [Google Scholar]
Nagy, W.; Townsend, D. Words as tools: Learning academic vocabulary as language acquisition. Read. Res. Q. 2012, 47, 91–108. [Google Scholar] [CrossRef]
Snow, C.E. Academic language and the challenge of reading for learning about science. Science 2010, 328, 450–452. [Google Scholar] [CrossRef] [PubMed]
Frantz, R.S.; Starr, L.E.; Bailey, A.L. Syntactic complexity as an aspect of text complexity. Educ. Res. 2015, 44, 387–393. [Google Scholar] [CrossRef]
Ozuru, Y.; Dempsey, K.; McNamara, D.S. Prior knowledge, reading skill, and text cohesion in the comprehension of science texts. Learn. Instr. 2009, 19, 228–242. [Google Scholar] [CrossRef]
Van den Broek, P.; Bohn-Gettelmann, S.; Kendeou, P.; White, M.J. Reading comprehension and the comprehension-monitoring activities of children with learning disabilities: A review using the dual-process theory of reading. Educ. Psychol. Rev. 2015, 27, 641–644. [Google Scholar]
O’Reilly, T.; McNamara, D.S. Reversing the reverse cohesion effect: Good texts can be better for strategic, high-knowledge readers. Discourse Process. 2007, 43, 121–152. [Google Scholar] [CrossRef]
Pressley, M.; Afflerbach, P. Verbal Protocols of Reading: The Nature of Constructively Responsive Reading; Routledge: New York, NY, USA, 2012. [Google Scholar]
Smallwood, J.; Schooler, J.W. The restless mind. Psychol. Bull. 2006, 132, 946–957. [Google Scholar] [CrossRef]
Unsworth, N.; McMillan, B.D. Mind wandering and reading comprehension: Examining the roles of working memory capacity, interest, motivation, and topic experience. J. Exp. Psychol. Learn. Mem. Cogn. 2013, 39, 832–848. [Google Scholar] [CrossRef]
Lind, F.; Gruber, M.; Boomgaarden, H.G. Content analysis by the crowd: Assessing the usability of crowdsourcing for coding latent constructs. Commun. Methods Meas. 2017, 11, 191–209. [Google Scholar] [CrossRef]
Beck, I.L.; McKeown, M.G.; Sinatra, G.M.; Loxterman, J.A. Revising social studies text from a text-processing perspective: Evidence of improved comprehensibility. Read. Res. Q. 1991, 26, 251–276. [Google Scholar] [CrossRef]
Potter, A.; Shortt, M.; Goldshtein, M.; Roscoe, R.D. Assessing academic language in tenth-grade essays using natural language processing. Assess. Writ. 2025, in press. [CrossRef]
Srivastava, A.; Rastogi, A.; Rao, A.; Shoeb, A.A.M.; Abid, A.; Fisch, A.; Wang, G. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv 2022, arXiv:2206.04615. [Google Scholar]
Gao, T.; Jin, J.; Ke, Z.T.; Moryoussef, G. A comparison of DeepSeek and other LLMs. arXiv 2025, arXiv:2502.03688. [Google Scholar]
Rosenfeld, A.; Lazebnik, T. Whose LLM is it anyway? Linguistic comparison and LLM attribution for GPT-3.5, GPT-4 and Bard. arXiv 2024, arXiv:2402.14533. [Google Scholar]
Reviriego, P.; Conde, J.; Merino-Gómez, E.; Martínez, G.; Hernández, J.A. Playing with words: Comparing the vocabulary and lexical diversity of ChatGPT and humans. Mach. Learn. Appl. 2024, 18, 100602. [Google Scholar] [CrossRef]
Lee, N.; Hong, J.; Thorne, J. Evaluating the consistency of LLM evaluators. arXiv 2024, arXiv:2412.00543. [Google Scholar]
Arner, T.; McCarthy, K.S.; McNamara, D.S. iSTART StairStepper—Using comprehension strategy training to game the test. Computers 2021, 10, 48. [Google Scholar] [CrossRef]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
Reynolds, L.; McDonell, K. Prompt programming for large language models: Beyond the few-shot paradigm. In Proceedings of the Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 8–13 May 2021; pp. 1–7. [Google Scholar]
Zhou, Y.; Muresanu, A.I.; Han, Z.; Paster, K.; Pitis, S.; Chan, H.; Ba, J. Large language models are human-level prompt engineers. arXiv 2023, arXiv:2305.10412. [Google Scholar]
Barthakur, A.; Dawson, S.; Kovanovic, V. Advancing learner profiles with learning analytics: A scoping review of current trends and challenges. In Proceedings of the 13th International Learning Analytics and Knowledge Conference (LAK23), Arlington, TX, USA, 13–17 March 2023; pp. 606–612. [Google Scholar]
Hu, S. The effect of artificial intelligence-assisted personalized learning on student learning outcomes: A meta-analysis based on 31 empirical research papers. Sci. Insights Educ. Front. 2024, 24, 3873–3894. [Google Scholar] [CrossRef]
Lagos-Castillo, A.; Chiappe, A.; Ramirez-Montoya, M.S.; Rodríguez, D.F.B. Mapping the intelligent classroom: Examining the emergence of personalized learning solutions in the digital age. Contemp. Educ. Technol. 2025, 17, ep543. [Google Scholar] [CrossRef] [PubMed]
Baird, M.D.; Pane, J.F. Translating standardized effects of education programs into more interpretable metrics. Educ. Res. 2019, 48, 217–228. [Google Scholar] [CrossRef]
Marvin, G.; Hellen, N.; Jjingo, D.; Nakatumba-Nabende, J. Prompt engineering in large language models. In Proceedings of the International Conference on Data Intelligence and Cognitive Informatics, Tirunelveli, India, 27–28 June 2023; pp. 387–402. [Google Scholar]
Sahoo, P.; Singh, A.K.; Saha, S.; Jain, V.; Mondal, S.; Chadha, A. A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv 2024, arXiv:2402.07927. [Google Scholar]
Bisk, Y.; Zellers, R.; Bras, R.L.; Gao, J.; Choi, Y. Experience grounds language. arXiv 2020, arXiv:2004.10151. [Google Scholar]
Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Kiela, D. Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]
Zhao, R.; Chen, H.; Wang, W.; Jiao, F.; Do, X.L.; Qin, C.; Joty, S. Retrieving multimodal information for augmented generation: A survey. arXiv 2023, arXiv:2303.10868. [Google Scholar]
Kintsch, W. Revisiting the construction-integration model of text comprehension and its implications for instruction. In Theoretical Models and Processes of Literacy, 6th ed.; Routledge: New York, NY, USA, 2018; pp. 178–203. [Google Scholar]
Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Zhou, D. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 2022, 35, 24824–24837. [Google Scholar]
Cain, K.; Oakhill, J. Children’s Comprehension Problems in Oral and Written Language: A Cognitive Perspective; Guilford Press: New York, NY, USA, 2008. [Google Scholar]
Perfetti, C.; Stafura, J. Word knowledge in a theory of reading comprehension. Sci. Stud. Read. 2014, 18, 22–37. [Google Scholar] [CrossRef]
Muñoz-Ortiz, A.; Gómez-Rodríguez, C.; Vilares, D. Contrasting linguistic patterns in human and LLM-generated news text. Artif. Intell. Rev. 2024, 57, 265. [Google Scholar] [CrossRef]
Unsworth, N.; Engle, R.W. The nature of individual differences in working memory capacity: Active maintenance in primary memory and controlled search from secondary memory. Psychol. Rev. 2007, 114, 104–132. [Google Scholar] [CrossRef]
Eccles, J.S.; Wigfield, A. Motivational beliefs, values, and goals. Annu. Rev. Psychol. 2002, 53, 109–132. [Google Scholar] [CrossRef] [PubMed]
Fletcher, J.M.; Lyon, G.R.; Fuchs, L.S.; Barnes, M.A. Learning Disabilities: From Identification to Intervention, 2nd ed.; Guilford Press: New York, NY, USA, 2018; pp. 1–350. [Google Scholar]
Ladson-Billings, G. Culturally relevant pedagogy 2.0: Aka the remix. Harv. Educ. Rev. 2014, 84, 74–84. [Google Scholar] [CrossRef]
Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models are Unsupervised Multitask Learners. OpenAI Blog. 2019. Available online: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf (accessed on 10 June 2025).
Dornburg, A.; Davin, K. To what extent is ChatGPT useful for language teacher lesson plan creation? arXiv 2024, arXiv:2407.09974. [Google Scholar]
Bang, Y.; Cahyawijaya, S.; Lee, N.; Dai, W.; Su, D.; Wilie, B.; Fung, P. A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. arXiv 2023, arXiv:2302.04023. [Google Scholar]
Ji, Z.; Lee, N.; Frieske, R.; Yu, T.; Su, D.; Xu, Y.; Fung, P. Survey of hallucination in natural language generation. ACM Comput. Surv. 2023, 55, 1–38. [Google Scholar] [CrossRef]
Maynez, J.; Narayan, S.; Bohnet, B.; McDonald, R. On faithfulness and factuality in abstractive summarization. arXiv 2020, arXiv:2005.00661. [Google Scholar]
Ward, B.; Bhati, D.; Neha, F.; Guercio, A. Analyzing the impact of AI tools on student study habits and academic performance. arXiv 2024, arXiv:2412.02166. [Google Scholar]
Zheng, L.; Niu, J.; Zhong, L.; Gyasi, J.F. The effectiveness of artificial intelligence on learning achievement and learning perception: A meta-analysis. Interact. Learn. Environ. 2023, 31, 5650–5664. [Google Scholar] [CrossRef]

Figure 1. Academic writing as a function of prompt and reader profile. Source: authors’ contribution.

Figure 2. Sentence cohesion as a function of prompt and reader profile. Source: authors’ contribution.

Figure 3. Language variety as a function of prompt and reader profile. Source: authors’ contribution.

Figure 4. Sophisticated wording as a function of prompt and reader profile. Source: authors’ contribution.

Table 1. Linguistic features related to text readability. Source: authors’ contribution.

Features	Metrics and Descriptions
Overall Readability	Flesch–Kincaid Grade Level (FKGL): Indicates text difficulty based on sentence length and word length Academic writing: The extent to which the texts include domain-specific words and sophisticated sentence structures, commonly found in academic writing texts Development of ideas: The extent to which ideas and concepts are developed and elaborated throughout a text
Conceptual Density and Cohesion	Noun-to-verb ratio: Text with a high noun-to-verb ratio results in dense information and complex sentences that require greater cognitive effort to process Sentence cohesion: The extent to which the text contains connectives and cohesion cues (e.g., repeating ideas and concepts)
Syntax Complexity	Sentence length: Longer sentences often have more clauses and complex structure Language variety: Indicates the extent to which text varies in the language used (sentence structures and wordings)
Lexical Complexity	Sophisticated wording: Lower measures indicate the vocabulary familiar and common, whereas higher measures indicate more advanced words Academic frequency: Indicates the extent of sophisticated vocabulary are used, which are also common in academic texts

Table 2. Descriptions of various reader profiles provided to the LLMs in Experiment 1. Source: authors’ contribution.

	Descriptions
Reader 1 (High RS/High PK *)	Young professional (25-year-old) who has a strong educational background in science and engineering. Interested in physics High prior knowledge in natural sciences subjects, especially physics Learning goal: To develop an in-depth understanding of scientific concepts and principles to pass a class exam
Reader 2 (High RS/Low PK *)	Young adult (20-year-old) who is a curious learner. Enjoys learning about science in an engaging and accessible way Low background knowledge in natural sciences subjects Learning goal: To develop an understanding of scientific concepts and principles to pass a class exam
Reader 3 (Low RS/High PK *)	College student (18-year-old) struggles with reading because of dyslexia. Feels anxious and frustrated when reading complex scientific texts. Lacks confidence in reading comprehension High background knowledge in natural science subjects Learning goal: To develop an understanding of scientific concepts and principles to pass a class exam
Reader 4 (Low RS/Low PK *)	Student (18-year-old) who is looking for a fun and engaging introduction to scientific concepts. Enjoys imaginative and relatable explanations Limited science knowledge and low understanding of scientific terms. Learning goal: To develop an understanding of scientific concepts and principles

* RS = reading skills; PK = prior knowledge.

Table 3. Scientific texts. Source: authors’ contribution.

Domain	Text Title	Word Count	FKGL *
Biology	Bacteria	468	12.10
Biology	The Cells	426	11.61
Chemistry	Chemistry of Life	436	12.71
Biology	Genetic Equilibrium	441	12.61
Biology	Food Webs	492	12.06
Biology	Patterns of Evolution	341	15.09
Biology	Causes and Effects of Mutations	318	11.35
Physics	What are Gravitational Waves?	359	16.51
Biochemistry	Photosynthesis	427	11.44
Biology	Microbes	407	14.38

* Flesch–Kincaid Grade Level.

Table 4. Hypothesized linguistic features of adapted texts aligned to reader profiles. Source: authors’ contribution.

Reader Profiles	Linguistic Features Aligned for Reader
Reader Profiles	Overall Readability	Conceptual Density and Cohesion	Syntax and Lexical Complexity
Reader 1 (High RS/High PK)	High Flesch–Kincaid Grade Level (FKGL *), high complexity High academic writing	Low cohesion to encourage active engagement High information density	Complex sentence structures, high sentence length Sophisticated vocabulary Varied language and syntax
Reader 2 (High RS/Low PK)	Moderate text complexity Moderate academic writing	Introduce scientific concepts with elaborated explanations and examples Moderate cohesion	Complex sentence structures, moderate sentence length Sophisticated vocabulary Varied language and syntax
Reader 3 (Low RS/High PK)	Moderate text complexity Moderate academic writing	Elaborated explanations of key concepts High cohesion	Simplify sentence structures, moderate sentence length Minimal technical vocabulary Clear and concise language
Reader 4 (Low RS/Low PK)	Low FKGL *, low text complexity High academic writing	Elaborated explanations of key concepts High cohesion	Simplify sentence structures, short sentence length Minimal technical vocabulary Clear and concise language

* Flesch–Kincaid Grade Level.

Table 5. Descriptive statistics and main effects of reader profiles. Source: authors’ contribution.

Linguistic Features	Reader 1 (High RS/High PK *)		Reader 2 (High RS/Low PK *)		Reader 3 (Low RS/ High PK *)		Reader 4 (Low RS/Low PK *)		Main Effects of Profile
Linguistic Features	M	SD	M	SD	M	SD	M	SD	F (3, 320)	p	η²
FKGL	16.97	2.36	10.63	1.76	9.84	1.89	8.50	1.39	355.64	<0.001	0.79
Academic Writing	89.93	12.53	45.08	24.86	37.20	23.25	17.17	15.95	228.05	<0.001	0.70
Idea Development	57.38	28.05	47.40	25.37	48.19	23.94	45.19	23.95	4.97	0.002	0.05
Sentence Cohesion	55.00	32.55	50.30	29.19	40.85	23.96	48.81	26.84	2.67	0.04	0.03
Noun-to-Verb Ratio	2.81	0.62	1.93	0.25	1.84	0.31	1.87	0.25	133.37	<0.001	0.58
Sentence Length	20.91	6.59	18.70	4.46	14.75	3.27	16.31	3.64	30.42	<0.001	0.24
Language Variety	75.75	21.26	54.07	21.43	27.14	18.46	33.88	18.85	112.79	<0.001	0.54
Sophisticated Word	90.23	9.97	42.87	19.35	31.17	17.71	23.50	13.59	342.11	<0.001	0.78
Academic Frequency	9591.39	1425.57	8708.02	1328.34	7763.13	1426.14	8016.06	1308.47	30.42	<0.001	0.24

* RS = reading skills; PK = prior knowledge.

Table 6. Descriptive statistics and main effects of LLMs. Source: authors’ contribution.

Linguistic Features	Claude		Llama		Gemini		ChatGPT		Main Effects of LLMs
Linguistic Features	M	SD	M	SD	M	SD	M	SD	F (3, 320)	p	η²
FKGL	11.13	4.34	11.87	3.18	11.23	3.91	11.72	3.53	3.35	0.02	0.03
Academic Writing	44.43	34.11	54.14	32.34	45.47	34.45	45.34	31.29	4.70	0.01	0.05
Idea Development	59.97	22.63	33.77	16.90	51.52	23.93	52.91	30.29	19.58	<0.001	0.17
Sentence Cohesion	30.38	24.15	60.86	24.71	52.65	25.32	51.06	31.06	20.30	<0.001	0.17
Noun-to-Verb Ratio	2.25	0.80	2.11	0.48	2.06	0.43	2.03	0.43	6.27	<0.001	0.06
Sentence Length	14.71	3.91	18.68	5.06	18.55	4.25	18.73	6.22	17.68	<0.001	0.16
Language Variety	47.61	28.43	38.07	27.84	55.51	25.77	49.64	25.68	12.21	<0.001	0.11
Sophisticated Word	46.21	31.90	46.55	26.74	47.58	30.07	47.43	32.51	0.15	0.93	0.00
Academic Frequency	7851.69	1465.06	9420.10	1569.22	8646.06	1291.46	8342.75	1412.02	21.48	<0.001	0.18

Table 7. Modified descriptions of various reader profiles provided to the LLMs in Experiment 2. Source: authors’ contribution.

	Descriptions
Reader 1 (High RS/High PK *)	Age: 25 Educational level: Senior Major: Chemistry (Pre-med) ACT English composite score: 32/36 (performance is in the 96th percentile) ACT Reading composite score: 32/36 (performance is in the 96th percentile) ACT Math composite score: 28/36 (performance is in the 89th percentile) ACT Science composite score: 30/36 (performance is in the 94th percentile) Science background: Completed eight required biology, physics, and chemistry college-level courses (comprehensive academic background in the sciences, covering advanced topics in biology, chemistry, and physics, well-prepared for higher-level scientific learning and analysis) Reading goal: Understand scientific concepts and principles
Reader 2 (High RS/Low PK *)	Age: 20 Educational level: Sophomore Major: Psychology ACT English composite score: 32/36 (performance is in the 96th percentile) ACT Reading composite score: 31/36 (performance is in the 94th percentile) ACT Math composite score: 18/36 (performance is in the 42th percentile) ACT Science composite score: 19/36 (performance is in the 46th percentile) Science background: Completed one high-school-level chemistry course (no advanced science course) Limited exposure and understanding of scientific concepts Interests/Favorite subjects: arts, literature Reading goal: Understand scientific concepts and principles
Reader 3 (Low RS/High PK *)	Age: 20 Educational level: Sophomore Major: Health Science ACT English composite score: 19/36 (performance is in the 44th percentile) ACT Reading composite score: 20/36 (performance is in the 47th percentile) ACT Math composite score: 32/36 (performance is in the 97th percentile) ACT Science composite score: 30/36 (performance is in the 94th percentile) Science background: Completed one physics, one astronomy, and two college-level biology courses (substantial prior knowledge in science, having completed multiple college-level courses across several disciplines, strong foundation in scientific principles and concepts) Reading goal: Understand scientific concepts Reading disability: Dyslexia
Reader 4 (Low RS/Low PK *)	Age: 18 Educational level: Freshman Major: Marketing ACT English composite score: 17/36 (performance is in the 33rd percentile) ACT Reading composite score: 18/36 (performance is in the 36th percentile) ACT Math composite score: 19/36 (performance is in the 48th percentile) ACT Science composite score: 17/36 (performance is in the 34th percentile) Science background: Completed one high-school-level biology course (no advanced science course) Limited exposure and understanding of scientific concepts Reading goal: Understand scientific concepts

* RS = reading skills; PK = prior knowledge.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huynh, L.; McNamara, D.S. GenAI-Powered Text Personalization: Natural Language Processing Validation of Adaptation Capabilities. Appl. Sci. 2025, 15, 6791. https://doi.org/10.3390/app15126791

AMA Style

Huynh L, McNamara DS. GenAI-Powered Text Personalization: Natural Language Processing Validation of Adaptation Capabilities. Applied Sciences. 2025; 15(12):6791. https://doi.org/10.3390/app15126791

Chicago/Turabian Style

Huynh, Linh, and Danielle S. McNamara. 2025. "GenAI-Powered Text Personalization: Natural Language Processing Validation of Adaptation Capabilities" Applied Sciences 15, no. 12: 6791. https://doi.org/10.3390/app15126791

APA Style

Huynh, L., & McNamara, D. S. (2025). GenAI-Powered Text Personalization: Natural Language Processing Validation of Adaptation Capabilities. Applied Sciences, 15(12), 6791. https://doi.org/10.3390/app15126791

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

GenAI-Powered Text Personalization: Natural Language Processing Validation of Adaptation Capabilities

Abstract

1. Introduction

1.1. Personalized Learning

1.2. Evaluating Text Differences Using Natural Language Processing

1.3. Current Research

2. Experiment 1: LLM Text Personalization

2.1. Introduction Experiment 1

2.2. Materials and Method Experiment 1

2.2.1. LLM Selection and Implementation Details

2.2.2. Reader Profiles Experiment 1

2.2.3. Text Corpus

2.2.4. Procedure Experiment 1

2.3. Results Experiment 1

2.3.1. Main Effect of Reader Profile on Variations in Linguistic Features of Modified Texts

2.3.2. Main Effect of LLMs

2.4. Discussion Experiment 1

3. Experiment 2: Prompt Refinements

3.1. Introduction Experiment 2

3.2. Materials and Method Experiment 2

3.2.1. LLM Selection Experiment 2

3.2.2. Reader Profiles Experiment 2

3.2.3. Procedure Experiment 2

3.3. Results Experiment 2

3.3.1. Academic Writing

3.3.2. Conceptual Density and Cohesion

3.3.3. Syntactic and Lexical Complexity

3.4. Discussion Experiment 2

4. General Discussion

4.1. Text Readability and Reader Alignment

4.2. Variability Across LLMs

4.3. Limitations and Future Directions

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. LLM Descriptions

Appendix B. Single-Shot Prompt Experiment 1

Appendix C. Augmented Prompt Experiment 2

Appendix D. Articles Used in RAG Process

Appendix E. Quality Assessment Rubric

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI