1. Introduction
History tells us that time and creativity have always been devoted to generating and telling narratives or stories, and these have always been a tool that could be considered essential for transmitting, among many other things, knowledge, values, and life experiences. The key point to consider is that narrative organizes thought, in addition to fostering creativity and strengthening social and emotional skills [
1,
2]. In educational contexts, this translates into a powerful means of facilitating learning, especially considering the cognitive diversity of students. The objectives of this study are as follows: to analyze the dynamics of entropy, compare architectures, assess diversity, and to study its applicability in the field of neurodiversity.
In this work, the term neurodiversity is used broadly to refer to cognitive and learning variability, including conditions such as autism spectrum disorder (ASD), attention-deficit/hyperactivity disorder (ADHD), and learning-related cognitive differences. While this study does not include any clinical assessment, the paper discusses how in the future flexible probabilistic text-generation systems may support reading and interaction preferences that are cognitively diverse, attention deficit disorder, and learning disorders; it is important to ensure that the narrative allows for understanding of certain topics, such as the expression of emotions, thereby enabling the building of social connections [
3,
4,
5]. Currently, with the growth of generative artificial intelligence, there is a very broad field in the automatic generation of narratives and stories through this new paradigm, which generates new opportunities for personalized interaction, all adjusted to the reader’s cognitive style. However, assessing semantic consistency, entity continuity, and lexical variability in the generated text sequences remains challenging due to the intricacies of human language [
6,
7]. In this work, we use “narrative” in an operational sense to refer to short sequential text structures produced by probabilistic language models, rather than the fully developed narrative constructs of literary theory.
This paper analyzes simple statistical models: N-grams, which predict words based on previously existing sequences; however, one problem they have is that they lose coherence in long sequence structures. On the other hand, RNN and LSTM tend to overcome this limitation through memory and the use of long-term strategies [
8]. Furthermore, by using attention strategies, they can capture long-term relationships more efficiently [
9]. However, there is still a long way to go, as the vast majority of studies do not consider the application of AI for neurodiversity, where coherence, diversity, and consistency of entities must be considered essential [
10,
11]. Compared to GPT-style LLMs, this work focuses on interpretability.
This research focuses on analyzing the behavior of entropy and the dynamics of probabilistic text generation, applying various generative language architectures (N-grams, simple RNNs, LSTMs, and Transformers). Several elements are analyzed using different architectures and random seeds, such as:
Furthermore, it is important to emphasize that the proposal does not focus on the evolution of narrative quality from a literary or narrative theory perspective, but rather examines the semantics and statistics of text sequences. An analysis of the relevance to applications related to the topic of neurodiversity is also conducted, considering that this may be a future area of application.
Generative artificial intelligence has given rise to a vast field of automated narrative generation. The literature reflects this growth and emphasizes the importance of ensuring the following:
These are relevant in educational and cognitive adaptation settings. Recent studies show that, although there have been improvements in both fluency and contextual modeling, issues such as repetition, semantic drift, and narrative degeneration still need to be addressed [
12,
13,
14]. Another area of interest involves approaches based on information theory and entropy analysis, which are viewed as tools with potential future applicability to support an understanding of:
Generation stability;
Predictive uncertainty;
Semantic degradation in linguistic models [
15,
16]
The literature also indicates that studies grounded in entropy dynamics show that token-level trajectories can reveal unstable patterns associated with inconsistent or low-quality results [
17].
Research in the fields of neuro-symbolism and cognitive inspiration suggests that working on memory, emotional regulation, and structured control is beneficial for improving long-term narrative ability [
18,
19]. It is worth noting that if applied, this research would open up a very broad field in the educational setting focused on neurodiversity, as it would emphasize the use of adaptive narratives; if so, this could have an impact on supporting reading comprehension, emotional development, and cognitive interaction styles [
20,
21].
2. Related Work
This section presents research related to the topic of this study. In the early stages of working with and applying n-grams, it was observed that they cannot maintain narrative coherence over the long term [
22]. Based on this, recurrent neural networks (RNNs) were analyzed and developed for sequential language modeling [
23]. Later, long short-term memory (LSTM) networks were introduced, which were found to improve handling of two important aspects: temporal dependencies and contextual retention, with a focus on text generation tasks [
8]. However, this was not sufficient, as they continued to lack gradients, resulting in vanishing gradients and instability when generating long narratives [
8].
Reinforcement learning and controllable attribute modeling have been widely applied to better align generated outputs with human preferences and conversational quality metrics [
24]. In a similar vein, unlikelihood training techniques were proposed to directly discourage the model from producing repetitive tokens during decoding. Together, these methods have shown gains in narrative diversity and have helped reduce degeneration issues in long-form text generation [
25].
Most prior work instead concentrates on fluency, perplexity, or subjective human preference scores, rather than tracking how entropy changes across narrative structures over time.
This research gap provides the motivation for the present study, which systematically investigates entropy evolution across different neural architectures and connects entropy patterns with lexical diversity, repetition behavior, and entity coherence in narrative generation systems.
3. Materials and Methods
In this work, we created a reduced corpus inspired by the ROCStories dataset format [
26] of short sequential text samples that have recurring semantic elements such as a child, a forest, and a dragon. The corpus was intentionally constrained so that the development of entropy, lexical diversity, repetition behavior, and consistency of entities could be studied in a controlled fashion over the training epochs. Therefore, the generated sequences must be understood as simplified semantic narrative structures and not as complete literary stories with complex narrative arcs or character development.
This allows for an evaluation with a narrative that maintains consistency in short, comprehensible sequences.
For the evaluation process, the texts were processed and tokenized, and padding was applied to standardize the length of the text sequences.
N-gram: The function of this type of N-gram model is to estimate the probability of a word appearing in a given context. They are trained based on text corpora, and their use ranges from natural language processing, speech recognition, machine translation, and text prediction systems [
27].
RNN simple: Recurrent neural networks (RNN) were designed to process data sequentially, such as text, voice signals, and time series [
28]. In this type of data, the sequence of elements is essential. The main feature of RNNs is the presence of recurrent connections, through which the output of a neuron at a given moment in time is fed back and incorporated as input in the next time step. This allows information to be kept in context throughout the sequence, enabling the modeling of long-term temporal dependencies.
LSTM: It is a recurrent neural network that works with data structures in sequence, including videos and sounds, among others. They are used because they allow past information to be remembered over long periods of time, resulting in a more efficient neural network [
29,
30].
Transformer: Developed to be applied to topics such as NLP, vision, and speech processing, is considered a deep learning architecture and is a model that works from sequence to sequence to provide translation solutions [
31]. Some studies indicate that pre-trained models based on the Transformer can perform very well [
32]. In addition, it has been applied in other areas, such as vision, audio processing, and speech [
33,
34,
35].
3.1. Training
A small corpus of stories was created with three different elements: a child, a forest, and a dragon. The main objective is to analyze and review the narration analysis and generation of the model. Each sentence was tokenized at the word level and then converted into numerical sequences. Additional special tokens (<PAD>, <UNK>) were added to handle sequence normalization and out-of-vocabulary words. During training, to keep a fixed input length, we truncated sequences longer than 20 tokens and padded shorter sequences to 20 tokens. A comparison of three architectures was performed: simple RNN, single-layer LSTM, and a causal Transformer with multiple attention. The models were trained using the Adam optimizer with 100, 500, 1000, 1500, and 2000 epochs for each random seed (42–46). This was done to confirm the generation of results. Validation loss stabilizes after 900–1200 epochs without divergence. This helped each of the models learn the temporal and semantic relationships of tokens within the stories, ensuring a solid foundation for text generation.
RNN: 128 units; LSTM: 2 layers, 256 hidden units; Transformer: 4 heads, 2 layers, d_model = 128.
In the interest of reproducibility, we explicitly describe all preprocessing steps, tokenization parameters, epoch settings, random seeds, and model hyperparameters. The corpus structure and generation templates used in the experiments can be shared as
Supplementary Material or as a repository to enable replication,
https://doi.org/10.6084/m9.figshare.32085366.
3.2. Metrics
Metrics were used to evaluate the generated narratives, and those that can process different points in the narratives were used. An important point is to verify that the generated text maintains fluency. To do this, the degree of entropy and perplexity was measured. This is important because it helps to reflect uncertainty and consistency in token prediction. Another relevant point is to observe lexical diversity and Self-BLEU. Flesch-Kincaid and Gunning Fog Index were added. The variety of words and phrases within the generated texts was analyzed and assigned a quantified value, and care was taken to ensure that the models did not repeat rigid patterns. The metrics were computed using spaCy NER + coreference resolution SweetSpot = weighted combination of entropy, LexDiv, and Self-BLEU. The metric evaluates the proportion of correctly maintained entity references throughout generated narrative sequences. Finally, we also assessed the entity consistency in terms of named-entity continuity in generated text sequences. This metric provides an estimate of semantic referential stability, but it should not be taken as a full measure of narrative coherence in literary theory.
To further extend the evaluation framework, we also incorporate syntactic complexity measures to capture structural properties of the generated narratives.
Mean Sentence Length (MSL) was computed as a basic proxy for syntactic elaboration, capturing average sentence size in tokens. Metrics based on syntactic analysis were also used, specifically dependency distance and the average depth of the dependency tree, in order to calculate two aspects: the hierarchical and relational complexity of sentence structures. Finally, the subordination index was obtained to calculate the percentage of subordinate clauses relative to the main clauses, which provided a precise perspective on syntactic nesting.
3.3. Narrative Generation
Once trained, the models generated new stories starting with “once upon a time” using temperature-controlled sampling of (0.5) to maintain a balance between consistency and creativity. Once the text sequences were obtained, they were analyzed using each of the metrics indicated above. Boxplots, violin plots, and raincloud plots were used to display the results, combining statistical values and complete data distribution. This made it possible to compare each of the models in a simple and clear way, working on topics such as creativity, coherence, and variability, offering a comprehensive and detailed view of their ability to generate coherent and varied narratives.
3.4. Hardware/Software
The software used was Python 3.11, PyTorch 2.2, and TensorFlow, and the hardware used was Intel Core i7, RAM 32 GB, and Windows 11.
4. Results
The results of analyzing the behavior of recurrent and attention-based linguistic models are presented. The objective is to examine the functioning of key elements such as:
Entropy;
Robustness generated between random seeds;
The impact of lexical diversity;
The effects caused by repetition;
Narrative coherence.
4.1. Evolution of Entropy Throughout Training Periods
The evolution of predictive entropy is examined as a function of training duration, which helps to understand the change in uncertainty as learning is generated in different architectures.
Figure 1,
Figure 2 and
Figure 3 show the evolution of the entropy metric, based on the number of epochs and with different seeds. The patterns of initial variability and subsequent stabilization are indicated, allowing factors such as generative diversity and model convergence to be evaluated.
4.2. Variability and Robustness Between Random Seeds
Figure 1.
Entropy evolution (LSTM).
Figure 1.
Entropy evolution (LSTM).
Figure 2.
Entropy evolution (RNN).
Figure 2.
Entropy evolution (RNN).
Figure 3.
Entropy evolution (TRF).
Figure 3.
Entropy evolution (TRF).
Figure 4,
Figure 5 and
Figure 6 show the evolution of the SeedVariability metric during training, demonstrating the model’s sensitivity to random initialization.
Figure 7,
Figure 8 and
Figure 9 show the evolution of the lexical diversity metric (LexDiv) based on the number of epochs and with different seeds, indicating the changes generated in the linguistic richness of the generated text.
4.3. Self-BLEU and Repetitiveness
This section presents how the degree of repetition in the generated results is evaluated by analyzing Self-BLEU scores throughout the training epochs.
Figure 10,
Figure 11 and
Figure 12 show the evolution of the SelfBLEU metric during training, thereby allowing for analysis of the degree of redundancy and internal similarity between the generated texts.
4.4. Entity Consistency in Generated Narratives
This section presents how the coherence of entities in the generated narratives is evaluated.
Figure 13,
Figure 14 and
Figure 15 show the evolution of the EntityCons entity consistency metric (EntityCons) during training, which helps indicate the model’s ability to maintain consistency in the use of narrative entities.
4.5. Optimal Training Epoch Detection
This section presents how the detection of the optimal training period is presented.
Figure 16,
Figure 17 and
Figure 18 show the evolution of the SweetSpot metric, which integrates topics such as diversity and consistency into a single indicator.
4.6. Evolution EntityCons
Figure 19,
Figure 20,
Figure 21,
Figure 22,
Figure 23 and
Figure 24 show the evolution of the Entropy, SeedVariability, LexDiv, SelfBLEU, EntityCons, and SweetSpot metrics for all models and seeds. The comparative analysis identifies topics such as overall trends, regions of stability, and critical transitions during training, providing relevant information for the optimization of generative narratives aimed at neurodivergent users.
Figure 19.
Evolution of EntityCons across epochs for all models and seeds.
Figure 19.
Evolution of EntityCons across epochs for all models and seeds.
4.7. Evolution Entropy
Figure 20.
Temporal Evolution of Entropy across epochs for all models and seeds.
Figure 20.
Temporal Evolution of Entropy across epochs for all models and seeds.
4.8. Evolution LexDiv
Figure 21.
Evolution of LexDIV across epochs for all models and seeds.
Figure 21.
Evolution of LexDIV across epochs for all models and seeds.
4.9. Evolution of SeedVariability
Figure 22.
Evolution of seedVariability across epochs for all models and seeds.
Figure 22.
Evolution of seedVariability across epochs for all models and seeds.
4.10. Evolution of SelfBLUE
Figure 23.
Evolution of SelfBLUE across epochs for all models and seeds.
Figure 23.
Evolution of SelfBLUE across epochs for all models and seeds.
4.11. Evolution of SweetSpot
Figure 24.
Evolution of SweetSpot across epochs for all models and seeds.
Figure 24.
Evolution of SweetSpot across epochs for all models and seeds.
4.12. Evolution of Sweet Spot/Optimal Training Periods
Figure 25 shows the evolution of the periods in which an optimal balance between diversity and consistency is achieved.
Figure 26 heatmaps are shown to improve the visualization of the results, thereby providing a better visual interpretation.
Figure 30 and
Figure 31 shows the quantification of variance across seeds using standard deviation and coefficient of variation.
Table 1 shows means ± standard deviations for Entropy, LexDiv, Self-BLEU, EntityCons, and SweetSpot.
5. Statistics
It is used to illustrate the results with the ANOVA test: p < 0.01 (entropy differences); p < 0.05 (LexDiv differences).
6. Discussion
In this work, we propose that the generative quality of probabilistic text-generation systems may benefit from evaluation across multiple interacting dimensions of the output, including uncertainty, lexical diversity, semantic consistency, and repetition behavior, beyond probability optimization or fluency measures, taking special care when these systems are focused on supporting neurodivergent individuals. It is clear that traditional optimization objectives focus primarily on maximizing probability and surface fluency, but these alone are insufficient for narratives that must balance aspects such as predictability, variability, and coherence in a cognitively accessible manner. Just as people with autism spectrum disorders may benefit from greater predictability, those with ADHD may benefit from greater variability.
The analysis carried out on each of the architectures evaluated shows that they generate a pattern that remains constant, and it can be indicated that if prolonged training is carried out, there tends to be a reduction in time that can be considered progressive in terms of entropy, which indicates that greater confidence can be placed in the model. However, it is important to consider that excessive minimization can periodically lead to less lexical diversity, higher repetition rates, and limited narratives. It should be noted that for neurodiverse readers, if narratives are inflexible or repetitive, they may become more interested, and even more so when there is no variation. Without variation and adaptation, their focus and comprehension are limited or even lost.
The comparison of results also shows that decision making is decisive in determining how uncertainty is distributed in the training. It was identified that simpler recurrent models generate faster convergence as well as overconfidence sooner, almost always to the detriment of expressiveness. On the other hand, architectures such as LSTM and those based on attention tend to keep uncertainty under control over longer periods, allowing narratives to evolve with good lexical and structural variation and helping to maintain coherence. All of the above is very important in order to develop a narrative oriented towards neurodiversity.
In this type of process, maintaining entropy regularization is very important in mediating these effects. It can be said that, given the very low certainty at the outset, entropy training helps to preserve a stable and controlled level of uncertainty, which favors narrative diversity without losing semantic coherence. In other words, rather than treating entropy as a passive diagnostic element, this study exemplifies and demonstrates its effectiveness as an active control mechanism that shapes generative behavior in a cognitively considered manner. A relevant point is that entropy-regularized models may also have lower sensitivity to random initialization, which demonstrates and reinforces their reliability for implementation in healthcare or educational contexts. Results are comparable to prior studies reporting LexDiv between 0.68 and 0.72.
Considering the automated identification of training periods using multi-objective criteria supports the alignment of design optimization with neurodiversity awareness. This means that considering entropy, diversity, repetition, and consistency of entities in a single group helps identify training patterns that generate more balanced narratives, without so many restrictions or unpredictability. It can be said that this approach may be a good point to consider for the development of narrative systems that are robust, adaptable, and inclusive by design.
The temporal evolution of entropy and identification are important, and even more so during optimal training periods, tending to further reinforce the importance of controlling uncertainty when designing narrative systems for neurodivergent users. As shown in
Figure 19,
Figure 20 and
Figure 21.
Finally, the analysis presented here demonstrates changes in entropy when considering generative models such as N-grams, RNNs, LSTMs, and Transformers, as well as the relationship between entropy and various elements: semantic coherence, repetition, and lexical diversity. Subsequently, when conducting the tests, the results obtained indicate something that can be considered relevant: in the simplest models, there is a tendency toward faster convergence, but they have lower variability; in contrast, LSTMs and Transformers tend to maintain a better balance between uncertainty and coherence. It is important to note that good entropy regularization is a key proposal in this work for controlling the generation process. Finally, a key point is its application to educational systems, as this type of research can greatly support the design of systems with easy-to-understand adaptive narratives that can adjust to predictability and diversity, addressing the needs of neurodivergent students, and ultimately promoting comprehension, attention, and personalized learning.
7. Conclusions
In this paper we systematically analyze the behavior of predictive entropy when generating text probabilistically using recurrent and attention-based architectures. The paper introduces entropy as a diagnostic measure of uncertainty but also as a possible control tool for balancing lexical diversity, semantic stability, repetition, and robustness in the training of language models. Through a comparative evaluation of recurrent and attention-based architectures, the study demonstrates that good generative quality arises from the interaction between several factors, such as uncertainty, diversity, and coherence, and not only from probability optimization.
The results obtained indicate that having a minimization that is not regulated in terms of entropy can sometimes generate rigid repetitive narratives, but on the other hand, if training is regularized by entropy, this helps to preserve lexical diversity, supports the stabilization of narrative structures, and also improves robustness in random seeds. This is consistent across all models, highlighting the general applicability of the proposed framework.
It is extremely important to point out and emphasize that these findings have implications beyond technical performance. When considering the narrative context geared toward neurodivergent individuals, maintaining and having controlled uncertainty is essential in order to have narratives that are considered adaptable, appealing, and cognitively accessible. It should be mentioned that entropy-regularized models have the ability to accommodate variation without sacrificing coherence, all while aligning probabilistic language modeling with the principles of inclusive design. The Transformer achieved r = −0.71 correlation between entropy and diversity.
Therefore, developers of educational narrative systems should incorporate entropy regularization mechanisms in order to maintain a balance between coherence and lexical diversity. Additionally, predictive entropy monitoring may be used as a diagnostic tool to detect repetitive generation patterns and adapt narrative complexity according to users’ cognitive accessibility requirements. In contexts oriented toward neurodivergent users, maintaining moderate entropy levels is recommended to avoid excessively rigid or unpredictable narratives, thereby promoting more accessible, stable, and adaptive reading experiences.
In conclusion, it can be said that predictive entropy should be understood not only as an element derived from probabilistic modeling, but also as a key element in guiding generative systems aimed at achieving balanced narrative behavior that is conscious of neurodiversity. Future work aims to explore how elements such as conscious entropy objectives can be extended to larger-scale datasets, interactive narrative scenarios, and user-adaptable narrative systems. This effect is consistent with known mode collapse in NLP models.