Automated Storytelling for Neurodiversity: Comparative Evaluation Between Multilayer LSTM, Advanced Embeddings, and Modern Narrative Generation Techniques

Alanis, Arnulfo; Díaz, Ximena; Márquez, Bogart Yail; Guarda, Teresa; Viramontes, J Ascención Guerrero

doi:10.3390/app16125817

Open AccessArticle

Automated Storytelling for Neurodiversity: Comparative Evaluation Between Multilayer LSTM, Advanced Embeddings, and Modern Narrative Generation Techniques

by

Arnulfo Alanis

^1,*

,

Ximena Díaz

²,

Bogart Yail Márquez

¹

,

Teresa Guarda

³

and

J Ascención Guerrero Viramontes

⁴

¹

Systems and Computer Department, National Technology of México, Campus Tijuana, Calzada del Tecnológico S/N, Fraccionamiento Tomas Aquino, Tijuana 22414, Baja California, Mexico

²

Systems and Computer Engineering, Department of Systems and Computing, National Technology of México, Campus Tijuana, Calzada del Tecnológico S/N, Fraccionamiento Tomas Aquino, Tijuana 22414, Baja California, Mexico

³

Faculty of Systems and Telecommunications, Universidad Estatal Península Santa Elena, Santa Elena 240204, Ecuador

⁴

Division of Postgraduate Studies and Research, Tecnológico Nacional de México, IT de Aguascalientes, Aguascalientes 20255, Aguascalientes, Mexico

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(12), 5817; https://doi.org/10.3390/app16125817 (registering DOI)

Submission received: 4 April 2026 / Revised: 26 May 2026 / Accepted: 5 June 2026 / Published: 9 June 2026

(This article belongs to the Special Issue Artificial Intelligence for Healthcare: Technologies, Applications, and Impact)

Download

Browse Figures

Versions Notes

Abstract

An important issue to consider is the training time, as it can have a considerable influence on the set of stories generated, due to factors such as uncertainty, diversity, and narrative coherence. This paper presents a systematic analysis of the dynamics of predictive entropy at different times and random seeds, studying the interaction of entropy with lexical diversity, repetition, semantic consistency, and entity continuity in probabilistic language generation models. A comparative evaluation of recurrent and attention-based architectures is performed using linguistic metrics. Predictive entropy was reduced by 32.4% (LSTM) and 28.7% (Transformer). LexDiv obtained 0.71 ± 0.03 and Self-BLEU obtained 0.42 ± 0.02, suggesting greater confidence in the model. However, it should be noted that a greater reduction in entropy may be associated with lower lexical diversity and higher Self-BLEU scores. This indicates a trade-off between confidence and expressiveness in probabilistic language models. The entropy term encourages smoother probability distributions and reduces premature mode collapse during Adam optimization.

L_{t o t a l} = L_{C E} - λ H (p (y | x)

aims to improve stability, reduce random initialization, and enable the generation of adaptable narratives, which may be relevant for neurodiversity-oriented narratives.

Keywords:

neural text generation; predictive entropy; training dynamics; lexical diversity; mode collapse; entropy regularization; early stopping; information theory; language models; generative robustness

1. Introduction

History tells us that time and creativity have always been devoted to generating and telling narratives or stories, and these have always been a tool that could be considered essential for transmitting, among many other things, knowledge, values, and life experiences. The key point to consider is that narrative organizes thought, in addition to fostering creativity and strengthening social and emotional skills [1,2]. In educational contexts, this translates into a powerful means of facilitating learning, especially considering the cognitive diversity of students. The objectives of this study are as follows: to analyze the dynamics of entropy, compare architectures, assess diversity, and to study its applicability in the field of neurodiversity.

In this work, the term neurodiversity is used broadly to refer to cognitive and learning variability, including conditions such as autism spectrum disorder (ASD), attention-deficit/hyperactivity disorder (ADHD), and learning-related cognitive differences. While this study does not include any clinical assessment, the paper discusses how in the future flexible probabilistic text-generation systems may support reading and interaction preferences that are cognitively diverse, attention deficit disorder, and learning disorders; it is important to ensure that the narrative allows for understanding of certain topics, such as the expression of emotions, thereby enabling the building of social connections [3,4,5]. Currently, with the growth of generative artificial intelligence, there is a very broad field in the automatic generation of narratives and stories through this new paradigm, which generates new opportunities for personalized interaction, all adjusted to the reader’s cognitive style. However, assessing semantic consistency, entity continuity, and lexical variability in the generated text sequences remains challenging due to the intricacies of human language [6,7]. In this work, we use “narrative” in an operational sense to refer to short sequential text structures produced by probabilistic language models, rather than the fully developed narrative constructs of literary theory.

This paper analyzes simple statistical models: N-grams, which predict words based on previously existing sequences; however, one problem they have is that they lose coherence in long sequence structures. On the other hand, RNN and LSTM tend to overcome this limitation through memory and the use of long-term strategies [8]. Furthermore, by using attention strategies, they can capture long-term relationships more efficiently [9]. However, there is still a long way to go, as the vast majority of studies do not consider the application of AI for neurodiversity, where coherence, diversity, and consistency of entities must be considered essential [10,11]. Compared to GPT-style LLMs, this work focuses on interpretability.

This research focuses on analyzing the behavior of entropy and the dynamics of probabilistic text generation, applying various generative language architectures (N-grams, simple RNNs, LSTMs, and Transformers). Several elements are analyzed using different architectures and random seeds, such as:

The interaction of predictive entropy with lexical diversity;
Repetition and semantic coherence;
Training stability.

Furthermore, it is important to emphasize that the proposal does not focus on the evolution of narrative quality from a literary or narrative theory perspective, but rather examines the semantics and statistics of text sequences. An analysis of the relevance to applications related to the topic of neurodiversity is also conducted, considering that this may be a future area of application.

Generative artificial intelligence has given rise to a vast field of automated narrative generation. The literature reflects this growth and emphasizes the importance of ensuring the following:

Semantic coherence;
Consistency of entities;
Lexical diversity;
Controllability of the generated narratives.

These are relevant in educational and cognitive adaptation settings. Recent studies show that, although there have been improvements in both fluency and contextual modeling, issues such as repetition, semantic drift, and narrative degeneration still need to be addressed [12,13,14]. Another area of interest involves approaches based on information theory and entropy analysis, which are viewed as tools with potential future applicability to support an understanding of:

Generation stability;
Predictive uncertainty;
Semantic degradation in linguistic models [15,16]

The literature also indicates that studies grounded in entropy dynamics show that token-level trajectories can reveal unstable patterns associated with inconsistent or low-quality results [17].

Research in the fields of neuro-symbolism and cognitive inspiration suggests that working on memory, emotional regulation, and structured control is beneficial for improving long-term narrative ability [18,19]. It is worth noting that if applied, this research would open up a very broad field in the educational setting focused on neurodiversity, as it would emphasize the use of adaptive narratives; if so, this could have an impact on supporting reading comprehension, emotional development, and cognitive interaction styles [20,21].

2. Related Work

This section presents research related to the topic of this study. In the early stages of working with and applying n-grams, it was observed that they cannot maintain narrative coherence over the long term [22]. Based on this, recurrent neural networks (RNNs) were analyzed and developed for sequential language modeling [23]. Later, long short-term memory (LSTM) networks were introduced, which were found to improve handling of two important aspects: temporal dependencies and contextual retention, with a focus on text generation tasks [8]. However, this was not sufficient, as they continued to lack gradients, resulting in vanishing gradients and instability when generating long narratives [8].

Reinforcement learning and controllable attribute modeling have been widely applied to better align generated outputs with human preferences and conversational quality metrics [24]. In a similar vein, unlikelihood training techniques were proposed to directly discourage the model from producing repetitive tokens during decoding. Together, these methods have shown gains in narrative diversity and have helped reduce degeneration issues in long-form text generation [25].

Most prior work instead concentrates on fluency, perplexity, or subjective human preference scores, rather than tracking how entropy changes across narrative structures over time.

This research gap provides the motivation for the present study, which systematically investigates entropy evolution across different neural architectures and connects entropy patterns with lexical diversity, repetition behavior, and entity coherence in narrative generation systems.

3. Materials and Methods

Data Set

In this work, we created a reduced corpus inspired by the ROCStories dataset format [26] of short sequential text samples that have recurring semantic elements such as a child, a forest, and a dragon. The corpus was intentionally constrained so that the development of entropy, lexical diversity, repetition behavior, and consistency of entities could be studied in a controlled fashion over the training epochs. Therefore, the generated sequences must be understood as simplified semantic narrative structures and not as complete literary stories with complex narrative arcs or character development.

This allows for an evaluation with a narrative that maintains consistency in short, comprehensible sequences.

For the evaluation process, the texts were processed and tokenized, and padding was applied to standardize the length of the text sequences.

Models Evaluated

N-gram: The function of this type of N-gram model is to estimate the probability of a word appearing in a given context. They are trained based on text corpora, and their use ranges from natural language processing, speech recognition, machine translation, and text prediction systems [27].

P (w_{1}, w_{2}, \dots, w_{T} .) \approx \prod_{i = 1}^{T} P (w_{i}| w_{1 - n + 1} {\dots w}_{i - 1})

(1)

RNN simple: Recurrent neural networks (RNN) were designed to process data sequentially, such as text, voice signals, and time series [28]. In this type of data, the sequence of elements is essential. The main feature of RNNs is the presence of recurrent connections, through which the output of a neuron at a given moment in time is fed back and incorporated as input in the next time step. This allows information to be kept in context throughout the sequence, enabling the modeling of long-term temporal dependencies.

h_{t} = t a n h (W_{x h} x_{t} + W_{h h} h_{t - 1} + b_{n})

(2)

LSTM: It is a recurrent neural network that works with data structures in sequence, including videos and sounds, among others. They are used because they allow past information to be remembered over long periods of time, resulting in a more efficient neural network [29,30].

{(i}_{t}, f_{t}, o_{t}, {\tilde{c}}_{t}) = (σ, σ, σ, t a h n) (W [x_{t}, h_{t - 1}] + b)

(3)

Transformer: Developed to be applied to topics such as NLP, vision, and speech processing, is considered a deep learning architecture and is a model that works from sequence to sequence to provide translation solutions [31]. Some studies indicate that pre-trained models based on the Transformer can perform very well [32]. In addition, it has been applied in other areas, such as vision, audio processing, and speech [33,34,35].

T r a n s f o r m e r L a y e r (X) = X + F F N (L a y e r N o r m (X + M u l t i H e a d A t t e n t i o n (L a y e r N o r m (X))))

(4)

3.1. Training

A small corpus of stories was created with three different elements: a child, a forest, and a dragon. The main objective is to analyze and review the narration analysis and generation of the model. Each sentence was tokenized at the word level and then converted into numerical sequences. Additional special tokens (<PAD>, <UNK>) were added to handle sequence normalization and out-of-vocabulary words. During training, to keep a fixed input length, we truncated sequences longer than 20 tokens and padded shorter sequences to 20 tokens. A comparison of three architectures was performed: simple RNN, single-layer LSTM, and a causal Transformer with multiple attention. The models were trained using the Adam optimizer with 100, 500, 1000, 1500, and 2000 epochs for each random seed (42–46). This was done to confirm the generation of results. Validation loss stabilizes after 900–1200 epochs without divergence. This helped each of the models learn the temporal and semantic relationships of tokens within the stories, ensuring a solid foundation for text generation.

RNN: 128 units; LSTM: 2 layers, 256 hidden units; Transformer: 4 heads, 2 layers, d_model = 128.

In the interest of reproducibility, we explicitly describe all preprocessing steps, tokenization parameters, epoch settings, random seeds, and model hyperparameters. The corpus structure and generation templates used in the experiments can be shared as Supplementary Material or as a repository to enable replication, https://doi.org/10.6084/m9.figshare.32085366.

3.2. Metrics

Metrics were used to evaluate the generated narratives, and those that can process different points in the narratives were used. An important point is to verify that the generated text maintains fluency. To do this, the degree of entropy and perplexity was measured. This is important because it helps to reflect uncertainty and consistency in token prediction. Another relevant point is to observe lexical diversity and Self-BLEU. Flesch-Kincaid and Gunning Fog Index were added. The variety of words and phrases within the generated texts was analyzed and assigned a quantified value, and care was taken to ensure that the models did not repeat rigid patterns. The metrics were computed using spaCy NER + coreference resolution SweetSpot = weighted combination of entropy, LexDiv, and Self-BLEU. The metric evaluates the proportion of correctly maintained entity references throughout generated narrative sequences. Finally, we also assessed the entity consistency in terms of named-entity continuity in generated text sequences. This metric provides an estimate of semantic referential stability, but it should not be taken as a full measure of narrative coherence in literary theory.

To further extend the evaluation framework, we also incorporate syntactic complexity measures to capture structural properties of the generated narratives.

Mean Sentence Length (MSL) was computed as a basic proxy for syntactic elaboration, capturing average sentence size in tokens. Metrics based on syntactic analysis were also used, specifically dependency distance and the average depth of the dependency tree, in order to calculate two aspects: the hierarchical and relational complexity of sentence structures. Finally, the subordination index was obtained to calculate the percentage of subordinate clauses relative to the main clauses, which provided a precise perspective on syntactic nesting.

3.3. Narrative Generation

Once trained, the models generated new stories starting with “once upon a time” using temperature-controlled sampling of (0.5) to maintain a balance between consistency and creativity. Once the text sequences were obtained, they were analyzed using each of the metrics indicated above. Boxplots, violin plots, and raincloud plots were used to display the results, combining statistical values and complete data distribution. This made it possible to compare each of the models in a simple and clear way, working on topics such as creativity, coherence, and variability, offering a comprehensive and detailed view of their ability to generate coherent and varied narratives.

3.4. Hardware/Software

The software used was Python 3.11, PyTorch 2.2, and TensorFlow, and the hardware used was Intel Core i7, RAM 32 GB, and Windows 11.

4. Results

The results of analyzing the behavior of recurrent and attention-based linguistic models are presented. The objective is to examine the functioning of key elements such as:

Entropy;
Robustness generated between random seeds;
The impact of lexical diversity;
The effects caused by repetition;
Narrative coherence.

4.1. Evolution of Entropy Throughout Training Periods

The evolution of predictive entropy is examined as a function of training duration, which helps to understand the change in uncertainty as learning is generated in different architectures.

Figure 1, Figure 2 and Figure 3 show the evolution of the entropy metric, based on the number of epochs and with different seeds. The patterns of initial variability and subsequent stabilization are indicated, allowing factors such as generative diversity and model convergence to be evaluated.

4.2. Variability and Robustness Between Random Seeds

Figure 1, Figure 2 and Figure 3 include 95% confidence intervals across seeds 42–46.

Figure 1. Entropy evolution (LSTM).

Figure 2. Entropy evolution (RNN).

Figure 3. Entropy evolution (TRF).

Figure 4, Figure 5 and Figure 6 show the evolution of the SeedVariability metric during training, demonstrating the model’s sensitivity to random initialization.

Figure 7, Figure 8 and Figure 9 show the evolution of the lexical diversity metric (LexDiv) based on the number of epochs and with different seeds, indicating the changes generated in the linguistic richness of the generated text.

4.3. Self-BLEU and Repetitiveness

This section presents how the degree of repetition in the generated results is evaluated by analyzing Self-BLEU scores throughout the training epochs.

Figure 10, Figure 11 and Figure 12 show the evolution of the SelfBLEU metric during training, thereby allowing for analysis of the degree of redundancy and internal similarity between the generated texts.

4.4. Entity Consistency in Generated Narratives

This section presents how the coherence of entities in the generated narratives is evaluated.

Figure 13, Figure 14 and Figure 15 show the evolution of the EntityCons entity consistency metric (EntityCons) during training, which helps indicate the model’s ability to maintain consistency in the use of narrative entities.

4.5. Optimal Training Epoch Detection

This section presents how the detection of the optimal training period is presented.

Figure 16, Figure 17 and Figure 18 show the evolution of the SweetSpot metric, which integrates topics such as diversity and consistency into a single indicator.

4.6. Evolution EntityCons

Figure 19, Figure 20, Figure 21, Figure 22, Figure 23 and Figure 24 show the evolution of the Entropy, SeedVariability, LexDiv, SelfBLEU, EntityCons, and SweetSpot metrics for all models and seeds. The comparative analysis identifies topics such as overall trends, regions of stability, and critical transitions during training, providing relevant information for the optimization of generative narratives aimed at neurodivergent users.

Figure 19. Evolution of EntityCons across epochs for all models and seeds.

4.7. Evolution Entropy

Figure 20. Temporal Evolution of Entropy across epochs for all models and seeds.

4.8. Evolution LexDiv

Figure 21. Evolution of LexDIV across epochs for all models and seeds.

4.9. Evolution of SeedVariability

Figure 22. Evolution of seedVariability across epochs for all models and seeds.

4.10. Evolution of SelfBLUE

Figure 23. Evolution of SelfBLUE across epochs for all models and seeds.

4.11. Evolution of SweetSpot

Figure 24. Evolution of SweetSpot across epochs for all models and seeds.

4.12. Evolution of Sweet Spot/Optimal Training Periods

Figure 25 shows the evolution of the periods in which an optimal balance between diversity and consistency is achieved.

Figure 26 heatmaps are shown to improve the visualization of the results, thereby providing a better visual interpretation.

Figure 26, Figure 27, Figure 28 and Figure 29 show the synthetic complexity metrics.

Figure 30 and Figure 31 shows the quantification of variance across seeds using standard deviation and coefficient of variation.

Table 1 shows means ± standard deviations for Entropy, LexDiv, Self-BLEU, EntityCons, and SweetSpot.

5. Statistics

It is used to illustrate the results with the ANOVA test: p < 0.01 (entropy differences); p < 0.05 (LexDiv differences).

6. Discussion

In this work, we propose that the generative quality of probabilistic text-generation systems may benefit from evaluation across multiple interacting dimensions of the output, including uncertainty, lexical diversity, semantic consistency, and repetition behavior, beyond probability optimization or fluency measures, taking special care when these systems are focused on supporting neurodivergent individuals. It is clear that traditional optimization objectives focus primarily on maximizing probability and surface fluency, but these alone are insufficient for narratives that must balance aspects such as predictability, variability, and coherence in a cognitively accessible manner. Just as people with autism spectrum disorders may benefit from greater predictability, those with ADHD may benefit from greater variability.

The analysis carried out on each of the architectures evaluated shows that they generate a pattern that remains constant, and it can be indicated that if prolonged training is carried out, there tends to be a reduction in time that can be considered progressive in terms of entropy, which indicates that greater confidence can be placed in the model. However, it is important to consider that excessive minimization can periodically lead to less lexical diversity, higher repetition rates, and limited narratives. It should be noted that for neurodiverse readers, if narratives are inflexible or repetitive, they may become more interested, and even more so when there is no variation. Without variation and adaptation, their focus and comprehension are limited or even lost.

The comparison of results also shows that decision making is decisive in determining how uncertainty is distributed in the training. It was identified that simpler recurrent models generate faster convergence as well as overconfidence sooner, almost always to the detriment of expressiveness. On the other hand, architectures such as LSTM and those based on attention tend to keep uncertainty under control over longer periods, allowing narratives to evolve with good lexical and structural variation and helping to maintain coherence. All of the above is very important in order to develop a narrative oriented towards neurodiversity.

In this type of process, maintaining entropy regularization is very important in mediating these effects. It can be said that, given the very low certainty at the outset, entropy training helps to preserve a stable and controlled level of uncertainty, which favors narrative diversity without losing semantic coherence. In other words, rather than treating entropy as a passive diagnostic element, this study exemplifies and demonstrates its effectiveness as an active control mechanism that shapes generative behavior in a cognitively considered manner. A relevant point is that entropy-regularized models may also have lower sensitivity to random initialization, which demonstrates and reinforces their reliability for implementation in healthcare or educational contexts. Results are comparable to prior studies reporting LexDiv between 0.68 and 0.72.

Considering the automated identification of training periods using multi-objective criteria supports the alignment of design optimization with neurodiversity awareness. This means that considering entropy, diversity, repetition, and consistency of entities in a single group helps identify training patterns that generate more balanced narratives, without so many restrictions or unpredictability. It can be said that this approach may be a good point to consider for the development of narrative systems that are robust, adaptable, and inclusive by design.

The temporal evolution of entropy and identification are important, and even more so during optimal training periods, tending to further reinforce the importance of controlling uncertainty when designing narrative systems for neurodivergent users. As shown in Figure 19, Figure 20 and Figure 21.

Finally, the analysis presented here demonstrates changes in entropy when considering generative models such as N-grams, RNNs, LSTMs, and Transformers, as well as the relationship between entropy and various elements: semantic coherence, repetition, and lexical diversity. Subsequently, when conducting the tests, the results obtained indicate something that can be considered relevant: in the simplest models, there is a tendency toward faster convergence, but they have lower variability; in contrast, LSTMs and Transformers tend to maintain a better balance between uncertainty and coherence. It is important to note that good entropy regularization is a key proposal in this work for controlling the generation process. Finally, a key point is its application to educational systems, as this type of research can greatly support the design of systems with easy-to-understand adaptive narratives that can adjust to predictability and diversity, addressing the needs of neurodivergent students, and ultimately promoting comprehension, attention, and personalized learning.

7. Conclusions

In this paper we systematically analyze the behavior of predictive entropy when generating text probabilistically using recurrent and attention-based architectures. The paper introduces entropy as a diagnostic measure of uncertainty but also as a possible control tool for balancing lexical diversity, semantic stability, repetition, and robustness in the training of language models. Through a comparative evaluation of recurrent and attention-based architectures, the study demonstrates that good generative quality arises from the interaction between several factors, such as uncertainty, diversity, and coherence, and not only from probability optimization.

The results obtained indicate that having a minimization that is not regulated in terms of entropy can sometimes generate rigid repetitive narratives, but on the other hand, if training is regularized by entropy, this helps to preserve lexical diversity, supports the stabilization of narrative structures, and also improves robustness in random seeds. This is consistent across all models, highlighting the general applicability of the proposed framework.

It is extremely important to point out and emphasize that these findings have implications beyond technical performance. When considering the narrative context geared toward neurodivergent individuals, maintaining and having controlled uncertainty is essential in order to have narratives that are considered adaptable, appealing, and cognitively accessible. It should be mentioned that entropy-regularized models have the ability to accommodate variation without sacrificing coherence, all while aligning probabilistic language modeling with the principles of inclusive design. The Transformer achieved r = −0.71 correlation between entropy and diversity.

Therefore, developers of educational narrative systems should incorporate entropy regularization mechanisms in order to maintain a balance between coherence and lexical diversity. Additionally, predictive entropy monitoring may be used as a diagnostic tool to detect repetitive generation patterns and adapt narrative complexity according to users’ cognitive accessibility requirements. In contexts oriented toward neurodivergent users, maintaining moderate entropy levels is recommended to avoid excessively rigid or unpredictable narratives, thereby promoting more accessible, stable, and adaptive reading experiences.

In conclusion, it can be said that predictive entropy should be understood not only as an element derived from probabilistic modeling, but also as a key element in guiding generative systems aimed at achieving balanced narrative behavior that is conscious of neurodiversity. Future work aims to explore how elements such as conscious entropy objectives can be extended to larger-scale datasets, interactive narrative scenarios, and user-adaptable narrative systems. This effect is consistent with known mode collapse in NLP models.

Supplementary Materials

The following supporting information can be downloaded at: https://doi.org/10.6084/m9.figshare.32085366 (accessed on 5 June 2026).

Author Contributions

Conceptualization, A.A., X.D. and B.Y.M.; methodology, A.A. and X.D.; software, A.A., X.D. and B.Y.M.; validation, T.G. and J.A.G.V.; formal analysis, A.A. and B.Y.M.; investigation, A.A., X.D., B.Y.M., T.G. and J.A.G.V. resources, A.A. and X.D.; data curation, X.D. and B.Y.M.; writing—original draft preparation, A.A. and X.D.; writing—review and editing, A.A., X.D. and T.G.; visualization, T.G. and J.A.G.V.; supervision, A.A., X.D., B.Y.M., T.G. and J.A.G.V.; project administration, A.A., X.D. and B.Y.M.; funding acquisition, A.A., X.D., B.Y.M., T.G. and J.A.G.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data supporting the findings of this study are synthetically generated by the authors following methodologies and parameters reported in the literature. The data are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

RNN	Recurrent neural networks
LSTM	Long Short-Term Memory
Self-BLEU scores	Bilingual Evaluation Understudy
TRF	Transformer

References

Bruner, J.S. Acts of Meaning; Harvard University Press: Cambridge, MA, USA, 1990. [Google Scholar]
Kintsch, W. Comprehension: A Paradigm for Cognition; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
Kuhn, D.; Arvidsson, T.S.; Lesperance, R.; Corprew, R. Can engaging in argumentation improve theory of mind? Discourse Process. 2020, 57, 92–110. [Google Scholar] [CrossRef]
Goldstein, T.R.; Winner, E. Enhancing empathy and theory of mind. J. Cogn. Dev. 2012, 13, 19–37. [Google Scholar] [CrossRef]
Ferstl, E.C.; Neumann, J.; Bogler, C.; von Cramon, D.Y. The extended language network: A meta-analysis of neuroimaging studies on text comprehension. Hum. Brain Mapp. 2008, 29, 581–593. [Google Scholar] [CrossRef] [PubMed]
Gervás, P. Computational approaches to storytelling and creativity. AI Mag. 2009, 30, 49–62. [Google Scholar] [CrossRef]
Roemmele, M.; Gordon, A.S. An encoder–decoder approach to predicting causal relations in stories. In Proceedings of the First Workshop on Storytelling, New Orleans, LA, USA, 5 June 2018; pp. 50–59. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017); Curran Associates: Red Hook, NY, USA, 2017; pp. 5998–6008. [Google Scholar]
Li, J.; Galley, M.; Brockett, C.; Gao, J.; Dolan, B. A diversity-promoting objective function for neural conversation models. In Proceedings of the NAACL-HLT 2016; Association for Computational Linguistics: San Diego, CA, USA, 2016; pp. 110–119. [Google Scholar] [CrossRef]
See, A.; Liu, P.J.; Manning, C.D. Get to the point: Summarization with pointer-generator networks. In Proceedings of the ACL 2017; Association for Computational Linguistics: San Diego, CA, USA, 2017; pp. 1073–1083. [Google Scholar] [CrossRef]
Ma, Y.; Suominen, H.; Haslum, P.; Susilo, R. Text-to-Text Automatic Story Generation: A Survey. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop (EACL-SRW); Association for Computational Linguistics: San Diego, CA, USA, 2026; pp. 514–527. [Google Scholar] [CrossRef]
Calvo, H.; Herrera-González, B.; Laureano, M.H. Integrating Cognitive, Symbolic, and Neural Approaches to Story Generation: A Review on the METATRON Framework. Mathematics 2025, 13, 3885. [Google Scholar] [CrossRef]
Wang, X.; Kang, J.; Han, P.; Ai, Z.; Gong, L. Octopus: Entropy-Controlled Science Fiction Literature Generation with Persistent Memory-Context Binding. In Proceedings of the AAAI Conference on Artificial Intelligence, Singapore, 20–27 January 2026; Volume 40, pp. 40480–40486. [Google Scholar] [CrossRef]
Ali, R.; Caso, F.; Irwin, C.; Lio, P. Entropy-Lens: The Information Signature of Transformer Computations. In Proceedings of the International Conference on Learning Representations (ICLR), Rio de Janeiro, Brazil, 23–27 April 2026; Available online: https://openreview.net/forum?id=NCI3elmcGR (accessed on 8 May 2026).
Zhu, C.; Wu, S.; Zeng, X.; Xu, Z.; Kang, Z.; Guo, Y.; Lu, Y.; Huang, J.; Zhou, G. EDIS: Diagnosing LLM Reasoning via Entropy Dynamics. arXiv 2026, arXiv:2602.01288. [Google Scholar] [CrossRef]
Rastelli, C.; Greco, A.; Finocchiaro, C.; Penazzi, G.; Braun, C.; De Pisapia, N. Neural Dynamics of Semantic Control Underlying Generative Storytelling. Commun. Biol. 2025, 8, 513. [Google Scholar] [CrossRef] [PubMed]
Alpay, F. Narrative-Dynamical Systems (NDS): A Closed-Loop Architecture for Long-Horizon Autoregressive Decoding via Orthogonal Logit Projection and Dynamic Barriers. Preprints 2026, 2026011130. [Google Scholar] [CrossRef]
Zador, A.; Escola, S.; Richards, B.; Ölveczky, B.; Bengio, Y.; Boahen, K.; Botvinick, M.; Chklovskii, D.; Churchland, A.; Clopat, C.; et al. Toward Next-Generation Artificial Intelligence: Catalyzing the NeuroAI Revolution. arXiv 2022, arXiv:2210.08340. [Google Scholar]
Hariyanto; Kristianingsih, F.X.D.; Maharani, R. Artificial intelligence in adaptive education: A systematic review of techniques for personalized learning. Discov. Educ. 2025, 4, 458. [Google Scholar] [CrossRef]
Samuel, J.; Kashyap, R.; Samuel, Y.; Pelaez, A. Adaptive cognitive fit: Artificial intelligence augmented management of information facets and representations. Int. J. Inf. Manag. 2022, 65, 102505. [Google Scholar] [CrossRef]
Bengio, Y.; Ducharme, R.; Vincent, P.; Jauvin, C. A neural probabilistic language model. J. Mach. Learn. Res. 2003, 3, 1137–1155. [Google Scholar]
Mikolov, T.; Karafiát, M.; Burget, L.; Černocký, J.; Khudanpur, S. Recurrent neural network-based language model. In Proceedings of the Interspeech, Chiba, Japan, 26–30 September 2010; pp. 1045–1048. [Google Scholar] [CrossRef]
See, A.; Roller, S.; Kiela, D.; Weston, J. What makes a good conversation? How controllable attributes affect human judgments. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 1702–1723. [Google Scholar] [CrossRef]
Welleck, S.; Kulikov, I.; Kim, J.; Cho, K.; Weston, J. Neural text generation with unlikelihood training. arXiv 2019, arXiv:1908.04319. [Google Scholar]
Mostafazadeh, N.; Chambers, N.; He, X.; Parikh, D.; Batra, D.; Vanderwende, L.; Kohli, P.; Allen, J. A corpus and evaluation framework for deeper understanding of commonsense stories. arXiv 2016, arXiv:1604.01696. [Google Scholar] [CrossRef]
Jurafsky, D.; Martin, J.H. Speech and Language Processing, 2nd ed.; Pearson: Upper Saddle River, NJ, USA, 2009. [Google Scholar]
Tealab, A. Time series forecasting using artificial neural networks methodologies: A systematic review. Future Comput. Inform. J. 2018, 3, 334–340. [Google Scholar] [CrossRef]
Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef] [PubMed]
Kelleher, J.D. Deep Learning; MIT Press: Cambridge, MA, USA, 2019. [Google Scholar]
Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27 (NeurIPS 2014); Curran Associates: Red Hook, NY, USA, 2014; pp. 3104–3112. [Google Scholar]
Qiu, X.; Sun, T.; Xu, Y.; Shao, Y.; Dai, N.; Huang, X. Pre-trained models for natural language processing: A survey. Sci. China Technol. Sci. 2020, 63, 1872–1897. [Google Scholar] [CrossRef]
Parmar, N.; Vaswani, A.; Uszkoreit, J.; Kaiser, L.; Shazeer, N.; Ku, A.; Tran, D. Image Transformer. In Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden, 10–15 July 2018; PMLR: Stockholm, Sweden, 2018; Volume 80, pp. 4055–4064. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Gulati, A.; Qin, J.; Chiu, C.-C.; Parmar, N.; Zhang, Y.; Yu, J.; Han, W.; Wang, S.; Zhang, Z.; Wu, Y.; et al. Conformer: Convolution-augmented Transformer for speech recognition. In Proceedings of the Interspeech 2020; ISCA: Shanghai, China, 2020; pp. 5036–5040. [Google Scholar] [CrossRef]

Figure 4. SeedVariability metric during model training (LSTM).

Figure 5. SeedVariability metric during model training (RNN).

Figure 6. SeedVariability metric during model training (TRF).

Figure 7. LexDiv metric during model training (LSTM).

Figure 8. LexDiv metric during model training (RNN).

Figure 9. LexDiv metric during model training (TRF).

Figure 10. SelfBLUE metric during model training (LSTM).

Figure 11. SelfBLUE metric during model training (RNN).

Figure 12. SelfBLUE metric during model training (TRF).

Figure 13. EntityCons metric during model training (LSTM).

Figure 14. EntityCons metric during model training (RNN).

Figure 15. EntityCons metric during model training (TRF).

Figure 16. SweetSpot metric during model training (LSTM).

Figure 17. SweetSpot metric during model training (RNN).

Figure 18. SweetSpot metric during model training (TRF).

Figure 25. Evolution of Entropy across epochs for all models and seeds.

Figure 26. Heatmaps.

Figure 27. Dependency-based complexity across models.

Figure 28. Evolution of syntactic depth.

Figure 29. Syntactic complexity heatmap.

Figure 30. Absolute variability.

Figure 31. Relativity variability.

Table 1. Means ± standard deviations for Entropy, LexDiv, Self-BLEU, EntityCons, and SweetSpot.

Model	Entropy	Lexical Diversity	Self-BLEU	Entity Consistency	SweetSpot
LSTM	1.5 × 10⁻³ ± 1 × 10⁻⁴	3.75 × 10⁻¹ ± 0	1.0 ± 0	0 ± 0	3.124 × 10⁻¹ ± 0
Simple RNN	1.0 × 10⁻³ ± 0	3.75 × 10⁻¹ ± 0	1.0 ± 0	0 ± 0	3.125 × 10⁻¹ ± 0
Transformer	1.0 × 10⁻³ ± 2 × 10⁻⁴	3.75 × 10⁻¹ ± 0	1.0 ± 0	0 ± 0	3.125 × 10⁻¹ ± 0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alanis, A.; Díaz, X.; Márquez, B.Y.; Guarda, T.; Viramontes, J.A.G. Automated Storytelling for Neurodiversity: Comparative Evaluation Between Multilayer LSTM, Advanced Embeddings, and Modern Narrative Generation Techniques. Appl. Sci. 2026, 16, 5817. https://doi.org/10.3390/app16125817

AMA Style

Alanis A, Díaz X, Márquez BY, Guarda T, Viramontes JAG. Automated Storytelling for Neurodiversity: Comparative Evaluation Between Multilayer LSTM, Advanced Embeddings, and Modern Narrative Generation Techniques. Applied Sciences. 2026; 16(12):5817. https://doi.org/10.3390/app16125817

Chicago/Turabian Style

Alanis, Arnulfo, Ximena Díaz, Bogart Yail Márquez, Teresa Guarda, and J Ascención Guerrero Viramontes. 2026. "Automated Storytelling for Neurodiversity: Comparative Evaluation Between Multilayer LSTM, Advanced Embeddings, and Modern Narrative Generation Techniques" Applied Sciences 16, no. 12: 5817. https://doi.org/10.3390/app16125817

APA Style

Alanis, A., Díaz, X., Márquez, B. Y., Guarda, T., & Viramontes, J. A. G. (2026). Automated Storytelling for Neurodiversity: Comparative Evaluation Between Multilayer LSTM, Advanced Embeddings, and Modern Narrative Generation Techniques. Applied Sciences, 16(12), 5817. https://doi.org/10.3390/app16125817

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automated Storytelling for Neurodiversity: Comparative Evaluation Between Multilayer LSTM, Advanced Embeddings, and Modern Narrative Generation Techniques

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Training

3.2. Metrics

3.3. Narrative Generation

3.4. Hardware/Software

4. Results

4.1. Evolution of Entropy Throughout Training Periods

4.2. Variability and Robustness Between Random Seeds

4.3. Self-BLEU and Repetitiveness

4.4. Entity Consistency in Generated Narratives

4.5. Optimal Training Epoch Detection

4.6. Evolution EntityCons

4.7. Evolution Entropy

4.8. Evolution LexDiv

4.9. Evolution of SeedVariability

4.10. Evolution of SelfBLUE

4.11. Evolution of SweetSpot

4.12. Evolution of Sweet Spot/Optimal Training Periods

5. Statistics

6. Discussion

7. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI