Understanding Transformers and Large Language Models (LLMs) with Natural Language Processing (NLP)

A special issue of AI (ISSN 2673-2688).

Deadline for manuscript submissions: 15 October 2026 | Viewed by 11278

Special Issue Editors


E-Mail Website
Guest Editor
CogNosco Lab, Department of Psychology and Cognitive Science, University of Trento, Trento, Italy
Interests: complex systems modeling; natural language processing; semantic networks; multilayer networks; complex networks
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
CogNosco Lab, Department of Psychology and Cognitive Science, University of Trento, Trento, Italy
Interests: large language models; psychometrics; data science

Special Issue Information

Dear Colleagues,

Large Language Models (LLMs), and transformers in general, have reshaped the scientific landscape of Natural Language Processing; however, their internal reasoning and divergence from human cognition remain largely opaque. This Special Issue, edited by Prof. Massimo Stella and Dr. Alexis Carrillo Ramirez, invites innovative research that leverages NLP as a lens to better understand LLMs.

We invite you to contribute to this Special Issue with novel works related to the following areas of research:

  1. Designing linguistic stimuli, psycholinguistic benchmarks, or contrastive prompts to uncover latent representations and emergent abilities of LLMs in NLP tasks, e.g., harnessing LLMs’ ability in detecting human emotions, creativity levels, personality traits, and other elements from language.
  2. Comparing distributional, causal, and neuro‑symbolic analyses of transformer layers with human behavioral and neural data, e.g., comparing distributional representations of key linguistic units.
  3. Tracing how training corpora, fine‑tuning, and in‑context learning shape model predictions, biases, and/or theory of mind relative to LLMs, e.g., assessing affective or cognitive biases within LLMs.

We particularly encourage interdisciplinary work bridging NLP, computational linguistics, cognitive science, network science, and ethics to illuminate both the potential and the limits of scaling laws. Ultimately, this Special Issue aims to develop principled methodologies that convert black‑box performance into transparent knowledge, guiding the responsible development of future AI systems aligned with human communicative norms. Submissions discussing reproducibility protocols and open‑source toolkits for community validation are especially welcome.

Prof. Dr. Massimo Stella
Dr. Alexis Carrillo Ramirez
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. AI is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • bias in LLMs
  • human–AI language comparison
  • cognitive benchmarks
  • LLMs’ trustworthiness

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

21 pages, 891 KB  
Article
Architectural Constraints in LLM-Simulated Cognitive Decline: In Silico Dissociation of Memory Deficits and Generative Language as Candidate Digital Biomarkers
by Rubén Pérez-Elvira, Javier Oltra-Cucarella, María Agudo Juan, Luis Polo-Ferrero, Manuel Quintana Díaz, Jorge Bosch-Bayard, Alfonso Salgado Ruiz, A. N. M. Mamun Or Rashid and Raúl Juárez-Vela
AI 2026, 7(2), 69; https://doi.org/10.3390/ai7020069 - 12 Feb 2026
Viewed by 1222
Abstract
This study examined whether large language models (LLMs) can generate clinically realistic profiles of cognitive decline and whether simulated deficits reflect architectural constraints rather than superficial role-playing artifacts. Using GPT-4o-mini, we generated synthetic cohorts (n = 10 per group) representing healthy aging, mild [...] Read more.
This study examined whether large language models (LLMs) can generate clinically realistic profiles of cognitive decline and whether simulated deficits reflect architectural constraints rather than superficial role-playing artifacts. Using GPT-4o-mini, we generated synthetic cohorts (n = 10 per group) representing healthy aging, mild cognitive impairment (MCI), and Alzheimer’s disease (AD), assessed through a conversational neuropsychological battery covering episodic memory, verbal fluency, narrative production, orientation, naming, and comprehension. Experiment 1 tested whether synthetic subjects exhibited graded cognitive profiles consistent with clinical progression (Control > MCI > AD). Experiment 2 systematically manipulated prompt context in AD subjects (short, rich biographical, and few-shot prompts) to dissociate robust from manipulable deficits. Significant cognitive gradients emerged (p < 0.001) across eight of thirteen domains. AD subjects showed impaired episodic memory (Cohen’s d = 4.71), increased memory intrusions, and reduced narrative length (d = 3.07). Critically, structurally constrained memory tasks (episodic recall, digit span) were invariant to prompting (p > 0.05), whereas generative tasks (narrative length, verbal fluency) showed high sensitivity (F > 100, p < 0.001). Rich biographical prompts paradoxically increased memory intrusions by 343%, indicating semantic interference rather than cognitive rescue. These results demonstrate that LLMs can serve as in silico test benches for exploring candidate digital biomarkers and clinical training protocols, while highlighting architectural constraints that may inform computational hypotheses about memory and language processing. Full article
Show Figures

Figure 1

20 pages, 1544 KB  
Article
No Free Lunch in Language Model Bias Mitigation? Targeted Bias Reduction Can Exacerbate Unmitigated LLM Biases
by Shireen Chand, Faith Baca and Emilio Ferrara
AI 2026, 7(1), 24; https://doi.org/10.3390/ai7010024 - 13 Jan 2026
Viewed by 1499
Abstract
Large Language Models (LLMs) inherit societal biases from their training data, potentially leading to harmful outputs. While various techniques aim to mitigate these biases, their effects are typically evaluated only along the targeted dimension, leaving cross-dimensional consequences unexplored. This work provides the first [...] Read more.
Large Language Models (LLMs) inherit societal biases from their training data, potentially leading to harmful outputs. While various techniques aim to mitigate these biases, their effects are typically evaluated only along the targeted dimension, leaving cross-dimensional consequences unexplored. This work provides the first systematic quantification of cross-category spillover effects in LLM bias mitigation. We evaluate four bias mitigation techniques (Logit Steering, Activation Patching, BiasEdit, Prompt Debiasing) across ten models from seven families, measuring impact on racial, religious, profession-, and gender-related biases using the StereoSet benchmark. Across 160 experiments yielding 640 evaluations, we find that targeted interventions cause collateral degradations to model coherence and performance along debiasing objectives in 31.5% of untargeted dimension evaluations. These findings provide empirical evidence that debiasing improvements along one dimension can come at the cost of degradation in others. We introduce a multi-dimensional auditing framework and demonstrate that single-target evaluations mask potentially severe spillover effects, underscoring the need for robust, multi-dimensional evaluation tools when examining and developing bias mitigation strategies to avoid inadvertently shifting or worsening bias along untargeted axes. Full article
Show Figures

Figure 1

19 pages, 4717 KB  
Article
Benchmarking Psychological Lexicons and Large Language Models for Emotion Detection in Brazilian Portuguese
by Thales David Domingues Aparecido, Alexis Carrillo, Chico Q. Camargo and Massimo Stella
AI 2025, 6(10), 249; https://doi.org/10.3390/ai6100249 - 1 Oct 2025
Viewed by 1851
Abstract
Emotion detection in Brazilian Portuguese is less studied than in English. We benchmarked a large language model (Mistral 24B), a language-specific transformer model (BERTimbau), and the lexicon-based EmoAtlas for classifying emotions in Brazilian Portuguese text, with a focus on eight emotions derived from [...] Read more.
Emotion detection in Brazilian Portuguese is less studied than in English. We benchmarked a large language model (Mistral 24B), a language-specific transformer model (BERTimbau), and the lexicon-based EmoAtlas for classifying emotions in Brazilian Portuguese text, with a focus on eight emotions derived from Plutchik’s model. Evaluation covered four corpora: 4000 stock-market tweets, 1000 news headlines, 5000 GoEmotions Reddit comments translated by LLMs, and 2000 DeepSeek-generated headlines. While BERTimbau achieved the highest average scores (accuracy 0.876, precision 0.529, and recall 0.423), an overlap with Mistral (accuracy 0.831, precision 0.522, and recall 0.539) and notable performance variability suggest there is no single top performer; however, both transformer-based models outperformed the lexicon-based EmoAtlas (accuracy 0.797) but required up to 40 times more computational resources. We also introduce a novel “emotional fingerprinting” methodology using a synthetically generated dataset to probe emotional alignment, which revealed an imperfect overlap in the emotional representations of the models. While LLMs deliver higher overall scores, EmoAtlas offers superior interpretability and efficiency, making it a cost-effective alternative. This work delivers the first quantitative benchmark for interpretable emotion detection in Brazilian Portuguese, with open datasets and code to foster research in multilingual natural language processing. Full article
Show Figures

Figure 1

31 pages, 7395 KB  
Article
Creativeable: Leveraging AI for Personalized Creativity Enhancement
by Ariel Kreisberg-Nitzav and Yoed N. Kenett
AI 2025, 6(10), 247; https://doi.org/10.3390/ai6100247 - 1 Oct 2025
Cited by 2 | Viewed by 3458
Abstract
Creativity is central to innovation and problem-solving, yet scalable training solutions remain limited. This study evaluates Creativeable, an AI-powered creativity training program that provides automated feedback and adjusts creative story writing task difficulty without human intervention. A total of 385 participants completed [...] Read more.
Creativity is central to innovation and problem-solving, yet scalable training solutions remain limited. This study evaluates Creativeable, an AI-powered creativity training program that provides automated feedback and adjusts creative story writing task difficulty without human intervention. A total of 385 participants completed five rounds of creative story writing using semantically distant word prompts across four conditions: (1) feedback with adaptive difficulty (F/VL); (2) feedback with constant difficulty (F/CL); (3) no feedback with adaptive difficulty (NF/VL); (4) no feedback with constant difficulty (NF/CL). Before and after using Creativeable, participants were assessed for their creativity, via the alternative uses task, as well as undergoing a control semantic fluency task. While creativity improvements were evident across conditions, the degree of effectiveness varied. The F/CL condition led to the most notable gains, followed by the NF/CL and NF/VL conditions, while the F/VL condition exhibited comparatively smaller improvements. These findings highlight the potential of AI to democratize creativity training by offering scalable, personalized interventions, while also emphasizing the importance of balancing structured feedback with increasing task complexity to support sustained creative growth. Full article
Show Figures

Figure 1

Other

Jump to: Research

12 pages, 944 KB  
Perspective
Could You Be Wrong: Metacognitive Prompts for Improving Human Decision Making Help LLMs Identify Their Own Biases
by Thomas T. Hills
AI 2026, 7(1), 33; https://doi.org/10.3390/ai7010033 - 19 Jan 2026
Viewed by 1442
Abstract
Because LLMs are still in development, what is true today may be false tomorrow. We therefore need general strategies for debiasing LLMs that will outlive current models. Strategies developed for debiasing human decision making offer one promising approach as they incorporate an LLM-style [...] Read more.
Because LLMs are still in development, what is true today may be false tomorrow. We therefore need general strategies for debiasing LLMs that will outlive current models. Strategies developed for debiasing human decision making offer one promising approach as they incorporate an LLM-style prompt intervention designed to access additional latent knowledge during decision making. LLMs trained on vast amounts of information contain information about potential biases, counter-arguments, and contradictory evidence, but that information may only be brought to bear if appropriately prompted. Metacognitive prompts developed in the human decision making literature are designed to achieve this and, as I demonstrate here, they show promise with LLMs. The prompt I focus on is “could you be wrong?” Following an LLM response, this prompt leads LLMs to produce additional information, including why they answered as they did, identifying errors, biases, contradictory evidence, and alternatives, none of which were present in their initial response. Further, this metaknowledge often reveals that how LLMs and users interpret prompts are not aligned. I demonstrate this prompt in three cases. In the first two cases I use a set of questions taken from recent articles identifying LLM biases, including implicit discriminatory biases and failures of metacognition. “Could you be wrong” prompts the LLM to identify its own biases and produce cogent metacognitive reflection. In the last case I present an example involving convincing but incomplete information about scientific research (the too much choice effect), which is readily corrected by “could you be wrong?” In sum, this work argues that human psychology offers a valuable avenue for prompt engineering, leveraging a long history of effective prompt-based improvements to decision making. Full article
Show Figures

Figure 1

Back to TopTop