Next Article in Journal
Beyond Accuracy: Explainable Deep Learning for Alzheimer’s Disease Detection Using Structural MRI Data
Previous Article in Journal
ProtoPGTN: A Scalable Prototype-Based Gated Transformer Network for Interpretable Time Series Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Exploring the Link Between the Emotional Recall Task and Mental Health in Humans and LLMs

CogNosco Lab, Department of Psychology and Cognitive Science, University of Trento, Corso Bettini 31, 38068 Rovereto, Trento, Italy
*
Author to whom correspondence should be addressed.
Information 2025, 16(12), 1057; https://doi.org/10.3390/info16121057
Submission received: 13 October 2025 / Revised: 13 November 2025 / Accepted: 26 November 2025 / Published: 2 December 2025

Abstract

The ability of large language models to recall human emotions provides a novel opportunity to investigate links among memory, affect, and mental health. This study explores whether the Emotional Recall Task (ERT), a free word-association paradigm, can reveal cognitive markers of distress in both humans and large language models (LLMs). Using spreading activation simulations grounded in cognitive network science, we examined how the recall of emotional concepts (e.g., stress, anxiety, and depression) relates to psychometric measures of well-being and personality. In Study 1, correlations were tested between activation dynamics and clinical scales (DASS-21, PANAS, and Life Satisfaction) in human participants (N = 1200) and artificial participants generated by GPT-4, Claude Haiku, and Anthropic Opus. For both human and LLM samples, spreading activation was modeled from participants’ ERT words within a human-derived semantic network, enabling a direct comparison of structural activation dynamics rather than psychological states. Humans with higher distress scores exhibited stronger, faster, and more persistent activation of negative concepts, supporting theories of rumination and memory bias. GPT-4 approximated human-like trajectories most closely, though with reduced variability. Study 2 linked recall dynamics with the Big Five traits, confirming that neuroticism predicted greater activation of negative concepts, while extraversion acted as a protective factor. While LLMs lack autobiographical memory, their semantic activation partially mirrored human associations. These findings demonstrate that network-based spreading activation analysis can reveal cognitive signatures of distress while also highlighting the limits of LLMs in modeling human affect.

1. Introduction

The ability of artificial systems to replicate human-like emotional recall provides a new perspective for evaluating how closely a large language model (LLM) can approximate complex psychological processes [1]. Examples include the logical reasoning behind constructing an argument [2], the activation of emotion-laden words in clinical settings [1], and the agreement on naming conventions in social systems [3]. Humans and LLMs might differ significantly in how emotions and cognition interact with one another. In humans, emotions strongly shape cognitive processes, influencing how people perceive the world, retrieve memories, and approach everyday tasks in general [4]. When negative emotional states intensify, leading to distress, anxiety, or depressive disorders, human behavior and decision making can be further compromised. Anxiety, for instance, is known to negatively impact executive functioning [5]; depression is closely intertwined with the cognitive structure that sustains it [6], and stress interferes with processes such as attention, memory formation, and recall [7]. These connections highlight how closely emotions and cognition are intertwined, especially regarding memory. Cognitive network science offers an effective way to examine these connections, enabling researchers to map the network of associations that support or amplify negative emotional states [8,9,10].
Cognitive network science conceptualizes knowledge as a network of interconnected concepts [11,12,13], where associations reflect how information is stored and retrieved in the mental lexicon of knowledge [9]. Within this framework, spreading activation models describe how the retrieval of one concept can trigger cascades of activity across related nodes, allowing emotionally charged words to become more accessible depending on their position and connectivity in the network [8,9,10]. For example, highly central nodes such as “stress” or “anxiety” may receive disproportionate activation, amplifying their salience and sustaining negative emotional states through recurrent loops of activation. This mechanism provides a quantitative account of memory biases observed in psychological distress, where minor triggers can rapidly propagate toward clusters of negative-valence concepts. By applying these tools, cognitive network science [11] enables researchers to capture how structural and dynamical properties of semantic networks mirror individual differences in affect, personality, and vulnerability to mental disorders. This approach is the one we follow in the present work.
By integrating network metrics of concepts within a network of memory recalls among them, this work aims to understand how closely the activation of nodes related to mental distress (such as “anxiety”, “stress”, and “depression”) correlates with measurable psychological indicators of well-being, both at the clinical and personality levels. Applying network science and spreading activation [10], we focus on the interpretation of network-driven results in the context of mental health. Additionally, we aim to emphasize the difference between humans, whose cognitive representations are shaped by personal history and socio-cognitive factors [13,14], and LLMs, which rely on linguistic training data without direct experience or autobiographical context for events and concepts [15].

2. Literature Review

Despite a rich tradition linking language, affect, and cognition, few approaches have quantitatively modeled how individuals’ spontaneous word associations relate to self-reported emotional well-being [16]. The present research study addresses this gap by integrating the Emotional Recall Task (ERT) with a spreading activation framework derived from cognitive network science. This integration allows us to examine whether personal lexical choices, when propagated through a large-scale semantic network, reveal systematic associations with affective self-reports. In doing so, we aim to bridge descriptive linguistic markers of emotion with formal network models of cognitive organization.
Cognitive networks are representational models of cognition. Cognitive network science models human knowledge as a network of concepts linked by learned associations [11], enabling the formal study of how structure shapes retrieval and reasoning [9,10,13]. Foundational spreading activation theories [8,17] propose that cueing one concept propagates activation along associative links, increasing the accessibility of nearby nodes; these ideas are now operationalized in computational simulations that estimate how activation unfolds over time across a lexical network. Normative free-association resources such as the Small World of Words (SWOW) [18] provide the backbone topology for such models, while affective lexica with valence–arousal–dominance (VAD) norms [19] annotate nodes with emotional properties, allowing for joint analyses of semantics and affect. Within this framework, classic network measures (e.g., degree, shortest paths, clustering, and centrality) predict which concepts act as hubs or bridges for activation flow and hence which ideas are most likely to be retrieved under minimal input [9,10]. When negative-valence clusters (e.g., anxiety, stress, and depression) are densely interconnected, even mild cues can traverse short associative paths and disproportionately energize these hubs, sustaining recurrent activation loops consistent with rumination and other maladaptive dynamics observed in psychopathology.
Personality traits and mental health. The Big Five personality traits—neuroticism, conscientiousness, extraversion, agreeableness, and openness [20]—have long been linked with psychological well-being and increased risk for psychopathology. Among these, the trait of neuroticism is the most consistently associated with greater vulnerability to negative emotions and with an elevated risk of developing anxiety and depressive disorders [14,21]. Neurotic individuals tend to simulate past and future problems, thus experiencing negative mood states in the apparent absence of a plausible cause [22]. From a network science perspective, we can assume that these individuals possess cognitive networks in which concepts linked with negative mood states (e.g., “anxiety” or “depression”) exhibit greater connectivity. This network property could lead to quicker or more persistent activation of anxious or depressive thoughts, characteristic of rumination. Rumination is defined as the repetitive and excessive focus on negative emotions, thoughts, or events [23]. It has been strongly linked with the development and maintenance of affective disorders such as depression and anxiety [23] and commonly occurs in other mental disorders. In the context of spreading activation models [17], rumination can be interpreted as a self-perpetuating activation loop: once a negative concept is triggered (e.g., concepts/nodes like depression, anxiety, and stress), the structure and connectivity of the semantic network determine whether this activation dissipates or initiates a recursive cycle. This occurs because highly central and connected nodes (often representing concepts with high valence for the individual) activate neighboring nodes in a cluster with similar affective valence, reinforcing the same emotional patterns. Moreover, these secondary nodes, once activated, may have fewer connections yet can return activation to the starting node, reinforcing a long-lasting emotional loop characteristic of rumination. Unlike neuroticism, conscientiousness, extraversion, agreeableness, and openness may serve as protective factors against mental health disorders [14,21]. Conscientious individuals have a goal-focused mindset, which helps mitigate rumination. Extraversion correlates with positive affect and sociability, and agreeableness promotes harmonious social interactions, while openness encourages creativity and flexible thinking. Collectively, these behavioral traits and thinking tendencies can foster dense positive clusters of nodes and weaken direct connections with negative concepts, potentially providing protection against stress-related factors. Personality traits are thus studied both as a structural and a behavioral moderator of emotional activation, potentially influencing which concepts are accessible, how strongly they interlink, and how stimuli are elaborated.
Psychometric scales as proxies for mental health assessment. To assess psychological well-being and its relationship with the dynamics of cognitive networks, three validated psychometric scales were employed: the Depression, Anxiety, and Stress Scales (DASS-21), the Life Satisfaction Scale, and the Positive and Negative Affect Schedule (PANAS).
Depression Anxiety Stress Scales (DASS-21). The DASS-21 [24] is a widely used self-report scale designed to assess the severity of three distinct psychological constructs: depression, anxiety, and stress. Each subscale consists of seven items that capture core symptomatology—depression (e.g., anhedonia and hopelessness), anxiety (e.g., hyperarousal and excessive worry), and stress (e.g., tension and irritability). By conceptualizing these constructs dimensionally, this tool is especially suited to examining how different manifestations of distress operate in associative networks. In this context, we hypothesize that individuals with higher subscale scores on the DASS-21 may also exhibit cognitive networks with denser and more interconnected negative associations among nodes representing depression, stress, and anxiety, much like those observed in other studies on math anxiety [10].
Life Satisfaction Scale. The Life Satisfaction Scale [25] measures global well-being by assessing an individual’s overall perception of life quality. The scale consists of five items that evaluate the extent to which individuals perceive their lives as fulfilling and meaningful; unlike the DASS-21, this scale provides a cognitive evaluation of well-being, focusing on the presence or absence of positive self-perception. In cognitive networks, individuals reporting greater life satisfaction may exhibit a network structure in which positive concepts are more central or more densely clustered with other positive concepts, as found in past studies using the Emotional Recall Task [16].
Positive and Negative Affect Schedule (PANAS). The PANAS [26] is a psychometric scale that differentiates between positive affect (PA) and negative affect (NA). The PA subscale measures the frequency with which an individual experiences positive states (such as attention, determination, and enthusiasm), while the NA subscale captures distress-related emotions such as fear, hostility, and guilt. From a network science perspective, high NA scores could be linked with stronger and more persistent activation of negative semantic nodes, which in turn could reinforce distress, as found in past studies using the Emotional Recall Task [16].
The Emotional Recall Task (ERT). The ERT is a free-association paradigm designed to capture how individuals spontaneously retrieve and verbalize emotional experiences [27]. In its standard form, participants are asked to generate a fixed number of words—typically around ten—that describe how they have felt over a recent period (e.g., the past week or month). These self-reported words are then analyzed in terms of their affective properties (such as valence, arousal, and dominance) and their position within normative lexical networks. By mapping recalled words onto semantic networks, the ERT provides a window into the accessibility of emotion-laden concepts and how patterns of recall may reflect underlying psychological states such as stress, anxiety, or depression. This approach has been shown to link individual differences in recall with validated psychometric measures, making it a useful tool for exploring the relationships among emotional memory, well-being, and personality [16]. Throughout this work, we interpret ERT-derived activation as a property of lexical–semantic propagation and its association with self-reported affect and well-being. Consistent with past research [16], we do not make claims regarding clinical status or diagnosis.

3. Manuscript Aims and Study Outline

Building on the methodological framework of cognitive network science [11], this work aims to extend the analysis of emotional recall patterns through two complementary studies. In Study 1, we investigated the relationship between the Emotional Recall Task (ERT) responses and psychometric measures of well-being in humans and LLMs. In Study 2, we focused on the link between ERT responses and personality traits. More specifically, Study 1 examined whether lexical activation dynamics derived from a brief Emotional Recall Task (ERT) relate to participants’ self-reported affective scales—namely, the DASS-21 (Depression, Anxiety, and Stress), PANAS (Positive and Negative Affect), and the Satisfaction With Life Scale (SWLS). Study 2 extended this framework to personality dimensions, correlating activation indices with the Big Five traits (neuroticism, conscientiousness, extraversion, agreeableness, and openness). To test the generality of these structural relationships, we applied the same ERT and spreading activation pipeline to text responses generated by three large language models (GPT-4, Claude Haiku, and Claude Opus), treating these outputs as linguistic data for network analysis rather than as indicators of affective states. Consequently, three fundamental themes and research questions are addressed:
  • In Study 1, we examine the extent to which human-generated associations correlate with clinical measures such as the DASS-21 scale, the PANAS, and the Life Satisfaction Scale. RQ1: Can spreading activation signals mirror psychological well-being indicators?
  • In Studies 1 and 2, we compare the capacity of GPT-4, Haiku, and Opus to simulate the associative patterns observed in humans. RQ2: Can these models replicate human emotional dynamics under the lens of cognitive network science?
  • In Study 2, we analyze the correlation between the results from recall tasks and the Big Five personality traits. RQ3: In either humans or LLMs, is there a relationship between the structure of associative memory and personality traits?
Theoretically, this work contributes to cognitive network science by linking spreading activation dynamics to individual differences in affective and personality measures. It extends existing research on the mental lexicon [8,17,28] by testing whether activation flows from personally generated words correspond to well-established dimensions of affect and personality. Practically, the approach demonstrates how brief, language-based tasks—analyzed within transparent network models—can serve as scalable proxies for studying affective lexical organization without collecting sensitive clinical data. By comparing human responses with outputs from large language models (LLMs), we also evaluate the extent to which current AI systems reproduce human-like patterns of lexical–affective activation, thereby providing a foundation for future applications in affect-aware natural language technologies [29].

4. Materials and Methods

For Study 1, five distinct datasets were employed to examine the associations between words generated in the Emotional Recall Task and participants’ psychometric outcomes as measured by the PANAS, DASS-21, and Life Satisfaction scales. Specifically, these datasets included two collections of human participant data, sourced from previous studies [16,27,30], and three datasets of LLM-simulated artificial participants—GPT-4, Claude Haiku, and Anthropic Opus. This diversity in sources enables the comparison of association patterns across different populations and allows for the assessment of how closely artificial models reflect human-like representations of emotion and well-being.
For Study 2, we utilized data from a large-scale online survey conducted in the United States between May and August 2024, collected and publicly shared on an Open Science Repository by De Duro and colleagues [30]. The survey was administered to a sample of 1000 adult participants, who completed first a brief demographic questionnaire and then a series of psychometric assessments. Personality traits were measured using the IPIP-NEO Inventory (short form) [31], which evaluates neuroticism, extraversion, openness, agreeableness, and conscientiousness through five items per trait. After completing the scale, participants took part in the Emotional Recall Task (ERT), in which they were asked to freely generate words describing how they had felt in recent weeks. The resulting data were used to construct individual activation trajectories, which were then analyzed to examine relationships between specific personality profiles and the activation strength of nodes associated with mental distress.
Human participant datasets. Two datasets were used to analyze the relationship between emotional word associations and psychological states in human subjects. The first dataset, derived from a study by Li et al. [27], contains data from 200 native English speakers recruited via Amazon Mechanical Turk. Participants provided ten emotional words describing their feelings over the past month, each accompanied by a self-rated frequency of experience, a valence rating (1–9 scale from unpleasant to pleasant), and an arousal rating (1–9 scale from calm to excited). These responses allowed for the computation of a valence–arousal emotional profile for each participant. After completing the emotional recall task, participants were administered several psychometric instruments: the Positive and Negative Affect Schedule (PANAS) [26], the Depression Anxiety Stress Scale (DASS-21) [24], and the Satisfaction With Life Scale (SWLS) [25].
The second human dataset consists of Emotional Recall Task responses from a larger sample of 1000 individuals, collected by De Duro [30] to validate the association between personality traits and trust in LLMs using free-recall results. This dataset contains personality trait data (the IPIP-NEO questionnaire for personality traits [32]) and includes broader indicators of emotional and psychological functioning, allowing for the generalization of activation results to a larger human sample. This dataset was used in both Studies 1 and 2.
Artificial participant datasets. We analyzed outputs from three large language models (LLMs): GPT-4 (OpenAI, 04/2024 release), Claude Haiku 3.5 (Anthropic, 10/2024 release), and Claude Opus 3 (Anthropic, 03/2024 release). Each model was accessed via its respective API under standard temperature and sampling conditions (temperature = 0.7). For each model, 200 independent pseudo-participants were generated using unique random seeds. Each pseudo-participant received the full Emotional Recall Task (ERT) prompt, followed by item-level questionnaire prompts for the DASS-21, PANAS, and SWLS. Model outputs were parsed and scored according to the original scale guidelines. These simulations provide text-based, reproducible responses for structural comparison with human data and do not imply the presence of self-reported mental or affective states.
For each model, artificial profiles were generated using a fixed, standardized prompt designed to elicit emotionally relevant content and simulate responses to psychometric questionnaires. All LLMs were presented with the complete text of each questionnaire item—identical to what human participants received—and were asked to select or generate a Likert-style response for each item. The model-generated responses were then parsed and scored using the same procedures applied to human data. Importantly, these outputs are treated purely as text-based data derived from the models’ linguistic probability distributions and are not interpreted as self-reports or reflections of internal mental states (see Section 6).
Each large language model (LLM) received the full text of every questionnaire item individually. Specifically, the DASS-21, PANAS, and Satisfaction With Life Scale (SWLS) were administered in their original formats, with each item and its Likert-style options presented sequentially to the model. The prompt instructed the model to select or generate one response per item (e.g., 1–5 or 1–7 scale, depending on the instrument). For each LLM (GPT-4, Claude Haiku, and Claude Opus), 200 independent pseudo-participants were generated using distinct random seeds under fixed sampling conditions (temperature = 0.7 and top-p = 0.9). The numerical responses were parsed and scored using the official scoring keys for each questionnaire, yielding total and subscale scores structurally equivalent to those obtained from human participants. It is essential to note that the LLM questionnaire scores reported here do not possess psychometric validity in the conventional sense. Although the administration and scoring procedures were identical to those used with human participants, the resulting values reflect probabilistic text generation tendencies rather than subjective introspection. Consequently, these data should be understood as linguistic artifacts that approximate the statistical structure of self-report scales, not as indicators of genuine emotion, personality, or well-being within the models. The prompt used for LLM generation was the following:
Impersonate a [x] years old [male/female/person].
Please use 10 English words to describe feelings you have experienced during the past month. Reply only with 10 words separated by a comma.
Please read each numbered statement and indicate how much the statement applied to you over the past week. The rating scale is as follows: 0 indicates it did not apply to you at all, 1 indicates it applied to you to some degree, or some of the time, 2 indicates it applied to you to a considerable degree or a good part of time, 3 indicates it applied to you very much or most of the time. Reply only with the vector number corresponding to your answers.
[Statements from the psychometric questionnaire y are listed.]
Repeat the two tasks independently [z] times.
Here, x represented an age value and ranged so as to match the age ranges in TILLMI by De Duro and colleagues [30]; y was a questionnaire among DASS-21, Life Satisfaction, and PANAS; and z controlled the number of repetitions, allowing the task to be performed ten times to facilitate simulations. Each LLM was tasked with independently generating hundreds of artificial participants by repeating the prompt across different fictional profiles. For each simulated participant, ten emotional words and full psychometric vectors were collected. These data were treated identically to those of human participants in subsequent analyses, allowing for a direct comparison of emotional word associations and activation dynamics across human- and LLM-generated datasets. We excluded the IPIP-NEO Inventory from the LLM simulations because the current literature indicates that LLMs without explicit prompting instructions might not possess clear, reliable, or well-specified personality traits [33,34].
This design enabled us to assess the representational fidelity of LLMs in capturing emotional constructs and their relationship with mental health indicators. Furthermore, the use of three different models allowed for the identification of model-specific biases and performance differences in reproducing human-like semantic associations. All LLM outputs are treated strictly as text-based samples generated from probabilistic language models. They are not interpreted as self-reports or indicators of internal mental states, and any human–LLM comparison is framed exclusively as a comparison of structural activation patterns in text, not of affect or cognition.

4.1. Preprocessing and Construction of the Network

Only participants whose entire set of recalled words from the Emotional Recall Task existed in the Small World of Words (SWOW) English dataset [18] were included in the simulations. This filtering step ensured that each word had a corresponding node within the associative network, preventing bias introduced by missing data. When misspellings occurred and a valid corrected version existed in SWOW, the misspelled words were manually corrected; otherwise, the participant’s full set was excluded to preserve consistency in network structure and input quality.
The cognitive network was constructed from the SWOW dataset [18], with nodes representing individual words and edges reflecting associative strengths based on lexical recall frequencies. This network was implemented using the NetworkX library (available at https://networkx.org/; last accessed: 6 October 2025).
For the dataset including the PANAS and DASS-21 scales, participants’ psychometric scores were linked with their total network activation outputs, i.e., the sum of all activation levels of specific nodes over time, to analyze possible relationships between psychological distress and emotional concepts spreading in memory.

4.2. Spreading Activation Dynamics

We adopted spreading activation as implemented in SpreadPy [10]. SpreadPy (https://github.com/dsalvaz/SpreadPy; last accessed: 26 September 2025) is a Python 3.11 library for simulating spreading activation on single- or multi-layer cognitive networks. Each node i at time t carries an activation energy e i , t , which updates in discrete steps according to a retention parameter r and a transfer function φ ( i j ) that distributes residual energy to neighbors. The general update rule is defined as follows:
e i , t + 1 = r e i , t + j N ( i ) φ ( j i ) ,
where N ( i ) is the neighborhood of node i. In case of unweighted connections, SpreadPy assigns equal probability to all neighbors:
φ ( i j ) = ( 1 r ) e i , t deg ( i ) ,
where deg ( i ) is the degree of node i. In the weighted case, transitions depend on normalized edge weights α i j :
φ ( i j ) = ( 1 r ) e i , t α i j , with j α i j = 1 .
By tuning r and α i j , SpreadPy provides a flexible framework for modeling how activation flows through semantic networks, making it a suitable tool for investigating memory biases, rumination, and emotional recall. Figure 1 contains an overview of the spreading activation dynamics on a cognitive network.

4.3. Spreading Activation Model Implementation

The simulation framework implemented with SpreadPy [10] was designed to trace the semantic activation of the emotional concepts of stress, anxiety, and depression within a cognitive network following the recall of emotional states. Initially, all ten emotional words recalled by each participant from the ERT data were simultaneously activated. Then, spreading activation was run to simulate how other associated concepts mediated the flow of activation across the cognitive network of free associations, i.e., memory recall patterns. Spreading activation was computed on a human-derived associative network (SWOW-EN). Consequently, model behavior reflects diffusion within this empirically constructed lexical–semantic structure. Inferences, therefore, pertain to propagation processes inside this network and align with established findings from prior research [9,10] linking spreading activation to the recall and processing of concepts in memory.
This approach allowed for the observation of the total activation level achieved by a given individual for a target concept related to mental health, i.e., one among “anxiety,” “stress,” and “depression.” In other words, the analysis aimed to determine how individual differences in self-reported emotions—and, in the case of the first dataset, psychological profiles—influenced the dynamics of semantic activation toward negative emotional concepts.
For the sake of simplicity, all analyses were performed using spreading activation diffusion on the unweighted network of free associations derived from SWOW-EN. Edge weights were ignored because their cognitive interpretation is ambiguous: a stronger association could in principle either amplify activation transfer (if interpreted as associative strength) or, conversely, act as a more saturated channel that reduces further spreading. To avoid introducing such theoretical uncertainty, we adopted a binary (unweighted) adjacency structure, where activation spreads uniformly along all outgoing links. Step counts were selected to be long enough for activation peaks to emerge in the time series and for target nodes to reach stationary levels. Empirically, runs of 100, 200, and 400 steps produced equivalent activation profiles. For brevity, we report results for the 200-step runs, as they capture the same asymptotic dynamics. The retention parameter was fixed at r = 0.5 , meaning that each node retained half of the activation it received on first arrival and diffused the remaining half to its outgoing neighbors. This setting provides a cognitive compromise between the influence of network topology and the temporal order of activation propagation, ensuring that spreading dynamics remain sensitive both to structural connectivity and to initial lexical activation.
In Study 2, the spreading activation framework described in Study 1 was applied to examine the relationship between personality traits and semantic activation dynamics. The goal was to assess whether individual differences in traits such as neuroticism, conscientiousness, or extraversion influenced semantic activation toward emotionally salient concepts like stress, anxiety, and depression following word recall. As in Study 1, participants’ recalled ERT words served as simultaneous activation inputs in the network, and activation levels for each of the three target nodes were tracked over 200 computational time steps.

4.4. Batch Simulations and Correlation Analysis

Simulations employed a batch-processing approach, adopting the implementation of spreading activation from SpreadPy [10]. SpreadPy implements Collins and Loftus’ spreading activation model in both single-layer and multiplex networks; in this study, the single-layer configuration was used. For each participant, all ten recall words were activated simultaneously in the semantic network, with spreading activation tracked over 200 time steps. Activation levels of the target nodes “stress,” “anxiety,” and “depression” were recorded at each time step.
The resulting time series of activation levels for each participant were visualized using log–log plots, and the final activation levels were extracted. For participants in the MTurk dataset, these final activation scores were statistically correlated with their psychometric results from the DASS-21 (Stress, Anxiety, and Depression subscales) and PANAS (Positive and Negative Affect) scales, as well as with their scores from the Life Satisfaction Scale.
Kendall’s tau coefficient was selected for correlation analysis due to its suitability for detecting monotonic yet potentially nonlinear relationships between the shapes of activation curves and psychological measures. The visual and statistical comparisons aimed to investigate whether higher levels of emotional distress corresponded to faster or more intense activation of negative emotional concepts. All simulations and visualizations were implemented using SpreadPy 1.0, Seaborn 0.13.2, and Matplotlib 3.10.7, ensuring methodological continuity with Study 1.

5. Results

5.1. Study 1

The first set of results stems from the correlation analyses between the lexical activation levels of specific negative concepts (anxiety, stress, and depression) and scores from the various psychometric scales. The reference measure is the DASS-21, divided into three distinct subscales (DASS-Anxiety, DASS-Stress, and DASS-Depression), with the PANAS Positive and Negative subscales and Life Satisfaction scores also being considered to explore broader associations with well-being. These associations suggest that participants’ cue-word choices channel activation toward negative affect nodes within the lexical network; they should not be interpreted as clinical assessments or as direct evidence of emotional memory.
All plots, presented in Figure 2, Figure 3 and Figure 4, were transformed to a log–log scale on both the X- and Y-axes for human and LLM data alike. Correlations were then computed on these log-transformed values using scatter plots with confidence intervals.

5.1.1. Correlations Between Node Activation and Mental Health Scales

Table 1 presents the correlations between the maximum activation levels of keywords (e.g., anxiety) and their corresponding psychometric dimensions (e.g., DASS-21 Anxiety score) derived from activation initiated by the ERT data.
For human participants, activation of negative concepts in the spreading activation model significantly and positively correlated with their corresponding DASS-21 subscales. Depression activation showed the strongest correlation with DASS-Depression (Kendall’s τ = 0.375 , p < 0.001). Similarly, anxiety activation positively correlated with DASS-Anxiety (Kendall’s τ = 0.225 , p < 0.001), and stress activation correlated with DASS-Stress (Kendall’s τ = 0.298 , p < 0.001).
Interestingly, GPT-4 displayed significant positive correlations with both the DASS-Anxiety ( τ = 0.149 , p < 0.05 ) and DASS-Stress ( τ = 0.152 , p < 0.05 ) subscales but not with DASS-Depression ( τ = 0.029 , n.s.). One possible explanation lies in the semantic topology of the network and the distributional semantics captured by GPT-4’s training corpus. Because GPT-4’s embeddings reflect broad linguistic regularities, its activation patterns may more easily align with general-usage affective terms like stress, which also show higher degree and strength centrality in the SWOW-EN network [35]. In contrast, depression and anxiety reside in more peripheral, narrowly connected regions of the semantic network of SWOW-EN [18], reducing the overlap between model-based and human-based activation trajectories.
These findings confirm that individuals reporting higher levels of psychological distress also exhibit stronger, faster, and more persistent activation of distress-related concepts in their cognitive networks, supporting the memory bias hypothesis in emotional processing. In contrast, LLMs exhibited inconsistent correlation patterns compared with human participants. Haiku showed a weak positive correlation between depression activation and DASS-Depression (Kendall’s τ = 0.136 ,   p = 0.009 ) but a nonsignificant correlation for anxiety (Kendall’s τ = 0.015 ,   p = 0.772 ) and stress (Kendall’s τ = 0.007 ,   p = 0.899 ). Opus demonstrated an inverse correlation between anxiety activation and DASS-Anxiety (Kendall’s τ = 0.123 ,   p = 0.057 ) and a nonsignificant correlation for depression and stress, indicating that its network structure does not reflect human-like emotional connectivity. GPT, despite showing a significant correlation with DASS-Stress (Kendall’s τ = 0.152 ,   p = 0.026 ), yielded nonsignificant results between depression activation and DASS-Depression, as well as for anxiety activation and DASS-Anxiety. The patterns observed with the DASS-Stress results, however, make GPT’s networks the most similar structure to human participants’ ones. Beyond DASS-21, the relationship between negative activation and well-being measures (PANAS Positive and Life Satisfaction) was examined.
In humans, analyses of well-being indicators revealed consistent inverse associations between negative-concept activation and both positive affect and life satisfaction. This is also reported in Figure 5. Specifically, activation strength was negatively correlated with PANAS Positive across all three domains, with the strongest effect being observed for depression ( τ = 0.3510 , p < 0.001 ), followed by stress ( τ = 0.1490 , p < 0.05 ) and anxiety ( τ = 0.1265 , p < 0.05 ). Similarly, higher activation of distress-related nodes was associated with lower scores on the Life Satisfaction Scale. This effect was the most pronounced for depression ( τ = 0.3801 , p < 0.001 ) but was also robust for anxiety ( τ = 0.2538 , p < 0.001 ) and stress ( τ = 0.2444 , p < 0.001 ). Together, these findings indicate that individuals whose recall networks more strongly activate concepts of depression, anxiety, and stress tend to report reduced positive affect and diminished satisfaction with life, highlighting the sensitivity of the Emotional Recall Task to broader aspects of subjective well-being. Compared with humans, LLMs displayed way weaker correlations between total activation scores and PANAS/Life Satisfaction scores (cf. Table 1).
The direction of the above findings suggests that the absence of autobiographical experience in LLMs creates a disconnection between lexical activation and subjective emotional states; while they demonstrate learned associations between depression and negative affect, their activation patterns are weaker and less systematic.

5.1.2. Differences Between Human Participants and LLMs

The log–log scale analysis revealed significant differences in activation intensity distributions between human participants and LLMs. First, human participants displayed greater variability compared with LLMs, with activation levels covering multiple orders of magnitude. For example, in cases where anxiety was triggered, some participants exhibited extreme spikes in activation, while others remained at lower, more stable levels. This variability reflects individual differences in emotional processing, where psychosocial factors (e.g., personal memories, biases, and personality traits) modulate how strongly distress-related concepts are activated.
In contrast, LLMs exhibited substantially less activation variability. Despite the identical spreading activation mechanism applied across all models, extreme peaks of activation were rare, and overall variability was compressed. Among the models, GPT-4 produced the most human-like activation curves, possibly due to its extensive training corpus and richer semantic representations. However, the absence of episodic memory prevented a complete replication of the full range of activation intensities observed in human participants.
Another key finding in the log–log curves is a pattern of convergence: after reaching peak activation, both human and LLM trajectories gradually decline and stabilize at a metastable level where activation neither increases nor decreases. For humans, this demonstrates functional universality: while individual differences shape the curves before reaching the maximum activation point, all human participants eventually reach comparable activation levels within their group, sharing approximately similar structural patterns. LLMs exhibited a comparable trend, though with considerably less individual variability. This may be due to LLMs displaying emotional biases akin to those observed in humans, as previously noted in the literature [36].

5.2. Study 2

Study 2 focuses on investigating correlations between personality traits and total activation levels of concepts such as anxiety, depression, and stress. Results are reported in Table 2 and refer exclusively to human participants. Simulations with LLMs were not considered for personality-trait analyses, because without explicit prompting instructions, it remains unclear whether LLMs exhibit stable, reliable, and well-defined personality characteristics, as noted in the previous literature [33,34].
Humans with high neuroticism traits show stronger activation of concepts such as anxiety and stress, along with higher activation peaks for depression. Their tendency to worry and ruminate on negative experiences may explain the denser connectivity of their semantic networks, which facilitates rapid activation even without direct exposure to stressors. The correlation between neuroticism and depression activation (Kendall’s τ = 0.1608 ,   p = 0.0013 ) confirms a statistically significant association between higher neuroticism scores and increased activation of depression-related concepts.
On the other hand, high extraversion appears to mitigate negative activation, as extraverted individuals demonstrate weaker activation of depression. The negative correlation between extraversion and depression activation (Kendall’s τ = 0.1864 ,   p < 0.001 ) suggests that extraversion may function as a protective factor, reducing the spread of negative semantic activation and potentially preventing ruminative loops of negative thought. Notably, this variability cannot be replicated in LLMs, as they do not possess personality traits; while they can statistically model semantic relationships between words, they lack the structural and experiential factors that shape personality. As a result, direct comparisons are not feasible.
These findings underscore the value of cognitive network science in elucidating the mechanisms underlying anxiety, stress, and depression.

6. Discussion

This paper examined the relationship between the results from the Emotional Recall Task (ERT) and mental health. By applying the spreading activation model to word-pair lists generated through the ERT, we hypothesized that individuals scoring higher on mental health scales (e.g., DASS-21) and those with higher neuroticism traits would exhibit stronger activation peaks of “cognitive energy” following the initial activation of nodes associated with mental distress (e.g., stress, depression, and anxiety). The findings support these hypotheses, revealing stronger activation peaks in these groups and comparable results across human participants and LLMs in terms of the relationship between mental health scale scores and the word pairs generated from the ERT. These results offer new insights into the structure and functioning of emotional memory and its relationship with mental health.
While the same questionnaires and prompts were administered to LLMs and human participants, the resulting scores represent simulated, text-based outputs rather than introspective reports. Accordingly, our comparisons reflect structural similarities in linguistic activation patterns, not shared affective or cognitive processes. This distinction is further clarified in our review of both studies’ results.
Study 1 focused on the link between the word pairs generated from the ERT and scores on the psychometric scales PANAS, DASS-21, and Life Satisfaction. We examined the correlations in both human participants and LLMs (Haiku, Opus, and GPT-4) by generating simulated participants. This approach allowed us to investigate whether relationships exist between psychometric scales and ERT responses and to what extent these associations correspond to clinical indicators through simulations applied to ERT-derived data and psychometric scale scores.
Study 1 also revealed recurrent patterns in activation trajectories among negative concepts, particularly in participants with higher distress scores. These loops likely reflect ruminative circuits present in individuals’ cognitive structures, in which activation moves cyclically from one negative concept to another. This quantitative pattern indicates the presence of semantic closures within cognitive networks, potentially giving rise to negative activation zones where activation becomes self-sustaining and confined, limiting transitions to other conceptual clusters. This structure is not only consistent with models of rumination that sustain negative emotional states and affective disorders but also aligns with the “attractor states” theory proposed to study dynamic systems and applied to psychology. According to this theory, certain psychological states, thoughts, and behaviors become stable over time through repeated reinforcement [37]. Similarly, negative clusters are automatically activated and reinforced, and this lexical activation generalizes across multiple contexts over time, maintaining negative emotional states and thoughts and even contributing to resistance to change or treatment in mental health disorders.
Additionally, Study 1 aimed to determine whether LLMs are able to mirror the same emotional and mnemonic dynamics observed in humans. The results demonstrated that the associations emerging from the ERT in humans positively correlated with scores on psychometric scales. Participants showing higher and faster activation peaks following word recall also scored higher on scales measuring negative affect (such as the DASS-21 and PANAS Negative) and lower on scales measuring positive affect (such as the Life Satisfaction Scale and PANAS Positive). As shown in previous studies [1], LLMs exhibited variable performance: GPT-4 produced the most human-like results, though with lower activation peaks and less variability among simulated participants. Haiku and Opus, on the other hand, showed weaker and statistically nonsignificant correlations between ERT-generated activation levels and the scores of simulated participants across the different scales.
The significant GPT-4 correlations for anxiety and stress, but not for depression, likely reflect differences in the lexical and corpus-based representations of these constructs. As shown by the SWOW-EN network used in past studies [35], stress and anxiety occupy highly connected, polysemous neighborhoods spanning work, academic, and physiological contexts. In contrast, depression tends to appear within more specialized or clinical discourse, forming a narrower and less densely connected subnetwork. Consequently, GPT-4’s activation diffusion patterns may mirror the structure of common affective vocabulary but do not extend to conceptually deeper or less frequent terms associated with depressive experience.
The findings from Study 1 indicate that lexical recall patterns, analyzed through the lens of associative networks and their cognitive activation dynamics, can be successfully compared with scales designed to assess mental distress (such as the DASS-21, PANAS, and Life Satisfaction Scale), with cognitive activation toward words indicating emotional distress (such as depression, anxiety, and stress) varying linearly with the scores obtained on these scales. This association confirms that emotional states are tightly linked with lexical memory access, aligning with previous research showing that negative emotions bias memory recall and information processing [4,6]. Individuals with higher DASS-21 Anxiety, Depression, or Stress subscale scores exhibited significantly stronger, faster, and more persistent activation of negative concepts. These distress-concept activation patterns suggest that emotional recall, assessed through free associations, can identify mental distress and is sensitive to the cognitive changes in thought patterns found in mental disorders [38,39].
Study 2 explored the associations between the Big Five personality traits and the activation of negative emotional concepts—specifically anxiety, stress, and depression—during spreading activation simulations in cognitive semantic networks. The results revealed that high neuroticism scores correlated significantly and positively with increased activation of negative concepts, particularly depression, while high extraversion scores correlated negatively with depression activation. These results are consistent with the psychological literature identifying neuroticism as a key vulnerability factor for affective disorders [40,41] and extraversion as a potential protective trait against negative affective states [42]. These findings also extend theoretical perspectives in psychology, suggesting that lexical recall patterns [9,10,28] could be integrated into assessments of at-risk individuals; high centrality scores for anxiety or depression or elevated activation around specific nodes may signal a neuroticism-related cognitive profile associated with greater emotional vulnerability.
Importantly, the personality effects observed in Study 2 converge with the affective associations identified in Study 1. Neuroticism has long been linked with greater negative emotionality, rumination, and vulnerability to anxiety and depression [43]. Individuals scoring higher in neuroticism exhibited stronger activation of depression- and anxiety-related nodes. This pattern parallels the heightened activation toward negative-valence concepts observed among participants reporting higher DASS-21 distress and lower life satisfaction. In contrast, extraversion represents a broad disposition associated with positive affectivity, approach motivation, and social engagement [21]. This trait predicted weaker activation in negative-emotion regions, mirroring the attenuated activation observed among participants with higher PANAS Positive Affect scores. Taken together, these converging findings suggest that the spreading activation dynamics derived from the Emotional Recall Task [27] capture a shared lexical–affective organization underlying both transient states of well-being and enduring personality dispositions [44].

6.1. Memory in Humans vs. LLMs: The Role of Episodic Memory

The LLMs tested in Study 1 partly failed to replicate human responses, as large language models’ outputs did not exhibit the same associative or emotional patterns observed in human participants. This discrepancy aligns with recent findings in cognitive psychology [45] and psycholinguistics [28]. In our mental health case, this discrepancy likely stems from the lack of autobiographical memory [1,15]: emotional states linked with personal memory (as in the Emotional Recall Task) also require autobiographical content, which is inherently absent in LLMs.
Study 1 involved the comparison of spreading activation dynamics between human participants and large language models (LLMs). The comparison between the two groups highlighted both similarities and differences. While humans showed wide variability in activation levels across individuals, LLMs showed a sort of “compressed” distribution. This difference could be accounted for by the absence of episodic memory in LLMs, which also reflects the lack of autobiographical depth and emotional self-reference that characterizes human cognition and the memories that humans form. Janik [46] claims that human memory integrates contextual, sensory, and emotional features, resulting in richly personal and highly variable memory recall. In contrast, LLMs can only reproduce the statistical co-occurrence mechanisms and semantic regularities found within their learning corpora or arising from their training processes [15].
Nonetheless, despite their lack of personal experience and memories [1], LLMs displayed surprisingly structured activation trajectories and regularities resembling those seen in human participants. This was especially true for GPT-4, whose spreading activation patterns most closely approximated those of humans, especially around the stress node activation. These similarities suggest that although LLMs obviously lack the biological [47] and emotional [12,24] foundation of human cognition, their internal representations may partially reflect aspects of human-like semantic organization.
We highlighted how negative-concept activation in humans was not only stronger but also more varied, hinting at a role for individual differences, potentially including personality traits, trauma exposure, stress levels, or even the presence or absence of symptoms of mental distress, in modulating the recall patterns in their cognitive network. LLMs, in contrast, showed minimal variability within each model, lacking the heterogeneity that characterizes human emotional cognition, influenced by lived experiences. This again underscores the structural limitations of artificial cognition compared with human mental life. Raz and colleagues [48] show that LLMs exhibit certain similarities to human participants in terms of problem-solving and question-complexity performance but warn about significant differences in the underlying cognitive processes between humans and LLMs. Mahowald and colleagues [49] emphasize that while humans and LLMs produce similar outputs in recall or language generation tasks, the underlying cognitive mechanisms remain fundamentally different.

6.2. Limitations and Future Directions

Importantly, our outcomes represent correlations with self-reported scales and do not constitute clinical diagnoses. LLM outputs are simulated textual responses without psychological states, and comparisons with human data are structural only. Moreover, all inferences are conditioned on a human-derived association network; thus, insights reflect diffusion over that structure rather than direct claims about emotional memory mechanisms. This approach entails several important limitations that must be acknowledged. First, while statistically significant, the correlations found were modest and should be interpreted cautiously; their effect sizes suggest that personality is merely one of many factors influencing semantic activation and individual variability remains high. Second, the Emotional Recall Task [27] itself may be more sensitive to emotional states than traits, introducing noise into personality associations. While neuroticism is quite the stable trait [50], the activation of words concerning symptoms or negative concepts in one session may reflect momentary distress rather than trait-like tendencies. Therefore, multiple results over time from the same individuals would be necessary to better identify which results can be linked with personality traits and which with mental state influences. Third, as stated above, the link between mental health and personality traits is still unclear. While, in the literature, some patterns linking different personality traits with attitudes [51], thought patterns, and emotional styles have already emerged, researchers speak of tendencies rather than clear, direct associations. Finally, although this study links personality types with semantic activation patterns, it does not examine the mechanisms through which this link occurs. Future studies could integrate measures of rumination [52] and other cognitive styles, such as mind wandering [47], to better specify these mediators and measure their impact on recall associations.
The strength of the correlations between activation and distress as measured by scales further highlights the potential of using network-based tools in psychological assessment.
A major consideration emerging from this work concerns whether LLMs can effectively simulate patients with psychiatric symptoms. Despite LLMs being able to reproduce psycholinguistic data with some effectiveness [53], the question of psychiatric symptoms is far more complex. Here, LLMs were able to partially generate responses that activated the same negative concepts targeted in human participants. However, the depth and variability of these activations diverged significantly. As described earlier, GPT-4 showed moderate alignment with human-like activation paths, particularly for the stress node, suggesting some capacity of the model to mimic emotional associations. However, its inconsistencies in correlating with depression or anxiety across different models reveal a ceiling effect in its ability to simulate distress dynamics. Nevertheless, there is growing evidence indicating that LLMs, when prompted appropriately, can mirror certain styles of thought [2,3,53] and, importantly, specific emotional tones. Both CounseLLMe by De Duro and colleagues [30] and recent work by Wang and colleagues [54] demonstrate that GPT-generated patient narratives and interactions can effectively reproduce the communicative and emotional characteristics of real patients when guided with clinically relevant prompts. These results suggest that while the internal processes of LLMs are not rooted in personal experience, their training on emotionally charged corpora and patient dialogues allows them to reconstruct the form of emotional expression found in humans. Therefore, although the underlying mechanisms are different, the outputs produced by LLMs under certain conditions resemble those of human patients.
This opens promising directions for future work, positioning LLMs as a potentially valuable addition to clinical research and practice [55,56]. By embedding LLMs into therapeutic simulations for mental health trainees or diagnostic interviews, practitioners could explore differential emotional responses, identify linguistic markers of distress, and test hypotheses about psychopathology dynamics—provided that their limitations are clearly recognized [57].

7. Conclusions

The findings from both studies revealed that peaks of activation, indicating cognitive distress, correlate with higher scores on mental health scales and with the neuroticism personality type. Furthermore, this personality type has, in turn, been correlated with a higher risk of developing mental disorders, emphasizing the link between mental health and cognitive representations in semantic memory. This pattern, although more modestly, also emerged in LLMs, with GPT-4 showing the strongest alignment with humans. Although this work does not claim precise mental state tracking from free-recall tasks, it demonstrates that cognitive network science can reveal mental health patterns in humans, potentially identifying at-risk individuals while simultaneously underscoring the differences in cognition between humans and large language models.

Author Contributions

Conceptualization, A.C. and M.S.; methodology, A.C. and M.S.; software, A.C. and E.T.; validation, A.C. and E.T.; formal analysis, A.C.; data curation, A.C., E.T. and M.S.; writing—original draft preparation, A.C., E.T. and M.S.; writing—review and editing, A.C., E.T. and M.S.; visualization, A.C. and M.S.; supervision, M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research study was funded by the COGNOSCO project, funded by University of Trento (grant ID PSCal2227).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new human data were generated for this study. The code for performing the activation simulations implemented in this work is available at the following repository (in the format of Google Docs working with Python): https://osf.io/c7shb/ (accessed on 12 October 2025).

Acknowledgments

The authors acknowledge Enrico Perinelli and Tiziano Gaddo for valuable feedback in the early stages of this project.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. De Duro, E.S.; Improta, R.; Stella, M. Introducing CounseLLMe: A dataset of simulated mental health dialogues for comparing LLMs like Haiku, LLaMAntino and ChatGPT against humans. Emerg. Trends Drugs Addict. Health 2025, 5, 100170. [Google Scholar] [CrossRef]
  2. Cau, E.; Pansanella, V.; Pedreschi, D.; Rossetti, G. Selective agreement, not sycophancy: Investigating opinion dynamics in LLM interactions. EPJ Data Sci. 2025, 14, 59. [Google Scholar] [CrossRef]
  3. Ashery, A.F.; Aiello, L.M.; Baronchelli, A. Emergent social conventions and collective bias in LLM populations. Sci. Adv. 2025, 11, eadu9368. [Google Scholar] [CrossRef]
  4. Brosch, T.; Scherer, K.; Grandjean, D.; Sander, D. The impact of emotion on perception, attention, memory, and decision-making. Swiss Med. Wkly. 2013, 143, w13786. [Google Scholar] [CrossRef]
  5. Shields, G.S.; Sazma, M.A.; Yonelinas, A.P. The effects of acute stress on core executive functions: A meta-analysis and comparison with cortisol. Neurosci. Biobehav. Rev. 2016, 68, 651–668. [Google Scholar] [CrossRef] [PubMed]
  6. Joormann, J.; Quinn, M.E. Cognitive processes and emotion regulation in depression. Depress. Anxiety 2014, 31, 308–315. [Google Scholar] [CrossRef] [PubMed]
  7. Mendl, M. Performing under pressure: Stress and cognitive function. Appl. Anim. Behav. Sci. 1999, 65, 221–244. [Google Scholar] [CrossRef]
  8. Anderson, J.R. A spreading activation theory of memory. J. Verbal Learn. Verbal Behav. 1983, 22, 261–295. [Google Scholar] [CrossRef]
  9. Siew, C.S.Q. spreadr: An R package to simulate spreading activation in a network. Behav. Res. Methods 2019, 51, 910–929. [Google Scholar] [CrossRef]
  10. Citraro, S.; Haim, E.; Carini, A.; Siew, C.S.; Rossetti, G.; Stella, M. SpreadPy: A Python tool for modelling spreading activation and superdiffusion in cognitive multiplex networks. arXiv 2025, arXiv:2507.09628. [Google Scholar] [CrossRef]
  11. Stella, M.; Citraro, S.; Rossetti, G.; Marinazzo, D.; Kenett, Y.N.; Vitevitch, M.S. Cognitive modelling of concepts in the mental lexicon with multilayer networks: Insights, advancements, and future challenges. Psychon. Bull. Rev. 2024, 31, 1981–2004. [Google Scholar] [CrossRef]
  12. Kenett, Y.N.; Anaki, D.; Faust, M. Investigating the structure of semantic networks in low and high creative persons. Front. Hum. Neurosci. 2014, 8, 407. [Google Scholar] [CrossRef]
  13. Citraro, S.; Rossetti, G. Identifying and exploiting homogeneous communities in labeled networks. Appl. Netw. Sci. 2020, 5, 55. [Google Scholar] [CrossRef]
  14. Naragon-Gainey, K.; Watson, D. What Lies Beyond Neuroticism? An Examination of the Unique Contributions of Social-Cognitive Vulnerabilities to Internalizing Disorders. Assessment 2018, 25, 143–158. [Google Scholar] [CrossRef]
  15. Stella, M.; Hills, T.T.; Kenett, Y.N. Using cognitive psychology to understand GPT-like models needs to extend beyond human biases. Proc. Natl. Acad. Sci. USA 2023, 120, e2312911120. [Google Scholar] [CrossRef]
  16. Stella, M.; Swanson, T.J.; Teixeira, A.S.; Richson, B.N.; Li, Y.; Hills, T.T.; Forbush, K.T.; Watson, D. Cognitive Networks and Text Analysis Identify Anxiety as a Key Dimension of Distress in Genuine Suicide Notes. Big Data Cogn. Comput. 2025, 9, 171. [Google Scholar] [CrossRef]
  17. Collins, A.M.; Loftus, E.F. A spreading-activation theory of semantic processing. Psychol. Rev. 1975, 82, 407–428. [Google Scholar] [CrossRef]
  18. De Deyne, S.; Navarro, D.J.; Perfors, A.; Brysbaert, M.; Storms, G. The “Small World of Words” English word association norms for over 12,000 cue words. Behav. Res. Methods 2019, 51, 987–1006. [Google Scholar] [CrossRef] [PubMed]
  19. Mohammad, S.M.; Turney, P.D. Crowdsourcing a word–emotion association lexicon. Comput. Intell. 2013, 29, 436–465. [Google Scholar] [CrossRef]
  20. McCrae, R.R.; Costa, P.T. Validation of the five-factor model of personality across instruments and observers. J. Personal. Soc. Psychol. 1987, 52, 81–90. [Google Scholar] [CrossRef]
  21. Lyon, K.A.; Elliott, R.; Ware, K.; Juhasz, G.; Brown, L.J.E. Associations between Facets and Aspects of Big Five Personality and Affective Disorders:A Systematic Review and Best Evidence Synthesis. J. Affect. Disord. 2021, 288, 175–188. [Google Scholar] [CrossRef] [PubMed]
  22. Perkins, A.M.; Arnone, D.; Smallwood, J.; Mobbs, D. Thinking too much: Self-generated thought as the engine of neuroticism. Trends Cogn. Sci. 2015, 19, 492–498. [Google Scholar] [CrossRef]
  23. Nolen-Hoeksema, S.; Wisco, B.E.; Lyubomirsky, S. Rethinking Rumination. Perspect. Psychol. Sci. 2008, 3, 400–424. [Google Scholar] [CrossRef] [PubMed]
  24. Lovibond, P.; Lovibond, S. The structure of negative emotional states: Comparison of the Depression Anxiety Stress Scales (DASS) with the Beck Depression and Anxiety Inventories. Behav. Res. Ther. 1995, 33, 335–343. [Google Scholar] [CrossRef]
  25. Diener, E.; Emmons, R.A.; Larsen, R.J.; Griffin, S. The Satisfaction With Life Scale. J. Personal. Assess. 1985, 49, 71–75. [Google Scholar] [CrossRef]
  26. Watson, D.; Clark, L.A.; Tellegen, A. Development and validation of brief measures of positive and negative affect: The PANAS scales. J. Personal. Soc. Psychol. 1988, 54, 1063–1070. [Google Scholar] [CrossRef]
  27. Li, Y.; Masitah, A.; Hills, T.T. The Emotional Recall Task: Juxtaposing recall and recognition-based affect scales. J. Exp. Psychol. Learn. Mem. Cogn. 2020, 46, 1782. [Google Scholar] [CrossRef] [PubMed]
  28. Vitevitch, M.S. Examining Chat GPT with nonwords and machine psycholinguistic techniques. PLoS ONE 2025, 20, e0325612. [Google Scholar] [CrossRef]
  29. Semeraro, A.; Vilella, S.; Improta, R.; De Duro, E.S.; Mohammad, S.M.; Ruffo, G.; Stella, M. EmoAtlas: An emotional network analyzer of texts that merges psychological lexicons, artificial intelligence, and network science. Behav. Res. Methods 2025, 57, 77. [Google Scholar] [CrossRef]
  30. Elshaer, I.A.; AlNajdi, S.M.; Salem, M.A. Measuring the Impact of Large Language Models on Academic Success and Quality of Life Among Students with Visual Disability: An Assistive Technology Perspective. Bioengineering 2025, 12, 1056. [Google Scholar] [CrossRef]
  31. Johnson, J.A. Measuring thirty facets of the Five Factor Model with a 120-item public domain inventory: Development of the IPIP-NEO-120. J. Res. Personal. 2014, 51, 78–89. [Google Scholar] [CrossRef]
  32. Maples-Keller, J.L.; Williamson, R.L.; Sleep, C.E.; Carter, N.T.; Campbell, W.K.; Miller, J.D. Using item response theory to develop a 60-item representation of the NEO PI–R using the International Personality Item Pool: Development of the IPIP–NEO–60. J. Personal. Assess. 2019, 101, 4–15. [Google Scholar] [CrossRef]
  33. Jiang, H.; Zhang, X.; Cao, X.; Breazeal, C.; Roy, D.; Kabbara, J. PersonaLLM: Investigating the ability of large language models to express personality traits. arXiv 2023, arXiv:2305.02547. [Google Scholar]
  34. Song, X.; Gupta, A.; Mohebbizadeh, K.; Hu, S.; Singh, A. Have large language models developed a personality?: Applicability of self-assessment tests in measuring personality in llms. arXiv 2023, arXiv:2305.14693. [Google Scholar] [CrossRef]
  35. Fatima, A.; Li, Y.; Hills, T.T.; Stella, M. Dasentimental: Detecting depression, anxiety, and stress in texts via emotional recall, cognitive networks, and machine learning. Big Data Cogn. Comput. 2021, 5, 77. [Google Scholar] [CrossRef]
  36. Cau, E.; Failla, A.; Rossetti, G. Bots of a Feather: Mixing Biases in LLMs’ Opinion Dynamics. In Proceedings of the International Conference on Complex Networks and Their Applications, Istanbul, Turkiye, 10–12 December 2024; Springer: Cham, Switherland, 2024; pp. 166–176. [Google Scholar]
  37. Spencer, J.P.; Austin, A.; Schutte, A.R. Contributions of dynamic systems theory to cognitive development. Cogn. Dev. 2012, 27, 401–418. [Google Scholar] [CrossRef]
  38. Kaplan, D.M.; Palitsky, R.; Carey, A.L.; Crane, T.E.; Havens, C.M.; Medrano, M.R.; Reznik, S.J.; Sbarra, D.A.; O’Connor, M. Maladaptive repetitive thought as a transdiagnostic phenomenon and treatment target: An integrative review. J. Clin. Psychol. 2018, 74, 1126–1136. [Google Scholar] [CrossRef] [PubMed]
  39. Bemme, D.; Kirmayer, L.J. Global Mental Health: Interdisciplinary challenges for a field in motion. Transcult. Psychiatry 2020, 57, 3–18. [Google Scholar] [CrossRef]
  40. Kotov, R.; Gamez, W.; Schmidt, F.; Watson, D. Linking “big” personality traits to anxiety, depressive, and substance use disorders: A meta-analysis. Psychol. Bull. 2010, 136, 768–821. [Google Scholar] [CrossRef] [PubMed]
  41. Ormel, J.; Jeronimus, B.F.; Kotov, R.; Riese, H.; Bos, E.H.; Hankin, B.; Rosmalen, J.G.; Oldehinkel, A.J. Neuroticism and common mental disorders: Meaning and utility of a complex relationship. Clin. Psychol. Rev. 2013, 33, 686–697. [Google Scholar] [CrossRef]
  42. Sarubin, N.; Wolf, M.; Giegling, I.; Hilbert, S.; Naumann, F.; Gutt, D.; Jobst, A.; Sabaß, L.; Falkai, P.; Rujescu, D.; et al. Neuroticism and extraversion as mediators between positive/negative life events and resilience. Personal. Individ. Differ. 2015, 82, 193–198. [Google Scholar] [CrossRef]
  43. Costa, P.T., Jr.; McCrae, R.R. Four ways five factors are basic. Personal. Individ. Differ. 1992, 13, 653–665. [Google Scholar] [CrossRef]
  44. Diener, E.; Wirtz, D.; Tov, W.; Kim-Prieto, C.; Choi, D.; Oishi, S.; Biswas-Diener, R. New Measures of Well-Being: Flourishing and Positive and Negative Feelings Social Indicators Research. Soc. Indic. Res. 2009, 39, 247–266. [Google Scholar]
  45. Binz, M.; Schulz, E. Using cognitive psychology to understand GPT-3. Proc. Natl. Acad. Sci. USA 2023, 120, e2218523120. [Google Scholar] [CrossRef]
  46. Janik, R.A. Aspects of human memory and Large Language Models. arXiv 2023, arXiv:2311.03839. [Google Scholar] [CrossRef]
  47. Chang, M.; Sorella, S.; Crescentini, C.; Grecucci, A. Gray and White Matter Networks Predict Mindfulness and Mind Wandering Traits: A Data Fusion Machine Learning Approach. Brain Sci. 2025, 15, 953. [Google Scholar] [CrossRef]
  48. Raz, T.; Reiter-Palmon, R.; Kenett, Y.N. Open and closed-ended problem solving in humans and AI: The influence of question asking complexity. Think. Ski. Creat. 2024, 53, 101598. [Google Scholar] [CrossRef]
  49. Mahowald, K.; Ivanova, A.A.; Blank, I.A.; Kanwisher, N.; Tenenbaum, J.B.; Fedorenko, E. Dissociating language and thought in large language models. Trends Cogn. Sci. 2024, 28, 517–540. [Google Scholar] [CrossRef] [PubMed]
  50. McAdams, D.P.; Olson, B.D. Personality Development: Continuity and Change Over the Life Course. Annu. Rev. Psychol. 2010, 61, 517–542. [Google Scholar] [CrossRef]
  51. da Silva, B.B.C.; Paraboni, I. Personality Recognition from Facebook Text. In Computational Processing of the Portuguese Language; Villavicencio, A., Moreira, V., Abad, A., Caseli, H., Gamallo, P., Ramisch, C., Gonçalo Oliveira, H., Paetzold, G.H., Eds.; Springer: Cham, Switherland, 2018; pp. 107–114. [Google Scholar]
  52. Bernstein, E.E.; Heeren, A.; McNally, R.J. Reexamining trait rumination as a system of repetitive negative thoughts: A network analysis. J. Behav. Ther. Exp. Psychiatry 2019, 63, 21–27. [Google Scholar] [CrossRef] [PubMed]
  53. Binz, M.; Akata, E.; Bethge, M.; Brändle, F.; Callaway, F.; Coda-Forno, J.; Dayan, P.; Demircan, C.; Eckstein, M.K.; Éltető, N.; et al. A foundation model to predict and capture human cognition. Nature 2025, 644, 1002–1009. [Google Scholar] [CrossRef]
  54. Wang, R.; Milani, S.; Chiu, J.C.; Zhi, J.; Eack, S.M.; Labrum, T.; Murphy, S.M.; Jones, N.; Hardy, K.; Shen, H.; et al. PATIENT-Ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals. arXiv 2024, arXiv:2405.19660. [Google Scholar] [CrossRef]
  55. Guo, Z.; Lai, A.; Thygesen, J.H.; Farrington, J.; Keen, T.; Li, K. Large Language Models for Mental Health Applications: Systematic Review. JMIR Ment. Health 2024, 11, e57400. [Google Scholar] [CrossRef] [PubMed]
  56. Lawrence, H.R.; Schneider, R.A.; Rubin, S.B.; Matarić, M.J.; McDuff, D.J.; Jones Bell, M. The Opportunities and Risks of Large Language Models in Mental Health. JMIR Ment. Health 2024, 11, e59479. [Google Scholar] [CrossRef] [PubMed]
  57. Hua, Y.; Na, H.; Li, Z.; Liu, F.; Fang, X.; Clifton, D.; Torous, J. A scoping review of large language models for generative tasks in mental health care. npj Digit. Med. 2025, 8, 230. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Visualization of the spreading activation dynamics over a given network topology and across subsequent time steps.
Figure 1. Visualization of the spreading activation dynamics over a given network topology and across subsequent time steps.
Information 16 01057 g001
Figure 2. Time evolution across subsequent steps of activation levels for the word “anxiety” across free-association networks when using human ERT data from Study 1 (top left) and GPT-4 ERT data (top right), Haiku ERT data (middle left), Opus ERT data (middle right), and human ERT data from Study 2 (bottom).
Figure 2. Time evolution across subsequent steps of activation levels for the word “anxiety” across free-association networks when using human ERT data from Study 1 (top left) and GPT-4 ERT data (top right), Haiku ERT data (middle left), Opus ERT data (middle right), and human ERT data from Study 2 (bottom).
Information 16 01057 g002
Figure 3. Time evolution across subsequent steps of activation levels for the word “depression” across free-association networks when using human ERT data from Study 1 (top left) and GPT-4 ERT data (top right), Haiku ERT data (middle left), Opus ERT data (middle right), and human ERT data from Study 2 (bottom).
Figure 3. Time evolution across subsequent steps of activation levels for the word “depression” across free-association networks when using human ERT data from Study 1 (top left) and GPT-4 ERT data (top right), Haiku ERT data (middle left), Opus ERT data (middle right), and human ERT data from Study 2 (bottom).
Information 16 01057 g003
Figure 4. Time evolution across subsequent steps of activation levels for the word “stress” across free-association networks when using human ERT data from Study 1 (top left) and GPT-4 ERT data (top right), Haiku ERT data (middle left), Opus ERT data (middle right), and human ERT data from Study 2 (bottom).
Figure 4. Time evolution across subsequent steps of activation levels for the word “stress” across free-association networks when using human ERT data from Study 1 (top left) and GPT-4 ERT data (top right), Haiku ERT data (middle left), Opus ERT data (middle right), and human ERT data from Study 2 (bottom).
Information 16 01057 g004
Figure 5. Scatter plots of total activation levels for concepts “anxiety”, “depression”, and “stress” for each individual human ERT data (representing a dot) versus the correspondent PANAS Positive scores (right) and Life Satisfaction scores (left).
Figure 5. Scatter plots of total activation levels for concepts “anxiety”, “depression”, and “stress” for each individual human ERT data (representing a dot) versus the correspondent PANAS Positive scores (right) and Life Satisfaction scores (left).
Information 16 01057 g005
Table 1. Kendall’s τ correlations between activation of negative concepts and psychological scales.
Table 1. Kendall’s τ correlations between activation of negative concepts and psychological scales.
ScaleHumansGPT-4HaikuOpus
DASS-Depression vs. “depression” 0.375 *** 0.029 0.136 ** 0.036
DASS-Anxiety vs. “anxiety” 0.225 *** 0.149 * 0.015 0.123
DASS-Stress vs. “stress” 0.298 *** 0.152 * 0.007 0.005
Life Satisfaction vs. “depression” 0.380 *** 0.071 0.155 ** 0.051
Life Satisfaction vs. “anxiety” 0.254 *** 0.132 * 0.007 0.059
Life Satisfaction vs. “stress” 0.244 *** 0.127 * 0.007 0.048
PANAS Positive vs. “depression” 0.351 *** 0.089 0.137 * 0.025
PANAS Positive vs. “anxiety” 0.127 *** 0.138 * 0.087 0.071
PANAS Positive vs. “stress” 0.149 *** 0.131 * 0.009 0.064
Notes: Significance levels—* p < 0.05 , ** p < 0.01 , and *** p < 0.001 .
Table 2. Kendall’s τ correlations between activation of distress-related concepts and Big Five personality traits in humans (values rounded to four decimals).
Table 2. Kendall’s τ correlations between activation of distress-related concepts and Big Five personality traits in humans (values rounded to four decimals).
TraitStressAnxietyDepression
Conscientiousness 0.0545 0.0659 0.1600 **
Agreeableness 0.0466 0.0580 0.1275 *
Openness 0.0466 0.0580 0.1275 *
Extraversion 0.0032 0.0064 0.1864 ***
Neuroticism 0.0244 0.0046 0.1608 **
Notes: Significance levels—* p < 0.05 , ** p < 0.01 , and *** p < 0.001 .
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Carini, A.; Taietta, E.; Stella, M. Exploring the Link Between the Emotional Recall Task and Mental Health in Humans and LLMs. Information 2025, 16, 1057. https://doi.org/10.3390/info16121057

AMA Style

Carini A, Taietta E, Stella M. Exploring the Link Between the Emotional Recall Task and Mental Health in Humans and LLMs. Information. 2025; 16(12):1057. https://doi.org/10.3390/info16121057

Chicago/Turabian Style

Carini, Alessandra, Enrique Taietta, and Massimo Stella. 2025. "Exploring the Link Between the Emotional Recall Task and Mental Health in Humans and LLMs" Information 16, no. 12: 1057. https://doi.org/10.3390/info16121057

APA Style

Carini, A., Taietta, E., & Stella, M. (2025). Exploring the Link Between the Emotional Recall Task and Mental Health in Humans and LLMs. Information, 16(12), 1057. https://doi.org/10.3390/info16121057

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop