Next Article in Journal
Exploring the Associations Between Autistic Traits, Sleep Quality, and Well-Being in University Students: A Cross-Sectional Study
Previous Article in Journal
Neurocognitive Impairment After COVID-19: Mechanisms, Phenotypes, and Links to Alzheimer’s Disease
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Neurophysiological Paradox of AI-Induced Frustration: A Multimodal Study of Heart Rate Variability, Affective Responses, and Creative Output

1
Intelligent Design Laboratory, School of Fine Arts, Central China Normal University, Wuhan 430079, China
2
Digital Media Art, School of Fine Arts, Central China Normal University, Wuhan 430079, China
*
Author to whom correspondence should be addressed.
Brain Sci. 2025, 15(6), 565; https://doi.org/10.3390/brainsci15060565
Submission received: 11 April 2025 / Revised: 20 May 2025 / Accepted: 20 May 2025 / Published: 25 May 2025
(This article belongs to the Section Cognitive, Social and Affective Neuroscience)

Abstract

:
AI code generators are increasingly used in creative contexts, offering operational efficiencies on the one hand and prompting concerns about psychological and neurophysiological strain on the other. This study employed a multimodal approach to examine the affective, autonomic, and creative consequences of AI-assisted coding in early-stage learners. Fifty-eight undergraduate design students with no formal programming experience were randomly assigned to either an AI-assisted group or a control group and engaged in a two-day generative programming task. Emotional states (PANAS), creative self-efficacy (CSES), and subjective workload (NASA-TLX) were assessed, alongside continuous monitoring of heart rate variability (HRV; RMSSD and LF/HF). Compared to the controls, the AI-assisted group exhibited greater increases in negative affect (p = 0.006), reduced parasympathetic activity during the task (p = 0.001), and significant post-task declines in creative self-efficacy (p < 0.05). Expert evaluation of creative outputs revealed a significantly lower performance in the AI group (p = 0.040), corroborated by behavioral observations showing higher tool dependency, emotional volatility, and rigid problem-solving strategies. These findings indicate that, in novice users, the opacity and unpredictability of AI feedback may disrupt emotional regulation and autonomic balance, thereby undermining creative engagement. The results highlight the need to consider neurocognitive vulnerability and the learner’s developmental stage when integrating AI tools into cognitively demanding creative workflows.

1. Introduction

The integration of Artificial Intelligence (AI) into creative tasks has grown rapidly, particularly with the rise of large language models (LLMs) such as ChatGPT-4o and ERNIE Bot 4.5 Turbo, which are extensively used for code generation, digital art, and interactive design. AI-powered code generators have successfully lowered the technical threshold for programming and improved efficiency, enabling creators to focus more on ideation. However, the widespread application of such systems has raised concerns about their psychological impact on users, particularly regarding emotional disturbance, cognitive overload, and stress regulation. While prior studies have acknowledged the potential of AI to streamline routine work and enhance task productivity [1], they also caution that the instability and opacity of AI outputs may increase emotional volatility, cognitive conflict, and reduce the quality of final outcomes [2].
Emotional state and cognitive load are key psychological variables influencing creative performance. Positive affect (e.g., enthusiasm and satisfaction) is typically associated with a greater cognitive flexibility and divergent thinking, whereas negative affect (e.g., anxiety and frustration) may suppress creative engagement [3]. A high cognitive load is known to impair emotional regulation and increase sympathetic nervous system (SNS) activity, thus disrupting the autonomic nervous system (ANS) balance [4]. However, in AI-assisted code generation environments, creators often encounter unpredictable or semantically ambiguous outputs that require frequent debugging, potentially heightening stress levels and hindering emotional control.
Heart rate variability (HRV) serves as a well-established physiological marker of ANS functioning and stress regulation [5]. A higher HRV generally reflects stronger parasympathetic activity and better emotional adaptability, while a decreased HRV is indicative of sympathetic dominance and elevated psychological stress [6]. HRV has been widely applied to evaluate cognitive load, emotional arousal, and self-regulatory capacity in task-based research [7], providing reliable insight into psychophysiological responses during cognitively demanding tasks [8].Accordingly, HRV fluctuations during AI-assisted creation could offer critical physiological evidence of stress and emotional strain.
From the perspective of social cognitive and affective neuroscience, emotional regulation is mediated by the prefrontal cortex–amygdala circuit, which plays a central role in balancing cognition and emotion [9]. The prefrontal cortex (PFC) governs top-down regulation by attenuating excessive amygdala activation, thereby mitigating anxiety and stress responses [10]. However, under conditions of task failure or an elevated cognitive load, PFC efficacy may decline, leading to overactivation of the amygdala and stronger negative affect [11].In AI-mediated tasks, creators may experience goal obstruction or dissonance when facing logic errors or incomprehensible AI outputs, activating affective stress pathways and reducing HRV.
Moreover, the Cognitive–Affective Conflict Hypothesis suggests that cognitive conflicts encountered during task execution, including goal obstruction or increased informational uncertainty, trigger stress responses through excessive activation of the sympathetic nervous system [7,12,13]. Empirical evidence indicates that an elevated cognitive load corresponds to a lower HRV and increased negative affect, highlighting the convergence of physiological and psychological stress responses [14]. In the context of AI-assisted creation, this interplay may be further exacerbated by poor code quality, overreliance on automated outputs, or a lack of interpretability, which elevate stress and reduce autonomy [12,15,16].
Despite the growing interest in AI-assisted creativity, most existing studies have focused on either productivity outcomes or user perceptions in isolation, and often rely solely on self-report data without physiological validation [13,15,17]. There remains a lack of research that systematically investigates the emotional, physiological, and behavioral consequences of AI use in open-ended creative tasks, especially from a psychophysiological perspective. Such an approach is necessary to gain a more comprehensive understanding of how AI systems shape users’ internal states and behavioral outcomes during creative engagement.
A further limitation of prior work is that it rarely considers the learner’s developmental stage when evaluating the impact of AI-assisted systems. Existing studies often treat users as a homogeneous group, without acknowledging how differences in domain expertise shape the cognitive and emotional consequences of AI interaction [18,19]. Particular concerns arise when AI tools are introduced at the very beginning of domain learning, a stage during which users lack both technical competence and the metacognitive strategies required to evaluate or adapt AI-generated content [20]. Prior research has identified a fundamental divergence between novice and expert users. While domain experts can make effective use of AI outputs through contextual reasoning and top-down control, novice learners are more likely to experience confusion, overreliance, and difficulty in interpreting opaque system feedback [2,20]. In such foundational learning environments, the premature integration of AI may interrupt schema acquisition, overload cognitive resources, and hinder the development of intrinsic creative motivation [21].
To address these concerns, the present study investigates how AI code generators influence learners at the initial stage of programming education. Specifically, we examine a population of undergraduate design students with no prior programming experience who completed a two-day creative coding task. First, the study utilizes the Positive and Negative Affect Schedule (PANAS) to assess emotional state changes in both AI and non-AI groups during the task, evaluating whether AI impacts emotional stability. Second, HRV is employed as a physiological marker to examine whether creators in the AI group demonstrate a lower HRV, thereby assessing potential increases in physiological stress due to AI usage. Finally, expert blind reviews are conducted to compare the creative output quality between the AI and non-AI groups, elucidating the practical impact of AI in creative tasks.
In line with these objectives, we propose the following hypotheses:
H1. 
AI-assisted creation increases psychological stress and impairs autonomic nervous system regulation.
H2. 
AI assistance leads to higher levels of negative affect and lower levels of positive affect during the creative process.
H3. 
Participants who use AI report greater perceived cognitive workload and frustration compared to those in the control group.
H4. 
The creative output quality is lower among AI-assisted participants than among those who complete the task independently.
This study is among the first to integrate HRV physiological data, PANAS emotional assessments, NASA-TLX workload evaluation, and expert blind reviews to comprehensively analyze the influence of AI-assisted creation on emotional regulation, stress, and creativity.

2. Materials and Methods

2.1. Participants

A total of 58 undergraduate students majoring in design at Central China Normal University (M = 19.07, SD = 0.84) were recruited for this study, comprising 7 males and 51 females. All participants participated voluntarily and provided written informed consent prior to the experiment. The study was approved by the institutional ethics committee and conducted in accordance with the ethical principles outlined in the Declaration of Helsinki (Approval Code: CCNU-IRB-202306002a).
None of the participants had formal programming experience. Participants were included if they were undergraduate students in design disciplines and had no formal programming experience. Participants were excluded from the creative performance analysis if they failed to produce a final output due to critical AI execution errors, though their other data were retained. Based on their performance on a processing programming comprehension test, the participants were evenly assigned to the following two groups: Group A (control group, n = 29), who completed the creative task without AI assistance, and Group B (experimental group, n = 29), who were permitted to use Wenxiaoyan, a large-scale language model developed by Baidu, for code generation after receiving the same instructional content. During the final creative stage, two participants in Group B were unable to complete their creative outputs due to execution errors in the AI-generated code and, thus, were excluded from the analysis of creative scores. However, their remaining data (HRV, emotional state, and self-efficacy) were retained for statistical analysis. A priori power analysis was conducted using G*Power 3.1 to estimate the required sample size for detecting medium-to-large effects (d = 0.63) with 80% power and a two-tailed significance level of 0.05. The analysis indicated that approximately 23 participants per group would be required under these parameters. Each group in this study initially included 29 participants, exceeding this reference value.

2.2. Experimental Procedures

The two-day experimental procedure is illustrated in Figure 1, which provides an overview of the study design and sequence. On the first day, all participants received approximately 210 min of foundational processing programming instruction in a standardized classroom environment. The course content covered basic syntax for drawing, variable and structure control, and the application of for-loop constructs. Upon completion of the instruction, a comprehension test was immediately administered, and the participants were evenly assigned to two groups based on their test scores to ensure no systematic differences in initial programming ability.
The participants in the experimental group were additionally introduced to the basic usage guidelines of Wenxiaoyan—a large-scale language model developed by Baidu. The second day was designated for the formal experimental procedure. Before the task began, all participants entered the laboratory and remained seated in a quiet state for five minutes while wearing a Polar H10 chest strap, which was used to initiate continuous heart rate monitoring. Data from this phase were designated as the baseline for heart rate variability (HRV; T1). Immediately following this resting period, the participants completed the initial administration of the PANAS and the CSES. The participants then engaged in a creative task lasting 180 min under the theme “Healing”, during which they were required to create at least one interactive poster incorporating for-loop structures using the processing platform. The participants in Group A completed the task using only instructional materials and sample code, whereas those in Group B were permitted to utilize Wenxiaoyan for code generation. Throughout the session, the researchers continuously observed the participants’ behavior and provided non-strategic technical support. The midpoint of the task (at 90 min) was marked as the second HRV data collection point (T2) for physiological analysis. Upon task completion, a third resting HRV data collection (T3) was conducted, followed by the second administration of the PANAS and CSES, as well as the NASA-TLX. The participants subsequently submitted their source files and image outputs. All creative works were anonymized and independently evaluated by three expert reviewers affiliated with the Artists’ Association. The evaluation criteria included creativity (50%), technical execution (30%), and thematic relevance (20%), with the final score calculated as the mean of the three reviewers’ ratings.

2.3. Measurement

A multimodal measurement framework was employed to test the four hypotheses. Hypothesis 1 (H1), related to autonomic stress, was assessed using heart rate variability (HRV) indices—RMSSD and the LF/HF ratio—collected at three task phases. Hypothesis 2 (H2), concerning affective changes, was evaluated using the Positive and Negative Affect Schedule (PANAS), administered before and after the task. Hypothesis 3 (H3) was tested using the NASA Task Load Index (NASA-TLX), which measures perceived mental demand, effort, and frustration. Hypothesis 4 (H4) was tested through expert blind evaluations of the participants’ creative outputs, based on standardized rubrics of creativity, technical execution, and thematic relevance.
HRV data were continuously recorded using the Polar H10 chest strap device, with a sampling rate of 250 Hz. The participants wore the device before the experimental task began and continued to wear it throughout the creative task, enabling continuous heart rate monitoring. During the data analysis phase, the following three standardized 5-min segments were extracted from the complete dataset: the pre-task resting phase (T1), the mid-task phase (T2), and the post-task resting phase (T3). Data processing and HRV parameter computation were conducted using Kubios HRV Premium software, which applied automated artifact correction and a default band-pass filter (0.04–0.4 Hz) to ensure signal stability and cross-sample comparability. Kubios is a widely adopted analytical tool in psychophysiological research, recognized for its high algorithmic reproducibility and accurate parameter estimation [22]. The analysis indices included time-domain parameters—standard deviation of normal-to-normal intervals (SDNN) and root mean square of successive differences (RMSSD)—as well as the following frequency-domain parameters: low-frequency power (LF), high-frequency power (HF), and the LF/HF ratio. These parameters have been widely recommended as valid physiological markers of changes in psychological and emotional states, and are particularly suitable for HRV assessments in studies on stress responses, task load, and emotion regulation [23].
Emotional states were assessed using the Chinese version of the PANAS, originally developed by Watson, Clark, and Tellegen [24], which is designed to evaluate individuals’ emotional responses in specific situational contexts. The scale consists of 14 emotion-descriptive adjectives, including 9 items measuring positive affect (e.g., “excited”, “attentive”, and “happy”) and 5 items measuring negative affect (e.g., “nervous”, “irritable”, and “afraid”). The participants rated each item based on their subjective emotional experience using a 5-point Likert scale (1 = very slightly or not at all; 5 = extremely). In the present study, a Chinese version of the PANAS that has been revised and psychometrically validated by Chinese scholars was used [25]. This version has been widely employed in empirical research on stress, emotional interventions, creativity, and educational contexts [16,17,18]. It should be noted that the PANAS measures state-based emotional responses and is not designed to diagnose clinical emotional disorders such as anxiety or depression.
Subjective task load was assessed using the NASA-TLX, developed by Hart and Staveland [26], which comprises the following six dimensions: mental demand, physical demand, temporal demand, performance, effort, and frustration. In the present study, a simplified Chinese version of the scale was adopted. This version has been widely applied in research fields such as design, education, cognitive neuroscience, and human–computer interaction, and has demonstrated a good reliability and cross-contextual adaptability [27,28].
Creative self-efficacy was assessed using the Creative Self-Efficacy Scale (CSES), developed by Tierney and Farmer [29] based on Bandura’s theory of self-efficacy. The scale consists of four items that evaluate participants’ perceived confidence and expectations of their ability to complete creative tasks, covering dimensions such as creative problem solving, independent idea generation, elaboration on others’ ideas, and flexible adaptation to challenges. All items are rated on a 5-point Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree). The scale has been widely validated across various cultural contexts and task types, and has been found to be highly correlated with variables such as creativity, emotion regulation, and occupational motivation [30,31]. In the present study, a bilingual, culturally adapted Chinese version was used, which demonstrated a good internal consistency reliability (Cronbach’s α = 0.87).

2.4. Data Analysis

All statistical analyses were conducted using SPSS version 27.0, with a significance level set at α = 0.05 (two-tailed). Before conducting inferential analyses, the normality of all continuous variables was assessed using the Shapiro–Wilk test. The results indicated that the PANAS and CSES scores were normally distributed (p > 0.05), justifying the use of parametric tests. However, several NASA-TLX dimensions deviated significantly from normality (p < 0.05), and, thus, non-parametric Mann–Whitney U tests were employed for between-group comparisons on these subscales. Descriptive statistics and reliability information (Cronbach’s α, means, standard deviations, and score ranges) for all measurement scales (PANAS, NASA-TLX, and CSES) are provided in Appendix A (Table A1).
HRV data were analyzed using repeated-measures ANOVA (RM-ANOVA) to examine interaction effects between time points (T1, T2, and T3) and group (A vs. B). Pre- and post-task PANAS and CSES scores were analyzed using paired-sample t-tests for within-group changes and independent-sample t-tests for between-group comparisons. As NASA-TLX scores deviated from normal distribution, non-parametric Mann–Whitney U tests were applied to evaluate differences across each dimension. Creative output scores were compared between groups using independent-sample t-tests, with Cohen’s d computed to interpret effect sizes. Two participants from the B group who failed to submit their work were included only in non-rating analyses; listwise deletion was applied for the creative score dataset.

3. Results

3.1. Expert Ratings and Creative Completion Outcomes

To evaluate the impact of AI tools on the quality and efficiency of creative output, expert ratings, submission time, and completion rate were analyzed. Three independent experts evaluated each participant’s work across the following three dimensions: creativity, technical execution, and thematic relevance, with a total score subsequently calculated. As shown in Table 1, the results indicated that the control group (A group) scored significantly higher on the creativity dimension (M = 27.4 ± 5.3) compared to the AI-assisted group (B group) (M = 23.9 ± 5.9), with a statistically significant difference (t(56) = 2.67, p = 0.010, d = 0.72). Regarding total score, the control group also outperformed the AI group (M = 76.61 ± 17.61 vs. M = 72.19 ± 18.51), with a statistically significant difference (t(56) = 2.10, p = 0.040, d = 0.57). No significant group differences were found in technical execution or thematic relevance.
Moreover, during the 180-min creative task, the two groups showed a clear discrepancy in submission time distribution. The submission times in the control group were more concentrated, with most participants submitting within the last five minutes of the task (M ≈ 175 min, SD = 6.2 min). In contrast, the AI group exhibited a wider distribution; some participants submitted as early as 90 min into the task, while others submitted after the full 180 min due to technical difficulties (M ≈ 162 min, SD = 22.7 min). Levene’s test showed that the AI group had a significantly greater variance in submission time compared to the control group (F(1,56) = 7.86, p = 0.007, η2 = 0.127), suggesting that AI usage substantially disrupted task pacing and process control. Notably, two participants in the AI group failed to submit their final work due to executable code errors (submission rate = 93.1%), whereas all participants in the control group successfully submitted their projects (100%).
These findings provide direct support for H3, which predicted that AI assistance would reduce the overall quality of creative output. The statistically significant differences observed in both creativity scores and total expert ratings between the AI-assisted group and the control group indicate that, despite the potential for increased efficiency, AI usage may constrain participants’ originality, thematic coherence, and execution quality in complex creative tasks.

3.2. Emotional State Change Analysis

To evaluate the impact of AI tool intervention on the participants’ emotional states, the Positive and Negative Affect Schedule (PANAS) was administered before and after the task to assess changes in positive affect (PA) and negative affect (NA). To ensure baseline equivalence, independent-sample t-tests were first conducted on the pre-task PANAS scores. Degrees of freedom were estimated using the Welch–Satterthwaite equation. The results showed no significant group differences in either PA or NA prior to the task, with PA scores of 3.12 ± 0.90 for the control group and 2.97 ± 0.49 for the AI group, t(43) = 0.79, p = 0.435 and NA scores of 2.77 ± 0.96 vs. 2.75 ± 0.66, t(50) = 0.09, p = 0.927, respectively. These findings confirm that the two groups were comparable in affective state at baseline (see Table 2).
Subsequently, paired-sample t-tests revealed that, in the AI group, post-task PA significantly decreased (pre-task M = 33.1 ± 5.9; post-task M = 29.4 ± 6.5), t(29) = −2.58, p = 0.015, with a medium effect size (d = 0.66). In contrast, NA significantly increased in the AI group (pre-task M = 13.1 ± 5.0; post-task M = 16.2 ± 5.4), t(29) = 3.42, p = 0.002, with a large effect size (d = 0.83). No significant differences were observed in the control group for either PA (t(29) = −0.84, p = 0.409) or NA (t(29) = 0.56, p = 0.580). Further analysis using independent-sample t-tests on the difference scores (post–pre) revealed that the AI group showed a significantly greater decrease in PA (∆PA = −3.7 ± 2.6) and a greater increase in NA (∆NA = +3.1 ± 2.2) compared to the control group (∆PA = −0.7 ± 2.1; ∆NA = +0.3 ± 1.9), with statistically significant differences (∆PA: t(58) = −2.14, p = 0.037, d = 0.57; ∆NA: t(58) = 2.86, p = 0.006, d = 0.75). These results indicate that participants in the AI-assisted group experienced more pronounced emotional disturbances, characterized by reduced positive affect and heightened negative experiences (see Figure 2).

3.3. Subjective Task Load Analysis

To assess the participants’ subjective cognitive and emotional workload during the creative task, the NASA Task Load Index (NASA-TLX) was employed. Widely used in human factor and cognitive psychology research, this instrument evaluates the following six dimensions of perceived task demand: mental demand, physical demand, temporal demand, effort, performance, and frustration, providing a comprehensive measure of subjective task difficulty [32,33].As NASA-TLX is a post-task instrument designed to capture subjective workload following task completion, no pre-task scores are available. However, random assignment and comparable pre-task affective states across groups reduce the likelihood that the observed differences stemmed from individual predispositions.
The results revealed significant between-group differences in three of the six NASA-TLX dimensions. Scores on the mental demand dimension were significantly higher in the AI group (M = 61.7 ± 11.9) compared to the control group (M = 53.2 ± 12.4), p = 0.030. Similar trends were observed for effort (AI group: M = 57.3 ± 12.6; control group: M = 49.1 ± 13.7; p = 0.022) and frustration (AI group: M = 52.6 ± 14.3; control group: M = 41.5 ± 15.2; p = 0.015), both of which were significantly higher in the AI group. No significant differences were found between the groups for physical demand (p = 0.513), temporal demand (p = 0.482), or self-rated performance (p = 0.634). These findings suggest that AI-assisted conditions primarily increased cognitive regulation and emotional control demands, with minimal influence on the participants’ perceived time pressure or performance evaluation. These results align with Hypothesis 3, which posits that AI-assisted creation is associated with greater perceived workload and emotional effort. The observed increases in mental demand, effort, and frustration in the AI group support this interpretation. Figure 3 presents the distribution of subjective task load scores across the six NASA-TLX dimensions in boxplot form, highlighting particularly notable differences in dimensions with significant group effects.

3.4. Formatting of Mathematical Components

To investigate whether AI-assisted creation influenced autonomic nervous system regulation, heart rate variability (HRV) indices were collected at the following three time points: the pre-task resting phase (T1), mid-task phase (T2), and post-task resting phase (T3). The following two core physiological indicators were selected: the LF/HF ratio, which reflects the relative activity of sympathetic versus parasympathetic branches, and the root mean square of successive differences (RMSSD), a sensitive marker of parasympathetic tone. These indices were analyzed using repeated-measures analysis of covariance (RM-ANCOVA), with baseline HRV (T1) as a covariate, to evaluate group-by-time interaction effects on autonomic regulation.
As shown in Figure 4, the descriptive results revealed a continuous increase in LF/HF values across the task for the AI group (T1 = 2.53, T2 = 2.91, T3 = 3.23), whereas the control group showed a relatively stable trend (T1 = 2.40, T3 = 2.59), suggesting enhanced sympathetic activation in the AI group during the creative task. RMSSD exhibited a downward trend in both groups, with the AI group decreasing from 34.3 ms at T1 to 25.7 ms at T3 and the control group decreasing from 35.2 ms to 31.6 ms. This indicates a general reduction in parasympathetic tone as the task progressed, with a more pronounced decline in the AI group.
To control for baseline HRV variability, repeated-measures ANCOVA (RM-ANCOVA) was conducted using T1 values as covariates for both LF/HF and RMSSD. The results (Table 3) showed a significant main effect of group on LF/HF (F(1,58) = 4.069, p = 0.050, η2 = 0.099), indicating consistently higher sympathetic activation levels in the AI group throughout the task. Although no significant main or interaction effects were found for LF/HF across time, RMSSD exhibited a significant main effect of time (F(2,114) = 7.803, p = 0.001, η2 = 0.121) and a significant group × time interaction (F(2,116) = 3.285, p = 0.043, η2 = 0.082). This indicates a more pronounced decline in parasympathetic activity in the AI group, possibly reflecting greater autonomic resource mobilization and physiological stress under AI-assisted conditions.
To further explore the relationship between HRV changes and subjective experiences, scatterplots were generated to examine the association between HRV at T2 and emotional and task-load variables, accompanied by Pearson’s correlation analyses. The T2 phase represents the core period of task execution, minimally affected by initial adaptation or terminal fatigue, thus providing an accurate reflection of task-induced autono mic regulation. The results showed a strong negative correlation between RMSSD and negative affect (r = −0.89, p < 0.001), indicating that lower parasympathetic activity was associated with higher perceived negative emotion. Additionally, RMSSD was positively correlated with subjective effort ratings (r = 0.60, p < 0.001), suggesting that individuals with stronger autonomic regulation exhibited greater cognitive engagement during the task. Moreover, LF/HF was positively correlated with frustration scores (r = 0.61, p < 0.001), suggesting that sympathetic activation played a key role in the subjective experience of emotional pressure. These significant correlations are visually presented in the scatterplots (see Figure 5), with regression lines and fit indices clearly illustrated. These findings demonstrate a significant coupling between physiological fluctuations induced by AI-assisted creation and the participants’ subjective cognitive–emotional experiences, offering critical evidence for understanding the psychophysiological mechanisms at play under technological intervention.

3.5. Creative Self-Efficacy Analysis

To investigate the impact of AI-assisted creation on individuals’ self-perception of creativity, the participants’ CSES was measured both before and after the experimental task. CSES reflects an individual’s subjective belief in their ability to accomplish creative tasks when facing novel challenges, and is considered a key indicator of affective regulation and cognitive resource integration. It is widely applied in studies exploring motivational regulation in creative behavior.
Paired-sample t-test results showed significant declines in all CSES items for the AI group after the task (see Table 4). Specifically, scores for “Confidence in using creativity to solve problems” decreased from 4.10 ± 0.64 pre-task to 3.67 ± 0.72 post-task (p = 0.018, d = 0.65). For “Good at coming up with new ideas”, the score dropped from 3.90 ± 0.60 to 3.53 ± 0.65 (p = 0.023, d = 0.61). Scores on “Skilled in developing ideas from others” declined to 3.60 ± 0.62 post-task (p = 0.012, d = 0.66). The item “Good at finding new ways to solve problems” also saw a decrease from 4.07 ± 0.66 to 3.63 ± 0.68 (p = 0.015, d = 0.63). All effect sizes were in the medium to high range, indicating that AI-assisted creation exerted a substantial influence on the participants’ creative self-beliefs. In contrast, the control group exhibited no significant differences across the CSES items before and after the task (p > 0.5), and effect sizes were minimal (d < 0.15), indicating stable creative self-efficacy throughout the experiment. Among these, item 4 (“Good at finding new ways to solve problems”) is particularly noteworthy, as it directly reflects participants’ confidence in approaching problems creatively and independently. The AI group showed a significant decline on this item (p = 0.015, d = 0.63), which may indicate a perceived erosion of creative agency in the presence of AI assistance. By comparison, the control group exhibited a slight, non-significant increase on this same item (Δ = +0.03, p = 0.772, d = 0.06), suggesting possible motivational benefits of self-reliant engagement.

3.6. Behavioral Observation Analysis

To comprehensively understand how AI-assisted creative tasks influence behavioral patterns, this study established eight core behavioral dimensions (see Table 5) based on full-process observational records. These dimensions encompassed tool dependency, feedback interpretation, strategic flexibility, emotional regulation, task pacing, and social interaction. A systematic comparison of behavioral frequencies was conducted between the AI group and the control group, accompanied by case-based analyses of representative participants.
Frequency analysis revealed that the AI group exhibited significantly higher counts in the dimensions of “tool dependency”, “feedback confusion”, “strategic rigidity”, and “emotional reactivity” compared to the control group. For example, the “tool dependency” behavior was recorded 24 times in the AI group versus 5 times in the control group. Similarly, “feedback confusion” (e.g., misinterpreting AI-generated code or mismanaging error output) occurred 20 times in the AI group and only 2 times in the control group. The “strategic rigidity” count was 12 for the AI group versus 2 for the control group. These behavioral discrepancies suggest that current AI systems may exacerbate cognitive conflict and task execution challenges within the creative environment.
A chi-square test indicated statistically significant group differences across seven of the eight behavioral dimensions, excluding submission timing (χ2(6) = 42.83, p < 0.001). These findings suggest that AI intervention altered not only user–tool interaction, but also behavioral strategies and emotional regulation mechanisms.
Specifically, the AI group demonstrated a higher frequency of programming-related inquiries (23 vs. 8). Participant B-01 repeatedly experienced a loop of “AI output misinterpretation → seeking instructor support → regenerating output”, ultimately leading to intense frustration, accompanied by verbal and physical emotional expressions such as sighing and head-holding.
Regarding emotional regulation, the AI group demonstrated 17 instances of emotional reactivity (e.g., muttering, giving up, and expressing resistance), compared to only 5 in the control group. Participant B-06 explicitly stated that “the AI is useless” and exhibited ongoing resistance during the latter phase of the task. Participant B-08 abandoned the task after repeated AI output errors and submitted their work the earliest among all participants, representing a typical case of emotional collapse and task withdrawal.
In terms of task strategy, the AI group frequently exhibited “persistent debugging without strategy change” (n = 12), characterized by repetitive AI output generation without parameter adjustment or alternative planning. Participant B-12 exemplified this behavior, continuously generating identical command strings despite repeated error messages, indicating strategic rigidity and inefficient persistence. In contrast, the control group exhibited more flexible strategies, relying on course materials and reference samples to engage in trial-and-error adjustments.
The control group also outperformed the AI group in “information seeking” (12 vs. 3). For example, A-03 proactively consulted OpenProcessing tutorials when initially unclear about the task, ultimately building a coherent creative logic. Participant A-13 progressively developed a functional code block structure during early task phases, demonstrating strong integrative learning. Meanwhile, the AI group exhibited more instances of “cognitive dissociation” (e.g., distraction or confusion), with 14 occurrences versus 6 in the control group, suggesting greater susceptibility to cognitive overload and goal disengagement under AI conditions. For instance, participant B-15 frequently paused, stared blankly, and rechecked their AI output during the task, but ultimately failed to produce a structured outcome.
In contrast, the control group showed significantly better focus maintenance (19 vs. 5). Participant A-21 maintained stable progress throughout the session with minimal emotional fluctuation. The AI group exhibited both early exits and prolonged delays, resulting in a significantly higher variance in submission times compared to the control group, indicating the disruptive effect of system uncertainty on time management. Although overall peer interaction was low in both groups, it was slightly higher in the control group (4 vs. 2), suggesting that the emphasis on human–machine interaction may have reduced collaborative engagement in the AI group.
In summary, both frequency data and case analysis indicate that AI-assisted creative tasks, in their current technical form, do not improve task strategy or emotional regulation at the behavioral level. Instead, they introduce maladaptive behavioral patterns such as heightened emotional reactivity, increased tool dependency, diminished cognitive flexibility, and impaired task control.

4. Discussion

This study examined the impact of AI code generators on university students’ psychological states, ANS function, and creative output quality in the context of a creative task. Using a multimodal approach integrating emotional assessments (PANAS), creative self-efficacy (CSES), subjective task load (NASA-TLX), heart rate variability (HRV), and behavioral observations, we tested the following four theory-driven hypotheses: (H1) AI-assisted creation increases psychological stress and impairs autonomic nervous system regulation; (H2) AI usage increases negative affect and reduces positive affect; (H3) AI use results in greater perceived workload and emotional effort; and (H4) AI usage lowers the quality of creative output.
First, HRV indicators revealed phase-dependent physiological differences. At the task midpoint (T2), the AI group had a significantly higher LF/HF ratio (p = 0.050, η2 = 0.099), along with reductions in RMSSD and HF, indicating sympathetic dominance and impaired ANS regulation, thus supporting H1. As an index of brain–heart interaction, HRV reflected simultaneous rises in emotional load and physiological stress in the AI environment in this study, providing a physiological dimension to the cognitive neuroscience perspective. Second, the AI group showed a statistically significant decrease in positive mood scores and a statistically significant increase in negative mood scores on the post-task PANAS measure (∆PA: -3.7 ± 2.6, ∆NA: +3.1 ± 2.2, p < 0.01), validating H2, which states that AI use induces higher levels of negative mood. This emotional disturbance may stem from the AI system’s opacity, frequent failures, and unpredictable outputs, which undermine users’ sense of control and violate their expectations. Behavioral observations further substantiated these findings by revealing frequent indicators of emotional agitation in the AI group, including sighing, self-directed speech, verbalized frustration, and premature task disengagement, which suggests a depletion of emotional regulatory capacity under AI-assisted conditions. In line with H3, the AI-assisted participants reported significantly higher levels of mental demand, effort, and frustration on the NASA-TLX. These results indicate that, despite their automation potential, AI tools may impose hidden cognitive and emotional costs. The elevated perceived workload suggests increased prefrontal engagement to manage uncertainty, error correction, and interpretability gaps inherent in AI-generated content. Frustration may further reflect a mismatch between user expectations and system feedback, consistent with emotional–cognitive conflict models. Together, these findings highlight the substantial subjective toll of navigating opaque AI systems during complex tasks. In addition, the final creation results showed that the AI group’s work scores were significantly lower than those of the control group in several dimensions, such as creative integrity, expressiveness, and technical execution (difference in scoring means >1.2, p < 0.05), which verified the H4 hypothesis that the AI system, in its current form, may instead limit creators’ proactive thinking and innovative expression. Although AI tools enhance surface-level generation speed, their cognitive burden and emotional side effects may hinder creators’ active engagement and deep processing. The dissociation between subjective effort and creative quality is notable. Although AI tools may increase perceived task difficulty and emotional load, they do not necessarily correlate directly with expert-rated output, suggesting a complex relationship between user experience and product quality, particularly in novice users with limited domain knowledge.
These findings align with previous theoretical frameworks. According to Inzlicht et al.’s Emotional–Cognitive Conflict Model [34], technological opacity and feedback unpredictability may activate the prefrontal–amygdala circuit, impairing emotional regulation and decision making. The high emotional volatility and cognitive strain observed in the AI group reflect this conflict. Moreover, our HRV findings are consistent with Thayer et al.’s Central Autonomic Network (CAN) model [4], which posits that HRV reflects integrated regulation via the prefrontal cortex, anterior cingulate, and amygdala. A reduced HRV among AI users may, thus, indicate disrupted top-down modulation of stress responses. Recent studies have also explored the affective implications of AI interactions in various task contexts. For example, it has been reported that AI-generated feedback on educational platforms heightened emotional stress and reduced intrinsic motivation among students [35]. Reliance on opaque AI decision systems has been found to be associated with increased anxiety and cognitive uncertainty [36]. Similarly, AI-assisted creativity tools, while helpful in procedural aspects, have been found to reduce confidence and increase emotional disengagement [37]. In addition, the limited explainability of AI outputs has been shown to exacerbate user stress and impair affect regulation [38]. These findings converge with our results and suggest that the emotional dysregulation and elevated physiological arousal observed in the AI group may be driven not only by task complexity, but also by the affective opacity of AI systems, thus reinforcing the neuropsychological burden of AI-supported cognitive environments. However, it is important to note that these neural and affective responses were observed in participants at the earliest stage of domain learning. The extent to which such mechanisms generalize to expert users remains an open question.
Furthermore, the creative performance findings align with the concept of “adaptive tension between AI and creativity” proposed by Frith [39] and Sternberg with Kaufman [40]. These scholars contend that although AI can alleviate low-level cognitive burdens, it may also erode individuals’ creative autonomy and agency, potentially diminishing final output quality in certain contexts. In our study, this mechanism was reflected in both evaluation results and behavioral data, including higher task dependency, information avoidance, and feedback confusion in the AI group. Notably, a significantly higher proportion of early submissions and task withdrawals was observed among AI participants, suggesting a propensity toward cognitive disengagement under technological strain. These findings collectively underscore that behavioral outcomes must be interpreted within a broader psychoneurocognitive framework, and specifically within the context of novice learners undergoing early-stage skill acquisition.
A key consideration in interpreting these findings is the novice status of our participants, who had no formal programming experience and were engaging with AI-assisted creation for the first time. Their cognitive and emotional responses must, therefore, be understood within a foundational learning context, where domain-specific schemas [15] and self-efficacy are still under development [41]. Prior studies have shown that the psychological demands and feedback-related uncertainty of AI systems may be amplified in novice users, potentially leading to overreliance, confusion, and reduced intrinsic motivation [2]. As such, the emotional strain, physiological arousal, and reduced creative performance observed in this study should not be overgeneralized to experienced programmers or professional creators, who possess more stable domain knowledge and can exert greater control over AI output [20]. These results instead reflect a context-specific interaction between novice learning status and opaque AI feedback [42], underscoring the need for differentiated design and deployment strategies based on user expertise.
Despite the study’s systematic evaluation of the effects of AI code generation on emotion, self-efficacy, HRV, physiological stress, and behavioral performance, several limitations remain. First, although HRV is widely used as a proxy for autonomic nervous system activity, its accuracy and interpretability are partially constrained by the absence of complementary physiological parameters such as respiratory rate, electrodermal activity, and blood pressure. Prior research has shown that a reduced respiratory rate can lead to elevated LF power, potentially confounding the interpretation of LF/HF ratios [43]. Hence, the observed sympathetic activation in this study may have been partially influenced by variations in respiratory rhythms. Future research should incorporate respiratory and electrodermal metrics to offer a more comprehensive understanding of HRV dynamics. Second, the sample was composed exclusively of university students in design disciplines, limiting generalizability. The AI tool used was a single general-purpose Chinese-language model, without cross-platform comparisons. Third, although the sample size was determined based on a priori power analysis, it may still be underpowered to detect small or between-group effects, particularly in secondary outcome variables. This limitation should be taken into account when interpreting the robustness and generalizability of the findings. Furthermore, the participant population was highly gender-skewed, with 51 females and only 7 males. While the primary aim of this study was not to examine sex-based differences, this imbalance may limit the generalizability of the findings and precludes any meaningful analysis of sex-dependent responses. Future studies should aim for a more balanced sample distribution to assess potential gender effects. Individuals from other disciplines or using different AI systems may exhibit diverse emotional and behavioral responses to AI-assisted creation. Expanding sample diversity and adopting multi-platform experimental designs in future studies would improve both external and ecological validity. In addition, although participants’ emotional state was assessed using the PANAS at baseline, this measure primarily captures transient affective states rather than stable affective traits. Given that university students are known to exhibit elevated rates of subclinical anxiety and depression symptoms during academic activities [44,45], failure to screen for these conditions may confound the interpretation of affective and physiological outcomes. Future studies should consider incorporating standardized clinical screening tools (e.g., GAD-7 and PHQ-9) to distinguish between state-based emotional changes and trait-level affective predispositions. Lastly, behavioral observation relied predominantly on manual annotations by researchers. While a standardized coding protocol was employed, subjectivity and omission risks remain. Future studies should integrate objective behavioral tracking technologies, such as screen recording, keystroke/mouse logging, and eye-tracking systems, to construct high-resolution behavioral datasets and enable the fine-grained modeling of decision paths and psychophysiological mechanisms during task execution.

5. Conclusions

This study systematically examined the psychological, autonomic, and creative effects of AI code generators in the context of university students engaged in creative tasks. By integrating subjective emotional assessments (PANAS), creative self-efficacy measures (CSES), perceived task load evaluations (NASA-TLX), heart rate variability (HRV) data, and behavioral observations, the study identified consistent patterns indicating increased emotional strain, reduced autonomic flexibility, and diminished creative output under AI-assisted conditions.
Specifically, the participants in the AI-assisted group exhibited significantly lower positive affect and higher negative affect following task completion, indicating impaired emotional regulation. Physiologically, the AI group demonstrated reduced RMSSD and increased LF/HF ratios during the mid-task phase, reflecting heightened sympathetic nervous system activity and reduced parasympathetic modulation. This pattern indicates a decline in autonomic flexibility and impaired stress recovery capacity. Moreover, creative outputs from the AI-assisted group received significantly lower scores in dimensions such as originality, completeness, and aesthetic expression, suggesting that excessive reliance on AI may impair individuals’ capacity for active information processing and creative articulation in complex creative contexts.
Although these results offer meaningful insights into the cognitive and affective implications of AI-assisted creation, they must be interpreted within the context of novice learners. All participants in this study were undergraduate design students with no formal training in programming, and their interaction with AI tools occurred during the initial phase of domain-specific skill acquisition. At this early stage, learners tend to lack established problem-solving schemas and self-efficacy, which may amplify the emotional and cognitive challenges posed by opaque and unpredictable AI feedback. Therefore, the observed effects should not be generalized to experienced users or professional developers. Rather, these findings highlight the importance of aligning AI system design with users’ developmental readiness and cognitive maturity.
Overall, the study contributes to the growing body of empirical research on human–AI interaction, especially within the domain of social cognitive and affective neuroscience. It proposes a multidimensional framework that integrates subjective experiences, behavioral indicators, and autonomic nervous system responses to elucidate the cognitive–affective–physiological coupling mechanisms at play under AI-assisted conditions. Future research should extend this work to more diverse populations and task environments, incorporating multimodal neurophysiological data to further explore the adaptive and maladaptive dynamics of cognitive and emotional regulation in AI-mediated contexts.

Author Contributions

Conceptualization, H.Z.; methodology, H.Z.; software, H.Z.; validation, H.Z.; formal analysis, H.Z. and S.W.; investigation, Z.L.; resources, H.Z. and Z.L.; data curation, H.Z. and Z.L; writing—original draft preparation, H.Z. and S.W.; writing—review and editing, H.Z. and Z.L.; visualization, S.W.; supervision, H.Z. and Z.L.; project administration, Z.L.; funding acquisition, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Education in China, Project of Humanities and Social Sciences, grant number 22YJC760044, and the Fundamental Research Funds for the Central Universities, grant number CCNU23XJ047.

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of CENTRAL CHINA NORMAL UNIVERSITY (protocol code CCNU-IRB-202306002 and date of approval: 13 June 2023).

Informed Consent Statement

All participants received written information about the research project, benefits and risks of participation. They were informed that they could withdraw from the study at any time. Written informed consent was obtained prior to the experiment.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available since they constitute an excerpt of research in progress.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Psychometric Properties of Measurement Scales

Table A1. Cronbach’s α, means, standard deviations, and score ranges for all scales and dimensions (PANAS, NASA-TLX, CSES).
Table A1. Cronbach’s α, means, standard deviations, and score ranges for all scales and dimensions (PANAS, NASA-TLX, CSES).
ScaleDimensionCronbach’s α(M ± SD)MinMax
PANASPositive Affect (pre)0.893.12 ± 0.901.05.0
Positive Affect (post)0.892.97 ± 0.491.04.5
Negative Affect (pre)0.862.77 ± 0.961.05.0
Negative Affect (post)0.862.75 ± 0.661.04.0
NASA-TLXMental Demand-61.70 ± 11.9035.085.0
Effort-57.30 ± 12.6035.085.0
Frustration-52.60 ± 14.3020.075.0
CSESConfidence (pre)-4.10 ± 0.642.255.0
Confidence (post)-3.67 ± 0.722.05.0
New Ideas (pre)-3.90 ± 0.602.05.0
New Ideas (post)-3.53 ± 0.652.05.0
Develop Other’s Ideas (pre)-4.00 ± 0.592.05.0
Develop Other’s Ideas (post)-3.60 ± 0.622.05.0
Problem Solving (pre)-4.07 ± 0.662.55.0
Problem Solving (post)-3.63 ± 0.682.05.0
As the NASA-TLX and CSES instruments were administered using independent single-item dimensions, Cronbach’s α values were not calculated for these measures. Internal consistency was only computed for PANAS subscales, which involved multiple items per dimension.

Appendix B. Test Submission of Creative Posters

Group A
Brainsci 15 00565 i001
Group B
Brainsci 15 00565 i002

References

  1. Amabile, T.M. Creativity in Context: Update to the Social Psychology of Creativity; Routledge: New York, NY, USA, 2018; ISBN 978-0-429-50123-4. [Google Scholar]
  2. Kazemitabaar, M.; Chow, J.; Ma, C.K.T.; Ericson, B.J.; Weintrop, D.; Grossman, T. Studying the Effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Hamburg, Germany, 19 April 2023; ACM: New York, NY, USA; pp. 1–23. [Google Scholar]
  3. Baas, M.; de Dreu, C.K.W.; Nijstad, B.A. A Meta-Analysis of 25 Years of Mood-Creativity Research: Hedonic Tone, Activation, or Regulatory Focus? Psychol. Bull. 2008, 134, 779–806. [Google Scholar] [CrossRef] [PubMed]
  4. Thayer, J.F.; Lane, R.D. Claude Bernard and the Heart–Brain Connection: Further Elaboration of a Model of Neurovisceral Integration. Neurosci. Biobehav. Rev. 2009, 33, 81–88. [Google Scholar] [CrossRef] [PubMed]
  5. Shaffer, F.; Ginsberg, J.P. An Overview of Heart Rate Variability Metrics and Norms. Front. Public. Health 2017, 5, 258. [Google Scholar] [CrossRef] [PubMed]
  6. Chalmers, J.A.; Quintana, D.S.; Abbott, M.J.-A.; Kemp, A.H. Anxiety Disorders Are Associated with Reduced Heart Rate Variability: A Meta-Analysis. Front. Psychiatry 2014, 5, 80. [Google Scholar] [CrossRef]
  7. Thayer, J.F.; Åhs, F.; Fredrikson, M.; Sollers, J.J.; Wager, T.D. A Meta-Analysis of Heart Rate Variability and Neuroimaging Studies: Implications for Heart Rate Variability as a Marker of Stress and Health. Neurosci. Biobehav. Rev. 2012, 36, 747–756. [Google Scholar] [CrossRef]
  8. Castaldo, R.; Melillo, P.; Bracale, U.; Caserta, M.; Triassi, M.; Pecchia, L. Acute Mental Stress Assessment via Short Term HRV Analysis in Healthy Adults: A Systematic Review with Meta-Analysis. Biomed. Signal Process. Control 2015, 18, 370–377. [Google Scholar] [CrossRef]
  9. Ochsner, K.N.; Silvers, J.A.; Buhle, J.T. Functional Imaging Studies of Emotion Regulation: A Synthetic Review and Evolving Model of the Cognitive Control of Emotion. Ann. N. Y. Acad. Sci. 2012, 1251, E1–E24. [Google Scholar] [CrossRef]
  10. Gross, J.J.; John, O.P. Individual Differences in Two Emotion Regulation Processes: Implications for Affect, Relationships, and Well-Being. J. Pers. Soc. Psychol. 2003, 85, 348–362. [Google Scholar] [CrossRef]
  11. McRae, K.; Misra, S.; Prasad, A.K.; Pereira, S.C.; Gross, J.J. Bottom-up and Top-down Emotion Generation: Implications for Emotion Regulation. Soc. Cogn. Affect. Neurosci. 2012, 7, 253–262. [Google Scholar] [CrossRef]
  12. Pessoa, L. On the Relationship between Emotion and Cognition. Nat. Rev. Neurosci. 2008, 9, 148–158. [Google Scholar] [CrossRef]
  13. Gillie, B.L.; Thayer, J.F. Individual Differences in Resting Heart Rate Variability and Cognitive Control in Posttraumatic Stress Disorder. Front. Psychol. 2014, 5, 758. [Google Scholar] [CrossRef] [PubMed]
  14. Botvinick, M.M.; Cohen, J.D.; Carter, C.S. Conflict Monitoring and Anterior Cingulate Cortex: An Update. Trends Cognit. Sci. 2004, 8, 539–546. [Google Scholar] [CrossRef] [PubMed]
  15. Sweller, J. Cognitive Load during Problem Solving: Effects on Learning. Cognit. Sci. 1988, 12, 257–285. [Google Scholar] [CrossRef]
  16. Ding, X.; Tang, Y.-Y.; Tang, R.; Posner, M.I. Improving Creativity Performance by Short-Term Meditation. Behav. Brain Funct. 2014, 10, 9. [Google Scholar] [CrossRef]
  17. Zhang, Y. Assessing the Wellbeing of Chinese University Students: Validation of a Chinese Version of the College Student Subjective Wellbeing Questionnaire. BMC Psychol. 2021, 9, 69. [Google Scholar] [CrossRef]
  18. He, W. Positive and Negative Affect Facilitate Creativity Motivation: Findings on the Effects of Habitual Mood and Experimentally Induced Emotion. Front. Psychol. 2023, 14, 1014612. [Google Scholar] [CrossRef]
  19. Barredo Arrieta, A.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
  20. Kalyuga, S.; Ayres, P.; Chandler, P.; Sweller, J. The Expertise Reversal Effect. Educ. Psychol. 2003, 38, 23–31. [Google Scholar] [CrossRef]
  21. Shin, D. The Effects of Explainability and Causability on Perception, Trust, and Acceptance: Implications for Explainable AI. Int. J. Hum.-Comput. Stud. 2021, 146, 102551. [Google Scholar] [CrossRef]
  22. Tarvainen, M.P.; Niskanen, J.-P.; Lipponen, J.A.; Ranta-aho, P.O.; Karjalainen, P.A. Kubios HRV—Heart Rate Variability Analysis Software. Comput. Methods Programs Biomed. 2014, 113, 210–220. [Google Scholar] [CrossRef]
  23. Laborde, S.; Mosley, E.; Thayer, J.F. Heart Rate Variability and Cardiac Vagal Tone in Psychophysiological Research—Recommendations for Experiment Planning, Data Analysis, and Data Reporting. Front. Psychol. 2017, 8, 213. [Google Scholar] [CrossRef] [PubMed]
  24. Watson, D.; Anna, L.; Tellegen, A. Development and Validation of Brief Measures of Positive and Negative Affect: The PANAS Scales. J Pers Soc Psychol. 1988, 54, 1063–1070. [Google Scholar] [CrossRef] [PubMed]
  25. Huang, L. Adaptation Study of the Positive and Negative Affect Scale (PANAS) in the Chinese Population. Chin. J. Health Psychol. 2003, 11, 54–56. (In Chinese) [Google Scholar]
  26. Hart, S.G.; Staveland, L.E. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In Advances in Psychology; Elsevier: Amsterdam, The Netherlands, 1988; Volume 52, pp. 139–183. ISBN 978-0-444-70388-0. [Google Scholar]
  27. Xiong, Q.; Luo, F.; Chen, Y.; Duan, Y.; Huang, J.; Liu, H.; Jin, P.; Li, R. Factors Influencing Fatigue, Mental Workload and Burnout among Chinese Health Care Workers during Public Emergencies: An Online Cross-Sectional Study. BMC Nurs. 2024, 23, 428. [Google Scholar] [CrossRef]
  28. Liu, Y.; Han, W.; Chu, D.; Zhang, J. A Study of the Effects of Different Semantic Distance Icons on Drivers’ Cognitive Load in Automotive Human-Machine Interface. In HCI in Mobility, Transport, and Automotive Systems; Lecture Notes in Computer Science; Krömker, H., Ed.; Springer Nature: Cham, Switzerland, 2024; Volume 14733, pp. 169–182. ISBN 978-3-031-60479-9. [Google Scholar]
  29. Tierney, P.; Farmer, S.M. Creative Self-Efficacy: Its Potential Antecedents and Relationship to Creative Performance. Acad. Manag. J. 2002, 45, 1137–1148. [Google Scholar] [CrossRef]
  30. Choi, W.-S.; Kang, S.-W.; Choi, S.B. Innovative Behavior in the Workplace: An Empirical Study of Moderated Mediation Model of Self-Efficacy, Perceived Organizational Support, and Leader–Member Exchange. Behav. Sci. 2021, 11, 182. [Google Scholar] [CrossRef]
  31. Karwowski, M.; Lebuda, I.; Wisniewska, E.; Gralewski, J. Big Five Personality Traits as the Predictors of Creative Self-efficacy and Creative Personal Identity: Does Gender Matter? J. Creat. Behav. 2013, 47, 215–232. [Google Scholar] [CrossRef]
  32. Rubio, S.; Díaz, E.; Martín, J.; Puente, J.M. A Comparison of SWAT, NASA-TLX, and Workload Profile Methods. Appl. Psychol. Int. Rev. 2004, 53, 61–86. [Google Scholar] [CrossRef]
  33. Hart, S.G.; Field, M. Nasa-Task Load Index (Nasa-Tlx); 20 Years Later. Proc. Hum. Factors Ergon. Soc. Annu. Meet. 2006, 50, 904–908. [Google Scholar] [CrossRef]
  34. Inzlicht, M.; Bartholow, B.D.; Hirsh, J.B. Emotional Foundations of Cognitive Control. Trends Cognit. Sci. 2015, 19, 126–132. [Google Scholar] [CrossRef]
  35. Delello, J.A.; Sung, W.; Mokhtari, K.; Hebert, J.; Bronson, A.; De Giuseppe, T. AI in the Classroom: Insights from Educators on Usage, Challenges, and Mental Health. Educ. Sci. 2025, 15, 113. [Google Scholar] [CrossRef]
  36. Yuan, H. Artificial Intelligence in Language Learning: Biometric Feedback and Adaptive Reading for Improved Comprehension and Reduced Anxiety. Humanit. Soc. Sci. Commun. 2025, 12, 556. [Google Scholar] [CrossRef]
  37. Wu, S.; Liu, Y.; Ruan, M.; Chen, S.; Xie, X.-Y. Human-Generative AI Collaboration Enhances Task Performance but Undermines Human’s Intrinsic Motivation. Sci. Rep. 2025, 15, 15105. [Google Scholar] [CrossRef] [PubMed]
  38. Khare, S.K.; Blanes-Vidal, V.; Nadimi, E.S.; Acharya, U.R. Emotion Recognition and Artificial Intelligence: A Systematic Review (2014–2023) and Research Recommendations. Inf. Fusion. 2024, 102, 102019. [Google Scholar] [CrossRef]
  39. Frith, C.D. The Social Brain? Philos. Trans. R. Soc. B Biol. Sci. 2007, 362, 671–678. [Google Scholar] [CrossRef]
  40. Sternberg, R.J.; Kaufman, J.C.; Roberts, A.M. The Relation of Creativity to Intelligence and Wisdom. In The Cambridge Handbook of Creativity, 2nd ed.; Kaufman, J.C., Sternberg, R.J., Eds.; Cambridge University Press: Cambridge, UK, 2019; pp. 337–352. [Google Scholar] [CrossRef]
  41. Bandura, A. Self-Efficacy: The Exercise of Control; 12. print; Freeman: New York, NY, USA, 2012; ISBN 978-0-7167-2626-5. [Google Scholar]
  42. Tsiakas, K.; Murray-Rust, D. Unpacking Human-AI Interactions: From Interaction Primitives to a Design Space. ACM Trans. Interact. Intell. Syst. 2024, 14, 1–51. [Google Scholar] [CrossRef]
  43. Saboul, D.; Pialoux, V.; Hautier, C. The Breathing Effect of the LF/HF Ratio in the Heart Rate Variability Measurements of Athletes. Eur. J. Sport. Sci. 2014, 14, S282–S288. [Google Scholar] [CrossRef]
  44. Eisenberg, D.; Gollust, S.E.; Golberstein, E.; Hefner, J.L. Prevalence and Correlates of Depression, Anxiety, and Suicidality among University Students. Am. J. Orthopsychiatry 2007, 77, 534–542. [Google Scholar] [CrossRef]
  45. Conteh, I.; Yan, J.; Dovi, K.S.; Bajinka, O.; Massey, I.Y.; Turay, B. Prevalence and Associated Influential Factors of Mental Health Problems among Chinese College Students during Different Stages of COVID-19 Pandemic: A Systematic Review. Psychiatry Res. Commun. 2022, 2, 100082. [Google Scholar] [CrossRef]
Figure 1. Experimental procedure. Solid arrows represent the temporal flow of tasks; the gray dashed arrow indicates the transition between Day 1 and Day 2. HRV = Heart Rate Variability. PANAS = Positive and Negative Affect Schedule. CSES = Creative Self-Efficacy Scale. NASA-TLX = NASA Task Load Index. Time is measured in minutes (min).
Figure 1. Experimental procedure. Solid arrows represent the temporal flow of tasks; the gray dashed arrow indicates the transition between Day 1 and Day 2. HRV = Heart Rate Variability. PANAS = Positive and Negative Affect Schedule. CSES = Creative Self-Efficacy Scale. NASA-TLX = NASA Task Load Index. Time is measured in minutes (min).
Brainsci 15 00565 g001
Figure 2. Boxplot of changes in positive (PA) and negative (NA) affect for the control group (A) and the AI-assisted group (B). Values represent post–pre difference scores. The gray dashed line indicates no change from baseline. The horizontal line inside each box indicates the median; boxes represent the interquartile range (IQR), whiskers extend to 1.5 × IQR. Diamonds represent outlier values beyond this range.
Figure 2. Boxplot of changes in positive (PA) and negative (NA) affect for the control group (A) and the AI-assisted group (B). Values represent post–pre difference scores. The gray dashed line indicates no change from baseline. The horizontal line inside each box indicates the median; boxes represent the interquartile range (IQR), whiskers extend to 1.5 × IQR. Diamonds represent outlier values beyond this range.
Brainsci 15 00565 g002
Figure 3. Boxplot comparison of subjective task load ratings across six NASA-TLX dimensions between the control group (A) and the AI-assisted group (B). Each box shows the interquartile range (IQR), the line inside the box represents the median, and whiskers extend to 1.5 times the IQR. Diamonds represent outlier values beyond this range. The color coding of groups has been revised to ensure consistency with the legend.
Figure 3. Boxplot comparison of subjective task load ratings across six NASA-TLX dimensions between the control group (A) and the AI-assisted group (B). Each box shows the interquartile range (IQR), the line inside the box represents the median, and whiskers extend to 1.5 times the IQR. Diamonds represent outlier values beyond this range. The color coding of groups has been revised to ensure consistency with the legend.
Brainsci 15 00565 g003
Figure 4. Group-wise HRV changes across task phases. (a) Line plot of LF/HF ratio across task phases (T1–T3), with separate trends for the control group and the AI-assisted group. (b) Line plot of RMSSD values across the three phases, showing a sharper decline in the AI-assisted group. Error bars denote ±1 SD.
Figure 4. Group-wise HRV changes across task phases. (a) Line plot of LF/HF ratio across task phases (T1–T3), with separate trends for the control group and the AI-assisted group. (b) Line plot of RMSSD values across the three phases, showing a sharper decline in the AI-assisted group. Error bars denote ±1 SD.
Brainsci 15 00565 g004
Figure 5. Scatterplot and correlation analysis of HRV physiological indicators (RMSSD vs. LF/HF) with subjective variables (negative mood, effort, and frustration). Regression lines with 95% CI are shown.
Figure 5. Scatterplot and correlation analysis of HRV physiological indicators (RMSSD vs. LF/HF) with subjective variables (negative mood, effort, and frustration). Regression lines with 95% CI are shown.
Brainsci 15 00565 g005
Table 1. Group differences in creative scores and submission timing (mean ± SD, significance tests).
Table 1. Group differences in creative scores and submission timing (mean ± SD, significance tests).
Group AGroup Btp
MeanSDMeanSD
Creativity Score27.405.323.95.92.670.01
Total Score76.5417.6172.1918.512.100.04
Submission Time175.16.2162.322.7-0.007
Table 2. Between-group comparison of pre-task and change scores on PANAS dimensions (mean ± SD).
Table 2. Between-group comparison of pre-task and change scores on PANAS dimensions (mean ± SD).
Affect
Dimension
MetricGroup A (M ± SD)Group B (Δ Score)t(df)p-ValueCohen’s d
Positive Affect (PA)Pre-task Score3.12 ± 0.902.97 ± 0.490.79(43)0.435-
Δ Score (Post–Pre)−0.7 ± 2.1−3.7 ± 2.6−2.14(58)0.0370.57
Negative Affect (NA)Pre-task Score2.77 ± 0.962.75 ± 0.660.09(50)0.927-
Δ Score (Post–Pre)+0.3 ± 1.9+3.1 ± 2.22.86(58)0.0060.75
Table 3. RM-ANCOVA results for LF/HF and RMSSD.
Table 3. RM-ANCOVA results for LF/HF and RMSSD.
HRV IndexEffectF(df)p-Valueη2
LF/HFGroupF(1,58) = 4.0690.050 *0.099
TimeF(2,116) = 1.2830.2810.022
Group × TimeF(2,116) = 2.1130.1250.036
RMSSDGroupF(1,58) = 2.5190.1210.064
TimeF (2,116) = 7.8030.001 **0.121
Group × TimeF(2,116) = 3.2850.043 *0.082
p < 0.05 marked with *, p < 0.01 marked with **.
Table 4. Item-level comparison of creative self-efficacy (CSES) scores before and after the task in the AI-assisted and control groups.
Table 4. Item-level comparison of creative self-efficacy (CSES) scores before and after the task in the AI-assisted and control groups.
ItemGroupPre-TestPost-TestΔScorepd
1. Confidence in using creativity to solve problems.AI Group4.10 ± 0.643.67 ± 0.72−0.430.018 *0.65
Control Group3.90 ± 0.583.88 ± 0.60−0.020.7850.04
2. Good at coming up with new ideas.AI Group3.90 ± 0.603.53 ± 0.65−0.370.023 *0.61
Control Group3.80 ± 0.633.75 ± 0.59−0.050.6440.08
3. Skilled in developing ideas from others.AI Group4.00 ± 0.593.60 ± 0.62−0.400.012 *0.66
Control Group3.95 ± 0.553.88 ± 0.58−0.070.5520.12
4. Good at finding new ways to solve problems.AI Group4.07 ± 0.663.63 ± 0.68−0.440.015 *0.63
Control Group3.75 ± 0.613.78 ± 0.57+0.030.7720.06
p < 0.05 marked with *.
Table 5. Comparative behavioral observations between AI-assisted and control groups.
Table 5. Comparative behavioral observations between AI-assisted and control groups.
No.Behavior CategoryControl
Group
AI
Group
Remarks
1Tool Dependency524Significantly higher in AI group
2Feedback Confusion220AI participants showed more frequent confusion
3Strategic Rigidity212Persistent debugging without adjustment in AI group
4Emotional Reactivity517Stronger emotional responses in AI group
5Task Withdrawal16Early termination more common in AI group
6Information Seeking123Control group showed higher proactive learning
7Sustained Focus195Control group showed more stable task pacing
8Peer Interaction42More collaboration in control
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, H.; Wang, S.; Li, Z. The Neurophysiological Paradox of AI-Induced Frustration: A Multimodal Study of Heart Rate Variability, Affective Responses, and Creative Output. Brain Sci. 2025, 15, 565. https://doi.org/10.3390/brainsci15060565

AMA Style

Zhang H, Wang S, Li Z. The Neurophysiological Paradox of AI-Induced Frustration: A Multimodal Study of Heart Rate Variability, Affective Responses, and Creative Output. Brain Sciences. 2025; 15(6):565. https://doi.org/10.3390/brainsci15060565

Chicago/Turabian Style

Zhang, Han, Shiyi Wang, and Zijian Li. 2025. "The Neurophysiological Paradox of AI-Induced Frustration: A Multimodal Study of Heart Rate Variability, Affective Responses, and Creative Output" Brain Sciences 15, no. 6: 565. https://doi.org/10.3390/brainsci15060565

APA Style

Zhang, H., Wang, S., & Li, Z. (2025). The Neurophysiological Paradox of AI-Induced Frustration: A Multimodal Study of Heart Rate Variability, Affective Responses, and Creative Output. Brain Sciences, 15(6), 565. https://doi.org/10.3390/brainsci15060565

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop