Moderating Roles of the Big Five in Valence–Arousal Dynamics: A TFace-Bi-GRU-SE and CTSEM Study

Meng, Lingping; Li, Mingzheng; Sun, Xiao

doi:10.3390/info17040334

Open AccessArticle

Moderating Roles of the Big Five in Valence–Arousal Dynamics: A TFace-Bi-GRU-SE and CTSEM Study

by

Lingping Meng

^1,2,

Mingzheng Li

^2,3,* and

Xiao Sun

^1,*

¹

School of Mental Health and Psychological Sciences, Anhui Medical University, Hefei 230032, China

²

Hefei Comprehensive National Science Center, Institute of Artificial Intelligence, Hefei 230088, China

³

Institute of Advanced Technology, University of Science and Technology of China, Hefei 230094, China

^*

Authors to whom correspondence should be addressed.

Information 2026, 17(4), 334; https://doi.org/10.3390/info17040334

Submission received: 22 December 2025 / Revised: 22 February 2026 / Accepted: 24 March 2026 / Published: 1 April 2026

(This article belongs to the Special Issue Deep Learning Approach for Time Series Forecasting)

Download

Browse Figures

Versions Notes

Abstract

Existing research confirms associations between Big Five personality traits and emotional states, yet investigations into how personality traits modulate emotional dynamics and their gender-specific patterns remain limited. The present study developed a TFace-Bi-GRU-SE deep learning model that achieved a weighted accuracy of 63.50 ± 0.98% (peak single-run: 64.96%) and an F1 score of 65.21% in performance testing, with a single-inference time of 14.1 s, outperforming traditional methods. The model processed 10 min video recordings from 30 participants (19,262 observations), generating time-series data for valence (P) and arousal (A). Combined with Big Five personality assessments, continuous-time structural equation modeling (CTSEM) revealed distinct emotional dynamics: both P and A exhibited significant negative autoregression (−0.056 and −0.558, p < 0.001), with A reverting to baseline substantially faster (half-life: 1.2 s) than P (half-life: 12.3 s); cross-lagged effects were nonsignificant (P_A: 0.007; A_P: −0.026, p > 0.05). Arousal demonstrated greater instantaneous volatility (=0.339) than valence (=0.286, p < 0.001), with positive covariation between dimensions (0.218, p = 0.006). Exploratory analyses (N = 30) indicated that higher neuroticism and openness scores were associated with elevated arousal (Cohen’s d > 0.8), whereas higher agreeableness and conscientiousness scores were associated with elevated valence (d > 0.8). Gender moderated the neuroticism–arousal relationship, with more potent effects in females (r = 0.746, p = 0.008). Robustness analyses demonstrated high stability of core DRIFT parameters (P_P, A_A): bootstrap resampling (n = 50) yielded coefficients of variation < 0.35 with 100% directional consistency; subgroup validation confirmed cross-sample invariance. Sensitivity analyses revealed that an additional 8% measurement error induced less than 9% bias (8.3% for both P_P and A_A) in autoregressive parameters while preserving half-life ratios, confirming CTSEM’s capacity to extract reliable dynamics from moderately accurate AI outputs. Bootstrap and Bayesian analyses identified ten personality–DRIFT associations with directional consistency ≥ 70%; these constitute preliminary hypotheses for adequately powered future studies (N ≥ 61). This study provides methodological foundations for personalized affective intervention research. Data and code are publicly available (see Data Availability Statement).

Keywords:

emotional valence; emotional arousal; big five personality; deep learning; continuous-time structural equation modeling (CTSEM); emotion recognition

1. Introduction

As a complex and multi-dimensional psychological phenomenon, human emotion plays a core role in individual behavior, cognitive processes, and social interaction [1,2]. An in-depth understanding and automatic recognition of emotional states are of great significance for the development of human–computer interaction systems with emotional intelligence, mental health monitoring, and personalized services [3,4,5]. Current emotion recognition research is mainly divided into discrete emotion classification (e.g., joy, sadness, anger) and dimensional emotion models (e.g., the Valence–Arousal model). Among them, Valence (P) describes the positive or negative degree of emotion, while Arousal (A) reflects the level of physiological activation. These dimensions provide a more refined and continuous characterization of emotion, better capturing subtle changes in human emotion [6,7].

Personality traits, as stable tendencies in individuals’ thinking, emotional, and behavioral patterns, significantly influence the generation, experience, and regulation of emotions [8,9,10]. For example, Neuroticism is often associated with greater negative emotional experiences and arousal, while Extraversion may predict more positive emotional states and arousal [11]. The Big Five personality model has been widely accepted as a mainstream framework for describing personality structure due to its cross-cultural and cross-situational stability. This model covers five core dimensions: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism [12,13,14,15,16,17,18]. Exploring how the Big Five personality traits affect individuals’ emotional dynamics in the valence–arousal space helps to reveal the complex interaction mechanism between personality and emotion [19,20,21].

In recent years, advances in deep learning have led to significant progress in the field of affective computing [22]. In particular, emotion recognition based on multimodal physiological signals, such as Electroencephalogram (EEG), Electrocardiogram (ECG), and Galvanic Skin Response (GSR), has shown great potential due to its ability to capture genuine physiological responses [23]. The Bidirectional Gated Recurrent Unit (Bi-GRU) performs well on sequential data and has been widely applied to emotion recognition tasks involving speech, video, and physiological signals [24,25,26]. However, traditional emotion recognition models still have limitations in capturing complex temporal dependencies and cross-modal information interaction [18]. To address these challenges, this study implemented a series of improvements based on the basic model TFACE, which adopts a multi-head self-attention mechanism and Bidirectional Long Short-Term Memory (Bi-LSTM) to process temporal features. Given that the multi-head self-attention mechanism of TFACE suffers from computational redundancy and insufficient ability to focus on key emotional features, we introduced the Squeeze-and-Excitation (SE) mechanism to replace it, aiming to improve feature expression by adaptively weighting channel features. The SE module recalibrates channel features, enabling the model to focus more on feature channels that contribute to emotion recognition.

Meanwhile, to address the redundancy of parameters and the long training time of the Bi-LSTM unit in TFACE, we adopted the Bi-GRU unit to replace it. By simplifying the gating structure, Bi-GRU reduces the scale of model parameters while maintaining the ability to capture bidirectional context, improves training efficiency and robustness, and is more suitable for the rapid modeling needs of video emotion frame sequences [24]. Compared with Bi-LSTM, Bi-GRU usually has fewer parameters and faster training speed while maintaining similar performance, thereby improving the model’s efficiency and robustness.

This study used the improved model to extract emotional labels from emotion-inducing videos. The model was applied to analyze the first 10 min of emotional video data from 30 participants and obtain corresponding emotional labels. Subsequently, these discrete emotional labels were converted into continuous valence (P) and arousal (A) values, which were mapped to the circumplex model of emotion [6,7,27]. To further explore the influence of the Big Five personality traits on valence and arousal in the circumplex model of emotion, this study adopted CTSEM to fit the data. CTSEM is a powerful statistical method that can analyze dynamic processes that change over time and is very suitable for analyzing emotional and personality variables with temporal dependence [11]. In addition, age and gender, as important covariates, were included in the model to control for their potential influences [19]. In the model parameter estimation process, this study selected the L-BFGS optimization algorithm. As a representative of quasi-Newton methods, this algorithm reduces computational complexity by approximating the inverse of the Hessian matrix, achieves a good balance between efficiency and memory consumption, and is particularly suitable for the optimization needs of large-scale time-series data (19,262 observations) in CTSEM analysis, which can efficiently complete the parameter estimation of the drift matrix and the moderating effect matrix [28,29,30,31]. Compared with the traditional Newton method, L-BFGS achieves a good balance between computational efficiency and memory requirements and has thus been widely used in machine learning and large-scale optimization problems [32,33,34,35].

Previous research has established associations between Big Five personality traits and emotional states, yet most studies emphasize static correlations while lacking a precise characterization of emotional dynamics. Moreover, substantial gaps persist in the efficiency and accuracy of multimodal emotion recognition, as well as in the systematic investigation of personality–emotion regulatory mechanisms. Existing recognition models exhibit limitations in temporal feature extraction and attentional focusing, hindering efficient derivation of dynamic affective parameters.

This study systematically addresses these gaps through the following integrated approach. First, we developed an optimized TFace-Bi-GRU-SE deep learning architecture that enhances video-based emotion recognition accuracy and inference efficiency by incorporating Bi-GRU modules to streamline temporal modeling and Squeeze-and-Excitation (SE) blocks to amplify salient channel features. Second, we applied this model to 10 min video recordings from 30 participants, generating continuous time-series trajectories for valence (P) and arousal (A). Third, integrating Big Five personality assessments with these affective trajectories, we employed continuous-time structural equation modeling (CTSEM) to systematically examine how personality dimensions, age, and gender modulate core dynamic parameters, including autoregressive effects, cross-lagged relationships, and diffusion coefficients. Finally, through complementary analyses—comprising group comparisons, correlation tests, robustness checks, and sensitivity evaluations—we elucidated specific mechanisms by which personality traits shape affective response patterns, thereby establishing a methodological framework for integrating deep learning-based emotion recognition with continuous-time dynamic modeling and generating preliminary hypotheses about personality–emotion associations that warrant investigation in future adequately powered studies.

2. Materials and Methods

2.1. Participants and Procedure

This study initially collected 198 samples. Following stratified random sampling and preliminary screening via the SCL-90 scale, 174 valid samples were retained. The samples were highly representative across age, gender, and other sociodemographic dimensions. The sociodemographic distribution (including gender, occupation, and age) is presented in Table 1.

From the 174 valid participants, 144 were randomly assigned to deep learning model development (70% training, 30% testing). These individuals provided only facial expression videos, without a Big Five personality assessment. The remaining 30 participants constituted an independent sample for continuous-time structural equation modeling (CTSEM) analysis; in addition to video recordings, they completed the Big Five Inventory-2 (BFI-2). Selection criteria for the CTSEM subsample comprised: (1) complete video data without missing frames, (2) fully completed questionnaires, and (3) inter-rater agreement ≥ 0.85 on emotional labels across three expert annotators.

Table 2 summarizes demographic comparisons between subsamples. Independent-samples t-tests and chi-square tests revealed no significant differences in age (t = 1.63, p = 0.105) or gender distribution (χ² = 1.44, p = 0.231), confirming demographic comparability between the CTSEM and model development groups.

All participants (with parental consent for minors) provided written informed consent. The study protocol received approval from the Biomedical Ethics Committee of Hefei University of Technology (Protocol No.: HFU20250110001H). The experimental protocol comprised four phases:

(1) Baseline Assessment: Participants completed psychological questionnaires assessing demographic characteristics and baseline mental health status.

(2) Emotion Induction and Video Acquisition: Under standardized laboratory conditions, participants viewed seven emotion-eliciting video clips (happiness, sadness, neutrality, anger, surprise, disgust, and fear; 2 min each) to induce corresponding affective states (mean induction efficacy: 78.6%). Clips were presented in randomized order with 1 min interstimulus intervals to minimize carryover effects. Facial expressions were captured continuously using a Hikvision HD camera (Hikvision Digital Technology Co., Ltd., Hangzhou, China; total duration: 59.4 h)

(3) Annotation and Validation: Three psychology-trained independent raters annotated facial expressions from the video recordings. Inter-rater reliability was high (Kappa = 0.89). Induction validity was verified through dual-method triangulation: objective behavioral coding and subjective self-report via the SCL-90.

(4) Personality Assessment (CTSEM subsample only): The 30 CTSEM participants completed the Big Five Inventory-2 (BFI-2) for subsequent exploratory analyses of personality–emotion dynamics.

The model used the RAVDESS dataset for pre-training to enhance cross-context generalizability—development was run with PyTorch 1.9 on a server with an Intel Xeon GPU and 128 GB of RAM. Performance was quantified via True_Ratio—the proportion of frames where model predictions matched the majority vote of three expert raters on 20% of annotated frames. Optimal single-run performance on the held-out test set (weighted accuracy = 64.96%, unweighted accuracy = 44.53%) was achieved after 200 training epochs. To assess stability, training was replicated across five random seeds (42, 68, 189, 618, 719), with results reported as M ± SD.

2.2. Measures

2.2.1. Big Five Personality Inventory

The BFI-2 [36] is a culturally adapted personality assessment tool explicitly developed for Chinese-speaking populations to mitigate cross-cultural measurement bias. It covers five dimensions of personality: Extraversion, Agreeableness, Conscientiousness, Neuroticism, and Openness. Each dimension comprises three facet scales, yielding a total of 15. These facets include social skills, decisiveness, energy, empathy, respect, trust, organizational skills, productivity, dutifulness, anxiety, depression, mood swings, intellectual curiosity, aesthetic sensitivity, and creative imagination. In the current sample, the BFI-2 exhibited acceptable to good internal consistency across the five dimensions, with Cronbach’s α values ranging from 0.56 to 0.84 (M = 0.72). Specifically, Agreeableness (α = 0.84) and Conscientiousness (α = 0.84) demonstrated good reliability, Neuroticism (α = 0.75) demonstrated acceptable reliability, while Openness (α = 0.63) and Extraversion (α = 0.56) exhibited low yet usable reliability. At the 15-facet level, internal consistency coefficients ranged from 0.61 to 0.91. The Kaiser–Meyer–Olkin (KMO) measure was 0.73, indicating adequate sampling adequacy for factor analysis.

2.2.2. Video Frame Data and Preprocessing

This study extracted 16 frames from each video segment to increase temporal coverage, using a frame count setting adapted to the R3D-18 backbone network’s input requirements. After frame extraction, the frame images were uniformly scaled to 224 × 224 pixels and then cropped or scaled to 112 × 112 pixels.

During the training phase, random scaling and cropping were used to process the images to 112 × 112 pixels to improve spatial robustness. During testing, the central region was scaled to 112 × 112 pixels to ensure stable evaluation. RGB channel pixel values were normalized to the range [0, 1] or [−1, 1], while data augmentation operations such as random horizontal flipping and color dithering were implemented. The training process used the cross-entropy loss function as the optimization objective, which minimized the difference between the discrete emotion category probability distribution predicted by the model and the actual label, as shown in Equation (1):

L = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{c = 1}^{C} y_{i c} l o g (p_{i c})

(1)

where N is the number of training samples (video frames), C = 7 is the number of emotion categories (corresponding to 7 basic emotions), yic is the actual label of the i-th sample (one-hot encoded, proper category is 1, others are 0), and pic is the probability that the model predicts the i-th sample belongs to emotion category C. Loss (Test) is the cross-entropy loss on the test set, which was used to evaluate the model’s generalization ability; the lower the loss, the smaller the deviation between the model’s predictions and the actual emotion labels.

Of 174 valid samples, this study selected 30 participants based on data integrity criteria (no missing videos, complete scales, and emotion-label consistency ≥ 0.85) for CTSEM analysis. Their age and gender distributions were not significantly different from those of the total sample (p > 0.05). During validation, the model processed 10 min videos from 30 CTSEM participants at 1 fps (approximately 600 frames per participant), generating continuous emotional label sequences. In contrast to model training, which employed 16-frame sampling per video to learn dynamic expression patterns, CTSEM analysis utilized dense temporal sampling (1 fps) to capture continuous emotional trajectories. This multi-scale temporal resolution strategy enables adaptation to distinct analytical objectives.

2.2.3. CTSEM Analysis

The data processing workflow is illustrated in Figure 1. Facial expression videos were acquired synchronously during emotion induction tasks and SCL-90 assessments. Frame sequences were extracted at 1 fps using an enhanced multi-dimensional encoding approach. For each frame, a pre-trained R3D emotion recognition model generated probability distributions over seven discrete emotions (happiness, sadness, neutrality, anger, surprise, disgust, and fear), with deterministic labels assigned via argmax, yielding a complete time series without missing data. Discrete labels were then transformed into continuous affective dimensions using the PAD (Pleasure–Arousal–Dominance) mapping framework [37] based on Russell’s circumplex model [38,39]. Following established conventions [40,41,42], we retained two dimensions—valence (P) and arousal (A)—which represent affective quality and physiological activation intensity, respectively, thereby balancing model parsimony with discriminative capacity. Table 3 presents standardized PAD coordinates for the seven emotions, with neutrality anchored at the origin (0, 0, 0) [38,42]. Each discrete label was mapped to its corresponding P and A values via direct lookup, producing a continuous time series of equal length to the original frame sequences, without interpolation or smoothing.

Data on the emotional dimension and Big Five personality trait scores from 30 participants were integrated for analysis. A continuous-time structural equation model (CTSEM) was employed to examine how the Big Five personality traits, age, and gender exert dynamic influences on pleasure (P) and arousal (A). As the core component of the model, differential equations describe the continuous dynamics of P and A over time.

\frac{d}{d t} (t) = A x (t) + B z (t) + ζ (t)

(2)

where x(t) denotes a state vector representing an individual’s emotional state at time t, encompassing emotional pleasure P(t) and arousal A(t). The drift matrix A describes the instantaneous rate of change in the emotional state. z(t) is a vector comprising exogenous predictor variables, including the Big Five personality traits, age, and gender. B denotes a coefficient matrix quantifying the influence of exogenous predictor variables on emotional dynamics, whose elements characterize how the Big Five personality traits, age, and gender moderate autoregressive and cross-lagged effects. By estimating the parameters in the drift matrix A and the moderating matrix B, we can quantify the specific direction and intensity of these effects, thereby uncovering the underlying mechanisms of emotional dynamics.

The core data processing workflow of this study is illustrated in Figure 1. First, facial expression videos of the participants were collected synchronously via the emotion induction tasks and SCL-90 interviews. Based on facial expression features in the videos, time-series data corresponding to seven discrete emotional labels were extracted using an improved multidimensional emotion coding technique (sampling frequency: 1 frame per second). Then, in accordance with Russell’s circumplex model of emotion, the discrete emotional labels were converted into continuous two-dimensional time-series data for valence (P) and arousal (A) using an emotion-label-to-dimension mapping method. Meanwhile, the Big Five Personality Inventory (BFI-2) was used to assess participants’ personality traits, yielding static scores for each participant across the five personality dimensions: Extraversion, Agreeableness, Conscientiousness, Neuroticism, and Openness. Finally, all these data were integrated for CTSEM analysis.

2.3. Statistical Analysis

2.3.1. CTSEM Model Specification

Data were structured in long format with participant ID as the grouping variable and video frame timestamps as the continuous time index. Valence (P) and arousal (A) were time-varying outcome variables, while Big Five personality dimensions and demographic characteristics (age, gender) were time-invariant predictors. Continuous-time structural equation modeling (CTSEM) was implemented via the ctsem package in R (version 4.4.2) on a 64-bit Windows workstation with an NVIDIA GeForce RTX 4060 GPU (NVIDIA Corporation, Santa Clara, CA, USA) and an AMD Ryzen 7 8745H processor (Advanced Micro Devices, Inc., Santa Clara, CA, USA). Two latent variables were specified to capture P and A dynamics, with factor loadings fixed at unity (i.e., an identity matrix). The predictor set comprised age, gender, and five personality dimensions. All variables were standardized (M = 0, SD = 1) before analysis. Parameter estimation employed the L-BFGS quasi-Newton optimization algorithm. The dataset encompassed 19,262 observations across 30 participants, yielding an acceptable model fit (log-likelihood = −45,734.62; AIC = 91,697.24).

2.3.2. Statistical Power Considerations

The analysis comprised two hierarchical levels with distinct power requirements. Within-person dynamics: Dense longitudinal sampling (30 participants × 642 observations = 19,262 time points) provided adequate power for estimating DRIFT parameters characterizing emotional temporal dynamics. Per CTSEM theory [43], within-person parameter precision depends primarily on observation density per individual rather than sample size. Between-person moderation: With N = 30 and seven predictors, power for detecting moderation effects was limited. Consequently, moderation findings are characterized as exploratory and require large-sample replication.

A post hoc power analysis was conducted to quantify the sensitivity of between-person moderation tests. With N = 30, k = 7 predictors, and α = 0.05 (two-tailed), each predictor effect had 80% power to detect a minimum partial f² of 0.36 (partial |r| = 0.51), corresponding to a significant effect by conventional power analysis thresholds. Power for minor (f² = 0.02) and medium (f² = 0.15) effects was 10% and 44%, respectively (see Table 4).

Based on this power analysis, moderation effects are reported according to convergent evidence strength: effects with bootstrap directional consistency ≥ 70% and Bayesian corroboration are interpreted as preliminary evidence; effects below this threshold are noted but not substantively interpreted. Future confirmatory studies targeting medium-sized effects should recruit at least N ≥ 61 participants.

2.3.3. Supplementary Analyses

Four supplementary analyses complemented core CTSEM findings: (1) Group comparisons: Participants were stratified by Big Five dimension medians into high/low groups; independent-samples t-tests compared emotion metrics, with Cohen’s d quantifying effect sizes. (2) Correlational analysis: Pearson correlations were computed between static emotion indices and personality/demographic variables. (3) Parameter precision assessment: Bayesian factor analysis evaluated DRIFT estimate reliability using individual-level data. (4) Moderation analysis: Hierarchical regression tested neuroticism × gender interactions via gender-stratified models. All analyses utilized standardized variables (z-scores).

2.3.4. Robustness Verification

The robustness of core DRIFT parameters (P_P and A_A autoregressive coefficients) was assessed using two bootstrap approaches. First, fifty resampling iterations with replacement (n = 30) evaluated the stability of the primary CTSEM model, which excluded time-invariant predictors. Second, fifty additional iterations were applied to the complete CTSEM model incorporating all seven time-invariant predictors (five Big Five personality dimensions, age, and gender); these examined the directional consistency of between-person moderation effects across all 28 predictor–DRIFT parameter combinations (Supplementary Table S6), with effects attaining ≥ 70% directional consistency classified as preliminarily replicable.

2.3.5. Measurement Error Sensitivity Analysis

To quantify the impact of classification accuracy (WA = 63.50% ± 0.98%, averaged across five random seeds) on DRIFT estimation, two complementary sensitivity analyses were conducted. This accuracy falls within the established range for seven-class in-the-wild emotion recognition (55–68% on AffectNet [44]); nevertheless, the approximately 36.5% misclassification rate constitutes a systematic measurement error that warrants sensitivity testing.

First, structured noise injection based on a confusion matrix was performed. Rather than applying uniform random relabeling, perturbations were calibrated to the classifier’s empirical error structure. Per-class misclassification probability distributions were derived from the confusion matrix of the best-performing model run (WA = 64.96%, seed = 42). It should be noted that the original P and A values derived from argmax labels inherently incorporated all pre-existing classification errors; the scaling procedure, therefore, simulated additional perturbation layered on top of the baseline error. Frame labels were subsequently resampled according to these class-specific distributions at scaling factors of 0.25×, 0.50×, 0.75×, 1.00×, and 1.25× relative to the empirical error rates, yielding per-class flip proportions ranging from approximately 1.9% (neutral) to 100% (sad) at the 1.00× level. Three replications with distinct random seeds were conducted at each non-zero scaling level. Core DRIFT parameters (P_P and A_A) were summarized in terms of mean deviation from baseline and directional consistency rate.

Second, soft-label uncertainty propagation was implemented. Argmax-derived discrete labels were replaced with confusion-probability-weighted expected values of P and A. For each frame, the model’s full seven-class probability vector—estimated from the row distributions of the confusion matrix—was used to compute the weighted-average expected P and A values across the seven PAD coordinate mappings. Monte Carlo sampling (200 iterations) was subsequently applied to generate an empirical distribution for each DRIFT estimate, characterized by the mean and standard deviation of P_P and A_A across iterations.

Results from both analyses were considered robust if the following criteria were met: autoregressive parameters maintained their sign across all scaling conditions (i.e., 100% directional consistency), Monte Carlo standard deviations remained negligible (<0.01), and soft-label-derived estimates deviated by less than 10% from the argmax baseline. All analyses were conducted in R 4.4.2 using the ctsem package (v3.9.1); reproducible code is publicly available at https://github.com/sheepissleeping/GRUSECTSEM (accessed on 23 March 2026).

3. Results

The research findings are organized as follows. Section 3.1 evaluates the performance of the TFace-Bi-GRU-SE model through ablation experiments comparing different component configurations, multi-seed emotion recognition performance against baseline methods, and training and validation curves. Section 3.2 examines the dynamic characteristics of emotions by analyzing inter-individual differences in emotion distributions and CTSEM parameter estimates, thereby elucidating the temporal evolution and interaction patterns of valence (P) and arousal (A), including impulse response analyses of autoregressive and cross-lagged effects. Section 3.3 investigates the moderating effects of personality dimensions and demographic variables (age and gender) on the four DRIFT parameters, with findings prioritized according to convergent evidence strength—operationalized as the integration of CTSEM significance, bootstrap directional consistency, and Bayesian factor analysis. Section 3.4 presents a correlational analysis of Big Five personality traits and demographic characteristics with mean emotional indicators, examining associations with P-mean, A-mean, P-range, A-range, and PA-ratio. Section 3.5 assesses individual-level heterogeneity and parameter robustness by evaluating DRIFT parameter-estimation precision and individual-level Bayesian factor analysis, indexed by coefficients of variation (CVs), effect sizes, and Bayes factors. Section 3.6 examines the moderating role of gender on the neuroticism–arousal relationship using gender-stratified correlation analyses and hierarchical regression modeling. Section 3.7 reports bootstrap resampling and subgroup validation analyses conducted to confirm the statistical stability of core DRIFT parameters and between-person moderation effects. Section 3.8 presents a two-component measurement error sensitivity analysis—comprising confusion-matrix-based structured noise injection and soft-label uncertainty propagation—to evaluate the robustness of core conclusions in the presence of the model’s approximately 36.5% frame misclassification rate.

3.1. Performance Evaluation of the TFace-Bi-GRU-SE Model

The performance contributions of the Bi-GRU and SE modules to the TFace model were verified through ablation experiments. Performance metrics under different component configurations are listed in Table 5. Hardware environment, dataset, and hyperparameter settings were kept consistent across experiments to ensure that observed differences were attributable to component variations.

Ablation experiments showed that the performance of models that replace the SE module or the Bi-GRU module alone was inferior to that of TFace-Bi-GRU-SE. The original TFace model performed the worst, confirming the need to optimize the two together. Bi-GRU addresses LSTM parameter redundancy, and the SE module optimizes MHSA redundancy and focuses on key frames. The combined use achieves the best performance. Traditional machine learning methods such as SVM and decision trees show comparatively limited performance on the RAVDESS dataset [45]. The TFace-Bi-GRU-SE model in this study was verified through repeated experiments with five random seeds: 42, 68, 189, 618, and 719. The weighted-average accuracy (WA) reached 63.50%, which is 2.74 percentage points higher than the previous value. In terms of unweighted accuracy (UA), the mean of this model after multiple random seed validations was 43.35%, which is significantly better than the 33.5% UAR of the SVM baseline [46], with a relative improvement of 29.4%, indicating that the model has a more balanced recognition ability across categories. Early multimodal fusion methods (such as SIFT + LBP-TOP + SVM) only achieved an accuracy of 47.17% in video–audio emotion recognition tasks [47], indicating that traditional feature extraction and fusion strategies have significant limitations and highlighting the advantages of deep learning models for emotion recognition performance in this study.

Figure 2 presents a confusion matrix, where darker diagonal regions indicate the model’s accuracy in recognizing most emotion categories. In contrast, lighter off-diagonal regions correspond to inter-emotion confusion. Specifically, emotions such as sadness vs. neutrality and anger vs. disgust are difficult to distinguish because their facial features are highly similar. Figure 3 illustrates the stability of the training process. Training accuracy increased rapidly, peaking at a weighted accuracy (WA) of 64.96%—this confirms the synergistic effect of the Bi-GRU and SE modules. Training loss decreased rapidly before stabilizing, whereas validation loss fluctuated slightly following an initial decrease, supporting the model’s fitting efficiency and generalization capability. Training and validation WA curves exhibited a synchronous upward trend, with validation WA approaching 70% and unweighted accuracy (UA) stabilizing at approximately 50%, indicating relatively balanced recognition performance across all emotion categories. The output of the deep learning model constitutes an observational indicator with measurement errors; approximately 36.5% of frame labels may be misclassified. Such errors propagate to the P and A time series, compromising the accuracy of CTSEM parameter estimation.

3.2. Estimation of Emotional Dynamic Parameters

Building on the estimation of emotional dynamic parameters, this study examined the distribution of individual emotional states to elucidate inter-individual differences in emotional pleasure (P) and arousal (A) [48,49]. Figure 4 illustrates the differences in standardized (M = 0, SD = 1) P and A states of 30 individuals during the reasoning stage. Most individuals’ arousal z-scores were concentrated between −1 and +2, exhibiting a slight overall positive arousal above the mean, with moderate inter-individual fluctuations. In contrast, pleasure z-scores exhibited more pronounced inter-individual variability, ranging from −2 to +2, and greater variability than arousal z-scores. Arousal exhibited a right-skewed distribution, with the sample generally tending toward moderate-to-high arousal. Pleasure, by comparison, exhibited a multi-modal distribution, reflecting the heterogeneity of individuals’ pleasure experiences. Z-scores for both dimensions were mainly within ±2, consistent with expectations [50] (see Figure S1 for details).

The continuous-time structural equation model (CTSEM) was used to estimate the dynamic parameters of pleasure (P) and arousal (A) after standardization of the population mean. The results are shown in Table 6. In terms of autoregressive effects, the autoregressive parameter of pleasure was −0.056 (z = −4.05, p < 0.001), and that of arousal was −0.558 (z = −11.14, p < 0.001), both of which were negatively significant. Moreover, the arousal state regressed to the baseline much faster than the pleasure state. In terms of cross-lag effects, the cross-effect of pleasure on arousal was 0.007 (z = 0.10, p = 0.920), and that of arousal on pleasure was −0.026 (z = −1.49, p = 0.136), both of which were not significant. The dynamic association between the two was relatively independent [51]. Regarding diffusion parameters, pleasure was 0.286 (z = 18.11, p < 0.001) and arousal was 0.339 (z = 13.13, p < 0.001), both exhibiting transient variability, with arousal showing slightly stronger transient fluctuations; the diffusion covariance parameter for both was 0.218 (z = 2.75, p = 0.006), showing significant positive covariance.

Impulse response analysis (Figure 5) revealed that both pleasure and arousal exhibited significant autoregressive effects, with significantly different decay rates. The half-life of pleasure was approximately 12.3 s: it remained at roughly 57% of its initial value after the first 10 s and still retained about 5% of its effect magnitude at 50 s. In contrast, the half-life of arousal was only 1.2 s, declining to below 25% of its initial value within approximately 3 s, and its effect essentially dissipated (<5%) after 10 s. This indicates that positive emotions (i.e., pleasure) are more persistent and stable.

The cross-effects between the two dimensions (A_P and P_A) were both close to zero and non-significant (p > 0.05, n.s.), indicating relatively independent temporal dynamics. This finding supports the two-dimensional model of emotion [52,53], whereby each dimension follows an independent temporal decay trajectory.

3.3. Association Patterns of Time-Invariant Predictor Variables: Exploratory Analysis

All findings reported in this section are exploratory, given the limited statistical power at N = 30 (approximately 44% power to detect medium-sized effects; see Table 4). As shown in Table 7, predictor effects are presented in order of convergent evidence strength, defined by the combination of CTSEM model significance, bootstrap directional consistency (Supplementary Table S6), and Bayesian factor analysis (Table 8) where available. Age significantly negatively modulated P_P and A_A (both p < 0.001 ***), thereby accelerating the return of both emotional dimensions to baseline, and positively modulated both cross-effects (A_P and P_A, both p < 0.001 ***), indicating more pronounced bidimensional coupling with increasing age. Gender positively modulated P_P (p < 0.001 ***) and A_P (p < 0.01 **), and negatively modulated P_A (p < 0.001 ***); no significant effect of gender on A_A was observed. Bootstrap directional consistency for demographic predictors ranged from 36% to 72%, reflecting the sampling variability expected under the current sample size, and these effects should therefore be interpreted with corresponding caution.

Among the Big Five dimensions, Agreeableness and Openness showed the strongest convergent evidence. Agreeableness positively modulated A_A (z = 7.62, p < 0.001 ***) and P_A (z = 4.91, p < 0.001 ***), with bootstrap directional consistency of 76% for A_A and 88% for P_A—the highest bootstrap values observed for any personality predictor in this analysis. Bayesian factor analysis additionally yielded moderate evidence for the Agreeableness–P_mean association (BF₁₀ = 2.39, r = 0.40, p = 0.028). These three independent lines of evidence converge to suggest that higher Agreeableness is associated with more sustained arousal states and stronger valence-to-arousal coupling, and this pattern warrants prioritized replication in future adequately powered studies. Openness similarly exhibited strong convergent support: it positively modulated A_A (z = 11.64, p < 0.001 ***) and negatively modulated A_P (z = −4.65, p < 0.001 ***), with bootstrap directional consistency of 84% for A_A and 74% for A_P. Crucially, Bayesian factor analysis provided strong evidence for the Openness–P_mean association (BF₁₀ = 18.31, r = 0.54, p = 0.002)—the strongest Bayesian support observed across all personality predictors. Individuals higher in Openness thus appear to exhibit greater stability across both emotional dimensions, consistent with cognitive flexibility accounts linking Openness to enhanced affective self-regulation.

Extraversion positively modulated P_P (z = 7.27, p < 0.001 ***) and A_A (z = 11.75, p < 0.001 ***) and negatively modulated A_P (z = −3.61, p < 0.001 ***) and P_A (z = −7.36, p < 0.001 ***), suggesting that higher Extraversion is associated with greater emotional stability and reduced bidimensional cross-coupling. Bootstrap directional consistency was 72% for A_A and 70% for P_P, providing modest corroborating support. Conscientiousness negatively modulated A_A (z = −6.87, p < 0.001 ***), positively modulated A_P (z = 3.56, p < 0.001 ***), and negatively modulated P_A (z = −6.91, p < 0.001 ***), indicating that higher Conscientiousness is associated with faster arousal dissipation and reduced valence-driven arousal modulation. The bootstrap directional consistency for Conscientiousness–P_A reached 80%, providing additional corroborating evidence for this specific effect; consistency for A_A and A_P was lower (40% and 52%, respectively), warranting caution in interpreting those two parameters for this trait. Neuroticism positively modulated P_P (z = 7.34, p < 0.001 ***), A_P (z = 6.03, p < 0.001 ***), and P_A (z = 4.55, p < 0.001 ***), while negatively modulating A_A (z = −4.71, p < 0.001 ***). Bayesian factor analysis independently supported the Neuroticism–A_mean association (BF₁₀ = 2.75, r = 0.41, p = 0.023); however, bootstrap directional consistency for Neuroticism ranged only 54–64% across DRIFT parameters, indicating substantial sampling variability. Neuroticism effects on dynamic parameters should therefore be treated as tentative, with the Bayesian evidence for Neuroticism–A_mean serving as the primary corroborating support. Overall, the ten predictor–parameter pairs achieving bootstrap directional consistency ≥ 70%—Agreeableness–P_A (88%), Openness–A_A (84%), Conscientiousness–P_A (80%), Agreeableness–A_P (78%), Agreeableness–A_A (76%), Openness–A_P (74%), Age–A_P (72%), Extraversion–A_A (72%), Age–P_A (70%), and Extraversion–P_P (70%)—constitute the most replicable preliminary evidence from this exploratory analysis. All effects require large-sample confirmation before causal conclusions can be drawn. These modulating patterns are visually summarized in Figure 6, where effect estimates with 95% confidence intervals are displayed for each Big Five dimension across the four DRIFT parameters. The modulating effects of all predictors on the four DRIFT parameters are visualized in Figure S2.

3.4. Correlation Analysis of Emotional Indicators with Demographic Characteristics and Big Five Personality Dimensions

Pearson correlation analysis was performed on standardized data (see Table S2 and Figure S3) to examine the associations between mood indicators, demographic characteristics, and the Big Five personality dimensions. All variables were standardized using Z-scores to ensure result comparability. The results indicated significant correlations between emotional variables and predictors: Openness was significantly positively correlated with P-mean (r = 0.543, p < 0.01 **), indicating that individuals with high Openness tended to experience higher levels of positive emotions; Agreeableness was significantly positively correlated with P-mean (r = 0.401, p < 0.05 *), suggesting that Agreeableness is associated with positive emotional tendencies; Conscientiousness was significantly positively correlated with P-range (r = 0.442, p < 0.05 *); Neuroticism was significantly positively correlated with A-mean (r = 0.414, p < 0.05 *); and Openness was significantly negatively correlated with A-range (r = −0.425, p < 0.05 *), indicating that individuals with high Openness exhibited more minor fluctuations in arousal levels. Following standardization, PA_ratio exhibited nonsignificant correlations, warranting cautious interpretation of this metric. Collectively, standardized analyses revealed robust associations between personality traits and patterns of emotional experience, furnishing empirical support for trait-based individual differences in affective dynamics.

3.5. Individual-Level Heterogeneity and Substantial Effects

We evaluated the accuracy of estimates for the main DRIFT parameters. Table S5 presents the coefficients of variation (CVs) for significant effects (P_P and A_A) as 24.7% and 9.0%, respectively, demonstrating stable parameter estimation. In contrast, the CV for the non-significant A_P effect was as high as 982.4%, indicating substantial uncertainty—consistent with the 95% confidence interval (CI) encompassing zero. Bayesian factor analysis conducted on individual data from 30 participants (see Table 8) provided strong evidence supporting the effect of Openness on mean pleasantness (P) (BF₁₀ = 18.3) and weak-to-moderate evidence for the effect of Neuroticism on mean arousal (A) (BF₁₀ = 2.75). In summary, the reporting of CV values, effect sizes, and Bayesian factors provides multiple lines of evidence for the robustness and reliability of the main findings.

3.6. The Moderating Effect of Gender on the Personality–Emotion Relationship

Exploratory sex-stratified analyses (Supplementary Table S4) revealed potential sex moderation of the neuroticism–arousal relationship, with a stronger positive correlation observed in females (r = 0.746, p = 0.008) compared to males (r = 0.129, p = 0.597). Hierarchical regression analyses (Supplementary Table S7) corroborated this pattern (

β

_interaction = −0.444, p = 0.012).

3.7. Robustness Analysis

Bootstrap resampling of core DRIFT parameters (n = 50; Table 9) confirmed robust parameter stability: P_P demonstrated a coefficient of variation (CV) of 0.24, a 95% CI of [−0.122, −0.052], and 100% directional consistency; A_A yielded a CV of 0.32 with equivalent directional consistency. Bootstrap resampling of the complete model incorporating all seven time-invariant predictors (Supplementary Table S6) revealed that the majority of predictor effects exhibited low directional consistency (<70%), with only 10 of 28 effects surpassing this threshold and none achieving 95% CI exclusion of zero, further confirming that between-person moderation effects remain unstable at the current sample size (N = 30). Subgroup validation analyses (Supplementary Table S7) corroborated these findings: autoregressive parameters (P_P and A_A) demonstrated 100% directional consistency across all subgroups, whereas cross-lagged effects (P_A and A_P) exhibited markedly lower consistency (60%), in accordance with their non-significant status in the primary model.

3.8. Measurement Error Sensitivity

Two additional sensitivity analyses examined robustness to the classifier’s empirical error structure. First, confusion-matrix-based structured noise injection was performed: the original pleasure/arousal (P/A) values inherently incorporated all classification errors (weighted accuracy [WA] = 64.96% for seed = 42, the specific model run used in this analysis; five-seed mean WA = 63.50% ± 0.98%); labels were subsequently perturbed according to class-specific error rates (ranging from 1.9% for neutral to 100% for sad) and misclassification probability distributions, using scaling factors ranging from 0.25× to 1.25× (Table 10). Across all perturbation conditions, autoregressive parameters remained consistently negative (100% of cases), whereas cross-lagged effects remained negligible. Second, soft-label uncertainty propagation was implemented: argmax-derived P and A values were replaced with confusion-probability-weighted expected values, and Monte Carlo sampling (200 iterations) generated parameter distributions (Table 11). Soft-label-derived parameters and autoregressive parameters (

β

P_P and

β

A_A) deviated by less than 9% from baseline estimates (both approximately 8.3%, respectively), and Monte Carlo variances were minimal (

β

P_P: ±0.005;

β

A_A: ±0.007; see Figure 7). Collectively, these two complementary sensitivity analyses demonstrate that the core pattern—significant negative autoregression, accelerated arousal decay, and negligible cross-dimensional coupling—remains robust across diverse measurement error conditions.

4. Discussion

This study found significant differences in the time decay of pleasure and arousal: the half-life of pleasure was approximately 12.3 s, with about 5% of the effect intensity remaining after 50 s, indicating a more persistent experience; the half-life of arousal was only about 1.2 s, with the effect essentially disappearing after 10 s. Physiological arousal subsided much faster than subjective pleasure, with a time decay difference of approximately 10 times. This finding is consistent with the research of Kuppens et al., suggesting that positive emotional experiences have a stronger self-sustaining capacity [54].

Based on emotion regulation theory, pleasure (P)—as one of the core affect dimensions—is typically closely associated with an individual’s overall well-being and cognitive evaluation of the environment. Statistical analyses revealed that changes in pleasure may involve complex cognitive processes, such as situational reappraisal and meaning construction. Given the relative slowness of these processes, pleasure exhibits greater persistence [55]. Arousal (A), by contrast, reflects an individual’s immediate physiological responses to environmental stimuli (e.g., changes in heart rate and respiration)—responses that are typically rapid and transient [56].

From an evolutionary psychology perspective, the ability to quickly identify environmental threats or opportunities and initiate physiological preparations is critical for survival. As a rapid response mechanism, arousal prompts organisms to generate an immediate fight-or-flight response. Once the threat is eliminated or the opportunity is seized, physiological arousal levels decline rapidly to conserve energy [57]. Pleasure, however, is linked to long-term resource acquisition and environmental assessment (e.g., feelings of safety, comfort, and social connection). These experiences are stable and consequential for an individual’s long-term well-being and adaptability, thus requiring more extensive processing and maintenance [58]. The persistence of pleasure may play a pivotal role in emotion regulation and adaptive functioning, helping individuals recover from negative emotions and sustain a positive psychological state [59,60].

In the analysis of autoregressive effects, the autoregressive parameter for pleasure was

β

= −0.056 (z = −4.05, p < 0.001), and for arousal was

β

= −0.558 (z = −11.14, p < 0.001). Both parameters were significantly negative, with arousal returning to baseline more rapidly. This result is consistent with the study by Sosnowska et al., which suggests that individual differences exist in emotional dynamics and the baseline attractor state of emotions, indicating that emotional states return to an intrinsic stable point (homeostasis) over time [61]. Collectively, the rapid initiation and dissipation of arousal—a physiological stress response—may reflect its strong autoregressive characteristics, enabling individuals to adapt to environmental changes efficiently [62].

This study observed non-significant cross-effects between the two dimensions: the cross-effect of pleasure on arousal was

β

= 0.007 (z = 0.10, p = 0.920), and the cross-effect of arousal on pleasure was

β

= −0.026 (z = −1.49, p = 0.136). These findings support the relative independence of pleasure and arousal in their dynamic changes. Tidikis et al. [63] noted that emotional valence and arousal exert distinct effects on task comprehension, suggesting they may influence cognitive processes via separate pathways. This independence holds important implications for emotion theory: Russell’s circumplex model of affect conceptualizes emotions as a continuous space defined by two orthogonal dimensions—pleasure and arousal [64]—and the current results align closely with this framework, confirming that pleasure and arousal are relatively independent core emotional dimensions rather than simply linearly correlated. This independence may reflect distinct neurophysiological mechanisms underlying emotional evaluation (pleasure) and physiological activation (arousal) [65,66]. For instance, high pleasure can co-occur with high arousal (e.g., excitement) or low arousal (e.g., calmness), and the independent variation in these dimensions suggests mediation by distinct neurophysiological pathways and cognitive mechanisms [67].

Regarding diffusion parameters, the pleasure diffusion parameter was significant (

β

= 0.286, z = 18.11, p < 0.001), as was the arousal diffusion parameter (

β

= 0.339, z = 13.13, p < 0.001); the diffusion covariance parameter was also significant (

β

= 0.218, z = 2.75, p = 0.006). Diffusion parameters reflect the instantaneous variability of emotional states—i.e., the degree of random short-term fluctuations in emotions. Previous research has demonstrated bidirectional links between the dynamic interplay of positive/negative affect and physical activity [68], suggesting that multiple factors may influence the instantaneous variability of emotions. These findings indicate that pleasure (P) and arousal (A), as core emotional dimensions, exhibit distinct temporal dynamic patterns: pleasure is persistent, potentially contributing to the maintenance of a positive mindset and adaptive behaviors; arousal dissipates rapidly, which may facilitate stress response management. The dynamic independence and significant positive covariation of these two dimensions reflect the complexity and multi-layered regulation of the emotional system. Exploratory analyses further suggested that distinct personality dimensions may be differentially associated with emotional dynamic parameters. However, these patterns require replication in adequately powered studies before conclusions about personality’s role can be drawn.

The observed tenfold difference in decay rates between valence (half-life: 12.3 s) and arousal (half-life: 1.2 s) has direct implications for the design of affect-sensitive systems. Most existing video-based emotion recognition pipelines treat valence and arousal as synchronously evolving outputs and apply uniform temporal smoothing across both dimensions [7]. The present findings suggest that this assumption is empirically untenable: a smoothing window appropriate for valence—which persists across tens of seconds—would systematically over-smooth arousal dynamics that resolve within seconds, effectively erasing physiologically meaningful reactivity signals. The tenfold difference in decay rates between valence (half-life: 12.3 s) and arousal (half-life: 1.2 s) is consistent with the hypothesis that dimension-specific temporal filtering may be warranted in affect-sensitive pipeline design. However, as this pattern derives from a single exploratory sample (N = 30), replication in independent samples is necessary before design recommendations can be drawn.

A secondary methodological contribution concerns the boundary conditions under which CTSEM can reliably recover dynamic parameters from AI-derived emotion labels. The present pipeline achieved a weighted accuracy of 64.96% (seed = 42; five-seed mean: 63.50 ± 0.98%)—a level typical of current seven-class, video-based recognition systems—yet autoregressive parameters exhibited complete directional consistency across 50 bootstrap iterations and less than 9% deviation under a soft-label sensitivity analysis. This robustness is not incidental: the dense longitudinal design (mean T = 642 observations per person) provides redundancy that effectively averages out frame-level classification noise [69]. The practical implication is significant—researchers need not await near-perfect classifiers before applying CTSEM to AI-derived emotion data, provided that temporal sampling density is sufficient. A preliminary heuristic suggested by the present results is that T ≥ 600 observations per person may provide adequate noise-averaging capacity at classification accuracies in the 60–70% range. Although this threshold requires systematic empirical validation in future work, it constitutes a concrete and falsifiable starting point for field standardization.

Exploratory between-person moderation analyses revealed preliminary patterns, organized here by convergent evidence strength. Bootstrap resampling of the complete model (Supplementary Table S6) revealed heterogeneous stability across effects: Ten predictor–DRIFT associations achieved directional consistency ≥ 70%: Agreeableness–P_A (88%), Openness–A_A (84%), Conscientiousness–P_A (80%), Agreeableness–A_P (78%), Agreeableness–A_A (76%), Openness–A_P (74%), age–A_P (72%), Extraversion–A_A (72%), age–P_A (70%), and Extraversion–P_P (70%). These constitute the most replicable preliminary evidence. Effects with directional consistency below 70% are noted where relevant but should be treated as directionally unstable at this sample size.

Agreeableness and Openness showed the strongest convergent evidence across CTSEM, bootstrap, and Bayesian analyses. Agreeableness selectively and positively modulated A_A (z = 7.62, p < 0.001) and P_A (z = 4.91, p < 0.001), with bootstrap directional consistency of 76% and 88%, respectively, and Bayesian analysis yielded moderate evidence for the Agreeableness–P_mean association (BF₁₀ = 2.39, r = 0.40, p = 0.028). This pattern reflects its role in transforming pleasantness into behavioral activation through positive social interactions [70,71]; individuals high in agreeableness are more likely to receive social support and positive feedback in interpersonal relationships, thereby maintaining an elevated baseline of pleasantness [72]. Openness positively modulated A_A (z = 11.64, p < 0.001), with bootstrap directional consistency of 84%, and showed preliminary evidence for A_P (z = −4.65, p < 0.001, 74% directional consistency). Bayesian analysis provided the most substantial evidence across all predictors for the Openness–P_mean association (BF₁₀ = 18.31, r = 0.54, p = 0.002). Openness is typically associated with curiosity, imagination, and receptiveness to novel experiences—traits that may facilitate the generation and maintenance of positive emotions [73]. Individuals high in Openness demonstrate greater flexibility in adjusting their cognitive appraisals when confronted with emotional stimuli, for instance, reinterpreting adverse events as learning opportunities [74]; consequently, their emotional fluctuations may decay more rapidly [75].

Extraversion positively modulated P_P (70% directional consistency) and A_A (72% directional consistency), suggesting that individuals higher in Extraversion demonstrate greater emotional stability. Extraversion significantly negatively modulated the bidirectional cross-effects between pleasure and arousal (both p < 0.001), indicating that highly extraverted individuals are better at decoupling these two dimensions—maintaining positive pleasure even during high arousal states and preventing the spread of negative emotions [76,77]. This decoupling ability may stem from more robust social support systems and superior coping strategies [78]. Bootstrap directional consistency was 72% for Extraversion–A_A and 70% for Extraversion–P_P, providing modest corroborating support. Conscientiousness negatively modulated A_A (z = −6.87, p < 0.001) and negatively modulated P_A (z = −6.91, p < 0.001), with the latter achieving bootstrap directional consistency of 80%. Highly conscientious individuals accelerate the dissipation of arousal through self-discipline and goal-directed behavior; conscientiousness indirectly influences emotional dynamics by promoting goal setting and achievement [79,80]. Neuroticism demonstrated the opposite pattern to Extraversion: it positively modulated A_P (z = 6.03, p < 0.001) and P_A (z = 4.55, p < 0.001), while negatively modulating A_A (z = −4.71, p < 0.001), suggesting that highly neurotic individuals experience stronger bidirectional coupling between pleasure and arousal alongside faster arousal dissipation. Bayesian analysis independently supported the Neuroticism–A_mean association (BF₁₀ = 2.75, r = 0.41, p = 0.023); however, bootstrap directional consistency for Neuroticism ranged only 54–64% across DRIFT parameters, and these dynamic parameter effects should be treated as tentative. Neuroticism is frequently associated with emotional instability, heightened sensitivity to harmful stimuli, and elevated physiological arousal levels [73,81].

Age-related effects suggested more rapid emotional return to baseline with increasing age (negative modulation of P_P and A_A, both p < 0.001) alongside enhanced bidimensional coupling (positive modulation of A_P and P_A, both p < 0.001), aligning with socioemotional selectivity theory [82] and lifespan developmental perspectives [73,81]. A gender-moderated effect was observed in the Neuroticism–Arousal relationship, with stronger associations among females (r = 0.746, p = 0.008) than males (r = 0.129, p = 0.597) [83]. Bootstrap directional consistency analysis identified ten associations with ≥ 70% stability; these constitute the most replicable preliminary findings and warrant prioritized replication in adequately powered studies (N ≥ 61). Detailed results are presented in the Supplementary Materials. The present moderation analyses were conducted on N = 30 participants, which constrains between-person detection power to significant effects (f² ≥ 0.35, partial |r| ≥ 0.51). Bootstrap directional consistency and Bayesian factor analysis were applied to stratify findings by evidence strength, retaining only effects with ≥ 70% directional stability for substantive interpretation. Ten personality–DRIFT associations met this threshold. Expanding the CTSEM sample to N ≥ 61 would provide 80% power to detect medium-sized moderation effects (f² ≥ 0.15), enabling more precise estimation of individual difference effects on emotional dynamics. Future studies should recruit larger independent samples with complete personality assessments to confirm the preliminary patterns identified here and characterize the full range of personality–DRIFT associations with adequate statistical resolution.

This study implemented rigorous quality control measures in its sample design. Initially, 198 participants were recruited. Through screening using the SCL-90 scale and video quality control procedures, 174 valid samples were retained, yielding a retention rate of 87.88%, which meets the recommendations of Osborne and Overbay (2004) [84].

A total of 144 participants were allocated to deep learning model training and testing (70%/30% split), and 30 independent samples were used for continuous-time structural equation modeling (CTSEM) analysis. This design ensures the independence of the CTSEM analysis, potentially mitigating the risks of data leakage and overfitting. Regarding the adequacy of the N = 30 sample size, there is methodological support. Hecht and Zitzmann (2020) demonstrated that increasing the number of time points (T) can partially compensate for a reduced number of individuals (N) [85]. This study provided 19,262 time-series observation points per individual, potentially furnishing sufficient data for CTSEM parameter estimation. Rodebaugh et al. (2022) demonstrated that dynamic patterns can be reliably detected even with a few individuals provided there are abundant within-individual time points [86]. Andriamiarana et al. (2023) showed, using Bayesian methods, that small samples can yield relatively stable estimates when appropriate estimation procedures are used [87]. The CTSEM multilayer modeling framework accounts for individual-level uncertainty, and the reported coefficient of variation (CV) quantifies inter-individual heterogeneity. Bayesian factor analysis based on individual-level data (n = 30) provided strong evidence for the core findings (BF₁₀ = 18.3). The deep learning model was pretrained using the RAVDESS dataset. The internal dataset (174 cases) was divided into training (70%) and testing (30%) sets at the subject level to ensure unbiased model evaluation. True_Ratio is defined as the percentage of frames in which the model’s prediction matches the majority vote of three independent experts across 20% of the frames.

The dataset splitting follows the official fixed split scheme of DFEW rather than five-fold cross-validation, primarily to enable fair comparison with DFEW baseline studies and due to computational resource constraints. This study acknowledges the limitations of the training process and repeated the experiment with five random seeds (42, 68, 189, 618, and 719), presenting results as “mean ± standard deviation” (WA: 63.50% ± 0.98%) to enhance the robustness of model results.

This study employed the TFace-Bi-GRU-SE deep learning model to generate continuous emotion time series, achieving a weighted accuracy (WA) of 63.50%. Approximately 36.5% of frame labels were misclassified, which may constitute the primary source of measurement error, with misclassifications predominantly occurring between similar emotion categories [88,89]. The error may exhibit both random noise characteristics and systematic bias. Frame misclassification may cause deviations in single-frame pleasure (P) and arousal (A) emotion values. When integrated across a continuous sequence, the error may propagate along the temporal dimension, leading to a regression dilution effect [88,90]. The impact of measurement error on CTSEM parameter estimation may vary: autoregressive effect parameters (P_P, A_A), based on individual temporal patterns, may be relatively robust to single-point measurement errors and better capture the temporal continuity of emotional states; cross-effect parameters (P_A, A_P), involving interactions between different emotional dimensions, may be more susceptible to interference, which may partially account for the nonsignificant cross-effects observed in this study [91]. Structural equation modeling (SEM) theory indicates that measurement error can attenuate the true strength of associations between latent variables, a phenomenon known as attenuation bias or regression dilution [92], potentially leading to underestimation of some path coefficients in CTSEM. This study used fuzzy frame cleaning to reduce noise in the raw data, and the CTSEM model, by modeling data across multiple time points, may smooth single-point errors. Robustness tests and measurement error assessments collectively support the reliability of the core findings. Bootstrap resampling (50 iterations; Table 9) demonstrated that the directional consistency of the P_P and A_A autoregressive parameters reached 100%, with 95% confidence intervals entirely excluding zero. Subgroup validation (Supplementary Table S7) further confirmed that the signs of all core dynamic parameters remained stable across subgroups. Two complementary sensitivity analyses were conducted to assess robustness against measurement error: confusion-matrix-based structured noise injection (Table 10) revealed that autoregressive parameters maintained negative signs under all perturbation conditions (directional consistency = 100%), while soft-label uncertainty propagation (Table 11) yielded estimates deviating less than 9% from baseline values, with negligible Monte Carlo variance (βP_P: ±0.005; βA_A: ±0.007; Figure 7). Collectively, these findings demonstrate that under a dense longitudinal design (mean = 642 observations per person), CTSEM can reliably recover core dynamic parameters from AI-derived emotion data of moderate classification accuracy. The cross-effects exhibited lower directional consistency rates in bootstrap and subgroup validation (60–70%), warranting cautious interpretation. This may reflect the inherently weak nature of the cross-effects or the substantial impact of moderate classification accuracy on the estimation of weak effects. Future research can enhance measurement accuracy through multimodal fusion or attention mechanisms [93,94,95,96]. This study provides a methodological reference for AI-assisted emotion measurement, emphasizing the need for multidimensional validation (bootstrap sampling, subgroup analysis, and sensitivity testing).

The dynamic interaction between emotional valence and arousal was systematically analyzed using a continuous-time structural equation model. The specific regulatory effects of the Big Five personality dimensions, age, and gender on these dynamic parameters were quantified. CTSEM, as a statistical method for analyzing temporally evolving processes, is well-suited for handling time-dependent emotional and personality variables, thereby revealing the complex interaction mechanisms between personality and emotions [11,43,97,98]. The integration of continuous emotional dimension data extracted by deep learning models with Big Five personality scale data provides a multilevel, cross-domain analytical framework that deepens our understanding of how personality shapes emotional response patterns [99,100,101]. Methods that integrate multiple data modalities, such as physiological signals and behavioral data, are receiving increasing attention in affective computing [95,102].

The findings of this study are exploratory and insufficient to inform the design of personalized intervention strategies [103]. The primary contribution of this work is methodological: it demonstrates a pipeline for extracting continuous emotional dynamics from ecologically valid video data and subjects those dynamics to rigorous sensitivity testing. Whether the personality-moderation patterns identified here carry clinical relevance remains an open empirical question, contingent on replication with adequate sample sizes (N ≥ 61) and more refined measurement models. At the methodological level, this study offers a systematic evaluation of deep learning models as measurement tools. It examines error-mitigation strategies, thereby providing a foundational foundation for the application of AI-assisted emotion measurement in psychology and medicine.

To address concerns regarding sample size generalizability, we validated the robustness of the research findings through two supplementary analyses: Bootstrap resampling analysis (n = 50) focused on the core emotional dynamic DRIFT parameters (P_P, A_A autoregressive coefficients), revealing stable estimates (coefficient of variation < 0.5, narrow 95% confidence intervals, 100% directional consistency), confirming that dense longitudinal data can provide reliable within-subject dynamic estimates even with small samples; subgroup validation analysis demonstrated that these parameters exhibited 100% directional consistency across different subgroups of emotional expression intensity, video quality, and personality extremity, supporting the cross-group consistency of the core emotional dynamic patterns.

This study has several limitations. First, sample size constitutes the primary limitation. The CTSEM analysis included 30 participants. Although dense longitudinal data supported the estimation of core dynamic parameters, the CTSEM analysis included only 30 participants, limiting the power to detect between-person moderation effects to significant levels (f² ≥ 0.35). To address this constraint directly, moderation effects were reported according to convergent evidence strength: effects supported by both bootstrap directional consistency ≥ 70% and Bayesian corroboration were retained for interpretation. In comparison, directionally unstable effects (<70% consistency) were noted but excluded from substantive conclusions. This evidence-based filtering approach ensures that reported findings are matched to the sample’s actual detection capacity. Second, the sample design contains flaws: the 144 participants in the model training/testing group did not undergo Big Five personality assessment, precluding verification of trait comparability between the two groups and limiting evaluation of the model’s generalizability across personality groups. Third, the deep learning model’s approximately 36.5% frame misclassification rate may systematically underestimate fine-grained dynamic parameters. Fourth, all participants were recruited from a Chinese population, resulting in high sample homogeneity and limiting the generalizability of the results across cultures and populations.

Future research should prioritize addressing the sample size limitation. We recommend recruiting at least 100 participants who complete both video recording and personality assessment, employing stratified sampling to ensure a representative distribution of personality traits, and matching personality traits between the training and validation groups to verify the exploratory findings of this study and enhance the generalizability of the results. Additionally, including participants from diverse cultural backgrounds would strengthen the cross-cultural applicability of the conclusions. Furthermore, multimodal emotion recognition techniques (such as integrating speech and physiological signals) can improve measurement accuracy. Future studies should first prioritize replication of the exploratory personality–DRIFT associations identified here in adequately powered independent samples (N ≥ 61). Only upon successful replication would it be appropriate to consider downstream applications such as personalized emotion regulation programs.

5. Conclusions

This study integrated the TFace-Bi-GRU-SE deep learning model with continuous-time structural equation modeling (CTSEM) to characterize within-person temporal dynamics of emotional valence (P) and arousal (A). The model achieved a weighted accuracy of 63.50% ± 0.98% (mean across five random seeds; peak single-run: 64.96%) and an F1 score of 65.21%. CTSEM analysis (30 participants, 19,262 time points) revealed distinct regulatory dynamics: both valence and arousal exhibited significant negative autoregression, with arousal reverting approximately 10 times faster than valence (half-lives: 1.2 s vs. 12.3 s). Cross-lagged effects were nonsignificant. These parameters demonstrated high robustness across bootstrap resampling, subgroup analyses, confusion-matrix noise simulations, and soft-label sensitivity analyses. Exploratory between-person moderation analyses (N = 30) identified ten personality–DRIFT associations with bootstrap directional consistency ≥ 70% and Bayesian corroboration; these constitute preliminary evidence for adequately powered replication studies (N ≥ 61). This study makes four primary contributions: (1) demonstrating the feasibility of integrating deep learning-based emotion recognition with continuous-time dynamic modeling, (2) providing robust evidence for differentiated valence–arousal decay kinetics, (3) establishing a methodological framework for sensitivity analysis of AI-derived emotion measurements, and (4) identifying ten preliminary predictor–DRIFT associations as hypothesis-generating targets for adequately powered future studies.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/info17040334/s1, Figure S1: Density distributions of arousal and pleasure intensities; Figure S2: Heatmap of the moderating effects of Big Five personality traits and demographic variables on the dynamic parameters of pleasure and arousal; Figure S3: Correlation heatmap of emotional variables and predictors (N = 30); Table S1: The Big Five Inventory-2 (BFI-2); Table S2: Pearson correlation coefficients between emotional indicators and demographic and personality predictors; Table S3: Accuracy of drift parameter estimation; Table S4: Complete analysis stratified by gender; Table S5: Results of the moderation effect model; Table S6: Bootstrap robustness analysis—complete model with time-invariant predictors (50 iterations); Table S7: Subgroup validation analysis.

Author Contributions

Conceptualization, L.M.; methodology, L.M. and M.L.; software, M.L. and L.M.; formal analysis, L.M.; investigation, L.M.; resources, M.L.; data curation, L.M.; writing—original draft preparation, L.M. and M.L.; writing—review and editing, L.M. and M.L.; visualization, L.M. and X.S.; supervision, X.S.; project administration, X.S.; funding acquisition, X.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China under the “Science and Technology Innovation 2030—New Generation Artificial Intelligence” Major Project (Grant No. 2022ZD0118200), undertaken by the Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, and by the Anhui Provincial Key Research and Development Program (Grant No. 202304a05020068, “Research on Assisted Diagnosis and Intervention of Childhood Autism Based on Human-Computer Interaction”).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Biomedical Ethics Committee of Hefei University of Technology (protocol code: HFUT20250110001H; date of approval: 10 January 2025). All procedures involving human participants were consistent with national and international ethical guidelines for the protection of research subjects.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The de-identified derived PA time series, time-invariant covariates, and full executable code used to reproduce all reported CTSEM results are publicly available at: https://github.com/sheepissleeping/GRUSECTSEM (accessed on 23 March 2026). Raw video recordings cannot be shared due to participant privacy protection.

Acknowledgments

We thank all authors for their contributions to this work.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BFI-2	The Big Five Inventory–2
P	Valence
A	Arousal
GRU	Gated Recurrent Unit
SE	Squeeze-and-Excitation
Bi-GRU	Bidirectional Gated Recurrent Unit

References

Dolan, R.J. Emotion, Cognition, and Behavior. Science 2002, 298, 1191–1194. [Google Scholar] [CrossRef] [PubMed]
Hamaker, E.L.; Ceulemans, E.; Grasman, R.P.P.P.; Tuerlinckx, F. Modeling Affect Dynamics: State of the Art and Future Challenges. Emot. Rev. 2015, 7, 316–322. [Google Scholar] [CrossRef]
Subramanian, R.; Wache, J.; Abadi, M.K.; Vieriu, R.L.; Winkler, S.; Sebe, N. ASCERTAIN: Emotion and Personality Recognition Using Commercial Sensors. IEEE Trans. Affect. Comput. 2018, 9, 147–160. [Google Scholar] [CrossRef]
Zitouni, M.S.; Park, C.Y.; Lee, U.; Hadjileontiadis, L.J.; Khandoker, A. LSTM-Modeling of Emotion Recognition Using Peripheral Physiological Signals in Naturalistic Conversations. IEEE J. Biomed. Health Inform. 2023, 27, 912–923. [Google Scholar] [CrossRef]
Seikavandi, M.J.; Narcizo, F.B.; Vucurevich, T.; Dittberner, A.B.; Burelli, P. MuMTAffect: A Multimodal Multitask Affective Framework for Personality and Emotion Recognition from Physiological Signals (Version 1). arXiv 2025, arXiv:2509.04254. [Google Scholar]
Handrich, S.; Dinges, L.; Saxen, F.; Al-Hamadi, A.; Wachmuth, S. Simultaneous Prediction of Valence/Arousal and Emotion Categories in Real-time. In Proceedings of the 2019 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), Kuala Lumpur, Malaysia, 17–19 September 2019; pp. 176–180. [Google Scholar] [CrossRef]
Nicolaou, M.A.; Gunes, H.; Pantic, M. Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space. IEEE Trans. Affect. Comput. 2011, 2, 92–105. [Google Scholar] [CrossRef]
Barańczuk, U. The five factor model of personality and emotion regulation: A meta-analysis. Personal. Individ. Differ. 2019, 139, 217–227. [Google Scholar] [CrossRef]
Berkovich, I.; Eyal, O. Teachers’ Big Five personality traits, emotion regulation patterns, and moods: Mediation and prototype analyses. Res. Pap. Educ. 2019, 36, 332–354. [Google Scholar] [CrossRef]
Liu, T. Relationships among college students’ Big Five Personality, emotion regulation, and interpersonal relationships. In SHS Web of Conferences; EDP Sciences: Les Ulis, France, 2025; Volume 222, p. 03025. [Google Scholar] [CrossRef]
Ryan, O.; Kuiper, R.M.; Hamaker, E.L. A Continuous-Time Approach to Intensive Longitudinal Data: What, Why, and How? In Continuous Time Modeling in the Behavioral and Related Sciences; Springer: Cham, Switzerland, 2018; pp. 27–54. [Google Scholar] [CrossRef]
Jawinski, P.; Markett, S.; Sander, C.; Huang, J.; Ulke, C.; Hegerl, U.; Hensch, T. The Big Five Personality Traits and Brain Arousal in the Resting State. Brain Sci. 2021, 11, 1272. [Google Scholar] [CrossRef]
Masumura, R.; Orihashi, S.; Ihori, M.; Tanaka, T.; Makishima, N.; Yamane, T.; Kawata, N.; Suzuki, S.; Katayama, T. Joint Modeling of Big Five and HEXACO for Multimodal Apparent Personality-trait Recognition (Version 1). arXiv 2025, arXiv:2510.14203. [Google Scholar]
Slåen Svendsen, T.; Boson, K.; Orm, S. The Relationship Between Empathic Disequilibrium and the Big Five Personality Traits. Imagin. Cogn. Personal. 2025, 2025, 02762366251399084. [Google Scholar] [CrossRef]
Hosseini, M.S.K.; Firoozabadi, S.M.; Badie, K.; Azadfallah, P. Personality-Based Emotion Recognition Using EEG Signals with a CNN-LSTM Network. Brain Sci. 2023, 13, 947. [Google Scholar] [CrossRef] [PubMed]
Anunciação, L.; Marques, L.; Murray, C.; Gomes, A.B.; Rabelo, I.; Moraes Cruz, R. Murray’s system of needs and the Big Five Personality Traits: Using Exploratory Structural Equation Modeling to evaluate their relationship. Persona 2025, 27, 35–58. [Google Scholar] [CrossRef] [PubMed]
Jammot, M.; Braun, B.; Streli, P.; Wampfler, R.; Holz, C. egoEMOTION: Egocentric Vision and Physiological Signals for Emotion and Personality Recognition in Real-World Tasks (Version 1). arXiv 2025, arXiv:2510.22129. [Google Scholar]
Shirvani, A.; Ware, S.G.; Baker, L.J. Personality and Emotion in Strong-Story Narrative Planning. IEEE Trans. Games 2023, 15, 669–682. [Google Scholar] [CrossRef]
Martínez-Tejada, L.A.; Maruyama, Y.; Yoshimura, N.; Koike, Y. Analysis of Personality and EEG Features in Emotion Recognition Using Machine Learning Techniques to Classify Arousal and Valence Labels. Mach. Learn. Knowl. Extr. 2020, 2, 99–124. [Google Scholar] [CrossRef]
Busseri, M.A.; Erb, E.M. The happy personality revisited: Re-examining associations between Big Five personality traits and subjective well-being using meta-analytic structural equation modeling. J. Personal. 2023, 92, 968–984. [Google Scholar] [CrossRef]
Fischer, A.; Voracek, M.; Tran, U.S. Semantic and sentiment similarities contribute to construct overlaps between mindfulness, Big Five, emotion regulation, and mental health. Personal. Individ. Differ. 2023, 210, 112241. [Google Scholar] [CrossRef]
Huang, X.; Zhu, S.; Wang, Z.; He, Y.; Zou, Z.; Wang, J.; Jin, H.; Liu, Z. An Enhanced Valence-Arousal Multimodal Emotion Dataset for Emotion Recognition. Sci. Data 2025, 12, 1944. [Google Scholar] [CrossRef]
Priyadarshini, N.; Aravinth, J. Emotion Recognition based on fusion of multimodal physiological signals using LSTM and GRU. In Proceedings of the 2023 Third International Conference on Secure Cyber Computing and Communication (ICSCCC), Jalandhar, India, 26–28 May 2023; pp. 1–6. [Google Scholar] [CrossRef]
Maliha, M.H.; Lopa, A.R.; Chowdhury, M.H. Detection of Emotion from EEG Signal Using Deep Learning: Bi-LSTM and GRU. In Proceedings of the 2024 6th International Conference on Electrical Engineering and Information & Communication Technology (ICEEICT), Dhaka, Bangladesh, 2–4 May 2024; pp. 131–136. [Google Scholar] [CrossRef]
Wang, S.-H.; Li, H.-T.; Chang, E.-J.; Wu, A.-Y. Entropy-Assisted Emotion Recognition of Valence and Arousal Using XGBoost Classifier. In IFIP Advances in Information and Communication Technology; Springer International Publishing: Cham, Switzerland, 2018; pp. 249–260. [Google Scholar] [CrossRef]
Zamani, F.; Wulansari, R. Emotion Classification using 1D-CNN and RNN based on DEAP Dataset. Nat. Lang. Process. 2021, 363–378. [Google Scholar] [CrossRef]
Chen, S.; Jin, Q.; Zhao, J.; Wang, S. Multimodal multi-task learning for dimensional and continuous emotion recognition. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, Mountain View, CA, USA, 23–27 October 2017; pp. 19–26. [Google Scholar]
Slattery, S.A.; Surjuse, K.; Valeev, E.F. Economical Quasi-Newton Self Consistent Field Solver. arXiv 2023, arXiv:2307.00560. [Google Scholar]
Lapucci, M.; Mansueto, P. A limited memory Quasi-Newton approach for multi-objective optimization. Comput. Optim. Appl. 2023, 85, 33–73. [Google Scholar] [CrossRef]
Moghrabi, I.A.R.; Hassan, B.A. An Efficient Limited Memory Multi-Step Quasi-Newton Method. Mathematics 2024, 12, 768. [Google Scholar] [CrossRef]
Fischer, G.H.; Ponocny, I. An Extension of the Partial Credit Model with an Application to the Measurement of Change. Psychometrika 1994, 59, 177–192. [Google Scholar] [CrossRef]
Dai, M.-X.; He, B.-S.; Huang, W.-S. Studies on modified limited-memory BFGS method in full waveform inversion. Front. Earth Sci. 2023, 10, 1047342. [Google Scholar] [CrossRef]
Qi, P.; Zhou, W.; Han, J. A method for stochastic L-BFGS optimization. In Proceedings of the 2017 IEEE 2nd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), Chengdu, China, 28–30 April 2017; pp. 156–160. [Google Scholar] [CrossRef]
Niu, Y.; Fabian, Z.; Lee, S.; Soltanolkotabi, M.; Avestimehr, S. Ml-bfgs: A momentum-based l-bfgs for distributed large-scale neural network optimization. arXiv 2023, arXiv:2307.13744. [Google Scholar]
Bollapragada, R.; Mudigere, D.; Nocedal, J.; Shi, H.-J.M.; Tang, P.T.P. A Progressive Batching L-BFGS Method for Machine Learning (Version 2). arXiv 2018, arXiv:1802.05374. [Google Scholar]
Zhang, B.; Li, Y.M.; Li, J.; Luo, J.; Ye, Y.; Yin, L.; Chen, Z.; Soto, C.J.; John, O.P. The Big Five Inventory–2 in China: A Comprehensive Psychometric Evaluation in Four Diverse Samples. Assessment 2022, 29, 1262–1284. [Google Scholar] [CrossRef]
Gebhard, P. ALMA: A layered model of affect. In Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems, Utrecht, The Netherlands, 25–29 July 2005; pp. 29–36. [Google Scholar]
Smith, K.; Khare, V.; Blanes-Vidal, V.; Nadimi, E.S.; Acharya, U.R. Emotion recognition and artificial intelligence: A systematic review (2014–2023) and research recommendations. Inf. Fusion 2023, 102, 102019. [Google Scholar] [CrossRef]
Zhang, M.; Wang, C.; Sun, Y.; Li, T. Memristive PAD three-dimensional emotion generation system based on D–S evidence theory. Nonlinear Dyn. 2024, 112, 4841–4861. [Google Scholar] [CrossRef]
Becker-Asano, C.; Wachsmuth, I. Affective computing with primary and secondary emotions in a virtual human. Auton. Agents Multi-Agent Syst. 2009, 20, 32–49. [Google Scholar] [CrossRef]
Zhao, Y.; Xie, D.; Zhou, R.; Wang, N.; Yang, B. Evaluating Users’ Emotional Experience in Mobile Libraries: An Emotional Model Based on the Pleasure-Arousal-Dominance Emotion Model and the Five Factor Model. Front. Psychol. 2022, 13, 942198. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Tao, L.; Fu, X. The Analysis of PAD Emotional State Model Based on Emotion Pictures. J. Image Graph. 2009, 14, 753–758. [Google Scholar]
Driver, C.C.; Voelkle, M.C. Hierarchical Bayesian continuous time dynamic modeling. Psychol. Methods 2018, 23, 774–799. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Yan, S.; Liu, Y.; Song, W.; Liu, J.; Chang, Y.; Mai, X.; Hu, X.; Zhang, W.; Gan, Z. A Survey on Facial Expression Recognition of Static and Dynamic Emotions. arXiv 2024, arXiv:2408.15777. [Google Scholar]
Sun, L.; Zou, B.; Fu, S.; Chen, J.; Wang, F. Speech Emotion Recognition Based on DNN-Decision Tree SVM Model. Speech Commun. 2019, 115, 29–37. [Google Scholar] [CrossRef]
Keinert, M.; Pistrosch, S.; Mallol-Ragolta, A.; Schuller, B.W.; Berking, M. Facial Emotion Recognition of 16 Distinct Emotions from Smartphone Videos: Comparative Study of Machine Learning and Human Performance. J. Med. Internet Res. 2025, 27, e68942. [Google Scholar] [CrossRef]
Shu, L.; Xie, J.; Yang, M.; Li, Z.; Li, Z.; Liao, D.; Xu, X.; Yang, X. A Review of Emotion Recognition Using Physiological Signals. Sensors 2018, 18, 2074. [Google Scholar] [CrossRef]
Reisenzein, R. Pleasure-arousal theory and the intensity of emotions. J. Personal. Soc. Psychol. 1994, 67, 525–539. [Google Scholar] [CrossRef]
Odaka, Y.; Kaneiwa, K. Block-segmentation vectors for arousal prediction using semi-supervised learning. Appl. Soft Comput. 2023, 142, 110327. [Google Scholar] [CrossRef]
Evans, T.C.; Carlson, J.; Zuberer, A.; Fry, R.; Agnoli, S.; Britton, J.C.; DeGutis, J.; Esterman, M. A preliminary characterization of the psychometric properties and generalizability of a novel social approach-avoidance paradigm. Motiv. Emot. 2024, 48, 278–294. [Google Scholar] [CrossRef]
Kuppens, P. Individual differences in the relationship between pleasure and arousal. J. Res. Personal. 2008, 42, 1053–1059. [Google Scholar] [CrossRef]
Haddad, S.; Daassi, O.; Belghith, S. Single Modality and Joint Fusion for Emotion Recognition on RAVDESS Dataset. SN Comput. Sci. 2024, 5, 669. [Google Scholar] [CrossRef]
Ryumina, E.; Markitantov, M.; Ryumin, D.; Karpov, A. OCEAN-AI framework with EmoFormer cross-hemiface attention approach for personality traits assessment. Expert Syst. Appl. 2024, 239, 122441. [Google Scholar] [CrossRef]
Kuppens, P.; Oravecz, Z.; Tuerlinckx, F. Feelings change: Accounting for individual differences in the temporal dynamics of affect. J. Personal. Soc. Psychol. 2010, 99, 1042–1060. [Google Scholar] [CrossRef]
Lo, L.Y.; Hung, N.L.; Lin, M. Angry Versus Furious: A Comparison Between Valence and Arousal in Dimensional Models of Emotions. J. Psychol. 2016, 150, 949–960. [Google Scholar] [CrossRef]
Smith, S.D.; McIver, T.A.; Di Nella, M.S.J.; Crease, M.L. The effects of valence and arousal on the emotional modulation of time perception: Evidence for multiple stages of processing. Emotion 2011, 11, 1305–1313. [Google Scholar] [CrossRef]
Fredrickson, B.L. The role of positive emotions in positive psychology: The broaden-and-build theory of positive emotions. Am. Psychol. 2001, 56, 218–226. [Google Scholar] [CrossRef]
Moccia, L.; Mazza, M.; Nicola, M.D.; Janiri, L. The Experience of Pleasure: A Perspective Between Neuroscience and Psychoanalysis. Front. Hum. Neurosci. 2018, 12, 359. [Google Scholar] [CrossRef]
Seidl, E.; Venz, J.; Ollmann, T.M.; Voss, C.; Hoyer, J.; Pieper, L.; Beesdo-Baum, K. Dynamics of affect, cognition and behavior in a general population sample of adolescents and young adults with current and remitted anxiety disorders: An Ecological Momentary Assessment study. J. Anxiety Disord. 2023, 93, 102646. [Google Scholar] [CrossRef]
Malykhin, N.; Pietrasik, W.; Aghamohammadi-Sereshki, A.; Ngan Hoang, K.; Fujiwara, E.; Olsen, F. Emotional recognition across the adult lifespan: Effects of age, sex, cognitive empathy, alexithymia traits, and amygdala subnuclei volumes. J. Neurosci. Res. 2022, 101, 367–383. [Google Scholar] [CrossRef] [PubMed]
Sosnowska, J.; Hofmans, J.; De Fruyt, F. Relating emotional arousal to work vigour: A dynamic systems perspective. Personal. Individ. Differ. 2019, 136, 178–183. [Google Scholar] [CrossRef]
Thorn, L.; Hucklebridge, F.; Evans, P.; Clow, A. The cortisol awakening response, seasonality, stress and arousal: A study of trait and state influences. Psychoneuroendocrinology 2009, 34, 299–306. [Google Scholar] [CrossRef]
Tidikis, V.; Ash, I.K.; Collier, A.F. The Interaction of Emotional Valence and Arousal on Attentional Breadth and Creative Task Performance. Creat. Res. J. 2017, 29, 313–330. [Google Scholar] [CrossRef]
Russell, J.A. Core affect and the psychological construction of emotion. Psychol. Rev. 2003, 110, 145–172. [Google Scholar] [CrossRef]
Hamann, S. Nosing in on the emotional brain. Nat. Neurosci. 2003, 6, 106–108. [Google Scholar] [CrossRef]
Li, Y.; Wang, J.; Liang, J.; Zhu, C.; Zhang, Z.; Luo, W. The impact of degraded vision on emotional perception of audiovisual stimuli: An event-related potential study. Neuropsychologia 2023, 191, 108785. [Google Scholar] [CrossRef]
Toisoul, A.; Kossaifi, J.; Bulat, A.; Tzimiropoulos, G.; Pantic, M. Estimation of continuous valence and arousal levels from faces in naturalistic conditions. Nat. Mach. Intell. 2021, 3, 42–50. [Google Scholar] [CrossRef]
Difrancesco, S.; Penninx, B.W.J.H.; Merikangas, K.R.; van Hemert, A.M.; Riese, H.; Lamers, F. Within-Day Bidirectional Associations between Physical Activity and Affect: A Real-Time Ambulatory Study in Persons with and without Depressive and Anxiety Disorders. Depress. Anxiety 2022, 39, 922–931. [Google Scholar] [CrossRef]
Ployhart, R.E.; Bliese, P.D.; Strizver, S.D. Intensive Longitudinal Models. Annu. Rev. Organ. Psychol. Organ. Behav. 2025, 12, 343–367. [Google Scholar] [CrossRef]
Götzendorfer, S.J.; Korb, S.; Massaccesi, C.; Stepnicka, P.; Banchaabouchi, M.A.; Rumiati, R.; Eisenegger, C.; Silani, G. “Wanting” And “Liking” of Primary and Social Rewards in Humans. In Proceedings of the MEi: CogSci Conference, Budapest, Hungary, 22–24 June 2017. [Google Scholar]
Olney, J.J.; Warlow, S.M.; Naffziger, E.E.; Berridge, K.C. Current perspectives on incentive salience and applications to clinical disorders. Curr. Opin. Behav. Sci. 2018, 22, 59–69. [Google Scholar] [CrossRef] [PubMed]
Deng, X.; Chen, J. Agreeableness and Subjective Well-Being: The Mediating Role of Perceived Social Support as a Coping-Relevant Resource and the Moderating Effect of Family Income. Behav. Sci. 2025, 16, 38. [Google Scholar] [CrossRef] [PubMed]
Gupta, P.; Galimberti, M.; Liu, Y.; Beck, S.; Wingo, A.; Wingo, T.; Adhikari, K.; Kranzler, H.R.; Stein, M.B.; Gelernter, J.; et al. A genome-wide investigation into the underlying genetic architecture of personality traits and overlap with psychopathology. Nat. Hum. Behav. 2024, 8, 2235–2249. [Google Scholar] [CrossRef] [PubMed]
d’Arbeloff, T.C.; Kim, M.J.; Knodt, A.R.; Radtke, S.R.; Brigidi, B.D.; Hariri, A.R. Microstructural integrity of a pathway connecting the prefrontal cortex and amygdala moderates the association between cognitive reappraisal and negative emotions. Emotion 2018, 18, 912–915. [Google Scholar] [CrossRef]
Wang, X.; Shao, S.; Cheng, H.; Blain, S.D.; Tan, Y.; Jia, L. Effects of cognitive flexibility on dynamics of emotion regulation and negative affect in daily life. Anxiety Stress Coping 2024, 38, 365–378. [Google Scholar] [CrossRef]
Samuel, G.; Stella, M.; Beaty, R.E.; Kenett, Y.N. Predicting openness to experience via a multiplex cognitive network approach. J. Res. Personal. 2023, 104, 104369. [Google Scholar] [CrossRef]
Stover, A.D.; Shulkin, J.; Lac, A.; Rapp, T. A meta-analysis of cognitive reappraisal and personal resilience. Clin. Psychol. Rev. 2024, 110, 102428. [Google Scholar] [CrossRef]
Talebi, M.; Matheson, K.; Anisman, H. The stigma of seeking help for mental health issues: Mediating roles of support and coping and the moderating role of symptom profile. J. Appl. Soc. Psychol. 2016, 46, 470–482. [Google Scholar] [CrossRef]
Maples-Keller, J.L.; Berke, D.S.; Miller, J.D.; vanDellen, M. Ego depletion and conscientiousness as predictors of behavioral disinhibition: A laboratory examination. Personal. Individ. Differ. 2016, 98, 6–10. [Google Scholar] [CrossRef]
Zhong, Z.; Ren, H.; Wang, S. Achievement Goal–Directed Mechanism Connecting Conscientiousness to Inefficacy: Evidence from Resting-State fMRI. J. Organ. Behav. 2025, 46, 889–905. [Google Scholar] [CrossRef]
Roberts, B.W.; Walton, K.E.; Viechtbauer, W. Patterns of mean-level change in personality traits across the life course: A meta-analysis of longitudinal studies. Psychol. Bull. 2006, 132, 1. [Google Scholar] [CrossRef] [PubMed]
Wirth, M.; Voss, A.; Rothermund, K. Age Differences in Everyday Emotional Experience: Testing Core Predictions of Socioemotional Selectivity Theory with the MIVA Model. J. Gerontol. Ser. B 2023, 78, 1152–1162. [Google Scholar] [CrossRef]
Kavish, N.; Boisvert, D.; Wells, J.; Lewis, R.; Cooke, E.; Woeckener, M.; Armstrong, T. On the associations between indicators of resting arousal levels, physiological reactivity, sensation seeking, and psychopathic traits. Personal. Individ. Differ. 2019, 141, 218–225. [Google Scholar] [CrossRef]
Limas, M.C.; Ordieres Meré, J.B.; de Pisón Ascacibar, F.J.M.; González, E.P.V. Outlier Detection and Data Cleaning in Multivariate Non-Normal Samples: The PAELLA Algorithm. Data Min. Knowl. Discov. 2004, 9, 171–187. [Google Scholar] [CrossRef]
Hecht, M.; Zitzmann, S. Sample Size Recommendations for Continuous-Time Models: Compensating Shorter Time Series with Larger Numbers of Persons and Vice Versa. Struct. Equ. Model. A Multidiscip. J. 2020, 28, 229–236. [Google Scholar] [CrossRef]
Rodebaugh, T.L.; Piccirillo, M.L.; Frumkin, M.R.; Kallogjeri, D.; Gerull, K.M.; Piccirillo, J.F. Investigating Individual Variation Using Dynamic Structural Equation Modeling: A Tutorial with Tinnitus. Clin. Psychol. Sci. 2022, 11, 574–591. [Google Scholar] [CrossRef]
Andriamiarana, V.; Kilian, P.; Kelava, A.; Brandt, H. On the Requirements of Non-linear Dynamic Latent Class SEM: A Simulation Study with Varying Numbers of Subjects and Time Points. Struct. Equ. Model. A Multidiscip. J. 2023, 30, 789–806. [Google Scholar] [CrossRef]
Jiang, F.; Huang, Q.; Mei, X.; Guan, Q.; Tu, Y.; Luo, W.; Huang, C. Face2Nodes: Learning facial expression representations with relation-aware dynamic graph convolution networks. Inf. Sci. 2023, 649, 119640. [Google Scholar] [CrossRef]
Zhang, Z.; Chen, T.; Liu, Y.; Wang, C.; Zhao, K.; Liu, C.H.; Fu, X. Decoding the temporal representation of facial expression in face-selective regions. NeuroImage 2023, 283, 120442. [Google Scholar] [CrossRef]
Lang, J.; Sun, X.; Li, J.; Wang, M. Multi-stage and multi-branch network with similar expressions label distribution learning for facial expression recognition. Pattern Recognit. Lett. 2022, 163, 17–24. [Google Scholar] [CrossRef]
Innes, G.K.; Bhondoekhan, F.; Lau, B.; Gross, A.L.; Ng, D.K.; Abraham, A.G. The Measurement Error Elephant in the Room: Challenges and Solutions to Measurement Error in Epidemiology. Epidemiol. Rev. 2021, 43, 94–105. [Google Scholar] [CrossRef] [PubMed]
Altman, N.; Krzywinski, M. Errors in predictor variables. Nat. Methods 2024, 21, 4–6. [Google Scholar] [CrossRef]
Wei, J.; Hu, G.; Yang, X.; Luu, A.T.; Dong, Y. Learning facial expression and body gesture visual information for video emotion recognition. Expert Syst. Appl. 2024, 237, 121419. [Google Scholar] [CrossRef]
94arma, M.S.; Hoque, M.M.; Siddique, N.; Dewan, M.A.A. AVaTER: Fusing Audio, Visual, and Textual Modalities Using Cross-Modal Attention for Emotion Recognition. Sensors 2024, 24, 5862. [Google Scholar] [CrossRef]
Geetha, A.V.; Mala, T.; Priyanka, D.; Uma, E.J.I.F. Multimodal Emotion Recognition with Deep Learning: Advancements, challenges, and future directions. Inf. Fusion 2024, 105, 102218. [Google Scholar] [CrossRef]
Almulla, M.A. A multimodal emotion recognition system using deep convolution neural networks. J. Eng. Res. 2025, 13, 721–729. [Google Scholar] [CrossRef]
Orzek, J.H.; Voelkle, M.C. Regularized continuous time structural equation models: A network perspective. Psychol. Methods 2023, 28, 1286–1320. [Google Scholar] [CrossRef]
Koch, T.; Voelkle, M.C.; Driver, C.C. Analyzing Longitudinal Multirater Data with Individually Varying Time Intervals. Struct. Equ. Model. A Multidiscip. J. 2022, 30, 86–104. [Google Scholar] [CrossRef]
Mercadante, E.J.; Weidman, A.C.; Tracy, J.L. Let’s Get Together: Toward an Integration of Personality Psychology and Distinct Emotions Research. Perspect. Psychol. Sci. 2025, 20, 1065–1080. [Google Scholar] [CrossRef]
Zhu, X.; Yang, Y.; Xiao, Z.; Pooley, A.; Ozdemir, E.; Speyer, L.G.; Leung, M.; Thurston, C.; Kwok, J.; Li, X.; et al. Daily life affective dynamics as transdiagnostic predictors of mental health symptoms: An ecological momentary assessment study. J. Affect. Disord. 2024, 351, 808–817. [Google Scholar] [CrossRef]
Kuzminskaite, E.; Vinkers, C.H.; Smit, A.C.; van Ballegooijen, W.; Elzinga, B.M.; Riese, H.; Milaneschi, Y.; Penninx, B.W.J.H. Day-to-day affect fluctuations in adults with childhood trauma history: A two-week ecological momentary assessment study. Psychol. Med. 2023, 54, 1160–1171. [Google Scholar] [CrossRef]
Wang, Y.; Song, W.; Tao, W.; Liotta, A.; Yang, D.; Li, X.; Gao, S.; Sun, Y.; Ge, W.; Zhang, W.; et al. A Systematic Review on Affective Computing: Emotion Models, Databases, and Recent Advances (Version 3). arXiv 2022, arXiv:2203.06935. [Google Scholar]
Mohamed, Y.; Lemaignan, S.; Guneysu, A.; Jensfelt, P.; Smith, C. Fusion in Context: A Multimodal Approach to Affective State Recognition (Version 1). arXiv 2024, arXiv:2409.11906. [Google Scholar]

Figure 1. Video Data Processing Flow Note: The data processing steps include video collection, discrete emotional label extraction, valence–arousal dimension conversion, personality data integration, and CTSEM analysis; Source: Drawn by this study.

Figure 2. The matrix rows represent the actual emotion categories: hap = happy, neu = neutral, ang = anger, sur = surprise, dis = disgust, and fea = fear; the columns represent the categories predicted by the model; the color intensity reflects the recognition accuracy, with darker colors indicating higher accuracy.

Figure 3. Training and validation metrics across 200 epochs, including training loss, validation loss, weighted accuracy (WA), and unweighted accuracy (UA).

Figure 4. The red bars represent the arousal (A) intensity distribution of the individuals, and the blue bars represent the pleasure (P) intensity distribution of the corresponding individuals. The height of the bars directly reflects the emotional intensity value. Dots represent outlier observations beyond 1.5 × IQR; the overall mean is standardized; N = 30, 19,262 observations.

Figure 5. Time regression response curve of emotional dynamic parameters (independent shock = 1.0). Note: P_P represents the autoregressive effect of pleasure (the ability of pleasure to maintain its subsequent state); A_A represents the autoregressive effect of arousal (the ability of arousal to maintain its subsequent state); P_A represents the cross-lag effect of pleasure on arousal; A_P represents the cross-lag effect of arousal on pleasure. The horizontal axis represents the time interval (in seconds), and the vertical axis represents the standardized coefficient. The shaded area represents the approximate 95% confidence interval.

Figure 6. Moderating Effect Diagram of Big Five Personality Traits on Emotional Dynamic Parameters: A_A = Arousal autoregressive effect; P_A = Cross-lagged effect of Valence on Arousal; A_P = Cross-lagged effect of Arousal on Valence; P_P = Valence autoregressive effect. Significance levels: * p < 0.05; *** p < 0.001.

Figure 7. Time-Invariant Predictor Effects: Stability Across Label Strategies. Panel (a) shows effects on P_P (Pleasure Autoregression); panel (b) shows effects on A_A (Arousal Autoregression). Blue = Argmax, Orange = Soft Label, Gray = Monte Carlo. Error bars represent Monte Carlo standard deviations.

Table 1. Demographic Distribution of Participants (N = 174).

Category	Subcategory	n	%
Gender	Male	90	51.7
Gender	Female	84	48.3
Age	<35	74	42.5
	35–60	61	35.1
	>60	39	22.4
Occupation	Student	57	32.8
	Employed	43	24.7
	Freelance	19	10.9
	Retired	30	17.2
	Others	25	14.4
Total		174	100.0

Note: The 174 valid samples were divided into two groups based on the research objectives: 144 samples were used for deep learning model training and testing (only video data was provided), and 30 samples were used as independent samples for CTSEM analysis (video data was provided, and the Big Five personality inventory was completed). A detailed comparison of the two groups can be found in Table 2.

Table 2. Participant Background Information.

Indicators	Model Training Test Group (n = 144)	CTSEM Analysis (n = 30)	t/χ²	p
Age	42.92 ± 22.00	35.93 ± 17.93	1.63	0.1052
Gender (Male/Female)	71/73	19/11	1.44	0.2309

Note: A total of 144 cases were used for model training and testing, and 30 cases were used as an independent sample for CTSEM analysis only. Age (mean ± standard deviation).

Table 3. PAD Dimension Mapping Values for Seven Emotion Types.

Emotion Type	Pleasure	Arousal
Anger	−0.51	0.59
Disgust	−0.4	0.2
Fear	−0.64	0.6
Happy	0.4	0.2
Sad	−0.4	−0.2
Surprise	0.2	0.45
Neutral	0	0

Note: Values represent the standardized coordinates of each emotion in the PAD 3D space, ranging from [−1, 1].

Table 4. Statistical Power for Between-Person Moderation Effects (N = 30, k = 7).

Effect Size	f²	Partial \|r\|	Power (α = 0.05)	Power (Bonferroni)	Required N
Small	0.02	0.14	10.2%	0.6%	401
Small-Medium	0.05	0.22	18.2%	1.6%	165
Medium	0.15	0.36	44.2%	7.7%	61
Medium-Large	0.25	0.45	64.9%	17.4%	40
Large	0.35	0.51	79.1%	29.5%	31

Note: Minimum detectable effect size (MDES) at 80% power: f² = 0.36, |r| = 0.51. Bonferroni correction accounts for 28 tests (7 predictors × 4 DRIFT parameters).

Table 5. Performance Comparison of TFace Models with Different Component Configurations on Emotion Recognition Tasks.

Experimental Configuration	True_Ratio	WA (%)	UA (%)	Loss (Test)	Precision (%)	Recall (%)	F1 (%)	Time (s)
TFace	0.53	61.31	39.77	1.56	60.19	66.92	61.37	14.6
TFace-SE	0.56	62.77	41.70	1.62	62.87	62.77	61.89	14.1
TFace-Bi-GRU	0.60	62.04	41.95	1.56	58.03	66.99	62.23	14.8
TFace-Bi-GRU-SE	0.59	64.96	44.53	1.51	59.12	73.72	65.21	14.1

Note: True_Ratio = the accuracy rate of emotion recognition results, calculated as the number of correctly recognized emotion samples by the model/the total number of valid emotion samples manually labeled; WA (%) = Weighted Accuracy; UA (%) = Unweighted Accuracy; Loss (Test) = Cross-entropy Loss on Test Set; Precision (%) = Precision Rate; Recall (%) = Recall Rate; F1 (%) = Harmonic Mean of Precision Rate and Recall Rate; Time (s) = Average Single-round Inference Time; TFace-SE = only replacing MHSA with SE module; TFace-Bi-GRU = only replacing Bi-LSTM with Bi-GRU module; TFace-Bi-GRU-SE = jointly improved model.

Table 6. Estimates for PA Model Parameters.

Parameter Type	Parameter Name	Mean	SD	2.5% CI	50% CI	97.5% CI	z-Value
T0mean	P_P	0.230	0.108	0.014	0.233	0.443	2.13 *
T0mean	A_A	−0.244	0.129	−0.492	−0.243	0.010	−1.90
Auto-effects	P_P	−0.056	0.014	−0.087	−0.055	−0.035	−4.05 ***
Auto-effects	A_A	−0.558	0.050	−0.661	−0.554	−0.471	−11.14 ***
Cross-effects	P_A	0.007	0.067	−0.121	0.006	0.139	0.10
Cross-effects	A_P	−0.026	0.017	−0.060	−0.025	0.006	−1.49
Diffusion parameters	P_P	0.286	0.016	0.259	0.285	0.317	18.11 ***
	A_A	0.339	0.026	0.291	0.337	0.392	13.13 ***
	A_P	0.218	0.079	0.054	0.219	0.366	2.75 **
Mean Measure	P_P	−0.002	0.055	−0.113	−0.004	0.100	−0.04
Mean Measure	A_A	0.002	0.057	−0.107	0.001	0.118	0.03

Note: N = 30; Parameters include mean, standard deviation (SD), confidence intervals (2.5%, 50%, and 97.5% quantiles), and Z-statistic. Valence autoregressive effect = P_P, Arousal autoregressive effect = A_A, Cross-lagged effect of Arousal on Valence = A_P, Cross-lagged effect of Valence on Arousal = P_A. Diffusion parameters measure the random volatility of emotional states. Initial mean (T0mean) represents the estimated mean of emotional variables at time zero. Mean Measure represents the mean of the measurement residuals of emotional variables. General parameters are reserved to 2–3 decimal places. Parameters with minimal absolute values are reserved to 3–4 decimal places to ensure accuracy and avoid information loss. Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. The same below.

Table 7. Big Five Personality Effects on Affective Dynamics.

Predictive Variable	Target Parameter	Mean	SD	95% Confidence Interval	z
Age	P_P	−0.084	0.013	[−0.110, −0.061]	−6.61 ***
	A_A	−0.745	0.075	[−0.900, −0.605]	−9.92 ***
	A_P	0.237	0.036	[0.167, 0.309]	6.55 ***
	P_A	0.053	0.010	[0.034, 0.072]	5.52 ***
Gender	P_P	0.187	0.030	[0.135, 0.252]	6.15 ***
	A_A	0.108	0.154	[−0.208, 0.382]	0.70
	A_P	0.193	0.073	[0.048, 0.333]	2.65 **
	P_A	−0.070	0.012	[−0.091, −0.046]	−5.93 ***
Extraversion	P_P	0.153	0.021	[0.115, 0.194]	7.27 ***
	A_A	1.225	0.104	[1.033, 1.439]	11.75 ***
	A_P	−0.088	0.024	[−0.136, −0.040]	−3.61 ***
	P_A	−0.175	0.024	[−0.219, −0.127]	−7.36 ***
Agreeableness	P_P	0.006	0.009	[−0.011, 0.027]	0.67
	A_A	0.743	0.098	[0.548, 0.940]	7.62 ***
	A_P	0.087	0.119	[−0.136, 0.312]	0.73
	P_A	0.164	0.033	[0.094, 0.231]	4.91 ***
Conscientiousness	P_P	0.000	0.009	[−0.015, 0.017]	0.01
	A_A	−0.557	0.081	[−0.719, −0.402]	−6.87 ***
	A_P	0.144	0.040	[0.066, 0.219]	3.56 ***
	P_A	−0.197	0.029	[−0.255, −0.140]	−6.91 ***
Neuroticism	P_P	0.094	0.013	[0.072, 0.120]	7.34 ***
	A_A	−0.348	0.074	[−0.493, −0.210]	−4.71 ***
	A_P	0.312	0.052	[0.211, 0.417]	6.03 ***
	P_A	0.082	0.018	[0.046, 0.118]	4.55 ***
Openness	P_P	0.153	0.024	[0.110, 0.203]	6.33 ***
	A_A	0.761	0.065	[0.637, 0.883]	11.64 ***
	A_P	−0.163	0.035	[−0.235, −0.093]	−4.65 ***
	P_A	−0.030	0.012	[−0.054, −0.007]	−2.49 *

Note: P_P = Valence autoregressive effect (the ability of Valence to maintain its subsequent state over time); A_A = Arousal autoregressive effect (the ability of Arousal to maintain its subsequent state over time); A_P = Cross-lagged effect of Arousal on Valence (the dynamic influence of Arousal on subsequent Valence); P_A = Cross-lagged effect of Valence on Arousal (the dynamic influence of Valence on subsequent Arousal); Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001.

Table 8. Bayes Factor Results.

Predictor	Outcome	BF10	r	p_Value
Extraversion_z	P_mean	0.369	−0.081	0.671
Neuroticism_z	P_mean	0.441	0.151	0.427
Conscientiousness_z	P_mean	0.723	0.257	0.170
Agreeableness_z	P_mean	2.386	0.401	0.028 *
Openness_z	P_mean	18.306	0.543	0.002 **
Extraversion_z	A_mean	0.807	0.275	0.142
Neuroticism_z	A_mean	2.748	0.414	0.023 *
Conscientiousness_z	A_mean	0.744	0.262	0.162
Agreeableness_z	A_mean	0.541	0.203	0.283
Openness_z	A_mean	1.007	0.306	0.100
Extraversion_z	P_sd	0.488	−0.179	0.344
Neuroticism_z	P_sd	0.357	0.057	0.764
Conscientiousness_z	P_sd	0.353	0.047	0.806
Agreeableness_z	P_sd	0.391	−0.109	0.568
Openness_z	P_sd	0.345	0.009	0.962
Extraversion_z	A_sd	0.496	−0.183	0.334
Neuroticism_z	A_sd	0.493	−0.181	0.338
Conscientiousness_z	A_sd	0.480	0.175	0.356
Agreeableness_z	A_sd	0.356	−0.056	0.770
Openness_z	A_sd	1.236	−0.332	0.073

Note: Significance levels: * p < 0.05, ** p < 0.01.

Table 9. Bootstrap Robustness Analysis.

Parameter	Original	Mean	SD	SE	95% CI	CV	Direction Consistency
P_P	−0.070	−0.074	0.018	0.003	[−0.122, −0.052]	0.24	100%
A_A	−0.058	−0.063	0.020	0.003	[−0.113, −0.030]	0.32	100%
P_A	−0.004	−0.006	0.017	0.002	[−0.048, 0.021]	2.94	60%
A_P	−0.011	−0.013	0.023	0.003	[−0.065, 0.022]	1.82	70%

Note: Bootstrap resampling with replacement (n = 50). CV = Coefficient of Variation. Direction Consistency = % of bootstrap samples with the same sign as the original.

Table 10. Confusion-Matrix-Based Sensitivity Analysis Results.

Scale	Frames Flipped (%)	P_P	A_A	P_P dev (%)	A_A dev (%)	Direction
×0.00	0	−0.176	−0.244	0.0	0.0	100%
×0.25	12.3	−0.211	−0.287	20.2	17.3	100%
×0.50	24.9	−0.244	−0.322	39.1	31.6	100%
×0.75	37.0	−0.275	−0.350	56.8	43.1	100%
×1.00	48.9	−0.299	−0.360	70.1	47.2	100%
×1.25	59.7	−0.321	−0.369	83.0	51.0	100%

Note: Scale = multiplication factor applied to per-class empirical error rates on top of existing classification errors (WA = 64.96% for seed = 42, the model run used here; five-seed mean: 63.50% ± 0.98%). Direction = percentage of conditions where autoregressive parameters maintained a negative sign. Cross-effects (P_A, A_P) remained near zero (|mean| < 0.015) across all conditions. Three iterations per non-zero scale level.

Table 11. Soft-Label Sensitivity Analysis Results.

Condition	P_P	A_A	P_A	A_P
Baseline (argmax)	−0.169	−0.228	−0.004	−0.002
Soft Label	−0.155	−0.247	−0.007	0.001
Monte Carlo Mean	−0.300	−0.358	−0.001	0.006
Monte Carlo SD	0.005	0.007	0.011	0.005

Note: Baseline uses original argmax-derived P and A values, which were replaced with confusion-probability-weighted expected values. Monte Carlo (200 iterations) samples labels from confusion probability distributions. All conditions used 3-point moving average smoothing and identical standardization parameters.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Meng, L.; Li, M.; Sun, X. Moderating Roles of the Big Five in Valence–Arousal Dynamics: A TFace-Bi-GRU-SE and CTSEM Study. Information 2026, 17, 334. https://doi.org/10.3390/info17040334

AMA Style

Meng L, Li M, Sun X. Moderating Roles of the Big Five in Valence–Arousal Dynamics: A TFace-Bi-GRU-SE and CTSEM Study. Information. 2026; 17(4):334. https://doi.org/10.3390/info17040334

Chicago/Turabian Style

Meng, Lingping, Mingzheng Li, and Xiao Sun. 2026. "Moderating Roles of the Big Five in Valence–Arousal Dynamics: A TFace-Bi-GRU-SE and CTSEM Study" Information 17, no. 4: 334. https://doi.org/10.3390/info17040334

APA Style

Meng, L., Li, M., & Sun, X. (2026). Moderating Roles of the Big Five in Valence–Arousal Dynamics: A TFace-Bi-GRU-SE and CTSEM Study. Information, 17(4), 334. https://doi.org/10.3390/info17040334

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Moderating Roles of the Big Five in Valence–Arousal Dynamics: A TFace-Bi-GRU-SE and CTSEM Study

Abstract

1. Introduction

2. Materials and Methods

2.1. Participants and Procedure

2.2. Measures

2.2.1. Big Five Personality Inventory

2.2.2. Video Frame Data and Preprocessing

2.2.3. CTSEM Analysis

2.3. Statistical Analysis

2.3.1. CTSEM Model Specification

2.3.2. Statistical Power Considerations

2.3.3. Supplementary Analyses

2.3.4. Robustness Verification

2.3.5. Measurement Error Sensitivity Analysis

3. Results

3.1. Performance Evaluation of the TFace-Bi-GRU-SE Model

3.2. Estimation of Emotional Dynamic Parameters

3.3. Association Patterns of Time-Invariant Predictor Variables: Exploratory Analysis

3.4. Correlation Analysis of Emotional Indicators with Demographic Characteristics and Big Five Personality Dimensions

3.5. Individual-Level Heterogeneity and Substantial Effects

3.6. The Moderating Effect of Gender on the Personality–Emotion Relationship

3.7. Robustness Analysis

3.8. Measurement Error Sensitivity

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI