SED-GPT: A Non-Invasive Method for Long-Sequence Fine-Grained Semantics and Emotions Decoding

Cui, Wenhao; Wang, Zhaoxin; Ma, Lei

doi:10.3390/app152011100

Open AccessArticle

SED-GPT: A Non-Invasive Method for Long-Sequence Fine-Grained Semantics and Emotions Decoding

by

Wenhao Cui

^1,2

,

Zhaoxin Wang

¹ and

Lei Ma

^1,*

¹

School of Information Science and Technology, Nantong University, Nantong 226019, China

²

Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London WC2R 2LS, UK

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(20), 11100; https://doi.org/10.3390/app152011100

Submission received: 19 August 2025 / Revised: 9 October 2025 / Accepted: 14 October 2025 / Published: 16 October 2025

Download

Browse Figures

Versions Notes

Abstract

Featured Application

This study introduces the Semantic and Emotion Decoding Generative Pre-trained Transformer (SED-GPT), a non-invasive framework for decoding fine-grained semantics and emotions from long-sequence fMRI. The approach provides potential applications in precise detection of affective disorders, personalized neuromodulation, and human–computer interaction systems that adapt to users’ emotional and semantic states.

Abstract

Traditional emotion decoding methods typically rely on short sequences with limited context and coarse-grained emotion categories. To address these limitations, we proposed the Semantic and Emotion Decoding Generative Pre-trained Transformer (SED-GPT), a non-invasive method for long-sequence fine-grained semantics and emotions decoding on extended narrative stimuli. Using a publicly available fMRI dataset from 8 participants, this exploratory study investigates the feasibility of reconstructing complex semantic and emotional states from brain activity. SED-GPT achieves a BERTScore-F1 of 0.650 on semantic decoding and attains a cosine similarity (CS) of 0.504 and a Jensen–Shannon similarity (JSS) of 0.469 for emotion decoding (p < 0.05). Functional connectivity analyses reveal persistent coupling between the language network and the emotion network, which provides neural evidence for the language–emotion interaction mechanism in Chinese. These findings should be interpreted as pilot-level feasibility evidence.

Keywords:

semantic decoding; emotion decoding; functional magnetic resonance imaging (fMRI); large language models (LLMs); Semantic to Brain Response Conversion Module

1. Introduction

Human emotional experience is complex and involves inner speech activities and self-dialogue during semantic perception [1]. These inner speech processes encode subjective emotional states and elicit neural activation in regions such as the anterior cingulate cortex, insula, and ventromedial prefrontal cortex. These patterns partially overlap with those observed during emotional expression [2,3,4]. Therefore, emotion constitutes not merely a physiological or behavioral response but also involves cognitive–semantic processing [5]. Emotional states activate their associated semantic networks, while semantic representations contribute to the refined construction of emotional experience [6,7,8,9]. Long-sequence semantic decoding can effectively extract context-dependent information during emotional construction, thereby providing the foundation for fine-grained emotional representation [10,11].

Common neuroimaging modalities for emotion decoding primarily include electroencephalography (EEG), magnetoencephalography (MEG), functional near-infrared spectroscopy (fNIRS), and functional magnetic resonance imaging (fMRI) [12]. EEG and MEG offer millisecond temporal resolution but poor spatial specificity due to volume conduction and field spread, making source localization inherently uncertain [13]. For fNIRS, although it can provide relatively high spatial resolution, its measurement depth is only 2–3 cm [14]. In contrast, fMRI offers high spatial resolution and whole-brain coverage. This technique can not only enable precise localization of emotion-related brain regions but also supports simultaneous detection of co-activation patterns between language processing networks and emotion regulation systems via blood oxygenation level-dependent (BOLD) signals [15]. These features provide a clear advantage for studying the neural mechanisms of emotion and semantic interaction in the Chinese context.

Traditional emotion classification methods typically rely on short-sequence emotion-induction paradigms and machine learning (ML) algorithms. Kassam et al. elicited nine discrete emotions using word-cued spontaneous emotion induction tasks, achieving a rank accuracy of 0.84 in fMRI-based emotion discrimination [16]. Saarimäki et al. employed multivariate pattern analysis (MVPA) to distinguish six basic emotions, demonstrating good cross-subject generalizability [17]. However, conventional algorithms show significantly reduced decoding accuracy when the number of target emotion categories increases, and these algorithms face constraints from short stimulus sequences, non-naturalistic experimental paradigms, and coarse-grained emotion classification [18].

In recent years, large language models (LLMs) exemplified by GPT have demonstrated outstanding performance across diverse domains. These models can effectively capture rich contextual semantic representations and exhibit deep-level semantic reasoning capabilities, offering a novel approach to long-sequence emotion decoding [19]. Tang et al. developed an fMRI-based generative decoding framework that maps brain activity patterns into the latent semantic space of LLMs, achieving neural reconstruction of continuous English speech (BERTScore = 0.82) and revealing the distributed encoding characteristics of the brain’s language network [20]. However, current research has two critical limitations. First, existing methods have not been validated on high-context language systems such as Chinese [21]. Second, prior studies have primarily focused on decoding basic semantic content, failing to further examine the subtle emotional components of language processing [22]. These limitations constrain our understanding of the neural mechanisms that underlie language and emotion interaction in real-world scenarios.

Accordingly, this study aims to develop and validate a fine-grained semantic and emotion decoding framework for extended Chinese narratives, bridging the current gap in understanding the neural mechanisms underlying language–emotion interactions. The innovations of this study include:

Extended emotion decoding in high-context language: This study investigates the feasibility of long-sequence emotion decoding in Chinese, extending emotion decoding to high-context language systems.
Fine-grained emotion decoding with SED-GPT: We propose a novel fine-grained decoding framework (SED-GPT) for Chinese narratives, which aligns brain activity with LLM-based semantic vector representations to reconstruct inner speech semantics.
Dynamic neural interactions in emotion–semantic processing: By systematically examining the dynamic interplay between the language network and emotional systems during Chinese semantic processing, this work provides neural evidence for cognition–emotion coupling.

Overall, this approach may inform the future development of affective brain–computer interfaces and therapeutic tools of cognitive behavioral therapy.

2. Materials and Methods

2.1. Dataset

This study employed the publicly available dataset SMN4Lang to evaluate our model, which comprises structural MRI and functional MRI data from 12 participants, with each participant contributing 6 h of fMRI data [23].

At trial onset, a screen displayed the instruction “Waiting for scan”, followed by an 8 s blank screen. The instruction then changed to “The audio will begin shortly. Please listen carefully” for 2.65 s before the auditory stimulus was presented.

The stimulus set consisted of 60 audio clips (4–7 min each) from People’s Daily news stories, covering diverse topics including education and culture. All recordings were narrated by the same male speaker. Manual timestamp alignment was performed to ensure precise synchronization between the audio and corresponding textual transcripts.

Structural MRI and fMRI were acquired using a Siemens Prisma 3T scanner (Siemens Healthineers, Erlangen, Germany) equipped with a 64-channel receive coil. T1-weighted images were obtained with a 3D MPRAGE sequence at an isotropic spatial resolution of 0.8 mm³ with the following parameters: TR = 2.4 s, TI = 1 s, TE = 2.22 ms, flip angle = 8°, and FOV = 256 × 256 mm.

fMRI data were collected using a BOLD-sensitive T2*-weighted GE-EPI sequence with the following parameters: TR = 710 ms, TE = 30 ms, flip angle = 54°, in-plane resolution = 2 mm, and FOV = 212 × 212 mm.

2.2. Data Preprocessing

We preprocessed structural and functional images with fMRIPrep v24.1.1 (Poldrack Lab, Stanford University, Stanford, CA, USA) and normalized them to MNI152NLin2009cAsym. Participants were included in the decoding experiments only if their skull-stripped and standardized T1-weighted images, as well as the reconstructed gray matter cortical surfaces, showed no evident cortical region loss after preprocessing. Four participants were excluded, leaving eight participants for analysis. The demographic details (ID, age, and sex) of the remaining eight participants are presented in Table 1.

30 long-sequence task fMRI files exclusively from training set were randomly sampled to identify brain regions showing significant activation during Chinese semantic perception and in response to twenty-one emotion word categories using second-level general linear model (GLM) [24,25]. Psychophysiological interaction analysis (PPI) then probed the neural pathways underlying semantic perception [26].

In comparing activation between task and resting states, at the individual analysis stage, we used a first level GLM based on the hemodynamic response function of SPM12 (Wellcome Centre for Human Neuroimaging, University College London, London, UK) and an autoregressive noise model AR(1). In the group analysis we constructed group level activation statistical maps using a second level GLM. After applying a voxel level correction (p < 0.001, Z > 3.09) and filtering by a minimum cluster size of 50 voxels, we identified regions showing significant activation [27].

In the study of neural pathways underlying Chinese semantic processing, BOLD signal time series were extracted from predefined brain seed regions. Physiological regressors, psychological regressors, and the original interaction regressor were constructed and then convolved with the hemodynamic response function. A first level GLM was fitted, followed by a second-level GLM group comparison to generate Z-statistic parametric maps. False Discovery Rate (FDR) correction (p < 0.001) was applied to identify brain regions exhibiting significant functional coupling with the semantic perception hubs [26].

We used the Affective Lexicon Ontology to mark timestamps for 21 emotions (joy, calm, respect, praise, trust, love, well wishing, anger, sadness, disappointment, guilt, longing, panic, fear, shame, frustration, disgust, blame, jealousy, doubt, and surprise) and built an event matrix to contrast emotion vs. neutral words [28]. After multiple comparison correction and filtering by a minimum cluster size of 10 voxels, we identified the ROIs showing significant activation for each emotion category [27].

Based on the significant activated brain regions and the functionally coupled regions identified above, along with prior regions implicated in Chinese semantic processing (probabilistic atlas threshold: Pr ≥ 0.25) [29,30], MNI coordinates were extracted to generate ROI masks. Using these ROI masks, the fMRI response time series of voxels within each mask were extracted. The 3D coordinates of the voxels were then re-encoded into 1D indices, forming a 2D neural response matrix.

To avoid data leakage and clearly separate training and evaluation phases, we adopted the following split strategy:

30 runs were randomly selected as the held-out test set, ensuring that these runs were never used for model training. The remaining runs from each participant were used exclusively to train that participant’s decoder. No run ever appeared in both sets, thereby preventing temporal or contextual overlap between training and test data. The exact subject–run list is presented in Table 2.

This split maximized the number of training runs retained per participant and minimized the risk of information leakage between training and testing phases.

2.3. SED-GPT

We propose Semantic and Emotion Decoding Generative Pretrained Transformer (SED-GPT), as shown in Figure 1.

In the encoding stage, we employed a Semantic to Brain Response Conversion Module (SBRCM) to minimize the inter-modal distance between linguistic stimuli and brain activity patterns. This module established a mapping between stimulus features and corresponding neural responses, incorporating a word rate model while estimating noise covariance matrices to improve model generalizability.

In the decoding stage, new semantic sequences were generated using linguistic priors from the language model and candidates. These sequences were then projected into neural response space through the SBRCM. The predicted simulated responses were compared with empirically observed brain activity patterns to compute likelihood distributions. Through iterative Bayesian integration of prior probabilities and likelihood estimates, the optimal semantic sequence was reconstructed. This iterative process enabled the generation of extended semantic vectors, whose emotional valence was subsequently classified into multiple categories using the GoEmotions framework [31].

2.4. Fine-Tuning of LLMs

English uses clear spaces between words and has rich morphology. This makes word-level tokenization effective for capturing both semantics and syntax. In contrast, Chinese lacks explicit word boundaries, and each character carries its own meaning, so character-level tokenization allows flexible combination into complete lexical units [32]. Consequently, English LLMs are typically trained with words as tokens, whereas Chinese LLMs use characters as tokens.

Character-encoded Chinese LLMs successfully avoid the complexity of Chinese word segmentation and require less computation for training [32]. However, this design choice also means that high-performance, open-source Chinese GPT based on word-level tokens remain unavailable [33,34]. Character-level tokenization in Chinese GPT models does not align with word-based semantic encoding mechanisms: neuroimaging studies show that Mandarin users rely on explicit word boundaries, encoding meaning at the word rather than character level [34,35].

Unlike the character-level tokenization commonly used in Chinese LLMs, human language comprehension integrates lexical, syntactic, and contextual cues in a distributed semantic network. New symbols are matched against existing nodes to activate concepts or domains [36]. This semantic information processing mechanism parallels the token-based encoding used in LLMs, where each input token is projected into a high-dimensional semantic space to capture and combine conceptual features [37].

In this study, we indexed semantic feature vectors at the word level. We used GPT-4 to translate the time aligned Chinese transcripts into English in batch and used a GPT-2 model—fine-tuned on the DeepMind Q&A news corpus—to construct the stimulus matrix [38]. The fine-tuning was carried out using a maximum sequence length of 1024 tokens and a stride of 896 tokens to preserve continuity across long contexts. The AdamW optimizer provided by PyTorch v2.3.1 (Meta AI, Menlo Park, CA, USA) was used with an initial learning rate of 0.00002 and a 3% linear warmup schedule, and the model was trained for three epochs. Performance was evaluated on a held-out validation set using perplexity, which was used to determine the final checkpoint. The prompt for LLM-based timestamp conversion was: Convert the Chinese text into the most appropriate English. You may split intervals and their corresponding times; ensure that each interval corresponds to exactly one English word and that the xmax value at the end of each sentence remains consistent.

For each word-time pair at every timestamp, the word sequence was fed into the GPT language model, and the semantic feature vector of the target word was extracted from the 9th layer of the model, where the semantic feature vector represents a 768-dimensional semantic embedding [39]. These embedding vectors were temporally resampled to match the fMRI data acquisition time points using a three-lobe Lanczos filter [40].

To evaluate potential translation bias, we repeated the pipeline without translation using a Chinese-native LLM (Qwen2 Base) to assess tokenization granularity effects on performance.

2.5. Semantic to Brain Response Conversion Module

In the encoding stage, the SBRCM receives input from two modalities: 768-dimensional semantic embeddings extracted from GPT-2 and BOLD signals recorded by fMRI. To account for the temporal-scale difference between these two modalities, the semantic stimulus vectors from 5–10th TRs prior to neural response onset were concatenated to construct a joint feature space. A linear mapping from this feature space to neural signals was then established through L2-regularized regression, with a word rate model being derived by estimating semantic occurrence frequency within individual TRs. To enhance model generalizability, a bootstrapping approach was incorporated in the encoding phase [41].

In the decoding stage, candidate semantic sequences were initially generated based on GPT-2 prior probabilities. These candidate sequences were subsequently projected into neural response space through the SBRCM to produce predicted brain responses. Each candidate’s likelihood was computed by evaluating the correspondence between its predicted response and the empirically observed neural signals. Candidate sequences exhibiting significant discrepancies with the actual brain response patterns were iteratively filtered out. We then combined language model priors with neural likelihoods, applying beam search to iteratively expand and retain the top-scoring candidates at each decoding step. This process was repeated until multiple complete text passages were generated. Beam search is employed to iteratively retain the top-scoring candidate sequences at each decoding step, while expanding them with the most probable next tokens. Finally, the sequence exhibiting the highest likelihood was selected, and the GoEmotions model was applied to extract an emotion probability distribution [31].

The detailed computational procedure of the SBRCM is summarized in Algorithm 1, which specifies the inputs, outputs, and step-by-step operations for both the encoding and decoding stages.

Algorithm 1. SBRCM

Input: Semantic embeddings

S (t) \in R^{768 \times 6}

, fMRI signals

R (t)

, TR = 0.71 s, Hemodynamic delays = [5, 6, 7, 8, 9, 10]
Output: Optimal decoded semantic sequence

S_{b e s t}

, Emotion Distribution
#Encoding
1: Construct delayed semantic feature matrix

X (t) = [S_{i} (t - 5), S_{i} (t - 6), S_{i} (t - 7), S_{i} (t - 8), S_{i} (t - 9) {, S}_{i} (t - 10)]

2: Normalize features

\tilde{X} (t) = Z s c o r e (X (t))

3: Predict brain response

\hat{R} (t) = \tilde{X} {(t)}^{⊤} W

4: Estimate optimal weight

W

\underset{W}{m i n} \{\sum_{t = 1}^{T} {[R (t) - \tilde{X} {(t)}^{⊤} W]}^{2} + λ \sum_{j = 1}^{4608} \sum_{j} W_{j}^{2}\}

λ^{*} = a r g \underset{λ}{m a x} E_{bootstrap} [R^{2} (λ)]

W_{λ^{*}} = {(\tilde{X} {(t)}^{⊤} \tilde{X} (t) + λ^{*} I)}^{- 1} \tilde{X} {(t)}^{⊤} R

5: Construct a delayed brain response feature vector

R_{d} (t) = [R (t + 5), R (t + 6), R (t + 7), R (t + 8), R (t + 9), R (t + 10)]

6: Predict word rate following the same logic as

\hat{R} (t) = \tilde{X} {(t)}^{⊤} W

:

\hat{W R} (t) = R_{d} {(t)}^{⊤} W

#Each TR is partitioned into sub-intervals according to word rate

\hat{W R} (t)

.
#Decoding
7: The posterior is decomposed into the prior and the likelihood

P (S | R_{t e s t}) \propto P (S) P (R_{t e s t} | S)

# The likelihood is obtained by comparing

\hat{R} (t)

with

R (t)

8: Compute the new candidate sequence

S^{'}

score

S c o r e (S^{'}) = 0.6 l o g P (R_{t e s t} ∣ S^{'}) + 0.4 l o g P (S^{'})

9. Iteratively retain the top-scoring sequences and update candidate sequences
10: Final decoded sequence

S_{b e s t} = \arg \underset{S \in B}{m a x} [P (R_{t e s t}∣ S) P (S)]

11: Compute emotion distribution
Emotion Distribution = GoEmotions(

S_{b e s t}

)

2.6. Evaluation Metrics

2.6.1. Semantic Similarity

BERTScore, word error rate (WER) and Euclidean distance (ED) were employed to quantify how closely the decoded text matches the original stimulus.

BERTScore measures semantic overlap by aligning contextual embeddings of the candidate and reference texts to compute precision, recall and F1 score [42]. Precision reflects the matching quality between the two texts. Recall reflects the extent of matching coverage. The F1 score balances both quality and coverage. The metrics are defined as follows:

P r e c i s i o n = \frac{1}{M} \sum_{i = 1}^{M} \underset{j \in {1, \dots, N}}{m a x} c o s (a_{i}, b_{j}),

(1)

R e c a l l = \frac{1}{N} \sum_{j = 1}^{N} \underset{i \in {1, \dots, M}}{m a x} c o s (b_{j}, a_{i}),

(2)

F 1 = \frac{2 \cdot P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l},

(3)

where

A = {a_{1}, a_{2}, \dots, a_{M}}

denotes the set of candidate text vectors, and

B = {b_{1}, b_{2}, \dots, b_{M}}

denotes the set of reference text vectors.

ED computes the Euclidean distance between corresponding semantic vectors, where smaller values indicate greater vector proximity and consequently lower semantic reconstruction error [43]. The metric is defined as follows:

E D = \sqrt{\sum_{i = 1}^{N} {(a_{i} - b_{i})}^{2}},

(4)

where

A = {a_{1}, a_{2}, \dots, a_{M}}

denotes the set of candidate text vectors, and

B = {b_{1}, b_{2}, \dots, b_{M}}

denotes the set of reference text vectors.

Word error rate (WER) measures, at the word level, the proportion of insertion, deletion, and substitution errors between the predicted text and the reference text relative to the total number of words in the reference [44]. The metric is defined as follows:

W E R = \frac{S + D + I}{N},

(5)

where

S

is the number of substitutions in the candidate text,

D

is the number of deletions (words present in the reference but missing in the candidate),

I

is the number of insertions (extra words in the candidate), and

N

the total number of words in the reference text.

2.6.2. Emotional Similarity

For emotion decoding, the GoEmotions framework was applied to extract normalized emotion probability distributions from the decoded text [31].

Cosine Similarity (CS) and Jensen-Shannon (JSS) similarities were then computed between two sets of comparisons: (a) 30 randomly sampled decoded emotion distributions and the true distributions, and (b) 30 randomly generated emotion distributions and true distributions. The emotion-decoding performance of SED-GPT was quantified through direct comparison of these similarity scores.

CS is defined as the cosine of the angle between the predicted and true probability vectors, reflecting the overall alignment of the distributions [45]. To evaluate emotion similarity, we employed the following formula:

C S (P, Q) = \frac{P^{⊤} Q}{∥ P ∥ ∥ Q ∥},

(6)

where the normalized emotion probability distribution of the decoded text is

P = (p_{1}, \dots, p_{n})

, and the normalized emotion probability distribution of the true text is

Q = (q_{1}, \dots, q_{n})

.

JSS is defined as one minus the Jensen–Shannon divergence between the two distributions. JSS measures the similarity between the predicted emotion distribution and the true distribution [46]. To evaluate semantic similarity, we employed the following formula:

J S S (P, Q) = 1 - \sqrt{\frac{1}{2} \sum_{i = 1}^{n} (p_{i} \log_{2} \frac{2 p_{i}}{p_{i} + q_{i}} + q_{i} \log_{2} \frac{2 q_{i}}{p_{i} + q_{i}})},

(7)

where the normalized emotion probability distribution of the decoded text is

P = (p_{1}, \dots, p_{n})

, and the normalized emotion probability distribution of the true text is

Q = (q_{1}, \dots, q_{n})

.

The similarity between the generated text and the reference text was quantified for each emotion category using GoEmotions with multi-emotion classifications. A similarity metric was defined through computation of the normalized ratio for each emotion category. The closer this ratio is to 1, the more consistent the predicted probabilities for that emotion between the two texts [47]. To evaluate semantic similarity, we employed the following formula:

S i m (p_{i}, q_{i}) = \frac{\min (p_{i}, q_{i})}{\max (p_{i}, q_{i})},

(8)

where the normalized emotion probability distribution of the decoded text is

P = (p_{1}, \dots, p_{n})

, and the normalized emotion probability distribution of the true text is

Q = (q_{1}, \dots, q_{n})

.

For the random baseline, the same metric is computed by replacing

P

with the distribution from the random text while keeping

Q

as the true distribution.

Determining positive labels for the ground truth in multi-label emotion distributions is challenging. We therefore binarize the gold emotion distribution using a Top-k rule (default k = 3), counting all ties at the third boundary as positive, and perform sensitivity analyses at k = 2/4. Selecting a threshold for multi-label emotion predictions is likewise challenging, so we adopt threshold-free AUPRC as the primary metric. For each emotion class, we sweep its GoEmotions scores in descending order as decision thresholds to trace the PR curve and compute the AUPRC. Finally, we report the overall AUPRC, with 95% confidence intervals estimated via sample-level paired bootstrap. Brier was used to evaluate probabilistic accuracy and calibration.

3. Results

3.1. Brain Activation and Functional Connectivity of Chinese Semantic Perception

3.1.1. Brain Activation of Chinese Semantic Perception

Comparative results between task-state and resting-state activation during the Chinese semantic perception task are shown in Figure 2 and Table 3.

During Chinese semantic processing task, significant activation was observed in the classical language network and semantic processing brain regions, accompanied by deactivation patterns in the default mode network (DMN) and primary sensorimotor cortices. Specifically, enhanced neural activity was identified in the bilateral frontal poles and posterior superior temporal gyri.

3.1.2. Functional Connectivity of Chinese Semantic Perception

Based on the above activation results, we selected bilateral frontal poles and posterior superior temporal gyri as seed ROIs and performed PPI analysis to investigate the functional connectivity mechanisms underlying semantic processing, as shown in Figure 3 and Table 4.

The PPI analysis revealed that core semantic processing regions exhibited significant task-state functional coupling with distributed brain areas. These semantic processing regions showed significant functional coupling with the 79 distinct connectivity clusters (FDR, p < 0.001) including paracingulate gyrus, anterior cingulate cortex and insular cortex.

These connections support processes such as semantic retrieval, contextual and narrative maintenance, motor simulation, attentional control, visual imagery and spatial scene construction, and affective and self-referential processing [48,49,50,51,52,53,54].

3.2. Brain Regions Activated by Emotional Words

The comparative activation results between each category of emotion words and neutral words during Chinese semantic perception tasks are shown in Table S1 in Supplementary Material.

Under conditions of continuous natural language stimulation, all emotion categories elicit widely distributed neural activation patterns. Specifically, the processing of emotional words engaged not only classic limbic regions associated with affective processing, but also significantly activated: primary visual cortex, sensorimotor cortices, facial expression-modulation regions and high-level cognitive cortices [55,56,57].

3.3. Semantic Decoding Performance

To assess semantic decoding performance, the text outputs generated by the decoder were quantitatively compared with both the original stimulus texts (ground truth) and randomly generated control texts, as shown in Table 5.

For the experimental group (EXP, n = 30), the 95% confidence intervals (CIs) were as follows: BERTScore F1 = 0.650 (95% CI: 0.594–0.706), ED = 12.432 (95% CI: 11.351–13.513), and WER = 0.924 (95% CI: 0.914–0.934). For the random group (RM, n = 200), the 95% CIs were BERTScore F1 = 0.326 (95% CI: 0.316–0.336), ED = 14.528 (95% CI: 14.434–14.622), and WER = 0.989 (95% CI: 0.982–0.996). Across all three metrics, the differences between the experimental and random groups were highly significant (p < 0.001). This indicates that the text produced by our semantic decoder significantly outperformed the random baseline at capturing and reconstructing semantic information in long Chinese narratives.

To further examine the potential influence of tokenization granularity on decoding performance, we conducted a sensitivity analysis by comparing the proposed English word-level decoder with a decoder constructed using the Chinese-native LLM Qwen2 Base, which operates at the character level. The results are summarized in Table 6.

For BERTScore-F1, the experimental decoder achieved 0.650 (95% CI: 0.594–0.706), which was significantly higher than the Chinese-native LLM decoder at 0.529 (95% CI: 0.526–0.532; p < 0.001). ED showed no significant difference between the two groups (p = 0.559), with overlapping CIs (EXP: 95% CI: 11.352–13.512; Chinese LLM: 95% CI: 14.033–14.415). For WER, the experimental group showed significantly lower values (95% CI: 0.914–0.934) compared to the Chinese-native LLM group (95% CI: 0.941–0.975; p < 0.001). This suggests that the performance difference is primarily associated with word-level representation.

3.4. Emotion Decoding Performance

The emotion recognition performance was quantitatively assessed by computing both CS and JSS between the decoded emotion distributions and the corresponding true distributions. These metric values were compared against a random baseline condition, as shown in Table 7.

For emotion recognition evaluation, we conducted multidimensional affective analysis of the decoded results and compared them with a random baseline. The experimental group achieved a CS of 0.504 (95% CI: 0.374–0.634) compared to 0.233 (95% CI: 0.140–0.326) for the random group, and a JSS of 0.469 (95% CI: 0.384–0.554) compared to 0.323 (95% CI: 0.268–0.378) for the random group. Overall, the experimental group demonstrated significantly higher scores in both CS and JSS compared to the random group (p < 0.05).

The normalized ratios between the emotion distributions of decoded texts and the true emotion distributions were calculated. These ratios were then compared against random baseline distributions, as detailed in Table 8 and Figure 4.

The decoding accuracy rates for anger, disgust, embarrassment, fear, grief, joy, nervousness, neutral, remorse, sadness, caring, confusion, desire and love in the experimental group was significantly above the random baseline (p < 0.05). These results underscore the decoder’s robust sensitivity to a wide spectrum of emotional states under naturalistic language conditions.

With Top-k = 3 ground truth, the overall AUPRC was 0.680 (95% CI: 0.597–0.783) for the decoded group versus 0.483 (95% CI: 0.327–0.660) for the random control, p < 0.01, as shown in Figure 5. The advantage persisted under k = 2/4 sensitivity. The Brier score was 0.409 for the decoded group versus 0.417 for the random group, indicating better overall probabilistic accuracy and calibration of the decoded predictions. Specifically, neutral, nervousness, joy, confusion, embarrassment, surprise, desire, and love each showed a significantly higher per-class AUPRC than random controls, with p < 0.05 for each.

4. Discussion

Conventional emotion decoding methods are constrained by factors such as short-sequence stimuli, coarse-grained categories, and low-context language systems, making it difficult to capture the interaction mechanisms between language and emotion in real-world scenarios [16]. Our study demonstrates fine-grained emotion decoding in long-sequence narratives and provides a methodological basis for investigating the neural mechanisms underlying the interaction between emotion and semantic processing. It may address the bottleneck in dynamic emotion monitoring for depression treatment and cognitive behavioral therapy (CBT) [58]. This work should be interpreted as an exploratory, pilot decoding study. The results demonstrate feasibility of long-sequence semantic and emotion decoding in Chinese but do not establish clinical-grade performance.

In this research, we introduce SED-GPT, a fine-grained emotion decoding framework designed for long-sequence Chinese language processing, which establishes a neural alignment between brain activity and LLM-based semantic vector representations to enable inner speech semantic reconstruction. Moreover, our findings reveal the dynamic interplay between language-related cortical networks and affective neural systems during Chinese emotional semantic processing, providing novel neurocognitive evidence for thought-emotion integration mechanisms.

For semantic decoding, SED-GPT achieved a BERTScore F1 of 0.65, an ED of 12.432 and a WER of 0.924. For emotion decoding, it attained a CS of 0.504 and a JSS of 0.469. All decoding performances significantly surpass random baseline.

The GLM results from both task and resting states demonstrated that during Chinese semantic processing, the coordinated activation of the left frontopolar cortex and right medial frontopolar cortex likely involves top-down attentional modulation and cross-modal semantic integration [59]. Bilateral posterior superior temporal gyri were engaged in acoustic feature analysis and semantic primitive extraction, and bilateral temporal poles participated in abstract semantic representation and integration of social context [60,61]. Concurrently, the suppression of the DMN and other non-task regions (e.g., the default mode network) indicates directed allocation of cognitive resources to core language networks [62]. Activity along the left central sulcus during Chinese lexical ambiguity resolution suggests enhanced phonological working memory [63]. The co-activation of the working memory network (left superior frontal gyrus to right superior parietal lobule) with the attentional network (right precuneus) supports context integration and interference suppression [64]. Cross-modal activation of the left middle occipital gyrus implies that orthography–phonology associations may facilitate automatic mapping from speech to visual representations [65]. In higher level comprehension (e.g., discourse level processing), bilateral frontal poles and middle frontal gyri support context maintenance and are implicated in controlled semantic retrieval [66,67].

PPI analyses showed that the right-hemisphere target seed region exhibited significant functional coupling with the paracingulate gyrus, anterior cingulate cortex (ACC), and insular cortex. From a network perspective, this model can be reasonably explained theoretically. The paracingulate gyrus is primarily involved in self-monitoring and reality monitoring processes [68], the anterior cingulate integrates negative emotions, pain and cognitive control [69], and the insula enriches semantic understanding by integrating abstract semantic information with interoceptive bodily states and socio-emotional experience [70]. This coupling mechanism likely supports multi-level integration of semantics, context, and emotional cues during narrative comprehension [71,72].

The GLM contrast between emotional and neutral words suggests:

Network reorganization and resource redistribution. Emotional words were associated with widespread changes across cortical networks. High social value words (e.g., praise) activate an integrated network of empathic, motor simulation and evaluation, while activity in general executive control regions is significantly reduced [57].
Embodied simulation mechanism. In the joy condition, significant activation was observed in the left and right precentral gyrus, which may involve the recruitment of oral and facial motor representations. In the praise condition, notable activation was detected in the postcentral gyrus and the anterior supramarginal gyrus, reflecting the engagement of somatosensory and speech-related pathways. These findings support the notion that abstract emotions participate in sensory-motor representation mapping [56].
The emotion-visual imagery coupling mechanism. Positive emotions (e.g., joy, admiration) and negative emotions (e.g., fear) were associated with activation in the inferior occipital cortex, cuneus and right higher-order visual areas, revealing the multi-level integration of vivid mental imagery with emotional processing [73].
The self/others reference and value assessment mechanism. Emotional words (e.g., admiration) activated the dorsomedial prefrontal cortex and superior frontal gyrus, reflecting metacognitive simulation of self and others in complex social emotions [74].
Suppression patterns of specific emotions. Emotional words (e.g., criticism) elicited large-scale deactivation in visual-semantic, sensorimotor and executive/metacognitive networks (e.g., middle frontal gyrus, frontal pole). This pattern may reflect down-regulation of DMN processing to concentrate cognitive resources on affective synesthesia and social reasoning network [75].

Fine-grained emotion decoding results indicate that the proposed method can distinguish 14 emotional states from brain activity. Notably, negative emotions (anger, disgust, fear, grief, sadness) exhibit greater decoding accuracy, which may be attributed to humans’ preferential processing of negative stimuli. During disgust-related words processing, participants experienced embodied interoceptive imagery (e.g., nausea), activating the right somatosensory cortex (postcentral gyrus) [76]. Concurrently, stronger engagement of higher-order cognitive regions (e.g., right frontal pole) was required to evaluate and regulate this negative emotional response (e.g., suppressing the “gagging” impulse) [76]. Sadness-related words can cause significant suppression in right lateral occipital cortex, right precuneus and right frontal pole. It indicates that the subjects exhibited weakened imagery, spatial association, and metacognitive processing under sad emotions [77]. Humans allocate more attentional, perceptual, learning and memory resources to negative stimuli, resulting in stronger and more stable neural responses with a higher signal-to-noise ratio, which facilitates emotion decoding [78].

Several limitations must be acknowledged:

The current study relied on translating Chinese transcripts into English to obtain word-level GPT-2 embeddings. Although we performed temporal alignment to minimize distortion, translation may still introduce semantic drift and reduce ecological validity. Future work will explore Chinese-native LLMs with word-level tokenization to avoid this bias.
The relatively small sample size and homogeneous participant pool (healthy young adults) may limit the generalizability of the findings to other populations, such as clinical cohorts with depression or anxiety disorders.
GLM and PPI analyses reveal correlational relationships rather than causal mechanisms. Follow-up studies incorporating brain stimulation or lesion models are needed to test the causal role of specific regions.
The study did not perform external cross-dataset validation, limiting the generalizability of decoding frameworks across different acquisition sites, scanners, and participant groups.

5. Conclusions

This exploratory study provides feasibility evidence that extended textual semantics and multilabel emotions can be decoded from fMRI of Chinese auditory narratives by aligning brain responses with large-language-model representations. The results show non-random discrimination of complex affective components in a Chinese context and underscore the promise of Chinese affective brain–computer interfaces. These are not yet clinical-grade results given the small sample (n = 8) and absence of external validation. Accordingly, potential applications—such as adjuncts to depression treatment or cognitive behavioral therapy—should be regarded as well-motivated hypotheses, pending larger and more diverse cohorts, as well as independent multi-site validation. Future work will pursue suitable datasets to enable larger and more diverse cohort studies.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app152011100/s1, Table S1: Activation and deactivation in response to emotional versus neutral words; Table S2: CLAIM Checklist.

Author Contributions

Conceptualization, W.C. and L.M.; methodology, W.C. and Z.W.; software, W.C. and Z.W.; validation, W.C. and Z.W.; formal analysis, W.C. and Z.W.; investigation, W.C. and L.M.; resources, L.M.; data curation, W.C. and Z.W.; writing—original draft preparation, W.C.; writing—review and editing, W.C. and Z.W.; visualization, W.C.; supervision, L.M.; project administration, L.M.; W.C. and Z.W. contributed equally to this work and should be considered co-first author. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used and analyzed in the current study are available from https://openneuro.org/datasets/ds004078/versions/1.2.1 (accessed on 6 December 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

FP	Frontal Pole
PreCG	Precentral Gyrus
PoCG	Postcentral Gyrus
MFG	Middle Frontal Gyrus
SFG	Superior Frontal Gyrus
IFGpt	Inferior Frontal Gyrus, pars triangularis
IFGoper	Inferior Frontal Gyrus, pars opercularis
ACC	Anterior Cingulate Cortex
ParaCG	Paracingulate Gyrus
PCC	Posterior Cingulate Cortex
PCG	Posterior Cingulate Gyrus
SMA	Supplementary Motor Area
FMC	Frontomedial Cortex
FOC	Frontal Orbital Cortex
COC	Central Opercular Cortex
SCC	Supracalcarine Cortex
LOCsup	Lateral Occipital Cortex, superior division
LOCid	Lateral Occipital Cortex, inferior division
LOCs	Lateral Occipital Cortex, superior division
OP	Occipital Pole
CunC	Cuneal Cortex
PCunC	Precuneous Cortex
MTG	Middle Temporal Gyrus
MTGto	Middle Temporal Gyrus, temporooccipital part
MTGpd	Middle Temporal Gyrus, posterior division
MTGad	Middle Temporal Gyrus, anterior division
STG	Superior Temporal Gyrus
STGpd	Superior Temporal Gyrus, posterior division
TP	Temporal Pole
ITGto	Inferior Temporal Gyrus, temporooccipital part
ITGpd	Inferior Temporal Gyrus, posterior division
AG	Angular Gyrus
SMGa	Supramarginal Gyrus, anterior division
SMGp	Supramarginal Gyrus, posterior division

References

Fernyhough, C.; Borghi, A.M. Inner speech as language process and cognitive tool. Trends Cogn. Sci. 2023, 27, 1180–1193. [Google Scholar] [CrossRef]
Nummenmaa, L.; Saarimäki, H.; Glerean, E.; Gotsopoulos, A.; Jääskeläinen, I.P.; Hari, R.; Sams, M. Emotional speech synchronizes brains across listeners and engages large-scale dynamic brain networks. Neuroimage 2014, 102, 498–509. [Google Scholar] [CrossRef] [PubMed]
Etkin, A.; Egner, T.; Kalisch, R. Emotional processing in anterior cingulate and medial prefrontal cortex. Trends Cogn. Sci. 2011, 15, 85–93. [Google Scholar] [CrossRef]
Devinsky, O.; Morrell, M.J.; Vogt, B.A. Contributions of anterior cingulate cortex to behaviour. Brain 1995, 118, 279–306. [Google Scholar] [CrossRef]
Binder, J.R.; Conant, L.L.; Humphries, C.J.; Fernandino, L.; Simons, S.B.; Aguilar, M.; Desai, R.H. Toward a brain-based componential semantic representation. Cogn. Neuropsychol. 2016, 33, 130–174. [Google Scholar] [CrossRef]
Lenci, A.; Lebani, G.E.; Passaro, L.C. The emotions of abstract words: A distributional semantic analysis. Top. Cogn. Sci. 2018, 10, 550–572. [Google Scholar] [CrossRef]
Satpute, A.B.; Lindquist, K.A. At the neural intersection between language and emotion. Affect. Sci. 2021, 2, 207–220. [Google Scholar] [CrossRef]
Gaillard, R.; Del Cul, A.; Naccache, L.; Vinckier, F.; Cohen, L.; Dehaene, S. Nonconscious semantic processing of emotional words modulates conscious access. Proc. Natl. Acad. Sci. USA 2006, 103, 7524–7529. [Google Scholar] [CrossRef]
Kuperberg, G.R.; Deckersbach, T.; Holt, D.J.; Goff, D.; West, W.C. Increased temporal and prefrontal activity in response to semantic associations in schizophrenia. Arch. Gen. Psychiatry 2007, 64, 138–151. [Google Scholar] [CrossRef]
Zhu, X.; Guo, C.; Feng, H.; Wang, X.; Wang, R. A review of key technologies for emotion analysis using multimodal information. Cogn. Comput. 2024, 16, 1504–1530. [Google Scholar] [CrossRef]
Zhang, Y.; Li, Y.; Yu, Z.; Tang, F.; Lu, Z.; Li, C.; Dang, K.; Su, J. Decoding the flow: Causemotion for emotional causality analysis in long-form conversations. arXiv 2025, arXiv:2501.00778. [Google Scholar]
Chaudhary, U. Non-invasive brain signal acquisition techniques: Exploring EEG, EOG, fNIRS, fMRI, MEG, and fUS. In Expanding Senses Using Neurotechnology: Volume 1—Foundation of Brain-Computer Interface Technology; Springer Nature: Cham, Switzerland, 2025; pp. 25–80. [Google Scholar] [CrossRef]
Winter, W.R.; Nunez, P.L.; Ding, J.; Srinivasan, R. Comparison of the effect of volume conduction on EEG coherence with the effect of field spread on MEG coherence. Stat. Med. 2007, 26, 3946–3957. [Google Scholar] [CrossRef] [PubMed]
Wilcox, T.; Biondi, M. fNIRS in the developmental sciences. Wiley Interdiscip. Rev. Cogn. Sci. 2015, 6, 263–283. [Google Scholar] [CrossRef] [PubMed]
deCharms, C.R. Applications of real-time fMRI. Nat. Rev. Neurosci. 2008, 9, 720–729. [Google Scholar] [CrossRef] [PubMed]
Kassam, K.S.; Markey, A.R.; Cherkassky, V.L.; Loewenstein, G.; Just, M.A. Identifying emotions on the basis of neural activation. PLoS ONE 2013, 8, e66032. [Google Scholar] [CrossRef]
Saarimäki, H.; Gotsopoulos, A.; Jääskeläinen, I.P.; Lampinen, J.; Vuilleumier, P.; Hari, R.; Sams, M.; Nummenmaa, L. Discrete neural signatures of basic emotions. Cereb. Cortex 2016, 26, 2563–2573. [Google Scholar] [CrossRef]
Kragel, P.A.; LaBar, K.S. Decoding the nature of emotion in the brain. Trends Cogn. Sci. 2016, 20, 444–455. [Google Scholar] [CrossRef]
Wei, J.; Tay, Y.; Bommasani, R.; Raffel, C.; Zoph, B.; Borgeaud, S.; Yogatama, D.; Bosma, M.; Zhou, D.; Metzler, D.; et al. Emergent abilities of large language models. arXiv 2022, arXiv:2206.07682. [Google Scholar] [CrossRef]
Tang, J.; LeBel, A.; Jain, S.; Huth, A.G. Semantic reconstruction of continuous language from non-invasive brain recordings. Nat. Neurosci. 2023, 26, 858–866. [Google Scholar] [CrossRef]
Ye, Z.; Ai, Q.; Liu, Y.; de Rijke, M.; Zhang, M.; Lioma, C.; Ruotsalo, T. Generative language reconstruction from brain recordings. Commun. Biol. 2025, 8, 346. [Google Scholar] [CrossRef]
Liu, P.; Dong, G.; Guo, D.; Li, K.; Li, F.; Yang, X.; Wang, M.; Ying, X. A survey on fMRI-based brain decoding for reconstructing multimodal stimuli. arXiv 2025, arXiv:2503.15978. [Google Scholar]
Wang, S.; Zhang, X.; Zhang, J.; Zong, C. A synchronized multimodal neuroimaging dataset for studying brain language processing. Sci. Data 2022, 9, 590. [Google Scholar] [CrossRef] [PubMed]
Pajula, J.; Tohka, J. How many is enough? Effect of sample size in inter-subject correlation analysis of fMRI. Comput. Intell. Neurosci. 2016, 2016, 2094601. [Google Scholar] [CrossRef] [PubMed]
Baker, D.H.; Vilidaite, G.; Lygo, F.A.; Smith, A.K.; Flack, T.R.; Gouws, A.D.; Andrews, T.J. Power contours: Optimising sample size and precision in experimental psychology and human neuroscience. Psychol. Methods 2021, 26, 295. [Google Scholar] [CrossRef]
Di, X.; Zhang, Z.; Biswal, B.B. Understanding psychophysiological interaction and its relations to beta series correlation. Brain Imaging Behav. 2021, 15, 958–973. [Google Scholar] [CrossRef]
Roiser, J.P.; Linden, D.E.; Gorno-Tempinin, M.L.; Moran, R.J.; Dickerson, B.C.; Grafton, S.T. Minimum statistical standards for submissions to Neuroimage: Clinical. Neuroimage Clin. 2016, 12, 1045. [Google Scholar] [CrossRef]
Xu, L.; Lin, H.; Pan, Y.; Chen, J. Constructing the affective lexicon ontology. J. China Soc. Sci. Tech. Inf. 2008, 27, 180–185. [Google Scholar]
Ge, J.; Gao, J.H. A review of functional MRI application for brain research of Chinese language processing. Magn. Reson. Lett. 2023, 3, 1–13. [Google Scholar] [CrossRef]
Zhang, Q.; Wang, H.; Luo, C.; Zhang, J.; Jin, Z.; Li, L. The neural basis of semantic cognition in Mandarin Chinese: A combined fMRI and TMS study. Hum. Brain Mapp. 2019, 40, 5412–5423. [Google Scholar] [CrossRef]
Demszky, D.; Movshovitz-Attias, D.; Ko, J.; Cowen, A.; Nemade, G.; Ravi, S. GoEmotions: A dataset of fine-grained emotions. arXiv 2020, arXiv:2005.00547. [Google Scholar]
Si, C.; Zhang, Z.; Chen, Y.; Qi, F.; Wang, X.; Liu, Z.; Wang, Y.; Liu, Q.; Sun, M. Sub-character tokenization for Chinese pretrained language models. Trans. Assoc. Comput. Linguist. 2023, 11, 469–487. [Google Scholar] [CrossRef]
Zhang, Z.; Han, X.; Zhou, H.; Ke, P.; Gu, Y.; Ye, D.; Qin, Y.; Su, Y.; Ji, H.; Guan, J.; et al. CPM: A large-scale generative Chinese pre-trained language model. AI Open 2021, 2, 93–99. [Google Scholar] [CrossRef]
Ma, L.; Cui, W.; Yang, W.; Wang, Z. Noninvasive decoding and reconstruction of continuous Chinese language semantics. J. Data Acquis. Process. 2025, 40, 616–636. [Google Scholar] [CrossRef]
Xun, G.R.E. Word Boundary Information and Chinese Word Segmentation. Int. J. Asian Lang. Process. 2012, 23, 15–32. [Google Scholar]
Binder, J.R.; Desai, R.H.; Graves, W.W.; Conant, L.L. Where is the semantic system? A critical review and meta-analysis of 120 functional neuroimaging studies. Cereb. Cortex 2009, 19, 2767–2796. [Google Scholar] [CrossRef]
Caucheteux, C.; Gramfort, A.; King, J.R. Disentangling syntax and semantics in the brain with deep networks. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; Volume 139, pp. 1336–1348. [Google Scholar]
Hermann, K.M.; Kocisky, T.; Grefenstette, E.; Espeholt, L.; Kay, W.; Suleyman, M.; Blunsom, P. Teaching machines to read and comprehend. Adv. Neural Inf. Process. Syst. 2015, 28, 1693–1701. [Google Scholar]
Jain, S.; Huth, A. Incorporating context into language encoding models for fMRI. Adv. Neural Inf. Process. Syst. 2018, 31, 6629–6638. [Google Scholar]
Deniz, F.; Nunez-Elizalde, A.O.; Huth, A.G.; Gallant, J.L. The representation of semantic information across human cerebral cortex during listening versus reading is invariant to stimulus modality. J. Neurosci. 2019, 39, 7722–7736. [Google Scholar] [CrossRef]
Benara, V.; Singh, C.; Morris, J.X.; Antonello, R.J.; Stoica, I.; Huth, A.G.; Gao, J. Crafting interpretable embeddings for language neuroscience by asking LLMs questions. Adv. Neural Inf. Process. Syst. 2024, 37, 124137. [Google Scholar]
Lin, J.; Nogueira, R.; Yates, A. Pretrained Transformers for Text Ranking: Bert and Beyond; Springer Nature: Berlin/Heidelberg, Germany, 2022. [Google Scholar] [CrossRef]
Elmore, K.L.; Richman, M.B. Euclidean distance as a similarity metric for principal component analysis. Mon. Weather Rev. 2001, 129, 540–549. [Google Scholar] [CrossRef]
Ali, A.; Renals, S. Word error rate estimation for speech recognition: E-WER. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia, 15–20 July 2018; Association for Computational Linguistics (ACL): Stroudsburg, PA, USA, 2018; pp. 20–24. [Google Scholar] [CrossRef]
Xia, P.; Zhang, L.; Li, F. Learning similarity with cosine similarity ensemble. Inf. Sci. 2015, 307, 39–52. [Google Scholar] [CrossRef]
Nielsen, F. On a generalization of the Jensen–Shannon divergence and the Jensen–Shannon centroid. Entropy 2020, 22, 221. [Google Scholar] [CrossRef] [PubMed]
Podani, J.; Ricotta, C.; Schmera, D. A general framework for analyzing beta diversity, nestedness and related community-level phenomena based on abundance data. Ecol. Complex. 2013, 15, 52–61. [Google Scholar] [CrossRef]
Chen, S.; Chen, M.; Wang, X.; Liu, X.; Liu, B.; Ming, D. Brain–computer interfaces in 2023–2024. Brain-X 2025, 3, e70024. [Google Scholar] [CrossRef]
Huth, A.G.; de Heer, W.A.; Griffiths, T.L.; Theunissen, F.E.; Gallant, J.L. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature 2016, 532, 453–458. [Google Scholar] [CrossRef]
Giacobbe, C.; Raimo, S.; Cropano, M.; Santangelo, G. Neural correlates of embodied action language processing: A systematic review and meta-analytic study. Brain Imaging Behav. 2022, 16, 2353–2374. [Google Scholar] [CrossRef]
Piai, V.; Roelofs, A.; Acheson, D.J.; Takashima, A. Attention for speaking: Domain-general control from the anterior cingulate cortex in spoken word production. Front. Hum. Neurosci. 2013, 7, 832. [Google Scholar] [CrossRef]
Epstein, R.A. Parahippocampal and retrosplenial contributions to human spatial navigation. Trends Cogn. Sci. 2008, 12, 388–396. [Google Scholar] [CrossRef]
Jackson, R.L.; Hoffman, P.; Pobric, G.; Ralph, M.A.L. The semantic network at work and rest: Differential connectivity of anterior temporal lobe subregions. J. Neurosci. 2016, 36, 1490–1501. [Google Scholar] [CrossRef]
Gennari, S.P.; Millman, R.E.; Hymers, M.; Mattys, S.L. Anterior paracingulate and cingulate cortex mediates the effects of cognitive load on speech sound discrimination. Neuroimage 2018, 178, 735–743. [Google Scholar] [CrossRef]
Ballotta, D.; Maramotti, R.; Borelli, E.; Lui, F.; Pagnoni, G. Neural correlates of emotional valence for faces and words. Front. Psychol. 2023, 14, 1055054. [Google Scholar] [CrossRef]
Hauk, O.; Johnsrude, I.; Pulvermüller, F. Somatotopic representation of action words in human motor and premotor cortex. Neuron 2004, 41, 301–307. [Google Scholar] [CrossRef] [PubMed]
Citron, F.M.M. Neural correlates of written emotion word processing: A review of recent electrophysiological and hemodynamic neuroimaging studies. Brain Lang. 2012, 122, 211–226. [Google Scholar] [CrossRef] [PubMed]
Ritchey, M.; Dolcos, F.; Eddington, K.M.; Strauman, T.J.; Cabeza, R. Neural correlates of emotional processing in depression: Changes with cognitive behavioral therapy and predictors of treatment response. J. Psychiatr. Res. 2011, 45, 577–587. [Google Scholar] [CrossRef] [PubMed]
van Ackeren, M.J.; Rueschemeyer, S.A. Cross-modal integration of lexical-semantic features during word processing: Evidence from oscillatory dynamics during EEG. PLoS ONE 2014, 9, e101042. [Google Scholar] [CrossRef]
Davey, J.; Thompson, H.E.; Hallam, G.; Karapanagiotidis, T.; Murphy, C.; De Caso, I.; Krieger-Redwood, K.; Bernhardt, B.C.; Smallwood, J.; Jefferies, E. Exploring the role of the posterior middle temporal gyrus in semantic cognition: Integration of anterior temporal lobe with executive processes. Neuroimage 2016, 137, 165–177. [Google Scholar] [CrossRef]
Poeppel, D.; Idsardi, W.J.; Van Wassenhove, V. Speech perception at the interface of neurobiology and linguistics. Philos. Trans. R. Soc. B Biol. Sci. 2008, 363, 1071–1086. [Google Scholar] [CrossRef]
Eysenck, M.W.; Moser, J.S.; Derakshan, N.; Hepsomali, P.; Allen, P. A neurocognitive account of attentional control theory: How does trait anxiety affect the brain’s attentional networks? Cogn. Emot. 2023, 37, 220–237. [Google Scholar] [CrossRef]
Xue, G.; Dong, Q.; Jin, Z.; Chen, C. Mapping of verbal working memory in nonfluent Chinese–English bilinguals with functional MRI. Neuroimage 2004, 22, 1–10. [Google Scholar] [CrossRef]
Wu, C.Y.; Ho, M.H.R.; Chen, S.H.A. A meta-analysis of fMRI studies on Chinese orthographic, phonological, and semantic processing. Neuroimage 2012, 63, 381–391. [Google Scholar] [CrossRef]
Booth, J.R.; Burman, D.D.; Meyer, J.R.; Gitelman, D.R.; Parrish, T.B.; Mesulam, M.M. Development of brain mechanisms for processing orthographic and phonologic representations. J. Cogn. Neurosci. 2004, 16, 1234–1249. [Google Scholar] [CrossRef]
Yang, X.; Zhang, X.; Yang, Y.; Lin, N. How context features modulate the involvement of the working memory system during discourse comprehension. Neuropsychologia 2018, 111, 36–44. [Google Scholar] [CrossRef]
Perfetti, C.A.; Frishkoff, G.A. The neural bases of text and discourse processing. In Handbook of the Neuroscience of Language; Elsevier: Amsterdam, The Netherlands, 2008; Volume 2, pp. 165–174. [Google Scholar]
Lavallé, L.; Brunelin, J.; Jardri, R.; Haesebaert, F.; Mondino, M. The neural signature of reality-monitoring: A meta-analysis of functional neuroimaging studies. Hum. Brain Mapp. 2023, 44, 4372–4389. [Google Scholar] [CrossRef] [PubMed]
Shackman, A.J.; Salomons, T.V.; Slagter, H.A.; Fox, A.S.; Winter, J.J.; Davidson, R.J. The integration of negative affect, pain and cognitive control in the cingulate cortex. Nat. Rev. Neurosci. 2011, 12, 154–167. [Google Scholar] [CrossRef] [PubMed]
Craig, A.D. How do you feel—Now? The anterior insula and human awareness. Nat. Rev. Neurosci. 2009, 10, 59–70. [Google Scholar] [CrossRef]
Simony, E.; Honey, C.J.; Chen, J.; Lositsky, O.; Yeshurun, Y.; Wiesel, A.; Hasson, U. Dynamic reconfiguration of the default mode network during narrative comprehension. Nat. Commun. 2016, 7, 12141. [Google Scholar] [CrossRef]
Vaccaro, A.G.; Scott, B.; Gimbel, S.I.; Kaplan, J.T. Functional brain connectivity during narrative processing relates to transportation and story influence. Front. Hum. Neurosci. 2021, 15, 665319. [Google Scholar] [CrossRef]
Sabatinelli, D.; Fortune, E.E.; Li, Q.; Siddiqui, A.; Krafft, C.; Oliver, W.T.; Beck, S.; Jeffries, J. Emotional perception: Meta-analyses of face and natural scene processing. Neuroimage 2011, 54, 2524–2533. [Google Scholar] [CrossRef]
Immordino-Yang, M.H.; McColl, A.; Damasio, H.; Damasio, A. Neural correlates of admiration and compassion. Proc. Natl. Acad. Sci. USA 2009, 106, 8021–8026. [Google Scholar] [CrossRef]
Anticevic, A.; Cole, M.W.; Murray, J.D.; Corlett, P.R.; Wang, X.-J.; Krystal, J.H. The role of default network deactivation in cognition and disease. Trends Cogn. Sci. 2012, 16, 584–592. [Google Scholar] [CrossRef]
Wicker, B.; Keysers, C.; Plailly, J.; Royet, J.-P.; Gallese, V.; Rizzolatti, G. Both of us disgusted in my insula: The common neural basis of seeing and feeling disgust. Neuron 2003, 40, 655–664. [Google Scholar] [CrossRef]
Maddock, R.J. The retrosplenial cortex and emotion: New insights from functional neuroimaging of the human brain. Trends Neurosci. 1999, 22, 310–316. [Google Scholar] [CrossRef]
Baumeister, R.F.; Bratslavsky, E.; Finkenauer, C.; Vohs, K.D. Bad is stronger than good. Rev. Gen. Psychol. 2001, 5, 323–370. [Google Scholar] [CrossRef]

Figure 1. Overall architecture of the SED-GPT model.

Figure 2. Activation comparison between Chinese semantic perception task state and resting state. (A) Sagittal view; (B) Coronal view; (C) Axial view; (D) Left hemisphere; (E) Right hemisphere.

Figure 3. PPI network analysis results. A represents the posterior part of the left superior temporal gyrus, B represents the posterior part of the right superior temporal gyrus, C represents the left frontal pole, and D represents the right frontal pole, and points with similar projection distances in the transverse plane are classified into the same cluster.

Figure 4. The chart illustrates the per-emotion advantage of the experimental group (EXP) over the random baseline (RM) for fine-grained emotion decoding. Each point represents the difference between EXP and RM (EXP−RM). Positive values indicate better decoding performance in EXP. Asterisks denote significance levels (* p < 0.05, ** p < 0.01, *** p < 0.001) based on Mann–Whitney U tests.

Figure 5. PR curves (Top-k = 3) of the decoded group and the random group.

Table 1. Demographic information of the eight participants.

ID	Age	Sex	ID	Age	Sex
01	26	M	07	26	F
02	30	F	08	23	M
05	26	M	10	25	M
06	25	M	12	24	M

Table 2. Exact composition of the 30 held-out test runs.

Subject	Runs
Sub01	09, 23, 31
Sub02	28, 29, 38, 40, 41
Sub05	19, 29, 51, 53
Sub06	13, 22, 25, 50
Sub07	03, 15, 32, 42, 46
Sub08	25, 52, 56
Sub10	12, 28, 44, 47
Sub12	02, 27

Table 3. Activation and deactivation clusters in response to task versus resting state.

Regions	x	y	z	Z-Peak	Size
Activated brain regions
FP	−26	62	−8	6.15	2225
STGpd	−60	−10	8	7.26	990
STGpd	56	4	−2	7.89	880
FP	26	68	−2	5.26	520
MFG	52	18	24	6.20	180
FP	−56	4	−2	6.41	112
Deactivated brain regions
FP	36	10	34	−3.09	730
SPL	30	−40	78	−3.09	346
PCunC	0	−38	52	−3.09	280
PoCG	−34	−48	76	−3.09	269
ParaCG	0	44	14	−3.10	249
PreCG	−52	−14	50	−3.09	236

Table 4. PPI functional connectivity index table.

Index	Regions	Index	Regions	Index	Regions
0	FMC, ParaCG	27	PreCG	54	IFGpt
1	MTG, PoCG, STG	28	LOCsup	55	ACC
2	PCG	29	MFG	56	IFGoper
3	FP	30	FOC, SFG	57	FP
4	FP	31	PCunC	58	COC, PreCG
5	TP	32	PreCG	59	FOC, SFG
6	PoCG	33	PCC	60	MTGto
7	SPL	34	PoCG	61	ITGpd, MTGpd
8	PreCG	35	PreCG	62	MFG
9	LOCsup	36	FOC, MFG	63	IFGoper
10	PreCG	37	IFGpt	64	OP
11	SCC, PoCG	38	ITGto	65	LOCsup
12	PoCG	39	FP	66	FP
13	PCunC	40	MFG	67	FP
14	PoCG	41	FP	68	IC, PoCG
15	COC, FP	42	AG	69	OP
16	PCG, PreCG	43	ParaCG	70	SCC
17	ACC, ParaCG	44	SFG	71	IC
18	SMA	45	pITG, pSMG	72	FOC
19	SPL	46	SPL	73	ACC, ParaCG
20	PCunC	47	FP	74	COC
21	FP, SFG	48	FP	75	MTGpd
22	SFG	49	PoCG	76	LOCsup
23	PoCG	50	ACC	77	PoCG
24	PoCG	51	MTGad, PreCG	78	FP
25	FP	52	FP, ParaCG
26	ParaCG, SFG	53	PoCG

Table 5. Comparison of semantic decoding performance.

Metrics	EXP	RM	U	p
BERT	0.650 ± 0.151	0.326 ± 0.074	5872	p < 0.001
ED	12.432 ± 2.896	14.528 ± 0.673	1767	p < 0.001
WER	0.924 ± 0.028	0.989 ± 0.051	862	p < 0.001

EXP denotes the experimental group. RM denotes the random group. BERT denotes BERTScore F1.

Table 6. Sensitivity analysis with available Chinese LLM embeddings.

Metrics	EXP	Chinese LLM	U	p
BERT	0.650 ± 0.151	0.529 ± 0.009	749	p < 0.001
ED	12.432 ± 2.896	14.224 ± 0.512	410	p = 0.559
WER	0.924 ± 0.028	0.958 ± 0.045	224	p < 0.001

EXP denotes the experimental group. Chinese LLM denotes the group using Chinese-native LLM Qwen2 Base to construct the decoder. BERT denotes BERTScore F1.

Table 7. Comparison of Emotion Recognition Similarity Metrics.

Metrics	EXP	RM	U	p
CS	0.504 ± 0.348	0.233 ± 0.248	645	p < 0.05
JSS	0.469 ± 0.227	0.323 ± 0.148	620	p < 0.05

EXP denotes the experimental group. RM denotes the random group.

Table 8. Comparison of fine-grained emotion decoding.

Emotions	EXP	RM	U	p
admiration	0.128 ± 0.208	0.313 ± 0.299	238	p < 0.05
amusement	0.311 ± 0.244	0.211 ± 0.154	539	p = 0.191
anger	0.246 ± 0.277	0.060 ± 0.117	654	p < 0.05
annoyance	0.429 ± 0.312	0.381 ± 0.325	488	p = 0.579
approval	0.323 ± 0.317	0.399 ± 0.255	352	p = 0.149
caring	0.260 ± 0.273	0.088 ± 0.098	589.5	p < 0.05
confusion	0.527 ± 0.304	0.335 ± 0.220	609	p < 0.05
curiosity	0.201 ± 0.258	0.175 ± 0.165	388	p = 0.363
desire	0.350 ± 0.255	0.213 ± 0.215	623	p < 0.05
disappointment	0.200 ± 0.212	0.146 ± 0.245	539	p = 0.191
disapproval	0.293 ± 0.232	0.322 ± 0.289	470.5	p = 0.767
disgust	0.506 ± 0.277	0.233 ± 0.242	699	p < 0.001
embarrassment	0.442 ± 0.302	0.175 ± 0.152	672.5	p < 0.001
excitement	0.354 ± 0.217	0.291 ± 0.171	531	p = 0.234
fear	0.420 ± 0.292	0.119 ± 0.150	732	p < 0.001
gratitude	0.271 ± 0.302	0.232 ± 0.241	456	p = 0.935
grief	0.302 ± 0.287	0.052 ± 0.091	772.5	p < 0.001
joy	0.344 ± 0.324	0.145 ± 0.187	646	p < 0.01
love	0.212 ± 0.256	0.061 ± 0.098	622	p < 0.05
nervousness	0.107 ± 0.107	0.028 ± 0.033	760	p < 0.001
neutral	0.358 ± 0.308	0.096 ± 0.174	733	p < 0.001
optimism	0.236 ± 0.242	0.231 ± 0.228	429	p = 0.762
pride	0.192 ± 0.176	0.199 ± 0.244	471	p = 0.762
realization	0.035 ± 0.059	0.039 ± 0.060	450.5	p = 0.997
relief	0.331 ± 0.284	0.255 ± 0.219	497	p = 0.492
remorse	0.276 ± 0.226	0.120 ± 0.153	659	p < 0.01
sadness	0.345 ± 0.362	0.078 ± 0.186	699.5	p < 0.001
surprise	0.370 ± 0.303	0.208 ± 0.210	579	p = 0.057

EXP denotes the experimental group. RM denotes the random group.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cui, W.; Wang, Z.; Ma, L. SED-GPT: A Non-Invasive Method for Long-Sequence Fine-Grained Semantics and Emotions Decoding. Appl. Sci. 2025, 15, 11100. https://doi.org/10.3390/app152011100

AMA Style

Cui W, Wang Z, Ma L. SED-GPT: A Non-Invasive Method for Long-Sequence Fine-Grained Semantics and Emotions Decoding. Applied Sciences. 2025; 15(20):11100. https://doi.org/10.3390/app152011100

Chicago/Turabian Style

Cui, Wenhao, Zhaoxin Wang, and Lei Ma. 2025. "SED-GPT: A Non-Invasive Method for Long-Sequence Fine-Grained Semantics and Emotions Decoding" Applied Sciences 15, no. 20: 11100. https://doi.org/10.3390/app152011100

APA Style

Cui, W., Wang, Z., & Ma, L. (2025). SED-GPT: A Non-Invasive Method for Long-Sequence Fine-Grained Semantics and Emotions Decoding. Applied Sciences, 15(20), 11100. https://doi.org/10.3390/app152011100

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SED-GPT: A Non-Invasive Method for Long-Sequence Fine-Grained Semantics and Emotions Decoding

Abstract

Featured Application

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. Data Preprocessing

2.3. SED-GPT

2.4. Fine-Tuning of LLMs

2.5. Semantic to Brain Response Conversion Module

2.6. Evaluation Metrics

2.6.1. Semantic Similarity

2.6.2. Emotional Similarity

3. Results

3.1. Brain Activation and Functional Connectivity of Chinese Semantic Perception

3.1.1. Brain Activation of Chinese Semantic Perception

3.1.2. Functional Connectivity of Chinese Semantic Perception

3.2. Brain Regions Activated by Emotional Words

3.3. Semantic Decoding Performance

3.4. Emotion Decoding Performance

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI