TPHFC-Net—A Triple-Path Heterogeneous Feature Collaboration Network for Enhancing Motor Imagery Classification

Jin, Yuchen; Dou, Chunxu; Wang, Dingran; Liu, Chao

doi:10.3390/technologies14020096

Open AccessArticle

TPHFC-Net—A Triple-Path Heterogeneous Feature Collaboration Network for Enhancing Motor Imagery Classification

¹

School of Intelligent Science and Control Engineering, Jinling Institute of Technology, Nanjing 211199, China

²

College of Information Science and Technology & Artificial Intelligence, Nanjing Forestry University, Nanjing 210037, China

^*

Author to whom correspondence should be addressed.

Technologies 2026, 14(2), 96; https://doi.org/10.3390/technologies14020096

Submission received: 18 November 2025 / Revised: 3 January 2026 / Accepted: 7 January 2026 / Published: 2 February 2026

(This article belongs to the Special Issue Applications of Artificial Intelligence in Healthcare and Information Processing)

Download

Browse Figures

Versions Notes

Abstract

Electroencephalography-based motor imagery (EEG-MI) classification is a cornerstone of Brain–Computer Interface (BCI) systems, enabling the identification of motor intentions by decoding neural patterns within EEG signals. However, conventional methods, predominantly reliant on convolutional neural networks (CNNs), are proficient at extracting local temporal features but struggle to capture long-range dependencies and global contextual information. To address this limitation, we propose a Triple-path Heterogeneous Feature Collaboration Network (TPHFC-Net), which synergistically integrates three distinct temporal modeling pathways: a multi-scale Temporal Convolutional Network (TCN) to capture fine-grained local dynamics, a Transformer branch to model global dependencies via multi-head self-attention, and a Long Short-Term Memory (LSTM) network to track sequential state evolution. These heterogeneous features are subsequently fused adaptively by a dynamic gating mechanism. In addition, the model’s robustness and discriminative power are further augmented by a lightweight front-end denoising diffusion model for enhanced noisy feature representation and a back-end prototype attention mechanism to bolster the inter-class separability of non-stationary EEG features. Extensive experiments on the BCI Competition IV-2a and IV-2b datasets validate the superiority of the proposed model, achieving mean classification accuracies of 82.45% and 89.49%, respectively, on the subject-dependent MI task and significantly outperforming existing mainstream baselines.

Keywords:

Brain-Computer Interface (BCI); motor imagery; Temporal Convolutional Network (TCN); Transformer; Long Short-Term Memory (LSTM)

1. Introduction

Brain–Computer Interface (BCI) technology establishes a direct communication pathway between the human brain and external devices, emerging as a transformative force in fields such as rehabilitative engineering, robotics, and cognitive neuroscience [1]. As a prominent paradigm within this domain, Motor Imagery-based BCI (MI-BCI) operates by decoding the specific neural patterns generated during imagined limb movements, while concurrently holding significant promise for practical applications [2,3]. Among various modalities for monitoring neural activities, electroencephalography (EEG) is the predominant method due to its non-invasive nature, which records bioelectric signals from cortical neurons via scalp-mounted electrodes and simultaneously offers an ideal balance of a high safety profile, excellent temporal resolution, and low cost, thus establishing it as the standard for both research and application in MI-BCI [4,5].

The technical workflow of an MI-BCI system typically involves the acquisition and preprocessing of EEG signals from a specific mental task, followed by feature extraction and classification to recognize the user’s motor intent. Despite its structured process, EEG-MI classification faces significant challenges arising from the inherent electrophysiological properties of EEG signals. At the signal level, the challenge lies in the extremely low signal-to-noise ratio (SNR), which is often below −10 dB. This poor SNR arises because the microvolt-level motor-related cortical potentials (MRCPs) are heavily contaminated by artifacts such as electromyography (EMG) and ocular interference, resulting in a weak signal that impairs the efficacy of feature extraction and classification [6]. At the feature level, the key event-related desynchronization/synchronization (ERD/ERS) patterns exhibit high non-stationarity and substantial inter-subject variability, making it difficult for traditional methods reliant on hand-crafted features like Common Spatial Patterns (CSP) or Power Spectral Density (PSD) to capture these complex non-linear dynamics [7,8]. Most critically, conventional approaches suffer from significant performance degradation across different sessions and subjects, with clinical data revealing accuracy fluctuations of up to 10–20% for such classic classifiers as Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM) [9], thereby severely hampering the practical deployment of MI-BCI systems.

To overcome the heavy reliance on manual feature engineering and the poor generalization of traditional methods, deep learning has emerged as a new paradigm for motor imagery decoding in MI-BCI systems. Unlike their conventional counterparts, deep neural networks can automatically learn complex non-linear mappings and demonstrate superior robustness against inter-subject variability and low SNR. Early explorations, such as Joseph et al.’s 1994 use of neural networks to classify clinical neurophysiological data, paved the way for this shift [10]. Following this work, the advent of novel spatio-temporal convolutional architectures like Shallow ConvNet and Deep ConvNet, which directly extract ERD/ERS features from EEG

μ / β

rhythms, marked the transition of EEG-MI classification to an era of end-to-end modeling [11]. Subsequent research has focused on architectural refinements to advance classification performance. EEGNet introduced depthwise separable convolutions to reduce model complexity [12], while EEG-TCNet enhanced temporal dependency modeling by integrating a Temporal Convolutional Network (TCN) into the EEGNet framework [13]. Further advancements came from models like CIACNet [14], SMT [15], and ASiBLS [16], which augmented the TCN with techniques such as multi-branch structures, multi-scale convolutions, and attention mechanisms. Despite these improvements, the fixed-dilation convolutions of TCNs still struggle to capture the aperiodic rhythms, temporal jitter, and cross-phase latency variations inherent in EEG signals. This limitation has spurred the development of hybrid modeling architectures. One prominent direction involves integrating the Transformer’s self-attention mechanism with the TCN, enabling models like EEG-Conformer [17], M-FANet [18] and MCTD [19] to synergistically model both local rhythms and long-range dependencies. Another approach leverages the Long Short-Term Memory (LSTM) architecture for its capacity to track dynamic state evolution. By combining convolutions with the gated units of LSTM, models such as CNN-LSTM [20] and FBLSTM [21] have improved the representation of phase-like changes in EEG signals, thereby boosting the accuracy of classification.

Owing to their distinct designs, different temporal modeling architectures present unique strengths and limitations in capturing the complex temporal features of EEG signals. TCN, for instance, is capable of efficiently extracting short-term local features via dilated convolutions, making it well-suited for transient responses like ERD/ERS in

μ / β

rhythms. However, its fixed receptive field is ill-equipped for non-uniform rhythmic variations. Transformer excels at modeling global dependencies using self-attention but is less sensitive to abrupt local events. Meanwhile, LSTM adeptly tracks gradual rhythmic trends by modeling state evolution, yet its inherent short-term memory restricts its capacity for global dependency modeling. The complementary nature of these architectures provides a compelling foundation for building robust, high-accuracy classification models. Nevertheless, naive fusion strategies have yielded limited gains. TCN-Transformer hybrids often suffer from poor modular coupling and inadequate fusion of global and local features [22]. Similarly, TCN-LSTM models, constrained by their serial design and parameter redundancy, struggle to simultaneously capture both long-range dependencies and sudden local rhythms [23]. These limitations reveal a critical insight: merely stacking modules is insufficient to unlock their respective structural advantages. This underscores the need for a sophisticated synergistic framework that organically integrates and dynamically complements the local perception of TCN, the global correlation of Transformer, and the state tracking of LSTM. Such a framework promises a more profound and comprehensive modeling of the complex temporal dynamics within EEG-MI signals.

Building upon the insights above, we introduce the Triple-path Heterogeneous Feature Collaboration Network (TPHFC-Net), an end-to-end model for enhanced motor imagery classification. As depicted in Figure 1, TPHFC-Net is architected with a four-stage progressive framework: (1) Progressive Feature Extractor (PFE): A composite front-end, integrating multi-scale temporal and depthwise separable convolutions, performs initial feature extraction. Subsequently, a denoising diffusion model is employed to bolster the noise robustness of these features. (2) Triple-Path Collaborative Temporal Architecture (TPCTA): The features extracted by PFE are channeled into three parallel streams and processed independently by TCN, Transformer, and LSTM modules. This tripartite design concurrently captures local rhythmic dynamics, models global cross-stage dependencies, and tracks the continuous evolution of signal states. (3) Dynamic Gating Fusion Module (DGFM): A dynamic gating mechanism adaptively learns the importance weights of the heterogeneous features from each stream, followed by a weighted sum fusion that achieves synergistic complementarity and yields an optimized, unified representation. (4) Prototype-Guided Classifier (PGC): In the final stage, a prototype-based attention mechanism guides features toward their corresponding class centers to enhance inter-class separability, after which a fully connected layer performs the final classification.

The primary contributions of this work are threefold:

We introduce a synergistic triple-path temporal modeling mechanism that concurrently leverages TCN, Transformer, and LSTM. This approach holistically models the short-term, global, and state-evolutionary characteristics of EEG-MI signals, thereby enhancing the representational power of the model.
We architect TPHFC-Net, an end-to-end neural network featuring a four-stage progressive framework for accurate motor intent recognition from EEG-MI signals.
We conduct comprehensive experiments on the BCI Competition IV-2a and IV-2b datasets under both subject-dependent and subject-independent settings, demonstrating that the proposed TPHFC-Net consistently outperforms mainstream baseline methods in EEG-MI classification.

2. Related Works

2.1. Classification of Motor-Imagery EEG

EEG-MI classification is a task fundamentally composed of two stages: data preprocessing, followed by model construction and training. The primary goal of preprocessing is to enhance the quality of raw EEG data, a critical step that directly governs the accuracy of all subsequent feature extraction and classification processes. Standard preprocessing techniques include filtering (e.g., band-pass filtering to mitigate high-frequency noise and baseline drift), artifact removal (to correct for ocular interference) and signal normalization, all of which serve to improve the SNR [24,25]. Once the data is cleaned, the subsequent stage involves constructing and training a model to extract discriminative features and ultimately classify the user’s intended motor task. It should be noted that the design of this model must be closely adapted to the intrinsic characteristics of the EEG signal.

In the context of MI, EEG signals present three crucial temporal characteristics: (1) Short-term local features: MI-induced neural phenomena, such as the suppression of

μ

(8–13 Hz) and

β

(13–30 Hz) rhythms, are often transient and concentrated within narrow time windows. Capturing these rapid, localized dynamics is therefore essential for EEG-MI classification [26]. (2) Global dependency features: Complex imaginary movements can elicit synergistic activity across distant brain regions, characterized by large temporal spans and strong global interdependencies. This manifests as phenomena like delayed synchronization and time-lagged coupling between signals from central and parietal areas [27]. (3) State-evolutionary features: The MI process itself is not static but unfolds through distinct phases (e.g., preparation, initiation, maintenance, termination). This phased evolution results in an unstable temporal structure where rhythms evolve slowly, exhibiting clear state continuity and long-range temporal dependencies. For instance, the

β

rhythm might be enhanced during initiation and return to baseline upon termination [28].

To address the challenge of EEG-MI classification, early research heavily relied on traditional machine learning pipelines. A typical workflow would involve first handcrafting features from the preprocessed data such as frequency energy, PSD or CSP, and then feeding them into a classic classifier like LDA, SVM, or k-Nearest Neighbors (k-NN) [29,30]. Among these, the Filter Bank Common Spatial Pattern (FBCSP) framework, proposed by Kai et al. [31], is arguably the most iconic. By integrating multi-band filtering with spatial feature extraction, FBCSP proved highly effective at enhancing task-relevant discriminative patterns and has seen widespread adoption in practical MI-BCI systems [32].

Despite their successes, these traditional methods’ heavy reliance on handcrafted features makes them struggle to capture the intricate, non-linear temporal structures inherent in EEG-MI signals, such as the short-term local features, global dependencies, and state-evolutionary dynamics discussed earlier. Furthermore, these methods are often sensitive to noise and exhibit poor generalization. Consequently, they are increasingly unable to meet the stringent demands for accuracy and robustness required in practical application scenarios.

2.2. Motor-Imagery Classification with CNN

The limitations of traditional machine learning spurred the adoption of deep learning, with CNNs yielding significant performance gains in MI classification. Pioneering work by Joseph et al. [10] in 1994 first applied neural networks to clinical neurophysiological data, developing a classification model that not only outperformed LDA but also demonstrated a distinct advantage in processing non-linear features. Though constrained by the computational bottlenecks and prohibitive training times of the era, this research paved the way for the later dominance of deep learning in MI classification.

A major breakthrough was the design of Shallow ConvNet by Schirrmeister et al. [11], inspired by the highly successful FBCSP method. This model ingeniously emulated FBCSP’s core components and used temporal convolutions to replicate band-pass filtering and spatial convolutions to mimic CSP’s spatial transformations, thus effectively isolated frequency-specific features and highlighted critical channel combinations, yielding accuracies that rivaled or surpassed FBCSP and firmly validated the superiority of CNNs in this domain. The drive for efficiency led Lawhern et al. [12] to develop EEGNet, a compact CNN architecture tailored for EEG data. By employing depthwise separable convolutions to decouple temporal and spatial feature learning, EEGNet drastically reduced its parameters while maintaining accuracy comparable to much larger models, establishing it as a cornerstone for lightweight MI classification. Building upon this foundation, Riyad et al. [33] proposed MI-EEGNet, which integrated an Inception-style architecture for multi-scale feature extraction and an Xception-like structure for enhanced modeling efficiency, thereby achieving stronger generalization across multiple datasets. A subsequent paradigm shift came when Ingolfsson et al. [13] addressed the persistent issue of causality in time-series modeling. Their EEG-TCNet model marked the first systematic application of TCN to MI classification. By incorporating causal and dilated convolutions, EEG-TCNet efficiently captured long-range temporal dependencies within a compact framework, solidifying TCN as a mainstream technique by delivering high accuracy with minimal parameters.

The advent of EEG-TCNet spurred a new wave of research aimed at extending and refining its architecture, which primarily targeted various aspects of the model. One major thrust was architectural innovation. CIACNet by Liao et al. [14] introduced a dual-branch convolutional structure with an enhanced convolutional block attention module (CBAM), empowering the TCN to model temporal features at varying semantic levels. Another approach, seen in ASiBLS by Yang et al. [16], employed a primary-auxiliary branch design to extract global and differential features, using a similarity-guided loss to foster complementary learning and boost generalization. A second area of focus was the optimization of convolutional units to better capture multi-scale features. Salami et al. [34] augmented the TCN with Inception modules in their EEG-ITNet model, enabling joint spectral-temporal modeling and significantly improving cross-subject recognition. Similarly, the SMT model from Yu et al. [15] featured a multi-branch separable convolution (MSC) module, where parallel branches with different kernel sizes captured short- and long-term temporal patterns that were subsequently integrated by a unified TCN. The integration of attention mechanisms emerged as another key strategy for refining feature relevance. For example, ETCNet by Qin et al. [35] synergistically combined an Efficient Channel Attention (ECA) module with a TCN. In this design, the ECA module first refines channel-wise representations, which the TCN then processes for temporal modeling, ultimately yielding higher classification accuracy.

As this body of work illustrates, the exceptional capacity of TCN for modeling local temporal dynamics solidifies it as a cornerstone of modern MI classification. Consequently, innovating upon this TCN foundation, whether through novel architectures, advanced feature modeling techniques, or other enhancements, remains the primary frontier for advancing the accuracy, generalization, and robustness.

2.3. TCN Combined with Transformer/LSTM

While TCNs demonstrate a marked ability to extract local temporal features from EEG signals, such as ERD/ERS, their inherent fixed receptive fields constrain the capacity to model long-range dependencies. This limitation makes it difficult for TCN to effectively capture cross-phase, long-term dynamic information within EEG signals. To circumvent this, some studies have integrated the self-attention mechanism of the Transformer model, which can directly model global dependencies across arbitrary time points, thereby enabling a sharper focus on critical temporal information. Song et al. [17] introduced the EEG Conformer model, which combines convolutional modules for local feature extraction with a Transformer to capture long-distance temporal dependencies, thus judiciously balancing local and global feature modeling capabilities. Expanding on this, Qin et al. [18] developed M-FANet, which incorporates multiple attention mechanisms to selectively emphasize frequency, spatial, and feature map dimensions for comprehensive multi-feature extraction, while simultaneously using regularization to suppress feature redundancy and bolster robustness and generalization. Furthermore, researchers have explored extending single convolutions to multi-scale variants, integrating them with Transformer to further enhance the model’s ability to represent EEG temporal characteristics. Hang et al. [19] presented the MCTD model, which extracts local features across diverse frequency ranges using dynamic convolutions, subsequently employing self-attention to model global temporal dependencies, thereby enriching the model’s capacity to express complex temporal features. In comparison, Zhu et al. [36] proposed IMCTNet, which adopts a more sophisticated multi-scale convolutional architecture and incorporates a channel attention mechanism to adaptively augment the representation capability of features at different scales, ultimately demonstrating superior feature expression and generalization performance.

Despite the Transformer’s notable strengths in modeling global dependencies, it still lacks an efficient mechanism for capturing the dynamic state evolution processes inherent in EEG signals. Long Short-Term Memory (LSTM) networks, as time-series modeling architectures endowed with memory mechanisms, are proficient at continuously tracking rhythmic changes in brain electrical signals. This makes them particularly well-suited for characterizing the evolving patterns from the initiation to the termination phases within MI tasks. Consequently, researchers have also endeavored to incorporate LSTMs to bolster models’ ability to characterize signal state evolution features. Early investigations by Saputra et al. [37] directly applied LSTMs for classification following CSP feature extraction to verify their basic utility; however, their experimental results revealed suboptimal adaptability to complex and high-noise EEG signals. Ghinoiu et al. [20] subsequently introduced a CNN-LSTM-based architecture that leverages convolutional layers to directly extract spatial features from multi-channel EEG signals, with LSTM then modeling their temporal evolution. This hybrid approach considerably enhanced the models’ joint spatio-temporal modeling capabilities. Gui et al. [21] designed the FBLSTM model, which utilizes filter banks for multi-frequency band information extraction, integrates convolutions for spatial feature extraction, and then employs an attention-equipped LSTM module to model temporal variations. This holistic strategy facilitates the joint learning of frequency, spatial, and temporal domain information, thereby effectively enhancing the synergistic expressive power across multi-modal features.

Evidently, constructing hybrid temporal feature modeling structures that judiciously integrate TCN, Transformer, and LSTM, by capitalizing on their complementary strengths, will enable the comprehensive, joint modeling of short-term local features, global dependencies, and state evolution characteristics of EEG signals. Such an approach holds significant promise for yielding more flexible, refined, and accurate temporal feature representations and classification capabilities.

3. Methodology

In this section, we detail the proposed end-to-end neural network model based on Triple-path Heterogeneous Feature Collaboration (TPHFC-Net).It adopts a four-stage progressive architecture design: a Progressive Feature Extractor with integrated noise diffusion modeling (PFE), a Triple-path Collaborative Temporal Architecture (TPCTA) comprising TCN, Transformer, and LSTM, a Dynamic Gating mechanism for adaptive fusion of heterogeneous temporal features, and a Feature Classifier with a prototype attention mechanism.The overall preprocessing, model construction, and training pipeline of the proposed model is illustrated in Figure 2.

3.1. Data Pre-Processing

The initial pre-processing stage focuses on preparing the raw EEG signals while retaining their intrinsic spectral characteristics. Specifically, a broad-band Finite Impulse Response (FIR) filter with a passband of 0.5–100 Hz is applied to remove extremely low-frequency baseline drift and suppress high-frequency noise, while largely preserving the original EEG information content. Owing to its inherent linear-phase property, the FIR filter ensures that the temporal structure of the EEG signals remains intact without introducing phase distortion.

This research utilizes the BCI Competition IV-2a and IV-2b dataset, which, respectively, contain recordings from nine subjects. In BCI Competition IV-2a dataset, each subject completed 72 trials for four distinct MI tasks: left hand, right hand, feet, and tongue. Each trial comprises

T = 1000

time samples recorded from

C = 22

EEG channels. Consequently, the labeled sample set for any given subject can be defined as

S^{k} = {(X_{i}^{k}, y_{i}^{k})}_{i = 1}^{M}

, where

X_{i}^{k} \in R^{C \times T}

is the data matrix of subject k for the i trial,

y_{i}^{k}

is the corresponding class label from {left hand, right hand, feet, tongue}, and

M = 288

is the total number of trials per subject. In contrast, the BCI Competition IV-2b dataset also comprises EEG recordings from nine subjects but focuses on a binary MI classification task involving left-hand and right-hand imagery. For each trial, EEG signals are recorded over

T = 1000

time samples recorded from

C = 3

EEG channels (C3, Cz, and C4), which are closely associated with motor-related cortical activity. The corresponding labeled sample set provides a complementary evaluation scenario characterized by limited spatial information.

To finalize the pre-processing pipeline, the labeled sample set undergoes global Z-score normalization to ensure a consistent data distribution across all channels and time points, followed by reshaping into the required tensor format to yield a pre-processed dataset ready for model training.

3.2. Progressive Feature Extractor

The proposed PFE derives information-dense, dimensionally compact, and robust spatiotemporal features from the high-dimensional raw data through a three-stage process: decoupling of spatiotemporal feature, multi-scale pattern capture, and diffusion-driven feature enhancement. This process ultimately generates an optimized feature tensor for a subsequent triple-path collaborative temporal architecture.

3.2.1. Decoupling of Spatiotemporal Features

For a given input sample tensor

X_{input} \in R^{C \times T}

, the spatiotemporal decoupling process commences by applying a 2D temporal convolution layer with a (1, 32) kernel and

L = 16

output channels to capture localized temporal patterns, yielding an initial feature map

F_{init} \in R^{L \times C \times T}

. To prevent premature entanglement of spatiotemporal information, this map is then fed into a depthwise separable convolution module consisting of a depthwise spatial convolution and a pointwise convolution. Specifically, the depthwise layer utilizes a (C,1) kernel to independently model inter-channel spatial relationships at each time step. This is followed by a pointwise layer that facilitates cross-channel information exchange and expands the feature channel dimension from

L = 16

to

G = 32

. The final output of this decoupling module is formulated as:

F_{DST} = ELU (W_{p} \cdot (W_{s} \cdot BatchNorm (F_{init})))

(1)

where

F_{DST} \in R^{G \times T}

,

W_{s}

and

W_{p}

represent the kernel weights for the depthwise spatial convolution and pointwise convolution, respectively. This design ensures that each output feature is a non-linear combination of all input channel features, achieving effective cross-channel fusion while preserving the critical separation of spatiotemporal information.

3.2.2. Multi-Scale Pattern Capture

The core of our multi-scale pattern capture strategy is the Temporal Inception module, which enhances feature richness and discriminative power by employing multiple parallel temporal convolution paths. These paths utilize different kernel sizes to achieve varying receptive fields, enabling efficient modeling of multi-temporal resolution features within the signal. For computational efficiency and to broaden the temporal context, the process begins with an average pooling layer with a kernel size of (1, 8) to compresses the time dimension from

T = 1000

to

T^{'} = 125

. The pooled feature map

F_{DST}^{'} \in R^{G \times T^{'}}

is then fed into four parallel branches with a unified output channel dimension

H = G / 4

to capture temporal patterns with different time scales: three grouped-convolutional branches with varying kernel sizes of (1,

K_{i}

) (

K_{i} \in {3, 5, 7}

) and a max-pooling branch with a kernel size of (1,

K_{p}

) (

K_{p}

= 3). Unlike standard convolutions, these convolutional branches employ grouped convolutions by dividing input channels into H groups for independent computation, which significantly reduces the parameter count. The outputs of the four branches,

P_{i} \in R^{H \times T^{'}}

, can be expressed as:

P_{i} [h, t] = \{\begin{matrix} Dropout (ELU (\sum_{k = 1}^{K_{i}} W_{i} \cdot F_{DST}^{'})) & , i = 1, 2, 3 \\ Dropout (ELU (W_{i} \cdot MaxPooling (F_{DST}^{'}))) & , i = 4 \end{matrix}

(2)

Similarly, a second average pooling operation further compresses the time dimension to

T^{″} = 15

. This progressive dimensionality reduction strategy ensures that critical discriminative information is effectively encoded into the feature representation prior to extensive dimensionality reduction. Finally, the resulting feature tensors from these four branches are concatenated along the channel dimension, forming a unified and comprehensive multi-scale feature representation

F_{MMC} \in R^{G \times T^{″}}

, denoted as:

F_{MMC} = AvgPooling (ELU (Concat (P_{1}, P_{2}, P_{3}, P_{4})))

(3)

3.2.3. Diffusion-Driven Feature Enhancement

To address the challenges of significant noise and inter-subject/session variability inherent in EEG signals, we introduce a diffusion-driven feature enhancement mechanism. In contrast to conventional methods like Dropout and Additive Noise that inject static noise, this mechanism dynamically adapts to the feature state by iteratively refining the noise distribution, thereby enabling a more robust recovery of the underlying signal representation. The mechanism operates iteratively, with each iteration comprising a forward noising phase and a reverse denoising phase based on Denoising Diffusion Probabilistic Model (DDPM). In the forward phase, controlled Gaussian noise

ε_{t}

is injected into the input feature map

F_{MMC}

to construct a noisy version

\hat{F_{t}}

:

\hat{F_{t}} = \sqrt{α_{t}} F_{MMC} + \sqrt{1 - α_{t}} ε_{t}, t \in {1, 2, \dots, T_{S}}

(4)

where

α_{t} = 1 - β_{t}

is defined as the fidelity coefficient,

β_{t}

is a predefined linear noise scale parameter, and t represents the iteration timestep. As to the reverse phase, it employs a lightweight network

f_{θ}

to estimate the injected noise

ε_{pred} = f_{θ} (\hat{F_{t}})

, which is then used to progressively denoise the feature map:

ε_{t - 1} = ε_{t} - \frac{β_{t}}{\sqrt{1 - α_{t}}} \cdot ε_{pred}

(5)

After

T_{S}

rounds of iterative denoising, this module obtains the final noise correction result

ε_{t}

and yields the final feature enhancement term, which is integrated back into the original feature map via a residual connection:

F_{aug} = F_{MMC} + λ_{d} \cdot σ (F_{MMC}) \cdot ε_{0}

(6)

This formulation scales the final injected noise based on both a fixed hyper parameter

λ_{d} = 0.1

and the standard deviation of the original features

σ (F_{MMC})

. This adaptively matches the perturbation’s energy to the feature’s intrinsic scale, functioning as a stable and effective regularization method.

Through the entire progressive feature extraction process, the raw input data is compressed and encoded into a highly compact and refined feature tensor

F_{aug} \in R^{G \times T^{″}}

.

3.3. Triple-Path Collaborative Temporal Architecture

Upon completion of the progressive feature extraction, the model engages its core computational engine: the Triple-Path Collaborative Temporal Architecture (TPCTA). The TPCTA is founded on the premise that any single temporal modeling paradigm has inherent inductive biases, preventing it from comprehensively capturing all dependencies within a signal [38]. To overcome this, the TPCTA deploys three parallel paths tailored to target the distinct temporal properties coexisting in EEG signals: short-term local patterns, long-range global dependencies, and continuous state evolution. This multi-path approach ensures a holistic and robust representation of the signal’s intricate temporal characteristics.

3.3.1. Lite-MSTCN: Capturing Local Multi-Scale Dependencies

The first path, Lite-MSTCN, leverages a multi-scale TCN to capture local dependencies across various time scales, such as the

μ

/

β

rhythmic signatures in EEG signals. Unlike traditional TCNs that sequentially stack dilated convolutions, Lite-MSTCN employs a parallel structure to broaden its multi-scale perception and incorporates an attention mechanism for adaptive, cross-scale feature integration. Initially, a lightweight convolution (LiteConv) layer performs a foundational transformation on the input feature tensor

F_{aug}

. By synergizing depthwise and pointwise convolutions for intra-channel temporal modeling and inter-channel feature integration, respectively, which are followed by a channel shuffle operation to enhance cross-channel information exchange, this design boosts the model’s feature representation while markedly reducing computational overhead. The transformed feature tensor

F_{base} \in R^{G \times T^{″}}

is then channeled into three parallel dilated convolution branches. For branch

i \in {0, 1, 2}

, the dilation factor

d_{i} = 2^{i}

increases exponentially while the kernel size remains fixed as

K_{b} = 3

. This architectural choice allows the model to achieve an exponentially expanding receptive field without added parametric complexity, facilitating the efficient capture of temporal dependencies at diverse scales. The output of each branch

B_{i} \in R^{G \times T^{″}}

is further refined through Batch Normalization and a GELU activation function:

B_{i} = GELU (BN (CausalConv 1 D (F_{base}, K_{b}, d_{i})))

(7)

where CausalConv1D() represents a causal convolution.

Departing from the static fusion methods (e.g., summation or concatenation) of conventional TCNs, Lite-MSTCN introduces a lightweight channel attention module to dynamically fuse the multi-scale features. The mechanism computes a global context vector via temporal average pooling, which then informs a compact two-layer convolutional network to generate adaptive attention weights

W_{DC}^{i}

for the three parallel branches. The final output of Lite-MSTCN

H_{TCN} \in R^{G \times T^{″}}

is a dynamically weighted combination of the branch features, tailored to the specific characteristics of

B_{i}

:

H_{TCN} = \sum_{i = 0}^{2} W_{DC}^{i} \cdot B_{i}

(8)

3.3.2. Lite-Transformer: Capturing Global Contextual Dependencies

To capture global contextual dependencies that extend beyond the fixed receptive fields of TCNs, such as the long-range association between task cues and motor execution, the TPCTA incorporates a second path: a lightweight Transformer (Lite-Transformer). Standard Transformers are prone to overfitting when applied to short-sequence, small-sample EEG datasets, primarily due to their lack of inductive bias. To mitigate this issue, Lite-Transformer fortifies the standard architecture by incorporating convolutional inductive biases and a dynamic gating mechanism. Distinct from variants that rely solely on self-attention or convolutional bias, Lite-Transformer introduces a dynamic fusion mechanism that orchestrates a parallel interplay between global self-attention and local convolutional attention. This allows the model to capture global context while retaining sensitivity to local rhythmic patterns, enhancing its adaptability to non-stationary EEG signals.

The process begins by projecting the input tensor

F_{aug}

into a stable feature space via a

1 \times 1

convolution and BatchNorm1d layer, yielding the projected feature map:

F_{MAP} = BN (W_{MAP} \cdot F_{aug})

(9)

where

F_{MAP} \in R^{C_{T} \times G}

, and

C_{T}

is the Transformer channels.

F_{MAP}

is then fed into two parallel branches within Lite-Transformer. The global context branch employs multi-head self-attention (

H_{head} = 4

) to capture non-local dependencies across the entire sequence:

F_{TRANS}^{g} = MultiHeadAttention (F_{MAP}^{T})

(10)

Concurrently, the local structure branch processes the tensor

F_{MAP}

with a LiteConv module. This step is crucial for injecting key convolutional inductive biases (e.g., translation invariance) into the model, enabling a more robust extraction of local structural features:

F_{TRANS}^{l} = {(LiteConv (F_{MAP}))}^{T}

(11)

Finally, a Linear Attention Gating unit composed of a multi-layer perceptron and a Sigmoid function is utilized to perform a weighted fusion of the features from the two branches. Critically, this unit takes the output of the global context branch as its input to dynamically generate a gating value between 0 and 1 for each time step and feature dimension, which then modulates the combination of the global and local feature streams:

H_{TRANS} = {(G (F_{TRANS}^{g}) ⊙ F_{TRANS}^{g} + (1 - G (F_{TRANS}^{g})) ⊙ F_{TRANS}^{l})}^{T}

(12)

where

G (\cdot)

and ⊙ represent linear gating operation and element-wise multiplication operation, respectively. This design empowers the model to autonomously arbitrate between the discriminative global context from self-attention and the robust local features from convolutions, based on the input data pattern, thereby resulting in a dynamic and complementary synergy between the two modeling paradigms.

3.3.3. Lite-LSTM: Modeling State Evolution Dynamics

In contrast to the stateless TCN and non-recurrent Transformer, the stateful architecture of LSTM offers a distinct advantage in modeling temporal dynamics and non-stationarity. This rationale underpins the inclusion of a third path, Lite-LSTM, whose inclusion is not for architectural novelty but to serve as the dedicated “state evolution expert”. Leveraging its internal cell state and sophisticated gating mechanism, Lite-LSTM models the continuous narrative of the cognitive task, thereby filling a functional void left by the other two paths.

Lite-LSTM consists of a two-layer unidirectional LSTM architecture: the first layer maps the input sequence to a sequence of hidden states, which in turn serves as the input for the second layer to generate the final hidden state sequence. The state transition process can be concisely expressed as:

h^{(1)} = {LSTM}_{1} (F_{aug})

(13)

h^{(2)} = {LSTM}_{2} (h^{(1)})

(14)

The output sequence from the second layer serves directly as the final feature representation of Lite-LSTM:

H_{LSTM} = h^{(2)} = [h_{1}^{(2)}, h_{2}^{(2)}, \dots, h_{T^{″}}^{(2)}]

(15)

Within the TPCTA framework, Lite-LSTM provides a modeling perspective that is orthogonal to Lite-TCN and Lite-Transformer. It offers a Markovian view of state evolution, enabling the model to capture state evolution memory such as the continuous progression of brain states during a MI task, which are inherently ill-equipped to handle for stateless or non-recurrent architectures. The inclusion of Lite-LSTM is therefore vital for ensuring the architecture’s robustness, further complementing and enhancing comprehensive feature learning capabilities of the model.

3.4. Dynamic Gating Fusion Module

As detailed in Section 3.3, the TPCTA architecture yields three heterogeneous feature tensors:

H_{TCN}

,

H_{TRANS}

and

H_{LSTM}

. While dimensionally identical, these tensors encapsulate temporal information derived from three distinct modeling paradigms—convolutional, self-attentional, and recurrent. This heterogeneity demands their fusion into a single, more discriminative representation. To this end, we introduce a dynamic gating fusion module designed to adaptively weight the contribution of each path at every timestep, enabling a context-aware synthesis of these diverse features.

The process begins by concatenating the three heterogeneous features along the channel dimension to form an aggregated feature tensor

H_{a} = Concat [H_{TCN}; H_{TRANS}; H_{LSTM}]

. This provides a holistic input to a dedicated lightweight gating network

F_{g}

, which is composed of two 1D convolutional layers with a unified kernel size of 3 and ELU activations. The output of

F_{g}

is then passed through a Softmax function to yield the dynamic gating weights for the three paths

W_{g} \in R^{3 \times T^{″}}

:

W_{g} = Softmax (F_{g} (H_{a}))

(16)

where

W_{g} = [W_{TCN}; W_{TRANS}; W_{LSTM}]

and each slice quantifies the relative importance of the path features across all timesteps. The final fused representation

H_{fused}

is then computed by performing a timestep-wise weighted summation of the path features with these dynamic weights:

H_{fused} = W_{TCN} ⊙ H_{TCN} + W_{TRANS} ⊙ H_{TRANS} + W_{LSTM} ⊙ H_{LSTM}

(17)

This dynamic gating fusion mechanism is essentially a data-driven arbitration strategy that empowers the model to learn complex fusion policies directly from the input data. For instance, the model can learn to amplify the contribution of

H_{TCN}

when local signal rhythms are prominent, or conversely, prioritize

H_{TRANS}

when long-range dependencies are more critical.

Following this dynamic fusion, the model proceeds to a final feature integration stage. A

1 \times 1

convolution first projects the dimension of the fused feature

H_{fused}

from

G = 32

to a more expressive dimension of

G_{fused} = 48

, followed by normalization and a non-linear activation to enhance the feature representation. The resulting temporal sequence is then condensed into a fixed-dimension feature vector

χ_{final} \in R^{1 \times G_{fused}}

, by applying Global Average Pooling across the temporal dimension, preparing it for the ultimate classification.

3.5. Prototype-Guided Classifier

The significant non-stationarity and distribution shifts inherent in EEG signals pose a considerable challenge, often rendering linear classifiers insufficient to establish stable inter-class decision boundaries. We address this limitation by introducing a Prototype-Guided Classifier (PGC) that precedes the final linear classifier. The core principle of PGC is to enhance feature separability through a refinement step that leverages a set of learnable class prototypes to optimize feature representations prior to the classification decision.

The PGC maintains a set of learnable class prototypes

ρ = [ρ_{1}, ρ_{2}, \dots, ρ_{N}]

, where N is the number of classes (four in this task), and the prototype vector

ρ_{n} \in R^{1 \times G_{fused}}

can be viewed as a learnable centroid or a canonical exemplar of class n within the feature space. For any given input feature tensor

χ_{final} \in R^{1 \times G_{fused}}

, the module processes it via a two-phase procedure:

Phase 1: Attention Weight Generation. Rather than directly computing input-prototype similarities, the module first feeds the input feature

χ_{final}

into a dedicated feed-forward attention network to generate a set of dynamic, input-specific attention weights. These weights then undergo channel-wise refinement via a lightweight depthwise separable convolution before being normalized by a Softmax function to produce the final prototype fusion weights

w = [w_{1}, w_{2}, \dots, w_{N}]

:

w = Softmax (DepthwiseConv (AttentionNet (χ_{final})))

(18)

Phase 2: Prototype-Guided Feature Refinement. Using the weights computed in the previous phase, the model performs a weighted sum over the prototype space to construct

χ_{proto}

, a prototype context vector highly relevant to the current samples:

χ_{proto} = \sum_{n = 1}^{N} w_{n} \cdot ρ_{n}

(19)

This vector embodies a context-aware representation synthesized from the global class structure but guided by the sample’s specific affinities, which is then integrated back into the original feature via a scaled residual connection with a learnable scaling factor

λ_{p}

, forming the refined feature vector

χ_{refined} \in R^{G_{fused}}

χ_{refined} = χ_{final} + λ_{p} \cdot χ_{proto}

(20)

This refinement process can be interpreted as an adaptive modulation of the original feature vector. It leverages the global manifold of the feature space, as defined by the prototypes, to gently steer each sample’s representation towards its corresponding class region. This process guides the model to learn an enhanced feature space characterized by greater intra-class compactness and inter-class separability.

Finally, the prototype-guided feature vector

χ_{refined}

is fed into a standard fully-connected layer with weight matrix

W_{c} \in R^{N \times G_{fused}}

and a Softmax function to compute the final posterior class probabilities:

\hat{y_{i}} = Softmax (W_{c} \cdot χ_{refined})

(21)

This entire pipeline, from the dynamic gating fusion to the prototype-guided classification, collectively ensures that the model maximally utilizes the heterogeneous temporal features from multiple paths and leads to highly robust and accurate classification.

3.6. Loss Functions and Training Strategy

Given the model’s predicted probabilities from the Softmax layer and the one-hot encoded ground-truth labels, the loss is formulated as:

L_{C E} = - \frac{1}{B} \sum_{i = 1}^{B} \sum_{n = 1}^{N} y_{i, n} log ({\hat{y}}_{i, n})

(22)

where B is the number of samples in a batch, N is the number of classes, and

{\hat{y}}_{i, n}

is the predicted probability that sample i belongs to class n. This formulation is equivalent to the negative log-likelihood of the true class and serves to minimize the Kullback–Leibler (KL) divergence between the predicted and true distributions, which effectively reduces their statistical distance and strengthens the discriminative power of the model.

The model’s parameters are optimized by using Stochastic Gradient Descent (SGD). We selected SGD for its well-documented stability and predictable convergence, qualities that are particularly beneficial for maintaining strong generalization performance in models with complex feature fusion architectures. The update rule for the trainable parameters

θ_{t}

is given by:

θ_{t + 1} = θ_{t} - η (▽ (L_{C E} (θ_{t})) + λ θ_{t})

(23)

where

η

denotes the learning rate,

▽ (L_{C E} (θ_{t}))

is the gradient of the loss function with respect to

θ_{t}

, and

λ

is the weight decay coefficient that enforces L2 regularization.

4. Experiments Details

4.1. Experiment Setup

4.1.1. Experiment Preparations

In our experiments, we utilize a software stack consisted of Python 3.8, PyTorch 2.4.0, and CUDA 12.6, running on a Windows 11 OS. The hardware platform is a workstation equipped with an Intel i7-14700KF CPU (Intel, Santa Clara, USA), 32 GB of DDR4 RAM (Gloway, Shenzhen, China), and a Tesla P40 GPU (NVIDIA, Santa Clara, USA).

4.1.2. Dataset and Evaluation Metrics

To evaluate the performance of our proposed model, we conduct extensive experiments on the BCI Competition IV-2a and IV-2b datasets, a widely adopted benchmark for MI classification. The BCI Competition IV-2a dataset contains

9 \times 4 \times 72 = 2592

samples collected from 9 subjects, each performing 72 trials for four distinct MI tasks (left hand, right hand, both feet, and tongue) with each trial’s data constituting a single sample of

22 \times 1000

= 22,000 points acquired from 22 EEG channels over 1000 time points. By contrast, the BCI Competition IV-2b dataset consists of EEG recordings from nine subjects performing two motor imagery tasks (left hand and right hand). Each subject completes multiple trials per session, with EEG signals acquired from three central channels closely related to MI-related activity. For each trial, the EEG data are segmented into 1000 time points, resulting in a single sample of

3 \times 1000 = 3000

points. Compared with BCI IV-2a, it provides substantially fewer spatial channels, thereby serving as a more challenging benchmark for evaluating the model’s ability to extract discriminative temporal features under limited spatial information.

We selected Accuracy and Cohen’s Kappa as the primary metrics for performance evaluation, which are considered among the most prevalent and significant indicators in EEG classification. Accuracy, denoted as

p_{o}

, is formally defined as the ratio of the number of correct predictions to the total number of classification trials:

p_{o} = \frac{\sum_{i = 1}^{N} T P_{i}}{M}

(24)

where N represents the number of classes (specifically,

N = 4

for the four-class MI task), M is the total number of samples, and

T P_{i}

is the count of true positives for class i, i.e., instances of class i correctly identified as such.

Cohen’s Kappa, a metric particularly effective for imbalanced datasets, evaluates the consistency between model predictions and true labels while explicitly correcting for chance agreement. This makes Kappa a fairer metric for comparing algorithm performance, as it avoids the misleadingly high scores often achieved by models that rely on naive or biased classification strategies. The Kappa coefficient (

κ

) is calculated as follows:

κ = \frac{p_{o} - p_{e}}{1 - p_{e}}

(25)

In this equation,

p_{o}

denotes the overall accuracy (observed agreement). The term

p_{e}

quantifies the hypothetical probability of chance agreement and is defined as

p_{e} = \sum_{i = 1}^{N} (\frac{A_{i} \times B_{i}}{M^{2}})

, where

A_{i}

and

B_{i}

denote the total number of actual instances and predicted instances for class i, respectively.

4.1.3. Implementation Details

The hyperparameters for model training are detailed in Table 1. The model was trained using the Stochastic Gradient Descent (SGD) optimizer with a learning rate of 0.001. A batch size of 32 was selected to balance maximizing the GPU’s parallel processing capability for higher computational throughput against the constraint of the hardware memory capacity, thereby preventing out-of-memory (OOM) errors. With these settings, the model consistently achieved convergence within 2000 epochs.

4.2. Comparison with SOTA

Table 2 presents the quantitative results for subject-dependent classification in BCI Competition IV-2a dataset, where the proposed model achieves a remarkable average accuracy of 82.45%. This performance represents a significant gain over classic baselines, such as EEGNet (+10.05%), EEG-ITNet (+5.71%), and EEG-TCNet (+5.10%). This superiority stems from overcoming the limitations of conventional methods, which rely on convolutional networks that primarily capture short-term local features (e.g.,

μ / β

rhythms) while overlooking other critical temporal dynamics. This advantage also extends beyond classic benchmarks to recent state-of-the-art (SOTA) works, including MBCNN-EATCFNet (2025), DMSACNN (2025), and MSSAN (2024). While these advanced methods enhance the TCN framework with techniques like multi-branch structures, multi-scale convolutions, or attention mechanisms, they still adopt a one-sided approach, failing to achieve comprehensive temporal modeling. In contrast, our model introduces a three-path synergistic architecture that uniquely integrates three distinct modeling paradigms: leveraging TCN for short-term local features, Transformer for long-range global dependencies, and LSTM for state evolution dynamics. By adaptively fusing these complementary features derived from convolutional, self-attention, and recurrent paradigms, our model generates a more discriminative representation, leading to a substantial boost in classification performance.

The model’s superiority is further corroborated by its Kappa coefficient. With an average Kappa value of 0.77, it surpasses all other models listed in Table 2 and confirms a higher degree of agreement between its predictions and the true labels. This metric is particularly insightful because it quantifies agreement while accounting for chance, meaning the improved Kappa score indicates substantially enhanced reliability and stability in the model predictions. This implies that our model excels not only in capturing latent data patterns and minimizing misclassifications but also in demonstrating robust performance in practical applications. Consequently, the proposed model is distinguished by its dual advantages in accuracy and reliability, highlighting its significant practical utility and strong potential for widespread adoption.

To further validate the robustness of the proposed framework, we extended the evaluation to the BCI Competition IV-2b dataset. As presented in Table 3, the model achieves a state-of-the-art average subject-dependent classification accuracy of 89.49% and a Kappa coefficient of 0.78, consistently outperforming all competing approaches. This represents a substantial improvement over the second-best SMT model (87.67%) and a significant margin over classic baselines such as EEGNet (85.24%) and Shallow ConvNet (83.98%). Notably, the model demonstrates exceptional adaptability to individual subjects, attaining near-perfect classification rates on Subject 4 (99.17%) and Subject 5 (98.33%). This finding is particularly significant given that the BCI IV-2b dataset contains only three EEG channels (C3, Cz, C4), providing limited spatial information. The results indicate that even in scenarios with sparse spatial features, our proposed three-path synergistic architecture effectively compensates by extracting high-quality heterogeneous temporal features. This confirms the model’s efficacy and stability across varying EEG acquisition configurations.

In addition to subject-dependent evaluations, we employed a Leave-One-Subject-Out (LOSO) cross-validation protocol to rigorously assess the model’s generalization capability. Table 4 presents a comprehensive comparison of the classification accuracy and Kappa scores against several state-of-the-art baseline methods on the BCI Competition IV-2a and IV-2b datasets. As illustrated in the table, the proposed model consistently outperforms all competing approaches across both datasets. Specifically, on the BCI IV 2a dataset, our model achieves a leading accuracy of 67.36% and a Kappa coefficient of 0.56, surpassing the second-best performer, EEG-TCNet. The performance advantage is even more pronounced on the BCI IV 2b dataset, where the proposed method attains an accuracy of 83.74% and a Kappa of 0.67, demonstrating a significant margin over established models such as EEGNet. These results underscore the superior capability of the proposed architecture in capturing subject-invariant features, thereby exhibiting strong robustness against inter-subject variability.

Figure 3 provides a visual performance assessment via box plots, which illustrates the accuracy distribution of our model and competing models across all subjects. An analysis of the plots reveals that our model demonstrates comprehensive superiority to other models across key statistical metrics including the median (horizontal line), quartiles (box edges), and range (whiskers). However, it is noteworthy that certain models exhibit strong outlier performance. For instance, the EEG Conformer achieves a higher maximum accuracy on some subjects, while MSSAN and M-FANet exhibit greater stability (i.e., lower variance) in some cases, indicating their efficacy under particular conditions. Despite these isolated strengths, our model achieves a superior overall balance between peak performance and consistency across the cohort, which lies in an innovative temporal architecture that effectively harmonizes diverse modeling paradigms.

Figure 4 details the classification outcomes for each subject on MI task through confusion matrices, where the main diagonal signifies correct predictions and off-diagonal elements indicate misclassifications between true (Y-axis) and predicted (X-axis) labels. The results reveal a significant performance divergence among subjects. Subjects 3, 7, and 9 demonstrated robust performance, characterized by high accuracy for the left/right-hand classes and minimal inter-class confusion. Conversely, Subjects 2, 5, and 4 exhibited suboptimal results, with particularly high error rates for the ‘Feet’ class. This is exemplified by Subject 2, who showed the most pronounced performance degradation, with seven ‘Feet’ trials misclassified as ‘Left Hand’. We attribute this inter-subject performance variance to three primary factors: signal quality, individual neurophysiological differences, and the representativeness of the training data. Superior performance in certain subjects likely correlates with high SNR and more distinct features, whereas poor results may stem from noise-corrupted signals that impede effective feature extraction, a challenge especially prominent for the more complex ‘Feet’ and ‘Tongue’ MI tasks. Furthermore, inherent variability in individual brainwave patterns and muscle artifacts can lead to subject-specific model performance. The success with Subject 3, for instance, may be due to their highly discernible and stable EEG patterns. Finally, the comprehensiveness of the training data is critical. If the training set inadequately captures the full spectrum of a subject’s unique EEG signatures, the model’s generalization capabilities will inevitably be compromised.

4.3. Computational Cost

Apart from classification accuracy, computational efficiency is a critical criterion for evaluating the practicality of deep learning models in real-world BCI systems. To rigorously assess the proposed model’s deployment feasibility, we conducted a quantitative analysis of its complexity using three key metrics: the number of trainable parameters (Params), floating-point operations (FLOPs), and average inference latency per trial. Table 5 presents a comparative summary of these metrics against state-of-the-art baseline methods. As shown in Table 5, the proposed model comprises 48.47 k parameters and requires approximately 62.07 M FLOPs per forward inference. While this indicates a moderate increase in computational load compared to ultra-lightweight architectures like EEGNet (3.44 k Params, 24.44 M FLOPs) or EEG-TCNet, our model maintains a significantly lower footprint than deeper or Transformer-based networks. Specifically, its computational cost is substantially reduced compared to DeepConvNet (283.25 M FLOPs) and EEG-Conformer (789.8 k Params), balancing structural complexity with resource efficiency. In terms of execution speed, inference latency is a decisive factor for online decoding. The proposed model achieves an average latency of 2.743 ms, which is faster than complex models such as SMT (3.065 ms) and EEG-Conformer (4.67 ms). Although slightly higher than that of shallow networks, this latency remains negligible within the context of BCI feedback loops. These results demonstrate that the proposed model effectively trades off a marginal increase in computational cost for enhanced representation capability, ensuring it remains lightweight enough for real-time applications.

4.4. t-SNE Visualization of the Extracted Features

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a dimensionality reduction technique designed to visualize high-dimensional data in a lower-dimensional space while preserving local structural relationships. In this study, t-SNE was employed to analyze the feature distributions learned by the model, serving as a qualitative assessment of its classification efficacy. Figure 5 illustrates the feature embeddings for Subject 1 of the BCI IV 2a dataset across various stages: the Lite-MSTCN, Lite-Transformer, and Lite-LSTM paths, followed by the dynamic gated fusion module and the prototype-enhanced classifier.

In t-SNE visualization, a distinct separation between clusters of different classes suggests significant divergence in the high-dimensional space, indicating that the model has effectively captured discriminative features. Conversely, dense clustering within the same class reflects high intra-class consistency, implying strong feature similarity among samples of the same category. As shown in the results, features processed through the dynamic gated fusion stage and subsequent prototype-enhanced classifier exhibit superior inter-class separability and intra-class compactness. This demonstrates the model’s capability to optimally integrate heterogeneous temporal features extracted from multiple pathways, ultimately ensuring robust and accurate classification performance.

4.5. Ablation Study

An ablation study was conducted to systematically evaluate the contributions of the TCN, Transformer, and LSTM modules. Quantitative results (mean accuracy and kappa) are summarized in Table 6, with per-subject accuracy detailed in Figure 6. The baseline model, stripped of all three temporal feature extraction paths, established a performance floor at 73.27% accuracy and 0.66 kappa. Individually enabling each path validated their distinct and complementary roles: the TCN path yielded the largest gain (+4.74% accuracy, +0.05 kappa) by capturing local temporal patterns; the Transformer path contributed by modeling global dependencies (+2.73% accuracy, +0.02 kappa); and the LSTM path offered benefits by tracking state evolution (+2.19% accuracy, +0.01 kappa). The synergy between these paradigms was evident in dual-path configurations. Fusing TCN with either Transformer or LSTM via dynamic gating consistently outperformed single-path models, boosting accuracy by at least 1.89%. Figure 6 corroborates this, showing that these dual-path combinations consistently outperform single-path across most subjects. This synergy suggests that the integration of diverse temporal modeling paradigms effectively overcomes the inherent blind spots of any single approach. Notably, the TCN module was crucial for stabilizing performance on non-stationary subjects, where standalone Transformer or LSTM models faltered. This stabilizing effect is also reflected in the reduced cross-subject performance volatility observed in the TCN+Transformer combination.

The full tripartite architecture, leveraging adaptive fusion of all three paths, culminated in the highest performance, reaching a mean accuracy of 82.45% and a kappa of 0.77. This configuration not only surpassed all sub-models in aggregate metrics but also delivered a more balanced and robust performance profile across all nine subjects, as seen in Figure 6. This demonstrates the architecture’s superior adaptability in feature fusion, which mitigates dependency on any signal modeling paradigm. Collectively, the ablation study provides compelling evidence for the architectural rationale of our model, validating the potent synergy achieved by the fusion of TCN, Transformer, and LSTM for MI classification.

5. Conclusions

In this paper, we present TPHFC-Net, an end-to-end neural network built upon a triple-path collaborative temporal architecture for the four-class MI classification task. The model concurrently leverages TCN, Transformer, and LSTM to capture short-term local, long-range global, and state evolution features from EEG-MI signals, respectively. By integrating these heterogeneous yet complementary features through a adaptive fusion module, TPHFC-Net creates a highly discriminative representation. This advanced representation enables superior classification performance by effectively addressing the limitations of incomplete temporal modeling and suboptimal performance in prior methods. Extensive experiments on the BCI Competition IV-2a dataset validated our approach, demonstrating that TPHFC-Net significantly outperforms existing mainstream models.

The central finding of this study is that the synergistic integration of diverse temporal modeling paradigms, rather than their simple concatenation, can unlock a new performance ceiling for EEG-MI classification. However, despite its strong performance, TPHFC-Net has two primary limitations. First, its parallel architecture introduces a significant computational overhead. Second, its feature modeling is predominantly confined to the temporal domain. These limitations point to clear directions for future research. Future work should focus on multi-domain fusion, integrating spatial and frequency-domain information to complement the temporal features. Furthermore, optimizing the model through techniques like network pruning or knowledge distillation could enhance its computational efficiency, making it more viable for real-world MI-BCI applications.

Author Contributions

Conceptualization, Y.J.; data curation, D.W.; formal analysis, Y.J. and C.L.; methodology, C.D. and Y.J.; software, Y.J.; supervision, C.L.; validation, D.W.; writing—original draft, Y.J. and C.D.; writing—review and editing, C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is partially supported by the Jiangsu Province Industry-University-Research Collaboration Project (Grant No. BY20230186).

Institutional Review Board Statement

The study that collected and published original dataset, Brunner et al. (2008) [43] has stated that their data collection protocol was approved by the local ethics committee of Graz University of Technology. All participants provided written informed consent before the experiments, as detailed in the original publication. Corresponding information can be verified in the official dataset description paper available at: http://www.bbci.de/competition/iv/desc_2a.pdf, accessed on 17 November 2025.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in [BCIIVdataset2a] at [https://bnci-horizon-2020.eu/database/data-sets], accessed on 17 November 2025, reference number [001-2014]. These data were derived from the following resources available in the public domain: [https://www.bbci.de/competition/iv/#dataset2a], accessed on 17 November 2025.

Acknowledgments

We are hugely grateful to the possible anonymous reviewers for their careful, unbiased, and constructive suggestions with respect to the original manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shenoy Handiru, V.; Vinod, A.; Guan, C. EEG Source Imaging of Movement Decoding: The State of the Art and Future Directions. IEEE Syst. Man Cybern. Mag. 2018, 4, 14–23. [Google Scholar] [CrossRef]
Liang, W.; Jin, J.; Xu, R.; Wang, X.; Cichocki, A. Variance characteristic preserving common spatial pattern for motor imagery BCI. Front. Hum. Neurosci. 2023, 17, 1243750. [Google Scholar] [CrossRef]
Tung, S.W.; Guan, C.; Ang, K.K.; Phua, K.S.; Wang, C.; Zhao, L.; Teo, W.P.; Chew, E. Motor imagery BCI for upper limb stroke rehabilitation: An evaluation of the EEG recordings using coherence analysis. In Proceedings of the 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Osaka, Japan, 3–7 July 2013; Volume 2013, pp. 261–264. [Google Scholar] [CrossRef]
Khademi, Z.; Ebrahimi, F.; Kordy, H.M. A review of critical challenges in MI-BCI: From conventional to deep learning methods. J. Neurosci. Methods 2023, 383, 109736. [Google Scholar] [CrossRef] [PubMed]
Orban, M.; Elsamanty, M.; Guo, K.; Zhang, S.; Yang, H. A Review of Brain Activity and EEG-Based Brain–Computer Interfaces for Rehabilitation Application. Bioengineering 2022, 9, 768. [Google Scholar] [CrossRef]
Saha, S.; Mamun, K.A.; Ahmed, K.; Mostafa, R.; Naik, G.R.; Darvishi, S.; Khandoker, A.H.; Baumert, M. Progress in Brain Computer Interface: Challenges and Opportunities. Front. Syst. Neurosci. 2021, 15, 578875. [Google Scholar] [CrossRef]
Samek, W.; Kawanabe, M.; Müller, K.R. Divergence-Based Framework for Common Spatial Patterns Algorithms. IEEE Rev. Biomed. Eng. 2014, 7, 50–72. [Google Scholar] [CrossRef] [PubMed]
Lotte, F.; Congedo, M.; Lécuyer, A.; Lamarche, F.; Arnaldi, B. A review of classification algorithms for EEG-based brain–computer interfaces. J. Neural Eng. 2007, 4, R1. [Google Scholar] [CrossRef]
dos Santos, E.M.; San-Martin, R.; Fraga, F.J. Comparison of subject-independent and subject-specific EEG-based BCI using LDA and SVM classifiers. Med. Biol. Eng. Comput. 2023, 61, 835–845. [Google Scholar] [CrossRef]
Sgro, J. Neural network classification of clinical neurophysiological data for acute care monitoring. In A Decade of Neural Networks: Practical Applications and Prospects; Alacron, Inc.: Nashua, NH, USA, 1994; pp. 95–106. [Google Scholar]
Tibor Schirrmeister, R.; Springenberg, J.T.; Fiederer, L.D.J.; Glasstetter, M.; Eggensperger, K.; Tangermann, M.; Hutter, F.; Burgard, W.; Ball, T. Deep learning with convolutional neural networks for EEG decoding and visualization. arXiv 2017, arXiv:1703.05051. [Google Scholar] [CrossRef]
Lawhern, V.J.; Solon, A.J.; Waytowich, N.R.; Gordon, S.M.; Hung, C.P.; Lance, B.J. EEGNet: A compact convolutional neural network for EEG-based brain–computer interfaces. J. Neural Eng. 2018, 15, 056013. [Google Scholar] [CrossRef]
Ingolfsson, T.M.; Hersche, M.; Wang, X.; Kobayashi, N.; Cavigelli, L.; Benini, L. EEG-TCNet: An Accurate Temporal Convolutional Network for Embedded Motor-Imagery Brain–Machine Interfaces. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Toronto, ON, Canada, 11–14 October 2020; pp. 2958–2965. [Google Scholar] [CrossRef]
Liao, W.; Miao, Z.; Liang, S.; Zhang, L.; Li, C. A composite improved attention convolutional network for motor imagery EEG classification. Front. Neurosci. 2025, 19, 1543508. [Google Scholar] [CrossRef]
Yu, Z.; Cao, D.; Zhou, P. Motor Imagery EEG Decoding Based on Multi-Branch Separable Temporal Convolutional Network. In Proceedings of the 2024 China Automation Congress (CAC), Qingdao, China, 1–3 November 2024; pp. 6058–6063. [Google Scholar] [CrossRef]
Yang, Y.; Li, M.; Wang, L. An adaptive session-incremental broad learning system for continuous motor imagery EEG classification. Med. Biol. Eng. Comput. 2025, 63, 1059–1079. [Google Scholar] [CrossRef]
Song, Y.; Zheng, Q.; Liu, B.; Gao, X. EEG Conformer: Convolutional Transformer for EEG Decoding and Visualization. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 710–719. [Google Scholar] [CrossRef]
Qin, Y.; Yang, B.; Ke, S.; Liu, P.; Rong, F.; Xia, X. M-FANet: Multi-Feature Attention Convolutional Neural Network for Motor Imagery Decoding. IEEE Trans. Neural Syst. Rehabil. Eng. 2024, 32, 401–411. [Google Scholar] [CrossRef] [PubMed]
Hang, W.; Wang, J.; Liang, S.; Lei, B.; Wang, Q.; Li, G.; Chen, B.; Qin, J. Multiscale Convolutional Transformer with Diverse-aware Feature Learning for Motor Imagery EEG Decoding. IEEE Trans. Cogn. Dev. Syst. 2025, 17, 1389–1400. [Google Scholar] [CrossRef]
Ghinoiu, B.; Vlădăreanu, V.; Travediu, A.M.; Vlădăreanu, L.; Pop, A.; Feng, Y.; Zamfirescu, A. EEG-Based Mobile Robot Control Using Deep Learning and ROS Integration. Technologies 2024, 12, 261. [Google Scholar] [CrossRef]
Gui, Y.; Tian, Z.; Liu, X.; Hu, B.; Wang, Q. FBLSTM: A Filter-Bank LSTM-based deep learning method for MI-EEG classification. In Proceedings of the International Conference on Signal Processing and Communication Technology (SPCT 2022), Harbin, China, 23–25 December 2022; Proceedings of SPIE, the International Society for Optical Engineering; SPIE: Bellingham, WA, USA, 2023; Volume 12615, pp. 470–475. [Google Scholar] [CrossRef]
Chen, H.; Tian, A.; Zhang, Y.; Liu, Y. Early Time Series Classification Using TCN-Transformer. In Proceedings of the 2022 IEEE 4th International Conference on Civil Aviation Safety and Information Technology (ICCASIT), Dali, China, 12–14 October 2022; pp. 1079–1082. [Google Scholar] [CrossRef]
Xiong, F.; Fan, M.; Yang, X.; Li, Y.; Yang, C.; Zheng, J.; Wang, C.; Zhou, J. Research on Emotion Recognition Model Based on ConvTCN-LSTM-DCAN Model with Sparse EEG Channels. Res. Sq. 2024. [Google Scholar] [CrossRef]
Jiang, X.; Bian, G.B.; Tian, Z. Removal of Artifacts from EEG Signals: A Review. Sensors 2019, 19, 987. [Google Scholar] [CrossRef]
Abibullaev, B.; Keutayeva, A.; Zollanvari, A. Deep Learning in EEG-Based BCIs: A Comprehensive Review of Transformer Models, Advantages, Challenges, and Applications. IEEE Access 2023, 11, 127271–127301. [Google Scholar] [CrossRef]
McFarland, D.J.; Miner, L.A.; Vaughan, T.M.; Wolpaw, J.R. Mu and Beta Rhythm Topographies During Motor Imagery and Actual Movements. Brain Topogr. 2000, 12, 177–186. [Google Scholar] [CrossRef]
Vafaei, E.; Hosseini, M. Transformers in EEG Analysis: A Review of Architectures and Applications in Motor Imagery, Seizure, and Emotion Classification. Sensors 2025, 25, 1293. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Yu, S.; Li, J.; Ma, J.; Wang, F.; Sun, S.; Yao, D.; Xu, P.; Zhang, T. Brain state and dynamic transition patterns of motor imagery revealed by the Bayes hidden Markov model. Cognitive Neurodyn. 2024, 18, 2455–2470. [Google Scholar] [CrossRef]
Narayan, Y. Motor-Imagery EEG Signals Classificationusing SVM, MLP and LDA Classifiers. Turk. J. Comput. Math. Educ. (TURCOMAT) 2021, 12, 3339–3344. [Google Scholar] [CrossRef]
Aggarwal, S.; Chugh, N. Signal processing techniques for motor imagery brain computer interface: A review. Array 2019, 1–2, 100003. [Google Scholar] [CrossRef]
Ang, K.K.; Chin, Z.Y.; Zhang, H.; Guan, C. Filter Bank Common Spatial Pattern (FBCSP) in Brain-Computer Interface. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; pp. 2390–2397. [Google Scholar] [CrossRef]
Avelar, M.C.; Almeida, P.; Faria, B.M.; Reis, L.P. Applications of Brain Wave Classification for Controlling an Intelligent Wheelchair. Technologies 2024, 12, 80. [Google Scholar] [CrossRef]
Riyad, M.; Khalil, M.; Adib, A. MI-EEGNET: A novel convolutional neural network for motor imagery classification. J. Neurosci. Methods 2021, 353, 109037. [Google Scholar] [CrossRef] [PubMed]
Salami, A.; Andreu-Perez, J.; Gillmeister, H. EEG-ITNet: An Explainable Inception Temporal Convolutional Network for Motor Imagery Classification. IEEE Access 2022, 10, 36672–36685. [Google Scholar] [CrossRef]
Qin, Y.; Li, B.; Wang, W.; Shi, X.; Wang, H.; Wang, X. ETCNet: An EEG-based motor imagery classification model combining efficient channel attention and temporal convolutional network. Brain Res. 2024, 1823, 148673. [Google Scholar] [CrossRef] [PubMed]
Zhu, L.; Wang, Y.; Huang, A.; Tan, X.; Zhang, J. An improved multi-scale convolution and Transformer network for EEG-based motor imagery decoding. Int. J. Mach. Learn. Cybern. 2025, 16, 4997–5012. [Google Scholar] [CrossRef]
Saputra, M.; Setiawan, N.A.; Ardiyanto, I. Deep Learning Methods for EEG Signals Classification of Motor Imagery in BCI. Int. J. Inf. Technol. Electr. Eng. (IJITEE) 2019, 3, 80. [Google Scholar] [CrossRef][Green Version]
Kim, J.; Kim, H.; Kim, H.; Lee, D.; Yoon, S. A comprehensive survey of deep learning for time series forecasting: Architectural diversity and open challenges. Artif. Intell. Rev. 2025, 58, 216. [Google Scholar] [CrossRef]
Liu, K.; Xing, X.; Yang, T.; Yu, Z.; Xiao, B.; Wang, G.; Wu, W. DMSACNN: Deep Multiscale Attentional Convolutional Neural Network for EEG-Based Motor Decoding. IEEE J. Biomed. Health Inform. 2025, 29, 4884–4896. [Google Scholar] [CrossRef] [PubMed]
Chunduri, V.; Aoudni, Y.; Khan, S.; Aziz, A.; Rizwan, A.; Deb, N.; Keshta, I.; Soni, M. Multi-scale spatiotemporal attention network for neuron based motor imagery EEG classification. J. Neurosci. Methods 2024, 406, 110128. [Google Scholar] [CrossRef]
Xiong, S.; Wang, L.; Xia, G.; Deng, J. MBCNN-EATCFNet: A multi-branch neural network with efficient attention mechanism for decoding EEG-based motor imagery. Robot. Auton. Syst. 2025, 185, 104899. [Google Scholar] [CrossRef]
Chen, X.; Teng, X.; Chen, H.; Pan, Y.; Geyer, P. Toward reliable signals decoding for electroencephalogram: A benchmark study to EEGNeX. Biomed. Signal Process. Control 2024, 87, 105475. [Google Scholar] [CrossRef]
Brunner, C.; Leeb, R.; Müller-Putz, G. BCI Competition 2008–Graz Data Set A. IEEE Dataport 2024. [Google Scholar] [CrossRef]

Figure 1. The Overall Architecture of TPHFC-Net. The architecture initiates by progressively extracting robust features from EEG signals, a process that incorporates a denoising diffusion model. The extracted features are then channeled into three parallel streams, where TCN, Transformer, and LSTM modules concurrently model the inherent heterogeneous temporal dynamics. To effectively integrate these complementary representations, a dynamic gating module adaptively fuses the features from all three pathways, before feeding the resulting unified representation into a prototype-attention-based classifier for the final classification task.

Figure 2. The flowchart of our methodology.

Figure 3. Boxplot of accuracy distribution for different models on BCI Competition IV-2a.

Figure 4. The confusion matrices for all 9 subjects on BCI Competition IV-2a.

Figure 5. The distribution of feature vectors for S01 based from BCI IV 2a. All feature vectors are mapped to the 2D space using the t-SNE method. (a) Raw Signal. (b) TCN Features. (c) Transformer Features. (d) LSTM Features. (e) Fused Features. (f) Prototype-Attention Features.

Figure 6. The accuracy for each subject of the ablation experiment on BCI Competition IV-2a.

Table 1. Model Training Parameter Configuration.

Configuration Item	Parameter
Batch-size	32
Learning-rate	0.001
Epochs	2000
Optimizer	SGD

Table 2. Subject-dependent classification accuracy (%) and Kappa scores of different models on the BCI Competition IV-2a dataset.

Method	S1	S2	S3	S4	S5	S6	S7	S8	S9	AVG	Kappa
EEGNet [12]	84.34	54.06	87.54	63.59	67.39	54.88	88.8	76.75	74.24	72.40	0.63
Shallow ConvNet [11]	79.51	56.25	88.89	80.9	57.29	53.82	91.67	81.25	79.17	74.31	0.66
EEG-TCNet [13]	85.77	65.02	94.51	64.91	75.36	61.4	87.36	83.76	78.03	77.35	0.70
EEG-ITNet [34]	84.38	62.85	89.93	69.1	74.31	57.64	88.54	83.68	80.21	76.74	–
DMSACNN [39]	86.81	61.11	92.71	67.01	72.57	70.83	87.5	85.07	80.21	78.20	0.71
EEG Conformer [17]	88.19	61.46	93.40	78.13	52.08	65.28	92.36	88.19	88.89	78.66	0.72
MSSAN [40]	83.19	69.97	93.44	70.97	79.31	67.28	81.22	84.66	83.33	79.26	–
M-FANet [18]	86.81	75.00	91.67	73.61	76.39	61.46	85.76	75.69	87.15	79.39	0.73
ASiBLS [16]	85.17	75.83	86.71	73.71	79.20	68.78	82.91	83.2	83.46	79.89	0.72
ETCNet [35]	90.62	64.93	93.75	78.47	79.51	66.32	87.85	81.94	82.99	80.71	0.74
MBCNN-EATCFNet [41]	84.72	67.71	94.58	74.17	81.74	69.31	90.35	83.68	85.83	81.34	–
SMT [15]	83.33	68.41	92.93	83.33	76.65	74.65	94.09	82.56	83.68	82.18	0.76
Ours	87.50	62.50	96.53	78.47	78.82	69.44	90.62	87.85	90.28	82.45	0.77

Table 3. Subject-dependent classification accuracy (%) and Kappa scores of different models on the BCI Competition IV-2b dataset.

Method	S1	S2	S3	S4	S5	S6	S7	S8	S9	AVG	Kappa
EEGNet [12]	73.56	69.21	85.81	96.94	91.44	77.94	91.13	93.06	88.06	85.24	0.71
Shallow ConvNet [11]	71.25	63.93	77.81	96.56	94.06	87.81	87.19	91.56	85.63	83.98	0.68
Deep ConvNet [11]	72.69	67.79	81.75	94.25	90.81	85.44	90.50	91.31	86.12	84.50	0.69
EEG-TCNet [13]	75.08	70.43	84.38	96.38	95.25	78.44	88.31	92.69	84.19	85.01	0.70
SMT [15]	77.93	72.35	86.88	97.56	94.38	85.63	92.82	95.07	86.52	87.67	0.74
Ours	83.33	72.50	77.92	99.17	98.33	90.83	93.33	96.25	93.75	89.49	0.78

Table 4. Comparison of classification accuracy (%) and Kappa scores for subject-independent MI tasks across various models on BCI Competition IV-2a and IV-2b datasets.

Method	BCI IV-2a		BCI IV-2b
Method	Accuracy (%)	Kappa	Accuracy (%)	Kappa
EEGNet [12]	56.85	0.42	76.11	0.52
Shallow ConvNet [11]	56.75	0.42	74.92	0.50
EEG Conformer [17]	57.43	0.43	73.61	0.47
EEGNeX [42]	63.02	0.51	74.47	0.48
EEG-TCNet [13]	65.12	0.52	75.14	0.50
Ours	67.36	0.56	83.74	0.67

Table 5. Parameter number, FLOPs and Mean latency comparison.

Method	FLOPs (M)	Params (k)	Mean Latency (ms)
EEGNet [12]	24.44	3.44	0.71
Shallow ConvNet [11]	113.92	46.12	0.45
DeepConvNet [11]	283.25	67.05	0.79
EEG-TCNet [13]	13.74	4.04	1.56
EEG-Conformer [17]	63.86	789.80	4.67
SMT [15]	56.20	297.49	3.065
Ours	62.07	48.47	2.743

Table 6. The average accuracy and kappa of the ablation experiment on BCI Competition IV-2a. The symbol ✓ indicates that the corresponding module is included, while a blank entry indicates that the module is not included.

TCN	Transformer	LSTM	Accuracy (%)	Kappa
✓	✓	✓	82.45	0.77
✓	✓		79.86	0.73
✓		✓	79.90	0.73
	✓	✓	78.86	0.72
✓			78.01	0.71
	✓		76.00	0.68
		✓	75.46	0.67
			73.27	0.66

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jin, Y.; Dou, C.; Wang, D.; Liu, C. TPHFC-Net—A Triple-Path Heterogeneous Feature Collaboration Network for Enhancing Motor Imagery Classification. Technologies 2026, 14, 96. https://doi.org/10.3390/technologies14020096

AMA Style

Jin Y, Dou C, Wang D, Liu C. TPHFC-Net—A Triple-Path Heterogeneous Feature Collaboration Network for Enhancing Motor Imagery Classification. Technologies. 2026; 14(2):96. https://doi.org/10.3390/technologies14020096

Chicago/Turabian Style

Jin, Yuchen, Chunxu Dou, Dingran Wang, and Chao Liu. 2026. "TPHFC-Net—A Triple-Path Heterogeneous Feature Collaboration Network for Enhancing Motor Imagery Classification" Technologies 14, no. 2: 96. https://doi.org/10.3390/technologies14020096

APA Style

Jin, Y., Dou, C., Wang, D., & Liu, C. (2026). TPHFC-Net—A Triple-Path Heterogeneous Feature Collaboration Network for Enhancing Motor Imagery Classification. Technologies, 14(2), 96. https://doi.org/10.3390/technologies14020096

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

TPHFC-Net—A Triple-Path Heterogeneous Feature Collaboration Network for Enhancing Motor Imagery Classification

Abstract

1. Introduction

2. Related Works

2.1. Classification of Motor-Imagery EEG

2.2. Motor-Imagery Classification with CNN

2.3. TCN Combined with Transformer/LSTM

3. Methodology

3.1. Data Pre-Processing

3.2. Progressive Feature Extractor

3.2.1. Decoupling of Spatiotemporal Features

3.2.2. Multi-Scale Pattern Capture

3.2.3. Diffusion-Driven Feature Enhancement

3.3. Triple-Path Collaborative Temporal Architecture

3.3.1. Lite-MSTCN: Capturing Local Multi-Scale Dependencies

3.3.2. Lite-Transformer: Capturing Global Contextual Dependencies

3.3.3. Lite-LSTM: Modeling State Evolution Dynamics

3.4. Dynamic Gating Fusion Module

3.5. Prototype-Guided Classifier

3.6. Loss Functions and Training Strategy

4. Experiments Details

4.1. Experiment Setup

4.1.1. Experiment Preparations

4.1.2. Dataset and Evaluation Metrics

4.1.3. Implementation Details

4.2. Comparison with SOTA

4.3. Computational Cost

4.4. t-SNE Visualization of the Extracted Features

4.5. Ablation Study

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI