Neuroscience-Inspired Deep Learning Brain–Machine Interface Decoder

Ou, Hong-Yun; Hasegawa, Takahiro; Fukayama, Osamu; Miyashita, Eizo

doi:10.3390/bioengineering13040440

Open AccessArticle

Neuroscience-Inspired Deep Learning Brain–Machine Interface Decoder

¹

Graduate School of Life Science and Technology, Institute of Science Tokyo, Yokohama 226-0026, Japan

²

Graduate School of Information Science and Technology, The University of Tokyo, Bunkyo, Tokyo 113-8656, Japan

³

Center for Information and Neural Networks, Advanced ICT Research Institute, National Institute of Information Science and Technology, Suita 565-0871, Japan

^*

Author to whom correspondence should be addressed.

Bioengineering 2026, 13(4), 440; https://doi.org/10.3390/bioengineering13040440

Submission received: 5 February 2026 / Revised: 2 April 2026 / Accepted: 3 April 2026 / Published: 10 April 2026

(This article belongs to the Special Issue Engineering Neural Motor Control: From Mechanisms to Neural Interfaces)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Brain–machine interfaces (BMIs) aim to decode motor intentions from neural activity to enable direct control of external devices. However, most existing decoders rely on monolithic architectures that fail to capture the distinct neural representations of different joint movement directions, limiting their generalizability. In this work, we propose a Single-Direction CNN-LSTM decoder inspired by motor cortex encoding mechanisms, which separately models extension and flexion dynamics through parallel CNN-LSTM branches. Each branch extracts spatial–temporal features from neural spike data and predicts directional joint variables, which are then combined by subtraction to yield the net angular velocity and torque of upper-limb joints. Using invasive recordings from a macaque during a 2D center-out reaching task, we demonstrate that our decoder achieves comparable performance to a conventional CNN-LSTM when trained on all tasks, while significantly outperforming both CNN-LSTM and linear regression baselines in cross-target generalization scenarios. Moreover, the model can capture physiologically meaningful co-contraction patterns, providing richer insights into motor control. These results suggest that incorporating neuroscience-inspired modular decoding into deep neural architectures enhances robustness and adaptability across tasks, offering a promising pathway for BMI applications in prosthetics and rehabilitation.

Keywords:

brain–machine interface; deep learning; CNN-LSTM; motor decoding

1. Introduction

Brain–machine interface is a technology that builds a bridge between the brain and the external device. A brain–machine interface (BMI or BCI) system will decode the user’s movement or selection from neural signals so the user can control external devices by brain activity [1,2,3,4,5]. According to the place of implanted electrodes or signals that the decoder uses, BMI devices can be divided into invasive and non-invasive [6]. Non-invasive BMIs usually use signals like EEG to estimate subjects’ imaginary movement [7,8,9] or sensory stimulation [10,11]. However, EEG signals are always noisy and full of artifacts because the electrodes are attached to the surface of the scalp [9,12,13,14]. In contrast, for invasive signals like ECoG, LFP, or spike data, electrodes are implanted on the surface or inside the cortex, so their signal quality is much higher than EEG [15] and more explainable.

To decode information from neural signals, numerous studies have employed the population vector [16] as a feature representation. State discrimination has often been performed using linear discriminators, and solutions for linear state-space representation models have been explored through methods such as Kalman filtering [17]. Due to the non-stationary and non-Gaussian nature of neural signals, these decoders can roughly estimate the presence or absence of movement or torque, but they are not precise, and their reproducibility is lacking. With the development of artificial intelligence (AI), deep neural networks (DNNs) have made remarkable achievements in non-invasive BMIs [18], and some benchmark decoder models have been proposed like EEGnet [19] and DeepConvNet [20]. Those deep learning based methods have significantly improved decoding capabilities [21]. There are also more and more researchers using DNN models to decode information from cortical neural signals and have obtained some exciting results in motor, speech [22], and cognitive function reconstruction [23]. Furthermore, by the use of data-driven feature extraction modules, researchers can get some abstract but important features of neural activity which is imperceptible for human [24,25]. In motor decoder, for example, Xie et al. [26] predicted continuous flexion and extension of five fingers using an end-to-end DNN model with four spatial/temporal convolutional layers (CNNs) as feature extractors and one LSTM layer to predict fingers activation from ECoG signals. The coefficient of determination of their model can reach 74%. Sliwowski et al. [24] employed a CNN layer combined with several other architectures to reconstruct 3D hand movements from the ECoG signals of a tetraplegic patient, they reported that a CNN-LSTM model with multi-time-step trajectory prediction achieved an average

R^{2}

of approximately 0.232 for decoding 3D hand positions during a motor imagery task, representing about a 60% improvement over other architectures. Although DNN decoders were tested with higher performance in offline decoding, but their generalizability across subjects and tasks is a big challenge.

Brain science has focused on understanding how the brain represents or encodes information. Many researches about neural signal decoding have estimated the output movement variables in the Cartesian coordinates [24]. However, one research argued that neural activity in the motor cortex represents information in joint coordinates (e.g., joint angle or joint torque) but not in Cartesian coordinates [27]. By the way, joint coordinates are essential for some special scenarios, such as prostheses and motor rehabilitation [4,28], in which an accurate output joint torque will give patients positive support. In our previous research, we found that the motor cortex of the monkey encoded the movement variables of the shoulder and elbow in four separate modules, representing different rotation directions of the two joints respectively-shoulder extension, shoulder flexion, elbow extension, and elbow flexion [29]. Furthermore, a recent research from Tian et al. [30] indicated that the features representing variables in different dimensions should be orthogonal. All of this evidence implies that the joint movement variables in different rotations could be encoded in different neural patterns in the motor cortex. Inspired by this observation, we propose that a model which extracts features independently across different motion directions can enhance the generalizability of the decoder. In contrast, previous decoder architectures have consistently employed shared feature extraction blocks across all movement variables. We argue that this design limits the generalizability of motor decoders in free-moving tasks. However, training a direction-specific decoder independently is challenging, as movement variables along a single direction are often more difficult to measure than net movement variables.

Drawing on these insights, we propose a DNN architecture to separately decode the upper-limb joint movement variables, inspired by the motor cortex encoding mechanism we previously identified. Our backbone decoder is a CNN-LSTM model, which we term the Single-Direction CNN-LSTM. The “Single-Direction” design is intended to extract independent representations of joint movement variables along different directions, thereby enhancing generalization across tasks. The decoder consists of three main components: (1) independent branches that estimate extension and flexion variables separately; (2) a feature extraction module with convolutional layers to capture both spatial and temporal patterns of neural activity; and (3) an output module composed of an LSTM and fully connected (FC) layer to model the dynamic patterns, with parameters that can be fine-tuned for efficient few-shot adaptation to new tasks.

To validate the effectiveness of our approach, we employed a dataset that we recorded before from a Japanese macaque motor cortex in a 2D center-out reaching experiments shown in Figure 1 [29]. During the experiment, we recorded the hand position on the horizontal workspace of the monkey and then calculated the joint angle, angular velocity joint torque with the motion equation of the monkey’s upper limb. We first experimentally demonstrated the practical feasibility of our decoder in a single task and motion equation by training and testing with all the neural spike activity and movement variables we recorded. Subsequently, we evaluated the generalization capability of our Single-Direction decoder. We use data from only two targets to train our decoder, then fine-tune the parameters of the output layer to mimic multi-task scenarios. We use a conventional CNN-LSTM model without branches and a simple linear regression (LR) model to give a comparison. The result shows that our Single-Direction decoder has the highest determination coefficient (

R^{2}

) with competitive performance to the conventional CNN-LSTM model in single task. The main contributions of this work include:

1.: We propose a DNN-based decoder (CNN-LSTM) for motor decoding, offering a novel formulation that bridges neuroscience mechanisms with deep learning approaches for prosthetic and rehabilitation applications.
2.: We introduce the Single-Direction CNN-LSTM, which decodes joint variables independently across directions, thereby improving task-level generalizability.

2. Materials and Methods

2.1. Data Preparation

Parts of dataset in our previous work [29] were used in the present study. To explain it, neural activity in the motor cortex of Japanese monkeys (Macaca fuscata) was systematically recorded during a center-out hitting task toward peripheral 8 targets with 1-mm grids through a glass-coated Elgiloy electrode. The schematic of the experiment is shown in Figure 1. Instead of being recorded simultaneously with a multi-channel electrode array, the electrode recording sites were changed daily. That means we used a pseudo-population as the reaching tasks are repetitive across sessions.We resampled the dataset as the monkey executed similar left-hand movement trajectories toward each target.

After spike sorting, we selected neurons with marked different preferred direction vectors (PV) [31] in flexion and extension rotation of the elbow and shoulder, which we used for our decoding. The total number of neurons we used was 73. Data of spike-firing timing was aligned with movement onset. Trial-averaged spike-firing rates for eight different targets were calculated using a 1-ms interval. Finally, the processed data from each experimental day were combined to form a large dataset, as illustrated in Figure 1.

To ensure that the monkey’s arm movements followed approximately the same trajectory toward each target across recording sessions, we first computed a reference hand trajectory for each target using all available motion data recorded in each session. Trials that exhibited large deviations from this reference within the same day were then discarded. Specifically, for each trial we calculated the angle of hand trajectory at the point located 47 mm away from the home position and clustered the trials based on these angles into three groups by K-means algorithm. The cluster containing the largest number of trials was identified as the reference, and its average trajectory was denoted as

\bar{x} (t)

and

\bar{y} (t)

for the two spatial dimensions. Movement onset was aligned across trials using the mean onset timing, which was defined as the time point when the hand’s tangential acceleration first reached or exceeded 0.8 m/s². To quantify the similarity between each individual trial and the reference, we evaluated both the trajectory error and the coefficient of determination, calculated as follows:

s_{i} = \frac{\sqrt{\sum {(x_{i} (t) - \bar{x} (t))}^{2} + \sum {(y_{i} (t) - \bar{y} (t))}^{2}}}{T}

(1)

R_{i}^{2} = 1 - \frac{\sum {(v_{i} (t) - \bar{v} (t))}^{2}}{\sum {(v_{i} (t) - \bar{v})}^{2}}

(2)

where

s_{i}

and

R_{i}^{2}

denote the trajectory error and the coefficient of determination between the hand trajectory of trial i and the reference. Here,

v_{i} (t)

and

\bar{v} (t)

represent the hand velocities of trial i and the reference trajectory, respectively, and

\bar{v}

is the temporal mean of

\bar{v} (t)

. T denotes the total number of time steps, which in this study was defined as a 1500-ms window from the onset to termination, sampled at 1 kHz. Only trials satisfying

s_{i} < 20

and

R_{i}^{2} > 0.6

were retained for further analysis.

2.2. Data Preprocessing

Recorded neural activity was digitized at a sampling rate of 40 kHz. To extract a single neuron’s activity from the timeseries electric potential data acquired using extracellular recording, we sorted spikes using Wave-Clus (University of Leicester, Leicestershire, England). Spike activity was detected when the amplitude exceeded or fell below a threshold level, which was spike-firing timing, and data of 0.25 ms before and 0.75 ms after this timing were treated as a spike. We computed the firing rate as the inverse number of the inter-spike interval of the spikes immediately before and after the 1ms time-bin and smoothed it with a 4th-order Butterworth filter with a cutoff frequency of 7 Hz. Data of spike-firing timing was aligned upon movement onset, which was defined in this paper as the time when tangential acceleration of the hand was equal to or greater than 0.8 m/s². Trial-averaged spike-firing rates for eight different targets were calculated using a 1-ms interval. After applying the moving average with a 40-ms time window, the data were resampled at a 10-ms interval.

On the one hand, the positions of the recorded LEDs were sampled at 1 kHz and converted into the monkey’s hand position in Cartesian coordinates. From these trajectories, hand velocity and acceleration were derived. The kinematic data were then down-sampled to 100 Hz. On the other hand, the shoulder and elbow joint angles, as well as their angular velocities, were computed:

θ_{s} = arctan (\frac{y}{x}) - arccos (\frac{l_{1}^{2} - l_{2}^{2} + x^{2} + y^{2}}{2 l_{1} \sqrt{x^{2} + y^{2}}})

(3)

θ_{e} = - arccos (\frac{l_{1}^{2} + l_{2}^{2} - x^{2} - y^{2}}{2 l_{1} l_{2}})

(4)

where x and y are the x- and y-axes elements of the hand position in Cartesian coordinates, and

l_{1}

and

l_{2}

are the upper arm and forearm lengths, respectively. Finally, the same filter as that used for the spike data was used to obtain the smooth hand position, velocity, joint angle, and angular velocity. All the processing was done with Matlab 2024b.

Further, the joint torques were calculated according to the arm motion equation:

T_{A} = M_{A} (Θ_{A}) {\ddot{Θ}}_{A} + V_{A} (Θ_{A}, {\dot{Θ}}_{A})

(5)

T_{A} = [\begin{matrix} τ_{s} \\ τ_{e} \end{matrix}], Θ_{A} = [\begin{matrix} θ_{s} \\ θ_{e} \end{matrix}]

(6)

M_{A} (Θ_{A}) = [\begin{matrix} I_{1} + I_{2} + 2 m_{2} l_{1} l_{g 2} cos θ_{2} + m_{2} l_{1}^{2} & I_{2} + m_{2} l_{1} l_{g 2} cos θ_{2} \\ I_{2} + m_{2} l_{1} l_{g 2} cos θ_{2} & I_{2} \end{matrix}]

(7)

In this paper, we used four variables, shoulder angular velocity (

{\dot{θ}}_{s}

), elbow angular velocity (

{\dot{θ}}_{e}

), shoulder torque (

τ_{s}

), elbow torque (

τ_{e}

) as our outputs. To obtain the angular velocities of the shoulder and elbow, we first computed the central difference of the joint angles, then applied a fourth-order low-pass filter with a cutoff frequency of 5 Hz.

I_{1}

and

I_{2}

are inertia moments of the upper arm around the shoulder joint and the forearm around the elbow joint, respectively.

m_{2}

and

l_{g 2}

refer to the weight of the forearm and distance from the elbow joint to the center of gravity of the forearm (half of the length of the forearm). We defined the monkey’s arm segment lengths as

l_{1} = 132

mm and

l_{2} = 207

mm, and the masses as

m_{1} = 0.295

kg and

m_{2} = 0.280

kg.

V_{A}

corresponds to Coriolis and centrifugal forces. In order to get the final real torque produced by joint rotation, it is also necessary to add the torque produced by pushing the manipulandum to

T_{A}

:

F_{M} = M_{m} (Θ_{m}) \ddot{X} + V_{m} (Θ_{m}, {\dot{Θ}}_{m})

(8)

M (Θ) = M_{A} (Θ_{A}) + J^{T} (Θ_{A}) M_{m} (Θ_{m})

(9)

V (Θ, \dot{Θ}) = V_{A} (Θ_{A}, {\dot{Θ}}_{A}) + J^{T} (Θ_{A}) V_{m} (Θ_{m}, {\dot{Θ}}_{m})

(10)

T o r = M (Θ) {\ddot{Θ}}_{A} + V (Θ, \dot{Θ})

(11)

where

T o r

is the total torque vector that we use in this study,

J^{T} (Θ_{A})

is the transpose Jacobian of the relationship between joint angles and manipulandum position.

Θ_{m}

represents the vector of the mass of the manipulandum ([0.565 kg, 0.065 kg] in this study), while

F_{M}

denotes the force from the manipulandum. The outputs movement last for 1800 ms from 300 ms before the onset of reaching phase towards the target to the end of the movement after hitting the target.

Before using the spike-firing data and motion data in the decoder, we divided the dataset into the train/validation set and the test set. All the trials were aligned based on the movement onset. For all models, we used five-fold cross-validation to test the performance. For the DNN decoder, to avoid overfitting, we split 80% of sessions (about 896 trials, each target repeated 112 times) as the train/validation set and 20% of sessions as the test set (about 224 trials, each target repeated 28 times) [32]. For the linear decoder, we split 80% sessions as training data while the other 20% sessions as test because validation is not necessary for the linear decoder.

{\hat{x}}_{i} = \frac{x_{i} - \frac{1}{C} \sum_{i = 1}^{C} x_{i}}{σ (x_{i})}

(12)

2.3. Proposed Model

In this study, we employed a deep neural network (DNN) model—referred to as the Single-Direction CNN-LSTM model (SingleNet)—to decode the angular velocity and torque of the monkey’s shoulder and elbow joints. The model architecture comprises multiple CNN-LSTM branches, as illustrated in Figure 2b. A fundamental building block, termed the decoder block, is shown in Figure 2a (note that the activation function of the final layer may vary depending on the specific output). Unlike conventional CNN-LSTM decoders used in previous studies, which typically employ a single decoder block to estimate all output variables simultaneously and share feature extraction layers across all outputs [24,26], the SingleNet adopts a different design. It consists of four pairs of CNN-LSTM branches—amounting to eight decoder blocks in total—each dedicated to independently estimating the parameters for extension and flexion movements corresponding to a single joint variable.

The final output, representing the net joint parameter, is obtained by subtracting the flexion-related component from the extension-related one, as motivated by prior work [33]. To implement this, a subtraction layer is placed after each pair of branches. Furthermore, we apply the ReLU activation function to the final layer of each decoder block, constraining the output to positive values to emulate a single-directional response. This design choice reflects the observation from our previous research that neural encoding patterns differ for opposite movement directions [29]. Because the decoder independently estimates extension and flexion parameters, we refer to it as the single-direction decoder. The internal structure of each CNN-LSTM branch is detailed in Figure 2b, and the full configuration is described in the following section.

2.3.1. Convolutional Layers

Similar to EEGnet [19], we employed two convolutional layers to extract temporal and spatial features from the spike data independently. The first separable convolution layer focuses on temporal feature extraction, capturing the specific time intervals that the network is most responsive to—referred to as the “lag time” [34,35]. The temporally filtered data is then passed to a second separable convolutional layer, which is responsible for extracting spatial features. The learned weights in this layer indicate which neuronal units the model attends to most, i.e., those whose activity exhibits the highest correlation with the target output. The output from the spatial convolution layer is subsequently flattened and passed through a max-pooling operation, after which it is fed into an LSTM layer for temporal sequence modeling.

2.3.2. LSTM Layer

LSTM is a special form of recurrent neural network (RNN) invented by Hochreiter and Schmidhuber in 1997 [36]. When the time sequence is very long, traditional RNN faces the problem of gradient vanishing or explosion [37]. LSTM solves this problem by using multiple gates to control the data flow. Since its appearance, LSTM has been widely used in many time sequence prediction problems. There are also many researches who use LSTM as a decoder in intracranial BMI and have obtained some exciting results. The equation and structure of LSTM cell is shown as (12)–(17) and Figure 2c, where

f_{t}, i_{t}, O_{t}

mean forget gate, input gate, and output gate at time t respectively.

x_{t}, h_{t}, C_{t}

mean input data, hidden state and statement of memory cell at time t. Detailed information is shown in [36]. In our decoder, we used an LSTM layer to fit the dynamic sequences of the output parameters. Connected with a readout full-connected layer, these two layers can do as the output layer of our decoder, as in Figure 2a. Since the angular velocity and torque during extension or flexion are always non-negative, we employ the ReLU function [38] as the activation function of the fully connected layer.

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(13)

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(14)

\tilde{C_{t}} = t a n h (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{C})

(15)

C_{t} = f_{t} * C_{t - 1} + i_{t} * \tilde{C_{t}}

(16)

O_{t} = σ (W_{O} \cdot [h_{t - 1}, x_{t}] + b_{O})

(17)

h_{t} = O_{t} * t a n h (C_{t})

(18)

2.4. Single-Direction CNN-LSTM Decoder

As shown in Figure 2, there is a pair of branches inside the model for each output variable in the Single-Direction CNN-LSTM decoder. First, two convolution layers will extract the spatial and temporal features from the input neural activity of orthogonal extension and flexion variables independently:

h_{e} = R e L U (B a t c h N o r m 2 d (C o n v (C o n v (x, θ_{e T}), θ_{e S})))

(19)

h_{f} = R e L U (B a t c h N o r m 2 d (C o n v (C o n v (x, θ_{f T}), θ_{f S})))

(20)

where

h_{e}

and

h_{f}

represent extension and flexion, respectively. Conv corresponds to the two convolution layers,

θ_{e T}

and

θ_{e S}

denote the learnable parameters inside these two layers that extract spatial and temporal features, respectively.

B a t c h N o r m 2 d

represents batch normalization for stable training, and

R e L U (\cdot)

is Rectified Linear Unit. Then the extracted spatial-temporal features will flow into the output layer, which is made up of LSTM and Fully connected layer, to fit the dynamics of variables and output the extension and flexion estimation:

H_{e, k} = R e L U (B a t c h N o r m 2 d (F C_{e} (L S T M_{e, k} (h_{e})))

(21)

H_{f, k} = R e L U (B a t c h N o r m 2 d (F C_{f} (L S T M_{f, k} (h_{f})))

(22)

where

H_{e, k}

and

H_{f, k}

denote the outputs of the extension and flexion branches at the k-th time step, respectively.

F C_{e}

and

F C_{f}

represent the fully connected layers of the extension and flexion branches. Since the outputs of ReLU are always non-negative, the estimated values for extension and flexion are also non-negative. Finally, the net variables are obtained by subtracting the flexion branch output from the extension branch output:

H_{n e t, k} = H_{e, k} - H_{f, k}

(23)

2.5. Baseline Model

2.5.1. Conventional CNN-LSTM Decoder

We compared our single-direction decoder with the conventional CNN-LSTM decoder without branches [26]. The structure of the conventional CNN-LSTM decoder is shown in Figure 2c. A single CNN-LSTM will output all the parameters (angle velocities and torques) at the same time. In order to speed up the training and keep the training balance between different outputs in case of multiple regression outputs, we need to scale the output series, so we also do the z-score normalization to the output motion data as input spike data [39].

2.5.2. Linear Decoder

To give a more complete comparison, we also compare the performance between our single-direction decoder and the linear decoder [29,40]. The model of the linear decoder is the same as the Equations (24) and (25), where X means the input spike data with the shape [

(B \cdot T) \times C

] that has been explained before, P means the output parameter with the shape [

(B \cdot T) \times 4

]. We first calculated

β

based on the training dataset and then tested the trained decoder on the test set.

P = X β + ϵ

(24)

β = {(X^{T} X)}^{- 1} X^{T} P

(25)

2.6. Fine-Tuned and Generalizability Test

In center-out experiments, motion toward a target in a different direction can be classified as a different sub-task, as the dynamics in each target have different patterns and initial states [41]. So we used some targets to train the decoder and tested on the other targets to validate the generalizability of the decoder.

First, we plot the motion space of each target using the elbow and shoulder variables as horizontal and vertical coordinates as shown in Figure 3. From the figure, we chose Target IV and Target V to train our decoder because the trajectories of these two targets cover the largest area of the motion space with the fewest trajectories. Therefore, we think the decoder trained by these two targets has enough knowledge to estimate the parameters of other decoders. We also used other targets to pre-train the decoder to validate this combination.

After training the decoder, we fine-tuned its output layer—including the LSTM layer and the connected fully connected (FC) layer—using a small number of trials from a target with a different movement direction. The fine-tuned decoder was then evaluated on the remaining data from this target [42]. During fine-tuning, the convolutional layers were kept fixed, based on the assumption that they extract general features shared across all directions, while the temporal dynamics captured by the output layers vary depending on the target direction.

2.7. Environment and Hyperparameter

The experiments were conducted on a Windows PC running Python 3.10 and TensorFlow 2.10, with an RTX A4000 GPU (16 GB memory). The hyperparameter and layer settings of the decoder are summarized in Table 1. The total number of parameters for our SingleNet is 19.5 K while 9.8 K for the conventional model. After preprocessing, the spike-firing data were normalized by z-score normalization as (12) to all units [32,43], where

σ

means the standard deviation among units. The sliding window with a length of 300 ms (200 ms before the motion and 100 ms after [44]) cleaved the normalized spike data into segments with the same number of sampling points as the motion data. Finally, the input data of DNN decoder has the shape of [

B \times T \times C \times L

], in which

B, T, C, L

represent batch size, motion time, number of units, and the length of the sliding window, respectively. In this study, the shape of the input data is [

32 \times 180 \times 73 \times 30

] For the linear decoder, we just concatenate the spike data along the temporal dimension, so the input shape of the linear decoder is [

(B \cdot T) \times C

], which is [

(32 \cdot 180) \times 73

] in this research [40]. To compare estimated

R^{2}

among different models, we performed paired two-tailed Student’s t-tests. To control for multiple comparisons across outputs, including shoulder angular velocity, elbow angular velocity, shoulder torque, and elbow torque, p-values were adjusted using the Benjamini–Hochberg false discovery rate (FDR) procedure [45]. Statistical significance was defined as FDR-adjusted p < 0.05.

To mitigate potential bias from data ordering, all training and testing procedures were performed using five-fold cross-validation. The learn rate was set to

1.0 \times 10^{- 3}

for training and

1.0 \times 10^{- 4}

for fine-tuning. We used a reduce-on-plateau learning rate scheduler that decreased the learning rate by a factor of 0.1 if the validation loss did not improve for 15 epochs. For each target—except for Target IV and Target V—20% of the trials (28 trials) were used for fine-tuning the decoder, while the remaining trials were used for testing. We chose the mean squared error (MSE) as the loss function and used Adam [46] to optimize the training parameters. Early stopping with a patience of 20 epochs was applied during both training and fine-tuning to prevent overfitting. We set the maximum epochs as 1000. In most cases, training for SingleNet stopped at epoch 63, taking approximately 15 min, while the conventional model stopped at epoch 45, taking approximately 5 min. All data splits in this study were performed randomly. The batch size was set to 32, and the dropout rate was 0.5.

We used the coefficient of determination,

R^{2}

, as a measure of the strength of the linear association between the predicted and the ground-truth kinematic parameters [47]. The definition of

R^{2}

is:

R^{2} = 1 - \frac{\sum_{i} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i} {(y_{i} - \bar{y})}^{2}}

(26)

where

y_{i}

and

{\hat{y}}_{i}

are the ground truth and prediction.

\bar{y}

is the average of truth. The larger the

R^{2}

is, the better the performance.

3. Results

In this study, we developed a deep neural network architecture termed the Single-Direction CNN-LSTM Decoder, specifically designed to decode the angular velocity and joint torque of a monkey’s upper limb with isolated extension or flexion movements. The decoder incorporates parallel CNN-LSTM branches, each of which independently learns spatial and temporal representations corresponding to either extension or flexion from the input neural signals. These learned features are then propagated through an output module composed of an LSTM layer followed by a fully connected (FC) layer, which maps the high-dimensional feature representations to continuous dynamic output sequences. We hypothesize that this modular structure, by explicitly separating the directional dynamics, enhances the model’s capacity for disentangling complex motor representations and, thus, offers improved generalizability across movement contexts, compared to conventional monolithic decoding architectures.

To evaluate the model’s decoding performance, we utilized comprehensive neural spike-firing and kinematic datasets recorded during center-out reaching tasks in a non-human primate. For assessing generalizability, the model was trained exclusively on trials involving Targets 4 and 5, and then evaluated on trials targeting other directions. This cross-target evaluation was designed to mimic transfer learning scenarios. We benchmarked our model against a conventional CNN-LSTM decoder and a baseline linear regression model. Quantitative comparisons demonstrated that our Single-Direction CNN-LSTM Decoder achieved superior generalization performance, highlighting its potential for robust neural decoding in variable motor tasks.

3.1. Validation on All Data

The decoding performance across all targets is shown in Figure 4. Panel (e) presents the

R^{2}

values for shoulder and elbow angular velocity and torque decoded by the Single-Direction CNN-LSTM model, the conventional CNN-LSTM model, and the Linear Regression (LR) model. The two deep learning-based decoders significantly outperformed the LR model, with improvements of approximately 48%, 41%, 59%, and 76% for shoulder angular velocity, elbow angular velocity, shoulder torque, and elbow torque, respectively (five-fold cross validation, * p < 0.05, pair t-test, FDR correction).

Hand trajectories reconstructed from shoulder and elbow angular velocities are shown in panels (b–d). These results further demonstrate the poor performance of the LR decoder across all targets. Although the average trajectory for each target is relatively clear in the LR decoder (x:

R^{2}

= 0.578; y:

R^{2}

= 0.586), the trial-by-trial trajectories are highly scattered and difficult to distinguish, especially when compared to the more consistent outputs from the two neural network decoders. This suggests that the LR model lacks stability at the single-trial level.

However, the

R^{2}

scores of the Single-Direction decoder did not differ significantly from those of the conventional CNN-LSTM decoder trained on all eight targets (p = 0.066, 0.164, 0.121, and 0.202 for shoulder angular velocity, elbow angular velocity, shoulder torque, and elbow torque, respectively). The mean

R^{2}

values across five-fold cross-validation for both CNN-LSTM models were approximately 0.825–0.855 for angular velocity and 0.650–0.700 for torque.

3.2. Generlizability

To justify our selection of Targets V and VI for decoder pre-training, we first evaluated the rationale behind this choice. Specifically, we compared our selected combination against two alternative pre-training configurations: Targets I and II, and Targets VII and VIII. For each configuration, the decoder was pre-trained on the designated targets, then fine-tuned and tested on the remaining target. The test set size was consistent with the description in the preceding section. The resulting mean

R^{2}

values for the output motions on the fine-tuned targets are presented in Table 2 for all three pre-training combinations.

As shown in Table 2, the pre-training combination influenced decoding performance across different output variables. For shoulder and elbow angular velocity, the decoder pre-trained on Targets VII and VIII achieved the highest mean

R^{2}

values (shoulder: 0.827; elbow: 0.740). However, this same combination yielded the lowest mean

R^{2}

for shoulder and elbow torque. A similar pattern was observed for the decoder pre-trained on Targets I and II, which performed well on some variables but poorly on others. In contrast, the decoder pre-trained on Targets IV and V, while not achieving the highest performance on any single variable, produced the most consistent and balanced results across all output measures. This suggests that pre-training on Targets IV and V offers the best trade-off, resulting in a decoder with stable performance and strong generalizability across different movement variables.

In this experiment, we first used all the data from Target IV and Target V to pre-train a decoder. We then fine-tuned the decoder for each of the other targets individually, using 20% of their data (approximately 28 trials per target). Finally, we evaluated the fine-tuned decoder on the remaining data. Each fine-tuning process was conducted for 100 epochs. The test results for the six other targets are presented in Figure 5.

From the figure, it is evident that the deep learning–based decoder achieved strong performance across most output parameters for each target. Although the Single-Direction decoder we developed showed only minor differences from the conventional CNN-LSTM model when validated on the full dataset, the fine-tuned model for each target clearly outperformed the conventional CNN-LSTM decoder in most cases (five-fold cross-validation, p < 0.05, pair t-test, FDR correction), particularly for Target II and Target VI. For Target II, the mean

R^{2}

values of the fine-tuned model were approximately 15.1%, 48.7%, and 13.6% higher than those of the conventional CNN-LSTM decoder for shoulder angular velocity, elbow angular velocity, and shoulder torque, respectively. For Target VI, the corresponding improvements were 10.6%, 22.9%, 17.8%, and 43.9%.

3.3. Ablation Study

To assess the applicability of our single-direction structure, we conducted an ablation study using the same validation approach. In this study, we compared our Single Direction CNN-LSTM decoder with two variant models: (i) a model in which the feature extraction layers (CNN and LSTM) are shared while the output layers are separated (referred to as ‘SharedNet’); and (ii) a model in which the activation function of the output layers is replaced with a linear function (referred to as ‘LinearNet’). Following the procedure described in the previous subsection, each model was pretrained on Targets IV and V, then fine-tuned and tested on the remaining targets. For each model and target, we performed five-fold cross-validation and compared the mean

R^{2}

across outputs. The results of ablation study are as shown in Table 3. ‘SingleNet’ denotes the Single Direction CNN-LSTM model that we proposed.

A V_{s}

and

A V_{e}

denote the angular velocities of the shoulder and elbow, while

T_{s}

and

T_{e}

represent the corresponding torques. The table clearly demonstrates the effectiveness of our single-direction decoder structure. The proposed Single Direction CNN-LSTM model consistently achieves the highest or comparable mean

R^{2}

values for most targets. Notably, for Target VI, it surpasses both baseline models across all motion variables—shoulder and elbow angular velocities and torques—outperforming LinearNet by an average of 0.068 and SharedNet by 0.128.

3.4. Features Extracted by the Decoder

To further examine the features extracted by the Single-Direction decoder, we analyzed the weights of the two CNN layers [48], which reflect spatial and temporal representations, as shown in Figure 6 and Figure 7. In Figure 6, the black diagonal line indicates equal weights in the extension and flexion directions. The results show that most units exhibit distinct weights between these two directions, and this trend varies across output variables. Specifically, for elbow torque, many units display identical or nearly identical weights in both directions, whereas for shoulder torque and shoulder angular velocity, a larger number of units tend to exhibit stronger weights in either flexion or extension. Similarly, Figure 7 illustrates that temporal weights also differ across output variables as well as between directions. These results on spatial and temporal features suggest that the encoding and decoding patterns differ across variables and directions, which is consistent with our hypothesis as well as with findings from our previous research [29].

3.5. Estimation of Co-Contraction Increase the Generalizability

Above results indicate that the CNN-LSTM–based Single-Direction model achieves superior generalizability compared with both the conventional CNN-LSTM decoder and the linear regression model. This advantage may be caused by the measurement of muscle co-contraction, a physiological phenomenon in which agonist and antagonist muscles are simultaneously activated. This mechanism is crucial for primates and humans to maintain upper-limb stability and adapt to different environments. For instance, when lifting a cup filled with water, the movement trajectory is largely similar to that of lifting an empty cup; however, both agonist and antagonist muscles must contract simultaneously and with comparable amplitude to achieve the motion goal while ensuring hand stability. Previous studies have shown that such co-contraction information is encoded in neural activity within the M1 region [49]. Nevertheless, conventional decoders struggle to independently extract and utilize this information.

By independently decoding motion variables in extension and flexion, the Single-Direction model can more accurately capture the nuanced neural representations underlying different movement directions, which likely contributes to its superior performance across tasks. To estimate the co-contraction of the monkey’s shoulder and elbow from the decoder output, we first extract the predicted extension and flexion torques from each branch. Since both flexion and extension variables are non-negative, we define the co-contraction torque of the shoulder and elbow as the minimum of the two, as illustrated below:

{\begin{matrix} τ_{n e t} = τ_{e x t e n s i o n} - τ_{f l e x i o n} \\ τ_{c o - c o n t r a c t i o n} = min (τ_{e x t e n s i o n}, τ_{f l e x i o n}) \end{matrix}

(27)

Figure 8 presents the results. The Single-Direction decoder can independently estimate extension and flexion variables, enabling the identification of periods when agonist and antagonist muscles are simultaneously activated, as indicated by the red line. Importantly, this red line—representing the co-contraction torque—peaks when the net torque equals zero, indicating that although the joint motion remains unchanged, the joint stiffness increases. Such a phenomenon cannot be captured by the conventional CNN-LSTM or linear regression decoders, as they do not separately model extension and flexion variables. While our approach successfully distinguished between flexion and extension movements, it is critical to emphasize that we have not yet definitively proven that these decoded patterns correspond to actual joint kinematics, nor that the predicted co-contraction torque precisely matches the true physiological torque. Obtaining ground-truth measurements for these parameters in vivo is inherently challenging. Consequently, these findings should be regarded as preliminary, warranting further validation through dedicated physiological experiments.

4. Discussion

With the rapid advancement of deep learning, neural decoders have shown impressive performance. Yet, their limited generalizability across different workspaces remains a critical challenge. Our Single-Direction CNN-LSTM model, which independently estimates extension and flexion through multiple branches, addresses this issue by capturing finer motor details such as joint co-contraction. This design appears to improve adaptability compared with conventional CNN-LSTM or linear regression decoders. In comparison with previous studies that decoded joint angle variables from motor cortex activity using LSTM models and reported an average

R^{2}

of approximately 0.8 [21,26], our decoder demonstrates superior performance and is further capable of estimating joint torque simultaneously. Although in some cases there is no substantial difference between the

R^{2}

values of the fine-tuned conventional CNN-LSTM and the Single-Direction decoder, the Single-Direction decoder does not perform notably worse. In contrast, the pre-trained linear regression decoder demonstrated limited generalizability across the other six targets, with

R^{2}

values consistently lower than zero. It should be emphasized that although our Single-Direction decoder exhibits good generalizability across targets, the hypothesized encoding mechanism mentioned in the Introduction cannot be considered validated without rigorous physiological experiments.

The results confirm that the Single-Direction decoder not only outperforms linear regression but also shows advantages over the conventional CNN-LSTM in fine-tuning tests. Nevertheless, three issues merit further discussion:

4.1. Limitation of Fine-Tuning

In this study, we first pre-trained a decoder using data from Targets IV and V, and then fine-tuned the model’s output layer on the remaining targets. As a result, the model trained on only two targets was able to generalize to others. However, fine-tuning still requires sufficient data and may face the problem of an overly biased pre-trained model [50,51], which are often limited in real clinical scenarios.

Currently, many approaches have been proposed to address this challenge through few-shot learning techniques, such as meta-learning [50], domain adaptation [52], and metric learning [53], which transfer knowledge across multiple tasks to a new one. Therefore, validating our Single-Direction model within these frameworks is crucial for its practical application in BMI systems.

4.2. Limitation of Data

We used pseudo-population neural spike data from a single macaque to decode shoulder and elbow angular velocities and torques. Although the recordings—collected across multiple days—produced promising results, they do not demonstrate generalizability across subjects. Moreover, because no online tests were performed, the decoder’s performance in real-time BMI applications remains uncertain. To address these limitations, our next step is to validate the decoder with online experiments in multiple subjects using a variety of paradigms.

4.3. Musculoskeletal Model

To calculate the joint torque, we employed a musculoskeletal motion model of the monkey’s upper limb. However, accurately measuring limb length and mass is challenging, as these measurements are often subject to errors and artifacts. Consequently, obtaining precise variables for the musculoskeletal model is difficult, which in turn affects the performance of data-driven decoders.

To address this issue, one potential solution is to generate synthetic movement data based on noisy measurements to augment the training set [54]. In future work, we aim to explore data augmentation techniques to mitigate measurement inaccuracies in limb variables, thereby enhancing the overall performance of the decoder.

4.4. Co-Contraction Judgment

We operationally define the co-contraction torque as the minimum of the estimated extension and flexion torques, as expressed in (27). This definition is motivated by the following observations: (1) both extension and flexion torques are inherently non-negative, (2) their difference approximates the net torque with reasonable accuracy, and (3) the derived co-contraction tends to peak when the net torque is close to zero. Nevertheless, this measure should be regarded as a proxy rather than a definitive estimate of co-contraction. A rigorous validation would require comparison with ground-truth quantitative co-contraction data, which remains difficult to acquire [55]. As a next step, we plan to validate our predicted co-contraction torque using EMG signals recorded from the monkey’s upper-limb muscles.

5. Future and Limitation

In this study, we proposed a Single-Direction CNN-LSTM architecture in which the extension and flexion of each joint were estimated separately, with the final net movement derived by subtraction. The results demonstrate that, compared to previous conventional models that rely on shared feature extraction layers, our model achieves a higher

R^{2}

across different targets. We interpret this improvement as evidence of enhanced generalizability across diverse workspaces, suggesting better suitability for free-movement applications. Although these findings are promising, substantial work remains before such an approach can be developed into a truly general-purpose BMI suitable for unconstrained, real-world use.

5.1. Subject-Specific Bias

While this study focuses on task generalizability rather than cross-subject generalizability—largely due to the availability of data from only a single monkey—we acknowledge that cross-subject universality remains a major challenge for practical BMI devices. Due to the high instability and inter-subject variability of neural signals, a decoder fine-tuned on one individual may generalize poorly to others [4]. Therefore, a general decoder model is needed for real application. Recent studies have employed transfer learning to improve inter-subject generalization and reduce the gap between training and test subjects [48,56]. Given that our decoder was fine-tuned across different task targets, its subject-specific bias and generalizability across individuals remain to be validated in future work.

5.2. Scalability to Human BMI

While our study demonstrates promising results in non-human primates, we recognize that scaling the approach to human brain–machine interfaces (BMI) presents additional challenges, including differences in signal fidelity, electrode stability, and neural plasticity [57]. However, recent advances in cross-species transfer learning suggest that these challenges may be surmountable. For instance, a study by Wang et al. [58] successfully detected epileptic seizures in humans using a model initially trained on canine data, achieved through dataset alignment techniques. This finding provides preliminary evidence that a decoder pre-trained on animal neural activity could potentially be adapted for human application, provided that appropriate data alignment and transfer learning strategies are employed.

5.3. Musculoskeletal Model Sensitivity

As noted in the Introduction, decoding joint torque is critical for rehabilitation devices such as exoskeletons, and our study provides insights that could inform the development of future BMI-controlled exoskeletons. However, a major limitation of joint torque decoding lies in the uncertainty inherent to musculoskeletal models. Specifically, certain parameters of the upper limb—such as muscle mass—are difficult or impossible to measure directly in vivo. These unobservable or noisy parameters can introduce errors that degrade decoder performance. Recent studies have explored strategies to mitigate this issue, including physics-informed neural network (PINN) [59] and the use of deep neural networks to approximate or replace traditional musculoskeletal models [60]. Incorporating such approaches to enhance model robustness represents an important direction for our future work.

6. Conclusions

This study demonstrates that a Single-Direction CNN-LSTM model, guided by prior neuroscience knowledge, can effectively decode upper-limb joint movements with high generalizability. Using only limited training targets, the model achieves comparable performance to conventional CNN-LSTM approaches, while its strength may lie in predicting joint co-contraction. These findings suggest that incorporating neuroscience principles into brain–machine interface design provides a promising pathway toward developing universal and robust BMI systems.

Author Contributions

Conceptualization, E.M. and O.F.; methodology, H.-Y.O. and T.H.; software, H.-Y.O.; validation, H.-Y.O.; formal analysis, H.-Y.O.; investigation, H.-Y.O.; data curation, T.H. and E.M.; resources, E.M.; writing—original draft preparation, H.-Y.O.; writing—review and editing, O.F., E.M., H.-Y.O. and T.H.; visualization, H.-Y.O.; supervision, E.M.; project administration, E.M.; funding acquisition, E.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Japan Agency for Medical Research and Development (AMED) under Grant Numbers JP23ym0126812 and JP24ym0126812.

Institutional Review Board Statement

The animal experiment was approved by the Animal Experimentation Committee of the National Institute for Physiological Sciences. All animal procedures were conducted in accordance with the Guide for the Care and Use of Laboratory Animals by the institute.

Informed Consent Statement

Not applicable.

Data Availability Statement

Available upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lebedev, M.A.; Nicolelis, M.A. Brain–machine interfaces: Past, present and future. TRENDS Neurosci. 2006, 29, 536–546. [Google Scholar] [CrossRef]
Gao, X.; Wang, Y.; Chen, X.; Gao, S. Interface, interaction, and intelligence in generalized brain–computer interfaces. Trends Cogn. Sci. 2021, 25, 671–684. [Google Scholar] [CrossRef]
Dong, Y.; Wang, S.; Huang, Q.; Berg, R.W.; Li, G.; He, J. Neural decoding for intracortical brain–computer interfaces. Cyborg Bionic Syst. 2023, 4, 0044. [Google Scholar] [CrossRef]
Orban, M.; Elsamanty, M.; Guo, K.; Zhang, S.; Yang, H. A review of brain activity and EEG-based brain–computer interfaces for rehabilitation application. Bioengineering 2022, 9, 768. [Google Scholar] [CrossRef]
Abdullah; Faye, I.; Islam, M.R. EEG channel selection techniques in motor imagery applications: A review and new perspectives. Bioengineering 2022, 9, 726. [Google Scholar] [CrossRef]
Wu, X.; Metcalfe, B.; He, S.; Tan, H.; Zhang, D. A review of Motor Brain-Computer interfaces using Intracranial Electroencephalography based on surface electrodes and depth electrodes. IEEE Trans. Neural Syst. Rehabil. Eng. 2024, 32, 2408–2431. [Google Scholar] [CrossRef]
Anjum, M.; Sakib, N.; Islam, M.K. Effect of artifact removal on EEG based motor imagery BCI applications. In Proceedings of the Fourth International Conference on Computer Vision and Information Technology (CVIT 2023); SPIE: Bellingham, WA, USA, 2024; Volume 12984, pp. 69–78. [Google Scholar]
Abu-Rmileh, A.; Zakkay, E.; Shmuelof, L.; Shriki, O. Co-adaptive training improves efficacy of a multi-day EEG-based motor imagery BCI training. Front. Hum. Neurosci. 2019, 13, 362. [Google Scholar] [CrossRef] [PubMed]
Altuwaijri, G.A.; Muhammad, G. Electroencephalogram-based motor imagery signals classification using a multi-branch convolutional neural network model with attention blocks. Bioengineering 2022, 9, 323. [Google Scholar] [CrossRef]
Zhu, D.; Bieger, J.; Garcia Molina, G.; Aarts, R.M. A survey of stimulation methods used in SSVEP-based BCIs. Comput. Intell. Neurosci. 2010, 2010, 702357. [Google Scholar] [CrossRef] [PubMed]
Lin, Z.; Zhang, C.; Wu, W.; Gao, X. Frequency recognition based on canonical correlation analysis for SSVEP-based BCIs. IEEE Trans. Biomed. Eng. 2006, 53, 2610–2614. [Google Scholar] [CrossRef] [PubMed]
Gaur, P.; Pachori, R.B.; Wang, H.; Prasad, G. An empirical mode decomposition based filtering method for classification of motor-imagery EEG signals for enhancing brain-computer interface. In Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN); IEEE: New York, NY, USA, 2015; pp. 1–7. [Google Scholar]
Mushtaq, F.; Welke, D.; Gallagher, A.; Pavlov, Y.G.; Kouara, L.; Bosch-Bayard, J.; van den Bosch, J.J.; Arvaneh, M.; Bland, A.R.; Chaumon, M.; et al. One hundred years of EEG for brain and behaviour research. Nat. Hum. Behav. 2024, 8, 1437–1443. [Google Scholar] [CrossRef]
Grobbelaar, M.; Phadikar, S.; Ghaderpour, E.; Struck, A.F.; Sinha, N.; Ghosh, R.; Ahmed, M.Z.I. A survey on denoising techniques of electroencephalogram signals using wavelet transform. Signals 2022, 3, 577–586. [Google Scholar] [CrossRef]
Buzsáki, G.; Anastassiou, C.A.; Koch, C. The origin of extracellular fields and currents—EEG, ECoG, LFP and spikes. Nat. Rev. Neurosci. 2012, 13, 407–420. [Google Scholar] [CrossRef]
Georgopoulos, A.P.; Schwartz, A.B.; Kettner, R.E. Neuronal population coding of movement direction. Science 1986, 233, 1416–1419. [Google Scholar] [CrossRef]
Wu, W.; Gao, Y.; Bienenstock, E.; Donoghue, J.P.; Black, M.J. Bayesian population decoding of motor cortical activity using a Kalman filter. Neural Comput. 2006, 18, 80–118. [Google Scholar] [CrossRef] [PubMed]
Autthasan, P.; Chaisaen, R.; Sudhawiyangkul, T.; Rangpong, P.; Kiatthaveephong, S.; Dilokthanakul, N.; Bhakdisongkhram, G.; Phan, H.; Guan, C.; Wilaiprasitporn, T. MIN2Net: End-to-end multi-task learning for subject-independent motor imagery EEG classification. IEEE Trans. Biomed. Eng. 2021, 69, 2105–2118. [Google Scholar] [CrossRef] [PubMed]
Lawhern, V.J.; Solon, A.J.; Waytowich, N.R.; Gordon, S.M.; Hung, C.P.; Lance, B.J. EEGNet: A compact convolutional neural network for EEG-based brain–computer interfaces. J. Neural Eng. 2018, 15, 056013. [Google Scholar] [CrossRef] [PubMed]
Schirrmeister, R.T.; Springenberg, J.T.; Fiederer, L.D.J.; Glasstetter, M.; Eggensperger, K.; Tangermann, M.; Hutter, F.; Burgard, W.; Ball, T. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp. 2017, 38, 5391–5420. [Google Scholar] [CrossRef] [PubMed]
Glaser, J.I.; Benjamin, A.S.; Chowdhury, R.H.; Perich, M.G.; Miller, L.E.; Kording, K.P. Machine learning for neural decoding. eneuro 2020, 7. [Google Scholar] [CrossRef]
Willett, F.R.; Kunz, E.M.; Fan, C.; Avansino, D.T.; Wilson, G.H.; Choi, E.Y.; Kamdar, F.; Glasser, M.F.; Hochberg, L.R.; Druckmann, S.; et al. A high-performance speech neuroprosthesis. Nature 2023, 620, 1031–1036. [Google Scholar] [CrossRef]
Chandrasekaran, S.; Wandelt, S.K.; Jangam, A.; Elias, Z.; Ibroci, E.; Maffei, C.; Rosenthal, I.A.; Ramdeo, R.; Kim, J.W.; Xu, J.; et al. Restoring Cortically Mediated Movement and Sensation in Complete Tetraplegia. medRxiv 2025. [Google Scholar] [CrossRef]
Śliwowski, M.; Martin, M.; Souloumiac, A.; Blanchart, P.; Aksenova, T. Decoding ECoG signal into 3D hand translation using deep learning. J. Neural Eng. 2022, 19, 026023. [Google Scholar] [CrossRef]
Ji, C. Explainable mst-ecognet decode visual information from ecog signal. arXiv 2024, arXiv:2411.16165. [Google Scholar]
Xie, Z.; Schwartz, O.; Prasad, A. Decoding of finger trajectory from ECoG using deep learning. J. Neural Eng. 2018, 15, 036009. [Google Scholar] [CrossRef]
Scott, S.H.; Kalaska, J.F. Reaching movements with similar hand paths but different arm orientations. I. Activity of individual cells in motor cortex. J. Neurophysiol. 1997, 77, 826–852. [Google Scholar] [CrossRef]
Dey, S.; Yoshida, T.; Foerster, R.H.; Ernst, M.; Schmalz, T.; Carnier, R.M.; Schilling, A.F. A hybrid approach for dynamically training a torque prediction model for devising a human-machine interface control strategy. arXiv 2021, arXiv:2110.03085. [Google Scholar] [CrossRef]
Miyashita, E.; Sakaguchi, Y. State variables of the arm may be encoded by single neuron activity in the monkey motor cortex. IEEE Trans. Ind. Electron. 2015, 63, 1943–1952. [Google Scholar] [CrossRef]
Tian, K.; Zhao, S.; Zhang, Y.; Yu, S. Multi-dimensional Neural Decoding with Orthogonal Representations for Brain-Computer Interfaces. arXiv 2025, arXiv:2508.08681. [Google Scholar] [CrossRef]
Georgopoulos, A.P.; Kalaska, J.F.; Caminiti, R.; Massey, J.T. On the relations between the direction of two-dimensional arm movements and cell discharge in primate motor cortex. J. Neurosci. 1982, 2, 1527–1537. [Google Scholar] [CrossRef]
Ahmadi, N.; Constandinou, T.G.; Bouganis, C.S. Robust and accurate decoding of hand kinematics from entire spiking activity using deep learning. J. Neural Eng. 2021, 18, 026011. [Google Scholar] [CrossRef] [PubMed]
Meattini, R.; Chiaravalli, D.; Biagiotti, L.; Palli, G.; Melchiorri, C. Combining unsupervised muscle co-contraction estimation with bio-feedback allows augmented kinesthetic teaching. IEEE Robot. Autom. Lett. 2021, 6, 6180–6187. [Google Scholar] [CrossRef]
Nijhawan, R. Neural delays, visual motion and the flash-lag effect. Trends Cogn. Sci. 2002, 6, 387–393. [Google Scholar] [CrossRef]
Awasthi, P.; Lin, T.H.; Bae, J.; Miller, L.E.; Danziger, Z.C. Validation of a non-invasive, real-time, human-in-the-loop model of intracortical brain-computer interfaces. J. Neural Eng. 2022, 19, 056038. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Noh, S.H. Analysis of gradient vanishing of RNNs and performance comparison. Information 2021, 12, 442. [Google Scholar] [CrossRef]
Schmidt-Hieber, J. Nonparametric regression using deep neural networks with ReLU activation function. Ann. Statist. 2020, 48, 1875–1897. [Google Scholar]
Bhanja, S.; Das, A. Impact of data normalization on deep neural network for time series forecasting. arXiv 2018, arXiv:1812.05519. [Google Scholar]
Haghi, B.; Aflalo, T.; Kellis, S.; Guan, C.; Gamez de Leon, J.A.; Huang, A.Y.; Pouratian, N.; Andersen, R.A.; Emami, A. Enhanced control of a brain–computer interface by tetraplegic participants via neural-network-mediated feature extraction. Nat. Biomed. Eng. 2024, 9, 917–934. [Google Scholar] [CrossRef]
Pandarinath, C.; O’Shea, D.J.; Collins, J.; Jozefowicz, R.; Stavisky, S.D.; Kao, J.C.; Trautmann, E.M.; Kaufman, M.T.; Ryu, S.I.; Hochberg, L.R.; et al. Inferring single-trial neural population dynamics using sequential auto-encoders. Nat. Methods 2018, 15, 805–815. [Google Scholar] [CrossRef]
Zhang, L.; Soselia, D.; Wang, R.; Gutierrez-Farewik, E.M. Lower-limb joint torque prediction using LSTM neural networks and transfer learning. IEEE Trans. Neural Syst. Rehabil. Eng. 2022, 30, 600–609. [Google Scholar] [CrossRef]
Shah, S.; Haghi, B.; Kellis, S.; Bashford, L.; Kramer, D.; Lee, B.; Liu, C.; Andersen, R.; Emami, A. Decoding kinematics from human parietal cortex using neural networks. In Proceedings of the 2019 9th International IEEE/EMBS Conference on Neural Engineering (NER); IEEE: New York, NY, USA, 2019; pp. 1138–1141. [Google Scholar]
De Feo, V.; Boi, F.; Safaai, H.; Onken, A.; Panzeri, S.; Vato, A. State-dependent decoding algorithms improve the performance of a bidirectional bmi in anesthetized rats. Front. Neurosci. 2017, 11, 269. [Google Scholar] [CrossRef] [PubMed]
Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 1995, 57, 289–300. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Rao, C.R.; Rao, C.R.; Statistiker, M.; Rao, C.R.; Rao, C.R. Linear Statistical Inference and Its Applications; Wiley: New York, NY, USA, 1973; Volume 2. [Google Scholar]
Peterson, S.M.; Steine-Hanson, Z.; Davis, N.; Rao, R.P.; Brunton, B.W. Generalized neural decoders for transfer learning across participants and recording modalities. J. Neural Eng. 2021, 18, 026014. [Google Scholar] [CrossRef]
Churchland, M.M.; Shenoy, K.V. Temporal complexity and heterogeneity of single-neuron activity in premotor and motor cortex. J. Neurophysiol. 2007, 97, 4235–4257. [Google Scholar] [CrossRef]
Chen, X.; Fu, Z.; Zhang, P.; Chen, X.; Huang, J. Intracortical Brain-Machine Interfaces with High-Performance Neural Decoding through Efficient Transfer Meta-learning. IEEE Trans. Biomed. Eng. 2025, 73, 518–529. [Google Scholar] [CrossRef] [PubMed]
Iman, M.; Arabnia, H.R.; Rasheed, K. A review of deep transfer learning and recent advancements. Technologies 2023, 11, 40. [Google Scholar] [CrossRef]
Hong, X.; Zheng, Q.; Liu, L.; Chen, P.; Ma, K.; Gao, Z.; Zheng, Y. Dynamic joint domain adaptation network for motor imagery classification. IEEE Trans. Neural Syst. Rehabil. Eng. 2021, 29, 556–565. [Google Scholar] [CrossRef]
Lee, D.Y.; Lee, M.; Lee, S.W. Decoding imagined speech based on deep metric learning for intuitive BCI communication. IEEE Trans. Neural Syst. Rehabil. Eng. 2021, 29, 1363–1374. [Google Scholar] [CrossRef] [PubMed]
Halmich, C.; Höschler, L.; Schranz, C.; Borgelt, C. Data augmentation of time-series data in human movement biomechanics: A scoping review. PLoS ONE 2025, 20, e0327038. [Google Scholar] [CrossRef]
Banks, C.L.; Huang, H.J.; Little, V.L.; Patten, C. Electromyography exposes heterogeneity in muscle co-contraction following stroke. Front. Neurol. 2017, 8, 699. [Google Scholar] [CrossRef]
Kumar, S.; Alawieh, H.; Racz, F.S.; Fakhreddine, R.; Millán, J.d.R. Transfer learning promotes acquisition of individual BCI skills. PNAS Nexus 2024, 3, pgae076. [Google Scholar] [CrossRef]
Chen, S.; Chen, M.; Wang, X.; Liu, X.; Liu, B.; Ming, D. Brain–computer interfaces in 2023–2024. Brain-x 2025, 3, e70024. [Google Scholar] [CrossRef]
Wang, Z.; Li, S.; Wu, D. Canine EEG helps human: Cross-species and cross-modality epileptic seizure detection via multi-space alignment. Natl. Sci. Rev. 2025, 12, nwaf086. [Google Scholar] [CrossRef] [PubMed]
Shi, Y.; Ma, S.; Zhao, Y.; Shi, C.; Zhang, Z. A physics-informed low-shot adversarial learning for semg-based estimation of muscle force and joint kinematics. IEEE J. Biomed. Health Inform. 2023, 28, 1309–1320. [Google Scholar] [CrossRef] [PubMed]
Kim, Y.; Kim, C.; Hwangbo, J. Learning forward dynamics model and informed trajectory sampler for safe quadruped navigation. arXiv 2022, arXiv:2204.08647. [Google Scholar] [CrossRef]

Figure 1. The experimental schematic is as follows: first, we compiled neural data from a center-out reaching task to form a pseudo-population. Each daily session provided recordings from one or two neurons simultaneously and 10 trials for each target. Due to the repetitive nature of the task, we assumed that the response properties of individual neurons were consistent across recording days. Therefore, we aggregated all recorded neurons across all sessions to construct a high-dimensional, pseudo-population spike dataset. This aggregated dataset was subsequently divided into training and test sets and fed into the decoder for evaluation.

u_{n}

:

n_{t h}

recording neuron;

θ_{s}

: rotation angle of shoulder;

θ_{s}

: rotation angle of elbow.

Figure 1. The experimental schematic is as follows: first, we compiled neural data from a center-out reaching task to form a pseudo-population. Each daily session provided recordings from one or two neurons simultaneously and 10 trials for each target. Due to the repetitive nature of the task, we assumed that the response properties of individual neurons were consistent across recording days. Therefore, we aggregated all recorded neurons across all sessions to construct a high-dimensional, pseudo-population spike dataset. This aggregated dataset was subsequently divided into training and test sets and fed into the decoder for evaluation.

u_{n}

:

n_{t h}

recording neuron;

θ_{s}

: rotation angle of shoulder;

θ_{s}

: rotation angle of elbow.

Figure 2. Schematic of the decoder. (a) Structure of the decoder block, which is the basic structure in this study. (b) Structure of our Single Direction CNN-LSTM structure. For each output motion variable, there are two symmetrical branches of decoder blocks represent extension and flexion correspondingly. The final net variable is obtained by the subtraction between the outputs of these two blocks. (c) Structure of conventional CNN-LSTM. Compared with our Single Direction CNN-LSTM model, there is only a single decoder block to estimate all the motion variables at the same time. Av: Angular velocity, Tor: Torque.

Figure 3. Motion space of all the eight targets. Pink shallow lines represent trajectories of each trial, while the red line represents the mean trajectory of all trials in each target. (Upper Row) motion space of angle velocities; (Lower Row) motion space of joint torques.

Figure 4. Trajectories for all eight targets based on angular velocities estimated by three decoders. Solid lines represent the average trajectory per target; faint lines represent individual trial trajectories. Different colors represent trajectories of different targets. (a) Ground-truth trajectories recorded during the experiment; The x- and y-axes represent hand position coordinates in the plane (unit: mm). This convention applies to all subsequent figures showing hand positions. (b) Trajectories reconstructed by the Single-Direction model; (c) conventional CNN-LSTM model; (d) Linear Regression (LR) model; (e) R2 scores for each output parameter across the three decoders; Av_s: shoulder angular velocity; Av_e: elbow angular velocity; T_s: shoulder torque; T_e: elbow torque; Baseline: Conventional CNN-LSTM model. **: p< 0.01, pair t-test, FDR correction.

Figure 5. Results of the generalizability test for each target. Each colored point represents the mean

R^{2}

value for a given model, with error bars indicating one standard deviation from five-fold cross-validation. Negative

R^{2}

values were set to zero in order to facilitate the display. Please note that the original values were still used during the statistical analysis. Av_s: shoulder angular velocity; Av_e: elbow angular velocity; T_s: shoulder torque; T_e: elbow torque; Baseline: Conventional CNN-LSTM model. *: p< 0.05, **: p< 0.01, pair t-test, FDR correction.

Figure 5. Results of the generalizability test for each target. Each colored point represents the mean

R^{2}

value for a given model, with error bars indicating one standard deviation from five-fold cross-validation. Negative

R^{2}

values were set to zero in order to facilitate the display. Please note that the original values were still used during the statistical analysis. Av_s: shoulder angular velocity; Av_e: elbow angular velocity; T_s: shoulder torque; T_e: elbow torque; Baseline: Conventional CNN-LSTM model. *: p< 0.05, **: p< 0.01, pair t-test, FDR correction.

Figure 6. Spatial features extracted by the Single-Direction decoder. Each red point represents a unit of the input neural data, with the abscissa and ordinate indicating the unit’s weight in the extension and flexion directions, respectively. The blue histogram and overlaid red curve above the diagonal illustrate the distribution of units across different weight values (

p < 0.05

, independent t-test). (a) Shoulder angle velocity; (b) Elbow angle velocity; (c) Shoulder torque; (d) Elbow torque.

Figure 6. Spatial features extracted by the Single-Direction decoder. Each red point represents a unit of the input neural data, with the abscissa and ordinate indicating the unit’s weight in the extension and flexion directions, respectively. The blue histogram and overlaid red curve above the diagonal illustrate the distribution of units across different weight values (

p < 0.05

, independent t-test). (a) Shoulder angle velocity; (b) Elbow angle velocity; (c) Shoulder torque; (d) Elbow torque.

Figure 7. Temporal features extracted by the Single-Direction decoder. The x-axis shows time bins of the input neural data (−200 ms to 100 ms, with negative values indicating pre-movement onset), and the y-axis denotes decoder branches estimating variables in different directions. All weights are normalized to the range 0.0–1.0 within each branch.

Figure 8. Torque results. Arrows indicate movement targets. Solid lines show trial averages, shaded areas the standard deviation. (a) Shoulder; (b) Elbow. Blue: experimental torque; Green: decoder-estimated torque; Red: pretend co-contraction torque.

Table 1. Neural network architecture.

Branch	Layer	Activation Function	Hyperparameter	Value	Output Shape
Input	N/A	N/A	Input Shape	[32 × 150 × 73 × 30]	[32 × 150 × 73 × 30 × 1]
Extension	Conv_2D	ReLU	Number of temporal filters	8	[32 × 180 × 73 × 30 × 8]
			Kernel size	(1, 30)
			Padding	same
			stride step	1
	Depthwise Conv_2D	ReLU	Depth multiplier	2	[32 × 180 × 1 × 30 × 16]
			Kernel size	(73, 1)
			Padding	valid
			Stride step	1
	Average Pooling & Flatten	N/A	Pooling kernel size/stride	4	[32 × 180 × 112]
	LSTM	tanh	Number of layers	1	[32 × 180 × 16]
			Number of hidden units	16
			Sequence length	180
	FC	ReLU	Number of units	8	[32 × 180 × 8]
Flexion	Conv_2D	ReLU	Number of temporal filters	8	[32 × 180 × 73 × 30 × 8]
			Kernel size	(1, 30)
			Padding	same
			stride step	1
	Depthwise Conv_2D	ReLU	Depth multiplier	2	[32 × 180 × 1 × 30 × 16]
			Kernel size	(73, 1)
			Padding	valid
			Stride step	1
	Average Pooling & Flatten	N/A	Pooling kernel size/stride	4	[32 × 180 × 112]
	LSTM	tanh	Number of layers	1
			Number of hidden units	16
			Sequence length	180	[32 × 180 × 16]
	FC	ReLU	Number of units	8	[32 × 180 × 8]

Table 2. Pre-train Validation.

	Av_s	Av_e	T_s	T_e
Target I $&$ II	0.775	0.534	0.645	0.612
Target VI $&$ VIII	0.827	0.740	0.486	0.542
Target IV $&$ V	0.827	0.710	0.604	0.587

Table 3. Ablation results.

		I	II	III	VI	VII	VIII
SingleNet	${Av}_{s}$	0.818 ± 0.016	0.852 ± 0.003	0.847 ± 0.004 ↑	0.913 ± 0.003 ↑	0.689 ± 0.008	0.773 ± 0.017
	${Av}_{e}$	0.810 ± 0.005 ↑	0.799 ± 0.011	0.395 ± 0.016 ↑	0.904 ± 0.002 ↑	0.329 ± 0.046	0.585 ± 0.034 ↑
	$T_{s}$	0.220 ± 0.040	0.516 ± 0.014	0.736 ± 0.013	0.731 ± 0.006 ↑	0.629 ± 0.028	0.602 ± 0.048
	$T_{e}$	0.186 ± 0.033	0.412 ± 0.031	0.603 ± 0.009	0.704 ± 0.004 ↑	0.603 ± 0.011	0.644 ± 0.052
LinearNet	${Av}_{s}$	0.848 ± 0.005 ↑	0.856 ± 0.006 ↑	0.831 ± 0.008	0.897 ± 0.002	0.724 ± 0.014 ↑	0.500 ± 0.063
	${Av}_{e}$	0.803 ± 0.013	0.816 ± 0.013	0.394 ± 0.029	0.828 ± 0.043	0.360 ± 0.028	0.767 ± 0.029
	$T_{s}$	0.209 ± 0.016	0.558 ± 0.014 ↑	0.745 ± 0.004 ↑	0.688 ± 0.006	0.666 ± 0.013 ↑	0.611 ± 0.026
	$T_{e}$	0.226 ± 0.056 ↑	0.400 ± 0.040	0.595 ± 0.029	0.569 ± 0.022	0.604 ± 0.013	0.670 ± 0.032 ↑
SharedNet	${Av}_{s}$	0.827 ± 0.007	0.833 ± 0.005	0.842 ± 0.007	0.875 ± 0.002	0.344 ± 0.015	0.801 ± 0.010 ↑
	${Av}_{e}$	0.778 ± 0.019	0.863 ± 0.003 ↑	0.371 ± 0.051	0.893 ± 0.004	0.703 ± 0.004 ↑	0.544 ± 0.036
	$T_{s}$	0.226 ± 0.051 ↑	0.463 ± 0.008	0.602 ± 0.021	0.279 ± 0.016	0.620 ± 0.012	0.698 ± 0.012 ↑
	$T_{e}$	0.209 ± 0.010	0.571 ± 0.013 ↑	0.741 ± 0.008 ↑	0.694 ± 0.009	0.605 ± 0.065 ↑	0.632 ± 0.054

↑: denotes the highest mean R² achieved among the three models.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ou, H.-Y.; Hasegawa, T.; Fukayama, O.; Miyashita, E. Neuroscience-Inspired Deep Learning Brain–Machine Interface Decoder. Bioengineering 2026, 13, 440. https://doi.org/10.3390/bioengineering13040440

AMA Style

Ou H-Y, Hasegawa T, Fukayama O, Miyashita E. Neuroscience-Inspired Deep Learning Brain–Machine Interface Decoder. Bioengineering. 2026; 13(4):440. https://doi.org/10.3390/bioengineering13040440

Chicago/Turabian Style

Ou, Hong-Yun, Takahiro Hasegawa, Osamu Fukayama, and Eizo Miyashita. 2026. "Neuroscience-Inspired Deep Learning Brain–Machine Interface Decoder" Bioengineering 13, no. 4: 440. https://doi.org/10.3390/bioengineering13040440

APA Style

Ou, H.-Y., Hasegawa, T., Fukayama, O., & Miyashita, E. (2026). Neuroscience-Inspired Deep Learning Brain–Machine Interface Decoder. Bioengineering, 13(4), 440. https://doi.org/10.3390/bioengineering13040440

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Neuroscience-Inspired Deep Learning Brain–Machine Interface Decoder

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Preparation

2.2. Data Preprocessing

2.3. Proposed Model

2.3.1. Convolutional Layers

2.3.2. LSTM Layer

2.4. Single-Direction CNN-LSTM Decoder

2.5. Baseline Model

2.5.1. Conventional CNN-LSTM Decoder

2.5.2. Linear Decoder

2.6. Fine-Tuned and Generalizability Test

2.7. Environment and Hyperparameter

3. Results

3.1. Validation on All Data

3.2. Generlizability

3.3. Ablation Study

3.4. Features Extracted by the Decoder

3.5. Estimation of Co-Contraction Increase the Generalizability

4. Discussion

4.1. Limitation of Fine-Tuning

4.2. Limitation of Data

4.3. Musculoskeletal Model

4.4. Co-Contraction Judgment

5. Future and Limitation

5.1. Subject-Specific Bias

5.2. Scalability to Human BMI

5.3. Musculoskeletal Model Sensitivity

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI