1. Introduction
Motor imagery is one of the main applications used in brain–computer interfaces. Through motor imagery, a person can mentally simulate specific movements without physically executing them. This concept could aid in the control of assistive devices for individuals with disabilities, as well as in the rehabilitation of injured individuals to improve muscle strength or to manage pain. EEG signals, which are commonly used to measure motor imagery activity in the brain, play a crucial role in BCIs, clinical diagnostics, and neuroscience research. These signals are characterized by their high temporal resolution, allowing for real-time monitoring of brain activity. EEG signals have demonstrated their efficacy across various fields, such as stroke rehabilitation [
1,
2], wheelchair control [
3], robot control [
4], and other assistive technologies. However, EEG signals are non-stationary, meaning their statistical properties can change over time, which further complicates the classification process [
5]. Additionally, the challenge is heightened when dealing with limited data for a person, making accurate analysis and classification of EEG signals for motor imagery tasks even more difficult.
Consider developing an online motor imagery rehabilitation system that incrementally gathers patient data over time. The main difficulty in designing such systems is accurately identifying the motor imagery tasks performed by the patient, particularly in the initial stages of rehabilitation when patient-specific data are limited. To address this, we adopted a pre-trained model to provide preliminary predictions about motor imagery tasks, especially during the initial sessions. This model can then be fine-tuned with the patient’s specific data to improve its accuracy—a process known as transfer learning [
6]. Transfer learning is a crucial technique in deep learning, especially due to the limited availability of EEG data and the extensive datasets typically required by neural network models [
7]. It is also valuable in scenarios where the initial data for classification is lacking. However, the non-stationary nature of EEG signals poses significant challenges to the process of transfer learning. EEG signals may vary considerably between trials or even within the same trial for the same subject [
5]. Nevertheless, subjects tend to display consistent EEG patterns when mentally simulating the same movement [
8].
Transfer learning (TL) is a well-studied technique for adapting models trained on one subject to another related subject. Various strategies have been employed to improve accuracy for the target subject, including methods such as freezing specific layers [
9], subject-to-subject semantic-style transfer network (SSSTN) [
10], and approaches based on Riemannian geometry [
6]. While TL is typically applied to a single target subject [
6,
7,
8,
10], the concept can be extended to multiple target subjects within the same region, offering opportunities to address broader variability and improve performance. Research indicates that cultural backgrounds can influence brain activity patterns. For instance, researchers in [
11] have shown differences in EEG patterns between individuals from different cultural backgrounds during cognitive tasks. The authors of [
12] provided a meta-analysis that explores how cultural differences affect human brain activity, which includes discussions on motor imagery. Moreover, the researchers in [
13] emphasized that BCI applications should account for cultural differences, considering how linguistic and cultural factors affect BCI performance and usage. Consequently, we propose to use distinct models retrained in different regions to account for cultural differences. This approach represents a key distinction between our work and previous studies, where multiple subjects were used as the target in transfer learning.
The architecture of the online rehabilitation system requires a multi-layered structure. At the cloud level, pre-trained models are generated, and distinct models are distributed to each region for retraining at the patient level. Data collection in this system is performed by IoT devices and sensors, including EEG sensors, which gather real-time patient information. This data is then transmitted to the cloud for processing. However, transmitting large volumes of patient data to the cloud necessitates efficient management and control [
14,
15]. Additionally, IoT devices face inherent limitations, such as restricted computational power, memory, bandwidth, and energy resources, which complicate the processing of such extensive data [
16]. These challenges call for solutions that distribute computational tasks across multiple layers. The authors in [
17] declared a need to build a new network level in order to decrease the load on the cloud and perform the complex computational process. Researchers have proposed architectures that integrate cloud, fog, and edge computing to address these challenges and support IoT applications [
18,
19,
20,
21]. The cloud serves as a centralized hub for model training, while fog and edge layers process data closer to the source, reducing delays and decreasing the amount of data transmitted. As highlighted in [
17], both fog and edge computing facilitate data processing between IoT devices and the cloud, offering faster analysis and localized decision-making. Fog nodes specialize in aggregating information from geographically dispersed devices, making them suitable for extensive systems but often difficult to manage. In contrast, edge computing is more compatible with our proposed architecture, as it supports real-time processing of EEG data and delivers immediate feedback to patients. In this context, we propose an efficient IoT framework for remote training based on edge/cloud computing as three network layers to cooperate and balance data processing. The classification process of EEG signals is applied at the edge level to overcome the communication cost and to provide the patient with a quick response. Moreover, the edge level can enhance MI accuracy by implementing subject-specific training.
Recognizing the patient’s motor imagery (MI) tasks remains a critical challenge due to the complexity of EEG signals. Various techniques were proposed in the literature to address this challenge, including feature extraction methods such as common spatial patterns (CSP) combined with machine-learning models like support vector machines (SVMs) and linear discriminant analysis (LDA) [
22,
23,
24]. Convolutional neural networks (CNNs) have also been explored as solutions, with examples including a dual-branch CNN with a self-attention mechanism [
25] and a multi-branch one-dimensional CNN with residual blocks [
26]. Moreover, one effective approach using fusion CNN with attention blocks (FCNNA) provided a well-performing classification model, as demonstrated in our previous work [
27]. In this study, we build upon that work by incorporating transfer learning. Additionally, EEG signals often contain redundant channels that can negatively affect accuracy and efficiency. To mitigate this issue, our approach minimizes the number of EEG channels used, thereby optimizing both accuracy and efficiency. Moreover, various transfer strategies, such as adjusting epochs, freezing layers, and employing different data division techniques, are applied to enhance both efficiency and accuracy. Both the classification and channel selection methods are derived from this robust framework [
27]. Consequently, our system’s effectiveness in addressing these challenges through the use of different classification models, channel selection algorithms, and transfer-learning strategies highlights its potential for improving the accuracy and efficiency of EEG-based MI task recognition.
The main contributions of this paper are as follows:
To develop an online rehabilitation system with an edge-computing layer to provide an efficient IoT framework for remote training. This layer reduces communication and computational costs on central servers;
To introduce a multi-subject transfer-learning process, where the target subjects include more than one subject for retraining the same model alternately. This approach enhances MI accuracy, optimizes memory, and considers common cultural differences while maintaining only one model per region;
To improve efficiency and accuracy by applying a freezing strategy and selecting a unified set of optimal channels for all subjects, which optimizes memory usage and computation time;
To provide a comprehensive comparison for cross-subject classification for each model and transfer-learning strategy used, exploring their respective advantages and disadvantages.
The rest of the paper is organized as follows.
Section 2 reviews the solutions required for edge computing within an IoT framework and briefly describes the current transfer-learning classification techniques employed for EEG signals.
Section 3 provides a detailed explanation of the methods and architectures proposed in this study. In
Section 4, the data and the models of the system are analyzed. The experimental results and discussion are presented in
Section 5. Finally, the paper is concluded in
Section 6.
3. Materials and Methods
This section illustrates the architecture of the proposed online rehabilitation system and provides descriptions of the techniques used in transfer-learning classification at the edge level.
3.1. Online Rehabilitation (OR) System Architecture
The architecture of the proposed system is depicted in
Figure 1, illustrating the main components of the online rehabilitation network and its layers. The system comprises three primary layers: cloud, edge, and client. The edge layer is included to enhance efficiency, providing quick responses while maintaining the specific characteristics of each region. The figure demonstrates the direction of data flow and the protocols used for transactions. According to
Figure 1 a brief description of each layer is as follows:
Client Level (Sensors)
At the Client Level, EEG sensors for MI are attached to the user’s headset to capture brain signals. These sensors are connected via Bluetooth to an edge gateway, which includes a Raspberry Pi device responsible for initial data processing and transmission. This level primarily focuses on collecting raw data from the sensors and performing preliminary processing such as extracting windows of 4.5 s from each trial and dropping the selected channels. The processed data are then transmitted to the edge node for further analysis.
Furthermore, the rehabilitation instructions for each session, such as ‘imagine moving your right hand’, are generated by the main hospital and stored in the cloud. These instructions are then sent to the care provider’s server at the edge node. From there, they are delivered to the patient’s phone prior to the rehabilitation session. The patient uses these instructions to perform specific motor imagery tasks during their rehabilitation exercises.
Edge Level
The edge level consists of an edge node (regional server) that acts as an intermediary between the client devices and the cloud infrastructure. Communication between the edge node, the client (via the edge gateway), and the cloud is facilitated through Wi-Fi. The edge node provides additional processing power and storage to handle data efficiently. Processing data locally at the edge node ensures rapid response times and preserves the unique characteristics of each region. It reduces communication overhead and offloads computational burden from the cloud. In the proposed framework, a regional care provider’s server acts as the edge node for retraining the model with local patient data. The updated model parameters are saved at the edge node and used exclusively for that region. This decentralized approach ensures that each region has a model fine-tuned to its specific patient population, enhancing the accuracy and effectiveness of predictions during rehabilitation training while reducing the need for extensive data transmission to the main hospital. For a detailed discussion on the retrained model and transfer-learning strategies, refer to
Section 3.2.
Cloud Level
The cloud represents the servers of the primary rehabilitation service provider (i.e., the main hospital). The main service provider is responsible for providing the trained model and the common channels to the edge nodes in each region. The trained model assists in predicting tasks for new patient data during rehabilitation training. Meanwhile, the channels are selected to minimize the amount of information transmitted through the network, based on the optimal channel selection methodology outlined in [
27]. This approach enhances efficiency while maintaining a high level of accuracy. Also, the cloud servers receive performance feedback on the patients’ training sessions from each edge node.
To ensure reliable WiFi transmission between the cloud, edge, and client levels, mechanisms such as automatic repeat request (ARQ) and acknowledgment signals could be employed. These strategies help detect and recover from potential packet loss or interference, ensuring the integrity of the transmitted data [
43].
3.2. Transfer-Learning Framework and Strategies
In the edge node, transfer learning is deployed to retrain the model, adapting it to the patients’ data in the edge region. Following the principles of transfer learning, we built and compared the performances of four models developed for this purpose. Two of these models replicated the structure used in our earlier study [
27]: one without channel selection and one with channel selection employing a cross-subject strategy. The other two models are similar to the previous ones, differing only in their kernel size values. This adjustment reduces processing time, effectively addressing the challenges posed by large data volumes. The process of choosing the models and the main comparison between the performances of the models are shown in
Section 4.3. Here is a list of the models used:
The previously listed models are structured with two layers of convolutional blocks, each followed by CBAM attention mechanisms. Each convolutional layer comprises two distinct blocks, with one employing both frequency and spatial filters, and the other utilizing a separable convolutional block. Furthermore, a genetic algorithm is implemented to reduce the number of channels. By selecting one set of optimal channels, the same channels are used for all subjects during training [
27]. The overall architecture of the FCNNA and LFCNN models remains consistent, with the primary variation being the kernel sizes, as detailed in
Section 4.3, Table 2.
It is important to note that all of the aforementioned methods have been utilized to generate multiple pre-trained models. In our approach, which employs a cross-subject strategy, each model is trained on a subset of source subjects and subsequently tested on a different source subject. These pre-trained models are subsequently deployed in a transfer-learning process to adapt to the data of the target subjects. Based on the accuracy and efficiency results from the transfer-learning experiments conducted in this study, the optimal retrained model will be selected for preparation in the cloud and then deployed to the edge nodes.
The following sections focus on describing and clarifying the methodologies used in the proposed transfer-learning process. We begin by providing a comprehensive overview of the general framework for EEG classification in a subject-specific context, incorporating both pre-trained and retrained models. Additionally, we will outline the strategies employed to update the model at the edge node, including the transfer-learning online mode, the session-based data division strategy, and the freezing strategy. These strategies are essential for enhancing the model’s adaptability and performance, ensuring that it can effectively learn from new data in real-time while maintaining stability and accuracy.
3.2.1. General Framework of Subject-Specific Classification
The framework of the selected models consists of three main processes: channel selection, preprocessing, and classification.
Figure 2 illustrates the deployment of transfer learning. It shows the primary steps of EEG classification across the client, edge, and cloud levels. Raw EEG data from source subjects are received and preprocessed at the cloud level, while data from the target subjects are received and preprocessed at the client level. At the client level, the data (from patient rehabilitation sessions) are preprocessed before being sent to the edge level. Both the cloud and edge levels process data through a series of steps, including channel selection and deep-learning classification. At the cloud level, these processes are applied to the source subjects to identify the optimal channels using a channel selection algorithm and to develop a pre-trained subject-specific classification model through deep learning. This pre-trained model, along with the optimal channels, is then sent to the edge level. At the edge level, the selected channels and the preprocessed target data are used to retrain the pre-trained model through transfer learning, adapting it to the specific characteristics of the target subjects.
3.2.2. Transfer-Learning Online Mode
In this paper, we use the online mode [
7] in transfer learning to enhance our model, facilitating real-time data processing and online rehabilitation. This approach involves incrementally updating the model with new data as it becomes available, rather than retraining it with a static dataset. This continuous learning process allows us to adapt our model to new patterns and variations of data in real-time, maintaining robustness and accuracy.
Figure 3 illustrates the implementation of the online mode. Initially, a classifier is trained using a cross-subject strategy, where one of the source subjects is tested at a time in the cloud. Subsequently, the classifier is transferred to the edge node for use in new sessions with the target subjects. Throughout the process, a list of predicted trials, initially empty, is maintained along with the saved classifier. At the beginning of each session with the target subject, predictions are made using the saved classifier, which first predicts the class of several trials. Taking advantage of the new data, the classifier is then updated based on these new predicted trials. This process is repeated for each new session. The saved classifier predicts the new trials and is subsequently updated according to the newly available data. At the end of each session, the new predicted trials are saved. Moreover, if the classifier demonstrates improved performance at the end of the session, it will also be saved on the edge node server. In general, the online mode reduces computational overhead and facilitates real-time adaptation, making it an optimal choice for our study, especially in dynamic and data-rich environments. The system was not specifically examined in an online setting, but it closely mimics the dynamics of an online learning environment.
3.2.3. Session-Based Data Division Strategy
In this section, we describe the methodology used to divide the data into multiple sessions, each trained separately to facilitate the transfer-learning process. Our approach involves training different target subjects at different times using the same model. By selecting the data size for each session, we effectively determine the data size for each training period. The model is updated and retrained with each new session, incorporating new data to progressively improve its performance. This approach ensures that the model receives diverse and comprehensive training inputs from different subjects on the edge node, which helps manage the training process efficiently and enhances the model’s ability to generalize across different data segments.
Figure 4 illustrates how each session, containing data from each subject, is trained at alternating times through transfer learning with the same model, which is continuously updated and retrained. In this figure, we consider three subjects on the edge node with N sessions, where N is the division size of the rehabilitation training. The division strategy is crucial for optimizing transfer learning and achieving high accuracy in subject-specific EEG classification. In the experiment, we will demonstrate two values of N and how they affect performance.
3.2.4. Freezing Strategy in Transfer Learning
In the proposed methodology, we use a freezing strategy as part of the transfer-learning approach to fine-tune the model efficiently. We start by freezing the parameters of some initial layers of the pre-trained model and retraining the parameters of the remaining layers to adapt the model to new data or new subjects. To optimize this fine-tuning process, we also adjust the model’s configuration parameters by applying different levels of freezing over various numbers of epochs and reducing the learning rate as needed. The levels of freezing applied in this work are shown in
Figure 5. We use four different levels of freezing, each designed to preserve specific parameter data according to the FCNNA model used in [
27]. In the beginning, we applied six layers of freezing where only time-domain filters were preserved to maintain the initial feature extraction. By freezing these layers, we ensure that the fundamental temporal patterns learned from the original dataset are retained, providing a stable foundation for further processing. At the second level, the parameters of the initial sixteen layers are frozen. These layers typically include time-domain filters and spatial filters from the pre-trained model. By freezing these layers, we retain the model’s ability to extract fundamental temporal and spatial features from the input EEG signals, ensuring that the basic structure and essential features learned from the original dataset are preserved. By freezing the initial layers, including the separable convolutions (up to layer 26), we ensure the preservation of the parameters responsible for extracting higher-order temporal and spatial features. In the final stage, 44 layers are frozen, which include parts of the attention block. By freezing these layers, we maintain the model’s ability to focus on significant features learned from the original dataset, ensuring that important aspects of the data are emphasized while adapting to new tasks. Applying different levels of freezing allows us to make meaningful comparisons and assess the impact of each freezing level on model performance. This approach helps in understanding how retaining specific learned features affects the model’s ability to adapt to new datasets or subjects. Additionally, it helps identify the most efficient freezing strategy for achieving optimal performance with minimal computational cost.
By applying transfer learning with different model configurations, including varying levels of freezing, epochs, and learning rates, we can thoroughly compare their effects on performance. By adjusting the learning rate across different ranges, we can identify the optimal rate that preserves the model’s learned weights and prevents disruptive updates to the optimized parameters. Additionally, by carefully selecting the number of epochs, we can control the extent of training, allowing the model to converge appropriately without overfitting. This balance between learning rate and epochs is crucial for maintaining the integrity of the pre-trained features while adapting the model to new data. Similarly, we will apply an unfreezing strategy, adjusting the same parameters (epochs and learning rates) to provide a comprehensive comparison. This approach allows us to evaluate the impact of different training strategies on the model’s ability to adapt and generalize to new data.
3.3. Stages and Key Contributions in the Online Rehabilitation(OR) Models
As illustrated in
Figure 6, the models are generated within the online rehabilitation architecture. Model 1 is generated in the cloud as the main pre-trained model and then distributed to all of the edge nodes. Each edge node receives new data specific to the targets of its respective region and subsequently retrains Model 1 to generate updated models. Consequently, each node generates a different model, with Model 2 produced by the first edge node and Model 3 produced by the second edge node. Later, as new data from their respective regions are received, each edge node uses its own model (Model 2 or Model 3) for prediction and further retraining.
Table 1 presents three key points of our approach regarding the model generation process using user data. (1) The data of the target subject are excluded from training the pre-trained model (Model 1). (2) The model is retrained. (3) The transfer learning is applied to multiple subjects. The table highlights the effectiveness of our approach by emphasizing these key aspects and showcasing the novelty of our method compared to previous studies. Unlike most studies, our work does not include any data from the target subjects in the pre-trained model. Some studies inconsistently incorporate target data into pre-trained models, but only in works [
37,
41,
43]. The target data are fully excluded during pre-training. Also, our model is retrained using continuous data from the target subjects, a practice rarely employed, except in study [
7], or four-class experiments and study [
37] for two-class experiments. The main difference between our study and the prior research contribution is that we applied transfer learning to multiple target subjects, whereas previous studies were limited to a single target subject.
5. Experiments and Discussions
In this study, we built the FCNNA and LFCNN models within the same environment described in [
27]. The models were implemented in Python using TensorFlow (version 2.15.0) and deployed on Google Colab, equipped with a T4 GPU and 15.0 GB of GPU RAM. To achieve reliable outcomes, we conducted three runs for each transfer-learning experiment, which involved varying model architectures, division sizes, learning rates, numbers of epochs, and levels of freezing, as detailed in this section. During training, we employed a callback function to save the model weights at the point of highest average accuracy across the target subjects’ trials, thus optimizing the use of computational resources.
We utilized an online strategy using two division levels on subjects seven, eight, and nine. We divided each subject’s data, which consisted of approximately 600 samples, into groups for training and prediction. Using the six-division strategy, the data were divided into six groups (0–90, 90–180, 180–270, 270–360, 360–450, 450–600), which were used for prediction and retraining. In the four-division strategy, the data were split into four groups (0–140, 140–280, 280–420, 420–600) for prediction and retraining. During retraining, the first group of samples was run using five epochs to gradually introduce the data into the classifier. Afterward, the runs were conducted with one of the following epoch settings: (30, 50, 100, 200).
5.1. Preparation of Pre-Trained Models
Initially, to begin working with transfer learning, we prepared the pre-trained classifiers. As noted in
Section 3.2, we developed four models:
A cross-subject classification across all source subjects for each model was adopted. In this approach, one subject was used for testing while the remaining subjects were utilized for training. This technique enables the model to train on and learn the majority of the features from the source subject data. Consequently, each of the four listed models generated six classifiers, resulting in a total of 24 classifiers. Specifically for the models using channel selection, we applied a genetic algorithm following the methodology described in [
27]. The genetic algorithm was employed to identify the optimal channels for testing each subject from the source data. As a result, different combinations of channels produced varying accuracies, and the channels with the highest accuracy were selected as the optimal channels for testing that subject. The selected channels were determined after applying the algorithm two to three times. Subsequently, cross-subject classification was conducted to test each subject using the selected optimal channels for that subject. It was performed six times, either with or without channel selection, and the classifier with the best accuracy was chosen. The final results are presented in
Table 4, which shows the accuracy results for each classifier when testing a single subject across the six source subjects, both with and without channel selection. Additionally, the optimal channels identified using the genetic algorithm are listed.
To select the best classifier for transfer learning from each set of six classifiers, we applied two criteria. First, we chose the classifier with the highest accuracy. Second, we selected the classifier that showed the greatest improvement in accuracy after applying channel selection. Based on the results presented in
Table 4, it is obvious that testing on subject three yields the highest accuracy across all four models. Additionally, subject 1 shows the highest increase after applying channel selection (CS), with a 5.06% improvement in FCNNA and a 2.34% increase in LFCNN. The accuracy in FCNNA was increased from 72.92% to 77.98% after channel selection. Similarly, using LFCNN, the accuracy for subject 1 was improved from 73.47% to 75.81% with channel selection. This indicates that the classifiers testing subjects one and three, highlighted in bold in the table, should be selected. Consequently, we will have two classifiers for each model, resulting in a total of eight classifiers. It is important to highlight that, when applying transfer learning using the classifier tested on subject 1 with channel selection, the optimal channels chosen for subject 1 are applied to all target subjects. The same approach is followed for subject 3.
Across the eight selected classifiers, we applied various transfer-learning strategies, including adjustments to the learning rates, division sizes, epochs, and freezing levels. In the following section, we will present the results based on these strategies.
5.2. Performances of the Proposed Transfer-Learning Technique Applied in the Main Models
This section analyzes the effectiveness of the proposed transfer-learning technique and provides evidence that it can enhance accuracy even when merging data from different subjects. Initially, we conducted cross-subject classification on all nine subjects without employing transfer learning, utilizing both the FCNNA and LFCNN models. This classification provides a baseline analysis, which serves as a reference for later comparisons. The results for subjects 7, 8, and 9 are presented in
Table 5. The FCNNA model achieved an average accuracy of 75.79%, with individual accuracies of 72.63% for subject 7, 81.68% for subject 8, and 73.05% for subject 9. On the other hand, the LFCNN model outperformed FCNNA, with an average accuracy of 78.70%, based on individual accuracies of 76.09%, 83.55%, and 76.45% for subjects 7, 8, and 9, respectively. The table also includes the average accuracies for pairs of subjects, which can be used in future comparisons. The FCNNA model achieved two-subject averages ranging from 72.84% to 77.37%, while the LFCNN model showed higher two-subject averages, ranging from 76.27% to 80.00%.
In this section, we will demonstrate how our transfer-learning strategy can further improve these average accuracy results, particularly across different subjects.
5.2.1. Transfer Learning on Three Subjects
In this study, we focus on three target subjects (subjects 7, 8, and 9), applying transfer learning to each subject’s data alternately, as described in
Section 3.2.3. We used eight pre-trained classifiers and applied transfer learning across the target subjects using two division sizes—six and four—and four learning rates: LR0 = 0.0009 (same as the pre-trained model), LR1 = 0.0001, LR2 = 0.00009, and LR3 = 0.00001. Training was conducted for 30 epochs, and the freezing technique was not applied at this stage. The resulting outcomes are presented below.
Table 6 presents the average accuracy results obtained by applying various strategies to the eight pre-trained models. The best results, which are bolded, were achieved using the pre-trained model of testing subject 1 (TS1) with FCNNA combined with channel selection, yielding accuracies of 72.44% in the six-division setup and 74.02% in the four-division setup.
Figure 8 further illustrates the impact of different learning rates and pre-trained models. As shown in
Figure 8a, the FCNNA pre-trained models generally produce the best outcomes, particularly in the cases of testing subject 1 with channel selection, testing subject 1 without channel selection, and testing subject 3 without channel selection. A detailed comparison reveals that, while the model based on testing subject 1 consistently delivers the highest overall accuracy, the model based on testing subject 3 occasionally outperforms it in specific instances. Meanwhile,
Figure 8b highlights that using LR0 consistently results in the lowest accuracy, regardless of whether the data are divided into six or four segments.
At this stage, we applied freezing strategies across all levels, starting with level 26, which represents a middle ground among the levels (6, 16, 26, and 44). Based on the previous results, we concentrated on the three best-performing classifiers, applying freezing at level 26 over 30 epochs and comparing the subsequent outcomes. However, we excluded LR0, as it had previously yielded the worst results.
Table 7 illustrates the average accuracy results after applying transfer learning to the three target subjects alternately. We found that freezing improved accuracy by retaining certain features of the original data while incorporating features from the target data at the final stages of the model architecture. Notably, we achieved an accuracy of 78.24% in the six-division setup and 78.48% in the four-division setup using the FCNNA model without channel selection. Moreover, the efficiency of this approach was evident as freezing reduced both the time and parameters, demonstrating a significant performance enhancement. However, the best accuracy without using transfer learning, which was 78.70%, has not yet been surpassed, although we have come close. There remains a potential for further improvement by exploring other freezing levels or adjusting the number of epochs.
According to the level 26 freezing results, we selected the best pre-trained model—FCNNA from testing subject 1 without channel selection—to identify the optimal freezing level. The training was conducted over 30 epochs, using various learning rates and division sizes. As shown in
Table 8, increasing the level of freezing significantly improves the results. At level 44, we achieved a higher average accuracy compared to not using transfer learning, reaching 79% in the six-division setup and 79.77% in the four-division setup. By increasing the freezing level, we ensure that the new training model captures more complex patterns from the original, suggesting that the new dataset shares similar characteristics with the dataset used for pre-training.
Given the strong accuracy achieved with level 44 freezing, we explored different numbers of epochs (30, 50, 100, 200) to assess the potential for further accuracy improvements.
Table 9 illustrates how each epoch setting affects the accuracy across various learning rates and division sizes. Our findings indicate that 50 epochs yielded the highest accuracy, reaching 80.29%. However, 30 epochs consistently produced better accuracy across all conditions compared to 50 epochs. While epochs 100 and 200 also provided good accuracy, they required significantly more computations without offering any notable improvement.
Since level 44 freezing with 30 epochs yielded promising results, we applied the same conditions to the four main models on testing subject 1, exploring various learning rates and division setups to assess the potential for further accuracy improvements. However, as presented in
Table 10, the FCNNA model without channel selection in testing subject 1 consistently proved to be the best-performing model. It is worth noting that the LFCNN model showed a slight improvement with freezing compared to the results in
Table 6, but this improvement is not considered significant. In the experiments conducted using FCNNA, the highest average classification accuracy without channel selection was 79.77%, while with channel selection, it reached 76.90%. A closer examination of the individual subject performances reveals that, without channel selection, subject 7 achieved an accuracy of 88.28%, subject 8 attained 84.35%, and subject 9 achieved 66.67%. However, when channel selection was applied, subject 7’s accuracy dropped significantly to 79.69%, while subject 8 and subject 9’s performances remained unchanged at 84.35% and 66.67%, respectively. This suggests that the channel selection process may negatively impact some subjects more than others, particularly in the case of subject 7, indicating potential variability in the effectiveness of channel selection across different subjects.
Based on the data presented in
Table 11, a comparison of efficiency versus average accuracy for subjects 7, 8, and 9 between the FCNNA and LFCNN models, both with and without transfer learning (TL), highlights several key differences. Without transfer learning, the FCNNA model requires 9 h and 30 min for cross-subject classification, achieving an average accuracy of 75.79%, with 358.20 KB of trainable parameters. In contrast, the LFCNN model completes the task in 2 h and 58 min, with a higher accuracy of 78.70% and using fewer trainable parameters 31.86 KB. When transfer learning is introduced, with four divisions (Four D.) and freezing at 44 layers (F44), the FCNNA model performs the classification in just 5 min, achieving an accuracy of 79.77% without channel selection. With channel selection, the time is reduced to 3 min, achieving an accuracy of 76.90%. In both cases, the model uses 114.53 KB of trainable parameters. The LFCNN model, with transfer learning, achieves a runtime as short as 2 min, with accuracies varying by approach, ranging from 68.85% to 72.88%. This analysis underscores the trade-off between model efficiency and performance. Although the LFCNN model is more parameter-efficient, it generally achieves higher accuracy than the FCNNA model without transfer learning. However, the effectiveness of transfer learning introduces additional considerations, balancing reduced computation time with potential impacts on accuracy. In general, the trainable parameters become the same, either with or without using channel selection, since channel value is affected at the beginning layers where these layers are frozen.
5.2.2. Transfer Learning on Two Subjects and One Subject
As part of our ongoing experiments, we explore a scenario in which transfer learning is applied to the target subjects, either alternately across two subjects or individually to a single subject. This follows our earlier experiments in which transfer learning was applied across three subjects, allowing us to further investigate model adaptability and performance under different conditions. We compare the results of the four main models, each pre-trained on testing subject 1, under conditions without freezing (No F.) and with freezing applied at layer 44, and using two setups, namely six and four divisions, over 30 epochs.
Starting with two-subject transfer learning,
Figure 9 presents the average transfer-learning results for each two-subject combination across the selected classifiers, compared with the previously reported two-subject average accuracies without transfer learning, as shown in
Table 5. Based on the averages of subjects 7 and 8, applying transfer learning with the FCNNA model improved the accuracy by approximately 6.55% without channel selection and 0.46% with channel selection. For the averages of subjects 8 and 9, the accuracy without transfer learning reached 77.37% with the FCNNA model and 80.00 percent with the LFCNN model. When applying transfer learning, the best accuracy was achieved with the FCNNA model without channel selection, improving the FCNNA accuracy by approximately 1.11% but still falling short of the LFCNN accuracy by approximately 1.77%. According to the average accuracy of subjects 7 and 9, transfer learning with the FCNNA model without channel selection achieved the highest accuracy, improving the non-transfer-learning classification accuracy by approximately 0.17 percent. Additionally, transfer learning with the FCNNA model and channel selection led to a significant improvement, increasing the non-transfer-learning accuracy of FCNNA by approximately 2.61 percent. In conclusion, the FCNNA model, both with and without channel selection, generally improved the results of non-transfer-learning classification, especially when utilizing four divisions and freezing 44 layers, which also enhanced the time and memory usage.
Single-subject classification with transfer learning has been the standard approach in previous studies. As shown in
Figure 10, transfer learning significantly improves performance, with all models in various configurations enhancing the accuracy of subject 7 by approximately 12.19 percent, increasing from 76.09% to 88.28%. For subject 8, the LFCNN model demonstrated greater improvement compared to the FCNNA model. Finally, for subject 9, improvements were only observed with the FCNNA model when using channel selection, but without freezing.
5.2.3. Transfer-Learning Strategies Analysis
The experiments in our study reveal an intriguing pattern in performance based on the combination of transfer-learning strategies employed (freezing levels, divisions, and learning rates) across different classifiers and subject groups. To illustrate this,
Figure 11 presents a heatmap displaying the accuracy results for each subject group across all main classifiers and the various strategies used. The heatmap uses color gradients ranging from blue (indicating lower accuracy values) to red (indicating higher accuracy values) to visualize the performance metrics. From the figure, it is evident that transfer learning on a single subject generally yields better accuracy compared to multiple-subject averages. However, using two or three averaged subjects yields better results when combined with four divisions and freezing strategies. Additionally, the FCNNA model without channel selection provides the best accuracy in most scenarios. When averaging two or three subjects, FCNNA with channel selection ranks as the second-best option in terms of accuracy. On the other hand, the LFCNN model, both with and without channel selection, demonstrates strong performance in the accuracy results for individual subjects.
For further clarity, we analyzed the instances where each strategy achieved the best accuracy across the 28 columns of the heatmap table, as illustrated in
Figure 12a. In each subject group, the best result from each model is presented in a way that highlights the techniques used, such as freezing, division, and learning rates. The figure shows that four divisions yielded the best results, outperforming six divisions, which provided the highest accuracy in only 3 out of the 28 cases. Additionally, learning rates 2 and 1 produced higher accuracy, with learning rate 2 being the most effective. In contrast, learning rate 3 achieved the highest accuracy in only one situation. Furthermore, freezing strategies delivered the best accuracy when averaging three subjects, followed by two subjects. However, in single-subject scenarios, the non-freezing strategy performed better, which is expected since the model can more easily adapt to the specific features of that individual subject. Regarding the training time,
Figure 12b illustrates the duration in seconds required to complete a single rehabilitation session, with the data divided into either four or six divisions. As previously discussed, the FCNNA model without channel selection usually produces the best results across all subject groups. However, this configuration also incurs the longest processing time, approximately 4 min for three subjects. Generally, there is a tradeoff between time and accuracy, except when applying the freezing strategy. Freezing allows for a significant reduction in processing time—by about half to one and a half minutes—while still improving accuracy, particularly in multi-subject transfer-learning scenarios.
5.3. Comparison of Previous Works
To demonstrate the performance of our approach, we compared it against previous work that applied cross-subject transfer learning on the BCI IV 2a dataset. In our study, we applied transfer learning exclusively to subjects 7, 8, and 9, so the comparison focuses solely on these three subjects. As previously mentioned, our study is unique in that it applies multi-subject transfer learning with multiple target subjects, but we also include cases where only one subject is used as the target. Therefore, our comparison covers all possible scenarios.
Table 12 illustrates a comparison of various approaches applied to the BCI IV 2a dataset. The paper in [
9] proposed an adaptive transfer-learning approach, with the results demonstrated on the same dataset, as shown in [
8]. The study in [
8] also explains and presents the results of the multi-direction transfer-learning (MDTL) strategy applied to the EEGNet model (EEGNet_MDTL). Additionally, the paper in [
41] conducted experiments on this dataset using two approaches: RA-MDRM, detailed in [
40], and EA-CSP-LDA. The results of the MMFT and DSTL methods are presented in figures in the papers [
35,
39], Accordingly, the values in the following comparisons were derived directly from these figures. The papers in [
37,
48] represent state-of-the-art works focusing on domain adaptation techniques. While ref. [
37] combined GNNs with transfer learning, ref. [
48] applied cross-subject classification without transfer learning. The study in [
48] incorporated a feature extraction module, including graph-related features, and integrated it with domain generalization (DG) techniques, which help extract domain-invariant features that can be effectively applied to unseen target data. For both papers, the results of cross-subject four-class classification are presented in [
48]. According to the results, in our work, subject 7 consistently achieved the best performance across all scenarios, particularly in the two-subject configuration. Furthermore, our method demonstrated the highest overall average accuracy in all configurations, especially when applying two-target-subject transfer learning. This comparison highlights the effectiveness of our approach, especially for subject 7, which consistently outperforms other subjects. Moreover, our method achieves the best overall average accuracy across all scenarios.
7. Conclusions
In this study, we proposed a novel multi-subject transfer-learning method to enhance online rehabilitation systems using edge computing in an IoT environment. This approach improves MI classification accuracy and enables real-time data integration of new subjects, supporting more effective rehabilitation. The architecture consists of three layers, namely cloud, edge, and sensor, each enhancing system efficiency and responsiveness. The edge layer minimizes communication latency while enabling a unified model for local predictions. This model starts as a pre-trained version in the cloud and then is retrained at the edge node using local EEG sensor data. Data are transmitted via Bluetooth to the edge gateway (Raspberry Pi) and relayed to the edge node via Wi-Fi, maintaining continuous updates.
Transfer learning for MI classification was applied using different strategies, including freezing layers, varying data divisions, and adjusting the number of epochs. These strategies were tested on two main models, FCNNA and LFCNN, both with and without channel selection. Our goal was to optimize accuracy while maintaining computational efficiency. We observed that reducing the epochs from 1000 to as low as 30 significantly improved both accuracy and efficiency, with 30 epochs achieving the best results. Freezing layers at different levels (6, 16, 26, and 44) also reduced the trainable parameters and computation time, with greater accuracy achieved as more layers were frozen, especially in the multi-subject setting.
We evaluated the proposed framework using the BCI IV 2a dataset for both multi- and single-subject transfer learning, focusing on subjects 7, 8, and 9. The highest accuracy was achieved by the FCNNA model using 30 epochs, four data divisions, and freezing 44 layers, reaching 79.77% without channel selection and 76.90% with it. In two-subject transfer learning, the accuracy improved by up to 6.55%, while single-subject transfer learning saw enhancements of about 12.19%. Notably, our approach significantly boosted the accuracy for subject 7, and the average across subjects 7, 8, and 9 surpassed previous studies.
Overall, this study demonstrates the potential of multi-subject transfer learning to enhance MI classification within an edge-computing and IoT-based rehabilitation system. The enhancements in cross-subject classification through various transfer-learning strategies, applied to both multi-subject and single-subject data, underscore the robustness of the proposed framework in adapting to diverse patient data. Future research might expand this approach to accommodate larger populations and investigate its use in a wider range of motor imagery tasks. Moreover, practical implementation and testing of the proposed system in real-world environments is an under-development task that will help to estimate through experience the real magnitude of the proposed approach. This includes addressing key challenges, such as managing network delays, ensuring efficient real-time data processing, and achieving system scalability. The findings of this study provide a foundation for creating personalized, efficient, and adaptive healthcare solutions, paving the way toward fully autonomous and practical rehabilitation systems.