Cross-Domain Transfer of EEG to EEG or ECG Learning for CNN Classification Models

Electroencephalography (EEG) is often used to evaluate several types of neurological brain disorders because of its noninvasive and high temporal resolution. In contrast to electrocardiography (ECG), EEG can be uncomfortable and inconvenient for patients. Moreover, deep-learning techniques require a large dataset and a long time for training from scratch. Therefore, in this study, EEG–EEG or EEG–ECG transfer learning strategies were applied to explore their effectiveness for the training of simple cross-domain convolutional neural networks (CNNs) used in seizure prediction and sleep staging systems, respectively. The seizure model detected interictal and preictal periods, whereas the sleep staging model classified signals into five stages. The patient-specific seizure prediction model with six frozen layers achieved 100% accuracy for seven out of nine patients and required only 40 s of training time for personalization. Moreover, the cross-signal transfer learning EEG–ECG model for sleep staging achieved an accuracy approximately 2.5% higher than that of the ECG model; additionally, the training time was reduced by >50%. In summary, transfer learning from an EEG model to produce personalized models for a more convenient signal can both reduce the training time and increase the accuracy; moreover, challenges such as data insufficiency, variability, and inefficiency can be effectively overcome.


Introduction
Electroencephalography (EEG) is often used to evaluate several types of neurological brain disorders, such as epilepsy, dementia (e.g., Alzheimer's disease), mental illness (e.g., depression), sleep disturbance, and unexplained headaches (e.g., intracranial hematoma) [1]. As artificial intelligence techniques have improved, many researchers have used machinelearning or deep-learning technology to identify or classify physiological signals [2][3][4] to reduce the burden on doctors and the time patients spend waiting for their diagnosis. Although machine learning is a mature field, with most algorithms, domain knowledge still needs to be applied for the feature selection [5]. By contrast, in deep-learning, useful features are automatically extracted, simplifying data preprocessing and improving recognition performance. For example, Shoeibi et al. [6] compared the performance of several conventional machine-learning methods-including support vector machine (SVM), k-nearest neighbors, decision tree, naïve Bayes, random forest, extremely randomized trees, and bagging-with that of three deep-learning architectures-the convolutional neural network (CNN), long short-term memory (LSTM), and one-dimensional (1D) CNN-LSTM-in schizophrenia (SZ) diagnosis based on z-score-normalized EEG signals from 14 subjects without and 14 patients with SZ. Bagging classification obtained the highest accuracy from the machine-leaning models (81% accuracy); the best deep-learning algorithm, the 1D-CNN-LSTM model, achieved a substantially superior accuracy of 99%.
However, the application of deep-learning requires the collection of a large dataset and substantial training time. In practice, the accuracy of models that have been well-trained Sensors 2023, 23, 2458 2 of 18 often decreases substantially when the models are applied to new data. For example, Cimtay and Ekmekcioglu [7] selected a pretrained CNN model, InceptionResnetV2, for classifying emotions from EEG data. In their one-subject-out binary classification tests on the SJTU Emotion EEG Dataset (SEED), InceptionResnetV2 achieved a mean accuracy of 82.94%; however, the mean cross-dataset prediction accuracy of the model trained on SEED and tested on the Loughborough University Multimodal Emotion Dataset was only 57.89%. That means developing and training a bespoke model for each patient would require an excessive investment of time and resources. Furthermore, many sleep monitoring studies have input multichannel EEG data to deep-learning models successfully. However, for clinical use, multichannel EEG must be performed by a professional. If such signals were to be collected using a wearable device at home, various factors would have to be considered, including long-term data storage, easy operation by a nonprofessional, and user comfort. Hence, many researchers have begun to investigate the potential of using other physiological signals-such as electrocardiogram (ECG), respiration, or blood oxygen-for sleep assessment. For example, Urtnasan et al. [8] used a deep convolutional recurrent model for the automatic scoring of sleep stages on the basis of raw single-lead ECG data from 112 subjects. They achieved an overall accuracy of 74.2% for five classes and 86.4% for three classes. Although they concluded that ECG can be used for at-home sleep monitoring, effectively improving the low accuracy of this method would be challenging.
In recent years, researchers have applied transfer learning in attempts to overcome these challenges (e.g., [9,10]). In transfer learning, the knowledge of a trained model, such as its features and weights, are input to a new model for further use. That means it reuses a pre-trained model for a new problem. Many implementations are to start from a pre-trained model, remove/freeze task-specific top layers, and fine-tune bottom layers of the new data. Here, the pre-trained models are partially transferred since only parameters in the bottom layers are transferred. Some examples of pre-training models in fine-tuning include AlexNet, ResNet, and VGG-16 [11]. This can greatly reduce not only the training data required but also the computing resources and time required for training a new model. For example, Zargar et al. [12] combined three ImageNet CNNs with three classifiers for predicting seizures. The Xception convolutional network with a fully connected (FC) classifier achieved a sensitivity of 98.47% for 10 patients from a European database, and the MobileNet-V2 model with an FC classifier trained on only one patient's data but tested on six other patients achieved a sensitivity of 98.39%. Their study demonstrated the feasibility of the cross-patient application and performance improvements enabled by transfer learning. One interesting application of transfer learning is cross-signal transfer learning, in which a pretrained model with one type of signals is transferred to another, completely different type of signals. However, cross-domain transfer learning is rarely applied in the medical literature. Bird et al. [13] attempted to use unsupervised transfer learning to adapt a multilayer perceptron and CNN network for EEG classification to electromyographic (EMG) classification. Their results revealed that if only EEG or EMG was used to train the model, the accuracy was 62% or 84%, respectively. However, EEG to EMG transfer learning (i.e., EEG pretrained weights were used as the initial weight distribution for the EMG classification models) and EMG to EEG transfer learning achieved accuracies of 85% and 93%, respectively. Hence, EEG to EMG transfer learning did result in a higher initial classification accuracy than using EMG alone; however, the improvement was lower than that of EMG to EEG transfer learning. This result demonstrated the possibility of using cross-domain transfer learning for different biosignals to reduce both the complexity of the models and the difficulty and tediousness of signal collection.
EEG can be used to detect brain abnormalities and provides an effective basis for patient evaluation. However, the method has many practical challenges. By using transfer learning, the aforementioned problems-time-consuming model training, low accuracy on novel data, and insufficient training data-might be effectively solved. Therefore, this study attempted to apply transfer learning to EEG-based classification to explore the effectiveness of various cross-domain training methods for improving recognition performance. Two simple experiments were performed for the verification of the proposed methods: (1) in Experiment 1, a seizure prediction system for detecting interictal and preictal periods was developed by using a patient-specific/cross-dataset transfer learning strategy. The preictal period was defined as 20, 30, or 40 min before a seizure. Epilepsy is a chronic neurological disease caused by abnormal brain electrical activity; it influences the behavior, movement, sensory perceptions, or cognition to negatively affect work, daily life, and social relationships [14]. An early seizure warning could greatly reduce the danger to and harm experienced by patients with epilepsy. In the experiment, a general epilepsy prediction model based on a CNN was first developed and then adapted for particular patients by using transfer learning to fine-tune parameters with the goal of reducing the model development time and improving the results for each patient. (2) In Experiment 2, a sleep staging system for detecting the five sleep stages was developed by using a cross-signal transfer learning strategy. Collecting ECG signals during sleep is easier and more convenient than collecting EEG signals; however, ECG models typically have lower accuracy. Hence, in the experiment, a CNN-based sleep staging model for EEG was first developed and validated; the EEG model was then converted into an ECG model and fine-tuned in an attempt to reduce the required number of training samples for the ECG model and achieve higher accuracy. CNN is a common type of neural network model used in deep-learning. Because of its automatic detection of visual features, CNN is widely used in image segmentation and classification. This main advantage is also suitable when applied to EEG raw data for a variety of recognition purposes [15]. ± standard deviation age 42.6 ± 13.8 years) were obtained; one patient included in the database had data of insufficient length, so these data were excluded. The record duration was 9 h 17 min ± 5 h 39 min [16,17]. From the Zenodo dataset, EEG signals were obtained for 14 patients with epilepsy (age 17.4 ± 9.6 years), excluding one as well, with a record duration of 7 h 55 min ± 4 h 15 min [18]. For each patient, the diagnosis of epilepsy and classification were made by a doctor. All patients provided written informed consent approved by the Ethics Committee of the University of Siena.

Data Analysis
EEG signals were preprocessed using MATLAB R2019a v9.6.0 in three steps: (1) all signals were detrended to remove means, offsets, and slow linear drifts over the time course; (2) the detrended signals were filtered using a 0.5-50 Hz bandpass filter; and (3) the global field power was computed over time for the filtered 29-channel signals using the formula [19]: where t is the time in milliseconds, N is the number of channels, x i is the value at time point t, and x is the mean value across channels at time point t. After preprocessing, the signals were truncated by using 10-s overlapping windows with 8 s of overlap and divided where t is the time in milliseconds, N is the number of channels, is the value at time point t, and ̅ is the mean value across channels at time point t. After preprocessing, the signals were truncated by using 10-s overlapping windows with 8 s of overlap and divided into four epileptic states: (1) seizure: the period after the previous seizure and before the current seizure with an interval of at least 50 min [13].

Classification and Performance Evaluation
The CNN model was implemented using Python v3.8.8 on a personal computer with an Intel Core i7-10700K CPU, NVIDIA Quadro RTX 4000, and 64.0 GB of RAM running Windows 10 with CUDA 10.1. We modified the model of Wang et al. [20]; the model comprised four convolutional layers, five pooling layers, and three FC layers (Table 1).

Classification and Performance Evaluation
The CNN model was implemented using Python v3.8.8 on a personal computer with an Intel Core i7-10700K CPU, NVIDIA Quadro RTX 4000, and 64.0 GB of RAM running Windows 10 with CUDA 10.1. We modified the model of Wang et al. [20]; the model comprised four convolutional layers, five pooling layers, and three FC layers (Table 1). Three approaches were used for training: recordwise, subjectwise, and patient-specific. For all approaches, 10-fold cross-validation was used to evaluate the trained models. The optimized model was then validated on the testing dataset by calculating its accuracy, specificity, and sensitivity. These processes were performed five times ( Figure 2).
In the recordwise approach, data from two datasets were randomly divided into two sets: 90% for training (approximately 11,000 samples per state) and 10% for testing (approximately 1222 samples per state). In the subjectwise approach, the Siena Scalp EEG data were used for training (11,000 samples per state), and the Zenodo data were used for testing (1222 trials per state). In the patient-specific transfer learning, the subjectwisetrained model was transferred to a model for the data of an individual subject in the Zenodo dataset. Subject data were randomly divided into training and testing datasets in a 90:10 ratio (178 and 20 samples per state, respectively). The first 12, 9, 6, or 3 layers were frozen (i.e., their weights were fixed) and the unfrozen layers were retrained for the individual. The performance of the models with various numbers of frozen layers was compared ( Figure 3).
Three approaches were used for training: recordwise, subjectwise, and patient-specific. For all approaches, 10-fold cross-validation was used to evaluate the trained models. The optimized model was then validated on the testing dataset by calculating its accuracy, specificity, and sensitivity. These processes were performed five times ( Figure 2).

Experiment 2 2.2.1. Datasets
We used EEG data downloaded from the Sleep Cassette subset of the Sleep-EDFX database [16,21], which consists of 153 polysomnographic (PSG) recordings. Seventy-eight healthy subjects (age = 58.8 ± 22.4 years) were included and the record duration was approximately 20 h, including the whole sleep period. The ECG data were downloaded from the Haaglanden Medisch Centrum (HMC) sleep staging database [16,22], which consists of 154 PSG files. A total of 154 patients with different sleep disorders (age = 53.8 ± 15.4 years) were included, and the record duration was 7-13 h. testing (1222 trials per state). In the patient-specific transfer learning, the subjectwisetrained model was transferred to a model for the data of an individual subject in the Zenodo dataset. Subject data were randomly divided into training and testing datasets in a 90:10 ratio (178 and 20 samples per state, respectively). The first 12, 9, 6, or 3 layers were frozen (i.e., their weights were fixed) and the unfrozen layers were retrained for the individual. The performance of the models with various numbers of frozen layers was compared ( Figure 3).

Datasets
We used EEG data downloaded from the Sleep Cassette subset of the Sleep-EDFX database [16,21], which consists of 153 polysomnographic (PSG) recordings. Seventy-eight healthy subjects (age = 58.8 ± 22.4 years) were included and the record duration was approximately 20 h, including the whole sleep period. The ECG data were downloaded from the Haaglanden Medisch Centrum (HMC) sleep staging database [16,22], which consists of 154 PSG files. A total of 154 patients with different sleep disorders (age = 53.8 ± 15.4 years) were included, and the record duration was 7-13 h.

Data Acquisition
EEG signals were recorded using a dual-channel (Fpz-Cz) cassette recorder at a sampling rate of 100 Hz. Each 30-s epoch was manually labeled by experts in accordance with the R&K standard [23] as belonging to one of six sleep stages: wake, S1, S2, S3, S4, or REM. We coded S1 as NREM1, S2 as NREM2, and combined S3 and S4 as NREM3 in accordance with the American Academy of Sleep Medicine (AASM) standard. ECG signals were recorded using a SOMNOscreen PSG recorder at a sampling frequency of 256 Hz. Each 30-s epoch was manually labeled by sleep technicians at HMC in accordance with the AASM standard ( Figure 4).

Data Analysis
EEG signals were preprocessed in two steps using MATLAB R2019a v9.6.0. First, all signals were detrended to remove means, offsets, and slow linear drifts over the time course. These detrended signals were then filtered using a 30-Hz lowpass filter. After preprocessing, the signals were truncated using 30-s windows with 22.5-s overlaps and categorized by sleep state. A total of 16,000 samples were obtained for each state. ECG signals were similarly preprocessed in two steps using MATLAB R2019a v9.6.0. All signals were first detrended to remove means, offsets, and slow linear drifts over the time course. These detrended signals were then filtered using a 0.5-40-Hz bandpass filter. After preprocessing, the truncation process was performed for the EEG signals. A total of 16,000 samples were obtained for each state. the R&K standard [23] as belonging to one of six sleep stages: wake, S1, S2, S3, S4, or REM. We coded S1 as NREM1, S2 as NREM2, and combined S3 and S4 as NREM3 in accordance with the American Academy of Sleep Medicine (AASM) standard. ECG signals were recorded using a SOMNOscreen PSG recorder at a sampling frequency of 256 Hz. Each 30-s epoch was manually labeled by sleep technicians at HMC in accordance with the AASM standard ( Figure 4).

Classification and Performance Evaluation
The CNN model was implemented using Python v3.5.4 running on a personal computer with an Intel Core i7-9700K CPU, NVIDIA Geforce RTX 2060, and 64.0 GB of RAM and running Windows 10 with CUDA 10.1. We modified the model of Jadhav and Mukhopadhyay [24]; the model comprised three blocks and two FC layers. Block_1 and block_2 each comprised two convolutional layers, two batch normalipyzation (BN) layers, and one pooling layer; block_3 comprised one convolutional layer, one BN layer, and one global pooling layer ( Table 2). The EEG to ECG transfer learning was performed in three steps: (1) the construction of an EEG-based sleep stage model; (2) transfer of the trained EEG  (3) freezing of block_1-3, block_1-2, or block_1 (i.e., fixing the pretrained weights) and retraining of the unfrozen layers ( Figure 5).  Data were randomly divided into two sets: 80% for training and 20% for testing. A five-fold cross validation was used to evaluate the trained models. The optimal model was then tested using the testing dataset and evaluated its accuracy, Cohen's kappa, and the F1-score. These processes were performed 10 times ( Figure 6). Data were randomly divided into two sets: 80% for training and 20% for testing. A five-fold cross validation was used to evaluate the trained models. The optimal model was then tested using the testing dataset and evaluated its accuracy, Cohen's kappa, and the F1-score. These processes were performed 10 times ( Figure 6). Data were randomly divided into two sets: 80% for training and 20% for testing. A five-fold cross validation was used to evaluate the trained models. The optimal model was then tested using the testing dataset and evaluated its accuracy, Cohen's kappa, and the F1-score. These processes were performed 10 times ( Figure 6). Figure 6. Scheme of the training process for a 5-fold cross-validation. Figure 6. Scheme of the training process for a 5-fold cross-validation.

Experiment 1
The effectiveness of the three training approaches for establishing a CNN-based epilepsy prediction model was investigated. The results for the recordwise training (Table 1) revealed that the accuracy, sensitivity, and specificity for classifying interictal and preictal 20-10-, 30-20-, and 40-30-min states were all greater than 99%, 98%, and 99%, respectively; the training time for all three models was approximately 2 h. Hence, this training approach had an excellent performance, but the training was somewhat time-consuming.

Experiment 2
Three CNN-based sleep staging models were established: the EEG model, the ECG model, and the EEG-ECG transfer learning model ( Table 5). The EEG model for sleep stage classification achieved accuracy, Cohen's kappa, and F1 scores of 92.67%, 0.908, and 92.69%, respectively; the training time (including five-fold cross validation) was approximately 1.5 h; the favorable Cohen's kappa and F1 scores indicated that the model had favorable validity and reliability. The ECG model achieved accuracy, Cohen's kappa, and F1 scores of 86.13%, 0.827, and 86.07%, respectively; the training time was still approximately 1.5 h. Finally, the EEG-ECG transfer learning model with block_1 frozen achieved metrics superior to the ECG-only model: 88.64%, 0.858, and 88.59%, with a lower training time of approximately 47 min. Freezing block_1 and block_2 or all three blocks resulted in lower scores than the ECG model; however, the training time was far shorter than that for the ECG model at approximately 17 min. Hence, the model with block_1 frozen (two convolutional layers, two BN layers, and one pooling layer) achieved both a higher performance and a lower training time than the ECG-only model.  Figure 7 illustrates the accuracy and loss functions of the EEG, ECG, and EEG-ECG models. An early stop strategy with a patience of 10 was implemented to terminate the training process. The validation accuracy and loss curve of the EEG model increased and decreased quickly, respectively. The validation accuracy and loss curves of the ECG model both fluctuated initially and then stabilized. For the EEG-ECG transfer learning model, the validation accuracy and loss curve were initially high and low, respectively, but slowly stabilized after fluctuating slightly. Overall, overfitting was not evident for any of the three models; hence, the training was judged to be effective.   Figure 7 illustrates the accuracy and loss functions of the EEG, ECG, and EEG-ECG models. An early stop strategy with a patience of 10 was implemented to terminate the training process. The validation accuracy and loss curve of the EEG model increased and decreased quickly, respectively. The validation accuracy and loss curves of the ECG model both fluctuated initially and then stabilized. For the EEG-ECG transfer learning model, the validation accuracy and loss curve were initially high and low, respectively, but slowly stabilized after fluctuating slightly. Overall, overfitting was not evident for any of the three models; hence, the training was judged to be effective.  Figure 8 presents the confusion matrixes for the three sleep staging models. Except for NREM1, which was frequently misclassified as waking or NREM2, high classification accuracies were achieved. Hence, the three models achieved favorable classification results and no substantial imbalance was identified.

Experiment 1
Recordwise training is a commonly used approach in initial deep-learning research. In training, data from all subjects in a database are randomly divided into a training set and testing set; each sample (from the same subject) is considered independent. For example, Acharya et al. [25] developed a computer-aided seizure diagnosis system to automatically distinguish the class of EEG signals (i.e., normal, preictal, or seizure) by using a 13-layer CNN model. They employed a dataset of 100 epochs for each of five healthy subjects and five patients with epilepsy, while 90% of the total data was set for training. Their model achieved an accuracy, specificity, and sensitivity of 88.67%, 90.00%, and 95.00%, respectively. Moreover, Wei et al. [26] proposed a long-term recurrent CNN for discriminating preictal from interictal states for seizure prediction. They similarly used a 9:1 ratio to divide the EEG data of each subject into training and test sets. Their seizure prediction model achieved an accuracy of 93.40%, prediction sensitivity of 91.88%, and specificity of 86.13%. In our experiment, all EEG samples from 27 patients with epilepsy in two datasets were mixed and were randomly divided into training and test sets (1222 samples per state). The classification accuracy, sensitivity, and specificity for interictal and preictal (regardless of period) states were all greater than 98%. These results indicate that the recordwise trained model is often only effective for classifying the used dataset, and its performance on novel data is poor [7].
Many studies have adopted subjectwise training for deep-learning models in which data from an individual are included in either the training or the testing set. This method better matches practical applications of the trained model to novel patients. However, the accuracy is an inevitable issue. In our experiment, we used EEG data from one dataset (Siena Scalp EEG database) for training, and EEG data from another dataset (Zenodo database) for testing. The accuracy decreased from 98% for the recordwise approach to only 84%; this may be attributable to the inter-person differences and the diversity of the data. This problem of a cross-subject domain shift has partly been addressed by some scholars. For example, Wang et al. [27] proposed a multiscale CNN, known as SEEG-Net, for evaluating drug-resistant epilepsy. They conducted cross-validation on a multicenter stereoelectroencephalography dataset by using the leave-one-group-out method and achieved an accuracy of 94.12% and 87.02% for the MAYO and FNUSA datasets, respectively; leaveone-subject-out cross validation on a private clinical dataset led to an accuracy of 93.85%. Although their proposed model performed highly in detecting pathological activity, it still has insufficient generalizability for practical applications.

Experiment 1
Recordwise training is a commonly used approach in initial deep-learning research. In training, data from all subjects in a database are randomly divided into a training set and testing set; each sample (from the same subject) is considered independent. For example, Acharya et al. [25] developed a computer-aided seizure diagnosis system to automatically distinguish the class of EEG signals (i.e., normal, preictal, or seizure) by using a 13-layer CNN model. They employed a dataset of 100 epochs for each of five healthy subjects and five patients with epilepsy, while 90% of the total data was set for training. Their model achieved an accuracy, specificity, and sensitivity of 88.67%, 90.00%, and 95.00%, respectively. Moreover, Wei et al. [26] proposed a long-term recurrent CNN for discriminating preictal from interictal states for seizure prediction. They similarly used a 9:1 ratio to divide the EEG data of each subject into training and test sets. Their seizure prediction model achieved an accuracy of 93.40%, prediction sensitivity of 91.88%, and specificity of 86.13%. In our experiment, all EEG samples from 27 patients with epilepsy in two datasets were mixed and were randomly divided into training and test sets (1222 samples per state). The classification accuracy, sensitivity, and specificity for interictal and preictal (regardless of period) states were all greater than 98%. These results indicate that the recordwise trained model is often only effective for classifying the used dataset, and its performance on novel data is poor [7].
Many studies have adopted subjectwise training for deep-learning models in which data from an individual are included in either the training or the testing set. This method better matches practical applications of the trained model to novel patients. However, the accuracy is an inevitable issue. In our experiment, we used EEG data from one dataset (Siena Scalp EEG database) for training, and EEG data from another dataset (Zenodo database) for testing. The accuracy decreased from 98% for the recordwise approach to only 84%; this may be attributable to the inter-person differences and the diversity of the data. This problem of a cross-subject domain shift has partly been addressed by some scholars. For example, Wang et al. [27] proposed a multiscale CNN, known as SEEG-Net, for evaluating drug-resistant epilepsy. They conducted cross-validation on a multicenter stereoelectroencephalography dataset by using the leave-one-group-out method and achieved an accuracy of 94.12% and 87.02% for the MAYO and FNUSA datasets, respectively; leave-one-subject-out cross validation on a private clinical dataset led to an accuracy of 93.85%. Although their proposed model performed highly in detecting pathological activity, it still has insufficient generalizability for practical applications.
The quality of EEG signals is affected by breathing, blinking, and swallowing during the measurement. In addition, individual differences may also affect evaluations based on these signals [28]. To avoid overfitting, deep-learning requires an enormous volume of training data and hence a long training time, delaying system development. Hence, we selected patient-specific transfer learning for retraining our model; specifically, data for a specific subject in the Zenodo dataset were used to fine-tune a model pretrained on the Siena Scalp EEG database. This method required a smaller amount of data, achieved high accuracy, and required little additional training time to produce the customized model. Layer-wise transfer learning is a commonly used approach in which some layers are frozen to decrease the training time. If a few layers are frozen, the model has high elasticity but requires a longer training time; by contrast, freezing many layers reduces the training time but often reduces the accuracy. Our experimental results indicated that a model with six frozen layers had a short training time (~40 s) and achieved the highest accuracy of nearly 100%. Freezing nine layers achieved a similar performance to freezing six layers; however, the imperfect results for patient 11 revealed that such a model may have insufficient elasticity to be applicable to all individuals. The optimal number of frozen layers may depend on the size of the training data [29]. Hence, for smaller datasets, training the FC layers alone is insufficient; some convolutional layers must also be trained to obtain a stable, accurate model.
Finally, we compared the accuracy rates of our model with those of models reported by other recent studies on epileptic seizure prediction using EEG data ( Table 6). Dissanayake et al. [30] extracted Mel-frequency cepstrum coefficients (MFCCs) features from EEG signals and used them in a graph neural network (C-GNN) based on geometric deep-learning to predict epileptic seizures. Their subject-independent models were trained through a 10-fold cross-validation with over a 95% accuracy in both CHB-MIT and Siena databases. Zhao et al. [31] proposed a novel end-to-end model AddNet-SCL for seizure prediction based on EEG signals. They used a quasi-patient-specific method (i.e., 0.75 × (1−1/N), 0.25 × (1−1/N), and 1/N of a patient's EEG data were used for the training, validation, and testing, respectively; where N was the number of seizure events) to conduct separate model training for each subject from CHB-MIT and Kaggle databases, and achieved 0.94 AUC and 0.831 AUC, respectively. Considering the robustness and generalization of the learning models, either training manner, i.e., subject independent or patient-specific, could achieve a high performance, while ours had the highest accuracy, specificity, and sensitivity. Furthermore, the use of raw EEG data in our experiment can facilitate the processes of data collection and processing and benefit future applications, bypassing the need for feature extraction or selection. S-Ind: subject independent; P-Spc: patient-specific.

Experiment 2
Silveira et al. [32] used random forest to classify 106,376 single-channel EEG epochs from the Physionet public database into two-to six-state sleep stages. They computed the kurtosis, skewness, and variance of the coefficients decomposed through the discrete wavelet transform as classification features. The accuracy and Cohen's kappa were >90% and >0.8, respectively, demonstrating that single-channel EEG is a feasible method of sleep staging. More recently, many studies have applied various deep-learning models for sleep staging with the goal of achieving automatic and accurate classification by avoiding manual feature extraction. For example, Yildirim et al. [33] developed a 1D-CNN model by using EEG signals from two public databases (Sleep-EDF and Sleep-EDFX) for the sleep stage classification. The accuracy of the model for five sleep classes on single-channel EEGs from the Sleep-EDF and the Sleep-EDFX databases was 90.83% and 90.48%, respectively. In our experiment, we also used the Fpz-Cz single-channel EEG signals from the Sleep-EDFX database for five-class sleep staging to train a modified 1D-CNN (10 layers in total; 9 layers fewer than in the model of [33]). The accuracy reached 92.67%, indicating that using fewer convolutional layers and max pooling instead of average pooling can slightly improve both the accuracy (~2%) and training efficiency. Although max pooling retains key sleep features in EEG, it ignores secondary features that may be effective for classification. By contrast, average pooling retains these features.
Due to the increasing prevalence of wearable biosignal sensors, many researchers have begun to study ECG sleep staging as an alternative to EEG staging. For example, Ebrahimi et al. [34] extracted features from ECG-derived respiration signals based on the R and S waves of the QRS complex, raw thoracic respiratory rate (R), and heart rate variability (HRV) and evaluated the performance of various signal combinations in an SVM automatic sleep staging model. Their best accuracy (89.32%) for classifying four stages-wake, Stage 2, slow wave sleep, and REM-was obtained when using HRV and R signals. Furthermore, Wei et al. [35] extracted 25 features from HRV and R signals and used LSTM for the two-to five-class sleep staging of patients with mental disorders, achieving accuracies of 89.84%, 84.07%, 77.76%, and 71.16% and Cohen's kappa of 0.52, 0.58, 0.55, and 0.52, respectively, for the four classification tasks. These results indicate that increasing the number of classes decreases the performance; improving the accuracy requires the combination of various signals with different features. However, manual feature selection is time-consuming. Hence, some researchers have used deep-learning models for ECG sleep staging. For example, Tang et al. [36] used a CNN with gated recurrent units to classify sleep stages into four classes on the basis of single-lead ECG signals from the three public datasets SHHS2, SHHS1, and MESA. Their best accuracy and Cohen's kappa were 80.6% and 0.70, respectively-a substantial improvement over previous attempts at cross-dataset classification. In our experiment, we used a CNN model for five-class sleep staging and an achieved average accuracy was 86.13%, which demonstrates that the model structure is effective for both EEG and ECG signals; the model achieved favorable performance for both signals, but the ECG model required more computational resources and training time.
Therefore, we applied transfer learning to improve the performance of the ECG model by basing it on the highly accurate EEG model. Freezing block_1 produced an EEG-ECG transfer learning model with an accuracy of 88.64%, a small improvement (~2.5%) compared with the ECG-only model. Radha et al. [37] trained an LSTM model to classify four-class sleep stages by using ECG data (292 participants, 584 recordings) and then transferred some of its weights to photoplethysmography (PPG) data (60 participants, 101 recordings) by using three transfer-learning strategies. The accuracy and Cohen's kappa of the ECG-PPG model were 76.36% and 0.65, respectively-a substantial improvement over those of the PPG model (69.82% and 0.55). This result demonstrates the merit of transfer learning if similar data are reused. However, few studies have attempted cross-signal transfer learning. Phan et al. [10] trained two recurrent neural networks in the source domain (a large database; the Montreal Archive of Sleep Studies database in this case) and then finetuned them in the target domain (two small databases: the SurreycEEGrid database and the Sleep Cassette and the Sleep Telemetry subsets of the Sleep-EDFX database). The transfer learning achieved an improvement in accuracy of 1.5% for their SeqSleepNet+ network (78.5% for EEG-only to 80.0% for EEG-EOG) and 3.5% for their DeepSleepNet+ network (75.9% to 79.4%). These transfer-learning studies reveal that the knowledge transfer from the same or similar signals can considerably increase model performance; for different signals, however, it does not greatly increase the accuracy but substantially reduces the training time, in our case. Moreover, if too many layers are frozen (too much knowledge is shared), training the new model has a limited effect and the model may fit the data poorly, resulting in high-speed training but low performance.
Finally, we compared the accuracy rates of our model with those of models reported by other recent studies on sleep staging using EEG data (Table 7). Li et al. [38] proposed an EEGSNet model based on CNN and bi-directional LSTM (Bi-LSTM) to extract features from the EEG spectrogram and classify them into five sleep stages. They trained their model using a 20-fold or leave-one-out cross-validation according to the size of the datasets. The accuracies were 94.17%, 86.82%, 83.02%, and 85.12%, respectively, for the sleep-edfx-8, sleep-edfx-20, sleep-edfx-78, and SHHS datasets. Jadhav et al. [24] evaluated the raw EEG epochs, short-time Fourier transform (STFT), and stationary wavelet transform (SWT) in the same dataset (i.e., sleep-edfx-78) by using CNN models. Their subject-wise models were trained through a 20-fold cross-validation with over 83% accuracy. For the classification of five sleep stages, our model with fewer layers achieved a better performance and the direct use of raw EEG data in our experiment can be of benefit for fast diagnosis. Table 8 shows the comparison of our model with other recent closely related studies using ECG data. Urtnasan et al. [8] used a deep convolutional recurrent (DCR) model based on the CNN and a gated recurrent unit (GRU) for the automatic scoring of sleep stages. They trained and tested the model using the ECG signals of 89 subjects and 23 subjects, respectively, randomly selected from the dataset and achieved an overall accuracy of 74.2% for five classes and 86.4% for three classes. Tang et al. [36] pre-trained a model built on five CNN blocks, bi-directional GRU layers, and a fully connected layer with a dataset and then re-trained it with another dataset with an improvement of 20%. Considering the resources and time, they randomly sampled 100 subjects (70% for training and 30% for testing) from each dataset. There is still room for improvement in the effect of using ECG signals alone for classification. By using transfer learning from EEG to ECG, our model could classify more classes with a better performance, which demonstrates the feasibility of automatic sleep staging using ECG signals.  Our experiments have some limitations. First, the sample size was insufficient. Including more databases in the training and test sets would improve the reliability of the model. Second, temporal information was not considered. Automatic feature extraction coupled with time series training, such as CNN-LSTM, may be more effective.

Conclusions
This study attempted to apply cross-domain transfer learning for two EEG-based classification tasks-seizure prediction and sleep staging-to explore its effects on recognition performance.
In Experiment 1, binary classification models were trained using a recordwise approach to test the architecture of our model; this model achieved an accuracy, specificity, and sensitivity of >98%. Subsequent subjectwise training simulated practical applications in which the test and training data were independent; this model achieved an accuracy, specificity, and sensitivity of >82%. Due to this dramatic decrease in the model performance, cross-dataset transfer learning was used to train patient-specific models; the model with six frozen layers achieved an accuracy, specificity, and sensitivity of 100% for seven out of nine subjects and >97% for the remaining two; moreover, only 40 s of additional training time was required. By applying transfer learning, the model could learn the EEG characteristics of an individual to achieve personalized and accurate detection that could increase the practicality of seizure prediction.
In Experiment 2, transfer learning on different signal sources for five-class sleep staging prediction was attempted. The same modified model architecture was used to build EEG and ECG models. As expected, the accuracy, Cohen's kappa, and F1-score (92.67%, 0.908, and 2.695%) of the EEG model were superior to those of the ECG model (86.13%, 0.827, and 86.07%). However, transfer learning produced an EEG-ECG model with an accuracy approximately 2.5% greater than that of the ECG model. Although this crosssignal transfer-learning method achieved little performance improvement, the training time was reduced by >50% compared with that for the ECG-only model, effectively reducing the computing resource consumption. Additional studies should be conducted regarding the challenges of knowledge transfer between different signals. To the best of our knowledge, this experiment is the first to demonstrate the feasibility of cross-signal transfer learning from EEG to ECG for sleep staging. EEG measurement is inconvenient and uncomfortable; hence, using ECG for sleep staging could enable practical applications, such as wearable devices employed for sleep analysis and recording sleep quality.
In summary, EEG can be used to detect brain abnormalities and provides an effective basis for patient evaluation. However, its limitations restrict its use in practice. Crossdomain transfer learning strategies may be able to overcome these problems for further specific uses, such as precision medicine, portable devices, or rare disease detection, in simple or original model structures.