Enhancing Electrocardiogram Classification with Multiple Datasets and Distant Transfer Learning

Electrocardiogram classification is crucial for various applications such as the medical diagnosis of cardiovascular diseases, the level of heart damage, and stress. One of the typical challenges of electrocardiogram classification problems is the small size of the datasets, which may lead to limitation in the performance of the classification models, particularly for models based on deep-learning algorithms. Transfer learning has demonstrated effectiveness in transferring knowledge from a source model with a similar domain and can enhance the performance of the target model. Nevertheless, the consideration of datasets with similar domains restricts the selection of source domains. In this paper, electrocardiogram classification was enhanced by distant transfer learning where a generative-adversarial-network-based auxiliary domain with a domain-feature-classifier negative-transfer-avoidance (GANAD-DFCNTA) algorithm was proposed to bridge the knowledge transfer from distant sources to target domains. To evaluate the performance of the proposed algorithm, eight benchmark datasets were chosen, with four from electrocardiogram datasets and four from the following distant domains: ImageNet, COCO, WordNet, and Sentiment140. The results showed an average accuracy improvement of 3.67 to 4.89%. The proposed algorithm was also compared with existing works using traditional transfer learning, revealing an average accuracy improvement of 0.303–5.19%. Ablation studies confirmed the effectiveness of the components of GANAD-DFCNTA.


Introduction
Electrocardiograms (ECG) have been helping human beings in medical monitoring and diagnosis for more than a century. Machine learning algorithms were applied to formulate ECG classification problems such as the detection of various types of cardiovascular disease [1], heart muscle damage detection [2], stress detection [3], and drowsiness Bioengineering 2022, 9,683 2 of 21 detection [4]. Large-scale medical data collection is challenging due to privacy [5], ethics [6], and security [7]. This usually results in small-scale electrocardiogram dataset collection by medical institutions and research groups. In the algorithmic perspective, deep-learning algorithms have superior advantages in the enhancement of models when sufficient training data is available [8,9]. There were some studies suggesting the applicability of deep-support vector machines for small-scale datasets [10].
Attention is drawn to transfer learning, which could leverage the performance of the target model based on the pretrained source model. Typically, both the source and target datasets have similar domains and modalities. It is observed in related works that transfer learning can improve the performance of the target model to a certain extent; however, there is room to achieve excellent performance. In regard to ECG classification, where the problem is often encountered, biased classification and small-scale datasets occur. The vision to apply the deep-learning algorithm to enhance the performance of the classification model is required to overcome potential model overfitting with limited training data. In view of this concern, the extension of the problem formulation of transfer learning with heterogenous datasets between source and target domains was considered in this article. This is an emergent research area, namely distant transfer learning, which has benefits such as the unlimited possibilities in choosing pretrained source models, particularly for source domains that are highly dissimilar to the target domain. The pretrained source models are trained with large-scale datasets that could benefit the fine-tuning of the target ECG classification models in new knowledge, the reduction of model overfitting and biased classification, etc. Nevertheless, distant transfer learning experiences a tradeoff in the key challenges in the design of new auxiliary domains and negative transfer avoidance algorithms. With the success of distant transfer learning, the enhancement of the performance of the target model can be achieved by both similar and distant source domains.
In the following, a literature review was conducted to study the methodology and results of the existing works. Several research limitations were observed, which served as the rationale of our proposal: to address the limitations of the existing works.

Literature Review
To the best of our findings, there were no related works in distant transfer learning using auxiliary domains for ECG classifications. Therefore, the discussion in this subsection considered traditional transfer learning for ECG classification without the introduction of auxiliary domains as the bridge between the source and target domains. Related works of distant transfer learning in other areas were studied, which provided insights for our design and formulation for the proposed distant transfer learning algorithm.

Traditional Transfer Learning for ECG Classification
Various works considered a knowledge transfer from a source model with different domains compared with the target domain. Transfer learning was performed to transfer knowledge from a pretrained model using the ImageNet database to the MIT-BIH arrhythmia database [11]. Before model construction, the continuous wavelet transform was used to transform the 1D ECG signals to 2D ECG signals. The input signals were passed to a convolutional neural network (CNN) algorithm for feature extraction and classification. The algorithm achieved a sensitivity of 96.2%, a specificity of 99.3%, and an accuracy of 99.1%. The accuracy was enhanced by 0.507% to 1.86% compared with existing works. Another work [12] also applied transfer learning to the MIT-BIH arrhythmia database. Three pretrained models, AlexNet, ResNet18, and GoogleNet, were selected for analysis. These models facilitated the knowledge transfer to the target model based on the CNN. The CNN model was optimized using different optimizers such as RMSprop, Adam, and SGDM. A performance evaluation showed that the AlexNet-CNN achieved the best accuracy with RMSprop (98.5%), the ResNet18-CNN achieved the best accuracy with SGDM (99.5%), and the GoogleNet-CNN achieved the best accuracy with Adam (98.6%). To prepare 2D ECG signals, the short-time Fourier transform was employed [13]. EfficientNet was selected as the pretrained model to support CNN-based target model construction for the MIT-BIH arrhythmia database and the PTB Diagnostic ECG database. An ablation study showed that the average accuracy improved from 94.7% to 97.0%.
For existing works using traditional machine learning-based classifiers, a preliminary study with 294 ECG samples (a small portion of six benchmark datasets) was carried out for ECG classification [14]. The continuous wavelet transform was firstly applied to prepare 2D ECG signals. ResNet50 was served as a pretrained model to fine-tune the target model. Three algorithms, namely XGBoost, random forest, and Softmax, were chosen to build the classifiers. These classifiers yielded an accuracy of 98.3%, 94.9%, and 93.2%, respectively.
The abovementioned works considered source and target domains that were not similar. In [15], a pretrained model for right bundle branch block classification was transferred to Brugada syndrome classification. The CNN and bidirectional long short-term memory (Bi-LSTM) were two core components for the ECG classification problem. An ablation study showed that the sensitivity of the model could be improved from 79.2% to 87.6%, whereas the specificity kept constant at 69.6%. Another work [16] presented a pretrained model using various datasets from different hospitals. Knowledge was transferred to a hybrid CNN and autoencoder target model. The design of the model was able to suppress the noise level of ECG signals and, thus, improve the model accuracy from 94.5% to 98.9%.

Distant Transfer Learning Applications
A distant-domain high-level feature fusion approach was proposed for distant transfer learning [17]. The source domain was from one of the benchmark datasets, namely Dslr, Webcam, Amazon, and Catech-256. The auxiliary domain was based on breast ultrasound images, and the target domain was based on thyroid ultrasound images. Ablation studies showed that distant transfer learning improved the accuracy of the target model from 82.5% to 86.7% (Dslr), 79.5% to 84.4% (Webcam), 76.1% to 83.4% (Amazon), and 78.5% to 82.7% (Catech). The work [18] adopted these four source datasets for multiple-dataset distant transfer learning using distant feature fusion and the reduced-size Unet Segmentation model. Chest X-rays were selected as the auxiliary dataset for the bridge between multiplesource domains and target domains (COVID-19 computed tomography). An ablation study concluded that there was an accuracy improvement of the model from 86% to 96%.
Differed from the common assumption of distant transfer learning that the source and target domains are different, the work [19] considered the adoption of distant transfer learning when both the source and target domains were similar (industrial fault samples). A transitive distant-domain adaptation network was proposed for the transitive exploration of distant-domain samples. The average accuracy was 81.4% for five types of faults. Another work [20] proposed an autonomous machine learning pipeline with a feature transfer. Three experiments were carried out for the performance evaluation of the algorithm in the research topics: (i) text classification, where the source domain (BERT dataset) transferred knowledge to the target domain (the toxic comment dataset or the spam-email classification dataset); (ii) image classification, where the source domain (the ImageNet dataset) transferred knowledge to the target domain (the Cifar-10 dataset or the CASIA-FaceV5 dataset); and (iii) audio classification, where the source domain (the Audioset dataset) transferred knowledge to the target domain (the ESC-50 dataset or the Speech Command dataset).

•
There was a lack of research on distant transfer learning for ECG classifications.
• Some of the research works [19,20] in distant transfer learning that considered both the source and target domains were similar; however, traditional transfer learning algorithms with a lower model complexity can achieve a similar performance.

•
The details of the design and the formulation of multiple-source datasets on the distant transfer learning process [18] were insufficient. • There were limited discussions on negative transfer avoidance between source and target domains in the aspects of domain, instance, and feature.

Research Contributions of the Article
To address the research limitations, a generative-adversarial-network-based auxiliary domain with a domain-feature-classifier negative-transfer-avoidance (GANAD-DFCNTA) algorithm was proposed for the knowledge transfer from distant source to target domains. The research contributions of the article were summarized below. It is worth noting that k-fold cross-validation where k = 5 was adopted for the performance evaluation, and analysis and ablation studies were carried out to reveal the effectiveness of the components of the GANAD-DFCNTA algorithm.

•
Distant transfer learning was newly applied for ECG classifications. Six benchmark ECG datasets were selected for the research studies.

•
With the unrestricted discipline of the source domain in distant transfer learning, generative-adversarial-network-based auxiliary domains were designed using both the source and target datasets.

•
To minimize the risk of negative transfer from the source model to the target model, a domain-feature-classifier negative-transfer-avoidance algorithm was proposed to minimize loss for domain reconstruction, feature extraction, and the classifier.

•
The GANAD-DFCNTA algorithm improved the accuracy by 0.303-5.19% compared with existing works.

•
An investigation was carried out on the extension of the GANAD-DFCNTA algorithm with multiple-source datasets. An evaluation showed that the target model can enhance the accuracy by 3.67 to 4.89% with multiple-source datasets.

Organization of the Article
The rest of the article was organized as follows: Section 2 presented the details of the design and formulation of the GANAD-DFCNTA algorithm. The summary of the benchmark datasets and the performance evaluation of the GANAD-DFCNTA algorithm can be found in Section 3. To evaluate the contributions of the components of the GANAD-DFCNTA algorithm, Section 4 shared the ablation studies that were carried out to remove the individual components in each study. This article was ended with a conclusion and future research directions in Section 5.

Methodology
The methodology begins with the overview of the proposed distant transfer learning. For the design and formulation of the GANAD-DFCNTA algorithm, the first part of the GANAD algorithm was firstly presented in Section 2.2, followed by the second part of the DFCNTA algorithm. Figure 1 compares the conceptual architecture between traditional transfer learning for ECG classifications (Left) and the proposed distant transfer learning (Right). The former assumed that the source and target domains are similar, i.e., ECG-related datasets for both source and target datasets. The latter considered distant source and target domains.

Overview of the Proposed Distant Transfer Learning Algorithm for ECG Classification
Since the desired application was ECG classification, the source domain was assumed as non-ECG-related, whereas the target domain was ECG-related. Exhaustive search and analysis (of infinitely many datasets) may be required for the selection of the appropriate auxiliary domain to serve as a bridge between the source and target domains. In our work, an algorithmic approach was selected to generate two auxiliary domains based on the employment of GAN by the source and target domains. One auxiliary domain was based on GAN with the source domain and another was based on GAN with the target domain. These auxiliary domains have the advantage of the generation of more relevant data and, thus, reduce the dissimilarities between the source domain and the target domain using the original formulation of direct distant transfer learning. As a result, the auxiliary domains contribute to negative transfer avoidance. In regard to the elements of positive transfer, attention was drawn into the nature of small-scale ECG datasets. The deep-learning-based ECG classifiers may experience overfitting and biased classification. DFCNTA algorithm. Figure 1 compares the conceptual architecture between traditional transfer learning for ECG classifications (Left) and the proposed distant transfer learning (Right). The former assumed that the source and target domains are similar, i.e., ECG-related datasets for both source and target datasets. The latter considered distant source and target domains. Since the desired application was ECG classification, the source domain was assumed as non-ECG-related, whereas the target domain was ECG-related. Exhaustive search and analysis (of infinitely many datasets) may be required for the selection of the appropriate auxiliary domain to serve as a bridge between the source and target domains. In our work, an algorithmic approach was selected to generate two auxiliary domains based on the employment of GAN by the source and target domains. One auxiliary domain was based on GAN with the source domain and another was based on GAN with the target domain. These auxiliary domains have the advantage of the generation of more relevant data and, thus, reduce the dissimilarities between the source domain and the target domain using the original formulation of direct distant transfer learning. As a result, the auxiliary domains contribute to negative transfer avoidance. In regard to the elements of positive transfer, attention was drawn into the nature of small-scale ECG datasets. The deep-learning-based ECG classifiers may experience overfitting and biased classification.

Generative-Adversarial-Network-Based Auxiliary Domains (GANAD) Algorithm
The generative adversarial network (GAN) has demonstrated its superiority in generating additional training data for the training of machine learning models [21][22][23]. One of the key advantages is the generation of additional training data for minority classes to reduce the impact of imbalanced classifications. To formulate the GAN for the two auxiliary domains (one relates to the source domain and one relates to the target domain), it is important to ensure good diversity for the generated data in order to enhance the functionality of the auxiliary domains as bridges between the source and target domains. Otherwise, without good diversity, the problem reduces to the traditional transfer learning process.
Starting with the formulation of the conditional GAN (cGAN) [24], there is a conditional mapping function named generator G : X → Y which uses the input x ∈ X to conditionally generate the output y ∈ Y. The latent variable z → Z is important to control the learning problem of multi-modal mapping G : X × Z → Y , such that an input x is mapped to diverse multiple outputs y. The minimization problem of the generator, with D as the discriminator, is given by: (1) Equivalently, maximizing the score D(x, G(x, z)) produces the outputs from the true data distribution. The discriminator aims at minimizing the score it gives to generated samples G(x, z) with the minimization of the D(x, G(x, z)) and the maximization of the D(x, y) that it gives to the ground truth x. G attempts to fake D to believe that the generated samples are from x. This requires the formulation of the maximization problem of D as: Combining Equations (1) and (2), the loss function of the cGAN is defined as: In a typical formulation of GAN, mapping is learnt from a latent distribution p z to a complicated distribution. This requires a large-scale dataset. With the nature of smallscale ECG datasets in the target domain, increasing the network depth was not a feasible approach. Alternatively, enhancing the latent distribution for better modelling power was chosen. From [24], the diversity of the data can be enhanced by modifying the latent space with the Gaussian model: where ω = [ω 1 , . . . , ω N ] is the mixture weights vector and f (z|α i , Σ i ) is the probability of the sample z following normal distribution N(α i , Σ i ). If equal weights are assumed, Equation (4) is reduced to: However, normal distribution has low kurtosis, and equal weights reduce the flexibility of mixture weights. It is also not practical to have equal importance for the generated samples because transferring knowledge between distant domains is more challenging with less relevant samples. In contrast to the traditional formulations of cGAN in Equations (1)-(5), in our work, we proposed the reformulation of Equation (4) with the logistic distribution g(z|µ, ρ) with high kurtosis and the mixture weights vector β = [β 1 , . . . , β N ] with unequal weights: Individual samples can be obtained by: where γ i is the deterministic function, δ i is the covariance matrix, and σ is the auxiliary noise following logistic distribution. The model was then specified as: with N logistic components. For data generation, one of the N logistic components was selected and sample z was obtained from the selected component. The sample was passed to the generator to obtain the output. By applying L2 regularization to the generator, Equation (1) was updated to: Applying gradient ascent to D to solve the maximization problem: max end for G, steps k = 1:N do Sample minibatch of size p from the latent space Applying gradient descent to G to solve the minimization problem: Likewise, Algorithm 2 summarizes the workflow to train G and D for GAN-based auxiliary domain stage 2 using the distant target domain. G generated a batch of synthetic ECG-related samples along with samples from the input ECG-related sample. Applying gradient ascent to D to solve the maximization problem: max end for G, steps k = 1:N do Sample minibatch of size p from the latent space Applying gradient descent to G to solve the minimization problem:

Domain-Feature-Classifier Negative-Transfer-Avoidance (DFCNTA) Algorithm
In the distant transfer learning process (recall from Figure 1), there exists a feature space shared between the source and target domains in all stages. Referring to the basic formulations of the GAN loss L GAN and the classification loss L class [25]: where x u is the sample from the unlabeled target domain, P T (X) is the target marginal, D is the discriminator, F(x u ) is the true feature, x s is the input sample from the labeled source domain, P S (X) is the source marginal, F(x s ) is the fake feature, x l is the input sample from the labeled target domain based on the target joint P T (X, Y), y l is the output sample from the labeled target domain based on P T (X, Y), T L is the labeled target domain dataset, C is the classifier, F(x l ) is the joint feature, y s is the output sample from the labeled source domain, and S is the labelled source set.
It is worth noting that the following assumption (Equation (12)) that every x s provides positive transfer to the target domain is not appropriate: Instead, there exists x X t . Equation (12) was revised as: To ensure good transferability of the marginal discriminator and the joint discriminator, a virtual label y v was defined so that a discriminator can served as the marginal discriminator and the joint discriminator. Particularly, the joint discriminator can use x u because the labeled data are limited. y v was passed to the feature network. Equation (10) was updated as: Since each sample in the source domain set may contribute to different extents to the positive transfer, weighting factors can be introduced to the second term of Equation (11). In [25][26][27], the weighting factors were estimated by the ratio between the target joint and the source joint. In our work, to improve diversity, the weighting factors ϕ were not restricted by this ratio. Instead, the factors were considered as hyperparameters to be optimized. Equation (11) was modified as: where γ is the scaling factor to control ϕ. As a result, the transfer learning problem became: where σ is the hyperparameter.

Benchmark Datasets and Performance Evaluation
This section first starts with a brief summary of the eight benchmark datasets in three data types: image, text, and ECG signal. The performance evaluation and the analysis of the GANAD-DFCNTA algorithm were shared. It was then compared with existing works.

Benchmark Dataset
Retrieving benchmark datasets is important for the performance evaluation of the GANAD-DFCNTA algorithm. For the source datasets, image-based and text-based datasets were selected because of their popularity and their successful previous works in transfer learning. More importantly, these datasets are not related to ECG classification and timeseries data. For image-based datasets, ImageNet [28] and COCO [29] were chosen. For text-based datasets, WordNet [30] and Sentiment140 [31] were used. In regard to the target datasets, four ECG-based datasets were chosen, namely PTB-XL [32], the MIT-BIH arrhythmia database [33], the European ST-T database [34], and the long-term ST database [35]. Table 1 summarizes the details of the eight benchmark datasets. With the formulation of the GANAD-DFCNTA algorithm using multiple-source datasets (from 1 to 4) with ordering of datasets, 64 scenarios can be generated for each target dataset. Overall, 64 × 4 = 256 models were analyzed. The ECG signals performed ECG beat segmentation to obtain individual samples. Since the ECG beat segmentation is a well-known technique, we only highlighted the major steps where the full details can be referred to [36,37]. The general idea of the ECG beat segmentation is to locate all R waves so that individual ECG samples is defined as the small segment between two consecutive R waves. Given that the typical frequency range of QRS complexes is from 10 to 30 Hz. A high pass filter with transfer function H high (z) is firstly applied to the ECG signal: with delay and cutoff frequency of samples and 5 Hz, respectively. A linear phase low-pass filter H low (z) is then applied to the ECG signal: where the powerline noise and muscle noise can be significantly attenuated by 35 dB. The delay and the gain of the H low (z) are 5 samples and 31 dB, respectively. To extract the slope information of the ECG signals (particularly QR and RS segments), a linear phase derivative filter with impulse response h d [n] is defined: where the delay and gain are 2 samples and 14 dB, respectively. It is followed by the employment of signal squaring (taking square) on the output. Afterwards, moving integration with difference equation y MI [n] is applied: with the total number of samples N in the window. The first sample of the output is the location of Q wave and the length of the output is the summation of twice QS segments and window width. Defining two thresholds, δ 1 = P N + 0.25(P S − P N ) and δ 2 = 0.5δ 1 with noise peak P N and signal peak P S . The locations of Q waves, R waves, and S waves can be obtained. Table 2 summarizes the sample size of each class for four ECG benchmark datasets [32][33][34][35].

Two Auxiliary Domains
The ECG classification model was supported by the GANAD-DFCNTA and CNN algorithms. The formulations were based on interpatient ECG classification to align with the settings in existing works [11][12][13][14][15][16] in an apple-to-apple comparison. The architecture of the CNN is summarized as follows: layer 1-conv1D with kernel = 50, unit = 128, ReLU = 3, and strides = 3; layer 2-batch normalization; layer 3-maximum pooling with size = 2 and stride = 3; layer 4-conv1-D with kernel = 8, unit = 32, ReLU = 1, and strides = 1; layer 5-batch normalization; layer 6-maximum pooling with size = 2 and stride = 2; layer 7-conv1-D with kernel = 5, unit = 512, ReLU = 1, and strides = 1; layer 8-conv1-D with kernel = 3, unit = 128, ReLU = 1, and strides = 1; layer 9-fully connected layer; and layer 10-output layer. For all experiments, k-fold cross-validation with k = 5 was selected as a common order in classification problems [38][39][40]. Table 3 summarizes the performance of the best model for the multiple datasets (from 1 to 4) using specificity, sensitivity, and accuracy. These evaluation metrics are defined as follows: where TN is the true negative, TP is the true positive, N is the number of real negatives, and P is the number of real positives. Since four distant source domains (two image-based [28,29] and two text-based domains [30,31]) were selected to perform distant transfer learning to enhance the performance of the target model for ECG classification problems, four models were built for [32][33][34][35]. An evaluation and an analysis were also conducted on the number of distant source domains and on the enhancement of the target domain. Therefore, four scenarios were set up with a different number of source domains using one dataset, two datasets, three datasets, and four datasets. In view of presenting the long list of the 25 pairs of sensitivity and specificity (25 classes in Table 2), Tables 3-5 summarize the overall sensitivity and specificity.  The key observations were drawn as follows: • Distant transfer learning via the GANAD-DFCNTA algorithm improved the performance (specificity, sensitivity, and accuracy) of the baseline ECG classification model. With more source datasets, the performance of the model can further be enhanced. It is worth noting that the saturation of model performance may be reached at some point, depending on the similarities between the source and target datasets.

•
The percentage improvement of the specificity, sensitivity, and accuracy in PTB- The deviations between overall specificity and sensitivity with a varying number of datasets were 0.763% in PTB-XL, 0.613% in the MIT-BIH arrhythmia database, 0.842% in the European ST-T database, and 0.940% in long-term ST database.

•
To better investigate the individual classes of highly imbalanced datasets [33], the overall deviations of the top five classes of the highest imbalanced ratios were 1.81% in Class 14, 1.45% in Class 13, 1.27% in Class 12, 1.13% in Class 11, and 1.03% in Class 10.

•
As a remark, the baseline CNN algorithm serves as a common architecture that was adopted in many existing works. The main theme is the distant transfer learning process between distant multiple-source domains and target domains.

One Auxiliary Domain
To reveal the benefits of two auxiliar domains, one less domain was formulated. The two scenarios were (i) a GAN-based auxiliary domain based on the source domain; and (ii) a GAN-based auxiliary domain based on the target domain. Similar to the settings of Section 3.2.1 with two auxiliary domains, Tables 4 and 5 summarize the performance of the best model for the multiple datasets (from 1 to 4) using specificity, sensitivity, and accuracy for scenario (i) and scenario (ii), respectively. In both scenarios, the performance of the target models was enhanced with the increase in the number of datasets. Compared with the proposed GANAD-DFCNTA algorithm with two auxiliary domains, the formulation with one GAN-based auxiliar domain based on the source domain is less accurate, and a GAN-based auxiliar domain based on the target domain is least accurate. This revealed that both auxiliar domains were important to bridge the gap between domains using distant transfer learning.

•
Ablation study: An ablation study was omitted in some works [11,12,14]. Our work and [13,16] included an ablation study to analyze the effectiveness of the individual components of the algorithm where multiple techniques were incorporated. • Sensitivity and specificity: The differences between sensitivity and specificity were 3.22% for [11] which suggests a slightly biased classification towards the majority class. The differences in our work were ranged from 0.621 to 0.928%. Other works [12][13][14]16] did not report the sensitivity and specificity. • Accuracy: Our work outperformed the existing work [11,13,14,16] by 0.303-5.19%. Compared with [12], our work enhanced the accuracy by 0.303-2.47% in 8 out of 9 scenarios. Table 6 compares the performance between GANAD-DFCNTA algorithm and existing works. For fair comparison, only target domains that were matched with the four benchmark datasets in [32][33][34][35] were included.

Performance Comparison between GANAD-DFCNTA Algorithm and Existing Works
The discussion of the comparison was presented based on each item: • Method: The basic architecture for ECG classification was typically the CNN, except [14] when using XGBoost. The CNN was a useful architecture that could automatically extract a feature and serve as a classifier. • Source domain: In related works, the source domain was similar to the target domain in the field of ECG datasets. To the best of our knowledge, this work was the first work to consider distant transfer learning for ECG classifications with multiple distant source domains and target domains. • Target domain: In related works, the MIT-BIH arrhythmia database [11][12][13]16] and the long-term ST database [14] were considered as the benchmark datasets in the target domain. Our work included two more benchmark datasets, the European ST-T database and the long-term ST database, for analysis.
To further analyze the effectiveness of the deep-learning-based algorithm, it was compared with traditional machine learning algorithms. Table 7 summarizes the results. It can be seen from the results that our work outperformed existing works with traditional machine learning in all the target ECG classification models. The accuracy of our GANAD-DFCNTA algorithm outperformed existing works by 36.3% in PTB-XL, 6.88% in the MIT-BIH arrhythmia database, 2.54-2.98% in the European ST-T database, and 2.96-8.94% in the long-term ST database. Therefore, it is worth formulating the ECG classification problem with deep learning, even though traditional machine learning has less model complexity.

Ablation Studies
Ablation studies of the GANAD-DFCNTA algorithm were carried out to evaluate the effectiveness of the components. Four algorithms, namely DFCNTA, GANAD-FCNTA, GANAD-DCNTA, and GANAD-DFNTA, were considered.

DFCNTA
To evaluate the effectiveness of the generative-adversarial-network-based auxiliary domains algorithm, we considered the DFCNTA algorithm for ECG classifications in the four target domains. Table 8 presents the performance of the DFCNTA algorithm.
The key observations were drawn as follows: • The percentage improvement of the specificity, sensitivity, and accuracy in PTB-XL was: 0.323, 0.213, and 0.276% for one dataset; 0.647, 0.640, and 0.643% for two datasets;  To further investigate the effectiveness of the generative-adversarial-network-based auxiliary domains algorithm, the performance of Classes 10-14 in the MIT-BIH arrhythmia database [33] with and without the algorithm is summarized in Table 9. The key observations were explained as follows: • The percentage improvement of the accuracy in Class 10 was: 29  The improvement was due to the contribution of the generation of additional training data to enhance the model. Particularly, the enhancement was more significant in the minority classes (for example, Classes 10-14 in Table 9). This aligned with the nature of machine learning problems where biased classification is usually towards the majority classes.

GANAD-FCNTA
To evaluate the effectiveness of the domain level in the domain-feature-classifier negative-transfer-avoidance algorithm, we considered the generative-adversarial-networkbased auxiliary domains with the domain-feature-classifier negative-transfer-avoidance (GANAD-FCNTA) algorithm for ECG classifications in the four target domains. Table 10 presents its performance. The findings reveal that the domain level-based negative transfer avoidance algorithm is important to the enhancement of the accuracy of the target model. Particularly, the dissimilar between the source and target domains in distant transfer learning is high, that requires the incorporation of domain-level information in the algorithm.

GANAD-DCNTA
To evaluate the effectiveness of the feature level in the domain-feature-classifier negative-transfer-avoidance algorithm, we considered the generative-adversarial-networkbased auxiliary domains with the domain-feature-classifier negative-transfer-avoidance (GANAD-DCNTA) algorithm for ECG classifications in the four target domains. Table 11 presents its performance.

GANAD-DFNTA
To evaluate the effectiveness of the classifier level in the domain-feature-classifier negative-transfer-avoidance algorithm, we considered the generative-adversarial-networkbased auxiliary domains with the domain-feature-classifier negative-transfer-avoidance (GANAD-DFNTA) algorithm for ECG classifications in the four target domains. Table 12 presents its performance. The key observations were drawn as follows: • The percentage improvement of the specificity, sensitivity, and accuracy in PTB-XL was: 0.431, 0.320, and 0.375% for one dataset; 0.862, 0.747, and 0.785% for two datasets; 1. 19 The findings revealed that the classifier-level-based negative-transfer-avoidance algorithm enhanced the accuracy of the target model. The fine tuning of the hyperparameters of the classifiers in the target model was crucial to ensure a positive transfer.

Conclusions
Distant transfer learning has received attention in recent years because the constraints of the high similarities between the source and target domains are released. Owning the fact that distant transfer learning has not yet been studied in ECG classification problems, this paper conducted a research study on the applicability of distant transfer learning for ECG classifications. A generative-adversarial-network-based auxiliary domain with the domain-feature-classifier negative-transfer-avoidance algorithm was proposed. Four benchmark distant-domain datasets were selected as source datasets, and four benchmark ECG datasets were selected as target datasets. A performance evaluation of the proposed algorithm showed that the accuracy improvement was 3.67 to 4.89% using four source datasets. Compared with existing works using traditional transfer learning, our work enhanced the accuracy of the ECG classification by 0. 303-5.19%. Ablation studies on the generative-adversarial-network-based auxiliary domains algorithm with the domainfeature-classifier negative-transfer-avoidance algorithm also confirmed the effectiveness of the components.
As the first work to study distant transfer learning in ECG classifications with auxiliary domains, several future research directions were discussed: (i) investing an algorithm for the selection of appropriate distant source domains; (ii) investing an algorithm for the minimization of the number of distant source domains; (iii) merging similar and distant source domains to further enhance the performance of the target model; (iv) studying other baseline classification models; (v) investigating the performance of ECG classification models using varying settings such as personalized, interpatient, and intrapatient ECG classifications; (vi) enhancing the images with an image enhancement algorithm [48]; and (vii) proposing new variants of the generative adversarial network for data generation [49].