Resource-Aware ECG Classification with Heterogeneous Models in Federated Learning

Islam, Mohammad Munzurul; Alawad, Mohammed

doi:10.3390/fi17030130

Open AccessArticle

Resource-Aware ECG Classification with Heterogeneous Models in Federated Learning

by

Mohammad Munzurul Islam

and

Mohammed Alawad

^*

Department of Electrical and Computer Engineering, Wayne State University, Detroit, MI 48202, USA

^*

Author to whom correspondence should be addressed.

Future Internet 2025, 17(3), 130; https://doi.org/10.3390/fi17030130

Submission received: 28 January 2025 / Revised: 13 March 2025 / Accepted: 17 March 2025 / Published: 19 March 2025

(This article belongs to the Special Issue Distributed Machine Learning and Federated Edge Computing for IoT)

Download

Browse Figures

Versions Notes

Abstract

In real-world scenarios, ECG data are collected from a diverse range of heterogeneous devices, including high-end medical equipment and consumer-grade wearable devices, each with varying computational capabilities and constraints. This heterogeneity presents significant challenges in developing a highly accurate deep learning (DL) global model for ECG classification, as traditional centralized approaches struggle to address privacy concerns, scalability issues, and model inconsistencies arising from diverse device characteristics. Federated Learning (FL) has emerged as a promising solution by enabling collaborative model training without sharing raw data, thus preserving privacy and security. However, standard FL assumes uniform device capabilities and model architectures, which is impractical given the varied nature of ECG data collection devices. Although heterogeneity has been explored in other domains, its impact on ECG classification and the classification of similar time series physiological signals remains underexplored. In this study, we adopted HeteroFL, a technique that enables model heterogeneity to reflect real-world resource constraints. By allowing local models to vary in complexity while aggregating their updates, HeteroFL accommodates the computational diversity of different devices. This study evaluated the applicability of HeteroFL for ECG classification using the MIT-BIH Arrhythmia dataset, identifying both its strengths and limitations. Our findings establish a foundation for future research on improving FL strategies for heterogeneous medical data, highlighting areas for further optimization and adaptation in real-world deployments.

Keywords:

federated learning; resource-aware computing; ECG classification; model heterogeneity

1. Introduction

Electrocardiography (ECG) records contain important and critical information about cardiac conditions by capturing the heart’s electrical activity [1]. With the growing availability of wearable devices and remote health monitoring systems, the demand for automated ECG analysis has increased significantly. Artificial Intelligence (AI) has shown great potential in analyzing ECG data, enabling the accurate detection and diagnosis of various cardiac abnormalities [2]. However, analyzing these signals is complex, as it involves handling high-dimensional, noisy, and irregularly sampled data that vary across individuals and devices [3]. Additionally, robust model training requires large, diverse datasets collected from devices with varying capabilities. Traditional centralized AI systems struggle with privacy concerns and security risks, while decentralized approaches face challenges in achieving consistent model performance across diverse devices and data sources, as limited collaboration hinders generalization [4,5,6,7]. These limitations highlight the need for techniques that can address these issues effectively.

In 2016, Google researchers introduced Federated Learning (FL) [8] as a privacy-preserving solution to train models directly on edge devices, addressing challenges in data privacy and scalability. FL aggregates updates from local devices iteratively until the global model converges, enabling collaborative model training without sharing sensitive data. This approach is particularly suited for applications like ECG classification, where the data distribution is inherently non-identical across devices. By training models on local data, FL enhances data privacy and security while also improving the generalization of the global model by leveraging diverse patient data collected from different sources [9].

Traditional FL assumes that all devices collecting ECG data, such as wearables or smartphones, have similar computational capabilities and network conditions. Each device independently trains the same model using its local ECG data and then synchronously transmits the updated parameters to a central server. An aggregation algorithm, such as FedAvg [8,10], combines these updates on the server after collecting all local model updates. While this synchronous aggregation ensures uniformity, it may struggle with efficiency and scalability, especially in resource-constrained environments where devices have limited computational power, battery life, and connectivity. Consequently, slower devices can delay global updates, leading to inefficient training and a reduced model convergence speed [11].

To overcome these limitations, several approaches, such as HeteroFL [12] and federated dropout [13], have been proposed to address the challenges posed by heterogeneous devices with varying computational power, memory, and network conditions, making them more suitable for real-world FL environments. Specifically, devices with different computational capacities train models of varying complexities, sending updates to the server asynchronously upon completing local training. This allows for the immediate integration of updates and reduces the impact of slower devices. For instance, high-end devices may train a full model, while resource-constrained devices might train a pruned version with fewer parameters. The server then applies techniques like layer-wise [12,14] or hierarchical aggregation [15] to reconcile and merge updates from heterogeneous models. This approach ensures effective global model updates despite diverse device capabilities, providing a robust and scalable solution in real-world deployments where data are collected from a wide variety of devices.

Despite the advancements in FL for heterogeneous devices, existing ECG classification methods lack efficient adaptation to device heterogeneity while maintaining model accuracy and scalability. Most prior approaches have assumed homogeneous client capabilities or struggled with optimizing resource-constrained participation, limiting the robustness of FL in real-world ECG applications [16,17]. To bridge this gap, this research explored heterogeneous FL techniques specifically tailored for ECG classification. We implemented and extended the HeteroFL framework to optimize model performance across devices with different computational capacities. Specifically, we utilized the MIT-BIH Arrhythmia ECG dataset [18] and integrated the model slimming technique from HeteroFL [12]. This technique adjusts the neural network’s width, enabling resource-limited devices to participate effectively in training without compromising the global model’s accuracy. By adapting the model width to each device’s capabilities, our approach ensures that even low-power devices contribute meaningfully to the FL process, enhancing scalability, efficiency, and generalization in ECG classification.

The main contributions of this paper are as follows:

Extending HeteroFL to ECG classification by implementing model slimming, enabling efficient training across heterogeneous devices.
Demonstrating the effectiveness of heterogeneous FL for ECG classification using the MIT-BIH Arrhythmia dataset.
Analyzing the effects of device heterogeneity on model convergence and generalization to better understand its implications for FL in ECG classification.

The rest of this paper is structured as follows: Section 2 discusses existing ECG classification methods, ranging from centralized to decentralized approaches. Section 3 describes the methodology and materials used for the experiments. Section 4 outlines the experimental settings, including dataset preprocessing strategies, model configurations, and performance evaluation metrics. Section 5 presents the results and analysis of different heterogeneous configurations. Finally, Section 6 concludes the paper with key findings and future research directions.

2. Related Work

Machine learning (ML) techniques have been widely applied to ECG classification, with methods such as the Support Vector Machine (SVM) [19] and Least Square Twin SVM with k-Nearest Neighbors (kNNs) [20] used for arrhythmia detection. Early ML-based methods also included techniques like fuzzy C-means clustering, the Mahalanobis distance, and abstract feature extraction [21], as well as hybrid models such as the Long Short-Term Memory–Support Vector Machine (LSTM-SVM) [22]. While these approaches provided initial success, they struggled with accurately interpreting ECG features and modeling the complex interrelationships in high-dimensional data, limiting their overall effectiveness.

To overcome the limitations of ML classifiers, researchers adopted deep learning (DL) models, particularly Convolutional Neural Networks (CNNs). Notable work included CNN-based heartbeat detection [23], optimized CNN architectures using bat–rider algorithms [24], and deep multiscale fusion models that integrate diverse convolutional kernels [25]. Hybrid architectures, such as CNN-LSTM models [26] and subject-adaptive learning frameworks [27], have further improved ECG classification accuracy. However, these approaches require substantial computational resources and large-scale labeled datasets, making them unsuitable for deployment on real-time, resource-constrained wearable devices.

With the increasing need for decentralized data privacy-preserving solutions, FL has emerged as a promising approach for ECG classification. FL enables large-scale model training across multiple clients while ensuring that sensitive patient data remain on local devices. Since 2021, numerous FL strategies have been proposed to address the challenges of ECG data distribution and privacy. Asynchronous Federated Learning (Async-FL) [28] was introduced to reduce the communication bandwidth and improve model convergence by allowing for independent client updates, effectively handling non-IID (non-identically distributed) ECG data and enhancing the arrhythmia detection efficiency. Federated Cluster (FedCluster) [29] improved classification accuracy by clustering clients based on similar data distributions, mitigating non-IID effects and benefiting devices with skewed datasets. Another study [30] introduced an FL-based healthcare framework integrating explainable AI and deep CNNs while employing communication cost reduction techniques to enhance privacy. More recent advancements, such as Federated Echocardiography (FedECG) [31], leveraged ResNet-9 with labeled ECG data, demonstrating the feasibility of semi-supervised learning in federated settings. A comparative study [32] analyzed the performance of centralized and federated models on the MIT-BIH Arrhythmia dataset, confirming that FL can maintain high accuracy even in real-world decentralized scenarios. While these studies significantly improved FL-based ECG classification, they primarily focused on model accuracy and privacy without addressing the critical issue of device heterogeneity. In real-world FL deployments, client devices have vastly different computational capacities, limiting the efficiency and scalability of existing solutions.

To tackle device heterogeneity in FL, various strategies have been proposed in other domains, primarily focusing on image and text datasets. HeteroFL [12] mitigates variability in client resources by employing output channel pruning and layer-wise aggregation, ensuring that devices with different computing capacities can still participate effectively. Split-Mix [33] and adaptive FL [34] optimize training by dynamically adjusting model partitions based on the client–server configuration. Other approaches, such as federated edge learning [35] and low-rank model fusion in FL in Heterogeneous Models (FEDHM) [36], enhance FL performance in distributed environments. Despite these advancements, most of these techniques have been tested in controlled settings on image and text datasets, leaving a significant gap in addressing heterogeneity in FL for real-world biomedical data, such as ECG signals. Given the computational constraints of wearable and mobile ECG devices, there remains an urgent need for resource-aware FL approaches that ensure the effective participation of low-power devices without compromising model performance.

Our research builds on these advancements by addressing device heterogeneity in FL-based ECG classification. We leveraged HeteroFL’s model slimming technique, which adjusts the neural network width to accommodate resource-limited devices. Unlike existing FL-based ECG studies, which primarily focus on data privacy and accuracy, our work explicitly tackled the challenge of deploying FL in a heterogeneous device ecosystem. By integrating model slimming, we enabled efficient, scalable, and resource-adaptive ECG classification, ensuring that even low-power devices could meaningfully contribute to the global model while maintaining robust performance.

3. Materials and Methods

This section presents an overview of the MIT-BIH Arrhythmia dataset, the CNN model used as the ECG classifier, and the HeteroFL framework, which enables FL in heterogeneous device settings.

3.1. Dataset

The purpose of this study was to explore ECG classification in a heterogeneous FL environment. While single-lead ECG classification has been extensively studied, our work focused on a more complex scenario that integrates signals from multiple ECG sources with varying configurations. This study specifically addressed the challenges posed by device heterogeneity, signal variability, and privacy constraints in a distributed learning setting. By leveraging FL, we aimed to enable effective model training across diverse ECG devices while ensuring data security and robustness in real-world deployments.

The MIT-BIH Arrhythmia dataset is a crucial dataset in the field of cardiology and biomedical signal processing, widely utilized for the development and assessment of arrhythmia detection algorithms. Released in 1980, this dataset has since become an essential resource for ECG research [18]. It contains heart recordings from 47 patients, captured using a two-lead Holter monitor over a span of 48 h. It is digitized at 360 Hz to provide high-resolution data, which is important in cardiovascular computing for accurate beat detection. Each sample was captured with an 11-bit resolution across a 10 mV range, offering precise heart activity measurements. The dataset includes 48 recordings, identified by record numbers 100–109, 111–119, 121–124, 200–210, 212–215, 217, 219–223, and 228–231. Each recording is 30 min long. Two or more cardiologists independently annotated each recording, identifying key events such as heartbeats and arrhythmias. Disagreements between the cardiologists were resolved through collaboration, resulting in computer-readable reference annotations. This process led to the annotation of approximately 110,000 heartbeats, making the dataset a valuable tool for arrhythmia research and the development of diagnostic algorithms [18]. However, to maintain the quality of the dataset, certain records were excluded. Specifically, records 102, 104, 107, and 217 were removed due to insufficient signal quality, making them unreliable for cardiac diagnosis [37]. Consequently, the number of heartbeats in the dataset available for classification was reduced to 99,188.

The dataset includes five distinct classes of beats: normal beats (N), atrial premature beats (A), premature ventricular contraction beats (V), left bundle branch block beats (L), and right bundle branch block beats (R). ECG signal classification relies on analyzing the morphological features of the detected P-, Q-, R-, S-, and T- waves, as represented in Figure 1. Normal beats (N) display consistent P-wave activity, along with well-defined QRS and T-wave structures. Atrial contractions (A) are characterized by irregular P-waves due to premature atrial depolarization. Ventricular contractions (V) typically lack a preceding P-wave and feature widened, abnormal QRS complexes. Left bundle branch blocks (L) present broad QRS complexes with notched R-waves in specific leads, whereas right bundle branch blocks (R) are marked by prolonged QRS durations and distinctive secondary R-waves.

By examining these waveform characteristics within ECG cycles, the classification system categorizes each heartbeat based on its deviation from normal patterns [37,38].

The MIT-BIH Arrhythmia dataset is heavily imbalanced, as shown in Table 1, with the majority of beats classified as normal beats (N), while the other classes appear much less frequently. This imbalance presents a challenge for ML models, as the underrepresentation of minority classes can lead to biased predictions where the model favors the dominant normal class, reducing the overall classification performance for arrhythmic beats.

3.2. ECG Classification Model

In this study, we utilized a 1D CNN ECG classification model [39], shown in Figure 2, as the classifier in our heterogeneous FL framework. CNNs, adapted with 1D convolutional layers for ECG time series data, are highly effective in extracting features such as peaks and valleys, achieving high accuracy in arrhythmia classification. In this model, each ECG segment is treated as a 1D signal with a length of 187, captured from raw ECG data. The CNN has three 1D convolutional layers that apply a set of learnable filters sliding across the data to extract local features, allowing the network to identify significant characteristics like peaks and troughs, which help differentiate between various heartbeats or arrhythmias. The first two convolutional layers are followed by a max pooling layer, which selects the maximum value within a specified window from the input feature map, enhancing feature selection by reducing dimensionality while retaining important information. The last convolutional layer is followed by an average pooling layer, which takes the average of the values within the window, providing a more generalized representation of the features. To learn complex patterns in the data, the ReLU activation function is applied to all convolutional layers, introducing non-linearity to improve model performance. Finally, a fully connected layer maps the extracted features to a probability distribution over the number of heartbeat classes, allowing the model to classify each ECG segment accordingly. Together, these layers enable the CNN to efficiently extract meaningful information from the ECG signal, reduce noise, and enhance its ability to classify different types of heartbeats accurately.

3.3. HeteroFL Framework for ECG Classification

The heterogeneous FL framework “HeteroFL” [12] is designed to enable heterogeneous ECG clients with varying computational power to train models of different capabilities. To emulate diverse resource-aware ECG clients, each client’s computational complexity level, p, is defined, and the global model’s output channel is shrunk according to the computational complexity level of each client. Figure 3 illustrates the distribution of five local clients connected to the global model,

W_{g}^{0}

, based on three computational complexity levels. This framework consists of a global model on the server connected to five clients, each assigned one of the three computational complexity levels. Levels

L^{1}

,

L^{2}

, and

L^{3}

represent the three complexity levels assigned to the five clients.

L^{1}

retains all the parameters of the global model, while

L^{2}

and

L^{3}

retain 75% and 50% of the output channels of the global model, respectively. After the local models are created, the server assigns them to the clients. The clients then train their data according to the assigned model. ECG clients with higher computational power train larger model instances, while clients with lower computational power receive shrunk versions of the model. After each training round, clients transmit their updated model parameters (instead of raw data) to the central server, where an aggregation mechanism integrates these updates by integrating the averaged updated parameters into the global model. This iterative process ensures that heterogeneous ECG clients contribute effectively to the training process while benefiting from a collectively refined global model.

The entire process is outlined in the following steps. First, the global model

W_{g}^{0}

is initialized at the server, and width slimming is applied to compress the global model with predetermined compression ratios. The model is scaled using the channel shrinkage ratio r, reducing its input and output dimensions as follows:

d_{l p} = r^{p - 1} d_{g}, k_{l p} = r^{p - 1} k_{g}

(1)

where

d_{l p}

and

k_{l p}

are the output and input channels at complexity level p,

d_{g}

and

k_{g}

are the corresponding global model dimensions, and r is the shrinkage ratio.

The total parameter count of each local model is as follows:

W_{l}^{p} = r^{2 (p - 1)} W_{g}^{0}

(2)

Thus, both the input and output channels scale by

r^{p - 1}

, and the total parameters scale quadratically (

r^{2 (p - 1)}

), ensuring a balanced trade-off between efficiency and model expressiveness.

Each client is assigned a model variant based on its available computing resources. Clients with higher computational power receive a larger model instance, while resource-constrained clients are assigned a compressed version with a reduced number of parameters. This ensures that all clients can participate in training while maintaining efficiency based on their capabilities. In each communication round, t, a subset of active clients, m, is selected from the local client pool M to participate in training. Each client trains its assigned model variant using local ECG data through batch-based optimization to minimize a loss function specific to ECG classification. After training, the updated model parameters are transmitted to the central server. The server aggregates the local model updates, ensuring that each subnetwork’s updates are properly aligned with the global model structure. The aggregation process follows a weighted averaging approach to effectively integrate the contributions from different clients.

W_{g}^{l} = 1 / m \sum_{i = 1}^{m} W_{l}^{p}

(3)

where

W_{g}^{l}

represents the global model weights for clients at complexity level p, and

W_{l}^{p}

denotes the local updates from participating clients.

The updated global model is distributed back to clients for further training. The learning rate

η

and computational settings are adjusted to optimize future training rounds. This iterative process continues until the global model converges to an optimal performance level. After every t round, the global model will be updated accordingly.

W_{g}^{t} = A \sum_{i = 1}^{t} W_{g}^{l}

(4)

where

W_{g}^{t}

is the updated global model after each round of local model training and the aggregation of local weights from clients with varying computational complexity levels and A is the aggregating function.

HeteroFL ensures that local models remain stable by maintaining a shared subnetwork structure across all complexity levels. Even low-resource clients contribute meaningfully, as their trained parameters align with the global model’s structure. This heterogeneous model strategy ensures that ECG classification benefits from FL without excluding lower-capacity devices.

4. Experimental Settings

Our experiments were conducted on a computing system with the specifications listed in Table 2.

4.1. Dataset Preprocessing

Efficient data preprocessing is paramount for achieving optimal results in any classification task in ML and DL. Since our FL approach is based on DL, ensuring data quality is fundamental. Consequently, we focused on improving the dataset preprocessing efficacy, specifically addressing the significant imbalance in the dataset, where certain heartbeat types are underrepresented [18]. We extracted the ECG data for each record, focusing on the modified version of lead II (MLII), and labeled the corresponding heartbeats based on annotations provided in the dataset. This involved detecting R-peaks in the ECG signals and extracting adjacent segments for classification. Preprocessing started with cleaning input signals using a wavelet transform, followed by a series of steps including loading the data, marking key points, and labeling the corresponding heartbeats. The processed data consisted of 64,533 segments with a length of 187 for training, 9220 segments with a length of 187 for validation, and 18,439 segments with a length of 187 for testing, with the splits made in a 7:1:2 ratio, respectively. In this research, we divided the training data into multiple parts, using the Independent and Identically Distributed (i.i.d) setting. Specifically, the data were uniformly assigned to each client, ensuring that each client received an equal number of samples. This approach guaranteed that each client had a similar distribution of all five classes of heartbeats [18].

4.2. Heterogeneous Model Configurations

Our experiments were conducted using the CNN model explained in Section 3 on the MIT-BIH Arrhythmia ECG dataset [18]. We used the HeteroFL configuration, applying slimming ratios to create the five model classes. For instance, model ‘a’ retained all the parameters, corresponding to a ratio of 1, while models ‘b’ through ‘e’ progressively decreased in model size with slimming ratios of 0.5, 0.25, 0.125, and 0.0625, respectively, compared to the full model [12]. In our experimental setup, we used two strategies, the two-model scenario and the multi-model scenario, to evaluate performance under heterogeneous conditions. In the two-model scenario, each client was assigned one of two possible model configurations. In the multi-model scenario, clients had more than two model configurations to choose from, such as combinations like ‘c-d-e’.

The hyperparameters of HeteroFL were fine-tuned specifically for the ECG classification task, and the detailed configurations are provided in Table 3.

4.3. Performance Metrics

To evaluate the effectiveness of our ECG classification model within the heterogeneous Federated Learning (HeteroFL) [12] framework, we used two primary performance metrics: the accuracy and F1 score. These metrics were crucial for assessing the model’s ability to correctly classify ECG signals, particularly in the context of imbalanced datasets, where certain classes may be underrepresented.

The accuracy is the proportion of correctly classified instances out of the total number of instances. It provides an overall measure of the model’s ability to correctly classify both the majority and minority classes. However, in the case of imbalanced datasets, the accuracy alone may not fully capture the model’s performance, as it can be biased towards the majority class. To address this limitation, we also report the F1 score, which is the harmonic mean of precision and recall.

F 1 score = 2 \times \frac{Precision \times Recall}{Precision + Recall}

(5)

where precision and recall are defined as

Precision = \frac{T P}{T P + F P} and Recall = \frac{T P}{T P + F N}

Here,

T P

denotes the number of true positives,

F P

denotes the number of false positives, and

F N

denotes the number of false negatives.

The F1 score is particularly useful when dealing with imbalanced classes, as it provides a more balanced measure of the model’s performance by considering both false positives and false negatives. The F1 score is especially important in medical applications like ECG classification, where misclassifying certain arrhythmia types could have serious implications [40,41].

Together, the accuracy and the F1 score provide a comprehensive view of the model’s performance, ensuring that it is both reliable and robust in handling imbalanced ECG data.

5. Results

In Figure 4, we show the performance of two-model combinations under i.i.d settings. As the number of parameters decreased in combinations such as ‘a–e’, ‘b–e’, and ‘c–e’, the global model’s accuracy did not maintain the same high levels as in models a, b, and c alone, which achieved accuracies of 99%, 98%, and 98%, respectively. An exception was observed in the combination with model d, where the performance of d–e reached 97.6%, matching the accuracy of model d. In these combinations, only the aggregated global models (‘a’, ‘b’, ‘c’, or ‘d’) were used during testing. Consequently, smaller models (‘e’) trained by weak learners could be tested with the larger models (‘a’, ‘b’, ‘c’, or ‘d’), achieving performance above 95%. This was still better than the global accuracy of the single model ‘e’, which was 94.8%.

Figure 4 also illustrates that larger models, such as ‘a’ and ‘b’, exhibited faster convergence due to their higher parameter count and greater representational capacity, which enabled them to learn patterns more efficiently during training. In contrast, smaller models like ‘e’, which have fewer parameters and lower complexity, converged more slowly as they required more iterations to achieve comparable performance. Additionally, in heterogeneous settings where model ‘e’ was combined with larger models, the global model’s convergence was slowed. This occurred because the smaller model introduced weaker updates into the aggregation process, reducing the overall pace of optimization. The disparity in the model capacities created imbalances in the contributions of local updates, causing the global model to require more rounds to stabilize and converge effectively.

In terms of the global model accuracy, homogeneous settings consistently achieved the highest accuracy, as all models shared the same architecture and capacity, allowing for the more uniform and efficient aggregation of updates. In contrast, heterogeneous settings tended to show lower global accuracy, as the variation in the model capacities and complexities led to less effective aggregation. Smaller models in heterogeneous settings may contribute less informative updates, resulting in a global model that cannot leverage the full potential of the larger models. Moreover, the accuracy drop became more pronounced as the difference in the model sizes increased. When the disparity between the sizes of the local models was large, the small models’ contributions became more detrimental to the global model’s performance, further reducing the overall accuracy. This highlights the challenge of balancing the model capacities in heterogeneous environments to maintain optimal global performance.

In Table 4, we present the results for the i.i.d. data for the multi-model combination performance. We hierarchically show the number of model parameters, as well as the global and local model accuracies, along with the F1 score for each model setting. As the parameter count decreased, the model performance was also reduced. The largest model combined with relatively smaller models, such as ‘a-b-c-d-e’, yielded a reasonable global model accuracy of 98%, compared to the homogeneous global accuracy of model ‘a’, which was 99%. In the ‘b’ class, the multi-model combination ‘b-c-d-e’ achieved an accuracy of 97%, which was a slight reduction from the largest model’s accuracy of 99%. In the ‘c-d-e’ combination, the model achieved a global accuracy of 97%, which was close to the performance of the largest model in this class, model ‘c’, with an accuracy of 98%. This combination also yielded a reasonable F1 score, which was calculated based on the final global model accuracy. In all cases, the smaller models ‘d’ and ‘e’ showed improved performance compared to their own model class performances, which were 97% and 94%, respectively.

Table 4 shows that the larger model ‘a’, with 267,467 parameters, achieved superior performance, with F1 scores of 0.9735 locally and 0.9987 globally, effectively handling the imbalance. However, as the model size decreased, the F1 scores dropped significantly, especially in the global metrics, as seen in ‘a-e’ with 160,973 parameters, which only achieved a global F1 score of 0.8131. Interestingly, multi-model combinations like ‘a-b-c-d-e’, ‘b-c-d-e’, and ‘c-d-e’, with 71,326, 32,763, and 9320 parameters, respectively, maintained relatively strong local F1 scores of 0.9325, 0.912, and 0.9330, indicating better adaptation to a class imbalance locally. This improved performance in multi-model setups can be attributed to the diversity of the model capacities, where different models specialize in capturing distinct patterns across classes. By leveraging multiple models, the system can better preserve minority class features, mitigating the performance drop observed in smaller individual models. Small models, such as ‘e’ with 1118 parameters, struggled the most, with the global F1 dropping to 0.8248, representing challenges in capturing minority class features. The data show the difficulty smaller models face in balancing performance across imbalanced datasets.

In the two-model settings, as the model size increased in the heterogeneous scenario with model e, the accuracy decreased. In Table 3, we can see that the model combinations ‘d-e’, ‘c-e’, ‘b-e’, and ‘a-e’ had accuracies of 97%, 95%, 93%, and 92%, respectively. This suggests that smaller combinations, like ‘d-e’, are more effective, likely due to their ability to balance the model size and representation diversity.

From the above result analysis, we have identified several interesting findings. These findings emphasize the importance of selecting appropriately sized and complementary model combinations to achieve optimal performance in heterogeneous scenarios.

6. Discussions

Through extensive experimentation on highly imbalanced ECG data, we have demonstrated that HeteroFL effectively accommodates device heterogeneity in FL settings. Our results indicate that multi-model configurations enable a diverse range of ECG-enabled wearable devices to collaboratively train a global model, allowing both high-end and resource-constrained devices to contribute meaningfully. This is particularly significant as it suggests that FL can facilitate broad participation across devices with varying computational capacities, thereby enhancing accessibility and inclusivity in medical AI applications.

A key insight from our study is that the meticulous selection of model configurations plays a crucial role in optimizing performance. While larger models provide stronger global performance, smaller models still contribute meaningfully, and their inclusion can lead to a more balanced and robust learning process. However, imbalanced model contributions pose a challenge, as weaker devices with fewer parameters introduce updates that may slow convergence. Our findings suggest that strategic configurations—where smaller models are paired with appropriately sized larger models—can mitigate these effects and improve the overall performance.

Moreover, our analysis highlights that heterogeneous FL settings can achieve competitive accuracy while maintaining computational efficiency, making them a viable approach for real-world ECG classification. The results encourage further exploration into adaptive strategies that dynamically adjust model configurations based on the available resources and data characteristics, potentially leading to more efficient and scalable FL systems for physiological signal processing.

Despite its advantages, our study reveals two key limitations of HeteroFL in its current form. First, the current implementation of HeteroFL relies on a fixed number of model variants (i.e., five configurations: a, b, c, d, and e). This rigid structure limits flexibility in real-world deployments, where devices may have a more continuous spectrum of computational capabilities. A more dynamic approach that automatically adjusts the number and complexity of model variants could enhance adaptability and efficiency. Second, as the number of participating devices increases, the aggregation process becomes more complex, and disparities in model sizes can cause slower convergence and potential performance degradation. Efficient aggregation mechanisms that adaptively weigh contributions from heterogeneous models will be necessary to ensure that the system scales effectively while maintaining high performance across diverse device capabilities.

By addressing these limitations, future work can enhance the practicality of HeteroFL for large-scale, real-world ECG classification tasks, paving the way for more robust and adaptive FL frameworks in medical AI applications.

7. Conclusions

In this study, we explored the performance of heterogeneous model combinations for ECG classification in FL settings. Our findings demonstrate that while larger, homogeneous models achieve superior global accuracy and faster convergence, smaller models face significant challenges, particularly when a class imbalance is present. Despite this, smaller models can still contribute positively in heterogeneous settings when combined with larger models, as seen in the multi-model combinations. These combinations can balance model diversity and representation, improving local performance and reducing the global model’s convergence time.

Moreover, our results highlight the importance of carefully selecting model combinations that are appropriately sized and complementary. In scenarios where model sizes vary significantly, the disparity in local model capacities can slow down global model convergence and lead to a decrease in the overall performance. Therefore, optimizing model selection and ensuring diversity while maintaining a balance between the model size and complexity is crucial for enhancing the effectiveness of FL in resource-constrained environments.

Overall, this study provides valuable insights into the trade-offs between the model size, convergence time, and accuracy in heterogeneous FL scenarios, offering a roadmap for future research and development in the field of resource-aware ECG classification.

Author Contributions

Conceptualization, M.A.; Methodology, M.M.I. and M.A.; Software, M.M.I.; Validation, M.M.I.; Formal analysis, M.M.I. and M.A.; Writing—original draft, M.M.I.; Writing—review & editing, M.A.; Supervision, M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Androulakis, E.; Fielder, C. Artificial Intelligence in ECG Diagnostics—Where AreWe Now? Escardio. org. Available online: https://www.escardio.org/Councils/Council-for-Cardiology-Practice-%28CCP%29/Cardiopractice/artificial-intelligence-in-ecg-diagnostics-where-are-we-now (accessed on 3 March 2025).
Schlesinger, D.E.; Alam, R.; Ringel, R.; Pomerantsev, E.; Devireddy, S.; Shah, P.; Garasic, J.; Stultz, C.M. Artificial intelligence for hemodynamic monitoring with a wearable electrocardiogram monitor. Commun. Med. 2025, 5, 4. [Google Scholar] [CrossRef] [PubMed]
Saeidi, M.; Karwowski, W.; Farahani, F.V.; Fiok, K.; Taiar, R.; Hancock, P.A.; Al-Juaid, A. Neural decoding of EEG signals with machine learning: A systematic review. Brain Sci. 2021, 11, 1525. [Google Scholar] [CrossRef]
Luqman, A.; Mahesh, R.; Chattopadhyay, A. Privacy and Security Implications of Cloud-Based AI Services: A Survey. arXiv 2024, arXiv:2402.00896. [Google Scholar]
Hsieh, K.; Phanishayee, A.; Mutlu, O.; Gibbons, P.B. The Non-IID Data Quagmire of Decentralized Machine Learning. In Proceedings of the International Conference on Machine Learning (ICML), Virtual, 13–18 July 2020. [Google Scholar]
Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Nitin Bhagoji, A.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and open problems in federated learning. Found. Trends® Mach. Learn. 2021, 14, 1–210. [Google Scholar]
Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated learning: Challenges, methods, and future directions. IEEE Signal Process. Mag. 2020, 37, 50–60. [Google Scholar]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics, PMLR, Fort Lauderdale, FL, USA, 20–22 April 2017. [Google Scholar]
Coelho, K.K.; Nogueira, M.; Vieira, A.B.; Silva, E.F.; Nacif, J.A.M. A survey on federated learning for security and privacy in healthcare applications. Comput. Commun. 2023, 207, 113–127. [Google Scholar]
Li, X.; Huang, K.; Yang, W.; Wang, S.; Zhang, Z. On the convergence of fedavg on non-iid data. arXiv 2019, arXiv:1907.02189. [Google Scholar]
Lang, N.; Cohen, A.; Shlezinger, N. Stragglers-aware low-latency synchronous federated learning via layer-wise model updates. IEEE Trans. Commun. 2024; early access. [Google Scholar] [CrossRef]
Diao, E.; Ding, J.; Tarokh, V. HeteroFL: Computation and Communication Efficient FL for Heterogeneous Clients. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 4 May 2021. [Google Scholar]
Wen, D.; Jeon, K.-J.; Huang, K. Federated dropout—A simple approach for enabling federated learning on resource constrained devices. IEEE Wirel. Commun. Lett. 2022, 11, 923–927. [Google Scholar] [CrossRef]
Alam, S.; Liu, L.; Yan, M.; Zhang, M. Fedrolex: Model-heterogeneous federated learning with rolling sub-model extraction. Adv. Neural Inf. Process. Syst. 2022, 35, 29677–29690. [Google Scholar]
Fang, W.; Han, D.J.; Chen, E.; Wang, S.; Brinton, C. Hierarchical federated learning with multi-timescale gradient correction. Adv. Neural Inf. Process. Syst. 2025, 37, 78863–78904. [Google Scholar]
Mughal, F.R.; He, J.; Das, B.; Dharejo, F.A.; Zhu, N.; Khan, S.B.; Alzahrani, S. Adaptive federated learning for resource-constrained IoT devices through edge intelligence and multi-edge clustering. Sci. Rep. 2024, 14, 28746. [Google Scholar] [CrossRef] [PubMed]
Yao, D.; Shi, Y.; Liu, T.; Xu, Z. FedMHO: Heterogeneous One-Shot Federated Learning Towards Resource-Constrained Edge Devices. arXiv 2025, arXiv:2502.08518. [Google Scholar]
Moody, G.B.; Mark, R.G. The impact of the MIT-BIH arrhythmia database. IEEE Eng. Med. Biol. Mag. 2001, 20, 45–50. [Google Scholar] [CrossRef]
Li, Q.; Rajagopalan, C.; Clifford, G.D. A machine learning approach to multi-level ECG signal quality classification. Comput. Methods Programs Biomed. 2014, 117, 435–447. [Google Scholar] [CrossRef]
Raj, S.; Ray, K.C. Sparse representation of ECG signals for automated recognition of cardiac arrhythmias. Expert Syst. Appl. 2018, 105, 49–64. [Google Scholar] [CrossRef]
Haldar, N.A.H.; Khan, F.A.; Ali, A.; Abbas, H. Arrhythmia classification using Mahalanobis distance based improved Fuzzy C-Means clustering for mobile health monitoring systems. Neurocomputing 2017, 220, 221–235. [Google Scholar] [CrossRef]
Hou, B.; Yang, J.; Wang, P.; Yan, R. LSTM-Based Auto-Encoder Model for ECG Arrhythmias Classification. IEEE Trans. Instrum. Meas. 2020, 69, 1232–1240. [Google Scholar] [CrossRef]
Chandra, B.S.; Sastry, C.S.; Jana, S. Robust heartbeat detection from multimodal data via CNN-based generalizable information fusion. IEEE Trans. Biomed. Eng. 2018, 66, 710–717. [Google Scholar] [CrossRef]
Atal, D.K.; Singh, M. Arrhythmia classification with ECG signals based on the optimization-enabled deep convolutional neural network. Comput. Methods Programs Biomed. 2020, 196, 105607. [Google Scholar] [CrossRef]
Wang, R.; Fan, J.; Li, Y. Deep Multi-Scale Fusion Neural Network for Multi-Class Arrhythmia Detection. IEEE J. Biomed. Health Inform. 2020, 24, 2461–2472. [Google Scholar]
Chen, C.; Hua, Z.; Zhang, R.; Liu, G.; Wen, W. Automated arrhythmia classification based on a combination network of CNN and LSTM. Biomed. Signal Process. Control 2020, 57, 101819. [Google Scholar]
Ye, C.; Kumar, B.V.; Coimbra, M.T. An automatic subject-adaptable heartbeat classifier based on multiview learning. IEEE J. Biomed. Health Inform. 2016, 20, 1485–1492. [Google Scholar] [CrossRef] [PubMed]
Sakib, S.; Fouda, M.M.; Fadlullah, Z.M.; Abualsaud, K.; Yaacoub, E.; Guizani, M. Asynchronous Federated Learning-based ECG Analysis for Arrhythmia Detection. In Proceedings of the IEEE International Mediterranean Conference on Communications and Networking (MeditCom), Athens, Greece, 7–10 September 2021; pp. 277–282. [Google Scholar]
Huang, A.; Yang, Z.; Huang, Z.; Li, L.; Yu, Y. FedCluster: A Federated Learning Framework for Cross-Device Private ECG Classification. In Proceedings of the IEEE INFOCOM 2022—IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), New York, NY, USA, 2–5 May 2022. [Google Scholar]
Raza, A.; Tran, K.P.; Koehl, L.; Li, S. Designing ECG monitoring healthcare system with federated transfer learning and explainable AI. arXiv 2021, arXiv:2105.12497. [Google Scholar] [CrossRef]
Ying, Z.; Zhang, G.; Pan, Z.; Chu, C.; Liu, X. FedECG: A federated semi-supervised learning framework for electrocardiogram abnormalities prediction. J. King Saud Univ.—Comput. Inf. Sci. 2023, 35, 101568. [Google Scholar]
Hwang, H.; Yang, S.; Kim, D.; Dua, R.; Kim, J.Y.; Yang, E.; Choi, E. Towards the practical utility of federated learning in the medical domain. Conf. Heal. Inference Learn. PMLR 2023, 209, 163–181. [Google Scholar]
Luo, M.; Wu, X.; Wu, C.; Lai, Z.; Xu, F.; Qin, Z. FEDHM: Efficient Federated Learning for Heterogeneous Models via Low-Rank Model Fusion. arXiv 2021, arXiv:2111.14655. [Google Scholar] [CrossRef]
Xu, X.; Duan, S.; Zhang, J.; Luo, Y.; Zhang, D. Optimizing Federated Learning on Device Heterogeneity with A Sampling Strategy. In Proceedings of the 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS), Tokyo, Japan , 25–28 June 2021. [Google Scholar] [CrossRef]
Sun, Y.; Shao, J.; Mao, Y.; Wang, J.H.; Zhang, J. Semi-Decentralized Federated Edge Learning with Data and Device Heterogeneity. IEEE Trans. Netw. Serv. Manag. 2023, 20, 1487–1501. [Google Scholar] [CrossRef]
Hong, J.; Wang, H.; Wang, Z.; Zhou, J. Efficient Split-Mix Federated Learning for On-Demand and In-Situ Customization. arXiv 2022, arXiv:2203.09747. [Google Scholar]
Hua, X.; Han, J.; Zhao, C.; Tang, H.; He, Z.; Chen, Q.; Tang, S.; Tang, J.; Zhou, W. A novel method for ECG signal classification via one-dimensional convolutional neural network. Multimed. Syst. 2022, 28, 1387–1399. [Google Scholar] [CrossRef]
Luz, E.J.D.S.; Schwartz, W.R.; Cámara-Chávez, G.; Menotti, D. ECG-based heartbeat classification for arrhythmia detection: A survey. Comput. Methods Programs Med. 2016, 127, 144–164. [Google Scholar] [CrossRef] [PubMed]
Yuniarti, A.R.; Rizal, S.; Lim, K.M. Single heartbeat ECG authentication: A 1D-CNN framework for robust and efficient human identification. Front. Bioeng. Biotechnol. 2024, 12, 1398888. [Google Scholar] [CrossRef] [PubMed]
Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef]
Ebrahimi, Z.; Loni, M.; Daneshtalab, M.; Gharehbaghi, A. A review on deep learning methods for ECG arrhythmia classification. Expert Syst. Appl. X 2020, 7, 100033. [Google Scholar] [CrossRef]

Figure 1. A single ECG cycle from the MIT-BIH Arrhythmia dataset (record no. 102), showing the detected P-, QRS, and T-waves.

Figure 2. Architecture of the 1D CNN model used for ECG classification.

Figure 3. HeteroFL framework for ECG classification. This figure shows the global model

W_{g}^{0}

on the central server, connected to five local ECG clients, each assigned one of three computational complexity levels:

L^{1}

(full model),

L^{2}

(75% of the model), and

L^{3}

(50% of the model).

Figure 3. HeteroFL framework for ECG classification. This figure shows the global model

W_{g}^{0}

on the central server, connected to five local ECG clients, each assigned one of three computational complexity levels:

L^{1}

(full model),

L^{2}

(75% of the model), and

L^{3}

(50% of the model).

Figure 4. Global Test Accuracy per round for different heterogeneous local clients.

Table 1. Class distribution of heartbeats in the MIT-BIH Arrhythmia dataset, highlighting the imbalance among different beat types.

Class	Description	Samples
N	Normal Beat	74,000
A	Atrial Premature Beat	6640
V	Premature Ventricular Contraction Beat	6732
L	Left Bundle Branch Block Beat	8843
R	Right Bundle Branch Block Beat	2979

Table 2. Hardware infrastructure.

Component	Specification
Disk size	4.5 TB
Processor model	Intel Xeon Processor W-2295
Number of processors	1
Memory	16 GB
Operating system (Linux)	Ubuntu 22.04.4 LTS
GPU	Nvidia A4000 GPU, RTX 4090
Python version	3.9.13

Table 3. Experiment hyperparameters.

Category	Key	Value
Control	data_split_mode	i.i.d
	model_split_mode	two-model/multi-model
	fixed_control_norm_bn	1
	control_scale_1	1
	num_clients	10
	control_frac	0.1
Data	data_name	MITBIHECG
	test_ratio	0.2
	num_epochs	25
	batch_size	64
	lr	0.001
	num_classes	5
	num_rounds	20
	seed	42
	server_learning_rate	0.001
Optimizer	optimizer_name	Adam
	weight_decay	3.0 × 10⁻⁴
	momentum	0.9
Scheduler	Reduce Relu

Table 4. Performance analysis and average number of parameters of various computational complexity combinations in multi-model settings for MIT-BIH Arrhythmia ECG (i.i.d) dataset.

Model	Ratio	Parameters	Acc., Global	Acc., Local	F1 Score, Local	F1 Score, Global
a	1	267,464	99.28	99.21	0.9735	0.9987
a-e	0.5	160,973	92.89	96.17	0.7053	0.8131
a-b-c-d-e	0.2	71,326	98.02	97.86	0.9325	0.8874
b	1	66,868	99.07	99.03	0.9678	0.9930
b-e	0.5	34,053	93.99	93.89	0.9329	0.9317
b-c-d-e	0.4, 0.3, 0.2, 0.1	32,763	97.53	97.66	0.9120	0.8533
c	1	16,774	98.56	98.47	0.9487	0.9416
c-e	0.5	10,574	95.61	97.17	0.8158	0.8544
c-d-e	0.5, 0.1, 0.4	9320	97.93	97.84	0.9330	0.8850
d	1	4263	97.22	98.17	0.9391	0.8902
d-e	0.5	2750	97.56	96.36	0.80	0.8800
e	1	1118	94.13	93.99	0.9083	0.8248

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Islam, M.M.; Alawad, M. Resource-Aware ECG Classification with Heterogeneous Models in Federated Learning. Future Internet 2025, 17, 130. https://doi.org/10.3390/fi17030130

AMA Style

Islam MM, Alawad M. Resource-Aware ECG Classification with Heterogeneous Models in Federated Learning. Future Internet. 2025; 17(3):130. https://doi.org/10.3390/fi17030130

Chicago/Turabian Style

Islam, Mohammad Munzurul, and Mohammed Alawad. 2025. "Resource-Aware ECG Classification with Heterogeneous Models in Federated Learning" Future Internet 17, no. 3: 130. https://doi.org/10.3390/fi17030130

APA Style

Islam, M. M., & Alawad, M. (2025). Resource-Aware ECG Classification with Heterogeneous Models in Federated Learning. Future Internet, 17(3), 130. https://doi.org/10.3390/fi17030130

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Resource-Aware ECG Classification with Heterogeneous Models in Federated Learning

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Dataset

3.2. ECG Classification Model

3.3. HeteroFL Framework for ECG Classification

4. Experimental Settings

4.1. Dataset Preprocessing

4.2. Heterogeneous Model Configurations

4.3. Performance Metrics

5. Results

6. Discussions

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI