Next Article in Journal
Priv-IQ: A Benchmark and Comparative Evaluation of Large Multimodal Models on Privacy Competencies
Next Article in Special Issue
On the Deployment of Edge AI Models for Surface Electromyography-Based Hand Gesture Recognition
Previous Article in Journal
AdaptiveSwin-CNN: Adaptive Swin-CNN Framework with Self-Attention Fusion for Robust Multi-Class Retinal Disease Diagnosis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Resource-Efficient Clustered Federated Learning Framework for Industry 4.0 Edge Devices

by
Atallo Kassaw Takele
* and
Balázs Villányi
Department of Electronics Technology, Faculty of Electrical Engineering and Informatics, Budapest University of Technology and Economics, 1111 Budapest, Hungary
*
Author to whom correspondence should be addressed.
Submission received: 3 December 2024 / Revised: 26 January 2025 / Accepted: 1 February 2025 / Published: 6 February 2025

Abstract

:
Industry 4.0 is an aggregate of recent technologies including artificial intelligence, big data, edge computing, and the Internet of Things (IoT) to enhance efficiency and real-time decision-making. Industry 4.0 data analytics demands a privacy-focused approach, and federated learning offers a viable solution for such scenarios. It allows each edge device to train the model locally using its own collected data and shares only the model updates with the server without the need to share real collected data. However, communication and computational costs for sharing model updates and performance are major bottlenecks for resource-constrained edge devices. This study introduces a representative-based parameter-sharing framework that aims to enhance the efficiency of federated learning in the Industry 4.0 environment. The framework begins with a server by distributing an initial model to edge devices, which then train it locally and send updated parameters back to the server for aggregation. To reduce communication and computational costs, the framework identifies groups of devices with similar parameter distributions and only sends updates from the resourceful and better-performing device, termed the cluster head, to the server. A backup cluster head is also elected to ensure reliability. Clustering is performed based on the device’s parameter distributions and data characteristics. Moreover, the server incorporates randomly selected past aggregated parameters into the current aggregation process through weighted averaging where more recent parameters are given greater weight to enhance model performance. Comparative experimental evaluation with the state of the art using a testbed dataset demonstrates promising results by minimizing computational cost while preserving prediction performance, which ultimately enhances data analytics on edge devices in industrial environments.

1. Introduction

Industry 4.0, also known as the Fourth Industrial Revolution, represents a paradigm shift in manufacturing and industrial processes through the integration of the latest digital technologies. This revolution brings together the Internet of Things (IoT), artificial intelligence, machine learning, robotics, cloud computing, edge computing, and big data analytics to create highly automated, intelligent, and interconnected systems [1,2]. Industry 4.0 utilizes these technologies to enable real-time monitoring, predictive maintenance, and self-optimization of processes, leading to greater efficiency, flexibility, and cost savings. These innovations enable machines and systems to communicate with one another, collect and analyze vast amounts of real-time data, make autonomous decisions, and adapt dynamically to changing conditions. This convergence of physical and digital technologies transforms traditional industries, reshaping supply chains, enhancing productivity, and fostering innovation across sectors like manufacturing, logistics, healthcare, and energy [1,3]. The result is a more efficient, flexible, and adaptive production environment, where downtime is minimized and productivity is maximized.
Artificial intelligence (AI) plays a significant role in Industry 4.0 by driving automation, optimization, and decision-making processes within smart factories and industrial systems [4,5]. AI technologies, such as machine learning, neural networks, and computer vision, enable machines to analyze large volumes of data generated by sensors, IoT devices, and other systems. This allows for real-time monitoring, predictive maintenance, and autonomous adjustments to production lines, which significantly improves efficiency and reduces downtime. AI also supports quality control by detecting defects, analyzing patterns, and even suggesting improvements without human intervention. Furthermore, AI-powered systems enhance supply chain management by predicting demand, optimizing inventory levels, and streamlining logistics. Tiny devices (such as IoT devices) are increasingly being utilized to perform machine learning tasks. Even though these devices often have limited computational resources and memory, they are being used in machine learning training through techniques like edge computing, federated learning, and model compression [6,7].
Federated learning (FL), which is the most well-known approach for distributed industrial systems, offers significant benefits to enhance data privacy and reduce data transfer [8]. However, maintaining model consistency across devices with different computational capabilities is a challenge. To address this, asynchronous updates and federated averaging algorithms are used, allowing devices to contribute to the global model at their own pace [9]. One major challenge is data heterogeneity, where data distributed across different devices and locations may vary in quality, format, and distribution, leading to difficulties in training consistent and accurate models [9,10,11]. Several methods have been proposed to address this challenge, including clustered and personalized federated learning techniques [12], which train models to specific local environments while still contributing to the global model. Another challenge is model computation and communication overhead, as devices must frequently exchange model updates, which can strain network resources and reduce efficiency. Solutions such as model sparsification and compression techniques can minimize the amount of data exchanged between devices and the central server [13,14]. However, implementing compression and sparsification introduces delays and additional overhead, particularly for resource-constrained devices and real-time applications.
In this paper, we proposed a typical clustered federated learning approach that employs representative parameter sharing and utilizes previously recorded aggregated parameters to optimize performance and resources for industrial tiny devices. The server initiates training by distributing an initial model to connected edge devices. After local training, edge devices send updated parameters and computational resources back to the server. Once the server receives updates from all edge devices, it clusters them based on their data distribution and logs the updated parameters for future training rounds. To reduce unnecessary communication costs, only a powerful device from each cluster can send updates to the server as parameter distributions are similar within the cluster. Hence, the server selects two cluster heads per group, one active and one backup, based on computational resource, performance, and communication delay, which are responsible for handling the communication. Furthermore, previously aggregated parameter samples are incorporated into the aggregation process through weighted averaging, with newer parameters given a higher ratio to improve predictive performance. To the best of the authors’ knowledge, representative parameter sharing with one active and one backup cluster head, as well as the utilization of previously aggregated parameter samples, has not been explored in the existing literature for minimizing resource usage and enhancing performance in Industry 4.0 tiny devices.
The rest of this study is presented as follows. Section 2 describes the related literature in the field, and Section 3 briefs the proposed approach’s methodology with pseudocode, equations, and figures. Section 4 presents the experimental evaluation of the proposed methodology with the existing state of the art. The conclusion and future work are presented in Section 5.

2. Literature

This section presents centralized and federated learning, along with their challenges, in Section 2.1 and Section 2.2, respectively. Section 2.3 discusses recent works related to the proposed approach.

2.1. Centralized Learning

Centralized learning is the most famous traditional machine learning approach for Industry 4.0, where data from various sources, such as sensors, IoT devices, and production systems, are collected and consolidated into a central server for processing. This enables the development of AI models that can analyze and optimize industrial processes at scale, helping to automate tasks, monitor systems, and predict maintenance needs [15,16]. Centralized learning is highly effective when large amounts of diverse data are required to train complex models, such as those used in supply chain optimization, quality control, and predictive analytics [17]. By aggregating data from multiple sources, centralized learning helps companies make data-driven decisions and streamline operations across different factories or production lines, creating uniform solutions that can be deployed across the organization.
However, centralized learning in Industry 4.0 comes with several challenges. One major issue is data privacy and security, particularly when sensitive data must be transferred to a central server [18,19]. Industries such as healthcare, automotive, and aerospace generate sensitive information that cannot be easily shared without risking data breaches or violating regulations [15,20]. Another challenge is data transfer and latency, moving vast amounts of data from edge devices (e.g., machines, sensors) to a centralized location can be expensive and time-consuming, especially for geographically distributed facilities. This can also lead to communication bottlenecks and delays in decision-making, reducing the real-time benefits of smart manufacturing. Additionally, data points from different devices are not drawn from the same distribution, which is usually known as non-independent and identically distributed (non-IID) in the literature [21]. Non-IID may result in overfitting to the dominant data distribution, leading to poor generalization performance on minority distributions.

2.2. Federated Learning

To address the challenges of centralized learning in Industry 4.0, several solutions are emerging. Federated and on-device learning offer decentralized alternatives by enabling data processing and model training at the edge, which is closer to where data are generated rather than relying on a central server [22,23,24,25]. This reduces latency, data transfer costs, and the risk of communication bottlenecks while also preserving data privacy by keeping sensitive data local. Data encryption and secure communication protocols can further safeguard data during transmission, addressing security concerns. Additionally, hybrid approaches that combine centralized and decentralized learning allow industries to balance the benefits of both models, optimizing resource use and minimizing the risk of single points of failure. Finally, adopting scalable, cloud-based infrastructure can provide the computational power needed while distributing workloads efficiently. Federated learning is a decentralized machine learning approach that allows multiple devices or systems to collaboratively train a model without sharing raw data [26]. In this setup, data remain local to each edge device (such as sensors, actuators, and machines), and only the model updates are sent to a central server for aggregation [27]. This technique enhances data privacy and security because sensitive information never leaves the local environment, making it particularly useful in industries with strict privacy requirements, like healthcare, manufacturing, and finance. Federated learning also reduces latency and bandwidth usage by minimizing data transfer, as models are trained at the edge where the data are generated.

2.3. Related Works

The authors in [28] integrate blockchain technology with federated learning to enhance sustainability performance. Throughout the process, blockchain continuously monitors and stores detailed logs, helping to detect any flaws in production. Federated learning plays a key role by validating both sustainability metrics and flaw detection, enabling adjustments to be made in subsequent operational cycles. This approach identifies sustainability levels based on internal and external distribution demands, offering precise recommendations for improvement. During the learning phase, the system predicts the maximum achievable sustainability and optimizes performance accordingly. These predictions are then used to adjust the production process, based on the recommended changes. Consequently, energy scheduling is refined using a combination of blockchain and learning frameworks, gradually improving sustainability across different operational timelines and varying demand levels. However, there are potential complexity and computational costs associated with integrating blockchain and FL, particularly when handling large-scale production data.
Brik et al. [29] introduced a disruption monitoring tool for Industry 4.0, utilizing Fog computing and federated learning. The tool focuses on disruptions from resource mislocalization, where a mobile resource is in an unexpected location. Using federated deep learning, they build a predictive model for resource localization in a distributed manner, ensuring privacy by processing personal data without central storage. Trained on real-world task schedules and resource locations, the model is deployed at the Fog computing layer for low-latency disruption detection. By comparing predicted and actual resource locations (collected via IoT), the tool detects real-time disruptions. A dynamic rescheduling module then assigns tasks to the nearest available resource, improving accuracy and reducing delays. Reliance on Fog computing infrastructure is beneficial for latency reduction but may introduce issues with scalability and computational limitations, as well as node disappearance due to battery shortage.
The authors in [30] proposed an attack–defense model to analyze the security of an IoT-based Transactive Energy System (IoTES). The model provides a framework for real-time interfacing and energy trading among demand and supply nodes. Initially, an experience-driven false data injection attack (FDIA) is designed to disrupt the IoTES at the distribution level, targeting reliable energy operations. A recurrent deterministic policy gradient (RDPG) algorithm is used to launch an optimal attack within a continuous observation and action space model. The user outage rate (UOR) is the key metric for assessing the impact of the FDIA. To counter this, a decentralized FDIA detection scheme based on deep federated learning with attentive aggregation (DeepFed-AA) is employed. This scheme detects stealthy attacks without violating data privacy while offering fast detection, high accuracy, and scalability in distributed IoT environments. Despite privacy protections, data leakage could still occur through indirect inference in dynamic IoT systems and real-world deployment may cause a challenge for resource-deficient devices due to the complexity of deployment and continuous updates across nodes
Tahir et al. [31] proposed a secure aggregation algorithm that defends against both client-side and server-side malicious behaviors in federated learning. The core of the design is a new secure partial aggregation protocol that enhances the robustness and security of the learning process. On the client side, we restrict each client to upload only a required proportion of its model updates, rather than the entire model. This reduces the risk of exposing sensitive information and limits the potential impact of any malicious client attempting to send deceptive or harmful updates. The server side ensures that model updates are encrypted, preventing the server from inferring private information while still allowing it to aggregate updates securely. The server applies techniques that can perform aggregations without needing access to the raw updates, ensuring data privacy even if the server is compromised. Additionally, the system incorporates methods to detect dishonest client behavior. While improving efficiency, restricting clients to only send partial updates may reduce the accuracy of the aggregated global model, since only a subset of the data are being utilized.
The authors in [32] presented a customized federated learning framework to address challenges in non-IID data settings. This approach integrates knowledge distillation during the training of local models to boost generalization capabilities, while also applying differential privacy mechanisms to introduce noise into output parameters, ensuring privacy preservation. Additionally, a bidirectional feedback mechanism is introduced to dynamically adjust the knowledge distillation and differential privacy parameters based on both model performance and user-specific privacy needs. This adaptive process aims to balance privacy protection with model efficiency, striving for optimal performance under various privacy constraints. However, maintaining optimal model accuracy as the noise introduced by differential privacy could affect learning performance in certain sensitive environments.
Chen [33] proposed a sharing incentive mechanism for federated learning in edge computing using contract theory, data quality, and resource allocation. It addresses two key incentive points: the cooperative game model among model consumers, which aims to maximize coalition profit in a multi-task federated learning system, and a dynamic game with incomplete information between workers and model consumers, who select workers based on performance. To optimize consumer profits, the paper frames the incentive as an optimization problem, ensuring individual rationality (IR) and incentive compatibility (IC). A novel weight dispersion evaluation mechanism based on the Wasserstein distance is introduced to measure the impact of workers’ data quality on model performance. The feasibility of the approach is rigorously proven, and experimental results show that the optimal contract effectively motivates high-quality workers, improving the efficiency of FL systems. The computational overhead and dynamic nature of worker behavior and data quality over time are underexplored, which could affect long-term system performance.
A semi-decentralized federated edge learning framework was proposed in [34] that handles limited training data in a single central cluster, where edge models are trained across devices and edge servers. The authors formulate an optimization problem for edge aggregation, considering device association, resource block allocation, and edge server placement, with the goal of minimizing training loss while staying within the edge server’s budget. They propose transforming the problem into a dynamic optimization challenge based on training loss degradation. A Trilateral Matching-based Association (TMA) method is introduced to solve device association and resource allocation subproblems, and a Tabu Search-based Service Placement (TSP) method is used to optimize edge server placement. The proposed framework can face scalability challenges as the complexity of the optimization problem increases with the number of edge devices, resource blocks, and edge servers.
The authors in [35] proposed a method that safeguards data distribution characteristics while enabling efficient similarity computation. It ensures that the server cannot reconstruct individual models from aggregated models across any number of training rounds, preserving privacy during user feature analysis. Unlike traditional random selection, which risks local model leakage over time, a lightweight differential privacy technique was applied to ensure timely processing in time-critical systems. To meet real-time requirements, the algorithm simplifies label analysis by focusing on the presence or absence of labels rather than exact counts. A linear-complexity clustering algorithm is used to improve the efficiency of analyzing heterogeneous data. However, the utilization of differential privacy limits performance and imposes computational overhead on resource-constrained devices.

3. Methodology

The Methodology Section outlines the approach adopted in this study, structured into five subsections: Section 3.1: Overview of the Proposed Framework, describing the system’s design; Section 3.2: Model Initialization, detailing initial parameter setup; Section 3.3: Edge Device Training, explaining local training processes; Section 3.4: Clustering and Cluster Head Selection, outlining the clustering approach and selection criteria; and Section 3.5: Utilization of Previously Aggregated Parameters, discussing the use of past aggregated data to enhance performance.

3.1. Overview of the Proposed Framework

In this study, a representative-based parameter-sharing framework is proposed for clustered federated learning. As shown in Figure 1, the training process is initiated by the server, which distributes the initial model to all connected edge devices. Upon receiving the initial model in the first round, edge devices begin training using their available collected data. Subsequently, they transmit updated parameters and computational resources back to the server. Upon receiving the updated parameters and computational capabilities of the edge devices, the server is responsible for clustering, selecting cluster heads, aggregating parameters, and finally distributing the aggregated parameters along with the announcement of the cluster heads.
If the parameter distribution of edge devices is similar, it is unnecessary to receive parameters from all of them in every round. Consequently, one of the most powerful edge devices in terms of resources and predictive performance within each group can send parameters to the server. Hence, the server gathers all parameters, performs clustering, and selects two cluster head candidates. The two cluster heads are elected based on their resource, performance, and delay. The top-ranked edge device, labeled green in Figure 1, sends the parameters to the server, while the second-ranked device, labeled yellow in Figure 1, serves as the backup cluster head. The clustering algorithm employs edge device parameters and data distribution for grouping. Then, the server sends a message for the elected devices, which announces if they are selected as cluster heads or backup cluster heads. This approach conserves communication costs, computational costs at the server, and parameter transmission at edge devices, and delays while aggregating redundant parameters.
All of the aggregated parameters are stored at the server. To enhance performance, we leverage samples of these stored aggregated parameters in the current aggregation process. The proposed method randomly selects some of the stored aggregated parameters and applies weighted averaging with the current aggregated value. The current aggregated value is assigned a higher ratio, while the remaining parameters receive a lower ratio. The ratio of the stored parameters depends on their age, with newer parameters receiving a higher ratio and older ones receiving a lower ratio.

3.2. Model Initialization

The central server initiates the training process by creating an initial global model, denoted as θ 0 . Algorithm 1 depicts the server-side operation, including parameter initialization, clustering, cluster head selection, parameter aggregation, and reuse of previously aggregated parameters. This initial model serves as the starting point for all edge devices participating in the federated learning process. The server distributes θ 0 to all connected edge devices simultaneously. These edge devices, which are typically distributed across different geographical locations and possess varying computational resources and local data distributions, receive the initial model parameters. This initial distribution ensures that every edge device starts the training process from the same baseline model, facilitating a more coordinated and synchronized training process across the network. By providing a unified initial model, the server sets a common ground for the subsequent training iterations, allowing the edge devices to begin their local training with consistent parameters. This step is crucial for maintaining uniformity and ensuring that the aggregation of parameters in later stages is meaningful and effective, ultimately contributing to the stability and convergence of the federated learning system.
Algorithm 1 Server-side training
1:
Input:
2:
Output:
3:
Initialize model parameters θ 0
4:
if round r = 0  then
5:
    Distribute θ 0 to all edge devices E = { E 1 , E 2 , , E N }
6:
     { θ 1 t , θ 2 t , , θ N t } receive ( E 1 , E 2 , , E N )
7:
    Receive data distribution of edge devices ( d ( E ) ), performance, and resources of edge devices
8:
     { C 1 , C 2 , , C k } Cluster ( d ( E 1 ) , d ( E 2 ) , , d ( E N ) )    ▹ Server performs clustering
9:
    for each cluster C k in clusters do
10:
                            ▹ Select top 2 devices based on resources, performance, and delay
11:
         C H 1 c i , C H 2 c i top 2 devices in C i
12:
    end for
13:
                                                                   ▹ Aggregation of parameters at the server
14:
     θ taggreg t = aggregate { θ 1 t , θ 2 t , , θ N t }
15:
     S store ( θ aggreg t )                                                             ▹ Store Aggregated Parameters
16:
    Broadcast aggregated parameters ( θ aggreg t ) and Cluster Head Status (CH status)
17:
else
18:
                                                            ▹ Server receives parameters from cluster heads
19:
     { θ 1 t , θ 2 t , , θ N t } receive ( C H 1 c 1 , C H 1 c 2 , , C H 1 c N )
20:
                                                   ▹ Randomly choose S among the aggregated parameters
21:
     S S = { θ aggreg t 1 , θ aggreg t 2 , , θ aggreg t r }
22:
                                            ▹ Assign weight values to the set of chosen stored parameters
23:
     w t > w t 1 > w t 2 > > w t m
24:
                                                            ▹ Aggregation of current parameters at the server
25:
     θ aggreg t aggregate ( { θ 1 t , θ 2 t , , θ N t } )
26:
     θ aggreg t w t θ aggreg t + j S w j θ aggreg j        ▹ Current parameter update with stored parameters
27:
                                                                       ▹ Store Aggregated Parameters
28:
     S store ( θ aggreg t )
29:
    Broadcast aggregated parameters ( θ aggreg t )
30:
end if

3.3. Edge Device Training

As shown in Algorithm 2, the edge devices start training after receiving the initial global model parameters θ 0 from the central server. Using these initial parameters, each device then proceeds to train the model on its own local dataset D i . This involves running several iterations of a training algorithm. The process includes calculating the gradient of the loss with respect to the model parameters, updating the parameters accordingly, and repeating this over multiple epochs until convergence. Through this localized training, each edge device refines the model to better fit its specific data, resulting in updated parameters θ i t that capture local data patterns. The local training allows each edge device to contribute unique insights from its own data without sharing the data themselves, thereby preserving privacy. By the end of each round of training, all edge devices have locally optimized versions of the model parameters, ready to be aggregated by the central server. This decentralized approach not only enhances the model’s overall performance and robustness by incorporating diverse data distributions but also maintains data security by keeping the data local to each device.
θ i t = T r a i n ( θ t 1 , D i )
where edge device E i receives θ and updates it using its local data D i , and Train( θ , D) represents a training procedure that updates the model parameters θ using the dataset D.
Algorithm 2 Client-side training
1:
Input:
2:
Output:
3:
if round r = 0  then
4:
    Edge device E i receives initial model from the server θ i t
5:
     θ i t Train ( θ t 1 , D i )             ▹ Undertake training with the available data D i
6:
    Send θ i t , data distribution d ( E ) , performance, and resources to the server
7:
else
8:
     E i receives aggregated parameters θ aggreg t and CH status from the server
9:
     θ i t Train ( θ aggreg t , D i )
10:
    if CH status = 1 then
11:
        Send parameters θ i t to the server
12:
    end if
13:
end if
Once each edge device completes its local training, it transmits its updated model parameters θ i t back to the central server. Along with these parameters, the edge devices also send information about their computational resources, data distribution, and performance metrics. This additional information includes details such as processing power, memory availability, network bandwidth, and training efficiency, which are essential for the server to evaluate the capability and reliability of each device. This transmission allows the server to gather a comprehensive dataset comprising the updated parameters from all participating devices, as well as insights into the resources and performance of each device. This information is crucial for the subsequent steps, where the server will perform clustering and select cluster heads based on these metrics to optimize the federated learning process. The aim is to efficiently manage the communication overhead and computational load while ensuring that the model continues to improve in a resource-effective manner.

3.4. Clustering and Cluster Head Selection

Once the central server receives the updated model parameters θ i t and resource metrics from all edge devices, it groups these devices into clusters based on the similarity of their parameter distributions and data characteristics. The goal is to create clusters (C1, C2, …, Ck) such that devices within the same cluster have similar parameter updates, indicating they are learning from similar data patterns. This grouping helps to optimize the federated learning process by reducing similar parameter transmissions and focusing on representative updates. Consequently, communication costs are minimized as the cluster members stopped transmitting updated parameters to the server. By leveraging the natural similarities among the devices, the server can ensure more efficient communication and aggregation, ultimately improving the overall learning efficiency and model performance.
{ C 1 , C 2 , , C k } = Cluster { θ i t } i G
where k is the number of clusters.
Following the clustering step, the server proceeds to select two cluster head candidates from each cluster k. These candidates are chosen based on a combination of factors, including their computational resources, predictive performance, and communication delay. The server evaluates each device within a cluster and ranks them accordingly. The top-ranked device, designated as the primary cluster head CH k , is responsible for sending its updated parameters θ C H k h to the server. The second-ranked devices, CH k , serve as backup cluster heads. These backups ensure robustness in the system; if the primary cluster head fails or encounters issues, the backups can take over the transmission duties. By selecting cluster heads based on resource availability and performance, the server optimizes the parameter aggregation process, minimizes communication overhead, and enhances the reliability and scalability of the federated learning framework. This strategic selection ensures that the most capable and efficient devices are utilized to represent each cluster, thereby contributing to the overall effectiveness of the model training process.
N CPU ( E i ) = max ( CPU ) min ( CPU ) CPU ( E i ) min ( CPU )
N MEM ( E i ) = max ( MEM ) min ( MEM ) MEM ( E i ) min ( MEM )
N Perf ( E i ) = max ( Perf ) min ( Perf ) Perf ( E i ) min ( Perf )
N Delay ( E i ) = max ( Delay ) min ( Delay ) Delay ( E i ) min ( Delay ) 1
S ( E i ) = w C × N CPU ( E i ) + w M × N Mem ( E i ) + w P × N Perf ( E i ) + w D × N Delay ( E i )
where wC, wM, wP, wD are weights for CPU, memory, performance, and delay, respectively
CH 1 , CH 2 arg max E i E S ( E i )
For electing cluster heads, we utilized a multi-criteria decision-making approach, as presented in Equations (3)–(8) and Algorithm 3. It begins by taking input metrics for each IoT device, including CPU capacity, memory capacity, prediction performance, and communication delay, along with assigned weights for each criterion. It then normalizes these metrics to a common scale, where higher values represent better performance (except for communication delay, where lower values are better). We assign an equal weight for all criteria to avoid bias, as prioritizing each criterion depends on specific application scenarios and expert decisions. Next, it computes an aggregate score for each device by applying the weighted sum of these normalized values. Finally, the device with the highest aggregate score is selected as the cluster head.
Algorithm 3 Multi-criteria decision-making approach for cluster head selection
1:
Input: Performance (Perf), Communication Delay (Delay), Resources (CPU, MEM) of edge devices
2:
Output: Selected Cluster Heads ( CH 1 , CH 2 )
3:
Server receives performance (Perf), communication delay (Delay), and resources (CPU, MEM) of edge devices
4:
Initialize weights uniformly ( w C , w M , w P , w D for CPU, memory, performance, and delay respectively)
                                                                                                           ▹ Normalize the criteria
5:
for each edge device E i  do
6:
     N CPU ( E i ) max ( CPU ) min ( CPU ) CPU ( E i ) min ( CPU )
7:
     N MEM ( E i ) max ( MEM ) min ( MEM ) MEM ( E i ) min ( MEM )
8:
     N Perf ( E i ) max ( Perf ) min ( Perf ) Perf ( E i ) min ( Perf )
9:
     N Delay ( E i ) max ( Delay ) min ( Delay ) Delay ( E i ) min ( Delay ) 1
10:
end for
                                                                                          ▹ Calculate score for each edge device
11:
for each edge device E i  do
12:
     S ( E i ) w C × N CPU ( E i ) + w M × N MEM ( E i ) + w P × N Perf ( E i ) + w D × N Delay ( E i )
13:
end for
14:
                                                                            ▹ Select the top 2 edge devices with the highest scores
15:
CH 1 , CH 2 arg max E i E S ( E i )

3.5. Utilization of Previously Aggregated Parameters

The server continuously records the output of the aggregated parameters from each round. In this phase, the central server retains a history of past aggregated model parameters to enhance future aggregation processes. By maintaining this repository, the server can leverage the diversity and robustness of these historical parameters. The stored aggregated parameters reflect various stages of model training influenced by different subsets of edge devices and their local data distributions, thus enriching the server’s capability to produce a more generalized and stable model in future rounds. This storage mechanism is a strategic measure to prevent the loss of useful information and to continually improve the model by integrating knowledge from past iterations. These sets of stored parameters can be denoted as
S = { θ aggreg t 1 , θ aggreg t 2 , , θ aggreg 0 }
where t is the number of the current round, and S is the set of stored aggregated parameters
The server takes samples of previously aggregated parameters and utilizes weighted averaging to incorporate them into the current aggregation. Once the server receives the current round’s parameters θ i t from the selected cluster heads, it performs a weighted averaging of these parameters with a randomly selected subset of stored parameters S S . The recent set of aggregated parameters is assigned a higher weight w t to ensure that the latest training updates have a significant impact on the global model. The stored parameters are assigned weights w j based on their age, with newer stored parameters receiving higher weights than older ones. The equation for weighted averaging is formulated as
θ aggreg t = w t θ aggreg t + j S w j θ aggreg j
where w t is the weight for the current aggregated parameters, w j is the weight for the stored aggregated parameters ( w t > w t 1 > w t 2 > > w 0 ), θ aggreg t is the current aggregation output, S’ is the set of selected parameters S S , and θ aggreg j is the set of selected stored aggregation.
This method was designed to balance the influence of recent updates with the historical knowledge captured in previous rounds, promoting a more stable and robust model. By integrating past and present parameters, the server enhances the model’s ability to generalize across diverse and potentially non-IID data distributions from different edge devices. This weighted averaging approach helps mitigate the effects of noise and outliers in any single round, leading to a more consistent and reliable federated learning process.

4. Results

In this section, we present and analyze the results obtained from the experimental evaluation to assess the performance of the proposed approach compared with existing methods. The experiment was performed on a machine with 64 GB RAM and an “NVIDIA GeForce RTX 4060 Ti” GPU processor (Santa Clara, CA, USA), operating at a base clock speed of 3.2 GHz. The proposed algorithm was implemented using Python with the TensorFlow framework. We selected Gated Recurrent Units (GRUs) for implementing the proposed approach due to its ability to efficiently handle complex time-series data. GRUs are highly effective at capturing temporal dependencies while being computationally lighter than other recurrent neural networks, which makes them suitable for resource-constrained devices. Their lightweight architecture ensures faster training and inference, which is crucial for applications on devices with limited processing power and memory.
We employ a multi-threading approach to simulate the implementation of the client–server architecture [36,37]. The server program operates on the main thread, while separate threads are allocated for each task running on edge nodes. Tasks including model initialization, parameter recording, clustering, cluster head selection, and aggregation are executed on the main thread. In contrast, the tasks in Algorithm 1 for edge devices, which involve local training and resource and parameter sharing, are processed in subthreads.
The remainder of this section is structured as follows: Section 4.1: Dataset and Evaluation Metrics describes the data used and the metrics for evaluation; Section 4.2: Baseline Methods outlines the comparison approaches; Section 4.3: Time Complexity Analysis, analyzes the computational efficiency; and Section 4.4: Simulation Results, presents and interpreting the outcomes of the experiments.

4.1. Dataset and Evaluation Metrics

The experiment utilized the Ton-IoT testbed dataset sourced from the University of New South Wales, focusing on specific subsets among a total of seven distinct sensor datasets [38]. In particular, the Weather and Thermostat datasets, which include both attacked and normal categories, were utilized among the seven subsets of the Ton-IoT dataset to assess the effectiveness of the proposed methodology. The rationale for choosing the weather subset is its significant impact on industrial processes, such as cooling and heating systems, while the thermostat subset represents internal control systems critical for maintaining operational efficiency in industrial environments. The weather subset contains 650,242 records, with 14% of them being attacked, while the thermostat subset has 442,228 records, 13% of which are attacked. The entire dataset was evenly split into 40 rounds to simulate federated learning, with 70% of the data used for training the model and 30% reserved for testing.
A comprehensive comparative analysis was conducted between the proposed approach and the selected baseline methods. A variety of metrics were used in this evaluation, including accuracy, precision, recall, first harmonic mean of precision and recall (F1-score), time complexity analysis, and processing time. The processing time and time complexity analysis were utilized to assess the computational cost and complexity associated with both the proposed and the baseline methods. Besides, we computed the standard deviation ( σ ) of the accuracy to measure the variability in the results across rounds. This provides insight into the consistency of the model’s performance, with lower values indicating stable accuracy and higher values reflecting greater fluctuations. Table 1 presents the parameter settings employed during the experiment.

4.2. Baseline Methods

The proposed methodology was meticulously evaluated against Federated Averaging and Clustered Federated Averaging (Clustered Fed-Avg), which are the most widely adopted frameworks in the realm of federated learning [33,36]. This comparative analysis aims to demonstrate the advantages and efficiencies of the proposed approach, particularly in terms of its prediction performance and computational efficiency. By contrasting the proposed method with Federated Averaging and Clustered Fed-Avg, this study aimed to demonstrate how the proposed method addresses some of the limitations associated with traditional federated learning methods. This comparison provides valuable insights into its potential applications and contributions to the advancement of federated learning practices in Industry 4.0.

4.3. Time Complexity Analysis

Both theoretical and runtime evaluations were conducted to assess the effectiveness of the proposed approach. Time complexity analysis was assessed to evaluate the computational overhead. As depicted in Algorithm 1, edge devices should only send the updated parameters in the first round, which saves the additional resources required for sending them back. For the subsequent round, only the cluster heads are required to send updated parameters. Hence, let D be the dimension of the parameter and R be the number of rounds; then, the time complexity for sending parameters is O(1) and receiving parameters is O ( R × D ) . However, the time complexity of Federated Averaging and Clustered Fed-Avg for both sending and receiving is O ( R × D ) , since they must send and receive from all member edge devices. The time complexity of parameter aggregation for the proposed method is O ( C × D ) , as the server receives updated parameters only from each cluster head: where C is the number of clusters. In contrast, for Federated Averaging and Clustered Fed-Avg, the time complexity is O ( N × D ) , where N is the total number of edge devices. The server should also undertake clustering, cluster head selection, and sampling of the stored aggregated parameters. For the cluster head selection process, lines 5 through 13 of Algorithm 3 include two independent loops, resulting in a time complexity of O ( m × n ) . The time complexity of clustering and sampling depends on the specific methods utilized.
To sum up, the cluster members can conserve resources required for sending updated parameters compared with Federated Averaging and Clustered Fed-Avg. The cluster heads need to send updated parameters, but they are able to manage this as they are assumed to have better resources than the other cluster members. Although the server incurs some additional overhead for storing aggregated parameters, clustering, cluster head selection, and sampling the aggregated parameters, it only aggregates a smaller number of parameters sent by the cluster heads.

4.4. Simulation Results

Figure 2a,b presents a comparison of the training times for the weather and thermostat datasets, respectively. These figures clearly highlight that the proposed model significantly reduces the training time in comparison with the Federated Averaging and Clustered Fed-Avg. The reduction in training time becomes particularly noticeable starting from the second training round. This notable improvement is primarily due to the proposed model’s ability to cluster the edge devices and focus on training only the cluster heads’ data after the first round. By limiting the training to the most representative data from the cluster heads, the model avoids redundant processing, resulting in a more efficient training process.
Table 2 and Table 3 depict the experimental outcomes assessing the performance of the proposed approach on weather and thermostat datasets, respectively. The results demonstrate that the proposed method achieves notable improvements over the baseline approaches. Specifically, it enhances the model’s ability to categorize data points with their respective labels. This improvement underscores the efficacy of the systematic strategies employed in the proposed methodology. Specifically, the utilization of previously aggregated parameters in the current round, combined with the grouping of edge devices by similar distributions for aggregation, significantly contributes to the improved results.

5. Conclusions

This study introduces an enhanced clustered federated learning approach by incorporating representative-based parameter sharing and utilizing previously stored aggregated parameters to optimize resources and improve performance. By strategically clustering edge devices based on their parameter and data distributions, the proposed method reduces the parameter transmissions from all member edge devices, thus minimizing communication and computational costs. The use of cluster heads that are selected based on resource availability, performance, and latency further enhances efficiency by allowing only the most capable devices to transmit updated parameters. Additionally, the weighted averaging of stored aggregated parameters with new values improves the model’s performance over time, with a bias towards more recent data. The results of the experimental evaluation are promising in terms of resource optimization and prediction performance.
In the proposed method, equal weights have been applied to the cluster head selection criteria; however, future work could delve into biased weight assignments based on the relative importance of the criteria for specific application scenarios. Due to resource limitations, the experiment was conducted using threading simulation and two subsets of TON-IoT datasets. Hence, future works could focus on securing additional resources to enable practical implementation and testing of the proposed method on more real-time datasets.

Author Contributions

Conceptualization, A.K.T. and B.V.; methodology, A.K.T.; software, A.K.T.; validation, A.K.T. and B.V.; formal analysis, A.K.T. and B.V.; investigation, A.K.T. and B.V.; resources, A.K.T. and B.V.; data curation, A.K.T. and B.V.; writing—original draft preparation, A.K.T.; writing—review and editing, A.K.T. and B.V.; visualization, A.K.T.; supervision, B.V.; project administration, B.V.; funding acquisition, B.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset used was public and cited in the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Ullah, I.; Hassan, U.U.; Ali, M.I. Multi-level federated learning for Industry 4.0-a crowdsourcing approach. Procedia Comput. Sci. 2023, 217, 423–435. [Google Scholar] [CrossRef]
  2. Devagiri, J.S.; Paheding, S.; Niyaz, Q.; Yang, X.; Smith, S. Augmented Reality and Artificial Intelligence in Industry: Trends, tools, and future challenges. Expert Syst. Appl. 2022, 207, 118002. [Google Scholar] [CrossRef]
  3. Jan, Z.; Ahamed, F.; Mayer, W.; Patel, N.; Grossmann, G.; Stumptner, M.; Kuusk, A. Artificial intelligence for Industry 4.0: Systematic review of applications, challenges, and opportunities. Expert Syst. Appl. 2023, 216, 119456. [Google Scholar] [CrossRef]
  4. Qu, Y.; Pokhrel, S.R.; Garg, S.; Gao, L.; Xiang, Y. A blockchained federated learning framework for cognitive computing in Industry 4.0 networks. IEEE Trans. Ind. Inform. 2020, 17, 2964–2973. [Google Scholar] [CrossRef]
  5. Zhou, H.; She, C.; Deng, Y.; Dohler, M.; Nallanathan, A. Machine learning for massive industrial internet of things. IEEE Wirel. Commun. 2021, 28, 81–87. [Google Scholar] [CrossRef]
  6. Liu, H.I.; Galindo, M.; Xie, H.; Wong, L.K.; Shuai, H.H.; Li, Y.H.; Cheng, W.H. Lightweight Deep Learning for Resource-Constrained Environments: A Survey. ACM Comput. Surv. 2024, 56, 1–42. [Google Scholar] [CrossRef]
  7. Imteaj, A.; Thakker, U.; Wang, S.; Li, J.; Amini, M.H. A survey on federated learning for resource-constrained IoT devices. IEEE Internet Things J. 2021, 9, 1–24. [Google Scholar] [CrossRef]
  8. Boobalan, P.; Ramu, S.P.; Pham, Q.V.; Dev, K.; Pandya, S.; Maddikunta, P.K.R.; Gadekallu, T.R.; Huynh-The, T. Fusion of Federated Learning and Industrial Internet of Things: A survey. Comput. Netw. 2022, 212, 109048. [Google Scholar] [CrossRef]
  9. Yaacoub, J.P.A.; Noura, H.N.; Salman, O. Security of federated learning with IoT systems: Issues, limitations, challenges, and solutions. Internet Things Cyber-Phys. Syst. 2023, 3, 155–179. [Google Scholar] [CrossRef]
  10. Zhang, T.; Gao, L.; He, C.; Zhang, M.; Krishnamachari, B.; Avestimehr, A.S. Federated learning for the internet of things: Applications, challenges, and opportunities. IEEE Internet Things Mag. 2022, 5, 24–29. [Google Scholar] [CrossRef]
  11. Campos, E.M.; Saura, P.F.; González-Vidal, A.; Hernández-Ramos, J.L.; Bernabe, J.B.; Baldini, G.; Skarmeta, A. Evaluating Federated Learning for intrusion detection in Internet of Things: Review and challenges. Comput. Netw. 2022, 203, 108661. [Google Scholar] [CrossRef]
  12. Li, Z.; Zhao, H.; Li, B.; Chi, Y. SoteriaFL: A unified framework for private federated learning with communication compression. Adv. Neural Inf. Process. Syst. 2022, 35, 4285–4300. [Google Scholar]
  13. Li, S.; Qi, Q.; Wang, J.; Sun, H.; Li, Y.; Yu, F.R. GGS: General gradient sparsification for federated learning in edge computing. In Proceedings of the ICC 2020—2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–7. [Google Scholar]
  14. Takele, A.K.; Villány, B. LSTM-autoencoder based incremental learning for industrial Internet of Things. IEEE Access 2023, 11, 137929–137936. [Google Scholar] [CrossRef]
  15. Dramé-Maigné, S.; Laurent, M.; Castillo, L.; Ganem, H. Centralized, distributed, and everything in between: Reviewing access control solutions for the IoT. ACM Comput. Surv. (CSUR) 2021, 54, 1–34. [Google Scholar] [CrossRef]
  16. Boccella, A.R.; Centobelli, P.; Cerchione, R.; Murino, T.; Riedel, R. Evaluating centralized and heterarchical control of smart manufacturing systems in the era of Industry 4.0. Appl. Sci. 2020, 10, 755. [Google Scholar] [CrossRef]
  17. Baena, F.; Guarin, A.; Mora, J.; Sauza, J.; Retat, S. Learning factory: The path to Industry 4.0. Procedia Manuf. 2017, 9, 73–80. [Google Scholar] [CrossRef]
  18. Chen, J.; Xue, J.; Wang, Y.; Huang, L.; Baker, T.; Zhou, Z. Privacy-Preserving and Traceable Federated Learning for data sharing in industrial IoT applications. Expert Syst. Appl. 2023, 213, 119036. [Google Scholar] [CrossRef]
  19. Mozaffari-Kermani, M.; Sur-Kolay, S.; Raghunathan, A.; Jha, N.K. Systematic poisoning attacks on and defenses for machine learning in healthcare. IEEE J. Biomed. Health Inform. 2014, 19, 1893–1905. [Google Scholar] [CrossRef]
  20. Nia, A.M.; Mozaffari-Kermani, M.; Sur-Kolay, S.; Raghunathan, A.; Jha, N.K. Energy-efficient long-term continuous personal health monitoring. IEEE Trans. Multi-Scale Comput. Syst. 2015, 1, 85–98. [Google Scholar] [CrossRef]
  21. Khalil, U.; Malik, O.A.; Uddin, M.; Chen, C.L. A comparative analysis on blockchain versus centralized authentication architectures for IoT-enabled smart devices in smart cities: A comprehensive review, recent advances, and future research directions. Sensors 2022, 22, 5168. [Google Scholar] [CrossRef]
  22. Friha, O.; Ferrag, M.A.; Benbouzid, M.; Berghout, T.; Kantarci, B.; Choo, K.K.R. 2DF-IDS: Decentralized and differentially private federated learning-based intrusion detection system for industrial IoT. Comput. Secur. 2023, 127, 103097. [Google Scholar] [CrossRef]
  23. Lazzarini, R.; Tianfield, H.; Charissis, V. Federated Learning for IoT Intrusion Detection. AI 2023, 4, 509–530. [Google Scholar] [CrossRef]
  24. Ren, H.; Anicic, D.; Runkler, T.A. Towards semantic management of on-device applications in industrial IoT. ACM Trans. Internet Technol. 2022, 22, 1–30. [Google Scholar] [CrossRef]
  25. Ren, H.; Anicic, D.; Runkler, T.A. The synergy of complex event processing and tiny machine learning in industrial IoT. In Proceedings of the 15th ACM International Conference on Distributed and Event-Based Systems, Milan, Italy, 28 June–2 July 2021; pp. 126–135. [Google Scholar]
  26. Rashid, M.M.; Khan, S.U.; Eusufzai, F.; Redwan, M.A.; Sabuj, S.R.; Elsharief, M. A federated learning-based approach for improving intrusion detection in industrial internet of things networks. Network 2023, 3, 158–179. [Google Scholar] [CrossRef]
  27. Farahani, B.; Monsefi, A.K. Smart and collaborative industrial IoT: A federated learning and data space approach. Digit. Commun. Netw. 2023, 9, 436–447. [Google Scholar] [CrossRef]
  28. Sun, F.; Diao, Z. Federated Learning and Blockchain-Enabled Intelligent Manufacturing for Sustainable Energy Production in Industry 4.0. Processes 2023, 11, 1482. [Google Scholar] [CrossRef]
  29. Brik, B.; Messaadia, M.; Sahnoun, M.; Bettayeb, B.; Benatia, M.A. Fog-supported low-latency monitoring of system disruptions in Industry 4.0: A federated learning approach. ACM Trans. Cyber-Physical Syst. (TCPS) 2022, 6, 1–23. [Google Scholar] [CrossRef]
  30. Tahir, B.; Jolfaei, A.; Tariq, M. Experience-driven attack design and federated-learning-based intrusion detection in Industry 4.0. IEEE Trans. Ind. Inform. 2021, 18, 6398–6405. [Google Scholar] [CrossRef]
  31. Gao, J.; Zhang, B.; Guo, X.; Baker, T.; Li, M.; Liu, Z. Secure partial aggregation: Making federated learning more robust for Industry 4.0 applications. IEEE Trans. Ind. Inform. 2022, 18, 6340–6348. [Google Scholar] [CrossRef]
  32. Jiang, Y.; Zhao, X.; Li, H.; Xue, Y. A Personalized Federated Learning Method Based on Knowledge Distillation and Differential Privacy. Electronics 2024, 13, 3538. [Google Scholar] [CrossRef]
  33. Chen, J.; Guo, S.; Shen, T.; Feng, Y.; Gao, J.; Qiu, X. IncEFL: A sharing incentive mechanism for edge-assisted federated learning in industrial IoT. Digit. Commun. Netw. 2023, in press. [CrossRef]
  34. Xu, B.; Zhao, H.; Cao, H.; Garg, S.; Kaddoum, G.; Hassan, M.M. Edge aggregation placement for semi-decentralized federated learning in Industrial Internet of Things. Future Gener. Comput. Syst. 2024, 150, 160–170. [Google Scholar] [CrossRef]
  35. Luo, G.; Chen, N.; He, J.; Jin, B.; Zhang, Z.; Li, Y. Privacy-preserving clustering federated learning for non-IID data. Future Gener. Comput. Syst. 2024, 154, 384–395. [Google Scholar] [CrossRef]
  36. Shan, Y.; Yao, Y.; Zhou, X.; Zhao, T.; Hu, B.; Wang, L. CFL-IDS: An Effective Clustered Federated Learning Framework for Industrial Internet of Things Intrusion Detection. IEEE Internet Things J. 2023, 11, 10007–10019. [Google Scholar] [CrossRef]
  37. Chamikara, M.A.P.; Bertok, P.; Khalil, I.; Liu, D.; Camtepe, S. Privacy preserving distributed machine learning with federated learning. Comput. Commun. 2021, 171, 112–125. [Google Scholar] [CrossRef]
  38. Gad, A.R.; Nashat, A.A.; Barkat, T.M. Intrusion detection system using machine learning for vehicular ad hoc networks based on ToN-IoT dataset. IEEE Access 2021, 9, 142206–142217. [Google Scholar] [CrossRef]
Figure 1. Proposed method architecture.
Figure 1. Proposed method architecture.
Ai 06 00030 g001
Figure 2. Processing time comparison: (a) Processing time for thermostat dataset. (b) Processing time for weather dataset.
Figure 2. Processing time comparison: (a) Processing time for thermostat dataset. (b) Processing time for weather dataset.
Ai 06 00030 g002
Table 1. Model parameters and values.
Table 1. Model parameters and values.
NoParametersValue
1Batch size32
2Epochs25
3OptimizerAdam
4Loss functionMSE
5Communication round40
Table 2. Prediction performance of the proposed approach and baseline methods for the weather dataset.
Table 2. Prediction performance of the proposed approach and baseline methods for the weather dataset.
ApproachesAccuracyPrecisionRecallF1-Score σ (Accuracy)
Proposed method0.74250.75120.71980.73510.0632
Clustered Fed-Avg0.73510.71250.68700.70010.0650
Vanilla Fed-Avg0.69510.70500.63210.66700.0690
Table 3. Prediction performance of the proposed approach and baseline methods for the thermostat dataset.
Table 3. Prediction performance of the proposed approach and baseline methods for the thermostat dataset.
ApproachesAccuracyPrecisionRecallF1-Score σ (Accuracy)
Proposed method0.79000.81290.70980.75780.0587
Clustered Fed-Avg0.78610.77870.69070.73210.0630
Vanilla Fed-Avg0.74320.72050.70500.71260.0619
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Takele, A.K.; Villányi, B. Resource-Efficient Clustered Federated Learning Framework for Industry 4.0 Edge Devices. AI 2025, 6, 30. https://doi.org/10.3390/ai6020030

AMA Style

Takele AK, Villányi B. Resource-Efficient Clustered Federated Learning Framework for Industry 4.0 Edge Devices. AI. 2025; 6(2):30. https://doi.org/10.3390/ai6020030

Chicago/Turabian Style

Takele, Atallo Kassaw, and Balázs Villányi. 2025. "Resource-Efficient Clustered Federated Learning Framework for Industry 4.0 Edge Devices" AI 6, no. 2: 30. https://doi.org/10.3390/ai6020030

APA Style

Takele, A. K., & Villányi, B. (2025). Resource-Efficient Clustered Federated Learning Framework for Industry 4.0 Edge Devices. AI, 6(2), 30. https://doi.org/10.3390/ai6020030

Article Metrics

Back to TopTop