Genetic Clustered Federated Learning for COVID-19 Detection

: Coronavirus (COVID-19) has caused a global disaster with adverse effects on global health and the economy. Early detection of COVID-19 symptoms will help to reduce the severity of the disease. As a result, establishing a method for the initial recognition of COVID-19 is much needed. Artificial Intelligence (AI) plays a vital role in detection of COVID-19 cases. In the process of COVID-19 detection, AI requires access to patient personal records which are sensitive. The data shared can pose a threat to the privacy of patients. This necessitates a technique that can accurately detect the COVID-19 patients in a privacy preserving manner. Federated Learning (FL) is a promising solution, which can detect the COVID-19 disease at early stages without compromising the sensitive information of the patients. In this paper, we propose a novel hybrid algorithm named genetic clustered FL (Genetic CFL), that groups edge devices based on the hypertuned parameters and modifies the parameters cluster wise genetically. The experimental results proved that the proposed Genetic CFL approach performed better than conventional AI approaches.


Introduction
The COVID-19 outbreak disturbed public health and human life [1]. The spread of COVID-19 [2] is still ongoing, and researchers are trying to find effective ways in early detection of the disease. The aim is to identify and isolate affected people, which results in limiting the spread of COVID-19. AI plays a vital role in detection of COVID-19, allowing researchers to identify it by analyzing symptoms such as throat infection, cold sweats, difficulty in breathing, and also with the assistance of a X-ray [3]. AI, with help of historical data related to COVID-19, can help in predicting and providing essential guidelines to control the spread of the COVID-19 pandemic. The historical data used by AI requires patient records, which are confidential. The patients will hesitate to share their sensitive information as their privacy can be compromised. This creates a scarcity of data required for predictions. A robust model cannot be developed due to this challenge [1]. A novel strategy is required, that allows the development of models that can provide accurate predictions without compromising the patients' personal information.
FL was introduced as an innovative ML approach by Google in 2016 [4]. The goal of FL is to create a ML method consisting of multiple datasets without gathering actual data while preserving confidentiality, privacy, transparency, and security [5,6]. In every iteration of the FL process, local system builds a classifier that uses native information and delivers parameters to a global system without transmitting actual data. FL provides collaborative environment among different healthcare organizations in preparing a COVID-19 prediction framework while maintaining data privacy [7,8]. Researchers used FL to assess COVID-19 disease from computed tomography or X-ray pictures [1,9]. Current FL studies focus on issues related to communication costs and performance issues. In FL, communication costs increase with frequent updates in patient data to the server. FL addresses the privacy and security issues in healthcare sector by allowing data servers to classify their designs locally and distribute each other's models without compromising patient's data privacy [10].
Every iteration of an FL approach consists of client-server communication, native mentoring, and prototype clustering [11][12][13][14]. The transmission of model from the server to all the clients and vice versa can cause communication overhead. Each connectivity session involves implementation issues due to poor data usage, network congestion, and ethical concerns. Modified communication algorithms such as privacy-preserving and communication efficient scheme for federated learning (PCFL) [15,16], minimize the model dimensions and improve security with compression and encryption. The number of edge devices also influence the communication load. The implementation of communication sparsification [17] over clients is modeled to increase the convergence rate and reduce network traffic on the server. The hierarchical clustering [18] approach is also used in many models to summarize related customer strategies and minimize clustering difficulty.
In a heterogeneous environment, not only communication but also AI model training is more challenging [19]. Clients train on the server model using hyper-parameters including client ratio (e.g., choosing 100 clients), epochs per round, batch size, learning rate. Edge devices differ in computational power and data properties, making it difficult to integrate broadly developed client models. The optimization methods such as FedMA and FedAvg [20] are more focused on integrating weights of model parameters. Training and aggregation are both affected by integration rate and training intensity. There are many new techniques for model clustering, such as combining new and existing features [21] or identifying the standard client models [22] to improve the classification. Several studies use various global models such as Federated Cloning-and-Deletion (FedCD) to improve convergent analysis [23].
Researchers have mainly focused on model aggregation to make FL concepts adaptable to non-IID user information [24]. The local training model has a significant impact on determining the model's accuracy. In this study, a novel solution based on genetic algorithm is proposed for hyper-parameter tuning for improved model aggregation in a cluster, as illustrated in Figure 1. The proposed genetic CFL model involve the following steps: • The clients are grouped based on the hyper-parameters thereby increasing the learning efficiency per training unit. • Genetic algorithm is used to tune the hyper-parameters and better model aggregation in a cluster. The genetically optimization FL approach is a novel method for enhancing COVID-19 detection and improve the AI model efficiency and performance. In this study, we create a genetically optimization FL system architecture to detect COVID-19. When compared to basic FL's technique, the proposed technique is more accurate and ensures provacy preservation.
The rest of the article is organized as follows: Section 2 introduces the literature survey. Section 3 provides details of the proposed framework and its methodology. Section 4 discusses the experimental results. The final section of the paper presents conclusion and future research.

Literature Survey
This section presents a survey on the current literature on FL, clustering, and evolutionary algorithms, respectively, in order to understand their limitations.

Federated Learning
Recent studies have focused on FL as a distributed and edge AI architecture [25,26]. In a heterogeneous environment containing non-IID data, FL's decentralized nature directly contradicts traditional AI algorithms which are centralized. Many novel approaches have tried to address the aggregation of non-IID data with various aggregation algorithms, including FedMA [20], feature fusion [21], and grouping of similar client models [22] for better personalized and accurate results. The clustering process makes use of client-model similarity [27] and provides efficient communication to improve data generalization [28].
Clustering will help in optimizing the communication in FL. The convergence of the model may be significantly reduced if there are thousands of nodes in a realistic scenario. Algorithms for partitioning clusters, such as k-means clustering [29], require a predetermined number of clusters which is not feasible. Clusters based on generative adversarial networks and agglomerative hierarchical clustering [18] are examples of nondefinitive clusters.

Evolutionary Algorithms
A model's hyper-parameters selection determine its ability to learn from datasets. Many researchers are working on the optimization of AI models and parameters using evolutionary algorithms [30], such as genetic algorithms [31] and whale optimization [32].
The ensemble models developed using evolutionary algorithms with deep learning techniques have become increasingly popular for optimization tasks [33].
The use of evolutionary algorithms with FL is not yet fully realised. Due to the ambiguity of data, hyper-parameter tuning is even more critical. Optimization algorithms assist with tuning these parameters beyond manual capabilities. Genetic algorithm is used to optimize learning rates and batch sizes for each of the individual end device models. Agrawal et al., in [34] showed that FL is restricted by efficiency of client training, thus involves selecting hyper-parameters effectively, model adjustment, and procedure streamlining. FedTune automatically tunes FL hyper-parameters during model training based on application training preferences [35]. To achieve diverse training preferences, it can be challenging to tune multiple hyper-parameters, particularly when several aspects of the system have to be optimized. FedEx estimates gradients based on client-side hyperparameter distributions in federated settings [36]. This approach uses weight-sharing methods for searching neural architectures. The training process for FL must not only be aimed at high accuracy but also at reducing the training time and resource consumption in practical environments, using low-capacity computing devices [37,38]. FL uses best epoch algorithm to determine how many epochs are necessary per training round. A summary of the key findings from the above discussion can be found in Table 1. Privacy, data bias [21] Feature fusion mechanism The accuracy and generalization abilities of FedFusion outperform baselines while reducing communication rounds by more than 60 percent.
The issue of high communication costs must be addressed immediately.
[22] Iterative Federated Clustering Algorithm (IFCA) The convergence rate of the population loss function under proper initialization ensures both convergences of the training loss and generalization to test data simultaneously.
Data heterogeneity is to be addressed. [27] Multi-center aggregation mechanism The proposed objective function is optimized using the Federated Stochastic Expectation Maximization method (FeSEM).
Data heterogeneity is to be addressed.
[31] Genetic algorithms Convolutional neural networks can be efficiently tuned by using a variable-length genetic algorithm.
In the case of networks with fewer layers, the size could be too small for the problem, resulting in underfitting.
[32] Whale optimization algorithm (WOA) For training multilayer perceptrons (MLP), the WOA was applied because of its high local optimization avoidance and fast convergence speed.
Slow convergence speed and local optima stagnation are the main disadvantages of conventional training algorithms.
[39] Clustered federated learning It proposes a collaborative learning framework to intelligently process visual data at the edge device by developing a multi-modal ML algorithm that is capable of diagnosing COVID-19 in both X-ray and Ultrasound images.
The major challenge here is regarding the performance of CFL when the number of samples per client varies.
[13] Clustered federated learning It addresses the issue of suboptimal results when the local clients' data distributions diverge by separating the client population into different groups based on the pairwise cosine similarities.
The main cluster is separated from some suspicious clients after a few rounds, which poses a major challenge.

This paper Genetic CFL algorithm
Genetic algorithms are used to optimize the hyper-parameters such as batch size, and learning rate of the clustered FL models.
Client training has a significant impact on FL efficiency.

Proposed Methodology
This section describes genetic CFL optimization technique using a comprehensive statistical method. There are two sections in the workflow, the first round of broadcasting is represented by Algorithm 1, which tells about the number of clusters and the federated training which uses genetic algorithm is represented by Algorithm 2. This section mainly describes the differential behavior of the algorithm with various hyper parameters that includes number of iterations, client ratio(n), minimum samples, batch size, and learning rate (η). The following Table 2 highlights most of the symbols used during the algorithm. for k ← 2 to size(η N ) do 9: parent A , parent B ← sampling(0, size(η N )) 10: 11: return η temp 12: function EVOLVE(losses N , η N ) 13: losses N , order ← sort(losses N ) 14: η N ← sort(η N , order) 15: return CROSSOVER(η N ) 16: procedure TRAIN 17: len ← size(clusters) 18: ind ← 0 to len 19: Assign η global with structure (len, size(clusters[ind]) 20: clusters unique = Identical(cluster) 21: for i ← 0 to iterations do 22: for k ← 0 to dimensions(clusters) do 23  Weights of models w 0 n Weights of models of nth customer

Dataset Description
The dataset used in this work is taken from the kaggle repository https://www.kaggle. com/datasets/mykeysid10/covid19-dataset-for-year-2020?select=covid_data_2020-2021.csv (accessed on 26 July 2022). There are 10 attributes, among which the attribute Corona result indicates whether a person has a positive or negative Corona result. Table 3 displays an overview of the information of each column that is used in our implementation. The objective of Algorithm 1 is to identify the attribute values of an edge device distinctly without violating its security. The server model(w 0 ) is broadcasted to N all the clients, C ⊆ {C 0 , C 1 , ..., C tot }. Along with the distributed server models, three different learning rates η are also broadcasted. The learning rates are selected from the array (η m ), which ranges from [1e − 1, 1e − 5]. The sample size is also chosen randomly and a more number of samples can also improve the training accuracy. Every edge device is offered w 0 , which is duplicated for all η values and supervised independently for a full iteration. Some data features such as complexity, size, ambiguity, and variance are unique to edge devices. These data features will effect the training and thus the hyper-parameters η are selected carefully. Out of the three models at the edge device only one model with least loss w 0 min is selected. Every edge device will return w 0 min , η min , and losses min . These statistics are important because of their ability to represent data on respective edge devices.
The models {w 0 0 , w 0 1 , . . . , w 0 n }, learning rates {η 0 , η 1 , . . . , η n } and their respective losses are obtained at the server. The server model is developed by integrating edge device models using the model aggregation technique. The model weights (w 0 n ) are added iteratively as follows: The output of the aggregation is divided by the number of clients, and the equation is as follows: In phase 2, we assign every edge device, a cluster-ID, as demonstrated by Algorithm 2. This algorithm component has been restricted by the algorithm's main control loop, which continues for i iterations. Each i-th cycle,
clients receive optimized hyper-parameters for cluster-based servers; 3. every client is prepared using a set of parameters; 4.
combining client models produces the most effective server model.
Each cluster contains a unique range of hyper-parameters personalized towards the edge devices which are a part of it. Training initiates the development of such aggregated parameters every ith iteration. During genetic optimization, hyper-parameters interact with the ideal range for each iteration. Every iteration changes the contents of η global [k], which stores the learning rates for every cluster. The shape of the data is < C, size(C i ) >, where C shows the set of clusters, C i shows the ith cluster and size(C i ) shows the wide range of edge devices in every ith cluster. The losses of a cluster with the shape m 0 are used to sort the hyper-parameters.
After sorting, we achieve different users across crossover and mutation. The ideal performers continue to pass on their genetic mutations to the next generation, while others have developed by mating with previous transmission users as The updated learning rates η new determine whether effectively or partially through natural selection. The sum of η obtained from previous generations can differ slightly. The modified parameters are derived from (5): There are two locations in which P A , P B ∈ [0, 9] and f ∈ [−1, 1]. All devices are configured according to their specific cluster hyper-parameters after genetic evolution. A training process involves repeating model aggregation, genetic optimization, and training for i − 1 iterations before a new epoch is achieved.
Artificial Neural Networks (ANN) were used in this work for classification of the data in each cluster. In this work, we used an ANN with three layers, input layer, a hidden layer, and the output layer. The hyper-parameters used in the ANN are as follows: activation function used in the hidden layer is relu, whereas sigmoid activation function is used in the output layer. The optimization function used is adam.

Results and Discussion
The objective of this section is to provide an overview of the experiments which were conducted for the evaluation and assessment of the genetic CFL architecture. Section 4.1 examines how genetic CFL architecture performs on the COVID dataset and how they compare with generic FL architecture. The genetic CFL architecture's performance analysis is discussed in Section 4.2.

COVID Dataset Performance for Genetic CFL Architecture
This subsection discusses the models' training and performance evaluation. COVID-19 dataset samples are first used to train the server model. In turn, clients are allocated the model based on their client ratio. For this experiment, 100 clients are randomly selected, and three client ratios are tested: 0.1, 0.15, and 0.3. Models' performance is generally evaluated using 10, 15, and 30 clients, respectively. Observations are chosen at random for each client device in the dataset. The purpose is to ensure that its observations are non-IID and replicate the key features of an actual situation. Section III discusses how the hyper-parameters are genetically modified after two training epochs. Tables 4 and 5 shows all such iterations, and Figure 2 plots the most successful performance against each round.  As the training hyper-parameters cannot be determined earlier, the training and performance of the model are locally optimized, and the training is said to be more personalized [24,40]. Server models learn smoothly and converge faster than typical FL models after training. Tables 4 and 5 and Figure 2 illustrate the performance of the models for both the architectures in terms of accuracy, loss, precision, recall, and F1-Score. Each iteration shows how genetic CFL outperforms generic FL. The accuracy and loss are higher and lower in genetic CFL architecture than in the generic FL architecture. The accuracy and loss indicates that useful information is aggregated at the server. The loss value is used to train the ANN. However, accuracy or other metrics such as precision, recall, and F1-Score are also used to assess the training outcome. Table 6 depicts the training accuracy, training loss, validation accuracy and validation loss of the proposed genetic CFL algorithm on the COVID-19 dataset. From the table, it can be observed that in the first round, the maximum training and validation accuracy and minimum training and validation loss is attained after 1st epoch. After 1st epoch, the performance is reduced, indicating that genetic CFL algorithm has encountered overfitting problem. Similar performance can be noted in rounds 2, 3, 4 and 5. Hence, we can conclude that, in order to reduce the training time and the resource consumption (CPU and memory consumption) in the proposed genetic CFL approach, 1 epoch is sufficient for all the rounds.

Genetic CFL Performance Analysis
The Genetic CFL method appears to perform better with a larger data set. As a result, improving the quality of test results for every sample might help overall hyperparameter optimization. Considering several datasets that differ in data characteristics and data points, an ideal grouping of similar scenarios leads to higher model accuracy. There needs to be a balance between cluster size and cluster number. In the given instance, a perfect combination could verify that the performance of such methods as in decentralized design produces better results. In a practical application, the predicted number of edge devices is more than in an artificial environment. Increasing the number of clients results in improved performance. Genetic CFL optimizes hyper-parameters to increase throughput for a relatively small set of optimization iterations.
The proposed genetic CFL architecture performance is better than regular CFL architecture while using fewer iterations. According to COVID data, our architecture is more efficient and iterative in clustering. It provides that the proposed genetic CFL is flexible and adjustable method for optimizing hyper-parameters. The proposed architecture has the advantage of adaptability over other methods by allowing it to be tailored to meet the dataset and the necessary situation. The majority of other architectures require a lot of manual effort to adjust hyper-parameters. It resets the mechanism and loads a new set of parameters for data analysis and applications. There are both time and resource costs associated with the conventional mechanism. Furthermore, each client is tested separately, which affects server and client performance. The proposed genetic CFL model ensures service delivery for all client devices while increasing the server model's performance.

Conclusions and Future Directions
In this paper, we used the genetic algorithm to optimize the rate at which hyperparameters are learned and the batch size for clustering through FL. To evaluate the performance of the proposed genetic CFL algorithm, we used the COVID-19 dataset. In addition, we discussed the best deployment conditions and limitations of the algorithm. In future, we would like to test the genetic CFL model on scalable and real-time datasets. The refinement of model parameters becomes accurate when the sample size increases, resulting in higher performance in the real-time scenario. We would also like to test the proposed model on several applications such as recommendation systems, image classification, and natural language processing. Furthermore, time-sensitive techniques could be combined with genetic CFL.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: