Data Quality-Aware Client Selection in Heterogeneous Federated Learning

Shinan Song; Yaxin Li; Jin Wan; Xianghua Fu; Jingyan Jiang

doi:10.3390/math12203229

,

and

¹

School of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, China

²

College of Bigdata and Internet, Shenzhen Technology University, Shenzhen 518118, China

³

School of Electronic Information, Sichuan University, Chengdu 610017, China

^*

Author to whom correspondence should be addressed.

Mathematics2024, 12(20), 3229;https://doi.org/10.3390/math12203229

This article belongs to the Special Issue New Trends in Computer Vision, Deep Learning and Artificial Intelligence

Version Notes

Order Reprints

Abstract

Federated Learning (FL) enables decentralized data utilization while maintaining edge user privacy, but it faces challenges due to statistical heterogeneity. Existing approaches address client drift and data heterogeneity issues. However, real-world settings often involve low-quality data with noisy features, such as covariate drift or adversarial samples, which are usually ignored. Noisy samples significantly impact the global model’s accuracy and convergence rate. Assessing data quality and selectively aggregating updates from high-quality clients is crucial, but dynamically perceiving data quality without additional computations or data exchanges is challenging. In this paper, we introduce the FedDQA (Federated learning via Data Quality-Aware) (FedDQA) framework. We discover increased data noise leads to slower loss reduction during local model training. We propose a loss sharpness-based Data-Quality-Awareness (DQA) metric to differentiate between high-quality and low-quality data. Based on the DQA, we design a client selection algorithm that strategically selects participant clients to reduce the negative impact of noisy clients. Experiment results indicate that FedDQA significantly outperforms the baselines. Notably, it achieves up to a 4% increase in global model accuracy and demonstrates faster convergence rates.

Keywords:

heterogeneous federated learning; data quality; loss sharpness; noisy data space

MSC:

68T07

1. Introduction

Federated Learning (FL) is a distributed machine learning approach that enables the utilization of decentralized data while maintaining user privacy in the nearby edge. Conventional FL downloads share global models to local clients, perform training on these models with the client’s unique data, upload the parameter updates, and then synthesize these updates server-side to refine the global model. Despite its promise, FL encounters significant obstacles stemming from statistical heterogeneity. This diversity in local datasets can precipitate skewed learning processes, where the resulting global model may exhibit suboptimal performance when generalized across the spectrum of client data distributions [].

To tackle the challenges posed by statistical heterogeneity in federated learning, current approaches concentrate on issues of client drift and data heterogeneity, a consequence of the non-independent and identically distributed (non-IID) nature and imbalance of data across clients, including (1) creating a standard data sample set or using generative models to generate common data [,,,,,,,]; (2) training personalized local models tailored to the individual objectives of clients [,,]; and (3) using external variables or partial training to bridge the differences between local and global models [,,,].

Despite the effectiveness of these existing methods in handling data heterogeneity, in real-world settings, the data can be unexpectedly diverse, containing low-quality data and extraneous elements outside the global model’s scope, which we refer to as noisy samples (Figure 1). Some studies have acknowledged the challenges posed by low-quality or noisy data, primarily focusing on label noise, such as [,,,]. In contrast, the issue of feature noise receives less consideration. For instance, in federated learning, malicious nodes may introduce adversarial examples detrimental to the training process, or some clients face covariate shifts due to environmental factors. As shown in Figure 1, the presence of noisy nodes not only substantially diminishes the accuracy of the global model (

61 % \to 58 %

) but also notably slows down its convergence rate (FedDQA, by selectively aggregating cleaner nodes, achieved a

60 %

accuracy in just 10 rounds, whereas a random node selection strategy, which included a more significant number of noisy nodes, failed to reach

60 %

accuracy even after 50 rounds).

Figure 1. Under heterogeneous federation learning, the correctness of noisy datasets with different weights can vary significantly. This example demonstrates the simple federation learning process with a noisy data space. The experimental results show that when the proportion of noise in the client model is small, the accuracy is

61 %

(FedDQA); however, when a larger proportion of noise is mixed in, the accuracy decreases to

58 %

(Random), indicating that the data noise has a large impact on the model and reduces the aggregation performance.

Mitigating the impact of data heterogeneity caused by noisy datasets is crucial in federated learning. Assessing data quality and selectively aggregating updates from high-quality clients offers an intuitive solution to avoid the detrimental effects of noisy data on the global model. A straightforward approach to assessing client data quality is to evaluate each client’s local model using a clean, public dataset. However, this method requires a labeled dataset and consumes additional computational and bandwidth resources. Therefore, dynamically perceiving the quality of each client’s local data during the training process without introducing additional computations or data exchanges and selectively choosing high-quality participating clients is a highly challenging task.

In this paper, we introduce a Federatedlearning via Data Quality-Aware (FedDQA) framework. Through experimental analysis, we investigated the impact of local models on the global model in heterogeneous federated learning. A crucial discovery reveals that increased data noise leads to a slower loss reduction rate during the training process of local neural network models. The fluctuation of the loss function reflects the feature skew from the noisy data, which is formalized as loss sharpness. Based on this insight, we proposed a loss sharpness-based metric to differentiate between high-quality and low-quality data, called Data-Quality-Awareness (DQA) metrics. By strategically identifying the data noise level of the clients according to their DQA, such as light-noise, mixed-noise, and heavy-noise, the server utilizes data of superior quality for reducing the negative impact of noisy clients on the global model. In general, our approach makes the following contributions to the research on heterogeneous federated learning:

First, to the best of our knowledge, we are the first to investigate the feature noisy issue in heterogeneity federated learning. Through in-depth analysis, we discovered an inherent relationship between the loss values of local models and the level of data noise. Based on this insight, we designed a novel loss sharpness metric that dynamically perceives data quality during training.

Second, our proposed FedDQA framework senses the client’s data quality through the loss sharpness metric and strategically selects the participant client. The method can obtain perceptual results with fewer training rounds at the early stage of training and does not require the construction of any public datasets or data exchange.

Third, we conduct experiments on multiple datasets to validate the effectiveness of FedDQA. The results demonstrate that FedDQA achieves superior average accuracy by up to

4 %

more than the baselines.

2. Related Works

2.1. Federated with Data Heterogeneity

Federated Learning (FL) has long been grappling with the issue of data heterogeneity, particularly in the context of non-independently and identically distributed (non-i.i.d) data. The prevalent aggregation technique, FedAVG [], has demonstrated a performance degradation when applied to heterogeneous data across distinct local clients.

Investigations into non-i.i.d issues have primarily revolved around the skewness in label distribution, where non-i.i.d datasets are established by segregating an existing dataset based on labels []. Notable advancements have been made in addressing these limitations. For instance, FedProx [] addressed data heterogeneity by allowing partial information aggregation and supplementing FedAVG with a proximal term. Additionally, FedMA [] introduced an aggregation strategy for non-i.i.d data that disseminates the global model layer-wise. However, while numerous [,,], these techniques often overlook the performance across diverse domains, focusing predominantly on developing an internal model.

Nevertheless, there has been a scarcity of solutions considering non-i.i.d issues stemming from feature shift, a common scenario in medical data collection from different instruments and natural images gathered in varying noisy environments. FedRobust [] and SiloBN [] represent recent attempts at addressing this problem, with the former assuming an affine distribution shift and the latter demonstrating improved robustness to data heterogeneity through local clients maintaining some untrainable batch normalization (BN) parameters. However, both approaches have their shortcomings. FedRobust’s effectiveness is hampered when the explicit affine transformation cannot be estimated. SiloBN, despite its empirical success, lacks a theoretical foundation. An alternative approach, FedBN [], opts for strictly local retention of all BN parameters.

Recent studies have also explored unsupervised domain adaptation for target domain [,] and domain generalization on unseen domains []. However, these methods typically require exhaustive data collection in the target domains and make optimistic assumptions about performance on unknown domains.

2.2. Federated Client Selection

The objective of client selection in federated learning is to identify the most suitable clients for participation in each iterative cycle of the learning procedure. This crucial step is designed to effectively address and mitigate various challenges arising from heterogeneous conditions in the federated learning environment. Several client selection strategies have been developed in the literature to counteract the bias inherently introduced by non-IID.

One such approach is the experience-driven control framework, Favor [], which innovatively recasts the problem of device selection for federated learning as a deep reinforcement learning problem. The aim is to cultivate an agent capable of learning an optimal selection policy. This methodology allows for actively selecting the most suitable subset of clients for participation in federated learning iterations, thereby improving model performance. In situations where the client data distribution is uneven, class imbalance can arise. Yang et al. [] address this issue by developing an estimation scheme to discern the client class distribution. This is achieved by examining the gradient of client service updates while excluding the original data. In tandem with this, they designed a client selection algorithm geared towards minimum class imbalance, thereby enhancing global model convergence. FedCor [] put forth a correlation-based client selection strategy. By utilizing a Gaussian process to model loss changes of clients, they select one client in each iteration to minimize the overall loss expectation. RIPFL [] is an alternative perspective on client selection for collaborative training. The authors argue that beyond considering the individual characteristics of clients, it is vitally important to examine the synergy between clients. This holistic view can further optimize the collaboration among selected clients, enhancing the overall federated learning process.

3. Notations and Preliminaries

This section defines the data domain and model for heterogeneous federated learning.

3.1. General FL Framework

In the vanilla federated learning, the global objective across N clients is to obtain a global model

ω^{*}

training over the global dataset

D = ⋃_{i \in [N]} D_{i}

:

ω^{*} = arg min_{w} L (ω) = \sum_{i = 1}^{N} \frac{|D_{i}|}{| D |} L_{i} (ω) = \sum_{i = 1}^{N} p_{i} L_{i} (ω)

(1)

where

ω

is the parameter of the global model,

L (ω)

is the empirical loss on the global dataset

D

, each client i corresponds to a private training data

D_{i}

and a local model

ω_{i}

. A sample-label pair is denoted as

(x, y)

where

x

is the raw input data point and y is its label.

|D_{i}|

is the number of data samples on

D

,

L_{i} (ω) = E_{(x, y) \in D_{i}} l (ω; (x, y))

is the local empirical loss on the dataset

D_{i}

of client i.

In light of privacy concerns and communication restrictions prevalent in the federated learning (FL) paradigm, algorithms often incorporate the presumption of partial client participation and execute localized model updates. Specifically, during the communication round t, a subset of clients, denoted as

K_{t}

, is selected from the total client set N. The size of this subset,

| K_{t} | = C

, is less than or equal to the total number of clients N.

Only the clients within

K_{t}

are chosen to receive the global model weights

ω_{t}

and independently conduct training iterations on their local datasets. Upon completion of local training, the server retrieves the updated models from the selected clients. It aggregates them, typically utilizing a weighted averaging approach [], to derive the updated global model

ω_{t + 1}

.

This procedure can be formally articulated as follows:

\begin{matrix} ω_{i}^{t + 1} & = ω^{t} - η_{t} \tilde{\nabla} l_{i} (ω^{t}) \end{matrix}

(2)

\begin{matrix} ω^{t + 1} & = \frac{1}{C} \sum_{i \in K_{t}} ω_{i}^{t + 1} \\ = ω^{t} - \frac{η_{t}}{C} \sum_{i \in K_{t}} \tilde{\nabla} l_{i} (ω^{t}), \end{matrix}

(3)

Here,

η_{t}

represents the learning rate, and

\tilde{\nabla} l_{i} (ω^{t})

is interpreted as the equivalent cumulative gradient [] in the t-th communication round. More precisely, for an arbitrary optimizer operating on client i, it generates

Δ w_{i}^{t, e} = - η d_{i}^{t, e}

as the local model update at the e-th epoch within the specific round. The cumulative gradient is subsequently computed as

\tilde{\nabla} l_{i} (w^{t}) = \sum_{e} d_{i}^{t, e}

.

In the pursuit of preserving data privacy and preventing potential data leakage, every client must maintain the confidentiality of its raw data, refraining from sharing it with other clients. The Federated Averaging (FedAVG) algorithm, is a strategy proposed to facilitate cooperative global model training across multiple clients, orchestrated by a central server, while ensuring rigorous data privacy measures [].

3.2. Heterogeneous Federated Learning

Data Heterogeneity: the data distribution of client i is denoted as

P_{i}

. For any client i and client j, the data heterogeneity refers to the

P_{i} (x) \neq P_{j} (x)

while

P_{i} (y | x) = P_{j} (y | x)

.

Recent studies have demonstrated the existence of a divergence between the local model trained by each client on its dataset and the global model derived directly from the aggregated dataset [,]. If this inter-client drift is not addressed, the server is prone to producing a skewed aggregation of the global model. Federated learning faces the challenge of accommodating heterogeneous data distributions among clients. FedAVG suffers significantly degraded performance when facing non-independent and identically distributed (non-IID) data in federated settings. This underscores that neglecting local drift can lead the global model astray from the optimal solution.

Figure 2 provides a toy example where local divergence arising from disparate local data partitions can bias the aggregated outcome of FedAVG. We assume there is a non-linear transformation function f, for instance,

s i g m o i d

activation layer in the model. There are three clients with local parameters named

θ_{1}, θ_{2}, θ_{3}

.

ω_{c}

is the optimal model parameter,

ω_{12}

(

ω_{23}

) is the global model parameters aggregated by

θ_{1}

and

θ_{2}

(

θ_{2}

and

θ_{3}

) using FedAVG. x is the input data point, and the outputs on client 1 are

y_{1} = f (θ_{1}, x)

and

y_{2} = f (θ_{2} x)

on client 2. Although

f (ω_{12}, x) \neq \frac{y_{1} + y_{2}}{2} \neq f (ω_{c}, x)

, the figure illustrates that compared to

ω_{23}

,

ω_{12}

is closer to the optimal global model parameter.

Figure 2. (a) Illustration of the local drift in FedAVG with a sigmoid activation function f.

ω_{c}

is the parameter of the model trained with centralized data (optimal model);

ω_{12}

and

ω_{23}

are the parameters of the model generated by FedAVG.

θ_{1}

,

θ_{2}

, and

θ_{3}

are the parameters of local models of client 1, client 2, and client 3, respectively. (b) In the case of natural data distribution, more clients are aggregated in each round, obtaining better generalization and, eventually, higher accuracy. For a noisy data environment, aggregating too many clients means aggregating to those clients that are heavily noisy, which results in lower accuracy instead.

Therefore, we can infer that the existence of data heterogeneity among clients can result in the deviation of aggregated global parameters from the optimal ones. However, the effect varies depending on the client selection strategy employed, indicating that different strategies may lead to different outcomes. For example, choosing

ω_{12}

may converge faster than

ω_{23}

.

We choose the Emnist dataset [] and the FedAVG method for a simple experiment to verify the impact of noisy clients in the federated aggregation. We split the training dataset into 20 parts in the form of independent homogeneous distributions. In the natural setting, clients do not undergo any additional processing. For the noisy setting, different levels of Gaussian noise

Δ

are added to each copy of the data and assigned to different clients i, totaling 20 clients. Specifically,

Δ \sim N (μ = 0, σ^{2})

, when

σ \sim i \times \frac{1}{C}, i \in (1, | C |)

.

Suppose we know in advance how noisy the clients are in this experiment. Therefore, when performing federated learning, we adopt a different client selection strategy, where Top

k %

denotes the selection of the top

k %

of client IDs in each round of aggregation, and Random

k %

denotes the random selection; the experimental results are shown in Figure 2b. According to the experimental results, it can be seen that in the case of nature’s data distribution, the more clients are aggregated in each round, the better the generalization and eventually higher accuracy, while randomly selecting

50 %

clients for aggregation in each round does not make much difference to the effect from the effect of aggregating the top

90 %

. However, for a noisy data environment, aggregating too many clients means aggregating to those heavily noisy clients, which results in lower accuracy. Random aggregation still cannot avoid aggregating to heavily noisy clients.

4. Loss Sharpness in Federated Learning

Suppose there exist data

D_{i}

, a model inference process

f (ω_{i}, x_{i}) = y_{i}^{'}

for client i, where the model weights

ω_{i}

come from the supervised training process:

T (D_{i}) \to ω_{i}

, then for this training process, there is a corresponding loss function:

l (ω_{i}; (x_{i}, y_{i})) \in R^{+}

, abbreviated as

l (ω_{i}, D_{i})

.

We let

D_{i}

be the IID testing dataset, and there is an inference accuracy

A C C (ω_{i}, D_{i})

:

A C C (ω_{i}, D_{i}) = \frac{| {(x_{j}, y_{j}) \in D_{i} | f (ω, x_{j}) = y_{j}} |}{n}

(4)

Assuming that there is noise

Δ = {δ_{j}}_{j = 1}^{n}, δ_{j} \sim R^{+}

, then the data

D_{i}

with noise can be expressed as

D_{i}^{'}

:

D_{i}^{'} = D_{i} + Δ = {(x + δ), y^{'}}

(5)

For noisy data, there is a neural network model training process

T (D_{i}^{'}) \to ω_{i}^{'}

and a corresponding inference process

f (ω_{i}^{'}, x^{'}) = y^{″}

.

As a supervised machine learning training process, the model parameters obtained from the training process are an abstraction of the training data space. An intuitive understanding is that the closer the feature distribution of the test dataset is to that of the training dataset, the higher the inference accuracy of the model. Conversely, a large amount of out-of-distribution data in the test data poses a challenge to model inference. However, the noisy data change the data feature distribution of the original dataset to a certain extent, i.e., feature skew occurs for the noisy data. For the original data

x

, the inference results of the model

f (ω_{i}, x)

, which is trained with the same distribution of the data, are better than the model

f (ω_{i}^{'}, x)

, which is trained with the noisy data, i.e.,

A C C (ω_{i}^{'}, \bar{D_{i}})

is better than the model

f (ω_{i}^{'}, x)

trained with the noisy data, i.e.,

A C C (ω_{i}, \bar{D_{i}}) > A C C (ω_{i}^{'}, \bar{D_{i}})

. Similarly, for the model

f (ω, x)

, inference using data x with the same distribution yields better inference results than model inference

f (ω_{i}, x^{'})

using noisy data

x^{'}

, i.e.,

A C C (ω_{i}, \bar{D_{i}}) > A C C (ω_{i}, {\bar{D_{i}}}^{'})

. To obtain better inference results in the natural data space, federated learning tends to select models obtained from clients’ training of data whose feature skew is less deviated.

The accuracy of the model inference is affected due to the noise that makes the original data feature skew. Since there is some positive correlation between test accuracy and loss function value, such a feature skew may also be reflected in the matter and volatility of the loss function. We verify such a phenomenon using a simple experiment, as shown in Figure 3. In the figure, the loss function values of different noisy data show other distribution characteristics during the training process compared with natural data.

Figure 3. The figure gives the loss values for each epoch during the training process (data source: CIFAR-10 and CIFAR-10-C). Even when using neural networks with the same structure, data with different types of noises (or nature) show other distributions and trends in the loss values during the training process.

In the following, we use formal definitions to describe this phenomenon. First, the loss function for one training epoch can be denoted as follows:

L_{i} (ω_{i}, D_{i}) = E_{(x, y) \in D_{i}} l (ω; (x, y))

(6)

For the noisy data and the corresponding model, the value of the loss function is denoted as follows:

L_{i} (ω_{i}^{'}, D_{i}^{'}) = E_{(x, y) \in D_{i}} l (ω; (x + δ^{'}, y))

(7)

Assuming here that the loss function l measures the difference between the inference result and the actual value, then, when

f (ω_{i}, x) = y

, there is

l (f (ω, x), y) \to 0

.

L_{i} (ω_{i}, D_{i}) = \sim \frac{1}{A C C (ω_{i}, {\bar{D}}_{i})}

(8)

Then, for the training process of the model, there is

L_{i} (ω_{i}^{'}, D_{i}^{'}) > L_{i} (ω_{i}^{'}, D_{i}) > L_{i} (ω_{i}, D_{i})

(9)

Let e epochs be trained at a time. The loss value of training is written as follows:

L_{i} = {L_{i}^{1}, L_{i}^{2}, \dots, L_{i}^{e}}

(10)

We compute the mean value of the loss function variation process L as follows:

M e a n (L_{i}) = \frac{1}{e} \sum_{j = 1}^{e} L_{i}^{j}

(11)

The standard deviation is

S t d (L_{i}) = \sqrt{\frac{1}{e} \sum_{j = 1}^{e} {(L_{i}^{j} - M e a n (L_{i}))}^{2}}

(12)

When noise is added uniformly to all data, it is more difficult to perform accurate inference with noisy data than raw data. As a result, noisy data with lower inference accuracy may have a higher loss function mean. At the same time, the noisy data are less cohesive, which may lead to a slower model optimization process, i.e., having a lower loss function standard deviation. We describe this phenomenon using the coefficient of variation of loss, i.e., the sensation of loss sharpness:

\begin{matrix} C o V (L_{i} (ω_{i}, D_{i})) & = \frac{S t d (L_{i} (ω_{i}, D_{i}))}{M e a n (L_{i} (ω_{i}, D_{i}))} \\ > \frac{S t d (L_{i} (ω_{i}^{'}, D_{i}^{'}))}{M e a n (L_{i} (ω_{i}^{'}, D_{i}^{'}))} \\ = C o V (L_{i} (ω_{i}^{'}, D_{i}^{'})) \end{matrix}

(13)

Loss sharpness is a metric used to assess the quality of data on each client in federated learning by analyzing the variability of the loss function during local training. It is based on the observation that the rate and variability at which the loss decreases during training can indicate the presence of noise in the data. In federated learning, data quality is assessed across clients without requiring direct access to their datasets. It leverages the natural training dynamics of the model. High loss sharpness signifies that the model is effectively learning from clean, informative data, as evidenced by significant fluctuations and reductions in loss during training. However, low loss sharpness indicates potential issues with the data, such as noise or corruption, causing slow and steady loss reduction without substantial learning progress.

By incorporating loss sharpness into the client selection process, the FedDQA framework enhances the federated learning process by improving global model quality and prioritizing client updates with high-quality data, leading to a more accurate and generalizable global model. Furthermore, focusing on clients contributing valuable learning signals helps the global model converge more quickly. The metric relies solely on loss values available during standard training, avoiding the need for additional data exchange or computations.

5. FedDQA

The nature of the loss sharpness is only significant at the beginning of training. As updating proceeds, the coefficient of the loss is likely to gradually converge to a smooth state due to over-fitting, regardless of the data quality. In addition, for the federated learning process, the loss may re-emerge with large fluctuations after aggregation with the global model because of individualized differences. To better perceive the changes in the loss distribution at the beginning of training, we set the slow-start loss threshold

ϵ > 0

. At the beginning of the updating, the federation system waits for the loss function value of all clients to be less than

ϵ

to start model aggregation and federation learning. To avoid individual clients failing to reach the threshold for a long time, it is also agreed that federation learning should be started automatically after a specific time (training epochs). Then, for client k, there is a sequence of loss values when the threshold is reached:

L_{i}^{*} = {L_{i}^{1}, \dots, L_{i}^{λ} | L_{i}^{λ} \geq ϵ, L_{i}^{λ + 1} < ϵ}

(14)

At the same time, we calculate the number of epochs

λ

at which the loss reaches the threshold

ϵ

and the constants

μ_{1}, μ_{2}

as additional constraints to amplify the volatility of the change in the loss to obtain the final DQA formula:

Q_{i} = (\frac{C o V (L_{i}^{*})}{λ})^{μ_{1}} \times μ_{2}

(15)

In the federated learning process, client selection relates to the quality of the global model and affects the synchronization mechanism and network transmission. Due to network instability, client dropouts, etc., assuming all clients are synchronized for aggregation is unrealistic. Usually, some federated frameworks take a randomly selected set of clients for aggregation or randomly arrange some clients to drop out. When network resources are limited, choosing a limited number of high-quality clients also helps to save transmission resources and improve training efficiency. Here, we give the method of selection of clients based on the variation in loss sharpness. Suppose a set of clients

K = {c_{i}}_{i = 1}^{n}

exists. Then, the group of loss sharpness of

K

in the tth updating can be obtained according to:

Q_{K} = {Q_{i}}_{i = 1}^{n}

(16)

where

Q_{i}

is the DQA of client

c_{i}

. ordering

Q_{K}

yields:

Q_{K} = < Q_{u}, Q_{v}, \dots >, Q_{u} \geq Q_{v}, u \in [1, n], v \in [1, n]

(17)

The server aggregates the clients based on the

Q_{K}

order. In contrast, the number of clients participating in each round of aggregation can be dynamically decided in conjunction with the network environment.

The

Q_{i}^{t}

computation occurs on the server side to save the client’s computational resources and standardize the metric. Clients only need to send the server the first round (

t = 1

) of DQA value

Q_{i}^{1}

. This subsection describes a client selection process based on a dynamic threshold to make the client selection more flexible, as shown in Figure 4. The process is motivated by the consideration that not all low-quality clients are harmful. In general, mild noise does not harm the model’s knowledge acquisition and even helps increase the model’s generalization. Only those heavy noises have the potential to poison the model and impair the reasoning of the aggregated model. Therefore, we first sort

Q_{K}

according to quality perception in the client selection. Different selection probabilities

P_{i}

are assigned to clients with other degrees or types of noise, e.g., by posting a higher selection probability to clients with slight noise; such clients have a higher likelihood of being selected in the aggregation process. In addition, the computationally obtained data quality perception

Q_{i}

will also be used as a weighting parameter in the computational process of model aggregation.

Figure 4. Overview of FedDQA. We set the slow-start loss threshold

ϵ

in the local update stage. A set of loss values

L

occurs when the threshold is reached. We calculate the number of epochs

λ

at which the loss reaches the threshold

ϵ

. The volatility of the change in the loss to obtain the final DQA formula consists of the number of epochs

λ

and the loss sharpness value. In the server aggregation stage, the server classifies clients into light-noise, mixed-noise, and heavy-noise according to its DQA

Q_{k}

, calculated by the loss sharpness metric. Then, it selects client k for model aggregation.

According to the above conventions, FedDQA can be described as Figure 4.

For local updating

F^{L}

, there are:

L1: Local training. Train local model $ω_{k}$ by $D_{i}$ .
L2: Data quality-aware with loss sharpness. Perceive client data quality during training and obtain the quality score $Q_{i}$ .

Remote updating

F^{R}

, involves the following:

R1: Client selection. Client selection using quality score $Q_{C} \to P_{C}$ to obtain a subset of users $K$ .
R2: Model aggregation. A global model is aggregated based on client subset $K$ .
R3: Global model broadcast. Distribute the global model parameters obtained from aggregation to clients for the next round of updating.

6. Performance Evaluation

6.1. Evaluation Setup

6.1.1. System Settings and Datasets

This section uses simulation experiments to validate the application of FedDQA based on loss sharpness-aware in noisy data space. First, for the system model,

n = 20

clients are used to participate in federated learning. A total of

\frac{n}{2}

clients participate in model aggregation for each round of updating. Simultaneous updating mode is used, with 5 local and 50 global updating rounds. The datasets used are the grayscale image datasets Emnist (26 classes) [] and Fashion (10 classes) [], the color image dataset CIFAR-10 (10 classes) [], and the noisy color image dataset CIFAR-10-C (10 classes) []. Among them, CIFAR-10-C contains 19 different kinds of noise data, namely: ‘gaussian_noise’, ‘shot_noise’, ‘speckle_noise’, ‘impulse_noise’, ‘defocus_blur’, ‘gaussian_blur’, ‘motion_blur’, ‘zoom_blur’, ‘snow’, ‘fog’, ‘brightness’, ‘contrast’, ‘elastic_transform’, ‘pixelate’, ‘jpeg_compression’, ‘spatter’, ‘saturate’, ‘frost’, ‘glass_blur’. Each noise corresponds to 1 to 5 noise levels. These datasets have many applications in federated learning and data noise studies [,].

For grayscale images, a simple 2-layer fully connected model is used. For color images, a 2-layer CNN model is used. These simple models are more suitable to be deployed in smart IoT devices or mobile devices. The learning rate is

0.1

, and the batch size is 10. Slow-start selection parameters for the FedDQA algorithm are as follow:

β = 0.7, λ = 2.0

. After slow-start client selection, FedDQA only needs to select the client in the

70 %

. Ideally, real-world scenarios with unstable network connections and device resource constraints make it challenging to maintain synchronized model aggregation over long periods. As analyzed in Section 3, aggregating low-quality local models can lead to slow or non-convergence of the global model. In our experiments, client selection strategies prioritize aggregating models trained on high-quality data, resulting in a biased and non-fair aggregation approach. To confirm that the global model is derived from clients with high-quality data, we use natural, noise-free test data for evaluation.

6.1.2. Noisy Environment Setup

In the noisy environment setup, we consider three noise scenarios as follows:

(1): Single noise type and multiple noise levels (1-M). We choose the Emnist and Fashion datasets. We split the training dataset into n parts as independent homogeneous distributions and add different Gaussian noise $Δ$ levels to each data part, assigning them to different clients i. Specifically, there are $Δ \sim N (μ = 0, σ^{2})$ , when $σ \sim i \times \frac{1}{C}, i \in (1, | C |)$ .
(2): Multiple noise types and single noise level (M-1). We choose the CIFAR-10 and CIFAR-10-C datasets. Among them, the noiseless natural data of client0 comes from the dataset CIFAR-10. client1∼client19 each comes with one different type of noise data, and the data come from 19 types of noise (level 5) in CIFAR-10-C.
(3): Multiple noise types and noise levels (M-M). We choose the CIFAR-10 and CIFAR-10-C datasets. Among them, the noiseless natural data of client0 comes from the dataset CIFAR-10. client1∼client9 each carries a different type of noise data, and the data come from 9 types of noise in CIFAR-10-C (level 3). client10∼client19 each carries a different type of noise data from the FedDQA parameter $λ = 1.0$ .

6.1.3. Benchmark Algorithms

We chose a variety of user-selected methods combined with federated learning as a comparison experiment. The underlying federated learning method is FedAVG. Since loss sharpness is a simple data metric, the experiments mainly consider the effects of various independent factors [] and do not select those that utilize a combination of factors for client selection.

(1): Random: This method selects a subset of clients randomly within the available range, which has been widely used in many studies (e.g., [,,]).
(2): POW-D: This method prioritizes clients with lower local losses. The method considers that the smaller the local final loss, the better its training results. This is a biased client selection scheme proposed in study [] and applied in study FedCor [].
(3): WGD: Weights Gradient Differences prioritizes the clients with the most significant parameter gradient of model updates. The method considers that the global model that receives more parameter updates during the local training process is considered to learn more; this metric has been applied in studies such as [,,].
(4): FAST: This method prioritizes the client with the fastest decreasing loss. The method considers that the faster the loss decreases during the local training process, the more content updates it has; this metric is applied in the study [].
(5): DQA: The method used in this paper.

6.1.4. Experimental Settings for Clients Selection

Considering that only those with heavy noise have the potential to poison the model, in the client selection process, we applied straightforward settings to classify clients into three subsets as follows: light-noise, mix-noise, and heavy-noise, based on the quality-aware ranking

{\bar{Q}}^{C}

, notated as

{\bar{C}}^{l}

,

{\bar{C}}^{m}

,

{\bar{C}}^{h}

. Light-noise clients are recommended to participate in model aggregation in this round, while heavy-noise clients are not advised to participate. For the vacancies that do not satisfy the number of candidate clients, a certain number of clients from mix-noise are randomly selected to participate in model aggregation. To distinguish three kinds of clients with different quality perceptions, we set two threshold parameters

(α, β)

, where

α \in (0, 1), β \in (0, 1)

are ranked according to the quality perceptions, and

P^{l}, P^{m}, P^{h}

are the selected probability for each clients set. The specific process is shown in Algorithm 1.

Algorithm 1: Client selection with dynamic thresholds

Data: client set C, DQA ordered set

{\bar{Q}}^{C}

, thresholds parameters

α, β

, client sub set size

κ

Result: selected client sub set

{\bar{C}}^{k}

A = α \times | C |

;

B = β \times | C |

;

select

{c_{i}}_{i = 0}^{A} \to {\bar{C}}^{l}

according to

{\bar{Q}}^{C}

and

P^{l}

;

combine

({\bar{C}}^{l}, {\bar{C}}^{m}, {\bar{C}}^{h}) \to {\bar{C}}^{k}

6.2. Effecient and Convergence Analysis

The FedDQA is designed to dynamically assess data quality based on loss sharpness during local training, allowing it to handle various types of noise in the data, not just simple Gaussian noise. In this subsection, we evaluate FedDQA’s performance in scenarios where the noise in the data is more complex by using the CIFAR-10-C dataset, which introduces a wide range of corruption types that simulate real-world noisy conditions.

In Figure 5, we observe that FedDQA can perceive the data quality better by slow-start client selection and loss sharpness computation compared to other factors of client selection. Combined with Table 1, it can be seen that FedDQA can perceive the data quality stably under different environmental settings with multiple noise levels and multiple noise types. By aggregating more clients of high-quality data, the obtained global model performs even better on natural datasets. The experimental results demonstrate that FedDQA consistently obtains higher accuracy on different noise sets. This shows that FedDQA performs robustly in scenarios where the noise in the data is more complex than simple Gaussian noise. The experiments using the CIFAR-10-C dataset, including a diverse set of complex corruptions, demonstrate that FedDQA handles heterogeneous and complex noise conditions effectively.

Figure 5. Analysis of the convergence of the global model. Compared to the baseline, FedDQA obtained more stable training results with faster convergence and higher accuracy.

Table 1. Classification accuracy (%) and corresponding standard deviations of FedDQA compared to other client selection methods on different datasets. Bold numbers have the highest accuracy.

6.3. Flexibility in Parameter Choices

To test the scalability of DQA, we evaluate the DQA accuracy at client choice by setting different parameters

λ

. Table 2 shows the accuracy for the dataset Emnist and the CIFAR-10-C task when 50 rounds of training are performed using different

λ

. Expressly, we set a threshold for the parameter

λ

, train on the dataset using FedDQA, and evaluate the performance by using the accuracy. We can observe that when the threshold parameter

λ

is varied, the accuracy rate does not change significantly. FedDQA achieves robust and consistent results across a range of parameter values, making it practical for real-world federated learning scenarios where exact parameter tuning may be challenging. This robustness ensures that FedDQA can be effectively applied without extensive parameter optimization, allowing for greater flexibility and ease of use in heterogeneous federated learning environments.

Table 2. Impact of slow-start metric

ϵ

on results. Better accuracy can still be obtained using higher

ϵ

, which further proves that the loss sharpness phenomenon is evident at the beginning of training.

6.4. Impact of Feature Skew

In this subsection, we further discuss the impact of noisy clients. Noise is essentially a change to the data domain. In federated learning, the convergence of the global model is affected when the model obtained by training different environments is aggregated. Figure 6 shows the performance under different noisy clients when there are 20 clients in the dataset Emnist. We make the following main statements: First, our proposed FedDQA client selection method can outperform other benchmarks in most settings. For example, we can visualize that FedDQA can achieve

87 %

accuracy after 50 training sessions when noisy clients = 19, but the accuracy is below

85 %

in other benchmarks. Second, when noisy clients increase, the performance gap between FedDQA and other benchmarks becomes more evident. This means that FedDQA can have higher accuracy against different numbers of noisy clients, which is favorable for its practical application.

Figure 6. Federated learning with noisy data space. FedDQA remains stable in a heavy-noise environment with its rational client selection strategy compared to the baseline.

6.5. Conjunction with Popular Federal Learning Methods

In the personalized federated learning process, the data quality problem is usually regarded as a feature of skew of the data, i.e., client personalization. Usually, these methods achieve model optimization by adjusting the difference between the local model and the global model. However, the noise space remains challenging for these personalized federated learning methods. We select popular personalized federated learning methods as our basic experiments. Relevant methods include (1) FedAVG []: classical federated learning, model aggregation based on average model weights; (2) FedProx []: classical personalized federated learning, used to cope with statistical and system heterogeneity by parameterizing FedAVG; (3) Ditto []: federation learning approach focusing on improving fairness and robustness for heterogeneous clients.

The experimental results are shown in Table 3. The experimental results show that aggregation of the personalized federated learning model is more stable and obtains higher accuracy when combined with the DQA client selection method. For example, for a given dataset CIFAR-10-C (M-M), the accuracy can be improved from

57.89 %

to

59.29 %

,

57.43 %

to

60.23 %

, and

55.29 %

to

59.07 %

for FedAVG, FedProx, and Ditto methods combined with DQA.

Table 3. Comparison of accuracy (%) of popular personalized federated learning combined with FedDQA client selection methods under different CIFAR-10-C datasets. Bold results indicate higher accuracy, which was improved for all of these methods when combined with FedDQA.

7. Conclusions

In conclusion, our proposed FedDQA framework effectively addresses the challenges of feature noise in heterogeneous federated learning by dynamically assessing data quality through loss sharpness metrics during training. This strategic client selection mitigates the negative impact of noisy clients on the global model’s performance and convergence rate and enhances overall model accuracy without additional computations or data exchanges. Furthermore, FedDQA can be adapted for model heterogeneity by incorporating a model compatibility assessment. This could involve evaluating the alignment of local models with the global model in terms of architecture and parameters and adjusting the aggregation process to account for structural differences, possibly through model distillation techniques or layer-wise aggregation strategies. Regarding computational resource heterogeneity, FedDQA can be extended by integrating a resource-awareness component, such as training time per epoch, memory usage, or energy consumption in the client selection process.

Author Contributions

Conceptualization, S.S. and Y.L.; methodology, S.S.; software, S.S.; validation, S.S., Y.L. and J.J.; formal analysis, S.S.; investigation, S.S.; resources, S.S.; data curation, S.S.; writing—original draft preparation, S.S. and Y.L.; writing—review and editing, S.S. and J.J.; visualization, S.S.; supervision, J.W.; project administration, S.S.; funding acquisition, X.F. and J.J. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was supported by Natural Science Foundation of Jilin Province under Grant YDZJ202401610ZYTS and supported by the Research Promotion Project of Key Construction Discipline in Guangdong Province (2022ZDJS112).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

N	Total number of clients
$D$	Global dataset
$D_{i}$	Local dataset of client i
$D_{i}^{'}$	Noisy dataset of client i
$Δ, δ$	Noise added to data sample
$ω$	Parameters of the global model
$ω^{*}$	Optimal global model parameters
$ω_{i}$	Parameters of the local model on nature data of client i
$ω_{i}^{'}$	Parameters of the model trained on noisy data of client i
$L (ω)$	Empirical loss on the global dataset $D$
$L_{i} (ω)$	Local empirical loss on the dataset $D_{i}$
$L_{i} (ω_{i}, D_{i})$	Loss function for dataset $D_{i}$ with model $ω_{i}$
$L_{i}$	Set of loss values of client i
$K_{t}$	Subset of clients selected in the t-th communication round
$ϵ$	Slow-start loss threshold
$λ$	Number of epochs to reach the loss threshold $ϵ$
$Q_{i}$	Data-Quality-Aware (DQA) score for client i
$Q_{K}$	Ordered set of DQA scores for client set $K$
f	Non-linear transformation function
$P_{i}$	Data distribution of client i
$P_{C}$	Selection probabilities for clients

References

Li, X.; Jiang, M.; Zhang, X.; Kamp, M.; Dou, Q. Fedbn: Federated learning on non-iid features via local batch normalization. arXiv 2021, arXiv:2102.07623. [Google Scholar]
Zhu, Z.; Hong, J.; Zhou, J. Data-Free Knowledge Distillation for Heterogeneous Federated Learning. In Proceedings of the 38th International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 12878–12889. [Google Scholar]
Zhang, J.; Chen, C.; Li, B.; Lyu, L.; Wu, S.; Ding, S.; Shen, C.; Wu, C. DENSE: Data-Free One-Shot Federated Learning. Adv. Neural Inf. Process. Syst. 2022, 35, 21414–21428. [Google Scholar]
Qi, P.; Zhou, X.; Ding, Y.; Zhang, Z.; Zheng, S.; Li, Z. FedBKD: Heterogenous Federated Learning via Bidirectional Knowledge Distillation for Modulation Classification in IoT-Edge System. IEEE J. Sel. Top. Signal Process. 2023, 17, 189–204. [Google Scholar] [CrossRef]
Fang, X.; Ye, M. Robust Federated Learning With Noisy and Heterogeneous Clients. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10072–10081. [Google Scholar]
Zhang, L.; Wu, D.; Yuan, X. FedZKT: Zero-Shot Knowledge Transfer towards Resource-Constrained Federated Learning with Heterogeneous On-Device Models. In Proceedings of the 2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS), Bologna, Italy, 10–13 July 2022; pp. 928–938. [Google Scholar]
Zhang, L.; Shen, L.; Ding, L.; Tao, D.; Duan, L.-Y. Fine-Tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10174–10183. [Google Scholar]
Huang, W.; Ye, M.; Du, B. Learn from Others and Be Yourself in Heterogeneous Federated Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10133–10143. [Google Scholar]
Tang, Z.; Zhang, Y.; Shi, S.; He, X.; Han, B.; Chu, X. Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning. In Proceedings of the 39th International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 June 2022; pp. 21111–21132. [Google Scholar]
Zhang, J.; Hua, Y.; Wang, H.; Song, T.; Xue, Z.; Ma, R.; Guan, H. FedALA: Adaptive Local Aggregation for Personalized Federated Learning. Proc. AAAI Conf. Artif. Intell. 2023, 37, 11237–11244. [Google Scholar] [CrossRef]
He, Y.; Chen, Y.; Yang, X.; Yu, H.; Huang, Y.-H.; Gu, Y. Learning Critically: Selective Self-Distillation in Federated Learning on Non-IID Data. IEEE Trans. Big Data 2022, 1–12. [Google Scholar] [CrossRef]
Marfoq, O.; Neglia, G.; Vidal, R.; Kameni, L. Personalized Federated Learning through Local Memorization. In Proceedings of the 39th International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 June 2022; pp. 15070–15092. [Google Scholar]
Gao, L.; Fu, H.; Li, L.; Chen, Y.; Xu, M.; Xu, C.-Z. FedDC: Federated Learning with Non-IID Data via Local Drift Decoupling and Correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10112–10121. [Google Scholar]
Alam, S.; Liu, L.; Yan, M.; Zhang, M. FedRolex: Model-Heterogeneous Federated Learning with Rolling Sub-Model Extraction. Adv. Neural Inf. Process. Syst. 2022, 35, 29677–29690. [Google Scholar]
Mendieta, M.; Yang, T.; Wang, P.; Lee, M.; Ding, Z.; Chen, C. Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 8397–8406. [Google Scholar]
Qu, Z.; Li, X.; Duan, R.; Liu, Y.; Tang, B.; Lu, Z. Generalized Federated Learning via Sharpness Aware Minimization. In Proceedings of the 39th International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 June 2022; pp. 18250–18280. [Google Scholar]
Tuor, T.; Wang, S.; Ko, B.J.; Liu, C.; Leung, K.K. Overcoming Noisy and Irrelevant Data in Federated Learning. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 5020–5027. [Google Scholar]
Yang, M.; Wang, X.; Zhu, H.; Wang, H.; Qian, H. Federated Learning with Class Imbalance Reduction. In Proceedings of the 2021 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 23–27 August 2021; pp. 2174–2178. [Google Scholar]
Xu, J.; Chen, Z.; Quek, T.Q.S.; Chong, K.F.E. Fedcorr: Multi-Stage Federated Learning for Label Noise Correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10184–10193. [Google Scholar]
Tam, K.; Li, L.; Han, B.; Xu, C.; Fu, H. Federated Noisy Client Learning. IEEE Trans. Neural Netw. Learn. Syst. 2023, 1–14. [Google Scholar] [CrossRef]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Aguera y Arcas, B. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 20–22 April 2017; PMLR: Fort Lauderdale, FL, USA, 2017; pp. 1273–1282. [Google Scholar]
Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and Open Problems in Federated Learning; Now Foundations and Trends: Norwell, MA, YSA, 2021; Volume 14, pp. 1–210. [Google Scholar]
Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated Optimization in Heterogeneous Networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
Wang, H.; Yurochkin, M.; Sun, Y.; Papailiopoulos, D.; Khazaeni, Y. Federated Learning with Matched Averaging. arXiv 2020, arXiv:2002.06440. [Google Scholar]
Li, Q.; He, B.; Song, D. Model-Contrastive Federated Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10713–10722. [Google Scholar]
Shoham, N.; Avidor, T.; Keren, A.; Israel, N.; Benditkis, D.; Mor-Yosef, L.; Zeitak, I. Overcoming Forgetting in Federated Learning on Non-IID Data. arXiv 2019, arXiv:1910.07796. [Google Scholar]
Dinh, C.T.; Tran, N.; Nguyen, J. Personalized Federated Learning with Moreau Envelopes. Adv. Neural Inf. Process. Syst. 2020, 33, 21394–21405. [Google Scholar]
Reisizadeh, A.; Farnia, F.; Pedarsani, R.; Jadbabaie, A. Robust Federated Learning: The Case of Affine Distribution Shifts. Adv. Neural Inf. Process. Syst. 2020, 33, 21554–21565. [Google Scholar]
Andreux, M.; du Terrail, J.O.; Beguier, C.; Tramel, E.W. Siloed Federated Learning for Multi-Centric Histopathology Datasets. In Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning: Second MICCAI Workshop, DART 2020, and First MICCAI Workshop, DCL 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, 4–8 October 2020, Proceedings 2; Springer: Cham, Switzerland, 2020; pp. 129–139. [Google Scholar]
Li, X.; Gu, Y.; Dvornek, N.; Staib, L.H.; Ventola, P.; Duncan, J.S. Multi-Site fMRI Analysis Using Privacy-Preserving Federated Learning and Domain Adaptation: ABIDE Results. Med. Image Anal. 2020, 65, 101765. [Google Scholar] [CrossRef] [PubMed]
Peng, X.; Huang, Z.; Zhu, Y.; Saenko, K. Federated Adversarial Domain Adaptation. arXiv 2019, arXiv:1911.02054. [Google Scholar]
Liu, Q.; Chen, C.; Qin, J.; Dou, Q.; Heng, P.A. FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1013–1023. [Google Scholar]
Wang, H.; Kaplan, Z.; Niu, D.; Li, B. Optimizing Federated Learning on Non-IID Data with Reinforcement Learning. In Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications, Toronto, ON, Canada, 6–9 July 2020; IEEE: New York, NY, USA, 2020; pp. 1698–1707. [Google Scholar]
Tang, M.; Ning, X.; Wang, Y.; Sun, J.; Wang, Y.; Li, H.; Chen, Y. FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10102–10111. [Google Scholar]
Qin, Z.; Yang, L.; Wang, Q.; Han, Y.; Hu, Q. Reliable and Interpretable Personalized Federated Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 20422–20431. [Google Scholar]
Cohen, G.; Afshar, S.; Tapson, J.; van Schaik, A. EMNIST: An Extension of MNIST to Handwritten Letters. arXiv 2017, arXiv:1702.05373. [Google Scholar]
Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar]
Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. Tech. Rep. 2009, 1, 1–60. [Google Scholar]
Hendrycks, D.; Dietterich, T. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. arXiv 2019, arXiv:1903.12261. [Google Scholar]
Lee, J.; Ko, H.; Seo, S.; Pack, S. Data Distribution-Aware Online Client Selection Algorithm for Federated Learning in Heterogeneous Networks. IEEE Trans. Veh. Technol. 2022, 72, 1127–1136. [Google Scholar] [CrossRef]
Fang, X.; Ye, M.; Yang, X. Robust Heterogeneous Federated Learning under Data Corruption. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 5020–5030. [Google Scholar]
Deng, Y.; Lyu, F.; Ren, J.; Wu, H.; Zhou, Y.; Zhang, Y.; Shen, X. Auction: Automated and Quality-Aware Client Selection Framework for Efficient Federated Learning. IEEE Trans. Parallel Distrib. Syst. 2021, 33, 1996–2009. [Google Scholar] [CrossRef]
Karimireddy, S.P.; Kale, S.; Mohri, M.; Reddi, S.; Stich, S.; Suresh, A.T. Scaffold: Stochastic Controlled Averaging for Federated Learning. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 12–18 July 2020; pp. 5132–5143. [Google Scholar]
Li, X.; Huang, K.; Yang, W.; Wang, S.; Zhang, Z. On the Convergence of FedAvg on Non-IID Data. arXiv 2019, arXiv:1907.02189. [Google Scholar]
Cho, Y.J.; Wang, J.; Joshi, G. Client Selection in Federated Learning: Convergence Analysis and Power-of-Choice Selection Strategies. arXiv 2020, arXiv:2010.01243. [Google Scholar]
Wu, H.; Wang, P. Node Selection Toward Faster Convergence for Federated Learning on Non-IID Data. IEEE Trans. Netw. Sci. Eng. 2022, 9, 3099–3111. [Google Scholar] [CrossRef]
Amiri, M.M.; Gündüz, D.; Kulkarni, S.R.; Poor, H.V. Convergence of Update Aware Device Scheduling for Federated Learning at the Wireless Edge. IEEE Trans. Wirel. Commun. 2021, 20, 3643–3658. [Google Scholar] [CrossRef]
Nishio, T.; Yonetani, R. Client Selection for Federated Learning with Heterogeneous Resources in Mobile Edge. In Proceedings of the ICC 2019—2019 IEEE International Conference on Communications (ICC), Shanghai, China, 20–24 May 2019; pp. 1–7. [Google Scholar]
Li, T.; Hu, S.; Beirami, A.; Smith, V. Ditto: Fair and Robust Federated Learning Through Personalization. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 6357–6368. [Google Scholar]

Figure 1. Under heterogeneous federation learning, the correctness of noisy datasets with different weights can vary significantly. This example demonstrates the simple federation learning process with a noisy data space. The experimental results show that when the proportion of noise in the client model is small, the accuracy is

61 %

(FedDQA); however, when a larger proportion of noise is mixed in, the accuracy decreases to

58 %

(Random), indicating that the data noise has a large impact on the model and reduces the aggregation performance.

Figure 2. (a) Illustration of the local drift in FedAVG with a sigmoid activation function f.

ω_{c}

is the parameter of the model trained with centralized data (optimal model);

ω_{12}

and

ω_{23}

are the parameters of the model generated by FedAVG.

θ_{1}

,

θ_{2}

, and

θ_{3}

are the parameters of local models of client 1, client 2, and client 3, respectively. (b) In the case of natural data distribution, more clients are aggregated in each round, obtaining better generalization and, eventually, higher accuracy. For a noisy data environment, aggregating too many clients means aggregating to those clients that are heavily noisy, which results in lower accuracy instead.

Figure 3. The figure gives the loss values for each epoch during the training process (data source: CIFAR-10 and CIFAR-10-C). Even when using neural networks with the same structure, data with different types of noises (or nature) show other distributions and trends in the loss values during the training process.

Figure 4. Overview of FedDQA. We set the slow-start loss threshold

ϵ

in the local update stage. A set of loss values

L

occurs when the threshold is reached. We calculate the number of epochs

λ

at which the loss reaches the threshold

ϵ

. The volatility of the change in the loss to obtain the final DQA formula consists of the number of epochs

λ

and the loss sharpness value. In the server aggregation stage, the server classifies clients into light-noise, mixed-noise, and heavy-noise according to its DQA

Q_{k}

, calculated by the loss sharpness metric. Then, it selects client k for model aggregation.

Figure 5. Analysis of the convergence of the global model. Compared to the baseline, FedDQA obtained more stable training results with faster convergence and higher accuracy.

Figure 6. Federated learning with noisy data space. FedDQA remains stable in a heavy-noise environment with its rational client selection strategy compared to the baseline.

Table 1. Classification accuracy (%) and corresponding standard deviations of FedDQA compared to other client selection methods on different datasets. Bold numbers have the highest accuracy.

Dataset	Letters (1-M)	Fashion (1-M)	cifar-10 (M-1)	cifar-10-C (M-1)	cifar-10-C (M-M)
Random	85.37 ± 0.3	85.72 ± 0.01	57.11 ± 0.75	59.19 ± 0.66	59.02 ± 0.8
POW-D	86.08 ± 0.58	82.89 ± 0.3	59.5 ± 0.38	57.17 ± 0.48	57.25 ± 0.69
WGD	84.42 ± 2.2	83.22 ± 0.34	58.51 ± 1.64	59.44 ± 0.59	59.38 ± 0.3
FAST	83.92 ± 0.37	84.72 ± 0.23	58.03 ± 1.02	58.93 ± 0.81	58.76 ± 0.86
DQA	87.33 ± 0.34	86.88 ± 0.33	59.82 ± 0.15	60.24 ± 0.76	59.34 ± 0.07

Table 2. Impact of slow-start metric

ϵ

on results. Better accuracy can still be obtained using higher

ϵ

, which further proves that the loss sharpness phenomenon is evident at the beginning of training.

Table 2. Impact of slow-start metric

ϵ

on results. Better accuracy can still be obtained using higher

ϵ

, which further proves that the loss sharpness phenomenon is evident at the beginning of training.

Metric $ϵ$	Emnist (1-M)	CIFAR-10-C (M-1)
0.1	86.93	60.24
0.5	87.18	59.84
1.0	87.75	59.32
1.5	87.96	59.65
2.0	87.75	61.23
Avg	87.51	60.06

Table 3. Comparison of accuracy (%) of popular personalized federated learning combined with FedDQA client selection methods under different CIFAR-10-C datasets. Bold results indicate higher accuracy, which was improved for all of these methods when combined with FedDQA.

Method	1-M	M-1	M-M
FedAVG	56.15	58.26	57.89
+Ours(FedDQA)	59.92	59.32	59.29
FedProx	55.34	57.9	57.43
+Ours(FedDQA)	60.59	59.84	60.23
Ditto	57.93	55.01	55.29
+Ours(FedDQA)	59.69	58.57	59.07

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Data Quality-Aware Client Selection in Heterogeneous Federated Learning

Abstract

1. Introduction

3. Notations and Preliminaries

3.1. General FL Framework

3.2. Heterogeneous Federated Learning

4. Loss Sharpness in Federated Learning

5. FedDQA

6. Performance Evaluation

6.1. Evaluation Setup

6.1.1. System Settings and Datasets

6.1.2. Noisy Environment Setup

6.1.3. Benchmark Algorithms

6.1.4. Experimental Settings for Clients Selection

6.2. Effecient and Convergence Analysis

6.3. Flexibility in Parameter Choices

6.4. Impact of Feature Skew

6.5. Conjunction with Popular Federal Learning Methods

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics