Next Article in Journal
A Motion-Direction-Detecting Model for Gray-Scale Images Based on the Hassenstein–Reichardt Model
Previous Article in Journal
A Port-Hopping Technology against Remote Attacks and Its Effectiveness Evaluation
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

Node Selection Algorithm for Federated Learning Based on Deep Reinforcement Learning for Edge Computing in IoT

Xi’an Research Institute of Hi-Tech, Xi’an 710025, China
College of Information and Communication, National University of Defense Technology, Wuhan 430000, China
Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an 710071, China
Xiongan Institute of Innovation, Chinese Academy of Sciences, Baoding 071702, China
College of Science, China University of Petroleum (East China), Qingdao 266580, China
Department of Computer Science, Community College, King Saud University, Riyadh 11437, Saudi Arabia
Author to whom correspondence should be addressed.
Electronics 2023, 12(11), 2478;
Submission received: 30 April 2023 / Revised: 26 May 2023 / Accepted: 29 May 2023 / Published: 31 May 2023


The Internet of Things (IoT) and edge computing technologies have been rapidly developing in recent years, leading to the emergence of new challenges in privacy and security. Personal privacy and data leakage have become major concerns in IoT edge computing environments. Federated learning has been proposed as a solution to address these privacy issues, but the heterogeneity of devices in IoT edge computing environments poses a significant challenge to the implementation of federated learning. To overcome this challenge, this paper proposes a novel node selection strategy based on deep reinforcement learning to optimize federated learning in heterogeneous device IoT environments. Additionally, a metric model for IoT devices is proposed to evaluate the performance of different devices. The experimental results demonstrate that the proposed method can improve training accuracy by 30% in a heterogeneous device IoT environment.

1. Introduction

With the continuous development of the Internet of Things (IoT) and edge computing technology, privacy issues in edge computing for IoT have become increasingly prominent [1]. Personal privacy and data leakage are among the most prominent issues. Due to the large number of sensors and devices involved in IoT, they continuously collect and transmit various types of data, including personal identification information, geographic location information, health status information, and so on [2]. If these data are obtained by malicious individuals, it could pose significant security threats and privacy risks. Another privacy issue is data security. Data in IoT are usually scattered among different devices, cloud servers, edge nodes, and sensors. These data need to be transmitted and stored, and the networks and devices used for transmitting and storing data face various security threats. For example, there may be hackers attacking the network, data centers being stolen, edge devices being eavesdropped or tampered with, and so on. These issues could all lead to data leakage and security risks. In addition, due to the inconsistency of data formats and standards among different devices and systems, data cannot be effectively shared and utilized, resulting in the problem of data silos. This not only limits the application and effectiveness of IoT but also leads to inefficiency in data management and analysis. This is because many data are stored and processed in isolation on different devices or systems, resulting in data fragmentation and the inability to achieve complete data analysis and application. With the growth of data and the increase in data transmission, IoT edge computing systems must handle more and more sensitive data, including personal privacy data and business confidential data. However, privacy and data silos are not only challenges faced by IoT edge computing but also important obstacles restricting the development of IoT technology. In order to solve these problems, federated learning has become an important solution.
Federated learning is a distributed machine learning approach that allows multiple devices or data sources to collaborate in learning without exposing raw data [3,4]. This approach not only reduces the cost of data transmission and storage, but also better protects privacy and data security, thereby avoiding privacy leaks and data loss issues. By training models without sharing data, federated learning protects the privacy of users participating in the training and improves the privacy protection and training effectiveness of edge computing in the IoT [5]. However, in the edge computing environment of the IoT, the application of federated learning faces many challenges, the most significant of which are heterogeneous devices and malicious nodes. Heterogeneous devices refer to devices participating in federated learning that have different computing capabilities, bandwidth, and data, which leads to training imbalance and instability [6,7]. Moreover, this leads to a high dimensionality of the solution space for the node selection problem in federation learning. Heuristic algorithms are prone to fall into local optimal solutions and fail to find global optimal solutions when faced with such complex problems. In federated learning, each device only uses its own local data for training, so the computing power and data quality of the device have a direct impact on the effectiveness of federated learning [8,9]. At the same time, network bandwidth between devices can also affect the training speed and effectiveness of federated learning. In federated learning, each participant, as a node, trains local data and then uploads the trained model parameters to the server for global model updates [10]. Due to the diversity and uncertainty of participants, the presence of malicious nodes may have a serious impact on the training effectiveness of federated learning [11,12]. Malicious nodes may engage in a variety of behaviors, such as transmitting false model parameters or intentionally destroying model parameters. For example, some participants may transmit incomplete or tampered data, or maliciously modify the training model to achieve their private interests or destroy the global model. These malicious behaviors may result in a decrease in the accuracy of the global model or complete collapse, seriously affecting the effectiveness and application value of federated learning. All of these issues need to be properly addressed in federated learning to enable effective model training on edge devices and ensure user privacy and security.
In summary, there are the following issues with applying federated learning in edge computing:
  • The node selection strategy in federated learning is not targeted enough, and there are few selection mechanisms specifically designed for IoT environments. Most selection mechanisms are based on random selection.
  • There are many heterogeneous devices in IoT edge computing, with different computing power, bandwidth, and data, which leads to training imbalance and instability;
  • There are some malicious devices in IoT edge computing that upload outdated or incorrect local models for various reasons, which negatively impact the convergence of the global model.
To address the problems associated with applying federated learning in edge computing networks mentioned above, this manuscript proposes the following solutions:
  • This manuscript proposes using deep reinforcement learning methods instead of traditional heuristic methods to select terminal devices to improve the accuracy of selection;
  • This manuscript proposes measuring the resource properties of IoT devices to determine their likelihood of participating in federated learning and improve the algorithm’s applicability in IoT environments;
  • To address the issue of devices uploading outdated or incorrect local models, this manuscript proposes a node credibility measurement scheme to eliminate the impact of malicious nodes on federated learning in edge computing networks.

2. Related Works

2.1. Federated Learning

In addition to privacy and security issues, the uneven distribution of data, communication network resources and computing resources will lead to low efficiency of model training. In order to further optimize the model iterative updating process and improve the efficiency of federated learning, researchers have conducted a lot of related research on these problems and different scenarios [13,14,15]. Because the training process of federated learning needs many iterations to update the training parameters, it causes a large communication overhead. Some researchers have performed research on optimizing the communication process for this problem. One of the research directions is the method of compressing the data that need to be updated by the model. For example, Sattle et al. [16] proposed a compression framework of sparse ternary compression. This framework extends the existing compression technology of Top-k gradient thinning through a novel mechanism for the existing federated learning compression methods that either only compress the upstream communication from the client to the server (without compressing the downstream communication), or only perform well under ideal conditions (e.g., independent and identical distribution). The hierarchical and optimal Golomb coding of downstream compression and weight updating is realized so that the federated learning communication mode is optimized, especially in the learning environment with limited bandwidth. The research direction of some researchers is to design new mechanisms and learning algorithms. Mills et al. [17] proposed a multi-task joint learning system, which benefits the accuracy of user models by using distributed Adam optimization and introducing a non-joint patch batch standardization layer, and only needs to upload a certain proportion of user data for model integration each time the model is updated. Guo et al. [18] proposed a novel design of a transceiver and learning algorithm that simulates the analog gradient aggregation (AGA) solution, which significantly reduces the multi-channel access delay. Wu et al. [19] proposed a framework for automatically selecting the most representative data from unlabeled input streams so as not to accumulate a large number of data sets due to the storage limitations of edge devices and proposed a data-replacement strategy based on contrast scores, that is, measuring the representation quality of each data without using labels, indicating that the data with low quality are not effectively learned by the model and will remain in the buffer for further learning, while data representing high quality are discarded.

2.2. Federated Learning Based on Edge Computing

As an extension of cloud computing, edge computing deploys computing resources in the edge network near the user side [20,21,22,23]. The terminal equipment can directly perform data analysis, storage and calculation at the edge node, realizing the service requirements of low delay, short communication distance and high reliability. As a learning mode for long-time distributed interaction with terminal devices, federated learning can effectively improve the performance of federated learning if edge computing can be used for task training or merging in advance. However, generally, edge nodes are different from cloud computing centers, in that their computing and communication resources are limited. Under the framework of a large-scale federal learning network, a large number of terminals will communicate and calculate based on edge computing, which is prone to the problem of communication bottlenecks and uneven resources, resulting in delay “short board”. Therefore, it is necessary to optimize resource scheduling under federated learning based on edge computing. First, based on the traditional federal learning framework, Shi et al. [24] proposed a joint equipment scheduling and resource allocation strategy. According to the number of training rounds and the number of scheduled equipment in each round, communication and computing resources are jointly considered, and a greedy equipment scheduling algorithm is designed to maximize the model accuracy under the condition of time constraints. Liu et al. [25] considered that in the federated learning scenario based on edge computing, by splitting the model, some models are reserved for local training, and the rest of the models are unloaded to edge nodes for training, thus reducing the training task of end users but at the same time increasing the overhead of the communication resources. Zhang et al. [26] proposed a federal learning-based service function chain mapping algorithm to solve the resource allocation problem of air–space integration networks and effectively improve resource utilization.
In addition, some researchers innovated and optimized the framework of federal learning. Luo et al. [27] introduced a novel hierarchical joint edge learning framework, in which some model aggregations are migrated from the cloud to the edge server, and further optimized the joint consideration of computing and communication resource allocation and edge association of devices under the hierarchical joint edge learning framework. Hosseinalipour et al. [28] proposed a multi-layer federated learning framework in heterogeneous networks, which takes into account the heterogeneity of the network structure, device computing capacity and data distribution, and realizes efficient federated learning by offloading learning tasks and allocating communication and computing resources accordingly. Xue et al. [29] implemented a clinical decision support system based on federated learning in edge computing networks. The double deep Q network was deployed at the edge node, and a stable and orderly clinical treatment strategy was obtained. Considering the constraints of link limitation, delay limitation and energy limitation, Lyapunov optimization was used to improve the convergence of the system.

3. System Implementation

The process of the federated learning node selection mechanism based on deep reinforcement learning designed in this manuscript is shown in Figure 1 below. The physical network environment composed of IoT devices and the policy network constitutes the entire deep reinforcement learning system. When a federated learning request arrives, the policy network, acting as an intelligent agent, extracts a specific feature matrix from the physical network as input based on the current state of the IoT devices. The training is conducted in an environment built by the physical resource state, and this process is considered the environment sending a state to the agent. The intelligent agent infers the federated learning node selection decision based on the training, which is considered an action applied to the environment. The environment provides the agent with a reward signal based on the execution effectiveness of the action. The agent continually optimizes the action by interacting with the environment to accumulate the maximum reward signal.

3.1. Feature Extraction

Training environment and methods have a great impact on training effectiveness. In order to train the agent in an environment closer to the real network, this paper proposes to extract the following four device features as the device attributes extracted by deep reinforcement learning:

3.1.1. Computational Model

For IoT devices, due to their requirements for low power consumption and small size, their computing power is usually limited, and computing power is also an important measure of whether an IoT device is suitable for participating in federated learning and its computing ability. For the computing power of IoT devices, this chapter uses the computing power of their processors to measure, usually expressed in FLOPS (floating-point operations per second). FLOPS refers to the number of floating-point operations that a device can complete in unit time and is an important indicator of computer performance. Generally, the FLOPS of IoT devices can be calculated using the following formula:
F L O P S i = C P U _ F r e i × C P U _ C o r e i × C P U _ F P U i C P U _ C o r e i × 2 ,
where F L O P S i represents the F L O P S value of IoT device i, C P U _ F r e i represents the CPU frequency of the device, C P U _ C o r e i represents the number of CPU cores of the device, and C P U _ F P U i represents the number of FPUs of the device.

3.1.2. Communication Model

The communication resources of IoT devices refer to the network resources required for devices to communicate, including bandwidth, network delay, network stability, etc. The adequacy of communication resources directly affects the communication quality and stability of the equipment. In IoT, different devices have different communication resources. For example, infrastructure devices usually have strong communication resources, which can support high-speed and stable data transmission, while some edge devices may have relatively limited communication resources, which need to be scheduled and optimized according to their specific usage scenarios and needs. For the communication resources of IoT devices, adequate evaluation and management are required to ensure the communication quality and stability of the devices. At the same time, it is also necessary to consider the allocation and utilization of communication resources during device design and deployment to meet the communication needs of the device.
In IoT, communication between devices can use different wireless technologies, such as Bluetooth, Zigbee, Wi-Fi, etc. Typically, these technologies employ radio frequency-based wireless communication techniques. In this context, the communication capability of the IoT device is measured by measuring its bandwidth, channel, modulation method, and signal-to-noise ratio. This section uses the following formula to calculate the communication model of the device:
D T E i = ( C C i × M R E i × S N R i ) / B W i ,
where D T E i represents the data transmission efficiency of IoT device i, C C i represents the channel capacity of the device, M R E i represents the modulation efficiency of the device, S N R i represents the signal-to-noise ratio of the device, and B W i represents the bandwidth of the device.

3.1.3. Data Quality Model

In IoT applications, the data quality is crucial, as it affects subsequent data analysis and applications. The model used to evaluate the quality of data generated by IoT devices is known as the IoT device data quality evaluation model. The total data quality management (TDQM) model can help determine whether the data generated by IoT devices are reliable, accurate, consistent, complete, and usable, thus improving the accuracy and credibility of data analysis [30]. For edge IoT devices, the TDQM model based on data accuracy and data integrity is chosen in this paper to evaluate the quality of data contained in edge IoT devices by assessing the data quality through the source and availability of data. The calculation method is as follows:
D a t a i = T D Q M i ( i n t e g r i t y , r e a d s p e e d , a c c u r a c y ) ,
where D a t a i represents the data metric of device i, while T D Q M i represents the local data quality of the device under the TDQM model.

3.1.4. Equipment Contribution

In federated learning, each device participating in training needs to upload its locally trained model parameters so that the server can integrate them into a global model. However, some devices may be unwilling or unable to upload their local models’ correct or latest versions due to various reasons, such as network issues, computational resource limitations, or privacy protection. This may have a negative impact on the performance of the global model. Therefore, it is necessary to measure the contribution of each device to identify and exclude unreliable or low-contributing devices, thereby improving the quality and convergence speed of the global model. We evaluate the contribution of each device by the improvements made to the global model parameters by the local model parameters. In this paper, the contribution of IoT device i can be defined as follows:
V i , k = 1 K k = 1 K w i , k w k σ k ,
where V i , k represents the contribution value of device i to the global model in the k-th round of training, w i , k represents the local model parameters of device i after the k-th round of training, w k represents the global model parameters after the k-th round of training, and σ k represents the sum of weights of all devices after the k-th round of training.
For terminal device i, after extracting the above network properties, the terminal device attributes are combined into a feature vector, represented as
v i = ( F L O P S i , D T E i , D a t a i , V i , k ) .
All the feature vectors are then combined into a four-dimensional feature matrix as follows:
F M = ( v 1 , v 2 , , v N ) = F L O P S 1 D T E 1 D a t a 1 V 1 , k F L O P S 2 D T E 2 D a t a 2 V 2 , k F L O P S N D T E N D a t a N V N , k .
Whenever a new round of federated learning is required, the policy network will extract the above feature matrix from the terminal devices as input, providing the intelligent agent with a training environment. At the same time, the feature matrix will be continuously updated as the terminal device’s resources are consumed.

3.2. Policy Network

The policy network in deep reinforcement learning is used to output a policy that selects the next action based on the current state. In our proposed method, we are selecting devices with probability greater than a specific value based on the probability of the policy network output, rather than selecting a specific number of devices to participate in the training. As shown in Figure 2, the policy network structure designed in this chapter includes four layers: extraction layer, convolutional layer, probabilistic layer, and output layer.
  • Extraction layer: The extraction layer, also known as the input layer, is primarily responsible for converting the input raw data into a format that can be processed by the deep neural network, usually by standardizing, normalizing, and other processing methods. In this chapter, the extraction layer extracts the feature matrix from all terminal devices based on their current states, and uses it as the input to the policy network. The feature matrix is then transferred to the next layer of the policy network.
  • Convolutional layer: The convolutional layer is a commonly used layer structure in deep learning. It uses convolutional kernels to perform convolution operations on input data in order to extract features. In this chapter, the convolutional layer performs convolution operations on the input vector according to the following equation:
    y i , j = ( K I ) i , j = m n I i + m , j + n K m , n ,
    where y i , j denotes the output matrix, I denotes the input matrix, and K denotes the convolution kernel. m n I i + m , j + n K m , n denotes the element I i + m , j + n of the input matrix multiplied by the element K m , n of the convolution kernel matrix. The ReLU activation function is then used to connect the fully connected layers as follows:
    f ( x ) = max ( 0 , y i , j ) .
    The generated vectors are passed to the probability layer in order to generate the probability of each node.
  • Probability layer: The probability layer uses the softmax function to compute the feature vector and generate the probability of each terminal device. The softmax function can map the elements of a K-dimensional vector to a K-dimensional probability distribution, where each element represents a probability value in the corresponding distribution. Specifically, for a federated learning network consisting of n terminal devices, the probability layer outputs an n-dimensional probability distribution, where each element represents the probability of selecting a terminal device. In this chapter, the calculation of the probability of device i participating in federated learning can be represented by the following formula:
    P i = e v i j = 1 n e v j ,
    among them, the denominator is the sum of the exponential functions of all elements, and the numerator is the exponential function of v i . In this way, the value of P i is the probability value corresponding to the dimension where v i is located, and the sum of all v i is equal to 1.
  • Output layer: The output layer outputs IoT devices and their probability of participating in federated learning.

3.3. Model Training

In our study, we employ deep reinforcement learning to implement a node selection strategy as shown in Figure 2. We first randomly initialize the parameters of the policy network and train it for several epochs. For the node selection task in each training iteration, we extract the feature matrix from the federated learned node set as the input of the policy network. The policy network outputs a set of available nodes and the selection probability of each node according to the node feature vector. The selection probability of each node represents the possibility of it being selected to participate in federated learning to produce better results. In the training phase, we do not select a fixed number of nodes to participate in the training but select devices whose probability value is greater than the threshold we set to participate in the training. The selected nodes will participate in the training process of federated learning and work together to learn the global model. Our node selection strategy is flexible and does not limit the number of nodes selected each time. This means that our method can adapt to federated learning scenarios of different scales and select the appropriate number of nodes according to the demand.
In deep reinforcement learning, unlike supervised learning, we do not have label information in the training data to guide the training process [31,32]. Instead, our learning agent relies on reward signals to evaluate the effectiveness of its actions. The magnitude of the reward signal indicates the agent’s decision quality, with larger reward signals indicating good decisions, while smaller or even negative reward signals indicate misbehavior that needs to be adjusted [33]. The choice of reward is crucial to the training process and the formation of the final policy. In the federated learning node selection problem based on deep reinforcement learning, the effect of each round of federated learning is used as a reward signal [34]. This indicator can better reflect the contribution of all devices to the global model aggregation of federated learning under the current selection scheme, which is very representative [35,36]. Therefore, after each round of federated learning, the agent calculates the reward signal according to the aggregation effect of the global model, and updates the parameters of the policy network to optimize the performance of the policy network. Through continuous iterative training, the policy network gradually learns the optimal node selection strategy, and can provide an efficient and reliable node selection scheme for federated learning. In practical implementations, due to the lack of real label information for node selection, we introduce hand-crafted labels to approximate the agent’s decision. Suppose we choose the i-th and i+2-th nodes, then in the policy network, the manual label will be an all-zero vector y, except for the i-th and i+2-th positions, which are 1. By computing the cross-entropy loss between the output of the policy network and the hand-crafted labels, we can measure the deviation of the output of the policy network from expectations and use this loss to guide the training process.
In this manuscript, backpropagation is used to calculate the parameter gradient of the policy network. First, the loss function is calculated using cross entropy based on the training samples and the output of the policy network. Then, the backpropagation algorithm is used to calculate the gradient of the loss function with respect to the parameters of the policy network. The backpropagation algorithm uses the chain rule to calculate the gradient of each parameter, starting from the output layer, calculating the partial derivative of each neuron, and then calculating the gradient of each parameter layer by layer. Finally, the gradient descent optimization algorithm is used to update the parameters of the policy network. The gradient calculated by the backpropagation algorithm indicates which direction the parameters should be adjusted, while the gradient descent optimization algorithm tells us how much to adjust. In this way, the parameters of the policy network can be continuously adjusted to improve the accuracy and performance of the policy network.

4. Experiment

4.1. Experimental Environment

For the node selection-optimized federated learning scheme (FL-IoTEL) proposed in this manuscript for IoT edge computing, we designed simulation experiments to verify its reliability. The simulation experiment simulates an IoT edge computing network with 10 edge nodes. Each edge node is connected to several IoT devices, and a total of 100 devices are involved. In the experiments we set up, there are 110 devices involved in node selection, of which 10 are edge nodes and 100 are IoT devices. Each edge node is connected to multiple IoT devices and is responsible for aggregating the local model of IoT devices into a global model. There are a total of 50,000 training images and 10,000 test images in the CIFAR-10 dataset [37]. CIFAR-10 has a slightly larger image size than MNIST and is in color. However, there are 10,000 more training images in the MNIST dataset than in CIFAR-10 [38]. For each IoT device, different hardware metrics (such as computing power and communication) and different data are assigned.

4.2. Simulation Results and Analysis

In the edge computing environment of the Internet of Things, federated learning needs to consider the characteristics of many heterogeneous devices, including but not limited to differences in hardware performance, network communication capabilities, data volume, and data quality. These differences can lead to variations in the quality and quantity of data provided by each device, as well as affecting the training speed and effectiveness of the devices. For example, some devices may have faster processors and higher memory capacities, enabling them to train and analyze more quickly, while other devices may be limited by lower hardware performance and take longer to complete the same task. Additionally, data may differ in quality and quantity depending on their source, with some data having better quality and more samples, while other data may have more noise or lack diversity. Therefore, federated learning needs to consider the heterogeneity of devices and data and use appropriate federated algorithms and node selection strategies to effectively utilize these heterogeneous resources, improve the training effectiveness and inference speed of models, and protect the privacy of devices. This section designs experiments to verify the performance of the node selection strategy for federated learning based on deep reinforcement learning under the edge computing of the Internet of Things, and compares the algorithm designed in this chapter with the traditional FedProx algorithm.
First, this manuscript compares the performance of the final global model under the condition that the data satisfy the independent and identically distributed (IID) assumption. Figure 3 shows the accuracy of the model when the local data on the IoT devices follow the IID assumption, and the size of the local dataset is the same. It can be seen that when the models of the two algorithms finally converge, there is little difference in the accuracy of the model on the test set. This is because the data distribution on the edge nodes is consistent, and the effect of randomly selecting nodes for learning is the same as that of purposefully selecting nodes on the server. However, because the scheme considered in this article not only considers the heterogeneity of node data but also the heterogeneity of resources, the algorithm proposed in this chapter is slightly better than the traditional FedProx algorithm in terms of final convergence. From Figure 3, it can also be seen that both schemes have good performance on the MNIST dataset but perform poorly on the CIFAR-10 dataset. This is because the CIFAR-10 dataset contains more information than the MNIST dataset, and the model performance is not as good as that on MNIST. This also proves that different datasets have a significant impact on learning models, and in scenarios with node selection, small-scale algorithms can always achieve optimal results.
Then, experiments were conducted for devices with Non-IID data. Figure 4 shows the training accuracy when the data on IoT devices are Non-IID. It can be seen from Figure 4 that when the data are Non-IID, the algorithm proposed in this chapter has a significantly higher accuracy than the traditional FedProx algorithm. This is because when the data are IID or the differences between the data of each device are small, there is not much difference between the traditional random device selection algorithm and the federated learning node selection strategy based on deep reinforcement learning used in this chapter. However, when the data differences are large, the algorithm described in this chapter performs well because it takes into account the device data.
What is more, this manuscript simulated the performance in different Non-IID scenarios, and used the variance of local data distribution to represent the size of data heterogeneity. The larger the variance, the greater the heterogeneity of local data on terminal devices. As can be seen in Figure 5, all devices perform almost identically, and the results of random node selection are consistent with the algorithm-based node selection. Therefore, the results shown in the figure appear. As the variance increases, the superiority of the proposed algorithm in this scenario is reflected because in this case, selecting nodes that are more useful for the global model is more reasonable than randomly selecting nodes. In addition, the number of users also affects the accuracy of the test set. The more nodes participate in the learning process, the better the performance of the learned model will be; of course, the learned model will be slightly better than the number of nodes that are less.
Finally, experiments were conducted to analyze the performance of the algorithm under device heterogeneity, and the experimental results are shown in Figure 6 and Figure 7. As can be seen from the figures, when there is device heterogeneity, the experimental results are similar to those of data heterogeneity. As shown in Figure 6, when the performance of the devices is similar, there is not much difference in training accuracy between the two algorithms. However, when there is significant performance difference between the devices and some devices perform poorly, the algorithm proposed in this chapter has a more obvious advantage over the traditional FedProx algorithm. This is because when there is significant device heterogeneity, some devices may not be able to complete the data training task, resulting in lower accuracy of the global model. The node selection strategy proposed in this chapter can better play its role by selecting better-performing devices to participate in training, thereby ensuring the efficiency and accuracy of federated learning.

5. Conclusions

This manuscript optimized the federated learning technology in edge computing for the Internet of Things. A node selection strategy based on deep reinforcement learning was proposed to select IoT nodes to participate in the federated learning training, ensuring efficient participation of heterogeneous IoT devices and improving the privacy protection ability of edge computing. The experimental results showed that the proposed method in this manuscript can improve the training accuracy by 30% in the heterogeneous device IoT environment. This manuscript provides a new perspective to solve the privacy protection problem in edge computing for the Internet of Things and proposes a node selection strategy based on deep reinforcement learning to optimize the federated learning technology. This strategy can ensure the efficiency of heterogeneous device participation in training and improve the accuracy of the model under the premise of privacy protection. The research results of this chapter can provide new ideas and methods for privacy protection in edge computing for the Internet of Things and are expected to be more widely used in practical applications.

Author Contributions

Conceptualization, S.Y. and P.Z.; methodology, S.Y.; software, S.H.; validation, J.W., H.S. and Y.Z.; formal analysis, S.H.; investigation, A.T.; resources, J.W. and A.T.; data curation, P.Z.; writing—original draft preparation, S.Y. and P.Z.; writing—review and editing, Y.Z. and A.T.; visualization, H.S.; supervision, J.W.; project administration, A.T.; funding acquisition, A.T. All authors have read and agreed to the published version of the manuscript.


This work was funded by the Researchers Supporting Project number (RSPD2023R681), King Saud University, Riyadh, Saudi Arabia.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.


  1. Deng, S.; Zhao, H.; Fang, W.; Yin, J.; Dustdar, S.; Zomaya, A.Y. Edge intelligence: The confluence of edge computing and artificial intelligence. IEEE Internet Things J. 2020, 7, 7457–7469. [Google Scholar] [CrossRef]
  2. Shafique, K.; Khawaja, B.A.; Sabir, F.; Qazi, S.; Mustaqim, M. Internet of things (IoT) for next-generation smart systems: A review of current challenges, future trends and prospects for emerging 5G-IoT scenarios. IEEE Access 2020, 8, 23022–23040. [Google Scholar] [CrossRef]
  3. Khan, L.U.; Saad, W.; Han, Z.; Hossain, E.; Hong, C.S. Federated learning for internet of things: Recent advances, taxonomy, and open challenges. IEEE Commun. Surv. Tutor. 2021, 23, 1759–1799. [Google Scholar] [CrossRef]
  4. Niknam, S.; Dhillon, H.S.; Reed, J.H. Federated learning for wireless communications: Motivation, opportunities, and challenges. IEEE Commun. Mag. 2020, 58, 46–51. [Google Scholar] [CrossRef]
  5. Wang, X.; Ning, Z.; Guo, L.; Guo, S.; Gao, X.; Wang, G. Mean-Field Learning for Edge Computing in Mobile Blockchain Networks. IEEE Trans. Mob. Comput. 2022, 1–17. [Google Scholar] [CrossRef]
  6. Zhu, Z.; Hong, J.; Zhou, J. Data-free knowledge distillation for heterogeneous federated learning. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 12878–12889. [Google Scholar] [CrossRef]
  7. Ning, Z.; Zhang, K.; Wang, X.; Guo, L.; Hu, X.; Huang, J.; Hu, B.; Kwok, R.Y.K. Intelligent Edge Computing in Internet of Vehicles: A Joint Computation Offloading and Caching Solution. IEEE Trans. Intell. Transp. Syst. 2021, 22, 2212–2225. [Google Scholar] [CrossRef]
  8. Wang, J.; Zhang, H.; Wang, J.; Pu, Y.; Pal, N.R. Feature Selection Using a Neural Network With Group Lasso Regularization and Controlled Redundancy. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 1110–1123. [Google Scholar] [CrossRef]
  9. Zhang, H.; Wang, J.; Sun, Z.; Zurada, J.M.; Pal, N.R. Feature Selection for Neural Networks Using Group Lasso Regularization. IEEE Trans. Knowl. Data Eng. 2020, 32, 659–673. [Google Scholar] [CrossRef]
  10. Zhou, X.; Liang, W.; She, J.; Yan, Z.; Kevin, I.; Wang, K. Two-layer federated learning with heterogeneous model aggregation for 6g supported internet of vehicles. IEEE Trans. Veh. Technol. 2021, 70, 5308–5317. [Google Scholar] [CrossRef]
  11. Xue, G.; Chang, Q.; Wang, J.; Zhang, K.; Pal, N.R. An Adaptive Neuro-Fuzzy System With Integrated Feature Selection and Rule Extraction for High-Dimensional Classification Problems. IEEE Trans. Fuzzy Syst. 2022, 1–15. [Google Scholar] [CrossRef]
  12. Zhang, P.; Sun, H.; Situ, J.; Jiang, C.; Xie, D. Federated transfer learning for IIoT devices with low computing power based on blockchain and edge computing. IEEE Access 2021, 9, 98630–98638. [Google Scholar] [CrossRef]
  13. Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated learning: Challenges, methods, and future directions. IEEE Signal Process. Mag. 2020, 37, 50–60. [Google Scholar] [CrossRef]
  14. Ning, Z.; Zhang, K.; Wang, X.; Obaidat, M.S.; Guo, L.; Hu, X.; Hu, B.; Guo, Y.; Sadoun, B.; Kwok, R.Y.K. Joint Computing and Caching in 5G-Envisioned Internet of Vehicles: A Deep Reinforcement Learning-Based Traffic Control System. IEEE Trans. Intell. Transp. Syst. 2021, 22, 5201–5212. [Google Scholar] [CrossRef]
  15. Konečnỳ, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. Federated learning: Strategies for improving communication efficiency. arXiv 2016, arXiv:1610.05492. [Google Scholar] [CrossRef]
  16. Sattler, F.; Wiedemann, S.; Müller, K.R.; Samek, W. Robust and communication-efficient federated learning from non-iid data. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 3400–3413. [Google Scholar] [CrossRef] [PubMed]
  17. Mills, J.; Hu, J.; Min, G. User-Oriented Multi-Task Federated Deep Learning for Mobile Edge Computing. arXiv 2020, arXiv:2007.09236. [Google Scholar] [CrossRef]
  18. Guo, H.; Liu, A.; Lau, V.K. Analog gradient aggregation for federated learning over wireless networks: Customized design and convergence analysis. IEEE Internet Things J. 2020, 8, 197–210. [Google Scholar] [CrossRef]
  19. Wu, Y.; Wang, Z.; Zeng, D.; Shi, Y.; Hu, J. Enabling on-device self-supervised contrastive learning with selective data contrast. In Proceedings of the 2021 58th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 5–9 December 2021; pp. 655–660. [Google Scholar] [CrossRef]
  20. Sonmez, C.; Ozgovde, A.; Ersoy, C. Edgecloudsim: An environment for performance evaluation of edge computing systems. Trans. Emerg. Telecommun. Technol. 2018, 29, e3493. [Google Scholar] [CrossRef]
  21. Hu, Y.C.; Patel, M.; Sabella, D.; Sprecher, N.; Young, V. Mobile edge computing—A key technology towards 5G. ETSI White Pap. 2015, 11, 1–16. [Google Scholar]
  22. Ning, Z.; Sun, S.; Wang, X.; Guo, L.; Guo, S.; Hu, X.; Hu, B.; Kwok, R.Y.K. Blockchain-Enabled Intelligent Transportation Systems: A Distributed Crowdsensing Framework. IEEE Trans. Mob. Comput. 2022, 21, 4201–4217. [Google Scholar] [CrossRef]
  23. Mao, Y.; You, C.; Zhang, J.; Huang, K.; Letaief, K.B. A survey on mobile edge computing: The communication perspective. IEEE Commun. Surv. Tutor. 2017, 19, 2322–2358. [Google Scholar] [CrossRef]
  24. Shi, W.; Zhou, S.; Niu, Z.; Jiang, M.; Geng, L. Joint device scheduling and resource allocation for latency constrained wireless federated learning. IEEE Trans. Wirel. Commun. 2020, 20, 453–467. [Google Scholar] [CrossRef]
  25. Liu, Y.; Yuan, X.; Xiong, Z.; Kang, J.; Wang, X.; Niyato, D. Federated learning for 6G communications: Challenges, methods, and future directions. China Commun. 2020, 17, 105–118. [Google Scholar] [CrossRef]
  26. Zhang, P.; Zhang, Y.; Kumar, N.; Guizani, M. Dynamic SFC Embedding Algorithm Assisted by Federated Learning in Space–Air–Ground-Integrated Network Resource Allocation Scenario. IEEE Internet Things J. 2023, 10, 9308–9318. [Google Scholar] [CrossRef]
  27. Luo, S.; Chen, X.; Wu, Q.; Zhou, Z.; Yu, S. HFEL: Joint edge association and resource allocation for cost-efficient hierarchical federated edge learning. IEEE Trans. Wirel. Commun. 2020, 19, 6535–6548. [Google Scholar] [CrossRef]
  28. Hosseinalipour, S.; Brinton, C.G.; Aggarwal, V.; Dai, H.; Chiang, M. From federated to fog learning: Distributed machine learning over heterogeneous wireless networks. IEEE Commun. Mag. 2020, 58, 41–47. [Google Scholar] [CrossRef]
  29. Xue, Z.; Zhou, P.; Xu, Z.; Wang, X.; Xie, Y.; Ding, X.; Wen, S. A resource-constrained and privacy-preserving edge-computing-enabled clinical decision system: A federated reinforcement learning approach. IEEE Internet Things J. 2021, 8, 9122–9138. [Google Scholar] [CrossRef]
  30. Bowo, W.A.; Suhanto, A.; Naisuty, M.; Ma’mun, S.; Hidayanto, A.N.; Habsari, I.C. Data Quality Assessment: A Case Study of PT JAS Using TDQM Framework. In Proceedings of the 2019 Fourth International Conference on Informatics and Computing (ICIC), Semarang, Indonesia, 16–17 October 2019; pp. 1–6. [Google Scholar] [CrossRef]
  31. Zhao, W.; Queralta, J.P.; Westerlund, T. Sim-to-real transfer in deep reinforcement learning for robotics: A survey. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, ACT, Australia, 1–4 December 2020; pp. 737–744. [Google Scholar] [CrossRef]
  32. Ibarz, J.; Tan, J.; Finn, C.; Kalakrishnan, M.; Pastor, P.; Levine, S. How to train your robot with deep reinforcement learning: Lessons we have learned. Int. J. Robot. Res. 2021, 40, 698–721. [Google Scholar] [CrossRef]
  33. Kiran, B.R.; Sobh, I.; Talpaert, V.; Mannion, P.; Al Sallab, A.A.; Yogamani, S.; Pérez, P. Deep reinforcement learning for autonomous driving: A survey. IEEE Trans. Intell. Transp. Syst. 2021, 23, 4909–4926. [Google Scholar] [CrossRef]
  34. Luong, N.C.; Hoang, D.T.; Gong, S.; Niyato, D.; Wang, P.; Liang, Y.C.; Kim, D.I. Applications of deep reinforcement learning in communications and networking: A survey. IEEE Commun. Surv. Tutor. 2019, 21, 3133–3174. [Google Scholar] [CrossRef]
  35. Rudin, N.; Hoeller, D.; Reist, P.; Hutter, M. Learning to walk in minutes using massively parallel deep reinforcement learning. In Proceedings of the Conference on Robot Learning, PMLR, Auckland, New Zealand, 14–18 December 2022; pp. 91–100. [Google Scholar] [CrossRef]
  36. Vithayathil Varghese, N.; Mahmoud, Q.H. A survey of multi-task deep reinforcement learning. Electronics 2020, 9, 1363. [Google Scholar] [CrossRef]
  37. Thakkar, V.; Tewary, S.; Chakraborty, C. Batch Normalization in Convolutional Neural Networks—A comparative study with CIFAR-10 data. In Proceedings of the 2018 Fifth International Conference on Emerging Applications of Information Technology (EAIT), Kolkata, India, 12–13 January 2018; pp. 1–5. [Google Scholar] [CrossRef]
  38. Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Figure 1. The federated learning node selection strategy framework.
Figure 1. The federated learning node selection strategy framework.
Electronics 12 02478 g001
Figure 2. Reinforcement learning strategy.
Figure 2. Reinforcement learning strategy.
Electronics 12 02478 g002
Figure 3. Training accuracy when data conform to IID.
Figure 3. Training accuracy when data conform to IID.
Electronics 12 02478 g003
Figure 4. Training accuracy when data conform to Non-IID.
Figure 4. Training accuracy when data conform to Non-IID.
Electronics 12 02478 g004
Figure 5. Performance under different data variances.
Figure 5. Performance under different data variances.
Electronics 12 02478 g005
Figure 6. Accuracy of IoT devices with similar performance.
Figure 6. Accuracy of IoT devices with similar performance.
Electronics 12 02478 g006
Figure 7. Accuracy in the case of heterogeneous IoT devices.
Figure 7. Accuracy in the case of heterogeneous IoT devices.
Electronics 12 02478 g007
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yan, S.; Zhang, P.; Huang, S.; Wang, J.; Sun, H.; Zhang, Y.; Tolba, A. Node Selection Algorithm for Federated Learning Based on Deep Reinforcement Learning for Edge Computing in IoT. Electronics 2023, 12, 2478.

AMA Style

Yan S, Zhang P, Huang S, Wang J, Sun H, Zhang Y, Tolba A. Node Selection Algorithm for Federated Learning Based on Deep Reinforcement Learning for Edge Computing in IoT. Electronics. 2023; 12(11):2478.

Chicago/Turabian Style

Yan, Shuai, Peiying Zhang, Siyu Huang, Jian Wang, Hao Sun, Yi Zhang, and Amr Tolba. 2023. "Node Selection Algorithm for Federated Learning Based on Deep Reinforcement Learning for Edge Computing in IoT" Electronics 12, no. 11: 2478.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop