You are currently viewing a new version of our website. To view the old version click .
Symmetry
  • Article
  • Open Access

13 August 2025

FedRP: Region-Specific Personalized Identification for Large-Scale IoT Systems

,
,
,
,
,
,
and
1
Hubei Key Laboratory of Internet of Intelligence, School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
2
School of Economics, Huazhong University of Science and Technology, Wuhan 430074, China
*
Author to whom correspondence should be addressed.
This article belongs to the Section Computer

Abstract

The widespread adoption of Internet of Things (IoT) technology has significantly expanded the scale at which devices are connected, posing new challenges to maintaining symmetry in network management. Traditional centralized identification architectures adopt a symmetric processing paradigm in which all device data are uniformly transmitted to the cloud for processing. However, this rigid symmetric structure fails to accommodate the asymmetric distribution typical of IoT edge devices. To address these challenges, this paper proposes an asymmetric identification framework based on cloud–edge collaboration, exploring a high-performance, resource-efficient, and privacy-preserving solution for IoT device identification. The proposed region-specific personalized algorithm (FedRP) introduces a region-specific, personalized identification approach grounded in federated learning principles. Firstly, FedRP leverages a decentralized processing framework to enhance data security by processing data locally. Secondly, it employs a personalized federated learning framework to optimize local models, thus improving identification accuracy and effectiveness. Finally, FedRP strategically separates the personalized parameters of transformer-based blocks from shared parameters and selectively transmits them, reducing the burden on network resources. Comprehensive comparative experiments demonstrate the efficacy of the proposed approach for large-scale IoT environments, which are characterized by numerous devices and complex network conditions.

1. Introduction

With the rapid advancement and widespread adoption of Internet of Things (IoT) technologies, the number of edge devices has grown substantially. While this proliferation strengthens human connectivity, the increasing diversity and number of connected devices also pose significant challenges for network management [1]. As IoT networks continue to expand in both size and complexity, effective device identification has emerged as a pressing necessity [2,3].
In recent years, deep learning has gained substantial attention in the field of IoT device recognition due to its high efficiency and accuracy. By learning the characteristics and behavioral patterns of devices, deep learning models can achieve fine-grained and precise classification [4]. Traditional deep learning-based training paradigms can be broadly categorized as using either centralized or distributed approaches. In centralized training, data must be uploaded to a central server, which becomes inefficient and bandwidth-intensive when handling massive amounts of data from edge devices. Distributed training mitigates some of these issues by enabling parallel computation across nodes; however, it still falls short in ensuring data privacy and security, especially in the face of increasingly frequent and sophisticated network attacks [5,6]. These limitations underscore the need for a learning paradigm that can simultaneously preserve data privacy while supporting distributed training. As an emerging paradigm in distributed learning, Federated Learning (FL) has demonstrated significant potential for privacy-preserving model training [7,8,9]. FL enables collaborative model optimization by training local models on client devices and aggregating model updates on a central server—without transferring raw data—thus ensuring data privacy while achieving global model convergence [10,11].
However, the substantial heterogeneity of IoT devices and their complex behavioral patterns often demand that personalization strategies be differentiated across datasets and deployment scenarios, a need that intensifies the generalization difficulties faced by federated learning (FL) algorithms [12,13]. Under conditions of non-independent and identically distributed (non-IID) data, model performance across clients tends to vary significantly, making it challenging to strike a balance between global model performance and local adaptability. This challenge is further compounded by the limited computational capabilities of edge devices and stringent privacy-preservation requirements, both of which restrict the design space for efficient trade-offs between personalized modeling and shared model parameters. Despite the demonstrated advantages of FL in preserving privacy and enabling collaborative learning across distributed nodes, its practical adoption in real-world IoT environments remains fraught with challenges. Most existing FL methods assume homogeneous data distributions across clients—a condition seldom met in heterogeneous IoT ecosystems, which are characterized by diverse data modalities and behavioral variability. Furthermore, prevailing personalized FL approaches predominantly target user-level customization, neglecting the region- or scenario-level heterogeneity that is commonly encountered in real-world IoT deployments such as smart homes, industrial control systems, and geographically dispersed networks. Compounding these issues is the limited volume of labeled data typically available on individual devices, which significantly hampers the capacity of deep learning models to extract meaningful statistical patterns and to contribute effectively to optimization of a global model.
In summary, this paper aims to design a distributed learning framework that enables efficient and accurate identification of IoT devices while rigorously preserving user data privacy. It is important to note that IoT devices span a wide range of application domains, including smart home appliances (e.g., smart locks, thermostats), wearable health monitors, and industrial control sensors. These devices exhibit significant variations in traffic patterns, communication protocols, and behavioral characteristics, introducing a high degree of heterogeneity into the learning environment and limiting the effectiveness of unified modeling approaches. Against this backdrop, several key challenges remain to be addressed. First, the diverse and non-IID nature of data in IoT ecosystems demands the use of models with strong generalization capabilities—those that can achieve high recognition accuracy while maintaining computational efficiency. One promising direction involves incorporating large-scale pretrained language models, which offer powerful feature extraction and transfer capabilities well-suited to the complex traffic patterns of IoT networks. Second, federated learning must demonstrate robustness across heterogeneous datasets and deployment scenarios. It is critical not only to achieve strong global model performance, but also to enable effective personalization for each client, particularly in situations in which only limited labeled data are available locally, thus ensuring reliable deployment in real-world, resource-constrained environments.
In this paper, we address the aforementioned challenges by proposing a novel distributed learning framework designed to enhance the recognition and identification of IoT devices while ensuring the security and privacy of client data. Our contributions are summarized as follows:
  • We propose a hybrid federated learning framework, the region-specific personalized algorithm (FedRP), which combines federated learning with advanced personalization techniques, enhancing global model generalization and client-specific fine-tuning.
  • Our approach leverages transformer-based blocks to explore data representation in multiple dimensions, accommodating the fluctuating packet lengths characteristic of traffic flows with different devices.
  • Our framework employs differential privacy and secure multiparty computation to ensure data security during training and updates.
  • We design lightweight, resource-efficient algorithms suitable for the varying computational capacities of IoT devices.
  • Extensive experiments on diverse IoT datasets validate our framework’s ability to maintain high accuracy, generalization, personalization, and efficiency while maintaining data privacy.
By addressing these critical challenges, our proposed distributed learning model sets a new standard for IoT device recognition, balancing accuracy, efficiency, personalization, and data privacy. Our contributions pave the way for more secure and effective IoT network management, fostering the continued growth and integration of IoT technologies in various applications and industries.

3. Problem Definition

In IoT device identification, concerns about user privacy have been raised in association with the capturing of device traffic [39]. Researchers have developed methods to prevent privacy leakage [40], particularly in Wide Area Networks (WANs), whereas Local Area Networks (LANs) are generally considered more trustworthy. We assume trusted IoT systems, such as smart homes and industrial IoT, where the possibility that malicious attacks may confuse the model is disregarded, as in previous studies. Due to the difficulty of directly accessing operational nodes for device identification, we leverage communication behavior to identify connected IoT devices. The traffic set S of the IoT device is defined as follows:
S = f i , a i f i F L , a i A
f i = x i 1 , x i 2 , , x i n
x i k = I P s r c i , P o r t s r c i , I P d s t i , P o r t d s t i , P r o t i , x p a y l o a d i k
where FL represents the set of device traffic flows and A is the set of sensing devices that generate these flows. f i is the i -th flow and is composed of a series of packets x i k , and n is the length of the traffic sequence. Packets within the same flow share the same five-tuple (source IP, source port, destination IP, destination port, transport layer protocol). x p a y l o a d i k is the payload information of the packets. Our goal is to build an end-to-end system ψ f i to predict the label T ^ p , which is exactly the real device type T r (i.e., ψ : f i a ).
In traditional federated learning, the client plays a crucial role and independently trains its local model [7]. Generally, the central server initially deploys a base model upon which the clients train using their local data. Once training is complete, clients send model updates rather than raw data to the central server for aggregation. To illustrate the algorithm model in detail, this paper assumes a large-scale IoT system with m heterogeneous intelligent local area networks acting as gateways, i.e., clients. Each client aims to preserve its own data privacy while effectively leveraging the knowledge derived from other clients’ data, thus incorporating the federated learning framework for collaborative training. During the federated learning phase, the IoT cloud center is responsible for coordinating and managing the entire learning process, including global model initialization, parameter aggregation, and model evaluation. Additionally, the collection of device traffic data that client i can store is denoted as:
F i = { f i k , a i k | k [ 1 , N i ] , a i k A i }
where f i k represents unidirectional flow data, a i k represents device type labels, N i is the current number of flow data instances owned by the client, and A i represents the set of device types within the current edge node. Assuming that the model parameters of the local model are θ and that i ( f i k , θ ) denotes the loss function evaluated by the client’s model for a single data instance f i k , the overall loss function L i ( θ ) for the local model across all data instances of the i-th device can be expressed as follows:
L i ( θ ) = 1 N i k = 1 N i i ( f i k , θ )
The objective of traditional federated learning can be simplified into an optimization problem, as shown in Equation (3), below:
min θ 1 m i = 1 m L i ( θ , F i )
This objective minimizes the average local loss across all clients, ensuring that the global model θ g l o b a l converges to a solution that performs well on all clients.
Federated learning yields high-performance models by minimizing the total loss function, making its distributed training crucial in cloud–edge collaborative environments. However, in IoT systems, edge nodes may struggle to obtain optimal models due to limitations in local data or computational capabilities. Centralizing all data in the cloud introduces privacy and security risks. Federated learning addresses this by exchanging information through a cloud center, leveraging local data patterns for model training. Nevertheless, the high heterogeneity of data from different sources can result in poor model performance on certain clients, affecting overall performance and fairness. Therefore, personalized federated learning has been proposed to retain each node’s data characteristics while enhancing the model’s generalization ability and adaptability.
This paper adopts personalized federated learning as the foundational approach, optimizing models locally to better meet the specific requirements and data characteristics of each client. This approach achieves true model convergence while preserving data privacy. The optimization objective of modern personalized federated learning techniques can be summarized as follows:
min θ i i = 1 m , θ 0 1 m i = 1 m ( L i ( θ , F i ) + λ i θ i θ 0 )
In this equation, λ i represents the regularization parameter and θ 0 denotes the reference model used for calibrating and adjusting the personalized model. Both the parameter set θ i contained in the local model and the reference parameter set θ 0 must be stored by the client to generate the personalized model. The additional parameter sets not only increase communication costs but also waste edge storage space by storing redundant information. Inspired by traditional centralized deep learning, where heterogeneous data distributed across different tasks can be used to train a universal representation that requires only a small subset of parameters for learning specific tasks, this paper proposes communicating only the globally shared universal parameters, retaining the personalized parameters locally. This approach eliminates the need for the reference parameter set typically required in personalized federated learning. Therefore, the optimization problem in equation (5) can be rewritten as:
min ( s , p i ) 1 m i = 1 m L i ( s , p i , F i )
where s represents the shared parameters that are common across all clients and facilitate the cloud–edge model interaction, while p i denotes the personalized parameters specific to client i. The optimization involves adjusting both the shared parameters s and the personalized parameters p i for each client.

4. Methodology

This study proposes a region-personalized federated learning framework, termed FedRP, to address the challenges of IoT device classification. By leveraging the powerful feature-extraction capabilities of large-scale pretrained transformer models, FedRP enables accurate and robust device categorization. Unlike traditional centralized processing paradigms, FedRP effectively mitigates concerns related to data privacy and high hardware-based resource demands. Moreover, it improves communication efficiency and is well-suited for deployment on resource-constrained edge devices. A detailed framework of the FedRP model is illustrated in Figure 1.
Figure 1. The overview of the FedRP model.

4.1. Model Architecture

IoT device traffic is inherently sequential and stream-oriented, with strong temporal dependencies between adjacent packets. Therefore, in order to effectively identify devices under specific scenarios, the pretrained network must learn the interpacket associations present in labeled traffic inputs. This learning paradigm, widely employed in natural language processing (NLP), is commonly referred to as “fine-tuning.” Since the pretrained encoder is device-agnostic, it can be reused across various device types after fine-tuning.
Additionally, traffic flows generated by different IoT devices often exhibit fluctuations in packet lengths, introducing diversity in traffic patterns. To address this, we employ transformer-based modules to extract multidimensional representations from the input sequences. Specifically, we use pretrained transformer blocks to capture byte-level dependencies in device traffic. Let X = x 1 , x 2 , , x k denote the tokenized and labeled input data, which are first projected into the input embedding space using a trainable matrix W H , as follows:
X ^ = X × W H
We then augment X ^ with positional embeddings to form the final embedding matrix E, which is passed through a stack of M transformer encoders. The operation of the ( N + 1 ) th encoder block, where N = 1 , 2 , , M 1 , begins by taking the input H N = [ h 1 N , h 2 N , , h k N ] (with H 0 = E ) and applying linear projections to generate the corresponding key, query, and value vectors:
K N = W K H N , Q N = W Q H N , V N = W V H N
where W K , W Q , and W V are the respective learnable projection matrices. These vectors are used to compute the core component of the transformer: the multi-head self- attention mechanism:
A T T ( Q , K , V ) = softmax Q K T d K V
head i = A T T ( Q W i Q , K W i K , V W i V )
MultiH ( Q , K , V ) = Concat ( head 1 , , head H ) · W O
where d K denotes the dimensionality of K and head i is the output of the i-th attention head. W O is a learnable projection matrix used to combine the outputs of all heads. This is followed by a residual connection and layer normalization, after which a feed-forward network (FFN) with F hidden units is applied to produce the final encoder output, as follows:
Output = LayerNorm ( H N + MultiH ( Q , K , V ) )
H N + 1 = i = 1 F W 2 i max ( 0 , W 1 i Output + b 1 i ) + b 2
where W 1 , W 2 , b 1 , and b 2 are learnable parameters of the FFN and max ( 0 , x ) denotes the ReLU activation function. The output embedding H N + 1 serves either as input to the next encoder or as the final representation H T used for device classification.
As shown in Figure 2, each transformer encoder module consists of two key components: the multi-head attention layer and the feed-forward network. Both are connected via linear projection layers that map the feature space and are followed by residual connections and normalization layers. To enable local data-driven personalization, we insert lightweight adapter modules within each sublayer to support customized adaptation.
Figure 2. Schematic diagram of the adapter module.
Given the resource limitations on edge devices, each adapter adopts a bottleneck architecture to minimize computational overhead. As shown in Figure 2, the adapter comprises two feed-forward layers: the first compresses the original D-dimensional feature into a lower M-dimensional subspace for task-specific adaptation, and the second projects it back to D dimensions. By ensuring M D , the adapter achieves efficient parameter usage while maintaining compatibility with edge hardware. A residual connection is also integrated to approximate identity mapping during early training stages, thereby preserving model stability and facilitating effective personalization.

4.2. The Proposed FedRP Method

The regional personalized federated learning algorithm proposed in this paper involves combining the regional personalized updates of local models with the generation of a global recognition model. Algorithm 1 presents the pseudocode implementation of the FedRP algorithm. Algorithm 2 presents the implementation of local parameter updates in FedRP. Specifically, FedRP comprises the following steps. First, the cloud server utilizes the global model as the initialization model and distributes this original model to each client i. Next, each client i uses its local dataset F i for local training, thereby updating the shared parameters of the local model. The local update of the shared parameters by client i in the t-th communication round is represented as follows:
s i t + 1 = s i t η L i ( s i t , p i t , F i )
where s i t represents the shared parameters of client i in the t-th communication round and η is the learning rate.
Each client i also updates the personalized parameters of the local model using its local dataset F i . The local update of the personalized parameters by client i in the t-th communication round is represented as follows:
p i t + 1 = p i t η L i ( s i t , p i t )
Algorithm 1 Regional Personalized Federated Learning (FedRP)
Require: 
m, F = { F i | i ( 0 , m ) } , T, p i 0 , E, η
Ensure: 
The personalized models at the edge and the central global model
1:
Center Aggregation
2:
Extract the initial shared parameters s 0 .
3:
for  t 1 to T do
4:
   The cloud center distributes the shared parameters s t to the clients.
5:
   for  i 1 to m do
6:
       s i t = C l i e n t U p d a t e ( s i t , p i t , F i )
7:
      Device i uploads s i t to the server.
8:
   end for
9:
   Model-parameter aggregation: s t + 1 = 1 m i = 1 m s i t + 1
10:
end for
Algorithm 2 ClientUpdate
Require: 
s i t , p i t , F i , η
1:
Init s i , 0 t = s i t
2:
Init p i , 0 t = p i t
3:
for  k 0 to E 1  do
4:
    Compute shared parameters:
        s i , k + 1 t + 1 = s i t η L i ( s i t , p i t , F i )
5:
    Compute personalized parameters:
        p i , k + 1 t + 1 = p i t η L i ( s i t , p i t , F i )
6:
    update s i t + 1 = s i , E t + 1
7:
    update p i t + 1 = p i , E t + 1
8:
end for
9:
return  s i t + 1
Each client uploads the updated shared parameters s i t + 1 to the IoT cloud center server while retaining the personalized parameters locally. The cloud center receives the shared parameter sets S t = { s i t | i ( 0 , n ) } provided by each client. These parameters contain coarse-grained representations of the data from each client. The cloud center aggregates them by averaging in the t-th communication round. The aggregation of the shared parameters by the cloud center in the t-th communication round is represented as follows:
s t + 1 = 1 m i = 1 m s i t + 1
After undergoing local model training, each edge LAN client obtains shared parameters and personalized parameters that can be used to update local personalized models. This process iterates multiple times, repeating these steps. Eventually, the personalized models obtained by each client can be used for edge-node device management, while the global model can be utilized for macroscopic monitoring by the cloud center. The number of clients is denoted as m. The dataset of client traffic is represented as F = { F i | i ( 0 , m ) } . The communication rounds are denoted as T. The initial value of personalized parameters is p i 0 . The number of client updates, learning rate, and other parameters are also defined in the context of the algorithm.

5. Evaluation

5.1. Datasets

The experiments in this section are primarily based on the UNSW-2016 IoT dataset curated by Sivanathan et al. [28], which captures network traffic from a wide range of IoT devices in a real-world environment. TP-Link routers were used as gateways, with traffic collected using the TCPdump tool and stored in PCAP format over a continuous 20-day period. The dataset includes 23 commonly used IoT device types. However, several devices were associated with extremely limited traffic, which may lead to significant class imbalance and reduced experimental stability. To ensure robustness and reproducibility, we excluded three device types with insufficient data and focused on the remaining twenty for training and evaluation. Additionally, the traffic flows were labeled according to device type, and MAC addresses were omitted to protect user privacy. The mapping between device labels and MAC addresses is shown in Table 1.
Table 1. UNSW-2016 DATASET.
To further validate the effectiveness and generalizability of the proposed approach, we introduced the USTC-TFC2016 dataset as a supplementary benchmark. This dataset was compiled by researchers at the University of Science and Technology of China (USTC) between 2011 and 2015 from real-world network environments. It consists of two parts: the first includes samples of ten types of malicious traffic, and the second includes ten types of benign traffic. USTC-TFC2016 comprises a total of 119,820 traffic records, from which we randomly selected 100,000 samples for training and evaluation. The details of the composition of the dataset are provided in Table 2.
Table 2. USTC-TFC2016 DATASET.
However, due to the use of federated learning, certain operations need to be performed on the dataset construction. To realistically simulate data heterogeneity in real-world IoT systems, this experiment employs three commonly used non-independent and identically distributed (non-IID) data patterns from the federated learning literature to simulate heterogeneous scenarios:
  • Balanced Dirichlet Distribution: Each client receives an equal number of samples, but the label distribution varies across clients, following a Dirichlet distribution with parameter α . In this experiment, α is set to 0.01 or 0.1. The datasets generated from this distribution are labeled as D B 1 , D B 2 , D B 3 , and D B 4 and are distributed to the clients.
  • Unbalanced Dirichlet Distribution: Each client receives a different number of samples, and the label distribution varies, also following a Dirichlet distribution with parameter α . In this experiment, α is set to 0.1 or 1.0. The datasets generated from this distribution are labeled as D U 1 , D U 2 , D U 3 , and D U 4 and are distributed to the clients.
  • Pathological Non-IID Distribution: Each client may have a different total number of samples, but the key feature is that each client possesses only samples from two completely distinct categories. The datasets generated from this distribution are labeled as D P 1 , D P 2 , D P 3 , and D P 4 and are distributed to the clients.
Compared to the pathological non-IID setting, the unbalanced Dirichlet distribution more closely resembles real-world IoT applications. In practical IoT environments, each client participating in personalized federated learning typically has devices from a variety of categories, with potential overlap in categories between clients with different data volumes. This scenario is common in practice. However, the data distribution in the pathological non-IID setting is more extreme, as it contains only two categories, which creates a more challenging scenario for research purposes compared to the balanced and unbalanced Dirichlet distributions.

5.2. Experimental Settings

All experiments in this section were conducted on the Windows 11 operating system. The server configuration featured an Intel Core i9-13900KF processor, 128 GB of RAM, and an RTX-4090 GPU with CUDA 11.3 for parallel training acceleration. Python 3.9 was used as the programming language, and PyTorch 1.11.0 was employed as the deep learning framework. In all experiments, the Adam optimizer was used for training, with the learning rate set to 0.0001 and β values set to (0.9, 0.999). The dimension M of the FFN in the adapter was set to 64, which is consistent with widely adopted settings in the literature. Four clients participated in the federated learning process, with each client performing five local update rounds and 100 communication rounds between the server and the clients.

5.3. Metrics

To evaluate the performance of IoT device identification, we use several key metrics: Accuracy (Acc), macro-averaged F 1 score ( F 1 m a c r o ), average precision (Pre), and average recall (Rec). These metrics are defined as follows:
A c c = 1 20 i = 1 20 T P i + T N i T P i + T N i + F P i + F N i
Accuracy (Acc) measures the overall correctness of the model by computing the ratio of correctly predicted instances to the total number of instances.
P r e = 1 20 i = 1 20 T P i T P i + F P i
Precision (Pre) is the ratio of true positive predictions to the total number of positive predictions, averaged across all classes.
R e c = 1 20 i = 1 20 T P i T P i + F N i
Recall (Rec) is the ratio of true positive predictions to the total number of actual positive instances, averaged across all classes.
F 1 m a c r o = 1 20 i = 1 20 2 × P r e c i s i o n i × R e c a l l i P r e c i s i o n i + R e c a l l i
Macro-averaged F 1 Score ( F 1 m a c r o ) is the harmonic mean of Precision and Recall, calculated for each class and then averaged.
In these formulas, T P , T N , F P , and F N denote true positives, true negatives, false positives, and false negatives, respectively. The division by 20 indicates averaging across 20 classes, reflecting a multi-class classification problem.

5.4. Experimental Results

To validate the effectiveness of FedRP, this experiment selects four categories of federated learning algorithms and fully local training as baselines. They are +FedAvg [27], PACFL [41], FedProto [42], FedAP [43], and LOCAL. To validate the effectiveness of transformer-based blocks in IoT device identification, this experiment selected five existing ML- and DL-based identification algorithms as benchmarks: NB [44], SVM [45], CNN [26], Audi [17], and DEFT [46]. To ensure fairness, each model is re-implemented with the parameters and configurations described in the corresponding research, thereby reproducing the results of these approaches. Additionally, all models are trained on the same training set and evaluated on the same test set.

5.4.1. Quantitative Evaluation

This section presents the experimental evaluation of the FedRP algorithm on the UNSW-2016 and USTC-TFC2016 datasets. The average accuracy across all client test sets is used as the evaluation criterion, with results presented in Table 3 and Table 4. The experimental results show that the FedRP algorithm outperforms other baseline methods on both datasets.
Table 3. Average accuracy and performance improvement on UNSW-2016.
Table 4. Average accuracy and performance improvement on USTC-TFC2016.
On the UNSW-2016 dataset, FedRP demonstrates superior performance in most cases, especially under conditions of pathological distributions. In such cases, the average accuracy of FedRP reaches 95.13%, which is significantly higher than those of FedAvg and other personalized federated learning methods. When α is set to 0.01, the accuracy of FedRP is 93.03%, further validating its ability to handle data heterogeneity and imbalanced distributions effectively. On the USTC-TFC2016 dataset, all algorithms perform significantly better than they do on the UNSW-2016 dataset, primarily because the UNSW-2016 dataset contains more noise, such as nonattack traffic or mixed traffic, which can interfere with model performance. Nonetheless, FedRP still performs very well on this dataset. Compared to FedAvg, FedRP shows a significant improvement in accuracy across all data distributions. For example, under pathological conditions, FedRP’s accuracy is 98.13%, approximately 7.22% higher than that of FedAvg. The results from both datasets demonstrate that FedRP consistently achieves high accuracy across different data distributions, confirming its effectiveness and stability in diverse scenarios.
Furthermore, while FedAP also achieves good performance on both datasets, it relies on additional local data to generate pretrained models to guide the personalization of local models. In contrast, FedRP assesses the relationships between clients without requiring extra local data. Unlike FedAvg or FedAP, FedRP does not aggregate parameters globally across all clients. Instead, it first performs local aggregation of shared parameters within regions to form regional representations; then, each client uses the adapter module for lightweight personalization. This approach retains the locality of the data while avoiding the overhead of large-scale parameter synchronization, making it more suitable for deployment on resource-constrained edge devices. Thus, FedRP is more efficient in environments with limited resources.

5.4.2. Local Comparison

In this chapter, we compare the FedRP method with other local centralized learning methods in terms of their performance on the UNSW-2016 dataset. The local methods are based on the first dataset D B 1 , which is constructed using a balanced Dirichlet distribution ( α = 0.01 ). As shown in Figure 3, the personalized FedRP model demonstrates significant improvements across various performance metrics compared to methods such as NB, SVM, CNN, and others. Specifically, compared to NB and SVM, FedRP achieves improvements of over 40% in both accuracy and F1 score. When compared to DEFT, the FedRP model shows improvements of more than 10% in accuracy and over 11% in F1 score. Furthermore, the performance of locally trained models is comparable to that of the FedRP models, further confirming the strong device-recognition capabilities of the LLM model itself. This demonstrates that within the federated learning framework, FedRP is capable of achieving efficient and accurate device recognition.
Figure 3. The recognition accuracy and F1 score of local centralized learning on UNSW-2016.

5.4.3. Error Analysis

This section analyzes the recognition performance of FedRP across different datasets and data distributions, with error analysis conducted using confusion matrices. Figure 4 and Figure 5 present the average confusion matrices of FedRP on the UNSW-2016 and USTC-TFC2016 datasets under imbalanced Dirichlet distributions ( α = 0.1 and α = 1.0 ). As shown in the figures, FedRP accurately identifies most device types, achieving recognition accuracies exceeding 90% in many cases. Misclassifications are sparse and occur with low probability. Notably, the recognition performance under α = 0.1 is slightly better than that under α = 1.0 , which may be attributed to the fact that increased data heterogeneity enhances the benefits of personalization, allowing models to better capture task-specific features. Although performance may slightly decline under more heterogeneous distributions, the recognition accuracy for most devices remains consistently high. This robustness is largely attributed to the strong representational and generalization capabilities of the global model within the federated learning framework, which enables local personalized models to achieve competitive recognition results even when the degree of personalization is moderate.
Figure 4. Average confusion matrix on UNSW-2016. (a) The result under the imbalanced Dirichlet distribution with α = 0.1 . (b) The result under the imbalanced Dirichlet distribution with α = 1.0 .
Figure 5. Average confusion matrix on USTC-TFC2016. (a) The result under the imbalanced Dirichlet distribution with α = 0.1 . (b) The result under the imbalanced Dirichlet distribution with α = 1.0 .

5.4.4. Train Validation

Figure 6 illustrates the training loss curves of four representative devices in FedRP under a balanced Dirichlet distribution with α = 0.01 . For better visualization, only the first 350 training epochs are displayed. As observed, all four devices exhibit rapid convergence, with loss values consistently decreasing over time. This downward trend indicates that the model is progressively learning meaningful representations from the data during training. Moreover, the absence of pronounced signs of overfitting or underfitting suggests that the personalized federated learning approach effectively mitigates model drift and maintains training stability. A closer inspection of the curves reveals periodic small peaks occurring every five epochs. These fluctuations correspond to the communication rounds between clients and the central server. During these rounds, the server distributes updated global parameters, prompting local models to adjust accordingly and thus resulting in transient instability. Nevertheless, the curves quickly return to stable trajectories after each communication round, reflecting the robustness and adaptability of the FedRP framework.
Figure 6. Loss functions of local models under a balanced Dirichlet distribution with α = 0.01 : (a) local model 1; (b) local model 2; (c) local model 3; (d) local model 4.

5.4.5. Ablation Study

This section investigates the impact of different local training epochs and learning rates on the performance of the FedRP model. Table 5 presents the average performance of FedRP on the UNSW-2016 dataset under a balanced Dirichlet distribution with α = 0.01 . From Table 5, it can be observed that as the number of local update rounds increases, the performance metrics of FedRP first improve and then decrease. The primary reason for this trend may be twofold: on the one hand, when the number of local updates increases, each client conducts multiple training sessions on its local dataset, which can lead to overfitting; on the other hand, due to the significant heterogeneity of the local data, as the number of local training rounds increases, clients are more likely to focus on updating and learning personalized parameters, leading to a phenomenon known as “knowledge forgetting,” particularly with small sample sizes. This leads the model to forget previously learned knowledge, which hinders the development of effective personalized models. Therefore, this experiment adopts five local training rounds.
Table 5. Performance comparison of FedRP under different local epochs (%).
Figure 7 presents a performance comparison of FedRP under different learning rates, with the dataset being UNSW-2016. The experimental results show that the performance of the model exhibits significant changes as the learning rate increases. From the figure, it can be seen that the model performs best when the learning rate is set to 1 × 10 4 , achieving an accuracy of 94.92% and an F1 score of 0.9501. Specifically, lower learning rates (such as 1 × 10 5 and 1 × 10 6 ) result in relatively high accuracy but do not correspond to similarly high F1 scores (0.9326 and 0.9221, respectively), indicating a possible underfitting phenomenon. Further observation reveals that when the learning rate is 1 × 10 3 , although the accuracy is 82.32%, the F1 score is only 0.8342, suggesting that the model may have overfitted at this high learning rate, leading to a decrease in performance. Therefore, when selecting the learning rate, it is crucial to consider both the model’s generalization ability and the stability of the training process.
Figure 7. Performance comparison of FedRP under different learning rates on UNSW-2016.

6. Conclusions

This study proposed a privacy-preserving personalized IoT device-identification model suitable for cloud–edge collaborative systems, aiming to achieve high-precision classification and data-privacy security. Through the construction of a cloud–edge collaborative system model, the identification algorithm was deployed at both edge nodes and central nodes. The FedRP model, based on federated learning and personalized fine-tuning, ensured efficient performance and protection of data privacy. FedRP utilized a federated learning framework to gather comprehensive information from multiple participants, addressing functional differences and data heterogeneity across network nodes. By incorporating adapters into the transformer model, FedRP obtained personalized and shared parameters, uploading only the shared parameters to reduce communication overhead. Experimental results demonstrated that FedRP struck a balance between performance in device identification and privacy security, making it suitable for device-identification tasks in large-scale cloud–edge collaborative IoT systems.

Author Contributions

Conceptualization, Y.J. and B.X.; methodology, B.C.; validation, B.Z. and Y.L.; formal analysis, J.W. (Jiacheng Wang) and F.G.; investigation, Y.J.; writing—original draft preparation, B.C. and J.W. (Junfei Wang); writing—review and editing, Y.L. and B.X.; visualization, Y.L.; supervision, J.W. (Junfei Wang) and Y.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the Key Research and Development Program of Hubei Province, China under Grants 2024BAB031, 2024BAB016.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Kim, H.; Feamster, N. Improving network management with software defined networking. IEEE Commun. Mag. 2013, 51, 114–119. [Google Scholar] [CrossRef]
  2. Ghimire, B.; Rawat, D.B. Recent Advances on Federated Learning for Cybersecurity and Cybersecurity for Federated Learning for Internet of Things. IEEE Internet Things J. 2022, 9, 8229–8249. [Google Scholar] [CrossRef]
  3. Peng, K.; Wang, L.; He, J.; Cai, C.; Hu, M. Joint Optimization of Service Deployment and Request Routing for Microservices in Mobile Edge Computing. IEEE Trans. Serv. Comput. 2024, 17, 1016–1028. [Google Scholar] [CrossRef]
  4. Zha, Z.; He, J.; Zhen, L.; Yu, M.; Dong, C.; Li, Z.; Wu, G.; Zuo, H.; Peng, K. A BiGRU Model Based on the DBO Algorithm for Cloud-Edge Communication Networks. Appl. Sci. 2024, 14, 10155. [Google Scholar] [CrossRef]
  5. Baccour, E.; Mhaisen, N.; Abdellatif, A.A.; Erbad, A.; Mohamed, A.; Hamdi, M.; Guizani, M. Pervasive AI for IoT Applications: A Survey on Resource-Efficient Distributed Artificial Intelligence. IEEE Commun. Surv. Tutor. 2022, 24, 2366–2418. [Google Scholar] [CrossRef]
  6. Deng, T.; Xu, X.; Zou, Z.; Liu, W.; Wang, D.; Hu, M. Multidrone Parcel Delivery via Public Vehicles: A Joint Optimization Approach. IEEE Internet Things J. 2024, 11, 9312–9323. [Google Scholar] [CrossRef]
  7. Arisdakessian, S.; Wahab, O.A.; Mourad, A.; Otrok, H.; Guizani, M. A Survey on IoT Intrusion Detection: Federated Learning, Game Theory, Social Psychology, and Explainable AI as Future Directions. IEEE Internet Things J. 2023, 10, 4059–4092. [Google Scholar] [CrossRef]
  8. Xu, B.; Guo, J.; Ma, F.; Hu, M.; Liu, W.; Peng, K. On the Joint Design of Microservice Deployment and Routing in Cloud Data Centers. J. Grid Comput. 2024, 22, 42. [Google Scholar] [CrossRef]
  9. Peng, K.; Xie, J.; Wei, L.; Hu, J.; Hu, X.; Deng, T.; Hu, M. Clustering-Based Collaborative Storage for Blockchain in IoT Systems. IEEE Internet Things J. 2024, 11, 33847–33860. [Google Scholar] [CrossRef]
  10. Hu, M.; Guo, Z.; Wen, H.; Wang, Z.; Xu, B.; Xu, J.; Peng, K. Collaborative Deployment and Routing of Industrial Microservices in Smart Factories. IEEE Trans. Ind. Inform. 2024, 20, 12758–12770. [Google Scholar] [CrossRef]
  11. Wang, L.; Li, Z.; Wang, C.; Li, J.; Hu, M.; Liu, W.; Peng, K. Obstacle-Aware Multicast Routing Algorithm for Large-Scale LEO Constellations. IEEE Trans. Netw. Sci. Eng. 2024, 11, 4551–4563. [Google Scholar] [CrossRef]
  12. Habbal, A.; Ali, M.K.; Abuzaraida, M.A. Artificial Intelligence Trust, Risk and Security Management (AI TRiSM): Frameworks, applications, challenges and future research directions. Expert Syst. Appl. 2024, 240, 122442. [Google Scholar] [CrossRef]
  13. Peng, K.; He, J.; Guo, J.; Liu, Y.; He, J.; Liu, W.; Hu, M. Delay-Aware Optimization of Fine-Grained Microservice Deployment and Routing in Edge via Reinforcement Learning. IEEE Trans. Netw. Sci. Eng. 2024, 11, 6024–6037. [Google Scholar] [CrossRef]
  14. Kebande, V.R.; Awad, A.I. Industrial Internet of Things Ecosystems Security and Digital Forensics: Achievements, Open Challenges, and Future Directions. ACM Comput. Surv. 2024, 56, 1–37. [Google Scholar] [CrossRef]
  15. Xu, Q.; Zheng, R.; Saad, W.; Han, Z. Device Fingerprinting in Wireless Networks: Challenges and Opportunities. IEEE Commun. Surv. Tutor. 2016, 18, 94–104. [Google Scholar] [CrossRef]
  16. Sivanathan, A.; Gharakheili, H.H.; Sivaraman, V. Detecting Behavioral Change of IoT Devices Using Clustering-Based Network Traffic Modeling. IEEE Internet Things J. 2020, 7, 7295–7309. [Google Scholar] [CrossRef]
  17. Marchal, S.; Miettinen, M.; Nguyen, T.D.; Sadeghi, A.R.; Asokan, N. AuDI: Toward Autonomous IoT Device-Type Identification Using Periodic Communication. IEEE J. Sel. Areas Commun. 2019, 37, 1402–1412. [Google Scholar] [CrossRef]
  18. Meidan, Y.; Bohadana, M.; Shabtai, A.; Ochoa, M.; Tippenhauer, N.O.; Guarnizo, J.D.; Elovici, Y. Detection of Unauthorized IoT Devices Using Machine Learning Techniques. arXiv 2017, arXiv:1709.04647. [Google Scholar]
  19. Aksoy, A.; Gunes, M.H. Automated IoT Device Identification using Network Traffic. In Proceedings of the ICC 2019—2019 IEEE International Conference on Communications (ICC), Shanghai, China, 20–24 May 2019; pp. 1–7. [Google Scholar]
  20. Desai, B.A.; Divakaran, D.M.; Nevat, I.; Peter, G.W.; Gurusamy, M. A feature-ranking framework for IoT device classification. In Proceedings of the 2019 11th International Conference on Communication Systems & Networks (COMSNETS), Bengaluru, India, 7–11 January 2019; pp. 64–71. [Google Scholar]
  21. Yeganeh, Y.; Farshad, A.; Boschmann, J.; Gaus, R.; Frantzen, M.; Navab, N. FedAP: Adaptive Personalization in Federated Learning for Non-IID Data. In Distributed, Collaborative, and Federated Learning, and Affordable AI and Healthcare for Resource Diverse Global Health; Albarqouni, S., Bakas, S., Bano, S., Cardoso, M.J., Khanal, B., Landman, B., Li, X., Qin, C., Rekik, I., Rieke, N., et al., Eds.; Springer: Cham, Switzerland, 2022; pp. 17–27. [Google Scholar]
  22. Bao, J.; Hamdaoui, B.; Wong, W.K. IoT Device Type Identification Using Hybrid Deep Learning Approach for Increased IoT Security. In Proceedings of the 2020 International Wireless Communications and Mobile Computing (IWCMC), Limassol, Cyprus, 15–19 June 2020; pp. 565–570. [Google Scholar]
  23. Xu, H.; Zhang, Z.; Yu, X.; Wu, Y.; Zha, Z.; Xu, B.; Xu, W.; Hu, M.; Peng, K. Targeted Training Data Extraction—Neighborhood Comparison-Based Membership Inference Attacks in Large Language Models. Appl. Sci. 2024, 14, 7118. [Google Scholar] [CrossRef]
  24. Yin, F.; Yang, L.; Wang, Y.; Dai, J. IoT ETEI: End-to-End IoT Device Identification Method. In Proceedings of the 2021 IEEE Conference on Dependable and Secure Computing (DSC), Fukushima, Japan, 30 January–2 February 2021; pp. 1–8. [Google Scholar]
  25. Sánchez Sánchez, P.M.; Huertas Celdrán, A.; Bovet, G.; Martínez Pérez, G. Adversarial attacks and defenses on ML- and hardware-based IoT device fingerprinting and identification. Future Gener. Comput. Syst. 2024, 152, 30–42. [Google Scholar] [CrossRef]
  26. Liu, Y.; Wang, J.; Li, J.; Song, H.; Yang, T.; Niu, S.; Ming, Z. Zero-Bias Deep Learning for Accurate Identification of Internet-of-Things (IoT) Devices. IEEE Internet Things J. 2021, 8, 2627–2634. [Google Scholar] [CrossRef]
  27. McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A.y. Communication-Efficient Learning of Deep Networks from Decentralized Data. Proc. Mach. Learn. Res. 2017, 54, 1273–1282. [Google Scholar]
  28. Feng, K.; Luo, L.; Xia, Y.; Luo, B.; He, X.; Li, K.; Zha, Z.; Xu, B.; Peng, K. Optimizing Microservice Deployment in Edge Computing with Large Language Models: Integrating Retrieval Augmented Generation and Chain of Thought Techniques. Symmetry 2024, 16, 1470. [Google Scholar] [CrossRef]
  29. Wang, J.; Liu, Q.; Liang, H.; Joshi, G.; Poor, H.V. Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization. Adv. Neural Inf. Process. Syst. 2020, 33, 7611–7623. [Google Scholar]
  30. Wu, F.; Guo, S.; Qu, Z.; He, S.; Liu, Z.; Gao, J. Anchor Sampling for Federated Learning with Partial Client Participation. Proc. Mach. Learn. Res. 2023, 202, 37379–37416. [Google Scholar]
  31. Li, Z.; Lin, T.; Shang, X.; Wu, C. Revisiting Weighted Aggregation in Federated Learning with Neural Networks. Proc. Mach. Learn. Res. 2023, 202, 19767–19788. [Google Scholar]
  32. Palihawadana, C.; Wiratunga, N.; Wijekoon, A.; Kalutarage, H. FedSim: Similarity guided model aggregation for Federated Learning. Neurocomputing 2022, 483, 432–445. [Google Scholar] [CrossRef]
  33. Lim, W.Y.B.; Luong, N.C.; Hoang, D.T.; Jiao, Y.; Liang, Y.C.; Yang, Q.; Niyato, D.; Miao, C. Federated Learning in Mobile Edge Networks: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2020, 22, 2031–2063. [Google Scholar] [CrossRef]
  34. He, Z.; Yin, J.; Wang, Y.; Gui, G.; Adebisi, B.; Ohtsuki, T.; Gacanin, H.; Sari, H. Edge Device Identification Based on Federated Learning and Network Traffic Feature Engineering. IEEE Trans. Cogn. Commun. Netw. 2022, 8, 1898–1909. [Google Scholar] [CrossRef]
  35. Zhang, W.; Lu, Q.; Yu, Q.; Li, Z.; Liu, Y.; Lo, S.K.; Chen, S.; Xu, X.; Zhu, L. Blockchain-Based Federated Learning for Device Failure Detection in Industrial IoT. IEEE Internet Things J. 2021, 8, 5926–5937. [Google Scholar] [CrossRef]
  36. Mothukuri, V.; Khare, P.; Parizi, R.M.; Pouriyeh, S.; Dehghantanha, A.; Srivastava, G. Federated-Learning-Based Anomaly Detection for IoT Security Attacks. IEEE Internet Things J. 2022, 9, 2545–2554. [Google Scholar] [CrossRef]
  37. Xu, G.; Xu, S.; Fan, X.; Cao, Y.; Mao, Y.; Xie, Y.; Chen, X.B. RAT Ring: Event Driven Publish/Subscribe Communication Protocol for IIoT by Report and Traceable Ring Signature. IEEE Trans. Ind. Inform. 2025, 1–9. [Google Scholar] [CrossRef]
  38. Cao, Z.; Huang, L.; Wang, T.; Wang, Y.; Shi, J.; Zhu, A.; Shi, T.; Snoussi, H. Understanding the Dimensional Need of Noncontrastive Learning. IEEE Trans. Cybern. 2025, 1–14. [Google Scholar] [CrossRef]
  39. Peng, K.; Liao, T.; Liao, X.; Xie, J.; Xu, B.; Deng, T.; Hu, M. DCMM: Dynamic Cluster-Based Mobile Node Migration Scheme for Blockchain Collaborative Storage in Mobile IoT Networks. IEEE Trans. Netw. Sci. Eng. 2025, 12, 584–598. [Google Scholar] [CrossRef]
  40. Alneyadi, S.; Sithirasenan, E.; Muthukkumarasamy, V. A survey on data leakage prevention systems. J. Netw. Comput. Appl. 2016, 62, 137–152. [Google Scholar] [CrossRef]
  41. Vahidian, S.; Morafah, M.; Wang, W.; Kungurtsev, V.; Chen, C.; Shah, M.; Lin, B. Efficient Distribution Similarity Identification in Clustered Federated Learning via Principal Angles between Client Data Subspaces. Proc. AAAI Conf. Artif. Intell. 2023, 37, 10043–10052. [Google Scholar] [CrossRef]
  42. Tan, Y.; Long, G.; Liu, L.; Zhou, T.; Lu, Q.; Jiang, J.; Zhang, C. FedProto: Federated Prototype Learning across Heterogeneous Clients. Proc. AAAI Conf. Artif. Intell. 2022, 36, 8432–8440. [Google Scholar] [CrossRef]
  43. Lu, W.; Wang, J.; Chen, Y.; Qin, X.; Xu, R.; Dimitriadis, D.; Qin, T. Personalized Federated Learning with Adaptive Batchnorm for Healthcare. arXiv 2022, arXiv:2112.00734. [Google Scholar] [CrossRef]
  44. Chakraborty, B.; Divakaran, D.M.; Nevat, I.; Peters, G.W.; Gurusamy, M. Cost-Aware Feature Selection for IoT Device Classification. IEEE Internet Things J. 2021, 8, 11052–11064. [Google Scholar] [CrossRef]
  45. Hamad, S.A.; Zhang, W.E.; Sheng, Q.Z.; Nepal, S. IoT Device Identification via Network-Flow Based Fingerprinting and Learning. In Proceedings of the 2019 18th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/13th IEEE International Conference on Big Data Science And Engineering (TrustCom/BigDataSE), Rotorua, New Zealand, 5–8 August 2019; pp. 103–111. [Google Scholar]
  46. Thangavelu, V.; Divakaran, D.M.; Sairam, R.; Bhunia, S.S.; Gurusamy, M. DEFT: A Distributed IoT Fingerprinting Technique. IEEE Internet Things J. 2019, 6, 940–952. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.