Semi-Supervised Learning with Entropy Filtering for Intrusion Detection in Asymmetrical IoT Systems

Alturki, Badraddin; Alsulami, Abdulaziz A.

doi:10.3390/sym17060973

Open AccessArticle

Semi-Supervised Learning with Entropy Filtering for Intrusion Detection in Asymmetrical IoT Systems

by

Badraddin Alturki

¹

and

Abdulaziz A. Alsulami

^2,*

¹

Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia

²

Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(6), 973; https://doi.org/10.3390/sym17060973

Submission received: 9 May 2025 / Revised: 12 June 2025 / Accepted: 17 June 2025 / Published: 19 June 2025

(This article belongs to the Special Issue Symmetry and Asymmetry in Cyber Security, IoTs and Privacy)

Download

Browse Figures

Versions Notes

Abstract

The growth of Internet of Things (IoT) systems has brought serious security concerns, especially in asymmetrical environments where device capabilities and communication flows vary widely. Many machine-learning-based intrusion detection systems struggle to address noise, uncertainty, and class imbalance. For that reason, intensive data preprocessing procedures were required. These challenges are in real-world data. In this work, we introduce a semi-supervised learning approach that uses entropy-based uncertainty filtering to improve intrusion detection in IoT environments. By dynamically identifying uncertain predictions from tree-based classifiers, we retain only high-confidence results during training. Later, confident samples from the uncertain set are used to retrain the model through a self-training loop. We evaluate this method using three diverse and benchmark datasets named RT-IoT2022, CICIoT2023, and CICIoMT2024, which include up to 34 different attack types. The experimental results reveal that XGBoost and Random Forest outperformed other tree-based models while maintaining their robustness when predicting attacks in the IoT environment. In addition, our proposed model was compared with other models proposed by researchers in the field, and the findings confirmed that our model presented promising results.

Keywords:

intrusion detection systems; semi-supervised learning; internet of things; machine learning; deep learning; entropy

1. Introduction

The Internet of Things (IoT) has grown rapidly in recent years, revolutionizing several sectors, including smart cities, transportation, healthcare, and manufacturing [1]. The IoT is a network that has interconnected devices that can gather, exchange, and analyze data to help in automation and decision-making [2]. IoT systems have sensors, computational power, software, and other technologies that allow them to be connected to other devices and systems through networks. For example, the IoT software can be installed on connected devices like appliances, wearable health equipment, smart home systems, smart factory devices, and many other devices [3]. IoT technologies are deployed in many domains, including smart homes, smart cities, smart agriculture, smart industries, security, healthcare, and digital consumers [4]. The real-time communication abilities of IoT systems require quick responses from IoT devices and collaboration with cloud services when high-quality performance is needed [5]. In 2024, IoT Analytics [6] forecasted that the number of connected devices to the internet will be around 40 billion IoT devices by 2030. The number of internet-connected devices is growing significantly due to the demand for IoT networks [7].

However, the rapid growth of the IoT has created significant challenges in security, which make the IoT systems more frequent targets for attackers [8]. The IoT networks face security challenges, including malicious attacks, unapproved access, denial of service, and information theft. Another challenge in IoT environments is adversarial attacks, where the attackers change the input data to mislead machine-learning algorithms. Additionally, any changes to input attributes might result in inaccurate classifications that make it possible for malicious traffic to avoid detection. This makes these kinds of attacks challenging for machine-learning- and deep-learning-based IDS [9]. There is a need to address these challenges in the software development stage to emphasize the importance of building inherently secure software systems [3]. Therefore, it is important to design and implement an effective intrusion detection system (IDS) to protect IoT applications from potential threats and to ensure the consistency of their accessibility and integrity [10]. The IDSs could monitor and track the behaviors of the IoT devices with the networks while detecting abnormal activities, such as malicious activities that may occur [11]. The IDS can alert the administrator when an attack is found. Thus, the detected attacks can be analyzed for further actions that are needed to prevent the attacks [12]. Also, the IDS uses efficient monitoring of the network traffic and security techniques to ensure the security of the IoT environment [13]. In IoT environments, IDSs provide real-time detection of online threats and the devices that are linked to IoT devices [14]. Since IoT devices produce vast volumes of data via sensors, machine-learning models (ML) are often considered to be the most effective way to implement IDSs because of the capability of ML to process vast volumes of IoT data [15]. In addition to processing the enormous volume of data that users produce on a regular basis, ML techniques can be used to detect attacks in IoT systems [16]. Researchers have proposed numerous methods utilizing ML and deep-learning (DL) approaches to identify cybersecurity threats in the IoT environment [17]. An ML model may be biased towards disproportionately represented attack types due to a highly unbalanced dataset [3]. Researchers are motivated to explore DL techniques further because it becomes difficult for machine-learning methods to classify the different attack types on complicated databases with many attributes [3]. DL uses the novel representation of data extracted from the input features to distinguish between different types of attacks [18]. DL can recognize known and unknown attack patterns by investigating past attack patterns, observing data dependencies, and obtaining information by evaluating dynamic and enormous amounts of data [19]. However, DL requires high computational resources, which could be limited in IoT systems [20].

In recent years, algorithmic advances and physical parallel technological infrastructure have been utilized in artificial neural networks to process enormous amounts of data, obtaining fascinating results, especially when DL algorithms are utilized to assist with the IDS for IoT devices [21]. However, many studies have demonstrated that ML models, which include DL models, often generate unreasonably high per-class classification ratings. This includes incorrectly classified instances from recognized classes that can be associated with unknown classes. Since an IDS needs to deal with novel types of attacks or variances of well-known attacks, this brings challenges when implementing a reliable ML-based IDS [22]. Although supervised learning techniques have been used in State-of-the-Art IDS with a labeled dataset, semi-supervised learning (SSL) techniques are used for intrusion detection because of their capacity to use both labeled and unlabeled data, as in [23]. However, noisy pseudo-labels are frequently a problem with current SSL techniques, which might hinder learning. Therefore, there is a need to have a reliable IDS while using the power of ML and DL to handle uncertainty issues within the classification process.

The primary objective of this research is to develop a robust IDS for IoT environments by leveraging entropy-based uncertainty estimation and semi-supervised learning. The proposed model integrates tree-based classifiers with dynamic entropy thresholds to identify uncertain predictions, which are selectively included in a self-training process using high-confidence samples. This approach is validated across three realistic and diverse IoT-related datasets (RT-IoT2022, CICIoT2023, and CICIoMT2024) to demonstrate its effectiveness in enhancing classifier performance, generalization, and resilience to label noise in security-critical applications.

The main contributions of this research can be summarized as follows:

We introduce an entropy-based uncertainty filtering mechanism calculated dynamically to identify uncertain predictions. This technique ensures only confident predictions are used during the initial supervised training phase.
We use a self-training model that leverages high-confidence uncertain predictions as pseudo-labeled samples for semi-supervised retraining. This enables the model to improve its generalization using additional unlabeled or low-confidence data.
We evaluate the proposed model across three IoT intrusion detection datasets: RT-IoT2022, CICIoT2023, and CICIoMT2024. The datasets include up to 34 attack types, providing a comprehensive benchmark for multiclass intrusion detection in IoT environments. Additionally, we compare our proposed model with existing models.
We present a detailed evaluation of semi-supervised classifiers for cyberattack detection, including per-class performance analysis to highlight each model’s strengths and limitations in classifying diverse attack types.

The rest of the paper is organized as follows: Section 2 discusses Related Works in the field of IDS in IoT environments. Section 3 presents the proposed Methodology, including datasets, model architecture, and entropy-based filtering. Section 4 provides the experimental Results and Discussion. Finally, Section 5 provides Conclusions for the paper and outlines possible directions for future work.

2. Related Works

Several studies have used traditional supervised machine-learning algorithms such as random forests (RF), support vector machines (SVM), and decision trees (DT) to implement IDSs in the IoT environment. While these models have achieved acceptable results on balanced datasets, their performance might decline when they are applied to real-world datasets that have high levels of noise and class imbalances. Furthermore, these techniques depend on the availability of large and labeled datasets that are often impractical in IoT systems. There are commonly used datasets in the literature, including the RT-IoT2022 dataset [24], CICIoT2023 dataset [25], and CICIoMT2024 dataset [26], that are used in this research for benchmarking purposes. In this section, we review the recent Related Works that use the same datasets in IoT environments.

The researchers in [27] provided a comparative performance study of several ML models to detect intrusions in IoT networks like extremely randomized trees (ERT), k-nearest neighbors (KNN), XGBoost (XGB), SVM, gradient boosting (GB), DT and RF. They evaluated the abovementioned models by utilizing the RT-IoT2022 dataset. The study analyzed the results and the performance of each classifier, and the outcome was that all classifiers had achieved acceptable results. However, ERT, RF, and XGBoost outperformed other models. The ERT can be considered the most effective classifier for real-time detection, with an achieved accuracy of 99.7%. The authors stated that ensemble models outperform other models in IoT-based IDS.

The authors in [28] presented a hybrid deep-learning architecture that is effective for detecting attacks in IoT real-world environments. Their proposed method includes a combination of bidirectional long short-term memory (BLSTM), gated recurrent units (GRU), attention mechanisms, and convolutional neural networks (CNN). The aim of this integration was to enhance the classification and detection of complex threats while considering the practical limitations of various IoT systems. They evaluated their model using the RT-IoT2022 dataset, which has various devices, operations, and attacks. Their proposed model outperformed conventional classifiers with an accuracy of 99.62%. The author indicates that their model strengthens the reliability and privacy of IoT infrastructure while demonstrating the ability of DL to improve IoT security by providing a dependable and adaptable response to evolving cybersecurity threats.

The study in [29] proposed a combined method for real-time intrusion detection, which integrates complex deep-learning models with principal component analysis (PCA) and feature selection techniques to improve the accuracy and efficiency of computation. In the feature selection process, they integrated five methods, including gain ratio (GR), correlation-based feature subset selection (CFS), Pearson analysis, symmetrical uncertainty (SU), and information gain (IG) with PCA to enhance the dimensionality of the feature and performance of the prediction. They used three classifiers, namely, deep neural networks (DNNs), artificial neural networks (ANNs), and TabNet, which were evaluated using the RT-IoT2022 dataset. The authors have three experimental setups, namely, a classifier without feature selection techniques, a classifier with feature selection techniques, and finally, a classifier with feature selection techniques and PCA. Their findings show that the experimental setup, which has a classifier with feature selection techniques and PCA, has achieved better accuracy results. The highest accuracy results that models achieved are as follows: ANN and Pearson with PCA achieved an accuracy of 99.7%, TabNet and Pearson with PCA achieved an accuracy of 99.3%, and DNN and Pearson with PCA achieved an accuracy of 99.6%.

Another Study in [3] presented federated learning IDS that uses a one-dimensional CNN for effective and precise intrusion detection in IoT networks. Also, they addressed the issue of privacy preservation (PP) in federated learning utilizing three techniques, including Homomorphic Encryption, Diffie–Hellman Key Exchange, and Differential privacy. They evaluated the effectiveness of their proposal by applying their experiments on seven public IoT datasets: RT-IoT 2022, CIC IoT 2023, CIC IoMT 2024, TON-IoT, IoT-23, BoT-IoT, and EdgeIIoT. Their findings show that the CNN-based method outperformed other classifiers by achieving an accuracy of 97.31%. Their CNN model achieved average accuracies of 99.63% on the TON-IoT dataset, 99.99% on the IoT-23 dataset, 100.00% on the BoT-IoT dataset, 93.96% on the CIC IoT 2023 dataset, 95.63% on the CIC IoMT 2024 dataset, 99.47% on the RT-IoT 2022 dataset, and 89.12% on the EdgeIIoT dataset.

The work in [30] proposed an intrusion detection model utilizing DL models based on CICIoT2023. They addressed the challenges of high dimensionality using techniques to reduce the size and enhance efficiency. They used an optimization algorithm called gaining–sharing knowledge (GSK), multilayer perceptron (MLP), and autoencoder (AE) algorithm. The goal of the GSK optimization method is to determine the percentage of important information in the data. MLP and AE are two deep-learning techniques that are utilized to build a model for intrusion detection with high accuracy and lower testing times. The MLP technique was initially used to train the dataset, followed by AE. The findings of the paper showed that the MLP classifier achieved an accuracy of 99.26%, and the AE achieved an accuracy of 98.76% in binary classification. The MLP model achieved an accuracy of 97.46%, and the AE model achieved an accuracy of 83.81% in multiclassification.

The researchers in [31] evaluated the performance of the three models—CNN, LSTM, and DNN—for classifying and detecting attacks in cybersecurity in IoT networks, and they evaluated the models using the CICIoT2023 dataset. Their findings showed that CNN outperforms DNN and LSTM in terms of accuracy rate and computational effectiveness. The CNN achieved an accuracy of 99.10% (multiclass classification) and 99.40% (binary classification). The LSTM achieved an accuracy of 85.98% (multiclass classification) and 99.36% (binary classification). DNN achieved an accuracy of 99.02% (multiclass classification) and 99.38% (binary classification).

The authors in [32] proposed a model called deep-learning multilayer perceptron intrusion detection and prevention system model (DLMIDPSM) for real-time intrusion detection and prevention on IoT utilizing the CICIoT2023 dataset. They combined conventional ML and DL techniques, including MLP, CNN, RNN, and ANN. They used a topology called intrusion detection and prevention system topology (IDPST) to prevent and block cybersecurity threats in real time with tools such as iptables. They used ML techniques like DT and SVM, while DL techniques are CNN, RNN, and ANN.

Researchers in [26] proposed a practical benchmark dataset called the CICIoMT2024 dataset to fill the gaps in the development and evaluation of IoMT security solutions. Additionally, they employed widely used ML techniques, including AdaBoost (ADA), logistic regression (LR), deep neural networks, and RF. These ML models are used to detect and classify the attacks and are evaluated using the proposed dataset. The authors concentrated on three classification approaches: binary, categorical, and multiclass classification. The findings showed that the binary class classification results achieved the highest accuracy among others. In binary classification, the accuracy of each classifier is as follows: ADA—99.6%, LR—99.5%, DNN—99.6%, and RF—99.6%. In multiclass classification, the accuracy of each classifier is as follows: ADA—42.2%, LR—72.7%, DNN—72.9%, and RF—73.3%.

The study in [33] proposed a CNN IDS called DCNN, which can be used for analyzing and detecting attacks in smart networks. The DCNN method is applied to three datasets (CICIoT2023, CICIDS-2017, and CICIoMT2024) that are benchmarking datasets and compared with other State-of-the-Art models. In addition, the DCNN model achieved better performance than comparable intrusion detection systems in terms of accuracy, F1 measure, positive predicted value, and true positive rate. The accuracy of DCNN is as follows: CICIoT2023—99.5% (binary) and 99.25% (multiclass), CICIDS-2017—99.96% (binary) and 99.96% (multiclass), and CICIoMT2024—99.98% (binary) and 99.86% (multiclass).

The authors in [34] proposed a model that was an enhanced version of the LSTM deep-learning algorithm to detect different types of attacks on the Internet of Medical Things (IoMT) devices. The proposed algorithm was evaluated and compared with alternative approaches using the CICIoMT2024 dataset that has a variety of equipment types and associated attacks. The findings show that the suggested method produced a promising result for IoMT environments that achieved an accuracy of 98%.

Table 1 provides a summary of related studies for intrusion detection, along with their corresponding techniques and reported accuracy.

Most of the research has shown that ML models, including DL models, could predict excessively high ratings for each class of classification. In addition, no attention was given to the case when ML or DL incorrectly classified samples from known classes to unidentified classes. This behavior creates a significant challenge for the deployment of a trustworthy ML-based IDS since an IDS will have to handle new attack types or variations of well-known attacks.

3. Methodology

This section illustrates the materials and methods used in this research. First, it explains the datasets that are used to test and evaluate the proposed model. Second, it discusses the proposed model in detail.

3.1. Datasets

3.1.1. RT-IoT2022

The dataset RT-IoT2022 [24] encompasses three benign classes and nine adversarial network behaviors. It mimics real-world scenarios for the IoT environment. The normal network traffic in the dataset includes ThinkSpeak-LED, Wipro-Bulb, and MQTT-Temp. ThinkSpeak is an IoT cloud platform that is used to visualize stream data [35]. The LED states are monitored by ThinkSpeak. Wipro-Bulb is a smart bulb that can be controlled remotely by mobile devices through the WIFI protocol. MQTT is an IoT communication protocol and was used to publish temperature values. The rest of the classes belong to various types of attacks that were simulated using four victims’ devices named ThinkSpeak-LED, Wipro-Bulb, MQTT-Temp, Amazon Alexa, and a router. Additionally, there are three attacker devices, Raspberry Pi and two virtual machines, that can perform attack scenarios. Table 2 presents information about each class included in the dataset with a brief description.

3.1.2. CICIoT2023

The CICIoT2023 dataset [25] was created based on real-time experiments for the IoT environment. It provides benign IoT traffic and large-scale IoT cyberattacks that reach thirty-three different attack types. Therefore, CICIoT2023 helps researchers evaluate security analytics and machine-learning models when implementing ML or DL models for IoT intrusion detection systems. The dataset contains seven main attack categories: DDoS, Reconnaissance, Spoofing, DoS, Mirai, Brute-Force, and Web-Based. A hundred and five devices were used when conducting the experiments of the CICIoT2023 dataset. Table 3 lists each class category and subcategory with a brief description.

3.1.3. CICIoMT2024

The CICIoMT2024 dataset [26] is a real-time experimental dataset that provides a realistic benchmark dataset for the IoMT environment. It includes eighteen attack types that were generated from forty IoMT devices. There are five main attack categories: DDoS, DoS, Reconnaissance, MQTT, and spoofing. The main difference between CICIoMT2024 and CICIoT2023 is that CICIoMT2024 focuses more on IoT healthcare devices. Table 4 provides summarized information about the category and subcategory of each class in the dataset.

It is important to note that the three datasets were imbalanced. Therefore, they need to be balanced to obtain reliable results.

3.2. Proposed Model Components

3.2.1. Training and Testing Classifiers

Multiple supervised learning models, including DT, gradient boosting classifier (GBC), RF, XGBoost, and extremely randomized trees (XRT), were trained and tested in this research. The reason for focusing on tree-based classifiers in this research is that they can be integrated seamlessly with entropy-based uncertainty estimation, as they naturally generate well-defined class probabilities at the leaf nodes. Furthermore, tree-based classifiers offer greater interpretability compared to other classifiers, such as neural networks and SVMs, which have limited interpretability and are often considered black-box models.

Each model was trained and tested using the aforementioned three datasets. Before training a classifier on a dataset, a preprocessing procedure is performed, which includes balancing classes of each dataset, feature scaling, and encoding labels. As was mentioned above, the datasets used in this work suffer from class imbalance, and to increase the robustness of the classification, the SMOTE oversampling technique was used to balance the number of samples in each class. SMOTE generates synthetic samples for minority classes [36]. It is worth mentioning that SMOTE was applied only to the labeled training data before running the entropy-based filtering step.

Feature scaling is a process of normalizing feature values to have values from [0, 1]. It ensures that all features during the training are treated equally. This allows for more effective training. Encoding labels involves converting categorical labels into numeric values. This helps the classifier to process them during the training and the evaluation. After that, the dataset is split into two groups, 70% for training and 30% for testing, and then the training and testing of each classifier proceed.

3.2.2. Entropy-Based Uncertainty Detection

Entropy is a technique that measures uncertainty and randomness in a system or dataset. High entropy indicates the outcome is highly unpredictable, while low entropy indicates the outcome is more predictable. In this research, entropy is used to measure uncertainty in the prediction probabilities of a classifier. Therefore, the entropy works as a filter to identify uncertain predictions. This is achieved by determining if entropy is above a dynamic threshold. The prediction is marked as uncertain and is later considered for self-training if its confidence is still high.

Entropy is defined mathematically as in the following Equation (1) [37].

n

represents the number of classes in a dataset.

p_{i}

is the predicted probability of class

i

.

H (p) = - \sum_{i = 1}^{n} p_{i} {l o g}_{b} (p_{i})

(1)

The threshold is selected dynamically using Equation (2).

μ_{H (p)}

refers to the average of entropy, and

σ_{H (p)}

is the standard deviation of entropy. Calculating the average of entropy gives the center of the entropy distribution, while the standard deviation shows how spread out the values are. λ is a tuning constant used to control the sensitivity of the entropy-based uncertainty threshold. In our case, it is set to 0.5 to select only samples that are uncertain (moderate–high) without being overly strict in the filtering process.

D y n a m i c T h r e s h o l d = μ_{H (p)} + λ \cdot σ_{H (p)}

(2)

This dynamic threshold is calculated after training the classifier; therefore, each classifier may have a different threshold.

Using a fixed threshold has some drawbacks, such as relying solely on a single entropy value for every dataset. Also, each classifier has its own way of measuring certainty, as class probabilities are assigned to each sample differently. These probabilities are used to compute entropy, which measures how confident the model is about its predictions. In addition, the dynamic threshold is selected because it is more generalizable, while the fixed threshold is biased. Table 5 shows the range of entropy values that can be expected for each dataset. The minimum and maximum possible entropy values can be calculated using Equation (1) based on the number of classes in the dataset. The minimum possible entropy value (~0.0) means that the model is completely confident in its prediction, and it assigns full probability to a single class. While the maximum possible entropy value (~3.58) for RT-IoT2022 and CICIoMT2024 and (~5.09) for CICIoT2023 represents the worst-case scenario, where the model is completely uncertain and assigns equal probability to all classes.

Table 6 summarizes the selected entropy threshold that was calculated dynamically for each classifier in each dataset. In the Section 4, those values are explained with the corresponding accuracy, and the reason for mentioning them here is to show the outcomes of Equations (1) and (2).

3.2.3. Self-Training of Classifiers

Each classifier is retrained using a self-training approach, which is a form of semi-supervised learning [38]. Initially, the classifier is trained on high-confidence labeled data selected using an entropy-based uncertainty filtering mechanism. Predictions that exceed the entropy computed threshold are considered uncertain and are excluded from this initial training phase.

After the initial training, the classifier evaluates the uncertain samples and identifies those with high confidence based on their predicted class probabilities, even though their entropy was relatively high. These predictions are treated as pseudo-labels assigned by the model to data points that were not initially trusted and are conditionally accepted based on their confidence levels.

The high-confidence pseudo-labeled samples are then combined with the original training dataset, and the classifier is retrained (self-training) on this extended set. This process allows the model to learn from additional data points, especially those near decision boundaries or previously considered ambiguous. By doing so, the model becomes more robust and achieves better generalization. Self-training, in this context, enables the classifier to improve iteratively by leveraging its own confident predictions.

The proposed IDS addresses the challenge of model uncertainty in supervised learning by integrating a semi-supervised approach built on top of multiple tree-based classification algorithms. This model incorporates entropy-based uncertainty filtering and high-confidence self-training to obtain robust and reliable classifiers for critical security or classification scenarios.

3.3. Proposed Model Flow of Operations

Figure 1 illustrates the flow of operations of our proposed model. It begins with loading and preprocessing each dataset and ends with model retraining and evaluation. Initially, each dataset is loaded and preprocessed independently. In this stage, feature values are normalized to ensure uniform influence across all classifiers. In addition, labels are encoded into numerical values that are compatible with ML classifiers.

After that, each dataset is partitioned as an essential step when training ML classifiers. 70% of the data is used for training, and 30% is reserved for testing. Then, each supervised classifier is trained independently on the training set, and the probabilistic predictions are produced from the test set.

Following this stage, each classifier computes entropy for its prediction probabilities to identify uncertain samples. A dynamic entropy threshold is calculated based on the distribution of entropy values. Samples with entropy values greater than the threshold are considered uncertain and are excluded from the initial performance evaluation.

Among the uncertain samples, those with high-confidence predictions are identified based on their maximum predicted class probability. Specifically, only uncertain samples for which the top predicted class probability exceeds 0.8 are considered reliable enough for pseudo-labeling and self-training. This strategy minimizes the risk of introducing noise, as some uncertain samples may still include instances for which the classifier makes high-confidence predictions. The classifier is then retrained (self-training) on an extended training set, which includes both the original labeled samples and the high-confidence uncertain samples. This ensures that only trusted predictions are incorporated, thereby boosting the model’s learning capacity and overall robustness. Finally, testing accuracy, evaluation metrics including precision, recall, and F1-score, as well as five-fold cross-validation, were calculated and reported to ensure robust performance evaluation.

4. Results and Discussion

This section provides the experimental Results and Discussion of this research. It begins with providing the system configuration that is used to conduct the experiments. Next, it explains each classifier’s parameters. After that, the classification performance of our proposed model is evaluated using the three datasets, RT-IoT2022, CICIoT2023, and CICIoMT2024. In addition, we compared the performance of multiple tree-based models to evaluate their classification performance and computational time. The Results and Discussion of the proposed model are reported in separate subsections for each dataset. In the end, we compared our proposed model with existing IoT environment methods.

4.1. Experimental Environment

The experiments were conducted on a personal computer running Windows 11 Pro 64-bit. The system is equipped with an Intel Core i9-14900K processor operating at a base clock of 3.2 GHz and 32 GB of RAM. It utilizes an NVIDIA GeForce RTX 4090 GPU that features 24 GB of dedicated VRAM and a total of 40 GB of GPU memory.

4.2. Classifiers Parameters

4.2.1. DT

The DT classifier was implemented with the following parameter settings: the splitting criterion was set to gini, which measures the impurity of a node to determine the quality of a split. The splitter strategy was set to best enable the algorithm to choose the optimal feature at each node. The minimum number of samples required to split an internal node (min_samples_split) was set to 2. The minimum number of samples required to be at a leaf node (min_samples_leaf) was set to 1. The random_state parameter was set to 42 to ensure the reproducibility of results.

4.2.2. GBC

The GB classifier was implemented using its default configuration. The random_state parameter is set to 42. The model used the friedman_mse loss function to optimize splits and improve performance through gradient descent. It utilized a learning rate of 0.1 to control the contribution of each tree and prevent overfitting. The number of boosting stages (n_estimators) was set to 100. The maximum depth of each individual regression tree (max_depth) was set to 3, ensuring model simplicity and preventing over-complexity. The minimum number of samples required to split an internal node (min_samples_split) and to be at a leaf node (min_samples_leaf) were 2 and 1, respectively. The subsample ratio was maintained at 1.0, meaning each boosting stage used the entire training dataset.

4.2.3. RF

The RF classifier was configured with n_estimators set to 100. The criterion used for measuring the quality of splits was the Gini impurity. The minimum number of samples required to split an internal node was set to 2, and at least one sample was required for a node to be considered a leaf. A random_state was set to 42.

4.2.4. XGBoost

For the XGBoost classifier, the number of boosting trees (n_estimators) was set to the default value of 100, and the learning rate (learning_rate) remained at its default value of 0.3. The model used the gradient-boosted decision tree algorithm (booster = ‘gbtree’) and optimized splits using the tree-based approach with default tree depth (max_depth = 6). The objective function was set to ‘binary:logistic’ or ‘multi:softprob’ automatically based on the number of classes in the label encoding. It had a fixed random seed (random_state = 42). Regularization parameters (reg_alpha and reg_lambda) and subsampling-related controls (subsample, colsample_bytree) were left at their default values, allowing the model to train without enforced constraints on overfitting or feature usage.

4.2.5. XRT

The XRT of the random_state was set to 42. The model constructs an ensemble of 100 decision trees (n_estimators = 100). The criterion used to measure split quality is Gini impurity. The min_samples_split was set to 2, and the min_samples_leaf was set to 1.

The value of λ was kept fixed at 0.5 across all models to ensure a fair comparison and to observe how each model responds under the same controlled entropy-thresholding condition.

4.3. Result of RT-IoT2022 Dataset

The results of the proposed model using the dataset RT-IoT2022 are shown in Table 7. It includes a selected entropy threshold for each model, supervised training for certain predictions only, and semi-supervised retraining for overall accuracy. As was mentioned, the minimum possible entropy values for RT-IoT2022 are approximately 0, and the maximum is 3.58. For each model, a dynamic entropy threshold was computed using Equation (2), which serves as a cutoff point to distinguish between certain and uncertain predictions. Therefore, any prediction that has an entropy value less than the threshold is considered reliable and used in the initial supervised training phase, while those above the threshold were flagged as uncertain. The selected entropy threshold varies across models, which reflects the differences in prediction confidence distributions for each classifier. For instance, DT has a very low threshold of 0.0080, which indicates that its predictions are generally very confident. Nevertheless, GBC has a higher threshold of 0.1081, which suggests a broader range of uncertainty in its outputs.

The third column of the table belongs to the classification accuracy of each classifier that was evaluated only with certain predictions to assess the entropy-based filtering and test its effectiveness at identifying reliable outputs. The results indicate that the entropy accomplished its purpose since it achieved 100% classification accuracy.

In the next phase, the uncertain predictions were revisited, and only those with high confidence scores (greater than 0.8) were reselected and treated as pseudo-labeled data to be retrained. These high-confidence uncertain samples were then added to the original training set, and the model was retrained. The fourth column reflects the final accuracy of each model after the retraining process. While DT retained its perfect accuracy at 1.00, GBC and XGBoost slightly decreased to 0.99, and RF and XRT settled at 0.97 as they retrained using a broader dataset that included not only certain predictions but also additional pseudo-labeled samples.

The table highlights how entropy-based filtering integrated with self-training can enhance model learning while maintaining high accuracy, especially when the confidence threshold is carefully selected.

The classification performance of the five semi-supervised learning classifiers is illustrated in Figure 2. Precision, recall, and F1-score values were reported for each classifier. These metrics are used to assess the effectiveness of each classifier in terms of its classification accuracy.

The results indicate that DT achieved perfect performance at every metric, showing that it was able to classify all instances correctly without any false positives or false negatives after semi-supervised retraining. Similarly, GBC and XGBoost exhibited strong performance, and they accomplished perfect precision and F1-score, although their recall was slightly lower at approximately 0.99.

In contrast, RF and XRT classifiers showed comparatively lower recall values, around 0.97, and corresponding F1-scores of approximately 0.98 and 0.99, respectively. Despite this minor drop, both models still demonstrated robust classification capability, suggesting that their semi-supervised learning process was effective, even though they are slightly more sensitive to noise in pseudo-labeled data.

The decrease in accuracy for RF and XRT is likely due to the inclusion of noisy pseudo-labeled samples during self-training. Although we applied a confidence threshold of 0.8, some samples with incorrect labels could still be included. These mislabels can negatively affect models that are sensitive to training noise. This highlights a limitation of self-training, where high confidence does not always guarantee label correctness.

Overall, the figure confirms that semi-supervised learning, when paired with an entropy-based uncertainty filtering, can lead to highly accurate classification results while achieving model generalization across various tree-based models in the RT-IoT2022 dataset.

The per-class classification performance of the five semi-supervised learning models is presented in Figure 3 on the RT-IoT2022 dataset. It highlights each model’s ability to classify individual attack types and benign categories correctly to demonstrate their effectiveness in classifying each class properly.

Among all classifiers, DT achieved perfect classification accuracy across all classes, which reflects its strong generalization even when trained in a semi-supervised manner. Meanwhile, the other classifiers were unable to classify correctly the ARP Poisoning attack, likely due to a failure to learn its underlying pattern.

However, for most classes, all classifiers achieved near-perfect classification accuracy.

Slight performance variations were observed in a few specific classes. For instance, DDoS_Slowloris and Wipro_bulb show slightly lower performance in some models, particularly in RF, indicating potential challenges in recognizing these classes under the self-training setting.

The reduced classification accuracy for certain classes, such as Wipro_bulb and ARP_Poisoning, is likely due to extreme class imbalance and limited data representation before applying SMOTE. For example, the dataset initially included only 253 samples for Wipro_bulb, 37 for Metasploit_Brute_Force_SSH, and 28 for NMAP_FIN_SCAN, in contrast to over 94,000 samples for DoS_SYN_Hping. These variances limit model learning, especially during semi-supervised retraining. Additionally, these classes may suffer from feature overlaps or noise, which further affects recognition accuracy. Despite these issues, the models still achieved high overall performance, confirming the robustness of the proposed approach for most classes.

This detailed per-class comparison confirms that entropy-based semi-supervised learning not only maintains high overall accuracy but also performs reliably across a diverse set of attack categories, making it highly suitable for intelligent intrusion detection in IoT environments.

4.4. Result of CICIoT2023 Dataset

Table 8 presents the performance summary of the semi-supervised learning classifiers used on the CICIoT2023 dataset.

Notably, GBC, RF, XGBoost, and XRT models had significantly higher entropy thresholds (ranging from 0.5584 to 1.0545), which indicates that they generate more uncertain predictions compared to DT, which had a much lower threshold of 0.0171. This means the DT classifier uses stricter filtering, only accepting highly confident predictions.

When evaluating the accuracy of the filtered certain predictions, most models performed exceptionally well. GBC and XGBoost achieved perfect accuracy at 1.00, followed closely by XRT at 0.99 and RF at 0.96, while DT reached 0.92.

However, after retraining with high-confidence pseudo-labeled samples, a noticeable drop in overall accuracy was observed in some models. GBC and RF dropped to 0.89 and 0.85, respectively, suggesting that the pseudo-labeled data introduced some level of noise or mislabeling during self-training. The entropy threshold for RF was relatively high (1.0545), which reflects a less restrictive strategy that allowed more uncertain predictions to be included. This may have led to the addition of incorrect labels that negatively affected retraining.

In contrast, XGBoost and XRT maintained high overall accuracy (0.93), demonstrating better robustness to uncertain data. DT’s performance remained stable at 0.92.

These results suggest that while entropy-based semi-supervised learning is generally effective, its impact can vary significantly depending on the model’s sensitivity to pseudo-label noise and its ability to generalize under uncertain conditions.

Figure 4 displays the precision, recall, and F1-score for the five semi-supervised classifiers on the CICIoT2023 dataset. The XGBoost and XRT models achieved the highest scores across all three metrics, with precision, recall, and F1-score values around 0.93, indicating their robust performance in both detecting and correctly classifying diverse attack types. The DT model also performed well, maintaining a balance across all metrics, each reaching approximately 0.92.

In contrast, GBC demonstrated slightly lower performance, with precision near 0.90 and recall and F1-score slightly below. This suggests that the model may have missed some true positives. The RF model showed the lowest performance in this evaluation, with precision and F1-score at approximately 0.84 and recall slightly higher at 0.85.

Overall, the figure demonstrates that while entropy-based semi-supervised learning supports strong classification performance, its success depends on the classifier’s architecture and ability to learn effectively from pseudo-labeled data. XGBoost and XRT showed superior resilience and effectiveness under these conditions, making them promising choices for real-world IoT security applications.

Figure 5 presents the per-class classification accuracy of the five semi-supervised classifiers evaluated on the CICIoT2023 dataset. The figure illustrates each model’s ability to classify correctly a wide variety of attack types and benign traffic across 34 different categories.

The results show that a high number of classes, particularly DDoS attack types such as DDoS-ICMP Flood, DDoS-ICMP Fragmentation, DDoS-PSHACK Flood, DDoS-RSTFINFlood, DDoS-TCP_Flood, and DDoS-UDP_Fragmentation, all models achieved perfect accuracy, indicating their effectiveness in distinguishing between DDoS attack types. Notably, XGBoost and XRT maintained consistently high accuracy across nearly all classes, demonstrating robust generalization and strong resilience to noisy pseudo-labeled data introduced during semi-supervised retraining.

However, variability is observed in several challenges or less frequent classes. For instance, Backdoor_Malware was only correctly classified by RF and XGBoost with modest accuracy (0.79 and 0.83, respectively), while DT, GBC, and XRT failed to detect it. Reconnaissance-based attacks, such as Recon-OSScan, Recon-PortScan, and Recon-PingSweep, exhibited considerable variation. RF achieved only 0.38 on Recon-OSScan and 0.44 on Recon-PortScan, while XGBoost and XRT showed improved robustness in these categories. The Benign Traffic class, which is crucial for avoiding false positives, was reliably detected by XRT (0.94) and XGBoost (0.83), while GBC and RF showed lower accuracy (0.71 and 0.60, respectively).

Notably, XGBoost and XRT consistently outperformed other models in more complex classes, maintaining high classification accuracy across most attack categories. This demonstrates their ability to generalize well under entropy-based semi-supervised learning and manage noisy pseudo-labels effectively. However, RF and GBC showed reduced performance in some classes, suggesting greater sensitivity to training noise or difficulty in learning from uncertain pseudo-labeled data.

4.5. Result of CICIoMT2024 Dataset

Table 9 summarizes the performance of the semi-supervised classifiers on the CICIoMT2024 dataset. The table reports each model’s selected entropy threshold, its accuracy on certain predictions only, and the overall accuracy after retraining with high-confidence pseudo-labeled data.

The selected entropy thresholds vary across models. GBC and XRT had the highest thresholds (0.3995 and 0.2529, respectively). It suggests they are more tolerant of prediction uncertainty. In contrast, DT and RF had relatively low thresholds (0.0952 and 0.1207), applying stricter filtering and only accepting highly confident predictions for initial training.

When trained on certain prediction sets, all models except DT achieved perfect accuracy (1.00). This demonstrates their ability to confidently classify high-certainty samples. DT followed closely with 0.98 accuracy.

After semi-supervised retraining using high-confidence uncertain samples, some models experienced a drop in performance. GBC and XRT dropped to 0.94 and 0.91, respectively, indicating a potential sensitivity to noise in the pseudo-labeled data. On the other hand, RF and XGBoost maintained strong generalization performance, achieving 0.98 and 0.97 overall accuracy, respectively. DT also maintained a high performance with 0.96.

These results confirm that entropy-based semi-supervised learning is effective for improving generalization while managing uncertainty, but its success depends on the model’s robustness to pseudo-labeling noise and its ability to leverage uncertain data effectively. Notably, RF and XGBoost appear to offer the best balance between confidence filtering and retraining effectiveness on the CICIoMT2024 dataset.

Figure 6 illustrates the precision, recall, and F1-score of the semi-supervised classifiers evaluated on the CICIoMT2024 dataset.

The results show that RF achieved the best overall performance, indicating excellent generalization across all classes. XGBoost followed closely, maintaining precision and F1-score at 0.98, with recall slightly lower at approximately 0.97, reflecting a strong balance between minimizing false positives and false negatives.

DT also performed well, and it achieved consistent values of approximately 0.96 across all three metrics, which demonstrates stable and reliable classification behavior. GBC showed slightly lower performance, with all three metrics at 0.94, suggesting a moderate drop in balance and possibly a sensitivity to mislabeled pseudo-samples during retraining.

The XRT model showed the most variation among the five, with high precision (0.98) but significantly lower recall (~0.91). This results in a reduced F1-score (~0.94). It is suggested that XRT was highly confident in its predictions but failed to identify a sufficient number of true positives, which impacted its recall and balance across all classes.

In summary, the figure demonstrates that while all models benefited from entropy-based semi-supervised learning, RF and XGBoost showed the most robust and well-balanced performance on CICIoMT2024, making them the most suitable choices for reliable intrusion detection in IoT environments.

Figure 7 illustrates the per-class classification accuracy of the semi-supervised models on the CICIoMT2024 dataset. This figure highlights the ability of each model to correctly identify specific attack types and benign traffic in a multiclass intrusion detection setting.

Across most classes, such as MQTT-DoS-Publish-Flood, MQTT-Malformed-Data, TCP_IP-DDoS-SYN, and TCP-IP-DDoS-UDP, all models achieved perfect or near-perfect accuracy. This indicates that these attacks have clear distinguishing patterns that are easily learnable, even under semi-supervised training.

The XGBoost model consistently outperformed others across the more challenging classes, achieving the highest score in ARP-Spoofing (0.95), while the other models failed to detect this class altogether. It also achieved the best accuracy in Recon-Ping-Sweep (0.99) and Recon-VulScan (0.98) attacks, which are often more prone to misclassification due to their similarity to normal scanning behavior.

RF and DT also showed robust results for many classes, especially in Recon-Port-Scan, TCP_IP-DDoS-ICMP, and Benign Traffic. However, GBC and XRT exhibited relatively lower performance in several cases, such as Benign (0.88 and 0.84, respectively) and Recon-Ping-Sweep (0.80 and 0.86, respectively).

A notable observation is that ARP-Spoofing was only correctly detected by XGBoost, suggesting that this attack was either underrepresented or poorly learned by the other models during supervised filtering and semi-supervised refinement.

These results emphasize the importance of evaluating classifier performance at a per-class level, as high overall accuracy can mask poor performance in critical but rare attack types. The figure confirms that XGBoost demonstrates the best generalization across the most diverse range of classes in the CICIoMT2024 dataset, making it the most reliable model for fine-grained IoT threat detection.

It is worth noting that the XGBoost model achieved consistently strong performance across all datasets and evaluation metrics despite being used with default hyperparameter settings. No manual tuning or optimization was performed. This choice was made to ensure consistency across all classifiers and to fairly evaluate the effectiveness of the entropy-based filtering and self-training approach. The ability of XGBoost to perform well without tuning further demonstrates its robustness and adaptability in the context of IoT intrusion detection.

A five-fold cross-validation was conducted to validate the accuracy of the classifiers, as presented in Table 10. Each classifier was evaluated using the three datasets to assess their classification performance. The results show that both DT and XGBoost achieved the highest accuracies, reaching higher than 97% on most datasets. GBC showed good performance, slightly lower than DT and XGBoost at 90% or above across all datasets. RF showed the lowest accuracies, approximately 85%, on the CICIoT2023 dataset, while XRT performed better than RF but lower than DT, GBC, and XGBoost on RT-IoT2022.

4.6. Comparative Evaluation with Existing Work

Table 11 presents a comparative analysis of the proposed entropy-based semi-supervised learning model against benchmark models used in existing research. The results demonstrate that the proposed model shows highly competitive performance across all three benchmark datasets: RT-IoT2022, CICIoT2023, and CICIoMT2024. In RT-IoT2022. The model achieved perfect classification accuracy using DT (100%) and near-perfect results with XGBoost (99%), outperforming or matching many existing supervised and deep-learning methods. For CICIoT2023, our proposed model reached 93% accuracy with XGBoost and XRT. However, CNN-based approaches achieved higher accuracy, reaching 99%, and tree-based models offer lower computational cost [20]. On CICIoMT2024, the model again proved effective, with RF achieving 98% and XGBoost 98%, surpassing many traditional ML baselines and approaching deep-learning benchmarks. Overall, the results confirm that the proposed model is not only robust and efficient but also highly practical for real-world IoT intrusion detection scenarios where label noise and uncertainty are prevalent.

4.7. Computation Time Analysis

Table 12 presents the average inference times (in microseconds) of semi-supervised classifiers on the CICIoMT2024 dataset. The inference (testing) time for each model was measured to show their effectiveness when working in real-time and resource-constrained IoT environments. The DT achieves the fastest inference at 0.15, which makes it highly suitable for real-time and resource-constrained IoMT environments. In contrast, methods such as GBC (17.35), XRT (15.76), and RF (12.51) show improved accuracy at the cost of higher latency. XGBoost (6.04) provides a balanced compromise between performance and speed. All the models perform well within microseconds, but the differences can be important for some applications, such as healthcare, where a slight delay can impact the effectiveness of the system.

5. Conclusions

To conclude, this study presented a robust semi-supervised learning model for intrusion detection in asymmetrical IoT environments. The proposed model integrates entropy-based uncertainty filtering with a self-training ML to improve model generalization, especially under conditions of label noise and misclassification issues. By applying dynamically computed entropy thresholds, the method effectively isolated uncertain predictions and selectively included high-confidence samples to enhance classifier learning. Comprehensive experiments were conducted on three benchmark datasets named RT-IoT2022, CICIoT2023, and CICIoMT2024, each representing diverse and realistic IoT attack scenarios. The results demonstrated that the proposed model achieved significant improvements in multiclass classification accuracy, precision, recall, and F1-scores across various tree-based classifiers. Notably, XGBoost and Random Forest consistently exhibited strong resilience to noise and delivered robust performance across all datasets. The findings confirm that entropy-driven semi-supervised learning offers a practical and effective solution for enhancing intrusion detection capabilities in asymmetrical IoT systems. Future work may investigate the potential for attackers to compromise the IDS by subtly altering input patterns to evade detection. This can be achieved by focusing on improving the IDS robustness against such adversarial compromise through methods like adversarial training, anomaly detection, and adaptive feature refinement.

Author Contributions

Conceptualization: A.A.A. and B.A. Methodology: A.A.A. and B.A. Project management: A.A.A. Resources: B.A. Software: A.A.A. and B.A. Supervision: A.A.A. Validation: B.A. Visualization: B.A. Writing—original draft: A.A.A. and B.A. Writing—proofreading and editing: A.A.A. and B.A. All authors have read and agreed to the published version of the manuscript.

Funding

This Project was funded by KAU Endowment (WAQF) at King Abdulaziz University, Jeddah, under grant no. WAQF: 186-611-2024. The authors, therefore, acknowledge with thanks WAQF and the Deanship of Scientific Research (DSR) for technical and financial support.

Data Availability Statement

The datasets used in this study—RT-IoT2022, CICIoT2023, and CICIoMT2024—are publicly available and can be accessed through their respective official sources.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shen, M.; Gu, A.; Kang, J.; Tang, X.; Lin, X.; Zhu, L.; Niyato, D. Blockchains for Artificial Intelligence of Things: A Comprehensive Survey. IEEE Internet Things J. 2023, 10, 14483–14506. [Google Scholar] [CrossRef]
Humayun, M.; Tariq, N.; Alfayad, M.; Zakwan, M.; Alwakid, G.; Assiri, M. Securing the Internet of Things in Artificial Intelligence Era: A Comprehensive Survey. IEEE Access 2024, 12, 25469–25490. [Google Scholar] [CrossRef]
Torre, D.; Chennamaneni, A.; Jo, J.; Vyas, G.; Sabrsula, B. Toward Enhancing Privacy Preservation of a Federated Learning CNN Intrusion Detection System in IoT: Method and Empirical Study. ACM Trans. Softw. Eng. Methodol. 2025, 34, 1–48. [Google Scholar] [CrossRef]
Goulart, A.; Chennamaneni, A.; Torre, D.; Hur, B.; Al-Aboosi, F.Y. On Wide-Area IoT Networks, Lightweight Security and Their Applications—A Practical Review. Electronics 2022, 11, 1762. [Google Scholar] [CrossRef]
Ficili, I.; Giacobbe, M.; Tricomi, G.; Puliafito, A. From Sensors to Data Intelligence: Leveraging IoT, Cloud, and Edge Computing with AI. Sensors 2025, 25, 1763. [Google Scholar] [CrossRef] [PubMed]
Sinha, S. State of IoT 2024; Technical Report; IoT Analytics GmbH: Hamburg, Germany, 2024. [Google Scholar]
Xu, B.; Sun, L.; Mao, X.; Ding, R.; Liu, C. IoT Intrusion Detection System Based on Machine Learning. Electronics 2023, 12, 4289. [Google Scholar] [CrossRef]
Tariq, N.; Asim, M.; Al-Obeidat, F.; Zubair Farooqi, M.; Baker, T.; Hammoudeh, M.; Ghafir, I. The Security of Big Data in Fog-Enabled IoT Applications Including Blockchain: A Survey. Sensors 2019, 19, 1788. [Google Scholar] [CrossRef]
Khazane, H.; Ridouani, M.; Salahdine, F.; Kaabouch, N. A Holistic Review of Machine Learning Adversarial Attacks in IoT Networks. Future Internet 2024, 16, 32. [Google Scholar] [CrossRef]
Rahman, M.; Al Shakil, S.; Mustakim, M.R. A survey on intrusion detection system in IoT networks. Cyber Secur. Appl. 2025, 3, 100082. [Google Scholar] [CrossRef]
Kumar, S.V.N.S.; Selvi, M.; Kannan, A.; Doulamis, A.D. A Comprehensive Survey on Machine Learning-Based Intrusion Detection Systems for Secure Communication in Internet of Things. Comput. Intell. Neurosci. 2023, 2023, 8981988. [Google Scholar] [CrossRef]
Abdulganiyu, O.H.; Tchakoucht, T.A.; Saheed, Y.K. A systematic literature review for network intrusion detection system (IDS). Int. J. Inf. Secur. 2023, 22, 1125–1162. [Google Scholar] [CrossRef]
Mohy-Eddine, M.; Guezzaz, A.; Benkirane, S.; Azrour, M. An efficient network intrusion detection model for IoT security using K-NN classifier and feature selection. Multimedia Tools Appl. 2023, 82, 23615–23633. [Google Scholar] [CrossRef]
Awajan, A. A Novel Deep Learning-Based Intrusion Detection System for IoT Networks. Computers 2023, 12, 34. [Google Scholar] [CrossRef]
Muneer, S.; Farooq, U.; Athar, A.; Raza, M.A.; Ghazal, T.M.; Sakib, S.; Madhukumar, A.S. A Critical Review of Artificial Intelligence Based Approaches in Intrusion Detection: A Comprehensive Analysis. J. Eng. 2024, 2024, 3909173. [Google Scholar] [CrossRef]
Sharma, B.; Sharma, L.; Lal, C.; Roy, S. Explainable artificial intelligence for intrusion detection in IoT networks: A deep learning based approach. Expert Syst. Appl. 2024, 238, 121751. [Google Scholar] [CrossRef]
Ozkan-Okay, M.; Akin, E.; Aslan, Ö.; Kosunalp, S.; Iliev, T.; Stoyanov, I.; Beloev, I. A Comprehensive Survey: Evaluating the Efficiency of Artificial Intelligence and Machine Learning Techniques on Cyber Security Solutions. IEEE Access 2024, 12, 12229–12256. [Google Scholar] [CrossRef]
Elghalhoud, O.; Naik, K.; Zaman, M.; Manzano, R. Data Balancing and CNN based Network Intrusion Detection System. In Proceedings of the 2023 IEEE Wireless Communications and Networking Conference (WCNC), Glasgow, UK, 26–29 March 2023; IEEE: New York, NY, USA; pp. 1–6. [Google Scholar]
Aljumah, A. IoT-based intrusion detection system using convolution neural networks. PeerJ Comput. Sci. 2021, 7, e721. [Google Scholar] [CrossRef]
Udurume, M.; Shakhov, V.; Koo, I. Comparative Analysis of Deep Convolutional Neural Network—Bidirectional Long Short-Term Memory and Machine Learning Methods in Intrusion Detection Systems. Appl. Sci. 2024, 14, 6967. [Google Scholar] [CrossRef]
Phong, L.T.; Aono, Y.; Hayashi, T.; Wang, L.; Moriai, S. Privacy-Preserving Deep Learning via Additively Homomorphic Encryption. IEEE Trans. Inf. Forensics Secur. 2018, 13, 1333–1345. [Google Scholar] [CrossRef]
Talpini, J.; Sartori, F.; Savi, M. Enhancing trustworthiness in ML-based network intrusion detection with uncertainty quantification. J. Reliab. Intell. Environ. 2024, 10, 501–520. [Google Scholar] [CrossRef]
Sarantos, P.; Violos, J.; Leivadeas, A. Enabling semi-supervised learning in intrusion detection systems. J. Parallel Distrib. Comput. 2025, 196, 105010. [Google Scholar] [CrossRef]
Sharmila, B.S.; Nagapadma, R. Quantized autoencoder (QAE) intrusion detection system for anomaly detection in resource-constrained IoT devices using RT-IoT2022 dataset. Cybersecurity 2023, 6, 1–15. [Google Scholar] [CrossRef]
Neto, E.C.P.; Dadkhah, S.; Ferreira, R.; Zohourian, A.; Lu, R.; Ghorbani, A.A. CICIoT2023: A Real-Time Dataset and Benchmark for Large-Scale Attacks in IoT Environment. Sensors 2023, 23, 5941. [Google Scholar] [CrossRef]
Dadkhah, S.; Neto, E.C.P.; Ferreira, R.; Molokwu, R.C.; Sadeghi, S.; Ghorbani, A.A. CICIoMT2024: A benchmark dataset for multi-protocol security assessment in IoMT. Internet Things 2024, 28, 101351. [Google Scholar] [CrossRef]
Sama, N.U.; Ullah, S.; Kazmi, S.M.A.; Mazzara, M. Cutting-Edge Intrusion Detection in IoT Networks: A Focus on Ensemble Models. IEEE Access 2025, 13, 8375–8392. [Google Scholar] [CrossRef]
Elzaghmouri, B.M.; Jbara, Y.H.F.; Elaiwat, S.; Innab, N.; Osman, A.A.F.; Ataelfadiel, M.A.M.; Zawaideh, F.H.; Alawneh, M.F.; Al-Khateeb, A.; Abu-Zanona, M. A Novel Hybrid Architecture for Superior IoT Threat Detection through Real IoT Environments. Comput. Mater. Contin. 2024, 81, 2299–2316. [Google Scholar] [CrossRef]
Albalwy, F.; Almohaimeed, M. Advancing Artificial Intelligence of Things Security: Integrating Feature Selection and Deep Learning for Real-Time Intrusion Detection. Systems 2025, 13, 231. [Google Scholar] [CrossRef]
Gheni, H.Q.; Al-Yaseen, W.L. Two-step data clustering for improved intrusion detection system using CICIoT2023 dataset. e-Prime 2024, 9, 100673. [Google Scholar] [CrossRef]
Becerra-Suarez, F.L.; Tuesta-Monteza, V.A.; Mejia-Cabrera, H.I.; Arcila-Diaz, J. Performance Evaluation of Deep Learning Models for Classifying Cybersecurity Attacks in IoT Networks. Informatics 2024, 11, 32. [Google Scholar] [CrossRef]
Erskine, S.K. Real-Time Large-Scale Intrusion Detection and Prevention System (IDPS) CICIoT Dataset Traffic Assessment Based on Deep Learning. Appl. Syst. Innov. 2025, 8, 52. [Google Scholar] [CrossRef]
Shebl, A.; Elsedimy, E.I.; Ismail, A.; Salama, A.A.; Herajy, M. DCNN: A novel binary and multi-class network intrusion detection model via deep convolutional neural network. EURASIP J. Inf. Secur. 2024, 2024, 36. [Google Scholar] [CrossRef]
Akar, G.; Sahmoud, S.; Onat, M.; Cavusoglu, Ü.; Malondo, E. L2D2: A Novel LSTM Model for Multi-Class Intrusion Detection Systems in the Era of IoMT. IEEE Access 2025, 13, 7002–7013. [Google Scholar] [CrossRef]
Alqahtani, A.; Alsulami, A.A.; Alqahtani, N.; Alturki, B.; Alghamdi, B.M. A Comprehensive Security Framework for Asymmetrical IoT Network Environments to Monitor and Classify Cyberattack via Machine Learning. Symmetry 2024, 16, 1121. [Google Scholar] [CrossRef]
Bajenaid, A.; Khemakhem, M.; Eassa, F.E.; Bourennani, F.; Qurashi, J.M.; Alsulami, A.A.; Alturki, B. Towards Robust SDN Security: A Comparative Analysis of Oversampling Techniques with ML and DL Classifiers. Electronics 2025, 14, 995. [Google Scholar] [CrossRef]
Venne, S.; Clarkson, T.; Bennett, E.; Fischer, G.; Bakker, O.; Callaghan, R. Automated Ransomware Detection using Pattern-Entropy Segmentation Analysis: A Novel Approach to Network Security. Authorea 2024. [Google Scholar] [CrossRef]
Zhang, K.; Wang, Y.; Li, O.; Hao, S.; He, J.; Lan, X.; Yang, J.; Ye, Y.; Zakariah, M. Improved self-training-based distant label denoising method for cybersecurity entity extractions. PLoS ONE 2024, 19, e0315479. [Google Scholar] [CrossRef]

Figure 1. Proposed Model Flowchart.

Figure 2. Accuracy of semi-supervised classifiers on RT-IoT2022.

Figure 3. Classification accuracy per class across semi-supervised models on RT-IoT2022.

Figure 4. Accuracy of semi-supervised classifiers on CICIoT2023.

Figure 5. Classification accuracy per class across semi-supervised models on CICIoT2023.

Figure 6. Accuracy of semi-supervised classifiers on CICIoMT2024.

Figure 7. Classification accuracy per class across semi-supervised models on CICIoMT2024.

Table 1. Related Works’ Summary.

Article	Year	Techniques	Datasets	Accuracy
[27]	2024	ERT, KNN, XGB, SVM, GB, DT and RF	RT-IoT2022	ERT = 99.7% KNN = 99.4% XGB = 99.6% SVM = 99.2% GB = 99.6% DT = 99.5% RF = 99.6%
[28]	2024	BLSTM, GRU, and CNN	RT-IoT2022	Proposed model = 99.62%
[29]	2025	GR, CFS, Pearson, SU, IG, PCA, ANN, TabNet and DNN	RT-IoT2022	ANN + Pearson + PCA = 99.7% TabNet + Pearson + PCA = 99.3% DNN + Pearson + PCA = 99.6%.
[3]	2025	CNN	RT-IoT 2022, CIC IoT 2023, CIC IoMT 2024, TON-IoT, IoT-23, BoT-IoT, and EdgeIIoT.	Average accuracy of 99.47% on RT-IoT 2022 93.96% on CIC IoT 2023, 95.63% on CIC IoMT 2024 99.63% on the TON-IoT 99.99% on IoT-23 100.00% on BoT-IoT 89.12% on EdgeIIoT.
[30]	2024	GSK, MLP, and AE	CIC IoT 2023	Binary classification: MLP = 99.26% AE = 98.76% Multi classification: MLP = 97.46% AE = 83.81%
[31]	2024	CNN, LSTM, and DNN	CIC IoT 2023	CNN = 99.10% (multiclass) and 99.40% (binary). LSTM = 85.98% (multiclass) and 99.36% (binary). DNN = 99.02% (multiclass) and 99.38% (binary).
[32]	2025	DT, SVM, MLP, CNN, RNN and ANN	CIC IoT 2023	DLMIDPSM achieves an accuracy rate of above 85% and a precision rate of 99%.
[26]	2024	ADA, LR, DNN, and RF	CICIoMT2024	Accuracy of binary class: ADA = 99.6% LR = 99.5% DNN = 99.6% RF = 99.6% Multiclass: ADA = 42.2% LR = 72.7% DNN = 72.9% RF = 73.3%
[33]	2024	DCNN	CICIoT2023, CICIDS-2017 and CICIoMT2024	Accuracy of binary and multiclass: CICIoT2023—99.5%( binary) and 99.25% (multiclass). CICIDS-2017—99.96% (binary) and 99.96% (multiclass). CICIoMT2024—99.98% (binary) and 99.86% (multiclass).
[34]	2025	LSTM	CICIoMT2024	An accuracy rate of 98%

Table 2. RT-IoT2022 Dataset Information.

Class	Description	Benign/Attack
Thing Speak-LED	Controlling LED through ThingSpeak.	Benign
Wipro Bulb	A WiFi-enabled smart bulb communicating with a mobile device for remote control.
MQTT Publish	MQTT protocol used in IoT communications.
DoS SYN Hping	DoS SYN Flood attack generated using hping.	Attack
ARP Poisoning	Address Resolution Protocol (ARP) Spoofing attack.
NMAP UDP Scan	An attack uses a UDP scan using Nmap, a network scanning tool.
NMAP TCP Scan	An attack uses a TCP scan using Nmap, a network scanning tool.
NMAP XMAS Tree Scan	Nmap XMAS tree scan attack.
NMAP OS Detection	An attack using Nmap’s OS detection scan.
NMAP FIN Scan	Nmap FIN scan attack.
DDoS Slow Loris	DDoS attack using Slow Loris.
Metasploit Brute-Force SSH	Brute-Force attack on SSH login using Metasploit.

Table 3. CICIoT2023 Dataset Information.

Category	Subcategory	Description
Benign Traffic	-	Normal IoT traffic
DDoS	RST FIN Flood	Sends a tremendous number of packets with RST and FIN flags to exhaust system resources.
	ICMP Fragmentation	Sends fragmented ICMP packets to overwhelm parsing systems.
	PSHACK Flood	Uses PUSH and ACK TCP flags in high volume to exhaust resources.
	TCP Flood	Overwhelms a target with TCP packets to exhaust system resources.
	UDP Fragmentation	Uses large, fragmented UDP packets to consume bandwidth.
	UDP Flood	Overwhelms a target with UDP packets to exhaust system resources.
	SYN Flood	Floods with TCP SYN packets without completing handshakes, exhausting connections.
	HTTP Flood	Floods HTTP requests targeting application layer servers.
	ACK Fragmentation	Uses fragmented ACK packets to bypass security and exhaust resources.
	ICMP Flood	Floods with ICMP (ping) packets to exhaust connections.
	Synonymous IP Flood	Sends spoofed TCP SYN packets with identical source/destination IPs to drain resources.
	Slow Loris	Keeps connections open by sending partial HTTP requests and exhausting web server resources.
Reconnaissance	Host Discovery	Scans the network to find active hosts.
	OS Scan	Attempts to determine the operating system of a host.
	Vulnerability Scan	Uses automated tools to identify weaknesses in hosts
	Port Scan	Identifies open ports on devices
	Ping Sweep	Send ICMP (ping) requests across a subnet to detect live hosts.
Spoofing	DNS Spoofing	Alters DNS responses to redirect victims to malicious sites.
Spoofing	MITM ARP Spoofing	Sends fake ARP messages to intercept traffic between devices
DoS	TCP Flood	Uses a single source to overwhelm the target by sending a high volume of TCP packets.
	UDP Flood	Uses a single source to overwhelm the target by sending a high volume of UDP packets.
	HTTP Flood	A single source sends excessive HTTP requests.
	SYN Flood	A single source SYN flood targeting TCP handshakes.
Mirai	Greip Flood	Uses GRE protocol with spoofed IPs to flood a victim.
	UDP Plain	Send repetitive UDP packets to disrupt targets.
	Greeth Flood	Floods using GRE packets with Ethernet header spoofing.
Brute-Force	Dictionary Brute-Force	Attempts to guess credentials using a dictionary of common passwords.
Web-Based	SQL Injection	Injects malicious SQL into input fields to manipulate databases.
	Command Injection	Executes system-level commands through vulnerable input.
	Uploading Attack	Uploads malicious files to exploit vulnerable servers.
	XSS	Inject scripts into web pages to steal data or hijack sessions.
	Browser Hijacking	Modifies browser settings to redirect traffic or display ads.
	Backdoor Malware	Installs software to maintain unauthorized access.

Table 4. CICIoMT2024 Dataset Information.

Category	Subcategory	Description
Benign Traffic	-	Normal IoT traffic
DDoS	SYN Flood	Described in CICIoT2023
	ICMP Flood	Described in CICIoT2023
	UDP Flood	Described in CICIoT2023
Reconnaissance	Ping Sweep	Described in CICIoT2023
	Vulnerability Scan	Described in CICIoT2023
	OS Scan	Described in CICIoT2023
	Port Scan	Described in CICIoT2023
MQTT	Malformed Data	Sends corrupted MQTT packets to crash the broker.
	DoS Publish Flood	Publishes large volumes of messages to MQTT topics from one device.
	DDoS Publish Flood	Publishes large volumes of messages to MQTT topics on multiple devices.
Spoofing	ARP Spoofing	Sends falsified ARP messages to redirect traffic through a malicious device.

Table 5. Range of Entropy Values.

Dataset	Number of Classes	Min Possible Entropy Value	Max Possible Entropy Value
RT-IoT2022	12	~0.0	~3.58
CICIoT2023	34	~0.0	~5.09
CICIoMT2024	12	~0.0	~3.58

Table 6. Selected Entropy Threshold.

Model	Selected Entropy Threshold
Model	RT-IoT2022	CICIoT2023	CICIoMT2024
DT	0.008	0.0171	0.0952
GBC	0.1081	0.7916	0.3995
RF	0.0407	1.0545	0.1207
XGBoost	0.0188	0.5584	0.1665
XRT	0.0368	0.8088	0.2529

Table 7. Entropy and Accuracy for Semi-Supervised Classifiers on RT-IoT2022.

Model	Selected Entropy Threshold	Supervised Training (Certain Predictions Accuracy)	Semi-Supervised Retraining (Overall Accuracy)
DT	0.0080	1.00	1.00
GBC	0.1081	1.00	0.99
RF	0.0407	1.00	0.97
XGBoost	0.0188	1.00	0.99
XRT	0.0368	1.00	0.97

Table 8. Entropy and Accuracy for Semi-Supervised Classifiers on CICIoT2023.

Model	Selected Entropy Threshold	Supervised Training (Certain Predictions Accuracy)	Semi-Supervised Retraining (Overall Accuracy)
DT	0.0171	0.92	0.92
GBC	0.7916	1.00	0.89
RF	1.0545	0.96	0.85
XGBoost	0.5584	1.00	0.93
XRT	0.8088	0.99	0.93

Table 9. Entropy and Accuracy for Semi-Supervised Classifiers on CICIoMT2024.

Model	Selected Entropy Threshold	Supervised Training (Certain Predictions Accuracy)	Semi-Supervised Retraining (Overall Accuracy)
DT	0.0952	0.98	0.96
GBC	0.3995	1.00	0.94
RF	0.1207	1.00	0.98
XGBoost	0.1665	1.00	0.97
XRT	0.2529	1.00	0.91

Table 10. Classifiers average five-fold cross-validation accuracy.

Model	RT-IoT2022	CICIoT2023	CICIoMT2024
DT	0.9987	0.9259	0.9770
GBC	0.9956	0.90	0.97
RF	0.9584	0.8488	0.8895
XGBoost	0.9951	0.9277	0.9759
XRT	0.9763	0.9344	0.9043

Table 11. Comparison of existing IDS Techniques.

Work	Year	Techniques	Dataset	Classification Type	Training Type	Best Reported Accuracy
[27]	2024	ERT, KNN, XGB, SVM, GB, DT, RF	RT-IoT2022	Multiclass	Supervised	Accuracy of 99.7% (ERT), F1-Scores of 95% (XGBoost and RF).
[28]	2024	BLSTM, GRU, CNN	RT-IoT2022	Multiclass	Deep Learning	Accuracy of 99.62%, F1-scores of 99.61%
[29]	2025	ANN, TabNet, DNN + PCA + FS	RT-IoT2022	Multiclass	Deep Learning	Accuracy of 99.7%, F1-scores of 99.6%
[3]	2025	CNN	RT-IoT2022, CICIoT2023, CICIoMT2024	Multiclass	Deep Learning	Accuracy of 99.47%—F1-scores of 97.31% (RT); Accuracy of 93.96%—F1-scores of 77.63% (CICIoT); Accuracy of 95.63%—F1-scores of 95.16% (CICIoMT)
[30]	2024	GSK, MLP, AE	CICIoT2023	Binary, Multiclass	Deep Learning	Accuracy of 97.46%, F1-scores of 97% (multiclass)
[31]	2024	CNN, LSTM, DNN	CICIoT2023	Binary, Multiclass	Deep Learning	Accuracy of CNN = 99.10%, DNN = 99.02%, LSTM = 85.98% F1-scores of CNN = 99.05%, DNN = 98.95%, LSTM = 84.03% (Multiclass)
[32]	2025	DT, SVM, MLP, CNN, RNN, ANN	CICIoT2023	Binary, Multiclass	Deep Learning	>85%
[26]	2024	ADA, LR, DNN, RF	CICIoMT2024	Binary, Multiclass	Machine Learning	Accuracy of 73.3%, F1-scores of 55.1% (multiclass, RF)
[33]	2024	DCNN	CICIoT2023, CICIoMT2024	Binary, Multiclass	Deep Learning	Accuracy and F1-scores of 99.25% (CICIoT), Accuracy and F1-scores of 99.86% (CICIoMT) (multiclass)
[34]	2025	LSTM	CICIoMT2024	Binary, Multiclass	Deep Learning	Accuracy and F1-scores of 98%
(Proposed)	2025	Semi-Supervised Tree-Based + Entropy Filtering	RT-IoT2022, CICIoT2023, CICIoMT2024	Multiclass	Semi-Supervised	RT-IoT2022: 100% (DT), 99% (XGB) CICIoT2023: 93% (XGB/XRT) CICIoMT2024: 98% (RF), 98% (XGB)

Table 12. Average Inference Time for Semi-Supervised Classifiers on CICIoMT2024.

Model	Time in Microseconds
DT	0.15
GBC	17.35
RF	12.51
XGBoost	6.04
XRT	15.76

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alturki, B.; Alsulami, A.A. Semi-Supervised Learning with Entropy Filtering for Intrusion Detection in Asymmetrical IoT Systems. Symmetry 2025, 17, 973. https://doi.org/10.3390/sym17060973

AMA Style

Alturki B, Alsulami AA. Semi-Supervised Learning with Entropy Filtering for Intrusion Detection in Asymmetrical IoT Systems. Symmetry. 2025; 17(6):973. https://doi.org/10.3390/sym17060973

Chicago/Turabian Style

Alturki, Badraddin, and Abdulaziz A. Alsulami. 2025. "Semi-Supervised Learning with Entropy Filtering for Intrusion Detection in Asymmetrical IoT Systems" Symmetry 17, no. 6: 973. https://doi.org/10.3390/sym17060973

APA Style

Alturki, B., & Alsulami, A. A. (2025). Semi-Supervised Learning with Entropy Filtering for Intrusion Detection in Asymmetrical IoT Systems. Symmetry, 17(6), 973. https://doi.org/10.3390/sym17060973

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Semi-Supervised Learning with Entropy Filtering for Intrusion Detection in Asymmetrical IoT Systems

Abstract

1. Introduction

2. Related Works

3. Methodology

3.1. Datasets

3.1.1. RT-IoT2022

3.1.2. CICIoT2023

3.1.3. CICIoMT2024

3.2. Proposed Model Components

3.2.1. Training and Testing Classifiers

3.2.2. Entropy-Based Uncertainty Detection

3.2.3. Self-Training of Classifiers

3.3. Proposed Model Flow of Operations

4. Results and Discussion

4.1. Experimental Environment

4.2. Classifiers Parameters

4.2.1. DT

4.2.2. GBC

4.2.3. RF

4.2.4. XGBoost

4.2.5. XRT

4.3. Result of RT-IoT2022 Dataset

4.4. Result of CICIoT2023 Dataset

4.5. Result of CICIoMT2024 Dataset

4.6. Comparative Evaluation with Existing Work

4.7. Computation Time Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI