Robust Intrusion Detection System Using an Improved Hybrid Deep Learning Model for Binary and Multi-Class Classification in IoT Networks

Hesham Kamal; Maggie Mashaly

doi:10.3390/technologies13030102

Abstract

The rapid expansion of internet of things (IoT) applications has significantly boosted productivity and streamlined daily activities. However, this widespread adoption has also introduced considerable security challenges, making IoT environments vulnerable to large-scale botnet attacks. These attacks have often succeeded in achieving their malicious goals, highlighting the urgent need for robust detection strategies to secure IoT networks. To overcome these obstacles, this research presents an innovative anomaly-driven intrusion detection approach specifically tailored for IoT networks. The proposed model employs an advanced hybrid architecture that seamlessly integrates convolutional neural networks (CNN) with multilayer perceptron (MLP), enabling precise detection and classification of both binary and multi-class IoT network traffic. The CNN component is responsible for extracting and enhancing features from network traffic data and preparing these features for effective classification by the MLP, which handles the final classification task. To further manage class imbalance, the model incorporates the enhanced hybrid adaptive synthetic sampling-synthetic minority oversampling technique (ADASYN-SMOTE) for binary classification, advanced ADASYN for multiclass classification, and employs edited nearest neighbors (ENN) alongside class weights. The CNN-MLP architecture is meticulously crafted to minimize erroneous classifications, enhance instantaneous threat detection, and precisely recognize previously unseen cyber intrusions. The model’s effectiveness was rigorously tested using the IoT-23 and NF-BoT-IoT-v2 datasets. On the IoT-23 dataset, the model achieved 99.94% accuracy in two-stage binary classification, 99.99% accuracy in multiclass classification excluding the normal class, and 99.91% accuracy in single-phase multiclass classification including the normal class. Utilizing the NF-BoT-IoT-v2 dataset, the model attained an exceptional 99.96% accuracy in the dual-phase binary classification paradigm, 98.02% accuracy in multiclass classification excluding the normal class, and 98.11% accuracy in single-phase multiclass classification including the normal class. The results demonstrate that our model consistently delivers high levels of accuracy, precision, recall, and F1 score across both binary and multiclass classifications, establishing it as a robust solution for securing IoT networks.

Keywords:

ADASYN; CNN-MLP; deep learning; DL; DNN; ENN; IDS; SMOTE

1. Introduction

The advancement of internet of things (IoT) network infrastructure has led to a significant increase in the proliferation of sophisticated embedded technologies and autonomous intelligent systems. The fundamental objective of IoT is to cultivate intelligent ecosystems that elevate human well-being, optimize convenience, and drive strategic technological superiority. Within these intelligent systems, devices interact and collaborate to perform various functions, finding applications across diverse sectors from manufacturing to commercial uses. These ecosystems encompass intelligent residential spaces, interconnected urban infrastructures, adaptive architectural structures, and cutting-edge utilities like automated manufacturing processes, energy distribution grids, and advanced transportation networks [1]. Despite its benefits, IoT introduces several challenges, particularly concerning privacy and security. As IoT technology continues to evolve, its security vulnerabilities are expected to grow, increasing the risk of cyber-attacks [2]. Reports from 2020 have highlighted a notable rise in attacks on IoT devices, underscoring an increase in vulnerabilities within wireless networks [3]. The rising potential rewards for successful breaches are likely to encourage attackers to develop more sophisticated methods to exploit these technologies. Traditional security measures designed for conventional internet environments often fall short in addressing the unique vulnerabilities of IoT networks. Effective network security requires a combination of prevention, detection, and mitigation strategies. In the IoT context, cyber security is a crucial component of the information technology infrastructure. Although IoT enhances performance and competitiveness through advanced control mechanisms, it also heightens the risk of cyber-attacks. The evolving privacy and security paradigm in IoT necessitates the creation of a unified framework to streamline device communication amidst increasing complexity.

Intrusion detection systems (IDS) are essential for addressing security challenges in IoT networks. By monitoring and analyzing network traffic, IDS detect and respond to suspicious activities in real-time, identifying anomalies that traditional security measures might overlook. Leveraging advanced machine learning and deep learning techniques, IDS can adapt to evolving threats and strengthen IoT security. Implementing a robust IDS is vital for mitigating risks and protecting IoT infrastructures from sophisticated cyber-attacks [4,5,6].

Real-time intrusion detection is vital for protecting network systems, especially in IoT environments. Advanced deep learning models have revealed profound effectiveness in the immediate scrutiny of network traffic, empowering the rapid recognition of intrusion attempts [7]. A range of advanced learning techniques improves the agility of Intrusion Detection Systems, especially in their ability to evolve in response to emerging security vulnerabilities [8]. Embedding instantaneous detection mechanisms can profoundly enhance network defense by rapidly spotting and counteracting malicious activities [9]. IDS, a widely used security solution, is categorized into two main types: Signature-based IDS, which relies on known attack patterns and requires frequent updates, and Anomaly-based IDS, which detects deviations from normal behavior and is effective against zero-day attacks. The latter approach often utilizes machine learning and deep learning techniques to analyze large datasets and identify anomalies with high accuracy, reducing false positives. This approach has been adopted in our study for IoT networks.

In this study, we present an enhanced hybrid convolutional neural network-multilayer perceptron (CNN-MLP) deep learning model tailored for effective intrusion detection. Our model tackles class imbalance through a comprehensive set of data resampling techniques. We utilize the enhanced hybrid adaptive synthetic sampling-synthetic minority oversampling technique (ADASYN-SMOTE) and enhanced ADASYN for oversampling in binary and multiclass scenarios, respectively. To enhance model performance on underrepresented classes, we employ edited nearest neighbors (ENN) for targeted undersampling simultaneously strategically adjusting class weights to amplify the significance of minority classes throughout training. The results highlight the exceptional performance of our model, which surpasses prior methods. On the IoT-23 dataset, the model achieved 99.94% accuracy in two-stage binary classification, 99.99% accuracy in multiclass classification excluding the normal class, and 99.91% accuracy in single-phase multiclass classification including the normal class. Utilizing the NF-BoT-IoT-v2 dataset, the model attained an exceptional 99.96% accuracy in the dual-phase binary classification paradigm, 98.02% accuracy in multiclass classification excluding the normal class, and 98.11% accuracy in single-phase multiclass classification including the normal class. The subsequent sections delineate the core advancements and foundational contributions of our research:

Development of a robust intrusion detection system leveraging an advanced hybrid CNN-MLP architecture, wherein the CNN layers intricately extract high-dimensional feature representations from input data, while the MLP layers execute refined classification, ensuring precise anomaly and intrusion detection.
Mitigating class imbalance by employing advanced hybrid ADASYN-SMOTE for binary classification, advanced ADASYN for multi-class classification, alongside ENN for noise reduction and class weights to ensure balanced model performance.
Utilizing enhanced Z-score method to remove outliers improves dataset quality and model performance, allowing the model to focus on relevant data and enhancing its ability to accurately classify attacks.
Utilizing the IoT-23 and NF-BoT-IoT-v2 datasets, this study substantiates the superior efficacy of the suggested approach, exceeding the performance of existing cutting-edge techniques in intrusion detection.

The paper is organized as follows: Section 2 offers a thorough review of the relevant literature. Section 3 details the methodology utilized in this study. Section 4 presents the experimental results, while Section 5 delves thoroughly into a detailed interpretation of the outcomes. Section 6 examines the constraints and challenges inherent in the suggested methodology. Section 7 brings the research to a close by integrating the primary advancements and distilling the essential revelations drawn from the investigation. Finally, Section 8 explores promising directions for future research and further exploration.

2. Related Literature

IDSs have emerged as indispensable tools for protecting national, economic, and individual security, driven by the exponential growth in data accumulation and the ever-expanding web of global interconnectivity. In 1980, James P. Anderson laid the groundwork for intrusion detection [10], conceived as a strategic approach to address system vulnerabilities and fortify monitoring capabilities. Over time, IDSs have been widely adopted, with ongoing efforts by security professionals to improve their accuracy and effectiveness. This section undertakes a profound examination of the diverse deep learning (DL) strategies implemented for threat identification and system defense, as detailed across the scholarly landscape. Building on its remarkable success in domains like image recognition and natural language processing, deep learning has risen as a leading approach for detecting traffic anomalies within IDSs. In scholarly research, DL methods are frequently employed for classifying different types of attacks in intrusion detection systems due to their versatility and superior performance.

2.1. Binary Classification

In IDS binary classification, the CNN-MLP technique leverages the spatial pattern recognition capabilities of CNNs alongside the feature learning strength of MLPs. CNNs excel at identifying complex patterns in network traffic, while MLPs refine these features, enabling more precise decision-making. This synergy enhances the model’s ability to distinguish between malicious and benign activities. By combining CNN’s pattern recognition with MLP’s robust classification abilities, this integrated approach significantly improves the accuracy and efficiency of IDS, bolstering real-time detection of cyber threats and fortifying defenses against evolving attacks.

In [11], the researchers unveil the range-optimized attention convolutional scattered technique (ROAST-IoT), a pioneering and intricately designed AI model, meticulously engineered to revolutionize intrusion detection within the dynamic and multifaceted landscape of IoT ecosystems. This range-optimized attention convolutional scattered technique harnesses a multi-faceted approach to decode the intricate, hidden correlations within diverse network traffic data, facilitating a profound analysis of the underlying dynamics. System behavior is meticulously tracked through sensors, with the data subsequently stored on a cloud server for thorough and exhaustive examination. The model’s performance is scrutinized across a diverse set of well-established datasets, like IoT-23, Edge-IIoT, ToN-IoT, and UNSW-NB15, ensuring a comprehensive and multifaceted evaluation of its capabilities. In [12], the authors present an innovative classification algorithm, carefully crafted to identify malicious traffic in IoT ecosystems, leveraging cutting-edge machine learning methods to improve detection precision and bolster system security. The suggested approach utilizes an authentic IoT dataset that closely resembles actual traffic behavior, thoroughly evaluating the performance of various classification algorithms to measure their effectiveness in accurately identifying anomalies and ensuring the resilience of the system. In [13], the authors intricately define the critical constraints necessary to construct a plausible adversarial cyber-attack scenario and introduce a sophisticated methodology for conducting an exhaustive adversarial analysis, with particular emphasis on employing a genuine evasion attack vector designed to authentically bypass defenses and rigorously challenge system robustness. This method was implemented to perform an extensive and intricate assessment of the effectiveness of three supervised learning algorithms: random forest (RF), extreme gradient boosting (XGB), and light gradient boosting machine (LGBM), as well as an unsupervised approach, isolation forest (IFOR), providing a profound and exhaustive investigation into their performance and appropriateness for the specific context. In [14], the authors introduce three pivotal machine learning methodologies, meticulously applied for handling classification tasks involving both binary and multiple categories, offering a detailed exploration of their capabilities and applications. These techniques are applied within an IDS specifically designed for detecting IoT-based attacks and classifying different types of attacks. The study utilizes the IoT-23 dataset, a state-of-the-art and all-encompassing dataset, to craft an advanced intelligent IDS proficient in detecting and categorizing attack patterns within the complex landscape of IoT environments. In [15], the authors confront this complex dilemma by unveiling a groundbreaking IoT/IIoT dataset, meticulously engineered with ground truth labels that clearly distinguish between normal and attack classes, offering a sophisticated solution for enhancing detection accuracy. This dataset also integrates a type feature that delineates attack sub-classes, enhancing the capability to perform intricate multi-class classification tasks and offering a deeper level of precision in categorizing diverse attack variants. Referred to as TON_IoT, the dataset carefully aggregates telemetry information from IoT/IIoT services, along with comprehensive operating system logs, and network traffic, providing a profound and multifaceted view of the intricate operations and interactions within an IoT network. The data is sourced from a highly realistic simulation of a mid-sized network environment, meticulously constructed within the Cyber Range and IoT Labs at UNSW Canberra (Australia), reflecting authentic operational conditions. In [16], the authors carried out their research utilizing PySpark in conjunction with Apache Spark within the Google Colaboratory (Colab) environment, harnessing the power of Keras and Scikit-Learn libraries for advanced machine learning processing. The datasets employed for training and testing the model include CICIoT2023 and TON_IoT. To ensure the inclusion of relevant features, the datasets were subjected to a correlation analysis process to effectively determine and isolate the most relevant features. The researchers constructed an intricate deep learning framework that blends one-dimensional CNN with LSTM models, meticulously crafted to harness the full potential of temporal and spatial feature extraction. In [13], the authors establish the fundamental conditions required for modeling a genuine adversarial cyber-attack scenario and present a comprehensive methodology for conducting reliable adversarial robustness analysis, employing a credible technique for evading detection. This methodology was implemented to meticulously analyze the effectiveness and capabilities of three supervised learning algorithms RF, XGB, and LGBM, in addition to an unsupervised technique, IFOR, for evaluating their performance and effectiveness.

This research provides an in-depth analysis of various deep learning models and their integration with big data to enhance the performance of IDS. In [17], a deep neural network (DNN) model demonstrates exceptional performance, achieving a near-perfect accuracy rate of 99.99% in binary classification tasks. The study emphasizes the effectiveness of deep learning combined with big data analytics in improving IDS capabilities. Three distinct classification techniques, random forest, gradient boosting tree (GBT), and a deep feed-forward neural network, are leveraged to analyze and categorize network traffic. An exhaustive uniformity assessment is meticulously performed to pinpoint and isolate the foremost impactful characteristics inherent in the datasets, thereby enhancing the model’s fidelity and amplifying its predictive proficiency. In [18], the authors present a sophisticated DNN model, which delivers outstanding results by achieving a remarkable 93.1% accuracy in binary classification, highlighting its capability to accurately differentiate between classes with exceptional reliability and performance. This research focuses on building a highly adaptive and proficient intrusion detection system, engineered to identify and categorize novel and unforeseen cyber threats with precision. Considering the constantly changing landscape of network environments and the swift progression of emerging threats, a range of datasets is rigorously analyzed through both static and dynamic methodologies to determine the most effective strategies for uncovering new and evolving threats. The research comprehensively evaluates DNN models in comparison with conventional machine learning classifiers, utilizing publicly accessible benchmark malware datasets for a robust analysis. The authors of [19] introduce a DNN-based IDS model that achieves 99% accuracy. This framework undergoes rigorous evaluation on a recently curated dataset encompassing both packet-level and flow-oriented data, supplemented with comprehensive metadata. The dataset exhibits a pronounced imbalance and comprises 79 distinct attributes, some of which correspond to classes characterized by an extremely limited number of training instances. This research illuminates the complexities introduced by data imbalance and emphasizes the transformative capabilities of deep learning in mitigating these challenges. In [20], the researchers present a cutting-edge stacked autoencoder (SAE) model, achieving an outstanding accuracy of 99.92%. Their study outlines a novel framework for IDS, which is comprised of five integral components: data preprocessing, the Autoencoder structure, a robust database, a classification mechanism, and a feedback loop, all working in synergy to optimize the system’s efficiency and precision. The Autoencoder compresses preprocessed data to extract lower-dimensional features, which are then used for classification. The compressed data is stored in the database, which can be leveraged for forensic analysis, post-event evaluation, and retraining. In [21], the authors present an LSTM model that reaches 92.2% accuracy in binary classification. This model represents a groundbreaking approach by seamlessly incorporating attention mechanisms with LSTM networks, enabling it to capture and effectively process both the time-based and spatial relationships within network communication. The model undergoes evaluation using the UNSW-NB15 dataset, which offers a broad spectrum of patterns and considerable discrepancies between the training and testing data, thus presenting a complex and demanding context for assessing the model’s performance. In [22], the authors introduce a hybrid model combining CNN with bi-directional long short-term memory (BiLSTM) networks, reaching an impressive 97.90% accuracy within the domain of dual-category predictive modeling. This unified framework synergistically fuses a bidirectional LSTM with an optimized CNN framework, employing advanced attribute refinement techniques methodologies to refine its structure and reduce computational intricacy. Similarly, in [23], a random forest model achieves 98.6% accuracy. The paper presents a comprehensive attack detection strategy on the UNSW-NB15 dataset using advanced machine learning and deep learning techniques. In [17], a DNN model reaches 99.16% accuracy in network traffic classification. This investigation adopts a stratified 5-fold cross-validation approach while synergizing advanced ensemble learning paradigms with the computational prowess of Apache Spark MLlib and the deep neural network capabilities of Keras.

In a comparable study, [24] proposed a recurrent neural network (RNN) leveraging LSTM units, designed to classify categories using a dataset that incorporates 122 distinct features. This model demonstrated a robust performance, attaining an accuracy rate of 82.68% on the Test+ dataset. In [25], the authors tackled the challenge of class imbalance through the fusion of a CNN-BiLSTM architecture with the ADASYN resampling technique, the model achieves an impressive 90.73% accuracy on the Test+ dataset. To further improve performance, [26] optimized the Autoencoder network for anomaly detection, which achieved an accuracy of 90.61% on Test+. In [27], a multi-CNN model with discrete pre-processing was proposed, successfully classifying attacks on Test+ with an accuracy of 83%. In [28], the authors pioneered advancements in IDS for cloud computing environments, crafting and rigorously assessing two avant-garde deep neural frameworks. The primary model implemented a sophisticated MLP framework, trained via backpropagation (BP), whereas the subsequent model enhanced the MLP training process by incorporating particle swarm optimization (PSO) to refine its performance. Both approaches delivered an exceptional accuracy of 98.97%, showcasing substantial improvements in IDS performance and efficiency for intrusion detection and prevention. In [29], the investigation meticulously analyzed the capabilities of advanced deep learning methodologies for network intrusion detection, undertaking an extensive comparative assessment across diverse frameworks, including Keras, TensorFlow, Theano, fast.ai, and PyTorch 2.6. The investigation devised an advanced deep-learning-driven MLP architecture, achieving a remarkable precision rate of 98.68% in discerning unauthorized network breaches and methodically categorizing an extensive spectrum of cyber threats. The evaluation was meticulously conducted using the CSE-CIC-IDS2018 dataset as a standardized benchmark for comprehensive performance validation. Along the same lines, [30] introduced a pioneering IDS that integrated a customized recurrent convolutional neural network (RC-NN), meticulously optimized through the Ant Lion Optimization algorithm for enhanced performance and precision, significantly boosting the system’s performance in detecting and preventing intrusions. This approach achieved an accuracy of 94%, demonstrating the effectiveness of the RC-NN-IDS in detecting and mitigating network intrusions. In [31], the authors introduce an advanced deep learning framework designed to improve the effectiveness of intrusion detection systems by integrating a denoising autoencoder (DAE) serving as the foundational pillar of their methodology. The DAE model undergoes training leveraging a progressive, an adaptive, stepwise optimization strategy designed to minimize overfitting and circumvent local minima, resulting in a robust model performance, achieving a commendable accuracy of 96.53%. This approach guarantees enhanced dependability in identifying and preventing network intrusions. In [32], the authors introduce a hidden naïve bayes (HNB) classifier, specifically designed to counter DoS attacks. This advanced data mining model relaxes the traditional Naïve Bayes assumption of conditional independence, incorporating discretization and feature selection techniques to refine the system. The approach achieves an accuracy of 97% by prioritizing the most relevant features, which not only improves performance but also reduces processing time. In [33], the authors present a novel method for classifier development, demonstrating results on two prominent intrusion detection datasets. Their method employs a sequentially optimized ensemble of artificial neural networks (ANNs) to develop a robust and high-performance multi-class classification system. Reaching an impressive accuracy of 98.25%, this approach refines the conventional one-vs-rest technique by incorporating an extra example filtering phase, which greatly amplifies its overall performance.

We firmly contend that our innovative CNN-MLP model outperforms current models, as evidenced by an in-depth comparison with existing studies. On the IoT-23 and NF-BoT-IoT-v2 datasets, our model achieves exceptional performance with 99.94% accuracy in binary classification on the IoT-23 dataset and 99.96% accuracy in binary classification on the NF-BoT-IoT-v2 dataset. These results significantly exceed those reported in previous studies. An in-depth evaluation of the effectiveness of our model, compared with findings from other pertinent studies, is illustrated in Table 1.

Table 1. Overview of related work in binary classification.

2.2. Multi-Class Classification

The combined application of convolutional neural networks and multilayer perceptrons forms a potent method for accurate multi-class attack categorization in intrusion detection systems. Initially, dimensionality reduction is achieved through MLPs, which extract and capture crucial patterns and features from the network traffic data. Subsequently, these refined features are processed by CNNs, known for their exceptional capability in recognizing intricate patterns and irregularities. This cohesive strategy substantially strengthens the IDS’s capacity to meticulously differentiate among a broad spectrum of attack categories, leading to a marked improvement in detection accuracy and substantially fortifying the robustness and effectiveness of the entire system.

In [12], the researchers unveil a groundbreaking classification algorithm meticulously engineered to identify malicious traffic within IoT ecosystems, employing advanced machine learning methodologies to enhance detection capabilities. This methodology leverages a real-world IoT dataset, meticulously designed to replicate authentic traffic scenarios, and conducts a thorough evaluation of multiple classification algorithms to scrutinize their effectiveness in distinguishing malicious activities. In [34], the authors delve into the realm of IoT network security, exploring the effectiveness of machine learning algorithms in detecting anomalies within network data, aiming to enhance the identification of potential threats. The paper assesses several ML algorithms proven effective in similar scenarios and conducts a thorough comparative analysis using diverse parameters and approaches. In [35], the authors embark on an extensive investigation into a diverse array of machine learning and deep learning methodologies, leveraging well-established datasets, with the goal of advancing the security measures within the IoT ecosystem. The study delves into the creation of a deep learning-driven algorithm, meticulously crafted to identify and mitigate DoS attacks. In [36], the researchers delve into the strategies for managing absent data within practical computational intelligence applications, addressing the challenges posed due to the absence of data in practical situations. Two rigorous experimental evaluations were performed, assessing the effects of varied missing data remediation methods on Random Forest classifier precision, utilizing contemporary cybersecurity validation datasets, CICIDS2017 and IoT-23, which are crafted to reflect modern security threats.

To remedy this deficit, authors in [15] present an innovative IoT/IIoT data resource, furnished with classified truth labels marking normalcy versus anomaly. The dataset also integrates an attribute detailing diverse attack typologies, enabling sophisticated poly-categorical segregation. TON_IoT’s data framework embodies IoT/IIoT functional metrics, system activity transcripts, and transmitted network interchanges. A meticulously curated data corpus was assembled from a realistic, moderate-scale network replica, executed within UNSW Canberra’s Cyber Range and IoT Laboratories, Australia. In [37], the scholars meticulously harnessed advanced Machine Learning and Deep Learning paradigms to conduct an in-depth analytical dissection of denial of service (DoS) and distributed denial of service (DDoS) attacks, unraveling their intricate dynamics and impact. The UNSW Canberra Cyber Centre’s Bot-IoT data compilation served as the training foundation. The ARGUS software (version 24.12) was strategically leveraged to distill essential features from the pcap files of the UNSW dataset, orchestrating an in-depth and multifaceted exploration of adversarial intrusion behaviors. In [37], the researchers unveiled an advanced privacy-enhanced intrusion detection architecture, termed the privacy-preserving intrusion detection framework (P2IDF), meticulously engineered to safeguard network traffic inside programmable IoT and distributed edge processing architectures. This framework leverages a sophisticated SAE architecture to systematically transmute raw network data into an abstracted latent space, thereby reinforcing privacy preservation and significantly mitigating exposure to inference-driven adversarial exploits. Subsequently, the framework integrates an advanced IDS powered by ANN, meticulously assessed using the ToN-IoT dataset to discern benign from malicious traffic, both in its raw state and post-transformation, ensuring enhanced detection efficacy. This synergistic dual-layer mechanism fortifies the security architecture of interconnected IoT-Fog infrastructures, simultaneously upholding stringent data confidentiality through advanced protective measures. Authors in [38] conducted an exhaustive evaluation of attribute importance within six varied intrusion detection data collections. They employed three separate attribute selection techniques: Chi-squared, entropy-based gain, and inter-variable dependency analysis, to order and grade attributes based on their predictive power. Following this, the features underwent rigorous evaluation using deep feedforward networks (DFF) and RF classifiers, resulting in a broad array of 414 analytical trials. The core discovery demonstrates that a refined, selective attribute array achieves detection parity or superiority over exhaustive attribute usage, emphasizing attribute compression’s impact on network intrusion detection system optimization. In [39], the researchers unveil a groundbreaking and exhaustive cybersecurity dataset, meticulously crafted for applications within IoT and IIoT ecosystems, named Edge-IIoTset. This data repository is precisely structured to enable automated intrusion detection frameworks, supporting both consolidated and distributed training methodologies. The data collection was precisely engineered within a tailored IoT/IIoT experimentation environment, encompassing a broad spectrum of realistic devices, sensing units, transmission methods, and cloud/edge arrangements, thus validating its accuracy and suitability for authentic deployments.

In [17], the researchers present a DNN model that attains an impressive 99.56% accuracy for multi-class classification. This research integrates Big Data and Deep Learning methodologies to elevate the performance of IDSs. The study employs a trio of classifiers for network traffic categorization: random forest and GBT as ensemble algorithms, and a deep feed-forward neural network. To optimize attribute selection, a homogeneity measure is applied to the datasets. A deep neural network, detailed in [18], realizes 95.6% accuracy across multiple classes. Their goal centers on crafting a resilient and potent intrusion detection apparatus, purposed to discern and categorize emergent and unforeseen digital incursions. The study highlights the importance of evaluating diverse datasets using both static and dynamic methodologies to identify the most effective algorithms for future threat detection. The investigation offers a complete appraisal of deep learning networks and classical predictive algorithms, utilizing various public malware validation datasets. The researcher in [19] propose a DNN model with a 99% accuracy rate. This model underwent thorough evaluation using a contemporary, publicly available dataset, which includes packet-level and flow-level data, in addition to supplementary metadata. It adeptly addresses the challenges posed by imbalanced and labeled datasets, containing 79 unique attributes. Reference [40] advocates employing principal component analysis, random forest, linear discriminant analysis, and quadratic discriminant analysis, attaining 99.6% accuracy in diverse category assignments. Principal component analysis shrinks feature space, and the compressed attributes are used to build diverse intrusion detection system classifiers. A 97.01% classification score was attained by a deep learning topology in [17]. This study leveraged a five-tiered validation scheme, and utilized the Keras Deep Learning platform coupled with Apache Spark for concurrent computing tasks. Ensemble techniques were also employed using the Apache Spark Machine Learning Library. The research conducted by [41] introduced an ANN model that achieved a notable 99.59% accuracy in multi-class classification. By consolidating the entire dataset into a single file and reclassifying attack types into new categories, the study assessed deep learning performance across both binary and multi-class scenarios. Instead of evaluating models in isolation, this approach offered a comprehensive strategy for dataset integration. In [42], a RF model attained a 97.37% accuracy in multi-class classification. The study introduced a feature clustering technique for Flow, TCP, and MQTT data to address challenges such as dataset imbalance, dimensionality reduction, and overfitting. Supervised machine learning methods, including ANN, SVM, and RF, were employed to enhance performance. Research presented in [23] offered a substantial advancement, with a RF model reaching 98.3% accuracy. This work broadened its intrusion detection method to incorporate the UNSW-NB15 dataset, showing a noteworthy 98.3% accuracy in multi-category classification. In [43], the researchers introduced a RNN model that achieved a notable 94% accuracy in multi-class classification. This study employed RFE for feature selection, enhancing the RNN’s classification performance across five categories: Normal, DoS, Probe, R2L, and U2R. Researchers in [44] developed a novel architecture combining deep convolutional and recurrent neural networks, achieving a striking 99.5% accuracy. This model used convolutional layers to automatically learn and select relevant features, Thereafter, a softmax function was integrated to discern and classify various network intrusion occurrences. In [45], a methodology incorporating sparse stacked autoencoders was proposed, reaching an exceptional 98.5% accuracy in the classification of network traffic data. This methodology was executed through a three-phase process: the preliminary phase focused on feature extraction via sparse stacked autoencoders, followed by system training using a softmax classifier, and concluded with an optimization phase for fine-tuning the model’s parameters. In [46], the authors propose a semi-supervised optimization method for network anomaly traffic detection, utilizing a double deep Q-network (DDQN), a prominent algorithm in Deep Reinforcement Learning. In the proposed semi-supervised double deep Q-network (SSDDQN) model, the current network first employs an autoencoder to reconstruct traffic features, followed by a deep neural network classifier. For the target network, the method begins with an unsupervised K-Means clustering algorithm, which is then followed by prediction using a deep neural network. The model achieved an accuracy of 79.43% in multi-class classification.

In [29], the study delved deeply into the capabilities and performance of various deep learning frameworks, assessing their contribution to advancing network intrusion detection methodologies. Through an extensive comparison of leading frameworks like Keras, TensorFlow, Theano, fast.ai, and PyTorch, alongside the integration of MLP into their evaluation, a 98.31% classification accuracy was facilitated on the CSE-CIC-IDS2018 dataset, encompassing both network intrusion detection and attack type categorization. The research presented in [47] highlights the crucial role of cyber security in protecting network infrastructures from vulnerabilities and intrusions. It highlights the profound progress made in machine learning, especially deep learning, in propelling the proactive identification and thwarting of attacks through the utilization of intricate self-learning mechanisms and advanced feature extraction methodologies. The CSE-CIC-IDS2018 dataset, including both normal and malicious network activity, was analyzed using a deep learning LSTM model, yielding 99% intrusion detection accuracy. In [48], the authors evaluate a DNN model, which has demonstrated a commendable detection accuracy of approximately 90%. This evaluation underscores the model’s effectiveness in identifying network intrusions. The research in [49] introduces an adaptive network anomaly detection framework aimed at fortifying network security by leveraging advanced deep learning methodologies. The authors meticulously crafted a deep neural network framework, leveraging the power of LSTM coupled with an advanced attention mechanism (AM), thereby significantly augmenting its performance and optimizing its overall efficacy in complex network anomaly detection tasks. To tackle the challenge of class imbalance within the CSE-CIC-IDS2018 dataset, they employed the SMOTE in conjunction with a refined loss function, resulting in a remarkable accuracy of 96.2%, thus optimizing model performance and enhancing its robustness against skewed class distributions. The study reported in [33] introduces a sophisticated classifier development technique and presents results from two intrusion detection datasets. Their approach leverages a sequential ensemble of boosting-enhanced ANNs, crafting a potent and highly efficient multi-class classifier that significantly amplifies detection capabilities and model performance across diverse attack scenarios. This technique, which builds on a one-vs-remaining strategy, is further refined with an example filtering step. This innovative approach achieves an impressive accuracy of 99.36%, significantly improving both performance and accuracy. In [18], the authors delve into the creation of a sophisticated DNN designed to build a dynamic and robust IDS, adept at identifying and categorizing both emerging and unpredictable cyber threats with remarkable flexibility and precision. Considering the ever-evolving landscape of network dynamics and the continuous adaptation of attack techniques, it is imperative to assess datasets derived from both static and dynamic approaches over time to effectively capture and respond to these shifting patterns. The proposed DNN model exhibits outstanding performance, reaching a high accuracy of 93.5%, demonstrating its exceptional capacity to adapt to dynamic network environments and its efficacy in providing real-time detection of emerging cyber threats, thereby ensuring comprehensive security in the face of evolving attack strategies. Study [50] examines network intrusion detection using multi-class classification on the KDD-CUP 99 and NSL-KDD datasets. The research thoroughly evaluates the model’s ability to discern and categorize various attack types, ensuring robust performance assessment across diverse security scenarios. Utilizing a CNN, the experiment attains an impressive accuracy rate of 98.2%, demonstrating the model’s exceptional proficiency in precisely detecting and classifying a wide spectrum of network intrusions, further highlighting its robustness in addressing diverse and evolving security threats. In [51], the authors deliver an extensive, experiment-centric evaluation of a range of neural network techniques applied within IDS, providing an insightful exploration of their efficacy and versatility in fortifying network security against a diverse array of cyber threats. They specifically highlight the performance of MLP, a prominent type of neural network-based method, which achieved an accuracy of 88.92% in multi-class classification for detecting intrusions. The growing integration of the IoT in manufacturing enhances real-time monitoring and decision-making but also raises security concerns due to potential anomalies in IoT networks, highlighting the need for rapid detection and resolution to prevent harm and losses [52].

Based on a thorough evaluation of previous research, we confidently assert that our proposed CNN-MLP model significantly outperforms existing approaches. This innovative model achieves exceptional performance with 99.99% accuracy in multi-class classification (excluding the normal class) and 99.91% accuracy in single-phase multi-class classification (including the normal class) on the IoT-23 dataset, and 98.02% accuracy in multi-class classification (excluding the normal class) and 98.11% accuracy in single-phase multi-class classification (including the normal class) on the NF-BoT-IoT-v2 dataset. These results underscore the advancements our model offers over prior methods. A detailed comparison of our results with relevant studies can be found in Table 2.

Table 2. Overview of related work in multi-class classification.

2.3. Challenges

Cutting-edge IDS employing deep learning models encounter numerous substantial obstacles. A primary challenge is attaining superior accuracy, which is frequently impeded by class imbalance in benchmark datasets, where benign traffic overwhelmingly surpasses attack traffic. This disparity complicates the identification of infrequent yet crucial attacks, often resulting in increased false positive rates and diminished overall system efficiency. The elevated processing complexity and resource consumption of deep learning, though offering the possibility of enhanced detection correctness, constitute a noteworthy difficulty. This inherent trait presents a substantial impediment to expandability and operational streamlining, notably in broad, real-time architectures where such prerequisites can hamper functionality and realistic application. A key challenge in these models is their generalizability, as they often face difficulties in adjusting to varying network environments or novel attack patterns not included in the training data. This restricts their ability to maintain robustness and effectiveness when deployed in real-world situations. Moreover, numerous studies tend to concentrate on theoretical and experimental dimensions, overlooking crucial practical concerns like preserving information confidentiality, reducing response time, and guaranteeing effortless compatibility with established protective measures. Lastly, prioritizing accuracy alone can obscure other crucial metrics, such as precision, recall, and F1-score, as well as the impact of false positives and negatives. Tackling these obstacles demands a comprehensive strategy that harmonizes efficient data management, scalability, flexibility, and real-world applicability.

Our CNN-MLP architecture successfully overcomes key challenges commonly encountered in modern IDSs. It outperforms conventional approaches, showcasing enhanced accuracy and improved performance across multiple evaluation metrics. The model mitigates class imbalance by utilizing sophisticated strategies such as the advanced hybrid ADASYN-SMOTE, improved ADASYN, ENN, and class weights, significantly strengthening its ability to recognize uncommon attack vectors. Utilizing the CNN for feature extraction and the MLP for final classification, the model preprocesses IoT network traffic data to improve feature representation and balance class distributions, thereby boosting classification performance. The CNN-MLP architecture is crafted to ensure scalability and efficiency, adeptly handling vast datasets. It capitalizes on the CNN’s proficiency in extracting features and the MLP’s power in classification, optimizing computational resources while ensuring real-time processing capability. By means of extensive testing across the IoT-23 and NF-BoT-IoT-v2 datasets, the model’s resilience is confirmed, showcasing its capability to perform effectively across varied network conditions and attack types. Furthermore, the model addresses practical deployment obstacles by minimizing both erroneous positive and negative identifications, thus guaranteeing reliable and trustworthy operation in real-world scenarios. Additionally, the assessment of the CNN-MLP model includes a broad range of performance metrics beyond mere accuracy, offering a detailed evaluation of its effectiveness while addressing possible shortcomings in detection dependability and real-world application.

3. Proposed Approach

The CNN-MLP model represents a state-of-the-art deep learning architecture that merges CNNs with MLP to deliver outstanding performance across both binary and multi-class classification challenges. This innovative framework adeptly tackles key issues encountered in IDS, such as boosting classification accuracy and managing class imbalances, with a particular focus on the IoT-23 and NF-BoT-IoT-v2 datasets. In this section, we provide a detailed overview of the steps involved in our model, including the thorough preprocessing procedures applied to the IoT-23 dataset. The model’s capabilities are subsequently scrutinized through testing on the IoT-23 and NF-BoT-IoT-v2 benchmark collections. A suite of sophisticated data preparation methods is utilized by the model to mitigate the challenge of unequal class representation. To counteract class imbalance, the model utilizes a sophisticated hybrid ADASYN-SMOTE mechanism, incorporating a refined ADASYN strategy. This strategic oversampling of minority classes strengthens the model’s capacity to identify and learn from rare but essential data samples. Furthermore, the model leverages ENN for selective undersampling and integrates class weighting to modulate the significance of each class during training. This integrated technique guarantees meticulous processing of complex data points and preserves balanced class proportions during training. The CNN component excels in feature extraction and enhancement, transforming network traffic data into a refined format that is then processed by the MLP for precise classification. This integrated architecture efficiently minimizes both false alarms and missed detections, improving the model’s capacity to recognize both known and unknown (zero-day) threats. The model’s remarkable effectiveness is definitively established by its superior results on the IoT-23 and NF-BoT-IoT-v2 data collections. On the IoT-23 dataset, it achieved 99.94% accuracy in two-stage binary classification, 99.99% accuracy in multi-class classification excluding the normal class, and 99.91% accuracy in single-phase multi-class classification including the normal class. Tested on the NF-BoT-IoT-v2 dataset, the model yielded 99.96% accuracy in binary classification, 98.02% accuracy in multi-class classification excluding the normal class, and 98.11% accuracy in single-phase multi-class classification including the normal class. Figure 1 highlights the efficacy of the dual-stage binary and multi-class classification workflow, excluding the normal class, while also illustrating the single-phase multi-class classification, which includes the normal class, thereby offering a comprehensive roadmap for practical deployment in IDS scenarios using the IoT-23 dataset.

Figure 1. System architecture for both binary and multi-class classification tasks utilizing the IoT-23 dataset.

The two-stage classification process employed in IDS begins with a binary classifier that assesses whether the network traffic is normal or indicative of an attack, as illustrated in Figure 2. If the traffic is classified as Normal, it is permitted to proceed without interruption. Conversely, if an Attack is detected, the system blocks the traffic and initiates a multi-class classifier in the background. This background classification occurs after blocking, ensuring that the critical decision-making is prioritized and based solely on the binary classification. By separating these processes, this approach significantly reduces the time and complexity associated with a single-phase multi-class classification system. The binary classifier’s rapid determination of whether to allow or block traffic allows for swift action, while the detailed classification of the attack type is handled independently, thus preserving real-time performance. Ultimately, this two-stage method enhances both efficiency and response time in detecting and mitigating security threats.

Figure 2. Design of the two-stage process of binary and multi-class classification.

3.1. Dataset Description

Throughout the past decade, the utilization of machine learning and deep learning methodologies for detecting anomalies has garnered significant attention in academic research, particularly in relation to IDS [53]. IoT-23, a comprehensive dataset released in early 2020 by Stratosphere Lab in the Prague, Czech Republic and developed with the support of Avast Software in Prague, serves as a critical resource for IoT security research. The IoT-23 dataset is structured around twenty scenarios, each representing different types of malicious traffic generated by various attacks targeting IoT networks. These scenarios are labeled with the specific malware executed on the IoT devices. In addition to the malicious traffic, the dataset also includes three scenarios capturing normal (benign), non-infected IoT traffic, which serve as a baseline for comparison. The dataset, which encompasses twenty-three distinct scenarios, is aptly referred to as IoT-23. Each scenario was performed in a regulated setting, where devices were linked to the internet, emulating the standard behavior of IoT devices [54]. As a result, the IoT-23 dataset provides both normal and malicious traffic patterns, facilitating its use for binary classification tasks. Moreover, the malicious traffic flows have been further annotated with additional labels, enabling the dataset to support multi-class classification as well. After removing the extreme minority classes [12,55], the various classes along with their detailed descriptions are outlined in Table 3.

Table 3. Class types in IoT-23 dataset.

The dataset contains various types of information for each row, representing different data flows. Each flow is characterized by features that are captured regardless of whether the traffic is normal or malicious. Table 4 details the features recorded for each flow between the source and destination. The data is organized into its own distinct column, enabling its use in the upcoming phases of classification. We applied two distinct approaches for classification: one for binary outcomes and another for multiple classes. Binary classification assesses whether a flow is categorized as either harmful or benign. The second method, multiclass classification, evaluates the IDS’s ability to identify different types of attacks.

Table 4. Attributes included in the IoT-23 dataset.

3.2. Data Preprocessing

Data preparation is fundamental to the data analysis and machine learning pipeline. It involves converting raw, unstructured data into a clean, structured format that can be effectively analyzed. Common preprocessing tasks include managing missing values, encoding categorical variables, normalizing or standardizing numerical data, and eliminating duplicates or irrelevant entries. Effective data preprocessing enhances dataset quality, reduces noise, and improves machine learning model performance by facilitating more accurate data interpretation. In this section, we present the preprocessing steps applied to the IoT-23 dataset, one of the most widely used datasets in IDS, detailing the techniques used to prepare the data for both types of classification (binary and multi-class). The IoT-23 data repository initially presents challenges such as missing values and duplicate entries. The preprocessing pipeline addresses these issues by first eliminating any missing values and subsequently removing duplicate records. In multi-class classification without normal class, normal traffic is removed, permitting the model to dedicate itself solely to discerning attack types, thereby improving its ability to categorize malicious activities. To handle outliers, Z-score is applied to identify and remove extreme values that could distort the model’s performance. Following these steps, numerical features are scaled to a uniform range using the MinMaxScaler 1.2.2, which ensures consistent feature scaling across the dataset. Categorical variables are then converted into a numerical format through One-Hot Encoding, making them compatible with machine learning algorithms. Upon completion of preprocessing, the dataset is partitioned into training and testing sets. To balance classes, resampling techniques such as hybrid ADASYN-SMOTE or ADASYN are employed for oversampling to create artificial samples within the training dataset, while ENN is used for undersampling within the training data, along with adjusting class weights, to further enhance the balance of the training set. This comprehensive approach to data preparation, including removing outliers, normalization, encoding, resampling, and model development, is visualized in Figure 1.

3.2.1. Removing Outliers Using Z-Score

Outliers are removed from a specific class in a dataset using the Z-score method. The process begins by identifying the indices of data points belonging to the target class. Numerical and categorical data for the specified class are separated, and Z-scores are calculated for the numerical features. Outliers are detected based on a Z-score threshold of 3, meaning any data point with a Z-score greater than 3 (or less than −3) is considered an outlier. These outliers are then removed from both the numerical and categorical columns of the class, and the cleaned data is recombined with the rest of the dataset. This results in a refined dataset that improves data quality and enables more accurate analysis or model training. By removing extreme values, the model is better able to focus on meaningful patterns, which enhances performance in classification tasks.

3.2.2. Normalization

Normalization is a vital preprocessing step in machine and deep learning, particularly for datasets like IoT-23. This process, which scales numerical data to a standardized range, is crucial for enhancing the efficiency and effectiveness of the model. Among the various normalization techniques considered, MinMaxScaler was selected as the optimal method. This tool from the scikit-learn library ensures consistent scaling across all numerical features, which is essential for improving model accuracy. The MinMaxScaler normalizes feature values to a specified range, typically [0, 1], using Equation (1) [56].

X (s c a l e d) = \frac{X - m i n (x)}{m a x (x) - m i n (x)}

(1)

In this equation, X refers to the initial values of the feature, while min(X) denotes the smallest value within the feature, and max(X) represents the largest value within the same feature. This process involves subtracting the minimum value from each feature value and then dividing by the range of the feature values. For the IoT-23 dataset, normalization with MinMaxScaler was applied to all numerical features, except for categorical variables such as ‘proto’, ‘history’, and ‘conn_state’. These categorical columns were transformed using one-hot encoding, converting them into numerical format suitable for machine learning models. This combination of MinMaxScaler for numerical features and one-hot encoding for categorical variables ensured uniform scaling and effective model training. This comprehensive approach to normalization and encoding significantly contributed to the quality and reliability of the machine learning models.

3.2.3. Encoding

Machine and deep learning models require numerical data, necessitating the conversion of non-numeric attributes into numerical formats. In this study, one-hot encoding was used to transform categorical variables such as ‘proto’, ‘history’, and ‘conn_state’ into numerical representations. This technique is favored for its simplicity and effectiveness in handling categorical data [57]. One-hot encoding creates a new binary column for each unique category, with a 1 indicating the presence of the category and 0 indicating its absence. This method converts categorical variables into a binary vector format that is compatible with machine learning algorithms. Applying one-hot encoding increased the number of features from 15 to 119 for binary and multi-class classification tasks excluding the normal class, and to 125 for multi-class classification including the normal class in the IoT-23 dataset. Additionally, classification labels were encoded using the one-hot encoder from the scikit-learn library [58]. This approach was selected after evaluating various encoding techniques, as it provided the most effective preparation for model training and analysis.

3.2.4. Splitting Dataset for Training and Evaluation

The data is partitioned into training and testing subsets to evaluate the model’s effectiveness and confirm its ability to generalize effectively. The training set is used for learning patterns, while the testing set remains separate for an unbiased evaluation on new data. This method helps evaluate the model’s ability to generalize across both types of classification (binary and multi-class). IoT-23 binary classification sample distribution (training and testing sets) is shown in Table 5. 58,997 samples are dedicated to training the Normal class, and 10,497 are used for testing. 137,358 samples are used for training the Attack class, and 24,155 for testing. This division guarantees sufficient representation of both classes during the training and testing stages, fostering a fair and comprehensive assessment of the model’s effectiveness.

Table 5. Sample distribution of the IoT-23 dataset for binary classification.

IoT-23 multi-class sample counts (Normal class removed) are in Table 6. For the various attack types, the training set contains 54,762 samples for PartOfAHorizontalPortScan, 51,048 samples for DDoS, and 12,777 samples for Okiru. The C&C-HeartBeat class has 6965 training samples, while C&C consists of 5756 training samples. The Attack class has 1814 training samples, and the C&C-PartOfAHorizontalPortScan class has 225 training samples. In the testing set, the PartOfAHorizontalPortScan class has 9733 samples, DDoS has 9069, Okiru has 2156, C&C-HeartBeat has 1174, C&C has 1036, Attack has 327, and C&C-PartOfAHorizontalPortScan has 37 samples. This distribution ensures the presence of all attack class instances in both the training and testing partitions, facilitating a comprehensive assessment of multi-class model efficacy.

Table 6. IoT-23 multi-class sample counts (Normal class removed).

Multi-class sample counts (including Normal) are in Table 7. For the Normal (benign) class, the training set contains 58,964 samples, with 10,507 samples in the testing set. The PartOfAHorizontalPortScan class has 58,901 training samples and 10,241 test samples, while DDoS has 51,026 training samples and 9016 test samples. The Okiru class consists of 12,675 training samples and 2260 test samples. The C&C-HeartBeat class has 6863 training samples and 1232 test samples, while C&C contains 5697 training samples and 990 test samples. The Attack class has 1825 training samples and 328 test samples, and the C&C-PartOfAHorizontalPortScan class has 227 training samples and 46 test samples. This arrangement guarantees sufficient presence of every category, encompassing the typical category, in both the learning and validation stages, aiding a fair and efficient appraisal of the model’s capability across diverse classifications.

Table 7. Sample distribution of the IoT-23 dataset for multi-class classification including normal class.

3.2.5. Class Balancing

The IoT-23 data collection presents a substantial obstacle because of unequal category proportions, potentially hindering the success of algorithmic learning systems. For remediation of this challenge, a meticulous approach to class equalization was utilized, with particular attention directed toward the training corpus. Within binary classification contexts, a composite ADASYN-SMOTE methodology was implemented to synthesize artificial samples for the subordinate class, thereby attaining class equilibrium within the training corpus. In multi-class scenarios, ADASYN was implemented to create additional samples for underrepresented classes, enhancing their representation. To further enhance the balance of the dataset, oversampling was performed in conjunction with these techniques. This process entailed creating artificial samples for both types of classification (binary and multi-class), enhancing the overall balance of class proportions in the training data. Additionally, to further address class imbalance, undersampling techniques such as ENN were applied within the training set to remove noisy and borderline instances from the majority class, thereby enhancing the overall quality of the data. Class weights were adjusted to reflect the importance of each class during model training, which helps to prevent bias towards the majority class. Despite these measures, the accuracy paradox, where models achieve high accuracy but struggle to effectively predict minority class instances, remains a concern [59]. To counteract this, an integrated approach combining over-sampling techniques such as Hybrid ADASYN-SMOTE or ADASYN with ENN for undersampling, along with the adjustment of class weights, was adopted [60,61]. This multifaceted strategy aimed to provide a robust solution to class imbalance, ultimately enhancing model performance and reliability in detecting and classifying minority class instances.

1.: Hybrid ADASYN-SMOTE

The Hybrid ADASYN-SMOTE technique was utilized to create artificial samples for the underrepresented class, successfully mitigating the class imbalance present in the dataset. For binary classification on the IoT-23 dataset, Hybrid ADASYN-SMOTE was used to equalize the class distribution between the Normal and Attack instances. At the outset, the dataset contained 58,997 instances of the Normal category and 137,358 instances of the Attack category, demonstrating a considerable disparity in class representation. As outlined in Table 8, following the application of ADASYN, the sample count for the Normal class rose to 140,815, whereas the Attack class maintained a total of 137,358 samples. Then, SMOTE was applied to the already resampled data, which further balanced the class distribution by increasing the number of samples in the Attack class to 140,815. This adjustment ensured an even distribution across both classes, effectively mitigating the class imbalance. This balance allows the model to more effectively learn and classify instances from both classes, ultimately improving its performance in detecting attacks.

Table 8. Training class sample counts (pre- and post-hybrid ADASYN-SMOTE resampling) for IoT-23 binary classification.

2.: ADASYN

The breakdown of sample distribution across the different classes in the IoT-23 dataset, both prior to and following the application of ADASYN resampling for multi-class classification, excluding the normal class, is presented in Table 9. ADASYN was applied twice in a cascaded manner to balance the class distribution. Initially, certain classes, such as “PartOfAHorizontalPortScan”, “DDoS”, and “Okiru”, retained their original sample sizes of 54,762, 51,048, and 12,777, respectively. Through the first application, minority classes with fewer samples were resampled to improve balance. Then, in the second stage, ADASYN was applied again to the already resampled data, further balancing the class distribution. This resulted in “Attack” growing from 1814 to 54,762 samples and “C&C-PartOfAHorizontalPortScan” increasing from 225 to 54,761 samples. The cascaded application of ADASYN ensures a more balanced dataset, which is crucial for improving the performance of machine learning models, particularly when addressing class imbalance in multi-class categorization scenarios.

Table 9. Training class sample counts (pre- and post-ADASYN resampling) for IoT-23 multi-class classification (excluding Normal).

The sample distribution across the different classes in the IoT-23 dataset, including the normal class, before and after resampling with ADASYN for multi-class classification is shown in Table 10. ADASYN was applied in a two-stage process to address class imbalance. Initially, classes such as “benign”, “PartOfAHorizontalPortScan”, and “DDoS” retained their original sample sizes of 58,964, 58,901, and 51,026, respectively. During the first stage, minority classes with fewer instances were resampled to improve balance. In the second stage, ADASYN was applied once again to the already resampled data, further balancing the class distribution. As a result, “Attack” increased from 1825 to 58,962 samples, and “C&C-PartOfAHorizontalPortScan” grew from 227 to 58,924 samples. The cascaded application of ADASYN results in a more balanced dataset, which is crucial for enhancing the performance of machine learning models, especially when addressing class imbalance in multi-class categorization scenarios.

Table 10. Training class sample counts (pre- and post-ADASYN resampling) for IoT-23 multi-class classification (including Normal).

3.: ENN

In the binary classification task for the IoT-23 dataset, the ENN undersampling method was utilized to eliminate noisy and borderline instances from the majority class, specifically targeting the Attack class. As shown in Table 11, the Normal class retained its original 140,815 samples after resampling, while the Attack class was reduced from 140,815 to 115,850 samples. This reduction of 24,965 samples aimed to improve the dataset’s overall quality by removing potentially confusing data points. By focusing on the elimination of ambiguous instances within the majority class, ENN helps to better balance the dataset, enabling the model to more effectively distinguish between Normal and Attack classes, ultimately enhancing classification performance.

Table 11. Training class sample counts (pre- and post-ENN resampling) for IoT-23 binary classification.

In the multi-class classification task for the IoT-23 dataset, excluding the normal class, the ENN undersampling method was employed to eliminate uncertain or noisy instances from the dominant classes, refining the dataset for better model accuracy. As detailed in Table 12, the PartOfAHorizontal-PortScan class was reduced from 54,762 to 54,491 samples, while the DDoS class saw a decrease from 51,048 to 51,043 samples. The C&C-HeartBeat class experienced a slight reduction, going from 6965 to 6957 samples. Other classes, including Okiru, C&C, Attack, and C&C-PartOfAHorizontalPortScan, maintained almost identical sample counts after resampling. By applying ENN, the dataset was refined by eliminating po-tentially misleading data points while preserving the integrity of minority classes. This enhancement ensures that the model can more effectively distinguish between various attack types, thereby improving its classification performance.

Table 12. Training class sample counts (pre- and post-ENN resampling) for IoT-23 multi-class classification (excluding Normal).

The multi-class classification task involving the IoT-23 dataset, including the normal class, utilized the ENN undersampling approach to address class imbalance. This method focuses on refining the dataset by identifying and removing ambiguous or noisy samples, particularly those located near decision boundaries within the majority classes. As detailed in Table 13, the benign class experienced a reduction from 58,964 to 48,622 samples, and the PartOfAHorizontalPortScan class decreased from 58,901 to 42,206 samples. Additionally, the DDoS class showed a slight decrease from 51,026 to 51,020 samples, while the Okiru class retained its original count of 12,675 samples. Similarly, the C&C-HeartBeat class saw a small reduction from 6863 to 6861 samples. The C&C and Attack classes retained their original counts of 5697 and 58,962 samples, respectively, and the C&C-PartOfAHorizontalPortScan class had a minimal decrease from 58,924 to 58,923 samples. By applying ENN, the dataset was refined through the removal of potentially problematic instances, thereby improving the model’s effectiveness and its capacity to correctly identify different types of attacks.

Table 13. Training class sample counts (pre- and post-ENN resampling) for IoT-23 multi-class classification (including Normal).

4.: Class Weights

In machine learning, class weights are essential for managing datasets with imbalanced class distributions by assigning appropriate weights to each class based on its prevalence. This strategy helps the model give more attention to less frequent classes. For the IoT-23 dataset, class weights were applied in binary classification to address this imbalance. As illustrated in Table 14, the Normal class was allocated a value of 0.9114, while the Attack class was given a value of 1.1077. These values were strategically selected to equalize the model’s attention on both classes, guaranteeing efficient performance despite their disproportionate distribution. Implementing these class weights allowed the model to better handle the imbalance, thereby enhancing accuracy for both classes.

Table 14. Class weighting for each training class on IoT-23 binary classification.

In the multi-class classification task for the IoT-23 dataset, excluding the normal class, class weights were carefully assigned to address the imbalance among various attack types. As detailed in Table 15, the weights were set according to each class’s representation within the dataset. The PartOfAHorizontalPortScan class was allocated a value of 0.6306, whereas the DDoS class was assigned a value of 0.6732. To address the underrepresented classes, Okiru, C&C-HeartBeat, and C&C were given higher weights of 2.6895, 4.9394, and 5.9701, respectively. The Attack class was assigned a weight of 0.6275, and C&C-PartOfAHorizontalPortScan was given a weight of 0.6275 as well. These class weights were intended to balance the influence of each class during the training process, improving the model’s ability to accurately classify and differentiate between attack types, despite the fluctuating occurrence rates within the dataset.

Table 15. The assigned weights for each training class using class weights for multi-class classification, excluding the normal class, on the IoT-23 dataset.

In the multi-class classification of the IoT-23 dataset, including the normal class, class weights were utilized to address imbalances among the various classes. As shown in Table 16, the weights were assigned based on the frequency and significance of each class. The benign class was given a value of 0.7326, whereas the PartOfAHorizontal-PortScan class was allocated a value of 0.8440. The DDoS class was given a weight of 0.6982. To improve the representation of less frequent classes, Okiru was assigned a weight of 2.8103, C&C-HeartBeat received a weight of 5.1918, and C&C was given a weight of 6.2525. The Attack class and C&C-PartOfAHorizontalPortScan both received weights of 0.6041 and 0.6045, respectively. These class weights were carefully chosen to ensure balanced training, improving the model’s capability to precisely identify and differentiate between various attack categories and normal behaviors.

Table 16. The assigned weights for each training class using class weights for multi-class classification, including the normal class, on the IoT-23 dataset.

3.3. Architectures of Models

This investigation implemented a variety of distinct architectural designs, encompassing CNNs, Autoencoders, DNNs, and a hybrid CNN-MLP configuration. These architectures were selected due to their exceptional performance across a diverse spectrum of evaluation metrics, as documented in prior work [62,63,64].

3.3.1. Convolutional Neural Networks (CNN)

The CNN model is composed of an input layer, three hidden convolutional blocks, and an output layer designed for binary or multi-class classification. The input layer processes sequences with a specified number of features and channels. Each hidden layer is formed by a convolutional block consisting of a Conv1D layer for extracting patterns from the input data, followed by BatchNormalization to stabilize and normalize activations, a ReLU activation function to introduce non-linearity, a MaxPooling1D layer to reduce spatial dimensions and focus on important features, and a Dropout layer to reduce the risk of overfitting. These three convolutional blocks are stacked to capture features at varying resolutions, with increasing receptive fields, serving as the core hidden layers of the model. The output from the final convolutional block is flattened using a Flatten layer, converting the feature maps into a 1D vector. The final layer of the network is a dense layer, also known as a fully connected layer. This layer uses a sigmoid activation function when performing binary classification and a softmax activation function when performing multi-class classification. The model is compiled with an appropriate optimizer, a loss function suited to the classification type, and relevant evaluation metrics such as accuracy.

(i): Binary Classification

The binary classification CNN model is composed of a series of layers. The shape of the data processed by the input block’s layer is determined by the dataset’s characteristics, including 119 features for IoT-23 and 18 features for NF-BoT-IoT-v2, as shown in Figure 3a,b, respectively. A 1D CNN (256 filters) with ReLU and max pooling (size 2) comprise Hidden Block 1, reducing spatial dimensions. A Dropout layer is added to the model to combat overfitting, using a dropout probability of 0.0000001. Hidden Block 2 replicates Block 1, but uses a max pooling size of four. The third latent module (Hidden Block 3) adheres to the architectural framework of its predecessors, with the distinction that its one-dimensional max pooling component employs a pooling size of eight. The output of the third block is then flattened using a Flatten layer, and the Output block applies a Dense layer with a single neuron and a Sigmoid activation function, making the model suitable for binary classification.

Figure 3. CNN model layer configuration for binary classification tasks (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

(ii): Multi-Class Classification excluding Normal Class

The CNN model, tailored for multi-class classification without the normal class, begins with the input block, which includes an input layer with shapes of 119 for IoT-23 dataset and 27 for NF-BoT-IoT-v2 dataset, as shown in Figure 4a,b, respectively. In the first hidden block, a 1D CNN (256 filters) precedes ReLU activation. Following the convolutional and activation layers, a 1D max pooling operation with a size of two is performed, and subsequently, a dropout layer with a rate of 0.0000001 is included to regulate overfitting. The subsequent hidden block, mirroring the first, employs a one-dimensional convolutional layer with 256 filters to process the input, followed by a ReLU activation function. The sequence is then max pooling (size 4) followed by dropout (same rate). The hidden block 3 replicates the design of its predecessors, with the exception that the one-dimensional max pooling layer is adjusted to a pool size of 8. After the third hidden block, the output is flattened using a Flatten layer. For IoT-23, the output block uses a 7-unit Dense layer, and for NF-BoT-IoT-v2, it uses 4 units, as shown in Figure 4a,b, respectively, both with Softmax. The model is compiled using the Adam optimizer, categorical cross-entropy loss, and accuracy as the evaluation metric.

Figure 4. CNN model layer configuration for multi-class classification excluding normal class (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

(iii): Multi-Class Classification including Normal Class

The multi-class CNN (including normal) begins with an input layer (125 for IoT-23, 19 for NF-BoT-IoT-v2), as shown in Figure 5a,b, respectively. The initial hidden layer features a one-dimensional convolutional layer consisting of 256 filters, followed by the application of a ReLU activation function. Next, a max pooling layer (size 2) and a dropout layer (rate 0.0000001) are used to prevent overfitting. The second hidden block follows the same structure as the first, featuring a one-dimensional convolutional layer with 256 filters, followed by a ReLU activation function. The sequence is then max pooling (size 4) followed by dropout (same rate). The hidden block 3 retains the same configuration as the preceding ones, with the sole difference being that the one-dimensional max pooling layer is configured with a pool size of eight. After passing through the third hidden block, the output is flattened using a Flatten layer. The output block consists of a Dense layer with 8 units for the IoT-23 dataset and 5 units for the NF-BoT-IoT-v2 dataset, as shown in Figure 5a,b, respectively, employing Softmax activation to generate class probabilities. The model is compiled using the Adam optimizer, categorical cross-entropy loss, and accuracy as the evaluation metric.

Figure 5. CNN model layer configuration for multi-class classification including normal class (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

(iv): Hyperparameters for the CNN Model

The CNN model parameters, as delineated in Table 17, are specified for both binary and multi-class classification paradigms. A batch size of 128 is employed for both types of classifiers. The learning rate follows a scheduled approach using ReduceLROnPlateau with starting value of 0.001, a factor of 0.5, and a minimum value of 1 × 10⁻⁵ for both classifiers. The Adam optimizer is employed for both tasks. The binary classifier uses binary cross-entropy as the loss function, while the multi-class classifier uses categorical cross-entropy. Accuracy is the evaluation metric for both types of classifiers.

Table 17. Hyperparameter Specification for the CNN Model.

3.3.2. Autoencoder (AE)

The model is first trained as an Autoencoder, where the encoder learns a compact representation of the input, and the decoder reconstructs the original feature space. The encoder consists of multiple dense layers with ReLU activation, progressively reducing the feature dimensionality to capture essential representations, while the decoder symmetrically reconstructs the input through corresponding dense layers. Once trained, the encoder is extracted and used to transform input data into a lower-dimensional representation for classification, while the decoder is discarded. The extracted encoder is then used to construct a separate classifier by adding a classification layer, which employs a sigmoid activation function for binary classification and a softmax activation function for multi-class classification [65,66,67,68,69].

(i): Binary Classification

The model is designed for binary classification using an Autoencoder architecture. It begins with an input layer that accepts feature vectors with dimensions matching the dataset, such as 119 for IoT-23 and 18 for NF-BoT-IoT-v2, as depicted in Figure 6a,b, respectively. The Autoencoder consists of an encoder and a decoder, where the encoder is composed of three fully connected layers with 128, 64, and 32 neurons, respectively, each utilizing the ReLU activation function to progressively reduce the feature dimensionality while capturing essential representations. The decoder symmetrically reconstructs the input using corresponding dense layers. Once training is complete, the encoder is extracted to transform input data into a lower-dimensional representation, while the decoder is discarded. A classification layer is then added to the extracted encoder, employing a sigmoid activation function for binary classification, which outputs a probability score within the unit interval.

Figure 6. Binary classification Autoencoder layer configuration (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

(ii): Multi-Class Classification excluding Normal Class

The model is designed for multi-class classification, excluding the normal class, using an Autoencoder architecture. It begins with an input layer that accepts feature vectors with dimensions corresponding to the dataset (119 for IoT-23 and 27 for NF-BoT-IoT-v2), as depicted in Figure 7a,b, respectively. The Autoencoder consists of an encoder and a decoder, where the encoder comprises three sequential dense layers with 128, 64, and 32 neurons, respectively. Each layer applies the ReLU activation function to extract essential features and progressively reduce the input dimensionality. The decoder symmetrically reconstructs the input through corresponding dense layers. Once training is complete, the encoder is extracted and used to transform input data into a lower-dimensional representation, while the decoder is discarded. A classification layer is then appended to the extracted encoder, utilizing a Softmax activation function with an output size that aligns with the dataset requirements, 7 neurons for IoT-23 and 4 neurons for NF-BoT-IoT-v2, as depicted in Figure 7a,b, respectively. This configuration ensures that the model produces a probability distribution over the possible classes, with each output neuron representing the likelihood of a specific class.

Figure 7. Multi-class Autoencoder layer configuration (excluding normal) (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

(iii): Multi-Class Classification including Normal Class

The Autoencoder architecture for multi-class classification, inclusive of the normal class, begins with an input layer that accepts feature vectors with dimensions of 125 for the IoT-23 dataset and 19 for the NF-BoT-IoT-v2 dataset, as depicted in Figure 8a,b, respectively. The encoder consists of three sequential dense layers designed to progressively reduce the input dimensionality while retaining essential features. The first layer contains 128 neurons, followed by 64 neurons in the second and 32 neurons in the third, with each layer employing the ReLU activation function to facilitate non-linear feature extraction. The decoder symmetrically reconstructs the input using corresponding dense layers. Once the Autoencoder is trained, the encoder is extracted to transform the input data into a compact representation, while the decoder is discarded. A classification layer is then added to the extracted encoder, utilizing a Softmax activation function. The number of output neurons corresponds to the total number of classes, including the normal class, resulting in 8 output neurons for the IoT-23 dataset and 5 for the NF-BoT-IoT-v2 dataset, as depicted in Figure 8a,b, respectively. This setup ensures the model produces a probability distribution over all possible classes, with each output neuron representing the likelihood of a specific class.

Figure 8. Multi-class Autoencoder layer configuration (including normal) (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

(iv): Autoencoder Model Hyperparameter Specifications

The configuration parameters for both the binary and multi-class classifiers are largely analogous, as outlined in Table 18. The batch size is set to 128, meaning 128 samples are processed before the model updates its weights. The learning rate is scheduled to start at 0.001, with a reduction factor of 0.5 when the validation loss plateaus, and a minimum learning rate of 1 × 10⁻⁵, managed by the ReduceLROnPlateau callback. The Adam optimizer is the chosen optimization method for both classifiers. This optimizer dynamically adjusts the learning rate associated with each parameter to improve the overall performance of the models. The binary cross-entropy loss function is used for the binary classifier, while the categorical cross-entropy loss function is used for the multi-class classifier. In both cases, accuracy is the evaluation metric, measuring the proportion of correct predictions.

Table 18. Hyperparameter Specification for the Autoencoder Model.

3.3.3. Deep Neural Network (DNN)

The DNN architecture is specifically crafted to tackle both types of classification (binary and multi-class) challenges. The model starts with an input block containing a fully connected layer (Dense) that uses the ReLU activation function. The size of the input to this layer is determined by the number of features in the dataset. This layer introduces non-linearity, enabling the model to capture complex relationships in the data. The hidden layers are organized into two blocks. The initial concealed module contains a Dropout layer to prevent overfitting, succeeded by a Fully Connected layer utilizing ReLU activation, and a BatchNormalization layer to improve training consistency and expedite learning. The following hidden block emulates this configuration, embedding an extra Dropout layer alongside BatchNormalization to further optimize the model’s stability and performance. The final processing unit is designed specifically for the classification objective. In the case of two-class categorization, a fully connected layer with a single unit is utilized, applying the sigmoid activation function to compute probability scores for the available categories. In the case of multi-class classification, a dense layer is incorporated, with the neuron count matching the total number of target classes. This is followed by a Softmax activation function, which facilitates the model’s ability to output a probability distribution across all possible categories.

(i): Binary Classification

The deep neural network design is configured for two-outcome categorization problems. It starts with an entry segment customized to the data set’s attributes, employing 119 processing units for the IoT-23 data compilation and 18 processing units for the NF-BoT-IoT-v2 data compilation, as presented in Figure 9a,b, respectively. This module contains a fully connected layer (1024 neurons) with ReLU activation, allowing the network to identify complex, non-linear relationships within the input information. The initial concealed module incorporates a Dropout layer (rate 0.0000001) to minimize overfitting by selectively disabling neurons during training, followed by a fully connected layer (1024 neurons) with ReLU activation to enhance feature extraction, and a BatchNormalization layer to regulate and expedite the learning process by standardizing activations. The second hidden block follows a similar structure, with an additional Dropout layer (rate 0.0000001) for regularization and a BatchNormalization layer to ensure consistent training performance. The final segment consists of a fully connected layer with a solitary unit and sigmoid activation, generating probability values appropriate for two-category classification.

Figure 9. Binary classification DNN layer configuration (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

(ii): Multi-Class Classification excluding Normal Class

The DNN model is structured for multi-class classification tasks, excluding the normal class. It begins with an input block designed for each dataset’s feature set, utilizing 119 neurons for the IoT-23 dataset and 27 neurons for the NF-BoT-IoT-v2 dataset, as presented in Figure 10a,b, respectively. This block includes a Dense layer (1024 neurons) with ReLU activation, enabling the model to capture complex, non-linear patterns in the input data. The first hidden block comprises a Dropout layer (rate 0.0000001) for overfitting mitigation, and a Dense layer (768 neurons, ReLU activation) for deeper feature representation. A BatchNormalization layer is included to stabilize and accelerate training by normalizing activations. The second hidden block follows a similar structure, with another Dropout layer (rate 0.0000001) and BatchNormalization layer to ensure regularization and consistent training behavior. The output block consists of a Dense layer with 7 neurons and Softmax activation for the IoT-23 dataset, and a Dense layer with 4 neurons and Softmax activation for the NF-BoT-IoT-v2 dataset, as presented in Figure 10a,b, respectively, making it suitable for multi-class classification by generating class probabilities.

Figure 10. Multi-class DNN layer configuration (excluding normal) (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

(iii): Multi-Class Classification including Normal Class

The DNN model is structured for multi-class classification tasks, including the normal class. It begins with an input block tailored to each dataset’s feature set, using 125 neurons for the IoT-23 dataset and 19 neurons for the NF-BoT-IoT-v2 dataset, as presented in Figure 11a,b, respectively. This block includes a Dense layer (1024 neurons) with ReLU activation, allowing the model to capture complex, non-linear patterns in the input data. The first hidden block comprises a Dropout layer (rate 0.0000001) for overfitting mitigation, and a Dense layer (768 neurons, ReLU activation) for deeper feature representation. A BatchNormalization layer is included to stabilize and accelerate training by normalizing activations. The second hidden block follows a similar structure, with another Dropout layer (rate 0.0000001) and BatchNormalization layer to ensure regularization and consistent training behavior. The output block consists of a Dense layer with 8 neurons and Softmax activation for the IoT-23 dataset, and a Dense layer with 5 neurons and Softmax activation for the NF-BoT-IoT-v2 dataset, as presented in Figure 11a,b, respectively, making it suitable for multi-class classification by generating class probabilities.

Figure 11. Multi-class DNN layer configuration (including normal) (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

(iv): DNN Model Hyperparameter Specifications

The hyperparameters in Table 19 are carefully chosen to optimize the training and performance of both the binary and multi-class classifiers. The batch size is set to 128, meaning that 128 samples are processed in each training step to balance computational efficiency with effective learning. The learning rate follows a scheduled decay approach using Exponential Decay, starting at an initial value of 0.0003, with a decay factor of 0.9 and decay steps of 10,000. This setup allows the learning rate to decrease gradually over time, ensuring smoother convergence. Both models use the Adam optimizer, known for its efficiency in adjusting the learning rate during training. The primary distinction lies in the loss function: the Binary classifier uses binary_crossentropy, suitable for binary classification tasks, while the Multi-Class classifier utilizes categorical_crossentropy to manage multi-class classification problems. Both models are evaluated based on accuracy, which measures the proportion of correct predictions made by the models.

Table 19. Hyperparameter Specification for the DNN Model.

3.3.4. Convolutional Neural Network-Multilayer Perceptron (CNN-MLP)

The architecture of the suggested model combines the strengths of CNN and MLP, providing a robust framework to efficiently handle both binary and multi-class classification challenges. The model initiates with an input layer specifically designed to process ordered sequences, represented in the form of a one-dimensional array structure. The initial CNN block performs convolutional operations, succeeded by batch normalization, ReLU activation function, max pooling, and dropout. These processes collectively facilitate the extraction of essential features and mitigate the risk of overfitting. This methodology is mirrored throughout the additional CNN layers, each featuring unique kernel dimensions, enabling the model to unearth an array of intricate patterns and relationships embedded within the input data. Upon completion of the CNN layer transformations, the output is converted into a flattened form and directed into an MLP section. This section is composed of a dense layer that incorporates L2 regularization, batch normalization, activation function, and dropout, all of which work together to optimize the model’s robustness while minimizing the risk of overfitting. These processes improve the model’s capacity to capture intricate features while simultaneously lowering the likelihood of overfitting. The attributes extracted from both the convolutional and multilayer perceptron components are combined to create a unified feature set. The final prediction layer utilizes a Sigmoid activation function for two-class classification or a Softmax activation function for multi-category classification.

(i): Binary Classification

The architecture involves a deep learning model designed for binary classification across two datasets, IoT-23 and NF-BoT-IoT-v2. For IoT-23, the input block processes 119 features, while for NF-BoT-IoT-v2, the input block handles 18 features, as illustrated in Figure 12a,b, respectively. The architecture comprises two hidden blocks, where each block contains a one-dimensional CNN layer equipped with 256 filters and activated by ReLU. This is succeeded by max pooling operations with pool sizes of 2 and 4, respectively, along with dropout layers set to a minimal rate of 0.0000001, all working together to minimize the potential for overfitting. The model further processes the extracted features through a dense layer with 1024 neurons and ReLU activation, along with an additional dropout layer (rate 0.0000001). The final output layer consists of a solitary neuron, utilizing a sigmoid activation function to generate probability values suited for binary classification tasks. This design efficiently combines convolutional and fully connected layers to extract meaningful features and perform classification tasks.

Figure 12. Binary classification CNN-MLP layer configuration (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

(ii): Multi-Class Classification excluding Normal Class

The CNN-MLP model for multi-class classification, excluding the normal class, is structured into input, hidden, and output blocks for both the IoT-23 and NF-BoT-IoT-v2 datasets. In the input block, the model accepts data with 119 features for IoT-23 and 27 features for NF-BoT-IoT-v2, as illustrated in Figure 13a,b, respectively. The hidden blocks consist of multiple (three) 1D CNN layers with 256 filters and ReLU activation functions, followed by max-pooling and dropout layers (rate 0.0000001) to prevent overfitting. The model also includes two dense layers with 1024 and 768 units, where the first dense layer uses ReLU activation, the second applies Softmax activation to produce class probabilities, and each dense layer is followed by a dropout layer (rate 0.0000001) to further mitigate overfitting. The output block uses a Softmax layer, with 7 output classes for IoT-23 and 4 output classes for NF-BoT-IoT-v2, as illustrated in Figure 13a,b, respectively, ensuring multi-class classification results.

Figure 13. Multi-class CNN-MLP layer configuration (excluding normal) (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

(iii): Multi-Class Classification including Normal Class

The CNN-MLP model for multi-class classification, including the normal class, is structured into input, hidden, and output blocks for both the IoT-23 and NF-BoT-IoT-v2 datasets. In the input block, the model accepts data with 125 features for IoT-23 and 19 features for NF-BoT-IoT-v2, as illustrated in Figure 14a,b, respectively. The hidden blocks consist of several (three) 1D CNN layers with 256 filters and ReLU activation functions, followed by max-pooling and dropout layers (rate 0.0000001) to reduce overfitting. Dense layers with 1024 and 768 units are employed, with the first dense layer using ReLU activation and the second applying Softmax activation to produce class probabilities. Each dense layer is followed by a dropout layer (rate 0.0000001) to further prevent overfitting. The output block utilizes a Softmax layer, with 8 output classes for IoT-23 and 5 output classes for NF-BoT-IoT-v2, as illustrated in Figure 14a,b, respectively, ensuring multi-class classification with the inclusion of the normal class.

Figure 14. Multi-class CNN-MLP layer configuration (including normal) (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

(iv): CNN-MLP Model Hyperparameter Specifications

The hyperparameter configurations for the CNN-MLP model, as comprehensively detailed in Table 20, are intricately calibrated to address the complexities both types of classification (binary and multi-class). In both scenarios, the batch size is configured to 128, indicating that during each training cycle, the model processes a batch of 128 samples. The learning rate is scheduled, with starting value of 0.001, a reduction factor of 0.5 when the learning plateaus, and a minimum learning rate of 1 × 10⁻⁵, controlled by the ReduceLROnPlateau mechanism, which adjusts the learning rate dynamically based on the performance of the model. The Adam optimizer is used for both classifiers, as it is widely effective for training deep learning models. The binary classification model adopts the binary cross-entropy loss function, ideal for problems involving two classes. On the other hand, the multi-class classification model utilizes categorical cross-entropy, a loss function optimized for handling tasks with more than two categories. The performance of both models is assessed using accuracy, serving as the key metric to measure the correctness of their predictions.

Table 20. Hyperparameter Specification for the CNN-MLP Model.

4. Results and Experiments

In this section, we conduct a thorough evaluation of the proposed models by employing an advanced data resampling and class weights techniques. This approach is designed to address class imbalance and improve model robustness. The experimental results reveal that our models not only achieve superior anomaly detection capabilities but also consistently outperform state-of-the-art approaches. These findings underscore the effectiveness of our methodology in enhancing the accuracy and reliability of intrusion detection systems.

4.1. Description of Dataset and Preprocessing Overview

This work leverages the IoT-23 and NF-BoT-IoT-v2 datasets, which are widely regarded as comprehensive benchmarks for IDS. These datasets encompass a broad spectrum of network activities and attack types, serving as a robust basis for designing and evaluating anomaly detection models. Despite their strengths, they are accompanied by certain challenges, such as incomplete records, duplicated entries, the presence of outliers, and significant class imbalances. Addressing these challenges requires meticulous preprocessing. This section delves into the characteristics of these datasets, their applicability to binary and multi-class classification, and their critical role in advancing IDS research. It also outlines the preprocessing strategies implemented to enhance data quality. These strategies include imputing missing values, removing redundant entries, managing anomalous data points, and correcting imbalanced class distributions, all of which are essential for preparing the datasets for reliable and effective model performance evaluation.

4.1.1. IoT-23 Dataset

The IoT-23 dataset, as described in Section 3.1, spans a wide array of network activities, incorporating both legitimate and harmful traffic from diverse attack types, thereby offering significant value for the advancement of IDS. Nevertheless, the dataset poses multiple challenges, including the presence of missing data, redundant records, and an imbalance in class distribution. The aforementioned issues were mitigated by the application of the preprocessing methods elaborated in Section 3.2. The data preprocessing involved addressing gaps in the dataset, eliminating redundant entries, and employing techniques like z-score for identifying and managing outliers. Additionally, numerical features were normalized using the MinMaxScaler. To tackle class imbalance, advanced resampling methods like enhanced ADASYN-SMOTE for oversampling, advanced ADASYN for multi-class scenarios, and ENN for undersampling were employed, complemented by the use of dynamic class weights during model training. These preprocessing strategies collectively improved the dataset’s ability to address classification challenges for both types of classification (binary and multi-class).

4.1.2. NF-BoT-IoT-v2 Dataset

An IoT NetFlow-based dataset is created by augmenting the NF-BoT-IoT dataset. Features were extracted from publicly available pcap files, and the flows were labeled according to their respective attack categories. The dataset contains a total of 37,763,497 data flows, with 37,628,460 (99.64%) classified as attack samples and 135,037 (0.36%) as benign [70]. The dataset includes five classes, including benign and four distinct attack categories. Data preprocessing is a fundamental process within data examination and machine learning workflows, involving the meticulous transformation of raw, unrefined data into a coherent, structured format that is primed for advanced analysis and the development of effective predictive models. This procedure involves several steps, such as managing missing values, removing duplicate entries, discarding irrelevant information or outliers, identifying key features, and performing normalization or standardization on numerical variables. Furthermore, methods for adjusting class distribution are employed to address imbalances within the classes. Effective preprocessing not only improves the overall quality of the data but also reduces noise, enhancing the efficiency of machine learning models in learning from the processed data. The preprocessing procedures can vary according to the dataset, and for the NF-BoT-IoT-v2 dataset, the initial step involves handling any missing or NaN values, which are promptly eliminated. Duplicates are then removed, followed by outlier detection and elimination using combined Z-Score and LOF methods. A correlation-based feature selection method is employed to reduce dimensionality, and numerical attributes are normalized with MinMaxScaler to ensure uniform data representation. Principal component analysis (PCA) is applied to reduce dimensionality in case of binary classification and one-phase multi-class classification (including the normal class), to retain the most significant features and minimize data redundancy. After performing these steps, the dataset is split into training and testing subsets, followed by recombination, at which point the ADASYN method is applied to the combined data to generate synthetic samples. These synthetic samples are incorporated into the dataset, after which the training and testing sets are split again, keeping the test set intact. This augmented training set improves the model’s ability to learn more effectively, addressing class imbalances and improving the model’s accuracy and overall performance. To further balance the dataset, the ENN method is used for undersampling, and class weights are adjusted during the model training phase. These thorough preprocessing techniques guarantee that the dataset is optimally formatted for machine learning, fostering effective model development and improving its capacity to precisely identify and categorize network anomalies in IoT ecosystems.

4.2. Experiment’s Establishment

The models were developed using TensorFlow 2.17.0 and Keras 3.4.1, utilizing the computational resources provided by the Kaggle platform 1.6.17. The tests were conducted on a system featuring an Nvidia GeForce RTX 1050 graphics card running on the Windows 10 operating system. To maintain the integrity of the evaluation phase, all data resampling techniques were applied solely to the training dataset, ensuring the test dataset remained untouched. Model training was carried out to optimize performance while preventing overfitting.

4.3. Setting up the Experiment

The confusion matrix serves as a crucial mechanism in the evaluation of machine learning models. It organizes data into a comprehensive grid that juxtaposes the true class labels with the model’s predicted outcomes, offering a detailed comparison as described in reference [71]. This matrix serves as a foundational element in the calculation of numerous performance metrics, offering a clear and organized framework for evaluating the accuracy and effectiveness of the model’s predictions. By presenting this comparative information, the confusion matrix facilitates the straightforward computation of key evaluation metrics, enabling a more nuanced understanding of model performance.

True Positive (TP): Instances where the model accurately predicts the positive class, correctly identifying the presence of the target condition or event.
False Negative (FN): Cases where the model incorrectly predicts a positive instance as negative, failing to identify the presence of the target condition or event.
True Negative (TN): Instances where the model accurately predicts the negative class, correctly recognizing the absence of the target condition or event.
False Positive (FP): Cases where the model erroneously classifies a negative instance as positive, mistakenly identifying the absence of the target condition or event as its presence.

Equation (2) [72] delineates accuracy, a fundamental and straightforward metric derived from the confusion matrix. Accuracy quantifies the fraction of instances that are correctly identified by the model, including both correctly identified positive cases and correctly identified negative cases, in relation to the overall total of instances. This metric provides a transparent evaluation of the model’s overall effectiveness in accurately detecting both positive and negative instances.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(2)

In the case of imbalanced datasets, relying solely on accuracy may fail to offer a thorough assessment of the model’s effectiveness. As a result, evaluating the model requires incorporating supplementary metrics like recall, precision, and the F1-score for a more comprehensive analysis. Precision, defined as the ratio of correctly identified positive instances to the total number of predicted positives (including incorrect ones), is also known as the positive predictive value and is computed according to Equation (3) [72]. Recall, as defined in Equation (4) [72,73], quantifies the fraction of genuinely positive instances that the model accurately detects from the total number of actual positives. The F-score, calculated using Equation (5) [74,75,76], represents the harmonic mean of precision and recall, offering a balanced measure of the model’s performance across these two dimensions.

P r e c i s i o n = \frac{T P}{T P + F P}

(3)

R e c a l l = \frac{T P}{T P + F N}

(4)

F s c o r e = \frac{2 * p r e c i s i o n * r e c a l l}{p r e c i s i o n + r e c a l l}

(5)

In this context, the main objective is to improve key performance measures, such as the F-score, accuracy, recall, and precision, as specified by the assessment standards. This involves optimizing these metrics to achieve a well-rounded assessment of the model’s effectiveness, ensuring it performs robustly across all key aspects of classification quality.

4.4. Results

The analysis of the IoT-23 dataset, followed by additional evaluations on datasets such as NF-BoT-IoT-v2 dataset, validates the models’ effectiveness and generalizability, demonstrating consistent performance across diverse network environments. This illustrates the resilience of the designed architectures in tackling classification challenges across both binary and multi-class scenarios, with the CNN-MLP framework consistently demonstrating exceptional capability. This model excelled not only in detecting various types of attacks but also in accurately distinguishing between them, outperforming other models like DNN, CNN, and Autoencoder in every evaluation measure. These results underscore the essential impact of the utilized preprocessing methods and demonstrate the effectiveness of the CNN-MLP model in tackling intricate classification challenges. Training and testing were conducted using carefully prepared subsets of the IoT-23 and NF-BoT-IoT-v2 datasets, with a comprehensive evaluation carried out to examine the effect of data resampling methods and class weights on model performance. Additionally, the performance of the suggested models was evaluated in comparison to established intrusion detection frameworks from prior research, offering a comprehensive insight into their comparative advantages and constraints.

(i): Binary Classification

The evaluation of binary classification performance using data resampling techniques and class weights on the IoT-23 and NF-BoT-IoT-v2 datasets revealed varying results across different models, as detailed in Table 21. For the IoT-23 dataset, the CNN model achieved 99.92% across all metrics, while the DNN model recorded an accuracy, recall, and F1-score of 99.24% and a precision of 99.25%. The Autoencoder revealed considerably diminished performance, recording an accuracy and recall of 84.02%, precision at 89.34%, and an F1-score of 84.64%. For the NF-BoT-IoT-v2 dataset, both the DNN and Autoencoder models achieved uniform metrics of 99.95%, while the CNN model exhibited a slight reduction to 99.94% across all metrics. Notably, the hybrid CNN-MLP model consistently outperformed all other approaches, achieving the highest metrics of 99.94% on the IoT-23 dataset and 99.96% on the NF-BoT-IoT-v2 dataset. These results underscore the effectiveness of the CNN-MLP model in leveraging data resampling techniques and class weights to deliver superior classification performance across a variety of datasets.

Table 21. Evaluation metrics for binary classification with the application of data resampling techniques and class weights.

(ii): Multi-Class Classification excluding Normal Class

The evaluation of multi-class classification performance, excluding the normal class, with data resampling techniques and class weights on the IoT-23 and NF-BoT-IoT-v2 datasets yielded different results across the models, as detailed in Table 22. For the IoT-23 dataset, the CNN model attained an accuracy of 99.97%, a precision of 99.98%, a recall of 99.97%, and an F1-score of 99.97%. The Autoencoder exhibited slightly lower effectiveness, achieving an accuracy of 99.57%, a precision of 99.60%, a recall of 99.57%, and an F1-score of 99.57%. The DNN model obtained 99.94% across accuracy, precision, recall, and F1-score. The combined CNN-MLP framework reached 99.99% for accuracy, precision, recall, and F1-score. For the NF-BoT-IoT-v2 dataset, the CNN model registered an accuracy of 97.99%, a precision of 98.04%, a recall of 97.99%, and an F1-score of 97.99%. The Autoencoder achieved an accuracy of 98.01%, a precision of 98.06%, a recall of 98.01%, and an F1-score of 98.00%. The DNN model recorded 97.92% for accuracy, recall, and F1-score, with a precision of 97.99%. The CNN-MLP model achieved the highest performance, with an accuracy of 98.02%, precision of 98.07%, recall of 98.02%, and F1-score of 98.02%. These results show that the CNN-MLP model consistently outperformed all other models across the two datasets.

Table 22. Evaluation metrics for multi-class classification excluding normal class with the application of data resampling techniques and class weights.

(iii): Multi-Class Classification including Normal Class

The evaluation of multi-class classification performance, including the normal class, using data resampling techniques and class weights on the IoT-23 and NF-BoT-IoT-v2 datasets revealed notable differences across the models, as detailed in Table 23. For the IoT-23 dataset, the CNN model recorded an accuracy of 99.81%, a precision of 99.84%, a recall of 99.81%, and an F1-score of 99.82%. The Autoencoder demonstrated notably poorer performance, achieving an accuracy of 87.12%, a precision of 87.70%, a recall of 87.12%, and an F1-score of 87.16%. The DNN model exhibited exceptional performance, with an accuracy of 99.67%, precision of 99.73%, recall of 99.67%, and an F1-score of 99.69%. The hybrid CNN-MLP model surpassed all other models, achieving an accuracy of 99.91%, precision of 99.92%, recall of 99.91%, and an F1-score of 99.91%. For the NF-BoT-IoT-v2 dataset, the CNN model attained an accuracy of 98.04%, precision of 98.04%, recall of 98.04%, and an F1-score of 98.04%. The Autoencoder achieved an accuracy of 98.08%, a precision of 98.12%, a recall of 98.08%, and an F1-score of 98.07%. The DNN model attained 97.89% accuracy, 97.92% precision, 97.89% recall, and 97.88% F1-score. The CNN-MLP model again outperformed all models with an accuracy of 98.11%, precision of 98.17%, recall of 98.11%, and F1-score of 98.11%. These results demonstrate the superior performance of the CNN-MLP model across the two datasets.

Table 23. Evaluation metrics for multi-class classification including normal class with the application of data resampling techniques and class weights.

4.5. Time, Memory, and Enegry for CNN-MLP Model

In the context of deep learning models, the computational resources required for training and inference are critical factors that influence both model performance and deployment feasibility, particularly in IoT devices such as edge devices. This section explores the time, memory, and energy consumption aspects of the CNN-MLP model, which are essential for evaluating the model’s efficiency and practicality in real-world applications. We will present the training and inference times across different IoT datasets and classification types, along with the memory consumption of the model. Additionally, the energy efficiency, a key consideration for optimizing the model’s performance without excessive energy consumption, is also assessed. By analyzing these metrics, we aim to provide a comprehensive overview of the model’s resource demands and the trade-offs involved in deploying deep learning models on devices with limited computational capacity, especially in IoT devices.

(i): Training time

The training times for the CNN-MLP model across different datasets and classification types, with each batch containing 128 samples, are presented in Table 24. For the IoT-23 dataset, the model exhibits varying training times, with the binary classification taking 0.0688 s per batch (0.54 milliseconds per sample), the multi-class classification excluding the normal class requiring 0.0824 s per batch (0.64 milliseconds per sample), and the multi-class classification including the normal class needing 0.0870 s per batch (0.68 milliseconds per sample). On the NF-BoT-IoT-v2 dataset, the binary classification shows a slightly shorter training time of 0.0579 s per batch (0.45 milliseconds per sample), while the multi-class classification excluding the normal class takes 0.0680 s per batch (0.53 milliseconds per sample). The multi-class classification including the normal class on this dataset takes 0.0747 s per batch (0.58 milliseconds per sample). These results demonstrate that the model’s training time varies with both the dataset and classification type, with the NF-BoT-IoT-v2 dataset generally requiring shorter training times compared to the IoT-23 dataset.

Table 24. Training time for CNN-MLP Model.

(ii): Inference time

The inference times for the CNN-MLP model across different datasets and classification types, with each batch containing 128 samples, are summarized in Table 25. For the IoT-23 dataset, the model demonstrates an inference time of 0.0575 s per batch (0.45 milliseconds per sample) for binary classification, 0.0613 s per batch (0.48 milliseconds per sample) for multi-class classification excluding the normal class, and 0.0674 s per batch (0.53 milliseconds per sample) for multi-class classification including the normal class. On the NF-BoT-IoT-v2 dataset, the binary classification shows an inference time of 0.0570 s per batch (0.45 milliseconds per sample), while the multi-class classification excluding the normal class takes 0.0634 s per batch (0.50 milliseconds per sample). The multi-class classification including the normal class requires 0.0687 s per batch (0.54 milliseconds per sample). These results indicate that the inference times are slightly higher on the IoT-23 dataset compared to the NF-BoT-IoT-v2 dataset across all classification types, with the multi-class classifications generally taking longer to infer than binary classification.

Table 25. Inference time for CNN-MLP Model.

(iii): Memory consumption

The memory consumption for the CNN-MLP model is shown in Table 26 across different datasets and classification types. For the IoT-23 dataset, memory consumption increases with the complexity of the classification task. The model consumes 111.63 Megabytes (MB) for binary classification, 119.70 MB for multi-class classification excluding the normal class, and 123.84 MB for multi-class classification including the normal class. In comparison, the NF-BoT-IoT-v2 dataset requires significantly less memory, with 15.54 MB for binary classification, 23.99 MB for multi-class classification excluding the normal class, and 25.69 MB for multi-class classification including the normal class. These findings illustrate that the IoT-23 dataset demands notably higher memory resources than the NF-BoT-IoT-v2 dataset for similar classification tasks.

Table 26. Memory consumption for CNN-MLP Model.

(iv): Energy efficiency

Energy efficiency in the context of deep learning models, particularly for resource-constrained devices like IoT systems and mobile devices, is crucial for ensuring optimal performance without excessive energy consumption. Energy consumption (E) is the amount of energy a system uses to perform its tasks, while energy efficiency refers to how effectively the system uses this energy. The efficiency can be evaluated by the ratio of useful output (performance) to the energy consumed. In deep learning models, particularly for IDS, energy consumption is directly influenced by the model’s complexity, the hardware used for computation, and the algorithms deployed. Energy consumption can be calculated as shown in Equation (6) [77].

E = µ \times CC \times {(F)}^{2}

(6)

where µ is a coefficient expressing the capacitance of the operating chip, F is the frequency of the processor’s operation, and CC is the number of clock cycles required to process the code.

Energy efficiency (

η

) can be derived as the inverse of energy consumption, as shown in Equation (7) [78].

η = \frac{P e r f o r m a n c e}{E}

(7)

where Performance refers to the output of the model, such as accuracy or classification performance, and E is the total energy consumed. Optimizing energy efficiency for deep learning models involves minimizing E while maximizing Performance, which can be achieved by techniques such as model pruning, quantization, and the use of energy-efficient hardware accelerators.

5. Discussion

This section offers an in-depth evaluation of the CNN-MLP model’s effectiveness, comparing its performance with other classification approaches, including CNN, Autoencoder, and DNN, in both types of classification (binary and multi-class) scenarios. A comprehensive examination of confusion matrices and evaluation metrics will be conducted for each model, providing an in-depth comparative assessment. By evaluating results from the IoT-23 and NF-BoT-IoT-v2 datasets, we will highlight the comparative strengths and weaknesses of each approach. Special attention will be given to how the CNN-MLP model’s metrics across various classes reflect its integration of CNN and MLP, demonstrating advancements in accuracy, precision, recall, and F1-score. This discussion aims to shed light on the practical implications of these findings, particularly in advancing the accuracy, reliability, and overall effectiveness of intrusion detection systems in real-world environments.

5.1. Binary Classification

For two-class categorization using the IoT-23 and NF-BoT-IoT-v2 datasets, the hybrid CNN-MLP framework exhibits exceptional effectiveness, notably minimizing incorrect positive predictions, as reflected in the error matrices and evaluation indicators. On the IoT-23 dataset, the model exhibited exemplary performance, securing an impressive accuracy of 99.94%, with precision, recall, and F1-score all aligning at the same outstanding 99.94%. The confusion matrix reveals its ability to handle real-world class imbalances, correctly identifying 10,488 normal instances and 24,144 attack instances, while incorrectly labeling only 9 normal instances in the attack category and 11 attack instances in the normal category, as shown in Figure 15a. These results underscore the model’s minimal false positives and false negatives, ensuring precise differentiation between benign and malicious traffic. Similarly, for the NF-BoT-IoT-v2 dataset, the CNN-MLP model attained a superior accuracy of 99.96%, with precision, recall, and F1-score all at 99.96%. The confusion matrix highlights its exceptional reliability, with 1270 normal instances and 10,786 attack instances correctly classified, while only 5 attack instances were misclassified as normal, and no normal instances were incorrectly flagged as attacks, as shown in Figure 15b. This near-perfect precision and recall demonstrate the model’s robust feature extraction capabilities and its ability to effectively mitigate false positives, which is critical for reducing unnecessary alerts in practical intrusion detection systems. These findings validate the CNN-MLP model’s effectiveness in achieving balanced, high-performance metrics across diverse datasets.

Figure 15. Confusion matrix for binary classification with the CNN-MLP model on (a) IoT-23 dataset (b) NF-BoT-IoT-v2 dataset.

The CNN-MLP model’s performance significantly outperforms other binary classifiers, such as the standalone CNN, Autoencoder, and DNN models, across both the IoT-23 and NF-BoT-IoT-v2 datasets, as shown in Figure 16 and Figure 17. On the IoT-23 dataset, the CNN-MLP achieves exceptional results, with 99.94% accuracy, precision, recall, and F1-score, highlighting its advanced capability in detecting intrusions. The 99.94% precision ensures minimal false positives, while the 99.94% recall reflects its proficiency in identifying nearly all true positives with very few false negatives. The F1-score of 99.94% underscores the model’s balanced performance. On the NF-BoT-IoT-v2 dataset, the CNN-MLP further excels, achieving 99.96% in all metrics, ensuring nearly perfect detection with negligible false positives and false negatives. In contrast, the Autoencoder demonstrates significantly lower performance on the IoT-23 dataset, with 84.02% accuracy, 89.34% precision, 84.02% recall, and 84.64% F1-score, indicating its limited ability to identify both benign and malicious traffic effectively. The CNN and DNN models perform well, with the CNN achieving 99.92% across all metrics and the DNN reaching 99.24% accuracy, 99.25% precision, 99.24% recall, and 99.24% F1-score. On the NF-BoT-IoT-v2 dataset, the DNN and Autoencoder achieve 99.95% across all metrics, while the CNN reaches 99.94% in accuracy, precision, recall, and F1-score. The CNN-MLP surpasses all, making it the most reliable and robust classifier for binary intrusion detection across both datasets.

Figure 16. Comparison of the proposed CNN-MLP model with binary classifiers on IoT-23 dataset.

Figure 17. Comparison of the proposed CNN-MLP model with binary classifiers on NF-BoT-IoT-v2 dataset.

The CNN-MLP model demonstrates exceptional performance in binary classification across the IoT-23 and NF-BoT-IoT-v2 datasets, as shown in Table 27 and Table 28, showcasing its reliability in identifying both benign and malicious traffic. On the IoT-23 dataset, it achieves 99.91% accuracy, 99.90% precision, 99.91% recall, and an F1-score of 99.90% for the Normal class, alongside 99.95% accuracy, 99.96% precision, 99.95% recall, and an F1-score of 99.96% for the Attack class, highlighting its ability to minimize false positives and negatives. Similarly, for the NF-BoT-IoT-v2 dataset, the model attains 100% accuracy and recall, 99.61% precision, and a 99.80% F1-score for the Normal class, while achieving 99.95% accuracy, 100% precision, 99.95% recall, and an F1-score of 99.98% for the Attack class. These results underscore the CNN-MLP model’s balanced precision and recall, making it a robust choice for intrusion detection in real-world scenarios.

Table 27. Evaluation metrics for the CNN-MLP model in binary classification across different classes on the IoT-23 dataset.

Table 28. Evaluation metrics for the CNN-MLP model in binary classification across different classes on NF-BoT-IoT-v2 dataset.

5.2. Multi-Class Classification Excluding Normal Class

For multi-category classification, omitting the standard class from the IoT-23 dataset, the CNN-MLP architecture demonstrated outstanding effectiveness across key evaluation parameters, such as accuracy, precision, recall, and F1-score, reaching an impeccable 99.99% in each, as depicted in the confusion matrix. The model’s outstanding proficiency in identifying distinct attack types is clearly demonstrated by its exceptional performance. For example, it successfully identified 9732 instances of PartOfAHorizontalPortScan, 9068 instances of DDoS, and 2156 instances of Okiru, with near-perfect classification and minimal misclassification. Similarly, it correctly recognized 1173 instances of C&C-HeartBeat, 1036 instances of C&C, 327 instances of Attack, and 37 instances of C&C-PartOfAHorizontalPortScan, with a negligible number of misclassified positive and negative instances, as depicted in Figure 18. The model’s precision of 99.99% highlights its ability to minimize false positives and ensuring accurate classification of various attack types. This is a significant achievement, as reducing false positives is crucial for avoiding unnecessary alerts and resource consumption in real-world intrusion detection systems. Furthermore, the recall of 99.99% reflects the model’s capability in identifying nearly all true positives, ensuring that most attacks are detected with very few false negatives. The F1-score of 99.99% demonstrates the model’s balanced performance in both precision and recall, ensuring robust and reliable classification across a wide range of attack types. The confusion matrix, as shown in Figure 18, validates the model’s proficiency in managing multi-category classification with outstanding accuracy, establishing it as a highly efficient solution for intrusion detection, especially when dealing with varied and unbalanced datasets such as IoT-23. The model’s success in reducing false positives and accurately classifying various attack classes underlines its robustness and effectiveness in real-world scenarios.

Figure 18. Confusion matrix for multi-class classification (excluding the normal class) with the CNN-MLP model on the IoT-23 dataset.

In multi-class classification excluding normal class on the NF-BoT-IoT-v2 dataset, the CNN-MLP model attained an accuracy of 98.02%, precision of 98.07%, recall of 98.02%, and F1-score of 98.02%, demonstrating its ability to effectively distinguish between attack types such as Reconnaissance, DDoS, DoS, and Theft. The model correctly identified key attack instances, including 2601 instances of Reconnaissance, 4207 of DDoS, 3509 of DoS, and 54 of Theft, with minimal misclassification, as shown in Figure 19. Its precision underscores the model’s ability to significantly reduce false positives, ensuring accurate classification of legitimate traffic while minimizing unnecessary alerts and conserving system resources, which is vital for practical intrusion detection systems. The recall illustrates the model’s effectiveness in identifying true positives with minimal false negatives. The well-balanced F1-score further validates the model’s robust performance, proving its efficiency in handling imbalanced datasets like NF-BoT-IoT-v2 for intrusion detection.

Figure 19. Confusion matrix for multi-class classification (excluding the normal class) with the CNN-MLP model on NF-BoT-IoT-v2 dataset.

The effectiveness of multi-category classification, excluding the Normal class, on the IoT-23 and NF-BoT-IoT-v2 datasets, as depicted in Figure 20 and Figure 21, reveals that the CNN-MLP model outperforms other classifiers, including the standalone CNN, Autoencoder, and DNN models. For the IoT-23 dataset, the CNN achieves 99.97% accuracy, 99.98% precision, 99.97% recall, and 99.97% F1-score, demonstrating high efficiency and solid detection capabilities. The Autoencoder shows slightly lower performance with 99.57% accuracy, 99.60% precision, 99.57% recall, and 99.57% F1-score, indicating a strong, though less optimal, performance. The DNN model attained 99.94% accuracy, 99.94% precision, 99.94% recall, and 99.94% F1-score, showcasing reliable detection capabilities. On the NF-BoT-IoT-v2 dataset, the CNN attains 97.99% accuracy, 98.04% precision, 97.99% recall, and 97.99% F1-score. The Autoencoder achieves 98.01% accuracy, 98.06% precision, 98.01% recall, and 98.00% F1-score, while the DNN reaches 97.92% accuracy, 97.99% precision, 97.92% recall, and 97.92% F1-score. The CNN-MLP model excels on both datasets, achieving 99.99% accuracy, 99.99% precision, 99.99% recall, and 99.99% F1-score for the IoT-23 dataset, and 98.02% accuracy, 98.07% precision, 98.02% recall, and 98.02% F1-score for the NF-BoT-IoT-v2 dataset. Overall, the CNN-MLP model proves to be the most reliable and efficient classifier for intrusion detection, outperforming the other models across both datasets with minimal false positives and false negatives.

Figure 20. Comparison of the proposed CNN-MLP model with multi-class classifiers excluding normal class on IoT-23 dataset.

Figure 21. Comparison of the proposed CNN-MLP model with multi-class classifiers excluding normal class on NF-BoT-IoT-v2 dataset.

The CNN-MLP model exhibits outstanding performance in multi-class classification tasks on the IoT-23 dataset, excluding the normal class, as shown in Table 29. The model achieves near-perfect results across various attack classes. For the PartOfAHorizontalPortScan class, it attains 99.99% accuracy, 100% precision, 99.99% recall, and 99.99% F1-score, demonstrating exceptional classification capability. In a similar manner, for the DDoS class, the model demonstrates a flawless performance, achieving 99.99% in accuracy, precision, recall, and F1-score. The Okiru class also exhibits exceptional performance, attaining perfect scores of 100% in accuracy, precision, recall, and F1-score. The C&C-HeartBeat class performs well with 99.91% accuracy, 100% precision, 99.91% recall, and 99.96% F1-score. In contrast, the C&C class delivers perfect results across all metrics, securing 100% in accuracy, precision, recall, and F1-score. The Attack class exhibits exceptional consistency, reaching a flawless accuracy of 100%, precision of 99.70%, a perfect recall rate of 100%, and an impressive F1-score of 99.85%. Despite the C&C-PartOfAHorizontalPortScan class showing a slight decrease in performance, with 100% accuracy, 97.37% precision, 100% recall, and 98.67% F1-score, the model remains highly proficient in correctly identifying this complex and nuanced attack pattern. Overall, these results emphasize the CNN-MLP model’s exceptional ability to handle multi-class classification tasks, especially in distinguishing between attack classes with high precision, establishing it as an optimal choice for practical intrusion detection frameworks that demand precise and dependable identification of diverse attack patterns.

Table 29. Evaluation metrics for the CNN-MLP model in multi-class classification, excluding the normal class, on the IoT-23 dataset.

The CNN-MLP model demonstrates exceptional effectiveness in executing multi-class classification on the NF-BoT-IoT-v2 dataset, intentionally omitting the normal class, as outlined in Table 30. The model achieves 94.72% accuracy, 99.27% precision, 94.72% recall, and 96.94% F1-score for the Reconnaissance class, indicating a strong ability to identify this attack type with high precision. For the DDoS class, the model excels with 99.41% accuracy, 99.53% precision, 99.41% recall, and 99.47% F1-score, showcasing its excellent detection capabilities for this specific attack. The DoS class also demonstrates impressive results with 98.93% accuracy, 95.43% precision, 98.93% recall, and 97.15% F1-score, reflecting the model’s robust performance. In the Theft class, the CNN-MLP model attains 98.18% accuracy, 96.43% precision, 98.18% recall, and 97.30% F1-score, further demonstrating its reliability in detecting a wide range of attack types. These results underline the CNN-MLP model’s effectiveness in handling complex multi-class classification tasks on the NF-BoT-IoT-v2 dataset, providing highly reliable and accurate performance across various attack categories.

Table 30. Evaluation metrics for the CNN-MLP model in multi-class classification, excluding the normal class, on NF-BoT-IoT-v2 dataset.

5.3. Multi-Class Classification Including Normal Class

In multi-class classification including the normal class on the IoT-23 dataset, the CNN-MLP model achieved exceptional performance, attaining an accuracy of 99.91%, precision of 99.92%, recall of 99.91%, and an F1-score of 99.91%. The model effectively distinguished between benign traffic and various attack types, including PartOfAHorizontalPortScan, DDoS, Okiru, C&C-HeartBeat, C&C, Attack, and C&C-PartOfAHorizontalPortScan. It successfully identified key classes instances, including 10,499 of “benign”, 10,220 of PartOfAHorizontalPortScan, 9015 of DDoS, 2260 of Okiru, 1232 of C&C-HeartBeat, 988 of C&C, 328 of Attack, and 46 of C&C-PartOfAHorizontalPortScan, with minimal misclassification, as shown in Figure 22. The model achieves 99.92% precision, effectively reducing false positives and enhancing system efficiency by focusing on actual threats. With a recall of 99.91%, it excels in identifying true positives, minimizing false negatives. The balanced F1-score demonstrates the model’s strong performance in handling imbalanced datasets such as IoT-23, ensuring accurate classification across various classes.

Figure 22. Confusion matrix for multi-class classification (including the normal class) with the CNN-MLP model on the IoT-23 dataset.

In multi-class classification including the normal class on the NF-BoT-IoT-v2 dataset, the CNN-MLP model exhibited excellent effectiveness, attaining an accuracy of 98.11%, precision of 98.17%, recall of 98.11%, and an F1-score of 98.11%. The model effectively differentiated between benign traffic and various attack types, such as Reconnaissance, DDoS, DoS, and Theft. It accurately identified key class instances, including 1287 of Benign, 2715 of Reconnaissance, 4228 of DDoS, 3559 of DoS, and 50 of Theft, with minimal misclassification, as shown in Figure 23. The model’s high precision and balanced F1-score demonstrate its efficiency in minimizing false positives and handling imbalanced datasets like NF-BoT-IoT-v2, ensuring accurate and reliable classification of all traffic classes.

Figure 23. Confusion matrix for multi-class classification (including the normal class) with the CNN-MLP model on NF-BoT-IoT-v2 dataset.

The performance of multi-class classification, including the normal class, on both the IoT-23 and NF-BoT-IoT-v2 datasets demonstrates the excellent effectiveness of the CNN-MLP model over the CNN, Autoencoder, and DNN models, as shown in Figure 24 and Figure 25. For the IoT-23 dataset, the CNN attained 99.81% accuracy, 99.84% precision, 99.81% recall, and 99.82% F1-score, demonstrating its robust and efficient intrusion detection abilities. The Autoencoder achieves 87.12% accuracy, 87.70% precision, 87.12% recall, and 87.16% F1-score, indicating that it is less effective in distinguishing among various classes. The DNN, on the other hand, delivers solid results with 99.67% accuracy, 99.73% precision, 99.67% recall, and 99.69% F1-score, proving its reliability in detecting intrusions. However, the CNN-MLP outshines all others with 99.91% accuracy, 99.92% precision, 99.91% recall, and 99.91% F1-score, establishing itself as the most efficient and dependable model. Moving to the NF-BoT-IoT-v2 dataset, the CNN achieves 98.04% across all metrics, demonstrating strong performance. The Autoencoder’s results are similarly close, achieving 98.08% accuracy, 98.12% precision, 98.08% recall, and 98.07% F1-score. The DNN achieves 97.89% accuracy, 97.92% precision, 97.89% recall, and 97.88% F1-score, indicating slightly lower performance. The CNN-MLP attains 98.11% accuracy, 98.17% precision, 98.11% recall, and 98.11% F1-score, confirming its dominance as the top performer for intrusion detection across both datasets.

Figure 24. Proposed CNN-MLP versus multi-class classifiers including normal class on IoT-23 dataset.

Figure 25. Proposed CNN-MLP versus multi-class classifiers including normal class on NF-BoT-IoT-v2 dataset.

The CNN-MLP model achieves outstanding performance in multi-class classification, including the normal class, on the IoT-23 dataset, as presented in Table 31. For the Benign class, the model attains 99.92% accuracy, 99.93% precision, 99.92% recall, and 99.93% F1-score, demonstrating excellent capability in correctly classifying normal traffic. The model performs exceptionally well in attack detection, with the PartOfAHorizontalPortScan class achieving 99.79% accuracy, 99.99% precision, 99.79% recall, and 99.89% F1-score, highlighting its precise identification of this attack. For the DDoS class, the model reaches near-perfect performance with 99.99% accuracy, 100% precision, 99.99% recall, and 99.99% F1-score. The Okiru and C&C-HeartBeat classes likewise demonstrate perfect performance, each achieving 100% accuracy, precision, recall, and F1-score, underlining the model’s outstanding effectiveness in detecting various attack categories. The C&C class performs well with 99.80% accuracy, 99.40% precision, 99.80% recall, and 99.60% F1-score. The Attack class shows perfect classification with 100% across all metrics. The model also achieves impressive results for the C&C-PartOfAHorizontalPortScan class, with 100% accuracy, 71.88% precision, 100% recall, and 83.64% F1-score, demonstrating its effective detection even in complex class patterns. These results highlight the CNN-MLP model’s ability to achieve both high accuracy and precision, offering a reliable and efficient solution for multi-class intrusion detection on the IoT-23 dataset.

Table 31. Evaluation metrics for the CNN-MLP model in multi-class classification, including the normal class, on the IoT-23 dataset.

The CNN-MLP model exhibits strong performance in multi-class classification, including the normal class, on the NF-BoT-IoT-v2 dataset, as shown in Table 32. For the Benign class, the model achieves perfect accuracy of 100%, with 99% precision, 100% recall, and 99.50% F1-score, indicating exceptional performance in correctly identifying normal traffic. In the Reconnaissance class, the model performs well with 94.47% accuracy, 99.85% precision, 94.47% recall, and 97.09% F1-score, demonstrating effective detection of this attack type. The DDoS class shows strong results with 99.18% accuracy, 99.30% precision, 99.18% recall, and 99.24% F1-score. Likewise, for the DoS class, the model attains 99.05% accuracy, 95.19% precision, 99.05% recall, and 97.08% F1-score, demonstrating robust efficacy in detecting denial-of-service attacks. The Theft class also performs excellently with 100% accuracy, 98.04% precision, 100% recall, and 99.01% F1-score. These results underscore the CNN-MLP model’s robust and reliable classification performance across various classes, making it an effective tool for intrusion detection in IoT-based networks.

Table 32. Evaluation metrics for the CNN-MLP model in multi-class classification, including the normal class, on NF-BoT-IoT-v2 dataset.

5.4. Advantages and Trade-Offs of Two-Stage and Single-Phase Classification

The two-stage classification process utilized in the IDS begins with a binary classifier that determines whether network traffic is normal or indicative of an attack. The CNN-MLP model exhibited outstanding performance, securing a remarkable accuracy of 99.94% on the IoT-23 dataset, while attaining 99.96% accuracy on the NF-BoT-IoT-v2 dataset, for binary classification. If the traffic is classified as Normal, it is allowed to proceed uninterrupted. However, when an Attack is detected, the system blocks the traffic and activates a multi-class classifier in the background, which reached an accuracy of 99.99% in multi-class classification excluding the normal class on the IoT-23 dataset, and 98.02% on the NF-BoT-IoT-v2 dataset. This sequential classification ensures that critical decisions are made based solely on the binary classification, prioritizing rapid responses. By decoupling these processes, the two-stage approach significantly reduces the time and complexity involved compared to a single-phase multi-class classification system, which achieved an accuracy of 99.91% in single-phase multi-class classification, including the normal class, on the IoT-23 dataset, and 98.11% on the NF-BoT-IoT-v2 dataset. Additionally, inference times further emphasize this advantage. The binary classifier operates in just 0.45 milliseconds on the IoT-23 and NF-BoT-IoT-v2 datasets, whereas the single-phase multi-class classification, including the normal class, requires 0.53 milliseconds on the IoT-23 dataset and 0.54 milliseconds on the NF-BoT-IoT-v2 dataset. The swift determination by the binary classifier to allow or block traffic enables prompt action, while the detailed attack classification is handled independently, maintaining real-time performance. Ultimately, this two-stage methodology not only enhances efficiency but also improves response times in detecting and mitigating security threats, yielding higher accuracy than the single-phase approach.

5.5. Effect of Data Resampling and Class Weights on CNN-MLP Model Performance

The performance of the CNN-MLP model on the IoT-23 dataset, with and without data resampling and class weights, is shown in Table 33. For the binary classification task, the model performs exceptionally well without resampling and class weights, achieving an accuracy of 99.38%, precision of 99.39%, recall of 99.38%, and F1-score of 99.38%. When data resampling and class weights are applied, the model shows a notable improvement, with accuracy, precision, recall, and F1-score all attaining 99.94%. In the multi-class classification task excluding the normal class, the model maintains excellent performance both with and without resampling and class weights. Without resampling, it attained 99.96% for accuracy, precision, recall, and F1-score, while with resampling and class weights, the performance improves to 99.99% across all metrics. When the normal class is included in multi-class classification, the model still demonstrates strong performance, with a slight reduction in scores. Without the application of resampling, the model attains an accuracy of 99.84%, along with precision, recall, and F1-score values of 99.84%, 99.84%, and 99.80%, respectively. When incorporating resampling techniques and class weights, the model demonstrates an improved performance, reaching 99.91% in accuracy, 99.92% in precision, 99.91% in recall, and 99.91% in F1-score. These outcomes highlight the effectiveness of both the data resampling and class weights in improving the model’s classification capabilities across different tasks.

Table 33. Performance metrics of the CNN-MLP model on the IoT-23 dataset with and without data resampling and class weights across different classification types.

5.6. Case Study for Zero-Day Attack

In today’s rapidly evolving cyber threat landscape, Zero-Day attacks pose a critical challenge, exploiting unknown vulnerabilities and evading traditional security defenses. To address this, we applied a cutting-edge CNN-MLP deep learning model to the IoT-23 dataset, aiming to enhance the detection of such attacks. In a carefully designed evaluation scenario, we excluded the Okiru attack from the training dataset, leaving it solely in the testing dataset to simulate a real-world Zero-Day attack detection situation. The model demonstrated impressive performance, correctly identifying 710 out of 717 instances of the Okiru attack, as illustrated in Figure 26, highlighting its robustness in detecting previously unseen threats and showcasing its ability to effectively generalize in dynamic, high-stakes cybersecurity environments.

Figure 26. Confusion matrix of the CNN-MLP model on the IoT-23 dataset, focusing on Zero-Day attack detection.

5.7. Case Study for Detecting Zero-Day Attack for Fully Synthetic Data

The experimental setup was further enhanced by supplementing it with synthetic data that represents previously unseen attack patterns. Synthetic data was generated using techniques such as Gaussian noise injection, data augmentation, and domain-specific rules to simulate realistic network traffic anomalies and attack scenarios. These synthetic samples were excluded during training to emulate real-world conditions where new threats emerge. The inclusion of this data in testing allowed us to assess the model’s generalization capabilities and adaptability to detect novel intrusions. The results demonstrated that the proposed CNN-MLP model effectively identified anomalous patterns, even when exposed to synthetic zero-day attacks, reinforcing its robustness for real-world deployment In our work, a synthetic dataset is generated from the IoT23 dataset using Gaussian noise injection method to simulate network traffic anomalies for evaluating the model’s ability to detect zero-day attacks. This synthetic data consists of 1000 samples for the attack class, representing unseen attack patterns, and 1000 samples for the normal class, representing legitimate network traffic. The results demonstrate that the CNN-MLP model successfully detects 983 out of 1000 synthetic attacks, as shown in Figure 27, highlighting the model’s high accuracy in identifying novel intrusions.

Figure 27. Confusion matrix of the CNN-MLP model on the IoT-23 dataset, highlighting Zero-Day attack detection using synthetic data.

5.8. Strategies for Minimizing False Positives and Negatives

Minimizing false positives in attack detection, particularly within IoT networks, requires a multi-faceted approach that combines advanced techniques in resampling, data preprocessing, and model optimization. Resampling methods such as SMOTE and ADASYN address class imbalance by generating synthetic examples of minority classes, ensuring the model learns patterns that reduce bias toward majority classes. Feature engineering, including the selection of IoT-specific features and the elimination of noisy or redundant data, enhances the model’s ability to differentiate between benign and malicious traffic. Additionally, threshold optimization is crucial to balance sensitivity and specificity, fine-tuning the system to detect anomalies without overreacting to normal behavior. Ensemble models, such as hybrid CNN-MLP frameworks, further improve detection by leveraging CNN for feature extraction and MLP for classification accuracy. To adapt to the dynamic nature of IoT traffic, real-time learning mechanisms and dynamic resampling are employed, allowing the system to adjust as patterns evolve. Finally, post-processing of alerts using meta-classifiers or aggregation techniques ensures that low-confidence predictions are filtered, further reducing unnecessary alarms. When evaluated on the IoT-23 dataset for binary classification, the CNN-MLP model identified 9 false positives out of 10,497 normal instances, indicating some misclassifications of normal behavior as attacks. However, for the NF-BoT-IoT-v2 dataset, the CNN-MLP model performed better, with 0 false positives detected, showcasing the model’s capability to accurately distinguish between benign and malicious traffic in that particular dataset. Minimizing false negatives, or missed attacks, is equally important in attack detection for IoT networks. To address false negatives, techniques such as lowering the detection threshold are employed to increase model sensitivity, capturing subtle anomalies that might otherwise be missed. Combining hybrid CNN-MLP models can significantly reduce false negatives by detecting both known and previously unseen attack patterns. Additionally, dynamic resampling helps ensure that the model remains sensitive to evolving attack patterns in real time, improving its ability to detect emerging threats. Context-aware detection, which considers both temporal and spatial factors, further reduces the risk of false negatives by identifying subtle, context-specific threats that might be overlooked in conventional systems. When tested on the IoT-23 dataset for binary classification, the CNN-MLP model identified 11 instances as false negatives out of 24,155 attack instances, meaning that some attack behaviors were wrongly classified as normal. On the other hand, the model performed better on the NF-BoT-IoT-v2 dataset, detecting just 5 false negatives out of 10,791 attack instances, which underscores its improved ability to distinguish between benign and malicious traffic in this specific dataset. This comprehensive strategy ensures a robust and precise anomaly detection system tailored for the unique challenges of IoT networks.

6. Limitations

The CNN-MLP architecture embodies a highly refined deep learning methodology, merging CNN with MLP to significantly elevate its capability in addressing both binary and multi-class classification challenges. This strategy adeptly tackles fundamental issues in intrusion detection systems by improving performance accuracy and mitigating the effects of class imbalance. However, it is important to recognize and address several limitations and challenges associated with this model:

Scalability: As datasets expand or the intricacy of network traffic intensifies, the computational requirements of the model are likely to rise substantially. This increase may have a considerable effect on the model’s performance, limiting its ability to process extensive datasets or adjust to changing network environments.
Generalization: Despite the CNN-MLP model’s remarkable outcomes on the IoT-23 and NF-BoT-IoT-v2 datasets, its ability to generalize across a broad range of network traffic variations or respond to novel attack strategies has yet to be fully proven. To thoroughly evaluate the model’s robustness and its ability to generalize, it is essential to apply it to a diverse set of datasets, including established ones like KDDCup99 and NSL-KDD, alongside newer datasets such as ToN-IoT and CSE-CIC-IDS2018. This broader evaluation will help determine the model’s effectiveness in diverse and evolving network environments.
Data Preprocessing: Effective data preprocessing is crucial for optimizing model performance across various datasets. This workflow entails handling absent values, converting categorical data into suitable representations, adjusting numerical features for uniformity, and eliminating unnecessary or irrelevant data. The success of the CNN-MLP model is closely tied to the quality and thoroughness of these preprocessing steps, which can significantly influence the model’s overall accuracy and reliability.
Model Adaptation: Adjusting the model for various datasets requires an iterative process of hyperparameter tuning. This step is vital for refining the model, allowing it to align more precisely with the unique patterns and subtleties inherent in unfamiliar datasets. Effective adaptation involves systematically experimenting with different hyperparameters to enhance the model’s performance and ensure its alignment with the specific attributes of each dataset.

7. Conclusions

Securing IoT networks is crucial for organizations aiming to protect their digital infrastructure from emerging cyber threats. This study introduces an advanced IDS that leverages a hybrid CNN-MLP model. By incorporating innovative data resampling techniques such as hybrid ADASYN-SMOTE for binary classification, ADASYN for multiclass scenarios, and ENN combined with class weights, our model is specifically designed to address the challenges of class imbalance within IoT environments, substantially improving detection and classification performance. Comprehensive assessment using the IoT-23 and NF-BoT-IoT-v2 datasets reveals that the proposed CNN-MLP model achieves impressive results with 99.94% accuracy in two-stage binary classification, 99.99% in multiclass classification excluding the normal class, and 99.91% in single-phase multiclass classification, including the normal class on the IoT-23 dataset, while achieving 99.96% accuracy in binary classification, 98.02% in multiclass classification excluding the normal class, and 98.11% in single-phase multiclass classification, including the normal class on the NF-BoT-IoT-v2 dataset. These findings underscore the model’s effectiveness in managing class imbalance and its strength in real-time intrusion detection. The model’s exceptional performance in terms of accuracy, precision, recall, and F1 score highlights its capacity to defend IoT networks against sophisticated threats. This research validates the effectiveness of the hybrid CNN-MLP approach, establishing it as a leading solution for enhancing IoT network security and effectively addressing the complex challenges posed by modern cyber threats.

8. Future Work

In order to tackle the challenges and limitations highlighted in Section 6, future investigations must prioritize the following domains:

Broader Dataset Evaluation: Upcoming research should shift focus to evaluating the CNN-MLP model on an expanded selection of datasets, incorporating both classic datasets such as KDDCup99 and NSL-KDD, and contemporary ones like ToN-IoT and CSE-CIC-IDS2018. This broader evaluation will provide insights into the model’s robustness, generalizability, and effectiveness in identifying and mitigating emerging attack vectors, ensuring that it remains effective in diverse and evolving network environments.
Data Preprocessing Refinement: Further refinement and customization of data preprocessing techniques are essential for optimizing model performance across various datasets. This involves experimenting with and fine-tuning preprocessing methods to better align with the characteristics of each dataset and to understand their impact on the model’s effectiveness. A comprehensive analysis of these preprocessing strategies and their implications for model performance is detailed in Section 3.2 and Section 4.1 of the manuscript.
Model Adaptation and Hyperparameter Optimization: Continuous exploration of strategies for fine-tuning the model is of utmost importance, particularly in refining hyperparameter tuning for various datasets. This requires a comprehensive and systematic evaluation to identify the optimal strategies for customizing the model to fit varying data landscapes. An in-depth exploration of these adaptation techniques and hyperparameter optimization processes are presented in Section 3, with a focus on Section 3.3.4.
Scalability and Computational Efficiency: Improving the model’s processing capability and expandability is essential, enabling it to effectively handle extensive datasets and adjust to progressively intricate network traffic environments. This involves optimizing the model’s architecture and processing capabilities to ensure robust performance and adaptability as data volume and complexity increase.

Author Contributions

Conceptualization, H.K. and M.M.; Methodology, H.K. and M.M.; Software, H.K. and M.M.; Validation, H.K. and M.M.; Writing—original draft, H.K. and M.M.; Supervision, M.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not Applicable.

Informed Consent Statement

Not Applicable.

Data Availability Statement

The datasets used in our study, IoT-23 and NF-BoT-IoT-v2, are publicly available. Below are the URLs for the datasets: IoT-23: https://zenodo.org/records/4743746 (accessed on 20 February 2024) and NF-BoT-IoT-v2: https://staff.itee.uq.edu.au/marius/NIDS_datasets/#RA8 (accessed on 20 February 2024).

Conflicts of Interest

The authors declare no conflict of interest.

References

Makhdoom, I.; Abolhasan, M.; Lipman, J.; Liu, R.P.; Ni, W. Anatomy of Threats to the Internet of Things. IEEE Commun. Surv. Tutorials 2018, 21, 1636–1675. [Google Scholar] [CrossRef]
Tawalbeh, L.; Muheidat, F.; Tawalbeh, M.; Quwaider, M. IoT Privacy and Security: Challenges and Solutions. Appl. Sci. 2020, 10, 4102. [Google Scholar] [CrossRef]
Donnell, L.O. IoT Device Takeovers Surge 100 Percent in 2020. 2020. Available online: https://threatpost.com/iot-devicetakeovers-surge/160504 (accessed on 20 May 2021).
Alzain, M.A.; Soni, S. A Comprehensive Survey on Intrusion Detection Systems in IoT. IEEE Access 2020, 8, 114786–114804. [Google Scholar]
Gao, J.; Wang, Y.; Liu, Y. Deep Learning-Based Network Intrusion Detection for IoT Devices. In Proceedings of the 2019 IEEE 24th Pacific Rim International Symposium on Dependable Computing (PRDC), Kyoto, Japan, 1–3 December 2019; Volume 8, pp. 10112–10122. [Google Scholar]
Ansam, K.; Alazab, A. A critical review of intrusion detection systems in the internet of things: Techniques, deployment strategy, validation strategy, attacks, public datasets and challenges. Cybersecurity 2021, 4, 1–27. [Google Scholar]
Zhang, X.; Xie, J.; Huang, L. Real-Time Intrusion Detection Using Deep Learning Techniques. J. Netw. Comput. Appl. 2020, 140, 45–53. [Google Scholar]
Kumar, S.; Kumar, R. A Review of Real-Time Intrusion Detection Systems Using Machine Learning Approaches. Comput. Secur. 2020, 95, 101944. [Google Scholar]
Smith, A.; Jones, B.; Taylor, C. Enhancing Network Security with Real-Time Intrusion Detection Systems. Int. J. Inf. Secur. 2021, 21, 123–135. [Google Scholar]
Anderson, J.P. Computer security threat monitoring and surveillance. In Technical Report; James P. Anderson Company: Washington, DC, USA, 1980. [Google Scholar]
Mahalingam, A.; Perumal, G.; Subburayalu, G.; Albathan, M.; Altameem, A.; Almakki, R.S.; Hussain, A.; Abbas, Q. ROAST-IoT: A novel range-optimized attention convolutional scattered technique for intrusion detection in IoT networks. Sensors 2023, 23, 8044. [Google Scholar] [CrossRef] [PubMed]
ElKashlan, M.; Elsayed, M.S.; Jurcut, A.D.; Azer, M. A machine learning-based intrusion detection system for iot electric vehicle charging stations (evcss). Electronics 2023, 12, 1044. [Google Scholar] [CrossRef]
Vitorino, J.; Praça, I.; Maia, E. Towards adversarial realism and robust learning for IoT intrusion detection and classification. Ann. Telecommun. 2023, 78, 401–412. [Google Scholar] [CrossRef]
Othman, T.S.; Abdullah, S.M. An intelligent intrusion detection system for internet of things attack detection and identification using machine learning. Aro-Sci. J. Koya Univ. 2023, 11, 126–137. [Google Scholar] [CrossRef]
Gad, A.R.; Nashat, A.A.; Barkat, T.M. Barkat. Intrusion detection system using machine learning for vehicular ad hoc networks based on ToN-IoT dataset. IEEE Access 2021, 9, 142206–142217. [Google Scholar] [CrossRef]
Yaras, S.; Dener, M. IoT-Based Intrusion Detection System Using New Hybrid Deep Learning Algorithm. Electronics 2024, 13, 1053. [Google Scholar] [CrossRef]
Faker, O.; Dogdu, E. Intrusion detection using big data and deep learning techniques. In Proceedings of the 2019 ACM Southeast Conference, Kennesaw, GA, USA, 18–20 April 2019; pp. 86–93. [Google Scholar]
Vinayakumar, R.; Alazab, M.; Soman, K.P.; Poornachandran, P.; Al-Nemrat, A.; Venkatraman, S. Deep learning approach for intelligent intrusion detection system. IEEE Access 2019, 7, 41525–41550. [Google Scholar] [CrossRef]
Farhana, K.; Rahman, M.; Ahmed, M.T. An intrusion detection system for packet and flow based networks using deep neural network approach. Int. J. Electr. Comput. Eng. 2020, 10, 2088–8708. [Google Scholar] [CrossRef]
Zhang, C.; Chen, Y.; Meng, Y.; Ruan, F.; Chen, R.; Li, Y.; Yang, Y. A novel framework design of network intrusion detection based on machine learning techniques. Secur. Commun. Netw. 2021, 2021, 6610675. [Google Scholar] [CrossRef]
Alsharaiah, M.A.; Abualhaj, M.; Baniata, L.H.; Al-Saaidah, A.; Kharma, Q.M.; Al-Zyoud, M.M. An innovative network intrusion detection system (NIDS): Hierarchical deep learning model based on Unsw-Nb15 dataset. Int. J. Data Netw. Sci. 2024, 8, 709–722. [Google Scholar] [CrossRef]
Jouhari, M.; Benaddi, H.; Ibrahimi, K. Efficient Intrusion Detection: Combining X² Feature Selection with CNN-BiLSTM on the UNSW-NB15 Dataset. arXiv 2024, arXiv:2407.14945. [Google Scholar]
Türk, F. Analysis of intrusion detection systems in UNSW-NB15 and NSL-KDD datasets with machine learning algorithms. Bitlis Eren Üniversitesi Fen. Bilim. Derg. 2023, 12, 465–477. [Google Scholar] [CrossRef]
Muhuri, P.S.; Chatterjee, P.; Yuan, X.; Roy, K.; Esterline, A. Using a long short-term memory recurrent neural network (lstm-rnn) to classify network attacks. Information 2020, 11, 243. [Google Scholar] [CrossRef]
Fu, Y.; Du, Y.; Cao, Z.; Li, Q.; Xiang, W. A deep learning model for network intrusion detection with imbalanced data. Electronics 2022, 11, 898. [Google Scholar] [CrossRef]
Yin, Y.; Jang-Jaccard, J.; Xu, W.; Singh, A.; Zhu, J.; Sabrina, F.; Kwak, J. IGRF-RFE: A hybrid feature selection method for MLP-based network intrusion detection on UNSW-NB15 dataset. J. Big Data 2023, 10, 15. [Google Scholar] [CrossRef]
Yoo, J.; Min, B.; Kim, S.; Shin, D.; Shin, D. Study on network intrusion detection method using discrete pre-processing method and convolution neural network. IEEE Access 2021, 9, 142348–142361. [Google Scholar] [CrossRef]
Alzughaibi, S.; El Khediri, S. A cloud intrusion detection systems based on dnn using backpropagation and pso on the cse-cic-ids2018 dataset. Appl. Sci. 2023, 13, 2276. [Google Scholar] [CrossRef]
Basnet, R.B.; Shash, R.; Johnson, C.; Walgren, L.; Doleck, T. Towards Detecting and Classifying Network Intrusion Traffic Using Deep Learning Frameworks. J. Internet Serv. Inf. Secur. 2019, 9, 1–17. [Google Scholar]
Thilagam, T.; Aruna, R. Intrusion detection for network based cloud computing by custom RC-NN and optimization. ICT Express 2021, 7, 512–520. [Google Scholar] [CrossRef]
Farahnakian, F.; Jukka, H. A deep auto-encoder based approach for intrusion detection system. In Proceedings of the 2018 20th International Conference on Advanced Communication Technology (ICACT), Chuncheon, Republic of Korea, 11–14 February 2018; pp. 178–183. [Google Scholar]
Mahmood, H.A.; Hashem, S.H.H. Network intrusion detection system (NIDS) in cloud environment based on hid-den Naïve Bayes multiclass classifier. Al-Mustansiriyah J. Sci. 2018, 28, 134–142. [Google Scholar] [CrossRef]
Baig, M.M.; Awais, M.M.; El-Alfy, E.S.M. A multiclass cascade of artificial neural network for network intrusion detection. J. Intell. Fuzzy Syst. 2017, 32, 2875–2883. [Google Scholar] [CrossRef]
Stoian, N.A. Machine Learning for Anomaly Detection in Iot Networks: Malware Analysis on the Iot-23 Data Set. Bachelor’s Thesis, University of Twente, Enschede, The Netherlands, 2020. [Google Scholar]
Susilo, B.; Sari, R.F. Intrusion detection in IoT networks using deep learning algorithm. Information 2020, 11, 279. [Google Scholar] [CrossRef]
Szczepański, M.; Pawlicki, M.; Kozik, R.; Choraś, M. The application of deep learning imputation and other advanced methods for handling missing values in network intrusion detection. J. Comput. Sci. 2023, 10, 1–23. [Google Scholar] [CrossRef]
Kumar, P.; Bagga, H.; Netam, B.S.; Uduthalapally, V. Sad-iot: Security analysis of ddos attacks in iot networks. Wirel. Pers. Commun. 2022, 122, 87–108. [Google Scholar] [CrossRef]
Sarhan, M.; Siamak, L.; Marius, P. Feature analysis for machine learning-based IoT intrusion detection. arXiv 2021, arXiv:2108.12732. [Google Scholar]
Ferrag, M.A.; Friha, O.; Hamouda, D.; Maglaras, L.; Janicke, H. Edge-IIoTset: A new comprehensive realistic cyber security dataset of IoT and IIoT applications for centralized and federated learning. IEEE Access 2022, 10, 40281–40306. [Google Scholar] [CrossRef]
Abdulhammed, R.; Faezipour, M.; Musafer, H.; Abuzneid, A. Efficient network intrusion detection using PCA-based dimensionality reduction of features. In Proceedings of the 2019 International Symposium on Networks, Computers and Communications (ISNCC), Istanbul, Turkey, 18–20 June 2019; pp. 1–6. [Google Scholar]
Aleesa, A.; Younis, M.O.H.A.M.M.E.D.; Mohammed, A.A.; Sahar, N. Deep-intrusion detection system with en-hanced UNSW-NB15 dataset based on deep learning techniques. J. Eng. Sci. Technol. 2021, 16, 711–727. [Google Scholar]
Ahmad, M.; Riaz, Q.; Zeeshan, M.; Tahir, H.; Haider, S.A.; Khan, M.S. Intrusion detection in internet of things using supervised machine learning based on application and transport layer features using UNSW-NB15 data-set. EURASIP J. Wirel. Commun. Netw. 2021, 2021, 10. [Google Scholar] [CrossRef]
Mohammed, B.; Gbashi, E.K. Intrusion detection system for NSL-KDD dataset based on deep learning and recursive feature elimination. Eng. Technol. J. 2021, 39, 1069–1079. [Google Scholar] [CrossRef]
Umair, M.B.; Iqbal, Z.; Faraz, M.A.; Khan, M.A.; Zhang, Y.-D.; Razmjooy, N.; Kadry, S. A network intrusion detection system using hybrid multilayer deep learning model. Big Data 2022, 12, 367–376. [Google Scholar] [CrossRef]
Choobdar, P.; Naderan, M.; Naderan, M. Detection and multi-class classification of intrusion in software defined networks using stacked auto-encoders and CICIDS2017 dataset. Wirel. Pers. Commun. 2022, 123, 437–471. [Google Scholar] [CrossRef]
Dong, S.; Xia, Y.; Peng, T. Network abnormal traffic detection model based on semi-supervised deep reinforcement learning. IEEE Trans. Netw. Serv. Manag. 2021, 18, 4197–4212. [Google Scholar] [CrossRef]
Farhan, B.I.; Jasim, A.D. Performance analysis of intrusion detection for deep learning model based on CSE-CIC-IDS2018 dataset. Indones. J. Electr. Eng. Comput. Sci. 2022, 26, 1165–1172. [Google Scholar] [CrossRef]
Farhan, R.I.; Maolood, A.T.; Hassan, N. Performance analysis of flow-based attacks detection on CSE-CIC-IDS2018 dataset using deep learning. Indones. J. Electr. Eng. Comput. Sci. 2020, 20, 1413–1418. [Google Scholar] [CrossRef]
Lin, P.; Ye, K.; Xu, C.Z. Dynamic network anomaly detection system by using deep learning techniques. In Proceedings of the Cloud Computing–CLOUD 2019: 12th International Conference, Held as Part of the Services Conference Federation, SCF 2019, San Diego, CA, USA, 25–30 June 2019; Proceedings 12. Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 161–176. [Google Scholar]
Liu, G.; Zhang, J. CNID: Research of network intrusion detection based on convolutional neural network. Discret. Dyn. Nat. Soc. 2020, 2020, 4705982. [Google Scholar] [CrossRef]
Di Mauro, M.; Galatro, G.; Liotta, A. Experimental review of neural-based approaches for network intrusion management. IEEE Trans. Netw. Serv. Manag. 2020, 17, 2480–2495. [Google Scholar] [CrossRef]
Jahangir, M.T.; Wakeel, M.; Asif, H.; Ateeq, A. Systematic Approach to Analyze The Avast IOT-23 Challenge Dataset For Malware Detection Using Machine Learning. In Proceedings of the 2023 18th International Conference on Emerging Technologies (ICET), Peshawar, Pakistan, 6–7 November 2023; pp. 234–239. [Google Scholar]
Balaji, R.; Deepajothi, S.; Prabaharan, G.; Daniya, T.; Karthikeyan, P.; Velliangiri, S. Survey on intrusions detection system using deep learning in iot environment. In Proceedings of the 2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS), Erode, India, 7–9 April 2022; pp. 195–199. [Google Scholar]
Garcia, S.; Parmisano, A.; Erquiaga, M.J. IoT-23: A Labeled Dataset with Malicious and Benign IoT Network Traffic. Zenodo. 2021. Available online: https://zenodo.org/records/4743746 (accessed on 20 February 2024).
Abdalgawad, N.; Sajun, A.; Kaddoura, Y.; Zualkernan, I.A.; Aloul, F. Generative deep learning to detect cyberattacks for the IoT-23 dataset. IEEE Access 2021, 10, 6430–6441. [Google Scholar] [CrossRef]
Patro, S.G.; Sahu, D.-K.K. Normalization: A preprocessing stage. Int. Adv. Res. J. Sci. Eng. Technol. 2015, 2, 20–22. [Google Scholar] [CrossRef]
Rodríguez, P.; Bautista, M.A.; Gonzalez, J.; Escalera, S. Beyond one-hot encoding: Lower dimensional target embedding. Image Vis. Comput. 2018, 75, 21–31. [Google Scholar] [CrossRef]
Jie, L.; Jiahao, C.; Xueqin, Z.; Yue, Z.; Jiajun, L. One-hot encoding and convolutional neural network based anomaly detection. J. Tsinghua Univ. Sci. Technol. 2019, 59, 523–529. [Google Scholar]
Elmasry, W.; Akbulut, A.; Zaim, A.H. Empirical study on multiclass classifcation-based network intrusion detection. Comput. Intell. 2019, 35, 919–954. [Google Scholar] [CrossRef]
Bagui, S.; Li, K. Resampling imbalanced data for network intrusion detection datasets. J. Big Data. 2021, 8, 1–41. [Google Scholar] [CrossRef]
Mbow, M.; Koide, H.; Sakurai, K. Handling class imbalance problem in intrusion detection system based on deep learning. Int. J. Netw. Comput. 2022, 12, 467–492. [Google Scholar] [CrossRef]
EL-Habil, B.Y.; Abu-naser, S.S. Global climate prediction using deep learning. J. Theor. Appl. Inf. Technol. 2022, 100, 24. [Google Scholar]
Zhendong, S.; Jinping, M. Deep learning-driven MIMO: Data encoding and processing mechanism. Phys. Commun. 2022, 57, 101976. [Google Scholar] [CrossRef]
Xin, Z.; Chunjiang, Z.; Jun, S.; Kunshan, Y.; Min, X. Detection of lead content in oilseed rape leaves and roots based on deep transfer learning and hyperspectral imaging technology. Spectroch Acta Part. A Mol. Biomol. Spectrosc. 2022, 290, 122288. [Google Scholar] [CrossRef]
Hesham, K.; Mashaly, M. Advanced Hybrid Transformer-CNN Deep Learning Model for Effective Intrusion Detec-tion Systems with Class Imbalance Mitigation Using Resampling Techniques. Future Internet 2024, 16, 481. [Google Scholar] [CrossRef]
Novaria, K.Y.; Nurmaini, S.; Stiawan, D.; Zarkasi, A. Automatic features extraction using autoencoder in intrusion detection system. In Proceedings of the 2018 International Conference on Electrical Engineering and Computer Science (ICECOS), Pangkal, Indonesia, 2–4 October 2018; pp. 219–224. [Google Scholar]
Hesham, K.; Mashaly, M. Enhanced Hybrid Deep Learning Models-Based Anomaly Detection Method for Two-Stage Binary and Multi-Class Classification of Attacks in Intrusion Detection Systems. Algorithms 2025, 18, 69. [Google Scholar] [CrossRef]
Anupriya, G.; Majumdar, A. Discriminative autoencoder for feature extraction: Application to character recognition. Neural Process. Lett. 2019, 49, 1723–1735. [Google Scholar]
Chen, X.; Ma, L.; Yang, X. Stacked denoise autoencoder based feature extraction and classification for hyperspectral images. J. Sens. 2016, 2016, 3632943. [Google Scholar]
Sarhan, M.; Layeghy, S.; Portmann, M. Towards a standard feature set for network intrusion detection system datasets. Mob. Netw. Appl. 2022, 27, 357–370. [Google Scholar] [CrossRef]
Veeramreddy, J.; Prasad, K. Anomaly-Based Intrusion Detection System. In Computer and Network Security; Alexandrov, A.A., Ed.; IntechOpen: Rijeka, Croatia, 2019. [Google Scholar] [CrossRef]
Chen, C.; Song, Y.; Yue, S.; Xu, X.; Zhou, L.; Lv, Q.; Yang, L. FCNN-SE: An Intrusion Detection Model Based on a Fusion CNN and Stacked Ensemble. Appl. Sci. 2022, 12, 8601. [Google Scholar] [CrossRef]
Ahmed, A.; Mashaly, M. Addressing the class imbalance problem in network intrusion detection systems using data resampling and deep learning. J. Supercomput. 2023, 79, 10611–10644. [Google Scholar]
Powers, D.M.W. Evaluation: From Precision, Recall, and F-Measure to ROC, Informedness, Markedness & Correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar]
Hesham, K.; Mashaly, M. Improving Anomaly Detection in IDS with Hybrid Auto Encoder-SVM and Auto Encoder-LSTM Models Using Resampling Methods. In Proceedings of the 2024 6th Novel Intelligent and Leading Emerging Sciences Conference (NILES), Giza, Egypt, 19–21 October 2024; pp. 34–39. [Google Scholar]
Tamer, A.A.; Mostafa, Y.; El-khaleq, A.A.; Mashaly, M. Anomaly-based intrusion detection system using one-dimensional convolutional neural network. Procedia Comput. Sci. 2023, 220, 78–85. [Google Scholar]
Khalid, M.H.; Awad, A.I.; Khashaba, M.M.; Mohamed, E.R. New improved multi-objective gorilla troops algorithm for dependent tasks offloading problem in multi-access edge computing. J. Grid Comput. 2023, 21, 21. [Google Scholar]
Vanessa, M.; Schacht, S.; Lanquillon, C. Towards energy-efficient deep learning: An overview of energy-efficient approaches along the deep learning lifecycle. arXiv 2023, arXiv:2303.01980. [Google Scholar]

Figure 1. System architecture for both binary and multi-class classification tasks utilizing the IoT-23 dataset.

Figure 2. Design of the two-stage process of binary and multi-class classification.

Figure 3. CNN model layer configuration for binary classification tasks (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

Figure 4. CNN model layer configuration for multi-class classification excluding normal class (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

Figure 5. CNN model layer configuration for multi-class classification including normal class (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

Figure 6. Binary classification Autoencoder layer configuration (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

Figure 7. Multi-class Autoencoder layer configuration (excluding normal) (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

Figure 8. Multi-class Autoencoder layer configuration (including normal) (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

Figure 9. Binary classification DNN layer configuration (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

Figure 10. Multi-class DNN layer configuration (excluding normal) (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

Figure 11. Multi-class DNN layer configuration (including normal) (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

Figure 12. Binary classification CNN-MLP layer configuration (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

Figure 13. Multi-class CNN-MLP layer configuration (excluding normal) (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

Figure 14. Multi-class CNN-MLP layer configuration (including normal) (a) Iot-23 dataset (b) NF-BoT-IoT-v2 dataset.

Figure 15. Confusion matrix for binary classification with the CNN-MLP model on (a) IoT-23 dataset (b) NF-BoT-IoT-v2 dataset.

Figure 16. Comparison of the proposed CNN-MLP model with binary classifiers on IoT-23 dataset.

Figure 17. Comparison of the proposed CNN-MLP model with binary classifiers on NF-BoT-IoT-v2 dataset.

Figure 18. Confusion matrix for multi-class classification (excluding the normal class) with the CNN-MLP model on the IoT-23 dataset.

Figure 19. Confusion matrix for multi-class classification (excluding the normal class) with the CNN-MLP model on NF-BoT-IoT-v2 dataset.

Figure 20. Comparison of the proposed CNN-MLP model with multi-class classifiers excluding normal class on IoT-23 dataset.

Figure 21. Comparison of the proposed CNN-MLP model with multi-class classifiers excluding normal class on NF-BoT-IoT-v2 dataset.

Figure 22. Confusion matrix for multi-class classification (including the normal class) with the CNN-MLP model on the IoT-23 dataset.

Figure 23. Confusion matrix for multi-class classification (including the normal class) with the CNN-MLP model on NF-BoT-IoT-v2 dataset.

Figure 24. Proposed CNN-MLP versus multi-class classifiers including normal class on IoT-23 dataset.

Figure 25. Proposed CNN-MLP versus multi-class classifiers including normal class on NF-BoT-IoT-v2 dataset.

Figure 26. Confusion matrix of the CNN-MLP model on the IoT-23 dataset, focusing on Zero-Day attack detection.

Figure 27. Confusion matrix of the CNN-MLP model on the IoT-23 dataset, highlighting Zero-Day attack detection using synthetic data.

Table 1. Overview of related work in binary classification.

Author	Dataset	Year	Utilized Technique	Accuracy	Contribution	Limitations
Anandaraj Mahalingam et al. [11]	IoT-23	2023	ROAST-IoT	99.15%	This research unveils ROAST-IoT, an intelligent AI-powered architecture meticulously crafted to fortify intrusion detection mechanisms within the intricate landscape of IoT environments. The framework leverages a sophisticated multi-modal architecture to intricately decipher and encapsulate the nuanced interdependencies embedded within diverse network traffic patterns. Intelligent sensors continuously scrutinize system dynamics, transmitting behavioral telemetry to a cloud-based repository for in-depth analytical processing. The framework’s efficacy is rigorously validated using reference datasets of established significance, such as IoT-23, Edge-IIoT, ToN-IoT, and UNSW-NB15, ensuring comprehensive evaluation and assessment.	The study’s constraints encompass the imperative for integrating more advanced deep learning paradigms to further fortify IIoT infrastructures against the ever-evolving landscape of sophisticated cybersecurity risks.
Mohamed ElKashlan et al. [12]	IoT-23	2023	Filtered classifier	99.2%	This study unveils a precision-engineered classification paradigm, leveraging advanced machine learning methodologies to meticulously dissect and identify nefarious traffic patterns within the intricate and dynamic fabric of IoT ecosystems. The proposed system utilizes a genuine IoT dataset based on real IoT traffic, and various classification algorithms are assessed for performance.	A key constraint of this research lies in its exclusive reliance on the IoT-23 dataset, potentially restricting its applicability across the full spectrum of attack scenarios within IoT-driven EVCS ecosystems. Prospective endeavors should encompass the integration of expansive datasets alongside cutting-edge deep learning architectures to facilitate a more exhaustive and nuanced evaluation of the framework’s efficacy.
João Vitorino et al. [13]	IoT-23	2023	RF, XGB, LGBM, and IFOR	99%	This research intricately defines the essential parameters for crafting a truly authentic adversarial cyber intrusion, emphasizing the need for an attack of unparalleled realism. It further introduces a comprehensive methodology for executing a robust analysis of system defenses, leveraging a nuanced adversarial evasion strategy engineered to seamlessly bypass security measures. The methodology put forth was meticulously applied to probe the fortitude of three prominent supervised learning models: RF, XGB, and LGBM, combined with the autonomous IFOR mechanism to rigorously assess their durability in complex and high-stress environments.	A fundamental limitation of this work is the realization that, while adversarial training bolstered the models’ defense against attacks, certain algorithms, Notably, LGBM exhibits pronounced susceptibility to adversarial perturbations, particularly within the realm of imbalanced multi-class categorization. This underscores the urgent imperative for an in-depth investigation into advanced fortification strategies, coupled with rigorous evaluation on novel datasets and the adaptation to emerging attack paradigms.
Trifa S. Othman and Saman M. Abdullah [14]	IoT-23	2023	ANN	99%	This research unveils a triad of advanced computational intelligence paradigms, meticulously architected for dual-phase classification encompassing both binary and multi-class categorization. These methodologies serve as the cornerstone of an intrusion detection mechanism, meticulously engineered to fortify the intricate and ever-evolving landscape of interconnected IoT infrastructures. These methodologies are strategically harnessed to identify a spectrum of cyber intrusions targeting IoT ecosystems, facilitating precise differentiation and categorization of their distinctive attack typologies. By harnessing the state-of-the-art IoT-23 dataset, the research develops an advanced intelligent IDS capable of identifying malicious activities and categorizing attack vectors in real-time, enhancing security for IoT networks.	One significant constraint uncovered in this research is the inadequacy of the SMOTE technique in enhancing the predictive performance of the suggested IIDS framework when applied to the IoT-23 dataset, even though it is conventionally effective in addressing class imbalances within datasets.
Abdallah R. Gad et al. [15]	ToN-IoT	2020	XGBoost	98.2%	This work confronts the challenge by unveiling an innovative, data-centric IoT/IIoT dataset, meticulously structured with annotated ground truth to distinctly delineate normal and attack classes, thereby refining the precision of intrusion detection mechanisms. Additionally, it incorporates a type attribute that categorically identifies various attack sub-classes, thereby facilitating the execution of multi-class classification and enriching the depth of threat analysis. The TON_IoT dataset encompasses an extensive spectrum of telemetry insights, meticulously extracted from IoT and IIoT infrastructures, integrating system-generated System-generated event records and complex network communication flows, providing a rich and multifaceted foundation for analysis. This corpus of data emerges from a meticulously orchestrated emulation of an intermediate-scale network infrastructure, conducted within the state-of-the-art Cyber Range and IoT Research Facilities at UNSW Canberra, Australia.	This study uncovers fundamental obstacles, notably profound disparities in class representation and the presence of incomplete data within the ToN-IoT dataset, despite the use of Chi2 for feature selection and SMOTE for class balancing, which may affect the generalizability and scalability of the model in real-world applications.
Sami Yaras and Murat Dener [16]	ToN-IoT	2024	CNN-LSTM	98.75%	This research was executed within the Colab environment, leveraging the power of PySpark and Apache Spark framework, in conjunction with the Keras deep learning suite and the Scikit-Learn machine learning toolkit. The ‘CICIoT2023’ and ‘TON_IoT’ datasets functioned as the critical resources for both training and evaluating the model’s performance. Feature reduction was performed using the correlation method to ensure only the most relevant features were included. The scholars engineered a synergistic deep learning paradigm, seamlessly fusing a 1D-CNN with a LSTM architecture to elevate predictive efficacy and optimize model robustness	A significant shortcoming of this research lies in its inherent constraints, which include, although it has yielded exceptional accuracy, the sheer scale of the data processed led to considerable increases across the entire span of both the learning and evaluation phases. This underscores the critical necessity for forthcoming innovations in optimization to harmonize the quest for superior precision with the essential demands of computational efficiency and economic viability.
João Vitorino et al. [13]	ToN-IoT	2023	RF, XGB, LGBM, and IFOR	85%	This research meticulously delineates the core underpinnings constraints necessary to craft a sophisticated malicious cyber-attack of unparalleled realism, and concurrently proposes a methodical framework for executing a thorough robustness analysis through the strategic application of a targeted distortion, grounded in practical execution. This strategy was meticulously executed to rigorously examine the fortitude of three eminent supervised learning paradigms, RF, XGB, and LGBM, along with the unsupervised IFOR model, systematically assessing their resilience when subjected to formidable adversarial conditions.	A fundamental limitation of this research lies in the observation that, while adversarial training strengthened the models’ defenses against attacks, certain algorithms, notably LGBM, still demonstrate considerable susceptibility to adversarial perturbations within the realm of disproportionate multi-category classification. This emphasizes the urgent need for a more profound investigation into advanced defense methodologies, coupled with extensive validation on novel datasets and the adaptation to emerging, more complex attack strategies.
Osama Faker and Erdogan Dogdu [17]	CICIDS2017	2019	DNN	99.9%	This research pioneers a transformative leap in intrusion detection by intricately synthesizing advanced deep learning frameworks with comprehensive big data analytical frameworks, harnessing the capabilities of RF, GBT, and a DFFNN to significantly refine detection performance and operational efficacy. It rigorously assesses feature importance, performs comprehensive assessments on the UNSW-NB15 and CICIDS2017 datasets through a rigorous 5-fold cross-validation framework, intricately fuses Keras with Apache Spark, and strategically leverages ensemble learning techniques to amplify predictive efficacy.	The work falls short in addressing scalability concerns related to distributed processing and offers limited exploration of advanced feature selection techniques.
R. Vinayakumar et al. [18]	CICIDS2017	2019	DNN	93.1%	This research explores the application of DNNs to create an adaptable IDS for identifying and categorizing emerging cyber threats, while assessing their performance against conventional machine learning classifiers using standard benchmark datasets	A notable shortcoming is the inadequate assessment of scalability and performance in distributed environments, especially in relation to sophisticated deep neural network structures.
Kaniz Farhana et al. [19]	CICIDS2017	2020	DNN	99%	This work unveils an IDS powered by DNN, meticulously developed within the Keras framework, operating seamlessly within the TensorFlow ecosystem. The model was implemented on a contemporary, significantly imbalanced dataset comprising 79 distinct features. The dataset contains details at the granular level of network packets, statistics based on flow, and supplementary data, with some classes notably having limited representation.	The model’s failure to accurately categorize ‘Heartbleed’, ‘Infiltration’, and ‘Web Attack SQL Injection’ underscores the difficulties introduced by class imbalance, which primarily arises due to the inadequate representation of these attack categories within the dataset.
Chongzhen Zhang et al. [20]	CICIDS2017	2021	SAE	99.92%	This study unveils a comprehensive and resilient IDS architecture, carefully structured into five fundamental components: data preprocessing, AE processing, database integration, classification, and dynamic feedback. AE performs data compression, meanwhile, the classification component generates the outcomes of the detection process. The database proficiently retains the compacted features, streamlining future assessments and model refinement.	The system’s recovery and re-training capabilities require further development to boost flexibility and efficiency.
Mohammad A. Alsharaiah et al. [21]	UNSW-NB15	2024	AT-LSTM	92.2%	This work presents a new NIDS approach that leverages LSTM and attention mechanisms to analyze both the time-dependent and spatial patterns within network data. The UNSW-NB15 dataset is utilized, with varied segmentations meticulously assigned for both the training and evaluation stages to ensure comprehensive model assessment and robustness	Complicated structural design Even though the AT-LSTM model demonstrates exceptional performance in terms of accuracy on the UNSW-NB15 dataset, it falls short of addressing the inherent class imbalance and does not include performance evaluations on alternative datasets, such as NSL-KDD, which could hinder its ability to generalize and maintain effectiveness across diverse environments.
Mohammed Jouhari et al. [22]	UNSW-NB15	2024	CNN-BiLSTM	97.90%	This investigation unveils a highly efficient IDS model that intricately merges BiLSTM with a compact CNN and employs dimensionality reduction techniques to optimize the model’s structure and performance	Complicated structural design This research predominantly focused on refining the IDS model to meet computational limitations, which may have constrained its investigation into other vital dimensions of the system’s capabilities, focusing on aspects such as the model’s capacity for broader generalization and its robustness when applied to a diverse range of datasets.
Fuat Turk [23]	UNSW-NB15	2023	RF	98.6%	In this research, a comprehensive intrusion identification evaluation was carried out on the UNSW-NB15 dataset, reaching an impressive 98.6% accuracy for two-class classification and 98.3% for multi-category classification, facilitated by the deployment of advanced machine learning and deep learning techniques.	Attack classes are occasionally misclassified, indicating a need for better dataset balancing and instantaneous model adjustments to improve effectiveness.
Osama Faker and Erdogan Dogdu [17]	UNSW-NB15	2019	DNN	99.16%	This study undertakes a thorough analysis of machine learning methodologies through the application of 5-fold cross-validation, utilizes ensemble techniques enhanced by Apache Spark, and seamlessly combines the capabilities of deep learning by linking Apache Spark with Keras	This research omits a comprehensive evaluation of the scalability of processing across multiple systems and fails to explore sophisticated Techniques for selecting relevant features
Pramita Sree Muhuri et al. [24]	NSL KDD	2020	LSTM-RNN	96.51%	This paper introduces an innovative intrusion detection strategy by combining RNN with LSTM and utilizing a genetic algorithm to identify the most optimal features. It reveals that LSTM-RNN categorization algorithms, when equipped with the appropriate features, improve intrusion detection on the NSL-KDD dataset.	Confined to binary (two-class) classification Intricate structure The research omitted any analysis of training time or evaluation on a GPU-accelerated platform, being entirely dependent on the NSL-KDD dataset. Consequently, this approach may overlook the model’s behavior and efficiency when applied to more current datasets or live network traffic conditions.
Yanfang Fu et al. [25]	NSL KDD	2022	CNN and BiLSTMs	90.73%	This study unveils DLNID, an advanced framework designed to identify irregularities in network traffic, which leverages the synergy of an attention mechanism integrated with Bi-LSTM, thereby significantly enhancing the precision of anomaly detection. The model harnesses CNN for effective feature extraction, incorporates attention mechanisms to optimize channel weights, and employs Bi-LSTM to learn sequential features, thereby refining its ability to detect anomalies.	Confined to binary (two-class) classification Intricate structure The DLNID model shows enhanced performance when evaluated on the KDDTest+ dataset, yet it remains untested in actual operational environments or for real-time intrusion detection, which restricts its broader applicability.
WEN XU et al. [26]	NSL KDD	2021	Autoencoder	90.61%	This work presents an innovative 5-layer autoencoder model for network anomaly detection, supported by a thorough assessment of its performance indicators.	Confined to binary (two-class) classification While the 5-layer AE-driven model exhibits commendable efficiency on the NSL-KDD dataset, its effectiveness in real-world applications, as well as across a broader range of intrusion types and diverse datasets, remains untested and unexplored.
Jihoon Yoo et al. [27]	NSL KDD	2021	CNN	83%	This investigation delves into a CNN-based classifier designed to tackle the issue of class imbalance in network traffic datasets. The technique involves preparing and transforming data prior to analysis, including steps such as data cleaning, feature engineering, and restructuring, to optimize the model’s performance and accuracy.	Suboptimal performance Confined to two-class classification Despite enhancing classification efficacy for select categories and streamlining computational intricacy, the proposed methodology demonstrates suboptimal proficiency for specific classifications and may prove insufficient in comprehensively mitigating the challenges posed by class distribution disparities.
Saud Alzughaibi and Salim El Khediri [28]	CSE-CIC-IDS2018	2023	MLP-BP, MLP-PSO	98.97%	This work studies cloud IDS advancement via two new deep neural nets. One uses MLP with BP, the other PSO. These models seek to boost IDS efficacy, improving intrusion detection and response, while ensuring adaptive, fast action.	Suboptimal accuracy Intricate structure This work achieves high prediction accuracy using MLP-BP and MLP-PSO, yet lacks real-world or cloud tests. Future studies exploring diverse optimization methods may further improve findings.
Ram B. Basnet et al. [29]	CSE-CIC-IDS2018	2019	MLP	98.68%	This study performs a detailed evaluation of deep learning models for intrusion detection. It systematically compares frameworks such as Keras, TensorFlow, Theano, fast.ai, and PyTorch, employing the CSE-CIC-IDS2018 dataset to achieve a solid empirical analysis.	This research underscores the exceptional efficacy and computational efficiency of fast.ai; however, its validation remains confined to a singular dataset, lacking evaluation across diverse datasets or deployment within practical network deployment scenarios. Prospective research should emphasize meticulous hyperparameter optimization and the investigation of different methodologies within deep learning.
T. Thilagam and R. Aruna [30]	CSE-CIC-IDS2018	2021	RC-NN-IDS	94%	This research unveils a highly intricate IDS, architected with a meticulously tailored RC-NN framework, synergistically fine-tuned via the ALO algorithm to profoundly amplify detection precision and operational efficacy.	Complicated architecture The RC-NN model proposed surpasses current classifiers in intrusion detection but lacks a management module for initiating proactive countermeasures following identification.
Fahimeh Farahnakian and Jukka Heikkonen [31]	KDD-CUP’99	2018	DAE	96.53%	This study introduces a novel IDS, rooted in the DAE framework, to tackle this critical issue. The model’s core strength lies in its sequential, layer-wise training strategy, meticulously designed to minimize overfitting and navigate past local optima. This refined training process fundamentally enhances the system’s robustness and significantly improves its capacity for accurate and effective intrusion detection.	KDD99 utilized this dataset, which exhibits a higher degree of redundancy. The proposed Deep DAE achieves high accuracy but does not investigate the advantages of sparsity constraints or alternative deep learning methods for further enhancement.
Hafza A. Mahmood and Soukaena H. Hashem [32]	KDD-CUP’99	2017	HNB	97%	This research suggests employing an HNB Classifier to tackle DoS attacks. The HNB model, an advanced data mining technique, relaxes the conditional independence constraint of the NB classifier. It integrates HNB with discretization and feature selection to enhance performance and minimize processing time by optimizing feature relevance.	KDD99 utilized this dataset, which exhibits a higher degree of redundancy. The proposed system demonstrates high accuracy in detecting DoS attacks using specific feature selections from the KDD Cup 99 dataset. Nonetheless, it overlooks constraints related to the NSL-KDD dataset and fails to address how variations in feature selection could impact effectiveness across different cloud computing settings.
Mirza M. Baig et al. [33]	KDD-CUP’99	2017	ANNs	98.25%	This work pioneers a methodology for developing a highly durable classifier, leveraging a cascading architecture of boosting-reinforced neural networks, and rigorously demonstrates its performance using two distinct intrusion detection data repositories. This strategy, resembling the one-vs-rest approach but augmented with additional filtering of examples, leads to improved classifier effectiveness.	The KDD99 dataset, which is characterized by a greater level of redundancy, was employed in this study. This study observes the proposed method’s adeptness with dominant KDD’99 classes, but a marked performance decline with sparse classes. Additionally, its diminished performance on UNSW-NB15 emphasizes the acute requirement for refined techniques to manage class imbalance and a more comprehensive evaluation across diverse data domains.

Table 2. Overview of related work in multi-class classification.

Author	Dataset	Year	Utilized Technique	Accuracy	Contribution	Limitations
Mohamed ElKashlan et al. [12]	IoT-23	2023	Filtered classifier	99.2%	This research pioneers a sophisticated machine learning-based framework, engineered for the precise identification of malicious traffic within the complex landscape of IoT networks. It employs a uniquely curated, highly representative dataset, meticulously designed to mirror authentic, real-world IoT traffic dynamics, and subsequently conducts a comprehensive, comparative performance assessment across a diverse spectrum of state-of-the-art classification algorithms.	This study’s scope is limited by its exclusive reliance on the IoT-23 dataset, potentially overlooking diverse threat vectors present in real-world IoT-EVCS scenarios. Future work should expand its analysis to include a richer array of datasets and leverage advanced deep learning techniques to achieve a more comprehensive evaluation.
Nicolas-Alin Stoian [34]	IoT-23	2020	RF	99.5%	This research delves into the critical area of IoT network security, meticulously evaluating the capacity of machine learning algorithms to discern anomalous patterns within intricate network data streams, thereby enhancing network resilience. It evaluates various ML algorithms that have demonstrated success in comparable contexts and performs a comparative analysis based on multiple parameters and methodologies.	The study’s limitations involve data splitting and encoding issues, potential skewing from correlation elimination, and mixed results with the Multi-Layer Perceptron. Future work should focus on using the complete dataset, assessing minimal data requirements, comprehending the Decision Tree’s exceptional precision, and evaluating cutting-edge neural network models.
Bambang Susilo and Riri Fitri Sari [35]	IoT-23	2020	CNN	91.24%	This investigation examines diverse machine learning and deep learning strategies, paired with conventional datasets, to fortify the security of IoT systems. An algorithm specifically designed for detecting DoS attacks has been developed using DL methods.	The study’s limitations include the lack of exploration of various algorithms beyond RF, CNN, and MLP, and the need for further research to optimize batch sizes and develop models that combine different machine learning or deep learning algorithms for real-time intrusion detection.
Mateusz Szczepański et al. [36]	IoT-23	2022	RF	96.30%	This research confronts the problem of effectively managing absent or incomplete data in real-world applications of computational intelligence. It presents two experimental studies that evaluate various methods for imputing missing values in Random Forest classifiers, which were trained using contemporary cyber security benchmark datasets, this includes the use of CICIDS2017 and IoT-23 datasets.	The research’s limitation is its focus on contrasting imputation techniques without thoroughly exploring how deep learning imputation affects different machine learning classifiers; future research should address this gap and investigate explainability methods and autoencoder latent representations.
Abdallah R. Gad et al. [15]	ToN-IoT	2020	XGBoost	97.8%	This work innovates by delivering a highly detailed, data-centric IoT/IIoT dataset, meticulously annotated with ground truth labels for precise normal/attack differentiation. It further incorporates attack subtype categorization, enabling nuanced multi-class analysis. The TON_IoT dataset aggregates telemetry from real-world IoT/IIoT services, system logs, and network traffic, all captured from a genuine, medium-scale network at UNSW Canberra’s Cyber Range and IoT Labs	This research is constrained by inherent data complexities: class imbalance and missing data points within the ToN-IoT dataset. Despite implementing Chi2 feature selection and SMOTE for mitigation, these factors may still significantly impede the model’s ability to scale effectively and achieve robust generalization in authentic, operational deployments.
Prahlad Kumar et al. [37]	Bot-IOT	2021	DT, RF, KNN, NB, and ANN	99.6%	This research conducts a rigorous, dual-pronged investigation into DoS and DDoS attack vectors, employing both Machine Learning and Deep Learning paradigms. The study’s primary training data is the Bot-IoT dataset, originating from the UNSW Canberra Cyber Centre. To achieve granular feature extraction from UNSW dataset’s pcap files, the ARGUS software was deployed. This process facilitates an in-depth dissection of attack signatures, enabling precise identification and hierarchical classification of malicious actions within IoT environments.	This investigation establishes that while both machine learning and deep learning paradigms achieve demonstrable efficacy in discerning DoS and DDoS attack signatures, deep learning constructs demand a substantially elevated consumption of computational assets for optimal functionality, positioning them as ideal for environments with high resource availability. Conversely, machine learning models exhibit enhanced efficiency in scenarios where resources are constrained and data traffic remains modest.
Prabhat Kumar et al. [37].	ToN-IoT	2021	ANN	99.44%	This study pioneers a groundbreaking P2IDF, specifically engineered for Software-Defined IoT-Fog architectures, implementing a SAE to encode data into a form that effectively neutralizes inference-driven malicious intrusions. This study rigorously scrutinizes the operational effectiveness of an ANN driven intrusion detection paradigm, employing the ToN-IoT dataset, via a comparative analysis of its performance across both the unaltered and modified data configurations. The framework demonstrates a robust ability to detect attacks while maintaining stringent information confidentiality.	The research emphasizes that although the P2IDF structure surpasses contemporary methods in terms of detection accuracy and precision, upcoming efforts will concentrate on creating a real-time prototype to tackle privacy and security concerns in Software-Defined IoT-Fog networks.
Mohanad Sarhan et al. [38]	ToN-IoT	2022	DFF, RF	96.10%, 97.35%	This study meticulously scrutinizes the salience of attributes across six Network Intrusion Detection System (NIDS) data repositories, applying three divergent attribute selection techniques: Chi-squared, Mutual Information, and interrelation analysis. The chosen attribute subsets were then rigorously evaluated via Deep Feedforward Neural Networks and Random Forest classification algorithms, culminating in 414 comprehensive experimental runs. A pivotal revelation indicates that a diminished attribute subset can equal, or even surpass, the detection efficacy of the complete attribute set, thus highlighting the profound effectiveness of attribute selection in optimizing NIDS operational efficiency and diagnostic accuracy.	A crucial shortcoming highlighted in this research is the lack of a one-size-fits-all approach for identifying the most effective feature sets, given that feature significance can vary extensively across different datasets and classifiers. This necessitates a thorough, context-specific analysis for each situation. Additionally, certain unrealistic features, such as TTL-related attributes within synthetic datasets like UNSW-NB15, must be excluded to preserve the integrity and reliability of the assessment findings
Mohamed Amine Ferrag et al. [39]	Edge-IIoT	2022	DNN	94.67%	This work unveils Edge-IIoTset, a comprehensive cyber security dataset for the application of machine learning-based intrusion detection systems in IoT and IIoT contexts. Designed with a customized IoT/IIoT testbed, featuring a diverse array of devices, sensors, protocols, and cloud/edge setups, it supports both centralized and federated learning models, ensuring its relevance and applicability in real-world environments.	A fundamental constraint of the envisioned dataset lies in its principal objective of mitigating the inadequacies of prevailing repositories by embedding cutting-edge advancements and multi-tiered architectures. However, its efficacy and authenticity in serving as a robust benchmark for assessing machine learning-driven intrusion detection frameworks across heterogeneous, real-world threat landscapes and evolving technological paradigms necessitate rigorous substantiation.
Osama Faker and Erdogan Dogdu [17].	CICIDS2017	2019	DNN	99.56%	This study advances the field of intrusion detection by harnessing the synergy between Big Data analytics and Deep Learning methodologies, deploying Gradient Boosted Trees, Random Forest ensembles, and Deep Feed-Forward Neural Networks to fortify detection capabilities. It examines features using a homogeneity measure and performs evaluations utilizing the UNSW-NB15 and CICIDS2017 data repositories through a five-segment cross-validation methodology, employing Apache Spark and Keras.	The paper does not include an analysis of scalability for distributed systems or advanced feature selection techniques.
R. Vinayakumar et al. [18]	CICIDS2017	2019	DNN	95.6%	This research endeavors to engineer a supremely adaptable IDS, leveraging deep neural network architectures to discern and categorize dynamic cybernetic assaults. It evaluates diverse data repositories and algorithmic methodologies, contrasting Deep Neural Networks with traditional classification paradigms across standardized malware benchmarks to pinpoint the optimal technique for emergent threat recognition	This study exhibits restricted capacity for expansion and operational efficacy assessment within decentralized architectures and sophisticated Deep Neural Networks.
Kaniz Farhana et al. [19].	CICIDS2017	2020	DNN	99%	This study fabricates an IDS predicated upon a deep neural network topology, authenticated against a modern, non-uniform dataset comprising 79 unique descriptors. Assembled utilizing the Keras and TensorFlow computational libraries, the model examines data at the packet level, flow-derived metrics, and pertinent supplementary information.	This study’s deficiency in recognizing ‘Heartbleed’, ‘Infiltration’, and ‘Web Attack SQL Injection’ underscores the challenge of managing class disparity, originating from the inadequate quantity of exemplars representing these particular intrusions.
Razan Abdulhammed et al. [40].	CICIDS2017	2019	PCA, RF, LDA and QDA	99.6%	This study applies PCA for feature dimensionality reduction, followed by utilizing the reduced features to train multiple classifiers, including RF, LDA, and QDA, for an IDS.	This investigation fails to confront the exigency for immediate alterations to training data repositories or the formulation of IDS architectures that possess adaptive dynamism and continuous knowledge acquisition for online network intrusion identification.
Osama Faker and Erdogan Dogdu [17].	UNSW-NB15	2019	DNN	97.01%	This investigation performs a valuation of machine learning paradigms via a quintuple-partition cross-validation procedure, merging Keras with Apache Spark for deep learning execution, and employing Apache Spark MLlib for ensemble-based algorithmic frameworks.	This work disregards the scrutiny of growth potential within distributed computational systems and the integration of refined processes for feature extraction.
A. M. ALEESA et al. [41]	UNSW-NB15	2021	ANN	99.59%	This investigation appraised the operational proficiency of deep learning paradigms for both dual-class and multi-categorical classification through the utilization of a reformed data compilation, integrating all data into one file and forming new multi-class labels derived from attack families.	This work remains confined to regulated experimentation employing the UNSW-NB15 dataset and precludes the instantiation of deep learning paradigms in operational, real-world deployments.
Muhammad Ahmad et al. [42]	UNSW-NB15	2021	RF	97.37%	This research posits the utilization of attribute agglomerations associated with Flow, TCP, and MQTT within the UNSW-NB15 dataset to alleviate predicaments such as category imbalance, excessive dimensionality, and model overfitting, concurrently implementing ANN, SVM, and RF for classification objectives.	This study’s accentuation on specific communication paradigms and data interpolation techniques may hinder the model’s potential for comprehensive adaptability to divergent IoT protocol ecosystems and data compilations.
Fuat Turk [23].	UNSW-NB15	2023	RF	98.3%	This work employs advanced computational learning and deep neural network methods to discern security violations within the UNSW-NB15 and NSL-KDD data repositories, attaining 98.6% correctness in binary categorization and 98.3% accuracy in poly-class labeling on the UNSW-NB15 data compilation.	Occasional misclassification of class types suggests the need for improved dataset equalization and continuous model refinements to boost performance.
Bilal Mohammed, Ekhlas K. Gbashi [43].	NSL KDD	2021	RNN	94%	The study uses RFE to select features and applies both DNN and RNN for categorization, attaining 94% accuracy across five distinct categories using the RNN learning model.	Restricted to classification across various categories The approach performs well on the NSL-KDD dataset but needs further evaluation on other datasets, exploration of different feature selection techniques, and consideration for real-time implementation.
Muhammad Basit Umair et al. [44]	NSL KDD	2022	Multilayer CNN-LSTM	99.5%	In response to the shortcomings of traditional techniques, this work introduces a data-driven method for detecting unauthorized activities. It involves the process of extracting relevant features, classification through a multi-layered CNN with softmax, followed by additional classification using a multi-layered DNN.	Restricted to classification across various categories Intricate structure The suggested IDS shows excellent accuracy and robust performance metrics, However, its performance has not been tested on a variety of datasets or under real-world scenarios, which could hinder its capacity to adapt to different conditions.
Padideh Choobdar et al. [45]	NSL KDD	2021	Sparse Stacked Auto-Encoders	98.5%	This study outlines the creation of a controller component tailored for an intrusion detection system based on SDN principles, incorporating an initial pre-training phase using sparse stacking autoencoders, followed by training with a softmax classifier, and concludes with the fine-tuning of parameters to optimize performance.	Restricted to classification across various categories The suggested intrusion detection system demonstrates exceptional accuracy but lacks testing on distributed SDN networks and advanced DL methods like GANs, and could benefit from enhanced hardware for quicker training.
Shi Dong et al. [46]	NSL KDD	2021	SSDDQN	79.43%	This study presents a semi-supervised optimization method for network anomaly traffic detection, based on a DDQN approach, which is a prominent technique in Deep Reinforcement Learning. In the proposed SSDDQN model, the current network utilizes an autoencoder to reconstruct traffic features, followed by a deep neural network classifier. The target network, on the other hand, first applies the unsupervised K-Means clustering algorithm and then employs a deep neural network for prediction.	The SSDDQN model faces challenges in detecting the lowest number of U2R abnormal attack traffic and struggles with identifying certain unknown attacks, which limits its overall detection capability.
Ram B. Basnet et al. [29]	CSE-CIC-IDS2018	2019	MLP	98.31%	This investigation scrutinizes a broad assortment of deep neural network methodologies for network security breach recognition, evaluating numerous computational platforms, including Keras, TensorFlow, Theano, fast.ai, and PyTorch, while leveraging the CSE-CIC-IDS2018 data compilation for assessment purposes	This research accentuates fast.ai’s outstanding performance and functional effectiveness, however, it remains unvalidated across disparate data compilations or real-world application contexts. Subsequent studies should focus on optimizing hyperparameters and investigating a broader range of deep learning approaches.
Baraa Ismael Farhan and Ammar D. Jasim [47]	CSE-CIC-IDS2018	2022	LSTM	99%	This study recognizes the intensifying imperative for robust digital protective measures, consequently the necessity for efficient network surveillance becomes paramount. This investigation utilizes deep learning algorithms upon the CSE-CIC-IDS2018 data repository, achieving 99% detection accuracy via a LSTM paradigm for discerning network security breaches	Restricted to classification across various categories This postulated LSTM security incursion detection construct attains significant precision, but encounters impediments concerning data repository disequilibrium and considerable extent, both of which might impact operational efficacy and intricate model formulation
Rawaa Ismael Farhan et al. [48]	CSE-CIC-IDS2018	2020	DNN	90%	This research comprehensively assesses their DNN construct, which has attained a remarkable detection correctness of roughly 90%, exhibiting substantial capacity for subsequent refinement and operational enhancement	Restricted to classification across various categories The DNN model proposed for flow-based intrusion detection attains an impressive 90% accuracy; however, it grapples with issues stemming from vast data size, elevated dimensionality, and intricate data preprocessing. Addressing these hurdles will require meticulous feature selection and the fine-tuning of hyperparameters to drive further enhancements in effectiveness.
Peng Lin et al. [49]	CSE-CIC-IDS2018	2019	LSTM	96.2%	In order to fortify cyber defense, we have designed an adaptive anomaly detection framework utilizing advanced deep learning approaches. This construct implements a LSTM deep neural network system, bolstered by an AM to heighten its overall efficiency. Moreover, the SMOTE, fused with an improved loss criterion, is utilized to effectively mitigate the distributional disparity present within the CSE-CIC-IDS2018 data repository.	Restricted to classification across various categories This paradigm exhibits remarkable precision and retrieval; nonetheless, its reliance on transformed attributes may restrict its capacity to acquire and adapt autonomously from unprocessed network traffic information.
Mirza M. Baig et al. [33]	KDD-CUP’99	2017	ANNs	99.36%	This research proposes a methodology that deploys a succession of augmentation-based artificial neural network frameworks to develop a vigorous classification model. Validated on two intrusion recognition data repositories, this method enhances the one-against-remaining strategy through the inclusion of additional instance screening to improve fidelity	The KDD99 dataset was employed, demonstrating a greater level of redundancy. The suggested approach shows strong performance with dominant classes in the KDD’99 dataset but faces challenges with infrequent classes, resulting in compromised efficiency on the UNSW-NB15 dataset. This underscores the need for more effective management of minority classes and a more comprehensive assessment across a wider range of datasets.
R. Vinayakumar et al. [18]	KDD-CUP’99	2019	DNN	93%	The researchers have designed a DNN-driven intrusion detection system that reaches 93% accuracy in identifying and categorizing novel cyberattacks by analyzing both fixed and evolving datasets.	The KDD99 dataset was employed, demonstrating a greater level of redundancy. The evaluation falls short in addressing the scalability and performance challenges associated with distributed systems, along with the limitations of employing sophisticated DNNs.
Guojie Liu and Jianbiao Zhang [50]	KDD-CUP’99	2020	CNN	98.2%	This research executed network security penetration detection across diverse categorizations employing the KDD-CUP 99 and NSL-KDD data repositories. The CNN architecture accomplished an outstanding accuracy of 98.2%, illustrating its skill in effectively recognizing diverse network attack variations.	The KDD99 dataset was employed, demonstrating a greater level of redundancy. Limited to multi-category classification The suggested model boosts accuracy and recall, yet still necessitates additional for enhanced identification of unidentified attacks and additional validation with real-world network traffic data.
Mario Di Mauro et al. [51]	CICIDS2017	2020	MLP	88.92%	In this paper, the authors present an experimental-based review of various neural network-based methods applied to IDS. They specifically highlight the performance of MLP, a prominent type of neural network-based method, which achieved an accuracy of 88.92% in multi-class classification for detecting intrusions	This study’s limitation lies in its focus on traditional neural network approaches for intrusion detection, without fully exploring emerging techniques like feature selection integration or weightless neural networks in real-time network environments.

Table 3. Class types in IoT-23 dataset.

Attack Type	Samples Counts	Description
benign	70,000	A general label for traffic captures deemed non-suspicious.
PartOfAHorizontalPortScan	69,198	It involves collecting information from a device by scanning multiple ports to identify open or vulnerable ports. This information is gathered to facilitate planning and execution of future attacks.
DDoS	65,000	An attack where the compromised device is used to participate in a distributed denial-of-service, overwhelming a target system or network with excessive traffic to disrupt its normal operations.
Okiru	14,942	An attack carried out by the Okiru botnet, a more advanced and sophisticated variant of the Mirai botnet. This network is designed to exploit IoT devices for malicious purposes, extending the capabilities of earlier botnet versions.
C&C-HeartBeat	10,239	A method where the server controlling the infected device sends regular messages to monitor the status of the compromised device. These periodic communications are detected by identifying small data packets sent at regular intervals from a suspicious source.
C&C	8939	A type of attack in which an attacker gains control over a device and establishes a command channel to issue instructions. This allows the attacker to direct the device to perform various malicious actions or attacks at their discretion in the future.
Attack	3396	A general label applied to anomalies that cannot be specifically identified or classified. This designation is used when a detected issue falls into an undefined category of malicious activity or remains unclassified due to insufficient details.
C&C-PartOfAHorizontalPortScan	327	A scenario where the network sends data packets to perform a horizontal port scan, collecting information about open ports and vulnerabilities on multiple devices. This data gathering is aimed at identifying potential targets for future attacks.

Table 4. Attributes included in the IoT-23 dataset.

Attribute	Description
ts	The time at which the capture occurred, represented in Unix time format.
uid	A unique identifier assigned to the capture, serving as a distinct reference for each data entry.
id_orig.h	The IP address of the originating source where the attack occurred, which can be either in IPv4 or IPv6 format.
id_orig.p	The port number utilized by the originating source in the communication process.
id_resp.h	The IP address of the device that received the capture, indicating the destination of the network traffic.
id_resp.p	The port number used for the response from the device where the capture took place, indicating the communication endpoint on the receiving device.
proto	The network protocol employed for the data packet transmission, specifying the method of communication used for the data exchange.
service	The application-level protocol used for the communication, defining the specific type of service or application interacting over the network.
duration	The total amount of time that data was exchanged between the device and the attacker, representing the period of active communication.
orig_bytes	The volume of data transmitted to the device from the source, indicating the total amount of incoming data.
resp_bytes	The volume of data sent by the device in response, representing the total amount of outbound data transmitted from the device.
conn_state	The current status of the connection, reflecting the connection’s operational state or phase at the time of the data capture.
local_orig	Indicates whether the connection was initiated locally from within the same network or system.
local_resp	Indicates whether the response was generated locally within the same network or system.
missed_bytes	The count of bytes that were not captured or were lost within a message transmission, indicating gaps in the data recorded.
history	The record of changes or transitions in the connection’s state over time, detailing the evolution and previous statuses of the connection.
orig_pkts	The total count of packets transmitted to the device from the source, reflecting the volume of incoming network traffic directed towards it.
orig_ip_bytes	The total number of bytes transmitted to the device over the IP network, indicating the volume of data received by the device.
resp_pkts	The total count of packets sent from the device, representing the volume of outbound network traffic originating from it.
resp_ip_bytes	The total number of bytes transmitted from the device over the IP network, reflecting the volume of data sent out by the device.
labels	The classification of the capture, denoting whether it is benign, normal, or malicious.
detailed_label	When the capture is identified as malicious, this field specifies the type of malicious activity, as described in the detailed classifications provided above.

Table 5. Sample distribution of the IoT-23 dataset for binary classification.

Class Type	Train	Test
Normal	58,997	10,497
Attack	137,358	24,155

Table 6. IoT-23 multi-class sample counts (Normal class removed).

Class Type	Train	Test
PartOfAHorizontalPortScan	54,762	9733
DDoS	51,048	9069
Okiru	12,777	2156
C&C-HeartBeat	6965	1174
C&C	5756	1036
Attack	1814	327
C&C-PartOfAHorizontalPortScan	225	37

Table 7. Sample distribution of the IoT-23 dataset for multi-class classification including normal class.

Class Type	Train	Test
benign	58,964	10,507
PartOfAHorizontalPortScan	58,901	10,241
DDoS	51,026	9016
Okiru	12,675	2260
C&C-HeartBeat	6863	1232
C&C	5697	990
Attack	1825	328
C&C-PartOfAHorizontalPortScan	227	46

Table 8. Training class sample counts (pre- and post-hybrid ADASYN-SMOTE resampling) for IoT-23 binary classification.

Category Label	Pre-Resampling Sample Count (ADASYN-SMOTE)	Post-Resampling Sample Count (ADASYN-SMOTE)
Normal	58,997	140,815
Attack	137,358	140,815

Table 9. Training class sample counts (pre- and post-ADASYN resampling) for IoT-23 multi-class classification (excluding Normal).

Category Label	Pre-Resampling Sample Count (ADASYN-SMOTE)	Post-Resampling Sample Count (ADASYN-SMOTE)
PartOfAHorizontalPortScan	54,762	54,762
DDoS	51,048	51,048
Okiru	12,777	12,777
C&C-HeartBeat	6965	6965
C&C	5756	5756
Attack	1814	54,762
C&C-PartOfAHorizontalPortScan	225	54,761

Table 10. Training class sample counts (pre- and post-ADASYN resampling) for IoT-23 multi-class classification (including Normal).

Category Label	Pre-Resampling Sample Count (ADASYN-SMOTE)	Post-Resampling Sample Count (ADASYN-SMOTE)
benign	58,964	58,964
PartOfAHorizontalPortScan	58,901	58,901
DDoS	51,026	51,026
Okiru	12,675	12,675
C&C-HeartBeat	6863	6863
C&C	5697	5697
Attack	1825	58,962
C&C-PartOfAHorizontalPortScan	227	58,924

Table 11. Training class sample counts (pre- and post-ENN resampling) for IoT-23 binary classification.

Category Label	Pre-Resampling Sample Count (ENN)	Post-Resampling Sample Count (ENN)
Normal	140,815	140,815
Attack	140,815	115,850

Table 12. Training class sample counts (pre- and post-ENN resampling) for IoT-23 multi-class classification (excluding Normal).

Category Label	Pre-Resampling Sample Count (ENN)	Post-Resampling Sample Count (ENN)
PartOfAHorizontalPortScan	54,762	54,491
DDoS	51,048	51,043
Okiru	12,777	12,777
C&C-HeartBeat	6965	6957
C&C	5756	5756
Attack	54,762	54,761
C&C-PartOfAHorizontalPortScan	54,761	54,761

Table 13. Training class sample counts (pre- and post-ENN resampling) for IoT-23 multi-class classification (including Normal).

Category Label	Pre-Resampling Sample Count (ENN)	Post-Resampling Sample Count (ENN)
benign	58,964	48,622
PartOfAHorizontalPortScan	58,901	42,206
DDoS	51,026	51,020
Okiru	12,675	12,675
C&C-HeartBeat	6863	6861
C&C	5697	5697
Attack	58,962	58,962
C&C-PartOfAHorizontalPortScan	58,924	58,923

Table 14. Class weighting for each training class on IoT-23 binary classification.

Class Type	Weight Using Class Weights
Normal	0.9114
Attack	1.1077

Table 15. The assigned weights for each training class using class weights for multi-class classification, excluding the normal class, on the IoT-23 dataset.

Class Type	Weight Using Class Weights
PartOfAHorizontalPortScan	0.6306
DDoS	0.6732
Okiru	2.6895
C&C-HeartBeat	4.9394
C&C	5.9701
Attack	0.6275
C&C-PartOfAHorizontalPortScan	0.6275

Table 16. The assigned weights for each training class using class weights for multi-class classification, including the normal class, on the IoT-23 dataset.

Class Type	Weight Using Class Weights
benign	0.7326
PartOfAHorizontalPortScan	0.8440
DDoS	0.6982
Okiru	2.8103
C&C-HeartBeat	5.1918
C&C	6.2525
Attack	0.6041
C&C-PartOfAHorizontalPortScan	0.6045

Table 17. Hyperparameter Specification for the CNN Model.

Parameter	Binary Classifier	Multi-Class Classifier
Batch size	128	128
Learning rate	Scheduled: Starting Value = 0.001, Factor = 0.5, Min = 1 × 10⁻⁵ (ReduceLROnPlateau)	Scheduled: Starting Value = 0.001, Factor = 0.5, Min = 1 × 10⁻⁵ (ReduceLROnPlateau)
Optimizer	Adam	Adam
Loss function	Binary cross-entropy	Categorical cross-entropy
Metric	Accuracy	Accuracy

Table 18. Hyperparameter Specification for the Autoencoder Model.

Parameter	Binary Classifier	Multi-Class Classifier
Batch size	128	128
Learning rate	Scheduled: Starting Value = 0.001, Factor = 0.5, Min = 1 × 10⁻⁵ (ReduceLROnPlateau)	Scheduled: Starting Value = 0.001, Factor = 0.5, Min = 1 × 10⁻⁵ (ReduceLROnPlateau)
Optimizer	Adam	Adam
Loss function	Binary cross-entropy	Categorical cross-entropy
Metric	Accuracy	Accuracy

Table 19. Hyperparameter Specification for the DNN Model.

Parameter	Binary Classifier	Multi-Class Classifier
Batch size	128	128
Learning rate	Scheduled: Starting Value = 0.0003, Factor = 0.9, Decay Steps = 10,000 (Exponential Decay)	Scheduled: Starting Value = 0.0003, Factor = 0.9, Decay Steps = 10,000 (Exponential Decay)
Optimizer	Adam	Adam
Loss function	Binary cross-entropy	Categorical cross-entropy
Metric	Accuracy	Accuracy

Table 20. Hyperparameter Specification for the CNN-MLP Model.

Parameter	Binary Classifier	Multi-Class Classifier
Batch size	128	128
Learning rate	Scheduled: Starting Value = 0.001, Factor = 0.5, Min = 1 × 10⁻⁵ (ReduceLROnPlateau)	Scheduled: Starting Value = 0.001, Factor = 0.5, Min = 1 × 10⁻⁵ (ReduceLROnPlateau)
Optimizer	Adam	Adam
Loss function	Binary cross-entropy	Categorical cross-entropy
Metric	Accuracy	Accuracy

Table 21. Evaluation metrics for binary classification with the application of data resampling techniques and class weights.

Dataset	Metric	Accuracy	Precision	Recall	F-Score
IoT-23	CNN	99.92%	99.92%	99.92%	99.92%
	Autoencoder	84.02%	89.34%	84.02%	84.64%
	DNN	99.24%	99.25%	99.24%	99.24%
	CNN-MLP	99.94%	99.94%	99.94%	99.94%
NF-BoT-IoT-v2	CNN	99.94%	99.94%	99.94%	99.94%
	Autoencoder	99.95%	99.95%	99.95%	99.95%
	DNN	99.95%	99.95%	99.95%	99.95%
	CNN-MLP	99.96%	99.96%	99.96%	99.96%

Table 22. Evaluation metrics for multi-class classification excluding normal class with the application of data resampling techniques and class weights.

Dataset	Metric	Accuracy	Precision	Recall	F-Score
IoT-23	CNN	99.97%	99.98%	99.97%	99.97%
	Autoencoder	99.57%	99.60%	99.57%	99.57%
	DNN	99.94%	99.94%	99.94%	99.94%
	CNN-MLP	99.99%	99.99%	99.99%	99.99%
NF-BoT-IoT-v2	CNN	97.99%	98.04%	97.99%	97.99%
	Autoencoder	98.01%	98.06%	98.01%	98.00%
	DNN	97.92%	97.99%	97.92%	97.92%
	CNN-MLP	98.02%	98.07%	98.02%	98.02%

Table 23. Evaluation metrics for multi-class classification including normal class with the application of data resampling techniques and class weights.

Dataset	Metric	Accuracy	Precision	Recall	F-Score
IoT-23	CNN	99.81%	99.84%	99.81%	99.82%
	Autoencoder	87.12%	87.70%	87.12%	87.16%
	DNN	99.67%	99.73%	99.67%	99.69%
	CNN-MLP	99.91%	99.92%	99.91%	99.91%
NF-BoT-IoT-v2	CNN	98.04%	98.04%	98.04%	98.04%
	Autoencoder	98.08%	98.12%	98.08%	98.07%
	DNN	97.89%	97.92%	97.89%	97.88%
	CNN-MLP	98.11%	98.17%	98.11%	98.11%

Table 24. Training time for CNN-MLP Model.

Dataset	Classification Type	Training Time per Batch (Seconds)	Training Time per Sample (Milliseconds)
IoT-23	Binary classification	0.0688	0.54
	Multi-class classification excluding normal class	0.0824	0.64
	Multi-class classification including normal class	0.0870	0.68
NF-BoT-IoT-v2	Binary classification	0.0579	0.45
	Multi-class classification excluding normal class	0.0680	0.53
	Multi-class classification including normal class	0.0747	0.58

Table 25. Inference time for CNN-MLP Model.

Dataset	Classification Type	Inference Time per Batch (Seconds)	Inference Time per Sample (Milliseconds)
IoT-23	Binary classification	0.0575	0.45
	Multi-class classification excluding normal class	0.0613	0.48
	Multi-class classification including normal class	0.0674	0.53
NF-BoT-IoT-v2	Binary classification	0.0570	0.45
	Multi-class classification excluding normal class	0.0634	0.50
	Multi-class classification including normal class	0.0687	0.54

Table 26. Memory consumption for CNN-MLP Model.

Dataset	Classification Type	Memory consumption (MB)
IoT-23	Binary classification	111.63
	Multi-class classification excluding normal class	119.70
	Multi-class classification including normal class	123.84
NF-BoT-IoT-v2	Binary classification	15.54
	Multi-class classification excluding normal class	23.99
	Multi-class classification including normal class	25.69

Table 27. Evaluation metrics for the CNN-MLP model in binary classification across different classes on the IoT-23 dataset.

Label	Accuracy	Precision	Recall	F-Score
Normal	99.91%	99.90%	99.91%	99.90%
Attack	99.95%	99.96%	99.95%	99.96%

Table 28. Evaluation metrics for the CNN-MLP model in binary classification across different classes on NF-BoT-IoT-v2 dataset.

Label	Accuracy	Precision	Recall	F-Score
Normal	100%	99.61%	100%	99.80%
Attack	99.95%	100%	99.95%	99.98%

Table 29. Evaluation metrics for the CNN-MLP model in multi-class classification, excluding the normal class, on the IoT-23 dataset.

Class	Accuracy	Precision	Recall	F-Score
PartOfAHorizontalPortScan	99.99%	100%	99.99%	99.99%
DDoS	99.99%	99.99%	99.99%	99.99%
Okiru	100%	100%	100%	100%
C&C-HeartBeat	99.91%	100%	99.91%	99.96%
C&C	100%	100%	100%	100%
Attack	100%	99.70%	100%	99.85%
C&C-PartOfAHorizontalPortScan	100%	97.37%	100%	98.67%

Table 30. Evaluation metrics for the CNN-MLP model in multi-class classification, excluding the normal class, on NF-BoT-IoT-v2 dataset.

Class	Accuracy	Precision	Recall	F-Score
Reconnaissance	94.72%	99.27%	94.72%	96.94%
DDoS	99.41%	99.53%	99.41%	99.47%
DoS	98.93%	95.43%	98.93%	97.15%
Theft	98.18%	96.43%	98.18%	97.30%

Table 31. Evaluation metrics for the CNN-MLP model in multi-class classification, including the normal class, on the IoT-23 dataset.

Class	Accuracy	Precision	Recall	F-Score
Benign	99.92%	99.93%	99.92%	99.93%
PartOfAHorizontalPortScan	99.79%	99.99%	99.79%	99.89%
DDoS	99.99%	100%	99.99%	99.99%
Okiru	100%	100%	100%	100%
C&C-HeartBeat	100%	100%	100%	100%
C&C	99.80%	99.40%	99.80%	99.60%
Attack	100%	100%	100%	100%
C&C-PartOfAHorizontalPortScan	100%	71.88%	100%	83.64%

Table 32. Evaluation metrics for the CNN-MLP model in multi-class classification, including the normal class, on NF-BoT-IoT-v2 dataset.

Class	Accuracy	Precision	Recall	F-Score
Benign	100%	99%	100%	99.50%
Reconnaissance	94.47%	99.85%	94.47%	97.09%
DDoS	99.18%	99.30%	99.18%	99.24%
DoS	99.05%	95.19%	99.05%	97.08%
Theft	100%	98.04%	100%	99.01%

Table 33. Performance metrics of the CNN-MLP model on the IoT-23 dataset with and without data resampling and class weights across different classification types.

Data Handling Strategy	Classification Type	Accuracy	Precision	Recall	F-Score
Without Resampling and Class Weights	Binary classification	99.38%	99.39%	99.38%	99.38%
	Multi-class classification excluding normal class	99.96%	99.96%	99.96%	99.96%
	Multi-class classification including normal class	99.84%	99.84%	99.84%	99.80%
With Resampling and Class Weights	Binary classification	99.94%	99.94%	99.94%	99.94%
	Multi-class classification excluding normal class	99.99%	99.99%	99.99%	99.99%
	Multi-class classification including normal class	99.91%	99.92%	99.91%	99.91%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.