Lightweight MLP-Based Feature Extraction with Linear Classifier for Intrusion Detection System in Internet of Things

Chandroth, Jisi; Ali, Jehad

doi:10.3390/electronics15081604

Open AccessEditor’s ChoiceArticle

Lightweight MLP-Based Feature Extraction with Linear Classifier for Intrusion Detection System in Internet of Things

by

Jisi Chandroth

¹

and

Jehad Ali

^2,*

¹

Department of AI and Software, Gachon University, Seongnam 13120, Republic of Korea

²

Department of AI Convergence Network, Ajou University, Suwon 16499, Republic of Korea

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(8), 1604; https://doi.org/10.3390/electronics15081604

Submission received: 20 March 2026 / Revised: 3 April 2026 / Accepted: 8 April 2026 / Published: 12 April 2026

(This article belongs to the Special Issue Secure and Intelligent IoT & CPS: AI Driven Attack–Defense, Network Analysis and Smart Data Protection)

Download

Browse Figures

Versions Notes

Abstract

The Internet of Things (IoT) comprises diverse devices connected through heterogeneous communication protocols to deliver a wide range of services. However, the complexity and scale of IoT networks make them difficult to secure. Network intrusion detection systems (NIDSs) have therefore become essential for identifying malicious activities and protecting IoT environments across many applications. Although recent deep learning (DL)-based IDS approaches achieve strong detection performance, they often require substantial computation and storage, which limits their practicality on resource-constrained IoT devices. To balance detection accuracy with computational efficiency, we propose a lightweight deep learning model for IoT intrusion detection. Specifically, our method learns compact, intrusion-relevant representations from traffic features using a two-layer multi-layer perceptron (MLP) embedding backbone, followed by a linear SoftMax classification head for multi-class attack detection. We evaluate the proposed approach on three benchmark datasets, CICIDS2017, NSL-KDD, and CICIoT2023, and the results show strong performance, achieving 99.85%, 99.21%, and 98.45% accuracy, respectively, while significantly reducing model size and computational overhead. The experimental results demonstrate that the proposed method achieves excellent classification performance while maintaining a lightweight design, with fewer parameters and lower FLOPs than existing approaches.

Keywords:

Internet of Things; intrusion detection system; multi-layer perceptron; deep learning

1. Introduction

The Internet of Things (IoT) is considered to be among the rapidly developing technological spheres in the world. It helps in merging the physical space with the cyberspace. IoT technology is a network consisting of interrelated physical objects, sensors, actuators, and other devices that communicate through different communication protocols to create, receive, and transmit data. It has become popular in various fields of application, such as healthcare, transportation, and industrial systems, enhancing the overall quality of experience in everyday life [1].

Although IoT is widely applied in various areas, it has a low resistance against broad security threats. The heterogeneous and dispersed nature of IoT devices, in addition to the lack of computational capabilities and security settings, render these environments nice targets for cyber-attackers. Therefore, IoT networks are frequently subject to numerous types of attacks, e.g., Distributed Denial of Service (DDoS) attacks, data losses, malware distribution, and unauthorized access. These security concerns have been underlined by the number of large-scale IoT attacks that have been observed in the last ten years. To illustrate this, the malware campaign of BADBOX 2.0 affected over one million devices by March 2025, and it is estimated that it might have reached over ten million devices in 222 countries by now [2]. Another notable one is the Reaper botnet, which was identified in September 2017; unlike the Mirai botnet, which primarily used weak default credentials, it used known vulnerabilities in IoT devices, making it more developed and threatening to use [3].

An Intrusion Detection System (IDS) is essential to alleviate these threats and to secure the safety of IoT systems. IDS is an alternative that continuously inspects and analyzes network traffic and system activity to detect anomalies and possible intrusions [4,5]. It allows identifying different attacks in IoT networks. Over the past several years, a significant portion of the attention has been focused on IDS methods that rely on machine learning (ML) and deep learning (DL), since they can conduct automatic learning of complex patterns in network traffic data and enhance the accuracy of the detection process in IoT contexts [6,7].

The IoT ecosystem consists of a wide range of devices, such as sensors, actuators, and edge nodes, which are usually constrained by harsh resource limits. IoT devices usually possess weak processing power, memory, storage capacity, and energy sources than traditional computing systems [8]. These limitations ensure that it is difficult to deploy traditional security mechanisms and computationally expensive intrusion detection systems to directly run on IoT devices. This has left most IoT devices vulnerable to security threats and exploitable weaknesses. Moreover, conventional IDS techniques usually depend on sophisticated machine learning algorithms that demand massive training datasets and high dimension features that demand a lot of computation and storage resources that cannot be implemented in the IoT setting [9].

The application of successful IDS solutions to IoT networks consequently poses a number of challenges. Currently, numerous methods focus primarily on high detection accuracy at the cost of constraints in the practical implementation of IoT devices, such as limited processing power, reliance on batteries, and limited memory. Moreover, IoT networks are large-scale and heterogeneous, making real-time intrusion detection even more difficult [10]. This may still lead to latency increase and extra energy usage in spite of the fact that edge computing has been implemented to remove the computational workload of IoT devices to nearby edge nodes [11,12]. Thus, when creating intrusion detection systems in an IoT setting, computational overhead, energy efficiency, and real-time detection are to be given due attention.

In an effort to solve these problems, recent studies have aimed at coming up with lightweight IDS models that have been optimized to meet the needs of resource-constrained IoT settings. The main aim is to achieve a balance between high-detection and low-computational-cost performance [13]. Nevertheless, the problem of creating machine learning-based IDS models, which are efficient and accurate, is a continuous challenge. In the recent past, researchers have investigated methods like compressed models and federated learning and dynamic quantization to simplify models without significantly impacting the level of detection performance [14]. These methods are designed to facilitate the ability of effective intrusion detection within the constrained computational and energy capabilities of the IoT devices.

Although advances have been achieved in intrusion detection methods, most of the available IDS solutions to the IoT setting still fail to achieve an optimal balance between detection efficiency and computational efficiency. The majority of traditional ML and DL-based methods are based on complicated architecture and numerous parameters and, consequently, consume a lot of memory, computation time, and energy [15]. These features render them inappropriate to run on resource-limited IoT devices. Hence, there is a strong necessity for light and efficient intrusion detection mechanisms that ensure that detection performance is high with minimum consumption of resources. In this paper, we will develop an effective IDS that suits IoT settings and is optimized by balancing the detection rate with the performance rate. The proposed solution will aim at providing efficient intrusion detection and will be appropriate for implementation in resource-constrained IoT systems.

The main contributions of this work are summarized as follows:

We propose a lightweight intrusion detection model that learns compact representations from network traffic features using a two-layer multi-layer perceptron embedding backbone.
The proposed architecture employs a simple, efficient design that combines an MLP feature-embedding network with a linear SoftMax classification head for multi-class attack detection.
The proposed approach is evaluated on two widely used benchmark datasets, namely, CICIDS2017 and NSL-KDD, to validate its effectiveness for intrusion detection tasks.
The experimental results demonstrate that the proposed model achieves strong classification performance, with accuracies of 99.85% on CICIDS2017 and 99.21% on NSL-KDD.
The proposed method maintains a lightweight structure with reduced model size, fewer parameters, and significantly lower FLOPs compared with existing approaches, making it suitable for deployment in resource-constrained IoT environments.

The rest of the paper will be structured as follows. Section 2 explains the lightweight methods of intrusion detection of IoT networks. Section 3 presents the proposed intrusion detection model, which is developed on the basis of a two-layer multi-Layer perceptron embedding backbone and linear SoftMax classification head to detect multi-class attacks. Section 4 provides a description of the dataset, experimental setup, model implementation, evaluation metrics, and performance analysis. Lastly, Section 5 will wrap up this paper, comment on the findings, and provide future work directions.

2. Related Works

As a result of the resource constraints of IoT devices, recent studies have started focusing more on creating lightweight intrusion detection systems (IDSs) capable of performing effectively with limited resources in terms of computational power, memory, and energy. A number of lightweight IDS models have been proposed in order to meet these challenges. Nevertheless, most of the current techniques have not been able to deliver high accuracy on detection and minimize the complexity of the model. As an example, Z. Wang et al. [11] developed a lightweight IoT intrusion detection model that is built on an architecture of a DL-BiLSTM that uses deep neural networks (DNNs) and bidirectional long short-term memory (BiLSTM) to learn nonlinear relationships and long-term temporal dependencies among network traffic. Incremental principal component analysis (IPCA) is employed to minimize the computational expenses incurred during dimensionality reduction in features, whereas post-training dynamic quantization is employed to minimize the computational costs incurred during model compression. Even though such a strategy decreases the computational cost, dynamic quantization can negatively impact detection accuracy compared to the original model and requires extra processing steps. In addition, the assessment dataset is not entirely representative of the complexity and dynamic nature of the actual world environment of IoT network, which reduces the generalization ability of the model.

In a bid to make lightweight IDSs perform better, a number of studies have explored lightweight IDS techniques, including feature optimization, knowledge distillation, and lightweight architectural design. According to Wang et al. [14], the model proposed is a lightweight intrusion detection model that uses self-knowledge distillation (SKD) and is known as the tied block convolution lightweight neural network (TBCLNN). In this model, binary Harris Hawk Optimization (bHHO) is used to reduce the number of features, and lightweight convolutional structures with residual and inverted residual blocks are used to reduce computational complexity. In a similar way, Benaddi et al. [16] presented a hybrid IDS model that uses convolutional neural networks (CNNs) and BiLSTM networks for capturing both spatial and temporal characteristics in IoT traffic data. The UNSW-NB15 dataset is analyzed using a chi-square-type feature selection algorithm to determine the most significant features of the dataset, and the input dimensions are reduced to enhance the efficiency of the method.

Moreover, recent work has investigated knowledge distillation and lightweight architectures to trade-off detection performance and computational cost. As one such example, the CL-SKD framework [17] uses a two-step approach to learning that incorporates both self-supervised contrastive learning and self-knowledge distillation to improve representation learning and minimize the use of labeled data. The model first acquires traffic representations through contrastive learning and subsequently learns directly through the teacher model to transfer knowledge to a lightweight student model to enhance performance in detections.

Moreover, LNet-SKD [18] introduces a lightweight intrusion detection model using self-knowledge distillation. It presents a DeepMax block for effectively deriving compact traffic representations and piles up several of these blocks to build a lightweight model. It also includes batch-wise self-knowledge distillation to reduce performance reductions due to simplification of the model. In the same manner, an inverted residual network combined with a multi-batch self-knowledge distillation mechanism to detect network intrusion is proposed in IRNet-MBSKD [19]. The inverted residual design that is protocol-aware promotes effective feature extraction, and the self-distillation strategy with many batches promotes generalization of the model. The strategy attains high-detection rates on the NSL-KDD dataset and also consumes lower computational costs.

Several other studies have also discussed optimization-based feature selection and advanced learning strategies to enhance the effectiveness of lightweight IDSs. A deep learning-based intrusion detection model, DP2, proposed by Khan et al. [20] is an extension of DNN- and Bi-LSTM-based intrusion detection with a wrapper-based genetic algorithm (GA) feature selection approach, and it aims at eliminating redundant features and reducing memory consumption. Likewise, Yang et al. [21] have created a lightweight open-source framework that can be used to detect intrusion into industrial IoT or IIoT, called a CompM3, and it consists of known attack classification, unknown attack detection based on reconstruction error analyses, and dynamical updating to adapt to new attack patterns known to the system. Moreover, Ma et al. [22] proposed a cloud–edge-node architecture in which computationally intensive training is done in the cloud. Simultaneously, the lightweight detection modules are implemented on the edges and node tiers. Although these methods reduce computational pressure at the device level, they also have a number of issues, such as being vulnerable to adversarial attacks, being dependent on particular malicious traffic patterns, and paying limited attention to post-detection mitigation measures.

Lightweight model compression, transfer learning, and frequency-domain representations of features are also discussed in recent works as ways of detecting intrusion into IoT. Zhang et al. [23] suggested a few-shot intrusion detection method based on lightweight transfer learning (NID-LTL) to combine model pruning, nonlinear feature selection, and knowledge distillation to generate a small-scale model that can be adapted to new attacks. Fard et al. [24] explored design space optimization methods, and this allowed them to create smaller deep neural networks that fit in embedded IoT devices without significantly deteriorating the training process. Equally, Wang et al. [25] presented a lightweight CNN model that used features of Fourier transforms with knowledge distillation to enhance feature representation and generalizations. PNet-IDS [26], as well as other packet-embedding-based classification methods [27] and attention-enhanced BiLSTM models [28], is designed to achieve greater detection accuracy with lower model complexity.

Recent research has extensively explored feature optimization and hybrid deep learning models to improve intrusion detection performance. Optimization-based feature selection methods, such as ISSOA, have been integrated with deep models, such as attention recurrent autoencoders, to enhance feature relevance and improve detection accuracy, achieving up to 98% recall in IoT environments [29]. Similarly, metaheuristic approaches, such as the Mayfly Optimization Algorithm (MOA) combined with BiLSTM, enable effective feature selection and sequential pattern learning, achieving 99.25% accuracy on the NSL-KDD dataset [30]. In addition, hybrid architectures such as CNN-BiGRU have been proposed to capture the spatial and temporal characteristics of network traffic jointly. At the same time, genetic algorithm-based data augmentation further improves minority class detection [31]. More advanced hybrid models, including ResNeSt-biGRU, leverage residual feature extraction and bidirectional temporal modeling to achieve high detection accuracy, exceeding 99% on IoT datasets [32]. Although these approaches achieve strong performance, they introduce higher computational complexity due to optimization procedures and deep hybrid architectures. In contrast, lightweight models aim to reduce this complexity while maintaining competitive detection performance.

However, even the majority of the existing strategies continue to face problems in achieving an optimal balance between the accuracy of detection, the complexity of the model, and the possibility of real-time deployment. Most models are based on extra optimization methods, including quantization, pruning, or knowledge distillation, that may add extra computation or may even worsen detection. Moreover, the datasets applicable in numerous studies also fail to capture the highly dynamic and heterogeneous nature of a real-world IoT environment, which constrains the extrapolability of the models suggested. Problems like the imbalance of the dataset, scalability of a large-scale network, resistance to changing attack patterns, etc., are not explicitly addressed as well. Thus, it remains necessary to create lightweight intrusion detection models that can learn compact and useful traffic representations at the same time as with high detection rates and low computation costs so that they can be suitable for easy deployment in real-world IoT systems. Table 1 provides a summary of the existing literature.

3. Proposed Method

This study proposes a lightweight intrusion detection model for resource-constrained IoT environments. The main objective of the proposed method is to achieve high detection accuracy while maintaining low computational complexity and a small model size. The proposed architecture employs a two-layer multi-layer perceptron feature embedding network, followed by a linear SoftMax classification layer, to learn compact representations of network traffic features. The overall framework of the proposed IDS is illustrated in Figure 1. The framework consists of three main stages: (1) data preprocessing, (2) feature extraction, and (3) classification. In the data preprocessing stage, raw network traffic data are processed to ensure data quality and consistency prior to model training. In the feature extraction stage, the preprocessed feature vectors are fed into the MLP layers, which learn compact representations of the traffic patterns. Finally, in the classification stage, a linear SoftMax classifier maps the learned feature embeddings to multiple attack classes for final prediction. The mathematical notations used in this work are shown in Table 2.

3.1. Data Preprocessing

Let

(X, Y)

denote the original dataset, where

X = {x_{1}, x_{2}, \dots, x_{n}} \in R^{n \times d}

represents the input feature matrix containing n samples and d features, and

Y = {y_{1}, y_{2}, \dots, y_{c}}

denotes the corresponding class labels. The label set Y comprises c traffic classes that represent different types of normal and malicious network activities. During data preprocessing, several operations are performed to ensure data quality and suitability for model training. First, missing values are removed to prevent potential bias and skewness in the learning process. Subsequently, rows containing non-finite values are filtered out to avoid numerical instability during model training. The percentage of samples eliminated due to missing and non-finite values was extremely low and had no noticeable effect on the distribution of the entire dataset. After cleaning the dataset, categorical class labels are converted to numerical values using LabelEncoder, which maps each class label to an integer in the range

{0, 1, \dots, c - 1}

.

To handle categorical attributes in the feature set, one-hot encoding is applied, transforming categorical variables into binary vectors, thereby making all features numerical. Let the resulting feature matrix after encoding be denoted as

X_{enc}

. Next, feature scaling is performed using Min–Max normalization to transform the feature values into a consistent range. Specifically, each feature value is normalized to the interval

[0, 1]

using the following transformation:

X_{norm} = \frac{X_{enc} - X_{min}}{X_{max} - X_{min}}

(1)

where

X_{min}

and

X_{max}

represent the minimum and maximum values of each feature, respectively. The final preprocessed dataset can, therefore, be represented as

(X_{p}, Y)

, where

X_{p} = X_{norm}

. This normalization step helps stabilize the training process and improves the convergence behavior of the neural network.

3.2. Feature Extraction

We employ a multi-layer perceptron for feature extraction. The MLP is a popular feedforward neural network architecture consisting of an input layer, one or more hidden layers, and an output layer. The network can learn intricate patterns in the data using a nonlinear activation function after each neuron processes the input through weighted connections. In general, MLPs use forward propagation to generate predictions, backpropagation to update network parameters, and hidden-layer representation learning to extract meaningful patterns from input data.

After the preprocessing stage, the normalized feature matrix is represented as

X_{p} \in R^{n \times d}

, where n denotes the number of samples, and d represents the number of input features. Each input vector

x_{p}

is fed into the MLP through the input layer, where the number of input neurons is equal to the feature dimension of the dataset. The proposed model consists of an input layer followed by two hidden layers containing 128 and 64 neurons, respectively, and a final output layer for multi-class classification. The hidden layers employ the ReLU activation function to learn complex nonlinear patterns from IoT traffic data. In particular, the second hidden layer reduces the feature representation to 64 neurons, thereby generating a compact embedding representation of the network traffic features.

The first hidden layer projects the input vector into a higher-dimensional representation, which is defined as

h_{1} = σ (W_{1} x_{p} + b_{1})

(2)

where

W_{1} \in R^{d \times h}

and

b_{1} \in R^{h}

denote the weight matrix and bias vector of the first layer, respectively, and h denotes the hidden layer dimension. Here,

σ (\cdot)

represents the ReLU activation function, which introduces nonlinearity and allows the model to capture complex feature interactions in network traffic data.

The hidden representation is then transformed into a compact embedding vector through the second linear layer:

z = W_{2} h_{1} + b_{2}

(3)

where

W_{2} \in R^{h \times e}

and

b_{2} \in R^{e}

denote the parameters of the second layer, and e represents the embedding dimension. The resulting vector

z \in R^{e}

serves as a low-dimensional representation of the input traffic features. This embedding vector captures the most relevant information required to distinguish between different network traffic categories. By compressing the input features into a compact representation, the embedding network reduces computational complexity while preserving discriminative characteristics necessary for accurate intrusion detection.

3.3. Classification Layer and Loss Function

After extracting the compact embedding representation

z \in R^{e}

from the feature embedding network, the embedding vector is passed to a linear SoftMax classification layer to predict the final traffic class. The classifier maps the embedding vector to the output space containing C traffic categories. The output logits are computed as follows:

o = W_{3} z + b_{3}

(4)

where

W_{3} \in R^{e \times C}

and

b_{3} \in R^{C}

represent the weight matrix and bias vector of the classification layer, respectively, and C denotes the number of traffic classes in the dataset. The vector

o \in R^{C}

contains the unnormalized prediction scores (logits) for each class.

To obtain the probability distribution over the classes, the SoftMax function is applied to the output logits:

{\hat{y}}_{i} = \frac{exp (o_{i})}{\sum_{j = 1}^{C} exp (o_{j})}, i = 1, 2, \dots, C

(5)

where

{\hat{y}}_{i}

represents the predicted probability for class i. The predicted class label is determined by selecting the class with the highest probability.

To train the model, a cross-entropy loss function is used to measure the difference between the predicted probabilities and the true class labels. The loss function is defined as

L = - \sum_{i = 1}^{C} y_{i} log ({\hat{y}}_{i})

(6)

where

y_{i}

represents the ground-truth label encoded in one-hot form, and

{\hat{y}}_{i}

denotes the predicted probability for class i. Minimizing this loss encourages the model to assign higher probabilities to the correct class labels. The pseudocode for the proposed algorithm is shown in Algorithm 1.

Algorithm 1 Proposed Lightweight MLP-Based Intrusion Detection Model

Require: Original dataset

(X, Y)

, learning rate

η

, batch size B, number of epochs E
Ensure: Predicted class labels

\hat{y}

1:: Preprocess the data to obtain preprocessed dataset $(X_{p}, Y)$
2:: Initialize weights $W_{1}, W_{2}, W_{3}$ and biases $b_{1}, b_{2}, b_{3}$
3:: for epoch $= 1$ to E do
4:: for each mini-batch $B \subset X_{p}$ of size B do
5:: for each input sample $x_{p} \in B$ do
6:: Compute predicted probabilities $\hat{y}$ using the MLP and SoftMax classifier
7:: Compute cross-entropy loss L
8:: end for
9:: Update model parameters using AdamW optimizer
10:: end for
11:: end for
12:: return predicted class labels $\hat{y}$

4. Experimental Setup and Performance Analysis

4.1. Datasets

In this section, we describe the datasets used to evaluate the performance of the proposed intrusion detection model. Three widely used benchmark datasets, CIC-IDS2017 [33], NSL-KDD [34], and CICIoT2023 [35], are used to assess the effectiveness of the proposed approach across different network intrusion scenarios. The datasets were divided into 80% for training and 20% for testing to assess the generalization capability of the proposed approach. The training subset was used to learn the model’s parameters, while the testing subset was used to evaluate the final detection performance.

4.1.1. CICIDS2017

The CICIDS2017 is a large, realistic dataset commonly used to test network intrusion detection systems. The Canadian Institute of Cybersecurity (CIC) created it to address weaknesses in previous datasets on intrusion-detection systems. The datasets include normal network traffic and contemporary attack conditions that mirror real cyber threats. The data were collected in a controlled network environment over five working days (Monday–Friday) to simulate the normal activities of an organization and various forms of wrongdoing. The dataset includes various network traffic classes, including Benign, PortScan, Hulk, DDoS, FTP, Bot, and Web Attack, among others. Such types of attacks render CICIDS2017 appropriate for analyzing the resilience of intrusion detection frameworks in determining various forms of cyber-attacks.

4.1.2. NSL-KDD

The NSL-KDD dataset is an improved version of the earlier KDD’99 dataset and is widely used for benchmarking intrusion detection methods. It is designed to represent different types of network traffic behaviors, including both normal and malicious activities. The dataset consists of one normal traffic class and several attack categories, such as neptune, satan, smurf, and portsweep. These attack categories provide a diverse set of intrusion patterns for evaluating the performance of intrusion detection models.

4.1.3. CICIoT2023

The CICIoT2023 dataset is a comprehensive and realistic benchmark for IoT security research introduced by Neto et al. [35]. It is constructed using a variety of network topologies and includes 105 real IoT devices to closely simulate the operation of IoT systems within a smart home setting. The dataset comprises multiple categories of network traffic, namely Benign, DDoS, DoS, MITM, Mirai, and Recon. Among these, Recon attacks are designed to collect detailed information about target devices within the IoT network, while Mirai represents large-scale distributed denial-of-service attacks specifically aimed at compromising IoT infrastructures. These attack types reflect both common and emerging threats observed in modern IoT environments.

4.2. Experimental Setup

All experiments were implemented in Python 3.9.25 using the PyTorch 2.5.1 deep learning framework. Data preprocessing and analysis were performed using commonly used scientific computing libraries, including NumPy 2.0.2, Pandas 2.3.3, Matplotlib 3.9.2, and Seaborn 0.13.2. The experiments were conducted on a system equipped with an Intel Core i5-12600K processor running at 3.69 GHz with 48 GB RAM.

4.3. Model Training and Hyperparameter Settings

The proposed model employs a two-layer multi-layer perceptron architecture with 128 and 64 neurons in the hidden layers, respectively. The hidden layers utilize the ReLU activation function to capture nonlinear relationships in network traffic data, while the output layer applies a SoftMax classifier for multi-class traffic classification. The model was trained for 100 epochs with a batch size of 128. Parameter optimization was performed using the AdamW optimizer with a learning rate of 0.003, and L2 regularization (weight decay of

1 \times 10^{- 4}

) was applied to improve generalization and prevent overfitting. The hyperparameters used in the proposed model are summarized in Table 3.

4.4. Evaluation Metrics

To evaluate the effectiveness of the proposed intrusion detection model, several standard performance metrics were employed, including accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve (AUC-ROC). These metrics provide a comprehensive assessment of the classification performance of the proposed model.

Accuracy measures the overall proportion of correctly classified samples among the total number of samples. It is defined as

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(7)

where

T P

(true positive) represents correctly detected attack samples,

T N

(true negative) denotes correctly identified normal samples,

F P

(false positive) refers to normal samples incorrectly classified as attacks, and

F N

(false negative) represents attack samples incorrectly classified as normal traffic.

Precision measures the proportion of correctly predicted attack samples among all predicted attack samples and is defined as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(8)

Recall, also known as detection rate, measures the proportion of actual attack samples that are correctly identified by the model:

R e c a l l = \frac{T P}{T P + F N}

(9)

The F1 score represents the harmonic mean of precision and recall and provides a balanced measure of the model’s performance:

F 1 = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(10)

In addition, the area under the receiver operating characteristic curve (AUC-ROC) is used to evaluate the ability of the model to distinguish between different classes across various threshold values. A higher AUC value indicates better classification performance.

4.5. Performance Results

Table 4, Table 5 and Table 6 show the per-class performance of the proposed model. The metrics include accuracy, precision, recall, and F1 score. In Table 4, the proposed model performs very well across most classes in the CICIDS2017 dataset. The Benign, DDoS, FTP, Hulk, and PortScan classes show accuracy values above 99%. Their precision, recall, and F1 scores are also above 0.99. This shows strong classification performance. However, the Bot class shows lower performance. This indicates that the model sometimes confuses Bot traffic with benign traffic. The Web Attack class also shows slightly lower precision. However, the overall detection performance remains high. Table 5 shows the results for the NSL-KDD dataset. The proposed model performs well for most classes. The Neptune class achieves perfect precision of 1.00. It also shows a high F1 score of 0.9985. The Normal class also performs well. The PortSweep, Satan, and Smurf classes show slightly lower values. However, their accuracy and F1-scores remain high. These results show that the proposed model effectively detects different attack types on both datasets. Table 6 presents the per-class performance of the proposed model on the CICIoT2023 dataset. In particular, the DDoS, DoS, and Mirai classes achieve very high accuracy above 99%, with corresponding precision, recall, and F1 scores consistently exceeding 0.99, indicating robust detection of large-scale attack patterns. The Benign class also achieves strong performance, with high recall and F1-score values, though its precision is slightly lower due to minor misclassifications in attack traffic. However, comparatively lower performance is observed for the MITM and Recon classes. The MITM class shows lower recall and F1 scores, suggesting that the model occasionally misclassifies these instances, likely because they are similar to normal or other attack traffic patterns. Similarly, the Recon class exhibits lower precision and F1 scores, indicating challenges in distinguishing reconnaissance activities from other classes.

The accuracy curves of the proposed model over the training epochs are illustrated in Figure 2 for the CICIDS2017, NSL-KDD, and CICIoT2023 datasets. The plots show training and test accuracy values over 100 epochs. As shown in Figure 2a, for the CICIDS2017 dataset, training and testing accuracies increase rapidly during the initial epochs and gradually stabilize as training progresses. Although minor fluctuations in testing accuracy are observed at certain epochs, the training and testing curves remain closely aligned. Similarly, Figure 2b illustrates the training performance on the NSL-KDD dataset. The model converges quickly within the first few epochs and reaches maximum accuracy. The training and test accuracy curves closely overlap throughout training, indicating stable learning behavior. Figure 2c shows the training and testing accuracy curves for the CICIoT2023 dataset. As with the other datasets, the model shows steady improvement in accuracy during the initial training epochs. However, compared to CICIDS2017 and NSL-KDD, the convergence is relatively slower due to the higher complexity and diversity of attack patterns in the IoT environment. After the initial phase, both training and testing accuracy continue to increase and eventually stabilize at high values. The close alignment between the training and test curves demonstrates that the model maintains strong generalization performance on the CICIoT2023 dataset despite its heterogeneous nature.

The loss curves of the proposed model across training epochs are illustrated in Figure 3 for the CICIDS2017, NSL-KDD, and CICIoT2023 datasets. The plots show the training and testing losses over 100 epochs. As shown in Figure 3a, for the CICIDS2017 dataset, the training loss decreases rapidly during the initial epochs and gradually stabilizes as the training progresses. Similarly, Figure 3b presents the loss curves for the NSL-KDD dataset. The training and test losses decrease sharply in the early epochs and converge quickly as training continues. Both curves remain closely aligned throughout the training process. Figure 3c illustrates the loss curves for the CICIoT2023 dataset. Training loss decreases steadily during the initial epochs, indicating effective learning of underlying patterns. However, compared to CICIDS2017 and NSL-KDD, the loss reduction is more gradual due to the increased complexity and diversity of IoT traffic. The test loss also shows a decreasing trend in the early stages but exhibits noticeable fluctuations at later epochs. Despite these variations, the overall gap between training and test loss remains moderate, indicating that the model does not suffer from severe overfitting.

The normalized confusion matrix shown in Figure 4 illustrates the relationship between the true and predicted class labels produced by the proposed model. The diagonal elements represent the correctly classified samples and therefore indicate the classification accuracy for each class, whereas the off-diagonal elements correspond to misclassified samples. By observing the diagonal elements across the two datasets, it can be inferred that the proposed model achieves high classification performance for most attack categories. However, the results also show that the model is confused when identifying some specific attack types. For instance, as shown in Figure 4a, for the CIC-IDS2017 dataset, the proposed model misclassifies approximately 32% of the Bot attacks as Benign traffic, while the remaining classes are accurately classified. In contrast, Figure 4b shows the results for the NSL-KDD dataset, where the proposed model almost perfectly classifies each attack category with very few misclassifications. Moreover, Figure 4c shows the confusion matrix for the CICIoT2023 dataset. The proposed model correctly classifies most DDoS, DoS, and Mirai traffic with very high accuracy. However, a few misclassifications are observed in the Benign, MITM, and Recon classes.

Figure 5 shows the ROC curves of the proposed model for the CICIDS2017, NSL-KDD, and CICIoT2023 datasets. In Figure 5a, the ROC curves for the CICIDS2017 dataset are presented. The curves are located near the top-left corner of the plot. This indicates strong classification performance. Most classes achieve AUC values close to 1.0. The Benign, DDoS, FTP, Hulk, PortScan, and Web Attack classes show perfect discrimination. The Bot class has a slightly lower AUC compared to the other classes. In Figure 5b, the ROC curves for the NSL-KDD dataset are shown. The curves also remain very close to the top-left corner. This indicates strong detection capability for the attack classes. The Neptune, Normal, and Smurf classes achieve an AUC value of 1.0. The PortSweep and Satan classes show slightly lower values. Figure 5c shows the ROC curves for the CICIoT2023 dataset. The curves are mostly close to the top-left corner, indicating strong classification performance. Most classes achieve high AUC values near 1.0, demonstrating effective detection. The DDoS, DoS, and Mirai classes show near-perfect discrimination. However, the Benign, MITM, and Recon classes have slightly lower AUC values compared to the others.

Figure 6 shows the precision–recall curves of the proposed model for the CICIDS2017, NSL-KDD, and CICIoT2023 datasets. In Figure 6a, most classes show precision–recall curves close to the top-right corner. This indicates strong classification performance. The Benign, DDoS, FTP, Hulk, PortScan, and Web Attack classes achieve very high average precision values. The Bot class performs worse than the other classes. The curve for the Bot class decreases as recall increases, indicating that the model struggles to detect Bot attacks in some cases. In Figure 6b, the curves remain close to the top-right corner for most classes, indicating high precision and recall. The Neptune, Normal, and Smurf classes show very high average precision values. The PortSweep and Satan classes show slightly lower values. However, their performance remains strong. In Figure 6c, most classes have curves close to the top-right corner, indicating strong classification performance. The DDoS, DoS, and Mirai classes achieve high average precision values and maintain stable precision across different recall levels. However, the Benign, MITM, and Recon classes show relatively lower performance. In particular, precision decreases as recall increases for the MITM and Recon classes, indicating that the model struggles to accurately distinguish these traffic types.

4.6. Computational Complexity Analysis

Table 7 presents the computational complexity and model size of the proposed intrusion detection method. The results show that the proposed model maintains a lightweight structure while achieving strong detection performance. For the CICIDS2017 dataset, the model requires 22,827 parameters with 45,200 FLOPs and 22,600 MAC operations, resulting in a model size of 89.6 KB. The training time is 26.98 min, and the testing time is 1.33 s. For the NSL-KDD dataset, the model uses 13,573 parameters, 26,752 FLOPs, and 13,376 MAC operations, resulting in a model size of 54.4 KB. The training time is 43.67 s, and the testing time is 0.05 s. For the CICIoT2023 dataset, the model requires 14,662 parameters, 28,928 FLOPs, and 14,464 MAC operations, resulting in a model size of 58.6 KB. The training time is 13.52 min, and the testing time is 1.03 s. Compared to CICIDS2017, the model achieves lower computational cost and a smaller size while maintaining efficient performance. These results show that the proposed model requires low computational resources and small storage space. Therefore, the model is suitable for deployment in resource-constrained IoT environments.

4.7. Comparison Study

Different existing lightweight intrusion detection models are used as comparison methods, including FBMP-IDS [5], DL-BiLSTM [11], CL-SKD [17], LNet-SKD [18], and IRNet-MBSKD [19]. A performance comparison with existing lightweight intrusion detection models is shown in Table 8. The proposed model achieves strong classification performance on both datasets. For the CICIDS2017 dataset, the proposed method obtains an accuracy of 99.85% and an F1 score of 99.93%. These results are slightly higher than the compared methods. At the same time, the proposed model requires only 13,573 parameters and 26,752 FLOPs. This computational cost is significantly lower than several existing models.

For the NSL-KDD dataset, the proposed model achieves an accuracy of 99.21% and an F1 score of 99.22%. These results are higher than the other compared approaches. In addition, the computational complexity remains very low. The number of parameters and FLOPs is much smaller than the deep learning models reported in previous studies. For the CICIoT2023 dataset, the proposed model achieves an accuracy of 98.45%, with precision and recall values of 98.45% and an F1 score of 98.41%. These results are significantly higher than those of the compared lightweight models. In addition, the proposed model maintains a compact architecture with only 14,662 parameters and 28,928 FLOPs. This computational cost is lower than that of existing methods while providing better detection performance. These results demonstrate that the proposed lightweight architecture provides strong detection capability while maintaining low computational overhead. Therefore, the model is suitable for deployment in resource-constrained IoT environments.

5. Discussion

The proposed lightweight intrusion detection model detects various attack categories in IoT network data using a two-layer multi-layer perceptron embedding backbone and a linear SoftMax classification head. The model has several advantages, including simplicity, efficiency, and excellent classification performance. The compact MLP design allows the model to learn discriminative representations of network traffic features while being lightweight. This design enables the model to achieve high detection performance with fewer parameters and lower computational overhead than many other deep learning algorithms. The experimental results on the CICIDS2017, NSL-KDD and CICIoT2023 datasets show that the proposed model achieves high classification accuracy while utilizing fewer FLOPs and MAC operations and having a modest model size. These aspects make the model suitable for use in resource-constrained IoT environments with limited memory and processing power. Furthermore, the comparatively low inference time enables more rapid identification of malicious traffic, which is critical for real-time network security monitoring. Moreover, unlike many existing lightweight IDS approaches that rely on post-training optimization techniques such as quantization and pruning, the proposed model achieves a compact design and strong detection performance without requiring additional optimization steps, thereby reducing implementation complexity and improving suitability for real-time IoT deployment.

Although it has several advantages, the proposed method has several limitations. The model performs poorly at detecting some attack types that exhibit traffic patterns similar to those of benign network behavior. This suggests that lightweight models continue to struggle to recognize subtle variations across traffic classes. Furthermore, the current design relies on a simplistic feature-embedding structure, limiting the model’s ability to capture the complex traffic patterns in large-scale IoT networks. Future work will employ additional strategies to optimize detection capability while maintaining computational efficiency. Advanced feature selection, model pruning, quantization, and parameter optimization will be used to reduce the number of parameters and overall model size. These enhancements would make the model more viable for real-world IoT applications with limited resources.

6. Conclusions

To address security challenges in IoT networks, this work proposes a lightweight intrusion detection model based on a two-layer multi-layer perceptron with an embedding backbone, followed by a linear SoftMax classification head for multi-class attack detection. The proposed approach learns compact, discriminative representations of network traffic attributes while remaining lightweight in architecture. The approach optimizes intrusion detection for resource-constrained IoT systems by minimizing the number of parameters and computational operations. The suggested method performs well on the CICIDS2017, NSL-KDD, and CICIoT2023 datasets, achieving accuracies of 99.85%, 99.21%, and 98.45%, respectively, while maintaining minimal computational complexity, parameter count, and model size.

Future studies will focus on improving the generalization capability of the proposed model across more varied and extensive network scenarios to increase its applicability in real-world IoT systems further. This includes testing the model on additional real-world datasets and strengthening it against complex, emerging cyber threats. Future studies will also investigate optimization techniques, such as feature selection, parameter tuning, model pruning, and quantization, to minimize the number of parameters and overall model size while maintaining exceptional detection performance. In addition, future work will explore imbalance-handling techniques to improve the detection performance of minority classes, which often exhibit lower classification accuracy. Furthermore, real-time deployment of the proposed model will be considered to validate its effectiveness and efficiency in practical IoT environments.

Author Contributions

Conceptualization, J.C.; data curation, J.C.; funding acquisition, J.A.; investigation, J.A.; methodology, J.C.; project administration, J.A.; resources, J.A.; software, J.C.; supervision, J.A.; validation, J.A.; visualization, J.A.; writing—original draft, J.C.; writing—review and editing, J.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets analyzed in this study are publicly available from the following sources: (i) University of New Brunswick (UNB) CICIDS2017 dataset: https://www.unb.ca/cic/datasets/ids-2017.html, accessed on 21 February 2026; (ii) University of New Brunswick (UNB) NSL-KDD dataset: https://www.kaggle.com/datasets/hassan06/nslkdd, accessed on 21 February 2026; (iii) University of New Brunswick (UNB) CICIoT2023 dataset: https://www.kaggle.com/datasets/himadri07/ciciot2023, accessed on 4 March 2026.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Vishwakarma, A.K.; Chaurasia, S.; Kumar, K.; Singh, Y.N.; Chaurasia, R. Internet of Things Technology, Research, and Challenges: A Survey. Multimed. Tools Appl. 2025, 84, 8455–8490. [Google Scholar] [CrossRef]
Secnora. BADBOX 2.0 Android Malware Infects Millions of Consumer Devices. 18 July 2025. Available online: https://secnora.com/blog/badbox-2-0-android-malware-infects-millions-of-consumer-devices/ (accessed on 4 March 2026).
Seals, T. Reaper Botnet Has Come for the Internet. 21 October 2017. Available online: https://www.infosecurity-magazine.com/news/reaper-botnet-has-come-for-the/ (accessed on 4 March 2026).
Rahman, M.; Al Shakil, S.; Mustakim, M.R. A Survey on Intrusion Detection System in IoT Networks. Cyber Secur. Appl. 2025, 3, 100082. [Google Scholar] [CrossRef]
Sakraoui, S.; Ahmim, A.; Derdour, M.; Ahmim, M.; Namane, S.; Ben Dhaou, I. FBMP-IDS: FL-Based Blockchain-Powered Lightweight MPC-Secured IDS for 6G Networks. IEEE Access 2024, 12, 105887–105905. [Google Scholar] [CrossRef]
Hozouri, A.; Mirzaei, A.; Effatparvar, M. A Comprehensive Survey on Intrusion Detection Systems with Advances in Machine Learning, Deep Learning and Emerging Cybersecurity Challenges. Discov. Artif. Intell. 2025, 5, 314. [Google Scholar] [CrossRef]
Xu, Z.; Wu, Y.; Wang, S.; Gao, J.; Qiu, T.; Wang, Z.; Wan, H.; Zhao, X. Deep Learning-Based Intrusion Detection Systems: A Survey. arXiv 2025, arXiv:2504.07839. [Google Scholar] [CrossRef]
Aldaej, A.; Ahanger, T.A.; Ullah, I. Deep Learning-Inspired IoT-IDS Mechanism for Edge Computing Environments. Sensors 2023, 23, 9869. [Google Scholar] [CrossRef]
Rawat, M.; Singal, G. Surveying Technology Fusion in IoT Networks for IDS: Exploring Datasets, Tools, Challenges, and Research Prospects. ACM Trans. Intell. Syst. Technol. 2025, 16, 107. [Google Scholar] [CrossRef]
Ali, J.; Song, H.H.; Sharma, V.; Al-Khasawneh, M.A. DDoS Intrusions Detection in Low Power SD-IoT Devices Leveraging Effective Machine Learning. IEEE Trans. Consum. Electron. 2024, 71, 343–351. [Google Scholar] [CrossRef]
Wang, Z.; Chen, H.; Yang, S.; Luo, X.; Li, D.; Wang, J. A Lightweight Intrusion Detection Method for IoT Based on Deep Learning and Dynamic Quantization. PeerJ Comput. Sci. 2023, 9, e1569. [Google Scholar] [CrossRef]
Omarov, B.; Auelbekov, O.; Suliman, A.; Zhaxanova, A. CNN-BiLSTM Hybrid Model for Network Anomaly Detection in Internet of Things. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 349. [Google Scholar] [CrossRef]
Fatima, M.; Rehman, O.; Rahman, I.M.H.; Ajmal, A.; Park, S.J. Towards Ensemble Feature Selection for Lightweight Intrusion Detection in Resource-Constrained IoT Devices. Future Internet 2024, 16, 368. [Google Scholar] [CrossRef]
Wang, Z.; Zhou, R.; Yang, S.; He, D.; Chan, S. A Novel Lightweight IoT Intrusion Detection Model Based on Self-Knowledge Distillation. IEEE Internet Things J. 2025, 12, 16912–16930. [Google Scholar] [CrossRef]
Govindarajan, V.; Ahmed, F.; Faheem, Z.B.; Bilal, M.; Ayadi, M.; Ali, J. Aegis-5: A Hybrid Ensemble Framework for Intrusion Detection in Industry 5.0 Driven Smart Manufacturing Environment. In ACM Transactions on Autonomous and Adaptive Systems; Association for Computing Machinery: New York, NY, USA, 2026. [Google Scholar]
Benaddi, H.; Jouhari, M.; Elharrouss, O. A Lightweight Hybrid Approach for Intrusion Detection Systems Using a Chi-Square Feature Selection Approach in IoT. Internet Things 2025, 32, 101624. [Google Scholar] [CrossRef]
Li, Z.; Yao, W. A Two-Stage Lightweight Approach for Intrusion Detection in Internet of Things. Expert Syst. Appl. 2024, 257, 124965. [Google Scholar] [CrossRef]
Yang, S.; Zheng, X.; Xu, Z.; Wang, X. A Lightweight Approach for Network Intrusion Detection Based on Self-Knowledge Distillation. In Proceedings of the IEEE International Conference on Communications (ICC), Rome, Italy, 28 May–1 June 2023; IEEE: New York, NY, USA, 2023; pp. 3000–3005. [Google Scholar]
Feng, S.; Ma, S.; Ma, M. A Lightweight Network Intrusion Detection Method Based on Protocol-Aware Dynamic Inverted Residuals and a Sliding-Window Multi-Batch Self-Knowledge Distillation Strategy. In Proceedings of the Guangdong-Hong Kong-Macao Greater Bay Area Education Digitalization and Computer Science International Conference (EDCS), Shenzhen, China, 18–19 April 2025; Association for Computing Machinery: New York, NY, USA, 2025; pp. 833–838. [Google Scholar]
Khan, A.; Hussain, M.A.; Anwer, F. A Hybrid Lightweight Deep Learning-Based Intrusion Detection Approach in IoT Utilizing Feature Selection and Explainable Artificial Intelligence. IEEE Access 2025, 13, 192451–192466. [Google Scholar] [CrossRef]
Yang, X.; Tong, F.; Jiang, F.; Cheng, G. A Lightweight and Dynamic Open-Set Intrusion Detection for Industrial Internet of Things. IEEE Trans. Inf. Forensics Secur. 2025, 20, 2930–2943. [Google Scholar] [CrossRef]
Ma, W.; Wang, X.; Dong, J.; Hu, M.; Zhou, Q. A Lightweight Method for Botnet Detection in Internet of Things Environment. IEEE Trans. Netw. Sci. Eng. 2025, 12, 2458–2472. [Google Scholar] [CrossRef]
Zhang, X.; Wang, Y.; Han, G.; Gui, G. Advanced Few-Shot Network Intrusion Detection Method Using Lightweight Transfer Learning. IEEE Internet Things J. 2025, 12, 48678–48688. [Google Scholar] [CrossRef]
Fard, E.; Soltani, M.; Jahangir, A.H.; Ko, S. LightIDS: A Lightweight Neural Network-Based Intrusion Detection System. J. Supercomput. 2026, 82, 18. [Google Scholar] [CrossRef]
Wang, L.-H.; Dai, Q.; Du, T.; Chen, L.-F. Lightweight Intrusion Detection Model Based on CNN and Knowledge Distillation. Appl. Soft Comput. 2024, 165, 112118. [Google Scholar] [CrossRef]
Iliyasu, A.S.; Siddiqui, A.J.; Song, H.; Abdu, F.J. PNet-IDS: A Lightweight and Generalizable Convolutional Neural Network for Intrusion Detection in Internet of Things. IEEE Access 2025, 13, 102624–102639. [Google Scholar] [CrossRef]
Pasquini, A.; Vasa, R.; Logothetis, I.; Habibi Gharakheili, H.; Chambers, A.; Tran, M. Robust and Lightweight Modeling of IoT Network Behaviors From Raw Traffic Packets. IEEE Trans. Mach. Learn. Commun. Netw. 2025, 3, 98–116. [Google Scholar] [CrossRef]
Alhassan, A.M. Self-Adaptive Lightweight Attention Module-Based BiLSTM Model for Effective Intrusion Detection. Arab. J. Sci. Eng. 2025, 50, 11513–11538. [Google Scholar] [CrossRef]
Dontu, S.; Vallabhaneni, R.; Addula, S.R.; Pareek, P.K.; Balassem, Z.A. Cybersecurity Framework Development for BoTNet Attack Detection Using ISSOA-Based Attention Recurrent Autoencoder. In Proceedings of the International Conference on Intelligent Algorithms for Computational Intelligence Systems (IACIS); Association for Computing Machinery: New York, NY, USA, 2024; pp. 1–7. [Google Scholar]
Vadakkethil, S.E.; Polimetla, K.; Alsalami, Z.; Pareek, P.K.; Kumar, D. Mayfly Optimization Algorithm with Bidirectional Long Short-Term Memory for Intrusion Detection System in Internet of Things. In Proceedings of the Third International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), Ballari, India, 26–27 April 2024; IEEE: New York, NY, USA, 2024; pp. 1–4. [Google Scholar]
Jablaoui, A.; Liouane, N. GA-CNN-BiGRU-IDS: A Robust Framework for Intrusion Detection System Based on GA for Data Augmentation and Hybrid CNN-BiGRU Model for Spatiotemporal Feature Extraction. Comput. Electr. Eng. 2026, 130, 110900. [Google Scholar] [CrossRef]
Xiang, Y.; Li, D.; Meng, X.; Dong, C.; Qin, G. ResNeSt-biGRU: An Intrusion Detection Model Based on Internet of Things. Comput. Mater. Contin. 2024, 79, 1005–1023. [Google Scholar] [CrossRef]
Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. In Proceedings of the International Conference on Information Systems Security and Privacy (ICISSP), Funchal, Portugal, 22–24 January 2018; pp. 108–116. [Google Scholar]
Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A Detailed Analysis of the KDD CUP 99 Data Set. In Proceedings of the IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), Ottawa, ON, Canada, 8–10 July 2009; IEEE: New York, NY, USA, 2009; pp. 1–6. [Google Scholar]
Neto, E.C.P.; Dadkhah, S.; Ferreira, R.; Zohourian, A.; Lu, R.; Ghorbani, A.A. CICIoT2023: A Real-Time Dataset and Benchmark for Large-Scale Attacks in IoT Environment. Sensors 2023, 23, 5941. [Google Scholar] [CrossRef]

Figure 1. Overall system architecture.

Figure 2. Accuracy curves of the proposed model in different datasets. (a) CICIDS2017; (b) NSL-KDD; (c) CICIoT2023.

Figure 3. Loss curves of the proposed model in different datasets. (a) CICIDS2017; (b) NSL-KDD; (c) CICIoT2023.

Figure 4. Confusion matrix of the proposed model in different datasets. (a) CICIDS2017; (b) NSL-KDD; (c) CICIoT2023.

Figure 5. ROC of the proposed model in different datasets. (a) CICIDS2017; (b) NSL-KDD; (c) CICIoT2023.

Figure 6. Precision–recall curves of the proposed model in different datasets. (a) CICIDS2017; (b) NSL-KDD; (c) CICIoT2023.

Table 1. Comparison of existing lightweight IoT intrusion detection methods.

Ref.	Feature Selection	Model	Disadvantages
[11]	Incremental PCA (IPCA)	DNN + BiLSTM	Dynamic quantization reduces the optimal detection capability of the trained model. The dataset does not represent highly dynamic IoT environments.
[14]	Binary Harris Hawk Optimization (bHHO)	TBCLNN with SKD	Performance strongly depends on the optimization process. Complex lightweight convolution structures increase model design complexity.
[16]	Chi-square ( $χ^{2}$ )	CNN + BiLSTM	Limited scalability in large IoT environments. Reduced robustness under dynamic network conditions. Dataset imbalance affects detection performance.
[20]	Genetic algorithm (GA)	DNN + BiLSTM	Quantization introduces accuracy degradation. The dataset does not fully represent real IoT traffic variability.
[21]	–	VAE + Extreme Value Machine	The EVM classifier produces unstable accuracy and F1 score. Removing the EVT component significantly degrades detection performance.
[22]	CDF ranking + Gini importance	MLP, CNN, LSTM, RF	Detection accuracy decreases when the number of selected features is reduced. Performance depends heavily on the selected classifier.
[23]	Kernel-based nonlinear feature selection	VGG-based model	Heavy dependence on pretrained models reduces adaptability to new traffic distributions. Transfer learning increases training complexity.
[24]	–	DNN	Design space exploration requires extensive computational search. Model tuning significantly increases development time.
[25]	–	CNN with Knowledge Distillation	The method focuses mainly on intrusion detection rather than attack family classification. Simple oversampling does not effectively solve dataset imbalance.
[26]	–	Lightweight CNN (PNet-IDS)	Performance degrades in large-scale IoT environments. Model struggles with complex traffic patterns.
[27]	Packet embeddings	1D CNN	Lower predictive accuracy compared to deep models. Requires centralized server processing for improved classification.
[28]	–	Attention-based BiLSTM	The model suffers from scalability issues with large network traffic volumes. High computational overhead for large datasets.
[17]	–	CL-SKD	The model contains many hyperparameters that require manual tuning. Hyperparameter selection increases training complexity.
[18]	–	LNet-SKD	Lightweight architecture reduces feature representation capability. Knowledge distillation introduces additional training overhead.
[19]	–	IRNet-MBSKD	Protocol-aware architecture increases model complexity. Multi-batch self-distillation increases training time.

Table 2. Mathematical notation used in the proposed model.

Symbol	Description
n	Number of samples in the dataset
d	Number of input features
C	Number of traffic classes
X	Original feature matrix
Y	Class label vector
$X_{p}$	Preprocessed and normalized feature matrix
$x_{p}$	Input feature vector
$h_{1}$	Hidden representation of the first layer
z	Compact embedding vector from the second hidden layer
$W_{1}$	Weight matrix of the first hidden layer
$b_{1}$	Bias vector of the first hidden layer
$W_{2}$	Weight matrix of the second hidden layer
$b_{2}$	Bias vector of the second hidden layer
$W_{3}$	Weight matrix of the classification layer
$b_{3}$	Bias vector of the classification layer
$σ (\cdot)$	Activation function (ReLU)
$\hat{y}$	Predicted class probability vector
o	Output logits before SoftMax
L	Cross-entropy loss function

Table 3. Hyperparameter settings of the proposed model.

Parameter	Value
Hidden layer 1 neurons	128
Hidden layer 2 neurons	64
Activation function	ReLU
Optimizer	AdamW
Learning rate	0.003
Weight decay (L2 regularization)	$1 \times 10^{- 4}$
Batch size	128
Epochs	100
Loss function	Cross-entropy

Table 4. Per-class performance metrics cicids2017.

Class	Accuracy	Precision	Recall	F1-Score
Benign	0.9971	0.9961	0.9971	0.9966
Bot	0.6814	1.0000	0.6814	0.8105
DDoS	0.9997	0.9994	0.9997	0.9995
FTP	0.9983	0.9983	0.9983	0.9983
Hulk	0.9989	0.9955	0.9989	0.9972
PortScan	0.9996	0.9992	0.9996	0.9994
Web Attack	0.9912	0.9825	0.9912	0.9868

Table 5. Per-class performance metrics nsl-kdd.

Class	Accuracy	Precision	Recall	F1-Score
Neptune	0.9970	1.0000	0.9970	0.9985
Normal	0.9911	0.9958	0.9911	0.9933
PortSweep	0.9643	0.9474	0.9643	0.9558
Satan	0.9905	0.9286	0.9905	0.9585
Smurf	0.9753	0.9405	0.9753	0.9576

Table 6. Per-class performance metrics CICIoT2023.

Class	Accuracy	Precision	Recall	F1-Score
Benign	0.9663	0.9169	0.9663	0.9410
DDoS	0.9984	0.9989	0.9984	0.9986
DoS	0.9982	0.9970	0.9982	0.9976
MITM	0.7015	0.8967	0.7015	0.7872
Mirai	0.9973	0.9991	0.9973	0.9982
Recon	0.7996	0.7500	0.7996	0.7740

Table 7. Computational complexity and model size of the proposed method.

Dataset	Training Time (s)	Testing Time (s)	No. Parameters	FLOPs	MACs	Model Size (KB)
CICIDS2017	26.98 min	1.33	22,827	45,200	22,600	89.6
NSL-KDD	43.67	0.05	13,573	26,752	13,376	54.4
CICIoT2023	13.52 min	1.03	14,662	28,928	14,464	58.6

Table 8. Performance comparison with existing lightweight IDS models.

Dataset	Ref.	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)	No. Parameters	FLOPs
CICIDS2017	[11]	99.67	99.54	99.67	99.59	1654	610,800
	[17]	99.84	99.84	99.84	99.84	14,494	145,297,408
	Ours	99.85	99.89	99.85	99.93	13,573	26,752
NSL-KDD	[17]	97.55	97.50	97.55	97.49	14,494	145,297,408
	[18]	98.66	95.22	85.68	89.03	4940	194,580
	[19]	98.79	95.37	87.33	91.32	5070	198,650
	Ours	99.21	99.24	99.21	99.22	13,573	26,752
CICIoT2023	[11]	93.13	91.80	93.13	91.94	1988	628,800
	[5]	91.35	77.41	-	-	206,602	-
	Ours	98.45	98.45	98.45	98.41	14,662	28,928

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chandroth, J.; Ali, J. Lightweight MLP-Based Feature Extraction with Linear Classifier for Intrusion Detection System in Internet of Things. Electronics 2026, 15, 1604. https://doi.org/10.3390/electronics15081604

AMA Style

Chandroth J, Ali J. Lightweight MLP-Based Feature Extraction with Linear Classifier for Intrusion Detection System in Internet of Things. Electronics. 2026; 15(8):1604. https://doi.org/10.3390/electronics15081604

Chicago/Turabian Style

Chandroth, Jisi, and Jehad Ali. 2026. "Lightweight MLP-Based Feature Extraction with Linear Classifier for Intrusion Detection System in Internet of Things" Electronics 15, no. 8: 1604. https://doi.org/10.3390/electronics15081604

APA Style

Chandroth, J., & Ali, J. (2026). Lightweight MLP-Based Feature Extraction with Linear Classifier for Intrusion Detection System in Internet of Things. Electronics, 15(8), 1604. https://doi.org/10.3390/electronics15081604

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Lightweight MLP-Based Feature Extraction with Linear Classifier for Intrusion Detection System in Internet of Things

Abstract

1. Introduction

2. Related Works

3. Proposed Method

3.1. Data Preprocessing

3.2. Feature Extraction

3.3. Classification Layer and Loss Function

4. Experimental Setup and Performance Analysis

4.1. Datasets

4.1.1. CICIDS2017

4.1.2. NSL-KDD

4.1.3. CICIoT2023

4.2. Experimental Setup

4.3. Model Training and Hyperparameter Settings

4.4. Evaluation Metrics

4.5. Performance Results

4.6. Computational Complexity Analysis

4.7. Comparison Study

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI