SE-DWNet: An Advanced ResNet-Based Model for Intrusion Detection with Symmetric Data Distribution

Zhang, Kunsan; Zheng, Renguang; Li, Chaopeng; Zhang, Song; Wu, Xinyi; Sun, Shidan; Yang, Jiawen; Zheng, Jiachun

doi:10.3390/sym17040526

Open AccessArticle

SE-DWNet: An Advanced ResNet-Based Model for Intrusion Detection with Symmetric Data Distribution

by

Kunsan Zhang

¹,

Renguang Zheng

¹,

Chaopeng Li

^2,*,

Song Zhang

¹,

Xinyi Wu

²,

Shidan Sun

²

,

Jiawen Yang

² and

Jiachun Zheng

²

¹

State Grid Fujian Electric Power Co., Ltd. Zhangzhou Power Supply Company, No. 13 Shengli East Road, Xiangcheng District, Zhangzhou 363000, China

²

School of Ocean Information Engineering, Jimei University, Xiamen 361021, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(4), 526; https://doi.org/10.3390/sym17040526

Submission received: 26 February 2025 / Revised: 23 March 2025 / Accepted: 28 March 2025 / Published: 31 March 2025

(This article belongs to the Section Engineering and Materials)

Download

Browse Figures

Versions Notes

Abstract

With the rapid increase in cyber-attacks, intrusion detection systems (IDS) have become essential for network security. However, traditional IDS methods often struggle with class imbalance, leading to asymmetric data distributions that adversely affect detection performance and model generalization. To address this issue and enhance detection accuracy, this paper proposes SE-DWNet, a residual network model incorporating an attention mechanism and one-dimensional depthwise separable convolution, trained on a symmetrically preprocessed dataset using SMOTETomek sampling. First, the feature distributions of the training and test datasets are analyzed using box plots, highlighting the impact of feature difference. To mitigate this difference and restore a more symmetric data distribution, we employ the SMOTETomek integrated sampling method in conjunction with a Focal Loss function. Subsequently, a lightweight residual network, incorporating the SE module and the Res-DWNet module, is designed to improve detection accuracy while maintaining computational efficiency. Extensive experiments on the NSL-KDD, CICIDS2018, and ToN-IoT datasets demonstrate that SE-DWNet outperforms existing neural network-based IDS models, achieving accuracy, precision, recall, and F1-score improvements ranging from 0.17% to 5.33%. The results confirm the effectiveness and superiority of the proposed approach in intrusion detection tasks.

Keywords:

attack detection; deep learning; attention mechanism; residual network; symmetric distribution

1. Introduction

With the rapid development of the Internet and information technology, cyberspace has become an integral part of modern society. The significance of Intrusion Detection Systems (IDS), as a crucial technology for network security, has grown increasingly apparent. The primary function of IDS is to monitor network traffic, detect and report potential malicious activities, and provide timely alerts to network administrators, enabling them to implement appropriate protective measures. However, as network attacks become more complex and diverse, traditional IDS face several challenges, such as low detection rates, high false alarm rates, and insufficient capabilities to identify unknown attacks. Moreover, the rapid expansion and unbalanced distribution of network traffic data have further compromised the performance of IDS. Therefore, researching and developing more efficient and accurate network intrusion detection methods has become both theoretically and practically significant [1,2].

A typical network intrusion detection system consists of four key components, as illustrated in Figure 1. First, it collects data from network traffic or system logs. Next, it analyzes the collected data to identify potential anomalous behaviors. The system then compares and analyzes this data to detect possible intrusion events. Once an intrusion event is identified, the IDS records detailed information about the event and automatically takes action based on predefined policies [3,4].

Traditional intrusion detection methods can be broadly classified into three primary categories. Signature-based IDS [5,6,7,8], anomaly-based IDS [9,10,11,12], and hybrid IDS [10,13,14,15]. Signature-based IDS detects intrusions by relying on known attack signatures (i.e., predefined patterns of malicious activity). This approach works by matching real-time monitored network traffic with a database of attack signatures. The main advantage of this method is its high detection accuracy and the ability to quickly identify known attacks. However, its limitation lies in its inability to detect new or unknown attacks, as the signature database only contains information about previously identified attack patterns. In contrast, anomaly-based IDS identifies intrusions by establishing a baseline of normal network behavior and monitoring deviations from this baseline. This approach allows for the detection of novel or unknown attacks, as it focuses on recognizing anomalous behavior rather than relying on known attack signatures. However, the challenge with anomaly-based IDS lies in its potential to generate a higher number of false positives, as deviations from the baseline can sometimes result from benign behavior rather than actual intrusions. Hybrid IDS seeks to overcome the limitations of both signature-based and anomaly-based approaches by combining the strengths of both methods. By integrating signature detection and anomaly detection, hybrid IDS enhances the robustness and adaptability of intrusion detection systems. Despite these advancements, current IDS techniques still face challenges, such as high false alarm rates and low detection rates.

In recent years, machine learning and deep learning techniques have made significant advancements across various domains, offering novel approaches to address network intrusion detection challenges. Researchers have increasingly turned to machine learning algorithms to automatically discover and identify different types of intrusions by analyzing network traffic data. Classical machine learning methods such as Random Forest [16], K-Nearest Neighbors (KNNs) [17], Decision Trees [18], and Support Vector Machines (SVM) [19] have demonstrated promising results in network intrusion detection. However, these methods still face limitations, particularly in addressing issues such as data imbalance, model overfitting, and scalability. To mitigate these challenges, ensemble learning methods, such as bagging [20] and stacking [21], have gained traction. These approaches combine the strengths of multiple models to enhance the overall performance of intrusion detection systems (IDS). Concurrently, the rise of neural networks has led to the development of advanced models like Convolutional Neural Networks (CNN) [22], Long Short-Term Memory Networks (LSTM) [23,24], and Generative Adversarial Networks (GAN) [25]. These deep learning models exhibit remarkable capabilities in feature extraction and pattern recognition. By automatically learning effective features from complex network traffic data, deep learning models have significantly improved the accuracy and efficiency of attack detection.

This paper presents an end-to-end solution, spanning from data preprocessing to model deployment. First, a state-of-the-art sampling technique is integrated with an enhanced loss function to effectively address the issue of data imbalance. Next, we introduce a novel cyber-attack detection model that creatively combines Depthwise Separable Convolution, Residual Connections, the Squeeze-and-Excitation (SE) module, and Global Average Pooling. This design allows the model to not only capture local features but also achieve effective fusion of global features. Through extensive experimental validation, our model demonstrates a significant improvement in both the accuracy and robustness of attack detection, while maintaining low computational overhead.

Our contributions are made by this work:

By adopting box plots to visualize the feature distributions of the training and test sets, as well as the SMOTETomek integrated sampling method and Focal Loss function, we effectively solve the problem of unbalanced data sets to ensure that the samples of each category have balanced feature performances;
A residual network is designed by combining the attention mechanism and one-dimensional deep separable convolution for network attack detection. The model consists of the SE module and the Res-DWNet module, which significantly improve the detection accuracy while achieving a lightweight model;
Validation is carried out on the NSL-KDD dataset, the CICIDS2018 dataset, and the ToN-IoT dataset, and the experimental results show that the proposed method outperforms most of the existing methods in terms of performance.

2. Related Work

In recent years, the application of deep learning to network intrusion detection has made significant advancements, with many researchers recognizing the considerable potential of deep learning methods for network security. Models such as deep neural networks, including ResNet [26], when combined with semantic encoders, aim to expand the semantic coding space, thereby enhancing the generalization ability of network attack detection. However, the detection performance of these models tends to be somewhat limited when applied to web traffic with a small semantic space. While deep learning has demonstrated remarkable learning capabilities, existing model architectures still require further optimization to effectively handle complex network traffic.

To address the widespread issue of data imbalance in network security detection, current research efforts have primarily concentrated on three key areas. First, sampling methods remain the most widely adopted strategy. These approaches include oversampling, undersampling, and hybrid techniques. Liu et al. [27] proposed an intrusion detection system (IDS) that combines the Adaptive Synthetic Sampling Technique (ADASYN) with LightGBM. This approach not only reduces computational complexity but also significantly improves model performance, thereby facilitating the optimization of IDS frameworks. However, when applied to large-scale datasets, ADASYN tends to generate redundant samples and, in high-dimensional feature spaces, may introduce noise or irrelevant attributes. Such issues can adversely impact the stability of the model and hinder its generalization capabilities. To address these limitations, several variants such as SMOTE and Borderline-SMOTE have been explored in an effort to generate more representative synthetic samples. Nonetheless, persistent challenges remain, including sample overlapping and ambiguous class boundaries. Second, model-based data augmentation has attracted increasing attention, particularly with the advancement of generative models such as Generative Adversarial Networks (GANs). Researchers have leveraged these models to enhance the distributional realism of minority class samples. For example, Guo et al. [28] proposed an end-to-end TA-GAN framework that employs a tri-party adversarial learning mechanism consisting of a generator, a discriminator, and a classifier to collaboratively improve the quality of generated minority samples. This design enhances the model’s ability to accurately recognize rare classes in imbalanced scenarios. Other studies have utilized alternative generative architectures, such as Variational Autoencoders (VAEs) [29] and Conditional GANs [30], to further improve the diversity and discriminative power of synthetic minority instances. Third, loss function re-weighting techniques have been explored to mitigate the effects of class imbalance during model training. He et al. [31] introduced Focal Loss, initially designed for object detection, which has since been widely applied across various domains. In the field of intrusion detection, Khanam S. [32] employed Class-wise Focal Loss (CFL) to significantly enhance the detection of rare attack types such as R2L and U2R in the NSL-KDD dataset. Similarly, Peng H. [33] developed the CBF-IDS framework, which integrates Focal Loss and has demonstrated strong performance on the UNSW-NB15 dataset. These studies collectively suggest that refining loss functions, particularly through the use of Focal Loss, can effectively address class imbalance and improve the detection accuracy of minority attack categories.

Feature extraction is a critical step in network attack detection. Convolutional Neural Networks (CNNs) have been widely used and have demonstrated remarkable performance in feature extraction. To further enhance model performance, researchers have introduced various improvements to CNNs. For instance, the RC-NN model [34] optimizes the feature extraction process to improve the identification of cyber-attacks, achieving outstanding results on the DARPA and CSE-CIC-IDS2018 dataset. In another approach, Nguyen and Kim [35] combine the Genetic Algorithm (GA) with the K-Nearest Neighbors (KNN) fitness function to derive an improved subset of features, which is subsequently used for Fuzzy C-Means Clustering (FCM). These features are further optimized, and a high-quality deep feature subset is generated using CNNs as the feature extractor. Although this method performs well, it incurs long computation times due to the use of genetic algorithms.

With the increasing sophistication of cyber-attacks, reinforcement learning is gradually being incorporated into the field of network intrusion detection. Safa Mohamed and Ridha Ejbali [36] proposed a reinforcement learning approach based on the deep SARSA algorithm, which exhibited high detection accuracy when dealing with complex network attacks. However, the high error rate of reinforcement learning models remains a significant challenge in the field, and further improvements are required to enhance their stability and accuracy in certain scenarios.

To address the challenges related to the computational resources and complexity of deep learning models, many studies have explored model lightweighting strategies. Mohammed Jouhari et al. [37] proposed a lightweight model that combines a bidirectional long short-term memory network (BiLSTM) with a convolutional neural network (CNN). This model effectively reduces computational complexity while maintaining high detection performance. This lightweight approach offers valuable insights for developing efficient network attack detection systems.

Additionally, the integration of the attention mechanism with deep convolutional networks has garnered significant attention in recent research. Zhen Wang et al. [38] introduced a CNN model based on the attention mechanism for intrusion detection. By transforming network traffic data into image format and applying a multi-layer attention mechanism, the model achieves both efficient and accurate intrusion detection. The attention mechanism enables the model to focus more precisely on key features within the network traffic, thereby improving detection accuracy. Building on this concept, we propose a Residual Difference Network (SE-DWNet) model that integrates the attention mechanism with one-dimensional depth-separable convolution. This model not only significantly enhances the accuracy of network attack detection while maintaining computational efficiency but also demonstrates the promising potential of combining deep learning with attention mechanisms. We summarize the comparison of the above typical works in Table 1.

3. Proposed SE-DWNet Framework

3.1. Overview

Firstly, low-level feature extraction is performed on the input data through a 3 × 3 standard convolutional layer, a process that helps the network to quickly capture the basic patterns and local information of the input data. This is followed by the normalization of the features through a BatchNormalization layer, and the normalized features are then input into the ReLU activation function for a non-linear transformation, which enhances the expressive ability of the model. The nonlinearly transformed features then enter the BasicBlock layer for multilayer feature extraction and enhancement, which extracts higher-order semantic information through deep separable convolution and optimises gradient propagation by combining residual concatenation, thus improving training stability. After all the feature extraction layers, the model uses Adaptive Average Pooling to compress the size of the input feature map. This operation not only significantly reduces the number of parameters, but also enables the model to adapt to data with different input lengths, thereby enhancing generalization. Finally, the model accomplishes the classification task through the Fully Connected Layer (FC), whereby the pooled features are mapped to the output category space to complete the prediction of the input data. The model’s architecture integrates the Residual Connection and the SE module, ensuring high computational efficiency while enhancing the model’s expressiveness as illustrated in Figure 2. The subsequent section provides a comprehensive description of the individual modules’ functions.

3.2. SE Module

The Squeeze-and-Excitation (SE) module [39] enhances the representation of the network by explicitly modeling the interdependencies between convolutional feature channels. Specifically, the SE module recalibrates features by incorporating global information, enabling the network to adaptively adjust feature responses based on the importance of different channels. This mechanism is based on a “squeeze-excitation” operation, where global average pooling is applied to the feature map to extract global descriptors of the channels. Subsequently, weight coefficients for each channel are generated through a fully connected layer. These weights are then applied to the original feature map to selectively enhance important features while suppressing less relevant ones. The SE module is shown in Figure 3.

3.3. Depthwise Separable Convolution

Depthwise separable convolution [40] is a lightweight convolution operation that decomposes standard convolution into two distinct processes: channel-wise convolution and pointwise convolution. In channel-wise convolution, each input channel is convolved independently, resulting in an output feature map that retains the same number of channels as the input feature map. However, this process typically leads to a limited number of output feature maps, which can hinder the model’s learning capacity. To address this, pointwise convolution is employed. Pointwise convolution preserves the spatial dimensions (width and height) while modifying the channel dimensions through a 1 × 1 convolution operation, thereby enabling information integration across channels. Compared to traditional convolution, depthwise separable convolution (as shown in Figure 4) significantly reduces both computational complexity and the number of parameters, while still maintaining robust feature extraction capabilities.

3.4. The BasicBlock Based on Residual Connection

The BasicBlock of our model principally combines the Res-DWNet module (residue-connected depthwise-separable convolutional layer) and the SE module in order to enhance the feature extraction capability and improve the computational efficiency. Within the BasicBlock, two layers of 3 × 3 depthwise-separable convolutions are utilized to extract higher-level features. If the input dimension is large, it can be increased to 5 × 5 or 7 × 7. These two layers of convolutions are connected by BN and nonlinear activation functions. The convolution of the second layer is residually connected to the input of the BasicBlock after passing through the BN and SE modules [41,42], thereby achieving cross-layer information fusion. This residual connection mechanism enables the input signal to bypass the intermediate layers and proceed directly to the subsequent layers. This approach effectively mitigates the problem of gradient vanishing in the generative network while preserving critical information and ensuring that the network increases in depth without compromising the information flow. And the SE module helps to recalibrate feature maps, further improving the model’s ability to capture complex patterns in long sequences. These mechanisms contribute to better feature representation and more efficient processing of long sequences.

3.5. Computational Complexity Analysis

In order to verify that the proposed model has high efficiency while guaranteeing detection accuracy, its computational complexity must be analysed from FLOPs.

(1) Computational complexity of depthwise separable convolution

(MDPI comments: We removed the bold format. Please check. )

Depthwise-separable convolution significantly reduces the computational cost compared to the standard 3 × 3 convolution operation. FLOPs for standard convolution can be expressed as the following Equation (1):

{F L O P s}_{s t a n d a r d} = H^{'} \times W^{'} \times C_{i n} \cdot C_{o u t} \times 3 \times 3 .

(1)

And the depthwise separable convolution splits into two processes, as shown in Equations (2) and (3):

Depthwise Convolution = H^{'} \times W^{'} \times C_{i n} \times (3 \times 3),

(2)

Pointwise Convolution (1 \times 1) = H^{'} \times W^{'} \times C_{i n} \times C_{o u t} .

(3)

Thus, the total computational complexity of the depthwise separable convolution is as in Equation (4). Compared to standard convolution, the computation is reduced by about

\frac{1}{C_{out}}

times.

{FLOPs}_{D W C o n v} = H^{'} \times W^{'} \times (C_{i n} \times 3 \times 3 + C_{in} \times C_{o u t})

(4)

(2) Computational complexity of the SE module

The SE module mainly consists of a global average pooling operation and two fully connected layers, which are used for modeling channel-wise attention. Its FLOPs can be represented as

{FLOPs}_{SE} = C_{in} \times \frac{C_{in}}{r} + \frac{C_{in}}{r} \times C_{in} = 2 \times \frac{C_{in}^{2}}{r},

(5)

where r is the compression rate (typically set to 16). Although the SE module introduces some additional computation when the number of channels is large, it significantly improves the model’s performance.

(3) Computational Complexity of Residual Connections

The element-wise addition overhead in residual structures can be neglected. However, when the input and output channel numbers do not match, a 1 × 1 convolution is required for channel mapping, which may introduce additional computational cost. Nevertheless, this part contributes a very small fraction to the overall computation.

In summary, the proposed model draws on the residual architecture of ResNet, while incorporating DWConv to significantly reduce the computational load. Compared to standard convolutions, DWConv can reduce approximately

\frac{1}{C_{out}}

of the FLOPs. Although the SE module introduces some computational growth, it plays a crucial role in improving detection accuracy. Overall, our model achieves a good balance between accuracy and computational complexity, making it suitable for offline data analysis and lightweight deployment scenarios. However, for scenarios with higher real-time requirements, further optimizations, such as model quantization, can be applied to improve execution efficiency.

4. Experiments

4.1. Dataset

In order to validate the effectiveness and generalization ability of the proposed intrusion detection model, three representative datasets are selected in this paper: the NSL-KDD, the CSE-CIC-IDS2018, and the TON-IoT. These datasets cover traditional attack scenarios, modern cyber threats, and IoT environments, thereby ensuring the stability and adaptability of the model in multiple application scenarios.

The NSL-KDD dataset [43] is a revised version of the widely known KDD99 dataset, encompassing four different types of attacks: DoS, Probe, U2R, and R2L. It consists of four sub-datasets, with the full training set containing 308,071 records and the full test set containing 28,954 records. Each dataset includes 43 features, 41 of which pertain to the traffic input itself, while the remaining two are related to labels and levels. We use this dataset to evaluate the model’s ability to detect traditional structured network attack scenarios, providing a benchmark for subsequent model performance comparisons.

The CICIDS2018 dataset [44] was developed under the guidance of the Canadian Institute for Cybersecurity Research. It consists of network traffic data collected over five consecutive days, during which multiple cyber-attacks and normal traffic were simulated in a controlled experimental environment. The dataset is highly realistic and diverse, with each entry containing detailed characteristics of the network traffic, including packet sizes, protocol types, and source and destination IP addresses. In total, this dataset includes fifteen distinct attack scenarios, and over seventy features were extracted using CICFlowMeter-V3. We use this dataset to examine the performance of the model in the face of modern sophisticated network attacks (e.g., DDoS, brute force, penetration attacks, etc.).

The TON-IoT dataset [45,46,47,48,49,50,51,52] is a next-generation dataset designed for evaluating the accuracy and efficiency of AI-based cybersecurity applications in the context of Industry 4.0 and the Industrial Internet of Things (IIoT). It leverages a variety of IoT and IIoT sensors to capture and record telemetry data. Based on the design of interactive network elements and IoT systems, the dataset covers a three-layer architecture, including edge computing, fog computing, and cloud computing. This structure enables the simulation of real-world IoT networks in production environments. We downloaded heterogeneous data sources collected from the Windows 10 operating system, which were then processed and filtered to generate standard features and labels, making the dataset more accessible for research purposes. Following the implementation of the feature alignment and normalization process, the generalization capability of the model in IoT heterogeneous environments is verified, with particular emphasis on its robustness in the presence of significant device-level data diversity.

For the above dataset, a data preprocessing process is used in this paper, which includes One-Hot coding of the category features, the min-max normalization process, and category balancing operation when necessary. Through the aforementioned joint validation of multi-source datasets, the paper demonstrates the adaptability and effectiveness of the proposed method under different network environments and attack types.

4.2. Data Processing

In the data preprocessing phase, we first conducted data cleaning to ensure the quality of the dataset by addressing outliers and missing values. For the NSL-KDD dataset, we began by applying one-hot encoding to non-numeric data and numeric features with values greater than 100, converting these features into multi-dimensional binary vector representations. For numeric features with values less than 100, we applied Min-Max normalization, scaling the values to a uniform range of [0, 1].

For the CICIDS2018 dataset, we first numerically normalized the Timestamp feature, converting it into a usable numeric form for modeling purposes. We then applied standard normalization across all features to ensure that they had a consistent scale during model training, thus minimizing any potential impact of scale discrepancies on model performance.

For the ToN-IoT dataset, which was already preprocessed and normalized upon official release, our main preprocessing steps involved removing samples with missing values to ensure data integrity. Subsequently, for numerical features, we applied Min-Max normalization to scale the feature values within the range of [0, 1], addressing the impact of differing magnitudes on model training. We selected the TYPE column as the target label for the supervised learning task and extracted it for model training. To avoid redundancy, we removed the label column from the dataset.

Given the notable class imbalance present in some datasets, we employed the SMOTETomek method [53] for comprehensive sampling. This technique first uses SMOTE (Synthetic Minority Over-sampling Technique) to oversample the underrepresented classes by generating synthetic samples. Following this, the Tomek Link method is applied to identify and remove pairs of samples that are close to each other but belong to different classes. This combination effectively balances the class distribution, increasing the number of samples for minority classes while also optimizing the dataset’s boundary structure. As a result, the classification model’s stability and accuracy during training are improved. Furthermore, we found that this method significantly improves the accuracy of the training set, while the improvement in the accuracy of the test set is relatively limited.

4.3. Data Visualization

In IDS datasets, particularly the NSL-KDD dataset, a significant issue of class imbalance is evident, with the quantitative disparity between certain classes reaching tens of thousands. This imbalance leads to an asymmetric data distribution, which can negatively impact model generalization and detection performance. Although the original dataset is partitioned into training and test sets, our analysis using box plot visualizations reveals substantial discrepancies in the feature distributions between these sets, further exacerbating the asymmetry. Notably, features such as Feature 33, Feature 34, and Feature 36 exhibit pronounced differences in distribution across training and test sets, as shown in Figure 5. To address this issue, ensuring a more symmetric data distribution is crucial for improving the robustness and reliability of model in intrusion detection.

This disparity introduces the risk of the model mislearning, particularly with respect to distinguishing between the “normal” class, the “u2r” (user to root) intrusion class, and the “r2l” (remote-to-local) attack class. Simply removing these features could lead to incomplete learning of certain categories, potentially affecting the overall accuracy of the model. To mitigate this bias, we chose to merge the training and test sets, then re-partition the dataset to ensure consistent data distribution, thereby improving the model’s learning process.

After re-segmenting the concatenated dataset and analyzing the feature distributions before and after applying SMOTETomek, we observed (as shown in Figure 6) that the preprocessing step significantly enhances the symmetry of the data distribution, making the class distribution more balanced and uniform. This improvement ensures that the model is trained on a more representative dataset, thereby enhancing its generalization ability. However, for datasets with inherently imbalanced distributions, we recommend integrating additional data augmentation techniques to further improve model performance.

Furthermore, a comparison of the confusion matrices before and after sampling (as illustrated in Figure 6) reveals a notable improvement in the detection accuracy of minority classes. Before applying SMOTETomek, the model tended to misclassify minority attack types as majority attack types, leading to low recall and precision, which negatively impacted the detection capability for low-frequency attack types. After applying SMOTETomek, the detection accuracy of minority classes significantly improved. This result indicates that a more symmetric data distribution contributes to a more balanced decision boundary, ultimately enhancing the overall detection performance of the model.

4.4. Loss Function

In this study, the loss function utilized is Focal Loss [54,55]. As a weighted loss function, Focal Loss effectively addresses the issue of class imbalance in the data. The core principle of Focal Loss is to mitigate overfitting on the majority class samples, which are easier to classify, by incorporating a focusing factor and class weights. Simultaneously, it enhances the model’s ability to correctly classify minority classes or harder-to-classify samples. This approach ultimately improves the model’s performance on challenging instances. The mathematical formulation of Focal Loss is shown in Equation (6) as follows:

\begin{matrix} F L (p_{t}) = \{\begin{matrix} - α {(1 - p_{t})}^{γ} log (p_{t}) & y = 1 \\ - (1 - α) p_{t}^{γ} log (1 - p_{t}) & y \neq 1 \end{matrix} \\ p_{t} = \{\begin{matrix} p & y = 1 \\ 1 - p & y \neq 1 \end{matrix} \end{matrix}

(6)

where

α

is the weight on category t, which can be adjusted according to the category imbalance, and

γ

is the Focal factor, which is used to control the weight of the difficult samples, here we take 2. When

γ

is large, the Focal Loss focuses more on the difficult to categorise samples, which improves the performance of the model on the imbalanced data.For

α

, values such as 0.25 and 0.75 are both suitable, as they allow for balancing the loss between underrepresented and overrepresented categories, with the former providing a stronger emphasis on minority classes and the latter giving more attention to the majority class.

4.5. Evaluation Metrics

In this paper, we primarily employ commonly used evaluation metrics as Equation (7), including accuracy, precision, recall, and F1-score, to assess model performance. These quantitative metrics provide a clear and intuitive means of comparing the performance of different models, particularly in the context of class imbalance. They allow for a comprehensive evaluation of the models’ effectiveness across various scenarios.

\begin{matrix} Accuracy = \frac{TP + TN}{TP + TN + FP + FN} \\ Precision = \frac{TP}{TP + FP} \\ Recall = \frac{TP}{TP + FN} \\ F 1 - score = 2 \times \frac{Precision \times Recall}{Precision + Recall} \end{matrix}

(7)

where TP stands for true positive case, TN stands for true negative case, FP stands for false positive case, and FN stands for false negative case.

4.6. Ablation Experiment

To assess the effectiveness of each module in our model, we conducted an ablation study using the NSL-KDD dataset. The experimental results, presented in the table, provide insights into the individual contributions of each module as well as their combined impact, offering a more comprehensive understanding of their roles in model performance. During the experiment, we observed that one-dimensional convolution was more suitable for this tabular data. As a result, when removing the Depthwise module, we opted for either Depthwise1d or Conv1d. The experimental findings indicate that the SE module plays a crucial role in enhancing the model’s performance, while the Depthwise1d module, with its residual connection, primarily contributes to improvements in structural design and computational efficiency, as is shown in Table 2.

4.7. Comparison of SE-DWNet Performance with Other Models

In this experiment, we used the Adam optimizer with a learning rate set to 0.0005. The Adam optimizer is particularly well-suited for handling sparse data, as it adjusts the learning rates of individual parameters dynamically based on an adaptive mechanism. The NSL-KDD dataset, after undergoing SMOTETomek-integrated sampling, consists of five categories. The number of samples in each category after the integrated sampling process is 61,585, 61,621, 61,631, 61,628, and 61,606, respectively.

The model was trained over 100 epochs with a batch size of 1000. During each epoch, the optimal model parameters were recorded and the learning rate was maintained at 0.0005. The best performing model, as determined by the recorded parameters, was then used to evaluate the model performance through 5-fold cross-validation (K = 5).

For the loss function, we used focal loss, with the

γ

parameter set to 2, to address class imbalance by placing more emphasis on hard-to-classify instances. The final performance metrics are presented in Table 3, and the corresponding confusion matrix is shown in Figure 7. The results indicate that our model consistently outperforms the other models in all evaluation metrics.

In order to comprehensively evaluate the performance of the proposed model in the multiclassification intrusion detection task, this paper adopts a `one-to-many’ strategy for each category in the NSL-KDD dataset, as shown in Table 4. The model attains near-perfect performances in the majority of the categories, particularly in the Normal, DoS, and Probe categories, as measured by the two metrics of AUC-ROC and AUC-PR. Notably, the model exhibits superior performance, especially in the Normal, DoS, and Probe categories. It is noteworthy that the model exhibits a substantial enhancement in identifying infrequent categories of attacks (e.g., U2R and R2L) in comparison to prevailing methodologies. This observation signifies that the residual attention mechanism and the integrated sampling strategy proposed in this study play a pivotal role in enhancing the model’s discriminative capability.

To construct a new dataset with both diversity and representativeness, we performed sample extraction and processing on the CICIDS2018 dataset. Given the large size of the dataset, we extracted a fixed number of samples from each category within each file. The number of extracted samples for each category is presented in Table 5, which ensures balanced representation across all categories. Additionally, due to the large sample size of the `Benign’ category, we applied SMOTETomek for comprehensive sampling. Experimental results indicate that this hybrid sampling method improves model accuracy by approximately 4% for the training set, but does not show a significant effect on the test set. Consequently, in order to optimize computational efficiency, we decided not to use SMOTETomek hybrid sampling in this study.

After these preprocessing steps, the generated dataset includes multiple types of attack samples, ensuring both diversity and representativeness. To ensure the effectiveness of model training and evaluation, we conducted 800 training rounds with a batch size of 1000, allowing for more efficient handling of large datasets in each iteration. The sampled training set contains 118,990 instances, while the test set consists of 4571 instances, covering a total of 15 categories.

For model evaluation, we employed K-fold cross-validation to enhance the generalization ability and stability of the model. To address the issue of class imbalance, we utilized stratified K-fold cross-validation (StratifiedKFold), which ensures that the class distributions in both the training and validation sets are similar for each fold, thus mitigating the negative effects of data skew. We set K to 5, performing 5-fold cross-validation to improve the reliability of the performance scores and the robustness of the model. As shown in Table 6 and Figure 8, the performance metrics of our model consistently exceed those of the other models. Each category in the CICIDS2018 dataset was also evaluated using a ‘one-to-many’ strategy, as shown in Table 7, as demonstrated by the AUC-ROC and AUC-PR metrics, the model exhibited a high degree of accuracy in the majority of categories, with performance levels approaching 100 percent.

We conducted experiments on the TON-IoT dataset for multi-class classification tasks. For model evaluation, 5-fold cross-validation was employed, with K set to 5. To ensure the reproducibility of experimental results and the stability of the cross-validation process, the random_state parameter was set to 77 for both data partitioning and model initialization. The training set consists of 28,171 samples, while the validation set contains 7043 samples. The dataset is comprised of eight categories: normal, DDoS, password, XSS, injection, DoS, scanning, and MITM.

The model was trained for 50 epochs with a batch size of 11, using the Adam optimizer with a learning rate of 0.001. For the loss function, we selected Focal Loss. Comparative experiments were performed using different models, and the best performance results are presented in Table 8 and Table 9. The corresponding confusion matrix is shown in Figure 9. The experimental results demonstrate that our model outperforms the other models across all performance metrics, particularly in the detection of the Normal and MITM classes. The Transformer model performs marginally worse on this dataset, primarily due to the non-temporal structure of the ToN-IoT dataset, which lacks significant temporal dependencies between the data. The Transformer model relies on the global self-attention mechanism for temporal modeling, which is less effective when dealing with non-temporal data. This may result in redundant modeling and noise, leading to a reduction in the detection effect. Our model is able to accurately identify this class, whereas the other models fail to detect it effectively.

5. Conclusions and Future Work

In this study, we propose a novel network attack detection method that effectively addresses the class imbalance problem by leveraging the SMOTETomek sampling technique and the Focal Loss function. This approach balances the feature distributions between the training and test sets, enabling the model to better learn the characteristics of different attack types, thereby enhancing detection performance.

Furthermore, we introduce an innovative lightweight residual network that integrates the attention mechanism with one-dimensional deep separable convolution. We also propose an integration of the SE (Squeeze-and-Excitation) module and the Res-DWNet module. This design not only significantly improves the accuracy of network attack detection while maintaining model efficiency, but also enhances the model’s responsiveness and potential for real-time detection applications.

The proposed methodology is applicable to both current datasets and experimental environments, and also demonstrates good generalization ability and extensibility under different network environments and attack scenarios. The model is constructed based on SE modules and deep separable convolution, which are components that have been widely used in fields such as computer vision and natural language processing, demonstrating their good migration properties. Furthermore, the Focal loss and SMOTETomek sampling methods have been integrated within the Pytorch framework. Exporting the model file for Transfer Learning and Fine-tuning using a new dataset is facilitated, thus enabling further optimization in a novel environment. Additionally, the model’s structure is designed to be adaptable to different network environments, including enterprise intranet, Cloud-based Networks, and Internet of Things (IoT).

Despite the superior detection capability of the proposed method demonstrated by various evaluation metrics, challenges may be encountered in real network environments or different datasets, as follows. (1) Data Distribution Drift (Concept Drift): As attack patterns evolve over time, the training data may not be able to cover all new types of attacks, leading to a decrease in the model’s detection effectiveness. (2) Overfitting: The SE module enhances feature extraction by adaptively learning the channel weights, but if the training data is insufficient, the model may be overly dependent on certain specific features, leading to a decrease in generalization.

Several avenues remain for future research and improvement. These include the following: (1) enhancing the interpretability of the model, which would facilitate a deeper understanding of its decision-making process and improve its credibility; (2) novel techniques such as autoencoders or Contrastive Learning, which are designed and introduced to optimize feature learning and improve the model’s adaptability to unknown attack patterns; (3) considering the complexity of distributed network environments, where future work could focus on deploying and testing the model in real-time scenarios to evaluate its scalability and adaptability in large-scale, dynamic network settings.

Author Contributions

Conceptualization, K.Z. and C.L.; methodology, R.Z.; software, S.Z.; validation, X.W., S.S. and J.Y.; formal analysis, J.Z.; investigation, X.W.; resources, C.L.; data curation, R.Z.; writing—original draft preparation, X.W.; writing—review and editing, K.Z.; visualization, S.Z.; supervision, S.S.; project administration, J.Y.; funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Research on Key Technologies for Intelligent Diagnosis of Power Information Network Security (No. 521350240008).

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

Authors Kunsan Zhang and Renguang Zheng were employed by the company State Grid Fujian Electric Power Co., Ltd. Zhangzhou Power Supply Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationship that could be construed as a potential conflict of interest.

References

Ashoor, A.S.; Gore, S. Importance of intrusion detection system (IDS). Int. J. Sci. Eng. Res. 2011, 2, 1–4. [Google Scholar]
Chhaya, L.; Sharma, P.; Kumar, A.; Bhagwatikar, G. IoT-based implementation of field area network using smart grid communication infrastructure. Smart Cities 2018, 1, 176–189. [Google Scholar] [CrossRef]
Abdulganiyu, O.H.; Ait Tchakoucht, T.; Saheed, Y.K. A systematic literature review for network intrusion detection system (IDS). Int. J. Inf. Secur. 2023, 22, 1125–1162. [Google Scholar]
Neupane, S.; Ables, J.; Anderson, W.; Mittal, S.; Rahimi, S.; Banicescu, I.; Seale, M. Explainable intrusion detection systems (x-ids): A survey of current methods, challenges, and opportunities. IEEE Access 2022, 10, 112392–112415. [Google Scholar]
Otoum, Y.; Nayak, A. As-ids: Anomaly and signature based ids for the internet of things. J. Netw. Syst. Manag. 2021, 29, 23. [Google Scholar]
Ioulianou, P.; Vasilakis, V.; Moscholios, I.; Logothetis, M. A signature-based intrusion detection system for the internet of things. Inf. Commun. Technol. Form 2018. Available online: https://eprints.whiterose.ac.uk/133312/ (accessed on 15 February 2025).
Kumar, V.; Sangwan, O.P. Signature based intrusion detection system using SNORT. Int. J. Comput. Appl. Inf. Technol. 2012, 1, 35–41. [Google Scholar]
Kruegel, C.; Toth, T. Using decision trees to improve signature-based intrusion detection. In International Workshop on Recent Advances in Intrusion Detection; Springer: Berlin/Heidelberg, Germany, 2003; pp. 173–191. [Google Scholar]
Depren, O.; Topallar, M.; Anarim, E.; Ciliz, M.K. An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks. Expert Syst. Appl. 2005, 29, 713–722. [Google Scholar] [CrossRef]
Aydın, M.A.; Zaim, A.H.; Ceylan, K.G. A hybrid intrusion detection system design for computer network security. Comput. Electr. Eng. 2009, 35, 517–526. [Google Scholar]
Yang, L.; Moubayed, A.; Shami, A. MTH-IDS: A multitiered hybrid intrusion detection system for internet of vehicles. IEEE Internet Things J. 2021, 9, 616–632. [Google Scholar] [CrossRef]
Khraisat, A.; Gondal, I.; Vamplew, P.; Kamruzzaman, J.; Alazab, A. A novel ensemble of hybrid intrusion detection system for detecting internet of things attacks. Electronics 2019, 8, 1210. [Google Scholar] [CrossRef]
Jyothsna, V.; Prasad, R.; Prasad, K.M. A review of anomaly based intrusion detection systems. Int. J. Comput. Appl. 2011, 28, 26–35. [Google Scholar] [CrossRef]
Maseer, Z.K.; Yusof, R.; Bahaman, N.; Mostafa, S.A.; Foozy, C.F. Benchmarking of machine learning for anomaly based intrusion detection systems in the CICIDS2017 dataset. IEEE Access 2021, 9, 22351–22370. [Google Scholar] [CrossRef]
Eskandari, M.; Janjua, Z.H.; Vecchio, M.; Antonelli, F. Passban IDS: An intelligent anomaly-based intrusion detection system for IoT edge devices. IEEE Internet Things J. 2020, 7, 6882–6897. [Google Scholar] [CrossRef]
Farnaaz, N.; Jabbar, M.A. Random forest modeling for network intrusion detection system. Procedia Comput. Sci. 2016, 89, 213–217. [Google Scholar] [CrossRef]
Li, W.; Yi, P.; Wu, Y.; Pan, L.; Li, J. A new intrusion detection system based on KNN classification algorithm in wireless sensor network. J. Electr. Comput. Eng. 2014, 2014, 240217. [Google Scholar] [CrossRef]
Rai, K.; Devi, M.S.; Guleria, A. Decision tree based algorithm for intrusion detection. Int. J. Adv. Netw. Appl. 2016, 7, 2828–2834. [Google Scholar]
Mohammadi, M.; Rashid, T.A.; Karim, S.H.; Aldalwie, A.H.; Tho, Q.T.; Bidaki, M.; Rahmani, A.M.; Hosseinzadeh, M. A comprehensive survey and taxonomy of the SVM-based intrusion detection systems. J. Netw. Comput. Appl. 2021, 178, 102983. [Google Scholar] [CrossRef]
Syarif, I.; Zaluska, E.; Prugel-Bennett, A.; Wills, G. Application of bagging, boosting and stacking to intrusion detection. In Machine Learning and Data Mining in Pattern Recognition: 8th International Conference, MLDM 2012, Berlin, Germany, July 13–20, 2012. Proceedings 8; Springer: Berlin/Heidelberg, Germany, 2012; pp. 593–602. [Google Scholar]
Tang, Y.; Gu, L.; Wang, L. Deep stacking network for intrusion detection. Sensors 2021, 22, 25. [Google Scholar] [CrossRef]
Mohammadpour, L.; Ling, T.C.; Liew, C.S.; Aryanfar, A. A survey of CNN-based network intrusion detection. Appl. Sci. 2022, 12, 8162. [Google Scholar] [CrossRef]
Laghrissi, F.; Douzi, S.; Douzi, K.; Hssina, B. Intrusion detection systems using long short-term memory (LSTM). J. Big Data 2021, 8, 65. [Google Scholar]
Imrana, Y.; Xiang, Y.; Ali, L.; Abdul-Rauf, Z. A bidirectional LSTM deep learning approach for intrusion detection. Expert Syst. Appl. 2021, 185, 115524. [Google Scholar]
Seo, E.; Song, H.M.; Kim, H.K. GIDS: GAN based intrusion detection system for in-vehicle network. In Proceedings of the 2018 16th Annual Conference on Privacy, Security and Trust (PST), Belfast, Ireland, 28–30 August 2018; pp. 1–6. [Google Scholar]
Wu, Z.; Wang, J.; Hu, L.; Zhang, Z.; Wu, H. A network intrusion detection method based on semantic re-encoding and deep learning. J. Netw. Comput. Appl. 2020, 164, 102688. [Google Scholar] [CrossRef]
Liu, J.; Gao, Y.; Hu, F. A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM. Comput. Secur. 2021, 106, 102289. [Google Scholar]
Guo, Y.; Xiong, G.; Li, Z.; Shi, J.; Cui, M.; Gou, G. Ta-gan: Gan based traffic augmentation for imbalanced network traffic classification. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–8. [Google Scholar]
Liu, C.; Antypenko, R.; Sushko, I.; Zakharchenko, O. Intrusion detection system after data augmentation schemes based on the VAE and CVAE. IEEE Trans. Reliab. 2022, 71, 1000–1010. [Google Scholar]
Benaddi, H.; Jouhari, M.; Ibrahimi, K.; Benslimane, A.; Amhoud, E.M. Adversarial attacks against iot networks using conditional gan based learning. In Proceedings of the GLOBECOM 2022-2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 4–8 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 2788–2793. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2980–2988. [Google Scholar]
Khanam, S.; Ahmedy, I.; Idris, M.Y.I.; Jaward, M.H. Towards an effective intrusion detection model using focal loss variational autoencoder for internet of things (IoT). Sensors 2022, 22, 5822. [Google Scholar] [CrossRef]
Peng, H.; Wu, C.; Xiao, Y. CBF-IDS: Addressing class imbalance using CNN-BiLSTM with focal loss in network intrusion detection system. Appl. Sci. 2023, 13, 11629. [Google Scholar] [CrossRef]
Thilagam, T.; Aruna, R. Intrusion detection for network based cloud computing by custom RC-NN and optimization. ICT Express 2021, 7, 512–520. [Google Scholar] [CrossRef]
Nguyen, M.T.; Kim, K. Genetic convolutional neural network for intrusion detection systems. Future Gener. Comput. Syst. 2020, 113, 418–427. [Google Scholar] [CrossRef]
Mohamed, S.; Ejbali, R. Deep SARSA-based reinforcement learning approach for anomaly network intrusion detection system. Int. J. Inf. Secur. 2023, 22, 235–247. [Google Scholar] [CrossRef]
Jouhari, M.; Benaddi, H.; Ibrahimi, K. Efficient Intrusion Detection: Combining X 2 Feature Selection with CNN-BiLSTM on the UNSW-NB15 Dataset. In Proceedings of the 2024 11th International Conference on Wireless Networks and Mobile Communications (WINCOM), Leeds, UK, 23–25 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
Wang, Z.; Ghaleb, F.A. An attention-based convolutional neural network for intrusion detection model. IEEE Access 2023, 11, 43116–43127. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
Targ, S.; Almeida, D.; Lyman, K. Resnet in resnet: Generalizing residual architectures. arXiv 2016, arXiv:1603.08029. [Google Scholar]
Koonce, B.; Koonce, B. ResNet 50. In Convolutional Neural Networks with Swift for Tensorflow: Image Recognition and Dataset Categorization; Apress: Berkeley, CA, USA, 2021; pp. 63–72. [Google Scholar]
Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A Detailed Analysis of the KDD CUP 99 Data Set. In Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), Ottawa, ON, Canada, 8–10 July 2009. [Google Scholar]
A Realistic Cyber Defense Dataset (CSE-CIC-IDS2018). Canadian Institute for Cybersecurity. Available online: https://registry.opendata.aws/cse-cic-ids2018 (accessed on 17 February 2025).
Moustafa, N. A new distributed architecture for evaluating AI-based security systems at the edge: Network TON_IoT datasets. Sustain. Cities Soc. 2021, 72, 102994. [Google Scholar]
Booij, T.M.; Chiscop, I.; Meeuwissen, E.; Moustafa, N.; Den Hartog, F.T. ToN IoT-The role of heterogeneity and the need for standardization of features and attack types in IoT network intrusion datasets. IEEE Internet Things J. 2021, 9, 485–496. [Google Scholar]
Alsaedi, A.; Moustafa, N.; Tari, Z.; Mahmood, A.; Anwar, A. TON_IoT telemetry dataset: A new generation dataset of IoT and IIoT for data-driven Intrusion Detection Systems. IEEE Access 2020, 8, 165130–165150. [Google Scholar]
Moustafa, N.; Keshky, M.; Debiez, E.; Janicke, H. Federated TON_IoT Windows Datasets for Evaluating AI-Based Security Applications. In Proceedings of the 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China, 29 December 2020–1 January 2021; pp. 848–855. [Google Scholar] [CrossRef]
Moustafa, N.; Ahmed, M.; Ahmed, S. Data Analytics-Enabled Intrusion Detection: Evaluations of ToN_IoT Linux Datasets. In Proceedings of the 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China, 29 December 2020–1 January 2021; pp. 727–735. [Google Scholar] [CrossRef]
Moustafa, N. New Generations of Internet of Things Datasets for Cybersecurity Applications based Machine Learning: TON_IoT Datasets. In Proceedings of the eResearch Australasia Conference, Brisbane, Australia, 21–25 October 2019. [Google Scholar]
Moustafa, N. A systemic IoT-Fog-Cloud architecture for big-data analytics and cyber security systems: A review of fog computing. arXiv 2019, arXiv:1906.01055. [Google Scholar]
Ashraf, J.; Keshk, M.; Moustafa, N.; Abdel-Basset, M.; Khurshid, H.; Bakhshi, A.D.; Mostafa, R.R. IoTBoT-IDS: A Novel Statistical Learning-enabled Botnet Detection Framework for Protecting Networks of Smart Cities. Sustain. Cities Soc. 2021, 72, 103041. [Google Scholar]
Viadinugroho, R.A.A. Imbalanced Classification in Python: SMOTE-Tomek Links Method; Medium: San Francisco, CA, USA, 2021.
Mukhoti, J.; Kulharia, V.; Sanyal, A.; Golodetz, S.; Torr, P.; Dokania, P. Calibrating deep neural networks using focal loss. Adv. Neural Inf. Process. Syst. 2020, 33, 15288–15299. [Google Scholar]
Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Adv. Neural Inf. Process. Syst. 2020, 33, 21002–21012I. [Google Scholar]

Figure 1. The flowchart of the network attack detection system is as follows: First, select training and testing samples from the dataset. The training samples are then input into the classification model for training. Once the model is trained, it can be used to detect the test set. The classification of samples is performed using the majority voting method to determine whether a sample is normal or anomalous.

Figure 2. The structure of SE-DWNet.

Figure 3. The following workflow diagram illustrates the SE module: the

F_{tr}

operation transforms the features; the

F_{sq}

operation compresses the information from H × W dimension to C dimension by means of global average pooling; the

F_{ex}

operation performs channel weight learning; and Finally, the

F_{scale}

operation adjusts the original feature maps weighted according to the channel weights.

Figure 3. The following workflow diagram illustrates the SE module: the

F_{tr}

operation transforms the features; the

F_{sq}

operation compresses the information from H × W dimension to C dimension by means of global average pooling; the

F_{ex}

operation performs channel weight learning; and Finally, the

F_{scale}

operation adjusts the original feature maps weighted according to the channel weights.

Figure 4. As illustrated in (a), ordinary convolution for a three-channel image involves the utilization of three convolution kernels, each operating on a distinct channel. The resultant values from each channel are then weighted and summed. (b) depicts channel-by-channel convolution, where the convolution kernels are applied to each channel individually. channel convolution, for a three-channel image, the channel-by-channel convolution will be processed with a separate convolution kernel for each channel, rather than performing the value of all the channels at each position as in ordinary convolution convolution operation.

Figure 5. The left figure shows the feature box plots between Feature 21 to Feature 41 in the training set, and the right figure shows the feature box plots between Feature 21 and Feature 41 in the test set.

Figure 6. Histograms of the distribution of the data before and after the application of the SMOTETomek sampling method and plots of the confusion matrix.

Figure 7. The four graphs above from left to right and top to bottom are the confusion matrix plots of SE-DWNet, LSTM, GRU, and CNN2d models under the NSL-KDD dataset, respectively.

Figure 8. The four graphs above from left to right and top to bottom are the confusion matrix plots of SE-DWNet, LSTM, GRU, and CNN2d models under the CICIDS2018 dataset, respectively.

Figure 9. The four graphs above from left to right and top to bottom are the confusion matrix plots of SE-DWNet, LSTM, GRU, and CNN2d models under the TON-IoT dataset, respectively.

Table 1. Comparison of existing approaches using deep learning with imbalanced handling.

Reference	Year	Method	Dataset	Imbalance Handling	Limitation
Liu et al. [27]	2021	LightGBM	NSL-KDD, UNSW-NB15, CICIDS2017	ADASYN	Data redundancy, model instability
Yu Guo et al. [28]	2021	TA-GAN	VPN_ISCX and NonVPN_ISCX	Focal Loss and Data Augmentation	High complexity, training cost
Khanam S et al. [32]	2022	CFLVAE-DNN	NSL-KDD	Class-wise Focal Loss	Overfitting, limited to NSL-KDD
Peng H et al. [33]	2023	CBF-IDS	NSL-KDD, UNSW-NB15, CIC-IDS2017	Focal Loss	High resource needs, preprocessing issues
Thilagam and Aruna [28]	2021	RC-NN with Ant Lion Optimization	DARPA IDS, CSE-CIC-IDS2018	-	Complex, dataset-dependent
Safa Mohamed and Ridha Ejbali [36]	2023	Deep SARSA	NSL-KDD, UNSW-NB15	Reinforcement Learning	Stability issues, high error rate
Mohammed Jouhari et al. [37]	2024	CNN-BiLSTM with $χ^{2}$ feature selection	UNSW-NB15	Weighted loss	Long training time, slow for real-time use
Zhen Wang et al. [38]	2023	Attention-based CNN	CSE-CIC-IDS2018	-	Specific dataset, untested generalization
ours	2025	Res-based and lightweight SE-DWNet	NSL-KDD, CICIDS2018, ToN-IoT	SMOTETomek, Focal Loss and SE Module	-

Table 2. Performance of different modules of SE-DWNet, where a tick means that the module is used and a cross means that the module is not used.

DepthWise-1D	SE	Residual	Accuracy	Precision	Recall	F1-Score
✓	×	✓	99.59%	99.59%	99.59%	99.59%
×	✓	✓	99.63%	99.63%	99.63%	99.63%
✓	✓	×	99.62%	99.62%	99.62%	99.62%
✓	✓	✓	99.64%	99.63%	99.64%	99.64%

Table 3. Performance of different models under NSL-KDD dataset, where the best metric is shown in bold.

Model	Accuracy	Precision	Recall	F1-Score
DenseNN	99.31%	99.30%	99.31%	99.30%
CNN2d	99.47%	99.47%	99.47%	99.47%
GRU	99.29%	99.27%	99.29%	99.28%
LSTM	99.32%	99.31%	99.32%	99.31%
Transformer	99.33%	99.37%	99.33%	99.34%
CNN-BILSTM	99.49%	99.50%	99.49%	99.49%
SE-DWNet	99.64%	99.63%	99.64%	99.64%

Table 4. AUC-ROC and AUC-PR scores for different attack classes in NSL-KDD dataset.

Class	AUC-ROC	AUC-PR
Normal	99.99%	99.99%
DoS	100.00%	100.00%
Probe	100.00%	99.96%
U2R	99.92%	72.98%
R2L	99.95%	98.95%

Table 5. Sample distribution after sampling the CICIDS2018 dataset.

Category	Benign	Infilteration	DoS Attacks-Hulk	Bot DoS Attacks-Slowloris	DoS Attacks-GoldenEye
Sample Count	9946	1981	1000	1000	1000
Category	DoS attacks-GoldenEye	DDoS attacks-LOIC-HTTP	DoS attacks-SlowHTTPTest	FTP-BruteForce	SSH-Bruteforce
Sample Count	1000	1000	1000	1000	1000
Category	DDOS attack-HOIC	DDOS attack-LOIC-UDP	Brute Force -Web	Brute Force -XSS	SQL Injection
Sample Count	1000	1000	611	230	87

Table 6. Performance of different models under CICIDS2018 dataset, where the best metric is shown in bold.

Model	Accuracy	Precision	Recall	F1-Score
DenseNN	93.79%	94.48%	93.79%	94.00%
CNN2d	96.28%	96.27%	96.28%	96.27%
GRU	97.42%	97.49%	97.42%	97.45%
LSTM	98.32%	98.36%	98.32%	98.33%
Transformer	96.65%	96.69%	96.65%	96.67%
CNN-BILSTM	96.76%	96.81%	96.76%	96.78%
SE-DWNet	99.12%	99.12%	99.12%	99.12%

Table 7. AUC-ROC and AUC-PR scores for different attack classes in the CICIDS2018 dataset.

Class	AUC-ROC	AUC-PR
Benign	99.98%	99.97%
Bot	100.00%	100.00%
DoS attacks-Hulk	100.00%	100.00%
DoS attacks-SlowHTTPTest	100.00%	100.00%
Brute Force -Web	100.00%	99.97%
Brute Force -XSS	100.00%	100.00%
SQL Injection	100.00%	100.00%
DDoS attacks-LOIC-HTTP	100.00%	100.00%
Infilteration	99.94%	99.41%
DoS attacks-GoldenEye	100.00%	100.00%
DoS attacks-Slowloris	100.00%	100.00%
SSH-Bruteforce	100.00%	100.00%
FTP-BruteForce	100.00%	100.00%
DDOS attack-HOIC	100.00%	100.00%
DDOS attack-LOIC-UDP	100.00%	100.00%

Table 8. Performance of different models under TON-IoT dataset, where the best metric is shown in bold.

Model	Accuracy	Precision	Recall	F1-Score
DenseNN	98.10%	98.21%	98.09%	98.12%
CNN2d	98.34%	98.54%	98.34%	98.39%
GRU	98.31%	98.34%	98.31%	98.32%
LSTM	98.39%	98.49%	98.39%	98.41%
Transformer	97.49%	98.16%	97.49%	97.71%
CNN-BILSTM	98.18%	98.55%	98.18%	98.30%
SE-DWNet	98.68%	98.78%	98.68%	98.71%

Table 9. AUC-ROC and AUC-PR scores for different attack classes in the TON-IoT dataset.

Class	AUC-ROC	AUC-PR
Normal	99.99%	100.00%
DDOS	99.99%	99.96%
Password	100.00%	100.00%
XSS	100.00%	100.00%
Injection	100.00%	99.98%
DOS	99.99%	99.60%
Scanning	100.00%	100.00%
MITM	100.00%	100.00%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, K.; Zheng, R.; Li, C.; Zhang, S.; Wu, X.; Sun, S.; Yang, J.; Zheng, J. SE-DWNet: An Advanced ResNet-Based Model for Intrusion Detection with Symmetric Data Distribution. Symmetry 2025, 17, 526. https://doi.org/10.3390/sym17040526

AMA Style

Zhang K, Zheng R, Li C, Zhang S, Wu X, Sun S, Yang J, Zheng J. SE-DWNet: An Advanced ResNet-Based Model for Intrusion Detection with Symmetric Data Distribution. Symmetry. 2025; 17(4):526. https://doi.org/10.3390/sym17040526

Chicago/Turabian Style

Zhang, Kunsan, Renguang Zheng, Chaopeng Li, Song Zhang, Xinyi Wu, Shidan Sun, Jiawen Yang, and Jiachun Zheng. 2025. "SE-DWNet: An Advanced ResNet-Based Model for Intrusion Detection with Symmetric Data Distribution" Symmetry 17, no. 4: 526. https://doi.org/10.3390/sym17040526

APA Style

Zhang, K., Zheng, R., Li, C., Zhang, S., Wu, X., Sun, S., Yang, J., & Zheng, J. (2025). SE-DWNet: An Advanced ResNet-Based Model for Intrusion Detection with Symmetric Data Distribution. Symmetry, 17(4), 526. https://doi.org/10.3390/sym17040526

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SE-DWNet: An Advanced ResNet-Based Model for Intrusion Detection with Symmetric Data Distribution

Abstract

1. Introduction

2. Related Work

3. Proposed SE-DWNet Framework

3.1. Overview

3.2. SE Module

3.3. Depthwise Separable Convolution

3.4. The BasicBlock Based on Residual Connection

3.5. Computational Complexity Analysis

4. Experiments

4.1. Dataset

4.2. Data Processing

4.3. Data Visualization

4.4. Loss Function

4.5. Evaluation Metrics

4.6. Ablation Experiment

4.7. Comparison of SE-DWNet Performance with Other Models

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI