Next Article in Journal
Winning Opinion in the Voter Model: Following Your Friends’ Advice or That of Their Friends?
Previous Article in Journal
Information Content and Maximum Entropy of Compartmental Systems in Equilibrium
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

CSCVAE-NID: A Conditionally Symmetric Two-Stage CVAE Framework with Cost-Sensitive Learning for Imbalanced Network Intrusion Detection

College of Computer Science, Beijing University of Technology, Beijing 100124, China
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(11), 1086; https://doi.org/10.3390/e27111086
Submission received: 23 September 2025 / Revised: 9 October 2025 / Accepted: 17 October 2025 / Published: 22 October 2025

Abstract

With the increasing complexity and diversity of network threats, developing high-performance Network Intrusion Detection Systems (NIDSs) has become a critical challenge. A primary obstacle in this domain is the pervasive issue of class imbalance, where the scarcity of minority attack samples and the varying costs of misclassification severely limit the effectiveness of traditional models, often leading to a difficult trade-off between high False Positive Rates (FPRs) and low Recall. To address this challenge, this paper proposes a novel, conditionally symmetric two-stage framework, termed CSCVAE-NID (Conditionally Symmetric Two-Stage CVAE for Network Intrusion Detection). The framework operates in two synergistic stages: Firstly, a Data Augmentation Conditional Variational Autoencoder (DA-CVAE) is introduced to tackle the data imbalance problem at the data level. By conditioning on attack categories, the DA-CVAE generates high-quality and diverse synthetic samples for underrepresented classes, providing a more balanced training dataset. Secondly, the core of our framework, a Cost-Sensitive Multi-Class Classification CVAE (CSMC-CVAE), is proposed. This model innovatively reframes the classification task as a probabilistic distribution matching problem and integrates a cost-sensitive learning strategy at the algorithm level. By incorporating a predefined cost matrix into its loss function, the CSMC-CVAE is compelled to prioritize the correct classification of high-cost, minority attack classes. Comprehensive experiments conducted on the public CICIDS-2017 and UNSW-NB15 datasets demonstrate the superiority of the proposed CSCVAE-NID framework. Compared to several state-of-the-art methods, our approach achieves exceptional performance in both binary and multi-class classification tasks. Notably, the DA-CVAE module is designed to be independent and extensible, allowing the effective data that it generates to support any advanced intrusion detection methodology.

1. Introduction

With the rapid development of big data, cloud computing, and high-speed communication technologies, global networks have penetrated all aspects of social life with unprecedented depth and breadth. The dependence of modern society on cyberspace has grown to an unparalleled degree, underlying everything from the stable operation of critical infrastructure and corporate digital transformation to personal communication and entertainment [1,2,3]. The network, acting as the central nexus for information exchange and service interaction, concurrently presents a significant attack surface due to its intrinsic openness and complexity, making it a prime target for a wide range of malicious activities and cyber threats [4,5]. Consequently, developing a proactive, efficient, and precise cybersecurity architecture to safeguard the stability and security of cyberspace remains a perennial and central challenge for the networking technology community. Among the numerous defense technologies, Network Intrusion Detection Systems (NIDSs) play a crucial role. They work by continuously monitoring and analyzing network traffic to detect and issue alerts for potential attack behaviors and security threats in real time. Recent years have witnessed the advent of numerous innovative intrusion detection techniques, developed in response to the increasing complexity of cyber threats [6,7,8]. The intrusion detection task is commonly framed in academia and industry as a binary classification problem (normal vs. anomalous) or a multi-class classification problem (identifying specific attack types). Based on this foundation, a variety of classic machine learning algorithms have seen extensive application in this field, including decision trees, support vector machines, multilayer perceptrons, and random forests [9,10]. Ref. [9] introduced the Dendron framework, which employs a combination of decision trees and genetic algorithms to evolutionarily generate highly readable intrusion detection rules. The framework also incorporates heuristics targeting minority classes, thereby improving the detection of rare attacks. Ref. [10] utilized classic classifiers, such as random forests, Bayesian networks, LDA, and QDA, to conduct binary and multi-class classification experiments on the CICIDS2017 dataset [11]. Their work effectively addressed the bottleneck related to both classification performance and computational efficiency when dealing with high-dimensional network traffic data. Nevertheless, the efficacy of these traditional methods is constrained by their inherently shallow architecture when confronted with the massive, high-dimensional, and intricate nature of modern network traffic data, limiting their capacity for effective feature learning and representation. Deep learning approaches, in contrast, exhibit superior performance in processing high-dimensional, complex, and noisy data. This superiority stems from their inherent ability to perform powerful non-linear transformations and automatically extract hierarchical features. As a result, the integration of deep learning technologies into network intrusion detection has become a highly active area of research. Convolutional Neural Networks (CNNs) [12], Recurrent Neural Networks (RNNs) [13], and autoencoders (AEs) [14], in particular, stand out as the most prevalent and representative models currently employed in this domain. Ref. [12] proposed a tree-like CNN architecture that leverages a hierarchical structure in conjunction with the Soft-Root-Sign (SRS) activation function. This design facilitates accelerated training and has demonstrated superior detection accuracy for specific attack classes, notably DDoS, infiltration, and brute-force attacks. Ref. [14] introduced a two-stage learning approach, integrating a Conditional Variational Autoencoder (CVAE) with Extreme Value Theory (EVT), which substantially enhanced the model’s ability to detect unknown attacks. Nevertheless, while many advanced intrusion detection methods have been proposed, several pressing challenges persist in practical applications. Firstly, acquiring effective attack samples is a process that is both resource-intensive and prohibitively expensive. The process of collecting and annotating diverse attack traffic from live network environments is often a labor-intensive and computationally expensive endeavor. This intrinsic challenge in sample collection is a direct cause for the severe class imbalance prevalent in most public datasets, which are characterized by a profound scarcity of attack samples relative to normal ones. Secondly, existing methods exhibit limited performance when processing datasets characterized by severe class imbalance. In the context of attack categories characterized by extreme data sparsity, neither traditional techniques for addressing class imbalance (such as SMOTE [15]) nor standard deep learning classifiers can reliably learn robust, generalizable discriminative features. This inherent limitation ultimately results in a classification performance that is often unsatisfactory. To overcome the aforementioned limitations, this paper introduces a novel two-stage framework for network intrusion detection, termed CSCVAE-NID. Specifically, our framework first employs a DA-CVAE network for data augmentation, which leverages a parallel dual-encoder structure to disentangle the content features of anomaly samples from the style features of their attack categories. This design effectively mitigates the class imbalance issue arising from the insufficient number of minority-class attack samples in the training data. For the second stage, aiming for high-precision multi-class classification, we designed the CSMC-CVAE network by symmetrically swapping the inputs of the parallel dual-encoder structure and slightly altering the decoder architecture. This novel approach reframes the classification task as an inference process based on probabilistic distribution matching, which involves measuring the consistency between the content feature distribution of a given sample and the prototypical distribution of each potential class. Moreover, to augment the training robustness of the CSMC-CVAE in class-imbalanced scenarios, a cost-sensitive learning strategy was integrated. This was implemented by introducing a predefined cost matrix into the loss function, where each element quantifies the penalty for a specific type of misclassification. This approach ensures that the model’s optimization process is guided not just by classification accuracy but also by the real-world costs of prediction errors. Comprehensive experiments on the public CICIDS2017 [11] and UNSW-NB15 datasets [16,17,18] demonstrate that our proposed method significantly outperforms existing state-of-the-art approaches in both binary and multi-class classification tasks. The specific contributions of our proposed method can be summarized as follows:
  • We propose a data augmentation scheme based on a Conditional Variational Autoencoder (CVAE), namely, the DA-CVAE model. This model leverages the attack category as a condition to directionally generate high-quality and diverse synthetic samples for minority classes within the dataset. Consequently, it effectively counteracts the training bias induced by the severe class imbalance, a common issue in such data.
  • To enable high-precision multi-class classification, we developed the CSMC-CVAE model, which operates on the fundamental principle of matching probabilistic distributions. This model reframes the classification task as a problem of quantifying the consistency between a given sample’s feature distribution and the prototypical distribution of each candidate class.
  • To further address the issue of class imbalance at the algorithmic level, we incorporated a cost-sensitive learning strategy within the CSMC-CVAE’s training paradigm. This was achieved by augmenting the loss function with a predefined cost matrix, which effectively forces the optimization process to prioritize the correct classification of samples from underrepresented classes.
  • Experimental results on two public datasets indicate that our proposed CSCVAE-NID framework significantly outperforms both conventional and state-of-the-art methods in terms of detection accuracy.
The rest of this paper is structured as follows: In Section 2, we review the relevant literature. Section 3 provides a detailed description of our proposed CSCVAE-NID methodology. Section 4 presents the comprehensive experimental validation of CSCVAE-NID’s performance on public benchmark datasets. Finally, Section 5 concludes the paper and discusses future work.

2. Related Work

2.1. Machine Learning-Based Network Intrusion Detection

The proliferation of inter-user communication has led to a sustained increase in anomalous network traffic, thereby introducing substantial security risks. In an effort to effectively curtail the disruption caused by anomalous intrusions to inter-device communication, a portion of researchers have dedicated their work to the development of machine learning-based network intrusion detection methods [19].
These methods often necessitate manual statistical feature engineering. In the study by Dogan [20], a set of features including flow duration, packet length, arrival time, and time intervals was selected. By applying tree-based classification algorithms to these features, a detection rate of 90% was attained. In their work [21], Lawa et al. surveyed and compared various anomaly detection mechanisms within the networking domain. They also benchmarked the performance of several anomaly classification methods on the UNSW-NB15 dataset. In an effort to bolster the performance of machine learning techniques, ref. [22] devised a two-stage classification framework that synergistically combines machine learning with Deep Packet Inspection (DPI). The primary objective of this architecture is to capitalize on the computational efficiency of machine learning while harnessing the high recognition accuracy afforded by DPI. However, this method is unable to handle encrypted traffic, as Deep Packet Inspection (DPI) necessitates access to the raw payload content. Furthermore, the computationally intensive nature of DPI restricts its applicability in real-time settings. The work presented in [23] introduces an ensemble classification technique grounded in statistical flow features. Evaluation on the UNSW-NB15 dataset revealed that this model achieves a high level of detection accuracy while maintaining a low false positive risk. Ref. [24] presents a thorough investigation into the relative efficacy of single machine learning classifiers as opposed to ensemble strategies for detecting anomalous network behaviors. Nevertheless, a more thorough evaluation of the computational cost and real-time performance of these models is still required. Furthermore, the significant time investment demanded by manual feature engineering continues to be a major challenge for machine learning-based network intrusion detection approaches.

2.2. Deep Learning-Based Network Intrusion Detection

Whereas traditional machine learning relies on manual feature engineering, the primary advantage of deep learning is its capability for automatic feature extraction. Ref. [25] employed Long Short-Term Memory (LSTM) to classify the ISCX VPN-NonVPN traffic and also discussed the applicability of Convolutional Neural Networks (CNNs) for processing this type of traffic. Their experiments achieved an accuracy of 91% on the dataset. Ref. [26] introduced a lightweight CNN (LNN) for classifying the BoT-IoT dataset, attaining an accuracy of 96.15%. Despite its advantages over conventional CNNs in model complexity and computational efficiency, the robustness of its feature extraction capabilities was not demonstrably improved.
To improve multi-class classification of anomalous traffic, deep learning approaches frequently utilize ensemble strategies involving multiple neural networks. As an illustration, Ref. [27] devised a hybrid architecture for traffic identification and classification on the BoT-IoT dataset by integrating Long Short-Term Memory (LSTM) with autoencoder (AE) networks. The performance of this model in multi-class classification tasks was thereby substantially elevated. A limitation of this work, however, is the narrow scope of its comparative study. The investigation was confined to exploring performance variations among different optimizers and CNN architectures, omitting a direct, horizontal comparison with other classes of deep learning models. A hybrid neural network model for anomaly detection grounded in traffic features was proposed in [28]. The architecture assigns distinct roles to its components: a Convolutional Neural Network (CNN) is employed for the extraction of sequential features, whereas a Deep Neural Network (DNN) is specialized for learning high-dimensional feature representations. A key advantage of this model over other deep learning methods is its ability to concurrently achieve high classification accuracy while maintaining a lower model complexity. An unsupervised and lightweight anomaly detection model, ARCADE, was proposed in [29]. The model synergistically combines a one-dimensional convolutional autoencoder (1D-CNN AE) with a Wasserstein GAN with Gradient Penalty (WGAN-GP) adversarial regularization strategy. Notably, it operates directly on raw traffic bytes to perform unsupervised anomaly detection. Nevertheless, the model exhibits subpar performance in detecting flood attacks, achieving F1-scores of only 68.70% and 66.61% for these two categories, respectively. Ref. [30] presents a multi-factor feature-encoding methodology grounded in autoencoders. It formulates a geometry-aware feature representation through the joint consideration of latent representations, reconstruction residual directions, and reconstruction errors. A key component of this approach is a specialized network engineered to inject reconstruction error information into the representation. The method demonstrates a substantial performance enhancement in weakly supervised anomaly detection, especially in scenarios with limited labeled data. Nevertheless, it is constrained by two factors—a relatively high computational complexity, and a dependency on data manifold assumptions—which could potentially hinder the capture of complex anomaly patterns.
In summary, while existing deep learning methods demonstrate considerable potential for intrusion detection, they are still beset by several prevalent limitations. On the one hand, many advanced models feature increasingly complex network architectures, which incorporate multiple neural networks or intricate feature-encoding modules. This complexity results in significant computational overheads and challenges in practical deployment. On the other hand, existing research has generally not sufficiently considered the negative impact of data imbalance on multi-class detection performance. Most models are still designed with the primary goal of improving overall accuracy, while paying insufficient attention to the recognition capability for minority attack classes and the risk costs associated with different misclassification types. This ultimately limits their reliability in real-world network environments.

2.3. Imbalanced Data Handling in Network Intrusion Detection

A fundamental challenge in network intrusion detection, when treated as a classification task, is the pervasive issue of imbalanced class distributions [31,32]. This is often manifested by a stark disparity in sample sizes, where some classes comprise tens of thousands of instances while others are represented by only several hundred. This class imbalance severely undermines the generalization performance of a model, with the detection of minority classes being the most adversely affected.
Generally, the approaches to mitigate class imbalance in network intrusion detection are primarily bifurcated into two categories: data-level strategies and algorithm-level strategies.
Data-level strategies [33,34,35] aim to balance the dataset prior to classification training by modifying the sample distribution via oversampling or undersampling techniques. Oversampling [36] aims to augment the minority class by generating additional samples, whereas undersampling seeks to reduce the majority class by removing existing samples. For instance, Tan et al. [37] developed an effective anomaly detection model by integrating the SMOTE oversampling technique with a random forest ensemble. In order to enhance intrusion detection performance, Wang and Septian [38] devised a multi-stage detection framework that incorporates both data sampling techniques and deep learning models. The framework operates in three key stages: initially, selecting the most informative features via an ensemble of decision trees; subsequently, balancing the dataset by augmenting the minority class with the SMOTE algorithm; and ultimately, feeding the processed data into a Recurrent Neural Network (RNN) for final attack identification. The study by Al and Dener [39] illustrates the synergy between hybrid sampling at the data level and a hybrid architecture at the model level. Their data balancing process involves a two-step hybrid approach. First, the SMOTE algorithm is employed to oversample the minority class. This is followed by an undersampling step, where the Tomek links method is used to eliminate noisy or borderline instances from the majority class. Building upon this, they constructed a deep model combining a CNN with LSTM, where the CNN extracts spatial features and the LSTM captures temporal dependencies, ultimately performing the intrusion detection task.
From the perspective of model learning, both predominant sampling techniques present potential issues. By discarding samples from the majority class, undersampling risks causing the model to learn an incomplete representation of that class, potentially introducing information bias. Conversely, oversampling techniques, by augmenting the dataset with synthetic samples, compel the model to learn a decision boundary for a more complex, artificially expanded data distribution. This inherently increases the risk of the model overfitting to the training data.
A distinct paradigm for addressing class imbalance involves modifications at the algorithm level [40,41,42,43]. In contrast to data-level strategies, this approach does not manipulate the data distribution. Instead, it focuses on redesigning or adapting the learning algorithm itself, such that it can effectively learn from an imbalanced training set without bias. A quintessential example of an algorithm-level strategy is cost-sensitive learning [44]. This methodology confronts class imbalance by modifying the optimization objective, i.e., the loss function. Whereas standard models assign a uniform cost to all classification errors, cost-sensitive learning introduces a predefined cost matrix. This matrix is designed to heavily penalize the misclassification of minority class instances as members of the majority class, thereby forcing the model to pay greater attention to them. As a result, during the iterative parameter optimization, the model is compelled to preferentially reduce classification errors on the minority class. This ultimately yields a higher degree of recognition accuracy for the underrepresented category. The work of Telikani and Gandomi [45] serves as a prime example of the application of cost-sensitive learning. They put forward a deep learning model, termed the Cost-Sensitive Stacked Autoencoder (CSSAE), tailored specifically to the challenge of data imbalance in anomaly detection. The fundamental principle of this model is the establishment of a cost matrix to assign disparate penalty weights to different types of misclassification errors, thereby compelling the model to learn more effectively from the imbalanced data.
Motivated by these pioneering works, the synergistic combination of cost-sensitive mechanisms and various Deep Neural Network architectures has become a prominent and active research avenue. Researchers have incorporated asymmetric costs into various deep learning models, including autoencoders, Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs). The objective of this integration is to harness the potent feature extraction capabilities of deep learning, while concurrently mitigating learning biases on imbalanced data via cost-sensitive strategies. This approach ultimately leads to enhanced performance in complex detection tasks. Ref. [45] provides a tangible example of cost-sensitive deep learning in practice. The contribution is twofold: Firstly, the authors devised a methodology for constructing the cost matrix wherein the costs are inversely proportional to the class sample counts. Secondly, these derived cost weights are directly incorporated into the loss function, thereby compelling the model to improve its recognition capabilities for minority classes. Despite achieving an approximate 2% improvement in accuracy on the VPN-NonVPN dataset, the cost integration mechanism of this method presents a theoretical deficiency. The primary issue is that the model’s output vector may forfeit its characteristic as a standard probability distribution, meaning that the summation of class probabilities is not guaranteed to equal one.
This study aims to overcome the limitations of existing methods by positioning the cost-sensitive learning strategy as the central component of the algorithm-level imbalance handling within our CSCVAE-NID framework. Drawing upon this concept, we also employ a weighting scheme where the cost for each class is set to be inversely proportional to its sample frequency. These cost weights are incorporated into a weighted cross-entropy loss formulation, which, in turn, forms an integral part of the joint loss function for our CSMC-CVAE sub-framework. This dual approach allows us to leverage the cost-sensitive strategy to force the model to prioritize the correct classification of minority attacks during training, while simultaneously enabling risk-aware decisions via our unique CVAE classification mechanism based on matching sample content distributions with prototypical class distributions.

3. The Proposed Methodology

3.1. Overview

As depicted in Figure 1, the proposed CSCVAE-NID model comprises two sub-frameworks. The first of these, the DA-CVAE, is a data augmentation module that takes the specific attack category as a condition to generate designated anomaly samples, thus expanding the sample size for minority attack classes. The second sub-framework, which we term the CSMC-CVAE, utilizes the anomaly sample as a condition to execute the multi-class network intrusion classification task. A key feature of this sub-framework is the integration of a cost-sensitive learning strategy. This strategy assigns differential classification costs—manifested as weights in the loss function—to various classes, thereby compelling the model to prioritize the correct classification of high-cost categories during training. In the following subsections, we will detail the specific architectures of the two CVAEs, as well as the implementation of our cost-sensitive learning strategy.

3.2. DA-CVAE for Data Augmentation

Class imbalance represents a prevalent and formidable challenge in real-world network intrusion detection tasks. Not only does the number of normal samples vastly outnumber attack samples, but there is also a substantial imbalance in the sample distribution across different attack categories. This form of data skewness induces a bias in the classification model during training, causing it to disproportionately focus on the majority class. As a consequence, the model often fails to adequately learn the discriminative patterns of minority attacks, which ultimately leads to a degradation in detection performance for these rare events. To alleviate this issue, we first employ a Conditional Variational Autoencoder dedicated to data augmentation, designated as the DA-CVAE (Data Augmentation Conditional Variational Autoencoder).
As depicted in the upper part of Figure 1, the primary goal of this module is to learn and model the data distribution of minority attack classes. It focuses on anomaly samples from these underrepresented categories, generating high-quality and diverse synthetic samples conditioned on the given class label.
To achieve a more effective disentanglement of the sample-specific content information and the class-specific style information, we engineered a parallel dual-encoder architecture. This architecture is composed of two parallel sub-networks within the DA-CVAE’s encoder, namely, E 1 and E 2 . The first sub-network, the content encoder E 1 , is designed to process a real anomaly sample x R D , where D signifies the dimensionality of the feature space. The purpose of the content encoder E 1 is to learn a mapping that extracts unique, class-agnostic content features, f 1 = E 1 ( x ) , from a specific sample. In doing so, E1 effectively parameterizes the content feature distribution, p ( z | x ) , which represents the sample in the latent space. The condition encoder E 2 takes the corresponding attack category label, denoted as y { 0 , 1 } K , as its conditional input. This label is one-hot encoded, and K represents the total number of attack classes. The purpose of the condition encoder E 2 is to learn a mapping to the class-conditional prior distribution, p ( z | y ) . This distribution serves as a global prototype for the given category, and its parameterizing feature vector is given by f 2 = E 2 ( y ) . The class prototypes used by DA-CVAE in the data augmentation process can be regarded as being formed by the parameters of the class-conditional prior distribution. The parameters here are trainable Gaussian parameters in the latent space, which are mainly used to constrain the latent variables of each category sample to cluster near the corresponding category prototype. It should be noted that the category prototype is not a point but a probability distribution, and the distance between each sample and its corresponding category prototype is calculated by KL divergence.
The two encoders, E 1 and E 2 , output feature vectors f 1 and f 2 , which, in turn, parameterize two separate probability distributions. A primary training objective of the model is to align the content feature distribution p ( z | x ) , derived from a specific sample, with the class prototypical distribution p ( z | y ) , derived from its corresponding category. This alignment is enforced by minimizing the Kullback–Leibler (KL) Divergence between the two distributions.
In the training phase, the latent variable z is sampled from the conditional posterior distribution p ( z | x , y ) , which is parameterized by the joint outputs of the content encoder E 1 and the class encoder E 2 . For the decoding stage, the sampled content feature f 1 (representing sample-specific information) is concatenated with the class feature f 2 (representing the generic style of the category). This fusion of features ensures that the reconstruction process is conditioned on both aspects. The resulting concatenated vector [ f 1 , f 2 ] is then fed into the decoder, which guarantees that the generation process remains conditioned on the target class information. The decoder network’s function is to map the latent vector back into the original data space, ultimately yielding a generated anomaly sample, x . This synthetic sample shares the same class label as the input x while exhibiting novel feature characteristics.
The training loss function for the DA-CVAE consists of a weighted sum of two components: a reconstruction loss, which quantifies the discrepancy between the original input x and the reconstructed output x ; and the previously mentioned KL divergence regularization term.
Following the training phase, the DA-CVAE is utilized to generate high-quality synthetic data for specified minority attack classes. The generation process involves sampling a latent vector z from the conditional posterior distribution p ( z | x , y ) . This distribution is jointly defined by the content encoder E 1 (which processes the input sample x) and the category encoder E 2 (which processes the class condition y). The sampled vector z is then passed to the decoder to synthesize a new data instance, thereby creating a more class-balanced training set for the subsequent classification model.
Since DA-CVAE relies on the richness of data samples during training, if the original data of minority-class samples themselves are biased, then the synthetic data will further amplify this bias, which may have a negative impact on the classification results. Second, if the difference between two minority-class samples is small, the synthetic data may further confuse the classification boundary between the two. Therefore, we have set reasonable limits on the volume of synthetic data.

3.3. Cost-Sensitive CVAE for Multi-Class Classification

Subsequent to the data balancing phase performed by the DA-CVAE, we introduce our novel Cost-Sensitive Multi-Class Classification Conditional Variational Autoencoder (CSMC-CVAE), which is designed for precise attack type identification. Although conceptually rooted in generative models, the proposed model is fundamentally discriminative in its objective, aiming for classification rather than data generation. As illustrated in the lower part of Figure 1, the CSMC-CVAE utilizes a conditionally symmetric architecture to reconceptualize the classification problem. Specifically, it transforms the task into a problem of evaluating the likelihood of a candidate class given the evidence of a sample, which is essentially a probabilistic matching problem.
Similar to the DA-CVAE, the CSMC-CVAE utilizes a parallel dual-encoder structure, but with a crucial difference: the input and condition are symmetrically swapped. Furthermore, its decoder is substituted with a specialized classification head. The encoder architecture of the CSMC-CVAE is composed of two parallel sub-networks, hereafter referred to as E 1 and E 2 . The first sub-network, the category prototype encoder E’1, receives a candidate attack category label, denoted as y { 0 , 1 } K , as its conditional input. The label y is a one-hot vector representation, with K representing the total number of distinct attack classes. The E 1 sub-network is tasked with learning a mapping from the category label y to its corresponding prototypical distribution p ( z | y ) in the latent space. This process yields a feature representation f 1 , which is formally defined as f 1 = E 1 ( y ) . The sample feature encoder, E 2 , accepts a real anomaly sample x R D as input, where D is the feature dimension. The role of the E 2 encoder is to learn a mapping from the input sample x to its content feature distribution p ( z | x ) . The output of this encoder is a feature representation, f 2 = E 2 ( x ) .
A core objective of the model’s training is the minimization of the KL divergence between these two distributions. This process ensures that samples belonging to the same category learn to conform to a shared distributional pattern within the latent space, a pattern that is dictated by the corresponding class label. The classification of a test sample x t e s t during the inference stage is performed by identifying the attack category y ^ that minimizes the KL divergence between the sample’s content distribution and the category’s prototypical distribution. This process is formally expressed as follows:
y ^ = arg min y i Y D K L ( q ( z | x t e s t ) | | p ( z | y i ) )
The sampled category and content features are then concatenated to form a composite feature vector, which serves as the input to the final classification head. The responsibility of this classification head is to infer and output the definitive class label for the given anomaly sample.
Standard classification models conventionally rely on the minimization of an average loss function, like cross-entropy, for their training. This approach, however, has a significant shortcoming: the loss function becomes dominated by the prevalent majority classes within the training data. Consequently, the model’s ability to effectively learn and recognize samples from underrepresented minority classes is compromised. Despite the data-level rebalancing achieved by the DA-CVAE, the efficacy of data augmentation may remain limited for certain minority classes. Specifically, for classes with high feature complexity, generating high-fidelity synthetic samples is non-trivial. Consequently, we introduce cost-sensitive learning as an algorithm-level solution to further intensify the model’s focus on these underrepresented categories.
The fundamental concept of cost-sensitive learning is to assign differential misclassification costs to various classes. This is operationalized by weighting the loss function according to these costs, thereby guiding the model’s optimization process to place greater emphasis on the accurate prediction of classes associated with higher penalties. For each class y i , a corresponding cost weight C y i is assigned. This cost weight is generally set to be inversely proportional to the sample size of the corresponding class. In other words, classes with fewer samples are assigned a higher cost weight. For instance, the weights can be formulated as follows:
C y i = γ N t o t a l N y i
where N t o t a l is the total number of training samples, N y i is the number of samples in class y i , and γ is a hyperparameter. This strategy effectively compels the model during training to prioritize the correction of errors on minority classes, since the contribution of these misclassified samples to the overall loss is magnified by their cost weights. We use a strategy based on the inverse frequency of categories to automatically generate costs ( C y i ). This approach has the following benefits:
  • Our approach automates cost assignment by using the inverse of class frequencies. This method obviates the need for subjective, manual cost matrix definition by domain experts, ensuring that the weighting scheme is transparent and highly reproducible.
  • By assigning higher weights to minority classes, the contribution of their misclassification errors to the total loss is significantly magnified. This compels the optimization process to prioritize learning from these underrepresented samples, directly counteracting the learning bias induced by the dominant majority classes.
  • The frequency-inverse weighting scheme is inherently adaptive and does not require prior domain knowledge of attack severity. Costs are automatically inferred from the data distribution, allowing our framework to be readily applied to diverse datasets without manual recalibration, which enhances its versatility and practical applicability.
During its training, the total loss function for the CSMC-CVAE, L j o i n t , consists of two parts:
L j o i n t = α L K L + β L C S L
in which L K L denotes the Kullback–Leibler (KL) divergence between the category prototype distribution p ( z | y ) and the sample content distribution p ( z | x ) . The coefficients α and β are tunable hyperparameters that serve to balance the latent space’s distributional consistency against the multi-class classification objective. L C S L represents our proposed cost-sensitive loss, which is formulated as follows:
L C S L = i = 1 N C y i · L ( y i ^ , y i )
where y i is the true class label of the i-th sample, y i ^ is the model’s predicted output for the i-th sample, and C y i is the cost weight corresponding to the true class y i . The formulation above implies that the base loss for each instance, L ( y i ^ , y i ) , is modulated by the cost weight C y i associated with its true class y i .
The optimization of the joint loss function L j o i n t enables the CSMC-CVAE to do more than just learn a classification task. Crucially, it directs the model to prioritize and bolster its predictive capabilities for underrepresented attack classes. Consequently, the model’s ultimate output, the classification of an anomaly sample, is characterized not only by high overall accuracy but, more importantly, by a markedly superior capability of identifying attack types that are rare yet potentially of high impact.

4. Experiments and Result Analysis

In this section, we systematically evaluate and validate the effectiveness of our proposed CSCVAE-NID framework through a comprehensive suite of experiments. We begin by detailing the experimental setup, which encompasses the dataset descriptions, the chosen performance evaluation metrics, and the key hyperparameter settings for our models. Next, we present the performance of the CSCVAE-NID framework on the multi-class intrusion detection task. A comprehensive comparison against various baseline and state-of-the-art methods is then conducted to fully assess its superiority. Furthermore, we delve into how the model’s performance is affected by different levels of class imbalance. Lastly, in order to ascertain the individual contribution of each innovative component within the framework, a series of thorough ablation studies were designed and conducted.

4.1. Experimental Setup

Datasets: The effectiveness of the proposed framework was comprehensively validated using two well-established benchmark datasets in the domain of network anomaly detection: CICIDS-2017 [11] and UNSW-NB15 [16,17,18]. The CICIDS-2017 dataset encompasses network traffic data spanning a continuous five-day period. The dataset is structured such that the traffic on the first day is purely benign. The subsequent four days, however, are populated with a variety of contemporary network attacks, which encompass brute-force FTP, brute-force SSH, Denial of Service (DoS), Heartbleed, web attacks, infiltration, botnet, and Distributed Denial of Service (DDoS). Every instance within this dataset comprises 78 network traffic features and a single class label, encompassing both the normal category and various common attack types. The UNSW-NB15 dataset, in its original form, comprises nine distinct attack categories. Each data instance is characterized by a 47-dimensional feature vector and a corresponding class label. For this experiment, we selected five representative categories for investigation: one normal class and four distinct attack classes. These four attack categories were chosen to encompass both prevalent and rare sample types. Specifically, DoS and reconnaissance represent high-frequency attacks abundant in the dataset, whereas Shellcode and Worms represent low-frequency attacks with very limited samples. The inclusion of the latter is intended to evaluate the model’s performance under conditions of extreme class imbalance. Table 1 provides the detailed statistical information for the training data utilized in our experiments.
Given the substantial variance in scales and numerical ranges across the feature attributes of the dataset, a normalization step was applied. All feature values were linearly scaled to the closed interval of [0, 1]. This transformation is formally defined by the mathematical equation shown below:
x = x x m i n x m a x x m i n
where x and x represent the original data and the normalized data, respectively, while x m i n and x m a x denote the minimum and maximum values in the current attribute, respectively.
Evaluation Metrics: To accurately assess the efficacy of our proposed method, we employed three key evaluation metrics: Precision, Recall, and F1-score. For simplicity, let N T P be the number of anomaly samples classified as attacks, N T N be the number of normal samples classified as normal, N F P be the number of normal samples classified as attacks, and N F N be the number of anomaly samples classified as normal. The metrics are then defined as follows:
(1)
Precision : Precision measures the proportion of true positive instances among all instances classified as positive (anomalous), and Precision = N T P N T P + N F P .
(2)
Recall : Recall is the percentage of correctly predicted anomaly samples out of the total number of actual anomaly samples, and Recall = N T P N T P + N F N .
(3)
F 1 - Score : The F 1 - score is the harmonic mean of Precision and Recall, providing a single metric to measure the overall detection accuracy of a model, and F 1 - score 2 × ( Precision × Recall ) Precision + Recall .
Table 2 presents the results for the three evaluation metrics, which are derived from the confusion matrix.
Our proposed method was implemented and evaluated on an NVIDIA A30 GPU with the PyTorch [46] framework. Specifically, the DA-CVAE for data augmentation and the CSMC-CVAE for multi-class classification share a nearly identical encoder–decoder architecture, the specifics of which are detailed in Table 3. It is important to note that a crucial modification was made for the latter model, the CSMC-CVAE. Specifically, its final deconvolutional layer (DeConv4) was replaced by a specialized classification head to facilitate the multi-class prediction task. We employed the Adam [47] optimizer for the two-stage training of the DA-CVAE and the CSMC-CVAE. The initial learning rates were set to 0.0005 and 0.0002, respectively, followed by a gradual decay based on the cosine annealing schedule [48]. Acknowledging the disparities in scale and complexity between the UNSW-NB15 and CICIDS2017 datasets, the models were trained for 50 and 60 epochs, respectively, with a uniform batch size of 128. During the specific training process, our objective was to achieve the highest possible macro F1-score. To this end, we used a small-batch validation set to empirically fine-tune the hyperparameters α , β , and γ . The hyperparameters were configured as follows: γ in Equation (2) was set to 0.2, and the coefficients α and β in Equation (3) were assigned values of 0.25 and 0.75, respectively.
As shown in Table 4, our method exhibits competitive training efficiency, being faster than TMG-IDS, SALAD, and DCHAE. While its total training time is 0.39 h longer than that of CAEP, this is offset by its superior average inference speed, which significantly surpasses that of all compared state-of-the-art methods.

4.2. Experimental Results

This section presents an extensive set of validation experiments conducted on the CICIDS2017 and UNSW-NB15 datasets, encompassing both binary and multi-class classification scenarios.

4.2.1. Comparisons with State-of-the-Art Methods

Binary Classification Results: To initially assess the efficacy of our framework for the foundational task of anomaly detection, we performed binary classification experiments on the preprocessed UNSW-NB15 and CICIDS-2017 datasets. In this set of experiments, a binary classification scheme was adopted by aggregating all disparate attack traffic types into a unified “Attack” class, in contrast to the “Normal” class. Table 5 details the Precision, Recall, and F1-score achieved by each model for both the normal and attack classes. The final row, labeled “Macro-”, corresponds to the macro-average of the three evaluation metrics. The macro-average was derived by calculating the simple arithmetic mean of the per-class metric scores, thus providing an impartial evaluation of the model’s aggregate performance on the imbalanced dataset. The experimental results unequivocally demonstrate that the proposed CSCVAE-NID framework outperforms all competing methods on both datasets. Table 5 presents the experimental results for the binary classification task on the UNSW-NB15 dataset. A clear observation from the results is that models built upon deep learning principles or featuring tailored architectures, such as DCHAE and CAEP, consistently outperform their counterparts. The DCHAE model stands out as the best-performing baseline among all comparative methods, attaining macro-averaged Precision, Recall, and F1-scores of circa 0.971. Building upon this strong baseline, our proposed CSCVAE-NID framework demonstrates a notable further enhancement in performance. Focusing on the macro-averaged metrics, our proposed method demonstrates state-of-the-art performance, achieving optimal scores of 0.988 for Precision, 0.984 for Recall, and 0.986 for F1-score. Our model outperforms the strongest baseline, CAEP, with relative improvements of 1.7%, 2.0%, and 1.9% in these three key metrics, respectively. In conclusion, the results from the binary classification experiments unequivocally validate the efficacy of our proposed method. It not only demonstrates a superior ability to differentiate between normal and anomalous traffic but has also established a new benchmark for state-of-the-art detection performance.
In Table 6, for the CICIDS2017 dataset, the CSCVAE-NID framework demonstrated superior performance, with macro-averaged Precision, Recall, and F1-scores reaching 0.978, 0.985, and 0.981, respectively. These results represent a significant improvement over all other methods under comparison. Specifically, our CSCVAE-NID outperforms the best-performing baseline, DCHAE, by 0.7%, 1.3%, and 1.0% in terms of macro-averaged Precision, Recall, and F1-score, respectively.
Multi-Class Classification Results:  Table 7 presents the experimental results for the multi-class classification task on the UNSW-NB15 dataset. This dataset presents a significant challenge to classification models due to the presence of rare attack categories, such as “Shellcode” and “Worms”, which are characterized by an extreme scarcity of samples. An examination of the table reveals that the SALAD model exhibits the most competitive performance among all baseline methods in terms of macro-averaged metrics. It attained scores of 0.908 for Precision, 0.914 for Recall, and 0.911 for F1-score. Surpassing this leading baseline, our proposed CSCVAE-NID framework achieves a new level of performance. Focusing on the macro-averaged metrics, our proposed method demonstrates the best performance, achieving optimal scores of 0.934 for Precision, 0.933 for Recall, and 0.933 for F1-score. Our model outperforms the strongest baseline, SALAD, with relative improvements of 2.86%, 2.08%, and 2.41% in these three key metrics, respectively. This outcome underscores the robustness of our model. Even when confronted with the complexity and severe class imbalance of the UNSW-NB15 dataset, our framework demonstrates the ability to accurately classify not only normal traffic but also diverse attack types, thereby establishing its superior and optimal detection efficacy.
Table 8, in addition, presents the performance of the models on the multi-class classification task using the CICIDS-2017 dataset. An analysis of the baseline models’ performance on this dataset also reveals a significant challenge in accurately classifying attack categories that are underrepresented. To illustrate, the RFFE model exhibited a notably low Recall for the underrepresented “Web Attack” (label 4) and “Bot” (label 5) classes, with scores of only 0.689 and 0.577, respectively. Even though advanced approaches like MF-Net and TMG-IDS have bolstered the detection of majority-class samples, accurately identifying these rare attack categories continues to be a significant challenge for them. In the multi-class classification task on this dataset, TMG-IDS emerged as the top-performing model among all baselines, achieving macro-averaged Precision, Recall, and F1-scores of 0.949, 0.953, and 0.951, respectively. Once again, our proposed CSCVAE-NID framework surpasses this strong baseline by further boosting the three key macro-averaged metrics to 0.977, 0.982, and 0.980, respectively. This constitutes a significant enhancement of 2.95%, 3.04%, and 3.05%, respectively, over the leading baseline model, TMG-IDS.

4.2.2. Ablation Studies

Effectiveness of Key Components: Ablation studies were conducted to verify the effectiveness of each enhancement made to our model for the multi-class classification task. Table 9 presents the detection results obtained from various combinations of our proposed enhancements. The first combination, denoted as (1), constitutes the baseline model, which excludes all of our proposed enhancements. Specifically, this baseline consists of a standard multi-class CVAE (MC-CVAE) trained exclusively on the original, imbalanced data, without the application of any cost-sensitive learning strategy. Under this setup, the misclassification costs are treated as uniform across all classes (i.e., C y i = 1 for all i). Adding the DA-CVAE data augmentation module in Combination (2) yielded a substantial boost in the overall performance of the model. On the UNSW-NB15 dataset, Precision, Recall, and F1-score were boosted by approximately 3.7%, 1.9%, and 2.8%, respectively, while on the CICIDS2017 dataset, the corresponding improvements were about 2.3%, 1.8%, and 2.0%, respectively. Combination (3) introduces the cost-sensitive learning strategy to the baseline model without data augmentation. This allows us to isolate the impact of the algorithm-level enhancement. On the UNSW-NB15 dataset, this addition elevates the Precision, Recall, and F1-score to 0.968, 0.970, and 0.969, respectively, outperforming the data-level augmentation of Combination (2). A similar trend is observed on the CICIDS2017 dataset. The incremental contribution of integrating both strategies is evident in Combination (4), which represents our full CSCVAE-NID framework. When comparing Combination (4) to Combination (2) (which only has data augmentation), the further integration of the cost-sensitive strategy boosts the F1-score by an additional 3.2% on the UNSW-NB15 dataset (from 0.948 to 0.980) and by 2.1% on the CICIDS2017 dataset (from 0.912 to 0.933). Combination (4), which incorporates all of our proposed enhancements, achieves the final optimal results. The ablation study results confirm that the incorporation of each component into our methodology progressively enhances the final detection performance. This validates that each modification and enhancement within the CSCVAE-NID framework is indeed meaningful and contributes to the overall efficacy.
To investigate the impact of key hyperparameters on model performance, we performed a sensitivity analysis focusing on the crucial hyperparameter γ . Our investigation specifically focused on the hyperparameter γ , a crucial component of the CSMC-CVAE’s joint loss function. The role of γ is to control the trade-off between the cost-sensitive classification loss and the KL divergence regularization term, and its impact was thoroughly analyzed through a series of experiments. Table 10 illustrates the effects of varying the hyperparameter γ on the multi-class detection performance across the UNSW-NB15 and CICIDS-2017 datasets.
A clear observation from the results in Table 10 is that the model’s performance, as measured by Precision, Recall, and F1-score, is highly sensitive to the choice of the hyperparameter γ . The model’s performance metrics, particularly the F1-score, peak when the hyperparameter γ is set to approximately 0.20. At the optimal setting of γ = 0.20, the F1-score achieved was 0.980 on the UNSW-NB15 dataset. For the CICIDS-2017 dataset, this same setting yielded the peak F1-score of 0.933. This finding underscores the robustness of our proposed model across diverse datasets. Furthermore, it indicates that the optimal trade-off point for its key hyperparameters is largely consistent, irrespective of the dataset. A decrease in the model’s baseline performance is observed when the value of γ is suboptimal on the lower side. This phenomenon can be attributed to the reduced emphasis on the KL divergence term during optimization. A smaller γ provides insufficient regularization for the latent space, leading to unstable learning of the category prototypes and, ultimately, a degradation in classification accuracy. Conversely, an overly large value for γ causes the model to over-prioritize the classification of minority classes. This comes at the cost of neglecting the detection of majority-class samples, resulting in a decline in the macro-averaged performance metrics.
Our key innovation for handling imbalanced data is the DA-CVAE. We benchmarked its performance against three state-of-the-art oversampling techniques [49,50,51] using the CICIDS-2017 dataset. In this comparative study, the synthetic data generated by each method, including our DA-CVAE, was used to train the same CSMC-CVAE classifier. The resulting performance metrics are summarized in Table 11.
On aggregate, the experimental findings confirm that an optimal equilibrium exists between the cost-sensitive classification objective and the latent space distribution regularization in our CSCVAE-NID framework. The hyperparameter γ serves as the critical control parameter for modulating this equilibrium. A well-chosen value for the hyperparameter γ ensures a synergistic interplay between the cost-sensitive learning strategy and the CVAE’s generative classification principle. This synergy is crucial for attaining the optimal detection efficacy.

5. Conclusions

In this work, we present the CSCVAE-NID, a novel two-stage framework for network intrusion detection. This framework is specifically engineered to achieve a significant enhancement in detection performance for a diverse range of attack types, particularly within the challenging context of imbalanced datasets.
To begin with, we address the challenge of underrepresented minority attack classes in the training data by proposing the DA-CVAE (Data Augmentation Conditional Variational Autoencoder). This generative model is engineered to effectively learn the distinct data distribution of each attack category, enabling the conditional generation of high-quality and diverse synthetic samples. By generating synthetic data, the DA-CVAE constructs a more class-balanced training corpus for the downstream classification model. This data-level enhancement provides a solid groundwork for mitigating the challenges posed by the imbalanced class distribution.
Secondly, we introduce the core of our framework: the CSMC-CVAE (Cost-Sensitive Multi-Class Classification Conditional Variational Autoencoder). This model reconceptualizes the classification problem as an inference process centered on probabilistic distribution matching. This principle is operationalized through a novel, conditionally symmetric dual-encoder architecture. Crucially, a cost-sensitive learning strategy is incorporated into the training of the CSMC-CVAE. The introduction of a predefined cost matrix fundamentally alters the model’s optimization objective during training. Instead of solely pursuing the maximization of classification accuracy, the model is guided to minimize the aggregate weighted risk associated with various types of misclassification errors.
Experimental evaluation on the public and challenging CICIDS-2017 dataset reveals that the proposed CSCVAE-NID framework significantly outperforms state-of-the-art approaches in both binary and multi-class classification scenarios.
It is noteworthy that our proposed CSCVAE-NID framework offers a solution to the dual challenges of class imbalance and risk-aware classification, tackling them concurrently at both the data and algorithmic levels. Notably, the DA-CVAE component is designed as a modular and independent data augmentation unit. The high-quality synthetic data generated by this module can serve as an input for any sophisticated intrusion detection methodology, highlighting its excellent extensibility.
However, it is also important to discuss the inherent limitations of this data-level approach. Specifically, the effectiveness of the data augmentation provided by the DA-CVAE in the first stage is of importance to the overall performance of our framework. When the processed dataset exhibits severe class imbalance, i.e., the sample size of some classes is extremely small, the DA-CVAE has difficulty in effectively augmenting such data. This represents a challenge inherent to most existing data augmentation techniques. It is our hope that acknowledging this limitation will stimulate further targeted research and exploration in this area by the academic community.

Author Contributions

Conceptualization, Z.W. and X.Y.; methodology, Z.W.; software, Z.W.; validation, Z.W. and X.Y.; formal analysis, Z.W.; investigation, Z.W.; data curation, Z.W.; writing—original draft preparation, Z.W.; writing—review and editing, Z.W. and X.Y.; visualization, Z.W.; supervision, X.Y.; project administration, X.Y.; funding acquisition, X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the project “Construction of Material Lifecycle Data Resource Node” (Grant No. 2024ZD0607600), Advanced Technology National Science and Technology Major Project.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Mao, J.; Wei, Z.; Li, B.; Zhang, R.; Song, L. Toward Ever-Evolution Network Threats: A Hierarchical Federated Class-Incremental Learning Approach for Network Intrusion Detection in IIoT. IEEE Internet Things J. 2024, 11, 29864–29877. [Google Scholar] [CrossRef]
  2. Nguyen, X.H.; Le, K.H. nNFST: A single-model approach for multiclass novelty detection in network intrusion detection systems. J. Netw. Comput. Appl. 2025, 236, 104128. [Google Scholar] [CrossRef]
  3. Thakkar, A.; Kikani, N.; Geddam, R. Fusion of linear and non-linear dimensionality reduction techniques for feature reduction in LSTM-based Intrusion Detection System. Appl. Soft Comput. 2024, 154, 111378. [Google Scholar] [CrossRef]
  4. Cai, S.; Zhao, Y.; Lyu, J.; Wang, S.; Hu, Y.; Cheng, M.; Zhang, G. DDP-DAR: Network intrusion detection based on denoising diffusion probabilistic model and dual-attention residual network. Neural Netw. 2025, 184, 107064. [Google Scholar] [CrossRef] [PubMed]
  5. Zhang, Y.; Wu, Y.; Huang, X. Toward transferable adversarial attacks against autoencoder-based network intrusion detectors. IEEE Trans. Ind. Inform. 2024, 20, 13863–13872. [Google Scholar] [CrossRef]
  6. Qi, L.; Yang, Y.; Zhou, X.; Rafique, W.; Ma, J. Fast anomaly identification based on multiaspect data streams for intelligent intrusion detection toward secure industry 4.0. IEEE Trans. Ind. Inform. 2021, 18, 6503–6511. [Google Scholar] [CrossRef]
  7. Mehedi, S.T.; Anwar, A.; Rahman, Z.; Ahmed, K.; Islam, R. Dependable intrusion detection system for IoT: A deep transfer learning based approach. IEEE Trans. Ind. Inform. 2022, 19, 1006–1017. [Google Scholar] [CrossRef]
  8. Xu, C.; Shen, J.; Du, X. A method of few-shot network intrusion detection based on meta-learning framework. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3540–3552. [Google Scholar] [CrossRef]
  9. Papamartzivanos, D.; Mármol, F.G.; Kambourakis, G. Dendron: Genetic trees driven rule induction for network intrusion detection systems. Future Gener. Comput. Syst. 2018, 79, 558–574. [Google Scholar] [CrossRef]
  10. Abdulhammed, R.; Musafer, H.; Alessa, A.; Faezipour, M.; Abuzneid, A. Features dimensionality reduction approaches for machine learning based network intrusion detection. Electronics 2019, 8, 322. [Google Scholar] [CrossRef]
  11. Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp 2018, 1, 108–116. [Google Scholar]
  12. Mendonça, R.V.; Teodoro, A.A.; Rosa, R.L.; Saadi, M.; Melgarejo, D.C.; Nardelli, P.H.; Rodríguez, D.Z. Intrusion detection system based on fast hierarchical deep convolutional neural network. IEEE Access 2021, 9, 61024–61034. [Google Scholar] [CrossRef]
  13. Atefinia, R.; Ahmadi, M. Network intrusion detection using multi-architectural modular deep neural network. J. Supercomput. 2021, 77, 3571–3593. [Google Scholar] [CrossRef]
  14. Yang, J.; Chen, X.; Chen, S.; Jiang, X.; Tan, X. Conditional variational auto-encoder and extreme value theory aided two-stage learning approach for intelligent fine-grained known/unknown intrusion detection. IEEE Trans. Inf. Forensics Secur. 2021, 16, 3538–3553. [Google Scholar] [CrossRef]
  15. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  16. Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, 10–12 November 2015; pp. 1–6. [Google Scholar] [CrossRef]
  17. Moustafa, N.; Slay, J. The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Inf. Secur. J. Glob. Perspect. 2016, 25, 18–31. [Google Scholar] [CrossRef]
  18. Moustafa, N.; Slay, J.; Creech, G. Novel Geometric Area Analysis Technique for Anomaly Detection Using Trapezoidal Area Estimation on Large-Scale Networks. IEEE Trans. Big Data 2019, 5, 481–494. [Google Scholar] [CrossRef]
  19. Gamage, S.; Samarabandu, J. Deep learning methods in network intrusion detection: A survey and an objective comparison. J. Netw. Comput. Appl. 2020, 169, 102767. [Google Scholar] [CrossRef]
  20. Dogan, G. ProTru: A provenance-based trust architecture for wireless sensor networks. Int. J. Netw. Manag. 2016, 26, 131–151. [Google Scholar] [CrossRef]
  21. Lawal, M.A.; Shaikh, R.A.; Hassan, S.R. Security analysis of network anomalies mitigation schemes in IoT networks. IEEE Access 2020, 8, 43355–43374. [Google Scholar] [CrossRef]
  22. Li, Y.; Li, J. MultiClassifier: A combination of DPI and ML for application-layer classification in SDN. In Proceedings of the 2014 2nd International Conference on Systems and Informatics (ICSAI 2014), Shanghai, China, 15–17 November 2014; pp. 682–686. [Google Scholar]
  23. Moustafa, N.; Turnbull, B.; Choo, K.K.R. An ensemble intrusion detection technique based on proposed statistical flow features for protecting network traffic of internet of things. IEEE Internet Things J. 2018, 6, 4815–4830. [Google Scholar] [CrossRef]
  24. Maniriho, P.; Mahoro, L.J.; Niyigaba, E.; Bizimana, Z.; Ahmad, T. Detecting intrusions in computer network traffic with machine learning approaches. Int. J. Intell. Eng. Syst. 2020, 13, 433–445. [Google Scholar] [CrossRef]
  25. Tong, X.; Tan, X.; Chen, L.; Yang, J.; Zheng, Q. BFSN: A novel method of encrypted traffic classification based on bidirectional flow sequence network. In Proceedings of the 2020 3rd International Conference on Hot Information-Centric Networking (HotICN), Hefei, China, 12–14 December 2020; pp. 160–165. [Google Scholar]
  26. Zhao, R.; Gui, G.; Xue, Z.; Yin, J.; Ohtsuki, T.; Adebisi, B.; Gacanin, H. A novel intrusion detection method based on lightweight neural network for internet of things. IEEE Internet Things J. 2021, 9, 9960–9972. [Google Scholar] [CrossRef]
  27. Popoola, S.I.; Adebisi, B.; Hammoudeh, M.; Gui, G.; Gacanin, H. Hybrid deep learning for botnet attack detection in the internet-of-things networks. IEEE Internet Things J. 2020, 8, 4944–4956. [Google Scholar] [CrossRef]
  28. Ma, C.; Du, X.; Cao, L. Analysis of multi-types of flow features based on hybrid neural network for improving network anomaly detection. IEEE Access 2019, 7, 148363–148380. [Google Scholar] [CrossRef]
  29. Lunardi, W.T.; Lopez, M.A.; Giacalone, J.P. Arcade: Adversarially regularized convolutional autoencoder for network anomaly detection. IEEE Trans. Netw. Serv. Manag. 2022, 20, 1305–1318. [Google Scholar] [CrossRef]
  30. Zhou, Y.; Song, X.; Zhang, Y.; Liu, F.; Zhu, C.; Liu, L. Feature encoding with autoencoders for weakly supervised anomaly detection. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 2454–2465. [Google Scholar] [CrossRef]
  31. Al-Qarni, E.A.; Al-Asmari, G.A. Addressing imbalanced data in network intrusion detection: A review and survey. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 136–143. [Google Scholar] [CrossRef]
  32. Liu, Z.; Li, Y.; Chen, N.; Wang, Q.; Hooi, B.; He, B. A survey of imbalanced learning on graphs: Problems, techniques, and future directions. IEEE Trans. Knowl. Data Eng. 2025, 37, 3132–3152. [Google Scholar] [CrossRef]
  33. Elyan, E.; Moreno-Garcia, C.F.; Jayne, C. CDSMOTE: Class decomposition and synthetic minority class oversampling technique for imbalanced-data classification. Neural Comput. Appl. 2021, 33, 2839–2851. [Google Scholar] [CrossRef]
  34. Arifin, M.A.S.; Stiawan, D.; Yudho Suprapto, B.; Susanto, S.; Salim, T.; Idris, M.Y.; Budiarto, R. Oversampling and undersampling for intrusion detection system in the supervisory control and data acquisition IEC 60870-5-104. IET Cyber-Phys. Syst. Theory Appl. 2024, 9, 282–292. [Google Scholar] [CrossRef]
  35. Zheng, M.; Hu, X.; Hu, Y.; Zheng, X.; Luo, Y. Fed-UGI: Federated undersampling learning framework with Gini impurity for imbalanced network intrusion detection. IEEE Trans. Inf. Forensics Secur. 2024, 20, 1262–1277. [Google Scholar] [CrossRef]
  36. Tao, X.; Zheng, Y.; Chen, W.; Zhang, X.; Qi, L.; Fan, Z.; Huang, S. SVDD-based weighted oversampling technique for imbalanced and overlapped dataset learning. Inf. Sci. 2022, 588, 13–51. [Google Scholar] [CrossRef]
  37. Tan, X.; Su, S.; Huang, Z.; Guo, X.; Zuo, Z.; Sun, X.; Li, L. Wireless sensor networks intrusion detection based on SMOTE and the random forest algorithm. Sensors 2019, 19, 203. [Google Scholar] [CrossRef] [PubMed]
  38. Wang, J.H.; Septian, T.W. Combining oversampling with recurrent neural networks for intrusion detection. In Proceedings of the International Conference on Database Systems for Advanced Applications, Taipei, Taiwan, 11–14 April 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 305–320. [Google Scholar]
  39. Al, S.; Dener, M. STL-HDL: A new hybrid network intrusion detection system for imbalanced dataset on big data environment. Comput. Secur. 2021, 110, 102435. [Google Scholar] [CrossRef]
  40. Jedrzejowicz, J.; Jedrzejowicz, P. GEP-based classifier for mining imbalanced data. Expert Syst. Appl. 2021, 164, 114058. [Google Scholar] [CrossRef]
  41. Luo, Y.; Chen, R.; Li, C.; Yang, D.; Tang, K.; Su, J. An improved binary simulated annealing algorithm and TPE-FL-LightGBM for fast network intrusion detection. Electronics 2025, 14, 231. [Google Scholar] [CrossRef]
  42. Chen, W.; Yang, K.; Yu, Z.; Shi, Y.; Chen, C.P. A survey on imbalanced learning: Latest research, applications and future directions. Artif. Intell. Rev. 2024, 57, 137. [Google Scholar] [CrossRef]
  43. Zeng, W.; Zhang, C.; Liang, X.; Xia, J.; Lin, Y.; Lin, Y. Intrusion detection-embedded chaotic encryption via hybrid modulation for data center interconnects. Opt. Lett. 2025, 50, 4450–4453. [Google Scholar] [CrossRef]
  44. Wang, Z.; Chu, X.; Li, D.; Yang, H.; Qu, W. Cost-sensitive matrixized classification learning with information entropy. Appl. Soft Comput. 2022, 116, 108266. [Google Scholar] [CrossRef]
  45. Telikani, A.; Gandomi, A.H. Cost-sensitive stacked auto-encoders for intrusion detection in the Internet of Things. Internet Things 2021, 14, 100122. [Google Scholar] [CrossRef]
  46. Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in pytorch. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  47. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  48. Loshchilov, I.; Hutter, F. Sgdr: Stochastic gradient descent with warm restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar]
  49. Habibi, O.; Chemmakha, M.; Lazaar, M. Imbalanced tabular data modelization using CTGAN and machine learning to improve IoT Botnet attacks detection. Eng. Appl. Artif. Intell. 2023, 118, 105669. [Google Scholar] [CrossRef]
  50. Chen, H.; Wei, J.; Huang, H.; Wen, L.; Yuan, Y.; Wu, J. Novel imbalanced fault diagnosis method based on generative adversarial networks with balancing serial CNN and Transformer (BCTGAN). Expert Syst. Appl. 2024, 258, 125171. [Google Scholar] [CrossRef]
  51. He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; pp. 1322–1328. [Google Scholar]
Figure 1. Overview of the proposed CSCVAE-NID framework. Blue boxes represent inputs/outputs, light blue trapezoids are encoder/decoder modules, yellow boxes are features, the pink oval is the sampling process, and the green box is the concatenation operation.
Figure 1. Overview of the proposed CSCVAE-NID framework. Blue boxes represent inputs/outputs, light blue trapezoids are encoder/decoder modules, yellow boxes are features, the pink oval is the sampling process, and the green box is the concatenation operation.
Entropy 27 01086 g001
Table 1. Detailed statistics of the training datasets before and after augmentation.
Table 1. Detailed statistics of the training datasets before and after augmentation.
DatasetLabelCategoryOriginal Training SetAugmented Training Set
UNSW-NB150Normal56,00056,000
1DoS12,26422,264
2Reconnaissance10,49120,491
3Shellcode11332133
4Worms130230
CICIDS20170Benign105,222105,222
1DoS21,55031,550
2Port Scan10,80920,809
3Brute Force52357235
4Web Attack14762476
5Bot8571057
Table 2. Structure of the confusion matrix.
Table 2. Structure of the confusion matrix.
Predicted Class
AttackNormal
Actual ClassAttackTrue Positive (TP)False Negative (FN)
NormalFalse Positive (FP)True Negative (TN)
Table 3. The encoder–decoder architecture of the model.
Table 3. The encoder–decoder architecture of the model.
ComponentLayerFilter SizeStride
EncoderConv13 × 3(2, 2)
Conv2(1, 1)
Conv3(2, 2)
Conv4(1, 1)
DecoderDeConv1 (1, 1)
DeConv2 (2, 2)
DeConv3 (1, 1)
DeConv4 (2, 2)
Multi-Classification
Header
Dense(128)
Softmax
Table 4. Comparison of model time efficiency on the CICIDS-2017 dataset.
Table 4. Comparison of model time efficiency on the CICIDS-2017 dataset.
MethodDatasetTotal Training Time (h)Average Inference Time (ms)
TMG-IDSCICIDS-20178.414.64
SALAD8.351.23
DCHAE8.292.58
CAEP7.833.71
CSCVAE-NID (Ours)8.220.90
Table 5. Binary classification results (UNSW-NB15).
Table 5. Binary classification results (UNSW-NB15).
LabelRFFEIDS-INT
PrecisionRecallF1-scorePrecisionRecallF1-score
Normal0.8690.8820.8750.9330.9270.930
Attack0.8570.8260.8410.9300.9080.919
Macro-0.8660.8670.8660.9320.9220.927
LabelMF-NetTMG-IDS
PrecisionRecallF1-scorePrecisionRecallF1-score
Normal0.9250.9160.9200.9670.9580.962
Attack0.9410.9360.9380.9720.9510.961
Macro-0.9290.9220.9250.9700.9560.963
LabelSALADDCHAE
PrecisionRecallF1-scorePrecisionRecallF1-score
Normal0.9460.9400.9430.9610.9720.966
Attack0.9620.9350.9480.9560.9680.962
Macro-0.9500.9390.9440.9600.9710.965
LabelCAEPCSCVAE-NID (Ours)
PrecisionRecallF1-scorePrecisionRecallF1-score
Normal0.9750.9680.9710.9890.9810.985
Attack0.9610.9540.9570.9850.9920.988
Macro-0.9710.9640.9670.9880.9840.986
Table 6. Binary classification results (CICIDS2017).
Table 6. Binary classification results (CICIDS2017).
LabelRFFEIDS-INT
PrecisionRecallF1-scorePrecisionRecallF1-score
Normal0.8930.9040.8980.9250.9410.933
Attack0.8580.8730.8650.9310.9160.923
Macro-0.8830.8950.8890.9270.9330.930
LabelMF-NetTMG-IDS
PrecisionRecallF1-scorePrecisionRecallF1-score
Normal0.9520.9370.9440.9740.9300.951
Attack0.9280.9400.9340.9460.9530.950
Macro-0.9450.9380.9410.9660.9370.951
LabelSALADDCHAE
PrecisionRecallF1-scorePrecisionRecallF1-score
Normal0.9550.9410.9480.9790.9750.977
Attack0.9360.9650.9500.9520.9660.959
Macro-0.9490.9480.9480.9710.9720.971
LabelCAEPCSCVAE-NID (Ours)
PrecisionRecallF1-scorePrecisionRecallF1-score
Normal0.9640.9490.9560.9820.9890.985
Attack0.9580.9730.9650.9680.9770.972
Macro-0.9620.9560.9590.9780.9850.981
Table 7. Multi-class classification results (UNSW-NB15).
Table 7. Multi-class classification results (UNSW-NB15).
LabelRFFEIDS-INT
PrecisionRecallF1-scorePrecisionRecallF1-score
00.9310.8790.9040.9280.9010.914
10.7190.9210.8080.8460.7690.806
20.5450.8350.6590.6530.7170.684
30.2980.3520.3230.4460.5700.499
40.7720.1600.2650.4830.3840.428
Macro-0.8380.8710.8540.8710.8510.861
LabelMF-NetTMG-IDS
PrecisionRecallF1-scorePrecisionRecallF1-score
00.9540.9090.9310.9290.9510.940
10.6720.8230.7400.8920.8190.854
20.5590.5410.5500.8210.7520.785
30.3420.5780.4300.6890.5500.612
40.7410.6360.6840.5470.5080.527
Macro-0.8500.8420.8420.9050.8980.901
LabelSALADDCHAE
PrecisionRecallF1-scorePrecisionRecallF1-score
00.9340.9440.9390.9630.9200.941
10.8720.8040.8370.7520.9030.820
20.8590.9150.8860.6710.8410.746
30.5070.7010.5880.2830.7070.404
40.4970.4470.4710.5740.7350.645
Macro-0.9080.9140.9110.8820.9030.892
LabelCAEPCSCVAE-NID (Ours)
PrecisionRecallF1-scorePrecisionRecallF1-score
00.9550.9460.9500.9740.9590.966
10.8260.9080.8650.8790.9210.899
20.7710.6740.7190.8060.8400.823
30.3030.5120.3810.7680.6930.729
40.5330.6090.5680.6920.7840.735
Macro-0.9010.8970.8990.9340.9330.933
Table 8. Multi-class classification results (CICIDS-2017).
Table 8. Multi-class classification results (CICIDS-2017).
LabelRFFEIDS-INT
PrecisionRecallF1-scorePrecisionRecallF1-score
00.8410.8740.8570.9040.9180.911
10.9180.8590.8880.9370.9060.921
20.9020.9340.9180.9260.8600.892
30.9610.7820.8620.7980.8510.824
40.7430.6890.7150.9050.6990.789
50.8920.5770.7000.9310.7840.851
Macro-0.8600.8680.8640.9060.9060.906
LabelMF-NetTMG-IDS
PrecisionRecallF1-scorePrecisionRecallF1-score
00.9350.9470.9410.9550.9670.961
10.9720.9310.9510.9260.9340.930
20.9380.9840.9600.9830.9250.953
30.9040.8460.8740.9200.8810.900
40.8960.8210.8570.8740.7960.833
50.9610.9190.9400.9070.9420.924
Macro-0.9380.9410.9390.9490.9530.951
LabelSALADDCHAE
PrecisionRecallF1-scorePrecisionRecallF1-score
00.9290.9110.9200.9460.9560.951
10.9670.9630.9650.9130.9430.928
20.9430.9260.9340.9470.9540.950
30.9500.8250.8830.9320.8930.912
40.9120.9070.9100.9080.8490.878
50.8270.7430.7830.9250.9410.933
Macro-0.9350.9160.9250.9390.9500.944
LabelCAEPCSCVAE-NID (Ours)
PrecisionRecallF1-scorePrecisionRecallF1-score
00.9400.9810.9600.9790.9830.981
10.9290.9520.9400.9650.9760.970
20.9510.9250.9380.9910.9940.992
30.9370.9560.9460.9830.9780.980
40.9080.8670.8871.0001.0001.000
50.9440.8830.9131.0000.9890.994
Macro-0.9440.9380.9520.9770.9820.980
Table 9. Ablation study results for the multi-class classification task.
Table 9. Ablation study results for the multi-class classification task.
DatasetsMetricsComb. 1 (MC-CVAE)Comb. 2 (+DA-CVAE)Comb. 3 (+Cost-Sens.)Comb. 4 (CSMC-CVAE)
UNSW-NB15Precision0.9140.9510.9680.977
Recall0.9260.9450.9700.982
F1-score0.9200.9480.9690.980
CICIDS2017Precision0.8940.9170.9280.934
Recall0.8900.9080.9250.933
F1-score0.8920.9120.9260.933
Table 10. Impact of the hyperparameter γ on detection performance.
Table 10. Impact of the hyperparameter γ on detection performance.
DatasetsUNSW-NB15CICIDS2017
PrecisionRecallF1-ScorePrecisionRecallF1-Score
γ = 0.16 0.9620.9650.9630.9140.9120.913
γ = 0.18 0.9730.9740.9730.9260.9280.927
γ = 0.20 0.9770.9820.9800.9340.9330.933
γ = 0.22 0.9750.9710.9730.9300.9240.927
γ = 0.24 0.9670.9630.9650.9230.9170.921
γ = 0.26 0.9580.9550.9560.9170.9150.916
Table 11. Comparison of different data augmentation methods.
Table 11. Comparison of different data augmentation methods.
Data AugmentationDatasetClassifierPrecisionRecallF1-Score
CTGANCICIDS-2017CSMC-CVAE0.9680.9710.969
BCTGAN0.9710.9750.973
ADASYN0.9590.9680.963
DA-CVAE (Ours)0.9770.9820.980
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Z.; Yu, X. CSCVAE-NID: A Conditionally Symmetric Two-Stage CVAE Framework with Cost-Sensitive Learning for Imbalanced Network Intrusion Detection. Entropy 2025, 27, 1086. https://doi.org/10.3390/e27111086

AMA Style

Wang Z, Yu X. CSCVAE-NID: A Conditionally Symmetric Two-Stage CVAE Framework with Cost-Sensitive Learning for Imbalanced Network Intrusion Detection. Entropy. 2025; 27(11):1086. https://doi.org/10.3390/e27111086

Chicago/Turabian Style

Wang, Zhenyu, and Xuejun Yu. 2025. "CSCVAE-NID: A Conditionally Symmetric Two-Stage CVAE Framework with Cost-Sensitive Learning for Imbalanced Network Intrusion Detection" Entropy 27, no. 11: 1086. https://doi.org/10.3390/e27111086

APA Style

Wang, Z., & Yu, X. (2025). CSCVAE-NID: A Conditionally Symmetric Two-Stage CVAE Framework with Cost-Sensitive Learning for Imbalanced Network Intrusion Detection. Entropy, 27(11), 1086. https://doi.org/10.3390/e27111086

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop