Design and Optimization of Hybrid CNN-DT Model-Based Network Intrusion Detection Algorithm Using Deep Reinforcement Learning

Qiu, Lu; Xu, Zhiping; Lin, Lixiong; Zheng, Jiachun; Su, Jiahui

doi:10.3390/math13091459

Open AccessArticle

Design and Optimization of Hybrid CNN-DT Model-Based Network Intrusion Detection Algorithm Using Deep Reinforcement Learning

by

Lu Qiu

¹

,

Zhiping Xu

^1,2,*

,

Lixiong Lin

^1,*,

Jiachun Zheng

¹ and

Jiahui Su

¹

School of Ocean Information Engineering, Jimei University, Xiamen 361021, China

²

Key Laboratory of Underwater Acoustic Communication and Marine Information Technology of the Ministry of Education, Xiamen University, Xiamen 361005, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2025, 13(9), 1459; https://doi.org/10.3390/math13091459

Submission received: 1 April 2025 / Revised: 22 April 2025 / Accepted: 28 April 2025 / Published: 29 April 2025

(This article belongs to the Special Issue Reinforcement Learning in Edge Intelligence for Next-Generation Communications and Security)

Download

Browse Figures

Versions Notes

Abstract

With the rapid development of network technology, modern systems are facing increasingly complex security threats, which motivates researchers to continuously explore more advanced intrusion detection systems (IDSs). Even though they work effectively in some situations, the existing IDSs based on machine learning or deep learning still struggle with detection accuracy and generalization. To address these challenges, this study proposes an innovative network intrusion detection algorithm that combines convolutional neural networks (CNNs) and decision trees (DTs) together, named CNN-DT algorithm. In the CNN-DT algorithm, CNN extracts high-level features from data packets first, then the decision tree quickly determines the presence of intrusions based on these high-level features, while providing a clear decision path. Moreover, the study proposes a novel adaptive hybrid pooling mechanism that integrates maximal pooling, average pooling, and global maximal pooling. The hyperparameters of the CNN network are also optimized by actor–critic (AC) deep reinforcement learning algorithm (DRL). The experimental results show that the CNN–decision tree (DT) algorithm optimized by actor–critic (AC) achieves an accuracy of 0.9792 on the KDD dataset, which is 5.63% higher than the unoptimized CNN-DT model.

Keywords:

deep reinforcement learning; network intrusion detection; actor–critic algorithms; convolutional neural networks; decision trees; hybrid pooling optimization

MSC:

68M25; 68T05

1. Introduction

Network security concerns are growing in importance as a result of the quick advancement of information technology and the ongoing growth of network size. As the foundational technology of network security defense, intrusion detection systems (IDSs) are essential to ensuring the safety of information systems.

Traditional intrusion detection systems face challenges in improving accuracy, reducing false alarms, and recognizing unknown attacks [1]. Recent research has progressed through machine learning and deep learning approaches. Ref. [2] proposes an SVM-based feature enhancement framework using logarithmic marginal density ratio transformation, while [3] introduces a method combining PNNs and DBNs that leverages nonlinear learning to transform data into low-dimensional representations and optimizes parameters using Particle Swarm Optimization, outperforming traditional methods on the KDD CUP 1999 dataset. Additionally, ref. [4] addresses the need for robust NIDS amid escalating cyberthreats by exploring various intrusion scenarios in a simulated military network environment and proposing a binary classification framework using TabTransformer, demonstrating efficacy in mitigating threats through detailed methodology encompassing data preprocessing, model architecture, and evaluation metrics.

Regarding deep learning techniques, ref. [5,6] demonstrate that noise-reducing self-encoders can build more robust feature representations by training models to recover original data from artificially noised inputs. These studies establish an essential theoretical foundation for deep learning in intrusion detection. Despite performance improvements, a significant issue persists: most current methods employ preset hyperparameters, limiting model adaptability to diverse and evolving network attacks. Fixed hyperparameter settings struggle to meet varied detection requirements in practical applications, with performance significantly deteriorating when confronting new attack types [7]. Traditional machine learning approaches often face convergence difficulties or perform poorly with high-dimensional state spaces and dynamic attack features, though they excel in discrete, low-dimensional scenarios. While existing research primarily focuses on feature engineering optimization or model structure improvements, minimal attention is given to hyperparameters’ impact on detection performance. Since hyperparameter selection directly influences a model’s learning and generalization capabilities—critical factors for detection precision and robustness—research into adaptive hyperparameter adjustment mechanisms holds considerable theoretical and practical value for enhancing intrusion detection system efficacy.

Reinforcement learning, a vital AI branch, shows considerable potential for hyperparameter optimization through continuous environmental interaction. The Actor-Critic (AC) method effectively optimizes continuous action spaces by combining policy gradient and value function approximation strengths. However, research on using reinforcement learning to optimize deep learning hyperparameters remains scarce, especially for network intrusion detection.

This study proposes using the AC method to optimize a CNN model’s hybrid pooling layer coefficients for dynamic network intrusion detection enhancement. This approach overcomes fixed hyperparameter limitations while adapting to evolving attack patterns through continuous learning. The research comprises three primary components: CNN, decision trees, and deep reinforcement learning.

1.1. Related Works

1.1.1. CNN

CNNs show significant potential in network security, especially for intrusion detection. While advancements include ResNet, DenseNet, and attention mechanisms, few studies optimize fundamental elements like pooling layers.

The pooling layer reduces the feature map dimensionality and extracts significant features [8]. Max pooling preserves texture features and highlights salient information, while average pooling better retains background information. Pooling reduces spatial dimensionality, parameter count, computational costs, prevents overfitting, and decreases training time.

Traditional pooling approaches offer computational efficiency but may not meet specific workload demands. Modern strategies enhance performance and generalization. However, traditional CNNs typically use pre-fixed pooling types and parameters, limiting adaptive feature extraction.

CNN research on IDSs follows two approaches: data preparation and model optimization. Reference [9] converted one-dimensional data into two-dimensional images through data padding. Reference [10] noted that while DBN and SAE were used for intrusion detection, they struggled with new threats. Reference [11] proposed a hybrid system blending Conv-LSTM networks with Spark ML, achieving 97.29% accuracy on the ISCX-UNB dataset. Reference [12] presented a LeNet-5-based system using OHE coding and normalization techniques. Reference [13] created a deep CNN-based system optimized with stochastic search.

Conventional CNNs work well with structured data but require preprocessing for unstructured network traffic, potentially losing information. Active research focuses on optimizing pooling layers to enhance intrusion detection performance.

1.1.2. Decision Tree

A decision tree is a hierarchical machine learning classifier with internal nodes representing feature tests, branches showing outcomes, and leaf nodes displaying classification results. Optimal features are selected using metrics like Information Gain, Gain Ratio, or Gini Index, with key algorithms including ID3, C4.5, and CART. Enhancements like Random Forest and AdaBoost combine multiple trees to improve performance, while hybrid approaches like CNN–decision tree models offer effective detection enhancement. Recent IDS research follows single and hybrid decision tree approaches. A multilevel hybrid classifier combining C4.5 with Bayesian clustering achieved high detection rates on the KDD CUP99 dataset [14]. Another study used AdaBoost with decision tree stumps as weak classifiers, creating rules handling both categorical and continuous features without forced conversion [15]. For complex attacks, HIDS-DT combined protocol analysis with decision trees for both misuse and anomaly detection, enabling the identification of new attacks while maintaining high detection rates [16]. More recently, an adaptive collaborative detection method based on SVM-DT applied the E-CARGO model to combine decision trees with support vector machines [17], demonstrating superior detection accuracy compared to single SVM implementations. Despite these advances, challenges remain in feature selection optimization, model complexity control, and detection accuracy improvement.

1.1.3. Deep Reinforcement Learning

Reinforcement learning, a key branch of machine learning, has demonstrated remarkable potential in network security, particularly for intrusion detection systems. This section introduces the fundamental concepts and current state of the art in reinforcement learning before examining its latest advancements in intrusion detection research. Reinforcement learning is a machine learning approach that determines optimal policies through environmental interactions.

Research on reinforcement learning in intrusion detection systems (IDSs) has progressed significantly, encompassing two main approaches: deep reinforcement learning and basic reinforcement learning methods. Reference [18] proposed using the Markov reward process model for intrusion detection. While this model employed relatively simple evaluation metrics, it effectively assessed detection accuracy and system efficiency. The automatic learning machine method presented in [19] achieved a high 96.13% accuracy rate in detecting denial-of-service attacks by selecting optimal actions through environmental interaction, making significant strides in addressing intrusion detection data dimensionality reduction. For distributed attack challenges, several studies have explored distributed reinforcement learning methods. Reference [20] pioneered applying distributed reinforcement learning in a hierarchical networked sensor agent architecture, offering novel approaches for complex distributed attack detection. Reference [21] enhanced detection performance by improving distributed sensor and decision-making agent architecture, though model accuracy remained unstable, fluctuating between 37% and 92%. Research in [22] identified limitations in existing distributed reinforcement learning methods’ ability to distinguish normal traffic from attack traffic, indicating necessary optimizations in feature engineering and model design.

Traditional reinforcement learning methods like Q-learning and SARSA perform well on discrete, low-dimensional problems but struggle with continuous action spaces and high-dimensional state spaces, often failing to extract and utilize feature information effectively from complex network traffic. Deep Reinforcement Learning (DRL) addresses these limitations by combining deep neural networks with reinforcement learning principles, enabling effective handling of high-dimensional state spaces through superior feature extraction and function approximation. Key algorithms include DQN, Policy Gradient, and actor–critic (AC), with AC’s dual network structure (the actor generating actions, the critic evaluating values) making it particularly suitable for continuous action space optimization. Experience Replay and Target Network techniques mitigate training instability issues, while advanced algorithms like DDPG and A3C enhance performance in complex environments. In IDS applications, ref. [23] introduced a framework using a sampling function based on historical intrusion data to construct a pseudo-environment generating dynamic reward signals from detection errors. Comparing four mainstream DRL models (DQN, DDQN, PG, and AC), experiments revealed DDQN achieved optimal performance on NSL-KDD and AWID datasets, improving both detection accuracy and operational efficiency compared to traditional methods, though hyperparameter optimization remains an open research challenge.

1.1.4. Contributions

In this paper, we propose a hybrid CNN-DT model-based network intrusion detection system. The method uses a hybrid detection structure that combines CNN and DT, and it optimizes the model’s hyperparameters via deep reinforcement learning. The model performs exceptionally well in important measures including accuracy, weighted average precision, weighted average recall, and weighted average F1 score, according to the results. The following is a summary of the particular contributions:

(1) A novel network intrusion detection system based on a hybrid CNN-DT model is proposed, which organically combines CNN and decision tree algorithms, not only retaining the powerful feature extraction capability of the CNN but also utilizing the decision tree to make the final classification judgment on the high-level features extracted by the CNN, and this combination gives full play to the feature extraction capability of the CNN and the interpretability advantage of decision tree.

(2) A hybrid pooling layer, considering maximal pooling, average pooling, and global maximal pooling, is designed for the current hybrid CNN-DT model to adapt to the complex network intrusion detection environment.

(3) The proposed hybrid pooling layer parameters are optimized using the actor–critic deep learning algorithm, which adjusts the pooling layer configurations to improve the model performance.

(4) A comprehensive experimental validation is carried out on the KDD dataset, and the experimental results show that the method achieves excellent performance in terms of detection accuracy, precision, recall, and other metrics.

2. The Proposed Hybrid CNN-DT Network Intrusion Detection (NID) Algorithm

2.1. Overall Description

The algorithm first converts the KDD dataset into a shape that can be processed by the CNN through preprocessing and then dynamically generates hybrid pooling weights (a1, a2, a3) using the actor–critic reinforcement learning architecture, which determine how to combine the outputs of maximal pooling, average pooling, and global maximal pooling; the CNN model processes the input data and extracts the features using these weights. The extracted features are input into the decision tree for classification (Figure 1), and the classification accuracy is fed back to the actor–critic network as a reward signal for updating, and the whole training process is iterated until the optimal combination of pooling weights is found. The optimal set of weights is finally used to evaluate the test data.

2.2. Components of a CNN

The CNN model uses a three-layer convolutional structure, where the first two convolutional layers are followed by a custom hybrid pooling operation (Table 1). The input layer receives data of shape (8, 6, 1) and the hybrid pooling layer is controlled by two learnable weight parameters a1 and a2. The first layer uses 32 3 × 3 convolutional kernels for feature extraction with ReLU activation function and “same” padding, followed by the application of the hybrid pooling layer; the second layer uses 64 2 × 2 convolutional kernels to continue extracting more advanced features, again with the application of hybrid pooling; and the last layer uses 128 2 × 2 convolutional kernels to further extract features, but instead of applying hybrid pooling, it directly uses the Flatten layer to convert the feature map into a one-dimensional feature vector, which is used as input to the subsequent decision tree classifier. The unique feature of this CNN structure is its hybrid pooling mechanism in the first two layers, which dynamically adjusts the importance of different pooling strategies through learnable weights, enhancing the model’s adaptability to different types of features.

2.3. DT Structure

The decision tree classifier uses the classic CART (Classification and Regression Trees) algorithm with a maximum depth limit of 10 to control model complexity. The classifier is implemented using the default Gini impurity as the splitting criterion. The internal nodes of the decision tree represent the feature test conditions and the leaf nodes represent the predicted categories. During training, the split at each node selects features and thresholds that maximize the purity of the child nodes. This tree structure allows the model to learn nonlinear decision boundaries and is particularly suitable for final classification after the CNN has already extracted key features. An important advantage of decision trees is their interpretability, which allows for an intuitive understanding of the path of classification decisions, which is particularly important for security applications such as network intrusion detection.

2.4. Hybrid Pooling Layers

In the context of network intrusion detection, different types of attacks exhibit different characteristic patterns. With a single pooling method, it is difficult to capture these diverse features simultaneously, so this paper proposes a hybrid pooling layer mechanism that combines the advantages of three pooling methods:

(1) Max pooling: Skilled at capturing salient features and preserving sudden anomalies and outliers in network traffic. This is particularly effective for detecting DOS and probing attacks, as these attacks typically exhibit significant peaks in statistical features.

(2) Average pooling: It can preserve background information and overall trends, helping to identify subtle deviations from normal traffic patterns. It is particularly important to consider the overall characteristics of traffic when detecting R2L (remote to local) attacks.

(3) Global max pooling provides a global view of the entire feature map, helping the model understand the overall distribution of features and has advantages in identifying distributed attacks and complex attack sequences.

These three pooling methods have complementarity in feature extraction: max pooling emphasizes local salient features, average pooling preserves statistical background information, and global max pooling provides an overall feature view. By dynamically adjusting the weights of these three pooling methods through deep reinforcement learning, the model can adaptively extract the optimal feature combination for different types of network traffic and attack patterns.

3. The RL-Based Optimization Method for Hybrid CNN-DT-Model-Based NID

The actor–critic algorithm was specifically chosen for optimizing the hybrid pooling weights for several key reasons: (1) Handling continuous action spaces: unlike traditional reinforcement learning methods (such as Q-learning) that excel at handling discrete action spaces, the actor–critic algorithm effectively processes continuous parameters (pooling weights) through its policy gradient approach. (2) Learning stability: the dual-network structure combines the advantages of both policy-based and value-based methods, providing a more stable learning process compared to pure policy gradient methods. (3) Sample efficiency: compared to other reinforcement learning methods, the actor–critic algorithm typically requires fewer samples to learn effective strategies, making it suitable for computationally intensive neural network hyperparameter optimization tasks. (4) Balance between exploration and exploitation: the Actor network promotes exploration of the parameter space, while the critic network provides value estimates, guiding exploitation of promising configurations.

These characteristics make the actor–critic architecture particularly well suited for the dynamic optimization of hybrid pooling weights in network intrusion detection systems, where finding the optimal balance between different pooling strategies is crucial for maximizing detection performance.

The complete training process of the actor–critic (AC) method in a hybrid pooling optimization task is divided into three parts: initialization, main loop, and network update.

In the initialization phase, the parameter initialization is performed first and the actor network parameter

θ_{π}

and the critic network parameter

θ_{v}

are initialized. At the same time, the hybrid pooled weight vector

a = [a_{1}, a_{2}, a_{3}]

is initialized and satisfies

a_{1} + a_{2} + a_{3} = 1

. Next, the environment is set up, including the input data and the CNN model, and the discount factor

γ \in [0, 1]

, the learning rates

α_{π}

and

α_{v}

, and the maximum number of iterations T are set up. The input data

x_{0}

are randomly initialized, and the initial state

s_{0}

is extracted by the CNN (Figure 2).

In the main loop, for each time step

t = 0, 1, \dots, T

, the following steps are performed: First, the current feature

F_{t}

is extracted from the input data and spread into a state vector

s_{t} = flatten (CNN (x_{t}))

. Then, actions (hybrid pooling weights) are generated based on the current state

s_{t}

via the actor network

π (a_{t} | s_{t}; θ_{π})

:

a_{t} = [a_{1}, a_{2}, a_{3}] = π (s_{t}; θ_{π})

, and

a_{1} + a_{2} + a_{3} = 1

. Next, hybrid pooling is applied to the feature

F_{t}

based on the generated weights

a_{t}

:

f e a t u r e s_{t} = a_{1} \cdot MaxPool (F_{t}) + a_{2} \cdot AvgPool (F_{t}) + a_{3} \cdot GlobalMaxPool (F_{t})

Then, a simple classifier (decision tree) is trained using the current pooled features

f e a t u r e s_{t}

, and the classification accuracy is computed as an instant reward

r_{t}

:

r_{t} = Accuracy (DecisionTree.fit_predict((f e a t u r e s_{t}))

. Next, the value of the current state

s_{t}

is estimated by the critic network

V (s_{t}; θ_{v})

:

v_{t} = V (s_{t}; θ_{v})

. Subsequently, the TD (Temporal Difference) error is used to compute the advantage value

A_{t}

:

A_{t} = r_{t} + γ \cdot V (s_{t + 1}; θ_{v}) - v_{t}

, where

V (s_{t + 1}; θ_{v})

is the estimate of the value of the state at the next moment.

In the network update phase, the parameters of the actor and critic networks are updated according to the actor and critic update rule, respectively. First, the critic network is updated by minimizing the squared error loss function,

L_{v} = E [{(r_{t} + γ \cdot V (s_{t + 1}; θ_{v}) - V (s_{t}; θ_{v}))}^{2}]

, the rule is updated:

θ_{v} \leftarrow θ_{v} - α_{v} \cdot \nabla_{θ_{v}} L_{v}

. Then, the actor network is updated to maximize the policy gradient objective,

L_{π} = - E [log π (a_{t} | s_{t}; θ_{π}) \cdot A_{t}]

, and by updating the rule

θ_{π} \leftarrow θ_{π} - α_{π} \cdot \nabla_{θ_{π}} L_{π}

.

Finally, the convergence conditions include the following: the maximum number of training rounds T is reached; the reward

r_{t}

has no significant improvement in k iterations;

∥ \nabla_{θ_{π}} L_{π} ∥

or

∥ \nabla_{θ_{v}} L_{v} ∥

is less than the set threshold

ϵ

. The final output is the optimal hybrid pooled weights

a^{*} = [a_{1}^{*}, a_{2}^{*}, a_{3}^{*}]

and the optimized actor and critic network parameters

θ_{π}^{*}

,

θ_{v}^{*}

.

The pseudocode for the specific actor–critic (AC) method training process is as follows (Algorithm 1):

Algorithm 1: Actor–Critic- based hybrid pooling IDS

4. Experiments Results

4.1. Dataset

This study used a 10% subset of the KDD Cup 1999 network intrusion detection dataset [24]. This dataset was constructed by MIT’s Lincoln Laboratory based on network traffic data collected by the DARPA 1998 Intrusion Detection Evaluation Program and published in the KDD (Knowledge Discovery and Data Mining) competition in 1999. As a classic benchmark dataset in the field of cybersecurity, the KDD Cup 1999 is still widely used for the evaluation and comparison of intrusion detection systems and anomaly detection algorithms.

The dataset for this experiment contained about 494,021 network connection records, each consisting of 41 features and a label. These features could be divided into four categories:

Basic features (9) describe the basic attributes of a TCP/IP connection, such as connection duration (duration), protocol_type, service, and connection status (flag).
Content features (13): features extracted based on domain knowledge for detecting suspicious behaviors in the data section, such as the number of failed logins (failed_logins), the number of shells acquired (num_shells), and the number of access control files (num_access_files).
Time-based traffic features (9) capture time-dependent features by examining connections within the last 2 s, including the number of connections from the same destination (dst_host_count) and the percentage of connections with the same service (dst_host_same_srv_rate), among others.
Host-based traffic features (10): features captured by examining the last 100 connections, such as the number of connections with the same destination host as the current connection (dst_host_count) and the percentage of connections with REJ errors (dst_host_rerror_rate).

In the data preprocessing stage, these features were normalized and adjusted to an 8 × 6 × 1 shape to meet the input requirements of the CNN model. The recombination and filling of features were aimed at preserving the original semantic information of the data while making it suitable for two-dimensional convolution operations. The choice to use all 41 features instead of feature selection was to enable the CNN model to automatically learn and extract the most relevant feature combinations, thereby avoiding biases or omissions that may be introduced by human feature engineering. Experiments have shown that this full feature utilization method combined with hybrid pooling optimization can effectively capture the feature patterns of various types of network attacks, including DoS, Probe, R2L, and U2R attacks. Figure 3 shows the main feature importance distribution extracted by the CNN-DT model from different feature categories.

4.2. Training

The experimental configuration is shown in the Table 2:

4.3. Metrics

Since the dataset suffered from severe category imbalance, the study used accuracy, weighted recall and weighted F1-score, which rely on

TP

(true positives),

TN

(true negatives),

FP

(false positives),

FN

(false negatives) as evaluation metrics. The weighted evaluation metrics provided a fairer, more comprehensive, and more realistic approach to evaluating performance in terms of security requirements. These metrics are defined as follows:

Accuracy: The proportion of all correctly predicted samples to the total sample size. The formula is as follows:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

Precision: The proportion of samples predicted by the model as positive classes that are truly positive classes. The formula is as follows:

P r e c i sin = \frac{T P}{T P + F P}

Weighted recall: The weighted average of recall rates based on the number of samples in each category. The formula is as follows:

W e i g h t e d R e c a l l = \frac{\sum_{i = 1}^{C} n_{i} \cdot R_{i}}{\sum_{i = 1}^{C} n_{i}}

Among them,

P_{i} = \frac{T P_{i}}{T P_{i} + F P_{i}}

n_{i} is the number of samples in category i

Weighted F1-score: The weighted F1 score is the harmonic average of precision and recall, and the final weighted F1 value is obtained by weighting the F1-scores of each category. The formula is as follows:

W e i g h t e d F 1 = \frac{\sum_{i = 1}^{C} n_{i} \cdot F 1_{i}}{\sum_{i = 1}^{C} n_{i}}

Among them,

F 1_{i} = 2 \times \frac{P_{i} \cdot R_{i}}{P_{i} + R_{i}}

n_{i} is the number of samples in category i

4.4. Results

It is first necessary to clarify the distinction between the CNN-DT and DT-CNN model structures. CNN-DT refers to a hybrid model structure that first uses CNN networks to extract features, then inputs these high-level features into a decision tree (DT) for final classification, while DT-CNN refers to a model structure that first uses decision trees to extract or select features, then inputs these features into CNN networks for classification. This structural difference has a significant impact on model performance, as shown in Table 3.

The experimental results fully verified the superiority of the CNN-DT structure and the effectiveness of the hyperparameter optimization strategy based on the actor–critic (AC) algorithm. The specific analysis was as follows (Figure 4):

First, from the comparison of the underlying structures, CNN-DT showed better performance compared to DT-CNN in all evaluation metrics. Specifically, CNN-DT achieved higher scores in three metrics: accuracy (0.9229 vs. 0.9220), recall (0.9083 vs. 0.9045), and F1-score (0.9499 vs. 0.9492). This indicated that the structural design of feature extraction by CNN followed by decision tree classification could obtain better feature representation and decision boundaries.

Second, the hyperparameter optimization using the actor–critic algorithm significantly improved the performance of the CNN-DT model. The actor–critic algorithm, as a deep reinforcement learning method, combines the advantages of policy gradient (actor) and value function estimation (critic), where the actor is responsible for selecting the hyperparameters for the action strategy, and the critic is responsible for evaluating the value of the selected strategy, and the two work together to optimize the model hyperparameters. The optimized CNN-DT model achieved a high score of 0.9792 on all the metrics, which was an improvement of 5.63%, 7.09%, and 2.93% in accuracy, recall, and F1-score, respectively, compared to the unoptimized CNN-DT model. Achieving the same high scores for all evaluation metrics indicated that the optimization process based on the AC algorithm not only successfully improved the overall performance of the model but also achieved a perfect balance of the metrics.

These results validated two important design choices: (1) the CNN-DT architectural design had obvious advantages over DT-CNN; and (2) the actor–critic algorithm demonstrated excellent performance in hyperparameter optimization tasks. Through the dual learning mechanism of actor and critic, the algorithm was able to effectively find the optimal configuration in the complex hyperparameter space, which enabled the optimized CNN-DT model to achieve a consistently high score of 0.9792 in all the metrics.

In addition, the significant performance improvement obtained by the actor–critic algorithm also highlighted the importance of reinforcement learning optimization methods in deep learning applications. The AC algorithm not only effectively improved the performance of the model but also achieved a perfect balance of the indicators through dynamic adjustment of the strategy and continuous evaluation of the improvement, which demonstrated that the hyperparameter optimization methods based on reinforcement learning could well deal with complex optimization problems and help the model obtain a better generalization ability.

In addition to the performance metrics shown in Table 3, this study also calculated Cohen’s Kappa coefficient to evaluate the classification consistency of the model. The CNN-DT model optimized by the actor–critic network achieved a Kappa coefficient of 0.7947, indicating high consistency and reliability in classification tasks. This result further validated the effectiveness of the proposed model, especially in handling network intrusion detection tasks with imbalanced class distributions.

Furthermore, the ROC curve analysis (as shown in Figure 5) demonstrated that the model achieved an AUC (Area Under the Curve) of 0.93, indicating excellent performance in distinguishing normal traffic from intrusion attacks. These evaluation metrics collectively confirmed the significant advantages of combining the CNN-DT structure with actor–critic optimization in improving intrusion detection performance.

The experimental results in Figure 6 indicated the decision tree’s depth significantly impacted model performance. The model with a depth of 3 demonstrated stable weight changes and sustained cumulative rewards; the model with a depth of 5 showed severe weight fluctuations throughout training, particularly extreme values near 200 batches, with cumulative rewards declining after the initial increase; while the depth-10 model, despite large initial weight adjustments, converged quickly and maintained high reward levels similar to the lesser-depth model. This suggested optimal tree depth was not simply “deeper is better” but required balancing between complexity and stability, where lesser-depth or larger-depth configurations may outperform medium-depth settings.

The weight change curves revealed how pooling parameters

(a_{1}, a_{2}, a_{3})

evolved during training across different tree depths. The depth-3 tree showed stable weight trends with minor parameter adjustments, indicating smooth learning and gradual optimization. The depth-5 tree exhibited dramatic weight fluctuations, especially extreme values near 200 batches, suggesting medium-depth trees were unstable when combined with CNNs. This instability likely stemmed from mismatches between model complexity and data characteristics, causing learning oscillations. The non-convergence in Figure 6b demonstrated the actor–critic deep learning algorithm struggled to find stable parameter combinations at that depth, and a noteworthy finding that contradicted the intuition that “moderately complex models perform best”. The depth-10 tree, despite larger initial weight adjustments, converged rapidly and stably, indicating deeper trees could form more stable synergies with CNNs, possibly by capturing more complex decision boundaries from high-level CNN features.

The cumulative reward curves reflected performance changes during training. The depth-3 model showed steady improvement, indicating shallow trees continuously benefited from CNN-extracted features. The depth-5 model’s rewards declined significantly after initial gains, corresponding to the observed weight instability and suggesting medium-depth trees may overfit CNN-extracted features. The depth-10 model maintained high reward levels similar to shallow models, demonstrating deep trees could effectively utilize complex CNN features while avoiding the instability of medium-depth models.

The non-convergence in Figure 6b deserves special attention. In the depth-5 configuration, severe weight fluctuations and extreme values near 200 batches indicated the actor–critic algorithm struggled to find stable pooling weights. This may result from decision boundary complexity mismatches, where depth-5 trees inadequately expressed the required boundaries while being more complex than depth-3 models; gradient instability at this specific depth causing actor network convergence difficulties; and exploration–exploitation imbalance issues causing performance fluctuations.

This finding has significant practical implications: when designing hybrid CNN-DT models, tree depth selection should avoid the “middle ground”, favoring either shallow (simple but stable) or deep (complex but expressive) configurations. Careful depth selection and validation are crucial for both accuracy and training stability. By selecting appropriate tree depth (10 in this study) combined with early stopping, we successfully overcame the non-convergence shown in Figure 6b, ultimately achieving a stable, high-performance intrusion detection model.

Table 4 shows the effect of different learning rates on the model reward values: the learning rate of 3 × 10⁻⁴ performed the best, not only reaching the highest maximum reward value (0.9923) but also maintaining a near-optimal final reward value (0.9830); the performance of the learning rate of 1 × 10⁻⁴ was relatively stable, and although the maximum reward was a little lower (0.9874), the final reward was maintained at a high level (0.9787); while a learning rate of 3 × 10⁻³ achieves a high maximum reward (0.9896), but the final reward decreased (0.9562); the most significant instability occurred in the configuration with a learning rate of 1 × 10⁻³, which had a high maximum reward value (0.9896), but the final reward decreased dramatically to a mere 0.5189. This suggests that appropriately lowering the learning rate helps the model to maintain stability, while too high a learning rate may cause the model to fail to converge to the optimal solution, and in particular, a moderate learning rate (1 × 10⁻³) may trigger the most severe performance degradation.

To verify the effectiveness of the hybrid pooling mechanism, we conducted ablation experiments and tested the performance of models using max pooling, average pooling, global average pooling, and hybrid pooling separately. The experimental results are shown in Table 5:

The experimental results showed that although a single pooling method could achieve high performance indicators, the hybrid pooling mechanism achieved better results in accuracy, recall, and F1-score by dynamically adjusting the weights of different pooling methods. Especially in terms of F1-score, hybrid pooling improved by 0.42 percentage points compared to the best single pooling method (max pooling). This confirmed that the hybrid pooling mechanism could more comprehensively capture complex feature patterns in network traffic, improving the model’s ability to detect different types of network attacks.

To validate the feasibility of using classification accuracy as the reward function in our reinforcement learning framework for network intrusion detection, we provide experimental justification. Accuracy is a direct and computationally efficient metric that aligns with the primary goal of accurately identifying intrusions, offering a clear learning signal for the agent. Comparative experiments, detailed in Table 6, evaluated accuracy, F1-score, and a combined accuracy+F1-score reward function, yielding nearly identical performance (accuracy, recall, and F1-score of 0.9792 for the accuracy-based model), confirming its effectiveness. Furthermore, our approach mitigated class imbalance in intrusion detection datasets by leveraging CNN-extracted high-quality features, which created balanced, linearly separable representations for decision tree classification, as evidenced by strong recall and F1-scores (0.9792). The decision tree’s tendency to maximize information gain and balance class importance under depth constraints complemented accuracy as a reward signal, enhancing optimization in the CNN-derived feature space. Thus, experimental results collectively demonstrated that using accuracy as the reward function was both feasible and effective, simplifying model design while achieving superior detection performance.

The experimental results in Table 7 clearly demonstrate the superiority of our proposed CNN-DT hybrid model based on deep reinforcement learning optimization in network intrusion detection tasks. We can analyze them from the following aspects:

Firstly, from the comparison of deep reinforcement learning optimization methods, all three deep reinforcement learning methods (actor critic, DQN, and DDQN) achieved significant performance improvements, with the actor–critic method achieving the best results (accuracy of 0.9792), slightly better than DQN (0.9787) and DDQN (0.9783). This result indicated that the actor–critic framework had advantages in handling continuous action spaces (pooling layer weights) and could more finely adjust model parameters. Although DQN and DDQN achieved similar results by discretizing the action space, they had some shortcomings in terms of precise optimization. Secondly, compared to traditional machine learning methods, our approach significantly improved performance. The accuracy of the best traditional method XGBoost was 0.9283, and our method improved it by 5.09 percentage points. Compared to Random Forest (0.9261) and LightGBM (0.9271), the improvement rates were 5.31% and 5.21%, respectively. Compared to SVM (0.8108), the improvement was as high as 16.84%. This indicated that our hybrid model structure could effectively combine the feature extraction capability of CNNs and the classification advantage of decision trees. Thirdly, compared to pure deep learning methods, our approach also demonstrated significant advantages. The accuracy of both CNN and Conv-LSTM was 0.9047, and our method improved it by approximately 7.45 percentage points. Compared to FT Transformer (0.8052), the improvement was as high as 17.4%. This result suggests that a reasonable hybrid model structure may be more effective than a single deep model in specific tasks such as network intrusion detection. Finally, it is worth noting that among all traditional and deep learning methods, ensemble learning methods (XGBoost, Random Forest, LightGBM) performed better overall than pure deep neural network methods, indicating that feature combination and decision boundary learning may be more important than deep feature extraction in network intrusion detection problems. Our method successfully overcame the limitations of a single model by combining the feature extraction ability of the CNN and the decision boundary learning ability of the decision tree, using deep reinforcement learning to optimize the weights.

Overall, the experimental results fully validated the effectiveness of our proposed CNN-DT hybrid model architecture and the advantages of using deep reinforcement learning to optimize model hyperparameters. All three deep reinforcement learning methods could effectively improve model performance, providing new directions for intelligent algorithm design in the field of network security.

While the KDD Cup 1999 dataset serves as a classic benchmark for intrusion detection systems, it is admittedly dated and may not fully reflect the characteristics of modern network attacks. To address this limitation and further validate the effectiveness of the proposed CNN-DT model with AC optimization, additional experiments were conducted on two more recent benchmark datasets: CICIDS2017 and UNSW-NB15.

The CICIDS2017 dataset contains modern attack scenarios including brute force attacks, DoS, DDoS, Web attacks, and insider threats collected in a realistic network environment. The UNSW-NB15 dataset comprises nine types of modern attacks, providing a more diverse and challenging evaluation scenario. As shown in Table 8, the proposed model demonstrated exceptional performance on these modern datasets, achieving even higher metrics than on the KDD dataset in some cases. The near-perfect performance on the CICIDS2017 dataset with a remarkable 1.0000 recall rate indicated that the model could identify virtually all attacks without missing legitimate instances. The slightly lower but still impressive performance on UNSW-NB15 (accuracy of 0.9491 and Kappa of 0.8851) demonstrated the model’s robustness against more diverse and complex attack patterns.

These results further confirm that the CNN-DT structure with AC optimization maintains its effectiveness when faced with modern attack scenarios, suggesting strong generalization capabilities across different network security contexts and time periods.

5. Discussion

The experimental results clearly demonstrated the effectiveness of the CNN-DT model optimized through the actor–critic algorithm with hybrid pooling. Several key findings emerged from this study:

(1) Model architecture advantages. The CNN-DT architecture demonstrated superior performance compared to the DT-CNN configuration. This superiority could be attributed to the effective processing sequence—combining CNN’s powerful feature extraction capabilities with DT’s interpretable classification mechanism. This configuration enabled the model to first capture complex patterns in network traffic data through convolutional operations, then make clear and interpretable decisions based on these high-level features.

(2) Hyperparameter optimization. The actor–critic (AC) algorithm significantly enhanced model performance, with the optimized model showing a remarkable 5.63% improvement in accuracy. This improvement suggests that dynamically adjusting hyperparameters through reinforcement learning can effectively enhance the model’s adaptability to diverse network attack patterns. The consistent performance across all metrics (accuracy, precision, recall, and F1 score all at 0.9792) indicates that the AC algorithm achieved a balanced optimization.

(3) Decision tree depth analysis. As shown in Figure 6, the depth of the decision tree significantly influenced model stability and performance. Based on experimental results, a decision tree with a depth of 10 was selected for the final model, but extremely shallow (depth = 3) and deeper (depth = 10) trees exhibited better stability than medium-depth trees (depth = 5). This counter-intuitive finding suggests that the relationship between model complexity and performance is not strictly linear in intrusion detection tasks.

(4) Learning rate sensitivity. The model exhibited high sensitivity to learning rate selection, with the optimal learning rate of 3e-04 providing both high maximum rewards and stable final performance. Higher learning rates led to unstable convergence, while lower rates resulted in slower but stable learning. This sensitivity highlights the importance of careful hyperparameter tuning in reinforcement learning-based optimization.

(5) Addressing decision tree overfitting. This research carefully addressed the potential overfitting issue in decision trees, especially when tree depth became excessive. The following strategies were implemented to mitigate this risk: (a) Depth Limitation: through experimental validation, the maximum depth of the decision tree was set to 10, achieving a balance between model complexity and generalization ability. (b) Early stopping technique: as mentioned in the conclusion section, early stopping strategies were introduced during the training process, terminating training when performance no longer improved, thus preventing overfitting. (c) Enhanced feature quality: by utilizing features extracted by CNN rather than raw data, the decision tree classifier received higher-quality, more-abstract feature representations, thereby reducing the risk of fitting to noise in the original data. These measures collectively enhanced the model’s generalization capability while preserving the interpretability advantages of decision trees. All performance metrics in this study were based on the test set evaluation to ensure results reflected the model’s generalization ability. The test set accuracy of 0.9792 and Kappa statistic of 0.7947 indicated that the model performed consistently on unseen data with no significant signs of overfitting, validating the effectiveness of these strategies.

(6) Hybrid pooling advantage. The hybrid pooling mechanism enhanced the model’s ability to capture diverse features by combining the advantages of max pooling, average pooling, and global max pooling. The ablation experiment proved that compared with a single pooling method, mixed pooling significantly improved various evaluation indicators, which verified the effectiveness of our proposed method in network intrusion detection tasks.

(7) Comparison of reward functions. The experimental results demonstrated the comparison of different reward signals. Using accuracy, F1 score, or a combination of both as reward functions, the model achieved various indicators close to 0.9793, confirming the effectiveness and rationality of accuracy as a reward function. This choice simplified model design while ensuring excellent detection performance.

(8) Comparison with other models. The CNN-DT model outperformed traditional machine learning and deep learning methods in all evaluation metrics. The actor–critic method (accuracy 0.9792) was slightly better than DQN (0.9787) and DDQN (0.9783). Compared to the best traditional method XGBoost (0.9283), performance improved by 5.09 percentage points; compared to CNN and Conv LSTM (0.9047), it increased by approximately 7.45 percentage points. This demonstrated the effectiveness of hybrid architecture combined with CNN feature extraction capabilities and decision tree classification advantages, as well as the importance of deep reinforcement learning optimization.

(9) Modern dataset validation. To verify the applicability of the model in modern attack scenarios, the study conducted tests on two modern datasets, CICIDS2017 and UNSW-NB15. As shown in Table 8, the model performed well on these datasets, achieving near perfect recall on CICIDS2017 and an accuracy of 0.9491 on UNSW-NB15. This indicated that the CNN-DT structure had strong generalization ability and could effectively respond to network security threats in different periods and environments.

(10) Limitations and future work. Despite the strong performance of the proposed model, certain limitations remain for future research to address. First, the computational overhead of the AC optimization process may limit its suitability for real-time applications, necessitating the development of more efficient optimization algorithms. Second, although decision trees offer interpretability, further strategies to enhance explainability while preserving high accuracy warrant exploration. Future work will focus on addressing these limitations and extending the approach to more complex network environments.

6. Conclusions

This research proposed a CNN-DT model-based network intrusion detection technique. The CNN-DT model overcomes the drawback of fixed pooling parameters in conventional CNN by utilizing a hybrid pooling layer technique. An actor–critic deep reinforcement learning algorithm was utilized for the parameters in the hybrid pooling layer to determine the best pooling weight combination, which enhanced the model’s feature extraction capabilities. The decision tree classifier enhanced the model’s credibility and produced interpretable classification results. An early stopping technique was used in the model training process to successfully avoid overfitting and enhance the model’s capacity for generalization. Tests conducted on the KDD dataset demonstrated that the approach performed exceptionally well, achieving a score of 0.9792 across a number of evaluation measures, including accuracy, precision, recall, and F1 score. The technique is ideal for deployment in actual network security scenarios due to its improved detection accuracy and lower false alarm rate, as well as its good scalability to accommodate varying quantities of network traffic data. To further enhance the model’s parameters, increasingly intricate reinforcement learning techniques will be investigated in subsequent research.

Author Contributions

Conceptualization: L.Q., Z.X., L.L. and J.Z.; data curation, L.Q.; formal analysis, L.Q., Z.X. and L.L.; funding acquisition, Z.X. and J.Z.; investigation, Z.X. and L.L.; methodology, Z.X., L.L. and J.Z.; project administration, Z.X.; resources, Z.X., L.L. and J.Z.; software, L.Q. and J.S.; supervision, Z.X. and L.L.; validation, L.Q.; visualization, L.Q. and J.S.; writing—original draft, L.Q. and Z.X.; writing—review and editing, L.Q., Z.X., L.L., J.Z. and J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Fujian Provincial Natural Science Foundation of China (No. 2024J01120); the National Science Foundation of Xiamen, China (No. 3502Z202372013); the Open Project of the Key Laboratory of Underwater Acoustic Communication and Marine In-formation Technology (Xiamen University) of the Ministry of Education, China (No. UAC202304); the Fujian Province Young and Middle-aged Teacher Education Research Project (No. JAT220182); the Jimei University Startup Research Project (No. ZQ2022015); the Scientific Research Foundation of Jimei University (No. ZP2023010); the science and technology key projects of the Xiamen Ocean and Fishery Development Special Fund Project (No.21CZB013HJ15); and the Xiamen Key Laboratory of Marine Intelligent Terminal R&D and Application (No. B2024008); the Education Department Foundation of Fujian Province (JAT190531).

Data Availability Statement

The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author. The datasets used in this paper are included in the papers referenced.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, H.; Lang, B. Machine learning and deep learning methods for intrusion detection systems: A survey. Appl. Sci. 2019, 9, 4396. [Google Scholar] [CrossRef]
Wang, H.; Gu, J.; Wang, S. An effective intrusion detection framework based on SVM with feature augmentation. Knowl.-Based Syst. 2017, 136, 130–139. [Google Scholar] [CrossRef]
Zhao, G.; Zhang, C.; Zheng, L. Intrusion detection using deep belief network and probabilistic neural network. In Proceedings of the 2017 IEEE international conference on computational science and engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), Guangzhou, China, 21–24 July 2017; IEEE: Piscataway, NJ, USA, 2017; Volume 1, pp. 639–642. [Google Scholar]
Wang, X.; Qiao, Y.; Xiong, J.; Zhao, Z.; Zhang, N.; Feng, M.; Jiang, C. Advanced network intrusion detection with tabtransformer. J. Theory Pract. Eng. Sci. 2024, 4, 191–198. [Google Scholar] [CrossRef] [PubMed]
Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.A. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 1096–1103. [Google Scholar]
Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P.A.; Bottou, L. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 2010, 11, 3371–3408. [Google Scholar]
Xin, Y.; Kong, L.; Liu, Z.; Chen, Y.; Li, Y.; Zhu, H.; Gao, M.; Hou, H.; Wang, C. Machine learning and deep learning methods for cybersecurity. IEEE Access 2018, 6, 35365–35381. [Google Scholar] [CrossRef]
Zafar, A.; Aamir, M.; Mohd Nawi, N.; Arshad, A.; Riaz, S.; Alruban, A.; Dutta, A.K.; Almotairi, S. A comparison of pooling methods for convolutional neural networks. Appl. Sci. 2022, 12, 8643. [Google Scholar] [CrossRef]
Yan, B.; Han, G. LA-GRU: Building Combined Intrusion Detection Model Based on Imbalanced Learning and Gated Recurrent Unit Neural Network. Secur. Commun. Netw. 2018, 2018, 6026878. [Google Scholar] [CrossRef]
Khan, M.A.; Karim, M.R.; Kim, Y. A scalable and hybrid intrusion detection system based on the convolutional-LSTM network. Symmetry 2019, 11, 583. [Google Scholar] [CrossRef]
Liu, P. An intrusion detection system based on convolutional neural network. In Proceedings of the 2019 11th International Conference on Computer and Automation Engineering, Perth, WN, Australia, 23–25 February 2019; pp. 62–67. [Google Scholar]
Potluri, S.; Ahmed, S.; Diedrich, C. Convolutional neural networks for multi-class intrusion detection system. In Proceedings of the Mining Intelligence and Knowledge Exploration: 6th International Conference, MIKE 2018, Cluj-Napoca, Romania, 20–22 December 2018; Proceedings 6; Springer: Cham, Switzerland, 2018; pp. 225–238. [Google Scholar]
Naseer, S.; Saleem, Y. Enhanced network intrusion detection using deep convolutional neural networks. KSII Trans. Internet Inf. Syst. (TIIS) 2018, 12, 5159–5178. [Google Scholar]
Xiang, C.; Yong, P.C.; Meng, L.S. Design of multiple-level hybrid classifier for intrusion detection system using Bayesian clustering and decision trees. Pattern Recognit. Lett. 2008, 29, 918–924. [Google Scholar] [CrossRef]
Hu, W.; Hu, W.; Maybank, S. Adaboost-based algorithm for network intrusion detection. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2008, 38, 577–583. [Google Scholar]
Yang, J.; Chen, X.; Xiang, X.; Wan, J. HIDS-DT: An effective hybrid intrusion detection system based on decision tree. In Proceedings of the 2010 International Conference on Communications and Mobile Computing, Shenzhen, China, 12–14 April 2010; IEEE: Piscataway, NJ, USA, 2010; Volume 1, pp. 70–75. [Google Scholar]
Teng, S.; Wu, N.; Zhu, H.; Teng, L.; Zhang, W. SVM-DT-based adaptive and collaborative intrusion detection. IEEE/CAA J. Autom. Sin. 2017, 5, 108–118. [Google Scholar] [CrossRef]
Xu, X.; Xie, T. A reinforcement learning approach for host-based intrusion detection using sequences of system calls. In Proceedings of the International Conference on Intelligent Computing, Hefei, China, 23–26 August 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 995–1003. [Google Scholar]
Di, C.; Su, Y.; Han, Z.; Li, S. Learning automata based SVM for intrusion detection. In Communications, Signal Processing, and Systems: Proceedings of the 2017 International Conference on Communications, Signal Processing, and Systems; Springer: Singapore, 2019; pp. 2067–2074. [Google Scholar]
Servin, A.; Kudenko, D. Multi-agent reinforcement learning for intrusion detection. In Adaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning. AAMAS ALAMAS ALAMAS 2005 2007 2006; European Symposium on Adaptive Agents and Multi-Agent Systems; Springer: Berlin/Heidelberg, Germany, 2005; pp. 211–223. [Google Scholar]
Malialis, K.; Devlin, S.; Kudenko, D. Distributed reinforcement learning for adaptive and robust network intrusion response. Connect. Sci. 2015, 27, 234–252. [Google Scholar] [CrossRef]
Schlenker, A.; Thakoor, O.; Xu, H.; Fang, F.; Tambe, M.; Tran-Thanh, L.; Vayanos, P.; Vorobeychik, Y. Deceiving cyber adversaries: A game theoretic approach. In Proceedings of the AAMAS’18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, Stockholm, Sweden, 10–15 July 2018; IFAAMAS: Richland, SC, USA, 2018; pp. 892–900. [Google Scholar]
Lopez-Martin, M.; Carro, B.; Sanchez-Esguevillas, A. Application of deep reinforcement learning to intrusion detection for supervised problems. Expert Syst. Appl. 2020, 141, 112963. [Google Scholar] [CrossRef]
Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 1–6. [Google Scholar]

Figure 1. Structure of IDS algorithm based on CNN+DT.

Figure 2. Algorithmic framework diagram.

Figure 3. Top original features correlated with CNN features.

Figure 4. Results comparison.

Figure 5. ROC curve.

Figure 6. Performance metrics: (a) Weight change curve showing the evolution of model parameters. (b) Cumulative reward change curve over training iterations.

Table 1. CNN structure details.

Layer Name	Input Channel	Output Channel	Convolution Kernel Size	Convolutional Step	Activation Function
Conv2D	42	32	3 × 3	1	ReLu
HybridPooling2D	32	32	2 × 2	2	None
Conv2D	32	64	2 × 2	1	ReLu
HybridPooling2D	64	64	2 × 2	2	None
Conv2D	64	128	2 × 2	1	ReLu
Flatten	128	512	None	None	None

Table 2. Training parameters.

Configuration	Parameter
OS	Windows 11
GPU	NVIDIA RTX 3050 Ti
keras version	3.5.0
Epoch	10
Optimizer	Adam
Loss function	Actor: $L_{π} = - E [log π (a_{t} \| s_{t}; θ_{π}) \cdot A_{t}]$
	Critic: MSE
Learning rate	0.0003

Table 3. Model performance comparison.

Model Structure	Accuracy	Recall	F1 Score
CNN-DT(AC-optimized)	0.9792	0.9792	0.9792
CNN-DT	0.9229	0.9083	0.9499
DT-CNN	0.9220	0.9045	0.9492

Table 4. Comparison of rewards with different learning rates.

Learning Rate	Maximum Reward	Final Reward
3 × 10⁻³	0.9896	0.9562
1 × 10⁻³	0.9896	0.5189
3 × 10⁻⁴	0.9923	0.9830
1 × 10⁻⁴	0.9874	0.9787

Table 5. Performance comparison of different pooling methods.

Method	Accuracy	Recall	F1-Score
Max pooling	0.9739	0.9739	0.9750
Average pooling	0.9733	0.9733	0.9744
Global max pooling	0.9728	0.9728	0.9739
Hybrid pooling	0.9792	0.9792	0.9792

Table 6. Comparison of Different Reward Functions.

Reward Signal	Accuracy	Recall	F1-Score
Accuracy	0.9792	0.9792	0.9792
F1-score	0.9792	0.9792	0.9791
Accuracy + F1-score	0.9791	0.9791	0.9791

Table 7. Model Performance Metrics.

Model	Accuracy	Recall	F1-Score
SVM	0.8108	0.8108	0.8288
Random Forest	0.9261	0.9261	0.9301
CNN	0.9047	0.9047	0.9109
Conv-LSTM	0.9047	0.9047	0.9154
XGBoost	0.9283	0.9283	0.9321
LightGBM	0.9271	0.9271	0.9306
FT Transformer	0.8052	0.8052	0.7183
CNN-DT (DQN)	0.9787	0.9787	0.9786
CNN-DT (DDQN)	0.9783	0.9783	0.9783

Table 8. Performance on modern datasets.

Dataset	Accuracy	Recall	F1-Score	Kappa
CICIDS2017	0.9988	1.0000	0.9987	0.9994
UNSW-NB15	0.9491	0.9491	0.9496	0.8851

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qiu, L.; Xu, Z.; Lin, L.; Zheng, J.; Su, J. Design and Optimization of Hybrid CNN-DT Model-Based Network Intrusion Detection Algorithm Using Deep Reinforcement Learning. Mathematics 2025, 13, 1459. https://doi.org/10.3390/math13091459

AMA Style

Qiu L, Xu Z, Lin L, Zheng J, Su J. Design and Optimization of Hybrid CNN-DT Model-Based Network Intrusion Detection Algorithm Using Deep Reinforcement Learning. Mathematics. 2025; 13(9):1459. https://doi.org/10.3390/math13091459

Chicago/Turabian Style

Qiu, Lu, Zhiping Xu, Lixiong Lin, Jiachun Zheng, and Jiahui Su. 2025. "Design and Optimization of Hybrid CNN-DT Model-Based Network Intrusion Detection Algorithm Using Deep Reinforcement Learning" Mathematics 13, no. 9: 1459. https://doi.org/10.3390/math13091459

APA Style

Qiu, L., Xu, Z., Lin, L., Zheng, J., & Su, J. (2025). Design and Optimization of Hybrid CNN-DT Model-Based Network Intrusion Detection Algorithm Using Deep Reinforcement Learning. Mathematics, 13(9), 1459. https://doi.org/10.3390/math13091459

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Design and Optimization of Hybrid CNN-DT Model-Based Network Intrusion Detection Algorithm Using Deep Reinforcement Learning

Abstract

1. Introduction

1.1. Related Works

1.1.1. CNN

1.1.2. Decision Tree

1.1.3. Deep Reinforcement Learning

1.1.4. Contributions

2. The Proposed Hybrid CNN-DT Network Intrusion Detection (NID) Algorithm

2.1. Overall Description

2.2. Components of a CNN

2.3. DT Structure

2.4. Hybrid Pooling Layers

3. The RL-Based Optimization Method for Hybrid CNN-DT-Model-Based NID

4. Experiments Results

4.1. Dataset

4.2. Training

4.3. Metrics

4.4. Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI