Deep Reinforcement Learning Algorithms for Intrusion Detection: A Bibliometric Analysis and Systematic Review

Mpoporo, Lekhetho Joseph; Owolawi, Pius Adewale; Tu, Chunling

doi:10.3390/app16021048

Open AccessSystematic Review

Deep Reinforcement Learning Algorithms for Intrusion Detection: A Bibliometric Analysis and Systematic Review

by

Lekhetho Joseph Mpoporo

^1,*,

Pius Adewale Owolawi

^2,*

and

Chunling Tu

¹

Department of Computer Systems Engineering, Tshwane University of Technology, Private Bag X680, Pretoria 0001, South Africa

²

Faculty Information and Communication Technology, Tshwane University of Technology, Private Bag X680, Pretoria 0001, South Africa

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2026, 16(2), 1048; https://doi.org/10.3390/app16021048

Submission received: 17 October 2025 / Revised: 29 December 2025 / Accepted: 29 December 2025 / Published: 20 January 2026

(This article belongs to the Special Issue Advances in Cyber Security)

Download

Browse Figures

Versions Notes

Abstract

Intrusion detection systems (IDSs) are crucial for safeguarding modern digital infrastructure against the ever-evolving cyber threats. As cyberattacks become increasingly complex, traditional machine learning (ML) algorithms, while remaining effective in classifying known threats, face limitations such as static learning, dependency on labeled data, and susceptibility to adversarial exploits. Deep reinforcement learning (DRL) has recently surfaced as a viable substitute, providing resilience in unanticipated circumstances, dynamic adaptation, and continuous learning. This study conducts a thorough bibliometric analysis and systematic literature review (SLR) of DRL-based intrusion detection systems (DRL-based IDS). The relevant literature from 2020 to 2024 was identified and investigated using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework. Emerging research themes, influential works, and structural relationships in the research fields were identified using a bibliometric analysis. SLR was used to synthesize methodological techniques, datasets, and performance analysis. The results indicate that DRL algorithms such as deep Q-network (DQN), double DQNs (DDQN), dueling DQN (D3QN), policy gradient methods, and actor–critic models have been actively utilized for enhancing IDS performance in various applications and datasets. The results highlight the increasing significance of DRL-based solutions for developing intelligent and robust intrusion detection systems and advancing cybersecurity.

Keywords:

intrusion detection; deep reinforcement learning; cybersecurity; unanticipated circumstances; dynamic adaptation; continuous learning

1. Introduction

A system or software that automates intrusion detection processes is referred to as an intrusion detection system (IDS). An IDS plays a pivotal role in the modern cybersecurity infrastructure by monitoring and analyzing network traffic to identify unauthorized access, malicious behavior, or policy violations. The cyber intrusion landscape is characterized by quickly evolving threats, sophistication, and attack complexity. This is because billions of ever-growing Internet connections are made every day worldwide [1]. That means there is a high increase in the attack surface, leaving organizations and individuals vulnerable to attacks [2].

Intrusions in the context of cybersecurity refer to any unauthorized acts aimed at compromising confidentiality, integrity, or even the availability of network systems or data. Therefore, it is necessary to safeguard network and system data to protect their integrity and confidentiality. As cyber threats continue to evolve in complexity and scale, IDSs have become crucial in detecting and mitigating cyberattacks in real time [3].

Intrusion detection is vital in the digital world and is used in a wide range of cybersecurity domains, including infrastructure security, network security, endpoint security, information security, vehicular ad hoc networks (VANETs), Internet of Things (IoT) security, cloud security, industrial control systems (ICSs), and application security. Intrusion detection can be categorized into three groups:

(1) Anomaly-based detection, which is able to detect deviations from normal behavior, making it suitable for novel attack detection.

(2) Signature-based detection, which identifies known threats through predefined patterns or rules.

(3) Hybrid detection, which combines both signature and anomaly-based approaches for improved accuracy.

Detection of malicious and suspicious traffic, specifically on devices connected to the network, is grouped under network intrusion detection (NID) [4]. Attacks detected on the perimeter of critical infrastructure may imply perimeter intrusion detection (PID) [5]. Furthermore, identifying anomalous behavior in the host machine suggests host-based intrusion detection (HID).

IDSs play an essential role in the modern cybersecurity environment through the monitoring and analysis of network, information, and traffic to identify unauthorized access, policy violations, or malicious activities. They assist in the discovery, identification, and determination of unauthorized use, misuse, alteration, and destruction of networks and information systems [6].

The significance of improving IDS is paramount. Enhanced intrusion detection not only safeguards sensitive information and critical infrastructure but also strengthens trust in the digital environment. In sectors such as banking, healthcare, education, engineering, military, and industrial control systems, the ability to promptly detect intrusions can prevent devastating breaches and financial loss. For instance, the timely detection of a distributed denial of service (DDoS) or insider threat can mitigate systemic damage and ensure business continuity.

Various models have been used to enhance intrusion detection. Statistical models are among the oldest techniques used in intrusion detection, where statistical characteristics of network traffic or the environment are used to identify anomalies or malicious activities. They are typically used in anomaly-based IDS [7]. Machine learning (ML) models have played an important role in improving the performance of IDS by enabling the automated analysis and classification of network traffic patterns. ML models use historical data to identify malicious activities and have been successful to some extent. ML approaches include supervised learning using labeled datasets, unsupervised learning using unlabeled data, and semi-supervised learning using a small portion of labeled data and unlabeled samples in datasets to detect intrusions. Despite their strengths, ML techniques frequently require considerable feature engineering and struggle to cope with dynamic threats [8].

Reinforcement learning (RL) has become a popular technique for intrusion detection and classification of attacks using automated agents. These agents can learn, adapt, and respond to the dynamics of attacks launched in their environment. However, most proposed models suffer from high time consumption and false positives when scanning network traffic because of their inability to handle large datasets and their low intrusion feature detection capability. In recent years, deep reinforcement learning (DRL) algorithms have been proposed [9], which combine deep learning (DL) with reinforcement learning (RL). DRL leverages the learning power of DL with RL’s decision-making agent to optimize decision-making, making it highly suitable for adaptive intrusion detection. DRL algorithms perform well in intrusion detection for huge datasets and various applications, such as medical applications, computer vision, robotics, and video games.

Several DRL algorithms have been developed, including the deep Q-network (DQN) [10,11], double DQN (DDQN) [12], dueling DQN (D3QN) [12], policy gradient, actor–critic [13,14], and multi-agent reinforcement learning (MARL) [12,15]. The importance of DRL in an IDS emanates from its ability to adapt, scale, and be resilient against adversarial attacks. DRL algorithms are able to continuously advance and adapt to new threats, making them a suitable technique in the ever-changing landscape of cybersecurity [16].

2. Motivation of This Study

In this section, we examine the foundational and emerging research on intrusion detection systems and applications, concentrating on DRL approaches.

Earlier studies utilized various ML and deep learning techniques to enhance the performance of IDS. A study reported in [17] used ML to detect intrusions at the network level for industrial control systems. An experiment conducted in [18] combined the genetic algorithm with a support vector machine (SVM) for detecting intrusion in cloud secure data. The experiment was performed using the CICIDS2017 dataset. The model demonstrated a 5.74% improvement compared to existing models. Nevertheless, further experiments still need to be conducted utilizing larger datasets.

DRL has gained momentum owing to its ability to learn policies that adapt to changing attack strategies. The authors of [19] used a DQN for intrusion detection, which is well recognized. Shen et al. [20] utilized a DQN-based model to address intrusion detection on an edge-based Social Internet of Things (S-IoT) zero-day attack. In contrast, [21] applied a DQN for the dynamic detection of network intrusions in real time, showing improved detection rates over traditional techniques.

Double DQN (DDQN) and dueling DQN (D3QN) variants have been used to mitigate the limitations of DQN, such as overestimation of action values. The research conducted in [22,23] indicated that these models can improve stability and detection accuracy in adversarial environments.

Moreover, the policy gradient [24] and actor–critic [13] methods have been introduced for continuous control scenarios. These techniques perform better in situations that require complex decision-making in the face of uncertainty. Multi-agent reinforcement learning (MARL) [25] is another avenue that presents a significant extension to the distributed IDS in big data networks.

Despite these advancements, gaps remain in terms of real-time scalability and robustness to adversarial scenarios across domains. Therefore, a systematic review is crucial for gathering findings, comparing approaches, and assessing potential avenues for future studies.

Nassif et al. [26] focused on DRL for anomaly detection, considering resources published from 2017 to 2022. However, the focus of their study was not on intrusion detection in cybersecurity settings. Consequently, neither that study nor the review by Arshad et al. [27] explicitly addresses DRL for intrusion detection in the field of cybersecurity.

This study aims to bridge this gap by conducting a focused bibliometric analysis and systematic review of DRL for intrusion detection in the cybersecurity context. In contrast to prior reviews that broadly address ML-based anomaly detection [26] or DRL for anomaly detection across heterogeneous domains [27], our work makes the following specific contributions:

a. We provide an up-to-date, DRL-centric evidence base by systematically reviewing studies published between 2020 and 2024 that explicitly apply deep reinforcement learning to intrusion detection problems across network, IoT, cloud, industrial, and cyber–physical environments.

b. We integrate performance-based SLR synthesis with science-mapping bibliometric techniques (co-authorship, co-citation, keyword co-occurrence, and bibliographic coupling) to reveal structural relationships between DRL-based IDS research themes, influential authors, and venues.

c. We present a structured comparison of DRL algorithms, datasets, and evaluation metrics used in IDS applications, highlighting how specific algorithm–dataset combinations affect detection performance and robustness.

d. We identify persistent methodological and practical gaps—such as limited real-time deployment, adversarial robustness, and standardization of evaluation protocols—and translate these into concrete directions for future DRL-based IDS research and deployment.

The unique advantage of the integrated (bibliometric and SLR) approach is that it will provide a holistic and longitudinal understanding of DRL-based IDS research, offering actionable insights for researchers and practitioners that extend beyond the scope of prior narrative or single-method reviews.

3. Methodology

This study utilized a hybrid methodology consisting of bibliometric analysis and SLR methodology based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework [28]. This study has been registered to OSF, accessible on https://osf.io/59pvg, accessed 21 December 2025.

3.1. Bibliometric Analysis

This study utilized the integration of bibliometric analysis with systematic synthesis. The systematic literature review (SLR) methodology was based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework, abstract, and keywords using a combination of keywords. The search terms “intrusion detection” and “deep reinforcement learning” were combined with an “AND” operator to refine the search results. The search was limited to 2020–2024; both years were included because the paper aims at the recent research.

For the bibliometric analysis, we extracted standard metadata fields (title, abstract, author keywords, index keywords, authors, affiliations, sources, citation counts, and references). We prioritized the following indicators:

i.: Annual publication trends;
ii.: Source-level citation counts;
iii.: Co-authorship networks;
iv.: Keyword and index-keyword co-occurrence;
v.: Bibliographic coupling and co-citation relationships among documents and sources.

We used VOSviewer version 1.6.20 for data network visualization and gained macro-level insights into the research area.

Network visualizations were generated in VOSviewer using the fractional counting method. For keyword and index-keyword co-occurrence maps, we set the minimum occurrence threshold to five (5), which balances noise reduction with topic coverage. For co-authorship and co-citation analyses, the minimum number of documents (co-authorship) and citations (co-citation) was set to one (1) and twenty (20), respectively, and only nodes above these thresholds were visualized. Choosing 1 as the minimum value ensures the inclusion of authors with at least 1 publication to enhance network completeness and inclusiveness and ensure accurate mapping of collaborative dynamics. Moreover, setting a minimum of 20 co-citations filters out marginally cited works and ensures that only influential studies are included. This improves the clarity of the visualization and allows meaningful interpretation of thematic clusters. Co-citation analysis aims to uncover shared theoretical foundations and dominant methodological influences. Highly co-cited references are more likely to represent seminal algorithms, benchmark IDS methodologies, or foundational cybersecurity frameworks. Edge weights were based on total link strength as computed by VOSviewer, and layout parameters were left at their default settings to allow reproducibility.

3.2. Systematic Literature Review (SLR)

As indicated earlier, SLR was conducted in accordance with the guidelines outlined in the PRISMA framework. This systematic review was conducted using a comprehensive method to collect, analyze, and summarize the available research in the area of deep reinforcement learning for intrusion detection in cybersecurity.

3.2.1. Definition of Research Questions

The research questions for this study were developed on the basis of the research focus and aim. They follow the PICOS framework. This is discussed below.

RQ1: What types of deep reinforcement learning techniques have been used to address intrusion detection?
RQ1 aims to discuss the DRL techniques used to address intrusion detection. Current IDS applications using DRL models were analyzed.
- Population (P): IDS, network datasets;
- Intervention (I): DRL algorithms applied to IDS;
- Comparison (C): ML/DL baselines and RL variants;
- Outcomes (O): Types and characteristics of DRL techniques;
- Study Design (S): Experimental studies, bibliometric, and systematic review.
RQ2: What is the overall performance of the reviewed DRL algorithms and the current limitations in applying them in dynamic and adversarial environments?
RQ2 presents the performance of the model. Performance evaluation methods were used, focusing on the dataset used, the performance metrics utilized, the accuracy value, and limitations.
- Population (P): IDS in a dynamic/adversarial setting;
- Intervention (I): Performance evaluation of DRL algorithms;
- Comparison (C): Baseline models, alternative DRL methods;
- Outcomes (O): Performance metrics and limitations;
- Study Design (S): Experimental/comparative studies, systematic review.

3.2.2. Information Sources and Research Criteria

Resources were collected from online database libraries using the aforementioned search terms and the manual search through reading references. To ensure a comprehensive set of resource findings, we used an electronic search approach across several online databases, including IEEE Xplore, Elsevier, ACM Digital Library, Springer, MDPI, and Google Scholar.

3.2.3. Eligibility Criteria

To conduct an SLR on deep reinforcement learning for intrusion detection, the initial eligibility criterion for inclusion was documents with titles containing basic metadata pertaining to these concepts. The inclusion and exclusion criteria for extracted resources are described below.

The inclusion criteria for the resources used in this SLR were as follows:

i.: Papers written in English;
ii.: Conference papers, journal papers, and book chapters;
iii.: Articles that applied deep reinforcement learning to intrusion detection;
iv.: Research published from 2020 to 2024.

The exclusion criteria for filtering out resources were as follows:

i.: Papers written in a language other than English;
ii.: Papers that apply deep reinforcement learning to areas other than intrusion detection, as well as those that use machine learning techniques other than deep reinforcement learning for intrusion detection.

3.2.4. Resource Selection

For the SLR, records retrieved from each database were first screened using database filters (publication years 2020–2024 and document type, such as journal articles and conference papers) to restrict retrieval to the scope of the work. Separate searches were performed in each database using the defined search string. Titles and abstracts were then screened to exclude clearly irrelevant records, followed by full-text screening of the remaining articles to ensure alignment with the research questions. The overall selection process followed the PRISMA guidelines [28,29] as summarized below.

i.: Remove duplicate records from the combined database search results.
ii.: Apply the inclusion and exclusion criteria to titles and abstracts to retain only potentially relevant articles.
iii.: Conduct full-text screening of the remaining articles to confirm their relevance to DRL-based intrusion detection.
iv.: Apply the quality assessment so that only documents that address the research questions are considered.
v.: Screen the reference lists of the included articles (backward snowballing) to identify additional relevant studies, and, where applicable, subject these additional records to the same screening and quality assessment steps (Steps i–v).

3.2.5. Quality Assessment Criteria

To ensure that only methodologically sound studies contributed to the synthesis, we conducted a structured quality assessment of all candidate resources that passed the inclusion/exclusion screening. Each study was evaluated against ten (10) criteria (Q):

Q1. Does the paper clearly define its research objectives?

Q2. Are the deep reinforcement learning techniques described in sufficient detail to understand how the study was conducted?

Q3. Does the study clearly define the specific application of intrusion detection?

Q4. Does the study clearly describe the dataset used in the experiment?

Q5. Does the paper cover practical experiments in the proposed algorithm?

Q6. Are the key DRL components clearly defined and technically sound?

Q7. Does the study present results using performance metrics and demonstrate that the findings support the conclusions?

Q8. Does the paper identify and discuss limitations in its experimental design?

Q9. Does the paper provide enough technical detail (algorithms, parameters, or code) to allow replication of the study?

Q10. Is the proposed methodology compared with other methods?

Each criterion was scored on a three-point scale (0 = not addressed, 1 = partially addressed, 2 = fully addressed). The total summation of these points is 20; the papers that scored 10 or more were included, while studies with a total score below 10 were excluded from the final synthesis. We chose 10 as the determining score because it is half of the maximum score. The assessment results are indicated in Table 1. One author performed the initial scoring, and the other two authors independently validated the scores. Disagreements were resolved through discussion until consensus was reached.

3.2.6. Data Extraction Process

Our goal in this step was to examine the final list of resources and extract the metadata essential for addressing the research questions. We aimed to outline DRL approaches for intrusion detection and review their applications. We also reviewed the performance matrices, datasets used in the intrusion detection models, and the effectiveness of the methods concerned. The following metadata were extracted from the selected resources: document title, publication type, year of publication, DRL algorithm or algorithms, dataset, and performance matrix.

3.2.7. Synthesis of Data Items

To conduct this SLR, we utilized various techniques, including document title, model, application, dataset, evaluation metrics, and document type, to gather and synthesize information from the selected publications to ensure a comprehensive approach to addressing RQs.

4. Results and Discussion

This section presents a bibliometric analysis and a systematic literature review. Dominant research themes, important contributors, and publication trends were identified through bibliometric analysis. DRL-based IDS performance measures, datasets, and methodology were combined in the systematic review.

4.1. Bibliometric Results

A total of 227 documents were found, with 1 document excluded because it was not written in English. After deleting duplicates, 188 documents were included in the bibliometric analysis.

Co-occurrence analysis is a popular technique used to analyze bibliometric data that involves identifying and visualizing the relationships between all keywords, authors, and index keywords.

Figure 1 shows a network visualization of all keyword co-occurrences. The minimum number of keyword occurrences was set to five. It can be observed that the keywords “reinforcement learning,” “intrusion detection,” “deep learning,” “reinforcement learnings,” “deep reinforcement learning,” “network security,” and “intrusion detection systems” are the top highest co-occurrences with 162, 136, 139, 118, 128, 103, and 69 occurrences, respectively. The keywords mentioned above were the main research focus in the collected studies because each keyword formed a total link strength of 1604, 1437, 1354, 1275, 1197, 1107, and 768, respectively.

A network visualization of the index keywords’ co-occurrence is shown in Figure 2. It can be observed that “reinforcement learning,” “intrusion detection,” “deep learning,” “reinforcement learnings,” “network security,” and “deep reinforcement learning” are the top hits in co-occurrence with 160, 129, 137, 118, 97, and 100 occurrences, respectively. These keywords have total link strengths of 1480, 1297, 1248, 1198, and 1007.

Figure 3 shows the co-authorship network of authors, generated using a minimum of one document per author. In this network, Jin-hee Cho has 18 co-authorship links, indicating extensive collaboration with other researchers in the field. The presence of several interconnected clusters suggests that DRL-based IDS research is organized around multiple collaborative groups rather than isolated individual efforts.

The density visualization of bibliographic coupling by document is shown in Figure 4. The analysis provides insight into some influential works. The studies conducted by Nguyen and Reddi [30] and Sethi [31] are some of the most influential works in the area, with total link strengths of 217 and 138 and total citations of 209 and 103, respectively.

Figure 5 presents the network visualization structure of the citation structure of co-citations with respect to the cited authors. The structure was constructed by taking a minimum of 20 citations per author; consequently, the number of authors was filtered to 95.

The cluster indicates a strong intellectual connection between authors in the research. There is a similarity in the work of the authors because the cluster connections between authors are strong and follow a similar pattern. The total link strength ranges from 152 to 3185.

Figure 6 shows a density visualization of co-citations with respect to sources. This shows that the top co-cited publishers are IEEE Access, Sensors, IEEE Internet of Things Journals, and Nature. IEEE Access constitutes a total link strength of 1494 and 266 citations.

4.2. Systematic Review Results

This section presents the outcomes of the systematic review. Each research question is investigated and addressed in detail in the following subsections.

Applying the PRISMA-based selection process yielded 89 records after initial database retrieval and preliminary filtering. Following removal of duplicates and title and abstract screening, a reduced set of full-text articles was assessed for eligibility. After applying the inclusion and exclusion criteria and the quality assessment, a total of 36 studies were finally selected for the systematic review. The identification, screening, eligibility, and inclusion stages are summarized in the PRISMA flow diagram shown in Figure 7.

To address the research questions, selected resources that were used in this study were extracted and organized, as shown in Table 2, by listing their title, source, paper type, and assigned paper IDs. Paper IDs were used for identification in the SLR.

4.2.1. Applications of Intrusion Detection Systems (RQ1)

This section discusses part of RQ1, which seeks to assess DRL-based intrusion detection applications that have been identified in the literature. Among the many IDS applications reported in the selected studies, thirteen (13) distinct DRL-based IDS application domains were identified. Table 3 summarizes these application domains and indicates the number of studies in which each application was investigated.

Figure 8 presents the percentage usage of DRL-based IDS applications. It can be observed that the highest application of DRL-based IDS, as per the literature, is on network intrusion detection, followed by the IoT network with 42% and 20%, respectively. Adversarial attacks and network anomaly intrusion detection applications constitute 7% each, whereas intrusion detection and cloud network intrusion detection applications constitute 5% each. Intelligent outlier detection, industrial control systems, software-defined networking (SDN), fog-to-cloud computing, vehicular ad hoc networks (VANETs), SCADA networks, and cyberattacks are among the least studied DRL-based applications in the literature. Each of them is represented by 2% in the DRL-based IDS applications.

4.2.2. DRL Algorithms Utilized (RQ1)

This section addresses the remaining part of RQ1, in which the DRL techniques used for intrusion detection in the selected studies were assessed. Table 4 presents the list of DRL algorithms used in the documents included in the applications of intrusion detection in the cybersecurity space.

Deep Q-networks (DQN), double deep Q-networks (DDQN), proximal policy optimization 2 (PPO2), actor–critic, convolutional neural networks (CNNs), deep neural networks (DNNs), and multi-agent reinforcement learning (MARL) are the most commonly used deep reinforcement learning algorithms in intrusion detection systems. The following subsections discuss popular deep reinforcement learning techniques used in intrusion detection systems.

Actor–Critic (AC) Algorithms

An actor network is a policy gradient method in which the actor makes decisions according to the current policy and immediately evaluates the performance gradient based on the actor parameter. The actor parameters are updated for improvement [64].

The critic network is a value function method that relies on value function approximation, which improves the learning process of the actor by learning the approximate solution to the Bellman equation [65], which should thereafter suggest a near-optimal policy.

The actor–critic algorithm addresses the limitations of actor and critic methods by combining the qualities of both methods. The critic criticizes actors’ actions by learning. value function using simulation and approximation techniques, which were then applied to adjust the actor’s policy settings in the direction of performance improvement [66].

Because this is a policy-based and value-based method, the optimum policy is calculated by directly modifying the policy, whereas the value-based function implicitly determines the optimum policy by determining the optimum value-based function [67]. Studies reveal that both networks often use deep neural networks (DNNs) [23].

Proximal Policy Optimization 2 (PPO2)

Proximal Policy Optimization (PPO) is a family of policy gradient methods for training an intelligent policy agent network in the DRL. The method uses constraints to manage the convergence between the old and new strategies (policy update), so that the difference is not significant. Managing the significance between policy updates, from being too small or too significant, produces optimal performance [68]. Studies have revealed that this method performs best in a dynamic environment in which data are continuously changing. The algorithm uses small-scale data changes to produce many interactive processes [37]. This method was used to identify anomalies in intrusion-detection applications.

Deep Q-Network, Double Deep Q-Network (DDQN), and Dueling Double Deep Q-Network (D3DQN)

Q-learning, which uses a function approximator, such as a neural network (NN) or a deep neural network (DNN), as a Q-function to estimate the value of the function, is one of the most effective forms of reinforcement learning (RL). The term deep Q-learning (DQL) refers to the Q-function integrated with a DNN.

The Q-function consists of the state of the model, action, and reward value as model parameters. To estimate and predict the desired Q-values, a DNN was employed as a deep Q-network [55].

DNN was integrated to enhance learning in complex environments with high-dimensional state spaces, such as video games and automation. Nevertheless, DQN has several limitations that DDQN [69] seeks to address.

DDQN mitigates the overestimation bias observed with DQN by decoupling action selection and action evaluation mechanisms. The separation processes enhance the accuracy of the Q-value estimates and improve performance in intrusion detection applications. Whereas DQN uses a single network, DDQN utilizes two networks: an offline target Q-network for target value calculation and an online Q-network for action prediction.

The D3QN algorithm is similar to the DQN; however, it incorporates the advantage function to determine Q-values, which indicate an action’s quality in a given state. The advantage function removes the requirement of calculating all action Q-values to assess a state value [23]. The D3DQN is created by integrating the dueling architecture with the DDQN, which has the advantages of a decreased overestimation bias and efficient learning of state values [70].

Policy Gradient (PG)

PG is a type of reinforcement learning technique in which a policy is explicitly optimized and parameterized. PG approaches, as opposed to value-based approaches (such as Q-learning), seek to determine the best course of action by modifying their parameters in a way that maximizes projected cumulative benefits [71].

Instead of estimating the value or action-value function, PG techniques aim to optimize the policy without having to go through intermediate steps. Therefore, they are well-suited for scenarios that require continuous action, such as cybersecurity actions.

Multi-Agent Reinforcement Learning (MARL)

In the cybersecurity context, MARL is an extension of traditional RL, where multiple network entities learn their optimal policy through interaction or observation with other entities in a shared environment, either collaboratively, competitively, or independently. Each agent or network entity learns from the environment and other entities, performs actions, and receives feedback (rewards) [72].

MARL is distinguished by its dynamic and adaptive capabilities, which enhance cybersecurity defenses by enabling the development of intelligent systems that continuously learn and adapt in response to evolving threats, because it can significantly improve the learning efficiency of network entities [73]. Unlike the single-agent RL, the presence of other agents makes the environment more dynamic because cybersecurity environments are adversarial in nature. The authors adopted this algorithm to solve various issues related to network intrusion detection and threat hunting.

Deep Neural Network (DNN)

A DNN consists of multiple layers between the input and output layers. These networks are able to model complex nonlinear relationships in data. DNNs leverage multiple hidden layers and can learn features from raw input data without manual feature engineering. This provides an advantage in the dynamics of cybersecurity applications such as intrusion detection. The authors in [74] highlighted the observation that DNNs work well in IoT security because, although IoT devices constantly produce a massive amount of data, the pattern of data is common. Hence, a change in the pattern indicates a potential intrusion.

Adversarial Reinforcement Learning (ARL)

In adversarial reinforcement learning, an agent is trained to achieve optimal performance in the presence of adversaries that may perturb the learning process, observation, or even the working environment. Unlike in a normal RL, where an agent interacts with an environment to maximize cumulative rewards, in ARL, an agent’s learning process occurs in the presence of adversarial settings in which a learning environment is manipulated to compromise the policy learning of an agent [74,75].

In IDS applications, this involves simulating attacks and a defender and manipulating the model to produce false information. Defender agents maintain a secure state by detecting threats, while adversary agents keep evading detection. Consequently, the model becomes dynamic and adaptive to novel threats. An adversarial agent continues to introduce network attacks, whereas an RL agent learns to detect intrusions.

4.3. DRL Datasets Used and Performance Analysis of Deep Reinforcement Learning Algorithms (RQ2)

In this section, we address RQ2, which concentrates on the datasets used in the experiments, as indicated by the resources used and the performance of the proposed DRL models. Table 5 shows the performance of models specifying paper ID for paper/document identification, the proposed DRL algorithm, the intrusion detection application, the dataset used, the performance analysis criteria used, and the performance value or comparison. It is worth mentioning that most of the evaluated resources used statistical performance analysis and provided values to support the performance of the models. Other resources only evaluated the performance based on the reference models.

As shown in both Table 5 and Figure 9, the NSL-KDD dataset constitutes 28.57% of the total dataset utilized in the selected resources. This is the most widely utilized dataset. NSL-KDD contains publicly accessible data that is mostly used for evaluating or simulating cybersecurity, such as intrusion detection, anomaly detection, and denial of service, based on machine learning, deep learning, and reinforcement learning algorithms.

Among the selected resources, some literature that used the NSL-KDD dataset are SS9, SS13, SS14, SS19, SS22, and SS25. The accuracies obtained in each study are 79%, 96.09%, 91.4%, and 99.31% for SS9, SS13, SS14, and SS19, respectively. SS22 and SS25 were compared with the reference models, and both outperformed the reference models with improved accuracy. The applications of these models include network intrusion detection, network intrusion, anomaly network intrusion detection systems, network traffic, intrusion detection systems, and wireless sensor networks (WSNs) for SS9, SS13, SS14, SS19, SS22, and SS25, respectively.

UNSW-NB15 accounts for 10.71% of all the datasets utilized in the selected sources. The UNSW-NB15 dataset is a modern and comprehensive network traffic dataset designed to overcome limitations in data such as KDD99 and NSL-KDD by including a broader range of modern attack vectors and realistic traffic patterns. These data have been widely adopted in many studies, including SS23. The study used the DQN algorithm for an IDS on cloud infrastructure, achieving an accuracy of 83.8% and a false positive rate (FPR) of 2.6%. SS7 used the proximal policy optimization 2 (PPO2) technique for network anomaly behavior intrusion detection and achieved higher accuracy than alternative models.

AWID and KDDTest+ each made up 5.36% of the data. AWID is mostly used in intrusion detection in wireless applications, such as IoT. KDDTest+ is used more frequently in machine learning models. The study in SS16 utilized a stacked autoencoder–soft actor–critic (SA-AC) with AWID for network intrusion detection and achieved accuracies of 0.9898, 0.9896, and 0.9898 for accuracy, precision, and recall, respectively. The same technique was used with KDDTest+ in the same paper, and the performance achieved was 0.8415, 0.8427, and 0.8415 for accuracy, precision, and recall, respectively. This indicates that the data utilized had an impact on the model performance. Actor–critic models and policy gradient methods are preferred in environments that require continuous control. DQN remains the most widely applied algorithm in IDS. When examining the performance results obtained in SS10 and SS12, where DDQN was utilized with different types of data, it was observed that DDQN offered improved accuracy in adversarial and dynamic contexts.

4.4. Research Gaps and Implications for Future DRL-Based IDS

The combined bibliometric and SLR findings highlight several cross-cutting trends and unresolved challenges. First, DQN and its variants (DDQN, D3QN) dominate current DRL-IDS research, especially when applied to legacy benchmark datasets such as NSL-KDD and UNSW-NB15 (Table 4 and Table 5). While these studies report high accuracy in static settings, they provide limited evidence of performance under realistic, high-throughput traffic or adversarial perturbations. This imbalance suggests a methodological gap between algorithmic innovation and deployment-oriented evaluation.

Second, a strong dependence on a small set of public datasets raises concerns about overfitting specific traffic profiles and attack taxonomies. Very few studies incorporate streaming data, concept drift, or continuous learning setups, even though these are central to the motivation for using DRL. Moreover, evaluation metrics are often restricted to aggregate classification measures (accuracy, precision, recall, and F1-score), with less attention paid to latency, energy consumption, or resource usage in edge and IoT deployments.

Third, the bibliometric maps reveal emerging but still fragmented communities around MARL, adversarial reinforcement learning, and federated DRL (for example, SS13, SS16, SS22, and SS27). Although these approaches are well aligned with distributed and adversarial cybersecurity scenarios, the number of empirical studies and cross-domain replications remains limited. There is also little standardization of reward design and state representations across application domains, which hinders comparability.

Taken together, these observations imply that future DRL-based IDS research should (i) prioritize evaluation on more diverse and up-to-date datasets, including real or semi-synthetic traffic; (ii) incorporate deployment-oriented metrics such as latency, resource utilization, and robustness under adversarial manipulation; (iii) explore MARL, federated DRL, and ARL in multi-domain settings with standardized experimental protocols; and (iv) develop shared benchmarks and open implementations to facilitate reproducibility and fair comparisons across studies.

5. Conclusions

This study investigated the use of DRL-based IDS by integrating bibliometric analysis with a systematic literature review. The bibliometric analysis demonstrated the growing significance of DRL approaches in cybersecurity by providing a macro-level overview of research trends, influential publications, collaboration patterns, and core thematic clusters (deep learning, anomaly detection, and network security).

Complementing this, the SLR synthesized empirical findings from 36 selected studies, offering a structured comparison of DRL algorithms (for example, DQN, DDQN, D3QN, policy gradient, actor–critic, and MARL), datasets (for example, NSL-KDD, UNSW-NB15, CIC-IDS2017/2018, and TON-IoT), and performance metrics. By linking algorithmic choices to datasets and reported performance, the review provides practitioners and researchers with concrete guidance on which DRL configurations are most promising for specific IDS scenarios.

Nevertheless, several opportunities remain for future work. Our synthesis shows that research is still concentrated on a narrow set of DQN-based architectures and legacy datasets, with limited evaluation of real-time performance, adversarial robustness, and resource constraints. Expanding the scope to cover additional years, digital libraries, real-world traffic traces, and gray literature would help capture emerging DRL designs and reduce publication bias. Furthermore, standardized performance metrics and reproducible experimental setups—especially for MARL, federated DRL, and adversarial settings—are essential enablers of fair comparison across studies and of reliable transfer of DRL-based IDS solutions into operational cybersecurity environments.

Author Contributions

Conceptualization, L.J.M. and P.A.O.; methodology, L.J.M., P.A.O. and C.T.; software, L.J.M. and P.A.O.; validation, P.A.O.; formal analysis, C.T.; investigation, L.J.M.; resources, L.J.M., P.A.O. and C.T.; data curation, L.J.M.; writing—original draft preparation, L.J.M.; writing—review and editing, P.A.O. and C.T.; visualization, L.J.M.; supervision, P.A.O. and C.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

During the preparation of this manuscript, the author(s) used free web-based OpenAI’s ChatGPT (https://chat.chatbot.app/) for the purposes of language formatting. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Goodman, M. Future Crimes: Inside the Digital Underground and the Battle for Our Connected World; Random House: New York, NY, USA, 2015. [Google Scholar]
Achari, A. Cybersecurity in Cloud Computing; Educohack Press: Delhi, India, 2025. [Google Scholar]
Pinto, L.; Davidson, J.; Sukthankar, R.; Gupta, A. Robust adversarial reinforcement learning. In Proceedings of the International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2017; pp. 2817–2826. [Google Scholar]
Wang, X.; Qiao, Y.; Xiong, J.; Zhao, Z.; Zhang, N.; Feng, M.; Jiang, C. Advanced network intrusion detection with tabtransformer. J. Theory Pract. Eng. Sci. 2024, 4, 191–198. [Google Scholar] [CrossRef] [PubMed]
Pitafi, S.; Anwar, T.; Widia, I.D.M.; Yimwadsana, B. Revolutionizing perimeter intrusion detection: A machine learning-driven approach with curated dataset generation for enhanced security. IEEE Access 2023, 11, 106954–106966. [Google Scholar] [CrossRef]
Mallick, M.A.I.; Nath, R. Navigating the cyber security landscape: A comprehensive review of cyber-attacks, emerging trends, and recent developments. World Sci. News 2024, 190, 1–69. [Google Scholar]
Zoppi, T.; Ceccarelli, A.; Capecchi, T.; Bondavalli, A. Unsupervised anomaly detectors to detect intrusions in the current threat landscape. ACM/IMS Trans. Data Sci. 2021, 2, 1–26. [Google Scholar] [CrossRef]
Valencia-Arias, A.; González-Ruiz, J.D.; Flores, L.V.; Vega-Mori, L.; Rodríguez-Correa, P.; Santos, G.S. Machine learning and blockchain: A bibliometric study on security and privacy. Information 2024, 15, 65. [Google Scholar] [CrossRef]
Tang, C.; Abbatematteo, B.; Hu, J.; Chandra, R.; Martín-Martín, R.; Stone, P. Deep reinforcement learning for robotics: A survey of real-world successes. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 28694–28698. [Google Scholar]
Hernández-Carlón, J.J.; Pérez-Romero, J.; Sallent, O.; Vilà, I.; Casadevall, F. A deep Q-network-based algorithm for multi-connectivity optimization in heterogeneous cellular-networks. Sensors 2022, 22, 6179. [Google Scholar] [CrossRef]
Geng, X.; Zhang, B. Deep Q-network-based intelligent routing protocol for underwater acoustic sensor network. IEEE Sens. J. 2023, 23, 3936–3943. [Google Scholar] [CrossRef]
Gök, M. Dynamic path planning via Dueling Double Deep Q-Network (D3QN) with prioritized experience replay. Appl. Soft Comput. 2024, 158, 111503. [Google Scholar] [CrossRef]
Kumar, H.; Koppel, A.; Ribeiro, A. On the sample complexity of actor-critic method for reinforcement learning with function approximation. Mach. Learn. 2023, 112, 2433–2467. [Google Scholar] [CrossRef]
Tiong, T.; Saad, I.; Teo, K.T.K.; Lago, H.B. Deep reinforcement learning with robust deep deterministic policy gradient. In Proceedings of the 2020 2nd International Conference on Electrical, Control and Instrumentation Engineering (ICECIE); IEEE: New York, NY, USA, 2020; pp. 1–5. [Google Scholar]
Chen, D.; Chen, K.; Li, Z.; Chu, T.; Yao, R.; Qiu, F.; Lin, K. Powernet: Multi-agent deep reinforcement learning for scalable powergrid control. IEEE Trans. Power Syst. 2021, 37, 1007–1017. [Google Scholar] [CrossRef]
Benaddi, H.; Ibrahimi, K.; Benslimane, A.; Jouhari, M.; Qadir, J. Robust enhancement of intrusion detection systems using deep reinforcement learning and stochastic game. IEEE Trans. Veh. Technol. 2022, 71, 11089–11102. [Google Scholar] [CrossRef]
Umer, M.A.; Junejo, K.N.; Jilani, M.T.; Mathur, A.P. Machine learning for intrusion detection in industrial control systems: Applications, challenges, and recommendations. Int. J. Crit. Infrastruct. Prot. 2022, 38, 100516. [Google Scholar] [CrossRef]
Aldallal, A.; Alisa, F. Effective intrusion detection system to secure data in cloud using machine learning. Symmetry 2021, 13, 2306. [Google Scholar] [CrossRef]
Van Hasselt, H.P.; Guez, A.; Hessel, M.; Mnih, V.; Silver, D. Learning values across many orders of magnitude. arXiv 2016, arXiv:1602.07714. [Google Scholar] [CrossRef]
Shen, S.; Cai, C.; Li, Z.; Shen, Y.; Wu, G.; Yu, S. Deep Q-network-based heuristic intrusion detection against edge-based SIoT zero-day attacks. Appl. Soft Comput. 2024, 150, 111080. [Google Scholar] [CrossRef]
Suwannalai, E.; Polprasert, C. Network intrusion detection systems using adversarial reinforcement learning with deep Q-network. In Proceedings of the 2020 18th International Conference on ICT and Knowledge Engineering (ICT&KE); IEEE: New York, NY, USA, 2020; pp. 1–7. [Google Scholar]
Zhu, Z.; Chen, M.; Zhu, C.; Zhu, Y. Effective defense strategies in network security using improved double dueling deep Q-network. Comput. Secur. 2024, 136, 103578. [Google Scholar] [CrossRef]
Sangoleye, F.; Johnson, J.; Tsiropoulou, E.E. Intrusion detection in industrial control systems based on deep reinforcement learning. IEEE Access 2024, 12, 151444–151459. [Google Scholar] [CrossRef]
Wang, Y.; Zou, S. Policy gradient method for robust reinforcement learning. In Proceedings of the International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2022; pp. 23484–23526. [Google Scholar]
Louati, F.; Ktata, F.B.; Amous, I. Big-IDS: A decentralized multi agent reinforcement learning approach for distributed intrusion detection in big data networks. Clust. Comput. 2024, 27, 6823–6841. [Google Scholar] [CrossRef]
Nassif, A.B.; Talib, M.A.; Nasir, Q.; Dakalbab, F.M. Machine learning for anomaly detection: A systematic review. IEEE Access 2021, 9, 78658–78700. [Google Scholar] [CrossRef]
Arshad, K.; Ali, R.F.; Muneer, A.; Aziz, I.A.; Naseer, S.; Khan, N.S.; Taib, S.M. Deep reinforcement learning for anomaly detection: A systematic review. IEEE Access 2022, 10, 124017–124035. [Google Scholar] [CrossRef]
Keele, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering. Technical report, ver. 2.3 ebse Technical Report. ebse 2007. Available online: https://legacyfileshare.elsevier.com/promis_misc/525444systematicreviewsguide.pdf (accessed on 15 October 2025).
Page, M.J.; Moher, D.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. PRISMA 2020 explanation and elaboration: Updated guidance and exemplars for reporting systematic reviews. Bmj 2021, 372, n160. [Google Scholar] [CrossRef]
Nguyen, T.T.; Reddi, V.J. Deep reinforcement learning for cyber security. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 3779–3795. [Google Scholar] [CrossRef] [PubMed]
Sethi, K.; Rupesh, E.S.; Kumar, R.; Bera, P.; Madhav, Y.V. A context-aware robust intrusion detection system: A reinforcement learning-based approach. Int. J. Inf. Secur. 2020, 19, 657–678. [Google Scholar] [CrossRef]
Priya, S.; PradeepMohankumar, K. Intelligent outlier detection with optimal deep reinforcement learning model for intrusion detection. In Proceedings of the 2021 4th International Conference on Computing and Communications Technologies (ICCCT); IEEE: New York, NY, USA, 2021; pp. 336–341. [Google Scholar]
Wang, Z.; Wang, Y.; Xu, H.; Wang, Y. Effective anomaly detection based on reinforcement learning in network traffic data. In Proceedings of the 2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS); IEEE: New York, NY, USA, 2021; pp. 299–306. [Google Scholar]
Su, Y.-J.; Wu, L.-C.; Chen, C.-H.; Chen, T.-Y. Combining Data Resampling and DRL Algorithm for Intrusion Detection. In Proceedings of the 2023 5th International Conference on Computer Communication and the Internet (ICCCI); IEEE: New York, NY, USA, 2023; pp. 47–51. [Google Scholar]
Merzouk, M.A.; Delas, J.; Neal, C.; Cuppens, F.; Boulahia-Cuppens, N.; Yaich, R. Evading deep reinforcement learning-based network intrusion detection with adversarial attacks. In Proceedings of the 17th International Conference on Availability, Reliability and Security, Vienna, Austria, 23–26 August 2022; pp. 1–6. [Google Scholar]
Tariq, Z.U.A.; Baccour, E.; Erbad, A.; Guizani, M.; Hamdi, M. Network intrusion detection for smart infrastructure using multi-armed bandit based reinforcement learning in adversarial environment. In Proceedings of the 2022 International Conference on Cyber Warfare and Security (ICCWS); IEEE: New York, NY, USA, 2022; pp. 75–82. [Google Scholar]
He, M.; Wang, X.; Wei, P.; Yang, L.; Teng, Y.; Lyu, R. Reinforcement learning meets network intrusion detection: A transferable and adaptable framework for anomaly behavior identification. IEEE Trans. Netw. Serv. Manag. 2024, 21, 2477–2492. [Google Scholar] [CrossRef]
Zolotukhin, M.; Kumar, S.; Hämäläinen, T. Reinforcement learning for attack mitigation in SDN-enabled networks. In Proceedings of the 2020 6th IEEE Conference on Network Softwarization (NetSoft); IEEE: New York, NY, USA, 2020; pp. 282–286. [Google Scholar]
Sujatha, V.; Prasanna, K.L.; Niharika, K.; Charishma, V.; Sai, K.B. Network intrusion detection using deep reinforcement learning. In Proceedings of the 2023 7th International Conference on Computing Methodologies and Communication (ICCMC); IEEE: New York, NY, USA, 2023; pp. 1146–1150. [Google Scholar]
Dang, Q.-V.; Vo, T.-H. Studying the Reinforcement Learning techniques for the problem of intrusion detection. In Proceedings of the 2021 4th International Conference on Artificial Intelligence and Big Data (ICAIBD); IEEE: New York, NY, USA, 2021; pp. 87–91. [Google Scholar]
Bebortta, S.; Tripathy, S.S.; Sharma, V.; Behera, J.R.; Nayak, A. A Secure Deep Q-Reinforcement Learning Framework for Network Intrusion Detection in IoT-Fog Systems. In Proceedings of the 2024 OPJU International Technology Conference (OTCON) on Smart Computing for Innovation and Advancement in Industry 4.0; IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar]
Rookard, C.; Khojandi, A. Applying deep reinforcement learning for detection of Internet-of-Things cyber attacks. In Proceedings of the 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC); IEEE: New York, NY, USA, 2023; pp. 0389–0395. [Google Scholar]
Vadigi, S.; Sethi, K.; Mohanty, D.; Das, S.P.; Bera, P. Federated reinforcement learning based intrusion detection system using dynamic attention mechanism. J. Inf. Secur. Appl. 2023, 78, 103608. [Google Scholar] [CrossRef]
Hsu, Y.-F.; Matsuoka, M. A deep reinforcement learning approach for anomaly network intrusion detection system. In Proceedings of the 2020 IEEE 9th International Conference on Cloud Networking (CloudNet); IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
Ren, K.; Zeng, Y.; Zhong, Y.; Sheng, B.; Zhang, Y. MAFSIDS: A reinforcement learning-based intrusion detection model for multi-agent feature selection networks. J. Big Data 2023, 10, 137. [Google Scholar] [CrossRef]
Li, Z.; Huang, C.; Deng, S.; Qiu, W.; Gao, X. A soft actor-critic reinforcement learning algorithm for network intrusion detection. Comput. Secur. 2023, 135, 103502. [Google Scholar] [CrossRef]
Benaddi, H.; Ibrahimi, K.; Benslimane, A.; Qadir, J. A deep reinforcement learning based intrusion detection system (drl-ids) for securing wireless sensor networks and internet of things. In Proceedings of the International Wireless Internet Conference; Springer: Berlin/Heidelberg, Germany, 2019; pp. 73–87. [Google Scholar]
Najafli, S.; Haghighat, A.T.; Karasfi, B. A novel reinforcement learning-based hybrid intrusion detection system on fog-to-cloud computing. J. Supercomput. 2024, 80, 26088–26110. [Google Scholar] [CrossRef]
Sharma, A.; Singh, M. Batch reinforcement learning approach using recursive feature elimination for network intrusion detection. Eng. Appl. Artif. Intell. 2024, 136, 109013. [Google Scholar] [CrossRef]
Sultana, R.; Grover, J.; Tripathi, M. Intelligent defense strategies: Comprehensive attack detection in VANET with deep reinforcement learning. Pervasive Mob. Comput. 2024, 103, 101962. [Google Scholar] [CrossRef]
Santos, R.R.D.; Viegas, E.K.; Santin, A.O.; Cogo, V.V. Reinforcement learning for intrusion detection: More model longness and fewer updates. IEEE Trans. Netw. Serv. Manag. 2022, 20, 2040–2055. [Google Scholar] [CrossRef]
Sethi, K.; Kumar, R.; Prajapati, N.; Bera, P. Deep reinforcement learning based intrusion detection system for cloud infrastructure. In Proceedings of the 2020 International Conference on COMmunication Systems & NETworkS (COMSNETS); IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
Ren, K.; Wang, M.; Zeng, Y.; Zhang, Y. An unmanned network intrusion detection model based on deep reinforcement learning. In Proceedings of the 2022 IEEE International Conference on Unmanned Systems (ICUS); IEEE: New York, NY, USA, 2022; pp. 1070–1076. [Google Scholar]
Raj, N.N.; Rajesh, R.; Justin, A.; Shihab, F. Enhancing Network Intrusion Detection Using Deep Reinforcement Learning: An Adaptive Learning Approach. In Proceedings of the International Conference on Emerging Trends in Communication, Computing and Electronics; Springer: Berlin/Heidelberg, Germany, 2018; pp. 297–315. [Google Scholar]
Alavizadeh, H.; Alavizadeh, H.; Jang-Jaccard, J. Deep Q-learning based reinforcement learning approach for network intrusion detection. Computers 2022, 11, 41. [Google Scholar] [CrossRef]
Benaddi, H.; Jouhari, M.; Ibrahimi, K.; Othman, J.B.; Amhoud, E.M. Anomaly detection in industrial IoT using distributional reinforcement learning and generative adversarial networks. Sensors 2022, 22, 8085. [Google Scholar] [CrossRef]
Dong, S.; Xia, Y.; Peng, T. Network abnormal traffic detection model based on semi-supervised deep reinforcement learning. IEEE Trans. Netw. Serv. Manag. 2021, 18, 4197–4212. [Google Scholar] [CrossRef]
Tharewal, S.; Ashfaque, M.W.; Banu, S.S.; Uma, P.; Hassen, S.M.; Shabaz, M. Intrusion detection system for industrial Internet of Things based on deep reinforcement learning. Wirel. Commun. Mob. Comput. 2022, 2022, 9023719. [Google Scholar] [CrossRef]
Ren, K.; Zeng, Y.; Cao, Z.; Zhang, Y. ID-RDRL: A deep reinforcement learning-based feature selection intrusion detection model. Sci. Rep. 2022, 12, 15370. [Google Scholar] [CrossRef] [PubMed]
Mesadieu, F.; Torre, D.; Chennamaneni, A. Leveraging deep reinforcement learning technique for intrusion detection in SCADA infrastructure. IEEE Access 2024, 12, 63381–63399. [Google Scholar] [CrossRef]
Baby, R.; Pooranian, Z.; Shojafar, M.; Tafazolli, R. A heterogenous IoT attack detection through deep reinforcement learning: A dynamic ML approach. In Proceedings of the ICC 2023-IEEE International Conference on Communications; IEEE: New York, NY, USA, 2023; pp. 479–484. [Google Scholar]
Sethi, K.; Kumar, R.; Mohanty, D.; Bera, P. Robust adaptive cloud intrusion detection system using advanced deep reinforcement learning. In Proceedings of the International Conference on Security, Privacy, and Applied Cryptography Engineering; Springer: Berlin/Heidelberg, Germany, 2020; pp. 66–85. [Google Scholar]
Geo Francis, E.; Sheeja, S. Enhanced intrusion detection in wireless sensor networks using deep reinforcement learning with improved feature extraction and selection. Multimed. Tools Appl. 2025, 84, 11943–11982. [Google Scholar] [CrossRef]
Jia, Y.; Zhou, X.Y. Policy gradient and actor-critic learning in continuous time and space: Theory and algorithms. J. Mach. Learn. Res. 2022, 23, 1–50. [Google Scholar] [CrossRef]
Bahdanau, D.; Brakel, P.; Xu, K.; Goyal, A.; Lowe, R.; Pineau, J.; Courville, A.; Bengio, Y. An Actor-Critic Algorithm for Sequence Prediction. arXiv 2016, arXiv:1607.07086. [Google Scholar]
Duan, J.; Wang, W.; Xiao, L.; Gao, J.; Li, S.E.; Liu, C.; Zhang, Y.Q.; Cheng, B.; Li, K. Distributional soft actor-critic with three refinements. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 3935–3946. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement learning: An introduction; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
Wang, Z.; Schaul, T.; Hessel, M.; Hasselt, H.; Lanctot, M.; Freitas, N. Dueling network architectures for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2016; pp. 1995–2003. [Google Scholar]
Lopez-Martin, M.; Carro, B.; Sanchez-Esguevillas, A. Application of deep reinforcement learning to intrusion detection for supervised problems. Expert. Syst. Appl. 2020, 141, 112963. [Google Scholar] [CrossRef]
Zhang, K.; Yang, Z.; Başar, T. Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handb. Reinf. Learn. Control 2021, 325, 321–384. [Google Scholar]
Albrecht, S.V.; Christianos, F.; Schäfer, L. Multi-Agent Reinforcement Learning: Foundations and Modern Approaches; MIT Press: Cambridge, MA, USA, 2024. [Google Scholar]
Ghani, H.; Virdee, B.; Salekzamankhani, S. A deep learning approach for network intrusion detection using a small features vector. J. Cybersecur. Priv. 2023, 3, 451–463. [Google Scholar] [CrossRef]
Zhang, H.; Chen, H.; Xiao, C.; Li, B.; Liu, M.; Boning, D.; Hsieh, C.J. Robust deep reinforcement learning against adversarial perturbations on state observations. Adv. Neural Inf. Process. Syst. 2020, 33, 21024–21037. [Google Scholar]

Figure 1. Network visualization of co-occurrence by all keywords.

Figure 2. Network visualization of co-occurrence by index keyword.

Figure 3. Network visualization of co-authorship by authors.

Figure 4. Density visualization of bibliographic coupling by document.

Figure 5. Network visualization of co-citation by cited authors.

Figure 6. Density of co-citation and cited sources.

Figure 7. PRISMA flow chart for SLR resources.

Figure 8. Percentage distribution of DRL-based IDS applications across different domains.

Figure 9. Distribution of datasets used in DRL-based IDS studies.

Table 1. Quality assessment results for selected papers.

Results	Number of Papers	Paper ID
Below 10	3	Excluded
11	1	SS7
12	1	SS21
13	5	SS22, SS24, SS28, SS30, SS31
14	4	SS1, SS8, SS25, SS27
15	5	SS16, SS17, SS20, SS33, SS35
16	6	SS2, SS3, SS4, SS10, SS11, SS29
17	4	SS14, SS18, SS34, SS36
18	2	SS19, SS23
19	3	SS6, SS13, SS26
20	5	SS5, SS9, SS12, SS15, SS32

Table 2. Metadata of resources included in this study for systematic review.

Paper ID	Title	Source	Type	Year	Reference
SS1	“Intelligent outlier detection with optimal deep reinforcement learning model for intrusion detection”	IEEE	Conf.	2021	[32]
SS2	“Effective Anomaly Detection Based on Reinforcement Learning in Network Traffic Data”	IEEE	Conf.	2021	[33]
SS3	“Combining Data Resampling and DRL Algorithm for Intrusion Detection”	IEEE	Conf.	2023	[34]
SS4	“Evading deep reinforcement learning-based network intrusion detection with adversarial attacks”	ACM	Conf.	2022	[35]
SS5	“Network intrusion detection for smart infrastructure using multi-armed bandit based reinforcement learning in adversarial environment”	IEEE	Conf.	2022	[36]
SS6	“Network intrusion detection systems using adversarial reinforcement learning with deep Q-network”	IEEE	Conf.	2020	[21]
SS7	“Reinforcement learning meets network intrusion detection: A transferable and adaptable framework for anomaly behavior identification”	IEEE	Jour.	2024	[37]
SS8	“Reinforcement learning for attack mitigation in SDN-enabled networks”	IEEE	Conf.	2020	[38]
SS9	“Network intrusion detection using deep reinforcement learning”	IEEE	Conf.	2023	[39]
SS10	“Studying the Reinforcement Learning techniques for the problem of intrusion detection”	IEEE	Conf.	2021	[40]
SS11	“A Secure Deep Q-Reinforcement Learning Framework for Network Intrusion Detection in IoT-Fog Systems”	IEEE	Conf.	2024	[41]
SS12	“Applying deep reinforcement learning for detection of internet-of-things cyber attacks”	IEEE	Conf.	2023	[42]
SS13	“Federated reinforcement learning based intrusion detection system using dynamic attention mechanism”	Elsevier	Jour.	2023	[43]
SS14	“A deep reinforcement learning approach for anomaly network intrusion detection system”	IEEE	Conf.	2020	[44]
SS15	“MAFSIDS: a reinforcement learning-based intrusion detection model for multi-agent feature selection networks”	Springer	Jour.	2023	[45]
SS16	“A soft actor-critic reinforcement learning algorithm for network intrusion detection”	Elsevier	Jour.	2023	[46]
SS17	“A deep reinforcement learning based intrusion detection system (drl-ids) for securing wireless sensor networks and internet of things”	Springer	Conf.	2020	[47]
SS18	“A novel reinforcement learning-based hybrid intrusion detection system on fog-to-cloud computing”	Springer	Jour.	2024	[48]
SS19	“Batch reinforcement learning approach using recursive feature elimination for network intrusion detection”	Elsevier	Jour.	2024	[49]
SS20	“Intelligent defense strategies: Comprehensive attack detection in VANET with deep reinforcement learning”	Elsevier	Jour.	2024	[50]
SS21	“Reinforcement learning for intrusion detection: More model longness and fewer updates”	IEEE	Jour.	2023	[51]
SS22	“Robust enhancement of intrusion detection systems using deep reinforcement learning and stochastic game”	IEEE	Jour.	2022	[16]
SS23	“Deep reinforcement learning based intrusion detection system for cloud infrastructure”	IEEE	Conf.	2020	[52]
SS24	“An unmanned network intrusion detection model based on deep reinforcement learning”	IEEE	Conf.	2022	[53]
SS25	“Enhancing Network Intrusion Detection Using Deep Reinforcement Learning: An Adaptive Learning Approach”	Springer	Conf.	2024	[54]
SS26	“Deep Q-learning based reinforcement learning approach for network intrusion detection”	MDPI	Jour.	2022	[55]
SS27	“Anomaly detection in industrial IoT using distributional reinforcement learning and generative adversarial networks”	MDPI	Jour.	2022	[56]
SS28	“Network abnormal traffic detection model based on semi-supervised deep reinforcement learning”	IEEE	Jour.	2021	[57]
SS29	“Intrusion detection system for industrial Internet of Things based on deep reinforcement learning”	Wiley	Jour.	2022	[58]
SS30	“ID-RDRL: a deep reinforcement learning-based feature selection intrusion detection model”	PubMed	Jour.	2022	[59]
SS31	“Intrusion Detection in Industrial Control Systems Based on Deep Reinforcement Learning”	IEEE	Jour.	2024	[23]
SS32	“Leveraging Deep Reinforcement Learning Technique for Intrusion Detection in SCADA Infrastructure”	IEEE	Jour.	2024	[60]
SS33	“A heterogenous IoT attack detection through deep reinforcement learning: a dynamic ML approach”	IEEE	Conf.	2023	[61]
SS34	“Robust adaptive cloud intrusion detection system using advanced deep reinforcement learning”	Springer	Conf.	2020	[62]
SS35	“A context-aware robust intrusion detection system: a reinforcement learning-based approach”	Springer	Jour.	2020	[31]
SS36	“Enhanced intrusion detection in wireless sensor networks using deep reinforcement learning with improved feature extraction and selection”	Springer	Jour.	2024	[63]

Table 3. Applications of DRL-based intrusion detection observed in the studies included in the systematic review.

Application	Number of Studies	Paper ID
Intelligent outlier detection	1	SS1
Network intrusion detection	19	SS2, SS3, SS4, SS6, SS9, SS11, SS13, SS14, SS15, SS16, SS19, SS21, SS22, SS23, SS24, SS26, SS30, SS35
Industrial control systems	1	SS31
Intrusion detection	2	SS7, SS10
Software-defined network (SDN)	1	SS8
IoT network	9	SS5, SS11, SS12, SS17, SS25, SS27, SS29, SS33, SS36
Fog-to-cloud computing	1	SS18
Vehicular ad hoc network (VANET)	1	SS20
Cloud network	2	SS23, SS34
SCADA network	1	SS32
Adversarial attack	3	SS4, SS5, SS6
Network anomaly	3	SS7, SS14, SS27, SS28
Cyber attacks	1	SS12

Table 4. Deep reinforcement learning algorithms utilized in the selected studies.

Paper ID	Proposed DRL Algorithm	Intrusion Detection Application
SS1	Intelligent outlier detection with optimal deep reinforcement learning (IOD-ODRL)	Intrusion detection (ID)
SS2	Convolutional neural networks (CNNs)	Intrusion detection systems (IDSs)
SS3	Deep Q-network (DQN)	Intrusion detection system (IDS)
SS4	Deep Q-network (DQN)	Intrusion detection system (IDS), adversarial attack
SS5	Multi-armed Bandit (MAB) algorithm	Network intrusion detection (NID)
SS6	Multi-agent reinforcement learning (MARL), adversarial reinforcement learning (ARL), deep Q-learning (DQN)	Anomaly-based network intrusion detection system (NIDS)
SS7	Proximal policy optimization 2 (PPO2)	Anomaly behavior identification intrusion detection
SS8	Deep Q-network (DQN), proximal policy optimization (PPO)	Network software-defined network (SDN)
SS9	Multi-agent reinforcement learning (MARL), adversarial reinforcement learning (ARL), deep Q-learning (DQN)	Network intrusion detection
SS10	Deep Q-network (DQN), double deep Q-network (DDQN), policy gradient (PG), actor–critic (AC)	Intrusion detection in a network
SS11	Deep Q-reinforcement learning (DQRL)	Network intrusion detection in IoT-fog systems
SS12	Deep Q-network (DQN)	Network intrusion detection system on IoT systems
SS13	Q-learning, deep Q-learning	Network intrusion detection
SS14	Deep reinforcement learning—deep Q-network (DRL-DQN)	Anomaly network intrusion detection system
SS15	Deep Q-network (DQN)	Network traffic intrusion detection
SS16	Stacked autoencoder–soft actor–critic (SA-AC)	Network intrusion detection
SS17	Deep Q-network (DQN), deep reinforcement learning-based IDS (DRL-IDS)	Wireless sensor networks and Internet of Things
SS18	Deep Q-network (DQN)	Intrusion detection system (IDS) on fog-to-cloud computing
SS19	Deep Q-network (DQN)	Network traffic
SS20	Deep Q-network (DQN)	Attack detection in VANET
SS21	Q-learning—new IDS	Intrusion detection on network traffic
SS22	Deep reinforcement learning-based IDS (DRL-IDS)	Intrusion detection systems
SS23	Deep-Q-network (DQN)	Intrusion detection system for cloud infrastructure
SS24	Deep Q-network (DQN)	Network IDS
SS25	Deep Q-network (DQN)	Wireless sensor network (WSN) and Internet of Things (IoT)
SS26	Deep Q-learning (DQL), deep neural network (DNN)	Network intrusion detection
SS27	Distributional reinforcement learning (distributional—RL), generative adversarial network (GAN)	IoT network
SS28	Semi-supervised double deep Q-network (SSDDQN)	Network abnormal traffic detection
SS29	Double deep Q-network (DDQN), deep Q-network (DQN)	Intrusion detection system for Industrial IoT
SS30	Recursive feature elimination with DT (DT + RFE) and DQD + RFE	Network intrusion detection
SS31	Deep Q-network (DQN) and double deep Q-network (DDQN), dueling double deep Q-network (D3QN), actor–critic (AC), proximal policy optimization (PPO)	Intrusion detection for industrial control systems
SS32	Actor–critic (AC), deep-Q-network (DQN)	Intrusion detection in SCADA network
SS33	Deep neural network (DNN), deep reinforcement learning—intrusion detection system (DRL-IDS)	Heterogeneous IoT attack detection
SS34	Double deep Q-network (DDQN)	Cloud intrusion detection system
SS35	Multiple independent deep reinforcement learning (MI-DRL)	Robust intrusion detection system
SS36	Deep reinforcement learning-based intrusion detection (DRL-IDS)	Intrusion detection in wireless sensor networks

Table 5. Datasets used and performance of the proposed techniques.

Paper ID	Proposed DRL Algorithm	Dataset	Performance Metrics	Value or Comparison	Limitations
SS1	Intelligent outlier detection with optimal deep reinforcement learning (IOD-ODRL)	UNSW-NB15	Detection Rate	95.29%	Lack of generalization to unseen attacks
			Accuracy	96.10%
			FPR	5.30%
SS2	Convolutional neural networks (CNNs)	5G-NIDD	Accuracy	0.98	Limited scalability to large-scale networks
SS2	Convolutional neural networks (CNNs)	FLNET2023	Accuracy	1.00	Limited scalability to large-scale networks
SS3	Deep Q-network (DQN)	UNSW-NB15	Accuracy	78%	The use of static datasets does not represent the full context or metadata
SS4	Fast gradient sign method (FGSM), basic iterative method (BIM), adversarial attacks	NSL-KDD	Accuracy	84.81%	Large deterioration in detection performance when adversarial attacks are used
SS4		NSL-KDD	F1-score	84.09%
SS5	Multi-armed Bandit (MAB) algorithm	MAGPIE	Accuracy	0.749	Poor performance on adversarial networks
			Recall	0.780
			Precision	0.787
			FPR	0.281
SS6	Multi-agent reinforcement learning (MARL), adversarial reinforcement learning (ARL), deep Q-learning (DQN)	KDDTest+	Accuracy	80%	Poor performance in a changing environment
SS6		KDDTest+	F1 score	79%	Poor performance in a changing environment
SS7	Proximal policy optimization 2 (PPO2)	IDS2017, IDS2018, NSL-KDD, UNSW-NB15, CIC-IoT2023	Accuracy	Higher accuracy compared to alternative models	Limited application of multi-agent systems in the framework. Low accuracy of multi-classification and slow model training.
SS8	Deep Q-network (DQN), proximal policy optimization (PPO)	Real-time	Ability to detect threads	Able to detect anomalies in the network	Further investigations needed for a bigger network environment. Not evaluated on real malware samples.
SS9	Multi-Agent reinforcement learning (MARL), adversarial reinforcement learning (ARL), deep Q-learning (DQN)	NSL-KDD, KDDTest+	F1 Score	79%	Low intrusion detection performance against dynamic attacks.
SS9		NSL-KDD, KDDTest+	Accuracy	80%
SS10	Deep Q-network (DQN), double deep Q-network (DDQN), policy gradient (PG), actor–critic (AC)	DoHBRw	Accuracy	0.9999	Not tested on offline settings
			DQN—Accuracy	0.934
			DDQN—Accuracy	0.933
			PG—Accuracy	0.91
			AC—Accuracy	0.89
SS11	Deep Q-reinforcement Learning (DQRL)	NSL-KDD	Latency	Low	Vulnerability to adversarial environments
			Precision	High
			Energy efficiency	High
SS12	Deep Q-network (DQN)	TON-IoT	Accuracy	0.7969	Low performance
SS12	Deep Q-network (DQN)	TON-IoT	Precision	0.7678	Low performance

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mpoporo, L.J.; Owolawi, P.A.; Tu, C. Deep Reinforcement Learning Algorithms for Intrusion Detection: A Bibliometric Analysis and Systematic Review. Appl. Sci. 2026, 16, 1048. https://doi.org/10.3390/app16021048

AMA Style

Mpoporo LJ, Owolawi PA, Tu C. Deep Reinforcement Learning Algorithms for Intrusion Detection: A Bibliometric Analysis and Systematic Review. Applied Sciences. 2026; 16(2):1048. https://doi.org/10.3390/app16021048

Chicago/Turabian Style

Mpoporo, Lekhetho Joseph, Pius Adewale Owolawi, and Chunling Tu. 2026. "Deep Reinforcement Learning Algorithms for Intrusion Detection: A Bibliometric Analysis and Systematic Review" Applied Sciences 16, no. 2: 1048. https://doi.org/10.3390/app16021048

APA Style

Mpoporo, L. J., Owolawi, P. A., & Tu, C. (2026). Deep Reinforcement Learning Algorithms for Intrusion Detection: A Bibliometric Analysis and Systematic Review. Applied Sciences, 16(2), 1048. https://doi.org/10.3390/app16021048

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Reinforcement Learning Algorithms for Intrusion Detection: A Bibliometric Analysis and Systematic Review

Abstract

1. Introduction

2. Motivation of This Study

3. Methodology

3.1. Bibliometric Analysis

3.2. Systematic Literature Review (SLR)

3.2.1. Definition of Research Questions

3.2.2. Information Sources and Research Criteria

3.2.3. Eligibility Criteria

3.2.4. Resource Selection

3.2.5. Quality Assessment Criteria

3.2.6. Data Extraction Process

3.2.7. Synthesis of Data Items

4. Results and Discussion

4.1. Bibliometric Results

4.2. Systematic Review Results

4.2.1. Applications of Intrusion Detection Systems (RQ1)

4.2.2. DRL Algorithms Utilized (RQ1)

Actor–Critic (AC) Algorithms

Proximal Policy Optimization 2 (PPO2)

Deep Q-Network, Double Deep Q-Network (DDQN), and Dueling Double Deep Q-Network (D3DQN)

Policy Gradient (PG)

Multi-Agent Reinforcement Learning (MARL)

Deep Neural Network (DNN)

Adversarial Reinforcement Learning (ARL)

4.3. DRL Datasets Used and Performance Analysis of Deep Reinforcement Learning Algorithms (RQ2)

4.4. Research Gaps and Implications for Future DRL-Based IDS

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI