Automated Network Defense: A Systematic Survey and Analysis of AutoML Paradigms for Network Intrusion Detection

Liu, Haowen; Wang, Xuren; He, Famei; Zheng, Zhiqiang

doi:10.3390/app151910389

Open AccessReview

Automated Network Defense: A Systematic Survey and Analysis of AutoML Paradigms for Network Intrusion Detection

¹

Information Engineering College, Capital Normal University, Beijing 100048, China

²

Key Laboratory of Cyberspace Security, Ministry of Education, Beijing 100048, China

³

Library of Beijing Institute of Technology, Beijing Institute of Technology, Beijing 100081, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(19), 10389; https://doi.org/10.3390/app151910389

Submission received: 27 August 2025 / Revised: 22 September 2025 / Accepted: 23 September 2025 / Published: 24 September 2025

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

As cyberattacks grow increasingly sophisticated, advanced Network Intrusion Detection Systems (NIDS) have become essential for securing cyberspace. While Machine Learning (ML) is foundational to modern NIDS, its effectiveness is often hampered by a resource-intensive development pipeline involving feature engineering, model selection, and hyperparameter tuning. Automated Machine Learning (AutoML) promises a solution, but its application to the massive, high-speed data streams in NIDS is fundamentally a parallel and distributed computing challenge. This paper argues that the scalability and performance of AutoML in NIDS are governed by the underlying computational paradigm. We introduce a novel taxonomy of AutoML frameworks, uniquely classifying them by their parallel and distributed architectures. Through a comprehensive meta-analysis of over 15 NID methods on benchmark datasets, we demonstrate how the performance of leading systems is a direct consequence of their chosen computational paradigm. Finally, we identify frontier challenges and future research directions at the intersection of AutoML, NIDS, and high-performance distributed systems, focusing on computational scalability, security, and end-to-end automation.

Keywords:

AutoML; network intrusion detection; parallel computing; distributed systems; neural architecture search

1. Introduction

In recent years, the threat landscape of cyberspace has evolved at an unprecedented pace. The complexity, stealthiness, and polymorphic nature of cyberattacks have significantly increased, while the volume of malicious traffic has grown exponentially with the rapid proliferation of Internet of Things (IoT) devices [1], cloud services, and encrypted communications. These developments have rendered traditional security mechanisms increasingly inadequate [2].

Conventional signature-based Intrusion Detection Systems (IDS), though effective for known attacks, struggle to detect novel, obfuscated, or zero-day threats. This limitation has become particularly evident as attackers increasingly exploit encrypted channels and adaptive evasion strategies. To address this gap, the research community has shifted attention toward Machine Learning (ML)-based Network Intrusion Detection Systems (NIDS) [3], which can learn latent representations and complex traffic patterns directly from data. By doing so, they are capable of identifying anomalous or malicious behaviors that are not explicitly defined in pre-existing rules.

Nevertheless, building an effective ML-based NIDS remains non-trivial. The design process typically involves data preprocessing, feature extraction, algorithm selection, and hyperparameter optimization. Each of these steps is highly labor-intensive and demands significant domain expertise. Moreover, the pipeline often requires repeated manual iterations to adapt to new datasets, evolving attack vectors, or deployment environments. This reliance on expert-driven trial-and-error creates the notorious “human-in-the-loop bottleneck”, which hinders the scalability and practicality of ML-based IDS in real-world deployments.

Automated Machine Learning (AutoML) has emerged as a promising paradigm to address this bottleneck. AutoML frameworks aim to democratize machine learning by reducing technical barriers, accelerating experimentation, and enhancing model robustness. At its core, AutoML formalizes the model construction task as the Combined Algorithm Selection and Hyperparameter (CASH) optimization problem [4], which seeks the optimal model configuration from a given algorithm set and hyperparameter space. Modern AutoML systems incorporate diverse strategies such as Bayesian optimization, reinforcement learning, meta-learning, and Neural Architecture Search (NAS), enabling them to automatically discover high-performing models with minimal human intervention.

Despite these advantages, applying AutoML to the NID domain introduces unique challenges. Unlike typical tabular or image datasets, network traffic data is high-dimensional, heterogeneous, often encrypted, and highly dynamic. Moreover, NIDS must operate under real-time constraints, requiring models to process gigabit-level traffic streams while maintaining low latency and high accuracy. These requirements introduce stringent trade-offs between accuracy, scalability, and computational efficiency.

From a system perspective, AutoML for NIDS is not solely a matter of automating ML workflows; rather, it represents a parallel and distributed computing challenge. Training a large number of candidate models, searching across vast architecture spaces, and handling terabyte-scale traffic data necessitate efficient parallel execution, resource-aware scheduling, and distributed optimization mechanisms. In addition, ensuring the security and trustworthiness of AutoML pipelines, particularly when deployed in federated or cloud-based environments, remains an open research issue.

Our main contributions are as follows:

1. We propose a novel taxonomy of AutoML frameworks. Unlike traditional function-based classifications, our taxonomy is rooted in their underlying parallel and distributed computation strategies, providing a new perspective for system-level understanding and optimization.

2. We present a comprehensive meta-analysis of reported performance from more than 15 methods on benchmark datasets, providing a structured overview of the current state-of-the-art.

3. We explore frontier challenges and future research directions, focusing on computational scalability, security in distributed learning, and end-to-end automation, aimed at inspiring the parallel and distributed computing community to address these system-level bottlenecks.

The rest of the paper is organized as follows: Section 2 introduces our AutoML taxonomy based on parallel computation paradigms. Section 3 presents and discusses empirical evaluation results. Section 4 outlines key challenges and future directions. Section 5 concludes the paper.

Related Work

Traditional NIDS often relies on handcrafted traffic features (e.g., port numbers, byte statistics, and payload entropy) combined with shallow classifiers such as decision trees, SVMs, or random forests. However, as encrypted traffic dominates modern networks, such approaches have become increasingly ineffective. To overcome this limitation, deep learning methods have been widely adopted for traffic classification and intrusion detection [5,6]. CNNs, RNNs, and transformer-based architectures have demonstrated strong performance in feature learning from raw traffic data [7].

For example, on the USTC-TFC2016 dataset, LGLFormer (a hybrid transformer model combining local and global attention [8]) achieved an F1-score of 0.9932, significantly outperforming the vision transformer [9] ViT-B/16 (0.7345). On CICIDS2017, LGLFormer reached 0.9997, while ViT achieved only 79.83%. These results highlight the importance of domain-specific adaptation in designing models for network traffic classification [10].

Meanwhile, AutoML research has rapidly progressed. Auto-Sklearn extended this by incorporating meta-learning and automated ensemble construction [11]; AutoGluon introduced large-scale parallel stacking ensembles; Auto-Keras and Google Cloud AutoML focus on distributed Neural Architecture Search (NAS). Recently, AutoML has been applied to encrypted traffic and intrusion detection. Tools such as GGFAST generate traffic classifiers automatically, while AutoML4ETC searches for efficient NAS-based architectures tailored for encrypted traffic classification. Lightweight NAS-based models have also been proposed, achieving accuracy above 96% on VPN datasets with only 88.3 K parameters, showing that AutoML is well-suited for both efficiency and accuracy [12].

Despite these advances, challenges remain in scaling AutoML to NIDS, particularly regarding parallelization, security in distributed training, and end-to-end feature automation.

2. Taxonomy of Parallel and Distributed AutoML Paradigms

While existing AutoML frameworks differ in functionality, their scalability and efficiency are fundamentally governed by their computational architectures. To systemically examine these frameworks through the lens of parallel and distributed computing, we propose a new taxonomy comprising four paradigms.

2.1. Enhanced Sequential Search with Meta-Learning

These frameworks are primarily based on sequential optimization (e.g., Bayesian optimization) but improve efficiency by incorporating parallelizable preprocessing and meta-learning components [13].

Auto-WEKA [14] pioneered the formalization of the CASH problem and introduced Sequential Model-Based Optimization (SMBO), where a surrogate model (e.g., Gaussian process) is iteratively constructed to guide the next sampling point. Although model updates are sequential, the evaluation of each candidate configuration

λ

(i.e., model training and validation) is independent and can thus be parallelized across nodes.

The CASH problem can be mathematically defined as follows:

A_{λ^{*}}^{*} \in \underset{A^{(i)} \in A, λ \in A^{(i)}}{arg min} \frac{1}{k} \cdot \sum_{i = 1}^{k} L (A_{λ}^{(j)}, D_{train}^{(i)}, D_{valid}^{(i)}) .

(1)

Given the algorithm set A and its corresponding parameter domain

A^{(j)}

, the goal is to identify the optimal algorithm and parameter combination

A_{λ}^{(j)}

that yields the best generalization performance. This combination achieves the lowest loss in k-fold cross-validation, as evaluated on both the training set

D_{t r a i n}^{(i)}

and the test set

D_{v a l i d}^{(i)}

.

Auto-Sklearn [15] extended Auto-WEKA with two innovations(like Figure 1). First, it employs meta-learning to compute meta-features (e.g., number of instances, dimensionality, and class entropy) and prioritize configurations that performed well on similar historical datasets. Second, it applies automated ensemble selection to combine the best-performing models from the search process. Both model evaluation and ensemble construction can be parallelized.

2.2. Intrinsically Parallel Population-Based Search

This paradigm uses Evolutionary Algorithms (EA) or Genetic Programming (GP) to simulate natural selection and optimize ML pipelines. A population of candidate solutions evolves over generations via selection, crossover, and mutation.

TPOT [16] (Tree-based Pipeline Optimization Tool) exemplifies this approach, representing ML pipelines as trees where nodes are preprocessing steps or models. Fitness evaluation of each pipeline in a population is independent, enabling parallel execution across hundreds of CPUs or nodes, an embarrassingly parallel strategy ideal for distributed environments.

2.3. Large-Scale Parallel Ensemble and Stacking

Instead of exploring large configuration spaces, this strategy trains a wide range of standard, robust models in parallel and uses stacking or ensembling to aggregate them.

AutoGluon [17] embodies this approach via multi-layer stacking with K-fold bagging(like Figure 2). It trains diverse base learners (e.g., LightGBM, CatBoost, Random Forest, and MLP) in parallel, and their outputs are combined with raw features as inputs to higher-level models. This combination of model and data parallelism enables rapid construction of high-performance ensembles.

2.4. Distributed Neural Architecture Search (NAS)

NAS, one of the most computation-intensive branches of AutoML, aims to automatically design neural network topologies, typically using a distributed controller–worker architecture.

Auto-Keras [18] and Google Cloud AutoML are representative frameworks. A controller (e.g., RNN or RL agent) generates candidate architectures, which are dispatched to worker clusters (GPUs) for training and evaluation. Feedback from the evaluation updates the controller. This loop (generate, train, evaluate, and feedback) requires massive computational resources.

To mitigate costs, AutoML4ETC [19] and others propose proxy models that predict architecture performance without full training, enabling faster evaluations and broader exploration in distributed settings.

2.5. Summary

These four paradigms represent complementary approaches to scaling AutoML within parallel and distributed environments. Sequential search methods emphasize efficiency through surrogate modeling and meta-learning, while population-based methods exploit inherent parallelism to explore large search spaces. Ensemble-based approaches prioritize robustness and accuracy via parallel model training, and NAS-based methods push the frontier of deep AutoML at scale. Together, they form a comprehensive taxonomy that highlights the trade-offs among performance, scalability, interpretability, and computational cost.

Table 1 below summarizes this taxonomy, highlighting a fundamental trade-off between search complexity and parallel efficiency. NAS offers novel architectures but requires immense resources. In contrast, approaches like AutoGluon provide scalability via simple yet parallel strategies. The optimal choice depends on problem complexity, available resources, and explainability requirements.

The choice of paradigm involves careful consideration of the problem’s nature. For NIDS, where traffic data is highly structured with strong temporal and feature-level dependencies, the undirected exploration of population-based search can be less efficient. The randomness inherent in genetic operators like crossover and mutation may disrupt meaningful feature combinations that are critical for identifying attack patterns. In contrast, guided methods like Bayesian optimization can more effectively learn the promising regions of the search space for such structured data. NAS offers the potential for novel architectures but requires immense resources, while approaches like AutoGluon provide scalability and robustness via simpler, highly parallelizable strategies. The optimal choice depends on problem complexity, available resources, and explainability requirements.

3. Empirical Analysis of NID Systems

To connect the theoretical framework with real-world performance, this section presents an empirical analysis based on data collected from multiple studies, evaluating a wide range of NID methods built on different technologies.

3.1. Meta-Analysis Methodology

Objective: The purpose of this analysis is to empirically evaluate the performance of various ML methods for NID based on existing research, providing a solid reference for understanding the effectiveness of automated approaches.

Datasets: The comparative analysis covers several widely used benchmark datasets in network security:

1. CICIDS2017 [20]: A modern dataset with various recent attack types, such as DDoS, brute-force, and botnet attacks.

2. CICDOS2017: A dataset focused on denial of service attacks, including slow-rate and application-layer attacks.

3. ISCX2016 [20]: A dataset that includes both real-world traffic and malicious flows.

4. NSL-KDD: A cleaned and improved version of the classic KDD intrusion detection dataset.

5. AUCK-VI: A specialized dataset for traffic classification.

6. USTC-TFC2016, ISCX-VPN-Service, etc.: Additional datasets used to evaluate advanced deep learning models.

Critical Analysis of Benchmark Datasets

Directly comparing performance scores across different datasets without scrutinizing the datasets’ own limitations is methodologically flawed. This section provides a critical analysis of key benchmark datasets to offer the necessary context for a more nuanced interpretation of the reported results.

1. CICIDS2017: This dataset’s strengths include its realistic traffic generation, diversity of modern attacks (e.g., Brute Force, DoS, Botnet, and DDoS), and complete network configuration. However, it also has significant, documented weaknesses: severe class imbalance, with “Benign” and “DoS Hulk” traffic dominating, which can lead to inflated accuracy scores for models biased towards the majority class; instances of missing class labels; most critically, systematic labeling errors caused by the CICFlowMeter tool’s flawed handling of TCP flow timeouts. This flaw can cause malicious TCP flows to be mislabeled as “BENIGN” if they follow a timed-out flow, fundamentally compromising the ground truth for a subset of the data.

2. ISCX2016 (ISCX-VPN-nonVPN): While valuable for studying encrypted and VPN traffic, this dataset was generated in a highly controlled, sanitized environment. During data capture, all unnecessary services and applications were closed, and only the target application was active. This clean capture environment simplifies the classification task by removing the background noise and concurrent application traffic typical of real-world networks.

Table 2 summarizes this critical information. The near-perfect scores reported on CICIDS2017 must be interpreted with caution, as high performance may reflect an ability to exploit dataset artifacts rather than true, generalizable detection capability. Similarly, models trained on the “clean” ISCX2016 dataset may not generalize well to noisy, real-world network environments.

Compared Methods: We categorized the methods extracted from the literature into four groups:

1. AutoML-based Methods: This category includes systems built on AutoGluon and other AutoML frameworks, such as nPrint (implemented on AutoGluon), GGFAST, AutoML4ETC, NAS-Net, and AutoML ensemble systems. These methods either automatically search for suitable models or directly provide end-to-end classifiers.

2. Advanced Deep Learning Models (transformer-based): This group includes recently proposed models leveraging transformer architectures, such as HiLo-MAE, LGLFormer, and other vision or sequence transformer models (e.g., ViT-B/16, ET-BERT, and YaTC). These models generally require large datasets and significant computational resources but are capable of capturing complex temporal or global features.

3. Standard Deep Learning Models: This category includes well-established models such as CNNs and LSTMs. Representative examples are DeepPacketCNN (1D CNN), UC Davis CNN, E2ECNN, LeNet, and ResNet. These architectures are relatively standard and are commonly applied in traffic classification tasks.

4. Traditional ML Models: This group covers classical classifiers, such as k-Nearest Neighbors (KNN), decision tree, Random Forest (RF), Extremely Randomized Trees (ET), XGBoost, LightGBM, and CatBoost, as well as feature-engineering-based approaches like AppScanner and CUMUL.

Evaluation Metrics: Standard evaluation metrics were used, including Accuracy (ACC), precision, and F1-score.

It should be noted that different studies may vary in data preprocessing, feature extraction, and model complexity. Therefore, this comparison primarily reflects the reported performance in the original papers and is intended to highlight overall trends across different technical approaches.

3.2. Performance and Efficiency Comparison

Table 3 and Table 4 summarize the performance indicators of representative methods on benchmark datasets, providing an intuitive comparison of detection effectiveness.

In general, AutoML methods demonstrate outstanding performance, often achieving near-perfect detection on widely used datasets. However, this advantage comes with increased computational requirements. Regarding computational overhead, specific details provide a more quantitative perspective. For instance, AutoML4ETC’s NAS process is computationally expensive, but strategies can manage this cost. A “partial training” strategy (training candidate models for only 10 epochs instead of 40) can reduce total search time by approximately 75%, at the cost of a 3% drop in final model accuracy. Nevertheless, AutoML4ETC produces models with over 50 times fewer parameters than state-of-the-art manual designs, implying a much lower memory footprint and faster inference time. On the other hand, the “Autonomous Cybersecurity” framework’s experiments were conducted on a standard desktop computer (Intel i7-8700, 16 GB memory), demonstrating the feasibility of running sophisticated AutoML pipelines without a large-scale cluster. Furthermore, its AutoML-OCSE model achieves the fastest average inference time per sample on the CICIDS2017 dataset, which is critical for real-time NIDS deployment.

Deep learning approaches, particularly those based on Transformers, show competitive results but heavily depend on large-scale labeled datasets and GPU acceleration. Traditional ML models offer clear advantages in training speed and deployment costs but often fall short when handling high-dimensional or evolving attack traffic. The choice between paradigms thus involves a trade-off between detection accuracy, computational resources, and practical deployment requirements.

3.3. Result Discussion

The analysis reveals several critical findings that directly link the predictive performance of NID systems to their underlying computational paradigms, as defined by our taxonomy.

3.3.1. State-of-the-Art Performance via Large-Scale Parallel Ensembling

On the widely-used CICIDS2017 dataset, which consists of pre-processed tabular flow features, the “Large-Scale Parallel Ensemble” paradigm yields state-of-the-art results. For example, nPrint, which is implemented on the AutoGluon framework, achieves a near-perfect F1-score of 99.90%. This success is a direct consequence of the paradigm’s strategy: training a diverse set of strong, independent learners (e.g., LightGBM, CatBoost, random forest, and MLP) in parallel and combining them via multi-layer stacked ensembling. This approach is exceptionally well-suited to the structured, high-dimensional nature of CICFlowMeter data, where identifying complex feature interactions is key to distinguishing malicious from benign flows. The inherent parallelism allows for rapid exploration and combination of robust models, leading to high-performance ensembles with minimal manual tuning.

3.3.2. Discovering Novel Architectures with Distributed NAS

In contrast, for tasks involving encrypted traffic classification from raw or semi-structured data, the “Distributed Neural Architecture Search” (NAS) paradigm demonstrates significant potential. AutoML4ETC, for instance, automatically discovers novel, lightweight CNN architectures tailored for encrypted traffic, achieving a 94.40% F1-score on the ISCX2016 dataset. This highlights the paradigm’s strength in tailoring deep learning models to specific data modalities, moving beyond pre-defined architectures. However, this power comes at the cost of significant computational resources, as the controller-worker architecture must train and evaluate a large number of candidate networks.

3.3.3. Synergy Between Domain Knowledge and Automation

While AutoML excels at automating model selection, the results suggest that combining domain expertise with automation produces the most effective systems. For instance, nPrint does not rely on raw byte streams alone; it incorporates rich, domain-expert-crafted n-gram representations. These representations encode expert knowledge about which parts of a packet are most informative into a standardized format. AutoGluon then automates the process of building models on these well-designed features. This indicates that the most effective strategy for NIDS often involves a hybrid paradigm: experts design powerful feature encodings, and AutoML frameworks optimize the downstream learning pipelines.

3.3.4. Expanded Observations

Beyond accuracy, efficiency and robustness are equally important in practical NIDS deployments. AutoML systems that rely on large-scale ensembling can face challenges in real-time inference due to latency and memory requirements. In contrast, transformer-based models may scale well in terms of representation learning but demand extensive GPU support, limiting their accessibility. Another emerging observation is the generalization gap across datasets. Many models perform exceptionally well on CICIDS2017 but show reduced performance on ISCX2016, which contains more heterogeneous traffic patterns. This highlights the importance of cross-dataset evaluation and suggests that future AutoML pipelines should incorporate mechanisms for domain adaptation.

4. Discussion: Challenges and Future Research Directions

Although automated network defense has made promising progress, widespread adoption in real-world scenarios still faces several system-level challenges. This paper focuses on three primary issues.

4.1. Computational Scalability and Green AutoML

The computational cost of AutoML, especially NAS methods, is often extremely high, potentially requiring hundreds of GPU-days for a single search. This introduces severe energy consumption and environmental costs. In particular, high-performing transformer-based models often require data-center-grade, high-VRAM GPUs (e.g., NVIDIA A100 80GB for training ViT-B/16 models) and multi-GPU distributed training frameworks (e.g., four NVIDIA GeForce RTX3090s for YaTC), exacerbating the scalability and sustainability challenge.

To address this, researchers are exploring several strategies:

1. Efficient Algorithms: Designing more efficient distributed optimization algorithms and incorporating techniques such as early stopping and model compression.

2. Resource-Aware Scheduling: Introducing scheduling strategies that allow AutoML systems to dynamically adjust the search process based on a given budget.

3. Green AutoML: Treating energy consumption itself as an optimization objective to balance model performance against energy costs.

4. Hardware-Aware NAS: Employing hardware-aware NAS to optimize models for specific platforms, significantly reducing parameter counts.

4.2. Security and Trust in Distributed Learning

While distributed AutoML frameworks improve efficiency, they also introduce new attack surfaces.

First, studies have shown that NAS-generated models may be more vulnerable to adversarial samples. The work by Pang et al. found that because the NAS search process favors models that converge quickly, these architectures are demonstrably more vulnerable to adversarial evasion, model poisoning, and backdoor injection compared to manually designed models [30].

Second, distributed paradigms such as federated learning may suffer from data or model poisoning attacks by malicious participants.

Third, using cloud-based services (e.g., Google AutoML) to train traffic models creates significant privacy risks due to the need to upload sensitive traffic logs.

Future research must embed security and privacy-preserving mechanisms into AutoML pipelines, including the following:

1. Robustness Training: Adopting adversarially robust training and verifiable AutoML methods.

2. Privacy-Preserving Technologies: Integrating differential privacy and homomorphic encryption. However, it is important to note that homomorphic encryption is computationally intense, which can conflict with “Green AutoML” goals.

3. Anti-Poisoning Protocols: Developing collaborative learning protocols that are robust against malicious behavior.

4.3. End-to-End Feature Engineering for NID

As discussed, even state-of-the-art AutoML backends still heavily rely on handcrafted or semi-automated feature engineering. Current tools are primarily optimized for tabular datasets and cannot directly handle raw network traffic (e.g., PCAP traces).

Exploring Universal Feature Extraction Methods

To move toward true end-to-end AutoML for NID, several promising research directions can be explored:

1. Raw Traffic Parsing Pipelines: AutoML frameworks must incorporate standardized modules to directly parse and preprocess raw PCAP data, automatically extracting session-level, packet-level, and flow-level statistics.

2. Cross-Modal Representations: Raw traffic can be transformed into multiple modalities such as sequences, images, and graphs. For instance, sequence-based methods treat traffic as a time series of bytes or packets, suitable for RNNs or transformers; image-based encodings (e.g., FlowPic) reshape packet payloads into matrices, enabling the use of AutoML image pipelines; graph-based methods construct communication graphs, allowing NAS to search over Graph Neural Networks (GNNs).

3. Unsupervised Feature Learning: To reduce reliance on scarce labeled datasets, unsupervised and self-supervised learning methods should be integrated to generate reusable feature embeddings.

4. Parallelized Preprocessing and Feature Selection: Preprocessing of massive network traffic should be parallelized and distributed, with the AutoML system automatically managing feature selection and dimensionality reduction.

5. Conclusions

This paper presented a comprehensive survey and empirical study of Automated Machine Learning (AutoML) for Network Intrusion Detection (NID). We systematically reviewed existing AutoML frameworks, proposed a taxonomy from the perspective of parallel and distributed computing, and benchmarked more than 15 representative methods across widely used datasets such as CICIDS2017, ISCX2016, and USTC-TFC2016. Our experimental results confirm that AutoML-based approaches can achieve, and in some cases surpass, the performance of expert-designed models, particularly when combined with domain-specific traffic representations such as nPrint.

The findings highlight several important observations:

1. AutoML pipelines consistently outperform or match traditional ML and DL approaches, delivering state-of-the-art F1-scores above 99% on CICIDS2017.

2. Distributed and parallel optimization plays a critical role in improving both performance and efficiency, demonstrating the necessity of resource-aware AutoML.

3. Synergy between expert knowledge and automation remains essential; hybrid systems that integrate domain-specific features with AutoML search achieve the best performance.

Nevertheless, challenges remain before AutoML can become a mainstream solution for real-world intrusion detection. These include the following:

1. Computational scalability and sustainability, ensuring AutoML can operate efficiently at the scale of modern networks while reducing environmental and financial costs (Green AutoML).

2. Security and trustworthiness, strengthening AutoML pipelines against poisoning, adversarial attacks, and privacy risks in distributed learning.

3. End-to-end feature automation, designing AutoML frameworks that can directly handle raw traffic, employ cross-modal representations, and integrate unsupervised feature learning.

4. Interpretability and explainability, providing security analysts with transparent insights into model behavior to foster trust and adoption.

Looking ahead, the integration of Green AutoML, secure distributed optimization, and end-to-end traffic feature automation will be pivotal in advancing AutoML-powered NID systems. Achieving these goals will not only improve accuracy and robustness but also enable scalable, interpretable, and sustainable cybersecurity solutions. Ultimately, AutoML has the potential to transform intrusion detection from a manual, expertise-driven process into a fully autonomous, adaptive, and trustworthy component of network defense.

Author Contributions

Conceptualization, H.L. and X.W.; methodology, H.L. and Z.Z.; software, H.L. and Z.Z.; validation, X.W. and F.H.; formal analysis, X.W. and F.H.; investigation, H.L. and Z.Z.; resources, X.W. and F.H.; data curation, H.L. and Z.Z.; writing—original draft preparation, H.L. and Z.Z.; writing—review and editing, X.W. and F.H.; visualization, F.H. and Z.Z.; supervision, X.W. and F.H.; project administration, X.W.; funding acquisition, X.W. and F.H. All authors have read and agreed to the published version of the manuscript.

Funding

Funded by the Open Foundation of Key Laboratory of Cyberspace Security, Ministry of Education of China (No. KLCS20240206).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dadkhah, S.; Mahdikhani, H.; Danso, P.K.; Zohourian, A.; Truong, K.A.; Ghorbani, A.A. Towards the development of a realistic multidimensional iot profiling dataset. In Proceedings of the 2022 19th Annual International Conference on Privacy, Security & Trust (PST), Fredericton, NB, Canada, 22–24 August 2022; pp. 1–11. [Google Scholar]
Lashkari, A.H.; Kadir, A.F.A.; Taheri, L.; Ghorbani, A.A. Toward developing a systematic approach to generate benchmark android malware datasets and classification. In Proceedings of the 2018 International Carnahan Conference on Security Technology (ICCST), Montreal, QC, Canada, 22–25 October 2018; pp. 1–7. [Google Scholar]
Jian, S.J.; Lu, Z.G.; Du, D.; Jiang, B.; Liu, B.X. Overview of Network Intrusion Detection Technology. J. Cyber Secur. 2020, 5, 96–122. [Google Scholar]
He, X.; Zhao, K.; Chu, X. AutoML: A survey of the state-of-the-art. Knowl.-Based Syst. 2021, 212, 106622. [Google Scholar] [CrossRef]
Zhou, Y.; Shi, H.; Zhao, Y.; Ding, W.; Han, J.; Sun, H.; Zhang, X.; Tang, C.; Zhang, W. Identification of encrypted and malicious network traffic based on one-dimensional convolutional neural network. J. Cloud Comput. 2023, 12, 53. [Google Scholar] [CrossRef]
Wang, W.; Zhu, M.; Wang, J.; Zeng, X.; Yang, Z. End-to-end encrypted traffic classification with one-dimensional convolution neural networks. In Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), Beijing, China, 22–24 July 2017; pp. 43–48. [Google Scholar]
Ruff, L.; Vandermeulen, R.; Goernitz, N.; Deecke, L.; Siddiqui, S.A.; Binder, A.; Müller, E.; Kloft, M. Deep one-class classification. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018; pp. 4393–4402. [Google Scholar]
Pan, J.; Bulat, A.; Tan, F.; Zhu, X.; Dudziak, L.; Li, H.; Tzimiropoulos, G.; Martinez, B. Edgevits: Competing light-weight cnns on mobile devices with vision transformers. In Proceedings of the ECCV 2022: 17th European Conference (Part XI), Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 294–311. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. Available online: https://openreview.net/forum?id=YicbFdNTTy (accessed on 11 February 2025).
Chu, X.; Tian, Z.; Zhang, B.; Wang, X.; Shen, C. Conditional Positional Encodings for Vision Transformers. Available online: https://openreview.net/forum?id=3KWnuT-R1bh (accessed on 11 February 2025).
Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a few examples: A survey on few-shot learning. Acm Comput. Surv. 2020, 53, 1–34. [Google Scholar] [CrossRef]
Lotfollahi, M.; Jafari Siavoshani, M.; Shirali Hossein Zade, R.; Saberian, M. Deep packet: A novel approach for encrypted traffic classification using deep learning. Soft Comput. 2020, 24, 1999–2012. [Google Scholar] [CrossRef]
Liao, H.J.; Lin, C.H.R.; Lin, Y.C.; Tung, K.Y. Intrusion detection system: A comprehensive review. J. Netw. Comput. Appl. 2013, 36, 16–24. [Google Scholar] [CrossRef]
Thornton, C. Auto-WEKA: Combined Selection and Hyperparameter Optimization of Supervised Machine Learning Algorithms. Ph.D. Thesis, University of British Columbia, Vancouver, BC, Canada, 2014. [Google Scholar]
Feurer, M.; Klein, A.; Eggensperger, K.; Springenberg, J.; Blum, M.; Hutter, F. Efficient and robust automated machine learning. Adv. Neural Inf. Process. Syst. 2015, 28, 2962–2970. [Google Scholar]
Olson, R.S.; Bartley, N.; Urbanowicz, R.J.; Moore, J.H. Evaluation of a tree-based pipeline optimization tool for automating data science. In Proceedings of the Genetic and Evolutionary Computation Conference 2016, Denver, CO, USA, 20–24 July 2016; pp. 485–492. [Google Scholar]
Erickson, N.; Mueller, J.; Shirkov, A.; Zhang, H.; Larroy, P.; Li, M.; Smola, A. Autogluon-tabular: Robust and accurate automl for structured data. arXiv 2020, arXiv:2003.06505. [Google Scholar]
Jin, H.; Song, Q.; Hu, X. Auto-keras: An efficient neural architecture search system. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 1946–1956. [Google Scholar]
Malekghaini, N.; Akbari, E.; Salahuddin, M.A.; Limam, N.; Boutaba, R.; Mathieu, B.; Moteau, S.; Tuffin, S. AutoML4ETC: Automated neural architecture search for real-world encrypted traffic classification. IEEE Trans. Netw. Serv. Manag. 2023, 21, 2715–2730. [Google Scholar] [CrossRef]
Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp 2018, 1, 108–116. [Google Scholar]
Holl, J.; Schmitt, P.; Feamster, N.; Mittal, P. New directions in automated traffic analysis. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual, 15–19 November 2021; pp. 3366–3383. [Google Scholar]
Yang, L.; Shami, A. Towards Autonomous Cybersecurity: An Intelligent AutoML Framework for Autonomous Intrusion Detection. In Proceedings of the Workshop on Autonomous Cybersecurity, Rome, Italy, 28 May–1 June 2023; pp. 68–78. [Google Scholar]
Lyu, R.; He, M.; Zhang, Y.; Jin, L.; Wang, X. Network Intrusion Detection Based on an Efficient Neural Architecture Search. Symmetry 2021, 13, 1453. [Google Scholar] [CrossRef]
Draper-Gil, G.; Lashkari, A.H.; Mamun, M.S.I.; Ghorbani, A.A. Characterization of encrypted and vpn traffic using time-related. In Proceedings of the 2nd International Conference on Information Systems Security and Privacy (ICISSP), Rome, Italy, 19–21 February 2016; pp. 407–414. [Google Scholar]
Wang, W.; Zhu, M.; Zeng, X.; Ye, X.; Sheng, Y. Malware traffic classification using convolutional neural network forrepresentation learning. In Proceedings of the 2017 International Conference on Information Networking (ICOIN), Da Nang, Vietnam, 11–13 January 2017; pp. 712–717. [Google Scholar]
Zheng, W.; Gou, C.; Yan, L.; Mo, S. Learning to Classify: A Flow-Based Relation Network for Encrypted Traffic Classification. Proc. Web Conf. 2020, 2020, 13–22. [Google Scholar]
Lin, X.; Xiong, G.; Gou, G.; Li, Z.; Shi, J.; Yu, J. ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification. Proc. ACM Web Conf. 2022, 2022, 633–642. [Google Scholar]
Zhao, R.; Zhan, M.; Deng, X.; Wang, Y.; Wang, Y.; Gui, G.; Xue, Z. Yet Another Traffic Classifier: A Masked Autoencoder Based Traffic Transformer with Multi-Level Flow Representation. Proc. AAAI Conf. Artif. Intell. 2023, 37, 5420–5427. [Google Scholar] [CrossRef]
Piet, J.; Nwoji, D.; Paxson, V. Ggfast: Automating generation of flexible network traffic classifiers. In Proceedings of the ACM SIGCOMM 2023 Conference, New York City, NY, USA, 10–14 September 2023; pp. 850–866. [Google Scholar]
Pang, R.; Xi, Z.; Ji, S.; Luo, X.; Wang, T. On the security risks of AutoML. In Proceedings of the 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, USA, 10–12 August 2022; pp. 3953–3970. [Google Scholar]

Figure 1. This figure shows the workflow of Auto-Sklearn: it first initializes the search using meta-learning, then employs Bayesian optimization to automatically find the best combination of data processing and models, and finally integrates multiple top models into an ensemble to output the prediction.

Figure 2. This figure depicts AutoGluon’s multi-layer stacking strategy: it trains multiple base models in parallel (the first layer) and uses their predictions as new features for a second-layer model, ultimately combining them into a higher-performance ensemble model.

Table 1. Classification of AutoML frameworks based on parallel and distributed mechanisms.

Paradigm	Example Framework	Core Principle	Parallel Mechanism	NID Applicability Analysis
Enhanced Sequential Search with Meta-Learning	Auto-WEKA, Auto-Sklearn	Accelerating Bayesian sequential optimization methods through meta-learning and parallel evaluation	Parallel evaluation of candidate configurations; Parallel initialization of meta-learning	Suitable for small to medium-sized datasets, it can find relatively optimal traditional ML models, but has limited support for deep learning
Intrinsically Parallel Population-Based Search	TPOT	Utilize evolutionary algorithms to conduct parallel evaluation of the candidate ML pipelines of a population	Embarrassingly Parallel (EP) population fitness evaluation	Capable of exploring complex pipeline combinations, but the search process is highly random and the results may be unstable
Large-Scale Parallel Ensemble and Stacking	AutoGluon	Parallel training of a large number of standard models, and then combining them through stacking and integration	Model parallelism and data parallelism (K-fold Bagging)	Powerful and stable performance, fast training speed, particularly suitable for tabular flow data, but model interpretability is relatively poor
Distributed Neural Architecture Search (NAS)	Auto-Keras, AutoML4ETC	Using the controller-worker architecture to search for network structures on a distributed cluster	Distributed training and evaluation of sub-network architectures	Holds great potential for discovering highly customized deep learning models based on traffic data, but the computational cost is extremely high

Table 2. Characteristics and limitations of key NIDS benchmark datasets.

Dataset	Year	Primary Use Case	Strengths	Known Limitations and Biases	Implication for Performance Comparison
CICIDS2017	2017	General-purpose NIDS evaluation	Realistic network topology; Diverse, modern attack types; Labeled flows with >80 features.	Severe class imbalance; Missing labels; Systematic labeling errors from CICFlowMeter tool can mislabel malicious TCP flows as benign.	Near-perfect scores should be viewed critically; performance may reflect overfitting to dataset artifacts rather than generalizable detection.
ISCX2016	2016	Encrypted, VPN traffic classification	Labeled VPN vs. non-VPN traffic for various applications; Includes full packet captures.	Generated in a highly controlled, sanitized environment with only one application active at a time; lacks realistic background noise.	High performance may not generalize to noisy, real-world networks with concurrent application traffic.
USTC-TFC2016	2016	Malicious, application traffic classification	Contains 10 types of malicious traffic from real-world captures and 10 types of normal application traffic.	Primarily used for classification tasks, not anomaly detection like CICIDS2017; less information on capture methodology.	Useful for evaluating classifiers on known malware vs. benign traffic, but less suited for evaluating zero-day anomaly detection systems.

Table 3. Performance comparison of NID methods on CICIDS2017 and CICDOS2017 datasets.

Category	Method	Dataset	ACC (%)	Precision (%)	F1-Score (%)
Traditional ML Models	KNN	CICIDS2017	96.30	96.20	96.30
	DT	CICIDS2017	99.61	99.61	99.60
	RF	CICIDS2017	99.71	99.71	99.71
	ET	CICIDS2017	99.24	99.25	99.24
	XGBoost	CICIDS2017	99.75	99.75	99.75
	LightGBM	CICIDS2017	99.77	99.77	99.76
	Cat Boost	CICIDS2017	99.55	99.55	99.55
Standard DL-CNN	LeNet	CICDS2017	98.87	96.41	88.90
	CNN	CICDOS2017	85.99	86.25	95.20
	ResNet	CICDOS2017	98.70	98.50	98.10
Advanced DL-Transformer	ViT-B/16	CICIDS2017	80.32	82.38	79.83
AutoML-based Systems	Nprint [21]	CICIDS2017	99.90	100.00	99.90
	Autonomous Cybersecurity [22]	CICIDS2017	99.80	99.80	99.80
	NAS-Net [23]	CICDOS2017	99.40	99.50	99.50
	KNN-AIDS	CICIDS2017	99.52	99.49	99.49
	DL-LSTM	CICIDS2017	99.32	99.32	99.32
	PyDSC-IDS	CICIDS2017	97.60	90.73	94.13
	OE-IDS	CICIDS2017	98.00	97.30	96.70
	PSO-DL	CICIDS2017	98.95	95.82	95.80

Table 4. Performance comparison of NID methods on ISCX, USTC-TFC, and other datasets.

Category	Method	Dataset	ACC (%)	Precision (%)	F1-Score (%)
Traditional ML Models	AppScanner	ISCX-VPN-App [24]	47.11	52.76	46.09
Traditional ML Models	CUMUL	ISCX-VPN-App	34.50	27.85	28.64
Standard DL-CNN	1DCNN	USTC-TFC2016 [25]	96.79	96.93	96.76
	2DCNN	USTC-TFC2016	96.94	97.07	96.92
	DeepPacketCNN	ISCX2016	92.24	94.01	92.04
	E2ECNN [12]	ISCX2016	92.48	93.03	92.47
Advanced DL-Transformer	FS-Net [26]	ISCX-VPN-Service [24]	69.51	59.98	57.08
	ET-BERT [27]	ISCX-VPN-Service	97.83	97.98	97.86
	YaTC [28]	ISCX-VPN-Service	95.72	95.70	95.71
	HiLo-MAE	ISCX-VPN-Service	99.19	99.20	99.19
	ViT-B/16	USTC-TFC2016	73.77	75.79	73.45
	LGLFormer	USTC-TFC2016	99.32	99.30	99.30
AutoML-based Systems	GGFAST [29]	AUCK-VI	98.60	98.10	97.41
	AutoML4ETC [19]	ISCX2016	94.35	94.87	94.40
	UWOrange-H	ISCX2016	92.56	92.56	94.87
	UCDavisCNN	ISCX2016	93.82	94.01	93.82

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, H.; Wang, X.; He, F.; Zheng, Z. Automated Network Defense: A Systematic Survey and Analysis of AutoML Paradigms for Network Intrusion Detection. Appl. Sci. 2025, 15, 10389. https://doi.org/10.3390/app151910389

AMA Style

Liu H, Wang X, He F, Zheng Z. Automated Network Defense: A Systematic Survey and Analysis of AutoML Paradigms for Network Intrusion Detection. Applied Sciences. 2025; 15(19):10389. https://doi.org/10.3390/app151910389

Chicago/Turabian Style

Liu, Haowen, Xuren Wang, Famei He, and Zhiqiang Zheng. 2025. "Automated Network Defense: A Systematic Survey and Analysis of AutoML Paradigms for Network Intrusion Detection" Applied Sciences 15, no. 19: 10389. https://doi.org/10.3390/app151910389

APA Style

Liu, H., Wang, X., He, F., & Zheng, Z. (2025). Automated Network Defense: A Systematic Survey and Analysis of AutoML Paradigms for Network Intrusion Detection. Applied Sciences, 15(19), 10389. https://doi.org/10.3390/app151910389

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automated Network Defense: A Systematic Survey and Analysis of AutoML Paradigms for Network Intrusion Detection

Abstract

1. Introduction

Related Work

2. Taxonomy of Parallel and Distributed AutoML Paradigms

2.1. Enhanced Sequential Search with Meta-Learning

2.2. Intrinsically Parallel Population-Based Search

2.3. Large-Scale Parallel Ensemble and Stacking

2.4. Distributed Neural Architecture Search (NAS)

2.5. Summary

3. Empirical Analysis of NID Systems

3.1. Meta-Analysis Methodology

Critical Analysis of Benchmark Datasets

3.2. Performance and Efficiency Comparison

3.3. Result Discussion

3.3.1. State-of-the-Art Performance via Large-Scale Parallel Ensembling

3.3.2. Discovering Novel Architectures with Distributed NAS

3.3.3. Synergy Between Domain Knowledge and Automation

3.3.4. Expanded Observations

4. Discussion: Challenges and Future Research Directions

4.1. Computational Scalability and Green AutoML

4.2. Security and Trust in Distributed Learning

4.3. End-to-End Feature Engineering for NID

Exploring Universal Feature Extraction Methods

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI