A Rigorous Comparative Study of Supervised Machine Learning Techniques for Network Anomaly Detection: Empirical Insights from the UNSW-NB15 Dataset

Alkhater, Nouf

doi:10.3390/computers15050285

Open AccessArticle

A Rigorous Comparative Study of Supervised Machine Learning Techniques for Network Anomaly Detection: Empirical Insights from the UNSW-NB15 Dataset

by

Nouf Alkhater

Department of Computer Science and Engineering, College of Computer Science and Engineering, University of Hafr Al Batin, Hafr Al Batin 39524, Saudi Arabia

Computers 2026, 15(5), 285; https://doi.org/10.3390/computers15050285

Submission received: 15 March 2026 / Revised: 27 April 2026 / Accepted: 28 April 2026 / Published: 1 May 2026

(This article belongs to the Special Issue Intelligent Systems Security: AI-Driven Approaches for Attacks, Detection & Explainability)

Download

Browse Figures

Versions Notes

Abstract

The increasing complexity of modern network infrastructures has intensified the need for reliable and efficient intrusion detection systems. While advanced deep learning approaches have demonstrated strong performance, their high computational cost and limited interpretability restrict their practical deployment in real-time environments. This study presents a systematic empirical evaluation of four supervised machine learning models—Decision Tree, Random Forest, Support Vector Machine (SVM), and XGBoost—for network anomaly detection using the UNSW-NB15 dataset. To ensure methodological rigor, a structured preprocessing pipeline and a five-fold stratified cross-validation framework were employed. Model performance was assessed using multiple evaluation metrics, including accuracy, precision, recall, F1-score, and area under the ROC curve (AUC). In addition, a feature importance analysis was conducted to identify the most influential network traffic attributes contributing to anomaly detection. The results show that ensemble-based methods outperform individual classifiers, with XGBoost achieving the best overall performance (accuracy = 0.97, AUC = 0.98) along with high stability across validation folds. The analysis further reveals that a subset of flow-based and temporal features—such as sttl, sload, and dload—plays a critical role in distinguishing between normal and malicious traffic. This study provides a rigorous, interpretable, and reproducible benchmarking framework for supervised machine learning in network anomaly detection. The findings provide practical insights for developing efficient and scalable intrusion detection systems suitable for real-world deployment.

Keywords:

machine learning; network anomaly detection; intrusion detection system; UNSW-NB15 dataset; cybersecurity; supervised learning; network security

1. Introduction

The rapid expansion of digital infrastructures, cloud services, Internet of Things (IoT) ecosystems, and interconnected enterprise networks has substantially increased the complexity and scale of modern communication environments. Although this transformation has enabled unprecedented levels of connectivity and data exchange, it has also expanded the attack surface and intensified cybersecurity risks. Contemporary networks are continuously exposed to a wide spectrum of malicious activities, including denial-of-service attacks, unauthorized access attempts, reconnaissance operations, and data exfiltration. Detecting such malicious behaviors in a timely and reliable manner has therefore become a fundamental requirement for maintaining secure and resilient network environments [1,2,3].

Traditional intrusion detection systems (IDSs), particularly signature-based approaches, remain effective for identifying previously known attack patterns. However, their dependence on predefined signatures limits their capability to detect zero-day attacks and evolving threat strategies. As a result, anomaly-based detection has received increasing attention because it aims to identify deviations from legitimate network behavior rather than relying solely on known attack fingerprints [2,4,5]. This ability makes anomaly detection especially valuable in dynamic threat landscapes where new and previously unseen attack patterns continue to emerge.

In this context, machine learning (ML) has become one of the most widely adopted paradigms for anomaly-based intrusion detection. By learning discriminative patterns from historical traffic data, ML models can automatically distinguish between normal and malicious activities. A broad range of supervised and unsupervised techniques has been explored in the literature to improve detection performance and reduce false alarm rates. Among these, supervised learning methods remain particularly attractive because of their strong predictive capabilities, relatively mature implementation pipelines, and suitability for labeled benchmark datasets [4,5,6,7].

A central requirement for the development of reliable intrusion detection systems is the availability of appropriate evaluation datasets. Earlier benchmark datasets, such as KDD Cup 99, were extensively used in prior studies; however, they suffer from important limitations, including redundancy, outdated traffic patterns, and unrealistic attack distributions [8]. To overcome these issues, more representative datasets have been proposed. Among them, the UNSW-NB15 dataset has become one of the most widely used benchmarks for modern intrusion detection research because it incorporates contemporary attack scenarios, realistic traffic behavior, and a diverse set of network flow features [9,10,11].

At the same time, recent research has increasingly shifted toward deep learning and hybrid AI-based intrusion detection architectures. These approaches can achieve strong predictive performance, especially in complex and high-dimensional settings. Moreover, the broader cybersecurity and cyber-physical literature has begun to integrate advanced AI paradigms into specialized network environments, including edge computing, vehicular communication systems, and intelligent offloading frameworks. For example, recent studies have explored multi-agent deep reinforcement learning, secure computation offloading, and AI-driven optimization strategies in mobile edge computing (MEC) and vehicular edge computing (VEC) environments [12,13]. While these directions are important and technically sophisticated, they often involve substantial computational overhead, reduced interpretability, and increased deployment complexity.

Despite the progress reported in the literature, an important practical gap remains. In many real-world network security settings, intrusion detection models must satisfy not only predictive accuracy, but also computational efficiency, interpretability, and methodological reliability. However, many existing studies either emphasize complex model architectures without sufficient attention to explainability or report performance using simple train–test splits that may yield unstable estimates. In addition, comparatively fewer studies provide a transparent analysis of which traffic attributes actually drive detection performance in classical supervised models. Consequently, there is a clear need for a rigorous and interpretable benchmarking study that evaluates well-established supervised algorithms under a more reliable validation framework.

This study addresses that need through a systematic comparative evaluation of four supervised machine learning models—Decision Tree, Random Forest, Support Vector Machine (SVM), and XGBoost—using the UNSW-NB15 dataset for binary network anomaly detection. Unlike studies that rely solely on a single random split, the present work adopts a five-fold stratified cross-validation protocol to provide more stable and statistically informative performance estimates. In addition, the study reports multiple evaluation metrics, including accuracy, precision, recall, F1-score, and AUC, and complements predictive evaluation with feature importance analysis to improve interpretability. In this way, the study does not claim algorithmic novelty; rather, its contribution lies in delivering a more rigorous, transparent, and practically relevant empirical baseline for classical supervised learning in intrusion detection.

Main Contributions

The main contributions of this study are as follows:

Rigorous comparative evaluation
The study provides a systematic comparison of four widely used supervised machine learning models—Decision Tree, Random Forest, SVM, and XGBoost—for binary network anomaly detection using the UNSW-NB15 benchmark dataset.
Robust validation framework
A five-fold stratified cross-validation protocol is employed to generate more reliable performance estimates and reduce the bias associated with a single train–test split evaluation.
Transparent methodological reporting
The study presents a clear preprocessing pipeline, model configuration details, and evaluation criteria to improve reproducibility and experimental transparency.
Multi-metric performance assessment
Model performance is analyzed using accuracy, precision, recall, F1-score, and AUC in order to capture different aspects of classification quality in intrusion detection tasks.
Interpretability-oriented analysis
A feature importance analysis is conducted to identify the network traffic attributes that most strongly influence anomaly detection performance, thereby improving the practical explainability of the evaluated models.
Practical baseline for deployment-oriented research
The findings establish an interpretable and computationally feasible benchmark that can support future work on real-time, lightweight, and operational intrusion detection systems.

2. Literature Review

2.1. Machine Learning and Deep Learning in Network Intrusion Detection

Network anomaly detection has become a central research focus due to the escalating complexity of modern network infrastructures and the continuous evolution of cyber threats. Machine learning (ML) techniques are widely adopted in this domain because of their ability to model complex traffic patterns and identify deviations from normal behavior [1,14]. Unlike traditional signature-based systems—which are limited to known attack patterns—ML-based approaches enable the detection of previously unseen or zero-day attacks by learning discriminative patterns from data [2,12]. While deep learning approaches are discussed to provide a comprehensive overview of recent advancements, the primary focus of this study remains on the rigorous evaluation of classical supervised machine learning models.

Supervised learning algorithms, including Decision Trees (DTs), Random Forests (RFs), and Support Vector Machines (SVMs), have demonstrated strong classification capabilities in intrusion detection tasks. These models are particularly effective when labeled datasets are available, allowing them to learn decision boundaries that distinguish between legitimate and malicious traffic [4,5,15,16,17]. However, many existing studies using these algorithms focus primarily on reporting accuracy improvements without sufficiently analyzing model stability, generalization behavior, or interpretability.

In parallel, deep learning (DL) techniques have been increasingly explored for network anomaly detection. Neural networks are capable of capturing complex nonlinear relationships in high-dimensional traffic data, making them suitable for detecting sophisticated attack patterns [3,8,18,19]. Hybrid approaches that combine ML and DL have also been proposed to enhance detection performance [9]. Nevertheless, a critical limitation persists: despite their strong predictive performance, DL-based methods often introduce high computational overhead and operate as “black-box” models, limiting their transparency and practical applicability in real-time intrusion detection systems [9,20].

Beyond conventional architectures, recent research has expanded toward advanced AI-driven paradigms in cyber-physical and edge environments. Techniques such as multi-agent deep reinforcement learning (DRL), intelligent resource allocation, and secure computation offloading have been investigated in mobile edge computing (MEC) and vehicular edge computing (VEC) scenarios [14]. While these approaches address complex optimization problems and dynamic network conditions, they further amplify concerns related to computational cost, system complexity, and lack of interpretability. This trend reinforces the need for reliable and efficient baseline models that can deliver strong performance while remaining transparent and deployable [13,21].

2.2. Intrusion Detection Using the UNSW-NB15 Dataset

The choice of dataset plays a critical role in evaluating intrusion detection models. Earlier datasets, such as KDD Cup 99, have been widely used but suffer from limitations including redundancy and outdated traffic patterns [8]. To address these shortcomings, the UNSW-NB15 dataset was introduced to provide a more realistic representation of modern network traffic, incorporating diverse attack categories and meaningful flow-based features [10,22,23].

Recent studies using the UNSW-NB15 dataset have explored various machine learning models, feature selection techniques, and optimization strategies [24,25,26]. For example, prior work has demonstrated that classical supervised models can achieve strong detection performance on this dataset [27,28]. However, these studies often focus on improving specific metrics or applying feature reduction techniques, rather than conducting a systematic and statistically robust comparison across multiple models under consistent evaluation conditions.

As a result, while the dataset has been extensively used, there remains a lack of comprehensive benchmarking studies that simultaneously address model performance, stability, and interpretability.

2.3. Research Gap and Motivation

Despite the substantial body of research in network anomaly detection, several limitations remain evident in the existing literature. First, many studies rely on computationally intensive deep learning or hybrid architectures that, while effective, lack transparency and are difficult to deploy in real-time environments. Second, a significant portion of prior work evaluates model performance using simple train–test splits, which may lead to unstable or biased estimates of generalization performance. Third, limited attention has been given to understanding which network traffic features contribute most to improving detection performance, reducing the interpretability of the resulting models.

These limitations highlight the need for a more rigorous and transparent evaluation of classical supervised learning approaches. In particular, there is a clear research gap in providing a statistically reliable benchmarking framework that evaluates model performance across multiple data partitions while also offering interpretable insights into feature importance.

In addition, although feature selection has been explored in previous studies [29,30], there remains a lack of qualitative analysis explaining why specific features—such as time-to-live (TTL) attributes or traffic load indicators—are particularly discriminative in identifying anomalous behavior. Understanding these relationships is essential for developing efficient and interpretable intrusion detection systems.

To address these gaps, this study proposes a systematic evaluation of supervised machine learning models using a five-fold stratified cross-validation framework, combined with feature importance analysis. The objective is not to introduce a new algorithm, but to establish a reliable, interpretable, and statistically robust performance baseline that can support both research and practical deployment.

To contextualize these efforts, Table 1 summarizes representative studies, highlighting their methodological focus and the specific gaps that the current study seeks to address.

3. Materials and Methods

3.1. Study Overview

This study aims to systematically evaluate the effectiveness of supervised machine learning techniques for network anomaly detection using structured network traffic data. The proposed framework focuses on identifying anomalous behavior by learning discriminative patterns from network flow features extracted from a benchmark intrusion detection dataset.

The anomaly detection task is formulated as a binary classification problem, in which each network traffic instance is labeled as either normal or malicious. This formulation is widely adopted in intrusion detection research because it provides a clear and operationally relevant distinction between legitimate and potentially harmful activities, while maintaining a tractable modeling framework.

To ensure methodological rigor and reproducibility, the study follows a structured experimental pipeline consisting of four main stages:

Data Acquisition: Network traffic data are obtained from the UNSW-NB15 dataset, a widely used benchmark that captures realistic modern network behavior and diverse attack scenarios.
Data Preprocessing: The raw dataset is processed through a series of transformation steps, including categorical encoding, feature normalization, and removal of non-informative attributes, to produce a consistent and model-ready feature space (see Section 3.3).
Model Development and Training: Multiple supervised machine learning algorithms—Decision Tree, Random Forest, Support Vector Machine (SVM), and XGBoost—are implemented and trained on the preprocessed dataset using a unified experimental configuration.
Performance Evaluation: Model performance is evaluated using a 5-fold stratified cross-validation framework to ensure robust and unbiased estimation. The evaluation is based on multiple metrics, including accuracy, precision, recall, F1-score, and area under the ROC curve (AUC), providing a comprehensive assessment of classification performance.

These stages collectively define a reproducible and statistically grounded methodology for assessing the effectiveness of supervised learning models in network anomaly detection. The overall workflow of the proposed framework is illustrated in Figure 1.

3.2. Dataset Description

The experimental evaluation in this study was conducted using the UNSW-NB15 dataset, a widely adopted benchmark for network intrusion detection research. This dataset was generated using the IXIA PerfectStorm tool in the Cyber Range Lab of the Australian Centre for Cyber Security, with the aim of simulating realistic modern network traffic and contemporary attack scenarios.

The dataset consisted of approximately 257,000 network flow records, each described by 49 features capturing different aspects of network behavior. These features included a combination of basic packet attributes, content-related characteristics, temporal statistics, and flow-based metrics. Such diversity enabled the machine learning models to capture both low-level and high-level patterns in network traffic.

For clarity, the features in the UNSW-NB15 dataset can be categorized into four main groups, as summarized in Table 2.

The categorization of features into basic, content, time-based, and flow-based groups is essential for understanding how different types of network attributes contribute to anomaly detection. In particular, flow-based and time-based features are expected to play a critical role in identifying malicious behavior, as they capture dynamic traffic patterns and temporal variations in network activity.

The dataset includes multiple categories of cyber attacks, such as denial-of-service (DoS), exploitation, reconnaissance, and generic malicious activities. This diversity makes it suitable for evaluating intrusion detection systems under a variety of attack conditions.

In this study, the anomaly detection problem was formulated as a binary classification task. All the attack categories were aggregated into a single class labeled as malicious (1), while legitimate traffic was labeled as normal (0). This formulation is commonly adopted in intrusion detection research to simplify the classification task while maintaining a realistic distinction between benign and harmful traffic.

The UNSW-NB15 dataset used in this study consisted of approximately 257,000 network flow records, each described by 49 original features. These records included both normal traffic and multiple categories of malicious activities, which were grouped into a single attack class for binary classification purposes. The dataset provides a comprehensive representation of modern network traffic, making it suitable for evaluating intrusion detection models under realistic conditions.

3.3. Data Preprocessing

To ensure the reliability, consistency, and reproducibility of the experimental results, a structured multi-stage preprocessing pipeline was applied to the UNSW-NB15 dataset.

Initially, the dataset was examined for missing values and redundant records. The clean partition of the UNSW-NB15 dataset was found to be internally consistent, and no imputation was required. Non-informative attributes, specifically the id column, were removed as they do not contribute to the classification task.

3.3.1. Categorical Feature Encoding

The dataset contains three categorical features: proto, service, and state. These attributes were transformed into numerical representations using Label Encoding, which maps each categorical value to an integer. This approach was adopted to maintain a consistent feature space across all the models and to avoid high-dimensional expansion associated with one-hot encoding, particularly given the large dataset size.

3.3.2. Feature Scaling

To ensure that features with larger numerical ranges do not dominate the learning process, all the numerical attributes were normalized using Min–Max scaling, mapping feature values to the range [0, 1]. This step is particularly important for distance-based and margin-based algorithms such as SVM, as well as for improving convergence in gradient-based models.

3.3.3. Final Feature Representation

After preprocessing, the dataset consisted of 48 input features (following the removal of the id attribute), all represented in numerical form. The target variable was encoded as a binary label, where 0 denotes normal traffic and 1 denotes malicious traffic.

3.3.4. Hyperparameter Configuration

To ensure a fair and consistent comparison, all the models were implemented using a unified configuration with standard and commonly adopted parameter settings. The selected hyperparameters are summarized as follows:

Decision Tree: max_depth = 10.
Random Forest: n_estimators = 100.
Support Vector Machine (SVM): RBF kernel with default regularization parameter (C = 1.0).
XGBoost: default configuration with eval_metric = ‘logloss’.

It is important to note that the objective of this study is to provide a robust and interpretable comparative baseline, rather than to perform extensive hyperparameter optimization. Therefore, default or commonly used parameter settings were adopted to ensure reproducibility and fair comparison across models.

3.3.5. Cross-Validation Strategy

To obtain statistically reliable performance estimates, a 5-fold stratified cross-validation procedure was employed. In this approach, the dataset is partitioned into five mutually exclusive subsets while preserving the class distribution in each fold. Each model is trained and evaluated five times, with each subset used once as a validation set and the remaining subsets used for training.

This strategy mitigates the bias associated with single train–test splits and provides a more robust estimation of model generalization performance. The final reported results correspond to the mean and standard deviation of each evaluation metric across the five folds.

3.3.6. Implementation Environment

The experiments were implemented using the Python programming language (Python 3.10). Data preprocessing and analysis were conducted using pandas (v1.5) andNumPy (v1.23). Machine learning models were developed and evaluated using scikit-learn (v1.3), while the XGBoost model was implemented using the XGBoost library (v1.7). All experiments were executed in a reproducible computational environ-ment to ensure consistency across model training and evaluation stages.

3.3.7. Experimental Workflow

The overall workflow and architecture of the proposed anomaly detection framework are illustrated in Figure 1.

3.4. Supervised Machine Learning Models

This study evaluates the performance of four widely used supervised machine learning algorithms for network anomaly detection: Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and XGBoost. These models were selected to represent a diverse set of learning paradigms, including tree-based methods, ensemble learning, margin-based classification, and gradient boosting.

The selection of these models is motivated by two key considerations. First, they are among the most frequently used algorithms in intrusion detection research, enabling meaningful comparison with existing studies. Second, they offer varying trade-offs between predictive performance, computational efficiency, and interpretability, which are critical factors in practical cybersecurity applications.

Decision Tree (DT): Decision Trees provide a simple and interpretable classification mechanism based on hierarchical decision rules. In the context of network anomaly detection, DT serves as a baseline model that allows direct interpretation of how specific traffic features contribute to classification decisions.
Random Forest (RF): Random Forest extends Decision Trees through an ensemble learning approach, combining multiple trees to reduce variance and improve generalization. Its robustness to noise and ability to handle high-dimensional data make it particularly suitable for network traffic analysis.
Support Vector Machine (SVM): SVM constructs a decision boundary that maximizes the margin between classes in the feature space. Using the RBF kernel, the model is capable of capturing nonlinear relationships in network traffic data. However, its performance may depend on kernel configuration and sensitivity to feature scaling.
XGBoost: XGBoost is a gradient boosting algorithm that iteratively refines predictions by correcting errors from previous models. Its ability to capture complex feature interactions and optimize performance efficiently makes it a strong candidate for anomaly detection tasks involving high-dimensional data.

Rather than focusing on introducing new models, this study emphasizes a rigorous and controlled comparative evaluation of these established algorithms under a unified experimental framework. All the models were trained and evaluated using the same preprocessed dataset and validation strategy to ensure fair comparison.

To ensure statistical reliability, all the models were evaluated using a 5-fold stratified cross-validation approach. Performance was assessed using multiple evaluation metrics, including accuracy, precision, recall, F1-score, and area under the ROC curve (AUC), allowing for a comprehensive comparison of classification effectiveness across models.

A summary of the key characteristics of the evaluated models is provided in Table 3.

3.5. Evaluation Metrics

To evaluate the performance of the machine learning models, a set of widely used classification metrics was employed. These metrics provide a comprehensive quantitative assessment of the models’ ability to correctly distinguish between normal and malicious network traffic.

The evaluation framework includes accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (AUC). The use of multiple metrics is essential in intrusion detection systems, as class imbalances and asymmetric misclassification costs can affect model evaluation.

▪ Accuracy: Accuracy measures the proportion of correctly classified instances among the total number of observations:

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(1)

where TP (true positive) represents correctly identified malicious instances, TN (true negative) represents correctly identified normal instances, FP (false positive) represents normal instances incorrectly classified as malicious, and FN (false negative) represents malicious instances incorrectly classified as normal.

Precision measures the proportion of correctly predicted attack instances among all the instances predicted as attacks. High precision indicates a low false alarm rate, which is critical in intrusion detection systems to avoid unnecessary alerts.

Precision = \frac{T P}{T P + F P}

(2)

Recall measures the proportion of actual malicious instances that are correctly identified by the model:

Recall = \frac{T P}{T P + F N}

(3)

In cybersecurity applications, recall is particularly important because failing to detect an actual attack (false negative) can have severe consequences.

The F1-score provides a harmonic mean of precision and recall, offering a balanced evaluation of the model’s classification performance.

F 1 - Score = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(4)

▪ Area Under the ROC Curve (AUC): The area under the receiver operating characteristic curve (AUC) measures the model’s ability to discriminate between classes across different classification thresholds. It reflects the trade-off between the True Positive Rate (TPR) and False Positive Rate (FPR). A higher AUC value indicates better overall separability between normal and malicious traffic. Unlike accuracy, AUC is less sensitive to class imbalance and provides a more robust evaluation of model performance in intrusion detection scenarios.

4. Experimental Setup and Results

4.1. Experimental Setup

This section presents the experimental configuration used to evaluate the performance of the selected supervised machine learning models for network anomaly detection. All the experiments were conducted using the UNSW-NB15 dataset, following the preprocessing pipeline described in Section 3.3.

The anomaly detection task was formulated as a binary classification problem, where network traffic instances were categorized as either normal or malicious. All the attack categories in the dataset were aggregated into a single class to provide a simplified yet practically relevant evaluation scenario.

To ensure robust and statistically reliable performance estimation, a five-fold stratified cross-validation procedure was employed for all the models. In this setup, the dataset is partitioned into five folds while preserving class distribution, and each model is trained and evaluated across five iterations. This approach reduces the bias associated with single train–test splits and provides a more reliable estimate of model generalization performance.

The experimental evaluation included four supervised machine learning algorithms—Decision Tree, Random Forest, Support Vector Machine (SVM), and XGBoost—as described in Section 3.4. All the models were trained using the same preprocessed feature set and evaluated under identical conditions to ensure a fair and consistent comparison.

Performance was assessed using multiple evaluation metrics, including accuracy, precision, recall, F1-score, and area under the ROC curve (AUC). These metrics collectively capture different aspects of classification performance, including the overall correctness, detection capability, and robustness to class imbalance. A summary of the experimental configuration is provided in Table 4.

4.2. Experimental Results

After training the supervised machine learning models on the preprocessed UNSW-NB15 dataset, their performance was evaluated using the metrics defined in Section 3.5. To ensure statistical robustness, all the results were obtained using a five-fold stratified cross-validation framework, and the reported values corresponded to the mean and standard deviation across the folds.

The comparative performance of the evaluated models is presented in Table 5, which summarizes the results in terms of accuracy, precision, recall, F1-score, and AUC.

The results demonstrate that all the evaluated models achieve strong classification performance, with accuracy values exceeding 90% across all cases. However, clear performance differences can be observed among the models.

XGBoost achieves the highest performance across all the evaluation metrics, including an accuracy of 0.97 and an AUC of 0.98, indicating superior ability to distinguish between normal and malicious traffic. In addition to its high mean performance, XGBoost also exhibits the lowest standard deviation among all the models, suggesting more stable and consistent behavior across different data splits.

Random Forest also demonstrates strong and reliable performance, with accuracy reaching 0.96 and relatively low variance. This indicates that ensemble-based approaches are particularly effective in handling the complexity and variability of network traffic data.

In contrast, the Decision Tree and SVM models show comparatively lower performance. While Decision Tree maintains reasonable accuracy with moderate variance, its simpler structure limits its ability to capture complex feature interactions. The SVM model exhibits the highest variability among the evaluated models, as reflected in its larger standard deviation values, indicating less stable performance across folds.

Overall, the results highlight two key observations:

(1): Ensemble-based models consistently outperform individual classifiers in network anomaly detection tasks.
(2): The use of cross-validation provides not only performance estimates but also insights into model stability, which is critical for practical deployment.

4.3. Comparative Analysis of the Results

The results presented in Table 5 demonstrate that all the evaluated models achieve strong performance in detecting anomalous network traffic, with accuracy values consistently exceeding 90%. However, clear differences emerge in terms of predictive performance and stability across models.

Among the evaluated algorithms, XGBoost achieves the best overall performance, obtaining the highest values across all the evaluation metrics, including accuracy, precision, recall, F1-score, and AUC. In addition to its superior mean performance, XGBoost exhibits the lowest standard deviation, indicating consistent behavior across different cross-validation folds. This stability suggests that the model is less sensitive to variations in training data and is better suited for generalization.

The Random Forest model also demonstrates strong and reliable performance, achieving high scores across all the metrics with relatively low variance. Although its performance is slightly lower than XGBoost, particularly in terms of precision and AUC, Random Forest remains a robust alternative due to its ability to reduce variance through ensemble averaging.

In contrast, the Decision Tree model, while achieving competitive accuracy, shows slightly higher variability and lower recall compared to ensemble methods. This indicates a reduced ability to capture complex interactions among network traffic features. However, its interpretability and low computational cost make it suitable for scenarios where transparency and efficiency are prioritized.

The Support Vector Machine (SVM) model achieves the lowest performance among the evaluated models and exhibits the highest variability across folds. This suggests that its performance is more sensitive to data distribution and parameter configuration. While SVM is effective in high-dimensional settings, its limitations become evident when handling complex, nonlinear feature interactions present in network traffic data.

Overall, these results confirm that ensemble-based approaches consistently outperform individual classifiers in network anomaly detection tasks. This finding aligns with previous studies [27,28,33,34,35,36,37], which highlight the effectiveness of ensemble methods in capturing complex feature interactions and improving generalization performance.

4.3.1. Confusion Matrix Analysis

To further evaluate the classification behavior of the best-performing model, a confusion matrix analysis was conducted for XGBoost, as presented in Table 6.

The confusion matrix reveals that XGBoost achieves high true positive and true negative rates, indicating strong capability in correctly identifying both malicious and normal traffic. In particular, the low false negative rate (3.2%) is critical in intrusion detection systems, where failing to detect an actual attack can have severe consequences.

At the same time, the relatively low false positive rate (2.5%) indicates that the model maintains a reasonable balance between detection sensitivity and false alarm control. This balance is essential in practical deployment scenarios, where excessive false alerts can overwhelm security analysts.

These results further support the effectiveness of ensemble boosting methods in capturing subtle patterns in network traffic and delivering both high detection accuracy and operational reliability.

4.3.2. Error Analysis

Building on the confusion matrix analysis, an error-oriented evaluation was conducted to further examine model performance beyond aggregate metrics.

The results indicate that the model achieves a low false negative rate (3.2%), suggesting strong capability in detecting malicious traffic. However, a small proportion of malicious instances remain misclassified as normal traffic.

These misclassifications can be more specifically attributed to similarities in traffic behavior between certain attack instances and normal network activity. For example, low-intensity reconnaissance or slow-rate probing attacks may generate traffic patterns that closely resemble benign traffic in terms of packet size, flow duration, and transmission rate.

As a result, features such as sload and dload may not exhibit sufficiently distinct values to clearly separate these instances, leading to classification ambiguity. This indicates that misclassification is not purely random but is influenced by intrinsic overlap in feature space, particularly for subtle or low-profile attack behaviors.

Similarly, the false positive rate (2.5%) reflects that a limited number of normal traffic instances are incorrectly identified as malicious. While relatively low, such errors highlight the inherent trade-off between detection sensitivity and false alarm control in intrusion detection systems.

It is important to note that this study adopts a binary classification setting, where all the attack categories are aggregated into a single class. As a result, detailed analysis of misclassification across specific attack types is beyond the scope of the current experimental design. Future work may extend this analysis to multi-class classification settings to enable a more granular understanding of model limitations across different attack categories.

4.4. Visualization of Model Performance

To provide a clearer visual comparison of the classification performance of the evaluated machine learning models, Figure 2 illustrates the comparative accuracy achieved by each model. Figure 2 visually confirms the superior and consistent performance of XGBoost across all the evaluation metrics.

While Figure 2 visually represents the comparative performance summarized in Table 5, it provides additional insight into the relative differences among models. In particular, the figure highlights the consistent performance advantage of XGBoost across all the evaluation metrics, as well as the narrower performance gap between Random Forest and XGBoost compared to the other models.

The visualization also facilitates a clearer interpretation of performance trends, especially in terms of variance and relative ranking, which may be less immediately apparent in tabular form.

4.5. Feature Importance Analysis

To provide deeper insight into the decision-making behavior of the evaluated models, a feature importance analysis was conducted. This analysis aims to identify the most influential network traffic attributes contributing to anomaly detection performance within the UNSW-NB15 dataset.

For the tree-based models (Decision Tree, Random Forest, and XGBoost), feature importance was computed using Gini importance (mean decrease in impurity), which quantifies the contribution of each feature to reducing classification uncertainty. For the SVM model, due to the use of the RBF kernel, permutation importance was employed to assess feature relevance by measuring the impact of feature value shuffling on model performance.

The results consistently indicate that temporal and flow-based features play a dominant role in distinguishing between normal and malicious traffic. In particular, features such as sttl (source-to-destination time-to-live), sload (source bits per second), and dload (destination bits per second) were ranked among the most influential features across multiple models.

The significant importance of sttl indicates that changes in Time to Live (TTL) values reveal meaningful patterns associated with abnormal network behavior. Similarly, data load characteristics, such as sload and dload, reflect abnormal data transmission patterns often linked to attack activity. Furthermore, flow attributes, such as ct_state_ttl and throughput rate, contribute to revealing patterns related to Denial-of-Service (DoS) and reconnaissance attacks.

Building on these findings, the relationship between feature importance and model performance is further examined. The superior performance of ensemble-based models, particularly XGBoost and Random Forest, can be partially attributed to their ability to effectively exploit high-impact flow-based features such as sttl, sload, and dload.

These features exhibit strong discriminative power due to their capacity to capture temporal and traffic load dynamics, which are critical in distinguishing between normal and malicious behavior. Tree-based models inherently leverage such features through hierarchical splitting, enabling them to model nonlinear interactions more effectively.

In contrast, the feature importance results obtained from the SVM model (using permutation importance) show partial consistency with tree-based models in identifying key features; however, the relative importance ranking differs due to the model’s reliance on global decision boundaries rather than hierarchical feature partitioning. This difference helps explain the relatively lower performance of SVM compared to ensemble-based methods.

These findings highlight that a subset of traffic flow and temporal features carries substantial discriminative power. From a practical perspective, this suggests that effective intrusion detection can be achieved using a reduced feature set, which may help lower computational overhead while maintaining high detection performance. Such insights are particularly valuable for real-time and resource-constrained network environments.

5. Discussion

The experimental results demonstrate the effectiveness of supervised machine learning techniques in detecting network anomalies. As shown in Table 6, all the evaluated models achieved strong classification performance, with the accuracy, F1-score, and AUC values consistently exceeding 90%. These results indicate that supervised learning approaches are capable of reliably distinguishing between normal and malicious network traffic in the UNSW-NB15 dataset. While the results demonstrate strong performance under controlled experimental conditions, further validation in real-world deployment scenarios is required to fully assess operational effectiveness.

Among the evaluated models, XGBoost achieved the best overall performance, followed by Random Forest. In addition to higher accuracy and AUC values, both models exhibited lower variance across cross-validation folds, indicating more stable and consistent performance. This suggests that ensemble-based methods are better suited to capturing complex and nonlinear relationships in network traffic data compared to individual classifiers.

One of the key empirical insights derived from this study is that model performance is affected by its ability to leverage distinctive data flow characteristics, as demonstrated in the analysis of the significance of these characteristics (Section 4.5). Characteristics such as sttl, sload, and dload contributed to improved detection performance across all the models. This finding underscores the importance of flow-based and time-based characteristics in identifying anomalous behavior in modern network environments.

While the Decision Tree model provided reasonable performance with high interpretability and low computational cost, it was less effective in capturing complex feature interactions. The SVM model, although capable of handling high-dimensional data, showed lower performance and higher variability, indicating sensitivity to data distribution and parameter settings.

Overall, the findings confirm that ensemble learning approaches provide a strong balance between predictive performance and stability, making them suitable candidates for practical intrusion detection systems. These results are consistent with prior studies [27,28,33], which report improved detection performance when ensemble methods are applied to network anomaly detection tasks.

Limitations of the Study

Despite the strong performance achieved in this study, several limitations should be acknowledged.

First, although the UNSW-NB15 dataset provides a realistic and widely used benchmark, it is generated within a controlled experimental environment. In real-world network deployments, traffic patterns are dynamic and subject to concept drift, where statistical properties change over time. As a result, models trained on static datasets may require periodic retraining to maintain performance against evolving threats.

Second, this study focuses on binary classification, where all the attack types are grouped into a single class. While this simplifies the anomaly detection task, practical intrusion detection systems often require multi-class classification to distinguish between different attack categories for more effective incident response.

Finally, although the proposed experimental framework is designed to be general and reproducible, the findings are based on a single dataset. Therefore, further validation on additional datasets—such as CIC-IDS2017 or NSL-KDD—would strengthen the generalizability of the results.

6. Conclusions

This study presented a systematic empirical evaluation of four supervised machine learning models—Decision Tree, Random Forest, Support Vector Machine (SVM), and XGBoost—for network anomaly detection using the UNSW-NB15 dataset. By employing a structured preprocessing pipeline and a five-fold stratified cross-validation framework, the study provided a reliable and reproducible performance assessment across multiple evaluation metrics.

The results demonstrate that ensemble-based methods, particularly XGBoost, achieve superior predictive performance, with accuracy reaching 0.97 and AUC reaching 0.98. In addition to high performance, these models exhibit greater stability across data partitions, highlighting their suitability for practical deployment in intrusion detection systems. The findings also confirm that classical machine learning approaches, when properly configured and evaluated, can provide an effective balance between predictive accuracy, computational efficiency, and interpretability.

The feature significance analysis also revealed that a subset of flow-based and time-based features—such as sttl, sload, and dload—play a crucial role in anomaly detection. This finding suggests the potential for developing efficient and lightweight intrusion detection systems by focusing on a small set of information-rich features. Overall, this study provides a standardized, interpretable, and reproducible framework for supervised machine learning in network anomaly detection. The results offer practical guidance for selecting appropriate models and features in real-world cybersecurity applications.

For future research, several directions can be explored:

▪ Dynamic feature engineering: Investigating automated feature selection techniques to reduce computational overhead while maintaining performance.
▪ Real-time deployment: Evaluating model performance in live network environments to assess robustness under dynamic traffic conditions.
▪ Multi-class classification: Extending the framework to distinguish between different attack categories for more granular threat analysis.
▪ Hybrid approaches: Exploring combinations of classical machine learning and lightweight deep learning methods to enhance detection capabilities.

Funding

This research received no external funding.

Data Availability Statement

The UNSW-NB15 dataset used in this study is publicly available. The implementation code used in this study is available upon reasonable request from the corresponding author.

Conflicts of Interest

The author declares no conflicts of interest.

References

Abdallah, A.M.; Alkaabi, A.S.R.O.; Alameri, G.B.N.B.; Rafique, S.H.; Musa, N.S.; Murugan, T. Cloud network anomaly detection using machine and deep learning techniques—Recent research advancements. IEEE Access 2024, 12, 56749–56773. [Google Scholar] [CrossRef]
Kasongo, S.M. An advanced intrusion detection system for IIoT based on GA and tree-based algorithms. IEEE Access 2021, 9, 113199–113212. [Google Scholar] [CrossRef]
Iqbal, A.; Amin, R. Time series forecasting and anomaly detection using deep learning. Comput. Chem. Eng. 2024, 182, 108560. [Google Scholar] [CrossRef]
Habib, B.; Khursheed, F. REST-API based DDoS detection using random forest classifier in a platform as a service cloud environment. Int. J. Comput. Digit. Syst. 2023, 14, 1075–1089. [Google Scholar] [CrossRef]
Al-Shareeda, M.A.; Manickam, S.; Saare, M.A. DDoS attacks detection using machine learning and deep learning techniques: Analysis and comparison. Bull. Electr. Eng. Inform. 2023, 12, 930–939. [Google Scholar] [CrossRef]
Momand, A.; Jan, S.U.; Ramzan, N. A systematic and comprehensive survey of recent advances in intrusion detection systems using machine learning: Deep learning, datasets, and attack taxonomy. J. Sens. 2023, 2023, 6048087. [Google Scholar] [CrossRef]
Ji, Z.; Wang, Y.; Yan, K.; Xie, X.; Xiang, Y.; Huang, J. A space-embedding strategy for anomaly detection in multivariate time series. Expert Syst. Appl. 2022, 206, 117892. [Google Scholar] [CrossRef]
Shiravi, A.; Shiravi, H.; Tavallaee, M.; Ghorbani, A.A. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 2012, 31, 357–374. [Google Scholar] [CrossRef]
Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the Military Communications and Information Systems Conference; IEEE: New York, NY, USA, 2015; pp. 1–6. [Google Scholar] [CrossRef]
Yang, Z.; Liu, X.; Li, T.; Wu, D.; Wang, J.; Zhao, Y.; Han, H. A systematic literature review of methods and datasets for anomaly-based network intrusion detection. Comput. Secur. 2022, 116, 102675. [Google Scholar] [CrossRef]
Chen, Z.; Yeo, C.; Lee, B.; Lau, C. Autoencoder-based network anomaly detection. In Proceedings of the 2018 Wireless Telecommunications Symposium (WTS); IEEE: New York, NY, USA, 2018; pp. 1–5. [Google Scholar] [CrossRef]
Ozkan-Okay, M.; Akin, E.; Aslan, Ö.; Kosunalp, S.; Iliev, T.; Stoyanov, I.; Beloev, I. A Comprehensive Survey: Evaluating the Efficiency of Artificial Intelligence and Machine Learning Techniques on Cyber Security Solutions. IEEE Access 2024, 12, 12229–12256. [Google Scholar] [CrossRef]
Apruzzese, G.; Laskov, P.; Montes de Oca, E.; Mallouli, W.; Burdalo Rapa, L.; Grammatopoulos, A.V.; Di Franco, F. The role of machine learning in cybersecurity. Digit. Threat. Res. Pract. 2023, 4, 1–38. [Google Scholar] [CrossRef]
Sohail, M.; Hamad, M.; Saeed, A. Network intrusion detection using the UNSW-NB15 dataset and the conditional ganaugmented CNN. Comput. Sci. Inf. Technol. 2025, 2025, 73–109. [Google Scholar] [CrossRef]
Fosic, I.; Zagar, D.; Grgic, K.; Krizanovic, V. Anomaly detection in NetFlow network traffic using supervised machine learning algorithms. J. Ind. Inf. Integr. 2023, 33, 100466. [Google Scholar] [CrossRef]
Shoyab, M.; Ahmed, M. Network Anomaly Detection using machine learning and stream it. Indian J. Comput. Sci. Technol. 2025, 4, 28–33. [Google Scholar] [CrossRef]
Vinisha, N. Detecting Network Traffic Anomalies with Machine Learning: A Comprehensive Approach. Int. J. Res. Appl. Sci. Eng. Technol. 2025, 13, 762–770. [Google Scholar] [CrossRef]
Waghmode, P.; Kanumuri, M.; El-Ocla, H.; Boyle, T. Intrusion detection system based on machine learning using least square support vector machine. Sci. Rep. 2025, 15, 12066. [Google Scholar] [CrossRef] [PubMed]
Mishra, P.; Varadharajan, V.; Tupakula, U.; PIlli, E. A Detailed Investigation and Analysis of Using Machine Learning Techniques for Intrusion Detection. IEEE Commun. Surv. Tutor. 2018, 21, 686–728. [Google Scholar] [CrossRef]
Saheed, Y.; Abiodun, A.; Misra, S.; Holone, M.; Colomo-Palacios, R. A machine learning-based intrusion detection for detecting internet of things network attacks. Alex. Eng. J. 2022, 61, 9395–9409. [Google Scholar] [CrossRef]
Ahmad, Z.; Khan, A.; Cheah, W.; Abdullah, J.; Ahmad, F. Network intrusion detection system: A systematic study of machine learning and deep learning approaches. Trans. Emerg. Telecommun. Technol. 2020, 32, e4150. [Google Scholar] [CrossRef]
Ness, S.; Eswarakrishnan, V.; Sridharan, H.; Shinde, V.; Janapareddy, N.; Dhanawat, V. Anomaly Detection in Network Traffic Using Advanced Machine Learning Techniques. IEEE Access 2025, 13, 16133–16149. [Google Scholar] [CrossRef]
Kumar, V.; Das, A.; Sinha, D. Statistical Analysis of the UNSW-NB15 Dataset for Intrusion Detection. Comput. Intell. Pattern Recognit. 2019, 999, 279–294. [Google Scholar] [CrossRef]
Wang, Y.; Houng, Y.; Chen, H.; Tseng, S. Network Anomaly Intrusion Detection Based on Deep Learning Approach. Sensors 2023, 23, 2171. [Google Scholar] [CrossRef]
Pai, V.; Pai, K.; S, M.; Hirmeti, S.; Bhat, V. Adaptive network anomaly detection using machine learning approaches. EURASIP J. Inf. Secur. 2025, 2025, 29. [Google Scholar] [CrossRef]
Zhou, P. A survey of streaming data anomaly detection in network security. PeerJ Comput. Sci. 2025, 11, e3066. [Google Scholar] [CrossRef]
Meftah, S.; Rachidi, T.; Assem, N. Network Based Intrusion Detection Using the UNSW-NB15 Dataset. Int. J. Comput. Digit. Syst. 2019, 8, 477–487. [Google Scholar] [CrossRef]
Moualla, S.; Khorzom, K.; Jafar, A. Improving the Performance of Machine Learning-Based Network Intrusion Detection Systems on the UNSW-NB15 Dataset. Comput. Intell. Neurosci. 2021, 2021, 5557577. [Google Scholar] [CrossRef]
Alani, M. Implementation-Oriented Feature Selection in UNSW-NB15 Intrusion Detection Dataset. In International Conference on Intelligent Systems Design and Applications; Springer International Publishing: Cham, Switzerland, 2022; Volume 418, pp. 548–558. [Google Scholar] [CrossRef]
Sajid, M.; Malik, K.; Almogren, A.; Malik, T.; Khan, A.; Tanveer, J.; Rehman, A. Enhancing intrusion detection: A hybrid machine and deep learning approach. J. Cloud Comput. 2024, 13, 123. [Google Scholar] [CrossRef]
Wang, S.; Balarezo, J.; Kandeepan, S.; Al-Hourani, A.; Chavez, K.; Rubinstein, B. Machine Learning in Network Anomaly Detection: A Survey. IEEE Access 2021, 9, 152379–152396. [Google Scholar] [CrossRef]
Naseer, S.; Saleem, Y.; Khalid, S.; Bashir, M.; Han, J.; Iqbal, M.; Han, K. Enhanced Network Anomaly Detection Based on Deep Neural Networks. IEEE Access 2018, 6, 48231–48246. [Google Scholar] [CrossRef]
More, S.; Idrissi, M.; Mahmoud, H.; Asyhari, A. Enhanced Intrusion Detection Systems Performance with UNSW-NB15 Data Analysis. Algorithms 2024, 17, 64. [Google Scholar] [CrossRef]
Achari, B.; Sreedevi, M. Network Intrusion Detection Using Supervised Machine Learning Technique with Feature Selection. Int. J. Sci. Res. 2025, 14, 208–212. [Google Scholar] [CrossRef]
Alang, K.; Bindewari, S. Machine Learning for Anomaly Detection and Prediction in Network Data. J. Quantum Sci. Technol. 2025, 2, 207–220. [Google Scholar] [CrossRef]
Liu, Y.; Pan, S.; Gong, C.; Zhou, C.; Karypis, G. Anomaly Detection on Attributed Networks via Contrastive Self-Supervised Learning. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 2378–2392. [Google Scholar] [CrossRef] [PubMed]
Chen, X.; Li, B.; Proietti, R.; Zhu, Z.; Yoo, S. Self-Taught Anomaly Detection with Hybrid Unsupervised/Supervised Machine Learning in Optical Networks. J. Light. Technol. 2019, 37, 1742–1749. [Google Scholar] [CrossRef]

Figure 1. Research workflow of the proposed anomaly detection framework.

Figure 2. Comparative performance of supervised machine learning models for network anomaly detection based on cross-validation results.

Table 1. Comparison of representative studies on machine learning-based network anomaly detection.

Study	Dataset	Method	Key Contribution	Limitation/Gap Addressed
[10,21,31]	Multiple	Surveys/Reviews	Comprehensive review of IDS methods and datasets.	Lack of direct experimental benchmarking on unified datasets.
[24,32]	Network Data	Deep Learning	Demonstrated DL’s ability to capture nonlinear patterns.	High computational cost and lower interpretability.
[27,33]	UNSW-NB15	Classical ML	Evaluated performance of standard ML classifiers.	Limited comparative coverage and lack of robust variance estimates.
[28]	UNSW-NB15	Supervised ML	Rigorous cross-validated benchmarking with interpretability analysis	Addresses lack of statistical validation and feature-level interpretability
[29]	UNSW-NB15	Feature Selection	Focused on implementation-oriented feature reduction.	Focused mainly on reduction rather than classifier benchmarking.
[30]	Multiple	Hybrid ML/DL	Proposed frameworks to enhance detection accuracy.	Increased complexity reduces deployment simplicity.

Table 2. Categories of features in the UNSW-NB15 dataset.

Feature Category	Description
Basic Features	Fundamental packet-level attributes such as protocol type, service, and duration
Content Features	Attributes related to packet content and payload characteristics
Time-Based Features	Traffic statistics computed over temporal windows
Flow-Based Features	Statistical properties derived from network flows

Table 3. Characteristics of the machine learning models used in the study.

Model	Type	Key Strength
Decision Tree	Tree-based classifier	High interpretability and simple structure
Random Forest	Ensemble learning	Robustness and reduced overfitting
Support Vector Machine	Margin-based classifier	Effective in high-dimensional feature spaces
XGBoost	Gradient boosting	High predictive performance and efficiency

Table 4. Experimental configuration used in the study.

Parameter	Description
Dataset	UNSW-NB15
Classification Type	Binary classification
Validation Strategy	5-fold stratified cross-validation
Machine Learning Models	Decision Tree, Random Forest, SVM, XGBoost
Evaluation Metrics	Accuracy, Precision, Recall, F1-score, AUC
Feature Type	Network flow features (48 features after preprocessing)

Table 5. Performance metrics of the evaluated models using 5-fold cross-validation (Mean ± SD).

Model	Accuracy	Precision	Recall	F1-Score	AUC
Decision Tree	0.93 ± 0.012	0.92 ± 0.015	0.91 ± 0.011	0.91 ± 0.013	0.940 ± 0.008
Random Forest	0.96 ± 0.008	0.95 ± 0.009	0.95 ± 0.007	0.95 ± 0.008	0.970 ± 0.005
Support Vector Machine	0.92 ± 0.018	0.91 ± 0.020	0.90 ± 0.019	0.90 ± 0.019	0.940 ± 0.012
XGBoost	0.97 ± 0.005	0.96 ± 0.006	0.96 ± 0.005	0.96 ± 0.006	0.980 ± 0.003

Table 6. Confusion Matrix of the XGBoost Model on the UNSW-NB15 Dataset.

	Predicted: Normal	Predicted: Malicious
Actual: Normal	97.5% (TN)	2.5% (FP)
Actual: Malicious	3.2% (FN)	96.8% (TP)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alkhater, N. A Rigorous Comparative Study of Supervised Machine Learning Techniques for Network Anomaly Detection: Empirical Insights from the UNSW-NB15 Dataset. Computers 2026, 15, 285. https://doi.org/10.3390/computers15050285

AMA Style

Alkhater N. A Rigorous Comparative Study of Supervised Machine Learning Techniques for Network Anomaly Detection: Empirical Insights from the UNSW-NB15 Dataset. Computers. 2026; 15(5):285. https://doi.org/10.3390/computers15050285

Chicago/Turabian Style

Alkhater, Nouf. 2026. "A Rigorous Comparative Study of Supervised Machine Learning Techniques for Network Anomaly Detection: Empirical Insights from the UNSW-NB15 Dataset" Computers 15, no. 5: 285. https://doi.org/10.3390/computers15050285

APA Style

Alkhater, N. (2026). A Rigorous Comparative Study of Supervised Machine Learning Techniques for Network Anomaly Detection: Empirical Insights from the UNSW-NB15 Dataset. Computers, 15(5), 285. https://doi.org/10.3390/computers15050285

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Rigorous Comparative Study of Supervised Machine Learning Techniques for Network Anomaly Detection: Empirical Insights from the UNSW-NB15 Dataset

Abstract

1. Introduction

Main Contributions

2. Literature Review

2.1. Machine Learning and Deep Learning in Network Intrusion Detection

2.2. Intrusion Detection Using the UNSW-NB15 Dataset

2.3. Research Gap and Motivation

3. Materials and Methods

3.1. Study Overview

3.2. Dataset Description

3.3. Data Preprocessing

3.3.1. Categorical Feature Encoding

3.3.2. Feature Scaling

3.3.3. Final Feature Representation

3.3.4. Hyperparameter Configuration

3.3.5. Cross-Validation Strategy

3.3.6. Implementation Environment

3.3.7. Experimental Workflow

3.4. Supervised Machine Learning Models

3.5. Evaluation Metrics

4. Experimental Setup and Results

4.1. Experimental Setup

4.2. Experimental Results

4.3. Comparative Analysis of the Results

4.3.1. Confusion Matrix Analysis

4.3.2. Error Analysis

4.4. Visualization of Model Performance

4.5. Feature Importance Analysis

5. Discussion

Limitations of the Study

6. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI