Harnessing AI for Cyber Defense: Honeypot-Driven Intrusion Detection Systems

Alatawi, Eman; Albalawi, Umar

doi:10.3390/sym17050628

Open AccessArticle

Harnessing AI for Cyber Defense: Honeypot-Driven Intrusion Detection Systems

by

Eman Alatawi

and

Umar Albalawi

^*

College of Computing and Information Technology, University of Tabuk, Tabuk 71491, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(5), 628; https://doi.org/10.3390/sym17050628

Submission received: 15 March 2025 / Revised: 18 April 2025 / Accepted: 19 April 2025 / Published: 22 April 2025

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

Anomaly detection is essential in cybersecurity for identifying abnormal activities, a requirement that has grown increasingly critical with the complexity of cyberthreats. This study leverages the BPF-Extended Tracking Honeypot (BETH) dataset, a comprehensive resource designed to benchmark robustness in detecting anomalous behavior in kernel-level process and network logs. The symmetry of the proposed system lies in its ability to identify balanced and consistent patterns within kernel-level process logs, which form the foundation for accurately distinguishing anomalies. This study focuses on anomaly detection in kernel-level process logs by introducing an enhanced Isolation Forest (iForest) model, which is integrated into a structured framework that includes exploratory data analysis (EDA), data pre-processing, model training, validation, and evaluation. The proposed approach achieves a significant performance improvement in the anomaly detection results, with an area under the receiver operating characteristic curve (AUROC) score of 0.917—an approximate 7.88% increase over the baseline model’s AUROC of 0.850. Additionally, the model demonstrates high precision (99.57%), F1-score (91.69%), and accuracy (86.03%), effectively minimizing false positives while maintaining balanced detection capabilities. These results underscore the role of leveraging symmetry in designing advanced intrusion detection systems, offering a structured and efficient solution for identifying cyberthreats.

Keywords:

anomaly detection; honeypot; intrusion detection system; exploratory data analysis; deep learning

1. Introduction

Cybersecurity has become a critical concern in today’s digital era, with the volume and sophistication of cyberattacks growing exponentially [1]. From phishing schemes and ransomware attacks to sophisticated state-sponsored actions, cyberthreats continue to evolve, exploiting vulnerabilities in systems, networks, and users themselves. These attacks pose significant risks, including the loss of sensitive data, reputational damage, legal repercussions, and financial losses [2]. Addressing such challenges is vital for safeguarding individual and corporate entities and maintaining global digital trust. However, traditional security measures, such as firewalls and basic intrusion detection systems, often struggle to keep pace with these rapidly evolving threats, necessitating innovative approaches to enhance detection and response mechanisms.

In this context, machine learning (ML) has emerged as a transformative technology. By leveraging the ability of ML models to analyze large datasets, organizations can uncover patterns, detect anomalies, and predict potential security breaches with unprecedented speed and accuracy. Through the continuous learning of historical attack patterns, ML systems demonstrate dynamic adaptability to emerging threats, offering real-time insights [3,4]. These capabilities are especially critical in proactively mitigating risks and preventing significant damage before threats materialize. Nonetheless, effectively leveraging ML for anomaly detection in cybersecurity systems introduces a set of challenges that must be addressed.

Cybersecurity anomaly detection, particularly when supported by machine learning, faces challenges arising from diverse data sources, sophisticated attacks, and limited detection systems. Although anomaly detection plays an important role in identifying potential threats and mitigating associated risks, its effectiveness is often constrained by several factors. These include the difficulty in selecting suitable algorithms, the complexities of processing large and diverse datasets, and the need to effectively identify previously unknown or emerging attack patterns [5,6,7,8]. Addressing these challenges is essential for improving the robustness and reliability of cybersecurity anomaly detection systems.

The integration of machine learning (ML) algorithms with honeypot systems offers a promising approach for enhancing real-time anomaly detection by combining the strengths of both technologies. Honeypots, which serve as decoy systems to attract and analyze malicious activities, generate valuable datasets that can be utilized to train machine learning models [9]. These models are adept at identifying patterns and anomalies in network traffic, thereby improving the detection of potential threats [10]. By leveraging the dynamic capabilities of machine learning, such integrated systems can achieve greater accuracy and efficiency in anomaly detection, while also demonstrating adaptability to evolving threats in real-time. As the need for more effective anomaly detection grows, the BETH dataset, an acronym for BPF-Extended Tracking Honeypot dataset, serves as a critical resource for advancing these capabilities [11]. The dataset, designed to provide realistic and comprehensive data, has been instrumental in training and evaluating machine learning models to detect and mitigate cyberthreats across various sectors. By bridging the gap between theoretical research and practical applications, the dataset offers a robust foundation for benchmarking and improving anomaly detection techniques. This dataset, collected using a novel honeypot tracking system, consists of over eight million data points from 23 hosts, with activities fully labeled as benign or malicious. Its heterogeneity, encompassing the kernel-process and network logs, enables detailed analysis of diverse cyberthreats. Prior studies utilizing the BETH dataset have evaluated several anomaly detection models, including robust covariance, one-class SVM, and isolation forest (iForest). Among these, the iForest model achieved the highest AUROC score of 0.850, demonstrating its effectiveness in differentiating suspicious events from benign in the testing dataset, while highlighting areas for improvement in both accuracy and adaptability to real-world scenarios [11].

Building on these evaluations, this study focuses on proposing and developing an isolation forest model to outperform BETH’s baseline in terms of AUROC, since it is the primary metric used by the authors of the BETH Dataset to evaluate their model. In addition to that, we also evaluate performance using other metrics, including precision, F1-score, and accuracy. Our contributions include refining the iForest model’s anomaly detection capabilities, validating its performance on the dataset, and offering insights into its practical applications for strengthening cybersecurity defenses.

The paper is organized as follows: Section 2 provides an overview of the related work on anomaly detection, honeypot systems, and the integration of machine learning in cybersecurity. Section 3 outlines the methodology used to develop the isolated forest model and the evaluation metrics employed. Section 4 presents the results, compares the proposed model with the baseline model, and discusses its implications. Finally, Section 5 concludes the paper, summarizes the key findings, and suggests directions for future research.

2. Related Work

Anomaly detection, honeypot systems, and the integration of machine learning into cybersecurity are critical components of the evolving landscape of digital security. Anomaly detection is essential for spotting unusual patterns that might signal threats. Traditional methods, like rule-based systems, often miss new attacks, so machine learning has become key, learning from data to adapt to changes [12]. However, the data used is crucial, and that’s where honeypots come in. Honeypots are decoy systems that attract attackers, letting us collect detailed data on their actions, which is great for training anomaly detection models while simultaneously diverting threats from critical systems [13]. The Honeyboost framework integrates honeypots with data fusion and anomaly detection techniques, thereby enhancing the overall performance of honeypots in detecting and mitigating security threats [14]. Studies also show honeypots defending against advanced threats in industrial IoT and securing 5G networks.

The study [15] addresses a novel honeypot-based strategy to defend industrial Internet of Things (IIoT) systems against Advanced Persistent Threats (APTs), the current leading class of cyberattackers. Because of persistence and sophistication, APTs represent the most dangerous threat to IIoT systems, including critical infrastructure. The study introduces an adaptive, resource-saving honeypot strategy using game theory and behavioral economics to advance IIoT security. Strategic honeypot deployment by Grammatikis et al. [16] is capable of leveraging honeypots to enhance the security of ultra-dense 5G (B5G) networks.

In the field of power grids, honeypots and anomaly detection are vital. The following research [17] investigates a defense strategy against attacks targeting the power grid. The findings emphasize the importance of adaptive and psychology-aware defense mechanisms in the protection of power grids. The authors of [18] propose a novel framework for detecting cyberthreats in smart grid networks using collaborative and incentivized honeypot-based detection mechanisms. By combining honeypot technology, federated learning, and incentivization, the framework achieves a balance among effective detection, scalability, and privacy. Its adaptability makes it a promising solution to cybersecurity challenges in smart grids.

As an example of intrusion detection, Rehman et al. [19] propose FLASH, which represents a significant advancement in intrusion detection by combining provenance graph analysis with state-of-the-art graph representation learning. The authors of [20] introduce a robust framework for backdoor detection in deep learning models using topological evolution dynamics. The study highlights the potential of leveraging topological data analysis to enhance the security of machine learning systems. Adilinam et al. [21] investigate the importance of provenance auditing as a powerful tool for intrusion detection and system analyses.

Despite these advances, there’s a big gap in research on anomaly detection using kernel-level logs, which track low-level system activities like system calls. These logs are high-dimensional, noisy, and often imbalanced [22]. Public datasets for kernel-level anomaly detection remain scarce, impeding the development and rigorous evaluation of new models. Nonetheless, monitoring system calls during program execution offers broad coverage for detecting malicious behaviors across diverse applications [23].

3. Methodology

This section describes the systematic approach undertaken to develop and evaluate the proposed Isolation Forest (iForest) model using the BETH dataset [24]. The framework involves multiple stages that delineate the key components of the proposed approach, as illustrated in Figure 1. These stages include the environment setup, dataset selection, exploratory data analysis (EDA) (Section 3.1), data pre-processing (Section 3.2), model training (Section 3.3), model validation, model evaluation, and model predictions (Section 3.4). For the exploratory data analysis (EDA) and the implementation of the proposed method, Python version 3.12 is used, utilizing the Anaconda Navigator version 2.6.4 and Jupyter Notebook version 7.2.2 to ensure seamless integration of essential libraries. These libraries included Pandas for data processing, Matplotlib and seaborn for data visualization, and Scikit-learn to develop and execute the machine learning workflow pipeline.

3.1. Dataset Selection and Exploratory Data Analysis (EDA)

We focus on the kernel-process log subset, comprising 1,141,078 data points generated from multiple hosts collected using a honeypot tracking system. These data points are organized into training, validation, and testing sets, with activities labeled as benign, suspicious, or malicious, providing a comprehensive foundation for evaluating the performance of our anomaly detection model. However, they require targeted pre-processing to handle their high dimensionality and noise, as discussed in Section 3.2. Figure 2 illustrates the proportional distribution of the dataset, divided into three subsets: the training dataset (763,144 instances, 66.88%), validation dataset (188,967 instances, 16.56%), and test dataset (188,967 instances, 16.56%). The training dataset comprises the majority of the data, facilitating model learning and training, whereas the validation and test datasets are equally divided to ensure robust model evaluation and performance assessment.

The training, validation, and test datasets are verified to have consistent columns, comprising 16 features, including two labels (targets). This ensured compatibility and uniformity throughout the machine learning pipeline. Importantly, no missing data or values are identified, further reinforcing the reliability of the dataset for the analysis and model development.

The EDA process aimed to gain insights into the structure of the dataset, statistical properties, feature relationships, and distributions. This process involved visualizing data patterns, relationships, and distributions to uncover trends, frequency distributions, and correlations among features. Key visualizations included histograms, bar charts, grouped bar plots, and correlation heatmaps. These tools provided valuable guidance for the pre-processing step:

Event Timeline (Figure 3): Revealed activity bursts, including sudden spikes in event frequency associated with distributed denial-of-service (DDoS) or botnet activities (e.g., setup Dota3 botnet). Significant spikes are observed between the 400 and 500 s mark, indicating periods of increased activity or anomalies in the system.
Label FreDquency istributions and Combinations (Figure 4 and Figure 5): Highlighted the occurrences of suspicious (‘sus’) and malicious (‘evil’) labels, as well as their combinations to provide a deeper understanding of label occurrences and relationships across training, validation, and test datasets. The combination distribution illustrated in Figure 5 reveals the significant variations and challenges that impact the development and evaluation of the anomaly detection model. In the training dataset, notably, malicious events (sus = 1, evil = 1) are absent, constituting 0% of the dataset. This lack of exposure to malicious activity during training presents a critical challenge for the model because it may struggle to effectively detect and classify rare but severe threats in unseen data. Similarly, the validation dataset demonstrates a pronounced imbalance. Malicious events are again entirely absent. This imbalance restricts the capacity of the model to fine-tune its parameters for detecting severe threats. Consequently, the model may perform well in classifying benign and suspicious activities but fails to generalize effectively when encountering malicious events in real-world scenarios. By contrast, the test dataset offers a distinct distribution, with a significant proportion of malicious events. This dataset provides a unique opportunity to rigorously evaluate the model’s ability to detect and classify severe threats under realistic conditions. However, the comparatively lower representation of benign and suspicious events highlights the necessity of the model to balance its classification capabilities across all activity types to avoid overfitting to the dominant malicious class. Overall, the distribution imbalances across these datasets present notable challenges for model development and evaluation. Although the training and validation datasets lack sufficient representation of malicious events, the test dataset emphasizes the importance of detecting these critical threats. To address these challenges, the pre-processing stage focuses on strategies to balance the datasets, ensuring a well-rounded and effective anomaly detection model that performs robustly in diverse scenarios. As shown in Table 1, the majority of events are benign, with only 13.88% being classified as malicious.
Correlation Heatmaps (Figure 6): Conducted to examine the relationships among the selected features and the target variables across the training, validation, and testing datasets and their respective impacts on model design and pre-processing decisions. These heatmaps provide valuable insights into feature dependencies and their potential influence on model training and evaluation. The correlation heatmaps were generated by computing Pearson correlation coefficients for all pairs of numeric features using pandas’ corr() function and visualized with seaborn’s heatmap(). In the training dataset, a strong correlation is observed between userId and the sus, indicating a significant relationship between user activity patterns and suspicious behavior. However, no data are available for the evil label in the training set, limiting insights into its interaction with other features. The perfect positive correlation (1.0) observed between the processId and threadId features indicates that these features are essentially identical, implying possible redundancy. The features eventId and argsNum show moderate positive correlations (0.63), highlighting their relevance in distinguishing event characteristics. Other features, such as ParentProcessId, exhibited weak correlations with the target variable, suggesting limited predictive utility. Additionally, most feature correlations remain near zero, indicating minimal multicollinearity, which is favorable for machine learning models such as iForest that assume feature independence. In the validation dataset, the correlation patterns are consistent with those of the training dataset, and sus maintained strong correlation with userId (0.99). However, no substantial correlations are evident between the other features and the target variable, which reinforces the independence assumption. Interestingly, MountNamespace displays a relatively negative correlation with several features, potentially indicating distinct behavior patterns associated with this feature. The test dataset reveals distinct correlation patterns, given the significant presence of malicious events (evil). The feature evil exhibits a strong correlation with userId (0.90) and parentProcessId (0.72). Furthermore, the sus and timestamp features exhibit a moderate positive correlation with the evil feature, with correlation coefficients of 0.73 and 0.70, respectively. This indicates their significance in identifying malicious activities. The higher prevalence of malicious events in the test set contributes to stronger correlations with specific features, providing a realistic context for model evaluation.

3.2. Data Pre-Processing

The data pre-processing stage aimed to enhance the quality, consistency, and relevance of the input data, facilitating effective model training and evaluation. As part of this process, seven key features are selected along with a single target label for anomaly detection. The selected features included processId, parentProcessId, userId, mountNamespace, eventId, argsNum, and returnValue. The target label sus is used to represent the anomalous classification. Key steps in this process are as follows:

Data Transformation: This step ensured consistency and compatibility across dataset formats. Raw features are transformed to align with the requirements of the anomaly detection model, adhering to the recommendations and suggestions outlined in [11]. These transformations aimed to optimize the feature representation, thereby improving the model performance. In the proposed iForest model, we employed the same set of engineered features to ensure consistency and comparability.
Data Standardization: To address variations in feature scales, data standardization is performed by normalizing the feature values. This process ensured that the features are on a comparable scale, enhancing the convergence of the model during training. The target variable, sus (suspicious activity), served as the primary label for model training. Standardizing the data is particularly crucial for the iForest algorithm because it relies on consistent feature scaling to accurately detect anomalies. We standardized all seven input features using the StandardScaler() class from the scikit-learn library to ensure zero mean and unit variance across each dimension. The following Python code snippet illustrates how we applied it:
   scaler = StandardScaler()
   X_train_scaled = scaler.fit_transform(X_train)
   X_val_scaled = scaler.transform(X_val)
   X_test_scaled = scaler.transform(X_test)

3.3. Machine Learning Model Training

An Isolation Forest (iForest) model, which is a tree-based unsupervised learning model, is employed because of its proven effectiveness in detecting anomalies within the dataset. The inherent ability to isolate outliers using recursive partitioning makes it particularly suitable for cybersecurity anomaly detection. The implementation utilized the scikit-learn library, a widely adopted Python framework for machine learning. The model is initialized with the following hyperparameters to optimize performance. The number of base estimators (n_estimators) is set to 100, and the contamination rate (contamination) is set to 0.01 to indicate the proportion of expected anomalies, and the random state (random_state) is fixed at 42 to ensure reproducibility. The model is trained on a scaled version of the training dataset (X_train_scaled) to ensure that all features contribute equally to the anomaly detection process. The following Python code snippet illustrates the configuration and training process:

iforest = IsolationForest(n_estimators = 100,

contamination = 0.01, random_state = 42)

iforest.fit(X_train_scaled)

3.4. Model Validation and Evaluation

To ensure a robust performance evaluation, the proposed model is validated using a hold-out validation set. The validation process employed a separate dataset, with a minimum AUROC threshold of 0.55 as the performance criterion. Once the model exceeded this threshold, it was subjected to testing. The validity of the model is assessed using standard evaluation metrics to measure the performance of the ML-based approach. These metrics included the confusion matrix, AUROC, precision, recall, accuracy, and F1-score. The confusion matrix provided an analysis of the true versus predicted classes, enabling a comprehensive evaluation of the model’s detection capabilities across the defined classes. The corresponding evaluation metrics derived from the confusion matrix are used to quantify the effectiveness of the model in distinguishing between benign and suspicious activities. For comparison, we reference the highest published AUROC score from [11] for the iForest model, which demonstrates the best performance in distinguishing suspicious events from benign ones, while noting that no additional metrics are reported in that study.

4. Experimentation Results and Discussion

This study introduces an unsupervised learning-based detection system for identifying cybersecurity anomalies by utilizing the BETH dataset. We train, validate, and test our model using a subset of over 1 million records, which is considered a sufficient and large dataset size. The performance of the proposed model is assessed using the aforementioned evaluation metrics, summarized in Figure 7. In this section, we present the key findings of our experiment. The subsequent subsections provide detailed analyses of various performance metrics.

4.1. Confusion Matrix Analysis

Figure 8 depicts the confusion matrix for the developed isolation forest model, providing a detailed visual representation of the classification results. The model demonstrates its ability to effectively identify suspicious activities, achieving a high number of true positives (TP) with 145,688 cases correctly classified as suspicious. Additionally, the model maintains a low false positive (FP) rate, with only 623 instances incorrectly classified as suspicious when they are actually benign. This underscores the strength of the model in detecting genuine anomalies with high precision (99.57%). However, the model has misclassified 25,771 cases (15.03%) as false negatives (FN), where benign cases are actually suspicious. The true negatives (TN), with 16,885 cases correctly classified as benign, further indicate the model’s ability to distinguish normal behavior from anomalies effectively. Overall, the confusion matrix reveals that out of 171,459 suspicious cases, the model achieved a notable detection rate but leaves room for improvement in identifying all anomalies.

4.2. Sensitivity Analysis

Sensitivity, also known as the recall or true positive rate (TPR), refers to the proportion of true positive detections out of all actual anomalies (both true positives and false negatives). It measures the ability of the model to identify all actual anomalies. Mathematically, it is defined as

R e c a l l = \frac{T P}{T P + F N} = \frac{145688}{145688 + 25771} = 0.8497

(1)

The model achieved a recall score of 0.8497, indicating that it successfully identified 84.97% of all the actual suspicious activities.

4.3. Precision Analysis

Precision refers to the proportion of true positive detections (correctly identified anomalies) out of all positive detections (both true positives and false positives). It measures the accuracy of the model in identifying only the actual anomalies.

P r e c i s i o n = \frac{T P}{T P + F P} = \frac{145688}{145688 + 623} = 0.9957

(2)

The model exhibited outstanding precision, with a score of 0.9957. This high level of precision signifies that when the model classifies an activity as suspicious, it is accurate 99.57% of the time. In the realm of cybersecurity, such high precision is vital as it reduces false alarms (FPs), enabling security teams to concentrate on actual threats.

4.4. F1-Score Analysis

The F1-score provides a balanced measure of the model’s performance, considering both precision and recall.

F 1 = \frac{2 * P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l} = \frac{2 * (0.9957 * 0.8497)}{0.9957 + 0.8497} = 0.9169

(3)

The model achieved an F1-score of 91.69%, indicating that it maintains a good balance between identifying suspicious activities accurately and capturing a high proportion of actual threats.

4.5. Accuracy Analysis

Accuracy refers to the proportion of correctly classified instances (both true positives and true negatives) out of the total number of instances. It measures the overall effectiveness of the model in correctly identifying both anomalies and normal activity.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N} = \frac{145688 + 16885}{145688 + 16885 + 623 + 25771} = 0.8603

(4)

The model achieved 86.03% accuracy, indicating good classification of both normal and suspicious activities. However, this metric can be misleading with imbalanced datasets. We prioritized precision, F1-score, and AUROC. These metrics provide a more accurate assessment of anomaly detection performance, particularly given the absence of malicious events during training and validation (Figure 5).

4.6. Area Under the Receiver Operating Characteristic Curve (AUROC) Analysis

Figure 9 illustrates the ROC curve, which represents the model’s ability to distinguish between the true positive rate (TPR) and the false positive rate (FPR) across various thresholds. The AUROC score of 0.91 indicates excellent performance, highlighting the capability of the model to effectively differentiate between benign and suspicious activities across various threshold settings. This represents a significant improvement over the BETH’s baseline iForest [11] model, which achieved an AUROC of 0.85. Figure 10 displays a comparison of the AUROC results of the ML models.

5. Conclusions

In this study, we demonstrated the effectiveness of the proposed Isolation Forest (iForest) model for anomaly detection by leveraging the rich data provided by the BPF-Extended Tracking Honeypot (BETH) dataset to enhance cybersecurity defenses. The study began with an exploratory data analysis (EDA) to gain insights into the dataset, examining correlations, relationships, and patterns within kernel-process logs classified as benign, suspicious, or malicious. The proposed iForest model significantly outperformed the baseline model, achieving an AUROC score of 0.917, compared with the baseline’s 0.850, representing an approximate 7.88% improvement in detection performance on kernel-system logs using the same set of features. This underscores the ability of the model to effectively differentiate between normal and anomalous activities. The model further demonstrated strong performance with a precision of 99.57%, an F1-score of 91.69%, and an overall accuracy of 86.03%, demonstrating its capability to minimize false positives and deliver balanced performance across multiple metrics. These results highlight its reliability and suitability for real-world cybersecurity applications.

While the outcomes are promising, several areas require improvement. Notably, the false negative rate is a critical concern. Future research should enhance the recall by optimizing the model’s sensitivity to subtle and rare anomalies. This includes refining feature selection and engineering to capture informative patterns more effectively, informed by insights from the exploratory data analysis. By emphasizing the most relevant features and their relationships, the ability of the model to detect subtle anomalies will improve significantly. In addition, incorporating deep learning techniques may provide advanced capabilities for feature extraction and representation, allowing the model to identify complex and evolving patterns in the landscape of cyberthreats.

Author Contributions

E.A., methodology, software, visualization, and writing. U.A., conceptualization and supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available at https://www.kaggle.com/datasets/katehighnam/beth-dataset/data (accessed on: 2 December 2024), reference number [24].

Conflicts of Interest

The authors declare no conflict of interest.

References

Verma, R. Cybersecurity Challenges in the Era of Digital Transformation; Infinity Publication Pvt. Ltd.: Lunawada, India, 2024; p. 187. [Google Scholar] [CrossRef]
Pearson, N. A larger problem: Financial and reputational risks. Comput. Fraud. Secur. 2014, 2014, 11–13. [Google Scholar] [CrossRef]
Mahdi, A.A. Machine learning applications of network security enhancement: Review. Comput. Sci. Res. J. 2024, 5, 2283–2300. [Google Scholar] [CrossRef]
Albtosh, L.B. Advancements in cybersecurity and machine learning: A comprehensive review of recent research. World J. Adv. Eng. Technol. Sci. 2024, 13, 271–284. [Google Scholar] [CrossRef]
Phulre, A.K.; Jain, S.; Jain, G. Evaluating Security Enhancement Through Machine Learning Approaches for Anomaly-Based Intrusion Detection Systems. In Proceedings of the 2024 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), Bhopal, India, 24–25 February 2024; pp. 1–5. [Google Scholar] [CrossRef]
Vajpayee, P.; Hossain, G. Reduction of Cyber Value at Risk (CVaR) Through AI Enabled Anomaly Detection. In Proceedings of the SoutheastCon 2024, Atlanta, GA, USA, 15–24 March 2024; pp. 623–629. [Google Scholar] [CrossRef]
Arjunan, T. Detecting Anomalies and Intrusions in Unstructured Cybersecurity Data Using Natural Language Processing. Int. J. Sci. Technol. Eng. 2024, 12, 1023–1029. [Google Scholar] [CrossRef]
Tushkanova, O.; Levshun, D.; Branitskiy, A.; Fedorchenko, E.; Novikova, E.; Kotenko, I. Detection of Cyberattacks and Anomalies in Cyber-Physical Systems: Approaches, Data Sources, Evaluation. Algorithms 2023, 16, 85. [Google Scholar] [CrossRef]
Amal, M.; Venkadesh, P. Review of cyber attack detection: Honeypot system. Webology 2022, 19, 5497–5514. [Google Scholar]
Baisholan, N.; Baisholanova, K.; Kubayev, K.; Alimzhanova, Z.; Baimuldina, N. Corporate network anomaly detection methodology utilizing machine learning algorithms. Smart Sci. 2024, 12, 666–678. [Google Scholar] [CrossRef]
Highnam, K.; Arulkumaran, K.; Hanif, Z.; Jennings, N.R. BETH Dataset: Real Cybersecurity Data for Anomaly Detection Research. In Proceedings of the ICML Workshop on Uncertainty and Robustness in Deep Learning, Virtual, 23 July 2021. [Google Scholar]
Landauer, M.; Onder, S.; Skopik, F.; Wurzenberger, M. Deep learning for anomaly detection in log data: A survey. Mach. Learn. Appl. 2023, 12, 100470. [Google Scholar] [CrossRef]
Diamantoulakis, P.; Dalamagkas, C.; Radoglou-Grammatikis, P.; Sarigiannidis, P.; Karagiannidis, G. Game Theoretic Honeypot Deployment in Smart Grid. Sensors 2020, 20, 4199. [Google Scholar] [CrossRef] [PubMed]
Kandanaarachchi, S.; Ochiai, H.; Rao, A. Honeyboost: Boosting honeypot performance with data fusion and anomaly detection. arXiv 2021, arXiv:2105.02526. [Google Scholar] [CrossRef]
Tian, W.; Du, M.; Ji, X.; Liu, G.; Dai, Y.; Han, Z. Honeypot detection strategy against advanced persistent threats in industrial internet of things: A prospect theoretic game. IEEE Internet Things J. 2021, 8, 17372–17381. [Google Scholar] [CrossRef]
Radoglou-Grammatikis, P.; Sarigiannidis, P.; Diamantoulakis, P.; Lagkas, T.; Saoulidis, T.; Fountoukidis, E.; Karagiannidis, G. Strategic honeypot deployment in ultra-dense beyond 5g networks: A reinforcement learning approach. IEEE Trans. Emerg. Top. Comput. 2022, 12, 643–655. [Google Scholar] [CrossRef]
Tian, W.; Ji, X.; Liu, W.; Liu, G.; Zhai, J.; Dai, Y.; Huang, S. Prospect theoretic study of honeypot defense against advanced persistent threats in power grid. IEEE Access 2020, 8, 64075–64085. [Google Scholar] [CrossRef]
Albaseer, A.; Abdi, N.; Abdallah, M.; Qaraqe, M.; Al-Kuwari, S. FedPot: A Quality-Aware Collaborative and Incentivized Honeypot-Based Detector for Smart Grid Networks. IEEE Trans. Netw. Serv. Manag. 2024, 21, 4844–4860. [Google Scholar] [CrossRef]
Rehman, M.U.; Ahmadi, H.; Hassan, W.U. FLASH: A Comprehensive Approach to Intrusion Detection via Provenance Graph Representation Learning. In Proceedings of the 2024 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2024; p. 139. [Google Scholar]
Mo, X.; Zhang, Y.; Zhang, L.Y.; Luo, W.; Sun, N.; Hu, S.; Gao, S.; Xiang, Y. Robust backdoor detection for deep learning via topological evolution dynamics. In Proceedings of the 2024 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2024; p. 171. [Google Scholar]
Inam, M.A.; Chen, Y.; Goyal, A.; Liu, J.; Mink, J.; Michael, N.; Gaur, S.; Bates, A.; Hassan, W.U. Sok: History is a vast early warning system: Auditing the provenance of system intrusions. In Proceedings of the 2023 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 21–25 May 2023; pp. 2620–2638. [Google Scholar]
Thudumu, S.; Branch, P.; Jin, J.; Singh, J.J. A comprehensive survey of anomaly detection techniques for high dimensional big data. J. Big Data 2020, 7, 42. [Google Scholar] [CrossRef]
Stolfo, S.J.; Hershkop, S.; Bui, L.H.; Ferster, R.; Wang, K. Anomaly detection in computer security and an application to file system accesses. In Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2005; pp. 14–28. [Google Scholar] [CrossRef]
Available online: https://www.kaggle.com/datasets/katehighnam/beth-dataset/data (accessed on 2 December 2024).

Figure 1. The proposed anomaly detection framework.

Figure 2. Distribution of training, validation, and test subsets.

Figure 3. Attack timeline in the test dataset. This line plot displays the timeline of the attack captured in the Test Dataset, with events grouped by seconds from the machine’s startup.

Figure 4. Frequency distributions of suspicious (‘sus’) and malicious (‘evil’) labels across datasets.

Figure 5. Distribution of ‘sus’ and ‘evil’ combinations across the training, validation, and testing datasets: (a) training dataset. (b) validation dataset. (c) testing dataset.

Figure 6. Correlation heatmaps. The color intensity represents the Pearson correlation coefficients, ranging from

- 1

(perfect negative correlation) to 1 (perfect positive correlation), with 0 representing no correlation. Color intensity from light green (negative) to dark green (positive) reflects the strength of the relationship: (a) Correlation heatmap for training dataset. (b) Correlation heatmap for validation dataset. (c) Correlation heatmap for testing dataset.

Figure 6. Correlation heatmaps. The color intensity represents the Pearson correlation coefficients, ranging from

- 1

(perfect negative correlation) to 1 (perfect positive correlation), with 0 representing no correlation. Color intensity from light green (negative) to dark green (positive) reflects the strength of the relationship: (a) Correlation heatmap for training dataset. (b) Correlation heatmap for validation dataset. (c) Correlation heatmap for testing dataset.

Figure 7. Performance metrics for isolation forest model.

Figure 8. Confusion matrix analysis for isolation forest model.

Figure 9. Area under the receiver operating characteristic curve (AUROC) for isolation forest model.

Figure 10. Comparison of AUROC between our proposed iForest model and BETH’s baseline [11].

Table 1. Summary of distribution of event types.

Type	Combination ¹	Count	Percentage
Benign Events	(0, 0)	967,564	84.79%
Suspicious Events	(1, 0)	15,082	1.32%
Malicious Events	(1, 1)	158,432	13.88%

¹ sus, evil.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alatawi, E.; Albalawi, U. Harnessing AI for Cyber Defense: Honeypot-Driven Intrusion Detection Systems. Symmetry 2025, 17, 628. https://doi.org/10.3390/sym17050628

AMA Style

Alatawi E, Albalawi U. Harnessing AI for Cyber Defense: Honeypot-Driven Intrusion Detection Systems. Symmetry. 2025; 17(5):628. https://doi.org/10.3390/sym17050628

Chicago/Turabian Style

Alatawi, Eman, and Umar Albalawi. 2025. "Harnessing AI for Cyber Defense: Honeypot-Driven Intrusion Detection Systems" Symmetry 17, no. 5: 628. https://doi.org/10.3390/sym17050628

APA Style

Alatawi, E., & Albalawi, U. (2025). Harnessing AI for Cyber Defense: Honeypot-Driven Intrusion Detection Systems. Symmetry, 17(5), 628. https://doi.org/10.3390/sym17050628

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Harnessing AI for Cyber Defense: Honeypot-Driven Intrusion Detection Systems

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Dataset Selection and Exploratory Data Analysis (EDA)

3.2. Data Pre-Processing

3.3. Machine Learning Model Training

3.4. Model Validation and Evaluation

4. Experimentation Results and Discussion

4.1. Confusion Matrix Analysis

4.2. Sensitivity Analysis

4.3. Precision Analysis

4.4. F1-Score Analysis

4.5. Accuracy Analysis

4.6. Area Under the Receiver Operating Characteristic Curve (AUROC) Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI