Unveiling Cyber Threats: An In-Depth Study on Data Mining Techniques for Exploit Attack Detection †
Abstract
1. Introduction
- Introduce a novel contribution by addressing the literature gap where previous works did not adequately consider exploit attacks, despite their significance.
- Analyze the detection of exploit attacks using different data mining techniques
- Evaluate the performance of different models and different feature selection methods.
2. Literature Review
3. Methodology
3.1. Data Preprocessing
3.2. Feature Selection
3.3. Model Development, Training, and Evaluation
4. Experiments and Results
4.1. Dataset
4.2. Evaluation Measures
4.3. Experiments
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zaki, M.J.; Meira, W. Data Mining and Analysis: Fundamental Concepts and Algorithms; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
- Feng, G.; Fan, M. Research on Learning Behavior Patterns from the Perspective of Educational Data Mining: Evaluation, Prediction and Visualization. Expert. Syst. Appl. 2024, 237, 121555. [Google Scholar] [CrossRef]
- Mansouri, S. Application of Neural Networks in the Medical Field. Journal of Wireless Mobile Networks. Ubiquitous Comput. Dependable Appl. (JoWUA) 2025, 14, 69–81. [Google Scholar] [CrossRef]
- Alsaaidah, A.; Almomani, O.; Abu-Shareha, A.A.; Abualhaj, M.M.; Achuthan, A. ARP Spoofing Attack Detection Model in IoT Network Using Machine Learning: Complexity vs. Accuracy. J. Appl. Data Sci. 2024, 5, 1850–1860. [Google Scholar] [CrossRef]
- Almaiah, M.A.; Saqr, L.M.; Al-Rawwash, L.A.; Altellawi, L.A.; Al-Ali, R.; Almomani, O. Classification of Cybersecurity Threats, Vulnerabilities and Countermeasures in Database Systems. Computers. Mater. Contin. 2024, 81, 3189–3220. [Google Scholar] [CrossRef]
- Almomani, O.; Alsaaidah, A.; Shareha, A.A.A.; Alzaqebah, A.; Almomani, M. Performance Evaluation of Machine Learning Classifiers for Predicting Denial-of-Service Attack in Internet of Things. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 263–271. [Google Scholar] [CrossRef]
- Abualhaj, M.M.; Al-Khatib, S.; Al Shafi, N.; Qaddara, I.; Hyassat, A. Utilizing Gray Wolf Optimization Algorithm in Malware Forensic Investigation. J. Comput. Cogn. Eng. 2025, 1–12. [Google Scholar] [CrossRef]
- Al-Amiedy, T.A.; Anbar, M.; Belaton, B.; Bahashwan, A.A.; Abualhaj, M.M. Towards a Lightweight Detection System Leveraging Ranking Techniques with Wrapper Feature Selection Algorithm for Selective Forwarding Attacks in Low Power and Lossy Networks of IoTs. In Proceedings of the 2024 4th International Conference on Emerging Smart Technologies and Applications (eSmarTA), Sana’a, Yemen, 6–7 August 2024; pp. 1–17. [Google Scholar] [CrossRef]
- Chen, X.; Jeong, J.C. Enhanced Recursive Feature Elimination. In Proceedings of the Sixth International Conference on Machine Learning and Applications (ICMLA 2007), Cincinnati, OH, USA, 13–15 December 2007; pp. 429–435. [Google Scholar] [CrossRef]
- Moustafa, N.; Slay, J. UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 Network Data Set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, ACT, Australia, 10–12 November 2015; pp. 1–6. [Google Scholar] [CrossRef]
- Alsharaiah, M.; Abualhaj, M.; Baniata, L.; Al-saaidah, A.; Kharma, Q.; Al-Zyoud, M. An Innovative Network Intrusion Detection System (NIDS): Hierarchical Deep Learning Model Based on Unsw-Nb15 Dataset. Int. J. Data Netw. Sci. 2024, 8, 709–722. [Google Scholar] [CrossRef]
- Binbusayyis, A. Hybrid VGG19 and 2D-CNN for Intrusion Detection in the FOG-Cloud Environment. Expert. Syst. Appl. 2024, 238, 121758. [Google Scholar] [CrossRef]
- Mohiuddin, G.; Lin, Z.; Zheng, J.; Wu, J.; Li, W.; Fang, Y.; Wang, S.; Chen, J.; Zeng, X. Intrusion Detection Using Hybridized Meta-Heuristic Techniques with Weighted XGBoost Classifier. Expert. Syst. Appl. 2023, 232, 120596. [Google Scholar] [CrossRef]
- Mohamed, D.; Ismael, O. Enhancement of an IoT Hybrid Intrusion Detection System Based on Fog-to-Cloud Computing. J. Cloud Comput. 2023, 12, 41. [Google Scholar] [CrossRef]
- Barhoush, M.; Abed-alguni, B.H.; Al-qudah, N.E.A. Improved Discrete Salp Swarm Algorithm Using Exploration and Exploitation Techniques for Feature Selection in Intrusion Detection Systems. J. Supercomput. 2023, 79, 21265–21309. [Google Scholar] [CrossRef]
- Kumar, V.; Kumar, V.; Singh, N.; Kumar, R. Enhancing Intrusion Detection System Performance to Detect Attacks on Edge of Things. SN Comput. Sci. 2023, 4, 802. [Google Scholar] [CrossRef]
- Yin, Y.; Jang-Jaccard, J.; Xu, W.; Singh, A.; Zhu, J.; Sabrina, F.; Kwak, J. IGRF-RFE: A Hybrid Feature Selection Method for MLP-Based Network Intrusion Detection on UNSW-NB15 Dataset. J. Big Data 2023, 10, 15. [Google Scholar] [CrossRef]
- Srivastava, A.; Sinha, D.; Kumar, V. WCGAN-GP Based Synthetic Attack Data Generation with GA Based Feature Selection for IDS. Comput. Secur. 2023, 134, 103432. [Google Scholar] [CrossRef]
- Abualhaj, M.M.; Al-Zyoud, M.; Alsaaidah, A.; Abu-Shareha, A.; Al-Khatib, S. Enhancing Malware Detection through Self-Union Feature Selection Using Firefly Algorithm with Random Forest Classification. Int. J. Intell. Eng. Syst. 2024, 17, 376–389. [Google Scholar] [CrossRef]
- Saeed, M.H.; Hama, J.I. Cardiac Disease Prediction Using AI Algorithms with SelectKBest. Med. Biol. Eng. Comput. 2023, 61, 3397–3408. [Google Scholar] [CrossRef] [PubMed]
- Ramakrishnan, K.; Balakrishnan, V.; Wong, H.Y.; Tay, S.H.; Soo, K.L.; Kiew, W.K. Face Mask Wearing Classification Using Machine Learning. Eng. Proc. 2023, 41, 13. [Google Scholar] [CrossRef]
- Kazolis, D.; Fotakis, C.D.; Tramantzas, K. Comparison of Functionality and Evaluation of Results in Different Prediction Models. Eng. Proc. 2024, 70, 31. [Google Scholar] [CrossRef]
- Abualhaj, M.M.; Al-Khatib, S.; Hiari, M.O.; Shambour, Q.Y. Enhancing Spam Detection Using Hybrid of Harris Hawks and Firefly Optimization Algorithms. J. Soft Comput. Data Min. 2024, 5, 161–174. [Google Scholar] [CrossRef]
- Mukasheva, A.; Koishiyeva, D.; Sergazin, G.; Sydybayeva, M.; Mukhammejanova, D.; Seidazimov, S. Modification of U-Net with Pre-Trained ResNet-50 and Atrous Block for Polyp Segmentation: Model TASPP-UNet. Eng. Proc. 2024, 70, 16. [Google Scholar] [CrossRef]
Category | Count |
---|---|
Normal | 2,218,761 |
Fuzzers | 24,246 |
Reconnaissance | 13,987 |
Shellcode | 1511 |
Analysis | 2677 |
Backdoors | 2329 |
DoS | 16,353 |
Exploits | 44,525 |
Generic | 215,481 |
Worms | 174 |
Total | 2,540,044 |
Model | Accuracy | Recall | Precision | F1 Score | AUC |
---|---|---|---|---|---|
AdaBoost | 0.796 | 0.916 | 0.739 | 0.818 | 0.796 |
Decision Tree | 0.876 | 0.934 | 0.837 | 0.883 | 0.876 |
KNN | 0.775 | 0.828 | 0.748 | 0.786 | 0.774 |
Logistic Regression | 0.673 | 0.473 | 0.790 | 0.592 | 0.674 |
MLP Classifier | 0.710 | 0.814 | 0.673 | 0.733 | 0.710 |
Random Forest | 0.875 | 0.956 | 0.824 | 0.885 | 0.875 |
SVM | 0.667 | 0.861 | 0.621 | 0.721 | 0.666 |
XGBoost | 0.844 | 0.955 | 0.782 | 0.860 | 0.844 |
Model | Accuracy | Recall | Precision | F1 Score | AUC |
---|---|---|---|---|---|
AdaBoost | 0.7906 | 0.8635 | 0.7541 | 0.8051 | 0.7904 |
Decision Tree | 0.8791 | 0.9081 | 0.8587 | 0.8827 | 0.8790 |
KNN | 0.8094 | 0.8394 | 0.7925 | 0.8153 | 0.8093 |
Logistic Regression | 0.7425 | 0.6920 | 0.7707 | 0.7292 | 0.7426 |
MLP Classifier | 0.7913 | 0.8684 | 0.7531 | 0.8064 | 0.7911 |
Random Forest | 0.8790 | 0.9266 | 0.8465 | 0.8847 | 0.8789 |
SVM | 0.6985 | 0.6242 | 0.7342 | 0.6747 | 0.6986 |
XGBoost | 0.8433 | 0.9402 | 0.7880 | 0.8574 | 0.8431 |
Model | Accuracy | Recall | Precision | F1 Score | AUC |
---|---|---|---|---|---|
AdaBoost | 0.7973 | 0.9220 | 0.7384 | 0.8201 | 0.7970 |
Decision Tree | 0.8791 | 0.9081 | 0.8587 | 0.8827 | 0.8790 |
KNN | 0.7785 | 0.8312 | 0.7526 | 0.7899 | 0.7784 |
Logistic Regression | 0.6759 | 0.4905 | 0.7813 | 0.6026 | 0.6763 |
MLP Classifier | 0.6872 | 0.7147 | 0.6909 | 0.6945 | 0.6871 |
Random Forest | 0.8790 | 0.9261 | 0.8468 | 0.8847 | 0.8789 |
SVM | 0.6664 | 0.8606 | 0.6205 | 0.7211 | 0.6660 |
XGBoost | 0.8467 | 0.9460 | 0.7896 | 0.8608 | 0.8465 |
Model | Accuracy | Recall | Precision | F1 Score | AUC |
---|---|---|---|---|---|
AdaBoost | 0.7922 | 0.9249 | 0.7315 | 0.8169 | 0.7920 |
Decision Tree | 0.8665 | 0.9967 | 0.7912 | 0.8821 | 0.8663 |
KNN | 0.7799 | 0.6723 | 0.8577 | 0.7537 | 0.7801 |
Logistic Regression | 0.5010 | 1.0000 | 0.5010 | 0.6676 | 0.5000 |
MLP Classifier | 0.5519 | 0.6139 | 0.5597 | 0.5450 | 0.5518 |
Random Forest | 0.8665 | 0.9980 | 0.7906 | 0.8823 | 0.8663 |
SVM | 0.6402 | 0.9445 | 0.5877 | 0.7245 | 0.6395 |
XGBoost | 0.8325 | 0.9809 | 0.7568 | 0.8544 | 0.8322 |
Model | Accuracy | Recall | Precision | F1 Score | AUC |
---|---|---|---|---|---|
AdaBoost | 0.7955 | 0.9181 | 0.7379 | 0.8182 | 0.7953 |
Decision Tree | 0.8704 | 0.9619 | 0.8135 | 0.8815 | 0.8702 |
KNN | 0.8068 | 0.8233 | 0.7977 | 0.8103 | 0.8068 |
Logistic Regression | 0.6457 | 0.4792 | 0.7200 | 0.5754 | 0.6461 |
MLP Classifier | 0.7152 | 0.9239 | 0.6591 | 0.7663 | 0.7148 |
Random Forest | 0.8704 | 0.9642 | 0.8123 | 0.8817 | 0.8702 |
SVM | 0.6400 | 0.9437 | 0.5877 | 0.7243 | 0.6394 |
XGBoost | 0.8408 | 0.9517 | 0.7794 | 0.8570 | 0.8406 |
Feature Selection Technique | Evaluation Metric | Best Classification Model | Best Value |
---|---|---|---|
SelectKBest | Accuracy | Decision Tree and Random Forest | 0.8759 |
Logistic Regression | Decision Tree and Random Forest | 0.8791 | |
RF | Decision Tree and Random Forest | 0.8791 | |
RFECV | Decision Tree and Random Forest | 0.8665 | |
GA | Decision Tree and Random Forest | 0.8704 | |
SelectKBest | Recall | Random Forest | 0.9560 |
Logistic Regression | XGBoost | 0.9402 | |
RF | XGBoost | 0.9460 | |
RFECV | Logistic Regression | 1.0 | |
GA | Random Forest | 0.9642 | |
SelectKBest | Precision | Decision Tree | 0.8374 |
Logistic Regression | Decision Tree | 0.8587 | |
RF | Decision Tree | 0.8587 | |
RFECV | KNN | 0.8577 | |
GA | Decision Tree | 0.8135 | |
SelectKBest | F1 Score | Random Forest | 0.8850 |
Logistic Regression | Random Forest | 0.8847 | |
RF | Random Forest | 0.8847 | |
RFECV | Random Forest | 0.8823 | |
GA | Random Forest | 0.8817 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hyassat, A.S.; Abu Zayed, R.E.; Al Khateeb, E.A.; Shalaldeh, A.; Abdelhamied, M.M.; Qaddara, I. Unveiling Cyber Threats: An In-Depth Study on Data Mining Techniques for Exploit Attack Detection. Eng. Proc. 2025, 104, 28. https://doi.org/10.3390/engproc2025104028
Hyassat AS, Abu Zayed RE, Al Khateeb EA, Shalaldeh A, Abdelhamied MM, Qaddara I. Unveiling Cyber Threats: An In-Depth Study on Data Mining Techniques for Exploit Attack Detection. Engineering Proceedings. 2025; 104(1):28. https://doi.org/10.3390/engproc2025104028
Chicago/Turabian StyleHyassat, Abdallah S., Raneem E. Abu Zayed, Eman A. Al Khateeb, Ahmad Shalaldeh, Mahmoud M. Abdelhamied, and Iyas Qaddara. 2025. "Unveiling Cyber Threats: An In-Depth Study on Data Mining Techniques for Exploit Attack Detection" Engineering Proceedings 104, no. 1: 28. https://doi.org/10.3390/engproc2025104028
APA StyleHyassat, A. S., Abu Zayed, R. E., Al Khateeb, E. A., Shalaldeh, A., Abdelhamied, M. M., & Qaddara, I. (2025). Unveiling Cyber Threats: An In-Depth Study on Data Mining Techniques for Exploit Attack Detection. Engineering Proceedings, 104(1), 28. https://doi.org/10.3390/engproc2025104028