Detecting Malware C&C Communication Traffic Using Artificial Intelligence Techniques
Abstract
:1. Introduction
1.1. Need for Detecting Banking Malware
- The need to update signature-based malware systems.
- The inability of these systems to detect newer malware variants.
- The inability to detect malware that uses sophisticated obfuscation techniques.
- The inability to detect zero-day malware.
1.2. Paper Contribution and Rational
- From all the ML algorithms being analyzed, identify which one performs the best.
- Establish whether the features used to detect the Zeus banking malware can also be used to detect the other banking malware variants.
- Determine a minimum set of features that could be used for detecting Zeus.
- Determine a minimum set of features that could be used for detecting other variants of the Zeus malware.
- Compare the performance results of all the ML algorithms.
- Compare the classification results with other research examined in Section 2.
1.3. Overview of the Zeus Banking Malware
1.4. Overview of the Zeus Panda Banking Malware
1.5. Overview of the Ramnit Banking Malware
1.6. Banking Malware Communication (C&C) Architecture
1.7. Proposed Banking Malware Tree
2. Related Studies
- Lenient version with cost settings of 10, 20, and 30
- Strict version with a cost setting of 10, 20, and 30
3. Problem Statement
- Signature-based systems are unable to detect zero-day malware or unknown malware variants.
- Signature-based systems must be updated frequently to accommodate newly emerging malware variants.
- Malware uses various obfuscation techniques to evade detection.
- There can be a time delay between discovering new malware and creating a signature to identify the malware.
- Signature databases can consume significant system resources and have a slow performance.
- Modern malware can dynamically change its structure (polymorphic malware) or rewrite its code (metamorphic malware) to avoid signature-based malware systems.
- As the malware landscape evolves, maintaining and updating the signature database becomes increasingly complex.
- Effective and continuous tuning is required to reduce false positives.
- The network has to be baselined, and normal communication traffic needs to be identified.
- Network traffic must be constantly monitored.
- Malware can hide within the normal traffic flows, making these malware types difficult to detect.
- Cross-Variant Detection: To apply the trained model to identify other banking malware variants and evaluate its generalizability.
- Algorithm Performance Evaluation: To compare the detection performances of various machine learning algorithms in this context.
- Feature Optimization: To determine the minimum set of features required to achieve satisfactory prediction results, thereby optimizing computational efficiency and simplifying the detection process.
4. Research Methodology
- Obtain pcap samples of the Zeus banking malware and benign traffic.
- Extract features from the pcap samples.
- Train and test the algorithms with the data.
- Compare and discuss the results.
4.1. Machine Learning Algorithms
- Binary classification—Two possible classifications can be predicted, for example, an email can either be spam or not spam. The two possible classes are usually either normal or abnormal.
- Multi-Class classification—Multiple classes are involved, and each data point is classified into one of the available class options.
- Multi-Label classification—Multiple classes can be predicted for each data point. For example, a house could be present in multiple photos.
4.2. System Architecture and Methodology
- The datasets are identified and collected.
- Features are extracted from these datasets.
- The extracted features are transferred to a CSV file and prepared.
- The features are selected for training and testing.
- The algorithm is trained and tested, and a model is created. Only one dataset is used for the training.
- The model is tuned, trained, and tested again if required.
- The model is used to test and evaluate the remaining datasets.
- The final model is deployed, all the data samples are tested, and a report highlighting the evaluation metrics is created.
4.3. Data Samples
4.4. Feature Selection
- Filter method—Feature selection is independent of the ML algorithm.
- Wrapper method—Features are selectively used to train the ML algorithm, and through continual experimental analysis, the best features are selected for the final model. This method can be very time-consuming.
- Hybrid—A fusion of the filter and wrapper approaches.
4.5. Evaluation Approach of the Experimental Analysis
5. Results
5.1. Training and Testing the Decision Tree Machine Learning Algorithms
5.2. Training and Testing the Random Forest (RF) Machine Learning Algorithm
5.3. Training and Testing the K-Nearest Neighbor (KNN) Machine Learning Algorithm
5.4. Training and Testing Using the Ensemble Machine Learning Approach
5.5. Comparing the Predication Results of all the Algorithms Tested
5.6. Comparing the Predication Results with Previous Research
6. Conclusions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wadhwa, A.; Arora, N. A Review on Cyber Crime: Major Threats and Solutions. Int. J. Adv. Res. Comput. Sci. 2017, 8, 2217–2221. [Google Scholar]
- Morgan, S. Cybercrime to Cost the World 8 Trillion Annually in 2023. Cybercrime Magazine. 17 October 2022. Available online: https://cybersecurityventures.com/cybercrime-to-cost-the-world-8-trillion-annually-in-2023/ (accessed on 7 December 2024).
- Banking Malware Threats Surging as Mobile Banking Increases—Nokia Threat Intelligence Report. n.d. Nokia. Available online: https://www.nokia.com/about-us/news/releases/2021/11/08/banking-malware-threats-surging-as-mobile-banking-increases-nokia-threat-intelligence-report/ (accessed on 7 December 2024).
- Kuraku, S.; Kalla, D. Emotet malware—A banking credentials stealer. IOSR J. Comput. Eng. 2020, 22, 31–41. [Google Scholar]
- Etaher, N.; Weir, G.R.S.; Alazab, M. From zeus to zitmo: Trends in banking malware. In Proceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA, Helsinki, Finland, 20–22 August 2015; Volume 1, pp. 1386–1391. [Google Scholar]
- Godfather Banking Trojan Spawns 1.2K Samples across 57 Countries. Darkreading.com. 2024. Available online: https://www.darkreading.com/endpoint-security/godfather-banking-trojan-spawns-1k-samples-57-countries (accessed on 16 January 2025).
- Nilupul, S.A. Evolution and Impact of Malware: A Comprehensive Analysis from the First Known Malware to Modern-Day Cyber Threats. Cyber Secur. 2024. [Google Scholar] [CrossRef]
- Mishra, R.; Butakov, S.; Jaafar, F.; Memon, N. Behavioral Study of Malware Affecting Financial Institutions and Clients. In Proceedings of the 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), Calgary, AB, Canada, 17–22 August 2020; pp. 79–86. [Google Scholar]
- Owen, H.; Zarrin, J.; Pour, S.M. A survey on botnets, issues, threats, methods, detection and prevention. J. Cybersecur. Priv. 2022, 2, 74–88. [Google Scholar] [CrossRef]
- Boukherouaa, E.B.; Shabsigh, M.G.; AlAjmi, K.; Deodoro, J.; Farias, A.; Iskender, E.S.; Mirestean, M.A.T.; Ravikumar, R. Powering the Digital Economy: Opportunities and Risks of Artificial Intelligence in Finance; International Monetary Fund: Washington, DC, USA, 2021. [Google Scholar]
- AMR. IT Threat Evolution in Q3 2022. Non-Mobile Statistics. Securelist.com. Kaspersky. 18 November 2022. Available online: https://securelist.com/it-threat-evolution-in-q3-2022-non-mobile-statistics/107963/ (accessed on 16 January 2025).
- Kazi, M.A.; Woodhead, S.; Gan, D. Comparing the performance of supervised machine learning algorithms when used with a manual feature selection process to detect Zeus malware. Int. J. Grid Util. Comput. 2022, 13, 495–504. [Google Scholar] [CrossRef]
- Punyasiri, D.L.S. Signature & Behavior Based Malware Detection. Bachelor’s Thesis, Sri Lanka Institute of Information Technology, Malabe, Sri Lanka, 2023. [Google Scholar]
- Gopinath, M.; Sethuraman, S.C. A comprehensive survey on deep learning based malware detection techniques. Comput. Sci. Rev. 2023, 47, 100529. [Google Scholar]
- Alaskar, H.; Saba, T. Machine learning and deep learning: A comparative review. In Proceedings of Integrated Intelligence Enable Networks and Computing: IIENC 2020; Springer: Singapore, 2021; pp. 143–150. [Google Scholar]
- Madanan, M.; Gunasekaran, S.S.; Mahmoud, M.A. A Comparative Analysis of Machine Learning and Deep Learning Algorithms for Image Classification. In Proceedings of the 2023 6th International Conference on Contemporary Computing and Informatics (IC3I), Gautam Buddha Nagar, India, 14–16 September 2023; Volume 6, pp. 2436–2439. [Google Scholar]
- Kazi, M.A.; Woodhead, S.; Gan, D. Comparing and analysing binary classification algorithms when used to detect the Zeus malware. In 2019 Sixth HCT Information Technology Trends (ITT); IEEE: Piscataway, NJ, USA, 2019; pp. 6–11. [Google Scholar]
- Bansal, M.; Goyal, A.; Choudhary, A. A comparative analysis of K-nearest neighbor, genetic, support vector machine, decision tree, and long short term memory algorithms in machine learning. Decis. Anal. J. 2022, 3, 100071. [Google Scholar] [CrossRef]
- Kazi, M.; Woodhead, S.; Gan, D. A contempory Taxonomy of Banking Malware. In Proceedings of the First International Conference on Secure Cyber Computing and Communications, Jalandhar, India, 15–17 December 2018. [Google Scholar]
- Falliere, N.; Chien, E. Zeus: King of the Bots. 2009. Available online: https://www.google.co.uk/url?sa=t&source=web&rct=j&opi=89978449&url=https://pure.port.ac.uk/ws/portalfiles/portal/42722286/Understanding_and_Mitigating_Banking_Trojans.pdf&ved=2ahUKEwizroXLwZqJAxU-VUEAHdgzKqEQFnoECDMQAQ&usg=AOvVaw1St11bbRwbhYj9IB4VdQv4 (accessed on 19 October 2024).
- Lelli, A. Zeusbot/Spyeye P2P Updated, Fortifying the Botnet. Available online: https://www.symantec.com/connect/blogs/zeusbotspyeye-p2p-updated-fortifying-botnet (accessed on 5 November 2019).
- Cluley, G. GameOver Zeus Malware Returns from the Dead. Graham Cluley. 14 July 2014. Available online: https://grahamcluley.com/gameover-zeus-malware/ (accessed on 16 January 2025).
- Brumaghin, E. Poisoning the Well: Banking Trojan Targets Google Search Results. [online] Cisco Talos Blog. 2017. Available online: https://blog.talosintelligence.com/zeus-panda-campaign/#More (accessed on 16 January 2025).
- Lamb, C. Advanced Malware and Nuclear Power: Past Present and Future; No. SAND2019-14527C; Sandia National Lab. (SNL-NM): Albuquerque, NM, USA, 2019. [Google Scholar]
- De Carli, L.; Torres, R.; Modelo-Howard, G.; Tongaonkar, A.; Jha, S. Botnet protocol inference in the presence of encrypted traffic. In Proceedings of the IEEE INFOCOM 2017-IEEE Conference on Computer Communications, Atlanta, GA, USA, 1–4 May 2017; pp. 1–9. [Google Scholar]
- Lioy, A.; Atzeni, A.; Romano, F. Machine Learning for Malware Characterization and Identification. Master’s Thesis, Politecnico di Torino, Turin, Italy, 2023. [Google Scholar]
- Paganini, P. HTTP-Botnets: The Dark Side of a Standard Protocol! Security Affairs. 22 April 2013. Available online: http://securityaffairs.co/wordpress/13747/cyber-crime/http-botnets-the-dark-side-of-an- (accessed on 16 January 2025).
- Sood, A.K.; Zeadally, S.; Enbody, R.J. An empirical study of HTTP-based financial botnets. IEEE Trans. Dependable Secur. Comput. 2014, 13, 236–251. [Google Scholar] [CrossRef]
- Niu, Z.; Xue, J.; Qu, D.; Wang, Y.; Zheng, J.; Zhu, H. A novel approach based on adaptive online analysis of encrypted traffic for identifying Malware in IIoT. Inf. Sci. 2022, 601, 162–174. [Google Scholar] [CrossRef]
- Black, P.; Gondal, I.; Layton, R. A Survey of Similarities in Banking Malware Behaviours. Comput. Secur. 2018, 77, 756–772. [Google Scholar] [CrossRef]
- Pilania, S.; Kunwar, R.S. Zeus: In-Depth Malware Analysis of Banking Trojan Malware. In Advanced Techniques and Applications of Cybersecurity and Forensics; Chapman and Hall/CRC: Boca Raton, FL, USA, 2024; pp. 167–195. [Google Scholar]
- CLULEY, Graham. Russian Creator of NeverQuest Banking Trojan Pleads Guilty in American Court. Hot for Security. 2019. Available online: https://www.bitdefender.com/en-us/blog/hotforsecurity/russian-creator-of-neverquest-banking-trojan-pleads-guilty-in-american-court/ (accessed on 16 January 2025).
- Fisher, D. Cridex Malware Takes Lesson from GameOver Zeus. Threatpost.com. Threatpost. 15 August 2014. Available online: https://threatpost.com/cridex-malware-takes-lesson-from-gameover-zeus/107785/ (accessed on 16 January 2025).
- Ilascu, I. Softpedia. 16 August 2014. Available online: https://news.softpedia.com/news/Cridex-Banking-Malware-Variant-Uses-Gameover-Zeus-Thieving-Technique-455193.shtml (accessed on 16 January 2025).
- Andriesse, D.; Rossow, C.; Stone-Gross, B.; Plohmann, D.; Bos, H. Highly resilient peer-to-peer botnets are here: An analysis of gameover zeus. In Proceedings of the 2013 8th International Conference on Malicious and Unwanted Software: “The Americas” (MALWARE), Fajardo, PR, USA, 22–24 October 2013; pp. 116–123. [Google Scholar]
- Sarojini, S.; Asha, S. Botnet detection on the analysis of Zeus panda financial botnet. Int. J. Eng. Adv. Technol. 2019, 8, 1972–1976. [Google Scholar] [CrossRef]
- Aboaoja, F.A.; Zainal, A.; Ghaleb, F.A.; Al-Rimy, B.A.S.; Eisa, T.A.E.; Elnour, A.A.H. Malware detection issues, challenges, and future directions: A survey. Appl. Sci. 2022, 12, 8482. [Google Scholar] [CrossRef]
- Chen, R.; Niu, W.; Zhang, X.; Zhuo, Z.; Lv, F. An effective conversation-based botnet detection method. Math. Probl. Eng. 2017, 2017, 4934082. [Google Scholar] [CrossRef]
- Jha, J.; Ragha, L. Intrusion detection system using support vector machine. Int. J. Appl. Inf. Syst. (IJAIS) 2013, 3, 25–30. [Google Scholar]
- Singla, S.; Gandotra, E.; Bansal, D.; Sofat, S. A novel approach to malware detection using static classification. Int. J. Comput. Sci. Inf. Secur. 2015, 13, 1–5. [Google Scholar]
- Wu, W.; Alvarez, J.; Liu, C.; Sun, H.M. Bot detection using unsupervised machine learning. Microsyst. Technol. 2018, 24, 209–217. [Google Scholar] [CrossRef]
- Yahyazadeh, M.; Abadi, M. BotOnus: An Online Unsupervised Method for Botnet Detection. ISeCure 2012, 4, 51–62. [Google Scholar]
- Soniya, B.; Wilscy, M. Detection of randomized bot command and control traffic on an end-point host. Alex. Eng. J. 2016, 55, 2771–2781. [Google Scholar] [CrossRef]
- Azab, A. The effectiveness of cost sensitive machine learning algorithms in classifying Zeus flows. Int. J. Inf. Comput. Secur. 2022, 17, 332–350. [Google Scholar] [CrossRef]
- Haddadi, F.; Runkel, D.; Zincir-Heywood, A.N.; Heywood, M.I. On botnet behaviour analysis using GP and C4. 5. In Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation, Vancouver, BC, Canada, 12–16 July 2014; pp. 1253–1260. [Google Scholar]
- Mohaisen, A.; Alrawi, O. Unveiling zeus: Automated classification of malware samples. In Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 13–17 May 2013; pp. 829–832. [Google Scholar]
- Wang, J.; Yang, Q.; Ren, D. An intrusion detection algorithm based on decision tree technology. In Proceedings of the 2009 Asia-Pacific Conference on Information Processing, Shenzhen, China, 18–19 July 2009; Volume 2, pp. 333–335. [Google Scholar]
- Sajjad, S.; Jiana, B. The use of Convolutional Neural Network for Malware Classification. In Proceedings of the 2020 IEEE 9th Data Driven Control and Learning Systems Conference (DDCLS), Liuzhou, China, 20–22 November 2020; pp. 1136–1140. [Google Scholar]
- Walker, A.; Sengupta, S. Malware family fingerprinting through behavioral analysis. In Proceedings of the 2020 IEEE International Conference on Intelligence and Security Informatics (ISI), Arlington, VA, USA, 9–10 November 2020; pp. 1–5. [Google Scholar]
- Ramakrishna, M.; Rama Satish, A.; Siva Krishna, P.S.S. Design and development of an efficient malware detection Using ML. In Proceedings of International Conference on Computational Intelligence and Data Engineering: ICCIDE 2020; Springer: Singapore, 2021; pp. 423–433. [Google Scholar]
- Ghafir, I.; Prenosil, V.; Hammoudeh, M.; Baker, T.; Jabbar, S.; Khalid, S.; Jaf, S. BotDet: A System for Real Time Botnet Command and Control Traffic Detection. IEEE Access 2018, 6, 38947–38958. [Google Scholar] [CrossRef]
- Agarwal, P.; Satapathy, S. Implementation of signature-based detection system using snort in windows. Int. J. Comput. Appl. Inf. Technol. 2014, 3, 3–93. [Google Scholar] [CrossRef]
- He, S.; Zhu, J.; He, P.; Lyu, M.R. Experience report: System log analysis for anomaly detection. In Proceedings of the 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), Ottawa, ON, Canada, 23–27 October 2016; pp. 207–218. [Google Scholar]
- Zhou, J.; Qian, Y.; Zou, Q.; Liu, P.; Xiang, J. DeepSyslog: Deep Anomaly Detection on Syslog Using Sentence Embedding and Metadata. IEEE Trans. Inf. Forensics Secur. 2022, 17, 3051–3061. [Google Scholar] [CrossRef]
- Khraisat, A.; Gondal, I.; Vamplew, P.; Kamruzzaman, J. Survey of intrusion detection systems: Techniques, datasets and challenges. Cybersecurity 2019, 2, 20. [Google Scholar] [CrossRef]
- Sharma, P.; Said, Z.; Memon, S.; Elavarasan, R.M.; Khalid, M.; Nguyen, X.P.; Arıcı, M.; Hoang, A.T.; Nguyen, L.H. Comparative evaluation of AI-based intelligent GEP and ANFIS models in prediction of thermophysical properties of Fe3O4-coated MWCNT hybrid nanofluids for potential application in energy systems. Int. J. Energy Res. 2022, 46, 19242–19257. [Google Scholar] [CrossRef]
- Choi, R.Y.; Coyner, A.S.; Kalpathy-Cramer, J.; Chiang, M.F.; Campbell, J.P. Introduction to machine learning, neural networks, and deep learning. Transl. Vis. Sci. Technol. 2020, 9, 14. [Google Scholar] [PubMed]
- Ahsan, M.; Nygard, K.E.; Gomes, R.; Chowdhury, M.M.; Rifat, N.; Connolly, J.F. Cybersecurity Threats and Their Mitigation Approaches Using Machine Learning—A Review. J. Cybersecur. Priv. 2022, 2, 527–555. [Google Scholar] [CrossRef]
- Elmachtoub, A.N.; Liang, J.C.N.; McNellis, R. Decision trees for decision-making under the predict-then-optimize framework. In Proceedings of the International Conference on Machine Learning, Virtual, 12–18 July 2020; pp. 2858–2867. [Google Scholar]
- Liberman, N. Decision Trees and Random Forests. Towards Data Science. 27 January 2017. Available online: https://towardsdatascience.com/decision-trees-and-random-forests-df0c3123f991 (accessed on 16 January 2025).
- Demirović, E.; Lukina, A.; Hebrard, E.; Chan, J.; Bailey, J.; Leckie, C.; Ramamohanarao, K.; Stuckey, P.J. Murtree: Optimal decision trees via dynamic programming and search. J. Mach. Learn. Res. 2022, 23, 1–47. [Google Scholar]
- Schonlau, M.; Zou, R.Y. The random forest algorithm for statistical learning. Stata J. 2020, 20, 3–29. [Google Scholar] [CrossRef]
- Oshiro, T.M.; Perez, P.S.; Baranauskas, J.A. How many trees in a random forest? In Machine Learning and Data Mining in Pattern Recognition, Proceedings of the 8th International Conference, MLDM 2012, Berlin, Germany, 13–20 July 2012; Proceedings 8; Springer: Berlin/Heidelberg, Germany, 2012; pp. 154–168. [Google Scholar]
- Halder, R.K.; Uddin, M.N.; Uddin, M.A.; Aryal, S.; Khraisat, A. Enhancing K-nearest neighbor algorithm: A comprehensive review and performance analysis of modifications. J. Big Data 2024, 11, 113. [Google Scholar] [CrossRef]
- Suyal, M.; Goyal, P. A review on analysis of k-nearest neighbor classification machine learning algorithms based on supervised learning. Int. J. Eng. Trends Technol. 2022, 70, 43–48. [Google Scholar] [CrossRef]
- Aggarwal, C.C. (Ed.) Data Classification; Springer International Publishing: New York, NY, USA, 2015. [Google Scholar]
- Kazi, M.A.; Woodhead, S.; Gan, D. Detecting Zeus Malware Network Traffic Using the Random Forest Algorithm with Both a Manual and Automated Feature Selection Process. In IOT with Smart Systems: Proceedings of ICTIS 2022, Volume 2; Springer Nature Singapore: Singapore, 2022; pp. 547–557. [Google Scholar]
- Chung, J.; Teo, J. Single classifier vs. ensemble machine learning approaches for mental health prediction. Brain Inform. 2023, 10, 1. [Google Scholar] [CrossRef] [PubMed]
- Salur, M.U.; Aydın, İ. A soft voting ensemble learning-based approach for multimodal sentiment analysis. Neural Comput. Appl. 2022, 34, 18391–18406. [Google Scholar] [CrossRef]
- Jabbar, H.G. Advanced Threat Detection Using Soft and Hard Voting Techniques in Ensemble Learning. J. Robot. Control (JRC) 2024, 5, 1104–1116. [Google Scholar]
- Shomiron. Zeustracker. Available online: https://github.com/dnif-archive/enrich-zeustracker (accessed on 25 July 2022).
- Stratosphere. Stratosphere Laboratory Datasets. Available online: https://www.stratosphereips.org/datasets-overviewRetrieved (accessed on 20 September 2024).
- Abuse.ch. Fighting Malware and Botnets. Available online: https://abuse.ch/ (accessed on 13 May 2022).
- Haddadi, F.; Zincir-Heywood, A.N. Benchmarking the effect of flow exporters and protocol filters on botnet traffic classification. IEEE Syst. J. 2014, 10, 1390–1401. [Google Scholar] [CrossRef]
- Kasongo, S.M.; Sun, Y. A deep learning method with filter based feature engineering for wireless intrusion detection system. IEEE Access 2019, 7, 38597–38607. [Google Scholar] [CrossRef]
- Miller, S.; Curran, K.; Lunney, T. Multilayer perceptron neural network for detection of encrypted VPN network traffic. In Proceedings of the 2018 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (Cyber SA), Glasgow, UK, 11–12 June 2018; pp. 1–8. [Google Scholar]
- Kazi, M.A.; Woodhead, S.; Gan, D. An Investigation to Detect Banking Malware Network Communication Traffic Using Machine Learning Techniques. J. Cybersecur. Priv. 2023, 3, 1–23. [Google Scholar] [CrossRef]
- Nasiri, H.; Alavi, S.A. A Novel Framework Based on Deep Learning and ANOVA Feature Selection Method for Diagnosis of COVID-19 Cases from Chest X-Ray Images. Comput. Intell. Neurosci. 2022, 2022, 4694567. [Google Scholar] [CrossRef] [PubMed]
- Alshanbari, H.M.; Mehmood, T.; Sami, W.; Alturaiki, W.; Hamza, M.A.; Alosaimi, B. Prediction and classification of COVID-19 admissions to intensive care units (ICU) using weighted radial kernel SVM coupled with recursive feature elimination (RFE). Life 2022, 12, 1100. [Google Scholar] [CrossRef] [PubMed]
- Kavya, D. Optimizing Performance: SelectKBest for Efficient Feature Selection in Machine Learning. Medium. 16 February 2023. Available online: https://medium.com/@Kavya2099/optimizing-performance-selectkbest-for-efficient-feature-selection-in-machine-learning-3b635905ed48 (accessed on 16 January 2025).
- dos Santos, C.H.M.; de Lima, S.M.L. XAI-driven antivirus in pattern identification of citadel malware. J. Comput. Sci. 2024, 82, 102389. [Google Scholar] [CrossRef]
- Liu, Z.; Wang, C.; Li, G. Feature Selection Algorithm Based on CFS Algorithm Emphasizing Data Discrimination. preprint 2023. [Google Scholar] [CrossRef]
- St, L.; Wold, S. Analysis of variance (ANOVA). Chemom. Intell. Lab. Syst. 1989, 6, 259–272. [Google Scholar]
- Luan, H.; Tsai, C.C. A review of using machine learning approaches for precision education. Educ. Technol. Soc. 2021, 24, 250–266. [Google Scholar]
- Davis, J.; Goadrich, M. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 233–240. [Google Scholar]
- Fourure, D.; Javaid, M.U.; Posocco, N.; Tihon, S. Anomaly detection: How to artificially increase your f1-score with a biased evaluation protocol. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer International Publishing: Cham, Switzerland, 2021; pp. 3–18. [Google Scholar]
- Visa, S.; Ramsay, B.; Ralescu, A.L.; Van Der Knaap, E. Confusion matrix-based feature selection. Maics 2011, 710, 120–127. [Google Scholar]
Classifier | FP | FN | Accuracy |
---|---|---|---|
Kstar | 0.275 | 0.026 | 88.69 |
J48 | 0.156 | 0.026 | 92.84 |
DT | 0.14 | 0.031 | 97.47 |
Malware Variant | FP | TN | FP | FN | Accuracy |
---|---|---|---|---|---|
Zeus 1 | 14,678 | 4352 | 969 | 1 | 0.9515 |
Zeus 2 | 14,663 | 4341 | 991 | 5 | 0.9502 |
Waledac 1 | 14,536 | 4500 | 963 | 1 | 0.9518 |
Waledac 2 | 14,521 | 4525 | 963 | 1 | 0.9523 |
Storm 1 | 10,139 | 4499 | 501 | 1386 | 0.8858 |
Storm 2 | 2300 | 503 | 247 | 3 | 0.9181 |
Botnet | Average Detection Rate | Average False Alarm Rate |
---|---|---|
HTTP-based | 0.95 | 0.041 |
IRC-based | 0.96 | 0.033 |
P2P-based | 0.91 | 0.037 |
Algorithm | Recall Score | Precision Score | F-Measure Score |
---|---|---|---|
Standard | 0.556 | 0.964 | 0.705 |
Botnet | Recall Score | Precision Score | F-Measure Score |
---|---|---|---|
Lenient with cost 10 | 0.556 | 0.964 | 0.705 |
Lenient with cost 20 | 0.667 | 0676 | 0671 |
Lenient with cost 30 | 0.667 | 0.686 | 0.676 |
Strict with cost 10 | 0.667 | 0.952 | 0.787 |
Strict with cost 20 | 0.611 | 0.989 | 0.755 |
Strict with cost 30 | 0.611 | 0.989 | 0.755 |
Dataset | Benign Samples Used for Training | Benign Samples Used for Testing | Malware Samples Used for Training | Benign Samples Used for Testing |
---|---|---|---|---|
Zeus-1 | 6099 | 6099 | 2614 | 2614 |
Zeus-2 | 611 | 611 | 262 | 262 |
Zeus (NETRESEC) | 252 | 252 | 108 | 108,100 |
Zeus (Snort) | 100 | 100 | 43 | 43 |
Conficker | 28,951 | 28,951 | 12,386 | 12,416 |
Torpig | 1864 | 1856 | 794 | 800 |
Dataset and Algorithm | Benign TPR | Benign FPR | Malware TPR | Malware FPR |
---|---|---|---|---|
Zeus-1—C4.5 | 86 | 17 | 83 | 14 |
Zeus-2—C4.5 | 96 | 1 | 99 | 4 |
Zeus (NETRESEC)—C4.5 | 97 | 3 | 97 | 3 |
Zeus (Snort)—C4.5 | 98 | 12 | 88 | 2 |
Zeus-1—SBB | 80 | 27 | 73 | 20 |
Zeus-2—SBB | 96 | 1 | 99 | 4 |
Zeus (NETRESEC)—SBB | 93 | 13 | 87 | 7 |
Zeus (Snort)—SBB | 98 | 2 | 98 | 2 |
Dataset and Algorithm | Benign TPR | Benign FPR | Malware TPR | Malware FPR |
---|---|---|---|---|
Zeus-1—C4.5 | 90 | 16 | 84 | 10 |
Zeus-2—C4.5 | 97 | 3 | 97 | 3 |
Zeus (NETRESEC)—C4.5 | 97 | 6 | 94 | 3 |
Zeus (Snort)—C4.5 | 97 | 1 | 99 | 3 |
Zeus-1—SBB | 73 | 18 | 82 | 27 |
Zeus-2—SBB | 94 | 0 | 100 | 6 |
Zeus (NETRESEC)—SBB | 87 | 7 | 93 | 13 |
Zeus (Snort)—SBB | 100 | 0 | 100 | 0 |
Algorithm | Test Log-Loss (%) | Misclassification Rate | Accuracy |
---|---|---|---|
KNN | 0.24 | 4.5 | 95.5 |
Logistic regression | 0.528 | 12.32 | 77.68 |
Random forest | 0.085 | 2.02 | 97.98 |
XGBoost | 0.078 | 1.24 | 98.76 |
Dataset Type | Malware Name/Year | Number of Flows | Name of Dataset for This Paper |
---|---|---|---|
Malware Benign | Zeus/2019 | 66,009 | Dataset1 |
N/A | 66,009 | ||
Malware Benign | Zeus/2019 | 38,282 | Dataset2 |
N/A | 38,282 | ||
Malware Benign | Zeus/2022 | 272,425 | Dataset3 |
N/A | 272,425 | ||
Malware Benign | ZeusPanda/2022 | 11,864 | Dataset4 |
N/A | 11,864 | ||
Malware Benign | Ramnit/2022 | 10,204 | Dataset5 |
N/A | 10,204 | ||
Malware Benign | Dridex/2018 | 134,998 | Dataset6 |
N/A | 134,998 |
Predicted Benign | Predicted Zeus | |
---|---|---|
Actual Benign (Total) | TN | FP |
Actual Zeus (Total) | FN | TP |
Dataset Name | Malware Precision Score | Malware Recall Score | Malware F1-Score | Benign Precision Score | Benign Recall Score | Benign F1-Score |
---|---|---|---|---|---|---|
Dataset 1 | 1.00 | 0.95 | 0.97 | 0.95 | 1.00 | 0.97 |
Dataset 2 | 1.00 | 0.95 | 0.97 | 0.96 | 1.00 | 0.98 |
Dataset 3 | 1.00 | 0.99 | 0.99 | 0.99 | 1.00 | 0.99 |
Dataset 4 | 1.00 | 0.99 | 0.99 | 0.99 | 1.00 | 0.99 |
Dataset 5 | 0.87 | 0.97 | 0.92 | 0.97 | 0.86 | 0.91 |
Dataset 6 | 0.78 | 0.66 | 0.71 | 0.70 | 0.82 | 0.76 |
Dataset Name | Malware Total Samples Tested | Malware Samples Classified Correctly | Malware Samples Classified Incorrectly | Total Benign Samples Tested | Benign Samples Classified Correctly | Benign Samples Classified Incorrectly |
---|---|---|---|---|---|---|
Dataset 1 | 66,009 | 62,906 | 3103 | 66,009 | 65,722 | 287 |
Dataset 2 | 38,282 | 36,519 | 1763 | 38,282 | 38,152 | 130 |
Dataset 3 | 272,425 | 270,328 | 2097 | 272,425 | 271,439 | 986 |
Dataset 4 | 11,864 | 11,728 | 136 | 11,864 | 11,820 | 44 |
Dataset 5 | 10,204 | 9941 | 263 | 10,204 | 8759 | 1445 |
Dataset 6 | 134,998 | 88,500 | 46,498 | 134,998 | 110,167 | 24,831 |
Dataset Name | Malware Precision Score | Malware Recall Score | Malware F1-Score | Benign Precision Score | Benign Recall Score | Benign F1-Score |
---|---|---|---|---|---|---|
Dataset 1 | 1.00 | 0.95 | 0.97 | 0.95 | 1.00 | 0.97 |
Dataset 2 | 1.00 | 0.95 | 0.97 | 0.96 | 1.00 | 0.98 |
Dataset 3 | 1.00 | 0.99 | 0.99 | 0.99 | 1.00 | 0.99 |
Dataset 4 | 1.00 | 0.99 | 0.99 | 0.99 | 1.00 | 0.99 |
Dataset 5 | 0.87 | 0.97 | 0.92 | 0.97 | 0.86 | 0.91 |
Dataset 6 | 0.78 | 0.66 | 0.71 | 0.70 | 0.82 | 0.76 |
Dataset Name | Total Malware Samples Tested | Malware Samples Classified Correctly | Malware Samples Classified Incorrectly | Total Benign Samples Tested | Benign Samples Classified Correctly | Benign Samples Classified Incorrectly |
---|---|---|---|---|---|---|
Dataset 1 | 66,009 | 65,051 | 958 | 66,009 | 66,003 | 6 |
Dataset 2 | 38,282 | 37,737 | 545 | 38,282 | 38,278 | 4 |
Dataset 3 | 272,425 | 272,276 | 149 | 272,425 | 272,401 | 24 |
Dataset 4 | 11,864 | 11,758 | 106 | 11,864 | 11,863 | 1 |
Dataset 5 | 10,204 | 9990 | 214 | 10,204 | 8852 | 1352 |
Dataset 6 | 134,998 | 88,586 | 46,412 | 134,998 | 111,428 | 23,570 |
Dataset Name | Malware Precision Score | Malware Recall Score | Malware F1-Score | Benign Precision Score | Benign Recall Score | Benign F1-Score |
---|---|---|---|---|---|---|
Dataset 1 | 1.00 | 0.90 | 0.95 | 0.91 | 1.00 | 0.95 |
Dataset 2 | 1.00 | 0.91 | 0.95 | 0.91 | 1.00 | 0.95 |
Dataset 3 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Dataset 4 | 1.00 | 0.99 | 0.99 | 0.99 | 1.00 | 0.99 |
Dataset 5 | 0.92 | 0.97 | 0.95 | 0.97 | 0.92 | 0.95 |
Dataset 6 | 0.85 | 0.50 | 0.63 | 0.65 | 0.91 | 0.76 |
Dataset Name | Total Malware Samples Tested | Malware Samples Classified Correctly | Malware Samples Classified Incorrectly | Total Benign Samples Tested | Benign Samples Classified Correctly | Benign Samples Classified Incorrectly |
---|---|---|---|---|---|---|
Dataset 1 | 66,009 | 59,476 | 6533 | 66,009 | 66,003 | 6 |
Dataset 2 | 38,282 | 34,659 | 3623 | 38,282 | 38,278 | 4 |
Dataset 3 | 272,425 | 272,423 | 2 | 272,425 | 272,401 | 24 |
Dataset 4 | 11,864 | 11,719 | 145 | 11,864 | 11,863 | 1 |
Dataset 5 | 10,204 | 9939 | 265 | 10,204 | 9397 | 807 |
Dataset 6 | 134,998 | 68,156 | 66,842 | 134,998 | 123,232 | 11,766 |
Dataset Name | Malware Precision Score | Malware Recall Score | Malware F1-Score | Benign Precision Score | Benign Recall Score | Benign F1-Score |
---|---|---|---|---|---|---|
Dataset 1 | 1.00 | 0.95 | 0.97 | 0.95 | 1.00 | 0.97 |
Dataset 2 | 1.00 | 0.95 | 0.97 | 0.96 | 1.00 | 0.98 |
Dataset 3 | 1.00 | 0.99 | 0.99 | 0.99 | 1.00 | 0.99 |
Dataset 4 | 1.00 | 0.99 | 0.99 | 0.99 | 1.00 | 0.99 |
Dataset 5 | 0.87 | 0.97 | 0.92 | 0.97 | 0.86 | 0.91 |
Dataset 6 | 0.78 | 0.66 | 0.71 | 0.70 | 0.82 | 0.76 |
Dataset Name | Total Malware Samples Tested | Malware Samples Classified Correctly | Malware Samples Classified Incorrectly | Total Benign Samples Tested | Benign Samples Classified Correctly | Benign Samples Classified Incorrectly |
---|---|---|---|---|---|---|
Dataset 1 | 66,009 | 65,051 | 958 | 66,009 | 66,003 | 6 |
Dataset 2 | 38,282 | 37,737 | 545 | 38,282 | 38,278 | 4 |
Dataset 3 | 272,425 | 272,276 | 149 | 272,425 | 272,401 | 24 |
Dataset 4 | 11,864 | 11,758 | 106 | 11,864 | 11,863 | 1 |
Dataset 5 | 10,204 | 9990 | 214 | 10,204 | 8852 | 1352 |
Dataset 6 | 134,998 | 88,586 | 46,412 | 134,998 | 111,428 | 23,570 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kazi, M.A. Detecting Malware C&C Communication Traffic Using Artificial Intelligence Techniques. J. Cybersecur. Priv. 2025, 5, 4. https://doi.org/10.3390/jcp5010004
Kazi MA. Detecting Malware C&C Communication Traffic Using Artificial Intelligence Techniques. Journal of Cybersecurity and Privacy. 2025; 5(1):4. https://doi.org/10.3390/jcp5010004
Chicago/Turabian StyleKazi, Mohamed Ali. 2025. "Detecting Malware C&C Communication Traffic Using Artificial Intelligence Techniques" Journal of Cybersecurity and Privacy 5, no. 1: 4. https://doi.org/10.3390/jcp5010004
APA StyleKazi, M. A. (2025). Detecting Malware C&C Communication Traffic Using Artificial Intelligence Techniques. Journal of Cybersecurity and Privacy, 5(1), 4. https://doi.org/10.3390/jcp5010004