Next Article in Journal
Acknowledgment to the Reviewers of Journal of Cybersecurity and Privacy in 2022
Next Article in Special Issue
How Close Is Existing C/C++ Code to a Safe Subset?
Previous Article in Journal
User Reputation on E-Commerce: Blockchain-Based Approaches
Previous Article in Special Issue
A Survey of the Recent Trends in Deep Learning Based Malware Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Investigation to Detect Banking Malware Network Communication Traffic Using Machine Learning Techniques

Old Royal Naval College, The University of Greenwich, Park Row, London SE10 9LS, UK
*
Author to whom correspondence should be addressed.
J. Cybersecur. Priv. 2023, 3(1), 1-23; https://doi.org/10.3390/jcp3010001
Submission received: 11 November 2022 / Revised: 7 December 2022 / Accepted: 19 December 2022 / Published: 27 December 2022
(This article belongs to the Special Issue Secure Software Engineering)

Abstract

:
Banking malware are malicious programs that attempt to steal confidential information, such as banking authentication credentials, from users. Zeus is one of the most widespread banking malware variants ever discovered. Since the Zeus source code was leaked, many other variants of Zeus have emerged, and tools such as anti-malware programs exist that can detect Zeus; however, these have limitations. Anti-malware programs need to be regularly updated to recognise Zeus, and the signatures or patterns can only be made available when the malware has been seen. This limits the capability of these anti-malware products because they are unable to detect unseen malware variants, and furthermore, malicious users are developing malware that seeks to evade signature-based anti-malware programs. In this paper, a methodology is proposed for detecting Zeus malware network traffic flows by using machine learning (ML) binary classification algorithms. This research explores and compares several ML algorithms to determine the algorithm best suited for this problem and then uses these algorithms to conduct further experiments to determine the minimum number of features that could be used for detecting the Zeus malware. This research also explores the suitability of these features when used to detect both older and newer versions of Zeus as well as when used to detect additional variants of the Zeus malware. This will help researchers understand which network flow features could be used for detecting Zeus and whether these features will work across multiple versions and variants of the Zeus malware.

1. Introduction

Cybercrime is a major threat to cybersecurity [1,2] estimates that the yearly cost of cybercrime could rise to USD 10.5 trillion by the year 2025 and a significant proportion of this is related to malware such as banking malware. Banking malware have also been increasing on a yearly basis, and according to [3], banking malware attacks have increased by 80% in 2021 alone. One of these banking variants, specifically, the Zeus malware (from hereon, referred to as Zeus), has become one of the most prevalent banking malware variants ever discovered [4]. Furthermore, in 2011, the Zeus program code was made public [5], allowing malware developers to create additional variants of Zeus and to also develop additional modules for the Zeus malware [6]. Since the Zeus code was leaked, many variants of Zeus have emerged, and some of these include ZeusPanda, Ramnit and Citadel.

1.1. Need for Malware Detection

As the number of malware and their variants are increasing rapidly and becoming more sophisticated and prevalent [7], additional modern techniques need to be developed to detect these malware variants, and [7] highlights the importance of using AI to detect malware. The authors of [8] also discuss the limitations in other malware detection approaches, such as detecting malicious patterns in executables, and using heuristic-based approaches and statistical approaches and have recommended that researchers should use machine learning and deep learning approaches to address these limitations. Signature-based malware detection systems also exist, but these systems also have limitations; for example, they can only detect known malware [9].
This paper proposes a framework and methodology to detect malware and benign traffic using machine learning and deep learning algorithms. The main contributions of this paper are to develop a methodology to detect the Zeus banking malware and differentiate it from benign traffic using binary classification machine learning algorithms. This paper will compare three binary classification algorithms to determine which provides the best detection results when used to detect Zeus from benign traffic. This paper also determines the minimum number of features that could be used to detect Zeus and benign traffic. Researchers [10,11,12,13] have discussed and proposed several supervised machine learning (ML) algorithms that could be used for analysing this type of problem and this paper uses three of these ML algorithms. These are: random forest ML algorithm, decision tree ML algorithm and the KNN deep learning algorithm. This paper aims to:
  • Determine a methodology that can be used by deep learning and machine learning algorithms for detecting the Zeus malware.
  • Determine which ML algorithm produces the best detection results.
  • Determine whether the features that produce the best detection results on one dataset will work on other datasets from other sources.
  • Determine a minimum set of features that could be used for detecting Zeus.
  • Determine whether the features that produce the best detection results work across newer and older versions of Zeus.
  • Determine whether the features that produce the best detection results when detecting Zeus also work on additional variants of the Zeus malware.

1.2. Zeus Malware Architecture

An important feature of the Zeus malware is the way that it communicates, as it uses command and control channels (C&C) for this purpose. The author of [14] has discussed the various phases of the C&C communication, which can be seen in Figure 1. This communication can occur using either a centralised or a peer-to-peer architecture, with the peer-to-peer architecture being more robust and resilient [15]. This is because if the central C&C server becomes unreachable or is taken down, the Zeus bots will not be able to communicate with the C&C server, preventing the bots from receiving commands, updating themselves and downloading new configuration files [16]. Newer variants of Zeus use the P2P C&C architecture. These are more resilient to takedown efforts because the configuration file does not point to a static C&C server [17]. Instead, the C&C server information is obtained from a peer (proxy bot), which can be updated if the C&C server is taken down or becomes unreachable [18]. Stolen data is routed through the C&C network to the malware authors’ C&C server, where the stolen data is decrypted and saved to a database [19].
As discussed by [20], Zeus propagates like a virus, mainly infecting Windows systems and predominantly, the infection vector occurs via phishing emails, which is a significant distribution mechanism for malware. Research by [21] has discussed this in detail, and states that around 90 percent of data breaches are caused by phishing. Once the Zeus binary executes on a Windows system, it performs several actions. One of these is to create two files called local.ds and user.ds. Local.ds is the dynamic configuration of the file downloaded from the command and control (C&C) server, while the user.ds stores stolen credentials and other information that needs to be transmitted back to the C&C server [22]. Additional code is injected into svchost and is responsible for network communications. Svchost is also responsible for injecting malicious code into many Windows processes, which provide Zeus with the ability to steal credentials and launch financial attacks.

2. Related Studies

Bothunter [23] is a perimeter scanning system which uses three sensors and a correlation engine to identify malicious traffic flows that can occur between an infected host and a malicious entity. Bothunter [23] has been built on top of the open-source platform called SNORT, and it is an application developed to track the various stages of the malware communication flow and can correlate both inbound and outbound traffic to identify malware traffic. Two plugins called SLADE and SCADE are used by Bothunter, and SCADEs role is to analyse the communication flows to identify traffic patterns that can be considered harmful. These traffic patterns include:
  • Hosts that frequently scan external IP addresses.
  • Outbound connection failures.
  • An evenly distributed communication pattern which is likely to indicate that that communication is malicious.
SLADEs role is to analyse network packets and alert the administrator if a packet deviates from an established profile. SLADE was developed using PAYL [24], which allows SLADE to examine 256 features of the packet and then use this information to make determinations as to whether the packet is malicious or not.
Botminer [25] is a tool that was designed to detect groups of compromised computers, and this is achieved by monitoring network communication flows using two modules, a C-plane module, and an A-plane module. The C-plane’s role is to log network traffic to identify all the hosts that are communicating, and the A-plane’s role is to identify what these hosts are doing. Features extracted from both these modules can be used to identify communication patterns that are similar between hosts and if these communication patterns are malicious, it is indicative that a particular group of hosts are communicating maliciously. The A-plane module is based on Bothunter’s [23] SCADE module and can analyse communications to determine malicious communication patterns [25].
CONIFA [26] uses machine learning to detect malware communication traffic, and it does this by training and testing the Zeus malware by using the correlation-based feature selection (CFS) algorithm with the C4.5 classification algorithm. To improve CONIFAs accuracy and prediction results, [26] created a cost-sensitive variant of the C4.5 classification algorithm, which uses a lenient and strict classifier and compares the prediction results to a standard machine learning framework, which uses a cost-insensitive version of the C4.5 algorithm. The standard framework’s detection rate was good when evaluating the training dataset; however, when evaluating the test data, the recall rate dropped to 56%. CONIFAs results demonstrated an improvement in the detection accuracy, with the recall rate increasing to 67%.
The RCC Detector (RCC) [27] analyses network traffic flowing from a host to identify any malware communication traffic. To do this, the RCC [27] uses a multi-layer perceptron (MLP) and a temporal persistence (TP) classifier. The MLP classifier is made up of an input layer, an output layer and one hidden layer [27], and these are used to classify botnets using several characteristics, including, the Flow count, session length, uniformity score and the Kolmogorov–Smirnov Test.
The multi-layer feed forward network (MLFFN) [28] is a tool that extracts TCP features from the TCP connections originating from a host computer and uses these to identify botnet communication traffic. MLFFN [28] consists of an input layer made up of six neurons and an output layer made up of four neurons. MLFFN was tested on four datasets, namely, Zeus-1, Zeus-2, Spyeye-1 and Spyeye-2, and it is worth noting that these are all older versions of the Zeus malware.
Genetic programming (GP) [29] used the Symbiotic Bid-Based (SBB) algorithm and the C4.5 machine learning algorithms to identify unique botnet communication patterns, and to do this, features were extracted from the communication flows of three malware variants including Zeus, Conficker and Torpig. The features were extracted using Softflowd [30], and the authors of [29] were able to categorise these three malware variants. It is worth noting that the results are based the usage of the older versions of the malware variants.
MOCA [31] uses a two-stage monitoring and classification system to detect and classify malicious attacks. It does this by identifying behaviours within the network flows that are outside of the normal range (abnormal) and this part of the MOCA system is classed as the stage one classifier of the MOCA system. These abnormal behaviours are then sent to the stage two classifier, which attempts to classify the attacks into a class such as a DDoS attack in an IoT network or a Botnet attack. Two datasets were used for testing, CICIDS2017 and CICDDOS2019, and the accuracy achieved was 99.84% for CICIDS2017 and 93% for the CICDDOS2019 dataset. The algorithms used in this research include the decision tree, random forest and XGBoost ML algorithms.

3. Problem Statement

This paper intends to develop a framework and methodology that uses machine learning techniques to detect malware. Other methodologies exist and have been used by many researchers to detect malware. These include anomaly-based detection approaches such as those discussed by [32,33], and signature-based approaches such as those discussed by [34,35]; however, these do have drawbacks, and these are highlighted in [36]. For example, signature based-systems need to be updated regularly to cater for newly emerging malware variants, and signature-based systems are not able to detect unknown malware variants or zero-day malware.
Machine learning can help address many of these issues [37] and this paper has developed a framework and approach using machine learning that will be able to detect several banking malware variants. Although other researchers [26,27,28] have done some experimental work on detecting malware, there is little to no research that aims to detect a range of malware variants by only training one dataset, i.e., one malware variant. This research paper aims to use only one dataset for training and then use this to build a machine learning model. This model is then used to detect multiple banking malware variants and is also used to distinguish between benign and malware communication traffic. This research is also analysing banking malware variants that have emerged recently and those that have been around since they were developed, and this should ensure that both the older and newer versions of the banking malware are detectable by the machine learning algorithms.

4. Research Methodology

This research paper aims to classify network traffic flows as either Zeus (malware) or benign (good). For this research, the raw network traffic samples were collected as pcap files, and each pcap file is made up of network flows, which refers to a sequence of packets flowing between a source and a destination host. In this paper, the flows are referred to as ML samples and the features are extracted from these samples.

4.1. Data Collection and Preperation

Figure 2 depicts the data collection and preparation steps and is discussed further in this section. To prepare the data for the ML algorithms, the features were extracted from the samples using Netmate-flowcalc (NF), a tool developed by [38], and were then exported into a CSV file. NF was used because it is an open-source tool that can extract the features required by the ML algorithms and has also been used by other researchers [39,40,41,42]. A total of 44 features were extracted by NF (see Appendix A for a brief description of the features), and the features from the benign and Zeus flows were extracted into separate CSV files and labelled. A label of ‘1’ was applied to the Zeus samples and a label of ‘0’ was applied to the benign samples. The two files were then combined into one CSV file, and this was used for the empirical analysis conducted during this research.

4.2. Feature Selection

One of the main issues in ML is selecting the appropriate features for the ML algorithm, and the criticality of this has been discussed by many researchers such as [43,44]. Selecting the right features has the following benefits:
  • Variance (overfitting) is reduced.
  • Computational cost and the time for running the algorithm is reduced.
  • Enables the ML algorithm to learn faster.
There are several techniques that can be used for selecting the appropriate and best features and [45,46] discuss these in detail. For example, two of these techniques are:
  • Filter method—Feature selection is independent of the ML algorithm.
  • Wrapper method—A subset of the features are selected and used to train the ML algorithm. Based on the results, features are either removed or added until the best features are determined.
For this research, the features were studied [47,48,49,50] and based on this, the features were divided into two groups, called Feature set1 and Feature set2, and only the features from Feature set1 were used during this research. Feature set2 contained those features that were not used during this research and were excluded. This was because these features could potentially be related to the characteristics of the network from which the packets were extracted, resulting in the ML algorithm making false correlations. For example, if the benign and malware traffic came from a particular IP address range, the ML algorithm might use the IP address information to make predictions. Table 1 shows the features that were excluded (Feature set2). All the remaining features were included in Feature set1 and were used during this research. These features are: total_fpackets, total_fvolume, total_bpackets, total_bvolume, min_fpktl, mean_fpktl, max_fpktl, std_fpktl, min_bpktl, mean_bpktl, max_bpktl, std_bpktl, sflow_fpackets, sflow_fbytes, sflow_bpackets, sflow_bbytes, fpsh_cnt, bpsh_cnt, furg_cnt, burg_cnt, total_fhlen, total_bhlen, duration, min_active, mean_active, max_active, std_active, min_idle, mean_idle, max_idle and std_idle.

4.3. Datasets (Samples)

This paper analyses and compares the performance of the ML algorithms using nine datasets obtained from four locations. One location was Zeustracker [51], a website that monitors Zeus C&C activities, and these samples were downloaded on 4 February 2019. The other datasets were obtained from Stratosphere, Abuse.ch and Dalhousie University, and these datasets are a combination of older and newer versions of the Zeus malware and three other variants of the Zeus malware, which are ZeusPanda, Ramnit and Citadel. Stratosphere [52] specializes in collecting malware and benign traffic captures, and they have multiple datasets which have been made available for research purposes. Abuse.ch is a research project that identifies and tracks malware and botnets, and is a platform integrated with many commercial and open-source platforms, including VirusTotal, ClamAV, Karspersky and Avast [53]. Dalhousie University has botnet samples that are available for download and these samples are part of the NIMS botnet research dataset and have been used by other researchers [54]. Table 2 describes the datasets that were used for the research reported in this paper.

4.4. Machine Learning Algorithms

The ML algorithms used for this research are discussed in this section, and they are supervised machine learning algorithms as these are used and are the most suitable for classification problems, as discussed by [55]. The machine learning algorithms used during this research include the decision tree (DT) algorithm, the random forest (RF) algorithm and the keras neural network (KNN) deep learning algorithm.
The decision tree algorithm is a common machine learning algorithm that can be used for classification problems [56] and is especially useful when used for binary classification problems [56]. For this reason, the decision tree algorithm is well suited for this prediction problem because this analysis is trying to determine if the network flow is malicious (Zeus banking malware traffic), or benign. The authors of [57] also state that the decision tree algorithm can produce good prediction results.
The random forest (RF) algorithm works by building and combining multiple decision trees [58]. It can be more efficient and provide better prediction results than the decision tree algorithm [59], and it reduces the possibility of overfitting [60]. It is important to tune the parameters to try and increase the prediction accuracy when using the RF algorithm; however, it is difficult to predict the best parameters ahead of time as the parameters are selected based on trial and error. One of these parameters is the number of trees built during the training and testing of the data. The author of [61] states that building more than 128 trees provides no significant gain in the accuracy and can increase costs. The authors of [61] also state that the optimum number of trees for the random forest classifier was found to be between 64 and 128. For this empirical analysis, the random forest algorithm was coded to build between 64 and 128 decision trees, and once the training was complete, the optimal number of trees was selected based on the best prediction results.
Keras is a popular neural network library implemented in Python [62] and can be used for classification problems such as the one examined during this research [63]. The keras neural network (KNN) deep learning algorithm was used for training and testing the datasets and for this empirical analysis, a sequential KNN model [64] was used, which means that the output of one layer is input into the next layer. For this research, the deep learning model consisted of one input layer, three hidden layers and one output layer, and a graphical representation of this can be seen in Figure 3. It is important to note that only one of the datasets was used for training and the remaining datasets were used for testing.

4.5. System Architecture and Methodology

The system architecture is depicted in Figure 4 and shows the steps that are completed to prepare the samples for the ML algorithms. These include:
  • The datasets are identified and collected.
  • Features are extracted from these datasets.
  • The extracted features are transferred to a CSV file and prepared.
  • The features are selected for training and testing.
  • The algorithm is trained and tested, and a model is created. Only one dataset is used for the training.
  • The model is tuned and trained and tested again if required.
  • The model is used to test and evaluate the remaining datasets.
  • Deploy the final model, test all the data samples and create a report highlighting the evaluation metrics.
Figure 4. System Design.
Figure 4. System Design.
Jcp 03 00001 g004

4.6. Evaluation

Precision, recall and f1-score evaluation metrics [65] are used to determine the accuracy of the ML algorithms. Precision is the percentage of correctly identified positive cases from the whole data sample, which in this case is the malware and benign samples [65]. Recall is the percentage of correctly identified positive cases from the positive samples only [66], which in this case is the malware samples. The formulas to calculate precision and recall are:
P r e c i s i o n = T P T P + F P
R e c a l l = T P T P + F N
The f1-score is another measure used for evaluation, and this considers both the positive and negative cases. The author of [67] states that the precision and recall are both combined during the evaluation of the ML algorithm. The formula to calculate the f1-Score is set out below:
F 1 S c o r e = 2 × ( P r e c i s i o n R e c a l l ) T r e c i s i o + R e c a l l
A confusion matrix [67] will also be generated, and an example of this is shown in Table 3. The confusion matrix will be used to measure the performance and prediction accuracy of the algorithm when tested and evaluated on the unseen datasets, and it will identify how many Zeus and benign samples were correctly identified.

5. Results

This section presents the training and testing results of the three algorithms and compares the prediction results.

5.1. Training and Testing the Machine Learning Algorithms Using the Data Sets

The DT and RF algorithms were trained on Dataset1, using all the features from Feature set1, and a model was created and used for testing all the remaining datasets. The precision, recall and f1-score results for the DT algorithm can be seen in Table 4, and the precision, recall and f1-score results for the RF algorithm can be seen in Table 5. A confusion matrix was generated for testing all the datasets and the results of these can be seen in Table 6 and Table 7. They also show the number of Zeus samples tested, how many of the Zeus samples were correctly classified (true positives) and how many of the Zeus samples were misclassified (false negatives). The table also shows the number of benign samples tested, how many of these were classified correctly (true negatives) and how many of these were misclassified (false positives).

5.2. Training and Testing the Deep Learning Algorithm Using the Data Sets

The DL algorithm was also trained in a similar manner and the precision, recall and f1-score results can be seen in Table 8, and the confusion matrices can be seen in Table 9.

5.3. Comparing the Predication Results of the Three Algorithms Tested

The results obtained from testing the three algorithms are all compared in this section. Figure 5 shows the true positive results of all the algorithms when tested against all the datasets, and Figure 6 shows the true negative results when tested against all the datasets.
The Zeus malware prediction accuracy for dataset1, dataset2, dataset3, dataset7, dataset8 and dataset9 were all above 90%, with the random forest algorithm performing the best with an average accuracy prediction result of 97% across these datasets. The three 2014 Zeus datasets (dataset4, dataset5 and dataset6) produced mixed results with the deep learning algorithm, performing better than the other two, with a detection result of 86% for dataset6. For dataset4 and dataset5, the random forest algorithm performed the best with a result of 71% and 74%, respectively.
For the benign traffic, the prediction results showed that for dataset1, dataset2, dataset3, dataset5, dataset6, dataset7, dataset8 and dataset9, the prediction accuracy for all the algorithms were above 90%, with the random forest algorithm performing the best with an average accuracy prediction result of 98% across these datasets. For dataset4 and dataset8, the deep learning algorithm performed best with a result of 95% and 94%, respectively, and the decision tree algorithm had the lowest prediction with a result of 87% for both these datasets.
This paper has demonstrated a methodology that could be used to detect the Zeus malware and its variants and has demonstrated that the methodology does work across multiple datasets and three other variants of the Zeus malware. The next section (Section 4.4) investigates the impact of the prediction accuracy when the number of features used during testing and training are reduced.

5.4. Reducing the Features to the Minimum Number of Possible Features

Multiple experiments were conducted by reducing the features from Feature set1 and this section seeks to investigate the prediction accuracy of both the malware and benign traffic as the number of features are reduced. To do this, the ML algorithms were trained and tested using dataset1, and the impact rating of each feature was determined and then used to establish which features have the highest impact ratings and which features have the lowest impact ratings. Some of these impact ratings can be seen in Figure 7. For example, Figure 7 shows that the mean active feature has an impact rating of 13.103% and that max_bpktl has an impact rating of 6.025%. Analysing the features in this way supported the systematic removal of the features, and this process can be seen in Figure 8. This process is described here.
  • Remove one feature which has the lowest impact score.
  • Training a dataset with this one feature redacted.
  • Test the remaining datasets.
  • Calculate the prediction accuracy and record the results.
  • Remove another feature and re-train the dataset.
  • Test the remaining datasets.
  • Calculate the prediction accuracy and record these results.
  • Repeat this process until the accuracy of two of the datasets fall below 50% during testing, as this would mean that more than half of the Zeus samples were misclassified for two or more of the datasets.
Figure 7. Feature impact scores.
Figure 7. Feature impact scores.
Jcp 03 00001 g007
Figure 8. Flow diagram showing the feature redaction process.
Figure 8. Flow diagram showing the feature redaction process.
Jcp 03 00001 g008

5.5. Training and Testing with the Minimum Number of Features with the DL Algorithm

Following the process discussed in Section 5.4, it was determined that the minimum number of features that could be used by the DL algorithm are as follows: total_fvolume, total_bpackets, total_bvolume, min_fpktl, mean_fpktl, max_fpktl, std_fpktl, min_bpktl, mean_bpktl, max_bpktl, std_bpktl, sflow_fbytes, sflow_bbytes, bpsh_cnt, duration, min_active, mean_active, max_active, min_idle and max_idle. The precision, recall and f1-score results can be seen in Table 10 and the confusion matrices can be seen in Table 11.
Figure 9 compares the results of detecting the Zeus malware between using all the features and the minimum number of features. The prediction results of dataset1, dataset2, dataset3, dataset4, dataset7, dataset8 and dataset9 were all within 5% of each other, dataset5 was within 9% and dataset6 was within 12%. Figure 10 compares the results of detecting the benign samples between using all the features and the minimum number of features when tested with the deep learning algorithm and shows that the prediction results of all the datasets were within 1% of each other.

5.6. Training and testing using the minimum number of features with the DT algorithm

Similar experiments were conducted using the DT algorithm and it was determined that the minimum number of features that could be used by the DT algorithm are: total_fvolume, total_bpackets, total_bvolume, min_fpktl, mean_fpktl, max_fpktl, min_bpktl, mean_bpktl, max_bpktl, std_bpktl, sflow_fbytes, sflow_bbytes, furg_cnt, burg_cnt, duration, min_active, mean_active, max_active, min_idle and max_idle.
The precision, recall and f1-score results can be seen in Table 12, and the confusion matrices can be seen in Table 13. Figure 11 compares the results of detecting the Zeus malware between using all the features and the minimum number of features and shows that the prediction results of dataset1, dataset2, dataset3, dataset4, dataset6, dataset7, dataset8 and dataset9 were all within 5% of each other, and dataset5 was within 8%. Figure 12 compares the results of detecting the benign samples between using all the features and the minimum number of features and shows that the prediction results of all the datasets are within 1% of each other.

5.7. Training and Testing Using the Minimum Number of Features with the RF Algorithm

Multiple experiments were conducted using the RF algorithm and the features were manually reduced by following the process described above (Section 5.4). This process was repeated until two of the dataset prediction results fell below 50% and it was determined that the minimum number of features that could be used are as follows: total_fvolume, total_bvolume, min_fpktl, mean_fpktl, max_fpktl, min_bpktl, mean_bpktl, max_bpktl, std_bpktl, sflow_fbytes, sflow_bbytes, bpsh_cnt, duration, min_active, mean_active, max_active and min_idle. The precision, recall and f1-score results for testing all the datasets using the minimum number of features with the RF algorithm can be seen in Table 14 and the confusion matrices can be seen in Table 15.
Figure 13 compares the results of detecting the Zeus malware between using all the features and the minimum number of features when tested with the DT algorithm, and shows that the prediction results of dataset1, dataset2, dataset3, dataset7, dataset8 and dataset9 were all within 5% of each other and that dataset4, dataset5 and dataset6 were within 16% of each other. Figure 12 compares the results of detecting the benign samples between using all the features and the minimum number of features and shows that the prediction results of all the datasets were within 1% of each other.
Figure 14 compares the true positive results of all three algorithms when tested using the minimum number of features, and the malware prediction results for all the datasets apart from dataset6 were within 10% of each. Dataset6 was an outlier with a difference of 36%, and in this case, the DL algorithm performing the best with a prediction result of 74% and the DT performing the worst with a prediction result of 38%. Figure 15 compares the results of detecting the benign samples between using all the features and the minimum number of features and shows that the prediction results of all the datasets were within 2% of each other.

6. Conclusions

The empirical analysis has shown that the framework and methodology adopted for this research can detect both older and newer versions of the Zeus banking malware, which demonstrates the potential of the framework to detect banking malware that evolve over time. The framework and methodology can also predict other banking malware variants, which demonstrates the potential to detect a wide range of banking malware variants without the need to analyse each banking malware variant to learn about their features.
For future work, there is a potential to further this research by enhancing the methodology to incorporate additional banking malware variants. Moreover, further research can be conducted to detect other malware variants and improve the prediction accuracy when detecting them. Researchers can also further this research by designing and building an IDS solution that could detect a wide range of malware, and the findings from this research could be used for this and by anti-malware vendors when they design malware detection tools. Action on the malicious traffic could also be taken once the malware has been detected. The findings from this research can be used by other researchers to develop their own malware prediction tools to enhance their research.

Author Contributions

Conceptualization, M.A.K.; methodology, M.A.K.; software, M.A.K.; validation, M.A.K., S.W. and D.G.; formal analysis, M.A.K.; investigation, M.A.K.; resources, M.A.K.; data curation, M.A.K.; writing—original draft preparation, M.A.K.; writing—review and editing, M.A.K., S.W. and D.G; visualization, M.A.K.; supervision, S.W. and D.G.; project administration, M.A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not Applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

srcip(string) The source ip address
srcportThe source port number
dstip(string) The destination ip address
dstportThe destination port number
protoThe protocol (ie. TCP = 6, UDP = 17)
total_fpacketsTotal packets in the forward direction
total_fvolumeTotal bytes in the forward direction
total_bpacketsTotal packets in the backward direction
total_bvolumeTotal bytes in the backward direction
min_fpktlThe size of the smallest packet sent in the forward direction (in bytes)
mean_fpktlThe mean size of packets sent in the forward direction (in bytes)
max_fpktlThe size of the largest packet sent in the forward direction (in bytes)
std_fpktlThe standard deviation from the mean of the packets sent in the forward direction (in bytes)
min_bpktlThe size of the smallest packet sent in the backward direction (in bytes)
mean_bpktlThe mean size of packets sent in the backward direction (in bytes)
max_bpktlThe size of the largest packet sent in the backward direction (in bytes)
std_bpktlThe standard deviation from the mean of the packets sent in the backward direction (in bytes)
min_fiatThe minimum amount of time between two packets sent in the forward direction (in microseconds)
mean_fiatThe mean amount of time between two packets sent in the forward direction (in microseconds)
max_fiatThe maximum amount of time between two packets sent in the forward direction (in microseconds)
std_fiatThe standard deviation from the mean amount of time between two packets sent in the forward direction (in microseconds)
min_biatThe minimum amount of time between two packets sent in the backward direction (in microseconds)
mean_biatThe mean amount of time between two packets sent in the backward direction (in microseconds)
max_biatThe maximum amount of time between two packets sent in the backward direction (in microseconds)
std_biatThe standard deviation from the mean amount of time between two packets sent in the backward direction (in microseconds)
durationThe duration of the flow (in microseconds)
min_activeThe minimum amount of time that the flow was active before going idle (in microseconds)
mean_activeThe mean amount of time that the flow was active before going idle (in microseconds)
max_activeThe maximum amount of time that the flow was active before going idle (in microseconds)
std_activeThe standard deviation from the mean amount of time that the flow was active before going idle (in microseconds)
min_idleThe minimum time a flow was idle before becoming active (in microseconds)
mean_idleThe mean time a flow was idle before becoming active (in microseconds)
max_idleThe maximum time a flow was idle before becoming active (in microseconds)
std_idleThe standard devation from the mean time a flow was idle before becoming active (in microseconds)
sflow_fpacketsThe average number of packets in a sub flow in the forward direction
sflow_fbytesThe average number of bytes in a sub flow in the forward direction
sflow_bpacketsThe average number of packets in a sub flow in the backward direction
sflow_bbytesThe average number of packets in a sub flow in the backward direction
fpsh_cntThe number of times the PSH flag was set in packets travelling in the forward direction (0 for UDP)
bpsh_cntThe number of times the PSH flag was set in packets travelling in the backward direction (0 for UDP)
furg_cntThe number of times the URG flag was set in packets travelling in the forward direction (0 for UDP)
burg_cntThe number of times the URG flag was set in packets travelling in the backward direction (0 for UDP)
total_fhlenThe total bytes used for headers in the forward direction.
total_bhlenThe total bytes used for headers in the backward direction.

References

  1. Wadhwa, A.; Arora, N. A Review on Cyber Crime: Major Threats and Solutions. Int. J. Adv. Res. Comput. Sci. 2017, 8, 2217–2221. [Google Scholar]
  2. Morgan, S. Cybercrime to Cost the World $10.5 Trillion Annually by 2025. Available online: https://cybersecurityventures.com/hackerpocalypse-cybercrime-report-2016/ (accessed on 2 November 2022).
  3. Nokia Banking Malware Threats Surging as Mobile Banking Increases–Nokia Threat Intelligence Report. Available online: https://www.nokia.com/about-us/news/releases/2021/11/08/banking-malware-threats-surging-as-mobile-banking-increases-nokia-threat-intelligence-report/ (accessed on 2 November 2022).
  4. Vijayalakshmi, Y.; Natarajan, N.; Manimegalai, P.; Babu, S.S. Study on emerging trends in malware variants. Int. J. Pure Appl. Math. 2017, 116, 479–489. [Google Scholar]
  5. Etaher, N.; Weir, G.R.; Alazab, M. From zeus to zitmo: Trends in banking malware. In Proceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA, Helsinki, Finland, 20–22 August 2015; IEEE: Washington, DC, USA, 2015; Volume 1, pp. 1386–1391. [Google Scholar]
  6. Ibrahim, L.M.; Thanon, K.H. Botnet Detection on the Analysis of Zeus Panda Financial Botnet. Int. J. Eng. Adv. Technol. 2019, 8, 1972–1976. [Google Scholar] [CrossRef]
  7. Owen, H.; Zarrin, J.; Pour, S.M. A Survey on Botnets, Issues, Threats, Methods, Detection and Prevention. J. Cybersecur. Priv. 2022, 2, 74–88. [Google Scholar] [CrossRef]
  8. Tayyab, U.-E.; Khan, F.B.; Durad, M.H.; Khan, A.; Lee, Y.S. A Survey of the Recent Trends in Deep Learning Based Malware Detection. J. Cybersecur. Priv. 2022, 2, 800–829. [Google Scholar] [CrossRef]
  9. Aboaoja, F.A.; Zainal, A.; Ghaleb, F.A.; Al-rimy, B.A.S.; Eisa, T.A.E.; Elnour, A.A.H. Malware Detection Issues, Challenges, and Future Directions: A Survey. Appl. Sci. 2022, 12, 8482. [Google Scholar] [CrossRef]
  10. Ahsan, M.; Nygard, K.E.; Gomes, R.; Chowdhury, M.M.; Rifat, N.; Connolly, J.F. Cybersecurity Threats and Their Mitigation Approaches Using Machine Learning—A Review. J. Cybersecur. Priv. 2022, 2, 527–555. [Google Scholar] [CrossRef]
  11. Bukvić, L.; Pašagić Škrinjar, J.; Fratrović, T.; Abramović, B. Price Prediction and Classification of Used-Vehicles Using Supervised Machine Learning. Sustainability 2022, 14, 17034. [Google Scholar] [CrossRef]
  12. Okey, O.D.; Maidin, S.S.; Adasme, P.; Lopes Rosa, R.; Saadi, M.; Carrillo Melgarejo, D.; Zegarra Rodríguez, D. BoostedEnML: Efficient Technique for Detecting Cyberattacks in IoT Systems Using Boosted Ensemble Machine Learning. Sensors 2022, 22, 7409. [Google Scholar] [CrossRef]
  13. Singh, A.; Thakur, N.; Sharma, A. A review of supervised machine learning algorithms. In Proceedings of the 2016 3rd International Conference on Computing for Sustainable Global Development 2016, (INDIACom), New Delhi, India, 16–18 March 2016; pp. 1310–1315. [Google Scholar]
  14. Aswathi, K.B.; Jayadev, S.; Krishna, N.; Krishnan, R.; Sarath, G. Botnet Detection using Machine Learning. In Proceedings of the 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India, 6–8 July 2021; pp. 1–7. [Google Scholar] [CrossRef]
  15. Kazi, M.; Woodhead, S.; Gan, D. A contempory Taxonomy of Banking Malware. In Proceedings of the First International Conference on Secure Cyber Computing and Communications, Jalandhar, India, 15–17 December 2018. [Google Scholar]
  16. Falliere, N.; Chien, E. Zeus: King of the Bots. 2009. Available online: http://bit.ly/3VyFV1 (accessed on 12 November 2022).
  17. Lelli, A. Zeusbot/Spyeye P2P Updated, Fortifying the Botnet. Available online: https://www.symantec.com/connect/blogs/zeusbotspyeye-p2p-updated-fortifying-botnet (accessed on 5 November 2019).
  18. Riccardi, M.; Di Pietro, R.; Palanques, M.; Vila, J.A. Titans’ Revenge: Detecting Zeus via Its Own Flaws. Comput. Netw. 2013, 57, 422–435. [Google Scholar] [CrossRef]
  19. Andriesse, D.; Rossow, C.; Stone-Gross, B.; Plohmann, D.; Bos, H. Highly Resilient Peer-to-Peer Botnets Are Here: An Analysis of Gameover Zeus. In Proceedings of the 2013 8th International Conference on Malicious and Unwanted Software: “The Americas” (MALWARE), Fajardo, PR, USA, 22–24 October 2013; pp. 116–123. [Google Scholar]
  20. Kazi, M.A.; Woodhead, S.; Gan, D. Comparing the performance of supervised machine learning algorithms when used with a manual feature selection process to detect Zeus malware. Int. J. Grid Util. Comput. 2022, 13, 495–504. [Google Scholar] [CrossRef]
  21. Md, A.Q.; Jaiswal, D.; Daftari, J.; Haneef, S.; Iwendi, C.; Jain, S.K. Efficient Dynamic Phishing Safeguard System Using Neural Boost Phishing Protection. Electronics 2022, 11, 3133. [Google Scholar] [CrossRef]
  22. Ibrahim, L.M.; Thanon, K.H. Analysis and detection of the zeus botnet crimeware. Int. J. Comput. Sci. Inf. Secur. 2015, 13, 121. [Google Scholar]
  23. Gu, G.; Porras, P.; Yegneswaran, V.; Fong, M.; Lee, W. BotHunter: Detecting Malware Infection Through IDS-Driven Dialog Correlation. In Proceedings of the USENIX Conference on Security Symposium, Anaheim, CA, USA, 9–11 August 2007; pp. 167–182. [Google Scholar]
  24. Thorat, S.A.; Khandelwal, A.K.; Bruhadeshwar, B.; Kishore, K. Payload Content Based Network Anomaly Detection. In Proceedings of the 2008 First International Conference on the Applications of Digital Information and Web Technologies (ICADIWT), Ostrava, Czech Republic, 4–6 August 2008. [Google Scholar] [CrossRef]
  25. Guofei, G.; Perdisci, R.; Zhang, J.; Lee, W. BotMiner: Clustering analysis of network traffic for protocol- and structure-independent botnet detection. In Proceedings of the 17th Conference on Security Symposium, San Jose, CA, USA, 28 July–1 August 2008; pp. 139–154. [Google Scholar]
  26. Azab, A.; Alazab, M.; Aiash, M. Machine Learning Based Botnet Identification Traffic. In Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, Tianjin, China, 23–26 August 2016; pp. 1788–1794. [Google Scholar]
  27. Soniya, B.; Wilscy, M. Detection of Randomized Bot Command and Control Traffic on an End-Point Host. Alex. Eng. J. 2016, 55, 2771–2781. [Google Scholar] [CrossRef] [Green Version]
  28. Venkatesh, G.K.; Nadarajan, R.A. HTTP botnet detection using adaptive learning rate multilayer feed-forward neural network. In Proceedings of the IFIP International Workshop on Information Security Theory and Practice, Egham, UK, 20–22 June 2012; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7322 LNCS, pp. 38–48. [Google Scholar]
  29. Haddadi, F.; Runkel, D.; Zincir-Heywood, A.N.; Heywood, M.I. On Botnet Behaviour Analysis Using GP and C4.5; Association for Computing Machinery: New York, NY, USA, 2014; pp. 1253–1260. [Google Scholar]
  30. Fernandez, D.; Lorenzo, H.; Novoa, F.J.; Cacheda, F.; Carneiro, V. Tools for managing network traffic flows: A comparative analysis. In Proceedings of the 2017 IEEE 16th International Symposium on Network Computing and Applications (NCA), Cambridge, MA, USA, 30 October–1 November 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–5. [Google Scholar]
  31. Fuhr, J.; Wang, F.; Tang, Y. MOCA: A Network Intrusion Monitoring and Classification System. J. Cybersecur. Priv. 2022, 2, 629–639. [Google Scholar] [CrossRef]
  32. He, S.; Zhu, J.; He, P.; Lyu, M.R. Experience report: System log analysis for anomaly detection. In Proceedings of the 2016 IEEE 27th international symposium on software reliability engineering (ISSRE), Ottawa, ON, Canada, 23–27 October 2016; pp. 207–218. [Google Scholar]
  33. Zhou, J.; Qian, Y.; Zou, Q.; Liu, P.; Xiang, J. DeepSyslog: Deep Anomaly Detection on Syslog Using Sentence Embedding and Metadata. IEEE Trans. Inf. Forensics Secur. 2022, 17, 3051–3061. [Google Scholar] [CrossRef]
  34. Ghafir, I.; Prenosil, V.; Hammoudeh, M.; Baker, T.; Jabbar, S.; Khalid, S.; Jaf, S. BotDet: A System for Real Time Botnet Command and Control Traffic Detection. IEEE Access 2018, 6, 38947–38958. [Google Scholar] [CrossRef]
  35. Agarwal, P.; Satapathy, S. Implementation of signature-based detection system using snort in windows. Int. J. Comput. Appl. Inf. Technol. 2014, 3, 3–4. [Google Scholar]
  36. Khraisat, A.; Gondal, I.; Vamplew, P.; Kamruzzaman, J. Survey of intrusion detection systems: Techniques, datasets and challenges. Cybersecurity 2019, 2, 1–22. [Google Scholar] [CrossRef] [Green Version]
  37. Sharma, P.; Said, Z.; Memon, S.; Elavarasan, R.M.; Khalid, M.; Nguyen, X.P.; Arıcı, M.; Hoang, A.T.; Nguyen, L.H. Comparative evaluation of AI-based intelligent GEP and ANFIS models in prediction of thermophysical properties of Fe3O4-coated MWCNT hybrid nanofluids for potential application in energy systems. Int. J. Energy Res. 2022, 37, 19242–19257. [Google Scholar] [CrossRef]
  38. Arndt, D. DanielArndt/Netmate-Flowcalc. Available online: https://github.com/DanielArndt/netmate-flowcalc (accessed on 6 November 2019).
  39. Montigny-Leboeuf, A.D.; Couture, M.; Massicotte, F. Traffic Behaviour Characterization Using NetMate. In International Workshop on Recent Advances in Intrusion Detection 2019; Springer: Berlin/Heidelberg, Germany; pp. 367–368.
  40. De Montigny-Leboeuf, A.; Couture, M.; Massicotte, F. Traffic Behaviour Characterization Using NetMate. Lect. Notes Comput. Sci. 2009, 5758, 367–368. [Google Scholar] [CrossRef]
  41. de Menezes, N.A.T.; de Mello, F.L. Flow Feature-Based Network Traffic Classification Using Machine Learning. J. Inf. Secur. Cryptogr. 2021, 8, 12–16. [Google Scholar] [CrossRef]
  42. Miller, S.; Curran, K.; Lunney, T. Multilayer perceptron neural network for detection of encrypted VPN network traffic. In Proceedings of the International Conference On Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA), Glasgow, UK, 11–12 June 2018. [Google Scholar]
  43. Kasongo, S.M.; Sun, Y. A Deep Learning Method with Filter Based Feature Engineering for Wireless Intrusion Detection System. IEEE Access 2019, 7, 38597–38607. [Google Scholar] [CrossRef]
  44. Reis, B.; Maia, E.; Praça, I. Selection and Performance Analysis of CICIDS2017 Features Importance. Found. Pract. Secur. 2020, 12056, 56–71. [Google Scholar] [CrossRef]
  45. Maldonado, S.; Weber, R. A wrapper method for feature selection using Support Vector Machines. Inf. Sci. 2009, 179, 2208–2217. [Google Scholar] [CrossRef]
  46. Wald, R.; Khoshgoftaar, T.; Napolitano, A. Comparison of Stability for Different Families of Filter-Based and Wrapper-Based Feature Selection. In Proceedings of the 2013 12th International Conference on Machine Learning and Applications, Miami, FL, USA, 4–7 December 2013. [Google Scholar] [CrossRef]
  47. Schmoll, C.; Zander, S. NetMate-User and Developer Manual. 2004. Available online: https://www.researchgate.net/publication/246926554_NetMate-User_and_Developer_Manual (accessed on 22 December 2022).
  48. Saghezchi, F.B.; Mantas, G.; Violas, M.A.; de Oliveira Duarte, A.M.; Rodriguez, J. Machine Learning for DDoS Attack Detection in Industry 4.0 CPPSs. Electronics 2022, 11, 602. [Google Scholar] [CrossRef]
  49. Alshammari, R.; Zincir-Heywood, A.N. Investigating Two Different Approaches for Encrypted Traffic Classification. In Proceedings of the 2008 Sixth Annual Conference on Privacy, Security and Trust, Fredericton, NB, Canada, 1–3 October 2008. [Google Scholar] [CrossRef]
  50. Yeo, M.; Koo, Y.; Yoon, Y.; Hwang, T.; Ryu, J.; Song, J.; Park, C. Flow-Based Malware Detection Using Convolutional Neural Network. In Proceedings of the 2018 International Conference on Information Networking (ICOIN), Chiang Mai, Thailand, 10–12 January 2018. [Google Scholar] [CrossRef]
  51. Shomiron. Zeustracker. Available online: https://github.com/dnif-archive/enrich-zeustracker (accessed on 25 July 2022).
  52. Stratosphere. Stratosphere Laboratory Datasets. Available online: https://www.stratosphereips.org/datasets-overview (accessed on 25 November 2022).
  53. Abuse, C. Fighting Malware and Botnets. Available online: https://abuse.ch/ (accessed on 13 May 2022).
  54. Haddadi, F.; Zincir-Heywood, A.N. Benchmarking the effect of flow exporters and protocol filters on botnet traffic classification. IEEE Syst. J. 2014, 10, 1390–1401. [Google Scholar] [CrossRef]
  55. Khodamoradi, P.; Fazlali, M.; Mardukhi, F.; Nosrati, M. Heuristic metamorphic malware detection based on statistics of assembly instructions using classification algorithms. In Proceedings of the 2015 18th CSI International Symposium on Computer Architecture and Digital Systems (CADS), Tehran, Iran, 7–8 October 2015; pp. 1–6. [Google Scholar]
  56. Salzberg, S.L. C4.5: Programs for Machine Learning by J. Ross Quinlan; Morgan Kaufmann Publishers, Inc.: Burlington, MA, USA, 1993. [Google Scholar]
  57. Xhemali, D.; Hinde, C.J.; Stone, R.G. Naïve Bayes vs. Decision Trees vs. Neural Networks in the Classification of Training Web Pages. Int. J. Comput. Sci. Issues 2009, 4, 16–23. [Google Scholar]
  58. Bernard, S.; Heutte, L.; Adam, S. On the selection of decision trees in random forests. In Proceedings of the 2009 International Joint Conference on Neural Networks, Atlanta, GA, USA, 14–19 June 2009; pp. 302–307. [Google Scholar]
  59. Maimon, O.; Rokach, L. (Eds.) Data Mining and Knowledge Discovery Handbook; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
  60. Liu, Z.; Thapa, N.; Shaver, A.; Roy, K.; Siddula, M.; Yuan, X.; Yu, A. Using Embedded Feature Selection and CNN for Classification on CCD-INID-V1—A New IoT Dataset. Sensors 2021, 21, 4834. [Google Scholar] [CrossRef]
  61. Oshiro, T.M.; Perez, P.S.; Baranauskas, J.A. How Many Trees in a Random Forest? Mach. Learn. Data Min. Pattern Recognit. 2012, 7376, 154–168. [Google Scholar] [CrossRef]
  62. Jiang, Z.; Shen, G. Prediction of House Price Based on the Back Propagation Neural Network in the Keras Deep Learning Framework. In Proceedings of the 2019 6th International Conference on Systems and Informatics (ICSAI), Shanghai, China, 2–4 November 2019; pp. 1408–1412. [Google Scholar] [CrossRef]
  63. Nagisetty, A.; Gupta, G.P. Framework for detection of malicious activities in IoT networks using keras deep learning library. In Proceedings of the 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 27–29 March 2019; pp. 633–637. [Google Scholar]
  64. Heller, M. What Is Keras? The Deep Neural Network API Explained. Available online: https://www.infoworld.com/article/3336192/what-is-keras-the-deep-neural-network-api-explained.html (accessed on 25 November 2022).
  65. Ali, S.; Rehman, S.U.; Imran, A.; Adeem, G.; Iqbal, Z.; Kim, K.-I. Comparative Evaluation of AI-Based Techniques for Zero-Day Attacks Detection. Electronics 2022, 11, 3934. [Google Scholar] [CrossRef]
  66. Kumar, V.; Lalotra, G.S.; Sasikala, P.; Rajput, D.S.; Kaluri, R.; Lakshmanna, K.; Shorfuzzaman, M.; Alsufyani, A.; Uddin, M. Addressing Binary Classification over Class Imbalanced Clinical Datasets Using Computationally Intelligent Techniques. Healthcare 2022, 10, 1293. [Google Scholar] [CrossRef] [PubMed]
  67. Maudoux, C.; Boumerdassi, S.; Barcello, A.; Renault, E. Combined Forest: A New Supervised Approach for a Machine-Learning-Based Botnets Detection. In Proceedings of the 2021 IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, 7–11 December 2021. [Google Scholar] [CrossRef]
Figure 1. C&C Communication phases.
Figure 1. C&C Communication phases.
Jcp 03 00001 g001
Figure 2. Methodology for collecting and preparing the data.
Figure 2. Methodology for collecting and preparing the data.
Jcp 03 00001 g002
Figure 3. A neural network.
Figure 3. A neural network.
Jcp 03 00001 g003
Figure 5. Comparison of the Zeus prediction results for all three ML algorithms.
Figure 5. Comparison of the Zeus prediction results for all three ML algorithms.
Jcp 03 00001 g005
Figure 6. Comparison of the benign prediction results for all three ML algorithms.
Figure 6. Comparison of the benign prediction results for all three ML algorithms.
Jcp 03 00001 g006
Figure 9. Zeus prediction results when tested using the minimum number of features.
Figure 9. Zeus prediction results when tested using the minimum number of features.
Jcp 03 00001 g009
Figure 10. Benign communication prediction results when tested using the minimum number of features.
Figure 10. Benign communication prediction results when tested using the minimum number of features.
Jcp 03 00001 g010
Figure 11. DT Zeus prediction results compared between using the minimum number of features and all the features in Feature set1.
Figure 11. DT Zeus prediction results compared between using the minimum number of features and all the features in Feature set1.
Jcp 03 00001 g011
Figure 12. DT benign communication prediction results compared between using the minimum number of features and all the features in Feature set1.
Figure 12. DT benign communication prediction results compared between using the minimum number of features and all the features in Feature set1.
Jcp 03 00001 g012
Figure 13. RF Zeus prediction results compared between using the minimum number of features and all the features in Feature set1.
Figure 13. RF Zeus prediction results compared between using the minimum number of features and all the features in Feature set1.
Jcp 03 00001 g013
Figure 14. True positive rates compared for all three algorithms when using the minimum number of features.
Figure 14. True positive rates compared for all three algorithms when using the minimum number of features.
Jcp 03 00001 g014
Figure 15. True negative rates compared for all three algorithms when using the minimum number of features.
Figure 15. True negative rates compared for all three algorithms when using the minimum number of features.
Jcp 03 00001 g015
Table 1. The features that were not used during this research.
Table 1. The features that were not used during this research.
Feature That Was RemovedJustification
srcipThis is the source IP which was removed to negate any correlation with a network characteristic
srcportThe is the source port number which was removed to negate any correlation with a network characteristic
dstipThis is the destination IP address which was removed to negate any correlation with a network characteristic
dstportThe is the destination port number which was removed to negate any correlation with a network characteristic
proto This is the protocol that was being used (i.e., TCP = 6, UDP = 17) which was removed to negate any correlation with a network characteristic
min_fiatThis is the minimum time between two packets sent in the forward direction (in microseconds) which was removed to negate any correlation with a network characteristic
mean_fiatThis is the mean amount of time between two packets sent in the forward direction (in microseconds) which was removed to negate any correlation with a network characteristic
max_fiatThis is the maximum time between two packets sent in the forward direction (in microseconds) which was removed to negate any correlation with a network characteristic
std_fiatThis is the standard deviation from the mean time between two packets sent in the forward direction (in microseconds)
min_biatThis is the minimum time between two packets sent in the backward direction (in microseconds) which was removed to negate any correlation with a network characteristic
mean_biatThis is the mean time between two packets sent in the backward direction (in microseconds) which was removed to negate any correlation with a network characteristic
std_biatThis is the standard deviation from the mean time between two packets sent in the backward direction (in microseconds) which was removed to negate any correlation with a network characteristic
Table 2. Datasets used in this research.
Table 2. Datasets used in this research.
Dataset TypeMalware Name/YearNumber of FlowsName of Dataset for
This Paper
Malware
Benign
Zeus/2022272,425Dataset1
N/A272,425
Malware
Benign
Zeus/201966,009Dataset2
N/A66,009
Malware
Benign
Zeus/201938,282Dataset3
N/A38,282
Malware
Benign
Zeus/2014200,000Dataset4
N/A200,000
Malware
Benign
Zeus/201435,054Dataset5
N/A35,054
Malware
Benign
Zeus/20146049Dataset6
N/A6049
Malware
Benign
ZeusPanda/202211,864Dataset7
N/A11,864
Malware
Benign
Ramnit/202210,204Dataset8
N/A10,204
Malware
Benign
Citadel/20227152Dataset9
N/A7152
Table 3. An example of the confusion matrix used to measure the detection accuracy.
Table 3. An example of the confusion matrix used to measure the detection accuracy.
Predicted BenignPredicted Zeus
Actual Benign (Total)TNFN
Actual Zeus (Total)FPTP
Table 4. Test results when using the decision tree algorithm.
Table 4. Test results when using the decision tree algorithm.
Dataset NameBenign
Precision Score
Benign
Recall Score
Benign
f1-Score
Malware
Precision Score
Malware
Recall Score
Malware
f1-Score
Dataset10.9910.9910.990.99
Dataset20.9410.9710.940.97
Dataset30.9510.9710.940.97
Dataset40.710.870.780.830.640.72
Dataset50.6710.810.520.68
Dataset60.6210.7610.390.56
Dataset70.9910.9910.990.99
Dataset80.980.870.920.880.980.93
Dataset90.9610.9810.960.98
Table 5. Test results when using the random forest algorithm.
Table 5. Test results when using the random forest algorithm.
Dataset NameBenign
Precision Score
Benign
Recall Score
Benign
f1-Score
Malware
Precision Score
Malware
Recall Score
Malware
f1-Score
Dataset1111111
Dataset20.9810.9910.980.99
Dataset30.9810.9910.980.99
Dataset40.750.890.820.860.710.78
Dataset50.810.8910.740.85
Dataset60.6810.8110.530.69
Dataset70.9910.9910.990.99
Dataset80.940.880.910.890.940.92
Dataset90.9510.9810.950.97
Table 6. Confusion matrix for testing with the decision tree algorithm.
Table 6. Confusion matrix for testing with the decision tree algorithm.
Dataset NameBenign
Precision Score
Benign
Recall Score
Benign
f1-Score
Malware
Precision Score
Malware
Recall Score
Malware
f1-Score
Dataset1272,425271,721704272,425270,1092316
Dataset266,00965,83217766,00961,9204089
Dataset338,22238,1279538,28236,0612221
Dataset4200,000173,21926781200,000127,70672,294
Dataset535,05434,9639135,05418,11616,938
Dataset6604960409604923333716
Dataset711,86411,8362811,86411,715149
Dataset810,2048865133910,20410,041163
Dataset9715271381471526839313
Table 7. Confusion matrix for testing with the random forest algorithm.
Table 7. Confusion matrix for testing with the random forest algorithm.
Dataset NameBenign
Precision Score
Benign
Recall Score
Benign
f1-Score
Malware
Precision Score
Malware
Recall Score
Malware
f1-Score
Dataset1272,425272,200225272,425271,3121113
Dataset266,00965,9466366,00964,4381571
Dataset338,28238,2523038,28237,546736
Dataset4200,000177,30922,691200,000142,28657,714
Dataset535,05435,0252935,05426,1048950
Dataset6604960463604931792870
Dataset711,86411,8521211,86411,740124
Dataset810,2049014119010,2049642562
Dataset971527145771526803349
Table 8. Test results when using the deep learning algorithm.
Table 8. Test results when using the deep learning algorithm.
Dataset NameBenign
Precision Score
Benign
Recall Score
Benign
f1-Score
Malware
Precision Score
Malware
Recall Score
Malware
f1-Score
Dataset10.980.970.980.970.980.98
Dataset20.920.970.940.970.910.94
Dataset30.920.980.950.970.920.95
Dataset40.690.950.80.930.560.7
Dataset50.70.970.810.960.580.72
Dataset60.870.990.930.990.860.92
Dataset70.970.970.970.970.970.97
Dataset80.910.940.930.940.910.92
Dataset90.920.980.950.980.920.95
Table 9. Confusion matrix for using the deep learning algorithm.
Table 9. Confusion matrix for using the deep learning algorithm.
Dataset NameBenign
Precision Score
Benign
Recall Score
Benign
f1-Score
Malware
Precision Score
Malware
Recall Score
Malware
f1-Score
Dataset1272,425265,4526973272,425266,0916334
Dataset266,00964,123188666,00960,3105699
Dataset338,28237,35692638,28235,1773105
Dataset4200,000190,9359065200,000112,73187,269
Dataset535,05434,15589935,05414,75320,301
Dataset6604959737660495180869
Dataset711,86411,56629811,86411,551313
Dataset810,204960559910,2049260944
Dataset97152702312971526547605
Table 10. Predication results when using the DL algorithm with minimum features.
Table 10. Predication results when using the DL algorithm with minimum features.
Dataset NameBenign
Precision Score
Benign
Recall Score
Benign
f1-Score
Malware
Precision Score
Malware
Recall Score
Malware
f1-Score
Dataset10.980.970.970.970.980.97
Dataset20.90.970.930.970.890.93
Dataset30.90.970.940.970.90.93
Dataset40.660.950.780.920.510.66
Dataset50.670.970.790.950.510.67
Dataset60.790.990.880.980.740.85
Dataset70.970.970.970.970.970.97
Dataset80.910.940.920.940.90.92
Dataset90.920.980.950.980.910.95
Table 11. Confusion matrix for testing the DL algorithm with minimum features.
Table 11. Confusion matrix for testing the DL algorithm with minimum features.
Dataset NameBenign
Precision Score
Benign
Recall Score
Benign
f1-Score
Malware
Precision Score
Malware
Recall Score
Malware
f1-Score
Dataset1272,425265,0497376272,425266,0526373
Dataset266,00964,028198166,00958,8117198
Dataset338,28237,29898438,28234,3043978
Dataset4200,000190,8209180200,000102,37197,629
Dataset535,05434,10395135,05417,99617,058
Dataset66049596386604944991550
Dataset711,86411,55331111,86411,535329
Dataset810,204957562910,2049233971
Dataset97152701513771526544608
Table 12. Predication results when using the DT algorithm with minimum features.
Table 12. Predication results when using the DT algorithm with minimum features.
Dataset NameBenign
Precision Score
Benign
Recall Score
Benign
f1-Score
Malware
Precision Score
Malware
Recall Score
Malware
f1-Score
Dataset10.9910.9910.990.99
Dataset20.9410.9710.940.97
Dataset30.9410.9710.940.97
Dataset40.690.870.770.820.60.7
Dataset50.7110.830.990.60.75
Dataset60.6210.760.990.380.55
Dataset70.9910.9910.990.99
Dataset80.940.870.90.880.940.91
Dataset90.9610.9810.960.98
Table 13. Confusion matrix for testing the DT algorithm with minimum features.
Table 13. Confusion matrix for testing the DT algorithm with minimum features.
Dataset NameBenign
Precision Score
Benign
Recall Score
Benign
f1-Score
Malware
Precision Score
Malware
Recall Score
Malware
f1-Score
Dataset1272,425271,502923272,425270,4082017
Dataset266,00965,78822166,00961,8044205
Dataset338,28238,15912338,28236,0182264
Dataset4200,000174,26725,733200,000120,78579,215
Dataset535,05434,93511935,05420,98414,070
Dataset66049603217604923283721
Dataset711,86411,8253911,86411,724140
Dataset810,2048857134710,2049630574
Dataset9715271302271526837315
Table 14. Predication results when using the RF algorithm with minimum features.
Table 14. Predication results when using the RF algorithm with minimum features.
Dataset NameBenign
Precision Score
Benign
Recall Score
Benign
f1-Score
Malware
Precision Score
Malware
Recall Score
Malware
f1-Score
Dataset1111111
Dataset20.9310.9510.930.95
Dataset30.9410.9710.930.96
Dataset40.670.890.770.840.570.68
Dataset50.710.8310.580.73
Dataset60.6210.7710.390.56
Dataset70.9910.9910.990.99
Dataset80.990.890.930.90.990.94
Dataset90.9610.9810.960.98
Table 15. Confusion matrix for testing the RF algorithm with minimum features.
Table 15. Confusion matrix for testing the RF algorithm with minimum features.
Dataset NameBenign
Precision Score
Benign
Recall Score
Benign
f1-Score
Malware
Precision Score
Malware
Recall Score
Malware
f1-Score
Dataset1272,425272,233192272,425271,3281097
Dataset266,00965,9614866,00961,2304779
Dataset338,28238,2562638,28235,6412641
Dataset4200,000178,34621,654200,000114,03085,970
Dataset535,05435,0292535,05420,22114,833
Dataset6604960472604923523697
Dataset711,86411,855911,86411,743121
Dataset810,2049033117110,20410,081123
Dataset971527148471526849303
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kazi, M.A.; Woodhead, S.; Gan, D. An Investigation to Detect Banking Malware Network Communication Traffic Using Machine Learning Techniques. J. Cybersecur. Priv. 2023, 3, 1-23. https://doi.org/10.3390/jcp3010001

AMA Style

Kazi MA, Woodhead S, Gan D. An Investigation to Detect Banking Malware Network Communication Traffic Using Machine Learning Techniques. Journal of Cybersecurity and Privacy. 2023; 3(1):1-23. https://doi.org/10.3390/jcp3010001

Chicago/Turabian Style

Kazi, Mohamed Ali, Steve Woodhead, and Diane Gan. 2023. "An Investigation to Detect Banking Malware Network Communication Traffic Using Machine Learning Techniques" Journal of Cybersecurity and Privacy 3, no. 1: 1-23. https://doi.org/10.3390/jcp3010001

APA Style

Kazi, M. A., Woodhead, S., & Gan, D. (2023). An Investigation to Detect Banking Malware Network Communication Traffic Using Machine Learning Techniques. Journal of Cybersecurity and Privacy, 3(1), 1-23. https://doi.org/10.3390/jcp3010001

Article Metrics

Back to TopTop