A Systematic Review of Defensive and O ﬀ ensive Cybersecurity with Machine Learning

.


Introduction
In the fight against malicious threats, there has been collaborative support from experts to design different cyber defense systems. Both researchers and designers respectively have the same goals: maintain the privacy, integrity, and accessibility of information through the cyber defense systems against both internal and external threats. The main goal of cybersecurity systems is to combat security threats originating from online sources, including viruses, Trojans, worms, spam, and botnets [1]. These systems defend against cybersecurity threats at both the network and host levels. Network-based defense systems make use of the network flow while host-based defense systems control workstation's upcoming data by mechanisms designed in firewalls, antiviruses, and Intrusion Detection Systems (IDS).
These mechanisms monitor, track, and block viruses and other malicious cyberattacks. However, these methods do not completely eliminate vulnerabilities, threats, and attacks because the design and implementation of software and network infrastructure is inherently imperfect. The old adage that "security chain is only as strong as the weakest link" sums this up aptly, because a single weak spot within modern software and network infrastructures can lead to cascading security compromises at multiple sub-levels [2]. This has led to the constant cycle of patches to protect cyberspace infrastructure, but this has not deterred attackers. Thus, building defense systems for known attacks is insufficient in protecting users. Effective cybersecurity is more critical than ever, as modern attacks are being initiated with the intent for cyberwarfare by well-trained and well-funded militaries and criminal organizations. Moreover, the intensity of attacks has increased and correspondingly the impact of intrusions, as people and organizations get more connected via the Internet of Things (IoT) [3].
Advanced methods are needed to discover previously unknown cyber intrusions and techniques towards a more dependable cybersecurity infrastructure, including both defensive and offensive approaches. Defensive approaches use reactive strategies that focus on prevention, detection, and responses. This is the more traditional method to keep networks safe from cyber criminals, and requires a thorough understanding of the system to be secured. Preventive measures are developed from understanding of the system and potential weak points [4]. On the other hand, offensive approaches are counterpoint to defensive methods, and proactively predict and remove threats in the system using ethical hacking techniques. Security experts mimic exploits and attacks as cyber attackers would. Ultimately, experts aim to eliminate vulnerabilities by identifying them ahead of time [4]. Due to the accessibility of vast volumes of data and cyber criminals trying to gain illegal access to cyberinfrastructures, various Artificial Intelligence (AI) and Machine Learning (ML) techniques have been explored. This is because ML-based cybersecurity solutions, both offensive and defensive, can handle and analyze large amounts of data and complex detection logic where traditional methods would struggle.
Previous reviews in this area have focused on several fragments of research topics. This paper systematically combines the knowledge base in cybersecurity with ML. The goal of our research is to provide a baseline for readers, covering ML techniques, objectives, and effectiveness in cybersecurity, as well as current challenges and future directions of ML techniques in cybersecurity. We focus on a clear depiction of various ML methods, including Data Mining (DM) and computational intelligence. We survey nearly two decades of research papers on application of ML techniques to cybersecurity. We also review and contextualize the literature through the Six Dimensions of Intersection of AI/ML and Cybersecurity (AI-ML-CS) framework [3]. Our systematic review focuses on two dimensions: firstly data and new information frontiers and, secondly, algorithms for AI/ML and cybersecurity.
The paper is organized as follows, in line with the IMRAD (Introduction, Methods, Results, and Discussion) organization structure typically used in scientific literature [5]. After the Introduction section, we provide an overview of our Methodology grounded in Systematic Literature Reviews (SLR). The outcomes from implementation of the SLR methods are contained in the Results section, while applications of the results are covered in the Discussion section.

Methodology
There are other complementary surveys on the topic of ML and cyberattacks [6] and cybersecurity [7]. In order to add to domain knowledge in this area, this review aims at producing an impartial and comprehensive search of the resources considered from the defensive and offensive cybersecurity perspectives. This involves utilizing systematic methods and secondary data, while critically appraising research studies in order to synthesize findings qualitatively and quantitatively. The recent interest in the field of cybersecurity approaches to ML has not yet resulted in an effort to survey the underlying concepts, methods, and problems systematically. Our research is partially using the Kitchenham and Charters methodology for SLRs [8]. This includes three critical steps that pre-define a review protocol to reduce potential researcher bias: outlining the research questions, generating a search strategy, and specifying a selection criteria. Overall, this systematic review adheres to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [9]. To fulfill the requirements of PRISMA, this paper has been structured in accordance with the required sections, and we have provided the PRISMA flow diagram in Figure 1, as well as the PRISMA checklist with cross-references in Appendix A.

Research Questions
We cover the following three main research questions in this research that the systematic review is aiming to answer, with emphasis on current ML methods being used in cybersecurity. These questions are formulated to be relevant to both researchers and practitioners in cybersecurity. In line with the AI-ML-CS framework, our research questions explore the need and availability of training data sets, as well as the need for the two-pronged approach using passive (forensic or defensive) versus proactive (offensive) cybersecurity strategies in algorithms [3].
• What ML techniques have been used in offensive and defensive cybersecurity? • What data sets are used in training supervised ML models by researchers? • What cybersecurity threats can be better tackled by ML in cybersecurity? Figure 1 shows the study selection divided into three different stages: identification, screening, and eligibility. For the identification step, we shortlisted five major databases: ACM Digital Library, IEEE Xplore, Springer Link, Science Direct, and Scopus. Scanning bibliographies of relevant articles, we also searched for resources outside those databases. For example, Google Scholar was also investigated but exempted from our sources, due to overlapping citations with the other databases. Google Scholar is not a publishing entity, and mostly indexes citations and papers from other primary database sources. On the other hand, the selected sources are original publishers who index original papers from self-managed conferences and journals by ACM, IEEE, Springer, and Elsevier. Hence, the overlap between publications retrieved from the selected databases is minimal. In order to collect

Research Questions
We cover the following three main research questions in this research that the systematic review is aiming to answer, with emphasis on current ML methods being used in cybersecurity. These questions are formulated to be relevant to both researchers and practitioners in cybersecurity. In line with the AI-ML-CS framework, our research questions explore the need and availability of training data sets, as well as the need for the two-pronged approach using passive (forensic or defensive) versus proactive (offensive) cybersecurity strategies in algorithms [3].

•
What ML techniques have been used in offensive and defensive cybersecurity? • What data sets are used in training supervised ML models by researchers? • What cybersecurity threats can be better tackled by ML in cybersecurity? Figure 1 shows the study selection divided into three different stages: identification, screening, and eligibility. For the identification step, we shortlisted five major databases: ACM Digital Library, IEEE Xplore, Springer Link, Science Direct, and Scopus. Scanning bibliographies of relevant articles, we also searched for resources outside those databases. For example, Google Scholar was also investigated but exempted from our sources, due to overlapping citations with the other databases. Google Scholar is not a publishing entity, and mostly indexes citations and papers from other primary database sources. On the other hand, the selected sources are original publishers who index original papers from self-managed conferences and journals by ACM, IEEE, Springer, and Elsevier. Hence, the overlap between publications retrieved from the selected databases is minimal. In order to collect all relevant existing research, we employed groups of several keywords. Despite major contributions to the cybersecurity industry starting from the 2010s, we chose our timeline based on the impact of ML in the early 2000s and included contributions made prior to 2010, as well.

Search Strategy
In line with PRISMA, we also self-evaluate our sampling methods for any hidden bias, which could lead to diverging outcomes from systematic studies [10]. Our data collection strategy had no known hidden bias. The databases selected are well-known, and we were not limited by any issues regarding access to manuscripts. We have also taken into account citation counts and year of publication for the selected studies, as documented in Appendix B, and hence, it is expected that any potential errors from missing or incomplete data will be minimal.
After collection, we identified duplicates manually based on citation similarity. The following search terms were used (Box 1), with ML, artificial intelligence, and DM grouped inclusively to ensure the presence of either word would return the matched articles. To filter for only security-related articles, cybersecurity was used exclusively in the search query to return ML articles specific to cybersecurity.

Box 1. Search terms.
Appl. Sci. 2020, 10, x FOR PEER REVIEW  4 of 26 all relevant existing research, we employed groups of several keywords. Despite major contributions to the cybersecurity industry starting from the 2010s, we chose our timeline based on the impact of ML in the early 2000s and included contributions made prior to 2010, as well.
In line with PRISMA, we also self-evaluate our sampling methods for any hidden bias, which could lead to diverging outcomes from systematic studies [10]. Our data collection strategy had no known hidden bias. The databases selected are well-known, and we were not limited by any issues regarding access to manuscripts. We have also taken into account citation counts and year of publication for the selected studies, as documented in Appendix B, and hence, it is expected that any potential errors from missing or incomplete data will be minimal.
After collection, we identified duplicates manually based on citation similarity. The following search terms were used (Box 1), with ML, artificial intelligence, and DM grouped inclusively to ensure the presence of either word would return the matched articles. To filter for only securityrelated articles, cybersecurity was used exclusively in the search query to return ML articles specific to cybersecurity.

Box 1. Search terms.
(machine learning OR artificial intelligence OR data mining) AND cybersecurity In the screening step, a team of researchers independently filtered the collection based on their titles and abstracts. Next, the researchers examined the full contents of the selected collection. The eligibility step reviewed articles using outlined search criteria to guarantee only relevant articles had been collected. In the event of differing opinions in relation to the credibility of any selected article, meetings were held to re-examine the full article until consensus was reached. Ultimately, 245 articles on cybersecurity approaches using ML techniques were collected. From these, further criteria were applied based on uniqueness of contributions to ML and cybersecurity, as well as removing duplicated contributions to get the final listing of 120 studies, listed with references in Appendix B. We also extracted metadata from the final selected set, including author's name, title, year of publication, publishing type, citation, data set, objectives, ML techniques, current and future challenges, offensive or defensive cybersecurity approach, and data sets used.

Search Criteria
A systematic search of the literature concerning offensive and defensive cybersecurity approaches using ML techniques was performed. Table 1 shows several criteria defined in order to find high-quality articles to answer our research questions. These criteria were applied in order to determine which articles would be included or excluded from all articles across each phase of the selection process.

Machine Learning Techniques Primer
In the screening step, a team of researchers independently filtered the collection based on their titles and abstracts. Next, the researchers examined the full contents of the selected collection. The eligibility step reviewed articles using outlined search criteria to guarantee only relevant articles had been collected. In the event of differing opinions in relation to the credibility of any selected article, meetings were held to re-examine the full article until consensus was reached. Ultimately, 245 articles on cybersecurity approaches using ML techniques were collected. From these, further criteria were applied based on uniqueness of contributions to ML and cybersecurity, as well as removing duplicated contributions to get the final listing of 120 studies, listed with references in Appendix B. We also extracted metadata from the final selected set, including author's name, title, year of publication, publishing type, citation, data set, objectives, ML techniques, current and future challenges, offensive or defensive cybersecurity approach, and data sets used.

Search Criteria
A systematic search of the literature concerning offensive and defensive cybersecurity approaches using ML techniques was performed. Table 1 shows several criteria defined in order to find high-quality articles to answer our research questions. These criteria were applied in order to determine which articles would be included or excluded from all articles across each phase of the selection process.

Machine Learning Techniques Primer
To answer the research question about the prevalence of ML techniques, we divided the ML methods within defensive and offensive cybersecurity. In general, ML techniques can be categorized into three main groups: supervised, unsupervised, and semi-supervised learning. With supervised learning, the ML algorithms require prior knowledge to guide decisions. This includes detailed data from past security incidents with an assigned label as to whether this was a breach or not. This popular type of supervised ML method is called a classifier. For instance, the training data could include information about network packets sent during an attack, along with other properties such as originating source details. The patterns within the training data is then associated with a "threat" or "no threat" label by the ML algorithm, and the trained model can classify future unknown threats. On the other hand, unsupervised ML methods do not rely on training data or curated labels, but group threats and non-threats based on general-purpose patterns within observations. One popular unsupervised ML method is clustering, where data points with similar attributes are grouped together, such as signals for attacks as an example. One benefit of unsupervised ML is that historical data is not needed for training. On the other hand, unsupervised algorithms tend to be more general-purpose compared with supervised methods that can learn domain-specific data properties better. For example, an unsupervised method for malware detection may take longer to detect new threats until the algorithm parameters are changed by the domain experts. On the other hand, supervised ML algorithms would be able to correlate new threats as soon as new signatures are provided as training data. Semi-supervised ML is useful when the training data is insufficient for supervised ML, but the unsupervised alternative may not give the best results. In this scenario, a small curated data set of attack signals can be used to make temporary inferences about new signals in conjunction with unsupervised ML approaches.
ML techniques are evaluated using standard metrics such as true positive rate, false positive rate, accuracy, precision, recall, F1 score, false alarm rate, and confusion matrix. The basis of these metrics are true positives, false positives, true negatives, and false negatives. True positive rate is considered as the number of intrusions that were correctly detected over the total number of intrusions in the testing set. False positive rate is a representation of the number of normal requests recognized as intrusions over the total number of normal requests available in the testing set [11]. It should also be noted that False Positive Rate (FPR) and False Negative Rate (FNR) are essential metrics in various types of network systems, including Erdos-Rényi (ER) networks, Random Regular (RR) networks, and Scale-Free (SF) networks. FPR and FNR provide a robust measurement of a network's probability of being compromised by attacks or security threats being overlooked [12]. Accuracy provides the percentage of all requests correctly classified. That is, the number of requests correctly classified over the total number of requests available in the testing set expressed as a percentage [13]. Another common measure is precision, computed as the number of intrusions that were correctly classified over the total number of observed data points. Recall is another measure to compute the ratio of the number of correctly identified intrusions over the total number of intrusions. F1 score is a composite measure computed as the weighted average of the recall and precision [14]. It provides a balance through incorporation of both precision and recall. False alarm rate is considered as the proportion between the number of normal connections that are incorrectly categorized as attacks and the aggregate of normal connections [11]. The confusion matrix presents the distribution of correctly and incorrectly classified or predicted data points [14].

Results
Having followed the review protocol we outlined, the results of the systematic review are summarized by answering each of the three research questions we raised. Firstly, we presented results of the survey on offensive and defensive ML techniques in cybersecurity, followed by a summary of data sets used in supervised ML. Lastly, we categorized cybersecurity threats that were tackled with ML techniques in the related literature. It should be noted that there is no silver bullet classification algorithm, and specific context requires thorough evaluation to determine the best classifiers suitable for the cybersecurity classification problem at hand.
Generally, most of the authors of the selected reviewed papers employed defensive approaches to provide solutions to the various cybersecurity issues and this can be seen based on Figure 2. Offensive approaches in cybersecurity were first employed in few of the selected reviewed papers published in 2008 and the number of papers that used these approaches comparatively increased in 2012. Despite this increase, the number papers that used defensive approaches in 2012 was about three times the number of papers leveraging offensive approaches in that same year. Furthermore, the selected reviewed papers published in 2017 recorded the highest number of studies that leveraged defensive approaches to solve cybersecurity problems. We postulated that a key reason for this discrepancy might be perceptions of Return on Investment (ROI) for offensive approaches, including unclear metrics for success. Offensive methods mitigate attacks and breaches before they occur. Hence, the effectiveness of these approaches is comparatively harder to quantify with traditional security metrics like FPR and FNR.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 26 Generally, most of the authors of the selected reviewed papers employed defensive approaches to provide solutions to the various cybersecurity issues and this can be seen based on Figure 2. Offensive approaches in cybersecurity were first employed in few of the selected reviewed papers published in 2008 and the number of papers that used these approaches comparatively increased in 2012. Despite this increase, the number papers that used defensive approaches in 2012 was about three times the number of papers leveraging offensive approaches in that same year. Furthermore, the selected reviewed papers published in 2017 recorded the highest number of studies that leveraged defensive approaches to solve cybersecurity problems. We postulated that a key reason for this discrepancy might be perceptions of Return on Investment (ROI) for offensive approaches, including unclear metrics for success. Offensive methods mitigate attacks and breaches before they occur. Hence, the effectiveness of these approaches is comparatively harder to quantify with traditional security metrics like FPR and FNR.

Defensive Machine Learning Techniques
We identified seven commonly used classification techniques in the selected reviewed papers on IDS, a defensive cybersecurity strategy: Support Vector Machine (SVM), naïve Bayes, decision trees, random forests, logistic regression, neural networks, and hybrid methods. Figure 3 summarizes the range and prevalence of ML defensive strategies surfaced in literature. SVM are a frequently used classification algorithm by authors of the selected reviewed papers when dealing with IDS. This classification algorithm searches for an optimum hyperplane to divide two classes by conducting a structural risk analysis of statistical learning theory [15]. For the definition of this hyperplane, the algorithm ensures some support vectors are computed so that the

Defensive Machine Learning Techniques
We identified seven commonly used classification techniques in the selected reviewed papers on IDS, a defensive cybersecurity strategy: Support Vector Machine (SVM), naïve Bayes, decision trees, random forests, logistic regression, neural networks, and hybrid methods. Figure 3 summarizes the range and prevalence of ML defensive strategies surfaced in literature.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 26 Generally, most of the authors of the selected reviewed papers employed defensive approaches to provide solutions to the various cybersecurity issues and this can be seen based on Figure 2. Offensive approaches in cybersecurity were first employed in few of the selected reviewed papers published in 2008 and the number of papers that used these approaches comparatively increased in 2012. Despite this increase, the number papers that used defensive approaches in 2012 was about three times the number of papers leveraging offensive approaches in that same year. Furthermore, the selected reviewed papers published in 2017 recorded the highest number of studies that leveraged defensive approaches to solve cybersecurity problems. We postulated that a key reason for this discrepancy might be perceptions of Return on Investment (ROI) for offensive approaches, including unclear metrics for success. Offensive methods mitigate attacks and breaches before they occur. Hence, the effectiveness of these approaches is comparatively harder to quantify with traditional security metrics like FPR and FNR.

Defensive Machine Learning Techniques
We identified seven commonly used classification techniques in the selected reviewed papers on IDS, a defensive cybersecurity strategy: Support Vector Machine (SVM), naïve Bayes, decision trees, random forests, logistic regression, neural networks, and hybrid methods. Figure 3 summarizes the range and prevalence of ML defensive strategies surfaced in literature. SVM are a frequently used classification algorithm by authors of the selected reviewed papers when dealing with IDS. This classification algorithm searches for an optimum hyperplane to divide two classes by conducting a structural risk analysis of statistical learning theory [15]. For the definition of this hyperplane, the algorithm ensures some support vectors are computed so that the  SVM are a frequently used classification algorithm by authors of the selected reviewed papers when dealing with IDS. This classification algorithm searches for an optimum hyperplane to divide two classes by conducting a structural risk analysis of statistical learning theory [15]. For the definition of this hyperplane, the algorithm ensures some support vectors are computed so that the maximum margin could be achieved. Some authors applied a soft margin when the data set was imperfectly linearly separable. This means that the authors were able to change the non-linear support vector machines to a linear problem using kernel functions. Many of the reviewed papers employed radial basis function (RBF). RBF has satisfactory non-linear forecasting abilities and RBF SVM has a smaller number of controllable parameters with respect to linear SVMs.
Naïve Bayes classifier is considered the simplest form of Bayesian network classifiers as all attributes are naively assumed to be unconstrained. Several authors have used this classifier in many studies because of its accuracy, performance, as well as simplicity, which can be attributed to its assumption property, which is conditionally independent. However, some authors have discovered that this classification algorithm will not have a good performance if there is an unsatisfied assumption property especially with data sets such as the KDD'99 data set which has complex attribute dependencies [16].
Decision trees are another widely used classification technique consisting of leaf nodes and decision nodes. One major evaluation factor for decision trees is classification error. This error has been defined as the misclassified cases percentage. When the class categories are more in the decision tree, there is a significant reduction in the classification accuracy [16]. Some authors considered the decision tree-based algorithms to be more advantageous than SVM as decision tree-based algorithms, especially J48, showed better weighted recall and overall accuracy. Furthermore, those authors concluded that decision tree-based algorithms provide better understanding of various classes of malicious behaviors as results were better interpreted.
Another common classification technique is random forest. Through this technique, several trees from the training data set are created. Every data set will go through the forest of trees to be classified and by averaging prediction from all the trees, the results are calculated. This classification technique has been considered to have excellent accuracy. Random forest has also been shown to help reduce false alarms and processing times [16].
Logistic regression is a probabilistic linear classifier that involves the projection of input vectors onto hyperplanes. The probability that the input is a member of a corresponding class is reflected through the distance of the input to the hyperplane. Though the logistic classifier needs extensive training time in some instances, it is an efficient classifier and has been used in cybersecurity to more effectively handle noisy data that can often be generated when trying to deal with security threats and attacks [17].
Neural networks were also used in many of the selected reviewed papers for IDS. More advanced variants of this classification technique include deep learning, with the support of several layers of connected networks. This classification algorithm is suitable for solving complicated data problems by extraction of sophisticated patterns from features with limited prior knowledge. Backpropagation is a common method used in training a neural network, and has been considered to result in local solutions or cause low training speed so a single-hidden layer feed-forward neural network, called extreme learning machines (ELMs) was proposed. The ELMs use bias and random weights for the connection of hidden neurons and input. Also, the ELMs use only a one step calculation of least squares approximation to determine the weights, thereby speeding up the learning process [18].
Finally, many current IDS studies have proposed hybrid detection techniques, combining both signature-based detection techniques and anomaly-based detection techniques. Several papers integrated classification algorithms to create effective IDS. One of the papers of interest proposed a Signature-based Anomaly Detection System (SADS) to overcome some of the drawbacks of the conventional IDS, such as false alarms, by integrating naïve Bayes and random forest classifiers [16].
It was also discovered that when random forest and NBTree algorithms, which combine naïve Bayes and decision tree classifiers, were used cooperatively based on the sum rule scheme, the detection accuracy was greater than the singular random tree algorithm's detection accuracy [19]. Malicious web sessions have also been automatically classified to multiple vulnerability scan classes and attack classes using various multiclass supervised ML methods such as SVM, J48, and PART [20]. Figure 4 shows the popular techniques used in offensive cybersecurity. Neural networks were the more commonly used technique, while unsupervised ML techniques such as association rule mining [21], frequent pattern mining [11], and clustering [22] were used in a few studies. Combinations of supervised and unsupervised methods as semi-supervised learning were also used [23].

Offensive Machine Learning Techniques
Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 26 Figure 4 shows the popular techniques used in offensive cybersecurity. Neural networks were the more commonly used technique, while unsupervised ML techniques such as association rule mining [21], frequent pattern mining [11], and clustering [22] were used in a few studies. Combinations of supervised and unsupervised methods as semi-supervised learning were also used [23]. Many authors have conducted research on bigraph-based ML algorithms and a subset of anonymized data learning methods function as offensive filter-based methods for network defense as well as moving target defense techniques that change the view of the network from the attacker through spatio-temporal randomization. These studies contributed significantly to a paradigm shift from defensive to offensive cybersecurity. One of the selected reviewed papers of interest discussed a framework created by the authors with the purpose of predicting adversarial movement with progressing threats. The authors employed the Cloppert 12-stage intrusion-chain model [24] and used various ML and DM models for predicting threats with time series data: Auto Regressive Integrated Moving Average (ARIMA), Nonlinear Auto Regressive (NAR) neural network, NAR neural network with eXogenous input (NARX), and NAR neural network for multi-steps ahead prediction. The authors were successful in predicting adversarial movement with the proposed innovative mixed-methods approach [25]. Similarly, to solve the challenge of predicting potentially malicious actions in execution files, Recurrent Neural Network (RNN) models have been used, as RNNs have been proven to be effective in time series data processing, providing high accuracy with minimal execution time for the case of dynamic malware, with appropriate feature selection and optimally configured hyperparameters [26].

Supervised Machine Learning Data Sets
Most of the studies surveyed using supervised ML with defensive or offensive security mechanisms used contest-curated data sets, while a number of recent articles have also used realworld data. The listing of data sets for supervised ML is summarized in Figure 5.
KDDCUP'99 was the most commonly used data set in the selected reviewed papers and this was followed by DARPA'99, NSL-KDD, log files, and honeypot. In Figure 5  Many authors have conducted research on bigraph-based ML algorithms and a subset of anonymized data learning methods function as offensive filter-based methods for network defense as well as moving target defense techniques that change the view of the network from the attacker through spatio-temporal randomization. These studies contributed significantly to a paradigm shift from defensive to offensive cybersecurity. One of the selected reviewed papers of interest discussed a framework created by the authors with the purpose of predicting adversarial movement with progressing threats. The authors employed the Cloppert 12-stage intrusion-chain model [24] and used various ML and DM models for predicting threats with time series data: Auto Regressive Integrated Moving Average (ARIMA), Nonlinear Auto Regressive (NAR) neural network, NAR neural network with eXogenous input (NARX), and NAR neural network for multi-steps ahead prediction. The authors were successful in predicting adversarial movement with the proposed innovative mixed-methods approach [25]. Similarly, to solve the challenge of predicting potentially malicious actions in execution files, Recurrent Neural Network (RNN) models have been used, as RNNs have been proven to be effective in time series data processing, providing high accuracy with minimal execution time for the case of dynamic malware, with appropriate feature selection and optimally configured hyperparameters [26].

Supervised Machine Learning Data Sets
Most of the studies surveyed using supervised ML with defensive or offensive security mechanisms used contest-curated data sets, while a number of recent articles have also used real-world data. The listing of data sets for supervised ML is summarized in Figure 5.
KDDCUP'99 was the most commonly used data set in the selected reviewed papers and this was followed by DARPA'99, NSL-KDD, log files, and honeypot. In Figure 5  DARPA'99 data set was commonly used in the reviewed papers when testing IDS methods as it is considered well documented and well-studied in IDS. This data set is widely used especially in articles that focused on signature-based IDS. This type of detection involves the availability of a normal network traffic and a whole set of attack with each attack having a comprehensive information such as source and destination ports, source and destination IP addresses, attack duration, attack starting time, and other relevant information [27]. Though DARPA'99 data set is considered well documented and well-studied, many of the reviewed papers combined it with other data sets when evaluating intrusion methods [16]. This was because many researchers concluded that DARPA'99 is old and not suitable for evaluating recent IDS methods. Some of the reviewed articles either used both DARPA'99 and UNSW-NB15 Data Set or used only the UNSW-NB15 Data Set which is a recent data set generated with the aim of responding to the inaccessibility of network benchmark data set challenges [27].
The NSL-KDD data set is an updated version of KDDCUP'99 and was used in many of the articles reviewed as researchers consider KDDCUP'99 to have some innate flaws. One major flaw is the large number of unwanted records as duplicates of records were found when the KDDCUP'99 was used for IDS. This flaw causes bias in evaluation results [28]. However, many recent articles employ KDDCUP'99 when evaluating the performance of anomaly-based IDS. Also, many authors consider the KDDCUP'99 to provide non-biased evaluation results especially when used for the evaluation of anomaly-based IDS performance [19]. NSL-KDD is still regarded to have some problems, such as unrealistic data rates of normal and attack data. Nevertheless, many studies employ NSL-KDD data set when evaluating IDS approaches because its test subset and train subset records are reasonable [29].
Log files were also employed in the reviewed literature. Though log files were not as commonly used in the reviewed papers as DARPA'99, KDDCUP'99, and NSL-KDD, it was still employed in a significant number of studies. Some of the studies used log files obtained from anti-virus logs, firewall servers, and IDS [30]. Web logs as real-world data sets contained various websites and relevant information about ATTACK-DATE, HOST, PARAMETERS, URL, REFER, USER-AGENT, COOKIE, IP, POST-CONTENT, as well as other details.
Various honeypot mechanisms provided as data sets were also leveraged in the surveyed research papers, which could be deployed to different networks to provide effective and practical evaluation results. Most of the reviewed papers that employed honeypots when evaluating IDS approaches made some modifications in addition to the traditional honeypots. For example, in one of the reviewed papers, the researchers deployed offensive systems that access joint botnets and malicious web servers to receive different types of commands as the traditional honeypots only receive attacks. There were also some less common real-world data sets used in some of the reviewed papers. Some of the less common real-world data sets were a real-world benchmark corpus which contained an estimate of one billion words from the Google code project and the real-world data sets from the ML database repository of the University of California, Irvine (UCI) [31]. DARPA'99 data set was commonly used in the reviewed papers when testing IDS methods as it is considered well documented and well-studied in IDS. This data set is widely used especially in articles that focused on signature-based IDS. This type of detection involves the availability of a normal network traffic and a whole set of attack with each attack having a comprehensive information such as source and destination ports, source and destination IP addresses, attack duration, attack starting time, and other relevant information [27]. Though DARPA'99 data set is considered well documented and well-studied, many of the reviewed papers combined it with other data sets when evaluating intrusion methods [16]. This was because many researchers concluded that DARPA'99 is old and not suitable for evaluating recent IDS methods. Some of the reviewed articles either used both DARPA'99 and UNSW-NB15 Data Set or used only the UNSW-NB15 Data Set which is a recent data set generated with the aim of responding to the inaccessibility of network benchmark data set challenges [27].

Cyberattacks Tackled by Machine Learning
The NSL-KDD data set is an updated version of KDDCUP'99 and was used in many of the articles reviewed as researchers consider KDDCUP'99 to have some innate flaws. One major flaw is the large number of unwanted records as duplicates of records were found when the KDDCUP'99 was used for IDS. This flaw causes bias in evaluation results [28]. However, many recent articles employ KDDCUP'99 when evaluating the performance of anomaly-based IDS. Also, many authors consider the KDDCUP'99 to provide non-biased evaluation results especially when used for the evaluation of anomaly-based IDS performance [19]. NSL-KDD is still regarded to have some problems, such as unrealistic data rates of normal and attack data. Nevertheless, many studies employ NSL-KDD data set when evaluating IDS approaches because its test subset and train subset records are reasonable [29].
Log files were also employed in the reviewed literature. Though log files were not as commonly used in the reviewed papers as DARPA'99, KDDCUP'99, and NSL-KDD, it was still employed in a significant number of studies. Some of the studies used log files obtained from anti-virus logs, firewall servers, and IDS [30]. Web logs as real-world data sets contained various websites and relevant information about ATTACK-DATE, HOST, PARAMETERS, URL, REFER, USER-AGENT, COOKIE, IP, POST-CONTENT, as well as other details.
Various honeypot mechanisms provided as data sets were also leveraged in the surveyed research papers, which could be deployed to different networks to provide effective and practical evaluation results. Most of the reviewed papers that employed honeypots when evaluating IDS approaches made some modifications in addition to the traditional honeypots. For example, in one of the reviewed papers, the researchers deployed offensive systems that access joint botnets and malicious web servers to receive different types of commands as the traditional honeypots only receive attacks. There were also some less common real-world data sets used in some of the reviewed papers. Some of the less common real-world data sets were a real-world benchmark corpus which contained an estimate of one billion words from the Google code project and the real-world data sets from the ML database repository of the University of California, Irvine (UCI) [31].

Cyberattacks Tackled by Machine Learning
In this section, an overview of the major challenges discussed in the selected reviewed papers is presented. Many authors of the selected reviewed papers employed different ML techniques to solve some of the most common IDS challenges which have been extensively discussed in IDS and cybersecurity research. Figure 6 summarizes the cyberattacks tackled by ML techniques.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 26 In this section, an overview of the major challenges discussed in the selected reviewed papers is presented. Many authors of the selected reviewed papers employed different ML techniques to solve some of the most common IDS challenges which have been extensively discussed in IDS and cybersecurity research. Figure 6 summarizes the cyberattacks tackled by ML techniques. Denial of Service (DoS), User to Root (U2R), Remote to Local (R2l), and probe attacks have been the most common categories of attacks solved with ML techniques. These attacks were found in most of the data sets (training and testing set). The DoS attack is an attack that results in the unavailability of network resources to intended users as services of a host connected to the internet becomes suspended. U2R attacks involve attackers attempting to get access of a target system without gaining official permission or approval. R2L attacks involve situations where attackers exploit vulnerabilities which could involve the guessing of passwords to take control over a remote machine. Probe attacks involve attackers examining machines to obtain relevant information [32]. Distributed Denial of Service (DDoS) attack is also a common attack solved with ML techniques. This attack has been considered the most enhanced form of DoS attacks. Its power to deploy attack vectors over the internet in a "distributed way" and generate lethal traffic through the aggregation of these forces differentiates it from other attacks [33].
Anti-Malware software products which protect legitimate users from attacks mostly use signature-based detection methods. These methods involve the extraction of unique signatures from already known malicious files and the identification of an executable file as a malicious code if there is a match between its signatures and the list containing available signatures [34]. Many studies have shown that the restriction of the signature-based detection method to recognize already known malware has made it ineffective and unreliable against new malicious codes [17].
Currently, many firms deal with botnets as it has been considered as one of the most significant cybersecurity threats. Many cybercrimes committed usually involved the use of botnets. Although, many studies have discussed different methods in which botnet could be detected and analyzed, coping with new forms of botnets has become a major challenge which has received relatively limited attention [35]. Various reviewed papers proposed new detection techniques to address botnet attacks. In some of the papers, traffic behavior analysis was used in detecting botnet activities as network traffic behavior was classified using ML. The traffic behavior analysis approach can function normally with encrypted network communication protocols as it is independent of packet payloads.
Also, information regarding network traffic can be recovered with ease from different network devices without the service availability or network performance being significantly affected. Furthermore, it has been discovered that various existing botnet detection techniques depend on the detection of botnet activities during the initial formation phase or attack phase. As a result, some of Denial of Service (DoS), User to Root (U2R), Remote to Local (R2l), and probe attacks have been the most common categories of attacks solved with ML techniques. These attacks were found in most of the data sets (training and testing set). The DoS attack is an attack that results in the unavailability of network resources to intended users as services of a host connected to the internet becomes suspended. U2R attacks involve attackers attempting to get access of a target system without gaining official permission or approval. R2L attacks involve situations where attackers exploit vulnerabilities which could involve the guessing of passwords to take control over a remote machine. Probe attacks involve attackers examining machines to obtain relevant information [32]. Distributed Denial of Service (DDoS) attack is also a common attack solved with ML techniques. This attack has been considered the most enhanced form of DoS attacks. Its power to deploy attack vectors over the internet in a "distributed way" and generate lethal traffic through the aggregation of these forces differentiates it from other attacks [33].
Anti-Malware software products which protect legitimate users from attacks mostly use signature-based detection methods. These methods involve the extraction of unique signatures from already known malicious files and the identification of an executable file as a malicious code if there is a match between its signatures and the list containing available signatures [34]. Many studies have shown that the restriction of the signature-based detection method to recognize already known malware has made it ineffective and unreliable against new malicious codes [17].
Currently, many firms deal with botnets as it has been considered as one of the most significant cybersecurity threats. Many cybercrimes committed usually involved the use of botnets. Although, many studies have discussed different methods in which botnet could be detected and analyzed, coping with new forms of botnets has become a major challenge which has received relatively limited attention [35]. Various reviewed papers proposed new detection techniques to address botnet attacks. In some of the papers, traffic behavior analysis was used in detecting botnet activities as network traffic behavior was classified using ML. The traffic behavior analysis approach can function normally with encrypted network communication protocols as it is independent of packet payloads. Also, information regarding network traffic can be recovered with ease from different network devices without the service availability or network performance being significantly affected. Furthermore, it has been discovered that various existing botnet detection techniques depend on the detection of botnet activities during the initial formation phase or attack phase. As a result, some of the studies proposed techniques for the detection of botnets during the initial formation phase as well as during the control and command phase.
One of the current attacks organizations face is the zero-day attack [36]. Many of the selected reviewed papers focus on anomaly detection methods to detect zero-day attacks using behavior-based data from benign programs. Some of the studies proposed a host-based anomaly detection method. In one of the studies, fuzzy logic and genetic algorithms were employed for anomaly detection. Furthermore, a significant number of the selected reviewed papers presented an enhanced or modified SVM approach to solve this challenge. Several challenges in detecting sequential data anomalies still exists even though anomaly detection techniques are employed in various studies and applied in several areas [37]. Furthermore, many of the selected reviewed papers employed ML towards cyberattacks such as malware, phishing, SQL injections, ransomware, and cross-site scripting (XSS) [38]. Authors find detecting anomalies in sequential data more complicated than detecting anomalies in static patterns and this is due to the sequential data's temporary related nature. Many authors of the selected reviewed papers propose different novel ML techniques to solve this challenge. In one of the studies, they proposed a temporal difference (TD) learning based method. The authors made some modifications to the Markov reward model which is often used to detect multi-stage cyberattacks. Furthermore, the value functions of Markov reward process were equivalent to the anomaly probabilities of sequential behaviors were included in the proposed TD learning based approach.

Discussion
There are a number of insights and challenges that can be identified in the use of ML with cybersecurity. These include the high dimensionality of network traffic data, class overlap between threats and legitimate data over the feature space, and general uncertainty of information. Nevertheless, ML in cybersecurity is growing exponentially, and the future points to more utilization.

High Dimensionality of Network Traffic Data
High dimensionality of network traffic data has made classification challenging as network traffic data usually comprise of many attributes and features [39]. This is mainly due to the computational complexity and resources required to process large and sparse matrices with many feature columns and observation rows. This challenge makes it difficult for researchers to train models in order to differentiate between anomalous and normal behavior [40]. As a result, many of the selected reviewed papers discussed the need for a reduction in the dimensionality of network traffic data as well as feature selection and the introduction of ML techniques when classifying such data. Some of the authors of the selected reviewed papers employed data mining techniques in a cloud-based environment, by choosing suitable attributes and features with the least relevance with regards to weight for the classification. In the majority of the reviewed studies, the standard strategy is to choose features with more desirable weights.

Class Overlap Challenge
Network intrusion detection systems face the challenge of providing satisfactory detection results due to class overlap between threat and legitimate data over feature space. This is also one of the causes behind false alarms and false positives observed in ML-based IDS. Another aspect of the class overlap challenge is the temporal shifting of network nodes from being threats to non-threats [41]. Nodes can be malicious or non-malicious at different times due to changes in performance, resource availability, or infections, disinfections, and re-infections. Various authors of the selected reviewed papers proposed different ML optimizations to solve this class overlap challenge. One of the studies introduced a wavelet based multi-scale Hebbian learning approach to neural networks [42], and the proposed methodology was able to properly differentiate between non-linear and overlapping boundaries.

Uncertainty of Information
The increasing occurrence of cyberattacks worldwide has resulted in the misuse or loss of information assets, thereby, increasing organizations' expenses [43]. Over the years, intrusion detection systems have been used for the protection of networks and computer systems. To detect cyberattacks, most of the present intrusion detection systems depend on low-level raw network data. A current practice is to employ knowledge-based intrusion detection systems which store cyberattack related information as well as the corresponding vulnerabilities. Also, this stored information is used for guiding the process of predicting attacks. A major challenge knowledge-based IDS face is the inability to predict attacks due to the lack of contextual information or the uncertainty of information.
Contextual information involves not only information about the configuration on the target systems and their vulnerabilities but any important pre-condition that must exist to achieve a successful attack. Also, contextual information includes probable semantic relationships between the targeted locations and the activities of the attackers at the time of the activities. Machine learning and probabilistic approaches have been employed in many studies to tackle common uncertainty challenges. However, many authors discovered that these approaches use models that users cannot understand but fuzzy logic approaches model uncertainty in a user-friendly form.

Future of Machine Learning in Cybersecurity
Looking ahead, some of the predictions around new cybersecurity threats involve the exploitation of ML systems by attackers and the use of these ML systems to aid assaults. Even though ML systems have been useful in automating manual activities and enhancing decision-making, they are also targets of new attacks. The fragility of some ML technologies has been predicted to become a growing concern. ML systems are also potential targets by hackers, and ML techniques can be used by attackers to enhance their attack vectors and data sniffing activities. Also, phishing and other social engineering attacks could be made better using ML, fooling targeted individuals through the creation of well-crafted audio-visuals or untraceable emails. Furthermore, realistic disinformation campaigns could be launched using ML. The generation of new threats can been made relatively easy for attackers due to the availability of attack toolkits for sale online. Another prediction made by some researchers relating to cybersecurity is increased dependence on ML for countering attacks and identifying vulnerabilities. Mobile phone users could be warned of risky actions when ML is embedded into mobile phones. Trade-offs between tracking personal information in exchange for added security is an ongoing discussion especially within research on security-based ML [44].

Conclusions
In the fight against malicious threats, there has been collaborative support from experts to design different cyber defense systems. Intrusion detection mechanisms monitor, track, and block viruses and other malicious cyberattacks. However, these methods are still vulnerable to attacks in applications because the design and implementation of software and networks is imperfect. Advanced methods using ML are being developed to discover previously unknown cyber intrusions and techniques towards a more dependable cybersecurity infrastructure, including both defensive and offensive approaches. This paper systematically synthesized the knowledge base in the domain of cybersecurity with ML. We also covered current challenges and future directions of ML in cybersecurity while surveying nearly two decades of research in the applications of ML to security. By employing the Systematic Literature Reviews and PRISMA model, we answered three research questions on ML techniques being used in offensive and defensive cybersecurity, data sets being used in training supervised ML models, as well as cyberattacks that have been tackled by ML. Our study was limited to literature investigation, and while algorithmic and experimental comparison of the different approaches was beyond our scope, this would be an interesting research direction for future work in this area. Although there is no silver bullet ML algorithm to handle all possible cybersecurity vulnerabilities, threats, and attacks, our study shows impressive outcomes from ML solutions, and provides a good starting point for researchers exploring ML techniques within cybersecurity.  State the process for selecting studies (i.e., screening, eligibility, included in systematic review, and, if applicable, included in the meta-analysis). Describe method of data extraction from reports (e.g., piloted forms, independently, in duplicate) and any processes for obtaining and confirming data from investigators.

Data items 11
List and define all variables for which data were sought (e.g., PICOS, funding sources) and any assumptions and simplifications made. 3

12
Describe methods used for assessing risk of bias of individual studies (including specification of whether this was done at the study or outcome level), and how this information is to be used in any data synthesis. Specify any assessment of risk of bias that may affect the cumulative evidence (e.g., publication bias, selective reporting within studies). 3

Additional analyses 16
Describe methods of additional analyses (e.g., sensitivity or subgroup analyses, meta-regression), if done, indicating which were pre-specified.

Study selection 17
Give numbers of studies screened, assessed for eligibility, and included in the review, with reasons for exclusions at each stage, ideally with a flow diagram. 3

Study characteristics 18
For each study, present characteristics for which data were extracted (e.g., study size, PICOS, follow-up period) and provide the citations.

21-28
Risk of bias within studies 19 Present data on risk of bias of each study and, if available, any outcome level assessment (see item 12).

Results of individual studies 20
For all outcomes considered (benefits or harms), present, for each study: (a) simple summary data for each intervention group (b) effect estimates and confidence intervals, ideally with a forest plot.

5-11
Synthesis of results 21 Present results of each meta-analysis done, including confidence intervals and measures of consistency.

11-12
Risk of bias across studies 22 Present results of any assessment of risk of bias across studies (see Item 15). N/A Additional analysis 23 Give results of additional analyses, if done (e.g., sensitivity or subgroup analyses, meta-regression [see Item 16]).

Summary of evidence 24
Summarize the main findings including the strength of evidence for each main outcome; consider their relevance to key groups (e.g., healthcare providers, users, and policy makers).

Limitations 25
Discuss limitations at study and outcome level (e.g., risk of bias), and at review-level (e.g., incomplete retrieval of identified research, reporting bias).

Conclusions 26
Provide a general interpretation of the results in the context of other evidence, and implications for future research.

Funding 27
Describe sources of funding for the systematic review and other support (e.g., supply of data); role of funders for the systematic review.