Functionality-Preserving Adversarial Machine Learning for Robust Classification in Cybersecurity and Intrusion Detection Domains: A Survey

Machine learning has become widely adopted as a strategy for dealing with a variety of cybersecurity issues, ranging from insider threat detection to intrusion and malware detection. However, by their very nature, machine learning systems can introduce vulnerabilities to a security defence whereby a learnt model is unaware of so-called adversarial examples that may intentionally result in mis-classification and therefore bypass a system. Adversarial machine learning has been a research topic for over a decade and is now an accepted but open problem. Much of the early research on adversarial examples has addressed issues related to computer vision, yet as machine learning continues to be adopted in other domains, then likewise it is important to assess the potential vulnerabilities that may occur. A key part of transferring to new domains relates to functionalitypreservation, such that any crafted attack can still execute the original intended functionality when inspected by a human and/or a machine. In this literature survey, our main objective is to address the domain of adversarial machine learning attacks and examine the robustness of machine learning models in the cybersecurity and intrusion detection domains. We identify the key trends in current work observed in the literature, and explore how these relate to the research challenges that remain open for future works. Inclusion criteria were: articles related to functionality-preservation in adversarial machine learning for cybersecurity or intrusion detection with insight into robust classification. Generally, we excluded works that are not yet peer-reviewed; however, we included some significant papers that make a clear contribution to the domain. There is a risk of subjective bias in the selection of non-peer reviewed articles; however, this was mitigated by co-author review. We selected the following databases with a sizeable computer science element to search and retrieve literature: IEEE Xplore, ACM Digital Library, ScienceDirect, Scopus, SpringerLink, and Google Scholar. The literature search was conducted up to January 2022. We have striven to ensure a comprehensive coverage of the domain to the best of our knowledge. We have performed systematic searches of the literature, noting our search terms and results, and following up on all materials that appear relevant and fit within the topic domains of this review. This research was funded by the Partnership PhD scheme at the University of the West of England in collaboration with Techmodal Ltd.


Introduction
Machine learning (ML) has become widely adopted as a strategy for dealing with a variety of cybersecurity issues. Cybersecurity domains particularly suited to ML include: intrusion detection and prevention [1], network traffic analysis [2], malware analysis [3,4], user behaviour analytics [5], insider threat detection [6], social engineering detection [7], spam detection [8], detection of malicious social media usage [9], health misinformation [10], climate misinformation [11], and more generally "Fake News" [12]. These are essentially classification problems. Papernot et al. [13] stated that most ML models can be described mathematically as functions h 0 (x) with an input x and parameterized by a vector θ ∈ Θ, although some models such as K nearest neighbor are non-paremetric. The output of the the function h 0 (x) is the model's prediction of some property of interest for the given input x. The input x is usually represented as a vector of values called features. The space of functions h = x → h 0 (x)|θ ∈ Θ defines the set of candidate hypotheses. In supervised learning, the parameters are adjusted to align model predictions h 0 (x) with the expected output y. This is achieved by minimizing a loss function that captures the dissimilarity of h 0 (x) and the corresponding y. Model performance must be validated against a separate training dataset to confirm if the model also generalizes well for unseen data. Classification ML systems find a function ( f ) that matches a vector ( x) to its corresponding class (y).
Dhar et al. [14] noted that few studies analyzed the complexity of models and associated trade-offs between accuracy and complexity. The complexity of an algorithm is often expressed in Big-O notation. They reviewed models, stating the number of features and activations have an effect on memory usage and computational complexity. Moreover, they argued that accuracy alone cannot justify the choice of model type, particularly in regard to DNN; however, we consider the risks involved for inaccurate predictions will vary across domains. In security domains, greater accuracy may be considered critical, possibly assuaging concerns regarding computational complexity of models.
Critically, ML systems are increasingly trusted within cyber physical systems [15], such as power stations, factories, and oil and gas industries. In such complex physical environments, the potential damage that could be caused by a vulnerable system might even be life threatening [16]. Despite our reliance and trust in ML systems, the inherent nature of machine learning-learning to identify patterns-is in itself a potential attack vector for adversaries wishing to circumvent ML-based system detection processes. Adversarial examples are problematic for many ML algorithms and models including random forests (RF) and naive Bayes (NB) classifiers; however, we focus on artificial neural networks and particularly deep neural networks. Artificial neural networks (ANN) are inspired by the network of neurons in the human brain. ANNs are useful because they can generalize from a finite set of examples, essentially mapping a large input space (infinite for continuous inputs) to a range of discrete outputs. Unfortunately, in common with other ML algorithms, neural networks are vulnerable to attacks using carefully crafted perturbations to inputs, including evasion and poisoning attacks. In recent work, carefully crafted inputs described as "adversarial examples" are considered possible in ANN because of these inherent properties that exist within neural networks [17], such as: The semantic information of the model is held across the model and not localised to specific neurons; 2.
Neural networks learn input-output mappings that are discontinuous (and discontiguous).
These properties mean that even extremely small perturbations of an input could cause a neural network to provide a misclassified output. Given that neural networks have these properties, we reasonably expect our biological neural networks to suffer misclassifications, and/or to have evolved mitigations. Human brains are more complex than current artificial neural networks, yet they suffer a type of misclassification (illusory perception), in the form of face pareidolia [18,19]. This strengthens the case that the properties of neural networks are a source of adversarial examples (AE). In cybersecurity-related domains it has been seen how adversaries exploit adversarial examples, using carefully-crafted noise to evade detection through misclassification [20,21].
In this way, an adversarial arms race exists between adversaries and defenders. The recent SolarWinds supply chain attack [22,23] identified in December 2020 indicates the reliance that organisations have on intrusion detection software, and the presence of advanced persistent threats (APTs) with the expertise and resources to attack organisations' network defenses. Adversarial machine learning is a critical area of research. If not addressed, there is increasing potential for novel attack strategies that seek to exploit the inherent weaknesses that exist within machine learning models. For this reason, this survey addresses the issues related to the robustness of machine learning models against adversarial attacks across the cybersecurity domain, where problems of functionality-preservation are recognized. While we use a case study of a network-based intrusion detection system (NIDS), these issues might be applicable in other areas where ML systems are used. We focus on papers detailing adversarial attacks and defenses. Attacks are further classified by attack type, attack objective, domain, model, knowledge required, and constraints. Defenses are further categorised by defense type, domain, and model. In the domain of network traffic analysis, adversaries need to evade detection methods. A suitable network firewall will reject adversarial traffic and malformed packets while accepting legitimate traffic. Therefore, successful adversarial examples must be crafted to comply with domain constraints such as those related to the transmission control protocol/internet protocol (TCP-IP) stack. Moreover, adversaries wish to preserve the functionality of their attacks. A successful attack must not lose functionality at the expense of evading a classifier. The essence of a simple adversarial attack is that a malicious payload evades detection by masquerading as benign. We refer to this characteristic as functionality-preserving. Compared to domains such as computer vision whereby the image modification is only to fool human vision sensors, adversarial attacks in other domains are significantly more challenging to fool both a human and/or system-based sensor. The major contributions of this paper are: • We conduct a survey of the literature to identify the trends and characteristics of published works on adversarial learning in relation to cybersecurity, addressing both attack vectors and defensive strategies; • We address the issue of functionality-preservation in adversarial learning in contrast to domains such as computer vision, whereby a malformed input must suitably fool a system process as well as a human user such that the original functionality is maintained despite some modification; • We summarise this relatively-new research domain to address the future research challenges associated with adversarial machine learning across the cybersecurity domain.
The remainder of this paper is structured as follows: Section 2 provides an overview of other important surveys; Section 3 discusses background material; Section 4 details the literature survey; Section 5 details our results; Section 6 provides our discussion, and the conclusion summarises our findings and identifies research challenges.

Related Works
Corona et al. [24] provided a useful overview of intrusion detection systems. They predicted greater use of machine learning for intrusion detection and called for further investigation into adversarial machine learning. We now consider a number of related academic surveys that have been presented in the last five years with a focus on adversarial examples, security, and intrusion detection.

Secure and Trustworthy Systems
Machine learning systems are used in increasingly diverse areas including those of cyber-security. Trust in these systems is essential. Hankin and Barrèere [25] note that there are many aspects to trustworthiness: reliability, trust, dependability, privacy, resilience, and safety. Adversaries ranging from solo hackers to state-sponsored APTs have an interest in attacking these systems. Successful attacks against machine learning models mean that systems are vulnerable and therefore potentially dangerously deployed in cyber-security domains. Cho et al. [26] proposed a framework considering the security, trust, reliability and agility metrics of computer systems; however, they did not specifically consider adversarial machine learning, or robustness to adversarial examples.

Adversarial ML in General
Papernot et al. [13] noted that the security and privacy of ML is an active but nascent area of research. In this early work, they systematized their findings on security and privacy in machine learning. They noted that a science for understanding many of the vulnerabilities of ML and countermeasures is slowly emerging. They analysed ML systems using the classical confidentiality, integrity and availability (CIA) model. They analysed: training in adversarial settings; inferring adversarial settings; and robust, fair, accountable, and private ML models. Through their analysis, they identified a total of eight key takeaways that point towards two related notions of sensitivity. The sensitivity of learning models to their training data is essential to privacy-preserving ML, and similarly the sensitivity to inference data is essential to secure ML. Central to both notions of sensitivity is the generalization error (i.e., the gap between performance on training and test data). They focused on attacks and defenses for machine learning systems and hoped that understanding the sensitivity of modern ML algorithms to the data they analysed will foster a science of security and privacy in machine learning. They argued that the generalization error of models is key to secure and privacy-preserving ML.
Their focus was on the visual domain and they did not specifically discuss IDS or functionality-preserving adversarial attacks.
Apruzzese et al. [28] examined adversarial examples and considered realistic attacks, highlighting that most literature considers adversaries with complete knowledge about the classifier and are free to interact with the target systems. They further emphasized that few works consider "relizable" perturbations that take account of domain and/or realworld constraints. There is perhaps a perception that the threat from adversarial attacks is low based on the assumption that much prior knowledge of the system is required. This approach has some merit; however, this could be an over-confident position to take. Their idea was that realistically the adversary has less knowledge of the system. This conflicts with Shannon's maxim [29] and Kerckhoff's second cryptographic principle [30], which states that the fewer secrets the system contains, the higher its safety. The pessimistic "complete knowledge" position is often used in cryptographic studies; in cryptographic applications it is considered safe because it is a bleak expectation. This expectation is also realistic since we must expect well-resourced adversaries to eventually discover or acquire all details of the system. Many adversarial example papers assume complete knowledge; this is however unlikely to always be the case, perhaps leading some to believe models are more secure against adversarial examples. However, the transferability property of adversarial examples means that complete knowledge is not required for successful attacks, and black-box attacks are possible with no prior knowledge of machine learning models. An adversary may only learn through interacting with the model. We must therefore account for the level of knowledge required by an adversary, including white-box, black-box, and gray-box knowledge paradigms.

Intrusion Detection
Wu et al. [31] considered several types of deep learning systems for network attack detection, including supervised and unsupervised models, and they compared the efficiency and effectiveness of different attack detection methods using two intrusion detection datasets: the "KDD Cup 99" dataset and an improved version known as NSL-KDD [32,33]. These two datasets have been used widely in the past by academic researchers; however, they do not fairly represent modern network traffic analysis problems due to concept-drift. Networks have increasing numbers of connected devices, increasing communications per second, and new applications using the network. The use of computer networks and the Internet has changed substantially in twenty years. The continued introduction of IPv6, network address translation, Wi-Fi, mobile 5G networks, and cloud providers has changed network infrastructure [34]. Furthermore, the Internet is now increasingly used for financial services. Akamai [35] reported that financial services now see millions or tens of millions of attacks each day. These attacks were less common twenty years ago. Furthermore, social media now constitutes much internet traffic and most social media platforms were founded after the KDD Cup 99 and NSL-KDD datasets were introduced. For example, Facebook, YouTube, and Twitter were founded in 2004, 2005, and 2006, respectively. This limits the validity of some research using outdated datasets. Therefore, we suggest research should use modern datasets that represent modern network traffic.
Kok et al. [36] analysed intrusion detection systems (IDS) that use a machine learning approach. They specifically considered the datasets used, the ML algorithms, and the evaluation metrics. They warned that some researchers are still using datasets first introduced decades ago (e.g., KDD Cup 99, NSL-KDD). They warned that this trend could result in no or insufficient progress on IDS. This would ultimately lead to the untenable position of obsolete IDS while intrusion attacks continue to evolve along with user behaviour and the introduction of new technologies. Their paper did not consider adversarial examples or robustness of ML models. Alatwi and Morisset [37] tabulated a list of network intrusion datasets in the literature that we extend in Table 1. Martins et al. [51] considered adversarial machine learning for intrusion detection and malware scenarios, noting that IDS are typically signature-based, and that machine learning approaches are being widely employed for intrusion detection. They described five "tribes" of ML algorithms before detailing some fundamentals of adversarial machine learning, including commonly used distance metrics: L ∞ , L 0 , and L 2 . They subsequently described common white-box methods to generate adversarial examples, including: Broyden-Fletcher-Goldfarb-Shanno algorithm (L-BFGS), the fast gradient sign method (FGSM), Jacobian-based saliency map attack (JSMA), Deepfool, and Carlini and Wagner attacks (C&W). They also considered black-box methods using generative adversarial networks (GANS). Traditional GANS sometimes suffer problems of mode collapse. Wasserstein generative adversarial networks (WGANS) solve some of these problems. They introduced Zeroth-order optimization attack (ZOO) as a black-box method. ZOO estimates the gradient and optimises an attack by iteratively adding perturbations to features. They noted that most attacks have been initially tested in the image domain, but can be applied to other types of data, which poses a security threat. Furthermore, they considered there is a trade-off when choosing an adversarial attack. For example, JSMA is more computationally intensive than FGSM but modifies fewer features. They considered JSMA to be the most realistic attack because it perturbs fewer features. When considering defenses, they tabulated advantages and disadvantages of common defenses. For example, feature squeezing is effective in image scenarios, but unsuitable for other applications because compression methods would result in data loss for tabular data. They noted that GANS are a very powerful technique that can result in effective adversarial attacks where the samples follow a similar distribution to the original data but cause misclassification.

Cyber-Physical Systems
Cyber-physical systems (CPSs) rely on computational systems to create actuation of physical devices. The range of devices is increasing from factory operations to power stations, autonomous vehicles, and healthcare operations. Shafique et al. [52] considered such smart cyber-physical systems. They discussed reliability and security vulnerabilities of machine learning systems, including hardware trojans, side channel attacks, and adversarial machine learning. This is important, because system aging and harsh operating environments mean CPSs are vulnerable to numerous security and reliability concerns. Advanced persistent threats could compromise the training or deployment of CPSs through stealthy supply-chain attacks. A single vulnerability is sufficient for an adversary to cause a misclassification that could lead to drastic effects in a CPS (e.g., an incorrect steering decision of an autonomous vehicle could cause a collision). We consider that vulnerabilities in ML could lead to a range of unwanted effects in CPSs, including those that could lead to life-threatening consequences [16]. The Stuxnet worm is an example of malware with dire consequences.

Contributions of This Survey
Our main objectives are: • Collect and collate current knowledge regarding robustness and functionality-preserving attacks in cybersecurity domains; • Formulate key takeaways based on our presentation of the information, aiming to assist understanding of the field.
This survey aims to complement existing work while addressing clear differences, by also studying the robustness of adversarial examples, specifically functionality-preserving use cases. Most previous work aimed to improve the accuracy of models or examine the effect of adversarial examples. Instead, we consider the robustness of models to adversarial examples.
Machine learning systems are already widely adopted in cybersecurity. Indeed, with increasing network traffic, automated network monitoring using ML is becoming essential. Modern computer networks carry private personal and corporate data including financial transactions. These data are an attractive lure to cyber-criminals. Adversaries may wish to steal or disturb data. Malware, spyware, and ransomware threats are endemic on many computer networks. IDS help keep networks safe; however, an adversarial arms race exists, and it is likely that adversaries, including advanced persistent threats, are developing new ways to evade network defenses. Some research has evaded intrusion detection classifiers using adversarial examples.
We identify that while adversarial examples in the visual domain are well understood, less work has focused on how adversarial examples can be applied to network traffic analysis and other non-visual domains, similarly to machine learning models used for image and object recognition. For example, convolutional neural networks (CNNs) are well researched, whereas other model types used for intrusion detection, e.g., recurrent neural networks (RNNs), receive less attention. The generation of adversarial examples to fool IDS is more complicated than visual domains because the features include discrete and non-continuous values [53]. Compounding the defense against adversarial examples is the overconfident assumption that successful adversarial examples require "complete knowledge" of the model and parameters. On the contrary, black-box attacks are possible with no or limited knowledge of the model. Most defenses so far proposed consider the visual domain and most are ineffective against strong and black-box attacks. This survey addresses the problem of adversarial machine learning across cyber-security domains. Further research is required to head off future mature attack methods that could facilitate more complex and destructive attacks.

Background
Here we provide further background on some key concepts that are related to adversarial learning, to support the reader of this survey. We cover the topics of model training, robustness, common adversarial example algorithms, adversary capabilities, goals, and attack methods.

Model Training
It is important to consider the dataset on which models are trained, because the trustworthiness and quality of a model is impacted by the distribution, quality, quantity, and complexity of dataset training samples [54]. Biased models are more susceptible to adversarial examples. Therefore, models must be trained on unbiased training data; however, Johnson et al. considered that the absolute number of training samples may be more important than the ratio of class imbalance [55]. For example, a small percentage of a large number of samples is sufficient to train a model regardless of high class imbalance (e.g., 1% malicious samples in 1 million network flows yields 10,000 samples). Unfortunately, cybersecurity datasets are often prone to bias, in part because of limited samples of some malicious traffic (e.g., zero-day attacks) and large amounts of benign traffic. Sheatsley et al. [56]  Algorithm-level techniques tackling dataset bias commonly employ cost-sensitive learning where a class penalty or weight is considered or decision thresholds are shifted to reduce bias [55].

Loss Functions
When training a model the goal is to minimize the loss function through use of an optimizer that adjusts the weights at each training step. Common optimizers include stochastic gradient descent (SGD), adaptive moments (Adam), and root mean squared propagation (RMSProp). Commonly, a regularizer is employed during training to ensure the model generalizes well to new data. A dropout layer is often employed as a regularizer.

Cross Validation
Cross validation [57] is a widely used data resampling method to assess the generalizability of a model and to prevent over-fitting. Cross validation often involves stratified random sampling, meaning the sampling method retains the class proportions in the learning set. In leave-one-out cross validation, each sample is used in turn as the validation set. The test error approximates the true prediction error; however, it has high variance. Moreover, its computational cost can be high for large datasets. k-fold cross validation aims to optimise the bias/variance trade-off. In k-fold cross validation, the dataset is randomly split into k equal size partitions. A single partition is retained for test data, and the remaining k − 1 partitions are used for training. The cross validation steps are reiterated until each partition is used once for validation, as shown in Figure 1. The results are averaged across all iterations to produce an estimation of the performance of the model (Equation (4)). Refaelzadeh et al. highlighted risks of elevated Type I errors (false positives).
With larger values of k, variance is reduced. Moreover, bias also reduces because the model is trained on more of the dataset. We posit that resampling techniques could be used to improve robustness against adversarial examples.

Bootstrapping
Bootstrapping is resampling with replacement, and is often used to statistically quantify the performance of a model, to determine if a model is statistically significantly better than other models.

Robustness
Robustness can be defined as the performance of well-trained models facing adversarial examples [58]. Essentially, robustness considers how sensitive a model's output is to a change in the input. The robustness of a model is related to the generalization error of the model. There is a recognised trade-off between accuracy and robustness in machine learning. That is, highly accurate models are less robust to adversarial examples. Machine learning models in adversarial domains must be both highly accurate and robust. Therefore, improving the robustness of machine learning models enables safer deployment of ML systems across a wider range of domains.
Other possible useful metrics to evaluate robustness include: the Lipschitzian property, which monitors the changes in the output with respect to small changes to inputs; and CLEVER (cross-Lipschitz extreme value for network robustness), which is an extreme value theory (EVT)-based robustness score for large-scale deep neural networks (DNNs). The proposed CLEVER score is attack-agnostic and computationally feasible for large neural networks improving on the Lipschitzian property metric [59]. Table 2 details some advantages and disadvantages of some robustness metrics. CLEVER is less suited to black-box attacks and where gradient masking occurs [60]; However, extensions to CLEVER help mitigate these scenarios [61]. [62] Empirical robustness Suitable for very deep neural networks and large datasets.

Common Adversarial Example Algorithms
There are numerous algorithms to produce adversarial examples. Szegedy et al. [17] used a box-constrained limited memory L-BFGS. Other methods include FGSM [63] and iterative derivatives, including the basic iterative method (BIM) and projected gradient descent (PGD). JSMA optimises for the minimal number of altered features (L 0 ). The Deepfool algorithm [62] optimises for the root-mean-square (Euclidean distance, L 2 ). Carlini and Wagner [64] proposed powerful C&W attacks optimizing for the L 0 , L 2 , and L ∞ distance metrics. There are many algorithms to choose from. Furthermore, Papernot et al. [65] developed a software library for the easy generation of adversarial examples. There are now a number of similar libraries that can be used to generate adversarial examples, as shown in Table 3. Does not support all models.
Moreover, algorithms such as FGSM that modify all features are unlikely to preserve functionality. Algorithms such as JSMA that modify a small subset of features are not guaranteed to preserve functionality; however, with fewer modified features, the likelihood improves. Checking for and keeping only examples that preserve functionality is possible, although it is a time-consuming and inelegant solution. A potentially better solution could ensure only functionality-preserving adversarial examples are generated.
When considering the robustness of machine learning models, we first must consider the threat model. We must consider how much the adversary knows about the classifier, ranging from no knowledge to perfect knowledge. Adversaries may have a number of different goals:

1.
Accuracy degradation (where the adversary wants to sabotage the effectiveness of the overall classifier accuracy); 2.
Target misclassification (where the adversary wants to misclassify a particular instance as another given class); 3.
Untargeted classification (where the adversary wants to misclassify a particular instance to any random class).
We now consider the attack surface. In IDS, the attack surface can be considered as an end-to-end pipeline, with varying vulnerabilities and potential for compromise at each stage of the pipeline.
In one basic pipeline, as shown in Figure 2, the raw network traffic on network interfaces is collected as packet capture files (PCAPs), which are then processed into network flows. There are different applications that could be used to process PCAPs into network flows. CICFlowMeter [69] is a network traffic flow generator and analyser that has been used in cyber-security datasets [70,71] and produces bidirectional flows with over 80 statistical network traffic features. The generated flows are unlabelled and so must be labelled manually with the traffic type, typically benign/malicious, although multiclasses could be labelled given sufficient information including attack type, IP source and destination dyad, duration, and start time. Finally, the labelled flows are used to train the model. Repetitive training cycles could enable detection of new attacks; however, the cyclic nature of the training means that an adversary could attack any iteration of training. Furthermore, an adversary could choose to attack any point in the pipeline. The training data used to train the model generally consist of feature-vectors and expected outputs, although some researchers are considering unsupervised learning models. The collection and validation of these data offer an attack surface. Separately, the inference phase also offers an attack surface. It is interesting to note that the size of the feature set a machine learning model uses can be exploited as an attack surface. A fundamental issue is that each feature processed by a model may be modified by an adversary. Moreover, Sarker et al. [72] noted that the computational complexity of a model can be reduced by reducing the feature dimensions. Large feature sets include more features and hence provide more opportunities to an adversary for manipulation. Almomani et al. [73] indicated that accuracy can be maintained with fewer features, and McCarthy et al. [74] indicated that more features tend to reduce the necessary size of perturbations. Therefore, larger feature sets are more readily perturbed than smaller feature sets, which have fewer modifiable features and hence require larger perturbations.

Threat Model-Adversary Capabilities
Adversaries are constrained by their skills, knowledge, tools, and access to the system under attack. An insider threat might have access to the classification model and other associated knowledge, whereas an external threat might only be able to examine data packets. While the attack surface may be the same for both adversaries, the insider threat is potentially a much stronger adversary because they have greater knowledge and access. Adversary capabilities mean that attacks can be split into three scenarios: white-box, black-box, and gray-box.
In white-box attacks, an adversary has access to all machine learning model parameters. In black-box attacks, the adversary has no access to the machine learning model's parameters. Adversaries in black-box scenarios may therefore use a different model, or no model at all, to generate adversarial examples. The strategy depends on successfully transferring adversarial examples to the target model. Gray-box attacks consider scenarios where an adversary has some, but incomplete, knowledge of the system. White-box and black-box are most commonly considered.

Threat Model-Adversary Goals
Adversaries aim to subvert a model through attacking its confidentiality, integrity, or availability. Confidentiality attacks attempt to expose the model or the data encapsulated within. Integrity attacks occur when an adversary attempts to control the output of the model, for example, to misclassify some adversarial traffic and therefore allow it to pass a detection process. Availability attacks could misclassify all traffic types, or deteriorate a model's confidence, consistency, performance, and access. In this way, an integrity attack resembles a subset of availability attack, since an incorrect response is similar in nature to a correct response being unavailable; however, the complete unavailability of a response would likely be more easily noticed than decreases in confidence, consistency, or performance. The goals of an adversary may be different but are often achieved with similar methods. Figure 3 shows some common categories of adversarial machine learning attack methods, that we explore in this section. An adversary with access to the training data or procedure manipulates it, implanting an attack during the training phase, when the model is trained on adversarial training data. This is achieved with carefully crafted noise or sometimes random noise. Unused or dormant neurons in a trained deep neural network (DNN) signify that a model can learn more; essentially, an increased number of neurons allows for a greater set of distinct decision boundaries forming distinct classifications of data. The under-utilised degrees of freedom in the learned model could potentially be used for unexpected classification of inputs. That is, the model could learn to provide selected outputs based on adversarial inputs. These neurons have very small weights and biases. However, the existence of such neurons allows successful poisoning attacks through training the model to behave differently for poisoned data. This suggests that distillation [75] could be effective at preventing poisoning attacks, because smaller models have lower knowledge capacity and likely fewer unused neurons. Distillation reduces the number of neurons that contribute to a model by transferring knowledge from a large model to a smaller model. Despite initial analysis indicating reduction in the success of adversarial attacks, Carlini [64] experimented with three powerful adversarial attacks and a high confidence adversarial example in a transferability attack, and found that distillation does not eliminate adversarial examples and provides little security benefit over undistilled networks in relation to powerful attacks. Unfortunately, they did not specifically consider poisoning attacks. Additional experiments could determine whether distillation is an effective defense against poisoning attacks.

Evasion
In evasion attacks, the adversary is often assumed to have no access to the training data. Instead, adversaries exploit their knowledge of the model and its parameters, aiming to minimise the cost function of adversarial noise, which, when combined with the input, causes changes to the model output. Untargeted attacks lead to a random incorrect output, targeted attacks lead to a specific incorrect output, and an attack may disrupt the model by changing the confidence of the output class. In the visual domain, the added noise is often imperceptible to humans. In non-visual domains such as intrusion detection, this problem may be much more challenging, since even small modifications may corrupt network packets and may cause firewalls to drop these malformed packets. This highlights the need for functionality preservation in adversarial learning as a clear distinction from vision-based attacks that exploit the human visual system.

Methodology
In this section, we describe our approach to surveying the literature so as to conduct an effective and meaningful survey of the literature.
Eligibility Criteria We determined our search terms leading to the most relevant articles. We chose the search terms detailed in Table 4. Table 4. Topics and associated search terms used in this survey.

Topic Search Query
Cyber security/intrusion detection ("cyber security" OR "intrusion detection" OR "IDS") Adversarial machine learning attacks and defences ("adversarial machine learning" OR "machine learning" OR "adversarial example") and ("attack" OR "defence") Robustness/Functionality Preservation (("robustness" OR "generalization error" OR "accuracy" OR "f1score" OR "f-score" OR "TPR" OR "FPR") OR (("functionality" OR "payload") AND "preservation"))) We expect these to result in good coverage of the relevant literature. We searched each database using the identified search terms. The literature search was conducted up to September 2021. Generally, we have chosen to exclude works that have not yet been peerreviewed, such as those appearing on arXiv, unless deemed by the authors as a significant paper that makes a clear contribution to the subject domain. We collated the searches and any subsequent duplicates were removed. Each paper was screened by reading the title and abstract to determine the relevance. Inclusion criteria were: the article is related to functionality preservation in adversarial machine learning for cybersecurity or intrusion detection with insight into robust classification.
From this large list, we specifically focused on adversarial machine learning attacks and defenses, narrowing the literature down to relevant papers. Our selection process was roughly based on the preferred reporting items for systematic meta-analysis (PRISMA) framework [76].

Information Sources
We selected the following databases with a sizeable computer science element to search and retrieve literature: IEEE Xplore, ACM Digital Library, ScienceDirect, Scopus, SpringerLink, and Google Scholar.

Results
In this section, we describe the results of our search and selection process. We further describe our classification scheme, and tabulate and discuss our findings, including adversarial attacks in traditional and cybersecurity domains of malware, IDS, and CPS. We included 146 relevant papers in this survey.

Classification Scheme
We classify attacks by attack type, attack objective (targeted/untargeted), domain, model, knowledge required, and whether any constraints are placed on the adversarial examples. Defenses are classified by type, domain, and model. We summarise the attacks in Table 5.

Adversarial Example Attacks
The attacks we focus on exploit adversarial examples that cause differences in the output of neural networks. Adversarial examples were discovered by Szegedy et al. [17]. Adversarial examples are possible in ANN as a consequence of the properties of neural networks; however, they are possible for other ML models. This complicates mitigation efforts, and adversarial examples can be found for networks explicitly trained on adversarial examples [102]. Furthermore, adversarial examples can be algorithmically generated, e.g., using gradient descent. Moreover, adversarial examples are often transferable, that is, an adversarial example presented to a second machine learning model trained on a subset of the original dataset may also cause the second network to misclassify the adversarial example.

Adversarial Examples-Similarity Metrics
In the visual domain, distance metrics are well used to judge how similar two inputs are, and therefore how easy the differences might be perceived. The following metrics are commonly used to describe the difference between normal and adversarial inputs: • Number of altered pixels, (L 0 ); • Euclidean distance (L 2 , root-mean-square); • Maximum change to any of the co-ordinates, (L ∞ ).
Human perception may not be the best criterion to judge a successful adversarial input. A successful attack in a vision ML task may be to fool a human. Success in an ML-based system is to fool some other detection routine, while conforming to the expected inputs of the system. For example, a malicious packet must remain malicious after any perturbation has been applied. If a perturbed packet is very close to the original packet, this would only be considered successful if it also retained its malicious properties, and hence its intended function. An early gradient descent approach was proposed by Szegedy et al. [17] using a boxconstrained limited memory L-BFGS. Given an original image, this method finds a different image that is classified differently, whilst remaining similar to the original image. Gradient descent is used by many different algorithms; however, algorithms have been designed to be optimized for different distance metrics. There are numerous gradient descent algorithms that produce adversarial examples; they can differ in their optimization and computational complexity. We note the relative computational complexity of common adversarial example algorithms in Table 6 (adapted from [27]). High success rates correlate with high computational complexity. We expect this correlation to be more pronounced for functionality-preserving attacks. FGSM [63] was improved by Kurakin et al. [103], who refined the fast gradient sign by taking multiple smaller steps. This iterative granular approach improves on FGSM by limiting the difference between the original and adversarial inputs. This often results in adversarial inputs with a predictably smaller L ∞ metric. However, FGSM modifies all parameters. This is problematic for features that must remain unchanged or for discrete features such as application programming interface (API) calls. JSMA differs from FGSM in that it optimises to minimize the total number of modified features (L 0 metric). In this greedy algorithm, individual features are chosen with the aim of step-wise increasing the target classification in each iteration. The gradient is used to generate a saliency map, modelling each feature's impact towards the resulting classification. Large values significantly increase the likelihood of classification as the target class. Thus, the most important feature is modified at each stage. This process continues until the input is successfully classified as the target class, or a threshold number of pixels is reached. This algorithm results in adversarial inputs with fewer modified features. The Deepfool algorithm [62] similarly uses gradient descent but optimises for the root-meansquare, also known as Euclidean distance (L 2 ). This technique simplifies the task of shifting an input over a decision boundary by assuming a linear hyper-plane separates each class. The optimal solution is derived through analysis and subsequently an adversarial example is constructed; however, neural network decision boundaries are not truly linear. Therefore, subsequent repetitions may be required until a true adversarial image is found. The optimizations for different distance metrics are types of constraint: maximum change to any feature (L ∞ ); minimal root-mean-square (L 2 ); minimal number of altered features (L 0 ).  Table 5 shows that few researchers employed the transferability of adversarial examples. Other common black-box techniques include GANS and genetic algorithms (GAs). Sharif et al. [104] proposed a method of attacking DNNs with a general framework to train an attack generator or generative adversarial network (GAN). GANs can be trained to produce new, robust, and inconspicuous adversarial examples. Attacks like Biggio et al. [77] are more suitable for the security domain, where assessing the security of algorithms and systems under worst-case attacks is needed [105,106].

White-Box
An important consideration in attacks against intrusion detection systems is that attackers cannot perform simple oracle queries against an intrusion detection system and must minimize the number of queries to decrease the likelihood of detection. Apruzzese et al. [28] further note that the output of the target model is not directly observable by the attacker; however, exceptions occur where detected malicious traffic is automatically stopped or dropped, or where the attacker gains access to/or knowledge of the system.
Gray-box attacks consider scenarios where an adversary has only partial knowledge of the system. Biggio et al. [77] highlighted the threat from skilled adversaries with limited knowledge; more recently, gray-box attacks have received some attention: Kuppa et al. [92] considered malicious users of the system with knowledge of the features and architecture of the system, recognizing that attackers may differ in their level of knowledge of the system. Labaca-Castro et al. [99] used universal adversarial perturbations, showing that unprotected systems remain vulnerable even under limited knowledge scenarios. Li et al. [101] considered limited knowledge attacks against cyber physical systems and successfully deployed universal adversarial perturbations where attackers have incomplete knowledge of measurements across all sensors.
Building on Simple Adversarial Examples: Table 5 shows that much research considers simple adversarial examples, although less research considers sequences of adversarial examples or transferability. We chose to classify attacks as either a simple adversarial example, a sequence of adversarial examples, or a transferable adversarial example. A simple adversarial example is sufficient to alter the output of a simple classifier. Lin et al. [82] suggested that using adversarial examples strategically could affect the specific critical outputs of a

Adversarial Examples-Attack Objectives
There is a distinction between the objectives of attacks: targeted or untargeted. An attack objective might be to cause a classifier to misclassify an input as any other class (untargeted) or to misclassify an input as a specific class (targeted). In the cyber-security domain, IDS often focus on binary classification: malicious or benign. For binary classification the effect of targeted and untargeted attacks is the same. More complex multi-class IDS can help network analysts triage or prioritise different types of intrusions. Network analysts would certainly treat a distributed denial of service (DDoS) attack differently than a BotNet or infiltration attempt. Adversaries could gain significant advantage through targeted attacks, for example, by camouflaging an infiltration attack as a comparatively less serious network intrusion.
Recent research goes beyond adversarial examples causing misclassification of a single input. Moosavi-Dezfooli et al. [107] further showed the existence of untargeted universal adversarial perturbation (UAP) vectors for images, and ventured that this is problematic for classifiers deployed in real-world and hostile environments. In the cyber-security domain, Labaca et al. [99] demonstrated UAPs in the feature space of malware detection. They showed that UAPs have similar effectiveness to adversarial examples generated for specific inputs. Sheatsley et al. [56] looked at UAP in the constrained domain of intrusion detection. Adversaries need only calculate one UAP that could be applied to multiple inputs. Precalculation of a UAP could enable faster network attacks (DDoS) that would otherwise require too much calculation time. Table 5 shows that most research considers untargeted attacks. Targeted attacks are less represented in the literature. Furthermore, UAPs are a more recent avenue for research. Table 5 shows that attacks in the visual domain were the subject of much early research, and the visual domain continues to attract researchers; however, researchers are beginning to consider attacks against other DNN systems such as machine learning models for natural language processing, with some considering semantic preserving attacks.

Adversarial Examples in Traditional Domains
In visual domains, features are generally continuous. For example, pixel values range from 0 to 255. A consensus exists in the visual domain that adversarial examples are undetectable to humans. Moreover, the application domain is clearly interrelated with the choice of machine learning model. Models such as CNNs are appropriate for visual-based tasks, whereas RNNs are appropriate for sequence-based tasks. We discuss model types in Section 5.2.6.
Some models, such as recurrent neural networks, cannot be attacked using traditional attack algorithms; however, some research aims to discover new methods to attack these systems. Papernot et al. [78] noted that because RNNs handle time sequences by introducing cycles to their computational graphs, the presence of these computation cycles means that applying traditional adversarial example algorithms is challenging because cycles prevent direct computation of the gradients. They adapted adversarial example algorithms for RNNs and evaluated the performance of their adversarial samples. If the model is differential, FGSM can be applied even to RNN models. They used a case study of a binary classifier (positive or negative) for movie reviews. They defined an algorithm that iteratively modifies words in the input sentence to produce an adversarial sequence that is misclassified by a well-trained model. They noted that their attacks are white-box attacks, requiring access to, or knowledge of, the model parameters. Szegedy [17]

Adversarial Examples in Cyber-Security Domains
Adversarial examples (AE) have been shown to exist in many domains. Indeed, no domain identified (so far) is immune to adversarial examples [56]. Researchers are beginning to consider cyber-security domains ( Figure 5) where features are often a mixture of categorical, continuous, and discrete. Some research focuses on adversarial example attacks against IDS, although few studies specifically consider functionality-preserving attacks. In the visual domain, we briefly discussed the consensus that adversarial examples are undetectable to humans. However, it is unclear how this idea should be translated to other domains. Carlini [64] held that, strictly speaking, adversarial examples must be similar to the original input. However, Sheatsley et al. [56] noted that research in nonvisual domains provides domain-specific definitions: perturbed malware must preserve its malware functionality [56], perturbations in audio must be nearly inaudible [56], and perturbed text must preserve its meaning. Sheatsley et al. further offered a definition for adversarial examples in intrusion detection: perturbed network flows must maintain their attack behaviour. We consider that human perception may not be the best criterion for defining adversarial examples in cyber-security domains. Indeed, human perception in some domains might be immaterial. For example, only very skilled engineers could perceive network packets in any meaningful way even with the use of network analysis tools. Furthermore, users likely cannot perceive a difference between the execution of benign or malicious software. After malware is executed, the effects are clear; however, during malware execution users often suspect nothing wrong. We therefore consider that while fooling human perception remains a valid ambition, it is critical that adversarial perturbations in cyber-security domains preserve functionality and behaviour.
In the cyber-security domain, traditional gradient descent algorithms may be insufficient. Algorithms that preserve functionality are required. Moreover, some models used in the cyber-security domain are distinct from those used for purely visual problems. For example, RNNs are useful for time sequences of network traffic analysis. We now consider recent functionality-preserving attacks in the cybersecurity domains of malware, intrusion detection, and CPS. We further examine Functionality-preserving attacks in Table 7. Table 7. Functionality-preservation in cybersecurity and intrusion detection.

Work
Year Domain Generation Method Realistic Constraints Findings [53] 2019 Malware Gradient-based Minimal content additions/modification Experiments showed that we are able to use that information to find optimal sequences of transformations without rendering the malware sample corrupt. [94] 2019 IDS GAN Preserve functionality The proposed adversarial attack successfully evades the IDS while ensuring preservation of functional behavior and network traffic features.
Evasion attacks achieved by inserting a dozen network connections.

Retains internal logic
Feature removal is insufficient defense against functionality-preserving attacks, which may are possible by modifying very few features. Malware: Hu and Tan [84] proposed a novel algorithm to generate adversarial sequences to attack an RNN-based malware detection system. They claimed that algorithms adapted for RNNs are limited because they are not truly sequential. They considered a system to detect malicious API sequences. Generating adversarial examples effective against such systems is non-trivial because API sequences are discrete values. There is a discrete set of API calls; changing any single letter in an API call will create an invalid API call and cause that API call to fail. This will result in a program crash. Therefore, any perturbation of an API call must result in a set of valid API calls. They proposed an algorithm based around a generative RNN and a substitute RNN. The generative RNN takes an API sequence as input and generates an adversarial API sequence. The substitute RNN is trained on benign sequences and the outputs of the generative RNN. The generative model aims to minimize the predicted malicious probability. Subsequently, adversarial sequences are presented to six different models. Following adversarial perturbation, the majority of the malware was not detected by any victim RNNs. The authors noted that even when the adversarial generation algorithm and the victim RNN were implemented with different models and trained on different training sets, the majority of the adversarial examples successfully attacked the victim RNN through the transferability property of adversarial examples. In MLP, they reported a TPR of 94.89% that fell to 0.00% under adversarial perturbations.
Demetrio et al. [98] preserved the functionality of malware while evading static windows malware detectors. Their attacks exploit the structure of the portable executable (PE) file format. Their framework has three categories of functionality-preserving manipulations: structural, behavioural, and padding. Some of their attacks work by injecting unexecuted (benign) content in new sections in the PE file, or at the end of the malware file. The attacks are a constrained minimization problem optimizing the trade-off between the probability of evading detection and the size of injected content. Their experiments successfully evaded two Windows malware detectors with few queries and a small payload size. Furthermore, they discovered that their attacks transfered to other Windows malware products. We note that the creation of new sections provides a larger attack surface that may be populated with adversarial content. They reported that their section-injection attack was able to drastically decrease the detection rate (e.g., from an original detection rate of 93.5% to 30.5%, also significantly outperforming their random attack at 85.5%).
Labaca-Castro et al. [53] presented a gradient-based method to generate valid executable files that preserve their intended malicious functionality. They noted that malware evasion is a current area of adversarial learning research. Evading the classifier is often the foremost objective; however, the perturbations must also be carefully crafted to preserve the functionality of malware. They noted that removing objects from a PE file often leads to corrupt files. Therefore, they only implement additive or modifying perturbations. Their gradient-based attack relies on complete knowledge of the system with the advantage that the likelihood of evasion can be calculated and maximised. Furthermore, they stated that their system only generates valid executable malware files. Wang et al. [117] noted that relatively few researchers are addressing adversarial examples against IDS. They proposed an ensemble defense for network intrusion detection that integrates GANS and adversarial retraining. Their training framework improved robustness while maintaining accuracy of unperturbed samples. Unfortunately, they evaluated their defences against traditional attack algorithms: FGSM, basic iterative method (BIM), Deepfool, and JSMA. However, they did not specifically consider functionalitypreserving adversarial examples. They further recognised the importance of using recent datasets for intrusion detection. They reported F1-scores for three classifiers and a range of adversarial example algorithms. For example, the F1-score for an ensemble classifier tested on clean data was 0.998 compared to 0.746 for JSMA. Among all classifiers, the ensemble classifier achieved superior F1-scores under all conditions.
Huang et al. [95] noted that it is more challenging to generate Cyber-Physical Systems: Cai et al. [100] warned that adversarial examples have consequences for system safety because they can cause systems to provide incorrect outputs. They presented a detection method for adversarial examples in CPS. They used a case study of an advanced emergency braking system, where a DNN estimates the distance to an obstacle. Their adversarial example detection method uses a variational auto-encoder to predict a target variable (distance) and compare it with a new input. Any anomalies are considered adversarial. Furthermore, adversarial example detectors for CPS must function efficiently in a real-time monitoring environment and maintain low false alarm rates. They reported that since the p-values for the adversarial examples are almost 0, the number of false alarms is very small and the detection delay is smaller than 10 frames or 0.5 s.
CPS include critical national infrastructure, such as power grids, water treatment plants, and transportation. Li et al. [101] asserted that adversarial examples could exploit vulnerabilities in CPS with terrible consequences; however, such adversarial examples must satisfy real-world constraints (commonly linear inequality constraints). For example, meter readings downstream may never be larger than meter readings upstream. Adversarial examples breaking constraints are noticeably anomalous. Risks to CPS arising from adversarial examples are not yet fully understood. Furthermore, algorithms and models from other domains may not readily apply because of distributed sensors and inherent real-world constraints. However, generated adversarial examples that meet such linear constraints were successfully applied to power grids and water treatment system case studies. The evaluation results show that even with constraints imposed by the physical systems, their approach still effectively generates adversarial examples, significantly decreasing the detection accuracy. For example, they reported the accuracy under adversarial conditions to be as low as 0%.

Adversarial Examples and Model Type
We classify models based on their architecture in four broad types: multi-layer perceptron (MLP), CNN, RNN, and RF. Ali et al. [118] observed that different deep learning architectures are more robust than others. They noted that CNN and RNN detectors are more robust than MLP and hybrid detectors, based on low attack success rates and high query counts. Architecture plays a role in the accuracy of these models because CNNs can learn contextual features due to their structure, and RNNs are temporally deeper, and thus demonstrate greater robustness.
Unsurprisingly, research on CNNs coincides with research in the visual domain, as shown in Table 5. The majority of adversarial example research on RNNs has until recently focused on the text or natural language domain; however, RNNs are also useful in the cybersecurity domain and researchers have recently considered adversarial example attacks against RNN-based IDS.
Other promising research shows that radial basis function neural networks (RBFNN) are more robust to adversarial examples [119]. RBFNNs fit a non-linear curve during training, as opposed to fitting linear decision boundaries. Commonly, RBFNNs transform the input such that when it is fed into the network it gives a linear separation. The non-linear nature of RBFNNs could be one potential direction for adversarial example research. Powerful attacks that are able to subvert RBFNNs would improve our understanding of decision boundaries. Goodfellow et al. [63] argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature. However, RBFNNs are less commonly deployed and are therefore not further discussed.

Adversarial Examples and Knowledge Requirement
The majority of the research focus is on white-box attacks, as shown in Table 5, perhaps because such attacks are known to be efficient and effective. Less research focuses on black-box attacks and few studies recognise gray-box attacks that need only partial model knowledge. Gray-box attacks will likely have advantages over black-box attacks. Adversaries will undoubtedly use any and all information available to them.
We classify the attacks on the knowledge required by the adversary. White-box attacks are likely the most effective and efficient method of attack, because the adversary has complete knowledge of the model architecture, and information on how the model was trained. However, access to this knowledge is harder to attain, although it might also be gained through insider threats [120] or model extraction attacks [121]. Extracted models might be a feasible proxy on which to develop and test adversarial examples.
Notwithstanding the efficiency of white-box attacks, effective black-box attacks are possible. Black-box (or oracle) attacks require no knowledge of the model. Adversaries only need the ability to query the model and receive its output. Adversaries generate inputs and receive the output of the model. Typical black-box attacks include GA [95], and GANs [89,97].
Gray-box attacks require only limited model knowledge, perhaps including knowledge of the features used by the model. This is a realistic prospect, as adversaries will likely have or gain at least partial knowledge of the model. Table 5 shows little research considering constraints of any sort. Much research on IDS ignores constraints; however, network traffic is highly constrained by protocols, and some network firewalls may drop malformed packets. Furthermore, it is insufficient that well-formed adversarial examples progress past firewalls. They must also retain their intended functionality.

Adversarial Example Constraints
Stringent constraints exist in the cyber-security domain. Extreme care must be taken to create valid adversarial examples. For example, in IDS, adversaries must conform the protocol specification of the TCP/IP stack.

Defenses Against Adversarial Examples
Multi-Classifier System [135] 2019 Weight Map Layers [136] 2019 Sequence Squeezing [109] 2019 Feature Removal [137] 2020 Adversarial Training [138] 2020 Adversarial Training [139] 2019 Game Theory [140] 2020 Hardening [141] 2021 Variational Auto-encoder [142] 2021 MANDA It is hard to defend against adversarial examples. People expect ML models to give good outputs for all possible inputs. Because the range of possible inputs is so large, it is difficult to guarantee correct model behaviour for every input. Some researchers explored the possibility of exercising all neurons during training [132]. Furthermore, consideration must be given to how adversaries might react when faced with a defense. Researchers in secure machine learning must evaluate whether defenses remain secure against adversaries with knowledge of model defenses.
We classify the suggested defenses against adversarial examples into the following groups: pre-processing, adversarial training, architectural, detection, distillation, testing, ensembles, and game theory.

Pre-Processing as a Defense against Adversarial Examples
Some promising research considers transformations, such as translation, additive noise, blurring, cropping, and resizing. These often occur with cameras and scanners in the visual domain. Translations have shown initial success in the visual domain. Initial successes have prompted some researchers to discount security concerns. For example, Graese [123] overreached by declaring adversarial examples an "academic curiosity", not a security threat. This position misunderstands the threat from adversarial examples, which remain a concern for cyber-security researchers.
Eykolt et al. [143] noted the creation of perturbations in physical space that survive more challenging physical conditions (distance, pose, and lighting). Transformations are appropriate for images; however, such translations may make little sense in cybersecurity domains. For example, what would it mean to rotate or blur a network packet? Nevertheless, inspiration could be taken from pre-processing methods in the visual domain. Adapting pre-processing methods to cyber-security and other non-visual domains is an interesting avenue for research.

Adversarial Training as a Defense against Adversarial Examples
Szegedy et al. [17] found that robustness to adversarial examples can be improved by training a model on a mixture of adversarial examples and unperturbed samples. Specific vulnerabilities in the training data can be identified through exploring UAPs. Identified vulnerabilities could potentially be addressed with adversarial training. We recognise that adversarial training is a simple method aiming to improve robustness; however, it is potentially a cosmetic solution: the problem of adversarial examples cannot be solved only through ever greater amounts of adversarial examples in the training data. Tramér et al. [102] found that adversarial training is imperfect and can be bypassed. Moreover, black-box attacks have been shown to evade models subject to adversarial training. Adversarial training has some merit because it is a simple method to improve robustness. It is unfortunately not a panacea and should be bolstered by other defenses. Research avenues could combine adversarial training with other techniques. We warn that models used in cyber-security or other critical domains should not rely solely on adversarial training.

Architectural Defenses against Adversarial Examples
Some research, rather than modifying a model's training data, investigated defenses through hardening the architecture of the model. This could involve changing model parameters or adding new layers. In Table 8, we classify such defenses as architectural.
Many white-box attacks rely on the quality of the gradient. Some research considers how the model's weights can be used to disrupt adversarial examples. Amer and Maul [135] modified convolutional neural networks (CNN), adding a weight map layer. Their proposed layer easily integrates into existing CNNs. A weight mapping layer may be inserted between other CNN layers, thus increasing the network's robustness to both noise and gradient-based adversarial attacks.
Other research aims to block algorithms from using weight transport and backpropagation to generate adversarial examples. Lillicrap et al. [122] proposed a mechanism called "feedback alignment", which introduces a separate feedback path via random fixed synaptic weights. Feedback alignment blocks the generation of adversarial examples that rely on the gradient because it uses the separate feedback path rather than weight transport.
Techniques to improve accuracy could similarly help harden models. For example, dropout can improve accuracy when used during training. It is particularly useful where there is limited training data and over-fitting is more likely to occur. Wang et al. [134] proposed hardening DNN using defensive dropout at test time. Unfortunately, there is inherently a trade-off between defensive dropout and test accuracy; however, a relatively small decrease in test accuracy can significantly reduce the success rate of attacks. Such hardening techniques force successful attacks to use larger perturbations, which in turn may be more readily recognized as adversarial.
Defenses that block gradient-based attacks complicate the generation of adversarial examples; however, like adversarial training, these defenses could be bypassed. In particular, black-box attacks and transferability-based attacks are not blocked by such defenses. A more promising defense, "defensive dropout", can block both black-box and transferability-based attacks.

Detecting Adversarial Examples
Much research has considered the best way to detect adversarial examples. If adversarial examples can be detected they could be more easily deflected, and perhaps even the original input could be salvaged and correctly classified. Grosse et al. [124] [144] assertion that Bayesian approximation using dropout can be applied to RNN networks.
Meng et al. [127] proposed a framework, "MagNET", to detect adversarial examples. This framework precedes the classifier it defends. The framework has two components: (1) A detector finds and discards any out-of-distribution examples (those significantly far from the manifold boundary); (2) A reformer that aims to find close approximations to inputs before forwarding the approximations to the classifier. Their system generalizes well because it learns to detect adversarial examples without knowledge of how they were generated. They proposed a defense against gray-box attacks where the adversary has knowledge of the deployed defenses. The proposed defense trains a number of autoencoders (or reformers). At test-time a single auto-encoder is selected at random.
Xu et al. [128] proposed "feature squeezing" as a strategy to detect adversarial examples by squeezing out unnecessary features in the input. Through comparing predictions of the original and feature squeezed inputs, adversarial examples are identified if the difference between the two predictions meets a threshold. Two feature-squeezing methods are used: (1) Reducing the colour bit-depth of the image; (2) Spatial smoothing. An adversary may adapt and circumvent this defense; however, the defense may frustrate the adversary because it changes the problem the adversary must overcome.
Rosenberg et al. [136] considered the feature squeezing defense designed for CNNs and proposed "sequence squeezing", which is adapted for RNNs. Adversarial examples are similarly detected by running the classifier twice: once on the original sequence, and once for the sequence-squeezed input. An input is identified as adversarial if the difference in the confidence scores meets a threshold value.
Zhang et al. [141] proposed an image classifier based on a variational auto-encoder. They trained two models each on half the dataset: a target model and a surrogate model. On the surrogate model they generated three types of strong transfer-based adversarial examples: L 0 , L 2 , and L ∞ . Analysis of their model using the CIFAR-10, MNIST, and Fashion-MNIST datasets found that their model achieves state-of-the-art accuracy with significantly better robustness. Their work is in the visual domain; however, perhaps their ideas can be applied to other domains such as intrusion detection.
We have discussed some architectural defenses against adversarial examples. In particular, we have considered methods for detecting adversarial examples. Carlini and Wagner [145] showed that adversarial examples are harder to detect and that adversarial examples do not exhibit intrinsic properties. Moreover, many detection methods can be broken by choosing good attacker-loss functions. Grosse et al. [124] noted that adversarial defenses exist within an arms race and that guarantees against future attacks are difficult because adversaries may adapt to the defenses by adopting new strategies. Meng et al. [127] advocated that defenses against adversarial examples should be independent of any particular attack. We have seen that human-in-the-loop solutions could be useful where few cases need human intervention; however, repeated requests might quickly overwhelm human operators given large numbers of adversarial examples, for example, as might be seen in network traffic analysis.

Defensive Testing
Adversarial examples cause unexpected behaviour. Recent research considers testing deep learning systems. Pei et al. [146] aimed to discover unusual or unexpected behaviour of a neural network through systematic testing. They produced test data by solving a joint optimization problem. Their tests aim to trigger different behaviours and activate a high proportion of neurons in a neural network. Their method finds corner-cases where incorrect behaviour is exhibited, for example, malware masquerading as benign. They claimed to expose more inputs and types of unexpected behaviour than adversarial examples. They further used the generated inputs to perform adversarial training. As a defense we question the practicability of triggering all neurons in larger neural networks; however, as an attack, their method could produce different types of adversarial inputs.
Other researchers are considering similar techniques to generate test data. Tian et al. [132] evaluated a tool for automatically detecting erroneous behaviour, generating test inputs designed to maximise the number of activated neurons using realistic driving conditions, including blurring, rain, and fog. Zhang et al. [133] proposed a system to automatically synthesize large amounts of diverse driving scenes, including weather conditions, using GANs. We consider GANs useful for generating adversarial inputs. GANs should implicitly learn domain constraints.

Multi-Classifier Systems
Biggio et al. [105] highlighted that robustness against adversarial examples can be improved through the careful use of ensemble classifiers, for example, by using rejection-based mechanisms. Indeed, Biggio et al. had implemented a multi-classifier system (MCS) [147], which was hardened using randomisation. Randomising the decision boundary makes a classifier harder to evade. Since the attacker has less information on the exact position of a decision boundary, they must make too conservative or too risky choices when generating adversarial examples.

Game Theory
Zhou et al. [139] consider game theoretic modeling of adversarial machine learning problems. Many different models have been proposed. Some aim to optimise the feature set using a set of high-quality features, thus making adversarial attacks more difficult. Game theoretic models are proposed to address more complex situations with many adversaries of different types. Equilibrium strategies are acceptable to both players and neither has an incentive to change. Therefore, assuming rational adversaries, game theory-based approaches allowing a Nash equilibrium could potentially end the evolutionary arms race.

Adversarial Example Defenses in Cybersecurity Domains
We discussed domains in Sections 5. Different model types are more suited to domains. We consider that different model types may require different defenses. Again, we classify models into four types: MLP, CNN, RNN, and RF.

Discussion and Conclusions
ML systems are deployed in complex environments, including cybersecurity and critical national infrastructure. Such systems attract the interest of powerful advanced persistent threats that may target them. Crucially, we must address robustness against functionality-preserving adversarial examples before novel attack strategies exploit inherent weaknesses in critical ML models.
Machine learning and adversarial learning are becoming increasingly recognised by the research community, given the rapid uptake of ML models in a whole host of application domains. To put this in context, 2975 papers were published on arXiv in the last 12 months (October 2020-September 2021) related to machine learning and adversarial learning. Over recent years, the number of papers being published on this topic has grown substantially. According to Carlini, who maintains a blog post "A Complete List of All (arXiv) Adversarial Example Papers" [148], the cumulative number of adversarial example papers neared 4000 in the year 2021. It is therefore evident that there is a lot of interest and many researchers active in this area. Not all papers in this list are useful or relevant; we pass no judgement of their quality but merely aim to clarify the research landscape and draw important research to the fore. The majority of prior research has been applied to the visual domain. Seminal contributions have been made by Szegedy et al. [17], Goodfellow et al. [63], Carlini et al. [64], and Papernot [79]. It is clear that the visual domain continues to be well researched.
We conducted an extensive survey of the academic literature in relation to functionalitypreservation in adversarial machine learning. We derived a classification based on both attack and defense. We consider edpossible robustness metrics. Moreover, we considered model training and data-level techniques that could help improve robustness through tackling biased datasets.
Analysis of functionality-preservation methods finds gradient-based methods may be less suitable for functionality-preservation and other constraints. Methods modifying large numbers of features are less likely to preserve functionality. We found that GANS and genetic algorithms are suitable for functionality-preserving attacks. We subsequently discussed defense strategies against functionality-preserving adversarial examples. We found that functionality-preserving adversarial machine learning is an open research topic. Finally, we will identify some key future directions and research challenges in functionalitypreserving machine learning.

Future Directions and Research Challenges
We now discuss future research challenges. Few researchers address the problem of transferability, which remains a key area of concern because hard-to-attack models are nevertheless susceptible to transferable adversarial examples generated against easy-toattack models. Breaking the transferability of adversarial examples is a key challenge for the research community. Currently, defensive dropout [134] at test time is a promising defense. Adversarial example detection is a useful area of research.
We Concept-drift is a real concern for cybersecurity [1], as new attacks and techniques are discovered daily. As the model and the current state of the art diverge, the model suffers from hidden technical debt. Therefore, the model must be retrained to reflect the current state-of-the-art attacks and new network traffic patterns [149]. Researchers might develop and use more up-to-date datasets. Further avenues for research include semisupervised/unsupervised ML and active learning methods that continuously update the underlying model and do not rely on labelled datasets. We identify that data-level techniques such as resampling, balancing datasets, and cross validation could have effects on robustness against adversarial examples. Further research is required to explore how the bias-variance trade-off can effect robustness.
We prioritise the areas of future research, setting the agenda for research in this area. Critical areas of research include breaking the transferability of adversarial examples that would hopefully be applicable across domains. Non-visual domains including cybersecurity and cyber physical systems have been under-explored and this oversight should be rectified urgently. Further research on transformations in non-visual domains could provide useful knowledge. Detection of adversarial examples and pushing the fields of cybersecurity, intrusion detection, and cyber-physical systems will yield benefits beyond cybersecurity and may be applicable in other non-visual domains. Moreover, research is required in areas beyond instance classifiers. Areas of RNNs and reinforcement learning have been under-explored. More research is required to understand the use of domain constraints and functionality-preserving adversarial examples. Further research is needed towards effective countermeasures.
Additionally, we consider that more research attention could be given to dataset resampling strategies as a defence against adversarial examples. There is a need for better robustness metrics. Some researchers simply state accuracy, and others might state the better F1-score; however, the F1-score is biased by unbalanced datasets that are widespread in intrusion detection, partly due to large numbers of benign samples. Using F1-score could lead to a false sense of security. Researchers should adopt stronger metrics such as CLEVER [59] or empirical robustness [62].
Adversarial machine learning is a critical area of research. If not addressed, there is increasing potential for novel attack strategies that seek to exploit the inherent weaknesses that exist within machine learning models; however, few works consider "realisable" perturbations that take account of domain and/or real-world constraints. Successful adversarial examples must be crafted to comply with domain and real-world constraints. This may be challenging since even small modifications may corrupt network packets that are likely to be dropped by firewalls. This necessitates functionality preservation in adversarial learning.
We propose that human perception may not be the best criterion for analyzing adversarial examples. In cybersecurity domains we propose that adversarial examples must preserve functionality. Traditionally, adversarial examples are thought of as having imperceptible noise. That is, that humans cannot perceive the difference between the original and perturbed inputs. Indeed, human perception in some domains might be immaterial.
For example, strategic attacks triggered at crucial moments might cause damage to CPS before any human could reasonably act.
In cyber-security domains traditional gradient descent algorithms may be insufficient, although JSMA may be reasonable because it perturbs few features. Stringent constraints exist in the cyber-security domain and extreme care must be taken to create valid adversarial examples. We offer some guidelines for generating functionality-preserving adversarial examples. Functionality-preserving adversarial examples should: only perform legitimate transformations; respect mathematical dependencies, real-world, and domain constraints; minimize the number of perturbed features and restrict modification to non-critical features; and where possible retain the original payload and/or packet order.
Defences against adversarial examples must consider that adversaries are likely to adapt by adopting new strategies. Many researchers propose adversarial training to improve robustness. Adversarial training is a simple method aiming to improve robustness; however, it is potentially a cosmetic solution: the problem of adversarial examples cannot be solved only through ever greater numbers of adversarial examples in the training data. Adversarial training, if used, must be bolstered by other defenses. Interesting defence strategies include randomisation: randomising decision boundaries makes evasion more difficult because attackers have less information on the exact position of a decision boundary. They must therefore make too conservative or too risky choices when generating adversarial examples.
Game theoretic models could be used to address more complex situations with many adversaries of different types as found in intrusion detection. Equilibrium strategies acceptable to both defender and adversary mean neither has an incentive to change. Therefore, assuming rational opponents, game theory-based approaches allowing a Nash equilibrium could potentially end the evolutionary arms race, although it is difficult to conceive a world where no advantage is possible.
Current promising defenses such as dropout exchange a relatively small decrease in accuracy for a significant reduction of successful attacks, even successfully blocking black-box and transferability-based attacks. Hardening techniques force successful attacks to use larger perturbations, which in turn may be more readily recognized as adversarial.
In a broader cybersecurity context, risks arising from adversarial examples are not yet fully understood. Furthermore, algorithms and models from other domains may not readily apply because of distributed sensors and inherent real-world constraints. It is uncertain whether current defences are sufficient. Furthermore, adversarial example detectors must function efficiently in a real-time monitoring environment while maintaining low false alarm rates.
Many academic researchers use old datasets that do not fairly represent modern network traffic analysis problems due to concept-drift. Problems of labelling data and retraining systems provide an impetus to explore unsupervised and active learning. Unfortunately, adversarial attacks are possible on active learning systems [150]. Lin et al. [82] described an enchanting attack to lure a machine learning system to a target state through crafting a series of adversarial examples. It is conceivable that similar attacks could lure anomaly detection systems towards normalizing and accepting malicious traffic.

Key Future Research Challenges
Adversarial ML is a critical area of research. Researchers must address the robustness of ML models against adversarial examples allowing safer deployment of ML models across cybersecurity domains. Better robustness metrics should be used and developed. We find the traditional benchmark of human perception may be less relevant in functionality preservation. Moreover, traditional gradient descent algorithms may be insufficient to generate functionality-preserving attacks, and adversaries may use other methods such as GANS. Therefore, defences against gradient descent algorithms may likewise be insufficient. Defences must consider reactive adversaries who adapt to defences. Randomisation of decision boundaries can make evasion more difficult. Moreover, research into multi-classifier systems could help thwart evasion attacks, making it harder to evade classification. Dropout is currently a promising defense against adversarial examples, although multiple defenses may be required and a combination of defenses will likely offer better defense capability. Game-theory approaches could potentially end the adversarial arms race by achieving a Nash equilibrium. Concept-drift requires further research. Many researchers are using outdated datasets. Simply using newer datasets could postpone problems of concept-drift and is a good first step. Unsupervised/semi-supervised and active learning could potentially offer longer-term solutions to concept-drift, aiming for models to learn and detect novel attack methods. Transferability of adversarial examples remains an open issue, and more research here has the potential to disrupt many attack strategies. More research is required in the area of functionality-preserving adversarial attacks, recognising the limits and trade-offs between functionality-preserving adversarial examples and their ability to evade classification; moreover, research into adversarial attacks in other constrained domains could improve robustness against complex attacks.
We offer these insights and hope that this survey offers other researchers a base for exploring the areas of robustness and functionality-preserving adversarial examples.