3.4.1. Traditional Sensitivity Analysis Methods
Traditional sensitivity analysis methods try to represent each input variable with a numeric value, which is called the sensitivity index. Sensitivity indices can be first-order indices, measuring the contribution of a single input variable to the output variance or second, third of higher-order indices, measuring the contribution of the interaction between two, three, or more input variables to the output variance respectively. the total-effect indices, combining the contributions of first-order and higher-order interactions with respect to the output variance.
An output variance sensitivity analysis that is based on the ANOVA decomposition was formalised by Sobol, who proposed the approximation of sensitivity indices of first and higher order while using Monte-Carlo methods [
101], while Saltelli [
102] and Saltelli et al. [
103] improved upon Sobol’s approach while using more efficient sampling techniques for first, higher, as well as total-effect indices.
Cukier et al. [
104] proposed the Fourier Amplitude Sensitivity Test (FAST) method to improve the approximation of Sobol’s indices. This is achieved by applying a Fourier transformation to transform a multi-dimensional integral into a one-dimensional integral with different transformations leading to different distributions of sampled points. Saltelli et al. [
105] improved upon FAST to compute the total-effect indices, while Tarantola et al. [
106] extended random balance designs, applied by Satterthwait in regression problems, to sensitivity analysis for non-linear, non-additive models by combining them with FAST (RBD-FAST). The RBD-FAST method was further improved in terms of computational efficiency by Plischke [
107], while Tissot et al. introduced a bias correction method in order to improve estimation accuracy [
108].
Another method for global sensitivity analysis is that of Morris [
110], which is often referred to as the one-step-at-a-time method (OAT). Under this approach, the input variables are split into three groups: input variables whose contributions are insignificant, inputs that have significant linear effects of their own without any interactions, and inputs that have significant non-linear and/or interaction effects. This is achieved through discretising the input space for each variable and iteratively making a number of local changes (one at a time) at different points for the possible range of input values. The Morris method, while complete, is very costly and, as a result, in some cases, fractional factorial designs, as described in [
109], need to be formulated and employed in practice, in order for sensitivity analysis to be performed more efficiently. By devising a more effective sampling strategy as well as other improvements, Campolongo et al. [
111] proposed an improved version of Morris’s method.
In some cases, variance is not a good proxy for the variability of the distribution. As a result, some studies have focused on developing sensitivity indices that are not based on variance, which are often referred to as moment independent importance measures, requiring no calculation of the output moments. One example is Borgonovo’s [
112] distribution or density based sensitivity indices, which measure the distance or the divergence between the unconditional output distribution and output distribution when conditioned on one or more input variables. Building on from the work of Borgonovo, Plischke et al. [
113] introduced a new class of estimators for approximating density-based sensitivity measures, independent of the sampling generation method used.
Being introduced by Sobol and Kucherenko [
114], the method of derivative-based global sensitivity measures (DGSM) is based on finding the averages of local derivatives while using Monte Carlo or Quasi Monte Carlo sampling methods. DGSM, which can be seen as the generalization of the Morris method, are much easier to implement and evaluate when compared to the Sobol sensitivity indices.
3.4.2. Adversarial Example-based Sensitivity Analysis
Adversarial examples are datapoints whose features have been perturbed by a subtle yet sufficient amount, enough to cause a machine learning model make incorrect predictions about them. Adversarial examples are like counterfactual examples; however, they do not focus on explaining the model, but on misleading it. Adversarial example-based sensitivity analysis methods are methods that create adversarial examples for different kinds of data such as images or test.
It was Szegedy et al. [
115] who first discovered that the functions learnt by deep neural networks can be significantly discontinuous, thus their output is very fragile to certain input perturbations. The term “adversarial examples” was coined for such perturbations and it was found that adversarial examples can be shared among neural networks with different architectures, trained on different subsets, disjoint or not, of the same data: the very same input perturbations that caused one network to misclassify can cause a different network to also alter its output dramatically. The problem of finding the minimal necessary perturbations was formulated as a box-constrained L
2-norm optimisation problem and the L-BFGS optimisation algorithm was employed in order to approximate its solution. Goodfellow et al. [
116] argued that high-confidence neural network misclassifications that are caused by small, yet intentionally, worst-case datapoint perturbations, were not due to nonlinearity or overfitting, but instead due to neural networks’ linear nature. In addition to their findings, they also proposed a fast and simple yet powerful gradient-based method of generating adversarial examples while using the L
∞ norm, called fast gradient sign method (FGSM).
Figure 8 illustrates the effectiveness of the FGSM method, where instances of the MNIST dataset are perturbed while using different values of
, resulting in the model misclassifying them.
In order to test the sensitivity of deep learning models, Moosavi-Dezfooli et al. proposed DeepFool [
117], a method that generates minimum-perturbation adversarial examples that are optimised for the L
2 norm. By making the simplifying assumptions, DeepFool employs an iterative process of classifier linearisation, producing adversaries that work well against both binary and multi-class classifiers. Moosavi-Dezfooli et al. [
118] also came up with a formulation that is able to produce a single perturbation, such that the classifier mis-classifies most of the instances. The existence of these so called “universal adversarial examples” exposed the inherent weaknesses of deep neural networks on all of the inputs. Papernot et al. [
119] conducted a thorough investigation regarding the adversarial behaviour within the deep learning framework and proposed a new class of algorithms able to generate adversarial instances. More specifically, the method exploiting the mathematical relationship between the inputs and outputs of deep neural networks to compute forward derivatives and subsequently construct adversarial saliency maps. Finally, the authors pointed towards the development and utilisation of a distance metric between non-adversarial inputs and the corresponding target labels as a way to defend against adversarial examples. Kurakin et al. [
120] highlighted that, although in most studies regarding machine learning sensitivity it is assumed the adversary examples can be input directly into the classifier, this assumption does not always hold true for classifiers engaging with the physical world, such as those receiving input in the form of signals from other devices. To this end, among the other methods used, a new method that improves upon the FGSM [
116] algorithm was introduced, whereby the FGSM was repeated many times with small step size, truncating the intermediate results after each step in the process, so that the produced adversarial examples (pixels in this case) are within close range of the original examples. Dong et al. [
121] promoted the use of momentum in oder to enhance the process of creating adversarial instances while using iterative algorithms, thus introducing the a broad class of adversarial momentum-based iterative algorithms. Momentum is well known to help iterative optimisation algorithms, such as gradient descent, in order to stabilise gradients and escape from local minima/maxima.
NATTACK: instead of seeking an optimal adversarial example Li et al. [
122] considered fitting a probability distribution in a neighbourhood centered around a given example, with the assumption being that any example generated from this distribution is a good adversary candidate. The proposed approach offers two distinct benefits: first, it can be employed to attack any model and, secondly, it does not require of any knowledge of the model’s internal workings.
Carlini and Wagner [
123] introduced three novel adversarial attack algorithms, based on the L
0, L
2, and L
∞ norms, respectively, which were very effective against neural networks, even those where defensive distillation technique [
124] had been applied. The proposed attacks aim to address the same minimal perturbation problem as Szegedy et al [
115], but they formulate it using the margin loss instead of cross-entropy loss, thus minimising the distance between adversarial and benign examples in a more direct way. In [
125], Carlini et al. demonstrated how to construct a provable strongest attack, also called the ground truth attack. The problem of finding adversarial examples proven to be of minimal distortion was formulated as a linear-like optimisation problem. The deduced adversarial example, having the most similarity to the original instance, is called the ground truth adversarial example.
Spatially Transformed Attack: Xiao et al. [
126] proposed perturbing images by performing a slight spatial transformation such as translating, rotating and/or distorting the image features. Such perturbations are small enough to escape human attention but are able to trick models.
One-pixel Attack: Su et al. [
127] showed how neural networks can be fooled by altering the value of just a single input pixel. By constraining the L
0 norm, they enforced a limit on the number of pixels that were allowed to be perturbed.
Zeroth order optimisation based attack (ZOO): assuming that the one has access to the prediction probability scores (rather than just the predicted labels) of a classifier and the respective inputs, Chen et al. [
128] proposed an algorithm to infer the gradient information by observing the changes in the prediction scores, thus eliminating the need for a substitute model when creating adversarial examples.
In their study [
129], Narodytska et al. focused on generating adversarial examples for any deep convolutional neural network without prior knowledge of the internal workings of the network in question. To this end, they proposed two pixel-perturbing methods that operate without using any gradient information: the first one is to randomly select and perturb a set of pixels, while the second one improves upon the first one by incorporating a greedy local-search algorithm to efficiently locate a better set of pixels to perturb. Introduced in [
130], HopSkipJumpAttack is a group of adversarial-example generating algorithms that rely on binary information regarding the decision boundary and Monte Carlo methods in order to approximate the direction of the gradient. The method is able to produce both targeted and non-targeted examples that are optimised for the L2 and L
∞ norms.
Liu et al. [
131] performed a thorough investigation on the transferability of both non-targeted and targeted adversarial examples while using models and datasets of large scale, concluding that while transferring non-targeted adversarial examples can be very effective in fooling neural networks, targeted adversarial examples do not work as well. To this end, they proposed new ways of producing effective, transferable adversarial examples, both targeted and non-targeted, with a high success rate when tested against a black-box image classification model. Houdini [
132] is an approach that was proposed by Cisse et al. that is able to produce adversarial instances for any specified task, according to the respective measure of performance. Houdini’s adversarial attacks were employed with success to a variety of structured prediction tasks, including the typical image classification challenge, but also extending the use of adversarial examples to other problems, such as voice recognition, pose estimation, and semantic segmentation. Finally, it should not be left unnoted that, in terms of measures of performance for the different tasks, Houdini is capable of handling complex measures, even non-decomposable ones as well as combinations of measures. In [
133], a novel approach that uses an elastic net-based regularisation framework (the combination of the L
1 and L
2 norms) to generate adversarial instances against deep neural networks was proposed. Empirical results on three different image datasets showed that the proposed framework was able to produce adversarial examples that can break through the defensive distillation technique and have high transferablity. Lastly, the inner workings of the method and its way of exploiting the L
1 norm revealed new useful insights behind the relationship between the L
1 norm and generation of effective adversarial examples. Papernot et al. [
134] proposed a novel method for generating adversarial examples by examining the inputs that were provided to a deep neural network and the corresponding the labels that were assigned by the network. The method consists of training a model using synthetic instances, generated by an adversary, as input and the neural network’s predictions of these instances as the true labels. The trained model is subsequently used to create adversarial examples to attack the neural network. Such examples would be misclassified not only by the trained model, but also by the neural network, as, by definition, they would have similar decision boundaries.
Brendel et al. [
135] highlighted the lack of scientific studies regarding decision-based adversarial attacks and pinpointed to the benefits and the versatility of such attacks, namely that they can be used against any black-box model, require only the observing of the model’s final decisions, are easier to implement compared to transfer-based attacks, and, at the same time, are more effective against simple defences when compared to gradient-based or score-based attacks. To support their arguments, they introduced the so-called Boundary Attack, a decision-boundary based adversarial attack, which, in principle, begins with creating adversarial instances of high degree perturbations and, subsequently, decreasing the level of perturbation. More specifically, through a rejection process, the method learns the decision boundary between non-adversarial and adversarial instances and, with this knowledge, is able to generate effective adversaries. Brendel et al. [
136] also developed a novel family of gradient-based adversarial attacks that not only performed better than previous gradient-based attacks, but were more effective against gradient-masking, more efficient in terms of querying the defended model, and able to optimise for a variety of adversarial criteria. Unlike other methods that explore areas far away from the decision boundary and, as a result, might get stuck, the point-wise attack only stays in areas close the boundary, where gradient signals are more reliable, in order to minimise the distance between the adversarial and original example. Koh and Liang [
137] proposed an indirect method of generating adversarial examples. The proposed method is capable of explicitly calculating what the difference would be in the final loss if one training example was altered without retraining the model. By identifying the training instances with the highest impact on model’s predictions, powerful adversaries can be deducted.
In the works of Zugner et al. [
138] and Dai et al. [
139], adversarial examples in graph-structured data were studied. The former method is a greedy approach that is concerned with attacking node classification models, through the modification of the node connections (add/remove edges between nodes) or node features (flip the feature of nodes with limited number of operations). Three different settings were considered: manipulation of all nodes in the graph, of a set of nodes, including the node in question, and a set of nodes excluding the node in question. The latter attack method is based on a reinforcement learning formulation of the problem and, more specifically, a Q-Learning game. Under this approach only the addition and removal of edges is allowed when altering the graph structure.
In [
140], a graph attack based on meta-learning was proposed. Meta-learning has been historically employed for fast reinforcement learning, hyperparameter tuning, and few-shot image recognition. In this scenario, the graph structure of the network was used as input to a meta-learning algorithm as the hyperparameter to be optimised.
Sharif et al. [
141] proposed a method for fooling face recognition neural networks by modifying the original images through the insertion of 3D printed sunglasses in the original face images. The colour of these glasses was optimised towards leading the neural network to mis-classify the faces in question. Hayes and Danezis [
142] introduced a generative universal adversarial example framework, whereby image perturbations are produced by a generative model, such that, when incorporated into a normal, non-adversarial instance, they transform it to an adversarial instance. Because the generator is not conditioned on the given images, the generated perturbations can be applied to any image and then transform it into an adversarial one. Schott et al. also developed a high-accuracy, robust against adversarial attacks, image classification model that utilises the analysis by synthesis approach [
143]. More specifically, for each instance in the datase, a lower bound of the ELBO loss given each class is calculated and, subsequently, these class-conditional ELBOs are synthesised in order to produce the final prediction. Furthermore, two new attacks were developed: one specifically tailored to work well against the proposed model by exploiting its structure and a decision-based attack that optimises towards the smallest number of perturbed pixels.
In noise-based adversarial attacks, original examples are perturbed with the addition of some form of noise and before being passed as input to a machine learning model. However, in many cases, this addition of noise can cause some input values to fall outside their originally defined domain and therefore clipping is required if they are to passed to the model. The proposed clipping methods prior to [
144] were relatively slow and they only provided approximations to the optimal solution, thus diminishing the effectiveness of the produced. adversarial examples. In order to improve both the effectiveness and speed of the previously proposed clipping methods, Rauber and Bethge [
144] proposed a fast and differentiable algorithm to rescale perturbation vectors, under which a perturbation with the desired norm after clipping can be analytically calculated while using a closed form solution.
Adversarial example vulnerability also exists in deep reinforcement learning modelling, as demonstrated by Huang et al. [
145]. By employing the FGSM method [
116], the authors created adversarial states to manipulate the network’s policy. They showed that even slight state perturbations can potentially lead to very significant differences in terms of performance and decisions.
Yang et al. [
146] focussed on generating adversarial examples for discrete data such as text. Firstly, a two-step greedy approach that locates which words in a piece of text to perturb and then alters them accordingly was implemented, and, secondly, they proposed a novel method, called Gumbel, where the two steps of the first approach were parameterised and a model was trained to find the optimal ones. Samanta and Mehta [
147] as well as Iyyer et al. [
148] proposed methods for generating adversarial sentences that are both grammatically correctly and in agreement with the syntax of the original sentences. To this end, the former replaced original words with synonyms and exploited words that, when used in different contexts, have have different meanings, while the latter used paraphrasing techniques. Miyato et al. [
149] proposed applying perturbations to the word embeddings in a recurrent neural network instead of the original input. The produced word embeddings were shown to be of greater quality, while the resulting model was shown to be less prone to over-fitting. Ebrahimi et al. [
150] considered replacing a single character in a sentence in order to fool character-based text classifiers. Using gradient information, the method identifies the most influential letter to be replaced. A closely related work [
151] by Liang et al. creates adversaries by adding, removing, and altering words or phrases instead of single characters. Such words or phrases are identified as more or less influential based on the influence of their individual characters, similarly to [
150].
In their study, Jia and Liang [
152] investigated generating examples for reading comprehension tasks: given a paragraph and a related question, the model has to generate an answer. Focusing on models while using the Stanford Question Answering Dataset (SQuAD), they devised two attacks, ADDSENT and ADDANY, which both try to create adversarial examples by adding words from the original question. In addition, two variants of the original attacks were developed: ADDONESENT, where a random human-approved sentence is added to the original paragraph, and ADDCOMMON, which is identical to ADDANY, except that common words are added instead. Alzantot et al. [
153] proposed a method to generate adversarial examples for text while using a population-based genetic algorithm. The algorithm, operating by looping through every word in each sentence applying perturbations based on swapping counter-fitted word embeddings, yielded very high success rates when its adversarial examples were used to attack sentiment analysis models as well as textual entailment models. A similar idea was later also proposed by Kuleshov et al. [
154], which uses word replacement by greedy heuristics, while later Wang et al. [
155] improved upon the genetic algorithm, achieving not only higher success rates, but also lower word substitution rates and more transferable adversarial examples when compared to [
153].
DeepWordBug: the basic idea behind DeepWordBug [
156] is to come up with a scoring strategy that is able to determine those text pieces, which, if manipulated, are most likely to force a model into mis-classifications. Such maniupulations include token insertions, deletions, substitutions as well as k-nearest neighbour token swaps based on cosine similarity. Textbugger [
157] works in similar fashion, providing improvements over DeepWordBug through the introduction of novel scoring functions.
Seq2Sick: Cheng et al. [
158] considered adversarial attacks against seq2seq models, which were widely adopted in text summarisation and neural machine translation tasks. The two main challenges in producing successful seq2seq attacks include the discrete input domain and the almost infinite output domain. The former problem was addressed through the development of a projected gradient method that combines the regularization method with group lasso, while the latter was handled by using newly-proposed loss functions.
Feng et al. [
159] introduced a process, called “input reduction”, which can expose issues regarding overconfidence and oversensitivity in natural language processing models. Under input reduction, non-important words are removed from the input text in interative fashion, while the model’s prediction for that input remains unchanged. The authors demonstrated that input texts can have their words removed to a degree where they make no sense to humans, without any impact on the model’s output. Ren et al. [
160] proposed a greedy algorithm for textual adversarial example generation , alled probability weighted word saliency (PWWS), which follows the synonyms substitution strategy, but replaces words that are based on the word saliency and classification probability TextFooler [
161] generates adversarial examples for text by utilising word embedding distance and part-of-speech matching to first identify the most important words in terms of the model’s output and subsequently greedily replaces them with synonyms that fit both semantically and grammatically until a mis-classification occurs. The BERT language model was utilised in two studies in order to create textual adversarial examples: Garg and Ramakrishnan [
162] and Li et al. [
163], both of which proposed generating adversarial examples through text perturbations that are based on the BERT masked language model, as part of the original text is masked and alternative text pieces are generated to replace these masks. In their work [
164], Tan et al. proposed Morpheus, which is a method for generating textual adversarial examples by greedily perturbing the inflections of the original words in the text to find the inflected forms with the greatest loss increase, only taking into considerations the inflections that belong to the same part of speech as the original word. Unlike most work on textual adversarial examples, Morpheus produces its adversaries by exploiting the morphology of the text. Zang et al. [
165] suggested applying word substitutions using the minimum semantic units, called sememes. The assumption was that the sememes of a word are indicative of the word’s meaning and, therefore, words with the same sememes should be good substitutes for each another. To search for such words efficiently, an algorithm based on particle swarm optimization (PSO) was proposed.
Studies on sensitivity analysis over the recent years have focussed on exposing the weaknesses of deep learning models and their vulnerability against adversarial attacks. The literature is very complete when it comes to fooling models in computer vision and natural language processing tasks. However, minimal work has been done in terms of tabular data—in theory, some of the adversarial example generation techniques from computer vision could be applied to tabular data, but their effectiveness has not yet been clearly demonstrated.