A Study of Adversarial Attacks and Detection on Deep Learning-Based Plant Disease Identiﬁcation

: Transfer learning using pre-trained deep neural networks (DNNs) has been widely used for plant disease identiﬁcation recently. However, pre-trained DNNs are susceptible to adversarial attacks which generate adversarial samples causing DNN models to make wrong predictions. Successful adversarial attacks on deep learning (DL)-based plant disease identiﬁcation systems could result in a signiﬁcant delay of treatments and huge economic losses. This paper is the ﬁrst attempt to study adversarial attacks and detection on DL-based plant disease identiﬁcation. Our results show that adversarial attacks with a small number of perturbations can dramatically degrade the performance of DNN models for plant disease identiﬁcation. We also ﬁnd that adversarial attacks can be effectively defended by using adversarial sample detection with an appropriate choice of features. Our work will serve as a basis for developing more robust DNN models for plant disease identiﬁcation and guiding the defense against adversarial attacks.


Introduction
On a global scale, pathogens and pests are one major reason that reduces the yield and quality of agricultural production. According to the study of [1], estimated yield losses of five major crops (wheat, rice, maize, potato, and soybean) due to pathogens and pests range from 10.1% to 41.1% globally. Identifying plant diseases in an early time can prevent further yield losses of production by informing farmers of appropriate treatment processes regarding the diagnosis. Advanced lab-based methods for plant disease diagnosis such as DNA-based and serological methods, are accurate and authentic [2]. However, these methods are either more time-consuming or more costly than those based on the visual observation of symptoms shown on the organs of species. Traditionally the visual observations are performed by experienced experts or producers which are labor intensive and not reliable [3,4]. In recent years, due to the advancement of information and communication technologies (ICT), acquiring images from farms can be done easily by human observers using mobile devices [5] or automated sensing technologies such as unmanned aerial vehicles (UAVs) [6]. Thus, automatic identification of plant diseases using plant leaf images becomes more and more popular [7].
The automation of plant disease identification methods is achieved through predictive models built with machine learning algorithms. Traditional machine learning algorithms such as k-nearest neighbor (KNN) [8], artificial neural network (ANN) [9], support vector machine (SVM) [10], and random forest [11] have been widely applied for the problem. These algorithms rely heavily on features generated from plant leaf images, which require advanced image processing techniques and extensive involvement of domain experts. Recently, deep learning (DL) has emerged as a promising solution for many computer vision applications including image-based plant disease identification [12]. Instead of relying on advanced image processing techniques and domain experts, DL-based methods use deep neural networks (DNNs) that are capable of automatically extracting image features from raw data [13]. In addition, it has been shown that DL offers significantly better detection performance than traditional machine learning algorithms [3,5,12,14,15]. The major challenge associated with DL-based methods is the need for large amounts of data and vast computing resources to train DNN models. Fortunately, transfer learning solves this problem by using a pre-trained model from a similar domain instead of starting the model training from scratch [16,17]. Majority of DL-based plant disease identification models were built based on pre-trained DNN models such as VGGNet [18], ResNet [19], Inception [20], and DenseNet [21].
Although DL models have shown superior performance in many applications, they are susceptible to carefully crafted adversarial attacks. Adversaries can easily perturb normal samples to produce adversarial samples which cause DL models to make wrong predictions [22]. Adversarial attacks against DL can be categorized as white-box, gray-box, and black-box attacks of which the difficulties increase in order [23]. The pre-trained model used in transfer learning is usually publicly available to both normal users and adversaries. Based on this vulnerability, an adversary can generate adversarial samples solely with the knowledge of the pre-trained model to launch an effective and efficient white-box attack [24]. Recently web-based plant disease identification systems with DL have been proposed which use leaf images uploading from smartphones [25,26]. Adversaries can intercept uploaded normal images and apply white-box attacks to convert them to adversarial images. In the end, the misdiagnosis of plant diseases by the systems could result in a significant delay of treatments and huge economic losses. In this paper, we conduct a comprehensive study of the effects of popular white-box adversarial attacks on pre-trained DNN models widely used in plant disease identification. We also investigate the effectiveness of different adversarial sample detection methods on defending adversarial attacks. To the best of our knowledge, our work is the first attempt to investigate adversarial attacks and detection on DL-based plant disease identification.
The rest of this paper is organized as follows. In Section 2, the popular pre-trained DNN models for plant disease identification are introduced followed by the description of white-box adversarial attacks and adversarial sample detection methods. The experiments and results of adversarial attacks and detection on DL-based plant disease identification are presented in Section 3. Finally, we conclude the paper in Section 4.

Methods
In this section, we first describe the problem of plant disease identification. Popular pre-trained DNN models for plant disease identification are then introduced. After that, white-box adversarial attacks and adversarial sample detection methods adopted in this paper are presented.

Plant Disease Identification Problem
Plant disease identification is a classification problem that can be either 2-class or multi-class. A 2-class model classifies a plant leaf image as healthy or diseased while a multi-class model does a more fine-grained classification to predict the input image as healthy or a certain type of disease. Figure 1 shows the system architecture adopted in this paper that applies a DL model for plant disease identification. To alleviate the problem of insufficient training data for a DL model, the system transfers knowledge from the source domain of ImageNet [27] to the target domain of plant disease detection. This process first employs the partial network from the source domain to serve as a feature extractor of the new network and then replaces the final output layer of the source domain network with a new dense layer followed by a SoftMax function corresponding to the plant disease dataset. Finally, the new network is fine-tuned by using the plant disease dataset. Fine-tuning can optimize network parameters wholly or partially [28]. The shallow training of our work fine-tunes all network parameters.

VGGNet
VGGNet is a deep convolutional neural network (CNN) model proposed for the ILSVRC-2014 challenge [18]. The input of the model is a fixed-size 224 × 224 image, which passes through a stack of convolutional layers with 3 × 3 filter. The model also uses 2 × 2 max-pooling layer following some convolutional layers for down-sampling which reduces the input size of later layers. The end of the network consists of two fully connected layers with 4096 neurons each followed by a SoftMax layer. Depending on the number of convolutional layers of the network, there are two VGGNet architectures: VGG-16 and VGG-19. VGG-16 is considered in our study.

ResNet
Deep residual networks (ResNet) were proposed by He et al. in [19], which have shown compelling performance and good convergence behaviors. The ResNet architecture accepts a 224 × 224 image as input. It consists of a stack of residual blocks, which are feedforward neural networks with shortcuts (or skip connections). Shortcuts are connections skipping over some layers which are used to deal with the problem of vanishing-gradients as the network goes deeper. The ResNet architecture considered in this paper is ResNet-101.

Inception
The idea of Inception was first introduced in [20] as a module for the GoogleNet architecture, which approximates an optimal local sparse structure of a convolutional network by dense components. An Inception Module is a stack of a max-pooling layer and convolution layers, which is the basic module to construct the Inception network. Inception V3 proposed in [36] is considered in this paper, which is the 3rd version of Inception architecture. Inception V3 inherits the basic idea of "Inception Module" and the batch normalization introduced in Inception V2 [37]. In addition, three new features are added to Inception V3 including convolution factorization, efficient grid size reduction, and auxiliary classifier [36]. The input of Inception V3 is a fixed-size 299 × 299 image.

DenseNet
DenseNet was introduced by Huang et al. in [21], which maximizes the information flow between layers by connecting a layer to other layers in a feed-forward manner. The inputs of a layer in DenseNet are features maps of all preceding layers. The feature maps of a layer will then be used as inputs of all subsequent layers. DenseNet requires significantly fewer parameters than traditional deep CNNs [21]. It also provides other benefits including alleviating the vanishing-gradient problem, strengthening feature propagation, and encouraging feature reuse [21]. In this paper, DenseNet-121 is considered to be the DenseNet architecture, which accepts a fixed-size 224 × 224 image as input.

Adversarial Attacks
The idea of using adversarial samples to cause the misclassification of a DL model was first explored by Szegedy et al. [22]. There are two kinds of adversarial samples that can be crafted. Given a dataset ( is the corresponding label of the sample, an untargeted adversarial sample x x x is crafted to make f ({x = y yet x x x and x x x are close according to certain metric. Another more powerful but harder attack uses targeted adversarial samples. Given a normal sample x x x and a target label y = f (x x x) , the attacking algorithm searches for an adversarial sample x x x such that f (x x x ) = y where x x x and x x x are close. To measure the similarity between x x x and x x x , L 0 , L 2 , L ∞ are the three most widely used distance metrics among all L p -norms for generating adversarial samples [38]. The L p distance is also written as ||x For the three distance metrics, L 0 norm measures the number of points that differ between x x x and x x x , L 2 norm is the standard Euclidean distance, and L ∞ norm measures the maximum change to any of the points. Although all three distance metrics approximate to the human perceptual similarity, L ∞ , also known as max-norm, is the most commonly used one due to its better consistency to human perception [39].
There are three types of adversarial attacks: white-box, gray-box, and black-box attacks. In this work, we focus on untargeted white-box attacks under the L ∞ norm distance metric. White-box attacks require attackers have the highest-level knowledge of the model among the three types of attacks, which then have a greater impact on the performance of DL-based models than other two types of attacks. Since many DL-based plant disease identification schemes are built based on pre-trained DNN models, adversarial samples generated in the white-box manner against a fine-tuned model based on a pre-trained DNN model can be successfully transferred to the target model [24]. In the following, we describe four popular white-box attacks considered in this study: fast gradient sign method (FGSM) [40], basic iterate method (BIM) [41], projected gradient descent (PGD) [39], and Carilini and Wagner attack (CW) [38].

FGSM
FGSM is a popular untargeted white-box attack introduced by Goodfellow et al. [40], which perturbs one-step along the gradient direction of the adversarial loss J(θ θ θ, x x x, y) with a max-norm constraint of :

BIM
BIM was proposed in [41] which extends FGSM as an iterative method. BIM perturbs the input x x x iteratively with a step size α under the max-norm .
where x t x t x t is the adversarial sample generated at t-th step, J(θ θ θ, x x x, y) is the optimization loss of adversarial attack in which θ θ θ represents the weights of model. The number of perturbation steps, T, is chosen heuristically [41]. The step size is usually set to /T ≤ α < . Clip x, (x ) is defined as following.
where img_max and img_min are the maximum and minimum of image range, e.g., 1 and 0 for images in the range [0,1].

PGD
PGD attack was proposed by Madry et al. [39] to find adversarial samples. PGD perturbs a normal sample x x x for a total of T steps where each step perturbs x x x in the gradient direction of the adversarial loss with a projection constraint, which is a set of allowed perturbations denoted as S ⊆ R d .
where α is the step size, and Π(·) is the projection function which projects the intermediate perturbation into the valid data range and the L ∞ -ball around the normal sample x x x. PGD is similar to BIM with the differences of the projection step and random start.

CW
CW attack is an optimization-based adversarial attack [38], which achieves a perfect attack success rate against defensive distillation, an efficient approach hardening neural networks against adversarial samples [42]. CW attack creates an adversarial sample x x x from a normal sample x x x ∈ [0, 1] by minimizing ||δ||p where δ is the pixel-wise difference between x x x and x x x , p denotes the L p norm distance metric (L ∞ in our study). The box constraints of CW attack uses the idea of change of variables instead of projected gradient descent and clipped gradient descent used in PGD and BIM, respectively [38], which introduces and optimizes over a new variable w to smooth the box constraint process as follows:

Detection of Adversarial Samples
Several adversarial defenses have been proposed in recent years for DL models such as adversarial training [22,40], input data compression [43], gradient regularization [44], and defensive distillation [42]. However, those defenses were proved later that do not work partially or wholly [45]. Therefore, recent research has more focused on detection-based defenses [46][47][48] that detect adversarial samples using features extracted from trained DL models. In the following, we describe adversarial sample detection methods investigated in our study.

Kernel Density (KD) and Bayesian Uncertainty (BU)
KD estimation was proposed in [46] as a measure to submanifold in the feature space of the last hidden layer. The assumption is that an adversarial sample lies far from the normal data manifold. Given a sample x x x and a training set X t X t X t of class t, the KD estimation of x x x based on Gaussian distribution is calculated as: where σ is the bandwidth of the kernel. σ controls the smoothness of the density estimation which we heuristically set it to 1.2. BU, the second detection method proposed in [46], is based on an approximation from dropout mechanism in DNNs to the deep Gaussian process. The uncertainty is the additional information to the label prediction which gives a confidence interval to the prediction. Features extracted with KD and BU can be used as the input of a machine learning-based detector.

LID
LID models dimensional characteristics of adversarial subspaces based on the distance distribution amid adversarial samples [48]. The argument of [48] is that KD can fail to differentiate adversarial samples from normal samples which are differentiable in highdimension manifold but not in low-dimension manifold. Given a sample x x x, the maximum likelihood estimator of LID uses its distances to k nearest neighbors: where r i is the distance between x x x and its i-th nearest neighbor, r k (x x x) is the distance between x x x and x x x's furthest neighbor of k nearest neighbors. A LID-based detector is built based on LID computed from each layer of the DNN under a mini-batch manner for given training samples.

SafetyNet
Lu et al. [47] proposed an adversarial sample detection architecture called SafetyNet which uses RBF-SVM as the adversarial detector with features extracted from the outputs of later activation layers. The hypothesis of the method is that adversarial samples produce different patterns of activation in late stage than those produced by normal samples. There are two different kinds of features used in [47]: raw features extracted directly from activation denoted as DeepF and discrete features obtained by quantizing activation as discrete levels denoted as DiscF. DiscF forces the attacker to solve a hard discrete optimization problem [47]. Both DeepF and DiscF are considered in our study.

Experiments and Results
In this section, the efficacy of adversarial attacks (FGSM, BIM, PGD, and CW) against four popular DNN models (VGG-16, ResNet-101, Inception V3, and DenseNet-121) for plant disease identification is investigated. The effectiveness of adversarial sample detection methods against adversarial attacks is also studied. Without loss of generality, we present the results of apple leaf disease identification for which several DL models have been developed [30,35]. Although we only report the results of apple leaf disease identification, our unreported experiments obtained similar results from other leaf disease datasets.

Datasets
We use a publicly available apple leaf disease dataset which is a subset of the PlantVillage dataset [49]. Table 1 shows the details of the dataset including classes of apple leaf images and the number of images for each class. The dataset can be directly used to build multi-class disease identification models that a leaf image is labeled as one of the four classes (healthy or one of the three diseases). To build 2-class disease identification models, all images of three disease classes are labeled as "diseased" which are combined with healthy images to form the dataset. Total 3171

Performance of Fine-Tuned DNN Models without Adversarial Attacks
To evaluate the performance of different DNN models without adversarial attacks, the apple leaf disease dataset (2-class or multi-class) is divided as a training set and a testing set with an 80/20 ratio. For each DNN model, a leaf image is resized as the standard input size required by a pre-trained DNN model. In the fine-tuning phase, each DNN model is fine-tuned from weights pre-trained using ImageNet dataset [27]. The models are trained with a stochastic gradient descent (SGD) optimizer with an initial learning rate of 0.001 and momentum of 0.9. The learning rate decays with a rate of 0.1 every 7 epochs. Our shallow training has 100 epochs with an early stopping of 7-epoch tolerance. The test accuracy of the four fine-tuned DNN models are presented in Table 2. It can be seen that without adversarial attacks all models perform very well on both 2-class and multi-class disease identification.

Efficacy of Adversarial Attacks
To investigate the efficacy of adversarial attacks, we perturb the testing set used in Section 3.2. Each of the four adversarial attacks of Section 2.3 is applied to generate adversarial samples from randomly selected 50% of the test samples. The generated adversarial samples are combined with another half of normal samples as the testing set for adversarial attacks. Please note that all four attacks are bounded by a pre-defined maximum perturbation size with respect to the L ∞ norm. We generate different testing sets by varying from 0.2/255 to 4/255. Examples of adversarial images generated by the four attacks and their corresponding normal images under = 1/255 for fine-tuned 2-class and multi-class VGG-16 models are shown in Figures 2 and 3, respectively. It can be seen that adversarial images generated by the four attacks under small perturbations are hard to distinguish from original images by human eyes. Figures 4 and 5 show the results of applying adversarial attacks on fine-tuned 2-class and multi-class DNN models, respectively. It can be observed that the results of 2-class and multi-class models are similar. For all models, the accuracy of disease identification drops significantly as increases which demonstrates the efficacy of the attacks. One can find that three iterative perturbation attacks (BIM, PGD, and CW) are more efficient than the only one-step perturbation attack, FGSM. Another interesting finding is that VGG-16 is the most robust one against adversarial attacks among the four DNN models although its performance is still significantly degraded under attacks.

Results of Adversarial Sample Detection
The testsets generated for adversarial attacks with = 1/255 in Section 3.3 are used as the datasets for evaluating the performance of adversarial sample detection methods. Detection features are extracted from the fine-tuned DNN models of Section 3.2. The KD and BU features are estimated from the second-last layer and the SoftMax layer of the network, respectively. The LID features are calculated from the outputs of all layers of the network with a min-batch size of 100. DeepF and DiscF features are obtained from the second-last layer of the network. According to [46,48], logistic regression classifier is used for KD + BU and LID features. RBF-SVM classifier is used for DeepF and DiscF features based on the SafetyNet architecture [47]. We apply a 5-fold cross-validation to evaluate the performance of different adversarial sample detection methods. Tables 3 and 4 show the results of adversarial sample detection for fine-tuned 2-class and multi-class DNN models, respectively. The results show that KD + BU features achieve significantly better performance than other features with a nearly perfect detection rate. This demonstrates that adversarial attacks on DL-based plant disease identification models can be effectively defended by using adversarial sample detection with an appropriate choice of features. Surprisingly, LID features are the worst performed ones which were shown superior performance over KD + BU features on three benchmark image datasets: MNIST, CIFAR-10, and SVHN in [48]. This implies that the intrinsic characteristics of leaf images used for plant disease identification are different from those of benchmark images. Finally, we investigate the transferability of detection models built with KD + BU features. The detection model for a DNN model is trained with a testset generated in Section 3.3 that consists of normal samples and adversarial samples generated by one of the four attacks. The model is then applied for detecting adversarial samples generated by other three attacks. Tables 5 and 6 show the detection performance in terms of accuracy for 2-class and multi-class DNN models, respectively. It can be seen that detection models for 2-class and multi-class DNN models have comparable transferability. Detection models trained with adversarial samples generated by CW attack have the best transferability which can perfectly detect adversarial samples generated by other three attacks except one case (Inception V3, 2-class).

Conclusions
Pre-trained DNN models have been widely used in machine learning and computer vision applications including plant disease identification. In this paper, the vulnerabilities of DL-based plant disease identification models under four popular white-box adversarial attacks are investigated. Our results show that all attacks can significantly affect the performance of DNN models for plant disease identification. A small number of perturbations introduced by the attacks on acquired leaf images can lead to a significant degradation of disease identification performance. It is found that VGG-16 is more robust against attacks than other DNN models. We then study the effectiveness of adversarial sample detection methods based on features extracted from fine-tuned DNN models. The results show that attacks can be effectively detected with an appropriate choice of features such as KD + BU. The findings of this paper will serve as a basis for developing more robust DNN models for plant disease identification and guiding the defense against adversarial attacks.