Identiﬁcation of Plant-Leaf Diseases Using CNN and Transfer-Learning Approach

: The timely identiﬁcation and early prevention of crop diseases are essential for improving production. In this paper, deep convolutional-neural-network (CNN) models are implemented to identify and diagnose diseases in plants from their leaves, since CNNs have achieved impressive results in the ﬁeld of machine vision. Standard CNN models require a large number of parameters and higher computation cost. In this paper, we replaced standard convolution with depth=separable convolution, which reduces the parameter number and computation cost. The implemented models were trained with an open dataset consisting of 14 different plant species, and 38 different categorical disease classes and healthy plant leaves. To evaluate the performance of the models, different parameters such as batch size, dropout, and different numbers of epochs were incorporated. The implemented models achieved a disease-classiﬁcation accuracy rates of 98.42%, 99.11%, 97.02%, and 99.56% using InceptionV3, InceptionResNetV2, MobileNetV2, and EfﬁcientNetB0, respectively, which were greater than that of traditional handcrafted-feature-based approaches. In comparison with other deep-learning models, the implemented model achieved better performance in terms of accuracy and it required less training time. Moreover, the MobileNetV2 architecture is compatible with mobile devices using the optimized parameter. The accuracy results in the identiﬁcation of diseases showed that the deep CNN model is promising and can greatly impact the efﬁcient identiﬁcation of the diseases, and may have potential in the detection of diseases in real-time agricultural systems.


Introduction
The automated identification of plant diseases based on plant leaves is a major landmark in the field of agriculture. Moreover, the early and timely identification of plant diseases positively impacts crop yield and quality [1]. Due to the cultivation of a large number of crop products, even an agriculturist and pathologist may often fail to identify the diseases in plants by visualizing disease-affected leaves. However, in the rural areas of developing countries, visual observation is still the primary approach of disease identification [2]. It also requires continuous monitoring by experts. In remote areas, farmers may need to travel far to consult an expert, which is time-consuming and expensive [3,4]. Automated computational systems for the detection and diagnosis of plant diseases assist farmers and agronomists with their high throughput and precision.
In order to overcome the above problems, researchers have thought of several solutions. Various types of feature sets can be used in machine learning for the classification of plant diseases. Among these, the most popular feature sets are traditional handcrafted Different convolutional-neural-network (CNN) architectures such as InceptionV3, InceptionResNetV2, MobileNetV2, and EfficientNetB0 are implemented to diagnose plant diseases on the basis of healthy-and diseased-leaf images. In InceptionV3 and InceptionResNetV2, standard convolution was replaced with depthwise separable convolution, which reduced the number of parameters by a large margin while achieving the same performance-accuracy level. The implemented InceptionV3 and InceptionResNetV2 use fewer parameters and are faster than the standard InceptionV3 and InceptionResNetV2 architectures.
A transfer-learning-based CNN was applied on a MobileNetV2 and an EfficientNetB0 model. In each model, we froze the layer weight before the fully connected layer and removed all layers after that. We added a stack of an activation layer, batchnormalization layer, and dense layer. After each batch-normalization layer, we used a dropout layer with different dropout values, which prevents the architecture from overfitting. Since a large number of features were there, we used the L1 and L2 regularization techniques in the dense layer of all models, which simplified the models. We finetuned the network with different parameters to achieve optimal results. We performed extensive testing by adjusting the different parameters. We used different batch sizes in the range of 32-180, and different dropout values in the range of 0.2-0. 8. To optimize the model, we tested it with different learning rates in the range of 0.01-0.0001. The models were trained with different epochs. To examine the robustness of the model, we used three formats of images, namely, color, segmented, and grayscale images. We compared the performance of the implemented models with that of other deeplearning models and state-of-the-art machine-learning techniques. Results showed that the implemented model performed better in terms of both accuracy and required training time.
This paper is organized as follows. Section 2 illustrates the literature related to the detection of plant diseases. Section 3 presents the CNN models and the details of the datasets that are used in the experiments, along with their class and labels. Section 4 presents the results and performance of the models on the basis of their ability to predict the correct class among 38 different classes. Section 5 offers a discussion, and outlines the study's limitations and future directions towards the development and enhancement of the system Section 6 concludes the work.

Related Work
The implementation of proper techniques to identify healthy and diseased leaves helps in controlling crop loss and increasing productivity. This section comprises different existing machine-learning techniques for the identification of plant diseases.

Shape-and Texture-Based Identification
In [30], the authors identified diseases using tomato-leaf images. They used different geometric and histogram-based features from segmented diseased portions and applied an SVM classifier with different kernels for classification. S.Kaur et al. [31] identified three different soybean diseases using different color and texture features. In [32] P Babu et al. used a feed-forward neural network and backpropagation to identify plant leaves and their diseases. S. S. Chouhan et al. [33] used a bacterial-foraging-optimization-based radial-basisfunction neural network (BRBFNN) for the identification of leaves and fungal diseases in plants. In their approaches, they used a region-growing algorithm to extract features from a leaf on the basis of seed points having similar attributes. The bacterial-foraging optimization technique is used to speed up a network and improve classification accuracy.

Deep-Learning-Based Identification
Mohanty et al. [24] used AlexNet and GoogleNet CNN architectures in the identification of 26 different plant diseases. Ferentinos et al. [25] used different CNN architectures to identify 58 different plant diseases, achieving high levels of classification accuracy. In their approach, they also tested the CNN architecture with real-time images. Sladojevic et al. [26] designed a DL architecture to identify 13 different plant diseases. They used the Caffe DL framework to perform CNN training. Kamilaris et al. [34] exhaustively researched different DL approaches and their drawbacks in the field of agriculture. In [35], the authors proposed a nine-layer CNN model to identify plant diseases. For experimentation purposes, they used the PlantVillage dataset and data-augmentation techniques to increase the data size, and analyzed performance. The authors reported better accuracy than that of a traditional machine-learning-based approach.
Pretrained AlexNet and GoogleNet were used in [36] to detect 3 different soybean diseases from healthy-leaf images with modified hyperparameters such as minibatch size, max epoch, and bias learning rate. Six different pre-trained network(AlexNet, VGG16, VGG19, GoogLeNet, ResNet101 and DenseNet201) used by KR Aravind et al. [37] to identify 10 different diseases in plants, and they achieved the highest accuracy rate of 97.3% using GoogleNet. A pretrained VGG16 as the feature extractor and multiclass SVM were used in [38] to classify different eggplant diseases. Different color spaces (RGB, HSV, YCbCr, and grayscale) were used to evaluate performance; using RGB images, the highest classification accuracy of 99.4% was achieved. In [39], the authors classified maizeleaf diseases from healthy leaves using deep-forest techniques. In their approach, they varied the deep-forest hyperparameters regarding number of trees, forests, and grains, and compared their results with those of traditional machine-learning models such as SVM, RF, LR, and KNN. Lee et al. compared different deep-learning architectures in the identification of plant diseases [22]. To improve the accuracy of the model, Ghazi et al. used a transfer-learning-based approach on pretrained deep-learning models [40].
In [41], the authors used a shallow CNN with SVM and RF classifiers to classify three different types of plant diseases. In their work, they mainly compared their results with those of deep-learning methods and showed that classification using SVM and RF classifiers with extracted features from the shallow CNN outperformed pretrained deeplearning models. A self-attention convolutional neural network (SACNN) was used in [42] to identify several crop diseases. To examine the robustness of the model, the authors added different noise levels in the test-image set.
Oyewola et al. [43] identified 5 different cassava-plant diseases using plain convolutional neural network (PCNN) and deep residual network (DRNN), and found that DRNN outperformed PCNN by a margin of 9.25%. Ramacharan et al. [4] used a transfer-learning approach in the identification of three diseases and two pest-damage types in cassava plants. The authors then extended their work on the identification of cassava plant diseases using a smartphone-based CNN model and achieved accuracy of 80.6% [44].
A NASNet-based deep CNN architecture was used in [45] to identify leaf diseases in plants, and an accuracy rate of 93.82% was achieved. Rice-and maize-leaf diseases were identified by Chen et al. [2] using the INC-VGGN method. In their approach, they replaced the last convolutional layer of VGG19 with two inception layers and one global average pooling layer. A shallow CNN (SCNN) was used by Yang Li et al. [41] in the identification of maize, apple, and grape diseases. First, they extracted CNN features and classified them using SVM and RF classifiers. Sethy et al. [1] used different deep-learning models to extract features and classify them using an SVM classifier. Using ResNet50 with SVM, they achieved the highest performance accuracy. A VGG16, ResNet, and DenseNet model was used by Yafeng Zhao et al. [46] to identify plant diseases from the plant village dataset. To increase the dataset size, they used a double generative adversarial network (DoubleGAN), which improved the performance results. A summary of the related work on plant-disease identification based on leaf images is shown in Table 1.

Convolutional-Neural-Network Models
Interest in CNNs has recently surged, and DL is the most popular architecture because DL models can learn relevant features from input images at different convolutional levels similar, to the function of the human brain. DL can solve complex problems particularly well and quickly with high classification accuracy and a lower error rate [47]. The DL model is composed of different components (convolutional, pooling layer, and fully connected layers, and activation functions). Table 2 shows the number of layers and parameter sizes of different CNN architectures. AlexNet has a layer size of 8 and 60 millions parameters, whereas VGGNet-16 and GoogleNet have parameter sizes of 138 and 7 million, respectively. The layers in those two models are 16 and 27. The layers in ResNet-152 are 152, and the parameter size is 50 million. InceptionV3, MobileNetV1, and MobileNetV2 have a parameter size of 27, 4.2, and 3.37 million, respectively. In our work, we used the InceptionV3, InceptionResNetV2, MobileNetV2, and EfficientNetB0 architectures to identify different plant diseases using the leaves of different disease-affected plants. We used these models because their parameter size is optimal in comparison with that of other architectures. During implementation, we used a pretrained weight based on the ImageNet Large-Scale Visual Recognition (ILSVRC) [48] dataset. Convolutional neural networks became familiar in machine vision since the AlexNet model was popularized in DL architecture. The development of the Inception model was important in the field of machine vision. Inception is a simple and more powerful DL network with sparsely connected filters, which can replace fully connected network architectures, especially inside convolutional layers, as shown in Figure 1b. The Inception model's computational efficiency and number of used parameters are much lower in comparison with those of other models such as AlexNet and VGGNet. An inception layer consists of differently dized convolutional layers (e.g., 1 × 1, 3 × 3, and n × n convolutional layers) and pooling layers with all outputs integrated together and propagating to the input of the next layer. Instead of using standard convolution in the inception block, we used depthwise separable convolution. Tables 3 and 4 show the required parameters in standard convolution and depthwise separable convolution, respectively. The number of parameters required in depthwise separable convolution is much less than that of standard convolution.    The InceptionResNetV2 architecture is the combination of recent deep-learning models: residual connection and the Inception architecture [49]. This hybrid deep-learning model has the advantages of a residual network and retains the unique characteristics of the multiconvolutional core of the Inception network. In [50], the authors showed that residual connections are implicit approaches for training very deep architectures. This improved version of the Inception architecture significantly improved performance and accelerated the model. Figure 2 shows the basic block diagram of InceptionResNetV2.   Figure 3a shows the modified InceptionResNet-A block where the inception module uses parallel structure to extract the features. The 3 × 3 standard convolution was replaced by 3 × 3 depthwise separable convolution. Figure 3b represents the modified InceptionResNet-B block, where the 7 × 7 standard convolutional structure of inception model was replaced by 7 × 7 depthwise separable convolution.  In the InceptionResNet-C block, the 3 × 3 convolutional structure was replaced by successive 3 × 1 and 1 × 3, as shown in Figure 4. By replacing the original convolutional kernel with multiple smaller convolutional kernels, this model effectively reduced computational complexity. An increase in the number of convolutional layers and the deepening of the network improved performance accuracy. The main intention behind the use of MobileNetV2 architecture is the convolutional layer, which is quite expensive in normal convolutions in comparison with in MobileNetV2. To improve efficiency, depthwise separable convolution is used in the MobileNetV2 architecture [51,52]. Depthwise convolution is independently performed for each input channel. The blocks of MobileNetV2 are shown in Figure 1a. The first layer is called the expansion layer of 1 × 1 convolution, and its purpose is to expand the number of channels in the data. Next is the projection layer. In this layer, a high number of dimensions is reduced to a smaller number. Except for the projection layer, each layer comprises a batch-normalization function and activation function ReLU. In the MobileNetV2 architecture, there is one residual connection between input and output layers. The residual network tries to learn already learned features; those that are not useful in decision making are discarded. This architecture can reduce the number of computations and of parameters. The MobileNetV2 architecture consists of 17 building blocks in a row followed by a 1 × 1 convolutional layer, global average pooling layer, and classification layer.
A deep-learning architecture aims to achieve better performance accuracy and efficiency with smaller models. Unlike other state-of-the-art deep=learning models, the EfficientNet architecture is a compound scaling method that uses a compound coefficient to uniformly scale network width, depth, and resolution [29]. EfficientNet consists of 8 different models from B0 to B7. Instead of using the ReLU activation function, EfficientNet uses a new activation function, swish activation. EfficientNet uses inverted bottleneck convolution, which was first introduced in the MobileNetV2 model, which consists of a layer that first expands the network and then compresses the channels [52]. This architecture reduces computation by a factor of f 2 as compared to normal convolution, where f is the filter size. The authors in [29] showed that EfficientNetB0 is the simplest of all 8 models and uses fewer parameters. So, in our experiment, we directly used EfficientNetB0 to evaluate performance. Figure 5 shows the basic block diagram of EfficientNetB0.

Transfer-Learning Approach
In deep learning, transfer learning is the reuse of a pretrained network on a new task. Transfer learning is very popular in deep learning because it can train the network with a small amount of data and high accuracy. In transfer learning, a machine exploits knowledge gained from a previous task to improve generalization about another. In transfer learning, the last few layers of the trained network are replaced with new layers, such as a fully connected layer and softmax classification layer, with number of classes, which is 38 in our paper. In each model, we unfroze the layer and added a stack of one activation layer, one batch-normalization layer, and one dropout layer. All models were tested with different dropout values, learning rates, and batch sizes. The input size used in MobileNetV2 and EfficientnetB0 is 224 × 224.

Dataset
For training and testing purposes, we used the standard open-access PlantVillage dataset [53], which consists of 54,305 numbers of healthy-and infected-plant leaves. Detailed database information, the number of classes and images in each class, their common and scientific names, and the disease-causing viruses are shown in Tables 5 and 6. The database contains 38 different classes of 14 different plant species with healthy-and disease-affected-leaf images. All images were captured in laboratory conditions. Figure 6 shows some sample leaf images from the PlantVillage datasets [53].
In our experiment, we used three different formats of PlantVillage datasets. First, we ran the experiment with colored leaf images, and then with segmented leaf images of the same dataset. In the segmented images, the background was smoothed, so that it could provide more meaningful information that would be easier to analyze. Lastly, we used grayscale images of the same dataset to evaluate the performance of the implemented methods. All leaf images were divided into two sets, a training set and the testing set.

Results
The implemented CNN architectures, as described in the previous section, used the parameters in Table 7. EfficientNetB0 achieved the best accuracy in comparison with that of InceptionV3, MobileNetV2, and InceptionResNetV2. To evaluate performance, we used different parameters, for example, performance accuracy, F1 score, precision, recall, training loss, and time required per epoch. As in our experiment, we used three different representations (i.e., color, grayscale, segmented) of PlantVillage image data, which showed different performance metrics in all cases. The color-image dataset performed better than those with grayscale and segmented images; the same number of CNN network parameters was maintained in all cases. Figure 7a-c shows the graphs for testing the accuracy, loss, and F1-score regarding the number of epochs for the implemented models. Figure 7d represents the accuracy graph of the InceptionResnetV2 model with different training and testing split images. A summary of the performance comparisons of the implemented models based on testing accuracy and testing loss is represented in Table 8. The performance metrics that are considered in our proposed work are as follows.

•
Performance accuracy: the total number of correctly classified images to the total number of images. • Loss function: how well the architecture models the data.  To avoid overfitting, we phasewise divided the dataset into different training and testing ratios. In the case of 80% of training and 20% of testing image data, we achieved an accuracy of 98.42% in InceptionV3, 99.11% in InceptionResNetV2, 97.02% in MobilenetV2, and 99.56% in EfficientNetB0 for color images. After splitting the dataset into different training and testing ratios, there was not much variation in the accuracy of the models. Hence, they did not suffer from the problem of overfitting.The accuracy of all models for different image types with loss and number of epochs are shown in Table 9. Table 10 presents the precision, recall, and F1 score of the implemented models on splitting the dataset into 80-20% training and testing ratios. EffcientNetB0 had a precision value of 0.9953, recall of 0.9971, and F1 score of 0.9961, which were higher than those of the other models.   Table 8 indicates that the implemented techniques achieved better performance in terms of the combination of accuracy and average time per epoch in comparison with that of other implemented techniques. The highest successful classification accuracy, obtained by EfficientNetB0, was 99.56%, and training time was much less as compared with that of the InceptionV3, InceptionResNetV2, and MobileNetV2 architectures. The decrease in time per epoch was because the number of parameters in these models was quite smaller than that of other existing models. A comparison between the number of parameters used in different models is highlighted in Table 1. The novelty of the implemented model lies in the fact that we used depthwise separable convolution, which reduces the network parameters. We considered different deep-learning models, such as a deep-learning model with an inception layer, deep learning with a residual connection, deep learning with depthwise separable convolution, and deep-learning models with depth, width, and resolution. We finetuned the network parameters to achieve better performance accuracy with less time, as is shown in Table 8.
The accuracy of the model with respect to the number of predictions in the Mo-bileNetV2 architecture decreased to 91% if we used a dropout value of 0.8. Figure 8 shows performance accuracy with respect to the different dropout values used in the network. Figure 9 shows correctly classified results from the test image dataset with their predicted and source class. The predicted class was returned with the confidence of that class.

Discussion
The early detection and identification of plant diseases using deep-learning techniques has recently made tremendous progress. Identification using traditional approaches heavily depends on some factors such as image enhancement, the segmentation of disease regions, and feature extraction.
Our approach is based on the identification of diseases using a deep-learning-based transfer-learning approach. Instead of using standard convolution, we used depthwise separable convolution in the inception block, which reduced the number of parameters by a large margin. To use both the inception and the residual network connection layer, we used the InceptionResNetV2 model. The model both has higher accuracy and requires less training time than the original architecture does, as the used parameters are much fewer. To check the performance towards a smartphone-implemented lightweight model to assist in plant-disease diagnosis, we implemented the MobileNetV2 model. We also implemented EfficientNetB0, which considers depth, width, and resolution during convolution.
Although the convolutional-neural-network-based deep-learning architecture achieved high success rates in the detection of plant diseases, it has some limitations, and there is a scope for future works. A little noise in the sample images led to misclassification by the deep-learning model [55,56]. Future work includes evaluating performance on noisy images and improving it. The dataset that we used to evaluate performance included 38 different diseases and healthy leaves. However, there is a need for the expansion of the dataset with wider land areas and more varieties of disease images. The dataset can also be improved with aerial photos, which are captured by drones. Another important issue is that the testing images are all from the same image dataset. Testing the network with real-time field images is an important challenging issue. The images that were used to test performance were all captured in laboratory conditions. The images that we used for testing our model are part of the same dataset, the training dataset. There is a need for the development of an efficient machine-learning system that could identify diseases in real-time scenarios and from collected data from different datasets. Some researchers are working on this field; they tested their model with real-time images, and performance worsened by a huge margin-around 25-30%. Mohanty et al. [24] conducted an experiment where they tested their model with different images from those in the training dataset and achieved an accuracy rate of 31.5%. Ferentinos et al. [25] measured performance with training images in laboratory conditions and tested the images in real-time conditions, and achieved an accuracy rate of 33%. To improve this, we need wide variety in databases, for example, with images taken in different lighting conditions, from different geographical areas, and with cultivating conditions. In addition, we aim to carry this research forward by implementing it with a new deep-learning model, such as ACNet [57], and a transformer-based architecture, such as ViT [58] and the MLP Mixer [59] method, in plant disease identification, and evaluate its performance.

Conclusions
There are many developed methods in the detection and classification of plant diseases using diseased leaves of plants. However, there is still no efficient and effective commercial solution that can be used to identify the diseases. In our work, we used four different DL models (InceptionV3, InceptionResnetV2, MobileNetV2, EfficientNetB0) for the detection of plant diseases using healthy-and diseased-leaf images of plants. To train and test the model, we used the standard PlantVillage dataset with 53,407 images, which were all captured in laboratory conditions. This dataset consists of 38 different classes of different healthy-and diseased-leaf images of 14 different species. After splitting the dataset into 80-20 (80% of whole data for training, 20% whole images for testing), we achieved the best accuracy rate of 99.56% in EfficientNetB0 model. On average, less time was required to train the images in the MobileNetV2 and EfficientNetB0 architectures, and it took 565 and 545 s/epoch, respectively, on colored images. In comparison with other deep-learning approaches, the implemented deep-learning model has better predictive ability in terms of both accuracy and loss. The required time to train the model was much less than that of other machine-learning approaches. Moreover, the MobileNetV2 architecture is an optimized deep convolutional neural network that limits the parameter number and operations as much as possible, and can easily run on mobile devices.