Deep Learning-Based Leaf Disease Detection in Crops Using Images for Agricultural Applications

: The agricultural sector plays a key role in supplying quality food and makes the greatest contribution to growing economies and populations. Plant disease may cause signiﬁcant losses in food production and eradicate diversity in species. Early diagnosis of plant diseases using accurate or automatic detection techniques can enhance the quality of food production and minimize economic losses. In recent years, deep learning has brought tremendous improvements in the recognition accuracy of image classiﬁcation and object detection systems. Hence, in this paper, we utilized convolutional neural network (CNN)-based pre-trained models for efﬁcient plant disease identiﬁcation. We focused on ﬁne tuning the hyperparameters of popular pre-trained models, such as DenseNet-121, ResNet-50, VGG-16, and Inception V4. The experiments were carried out using the popular PlantVillage dataset, which has 54,305 image samples of different plant disease species in 38 classes. The performance of the model was evaluated through classiﬁcation accuracy, sensitivity, speciﬁcity, and F1 score. A comparative analysis was also performed with similar state-of-the-art studies. The experiments proved that DenseNet-121 achieved 99.81% higher classiﬁcation accuracy, which was superior to state-of-the-art models.


Introduction
Agriculture, being a substantial contributor to the world's economy, is the key source of food, income, and employment. In India, as in other low-and middle-income countries, where an enormous number of farmers exist, agriculture contributes 18% of the nation's income and boosts the employment rate to 53% [1]. For the past 3 years, the gross value added (GVA) by agriculture to the country's total economy has increased from 17.6% to 20.2% [2,3]. This sector provides the highest share of economic growth. Hence, the impact of plant disease and infections from pests on agriculture may affect the world's economy by reducing the production quality of food. Prophylactic treatments are not effective for the prevention of epidemics and endemics. Early monitoring and proper diagnosis of crop disease using a proper crop protection system may prevent losses in production quality.
Identifying types of plant disease is extremely important and is considered a crucial issue. Early diagnosis of plant disease may pave the way for better decision-making in managing agricultural production. Infected plants generally have obvious marks or spots on the stems, fruits, leaves, or flowers. Most specifically, each infection and pest condition leaves unique patterns that can be used to diagnose abnormalities. Identifying a plant disease requires expertise and manpower. Furthermore, manual examination when identifying the type of infection of plants is subjective and time-consuming, and, sometimes, the disease identified by farmers or experts may be misleading [4]. This may lead to the common among the datasets. Hence, the transfer learning approach has been found to be the most suitable and robust model for image classification [28]. Further, transfer learning can improve learning even when there is a smaller dataset. Figure 2 shows the basic idea behind transfer learning. CNN deep-learning models are popular for image-based research. They are efficient in learning low-level complex features from images. However, deep CNN layers are difficult to train as this process is computationally expensive. To solve such issues, transfer learning-based models have been proposed by various researchers [22][23][24][25][26]. Popular transfer learning models include VGG-16, ResNet, DenseNet, and Inception [27]. These models are trained with the ImageNet dataset, which consists of multiple classes. Such models can be used for training with any dataset as the features of the images, such as edges and contours, are common among the datasets. Hence, the transfer learning approach has been found to be the most suitable and robust model for image classification [28]. Further, transfer learning can improve learning even when there is a smaller dataset. Figure 2 shows the basic idea behind transfer learning. With transfer learning [22], tasks are more precise, as the model can be trained by freezing the last or the first layers. Thus, by freezing the layers, the model parameters cane be retained and tuned for feature extraction and classification [29]. In this study, we performed a comparative performance analysis of different transfer learning models with deep CNNs in order to enhance recognition and classification accuracy and attenuate time With transfer learning [22], tasks are more precise, as the model can be trained by freezing the last or the first layers. Thus, by freezing the layers, the model parameters cane be retained and tuned for feature extraction and classification [29]. In this study, we performed a comparative performance analysis of different transfer learning models with deep CNNs in order to enhance recognition and classification accuracy and attenuate time complexity. Our workflow architecture is depicted in Figure 3. The experiments were carried out using the PlantVillage dataset with pre-trained CNN models, such as VGG-16, DenseNet-121, ResNet-50, and Inception V4. The major contributions of this manuscript can be summarized as follows: • Development of a deep learning model for the diagnosis of various plant diseases; • Determination of the best transfer learning technique to achieve the most accurate classification and optimal recognition accuracy for multi-class plant diseases; • Resolution of distinct labeling and class issues in plant disease recognition by proposing a multi-class, multi-label transfer learning-based CNN model; • Resolution of the overfitting problem through data augmentation techniques; The rest of the article is arranged as follows. Section 2 provides a literature survey. The methodology used in this work is presented in Section 3. Section 4 discusses the various experiments conducted. The results and discussion are presented in Section 5. Finally, Section 6 concludes the paper with future directions.

Related Work
In the field of agricultural production, ignoring the early signs of plant disease may lead to losses in food crops, which could eventually destroy the world's economy [30]. This section presents an in-depth survey of state-of-the-art research in the field of leaf disease identification.
A CNN-based deep learning model was proposed for the accurate classification of plant disease in [31], and the model was trained using a publicly available dataset with 87,000 RGB images. Initially, preprocessing was undertaken, followed by segmentation. For classification, a CNN was used. Although this model attained a recognition accuracy

Related Work
In the field of agricultural production, ignoring the early signs of plant disease may lead to losses in food crops, which could eventually destroy the world's economy [30].
This section presents an in-depth survey of state-of-the-art research in the field of leaf disease identification.
A CNN-based deep learning model was proposed for the accurate classification of plant disease in [31], and the model was trained using a publicly available dataset with 87,000 RGB images. Initially, preprocessing was undertaken, followed by segmentation. For classification, a CNN was used. Although this model attained a recognition accuracy of 93.5%, it failed to classify some classes, leading to confusion with the classes in subsequent stages. Further, the performance of the model deteriorated due to limited availability of data. However to improve recognition accuracy, Narayanan et al. [32] proposed a hybrid convolutional neural network to classify banana plant disease. In their approach, the raw input image was preprocessed without altering any default information, and the standard image dimensions were maintained using a median filter. This approach used a fusion SVM along with a CNN. A multiclass SVM was used in the testing phase to identify the type of infection or disease in infected banana leaves, whereas the SVM was used in phase 1 to classify whether the banana leaves were healthy or infected. The classified CNN output was fetched as an input to the support vector machine, attaining a classification accuracy of 99%. The previous work stated that the CNN had better accuracy outcomes than traditional methods but this approach lacked diversity. Jadhav et al. [33] proposed a CNN for the identification of plant disease. In this approach, they used pre-trained CNN models to identify diseases in soybean plants. The experiments were carried out using pre-trained transfer learning approaches, such as AlexNet and GoogleNet, and attained better outcomes, but the model fell behind in the diversity of classification. Many existing models focus on identifying single classes of plant disease rather than building a model to classify various plant diseases. This is mainly due to the limited databases for training deep learning models with diversified plant species.
Jadhav et al. [34] were the first to propose a novel histogram transformation approach, which enhanced the recognition accuracy of deep learning models by generating synthetic image samples from low-quality test set images. The motive behind this work was to enhance the images in the cassava leaf disease dataset using Gaussian blurring, motion blurring, resolution down-sampling, and over-exposure with a modified MobileNetV2 neural network model. In their approach, synthetic images using modified color value distributions were generated to address the data shortage that a data-hungry deep-learning model faces during its training phase and achieve better outcomes.
Following Olusola et al., Abbas et al. [35], in their work proposed, a conditional generative adversarial network to generate a database of synthetic images of tomato plant leaves. With the advent of generative networks, previously expensive, time-consuming and laborious real-time data acquisition or data collection have become possible. Anh et al. [36] proposed a benchmark dataset-based multi-leaf classification model using a pre-trained MobileNet CNN model and found it efficient in classification, attaining a reliable accuracy of 96.58%. Further, a multi-label CNN was put forward in [20] for the classification of multiple plant diseases using transfer learning approaches, such as DenseNet, Inception, Xception, ResNet, VGG, and MobileNet, and the authors claim that theirs' is the first research work that classifies 28 classes of plant disease using a multi-label CNN. Classification of plant diseases using the Ensemble Classifier was proposed in [37]. The best ensemble classifier was evaluated with two datasets; namely, PlantVillage and Taiwan Tomato Leaves. Pradeep et al. [21] proposed the EfficientNet model using a convolutional neural network for multi-label and multi-class classification. The secret layer network in the CNN had a better impact on the identification of plant diseases. However, the model underperformed when validated with benchmark datasets. An effective, loss-fused, resilient convolutional neural network (CNN) was proposed in [38] using the publicly available benchmark dataset PlantVillage and achieved a classification accuracy of 98.93%. Though this method improved the classification accuracy, the model lagged in its performance when using real-time images under different environmental conditions. Later, Enkvetchakul and Surinta [39] proposed a CNN network with a transfer learning approach for two plant diseases. NASMobileNet and MobileNetV2 were the two pre-trained network models used for the classification of plant diseases, among which the most accurate prediction outcome was that based on the NASMobileNet algorithm. Overfitting in deep learning can be resolved using the data augmentation approach. The data augmentation technique was implemented in an experimental setup that included cut-out, rotation, zoom, shift, brightness, and mix-up. Leaf disease datasets and iCassava 2019 were the two kinds of dataset used. The maximum test accuracy attained after the evaluation was 84.51%. Table 1 shows the different convolutional neural network models that have been proposed to improve accuracy.

Methodology
CNN models are best suited for object recognition and classification with image databases. Despite the advantages of CNNs, challenges still exist, such as the long duration of training and the requirement for large datasets. To extract the low-level and complex features from the images, deep CNN models are required; this increases the complexity of the model training. Transfer learning approaches are capable of addressing the aforementioned challenges. Transfer learning uses pre-trained networks, in which model parameters learned on a particular dataset can be used for other problems. In this section, we discuss the methodologies used in this work.

Multi-Class Classification
Plant disease datasets hold multiple images infected and healthy plant samples, with each sample mapped to a particular class. For instance, if we consider the banana plant as a class, then all the images of healthy and infected samples of banana plants will be mapped to that specific class. Now, the classification of the target image is purely based on the features extracted from the source image. Considering the same example of the banana plant, the banana class has four sets of diseases; namely, xanthomonas wilt, fusarium wilt, bunchy top virus, and black sigatoka [32]. When a sample of one particular disease is fetched as input after training with all four sets of disease samples under the banana class, the testing phase output will classify the exact label of the disease from among the four categories mapped under that particular class. Thus, multi-class classification is mutually exclusive, whereas, in multi-label classification, each category inside a class is itself considered a different class. Suppose we have N classes, then we can refer to N multi-classes, and if the N classes have M categories, then each category inside each of the N classes is itself considered a class.

Transfer Learning Approach
In general, it takes several days or weeks to train and tune most state-of-art models, even if the model is trained on high-end GPU machines. Training and building a model from scratch is time-consuming. A CNN model built from scratch with a publicly available plant disease dataset seemed to attain 25% accuracy in 200 epochs, whereas using a pretrained CNN model using a transfer learning approach attained 63% accuracy in almost half the number of iterations (over 100 epochs). Transfer learning methods include several approaches, the choice of which depends on the choice of the pre-trained network model for classification and the particular nature of the dataset.

ResNe-50
ResNet-50 is a convolutional neural network that has 50 deep layers. The model has five stages, with convolution and identity blocks. These residual networks act as a backbone for computer vision tasks. ResNet [49] introduced the concept of stacking convolution layers one above the other. Besides stacking the convolution layers, they also have several skip connections, which bypass the original input to reach the output of the convolutional neural network. Furthermore, the skip connection can be placed before the activation function to mitigate the vanishing gradient issue. Thus, deeper models end up with more errors, and to resolve these issues, skip connections in the residual neural network were introduced. These shortcut connections are simply based on identity mapping.
Let us consider x as the input image, F(x) as the nonlinear layers fitting mappings, and H(x) as the residual mapping. Thus, the function for residual mapping becomes: ResNet-50 has convolution as an identity block. Each identity block has three convolutional layers and over 23 M trainable parameters. Input x and shortcut x are the two matrices, and they can only be added if the output dimension from a shortcut and the convolution layer after the convolution and batch normalization are the same. Otherwise, shortcut x must go through a convolution layer and batch normalization to match the dimension.

VGG-16
The VGG-16 [50] network model, also known as the Very Deep Convolutional Network for Large-Scale Image Recognition, was built by the Visual Geometry Group from Oxford University. The depth is pushed to 16-19 weight layers and 138 M trainable parameters. The depth of the model is also expanded by reducing the convolution filter size to 3 × 3. This model requires more training time and occupies more disk space.

DenseNet-121
DenseNet-121 [51] is a deep CNN model designed for image classification using dense layers with shorter connections between them. In this network, each layer receives additional inputs from its preceding layers and passes its generated feature maps to the succeeding layer. Concatenation is performed between each layer, through which the next successive layer receives collective knowledge from all the preceding layers. Further, the network is thin and small since the preceding layers' feature maps are mapped to the subsequent layers. In this manner, the number of channels in a dense block is reduced, and the growth rate of a channel is denoted by k. Figure 4 shows the working principle of a dense block in DenseNet. For each composition layer, regularization, activation, and convolution operations are carried out for the output feature maps of k channels. Batch normalization, ReLu activation and convolution, and pooling are performed to transform the outcome of subsequent layers: Agronomy 2022, 12, 2395 9 of 20 additional inputs from its preceding layers and passes its generated feature maps to the succeeding layer. Concatenation is performed between each layer, through which the next successive layer receives collective knowledge from all the preceding layers. Further, the network is thin and small since the preceding layers' feature maps are mapped to the subsequent layers. In this manner, the number of channels in a dense block is reduced, and the growth rate of a channel is denoted by k. Figure 4 shows the working principle of a dense block in DenseNet. For each composition layer, regularization, activation, and convolution operations are carried out for the output feature maps of k channels. Batch normalization, ReLu activation and convolution, and pooling are performed to transform the outcome of subsequent layers: The layers have a strong gradient flow and more diversified features. DenseNet is small compared to ResNet. Further, the classifiers in the standard ConvNet model process complex features, whereas DenseNet uses all features, even with different complexities, and provides smooth decision boundaries.

Inception V4
Images contain lots of details and salient features and may vary in size. With these variations in size, choosing the right filter size for feature extraction is challenging. For local information extraction, a smaller kernel size should be chosen, whereas, for global information, the kernel size should be large. Stacking up the convolution layers may result in overfitting and vanishing gradient problems. To solve this, the Inception modules incorporate different kernel sizes in each block, such that the network model becomes wider instead of deeper [52]. For instance, the naïve Inception module can use 3 × 3, 1 × 1, or 5 × 5 sizes for the filter after three different stages of convolution. Max-pooling is then performed and the outcome is concatenated and passed to the next layer. The stem of the Inception layer is meant for setting up an initial set of operations to be performed before the Inception module. Further, Inception V4 has reduction blocks to alter the height and width of the grids.

Experiments
The baseline system for evaluation of our experiments was a GPU NVIDIA GeForce The layers have a strong gradient flow and more diversified features. DenseNet is small compared to ResNet. Further, the classifiers in the standard ConvNet model process complex features, whereas DenseNet uses all features, even with different complexities, and provides smooth decision boundaries.

Inception V4
Images contain lots of details and salient features and may vary in size. With these variations in size, choosing the right filter size for feature extraction is challenging. For local information extraction, a smaller kernel size should be chosen, whereas, for global information, the kernel size should be large. Stacking up the convolution layers may result in overfitting and vanishing gradient problems. To solve this, the Inception modules incorporate different kernel sizes in each block, such that the network model becomes wider instead of deeper [52]. For instance, the naïve Inception module can use 3 × 3, 1 × 1, or 5 × 5 sizes for the filter after three different stages of convolution. Max-pooling is then performed and the outcome is concatenated and passed to the next layer. The stem of the Inception layer is meant for setting up an initial set of operations to be performed before the Inception module. Further, Inception V4 has reduction blocks to alter the height and width of the grids.

Experiments
The baseline system for evaluation of our experiments was a GPU NVIDIA GeForce GTX workstation. The operating environment was Windows 10, GDDR5 graphic memory type, Core i5 9th generation, 8 GB RAM. Software implementation was undertaken using the Anaconda3, Keras, OpenCV, Numpy CuDNN, and Theano libraries. CUDNN and CUMeM are simple libraries specially designed to carry out deep learning implementations with less memory and faster execution. Both these libraries were designed by NVIDIA to work in the Theano backend. OpenCV supports both academic and commercial project development and supports Linux, Windows, Mac OS, iOS, Python, Java, and Android interfaces. In this work, for each experiment, the training accuracy and the testing accuracy were evaluated. The losses obtained during the testing and training phases were calculated for each model. The models were trained using the PlantVillage dataset with the aim of accelerating the learning speed of the CNN with transfer learning models. The pre-trained models chosen for our study included ResNet-50, Inception V4, VGG-16, and DenseNet-121, which had been previously trained using the ImageNet dataset with 1.2 M images and 1000 image categories.

Description of Dataset
The PlantVillage [17] dataset is a publicly available dataset with different categories of plant diseases. This dataset comprises 38 classes with 54,305 images. For our experimental analysis, we split the dataset into training samples, testing samples, and validation samples. The pre-trained models were trained with 80% of the PlantVillage dataset, and 20% was used for validation and testing. Further, the total number of samples available for the plant classes was 54,305, out of which 43,955 samples were used for training, 4902 for validation, and 5488 for testing. All these train, test and validation sets include all the 38 classes of the different plant diseases. The details of the dataset split are presented in Table 2.

Preprocessing and Data Augmentation
The dataset held 38 classes with 26 diseases and 14 species of crops. For our experimental purpose, we used the colour images from the PlantVillage dataset, as they fit well with the transfer learning models. The images were downscaled to 256 × 256 pixels as a standardized format since we used different pre-trained network models that require different input sizes. For VGG-16, DenseNet-121 and ResNet-50, the input size is 224 × 224 × 3 (height, width, and channel width), whereas, for Inception V4, the input shape of images is 299 × 299 × 3 (height, width, and channel width). Though the dataset is huge, with around 54,000 images of different crop diseases, the images match the real-life images captured by farmers using different image acquisition techniques, such as Kinect sensors, high-definition cameras, and smart phones. Further, a dataset of such a size is prone to overfitting. Therefore, to overcome this, overfitting regularization techniques, such as data augmentation after preprocessing, were introduced. The augmentation processes used with the preprocessed images included clockwise and anticlockwise rotation, horizontal and vertical flipping, zoom intensity, and rescaling. The images were not duplicated but augmented during the training process, so the physical copies of the augmented images were not stored but were temporarily used in the process. This augmentation technique not only prevents the model from overfitting and model loss but also increases the robustness of the model so that, when the model is used to classify real-life plant disease images, it can classify them with better accuracy.

Fine-Tuning of Hyperparameters in Pre-Trained Models
The advantages of the transfer learning model are that it learns faster compared to models built from the scratch and that layers of the model can be frozen and the last layers trained for more accurate classification. Initially, certain standardizations of the hyperparameters for different pre-trained models were performed. The details of the hyperparameter tuning are listed in Table 3. The models were optimized using stochastic gradient descent. The initial learning rates of the DenseNet-121, ResNet-50, VGG-16, and Inception V4 models were set to 0.001. Each model was run for 30 epochs and the dropout value was fixed as 0.5. In our experiment, the output graph started to converge after a few iterations (i.e., from 30 epochs the graph started to converge); thus, our experiment overcame overfitting and degradation issues.

Network Architecture Model
The pre-trained network models where chosen based on their applicability for the plant disease classification task. The details of the model architecture are given in Table 4. Each network has different filter sizes for extracting specific features from feature maps. Filters play a key role in feature extraction. Further, each filter, when convolved with the input, will extract different features from it, and the specific feature extraction from the feature maps depends on the specific values of the filters. In our experiments, we used the actual pre-trained network models with the actual combinations of convolution layers and actual filter sizes used for each network model. The input image dimensions for the network are 224 × 224 × 3, and it has 64 channels in the first two layers with a filter size of 3 × 3 and stride of 2. The next two layers in the VGG-16 have 256 channels with 3 × 3 filters; followed by this is a max-pooling layer with stride of 2. After the pooling layer, there are two convolution layers with 256 channels with a 3 × 3 filter size. Following the two convolution layers, there are two sets of three convolution layers, along with a pooling layer, with 3 × 3 filters. The network includes one flatten layer, five max pool layers, and two dense layers.

Inception V4 Tuning Details
The Inception V4 block has two phases: one is for feature extraction and the other uses fully connected layers. Inception V4 includes a stem block and the Inception A, B, and C blocks, which are followed by the reduction blocks A and B and an auxiliary classifier block.

ResNet-50 Tuning Details
This residual CNN network has 50 layers, and the first layer is a convolutional layer with kernel size 7 × 7, a stride of 2, and 64 channels. The next three stages are convolution layers with filter sizes of 1 × 1, 3 × 3, and 1 × 1 and 64, 64, 256 channels. These are repeated three times. Similarly, the next convolution layers are repeated four times and the subsequent convolutional blocks are repeated six times.

DenseNet-121 Tuning Details
DenseNet-121 increases the depth of the convolutional neural network by solving the vanishing gradient issues. It has four dense blocks. In the first dense block, convolution is performed with 1 × 1 and 3 × 3 filter sizes, and this is repeated six times. Similarly, in the second dense block, convolution is performed using the filter sizes 3 × 3 and 1 × 1 and the steps are repeated 12 times. In the third dense block, convolution operations with the same filter size are repeated 24 times, and in the fourth dense block, the steps are repeated 16 times. In between the dense blocks are transition blocks with convolution and pooling layers.

Results and Discussion
This part of the study employed state-of-the art deep learning models using the transfer learning approach for the diagnosis of plant diseases. PlantVillage, a publicly available dataset, was used to further train the pre-trained deep CNN networks, which were previously trained with the ImageNet dataset. For our experiment, each model was standardized with a learning rate of 0.01, a dropout of 0.5, and 38 output classes.
The dataset was split into training, test, and validation samples. A total of 80% of the samples from PlantVillage were used for training the pre-trained Inception V4, VGG-16, ResNet, and DenseNet-121 models. Each model was run for 30 epochs and it was found that our model started to converge after 10 epochs with high accuracy. The graph in Figure 5a depicts the recognition accuracy of the Inception V4 model. The training accuracy achieved using the inception V4 model was 99.78, and Figure 5b shows the log loss of the Inception V4 model. DenseNet-121 increases the depth of the convolutional neural network by solving the vanishing gradient issues. It has four dense blocks. In the first dense block, convolution is performed with 1 × 1 and 3 × 3 filter sizes, and this is repeated six times. Similarly, in the second dense block, convolution is performed using the filter sizes 3 × 3 and 1 × 1 and the steps are repeated 12 times. In the third dense block, convolution operations with the same filter size are repeated 24 times, and in the fourth dense block, the steps are repeated 16 times. In between the dense blocks are transition blocks with convolution and pooling layers.

Results and Discussion
This part of the study employed state-of-the art deep learning models using the transfer learning approach for the diagnosis of plant diseases. PlantVillage, a publicly available dataset, was used to further train the pre-trained deep CNN networks, which were previously trained with the ImageNet dataset. For our experiment, each model was standardized with a learning rate of 0.01, a dropout of 0.5, and 38 output classes.
The dataset was split into training, test, and validation samples. A total of 80% of the samples from PlantVillage were used for training the pre-trained Inception V4, VGG-16, ResNet, and DenseNet-121 models. Each model was run for 30 epochs and it was found that our model started to converge after 10 epochs with high accuracy. The graph in Figure  5a depicts the recognition accuracy of the Inception V4 model. The training accuracy achieved using the inception V4 model was 99.78, and Figure 5b shows the log loss of the Inception V4 model. The second experiment evaluated the VGG-16 model using the same dataset. After standardization of the hyperparameters, the model was trained with 80% of the same dataset, with 10% used for testing and the remaining 10% of the image samples used for testing and validation. It can be observed from Figure 6a that the model recognition accuracy reached around 78% in the initial 10 epochs, after which is steadily increased to attain the maximum recognition accuracy of 84.27%, which was lower than the Inception V4 model. The training loss and the validation model were found to be 0.52% and 0.64%, respectively, as seen in Figure 6b. The second experiment evaluated the VGG-16 model using the same dataset. After standardization of the hyperparameters, the model was trained with 80% of the same dataset, with 10% used for testing and the remaining 10% of the image samples used for testing and validation. It can be observed from Figure 6a that the model recognition accuracy reached around 78% in the initial 10 epochs, after which is steadily increased to attain the maximum recognition accuracy of 84.27%, which was lower than the Inception V4 model. The training loss and the validation model were found to be 0.52% and 0.64%, respectively, as seen in Figure 6b. The third experiment was undertaken with the ResNet-50 model. The same method was applied in the evaluation of model loss and recognition accuracy, and the graphs for recognition accuracy and validation and training loss are plotted in Figure 7a After hyperparameter standardization, the final experiment was executed with DenseNet-121, which has 121 layers with four dense blocks and a transition layer between each dense block. Figure 8a,b show the graphs plotted for the training and validation accuracy/loss for 30 epochs. In the testing phase after training, the maximum accuracy achieved was 99.81% and the maximum validation loss calculated was 0.0154%. A comparative performance analysis is shown in Table 5 for the pre-trained network model experiments. After hyperparameter standardization, the final experiment was executed with DenseNet-121, which has 121 layers with four dense blocks and a transition layer between each dense block. Figure 8a,b show the graphs plotted for the training and validation accuracy/loss for 30 epochs. In the testing phase after training, the maximum accuracy achieved was 99.81% and the maximum validation loss calculated was 0.0154%. A comparative performance analysis is shown in Table 5 for the pre-trained network model experiments. After hyperparameter standardization, the final experiment was executed with DenseNet-121, which has 121 layers with four dense blocks and a transition layer between each dense block. Figure 8a,b show the graphs plotted for the training and validation accuracy/loss for 30 epochs. In the testing phase after training, the maximum accuracy achieved was 99.81% and the maximum validation loss calculated was 0.0154%. A comparative performance analysis is shown in Table 5 for the pre-trained network model experiments.  In agricultural production, early diagnosis of crop disease is essential for high yields. To maintain a high production rate, the latest technologies should be implemented in the early diagnosis of plant disease. It was observed from the literature study that deep learning models are efficient in image classification, and transfer learning based models are efficient in eliminating training complexity and huge dataset requirements. Hence, in this work, we evaluated four pre-trained models-VGG-16, ResNet-50, Inception V4, and DenseNet-121-to determine the model that was best capable of classifying various plant diseases. The results for the pre-trained models were evaluated with evaluation metrics, such as specificity, sensitivity, and F1 score values. The validation accuracy in terms of the F1 score was calculated and a graphical representation the validation accuracy for the pre-trained models is depicted in Figure 9. It was inferred that DenseNet-121 (Figure 9d) outperformed the other network models (Figure 9a-c) and attained the highest validation peak with 0.998, which is very close to an F1 score of 1. In general, the value of an F1 score ranges from 0 to 1. A model's performance is relatively better when it is closer to 1. In our analysis, after repeating the same experiments for all the pre-trained models, we found that the highest validation accuracy in terms of the F1 score was achieved by DenseNet-121 at 0.998, whereas it was 0.887 for Inception V4, 0.901 for VGG-16, and 0.935 for ResNet-50.  In agricultural production, early diagnosis of crop disease is essential for high yields. To maintain a high production rate, the latest technologies should be implemented in the early diagnosis of plant disease. It was observed from the literature study that deep learning models are efficient in image classification, and transfer learning based models are efficient in eliminating training complexity and huge dataset requirements. Hence, in this work, we evaluated four pre-trained models-VGG-16, ResNet-50, Inception V4, and DenseNet-121-to determine the model that was best capable of classifying various plant diseases. The results for the pre-trained models were evaluated with evaluation metrics, such as specificity, sensitivity, and F1 score values. The validation accuracy in terms of the F1 score was calculated and a graphical representation the validation accuracy for the pre-trained models is depicted in Figure 9. It was inferred that DenseNet-121 (Figure 9d) outperformed the other network models (Figure 9a-c) and attained the highest validation peak with 0.998, which is very close to an F1 score of 1. In general, the value of an F1 score ranges from 0 to 1. A model's performance is relatively better when it is closer to 1. In our analysis, after repeating the same experiments for all the pre-trained models, we found that the highest validation accuracy in terms of the F1 score was achieved by DenseNet-121 at 0.998, whereas it was 0.887 for Inception V4, 0.901 for VGG-16, and 0.935 for ResNet-50.
A statistical representation of the pre-trained network models based on the evaluation metrics is shown in Figure 10. The vanishing gradient issues resulting from skip connections were eliminated using regularization techniques, such as batch normalization. With deeper models, various challenges, such as overfitting, covariant shifts, and training time complexity, occurred. To overcome these challenges in our experiments, we fine-tuned the hyperparameters. The experiments used sensitivity to predict the proportion of actually healthy plants classed as healthy (true positive) and actually healthy plants classed as unhealthy (false negative). From the evaluation, it was observed that ResNet-50 and DenseNet-121 performed better than the VGG-16 and Inception V4 models. A performance analysis of the different pre-trained models based on the specificity, sensitivity, and F1 score is shown in Figure 10. A statistical representation of the pre-trained network models based on the evaluation metrics is shown in Figure 10. The vanishing gradient issues resulting from skip connections were eliminated using regularization techniques, such as batch normalization. With deeper models, various challenges, such as overfitting, covariant shifts, and training time complexity, occurred. To overcome these challenges in our experiments, we finetuned the hyperparameters. The experiments used sensitivity to predict the proportion of actually healthy plants classed as healthy (true positive) and actually healthy plants classed as unhealthy (false negative). From the evaluation, it was observed that ResNet-50 and DenseNet-121 performed better than the VGG-16 and Inception V4 models. A performance analysis of the different pre-trained models based on the specificity, sensitivity, and F1 score is shown in Figure 10.
Specificity is a measure of the proportion of actually unhealthy plants predicted to be unhealthy (true negative) and the actually unhealthy leaves predicted to be healthy (false positive) = + (4)  Table 6 presents a comparison of the obtained results with those from state-of-the-art studies from the literature that used transfer learning models. We considered state-of-theart studies from the literature that experimented on the PlantVillage dataset. It was observed from the analysis that our work considered more plant disease classes. Further, Specificity is a measure of the proportion of actually unhealthy plants predicted to be unhealthy (true negative) and the actually unhealthy leaves predicted to be healthy (false positive) speci f icity = True Negative (True Negative + False positive) (4) Table 6 presents a comparison of the obtained results with those from state-of-the-art studies from the literature that used transfer learning models. We considered state-ofthe-art studies from the literature that experimented on the PlantVillage dataset. It was observed from the analysis that our work considered more plant disease classes. Further, our fine-tuned, pre-trained model achieved the best accuracy of 99.81%.

Conclusions
In this work, we successfully analysed the different transfer learning models suitable for the accurate classification of 38 different classes of plant disease. Standardization and evaluation of state-of-the-art convolutional neural networks using transfer learning techniques were undertaken based on the classification accuracy, sensitivity, specificity, and F1 score. From the performance analysis of the various pre-trained architectures, it was found that DenseNet-121 outperformed ResNet-50, VGG-16, and Inception V4. Training the DenseNet-121 model seemed to be easy, as it had a smaller number of trainable parameters with reduced computational complexity. Hence, DenseNet-121 is more suitable for plant disease identification when there is a new plant disease that needs to be included in the model, demonstrating reduced training complexity. The proposed model achieved a classification accuracy of 99.81% and F1 score of 99.8%.
In future work, we will address the problems in real-time data collection and develop a multi-object deep learning model that can even detect plant diseases from a bunch of leaves rather than a single leaf. Furthermore, we are working towards implementing a mobile application with the trained model from this work. It will help farmers and the agricultural sector in real-time leaf disease identification.