A New Compact Method Based on a Convolutional Neural Network for Classiﬁcation and Validation of Tomato Plant Disease

: With recent advancements in the classiﬁcation methods of various domains, deep learning has shown remarkable results over traditional neural networks. A compact convolutional neural network (CNN) model with reduced computational complexity that performs equally well compared to the pretrained ResNet-101 model was developed. This three-layer CNN model was developed for plant leaf classiﬁcation in this work. The classiﬁcation of disease in tomato plant leaf images of the healthy and disease classes from the PlantVillage (PV) database is discussed in this work. Further, it supports validating the models with the images taken at “Krishi Vigyan Kendra Narayangaon (KVKN),” Pune, India. The disease categories were chosen based on their prevalence in Indian states. The proposed approach presents a performance improvement concerning other state-of-the-art methods; it achieved classiﬁcation accuracies of 99.13%, 99.51%, and 99.40% with N1, N2, and N3 models, respectively, on the PV dataset. Experimental results demonstrate the validity of the proposed approach under complex background conditions. For the images captured at KVKN for predicting tomato plant leaf disease, the validation accuracy was 100% for the N1 model, 98.44% for the N2 model, and 96% for the N3 model. The training time for the developed N2 model was reduced by 89% compared to the ResNet-101 model. The models developed are smaller, more efﬁcient, and less time-complex. The performance of the developed model will help us to take a signiﬁcant step towards managing the infected plants. This will help farmers and contribute to sustainable agriculture.


Introduction
As the world's population grows, so does the demand for healthy, high-quality food. Agriculture is one of the most significant factors in the economy for many countries. It is considered a way of life and a national focus. Farming enables people with little or no farming experience to grow plants or crops [1]. "The farmers should take preventive steps to protect the farm from diseases that can be proactively prevented if the cause of the disease is known in advance. Traditional techniques used are cumbersome and expensive" [2,3]. "The diagnosis of diseases, if mistaken by the experts due to the sizeable cultivating area they have to inspect and treating the plants, may not be sufficient to save the plant or reduce the diseases in them" [4]. "As a part of the concern, the farmers followed the steps to spray pesticides or chemical fertilizers to get rid of the diseases." However, this harms the crop along with the ecosystem.
The multidisciplinary approach incorporates botanical data, the species concept, and computer-aided plant classification solutions [5]. Botanists can now use computer vision ResNet-50 and ResNet-101 into healthy and disease cases with a mean average precision of 90.87% from ResNet-101. Li et al. [32] achieved an accuracy of 95% with the same model with a different training based on CNN for the remotely sensed images." In their work, Rangarajan et al. [17] classified tomato plant disease (TPD) with AlexNet and VGG16 for one healthy and six disease classes. Brahimi et al. [33] classified TPDs with AlexNet and the GoogLeNet model; for the PV dataset, they attained accuracies of 98.66% and 99.18%, respectively. Zhang et al. [34] used the ResNet-50 model to identify tomato leaf disease and achieved an accuracy of 97.28%. Karthik et al. [35], in their work for the detection of TPD, attained an accuracy of 95% with a residual CNN model and 98% with an attention-based residual CNN model. In detecting tomato plant leaves (TPL) with disease, Gonzalez et al. [36] used four models, MobileNetV2, NasNetMobile, Xception, and MobileNetV3, for the PV dataset and achieved accuracies of 75%, 84%, 100%, and 98%, respectively. Table 1 provides a comparative study of related work in plant disease classification. The PV database consists of 38 different plants leaves with healthy and disease classes of 14 species [37]. This paper offers the TPD classification with the proposed compact CNN models. In this work, a database with TPDs that occur in Indian states and the healthy type was chosen for analysis. Classification of nine leaf classes consisting of "Tomato Healthy (H) and disease classes Bacterial Spot (BS), Early Blight (EB), Late Blight (LB), Leaf Mold (LM), Mosaic Virus (MV), Septoria Leaf Spot (SLS), Target Spot (TS), and Yellow Leaf Curl Virus (YLCV)" was performed. The performances of the developed models are compared with that of ResNet-101 with transfer learning. The proposed models have less depth as compared to ResNet-101. The main contributions of this work are as follows:

1.
Three highly accurate and compact models, N1, N2, and N3, have been proposed for the disease classification of TPL. The proposed models show high classification accuracy and require short training times. The performances of the models were validated by employing them to classify TPL from the challenging PV dataset and KVKN dataset. The models exhibited high classification accuracy for an unknown dataset.

2.
The proposed models maintained good classification accuracy with compact model size. N1 and N3 were 8.5 MB in size, and N2 model was 17.14 MB.

3.
To validate the versatility of the proposed models, they were also employed in tomato leaf disease classification using images captured from a mobile phone. The disease classification accuracy shows that the proposed models are well suited for TPL disease classification.
The paper describes the materials and methods in Section 2, followed by results and discussions in Section 3, and the conclusion in Section 4.

Materials and Methods
This research involved the classification of TPDs and the validation of the trained model with unknown data. Figure 1 depicts the workflow for classifying nine classes of TPL.

Dataset and Pre-Processing
The TPL images from the PV database were used in this work [37]. The healthy tomato class and eight diseased leaf categories found in Indian states were used for classification purposes. Classification of nine leaf classes consisting of "Tomato Healthy (H) and disease classes Bacterial Spot (BS), Early Blight (EB), Late Blight (LB), Leaf Mold (LM), Mosaic Virus (MV), Septoria Leaf Spot (SLS), Target Spot (TS), and Yellow Leaf Curl Virus (YLCV)" was done. "It is critical to adhere to the basic steps that are customary in the study, one of which is pre-processing, for the actual maneuver of any algorithm and the preservation of uniformity in the study" [34,[38][39][40][41]. The dataset was augmented with color augmentation of saturation, hue, and contrast; position augmentation of rotation by 45°, 135°, 225°, and 315°; and flipping horizontally and vertically, during the pre-processing stage. Saturation augmentation modifies the image's vibrancy. A grayscale image is fully desaturated, a partially desaturated image has muted colors, and positive saturation shifts colors closer to the primary colors. Adjusting the saturation of an image can help your model perform better. Hue augmentation changes the color channels of an input image at random, causing a model to consider alternative color schemes for objects and scenes in the input image. This technique is helpful in ensuring that a model does not memorize the colors of a given object or scene. Hue augmentation allows a model to consider the edges and shapes of things, and the colors. The degree of separation between an image's darkest and brightest areas is defined as contrast. The dataset is augmented with said combination of color and position augmentation. The augmented dataset consisted of 94,500 images resized to a standard size of 256 × 256 × 3 for the developed N1, N2, and N3 models and 224 × 224 × 3 for the ResNet-101 model. Table 2 shows the PV dataset images for each class before and after data augmentation. The KVKN dataset was used for predicting the performances of the trained models. The authors collected the data on the farm of KVKN, which were not augmented. Table 2. Class-wise image data before and after augmentation of PV database.

CNN Models
The primary goal of this research was to develop a computationally less complex and precise "learning model" for classifying TPL. Figure 2 depicts the proposed compact CNN model for the classification and validation of the TPD. The three CNN models have variations in the Conv2D layer, as shown in Table 3. There are three sets of convolution 2D layers, "Conv2D layer," "batch normalization layer," and "ReLU layer." The "maxpooling layer follows the first two sets"; the "fully connected layer, softmax classifier, and classification layer" follow the third set. The functional descriptions of convolutional layers for the developed CNN model 1 (N1), model 2 (N2) and, model 3 (N3) are as shown in Table 3. Table 3. Functional description of convolution layers for N1, N2, and N3 models.

CNN Layer CNN Model
The convolutional layer describes a collection of the filters carrying out convolution over the entire image. Each convolutional layer learns the numerous features that detect discriminatory outlines in the tomato leaves to distinguish the type of disease in this architecture. CNN's feature extractor comprises particular neural networks that decide on weights during the training process. Deep neural networks see diverse feature evidence from the preceding layer after each gradient apprises a dataset. Furthermore, as the parameters of the initial layers are restructured through the training phase, the data delivery of this input feature map differs significantly. This significantly impacts training speed and necessitates various heuristics to determine parameter initialization [35]. This model employs an activation function known as the rectified linear unit (ReLU). It is the identity function, f(x) = x, for all positive values of input 'x,' and zeros for the negative values. ReLU is triggered sparingly, mimicking the neuron's inactivity in response to certain impulses. The neural network classification then operates on the image features and generates the output. The pooling layer activates only a subset of the feature map neurons. A '2-by-2' window is used across all blocks with a stride factor of '2.' The feature maps' width and height are effectively reduced while the number of channels remains constant. The neural network includes convolution layer piles and sets of pooling layers for feature extraction. The convolution layer employs the convolution process to transform the image. It is best described as a series of digital filters. The pooling layer combines neighboring pixels into a single pixel. The pooling layer then reduces the image dimensions. Batch normalization significantly reduces training time by normalizing the input of each layer in the network, not just the input layer. This method allows for higher learning rates, which reduces the number of training steps required for the network to converge [42]. The softmax function is the activation function in the CNN model's output layer that predicts a multinomial probability distribution." "The benefits of small filter sizes over fully connected networks are that they minimize computing costs and weight sharing, resulting in lower back-propagation weights. Until now, the best choice for practitioners has been 3 × 3 [43,44]. The N1 CNN model has a fixed filter size of 3 × 3 in all three convolution layers. In the 1st Conv2D, there are eight filters, and in the 2nd Conv2D and 3rd Conv2D, there are 16 and 32 filters, respectively. In the N2 CNN model, the filter size is 3 × 3, and the number of filters in them is doubled compared to N1. In the N3 CNN model, the filter size for the 1st Conv2D layer is 7 × 7, with eight filters. The 2nd Conv2D layer is 5 × 5, with 16 filters, and the 3rd Conv2D layer is 3 × 3 with 32 filters." The VGG16 model [45] is a 16-layer CNN model. The VGG16 model is shown in Figure 3. The VGG16 model has a convolutional layer, followed by a ReLU activation layer. All the convolutional layers have a filter size of 3 × 3 but a specific number of filters for the convolution. "The max-pooling layer follows two sets of two convolutional layers and one ReLU layer combinations. The max-pooling layer follows the next three sets of three convolutional layer and one ReLU layer combinations; these layers are followed by the fully connected layer, softmax layer, and classification layer." The proposed N1, N2, and N3 models have batch normalization layers. The top-5 error for the ResNet-50 model is 5.25; for ResNet-101 it is 4.60; and for ResNet-152 it is 4.49 [11]. ResNet-101 performs between ResNet-50 and ResNet-152, so ResNet-101 was chosen for the classification in this work. We used the ResNet-101 model with transfer learning and proposed N1, N2, and N3 models to classify nine TPL classes. This augmented dataset was used to train CNN models for TPL classification. This work's models were created in MATLAB2019b using a deep learning toolbox. The dataset was split into training and testing datasets, 80-20%, involving healthy and diseased plant leaves. "The performance parameters evaluated were macro-recall, macro-precision, macro-F1-score, and mean accuracy. Sensitivity/recall is the measure of the model that appropriately detects the positive class and is also known as the true positive rate. The model assigning positive events to the positive class is measured by a positive predictive value, also known as precision. F1-score is the harmonic mean of recall and precision. Macro-recall is the average per-class effectiveness of a classifier at identifying class labels. Macro-precision is the average agreement of data class labels per class with classifiers. Macro-F1-score is the relation between positive labels of the data and those agreed to by the classifier based on per-class average. Accuracy is the ratio of correct predictions to all predictions." where C represents the number of classes.

Validation of the Trained CNN Model
Following classification, the CNN models were validated using images from the PV database that were not included in the training or testing sets and images taken at KVKN. Models were validated using 1090 images, which aided in predicting the class and accuracy of unknown data.

Results and Discussion
The entire investigation was carried out on an augmented dataset of 94,500 images from the PV database for nine Tomato plant classes. Figure 4 depicts healthy and diseased TPL images. As per the pre-processing in Section 2.1, data augmentation and image resizing were performed. Transformations were used to increase the data to avoid overfitting the training models and generalizing their responses. The dataset was augmented with color augmentation of saturation, hue, and contrast; position augmentation of rotation by 45°, 135°, 225°, and 315°; and flipping horizontally and vertically. Figure 5 shows some of the pre-processed images of TPL with color augmentation of hue, saturation, and contrast. The first row shows the original images of different TPL classes. The images in the second, third, and fourth row show the saturation, hue, and contrast augmentation, respectively, of the images in row one. "Overfitting occurs when the model fits well to training data but does not generalize well to new, previously unseen data. Overfitting problems can be prevented by taking measures such as data augmentation, simplifying the models, dropout, regularization, and early stopping" [48,49]. To ensure consistency, all networks used here had the same hyperparameters. In this work, the mini-batch size was set to 10, the epochs were set to 2, and the learning rate was set to 0.0001. "The training loss (TL) was reduced to a minimum value in two epochs. Hence, two epochs was chosen. The training accuracy (TA) and TL, along with the validation accuracy (VA) and validation loss (VL), were as shown in Figure  6 for the ResNet-101, N1, N2, and N3 models. The model with increasing TA and VA and decreasing TL and VL show that overfitting was prevented. The TA and TL for the models are shown in Figure 6a for ResNet-101, Figure 6c for N1, Figure 6e for N2, and Figure 6g for N3. The VA and VL for the models are shown in Figure 6b for the ResNet-101 model, Figure 6d for the N1 model, Figure 6f for the N2 model, and Figure 6h for    (h) Figure 7. TL, VA, and VL were calculated for the ResNet-101 and N1, N2, and N3 models.
The tomato plant images were classified with the ResNet-101 model with transfer learning and the proposed N1, N2, and N3 models, as shown in Figure 8. For TPD classification, each model was trained with 80% of the dataset and tested with 20% of the dataset. The classified images with the N1 model are shown in Figure 8a.  Table 4 shows the classification accuracies of N1, N2, N3, and ResNet-101 models for 80% of the training dataset. The table compares previous work on plant leaves' classification with the proposed work. Brahimi et al. [33] attained accuracies of 98.66% and 99.18% for AlexNet and GoogLeNet, respectively. N1, N2, and N3 achieved accuracies of 99.13%, 99.51%, and 99.40%, respectively. The input size of images for AlexNet was 227 × 227 × 3, and it was 224 × 224 × 3 for GoogLeNet, VGG 16, and ResNet models. The ten-layer CNN model was fed with images of size 64 × 64 × 3, and the attention-based residual CNN was 256 × 256 × 3. All the models compared in Table 4 were trained with the PV database. "The ten-layer CNN model by [22] achieved accuracy of 87.92% with the Flavia dataset and 84.02% with the PV dataset." When identifying TPL disease, Anandhakrishnan et al. [50] achieved an accuracy of 99.45% with the Xception V4 model. Qiu et al. [8], in their work on plant disease recognition on a self-collected dataset, achieved an average accuracy of 97.62%. The VGG16 model was used to train a "teacher model" with a better recognition rate and a much larger volume than the "student model." The information was then transferred to MobileNet via distillation. This process reduced the model size to 19.83 MB. The classification accuracy for the pretrained VGG16 model was 99.21%, and the size of the trained model was 477 MB. The proposed trained models N1 and N3 were 8.5 MB, and the N2 model was 17.14 MB. The pre-trained ResNet-101 demonstrated classification accuracy of 99.97%, and the size of the trained model was 151 MB. AlexNet, GoogLeNet, and VGG 16 had larger model sizes as compared to the N1, N2, and N3 models. The developed N2 model achieved accuracy in the same range as ResNet-101 and VGG16, though being 88.65% smaller than ResNet-101 and 96.41% smaller than VGG16. The CNN model's training time is also critical, as illustrated in Figure 9. The N1, N2, and N3 models are three-layer CNN models that are compact in size. The N1 model takes less time than the N2 model. The N2 model has twice the number of filters as the N1 model. The VGG16 model also took more training time than the proposed models. There was a steep rise in the training time for the ResNet-101 model. Note that 89% of the time was reduced for training the N2 model compared to the ResNet-101 model. The proposed models have shown better results than state-of-the-art classifiers. The confusion matrix contains information on the correct and incorrect classification of each of the nine classes of tomato leaves. Table 5 shows the confusion matrix for the ResNet-101 and proposed N1, N2, and N3 models.  The performance parameters were calculated based on the elements of the confusion matrix. Based on this, the performance parameters for N1, N2, N3, and ResNet-101 models are shown in Table 6. Brahimi et al. [33] stated that "the mean accuracy was 99.18% for GoogLeNet." The average precisions by [31] for classification of disease in tomato fruit using VGG16, ResNet-50, and ResNet-101 were 88.28%, 89.53%, and 90.87%, respectively. The proposed N1, N2, N3, and ResNet-101 models achieved macro-precision of 99.13%, 99.51%, 99.40%, and 98.10%, respectively. The proposed N1, N2, and N3 models achieved mean accuracies of 99.81%, 99.89%, 99.86%, respectively, and a mean accuracy of 99.58% for the ResNet-101 model. Macro-recall, macro-precision, and macro-F1-score of N1, N2 and N3 are higher than those of the ResNet-101 model. The performance parameters of recall, precision, F1-score, and accuracy for all nine classes of TPL for the proposed N1, N2, N3 model, and ResNet-101 model are shown in Figures 10-13. The recall for the nine classes is shown in Figure 10. The performance of the N2 model is suitable for all classes. The recall values for EB, SLS, and TS classes are low for the ResNet-101 model compared to the proposed N1, N2, and N3 models. Precision for all the nine classes of TPL was evaluated and is shown in Figure 11. The F1-score is shown in Figure 12 for all nine classes. The ResNet-101 model showed lower performances for EB, LB, LM, SLS, and TS. Accuracy for the TPL classes is shown in Figure 13. The accuracy for all classes was good for the proposed N1, N2, and N3 models.    As per the pre-processing section, The receiver operating characteristic (ROC) explicitly states how well the probabilities of the positive and negative classes are distinguished. The ROC curve is generated by varying this probability threshold and computing the corresponding true positive rate (TPR) and false positive rate (FPR). The x-axis in ROC represents the FPR, and the y-axis represents the TPR [53,54]. The area under the curve (AUC) is a critical calculation metric for assessing the performance of any classification model. It indicates how well the model can distinguish between classes. The higher the AUC, the more accurately the model predicts the classes. ROC is a probability curve, and the AUC represents the degree or measure of separability. "The AUC is a measure of the ability of a classifier to distinguish between classes and is used as a summary of the ROC curve. The higher the AUC, the better the model is at distinguishing between the positive and negative classes. The ROC curves for the ResNet-101 and proposed N1, N2, and N3 models are shown in Figure 14." The AUC for ResNet-101 and N2 is 100%, and the AUC for N1 and N3 is 99.98%. This result shows the excellent performance of the N1, N2, N3, and ResNet-101 models in the classification of TPL. The average precision (AP) is an essential parameter in a detection or classification task; it is the area under the precision-recall curve. The AP for the nine classes is shown in Table 7 for the PV dataset. The trained N1, N2, N3, and ResNet-101 models were validated with the anonymous data to predict its class and accuracy. The mean accuracy for validating the model is shown in Figure 15 for anonymous PV data. The N2 model delivered excellent performance for all the classes compared with N1 and N3 models. N2 showed mean accuracy of 96.40% for H and LB classes. EB had low accuracy for all the models. Overall, it can be seen that the N2 model behaved exceptionally well in classification and prediction, along with ResNet-101. Figure 16 shows the trained models' predictions of the KVKN captured image. The image class was predicted as LB by all the models, and the accuracy of prediction by N1 was 100%, that of N2 was 98.44%, that of N3 model was 96%, and that of ResNet-101 was 95.95%.  Computational models with robustness and high precision computing output have extended their usage in practical application scenarios, including classification in healthcare, industry, etc. The developed N1 model, N2 model, and N3 model trained on the PV dataset were able to predict the class of TPL of the KVKN dataset. The models can be deployed via applications on mobile phones in the future, allowing farmers to make quick decisions about tomato plant disease management. The management step towards infected plants can be spraying appropriate pesticides or just removing the infected plants from the field to avoid the further spread of disease.

Conclusions
This work used deep learning models, N1, N2, N3, and ResNet-101, to classify TPL images from the PV database. The developed model showed an accuracy of classification equally as good as ResNet-101. Compared to the ResNet-101 model, the developed model's training time was reduced by 92% for the N1 model, 89% for the N2 model, and 90% for the N3 model. N2 is 88.65% more compact than ResNet-101 and is about as accurate. The developed models outperform ResNet-101 in terms of performance parameters such as "macro-precision, macro-F1-score, and mean accuracy." The proposed N2 model had an AUC of 100%, and the N1 and N3 models have an AUC of 99.98%, indicating good classifier performance. The average precision in each tomato plant class has been consistently strong, affirming a more robust classification process. The PV and KVKN images were used to validate these trained models. The mean accuracies of N1, N2, N3, and ResNet-101 were 99.81%, 99.89%, 99.86%, and 99.58%, respectively, for the PV test dataset. The prediction accuracies for proposed N1, N2, N3, and ResNet-101 models were 100% LB, 98.44% LB, 96% LB, and 95.95% LB for the KVKN dataset. This classification problem will assist farmers in detecting and taking appropriate steps for disease management, which will benefit society.
In the future, the models can be deployed via an application to mobile phones that can help farmers make rapid decisions about the management of tomato plant diseases.