Automatic and Reliable Leaf Disease Detection Using Deep Learning Techniques

: Plants are a major source of food for the world population. Plant diseases contribute to production loss, which can be tackled with continuous monitoring. Manual plant disease monitoring is both laborious and error-prone. Early detection of plant diseases using computer vision and artiﬁcial intelligence (AI) can help to reduce the adverse effects of diseases and also overcome the shortcomings of continuous human monitoring. In this work, we propose the use of a deep learning architecture based on a recent convolutional neural network called EfﬁcientNet on 18,161 plain and segmented tomato leaf images to classify tomato diseases. The performance of two segmentation models i.e., U-net and Modiﬁed U-net, for the segmentation of leaves is reported. The comparative performance of the models for binary classiﬁcation (healthy and unhealthy leaves), six-class classiﬁcation (healthy and various groups of diseased leaves), and ten-class classiﬁcation (healthy and various types of unhealthy leaves) are also reported. The modiﬁed U-net segmentation model showed accuracy, IoU, and Dice score of 98.66%, 98.5%, and 98.73%, respectively, for the segmentation of leaf images. EfﬁcientNet-B7 showed superior performance for the binary classiﬁcation and six-class classiﬁcation using segmented images with an accuracy of 99.95% and 99.12%, respectively. Finally, EfﬁcientNet-B4 achieved an accuracy of 99.89% for ten-class classiﬁcation using segmented images. It can be concluded that all the architectures performed better in classifying the diseases when trained with deeper networks on segmented images. The performance of each of the experimental studies reported in this work outperforms the existing literature.


Introduction
Agriculture contributed to the domestication of today's major food crops and animals thousands of years ago. Food insecurity [1], which is a major cause of plant diseases [2], is one of the major global problems that humanity faces today. According to one study, plant diseases account for around 16 percent of global crop yield loss [3]. Pest losses are projected to be about 50 percent for wheat and 26-29 percent for soybeans globally [3]. Fungi, funguslike species, bacteria, viruses, viroid, virus-like organisms, nematodes, protozoa, algae, and parasitic plants are the main classes of plant pathogens. Many applications have benefited from artificial intelligence (AI), machine learning (ML), and computer vision, including power prediction from renewable resources [4,5] and biomedical applications [6,7]. During the COVID-19 pandemic, AI was used extensively for the identification of lung-related diseases [8][9][10][11] as well as other prognostic applications [12]. Similar advanced technologies can be used to mitigate the negative effects of plant diseases by detecting and diagnosing them at an early level. The application of AI and computer vision to automatic detection and diagnosis of plant diseases is currently being extensively studied because manual plant disease monitoring is tedious, time-consuming, and labor-intensive. Sidharth et al. [13] applied a Bacterial Foraging Optimization-based Radial Basis Function Network (BRBFNN) to automatically identify and classify plant disease, achieving classification accuracy of 83.07%. Convolutional neural network (CNN) is a very popular neural network architecture that is used successfully for a variety of computer vision tasks in diverse fields [14]. Researchers have used CNN architecture and its various versions for the classification and identification of plant diseases. Sunayana et al. [15] compared different CNN architectures for disease detection in potato and mango leaves, with AlexNet achieving 98.33% accuracy and a shallow CNN model achieving 90.85% accuracy. Guan et al. [15,16] used a pre-trained VGG16 model to predict disease severity in apple plants and achieved a 90.40% accuracy rate. Jihen et al. [17] used the LeNet model to accurately distinguish healthy and diseased banana leaves, achieving a 99.72% accuracy rate.
Tomatoes are a major food crop in many parts of the world, with a per capita consumption of 20 kg per year, accounting for around 15% of total vegetable consumption. North America consumes 42 kg of tomatoes per person per year, while Europe consumes 31 kg per person per year [18,19]. To meet global demand for tomatoes, techniques for the crop yield and early detection of pests, bacterial, and viral infections must be created. Several studies have been performed to enhance tomato plant survival through early disease detection and subsequent disease management using artificial intelligence-based techniques. Manpreet et al. [20] classified seven tomato diseases with an accuracy of 98.8% using a pre-trained CNN-based architecture known as Residual Network or generally known as ResNet. Rahman et al. [21] proposed a fully-connected deep learning-based network to distinguish Bacterial Spot, Late Blight, and Septorial Spot disease from tomato leaf images with a 99.25% accuracy. Fuentes et al. [22] used three types of detectors to identify 10 diseases from tomato leaf images: Faster Region-based Convolutional Neural Network (Faster R-CNN), Region-based Completely Convolutional Network (R-FCN), and Single Shot Multibox Detector (SSD). For real-time disease and pest recognition, these detectors were combined with different variants of deep feature extractors VGG16, ResNet50, and ResNet152 for Faster R-CNN, ResNet-50 for SSD, and ResNet-50 for R-FCN, with VGG16 on top of FRCNN achieving the highest Average Precision of 83%. Agarwal et al. [23] proposed the Tomato Leaf Disease Detection (ToLeD) model, a CNN-based architecture for the classification of 10 diseases from tomato leaf photographs, with a 91.2% accuracy rate. Durmuş et al. [24] used AlexNet and SqueezeNet architectures to classify 10 diseases from photographs of tomato leaves and achieved a 95.5% accuracy. While disease classification and identification in plant leaves have been extensively studied in tomatoes and other plants, there has been little research on segmenting leaf images from the context. Since real-world images can differ greatly in terms of lighting conditions, better segmentation techniques can assist AI models in learning from the correct region of interest rather than the context. U-net is a cutting-edge deep learning-based image segmentation architecture. It was created with biomedical image segmentation in mind [25]. The U-shape network architecture gives U-net its name. Unlike traditional CNN models, U-net includes convolutional layers that are used to up-sample or recombine feature maps into a complete image.
The authors have published articles using the state-of-the-art U-Net for segmentation [25] with very promising results [8]. EfficientNet is a recent classification network [26] and has not been used for the application intended in the paper. Thus, the authors used it in this application and have obtained promising results. The paper has the following main contributions:

1.
Different variants of U-net architecture are investigated to propose the best segmentation model by comparing the model predictions to the ground truth segmented images.

2.
Investigation of the classification tasks for different variants of CNN architecture for binary and different multi-class classifications of tomato diseases. Several experiments employing different CNN architectures were conducted. Three different types of classifications were done in this work: (a) Binary classification of healthy and diseased leaves, (b) Five-class classification of healthy and four diseased leaves, and finally, (c) Ten-class classification with healthy and nine different diseases classes. 3.
The performance achieved in this work outperforms the existing state-of-the-art works in this domain.
The rest of the paper is organized in the following manner: Section 1 gives a brief introduction, literature review, and motivation for the study. Section 2 describes the different types of plant pathogens. Section 3 provides the methodology of the study with technical details such as the dataset description, pre-processing techniques, and details of the experiments. Section 4 reports the results of the studies, followed by discussions in Section 5 and finally, the conclusion is provided in Section 6.

Deep Convolutional Neural Networks (CNN)
For detecting tomato leaf diseases, we fine-tuned the EfficientNet CNN proposed by Tan et al. [26]. The authors make sure that the network's width, depth, and resolution are all balanced. They are the first to empirically measure the relationship between all three dimensions, unlike other CNN scaling approaches that use one-dimension scaling. The authors developed their baseline architecture using the MnasNet network [27], which employs a multi-objective neural architecture search that prioritizes accuracy and FLOPS. They build the EfficientNet-B0 network, which is similar to MnasNet [27] but much larger due to the higher FLOPS target. The mobile inverted bottleneck MBConv [28] is its key building block, and it also includes squeeze and excitation optimization [29]. The authors use the compound scaling method, which uses a compound coefficient ϕ to uniformly scale the network width, depth, and resolution, based on EfficientNet-B0. By the following equation: where a, b, and c are constants that can be identified through a quick grid scan. ϕ is a user-specified coefficient that regulates the number of additional resources used for model scaling, while a, b, and c specify how those extra resources would be allocated to network width, depth, and resolution, respectively. EfficientNet-B0 scales up the baseline network by fixing a, b, and c as constants and scaling up the baseline network with different ϕ to build a family of EfficientNets (B0 to B7); the baseline network is shown in Table 1. EfficientNet-B7 achieves state-of-theart performance on ImageNet, with a top-five accuracy of 97.1%, while being 8.4 times smaller and 6.1 times faster on inference than the best current ConvNets like SENet [29] and Gpipe [30].
To build our model, we used EfficientNet-B0, EfficientNet-B4, and EfficientNet-B7. To enhance accuracy and minimize overfitting, we added a Global Average Pooling (GAP) to the network's final convolution layer. Following GAP, we added a Dense layer with a size of 1024 and a 25% dropout. Finally, the probability prediction scores for detecting tomato leaf diseases are given by a Softmax layer. Starting with EfficientNet as a baseline, we use our compound scaling method to scale it up in two steps: The first step-assuming twice more resources available, we first fix ϕ = 1. In particular, we find the best values for EfficientNet-B0 are a = 1.2, b = 1.1, c = 1.15.
Second step-then we fix a, b, c as constants and scale-up baseline network with different ϕ using Equation (1), to obtain EfficientNet-B0 to B7.
Notably, searching for a, b, and c directly around a large model will yield even better results, but the search cost becomes prohibitively costly on larger models. Our approach overcomes this problem by performing a single search on a small baseline network (first step) and then applying the same scaling coefficients to all other models (second step).

Segmentation
In the literature, there are many variants of segmentation models based on U-nets. In order to use the best performing one, two separate variants of the original U-Net [25] and Modified U-Net [31] were investigated in this study. The design of the original U-Net and the Modified U-Net is shown in Figure 1. A contracting path and an expanding path make up the initial U-net. The contracting path consists of two 3 × 3 convolutions (unpadded convolutions) that are applied repeatedly, each followed by a ReLU and a 2 × 2 max pooling operation with stride 2 for downsampling. The expanding path consists of an upsampling of the feature map, a 2 × 2 convolution ("up-convolution") that halves the number of feature channels, a concatenation with the contracting path's correspondingly cropped feature map, and two 3 × 3 convolutions, each accompanied by a ReLU. The network employs a total of 23 convolutional layers.
The Modified U-Net, also a U-Net model with small variation in its decoding part, is utilized [31]. A contracting path with four encoding blocks is followed by an expanding path with four decoding blocks in the U-Net model. Each encoding block is made up of two consecutive 3 × 3 convolutional layers, followed by a downsampling max pooling layer with a stride of 2. The decoding blocks are made up of two 3 × 3 convolutional layers, a transposed convolutional layer for upsampling, and concatenation with the corresponding feature map from the contracting path. The decoding block uses three convolutional layers instead of two in the modified U-Net architecture. An upsampling layer is followed by two 3 × 3 convolutional layers, a concatenation layer, and another 3 × 3 convolutional layer in the modified decoder. Batch normalization and ReLU activation are extended to all convolutional layers. A pixel-wise SoftMax is applied to map each pixel into a binary class of background at the final layer, which uses 1 × 1 convolution to map the output from the last decoding block to two channel feature maps.

Visualization Techniques
The development of visualization techniques has resulted from an increased interest in the internal mechanisms of CNNs and the logic behind a network making particular decisions. The visualization techniques aid in the interpretation of CNN decision-making processes by providing a more visual representation. This also improves the model's clarity by visualizing the reasoning behind the inference in a way that is readily understood by humans, thus raising trust in the neural network's outcomes. Among the numerous visualization techniques available, such as SmoothGrad [32], Grad-CAM [33], Grad-CAM++ [34], and Score-CAM [35], the recently proposed Score-CAM was used in this study due to its promising output. The weight of each activation map is obtained through its forward passing score on the target class, and the final result is obtained through a linear combination of weights and activation maps. Score-CAM eliminates the dependency on gradients by obtaining the weight of each activation map through its forward passing score on the target class. Figure 2 shows a sample image visualization with Score-CAM, with the heat map showing that the leaf regions controlled the decision making in CNN. This can help users understand how the network makes decisions and can also raise end-user trust if it can be verified that the network always makes decisions based on the leaf area.

Pathogens of Tomato Leaves
The most common plant pathogens are fungi, which can cause a variety of diseases such as early blight, septoria leaf spot, target spot, and leaf mold. Fungi can invade plants from a variety of places, including infected soil and seeds. Animals, humans, equipment, and soil pollution may all spread fungal infections from one plant to another. The fungus that causes early blight disease in tomato plants affects the plant leaves. Collar rot, stem lesion, and fruit rot are the terms for the rot that affects the seedlings' basal stems, adult plant stems, and fruits, respectively [36,37]. The most important methods for controlling early blight are cultural control, which involves effective soil, nutrient, and crop management, as well as the use of fungicidal chemicals. Fungus induces Septoria leaf spot on tomato plants [38,39], which releases tomatinase enzyme, which speeds up the degradation of tomato steroidal glycoalkaloids α-tomatine [40,41]. The fungus induces the target spot disease in tomato plants [42,43]. In tomato plants, necrotic lesions with a light brown color in the middle are symptoms of target spot disease [44,45]. Early defoliation occurs when the lesions spread to a wider blighted leaf region [44,45]. The goal spot also does direct damage to the fruit by penetrating the pulp [44,45]. The fungus is responsible for plant leaf mold disease [46,47], which happens when the leaves are damp for a long time. Bacteria are a major plant pathogen as well. They get into plants through wounds like insect bites, pruning, and cuts, as well as natural openings like stomata. Temperature, humidity, soil conditions, nutrient availability, weather conditions, and airflow are all important factors in bacterial growth on plants and the harm they cause. Bacterial spot is a bacterial-caused plant disease [48,49]. Molds are also a significant contributor to plant disease. Mold causes late blight disease in tomato and potato plants [50,51]. A few of the symptoms include the presence of dark uneven blemishes on leaf tips and plant stems. The Tomato Yellow Leaf Cur Virus (TYLCV) is a disease-causing virus that affects tomatoes. The plant is infected by this virus, which is spread by another insect. Despite the fact that tomato plants have diseased leaves and a ten-class classification. Different types of tomato leaf diseases were categorized into disease categories in study 2, while different classes of unhealthy and stable leaf photos were classified in study 3. Similar studies have shown that the virus can infect a number of plants, including beans and peppers, tobacco, potatoes, and eggplants [52,53]. Owing to the disease's rapid spread in recent decades, the focus of research has changed to damage control of yellow leaf curl disease [54][55][56][57]. Tomato mosaic virus is another viral disease that directly affects tomato plants (ToMV). This virus is present all over the world and affects not only tomatoes but also other plants. Twisted and fern-like stems, infected fruit with yellow patches, and necrotic blemishes are all symptoms of ToMV infection [58,59].

Methodology
The overall methodology of the study of the paper is summarized in Figure 3. This study used tomato leaf data from the Plant Village dataset [60,61], where tomato leaf images and corresponding segmented tomato leaf masks are provided. As explained earlier, the paper has three different studies: (i) binary classification of healthy and unhealthy segmented leaves; (ii) five-class classification of healthy and were performed using segmented leaf images. The paper also explored different variants of U-net segmentation models to investigate the best segmentation network for leaf segmentation from the background. Segmented tomato leaf images leveraging in the classification is further verified with the Score-Cam visualization technique, which has been found very reliable in different applications. The classification is done using EfficientNet networks that have been comparatively successful in previous publications by the authors.

Datasets Description
In this study, Plant Village tomato leaf images and corresponding leaf mask dataset were used [60,61], where 18,161 tomato leaf images and corresponding segmented leaf masks are available. The dataset was used for training the tomato leaf segmentation models and classification models as well. All images were divided into 10 different classes, where one class is healthy and the other nine classes are unhealthy (such as bacterial spot, early blight, leaf mold, septoria leaf spot, target spot, two-spotted spider mite, late bright mold, mosaic virus, and yellow leaf curl virus), and nine unhealthy classes are categorized into five subgroups (namely-bacterial, viral, fungal, mold, and mite disease). Some sample tomato leaf images, for healthy and different unhealthy classes, and leaf masks from the Plant Village dataset are shown in Figure 4. Moreover, a detailed description of the number of images in the dataset is also shown in Table 2, which is useful for classification tasks discussed in detail in the next section.

Preprocessing
Resizing and Normalizing: The various CNN network (both for segmentation and classification experiments) has input image size requirements. Thus, the images were resized to 256 × 256 for the various variants of U-net segmentation networks. Similarly, the images were resized to 224 × 224 for EfficientNet (EfficientNet-B0, EfficientNet-B4, and EfficientNet-B7). Using the mean and standard deviation of the images of the dataset, z-score normalization was used to normalize the images.
Augmentation: Since the dataset is not balanced and the dataset does not have a similar number of images for the different categories, training with an imbalanced dataset can produce a biased model. Thus, data augmentation can help in having a similar number of images in the various classes, which can provide reliable results as stated in many recent publications [6][7][8][9]11]. In this study, three augmentation strategies (rotation, scaling, and translation) were utilized to balance the training images. The rotation operation used for image augmentation was done by rotating the images in the clockwise and counterclockwise direction with an angle between 5 to 15 degrees. The scaling operation is the magnification or reduction of the frame size of the image and 2.5% to 10% image magnifications were used in this work. Image translation was done by translating images horizontally and vertically by 5% to 20%.

Experiments
Leaf Segmentation: Different U-net models were used separately on Plant Village tomato leaf images and leaf mask dataset to identify the best performing segmenta-tion model for leaf segmentation. Five-fold cross-validation was used, where 80% of 18,161 tomato leaf images and their corresponding ground truth masks were randomly selected and used for training, and the remaining 20% were used for testing ( Table 3). The class distribution in the test set is similar to the train set. Out of the 80% training dataset, 90% was used for actual training, and 10% for validation, which helps in avoiding the overfitting problem. In this study, three different loss functions (Negative Log-Likelihood (NLL) loss, Binary Cross-Entropy (BCE) loss, and Mean-Squared Error (MSE) loss) were used to achieve the best performance metrics and to identify the best tomato leaf segmentation, model. Moreover, an early stopping criterion of five maximum epochs with no improvement in validation loss was used as reported in some of the recent works [9,11]. Tomato leaf disease classification: The study investigated a deep learning architecture based on a recent convolutional neural network called EfficientNet to classify segmented tomato leaf disease images. Three different classification experiments were carried out in this study. Table 4 summarizes the details of the images in the experiments for three different classification using segmented leaf images. The summary of parameters of the classification and segmentation experiments is reported in Table 5.  All the experiments were conducted using PyTorch library with Python 3.7 on Intel ® Xeon ® CPU E5-2697 v4 @ 2,30GHz and 64 GB RAM, with a 16 GB NVIDIA GeForce GTX 1080 GPU.

Performance Matrix
Tomato leaf Segmentation: Important performance metrics for the segmentation experiment is stated in Equations (2) Here, true positive (TP) is the number of correctly classified healthy leaf images and true negative (TN) is the number of correctly classified unhealthy leaf images. Falsepositive (FP) and false-negative (FN) are the misclassified healthy and unhealthy leaf images, respectively.
Moreover, segmentation and classification networks were also compared in terms of the testing time per image, i.e., time taken by each network to segment or classify an input image, represented in Equation (10): where t' is the starting time for a network to segment or classify an image, I and t" is the end time when the network has segmented or classified the same image, I.

Results
The performance of various networks in the different experiments is reported in this section.

Tomato Leaf Segmentation
In this study, three different segmentation models, the original U-net [25], and modified U-net [31] were trained, validated, and tested for the segmentation of tomato leaf images. Table 6 shows the comparative performance of the two segmentation models using three different loss functions (namely, NLL, BCE, and MSE loss function) in image segmentation. It can be noted that the Modified U-net with NLL loss function outperformed the original U-net in the segmentation of the leaf region on the whole images quantitatively and qualitatively. The test loss, test accuracy, IoU, and dice for the segmentation of tomato leaves using Modified U-net with NLL loss function were found to be 0.0076, 98.66, 98.5, and 98.73, respectively. Figure 5 shows some example test tomato leaf images, corresponding ground truth masks, and segmented leaf images generated by the Modified U-net model with NLL loss function for the Plant Village dataset.

Tomato Leaf Disease Classification
In this study, three different experiments were conducted for segmented tomato leaf images. The comparative performance for three different EfficientNet families (such as EfficientNet-B0, EfficientNet-B4, and EfficientNet-B7) for the three classification schemes for segmented leaf images is shown in Table 7. It is apparent from Table 7 that all the evaluated pre-trained models perform very well in classifying healthy and unhealthy tomato leaf images in two-class, six-class, and ten-class problems. The performance also improved when using non-segmented images (see Supplementary Table S1).
Among the networks trained with leaf images with and without segmented twoclass, six-class, and ten-class problems, EfficientNet-B7 outperformed other trained models, except for ten-class where EfficientNet-B4 was slightly better than EfficientNet-B7. It can also be seen that as the EfficientNet model's scale, the testing time (T) increases due to scaled depth, width, and resolution of the network. The authors have tried the different versions of EfficientNet and it was seen that as the network is scaled in terms of depth, width, and resolution, the performance becomes better. However, as the classification scheme becomes complicated, the performance does not improve much with the scaled version of EfficientNet.
For segmented tomato leaf images, EfficientNet-B7 outperforms others and for twoclass and six-class problems showed accuracy, sensitivity, and specificity of 99.95%, 99.95%, 99.77%, and 99.12%, 99.11%, 99.81%, respectively. In contrast, EfficientNet-B4 produced the best result for ten-class with accuracy, sensitivity, and specificity of 99.89%, 99.44%, and 99.94%, respectively. It is evident from Figure 5 that network performances are slightly improved with more parameters for 2-class, 6-class, and 10-class problems. Figure 6 clearly shows that the Receiver operating characteristic (ROC) curves for two-class, six-class, and ten-class problems using segmented tomato leaf images. However, deep networks can provide better performance gain for 2-class and 6-class problems.   The confusion matrix for the best performing networks for the different classification problems using tomato leaf images is shown in Figure 7. It can be noticed that even with the best performing network EfficientNet-B7 for two-class tomato leaf images, 6 out of 16,570 unhealthy tomato leaf images were misclassified as healthy and 4 out of 1591 healthy tomato leaf images were miss-classified as unhealthy images.
For the six-class problem, which consisted of one healthy class and five different unhealthy classes, only 3 out of 1591 healthy tomato leaf images were miss-classified as unhealthy images, and 159 out of 16,570 unhealthy tomato leaf images were miss-classified as healthy or any other unhealthy classes. Moreover, for the ten-class problem, which consisted of one healthy class and nine different unhealthy classes, it can be noticed that the best performing network was EfficientNet-B4, only 4 out of 1591 healthy tomato leaf images were miss-classified as unhealthy images and 105 out of 16,570 unhealthy tomato leaf images of nine different categories were miss-classified as healthy or any other unhealthy classes.

Visualization Using Score-Cam
In this study, the reliability of the trained networks was investigated using visualization techniques. Score-CAM-of five different categories were misclassified as healthy or any other unhealthy classes. For the ten-class problem, heat maps for segmented tomato leaf images were used. Figure 8 shows the original tomato leaf samples along with the heat maps on segmented leaves. As can be seen from Figure 8, the networks are learning from the leaf images in the segmented leaf, which makes the network decision more reliable. This helps to counter the criticism that CNN takes decision from the non-relevant region and are not reliable [62]. It can also be seen in Figure 9 that segmentation has helped in a classification where the network learns from the region of interest. This reliable learning has helped incorrect classification. In addition, the authors have also experimented to confirm that segmentation helps in learning and taking decisions from relevant areas compared to non-segmented images (see Supplementary Figure S1).

Discussion
Plant diseases are a major threat to global food security. The latest technologies need to be applied to the agriculture sector to curb diseases. Artificial intelligence-based technologies are extensively investigated in plant disease detection. Computer visionbased disease detection systems are popular for their robustness, ease of acquiring data, and quick results. This research investigates how model scaling CNN-based architectures perform against each other in two tasks i.e., segmentation and classification of tomato leaf images. The study was divided into three sub-studies of 2-class classification (Healthy, and Unhealthy), 6-class classification (Healthy, Fungi, Bacteria, Mold, Virus, and Mite), and 10-class classification (Healthy, Early blight, Septoria leaf spot, Target spot, Leaf mold, Bacterial spot, Late bright mold, Tomato Yellow Leaf Curl Virus, Tomato Mosaic Virus, and Two-spotted spider mite). Overall, the EfficientNet-B7 model outperformed every other model, except for binary classification, and 6-class classification with segmented images, where the EfficientNet-B4 model outperformed others in 10-class classification. In the binary classification of healthy and diseased tomato leaves, EfficientNet-B7 showed an overall accuracy of 99.95% with segmented images. In 6-class classification, EfficientNet-B7 showed an overall accuracy of 99.12% with segmented images. Furthermore, in the 10class classification, EfficientNet-B4 showed an overall accuracy of 99.89% with segmented images. The results in the paper are comparable to the state-of-the-art results and are also summarized in Table 8. Although the Plant Village dataset used in this study contains images taken in diverse environmental conditions, the dataset is collected in a specific region and is of specific breeds of tomatoes. A study conducted using a dataset containing images of other breeds of tomato plants from different regions of the world may result in a more robust framework for early disease detection in tomato plants. Furthermore, the lighter architecture of CNN models with non-linearity in the feature extraction layers might be useful to investigate for portable solutions.

Conclusions
In this work, we developed a deep convolutional neural network (CNN) based on a recently developed EfficientNet CNN model. The model was fine-tuned and trained for the detection of healthy and different unhealthy tomato leaf images. The obtained results show that our model outperforms some recent deep learning techniques by using the most popular publicly available Plant Village dataset [60,61]. It was also found that the Modified U-net was best suited for segmentation of leaf images from the background and EfficientNet-B7 was better at extracting discriminative features from images compared to other architecture. Besides, the performance of the networks generally further improved when trained with more parameters. The trained models can be used in the early automatic detection of plant diseases. Experts need years of training and knowledge to early disease detection with visual inspection but our model can be used by anybody who is not an expert. Any new users will have the network working in the background which will take input from the visual camera and immediately inform the user of the output so that the user can take necessary action. Thus, preventive actions can be taken earlier. This work can be beneficial in early and automatic disease detection of tomato crops enabled by the latest technologies such as smartphones, drone cameras, and robotic platforms. The proposed framework can be incorporated with a feedback system that gives valuable suggestions, remedies, disease management, and control strategies, thus ensuring better crop yields. The authors would work on an extension of the work to validate the performance of the proposed solution on a real-time application where microcontrollers with cameras will be used to check the performance. The future work would have a much more diverse environment and the authors are confident that it will work better even over there.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10 .3390/agriengineering3020020/s1, Figure S1: Score-CAM visualization to confirm how segmentation has helped in classification even incorrectly classified images, Table S1: Summary of the tomato leaf disease classification performance using non-segmented original leaf images.