Potato Leaf Disease Detection Based on a Lightweight Deep Learning Model

Chang, Chao-Yun; Lai, Chih-Chin

doi:10.3390/make6040114

Open AccessArticle

Potato Leaf Disease Detection Based on a Lightweight Deep Learning Model

by

Chao-Yun Chang

and

Chih-Chin Lai

^*

Department of Electrical Engineering, National University of Kaohsiung, Kaohsiung 811726, Taiwan

^*

Author to whom correspondence should be addressed.

Mach. Learn. Knowl. Extr. 2024, 6(4), 2321-2335; https://doi.org/10.3390/make6040114

Submission received: 26 August 2024 / Revised: 12 October 2024 / Accepted: 12 October 2024 / Published: 14 October 2024

(This article belongs to the Section Learning)

Download

Browse Figures

Versions Notes

Abstract

Traditional methods of agricultural disease detection rely primarily on manual observation, which is not only time-consuming and labor-intensive, but also prone to human error. The advent of deep learning has revolutionized plant disease detection by providing more accurate and efficient solutions. The management of potato diseases is critical to the agricultural industry, as these diseases can lead to substantial losses in crop production. The prompt identification and classification of potato leaf diseases are essential to mitigating such losses. In this paper, we present a novel approach that integrates a lightweight convolutional neural network architecture, RegNetY-400MF, with transfer learning techniques to accurately identify seven different types of potato leaf diseases. The proposed method not only enhances the precision of potato leaf disease detection but also reduces the computational and storage demands, with a mere 0.40 GFLOPs and a model size of 16.8 MB. This makes it well-suited for use on edge devices with limited resources, enabling real-time disease detection in agricultural environments. The experimental results demonstrated that the accuracy of the proposed method in identifying seven potato leaf diseases was 90.68%, providing a comprehensive solution for potato crop management.

Keywords:

agriculture; deep learning; lightweight convolutional neural network; transfer learning

1. Introduction

Potatoes are an important food crop around the world and is grown extensively to meet the basic nutritional needs of billions of people. The potato is the fourth most widely cultivated food crop in the world, following rice, wheat, and maize. It is grown in more than 150 countries, spanning a range of climatic zones from temperate regions to tropical highlands. The high nutritional value of the potato has led to its designation as a staple food in numerous countries, providing essential carbohydrates, vitamins, and minerals such as potassium. Moreover, the relatively short growth cycle and high yield potential of the potato make it a pivotal crop for addressing population growth and food shortages, which is of paramount importance for global food security [1]. However, potato production is severely threatened by a variety of diseases, including those caused by viruses, bacteria, fungi, insect pests, and nematodes, as shown in Figure 1. These diseases not only reduce the yield and quality of potato crops, but also result in significant economic losses for growers. Therefore, the early detection and effective management of these diseases are critical to protecting potato crops and increasing agricultural productivity. In addition, research focused on potato diseases is essential for improving crop yields and ensuring the stability of the global food supply [2].

Traditional methods for identifying plant diseases depend on manual inspections requiring specialized knowledge to accurately diagnose the type of disease. This approach requires considerable computational effort and incurs significant costs. In addition, the failure to detect diseases in a timely manner can lead to their uncontrolled spread among plants, causing significant damage. The rapid development of deep learning technologies, particularly convolutional neural networks (CNNs), has led to significant advances in plant disease diagnosis [3,4,5,6]. In addition, CNNs are also effective in detecting abnormalities in plant leaves, such as spots, damage, and color changes. In recent years, an increasing number of studies have used these technologies for automated plant disease detection. The application of deep learning models has the potential to significantly improve the accuracy of potato leaf disease diagnosis while reducing the time required for diagnosis. This will allow growers to take timely action and effectively minimize the losses caused by these diseases. Numerous studies have focused on the detection of leaf diseases in crops such as potatoes [7,8,9,10], peanuts [11], jasmine [12], citrus, cucumber, grape, and tomato [13,14]. Consequently, the use of deep learning for plant leaf disease detection has become an important development direction in agricultural technology and can help address critical challenges in global agricultural production.

The use of a tri-CNN architecture, consisting of three deep convolutional neural networks (DenseNet169, Inception, and Xception), was proposed for disease identification [11]. The model was initially pre-trained on the ImageNet dataset and later trained and tested on the Soa comprehensive dataset, achieving an accuracy of 98.46%. However, the high computational complexity of the tri-CNN architecture may limit its deployment on resource-constrained devices.

A novel MobileNetV3-based classifier specifically designed for classifying jasmine plant leaf spot disease was proposed [12]. The classifier incorporates a data enhancement technique using a conditional generative adversarial network (cGAN). Evaluation using the jasmine plant leaf spot disease dataset demonstrated strong performance, with a training accuracy of 97%, highlighting its effectiveness. Real-world testing under varying conditions, such as extreme camera angles and different lighting conditions, showed the model’s flexibility, with its test accuracies ranging from 94% to 96%. However, since the model was developed exclusively for jasmine, further testing is required to assess its applicability to other crops.

Sharma et al. [15] introduced a novel deep learning architecture called the Deeper Lightweight Convolutional Neural Network (DLMC-Net) for detecting leaf diseases across a wide variety of crops. Unlike many traditional models that are limited to specific crops, DLMC-Net features an optimized architecture capable of generalizing across multiple crops, including citrus, cucumber, grape, and tomato. The model incorporates integrated block and channel layers, which enhance feature propagation and reuse while addressing the gradient vanishing problem. It achieved accuracies of 93.56% on citrus, 92.34% on cucumber, 99.50% on grape, and 96.56% on tomato. While DLMC-Net offers advantages in terms of its lightweight design and high accuracy, there is still potential for improvement. Future research should focus on testing the model’s robustness under different environmental conditions to ensure its effectiveness in real-world agricultural scenarios.

Sholihati et al. [16] employed the deep learning model called the Visual Geometry Group 16-layer network (VGG-16) in conjunction with data augmentation techniques to detect potato leaf diseases using the PlantVillage dataset [17], which encompasses five categories of diseases. The VGG-16 model leverages the stacking of multiple convolutional layers to effectively extract complex image features. The model ultimately achieved a classification accuracy of 91% on 1220 test images from the PlantVillage dataset. The VGG16 model is known for its high computational intensity due to its deep layers, making it unsuitable for real-time applications or deployment on resource-constrained edge devices. As more efficient models become available, it would be beneficial for future studies to explore alternatives that can deliver comparable or superior accuracy while reducing the computational load.

Pandiri et al. [18] introduced an optimized deep convolutional neural network (ODNet) specifically designed for the identification of potato leaf diseases. The performance of ODNet was enhanced through the adjustment of hyperparameters using the slime mold optimization (SMO) algorithm. The model was tested on the Potato Leaf Disease dataset [19], which includes three categories: healthy, early blight, and late blight. The ODNet model achieved a classification accuracy of 96.67%, demonstrating its effectiveness in accurately identifying these diseases. However, the tests were limited to only three disease types, which restricts the model’s ability to generalize to a broader dataset. Additionally, the study did not evaluate the effectiveness of ODNet in real-world conditions. Disease behavior in actual farm fields is likely to be more diverse and complex than in controlled datasets.

Shaheed et al. [20] introduced a composite model, EfficientRMT-Net, which integrates EfficientNet, vision transformer (ViT), depth-wise convolution (DWC), and ResNet architectures for the detection of potato leaf diseases using images from the PlantVillage dataset. To address the issue of class imbalance within the dataset, they applied data augmentation techniques. Additionally, transfer learning was employed to enhance the classification efficiency of the proposed approach. In their experiments, the individual architectures—EfficientNet, ViT, DWC, and ResNet—achieved accuracies of 76%, 84.1%, 84.6%, and 83.5%, respectively. The proposed integrated EfficientRMT-Net architecture, however, achieved a remarkable recognition accuracy of 99.12% on the PlantVillage dataset. Although the model demonstrated impressive accuracy (99.12%), the integration of multiple architectures makes it overly complex and computationally expensive, which could pose significant challenges in real-world applications, particularly in resource-constrained agricultural settings. Moreover, the study did not validate the model’s performance in a realistic agricultural environment.

Mahum et al. [21] employed the DenseNet model to classify potato leaf diseases using data primarily collected from the PlantVillage dataset. In addition to the existing categories of early blight (EB) and late blight (LB) within the PlantVillage dataset, the authors manually collected additional data for categories such as potato leafroll virus (PLR), potato verticillium wilt (PVW), and healthy potato (PH). The classification of potato leaf diseases was conducted using a pre-trained efficient DenseNet model. To address the issue of data imbalance within the training set, the authors utilized a reweighted cross-entropy loss function to stabilize the training process. The model achieved a classification accuracy of 97.2% on the customized PlantVillage dataset. Nevertheless, the model’s efficacy on real-world data remains uncertain, especially considering the expansion and manual adjustment of the dataset. While the reweighted cross-entropy loss function addresses data imbalances, it is unclear whether the model will perform equally well on more diverse datasets or on previously unseen disease classes.

Catal Reis et al. [22] used machine learning algorithms such as support vector machine (SVM), logistic regression (LR), Random Forest (RF), and adaptive boosting (AdaBoost) algorithms to propose the multi-head attention mechanism depthwise separable convolution inception reduction network (MDSCIRNet), which overcomes common issues encountered during the training process of deep neural networks, such as gradient vanishing, underfitting, and overfitting. This method achieved an accuracy of up to 99.33% in identifying potato leaf diseases on the Potato Leaf Disease dataset. Nevertheless, the deployment of conventional machine learning algorithms, such as support vector machines (SVM) and Random Forest, while effective for small to medium-sized datasets, may limit scalability and performance when applied to larger and more complex datasets. Furthermore, the study did not specify the dimensions and computational efficiency of the model, which is crucial for practical agricultural applications.

Shabrina et al. [23] proposed a novel dataset for potato leaf diseases collected in uncontrolled environments, with images captured in real agricultural fields under various weather conditions. The dataset comprises 3076 images categorized into seven distinct classes. The authors conducted experiments using five different deep learning models, with the EfficientNetV2B3 model attaining the highest accuracy of 73.63% among the models compared. The creation of datasets in uncontrolled environments is a valuable contribution, as it more accurately reflects real-world scenarios where lobar diseases can be more challenging to detect. However, the relatively low accuracy rates suggest that the existing deep learning models may struggle to generalize effectively under such challenging conditions. Further research into more sophisticated data augmentation techniques or advanced model architectures may be necessary to enhance performance on realistic datasets.

The implementation of deep learning models has the potential to significantly improve the accuracy of potato leaf disease diagnosis while simultaneously reducing the time required for diagnosis. This advancement will enable growers to take prompt action and effectively mitigate the losses caused by these diseases. As of early 2024, most research using deep learning for potato leaf disease identification has primarily focused on datasets with three or five disease categories and has demonstrated commendable performance. However, studies using datasets with seven categories still show significant potential for improving classification accuracy. Furthermore, it has been observed that the deep learning models discussed in the literature typically require significant computational resources and storage capacity. This poses significant challenges for deployment in resource-constrained environments, such as portable mobile devices (e.g., smartphones and tablets), where the computational power required for these models may not be feasible. To address these practical challenges, this study proposes the use of the RegNetY-400MF model [24] for potato leaf disease detection. RegNetY-400MF is a lightweight and efficient deep learning architecture that achieves high accuracy while significantly reducing computational and storage requirements. This makes it well suited for use on portable mobile devices, making disease detection more accessible and practical in agricultural settings.

2. The Proposed Approach

2.1. DL Model

This study employed deep learning techniques to address the classification of potato diseases, with the training process depicted in Figure 2. Initially, the images are preprocessed by standardizing them to ensure uniformity and facilitate their use in subsequent models. Following preprocessing, the images are fed into the deep learning model for training and validation. Using the transfer learning technique [25], knowledge learned from a pre-trained model is integrated into the proposed approach to improve model performance.

We selected the RegNetY deep learning model, an optimized variant of the RegNet model [26]. Both the RegNet and RegNetY models are distinguished by their design of efficient and flexible CNNs using a network architecture known as AnyNet. As depicted in Figure 3a, AnyNet comprises three structural components: a stem, body, and head. The stem is responsible for the initial feature extraction and dimensionality reduction of the input image, where r represents the image resolution and w denotes the channel width of the feature matrix. The head includes a global average pooling layer and a fully connected layer for classification, with n indicating the number of predicted classes. The body consists of four stages, as illustrated in Figure 3b, where the resolution r gradually decreases with each successive stage. Each stage consists of a series of blocks, shown in Figure 3c, which allow for flexible adjustments between the depth (d_i) and width (w_i) of the network, where d_i and w_i represent all possible combinations. This flexibility facilitates the creation of various RegNet models that can be optimally configured based on specific application requirements, thereby enabling superior performance across diverse environments with varying demands.

The main difference between RegNetY and RegNet is that the RegNetY model incorporates the squeeze-and-excitation (SE) channel attention mechanism [27] in each block to enhance the model’s expressiveness. The SE module works by first compressing the feature map via a global average pooling operation, converting it into a 1 × 1 × C vector. This vector is then passed through two fully connected layers and two activation functions, ReLU and sigmoid. The first fully connected layer, consisting of C × SE Ratio neurons, receives an input of 1 × 1 × C and produces an output of 1 × 1 × C × SE Ratio, followed by a ReLU activation function. The second fully connected layer, containing C neurons, takes an input of 1 × 1 × C × SE Ratio and produces an output of 1 × 1 × C, which is then passed through a sigmoid activation function. The resulting values are then applied to the appropriate channels to generate a new feature map. The primary goal of the SE module is to capture the color features between the channels and to learn the relationships between the channels, thereby increasing the expressiveness of the model, as shown in Figure 4.

RegNetY is divided into a series of models of different sizes based on the number of floating-point operations (FLOPs). In the preliminary tests, the smaller variant, RegNetY-200MF, demonstrated a significantly lower accuracy (76.53%) compared to RegNetY-400MF, which achieved 81.03% accuracy. Although the accuracy of the larger variant, RegNetY-800MF, increased to 83.39%, the improvement was not substantial, and its computational requirements were nearly doubled. Since our research focuses on the computational complexity of the RegNet series for resource-constrained devices, RegNetY-400MF was chosen for its optimal balance between model size, computational efficiency, and accuracy.

2.2. Transfer Learning

We applied transfer learning techniques using the pre-trained RegNetY-400MF model weights on the ImageNet database [28]. The ImageNet database contains over ten million labeled images across a thousand different categories. The pre-trained model learns to identify different features in images, such as edges, textures, and shapes, through training on this vast amount of data, providing a solid starting point for our research and effectively speeding up the model’s convergence process for specific tasks. In addition, this approach can improve the generalization ability of the model, allowing it to perform well even with a small amount of labeled data.

Transfer learning is particularly beneficial in scenarios where labeled data or computational resources are scarce, as it can significantly improve model performance and recognition rates. By fine-tuning pre-trained models according to the specific requirements of a task, researchers can adapt them for particular applications, making this technique widely applicable in various areas of machine and deep learning, including image classification, object detection, and speech recognition.

Although not using transfer learning may allow for greater model customization and avoidance of inherited biases or irrelevant features from pre-trained models, this approach generally requires extensive data and computational resources, resulting in longer training times. The decision to use transfer learning in this study was primarily due to its ability to improve model performance while conserving resources and reducing training time, making it a practical and necessary choice.

3. Experimental Results

3.1. Experimental Platform and Model Parameters

The experimental environment consisted of an 8-core computer with an Intel Core i7-10700 CPU model, a NVIDIA GeForce RTX 2070 Super GPU model, and a memory size of 16 GB. In terms of software, the operating system was Windows 11 Professional, the development environment was Anaconda 24.3.0 coupled with Python 3.11.8, using the PyTorch 2.2.2 deep learning framework and CUDA 12.2 and cuDNN 9.0.0 GPU drivers.

The initial parameters used in the experiment were as follows: The learning rate and batch size were set to 0.0001 and 64, respectively, and the model was trained for 50 epochs. The optimizer used was Adam, a widely used optimizer in deep learning that speeds up model convergence. For the loss function, cross entropy was chosen due to its effectiveness in calculating the loss between predicted probability distributions and true labels in classification tasks. Compared to other loss functions, such as mean squared error (MSE), cross-entropy function is better suited to classification problems, allowing for faster and more accurate model convergence.

3.2. Dataset Construction

In order to accurately assess the identification performance of the proposed method, we used 10-fold cross-validation to evaluate the effectiveness of the system [29], using the Potato Leaf Disease in Uncontrolled Environment dataset as a test subject. This dataset consists of 3076 images of potato leaf diseases categorized into seven classes. The images have a resolution of 1500 × 1500 pixels and were captured under varying weather conditions, including sunny, cloudy, and partly cloudy skies, between 8 a.m. and 3 p.m. The distances from the subject ranged from approximately 5 to 15 cm, due to differences in photographic positioning. As a result, the backgrounds and occlusions vary, making the dataset a realistic representation of potato leaf diseases. Detailed information about this dataset is provided in Table 1.

The primary reason for this data imbalance is the nature of data collection. Since agricultural research and datasets often focus on identifying and addressing disease problems that significantly threaten crop yield and quality, more emphasis is placed on collecting images of diseased plants during data collection, resulting in a higher representation of disease categories in the dataset.

3.3. Performance Metrics

In this study, we used several evaluation metrics, including accuracy, precision, recall, and F1 score. Accuracy served as the primary metric for assessing the overall performance of the model across the seven categories. Precision measures the model’s ability to minimize false positives, while recall assesses its effectiveness in identifying all positive cases. The F1 score, calculated as the harmonic mean of precision and recall, provides a balanced assessment of these two metrics.

In the context of these metrics, true positive (TP) refers to cases where a positive sample is correctly predicted to be positive; a false positive (FP) occurs when a negative sample is incorrectly predicted to be positive; a false negative (FN) indicates that a positive sample is incorrectly predicted to be negative; and true negative (TN) refers to cases where a negative sample is correctly predicted as negative. The equations used to calculate the accuracy, precision, recall, and F1 score are as follows:

T e s t A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(1)

P r e c i s i o n = \frac{T P}{T P + F P}

(2)

R e c a l l = \frac{T P}{T P + F N}

(3)

F 1 s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(4)

3.4. The Impact of Transfer Learning

In this experiment, we compared the performance of the model with and without the application of transfer learning to assess its impact. The results are shown in Table 2. The data show that with the same parameters, transfer learning resulted in significant improvements in all performance metrics. Consequently, we decided to include transfer learning in the subsequent experiments.

3.5. Different Combinations of Training Parameters

In this experiment, we evaluated the impact of different batch sizes and learning rates on the model performance by testing different combinations of these parameters. The results are shown in Table 3. The data show that a batch size of 16 and a learning rate of 0.001 gave the best performance in terms of both training stability and accuracy. In addition, the results suggest that batch sizes that are too large or learning rates that are too small can negatively affect model performance. Therefore, we chose a batch size of 16 and a learning rate of 0.001 as the optimal parameter combination for our experiments.

3.6. Data Enhancement

To further improve the performance of the model, we used a number of image enhancement techniques during the training process, including random cropping, brightness adjustment, horizontal and vertical flipping, and random rotation. These enhancement techniques can increase the robustness of the training, thereby improving the generalization ability of the model in different environments. The experimental results are shown in Table 4, which shows that, with the same settings for other parameters, the application of data enhancement techniques led to significant improvements in all evaluation metrics of the model during testing.

To assess the impact of image enhancement and data diversity on performance, we analyzed the validation loss function curve. The experimental results are shown in Figure 5. In Figure 5a, the blue curve converges close to 0, while the validation accuracy shown in Figure 5b shows no improvement, suggesting overfitting. Overfitting typically occurs when the training data lack sufficient diversity, causing the model to learn specific features of the training data rather than general patterns, which affects its performance on the validation data [30]. This observation highlights the effectiveness of data augmentation techniques in improving model performance and increasing training robustness by mitigating overfitting.

3.7. Performance Comparison with Other Methods

The optimal combinations of batch size and learning rate identified in our initial experiments were then integrated with various data augmentation techniques. To ensure robust results, we ran ten tests and calculated average performance metrics. This approach yielded favorable results for several key performance indicators, including test accuracy, precision, recall, and F1 score. In particular, the highest test accuracy achieved was 90.68%, as shown in Table 5.

In order to more effectively assess the true performance of the model, this experiment used 10-fold cross-validation to compare with other lightweight models. The 10-fold cross-validation method is a technique used to evaluate the performance of machine learning models by randomly dividing the dataset into 10 equally sized subsets. Each time, one subset is chosen as the validation set, while the remaining 9 subsets are used to train the model. This process is repeated 10 times, resulting in 10 best validation accuracies, from which the mean and standard deviation are calculated as indicators to evaluate the overall performance of the model. In situations with limited datasets, this method allows a more uniform use of all samples in the dataset. For a fair comparison, we selected several mainstream lightweight models, including SqueezeNet 1.1 [31], MNASNet1_3 [32], ShuffleNetV2_x2_0 [33], and EfficientNet_B1 [34]. The experimental results are shown in Table 6.

The experimental results show that the RegNetY-400MF model achieved the highest average accuracy of all the models evaluated. It also exhibited a significantly lower standard deviation compared to other lightweight models, indicating greater consistency in its performance.

4. Discussion

This section presents a comparative analysis of the RegNetY-400MF model against other lightweight models, based on various evaluated metrics. The number of floating-point operations (FLOPs) required for a single forward pass was used as a measure of computational complexity. FLOPs, typically expressed in gigaflops (GFLOPs), serve as a key indicator of computational demand—an essential factor for applications requiring real-time processing. Additionally, the model’s memory footprint was measured in megabytes (MB), reflecting the amount of storage space the model occupies. The number of parameters in the model is another crucial performance indicator, as an increase in parameters corresponds to greater model complexity, higher memory usage, and increased computational requirements. A detailed comparison of the experimental results is presented in Table 7.

Based on the experimental results in Table 7, we can observe that the RegNetY-400MF model was only worse than the SqueezeNet 1.1 model in the lightweight metrics evaluation. However, when we consider the results in Table 6, the average accuracy of RegNetY-400MF was not only the highest, but also significantly greater than that of the SqueezeNet 1.1 model. This demonstrates that RegNetY-400MF has a clear advantage in balancing model size and performance, making it the best choice for lightweight models in resource-constrained environments.

However, the results also suggest that there is potential for further performance improvement. To facilitate a deeper analysis of the model’s performance, we generated a confusion matrix of the test results, as shown in Figure 6. This matrix provides additional insights that could guide future improvements.

The confusion matrix shows that the model we used achieved relatively high accuracy in identifying both the Bacteria disease category and the Healthy category. The model also performed well in identifying the Fungi disease category, with only four images misclassified as belonging to the Pest disease category. On closer inspection of the images where Fungi images were misclassified as Pest, it is apparent that the visual characteristics of these two disease categories are quite similar, as shown in Figure 7 and Figure 8. This similarity of features is likely contributing to the misclassification. In the Pest disease category, 55 samples were correctly classified and 4 were misclassified as Fungi. This further suggests that the overlapping features between the Pest and Fungi categories can lead to confusion, causing the model to occasionally misclassify between these two categories.

For the detection of Phytophthora symptoms, 5 out of 35 samples were misclassified as Fungi, as shown in Figure 9. The possible reason for this is that Fungi images have several characteristics that can easily lead to the misclassification of other disease categories as this type. The characteristics of the Fungi category include “circular patterns or slightly sunken leaf spots along leaf margins with yellow edges and a concentric ring appearance, or powdery spots on yellow leaves”. As these characteristics are not necessarily all present at the same time, this has a negative impact on the training of classification models, leading to identification errors.

Most of the identification errors were concentrated in the Virus disease category. Of the 54 test images, 9 were misidentified, with the most common misclassification being Virus disease images misidentified as Healthy. The characteristics of virus diseases—such as reduced leaf size, wrinkling, slight spotting or variegation, and necrosis—overlap significantly with those of other diseases. This overlap highlights the inherent difficulty in distinguishing virus diseases from other categories, leading to frequent misclassification. In addition, characteristics such as reduced leaf size and light spotting are subtle and may not be readily apparent, increasing the likelihood of these cases being misclassified as Healthy, as shown in Figure 10.

To further enhance the classification effectiveness of the model, it is proposed that improvements focus on the attention mechanism. Incorporating the sophisticated efficient channel attention (ECA) mechanism allows the model to concentrate more effectively on the critical characteristics inherent in the input data, thereby improving the accuracy of plant disease recognition. ECA calculates the channel weights, enabling the model to automatically emphasize the features that have the greatest impact on classification. Additionally, it can capture the subtle changes caused by the disease. These improvements will make the model more sensitive to similar diseases, thus reducing misclassification and ultimately improving model accuracy.

5. Conclusions

Traditional deep learning models typically require a large amount of computational resources and storage space. Typically, training areas are limited by geography and the difficulty of setting up computational environments, resulting in limited resources. Applying traditional deep learning models can be a significant challenge. This study successfully applied a lightweight deep learning model, RegNetY-400MF, for potato leaf disease detection. Through transfer learning techniques, optimized parameter combinations, and data augmentation techniques, the model demonstrated high accuracy and efficiency. The experimental results showed that the model can achieve a best test accuracy of 90.68% on the Potato Leaf Disease in Uncontrolled Environment dataset, highlighting the strong ability of our method to discriminate between different types of potato leaf diseases. The proposed approach demonstrated superior performance compared to existing lightweight models, achieving the highest average accuracy of 89.87% with the lowest standard deviation of 1.28. This indicates a high degree of consistency in performance. The model’s compact size of 16.8 MB and low computational requirement of 0.40 GFLOPs make it highly suitable for deployment on resource-constrained devices, enabling real-time disease detection in field conditions.

In future work, we aim to explore the potential of incorporating sensors into the model to enhance its utility in realistic agricultural environments. The deployment of cameras, drone monitoring, and sensors for temperature, humidity, and soil moisture will enable real-time data collection on crop-growing conditions, thereby improving our ability to predict and understand disease occurrence. Moreover, we will expand the dataset and conduct tests on various plant diseases to validate the model’s applicability. Future studies will also include Receiver Operating Characteristic Area Under the Curve (ROC-AUC) evaluation metrics to provide a more comprehensive assessment of the model’s performance. Additionally, we will measure inference time once the model is formally ported to an edge device, offering a more accurate performance metric. To further improve the model, we will refine its architecture, with a specific focus on enhancing the attention mechanism. We are considering replacing this with the Efficient Channel Attention (ECA) method, which could lead to more accurate disease predictions and offer timely farm management advice for farmers. Ultimately, these improvements will contribute to improving productivity and crop health.

Author Contributions

Conceptualization, C.-Y.C.; methodology, C.-Y.C.; software, C.-Y.C.; validation, C.-Y.C.; formal analysis, C.-Y.C.; investigation, C.-Y.C.; resources, C.-Y.C. and C.-C.L.; data curation, C.-Y.C.; writing—original draft preparation, C.-Y.C.; writing—review and editing, C.-Y.C. and C.-C.L.; visualization, C.-Y.C. and C.-C.L.; supervision, C.-C.L.; project administration, C.-C.L.; funding acquisition, C.-C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original data presented in the study are openly available in Mendeley Data at DOI: 10.17632/ptz377bwb8.1.

Conflicts of Interest

The authors declare no conflict of interest.

References

Singh, P.; Sandhu, A.S. Energy Budgeting and Economics of Potato (Solanum tuberosum L.) Cultivation under Different Sowing Methods in North-Western India. Energy 2023, 269, 126755. [Google Scholar] [CrossRef]
Wang, R.-F.; Su, W.-H. The Application of Deep Learning in the Whole Potato Production Chain: A Comprehensive Review. Agriculture 2024, 14, 1225. [Google Scholar] [CrossRef]
Sun, X.; Li, G.; Qu, P.; Xie, X.; Pan, X.; Zhang, W. Research on Plant Disease Identification Based on CNN. Cogn. Robot. 2022, 2, 155–163. [Google Scholar] [CrossRef]
Zhao, Y.; Li, Y.; Wu, N.; Xu, X. Neural Network Based on Convolution and Self-Attention Fusion Mechanism for Plant Leaves Disease Recognition. Crop Prot. 2024, 180, 106637. [Google Scholar] [CrossRef]
Sujatha, R.; Chatterjee, J.M.; Jhanjhi, N.; Brohi, S.N. Performance of Deep Learning vs Machine Learning in Plant Leaf Disease Detection. Microprocess. Microsyst. 2021, 80, 103615. [Google Scholar] [CrossRef]
Vallabhajosyula, S.; Sistla, V.; Kolli, V.K.K. A Novel Hierarchical Framework for Plant Leaf Disease Detection Using Residual Vision Transformer. Heliyon 2024, 10, e29912. [Google Scholar] [CrossRef] [PubMed]
Kang, F.; Li, J.; Wang, C.; Wang, F. A Lightweight Neural Network-Based Method for Identifying Early-Blight and Late-Blight Leaves of Potato. Appl. Sci. 2023, 13, 1487. [Google Scholar] [CrossRef]
Nazir, T.; Iqbal, M.M.; Jabbar, S.; Hussain, A.; Albathan, M. EfficientPNet—An Optimized and Efficient Deep Learning Approach for Classifying Disease of Potato Plant Leaves. Agriculture 2023, 13, 841. [Google Scholar] [CrossRef]
Mustak Un Nobi, M.; Rifat, M.; Mridha, M.F.; Alfarhood, S.; Safran, M.; Che, D. GLD-Det: Guava Leaf Disease Detection in Real-Time Using Lightweight Deep Learning Approach Based on MobileNet. Agronomy 2023, 13, 2240. [Google Scholar] [CrossRef]
Feng, J.; Hou, B.; Yu, C.; Yang, H.; Wang, C.; Shi, X.; Hu, Y. Research and Validation of Potato Late Blight Detection Method Based on Deep Learning. Agronomy 2023, 13, 1659. [Google Scholar] [CrossRef]
Aishwarya, M.P.; Reddy, P. Ensemble of CNN Models for Classification of Groundnut Plant Leaf Disease Detection. Smart Agric. Technol. 2023, 6, 100362. [Google Scholar] [CrossRef]
Shwetha, V.; Bhagwat, A.; Laxmi, V. LeafSpotNet: A Deep Learning Framework for Detecting Leaf Spot Disease in Jasmine Plants. Artif. Intell. Agric. 2024, 12, 1–18. [Google Scholar] [CrossRef]
Shewale, M.V.; Daruwala, R.D. High Performance Deep Learning Architecture for Early Detection and Classification of Plant Leaf Disease. J. Agric. Food Res. 2023, 14, 100675. [Google Scholar] [CrossRef]
Dahiya, S.; Gulati, T.; Gupta, D. Performance Analysis of Deep Learning Architectures for Plant Leaves Disease Detection. Meas. Sens. 2022, 24, 100581. [Google Scholar] [CrossRef]
Sharma, V.; Tripathi, A.K.; Mittal, H. DLMC-Net: Deeper Lightweight Multi-Class Classification Model for Plant Leaf Disease Detection. Ecol. Inform. 2023, 75, 102025. [Google Scholar] [CrossRef]
Sholihati, R.A.; Sulistijono, I.A.; Risnumawan, A.; Kusumawati, E. Potato Leaf Disease Classification Using Deep Learning Approach. In Proceedings of the 2020 International Electronics Symposium (IES), Surabaya, Indonesia, 29–30 September 2020; pp. 392–397. [Google Scholar]
PlantVillage Dataset. Available online: https://paperswithcode.com/dataset/plantvillage (accessed on 22 August 2024).
Pandiri, D.N.K.; Murugan, R.; Goel, T. ODNet: Optimized Deep Convolutional Neural Network for Classification of Solanum Tuberosum Leaves Diseases. In Proceedings of the 2022 IEEE Region 10 Symposium (TENSYMP), Mumbai, India, 1–3 July 2022; pp. 1–6. [Google Scholar]
Potato Leaf Disease Dataset. Available online: https://www.kaggle.com/datasets/muhammadardiputra/potato-leaf-disease-dataset (accessed on 22 August 2024).
Shaheed, K.; Qureshi, I.; Abbas, F.; Jabbar, S.; Abbas, Q.; Ahmad, H.; Sajid, M.Z. EfficientRMT-Net—An Efficient ResNet-50 and Vision Transformers Approach for Classifying Potato Plant Leaf Diseases. Sensors 2023, 23, 9516. [Google Scholar] [CrossRef] [PubMed]
Mahum, R.; Munir, H.; Mughal, Z.-U.-N.; Awais, M.; Sher Khan, F.; Saqlain, M.; Mahamad, S.; Tlili, I. A Novel Framework for Potato Leaf Disease Detection Using an Efficient Deep Learning Model. Hum. Ecol. Risk Assess. Int. J. 2023, 29, 303–326. [Google Scholar] [CrossRef]
Catal Reis, H.; Turk, V. Potato Leaf Disease Detection with a Novel Deep Learning Model Based on Depthwise Separable Convolution and Transformer Networks. Eng. Appl. Artif. Intell. 2024, 133, 108307. [Google Scholar] [CrossRef]
Shabrina, N.H.; Indarti, S.; Maharani, R.; Kristiyanti, D.A.; Irmawati; Prastomo, N.; Adilah M, T. A Novel Dataset of Potato Leaf Disease in Uncontrolled Environment. Data Brief 2024, 52, 109955. [Google Scholar] [CrossRef]
Radosavovic, I.; Kosaraju, R.P.; Girshick, R.; He, K.; Dollár, P. Designing Network Design Spaces. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10425–10433. [Google Scholar]
Ribani, R.; Marengoni, M. A Survey of Transfer Learning for Convolutional Neural Networks. In Proceedings of the 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images Tutorials (SIBGRAPI-T), Rio de Janeiro, Brazil, 28–31 October 2019; pp. 47–57. [Google Scholar]
Schneider, N.; Piewak, F.; Stiller, C.; Franke, U. RegNet: Multimodal Sensor Registration Using Deep Neural Networks. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 1803–1810. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Regnet_y_400mf—Torchvision 0.19 Documentation. Available online: https://pytorch.org/vision/stable/models/generated/torchvision.models.regnet_y_400mf.html#torchvision.models.regnet_y_400mf (accessed on 26 August 2024).
Wong, T.-T. Performance Evaluation of Classification Algorithms by k-Fold and Leave-One-out Cross Validation. Pattern Recognit. 2015, 48, 2839–2846. [Google Scholar] [CrossRef]
Ying, X. An Overview of Overfitting and Its Solutions. J. Phys. Conf. Ser. 2019, 1168, 022022. [Google Scholar] [CrossRef]
Liu, Y.; Li, Z.; Chen, X.; Gong, G.; Lu, H. Improving the Accuracy of SqueezeNet with Negligible Extra Computational Cost. In Proceedings of the 2020 International Conference on High Performance Big Data and Intelligent Systems (HPBD&IS), Shenzhen, China, 23 May 2020; pp. 1–6. [Google Scholar]
Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; Sandler, M.; Howard, A.; Le, Q.V. MnasNet: Platform-Aware Neural Architecture Search for Mobile; IEEE Computer Society: Washington, DC, USA, 2019; pp. 2815–2823. [Google Scholar]
Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 122–138. [Google Scholar]
Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]

Figure 1. Sample images of seven diseases on potato leaves. (a) Virus, (b) Phytophthora, (c) nematode, (d) fungi, (e) bacteria, (f) pest, and (g) healthy.

Figure 2. The training process of the proposed approach.

Figure 3. RegNet model (AnyNet). (a) Main architecture, (b) body, and (c) block.

Figure 4. Architecture of the SE channel attention mechanism.

Figure 5. The impact of image enhancement on performance. (a) Training loss curve and (b) validation accuracy curve.

Figure 6. The confusion matrix of our approach for the 7 disease categories.

Figure 7. (a) Real image of Pests category; (b) real image of Fungi category incorrectly predicted as Pests category.

Figure 8. (a) Real image of Fungi category; (b) real image of Pest category incorrectly predicted as Fungi.

Figure 9. (a) Real image of Fungi category; (b) real image of Phytophthora category incorrectly predicted as Fungi.

Figure 10. (a) Real images of Healthy category; (b) real images of Virus category incorrectly predicted as Healthy.

Table 1. Distribution of the potato leaf disease dataset.

Class	Number of Images
Virus	532
Phytophthora	347
Nematode	68
Fungi	748
Bacteria	569
Pest	611
Healthy	201

Table 2. Comparison between with and without transfer learning.

Transfer Learning	Accuracy (%)	Precision (%)	Recall (%)	F1 Score (%)
No	45.98	44.86	45.98	44.66
Yes	81.03	81.91	81.03	80.64

Table 3. Performance comparison of different parameter combinations.

Batch Size	Learning Rate	Validation Accuracy (%)	Test Accuracy (%)
8	0.0001	88.04	85.21
8	0.001	85.14	83.60
16	0.0001	83.03	83.60
16	0.001	89.89	87.78
32	0.0001	83.39	83.28
32	0.001	86.64	86.17
64	0.0001	83.75	81.03
64	0.001	87.00	87.14

Table 4. Comparison between with and without image enhancement.

Image Enhancement	Accuracy (%)	Precision (%)	Recall (%)	F1 Score (%)
No	81.03	81.91	81.03	80.64
Yes	87.78	88.51	87.78	87.80

Table 5. Experimental results of the average from 10 tests.

No. of Test	Accuracy (%)	Precision (%)	Recall (%)	F1 Score (%)
1	90.35	90.77	90.35	90.31
2	90.68	91.22	90.68	90.70
3	89.39	89.74	89.39	89.37
4	89.07	89.41	89.07	89.03
5	90.03	90.36	90.03	90.02
6	90.35	90.95	90.35	90.34
7	90.03	90.28	90.03	89.95
8	89.07	89.45	89.07	89.02
9	90.03	90.32	90.03	89.93
10	89.71	89.93	89.71	89.64
Average	89.87	90.24	89.87	89.83

Table 6. Comparison of performance measures for different models.

Model	Accuracy (%)	Std. Dev.
SqueezeNet 1.1	86.11	2.49
MNASNet1_3	87.31	2.31
ShuffleNetV2_x2_0	88.06	1.80
EfficientNet_B1	89.66	2.05
RegNetY_400MF	89.87	1.28

Table 7. Comparison of lightweight model performance metrics.

Model	GFLOPs	Size (MB)	Number of Parameters
SqueezeNet 1.1	0.35	4.7	1,235,496
MNASNet1_3	0.53	24.2	6,282,256
ShuffleNetV2_x2_0	0.58	28.4	7,393,996
EfficientNet_B1	0.69	30.1	7,794,184
RegNetY_400MF	0.40	16.8	4,344,144

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chang, C.-Y.; Lai, C.-C. Potato Leaf Disease Detection Based on a Lightweight Deep Learning Model. Mach. Learn. Knowl. Extr. 2024, 6, 2321-2335. https://doi.org/10.3390/make6040114

AMA Style

Chang C-Y, Lai C-C. Potato Leaf Disease Detection Based on a Lightweight Deep Learning Model. Machine Learning and Knowledge Extraction. 2024; 6(4):2321-2335. https://doi.org/10.3390/make6040114

Chicago/Turabian Style

Chang, Chao-Yun, and Chih-Chin Lai. 2024. "Potato Leaf Disease Detection Based on a Lightweight Deep Learning Model" Machine Learning and Knowledge Extraction 6, no. 4: 2321-2335. https://doi.org/10.3390/make6040114

APA Style

Chang, C.-Y., & Lai, C.-C. (2024). Potato Leaf Disease Detection Based on a Lightweight Deep Learning Model. Machine Learning and Knowledge Extraction, 6(4), 2321-2335. https://doi.org/10.3390/make6040114

Article Menu

Potato Leaf Disease Detection Based on a Lightweight Deep Learning Model

Abstract

1. Introduction

2. The Proposed Approach

2.1. DL Model

2.2. Transfer Learning

3. Experimental Results

3.1. Experimental Platform and Model Parameters

3.2. Dataset Construction

3.3. Performance Metrics

3.4. The Impact of Transfer Learning

3.5. Different Combinations of Training Parameters

3.6. Data Enhancement

3.7. Performance Comparison with Other Methods

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI