A Robust Ensemble Model for Plant Disease Detection Using Deep Learning Architectures

Zubair, Fida; Saleh, Moutaz; Akbari, Younes; Al Maadeed, Somaya

doi:10.3390/agriengineering7050159

Open AccessArticle

A Robust Ensemble Model for Plant Disease Detection Using Deep Learning Architectures

Department of Computer Science and Engineering, Qatar University, Doha 2713, Qatar

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

AgriEngineering 2025, 7(5), 159; https://doi.org/10.3390/agriengineering7050159

Submission received: 16 March 2025 / Revised: 29 April 2025 / Accepted: 9 May 2025 / Published: 19 May 2025

Download

Browse Figures

Versions Notes

Abstract

This study explores advanced methods for plant disease classification by integrating pre-trained deep learning models and leveraging ensemble learning. After a comprehensive review of deep learning methods in this area, the InceptionResNetV2, MobileNetV2, and EfficientNetB3 architectures were identified as promising candidates, as they have been shown to achieve high accuracy and efficiency in various applications. The proposed approach strategically combines these architectures to leverage their unique strengths: the advanced feature extraction capabilities of InceptionResNetV2, the lightweight and efficient design of MobileNetV2, and the scalable, performance-optimized structure of EfficientNetB3. By integrating these models, the approach aims to improve classification accuracy and robustness and overcome the multiple challenges of plant disease detection. Comprehensive experiments were conducted on three datasets—PlantVillage, PlantDoc, and FieldPlant—representing a mix of laboratory and real-world conditions. Advanced data augmentation techniques were employed to improve model generalization, while a systematic ablation study validated the efficacy of key architectural choices. The ensemble model achieved state-of-the-art performance, with classification accuracies of 99.69% on PlantVillage, 60% on PlantDoc, and 83% on FieldPlant. These findings highlight the potential of ensemble learning and transfer learning in advancing plant disease detection, offering a robust solution for real-world agricultural applications.

Keywords:

deep learning; InceptionResNetV2; MobileNetV2; EfficientNetB3; plant disease classification; artificial intelligence; agriculture

1. Introduction

In recent years, the dominance of plant diseases has increased significantly, negatively affecting agricultural production and food security globally. Agriculture is fundamental to our society, and ensuring crop health is vital to sustaining food security. Unfortunately, the impact of plant diseases is often taken lightly, leading to substantial consequences for farmers and consumers [1]. Early detection of these diseases is crucial to prevent widespread outbreaks and to enable timely treatment, which is essential for effective agricultural management [2]. Symptoms often manifest as distinctive marks or lesions on leaves, flowers, or fruits, with leaf images serving as the primary source of diagnosis [2].

Many farmers are unaware of the various plant diseases that affect their crops, which are susceptible to infections caused by fungi, molds, temperature fluctuations, humidity, and rain. These infections can cause significant financial losses and threaten the livelihoods of agricultural workers, further contributing to food shortages [3]. Traditionally, plant disease identification relies on visual inspections by trained experts or machine detection using image processing techniques [2]. However, visual inspections are time-consuming, costly, and prone to human error, particularly in developing countries where expert access is limited [4]. This highlights the urgent need for automated and efficient plant disease recognition systems that can monitor large fields and quickly detect symptoms [5]. Recent advancements, such as the integration of deep learning with IoT and edge computing technologies, offer scalable and real-time solutions for plant disease detection, transforming traditional agricultural practices into more automated and efficient systems [6].

Deep learning techniques present promising solutions to the limitations of traditional methods. Although conventional machine learning approaches can be effective, they often require manual feature extraction and significant expert input [4]. In contrast, deep transfer learning can automatically recognize hierarchical features in images, greatly enhancing the accuracy and efficiency of plant disease identification. This advancement reduces reliance on skilled labor and facilitates real-time monitoring and diagnosis of plant health, providing a practical and cost-effective strategy to protect agricultural productivity [4].

As interest in machine learning and deep learning grows, advancements in computer technology and data processing capabilities have become vital for the effective detection and diagnosis of plant diseases [7]. Although machine learning has been used to categorize diseases, it often struggles with generalization [8]. The effectiveness of plant disease classification relies heavily on feature extraction and the choice of classifiers. Some algorithms, like random forests, require extensive parameter tuning, leading to increased computation time [9]. The rise of convolutional neural networks (CNNs) has significantly improved accuracy in agricultural applications, eliminating the need for intricate pre-processing and feature extraction. However, training large neural networks demands considerable time and access to large datasets, which can be challenging when generating labelled data is costly and difficult. Transfer learning emerges as a valuable technique in such situations, allowing knowledge transfer from pre-trained models to new models, thereby reducing the requirements for extensive training data and computational resources [8].

In this paper, we propose an ensemble approach that combines three state-of-the-art transfer learning models: InceptionResNetV2, MobileNetV2, and EfficientNetB3, as shown in Figure 1. This ensemble model aims to utilize the strengths of each individual model, enhancing overall performance and robustness in plant disease detection. InceptionResNetV2 is chosen for its ability to merge the strengths of Inception and ResNet architectures, efficiently learning rich feature representations. MobileNetV2 is included for its lightweight design, making it suitable for deployment in resource-constrained environments without sacrificing performance. EfficientNetB3 is known for its scalability, balancing depth and width to achieve high accuracy with fewer parameters. By combining these models in an ensemble, we aim to achieve improved classification accuracy and robustness compared to using each model individually. Moreover, our research utilizes three carefully chosen datasets for model training: PlantVillage [10], PlantDoc [11], and FieldPlant [12] datasets. PlantVillage is a well-known dataset offering a wide range of plant diseases, while PlantDoc features real-world images. The FieldPlant dataset is the latest available, providing fresh insights into real-world conditions and variations in plant diseases. This combination ensures comprehensive coverage, enhancing the robustness of our models. The main objectives of this study can be summarized as follows:

Perform a detailed analysis and comparison of deep learning models applied to plant disease detection, focusing on their performance, scalability, and suitability for deployment in real-world agricultural settings. This includes analyzing how well different models handle challenges such as class imbalance and complex disease symptoms across multiple plant species.
Design and develop a robust ensemble framework by integrating three state-of-the-art convolutional neural network architectures—InceptionResNetV2, MobileNetV2, and EfficientNetB3. The goal is to create a unified model that utilizes the detailed feature extraction of InceptionResNetV2, the computational efficiency of MobileNetV2, and the scalability of EfficientNetB3, resulting in improved accuracy and robustness in both controlled and real-world environments.
Improve the model’s ability to generalize across unseen data by applying sophisticated data augmentation techniques. These include random rotations, zooming, horizontal and vertical flips, and rescaling, all of which introduce greater variability into the training process and simulate real-world image capture conditions. This helps ensure that the model performs well even in challenging or unfamiliar scenarios.
Validate the effectiveness and adaptability of the proposed ensemble model using three diverse datasets—PlantVillage, PlantDoc, and FieldPlant—each offering different characteristics ranging from clean laboratory images to complex, real-world field conditions. Evaluation is carried out through multiple performance metrics, including accuracy, precision, recall, F1-score, and confusion matrices, along with comparative analysis against individual base models and existing approaches in the literature.

The remainder of this paper is structured as follows: In Section 2, we give an overview of previous research on the topic. Section 3 explains the datasets we used for this study. In Section 4, we provide detailed information about the proposed approach. Section 5 defines the evaluation metrics used to measure the performance of the proposed model. The results and findings are presented in Section 5. Lastly, Section 6 concludes the paper by discussing the advantages and limitations of the study, along with potential directions for future research, contributing to ongoing efforts in plant disease classification.

2. Literature Review

Researchers have explored various deep learning models and innovative techniques to enhance plant disease detection accuracy, often combining or adapting methods for optimal performance. Several studies focused on CNN-based models for classification, emphasizing transfer learning and fine-tuning. H. Hong compared models like ResNet50, Xception, MobileNet, ShuffleNet, and DenseNet121_Xception on tomato leaf datasets [8]. DenseNet121_Xception achieved the highest accuracy of 97.1%, though it required more parameters, whereas ShuffleNet, using fewer parameters, reached 83.68% accuracy. Similarly, Ref. [13] applied ResNet-50 to classify plant diseases with an accuracy of 98.98%, showcasing the effectiveness of transfer learning when handling large image datasets. A recent study utilized transfer learning with deep convolutional neural networks, such as VGG19 and InceptionV3, to classify diseases in strawberry plants, training on a dataset collected from various sources that exhibited class imbalance [14]. By fine-tuning these pre-trained models, the authors were able to recognize the unique characteristics of strawberry diseases with a relatively small labeled dataset. They also implemented four data augmentation techniques—RandomFlip, ColorJitter, Cutmix, and cropping—to enhance model generalization and mitigate overfitting. The highest accuracies recorded were 88.65% for VGGNet, 96.25% for InceptionV3, 98.13% for ResNet50, and 91.06% for InceptionResNetV2 [15]. Another notable contribution is DeepPlantNet, a CNN architecture specifically developed for plant leaf disease classification. It was evaluated using the PlantVillage dataset and consists of twenty-eight learnable layers—twenty-five convolutional and three fully connected—designed for both accuracy and computational efficiency. The model employs fire modules to reduce parameters, batch normalization for training stability, and Leaky ReLU activation to address the dying ReLU issue. DeepPlantNet achieved over 94% accuracy across multiple class setups, showcasing its potential as a lightweight and effective solution for plant disease detection. However, despite its strong performance in controlled conditions, its adaptability to real-world agricultural environments with varying backgrounds and lighting remains to be fully validated [16].

Other studies explored advanced architectures like hybrid and ensemble models to enhance classification accuracy. For example, G. Sachdeva’s team utilized a deep CNN (DCNN) combined with Bayesian learning, achieving 98.8% accuracy by improving pixel dependency and utilizing conditional probabilities in classification [5]. T. Anandhakrishna also employed DCNN, achieving a comparable accuracy of 98.4% and demonstrating robustness in challenging conditions like low light [17]. S. Kumar Sahu took a different route with a hybrid random forest multi-class support vector machine (HRF-MCSVM), achieving 98.9% accuracy by combining classifiers to minimize error rates and optimize feature extraction [9]. These approaches underscore the benefits of combining multiple models or methodologies to address the complexity of plant disease classification. In addition, studies have highlighted the effectiveness of multi-stage detection networks like Faster R-CNN for higher accuracy, and one-stage networks like SSD and YOLO for faster detection speeds, depending on the practical application needs [6].

Data augmentation and synthetic data generation were key strategies for improving model performance. A. Abbas utilized conditional generative adversarial networks (C-GANs) to generate synthetic images, thus expanding the dataset and preventing overfitting during DenseNet121 training [18]. Similarly, Anim Ayeko applied data augmentation techniques with ResNet-9, achieving a notable accuracy of 99.25% by increasing the dataset size and optimizing hyperparameters [19]. These approaches underline the importance of enriched training data for achieving higher classification accuracy. For grape disease detection, J. Chen proposed a novel SegCNN approach that combined image segmentation and classification to reach 93.75% accuracy [1]. This method used an enhanced neural network-based segmentation to isolate diseased regions, outperforming traditional models like LeNet, AlexNet, and GoogleNet. By focusing on precise region extraction, SegCNN effectively handled challenges such as poor lighting and noise. A. Bansal’s work with ResNeXt-50 for classifying cucumber leaf disease severity highlighted the use of cardinality over depth to improve accuracy and robustness. With an accuracy of 97.81%, ResNeXt-50 surpassed other models like U-Net and YOLOv5, making it a reliable tool for handling diverse leaf image datasets [20].

Overall, these studies show the evolution of plant disease detection methods, using CNN variations, hybrid approaches, and data augmentation techniques. The focus on transfer learning, enhanced data utilization, and model combinations demonstrates a shared goal of improving accuracy and robustness in complex agricultural applications. Table 1 lists the different kinds of classifiers and their corresponding accuracies across various datasets. Transfer learning models represent a paradigm shift in machine learning, enabling models trained on large datasets to be adapted for specific tasks with limited data. This approach significantly reduces training time, computational costs, and the data requirements for high-performance results. Popular architectures such as InceptionResNetV2, MobileNetV2, and EfficientNetB3 exemplify transfer learning’s power, offering scalable, efficient, and accurate solutions for a wide range of applications, including plant disease detection. This adaptability makes transfer learning essential for solving complex problems across domains. In this section, we analyze these three models for plant disease classification, examining their performance across different datasets and scenarios.

InceptionResNetV2: InceptionResNetV2 [29] represents a deep learning model that merges the strengths of Inception and ResNet designs. Engineered by Google researchers, this architecture utilizes residual skip connections—borrowed from ResNet—to mitigate vanishing gradient problems and stabilize training dynamics. Simultaneously, it preserves the Inception network’s ability to hierarchically discern features across multiple scales [30]. The model’s structure is organized into three primary components (Blocks A, B, and C), each culminating in a dimensionality reduction phase. These modules compress spatial resolution (height and width) while augmenting channel depth, optimizing computational efficiency [29]. Empirical evaluations, summarized in Table 2, demonstrate the model’s classification accuracy on varied datasets, alongside architectural refinements. This comparative analysis highlights its adaptability and effectiveness in diverse data environments.

MobileNetV2: MobileNetV2 is a neural network designed specifically for mobile and low-power devices, focusing on being fast and efficient without losing accuracy. Its key feature is the use of inverted residuals with linear bottlenecks, which first increase the size of the input, use depthwise convolutions to extract features, and then reduce the size back down. Depthwise convolutions are a type of lightweight convolution operation that processes each input channel separately, unlike standard convolutions that mix all channels together. This significantly reduces the number of calculations required, allowing the model to run faster and use less memory, especially useful for devices like smartphones or edge hardware. This helps save both processing power and memory, making it perfect for mobile applications [37].

To further boost efficiency, MobileNetV2 replaces standard convolution layers with a combination of 3 × 3 depthwise and 1 × 1 pointwise convolutions, cutting the computational cost by eight to nine times while only slightly reducing accuracy. It also uses linear bottlenecks in narrow layers to keep the model powerful and effective at recognizing features [38]. It can be used in performance-constrained applications, delivering both high accuracy and low computational demands. Table 3 provides a summary of the accuracy scores achieved by the modified MobileNetV2 model across different datasets, showcasing its performance and architectural adjustments.

EfficientNetB3: EfficientNetB3 [42] is a convolutional neural network (CNN) that is part of the EfficientNet family, which ranges from EfficientNetB0 to EfficientNetB7. The model is notable for its use of compound scaling, a technique that balances the scaling of the model’s width, depth, and resolution to enhance performance. Instead of making the model deeper, wider, or using larger input images separately, compound scaling increases all three—depth (number of layers), width (number of features per layer), and resolution (input image size)—in a balanced way. This coordinated scaling allows the model to achieve higher accuracy while keeping computational costs low, making it both powerful and efficient for practical use.

EfficientNetB3 utilizes mobile inverted bottleneck convolution blocks (MBConv) with kernel sizes of 3 × 3 and 5 × 5, significantly reducing the computational demands compared to traditional convolutional methods. With 210 layers and around 11.1 million parameters, the model accepts input sizes of 300 × 300 × 3. Its deeper architecture allows for a better understanding of complex features, making it effective for classification tasks [42].

In addition to its architecture, EfficientNetB3 includes advanced features like Swish activation (a type of activation function), Squeeze-and-Excitation (SE) blocks (which enhance the model’s focus on important features), and dropout layers (which help prevent overfitting by randomly deactivating some neurons during training) [43]. Table 4 summarizes the accuracy scores of the EfficientNetB3 model across various datasets, highlighting its performance and the modifications made to its architecture.

To summarize, the decision to combine InceptionResNetV2, MobileNetV2, and EfficientNetB3 in the study stems from their complementary strengths and the desire to leverage the unique advantages of each architecture for plant disease detection. InceptionResNetV2 combines the deep feature extraction capabilities of the Inception modules with the efficiency of residual connections, making it highly effective at capturing complex patterns in large datasets. MobileNetV2, on the other hand, is optimized for resource-constrained environments due to its lightweight structure and efficient depth-separable convolution, ensuring faster inference without significant loss of accuracy. EfficientNetB3 introduces a composite scaling approach that balances depth, width, and resolution to maximize performance while minimizing computational costs. By integrating these models, we aim to leverage their individual strengths—depth, efficiency, and scalability in a unified framework that enables robust and accurate detection of plant diseases under different conditions and constraints.

3. Datasets

The study utilizes three datasets—PlantVillage, PlantDoc, and FieldPlant—selected for their diverse features and significance in advancing plant disease detection. These datasets provide a mix of laboratory and real-world images, covering various plant species and disease types, to ensure comprehensive model evaluation. Figure 2 illustrates sample images from these datasets, showcasing the variety of healthy and diseased plant specimens included.

PlantVillage dataset: The PlantVillage [10] dataset contains more than 54,323 images of both healthy and diseased leaves from 14 different crop species, such as Apple, Blueberry, Cherry, Corn, Grape, and Orange, among others. It includes images of seventeen fungal diseases, four bacterial diseases, two mold diseases, two viral diseases, and one disease caused by a mite. The dataset also features images of healthy leaves that show no visible signs of disease. All images are carefully curated and labeled by plant pathology experts, ensuring high-quality data for research and development in plant health diagnostics. An open access repository of images on plant health to enable the development of mobile disease diagnostics.

PlantDoc dataset: PlantDoc [11] is a dataset specifically created for visual plant disease detection, containing a total of 2598 images from 13 plant species across 27 classes, including both diseased and healthy specimens. This dataset was developed to tackle the challenges of early plant disease detection, particularly in uncontrolled environments that are typical in real-world agricultural scenarios. Its goal is to enable the use of computer vision techniques in agriculture, especially for farmers with low-end mobile devices. Research has demonstrated that using this dataset to fine-tune models can enhance classification accuracy by up to 31%, making it an asset for researchers and professionals in plant pathology and agricultural technology.

FieldPlant dataset: The FieldPlant [12] dataset consists of 5170 images of plant diseases captured directly from plantations, specifically created for identifying and classifying plant diseases using deep learning techniques. It features manual annotations of individual leaves, totalling 8629 annotated leaves across 27 disease classes. The annotations were conducted under the guidance of plant pathologists to ensure the quality of the data. This dataset aims to offer researchers a resource for developing models that can accurately identify and classify plant diseases in real-world conditions, overcoming the limitations of earlier datasets like PlantVillage and PlantDoc, which primarily relied on laboratory images. In refining the dataset, we removed non-leaf disease classes, such as Cassava root rot (78 images) and Corn charcoal, to enhance its relevance. We also excluded images containing fewer than 50 instances to ensure high dataset quality. In cases where images featured multiple classes, such as healthy and diseased tomato leaves, we focused exclusively on the diseased instances by removing the healthy ones.

Figure 3, Figure 4 and Figure 5 illustrate the image distribution across the three datasets. The graph clearly shows that the PlantDoc dataset maintains a more balanced class distribution relative to PlantVillage and FieldPlant. Including these varied datasets demonstrates our dedication to rigorously evaluating the performance and reliability of our method. Leveraging PlantVillage, PlantDoc, and FieldPlant data allows us to account for dataset-specific biases and variability. This approach of combining multiple datasets improves the adaptability of our method and reinforces its robustness in real-world applications.

4. Proposed Approach

The following section presents the proposed model architecture and the data augmentation techniques employed in the study. Figure 1 illustrates the framework for the plant leaf disease detection and classification approach utilized in this paper.

4.1. Data Augmentation

To prepare image data for training, augmentation techniques are crucial for improving model generalization and resilience. We employed ImageDataGenerator to implement these transformations. The augmentation process includes random rotations, zoom variations, and both horizontal and vertical flips, which artificially expand the training dataset’s diversity. This increased variability helps the model develop stronger feature recognition capabilities while reducing its tendency to overfit to specific training examples [33]. Additionally, all images undergo pixel value normalization through rescaling, maintaining uniform input dimensions for the model.

Each dataset used for model training and evaluation is divided into three parts: training, validation, and testing. Both the training and validation data come from the same source and are split before model training begins. For the PlantVillage and PlantDoc datasets, the data were already pre-divided into training and testing sets; therefore, 20% of the training data was further split off to create a validation set. However, for the FieldPlant dataset, the images were initially organized into separate folders for each disease class without predefined training and testing splits. As a result, the dataset was restructured by manually creating new training and testing sets, and then 20% of the newly formed training set was set aside for validation.

While only the training dataset undergoes augmentation and rescaling, this design choice is intentional and follows standard practice to ensure a fair and realistic evaluation of the model. Applying transformations such as flipping or rotation to validation or test data could introduce artificial variance that does not reflect real-world deployment scenarios. By keeping the validation and test sets unchanged, we preserve the original data distribution and ensure that performance metrics accurately reflect how the model generalizes to unseen data. The normalization step (rescaling pixel values to [0, 1]) is consistently applied across all datasets, maintaining input consistency while reserving augmentation solely for enhancing training diversity. Table 5 shows the data augmentation methods and the corresponding values applied to increase the diversity and robustness of the training dataset.

4.2. Model Architecture

The model architecture is designed to extract complex features from images using pre-trained networks, specifically InceptionResNetV2, MobileNetV2, and EfficientNetB3. These networks are trained on large datasets like ImageNet and are known for their ability to capture high-level representations. For the current task, the models have been adapted by freezing their initial blocks while allowing fine-tuning of the last 10 blocks. This ensures that the deeper layers are trainable, enabling the models to adjust to task-specific patterns and features.

Each architecture preprocesses the input image (of size 224 × 224 × 3) before feeding it into the respective networks. The initial blocks of each pre-trained model are frozen to retain their generalized feature extraction capability, while the last 10 blocks are unfrozen, allowing these layers to become trainable. This selective unfreezing is crucial for enabling the networks to learn task-specific nuances without losing their prior knowledge.

After extracting the features from the fine-tuned layers, a global average pooling (GAP) layer is applied to reduce the spatial dimensions of the feature maps and aggregate the extracted features. After feature extraction using global average pooling, the outputs from all three architectures are concatenated. This concatenation combines the rich features learned by each model, ensuring that diverse patterns are captured. The combined feature map is then passed to additional layers designed to optimize performance.

To enhance the capacity for learning intricate patterns, we included a dense layer with 512 units and a rectified linear unit (ReLU) activation function. This layer allows the model to extract high-level representations from the concatenated features. To stabilize the training process and speed up convergence, a batch normalization layer was added after the dense layer, which helps mitigate gradient issues and improves learning efficiency.

Furthermore, a dropout layer with a 50% dropout rate was implemented to reduce overfitting. Randomly dropping out neurons during training makes the model more robust and better equipped for generalization. The final output layer is a dense layer with N units, corresponding to the number of target classes, and uses a softmax activation function to output class probabilities. The combination of pre-trained feature extractors, feature concatenation, and carefully designed additional layers ensures a robust and efficient model for classification. As shown in Figure 6, this architecture is well-suited for tasks requiring high accuracy and adaptability to task-specific features.

The models are trained with accuracy as the primary metric, limited to 15 epochs. We utilize callback functions, such as early stopping and reducing the learning rate on plateau, to ensure a robust and efficient training process.

4.3. Parameter Settings

The study was conducted using Google Colab’s cloud-based platform, which provided a Python 3.7 environment with GPU acceleration through Google’s computational infrastructure. Our RAM capacity included 12.7 GB of available memory. Key software libraries employed were TensorFlow for deep learning operations, Keras for model architecture, NumPy for numerical computations, and Matplotlib (version 3.8.2) for visualization. Input image resolutions were architecture-specific: 299 × 299 pixels for InceptionResNetV2, 224 × 224 for MobileNetV2, and 300 × 300 for EfficientNetB3 models. All experiments used consistent training parameters, including 16 batch size across 15 epochs.

Hyperparameter selection critically influences model generalization capability. Our architecture contains 69,600,000 total parameters, comprising 69,383,417 trainable weights and 216,583 fixed parameters, with a cumulative memory footprint of 265.59 megabytes.

For optimization, we implemented the Adam Optimizer [46], which dynamically adjusts learning rates using gradient moment estimates. Model training employed categorical cross-entropy [47] to quantify prediction errors relative to ground truth labels. The initial learning rate was initialized at 10⁻⁴ with softmax activation [19], converting outputs to class probabilities. To regularize the network, we applied 50% neuron dropout during the training phases.

The training protocol incorporated two callback mechanisms: early stopping monitored validation loss with 5-epoch patience to terminate unproductive training, while ReduceLROnPlateau dynamically scaled the learning rate downward by 0.2x when validation metrics plateaued for two consecutive epochs (with 10⁻²² minimum learning rate threshold). These mechanisms collectively enhanced training efficiency while preventing overfitting to training data. Table 6 presents a comprehensive list of all the parameters that were used in the model configuration.

4.4. Evaluation Measures

To assess and refine the performance of our model, we employ a comprehensive set of evaluation measures. Precision, recall, F1 score, and accuracy serve as important metrics, each offering insights into the model’s efficacy. These measures contribute to a thorough evaluation and ensure a nuanced understanding of the model’s strengths and areas of improvement.

Accuracy is the overall correctness of predictions, considering both true positives and true negatives. This is the most important metric, but it should be complemented by F1 score, precision, and recall for a comprehensive evaluation.

Accuracy = \frac{True Positives + True Negatives}{Total Predictions}

(1)

Precision is the ratio of correctly identified diseased tomatoes to the total number of tomatoes predicted as diseased. A higher precision indicates a lower rate of false positives, emphasizing the accuracy of the model.

Precision = \frac{True Positives}{True Positives + False Positives}

(2)

Recall, also known as sensitivity or the true positive rate, is the ratio of correctly identified diseased tomatoes to the total number of actually diseased tomatoes. For our research, it is crucial to capture all instances of diseased tomatoes. Higher recall indicates the model’s effectiveness in reducing false negatives.

Recall = \frac{True Positives}{True Positives + False Negatives}

(3)

The F1 score is the harmonic mean of precision and recall and provides a balanced assessment of the model’s performance. It is useful in situations where the dataset is imbalanced. A higher F1 score indicates that the model achieved a balance between precision and recall.

F 1 Score = \frac{2 \cdot Precision \cdot Recall}{Precision + Recall}

(4)

5. Results and Discussion

The following section presents the model’s performance metric such as the loss and accuracy of evaluated on the above-mentioned datasets along with Recall, Precision, and F1 Score for each class, a confusion matrix depicting the model’s classification performance across different classes, and confidence values indicating the model’s certainty in predictions on the test data.

5.1. Loss and Accuracy

The training and validation accuracy graphs (Figure 7) illustrate the performance of the proposed model on three datasets. For the PlantVillage dataset (7a), the model demonstrates rapid convergence, with training accuracy rising sharply within the first few epochs and reaching nearly 100% by the 12th epoch. Validation accuracy also shows consistent improvement, stabilizing around 98%, with minimal divergence from the training accuracy. This reflects the model’s strong ability to generalize on this dataset, likely due to its relatively high quality and balanced data distribution. For the PlantDoc dataset (7b), the model’s training accuracy steadily increases, reaching approximately 90% by the 12th epoch, indicating effective learning on the training data. However, the validation accuracy progresses more slowly and stabilizes around 65–70%, resulting in a noticeable gap between training and validation accuracy. In the FieldPlant dataset (7c), training accuracy improves rapidly, reaching nearly 95% by the 6th epoch, yet validation accuracy remains low, stabilizing around 60% with limited improvement, pointing to overfitting issues. The faster convergence on the FieldPlant training data and lower validation performance indicate that this dataset poses more challenges for the model’s generalization due to the data imbalance overall. In contrast, the model achieved high training accuracy on both datasets.

The training and validation loss graphs (Figure 8) indicate that the proposed model performs well on the training data for all three datasets, as shown by the decreasing training losses. The training loss decreases steadily for the PlantVillage dataset (8a), reaching a very low value by the 12th epoch. The validation loss also shows a declining trend, stabilizing after the fifth epoch with minimal fluctuations. The small gap between training and validation losses indicates minimal overfitting, suggesting that the model generalizes well to unseen data. In the PlantDoc dataset (8b), the training loss approaches zero by the 12th epoch, while the validation loss stabilizes around 1.5, suggesting overfitting as the model performs better on training data than on unseen data. For the FieldPlant dataset (8c), training loss also declines significantly, but validation loss remains high and even increases after an initial drop, indicating further overfitting and difficulty generalizing to the validation set.

This overfitting behavior suggests that while the model effectively learns the training data, it struggles to generalize across diverse and noisy real-world examples present in the FieldPlant and PlantDoc datasets. To mitigate this, future improvements could include implementing stronger regularization techniques such as L2 weight decay and increasing dropout rates in the fully connected layers. Additionally, incorporating early stopping with more stringent patience settings and using more extensive data augmentation (e.g., brightness and contrast variation, random cropping) may help reduce overfitting. Training with a larger and more balanced dataset or using techniques like synthetic data generation with GANs [48] could also improve generalization across underrepresented classes.

Table 7 presents a comparative analysis of the test accuracy achieved by the proposed ensemble approach against other accuracies achieved by various papers that employ distinct approaches across three datasets. The proposed ensemble approach demonstrated improved accuracy for the FieldPlant and PlantVillage datasets, showcasing its ability to combine the strengths of multiple architectures. However, for the PlantDoc dataset, the proposed model did not surpass the state-of-the-art results, indicating areas for further optimization and refinement.

The lower accuracy on the PlantDoc and FieldPlant datasets can be explained by their challenging real-world nature. Unlike the well-structured PlantVillage dataset, these datasets include images captured under uncontrolled conditions, with variations in lighting, background, and image quality. Furthermore, class imbalance and limited samples for certain disease categories make it difficult for the model to learn clear distinguishing patterns, resulting in more frequent misclassifications and reduced overall accuracy.

As seen in Figure 9 for the FieldPlant dataset, some of the images were selected randomly and the model was made to predict the label. The confidence levels, which refer to the degree of assurance that the models are correctly predicted, were calculated and illustrated on the images. It also displays the actual and predicted values, along with the leaf image. This provides visual confirmation of the model’s performance and showcases its capacity to identify various classes accurately.

Most predictions show high confidence, such as “Cassava Mosaic” cases with 100% confidence, indicating the model’s strong performance in correctly identifying certain diseases. However, in some cases, like the “Cassava Healthy” sample with a confidence of 0.67 and a “Cassava Mosaic” misclassified as “Cassava Brown Leaf Spot” with 0.56 confidence, the lower confidence levels suggest areas where the model may struggle, particularly with visually similar classes. Overall, this visual confirmation highlights the model’s strengths and limitations, demonstrating its ability to classify with high confidence in most cases but also identifying cases where misclassification or lower confidence occurs.

5.2. Precision, Recall, and F1 Score Analysis

The classification performance of the proposed model is evaluated based on precision, recall, and F1 score metrics across three datasets. The following analysis highlights the key observations and challenges identified from the results.

The performance of the model varied across the three datasets. On the PlantVillage dataset (Table 8), the results show outstanding classification ability, with most classes achieving precision, recall, and F1 scores close to 100%. Classes such as Apple Scab, Black Rot, Cedar Apple Rust, and Tomato Mosaic Virus were classified perfectly. Even in slightly challenging cases like Corn Cercospora Leaf Spot, the model maintained an F1 score above 94%, demonstrating excellent generalization on clean and structured images captured under controlled conditions.

In comparison, results on the PlantDoc dataset reveal greater variability in performance, as depicted in Table 9. While some classes, such as Bell Pepper Leaf, Raspberry Leaf, and Squash Powdery Mildew Leaf, achieved high F1 scores, others, like Corn Gray Leaf Spot and Tomato Leaf Bacterial Spot, showed significant drops. The lower precision and recall for certain categories highlight the challenges posed by real-world images, where factors such as varying lighting, background clutter, and occlusion affect model predictions. The confusion between healthy and diseased leaves, particularly in field images, contributed to these misclassifications.

Performance on the FieldPlant dataset showed a middle ground between PlantVillage and PlantDoc (Table 10). Classes like Cassava Mosaic and Tomato Brown Spots achieved high F1 scores above 90%, indicating strong detection ability for some diseases. However, several classes, especially Corn Healthy and Tomato Healthy, reported moderate F1 scores around 46% to 54%. The reduced performance for healthy classes suggests that the model occasionally struggles to distinguish healthy leaves from mildly infected or visually ambiguous samples in complex field environments.

Overall, the model achieved exceptional results on clean datasets but faced reduced accuracy on field-acquired images with uncontrolled variations. The trend observed across the three datasets confirms the model’s strong capability in structured settings and also highlights areas where further improvements, such as advanced data augmentation, domain adaptation techniques, and the integration of multimodal inputs, could be explored to enhance robustness in real-world agricultural scenarios.

5.3. Evaluation Measures for Each Class

A confusion matrix is generated for each class, aiding in the identification of predicted and actual values for each class. The diagonal elements of the matrix represent the count of correctly predicted values.

The confusion matrices for the PlantVillage, PlantDoc, and FieldPlant datasets reveal the model’s ability to accurately classify most classes, as indicated by the strong diagonal dominance. For the PlantVillage dataset (Figure 10), the model demonstrates excellent performance, with high accuracy in several classes. However, some misclassifications are observed, such as Tomato Leaf Curl Virus being misclassified as Tomato Bacterial Spot, and Tomato Septoria Leaf Spot confused with Blueberry Healthy, likely due to visual similarities or overlapping features.

Similarly, the PlantDoc dataset (Figure 11) shows strong performance, with accurate predictions in classes like Apple Scab Leaf (10 correct), Bell Pepper (61 correct), and Corn Leaf Blight (57 correct). However, errors such as Cherry Leaf being confused with Bell Pepper Leaf Spot (fourteen incorrect) and Grape Leaf Black Rot misclassified as Grape Leaf (three incorrect) highlight challenges stemming from visual resemblances and class imbalances.

The FieldPlant dataset (Figure 12) also exhibits strong performance in certain classes, such as Cassava Mosaic (204 correct) and Corn Leaf Blight (206 correct). Nevertheless, significant confusion is noted between visually similar diseases, particularly within cassava plants, where Cassava Brown Leaf Spot is often misclassified as Cassava Healthy or Cassava Mosaic. Additionally, underrepresented classes, such as Tomato Leaf Yellow Virus, suffer from poor performance due to class imbalance. Overall, the confusion matrices emphasize the model’s robustness while identifying areas where visual similarities and class imbalances contribute to misclassification.

5.4. Ablation Study

To validate the proposed model architecture, we conducted an ablation study using the PlantDoc dataset. The study was designed to evaluate the impact of different architectural layers and combinations of pre-trained models on the overall accuracy of the classification task.

The effect of adding layers was tested on the three-model ensemble architecture, which combines features from InceptionResNetV2, MobileNetV2, and EfficientNetB3. Starting with no additional layers, dense, batch normalization, and dropout layers were added incrementally. Results, summarized in Table 11, show that the inclusion of all three layers—dense layer, batch normalization, and dropout—achieved the highest accuracy of 60.1%, demonstrating the effectiveness of this combination in improving feature learning and reducing overfitting.

Additionally, the performance of individual models, pairwise combinations, and the three-model ensemble was evaluated with all three layers. Each pre-trained model was tested individually, followed by combinations of two models, and finally the three-model ensemble. As shown in Table 12, the three-model ensemble approach consistently outperformed other configurations, achieving the highest accuracy.

The results validate the effectiveness of the proposed architecture. The use of pretrained models as feature extractors, combined with carefully chosen additional layers and the ensemble strategy, ensures robust and adaptable performance. The three-model ensemble with dense, batch normalization, and dropout layers was identified as the optimal configuration for achieving the highest accuracy.

6. Conclusions

Plant disease detection is a critical area of research aimed at enhancing agricultural productivity and ensuring food security. To address this challenge, we proposed an ensemble approach combining InceptionResNetV2, MobileNetV2, and EfficientNetB3. By using the strengths of each model together, we achieved better accuracy and reliability in identifying plant diseases. The integration of data augmentation techniques further enhanced model performance by mitigating overfitting and enabling effective learning across diverse datasets.

Our study utilized three datasets—PlantVillage, PlantDoc, and the FieldPlant Dataset—each contributing a unique characteristic. This combination ensured robustness and highlighted the potential of deep learning in tackling the complexities of plant disease detection. Our results show that transfer learning and ensemble modeling hold great potential for agricultural use, especially in resource-limited areas that need real-time monitoring and diagnosis. Despite these advancements, challenges remain, such as the need for larger and more diverse datasets, the computational demands of training deep learning models, and the practical deployment of such systems.

Future research directions could focus on creating a localized dataset by collecting plant images from Qatar, enabling region-specific disease detection. Developing models capable of handling multi-label classification, where multiple diseases or symptoms appear on a single plant, would further improve practical applicability. Lightweight architectures optimized for mobile and edge devices, such as smartphones or drones, could facilitate real-time and resource-efficient deployments. Time-series analysis and environmental data, including temperature and humidity, could be incorporated to predict disease outbreaks. Additionally, explainable AI techniques should be explored to make deep learning models more interpretable for farmers and agricultural experts, bridging the gap between technology and end-users.

Author Contributions

Conceptualization, F.Z.; methodology, F.Z.; validation, F.Z., M.S. and S.A.M.; formal analysis, F.Z.; investigation, F.Z.; resources, F.Z.; writing—original draft preparation, F.Z.; writing—review and editing, Y.A.; supervision, Y.A., M.S. and S.A.M.; funding acquisition, S.A.M. All authors have read and agreed to the published version of the manuscript.

Funding

The research reported in this publication was supported by the Qatar Research Development and Innovation Council [ARG01-0513-230141]. The Qatar National Library provides Open Access funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used during this study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, J.; Chen, J.; Zhang, D.; Nanehkaran, Y.A.; Sun, Y. A cognitive vision method for the detection of plant disease images. Mach. Vis. Appl. 2021, 32, 31. [Google Scholar] [CrossRef]
Fan, X.; Luo, P.; Mu, Y.; Zhou, R.; Tjahjadi, T.; Ren, Y. Leaf image based plant disease identification using transfer learning and feature fusion. Comput. Electron. Agric. 2022, 196, 106892. [Google Scholar] [CrossRef]
Tiwari, V.; Joshi, R.C.; Dutta, M.K. Dense convolutional neural networks based multiclass plant disease detection and classification using leaf images. Ecol. Inform. 2021, 63, 101289. [Google Scholar] [CrossRef]
Faisal, S.; Javed, K.; Ali, S.; Alasiry, A.; Marzougui, M.; Khan, M.A.; Cha, J.H. Deep transfer learning based detection and classification of citrus plant diseases. Comput. Mater. Contin. 2023, 76, 895–914. [Google Scholar] [CrossRef]
Sachdeva, G.; Singh, P.; Kaur, P. Plant leaf disease classification using deep Convolutional neural network with Bayesian learning. Mater. Today Proc. 2021, 45, 5584–5590. [Google Scholar] [CrossRef]
Shoaib, M.; Sadeghi-Niaraki, A.; Ali, F.; Hussain, I.; Khalid, S. Leveraging deep learning for plant disease and pest detection: Acomprehensive review and future directions. Front. Plant Sci. 2025, 16, 1538163. [Google Scholar] [CrossRef] [PubMed]
Ahmed, N.; Asif, H.M.S.; Saleem, G.; Younus, M.U. Image quality assessment for foliar disease identification (AgroPath). arXiv 2022, arXiv:2209.12443. [Google Scholar]
Hong, H.; Lin, J.; Huang, F. Tomato disease detection and classification by deep learning. In Proceedings of the 2020 International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), Fuzhou, China, 12–14 June 2020; pp. 25–29. [Google Scholar]
Sahu, S.K.; Pandey, M. An optimal hybrid multiclass SVM for plant leaf disease detection using spatial Fuzzy C-Means model. Expert Syst. Appl. 2023, 214, 118989. [Google Scholar] [CrossRef]
Hughes, D.; Salathé, M. An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv 2015, arXiv:1511.08060. [Google Scholar]
Singh, D.; Jain, N.; Jain, P.; Kayal, P.; Kumawat, S.; Batra, N. PlantDoc: A dataset for visual plant disease detection. In Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, Hyderabad, India, 5–7 January 2020; pp. 249–253. [Google Scholar]
Moupojou, E.; Tagne, A.; Retraint, F.; Tadonkemwa, A.; Wilfried, D.; Tapamo, H.; Nkenlifack, M. FieldPlant: A dataset of field plant images for plant disease detection and classification with deep learning. IEEE Access 2023, 11, 35398–35410. [Google Scholar] [CrossRef]
Islam, M.M.; Adil, M.A.A.; Talukder, M.A.; Ahamed, M.K.U.; Uddin, M.A.; Hasan, M.K.; Sharmin, S.; Rahman, M.M.; Debnath, S.K. DeepCrop: Deep learning-based crop disease prediction with web application. J. Agric. Food Res. 2023, 14, 100764. [Google Scholar] [CrossRef]
Karki, S.; Basak, J.K.; Tamrakar, N.; Deb, N.C.; Paudel, B.; Kook, J.H.; Kang, M.Y.; Kang, D.Y.; Kim, H.T. Strawberry disease detection using transfer learning of deep convolutional neural networks. Sci. Hortic. 2024, 332, 113241. [Google Scholar] [CrossRef]
Khan, I.; Sohail, S.S.; Madsen, D.Ø.; Khare, B.K. Deep transfer learning for fine-grained maize leaf disease classification. J. Agric. Food Res. 2024, 16, 101148. [Google Scholar] [CrossRef]
Ullah, N.; Khan, J.A.; Almakdi, S.; Alshehri, M.S.; Al Qathrady, M.; El-Rashidy, N.; El-Sappagh, S.; Ali, F. An effective approach for plant leaf diseases classification based on a novel DeepPlantNet deep learning model. Front. Plant Sci. 2023, 14, 1212747. [Google Scholar] [CrossRef] [PubMed]
Anandhakrishnan, T.; Jaisakthi, S. Deep Convolutional Neural Networks for image based tomato leaf disease detection. Sustain. Chem. Pharm. 2022, 30, 100793. [Google Scholar] [CrossRef]
Abbas, A.; Jain, S.; Gour, M.; Vankudothu, S. Tomato plant disease detection using transfer learning with C-GAN synthetic images. Comput. Electron. Agric. 2021, 187, 106279. [Google Scholar] [CrossRef]
Anim-Ayeko, A.O.; Schillaci, C.; Lipani, A. Automatic blight disease detection in potato (Solanum tuberosum L.) and tomato (Solanum lycopersicum, L. 1753) plants using deep learning. Smart Agric. Technol. 2023, 4, 100178. [Google Scholar] [CrossRef]
Bansal, A.; Sharma, R.; Sharma, V.; Jain, A.K.; Kukreja, V. Detecting Severity Levels of Cucumber Leaf Spot Disease using ResNext Deep Learning Model: A Digital Image Analysis Approach. In Proceedings of the 2023 4th International Conference for Emerging Technology (INCET), Belgaum, India, 26–28 May 2023; pp. 1–6. [Google Scholar]
Harakannanavar, S.S.; Rudagi, J.M.; Puranikmath, V.I.; Siddiqua, A.; Pramodhini, R. Plant leaf disease detection using computer vision and machine learning algorithms. Glob. Transit. Proc. 2022, 3, 305–310. [Google Scholar] [CrossRef]
Pallathadka, H.; Ravipati, P.; Sajja, G.S.; Phasinam, K.; Kassanuk, T.; Sanchez, D.T.; Prabhu, P. Application of machine learning techniques in rice leaf disease detection. Mater. Today Proc. 2022, 51, 2277–2280. [Google Scholar] [CrossRef]
Alessandrini, M.; Rivera, R.C.F.; Falaschetti, L.; Pau, D.; Tomaselli, V.; Turchetti, C. A grapevine leaves dataset for early detection and classification of esca disease in vineyards through machine learning. Data Brief 2021, 35, 106809. [Google Scholar] [CrossRef]
Hemalatha, A.; Vijayakumar, J. Automatic tomato leaf diseases classification and recognition using transfer learning model with image processing techniques. In Proceedings of the 2021 Smart Technologies, Communication and Robotics (STCR), Sathyamangalam, India, 9–10 October 2021; pp. 1–5. [Google Scholar]
Agarwal, M.; Singh, A.; Arjaria, S.; Sinha, A.; Gupta, S. ToLeD: Tomato leaf disease detection using convolution neural network. Procedia Comput. Sci. 2020, 167, 293–301. [Google Scholar] [CrossRef]
Rizvee, R.A.; Orpa, T.H.; Ahnaf, A.; Kabir, M.A.; Rashid, M.R.A.; Islam, M.M.; Islam, M.; Jabid, T.; Ali, M.S. LeafNet: A proficient convolutional neural network for detecting seven prominent mango leaf diseases. J. Agric. Food Res. 2023, 14, 100787. [Google Scholar] [CrossRef]
Prathiksha, B.; Kumar, V.; Krishnamoorthi, M.; Poovizhi, P.; Sowmiya, D.; Thrishaa, B. Early Accurate Identification of Grape leaf Disease Detection using CNN based VGG-19 model. In Proceedings of the 2024 International Conference on Cognitive Robotics and Intelligent Systems (ICC-ROBINS), Coimbatore, India, 17–19 April 2024; pp. 263–269. [Google Scholar]
Paymode, A.S.; Malode, V.B. Transfer learning for multi-crop leaf disease image classification using convolutional neural network VGG. Artif. Intell. Agric. 2022, 6, 23–33. [Google Scholar] [CrossRef]
Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
Hassan, S.M.; Maji, A.K.; Jasin´ski, M.; Leonowicz, Z.; Jasin´ska, E. Identification of plant-leaf diseases using CNN and transfer-learning approach. Electronics 2021, 10, 1388. [Google Scholar] [CrossRef]
Chellapandi, B.; Vijayalakshmi, M.; Chopra, S. Comparison of pre-trained models using transfer learning for detecting plant disease. In Proceedings of the 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), Greater Noida, India, 19–20 February 2021; pp. 383–387. [Google Scholar]
Krishnamoorthy, N.; Prasad, L.N.; Kumar, C.P.; Subedi, B.; Abraha, H.B.; Sathishkumar, V. Rice leaf diseases prediction using deep neural networks with transfer learning. Environ. Res. 2021, 198, 111275. [Google Scholar]
Naveenkumar, M.; Srithar, S.; Kumar, B.R.; Alagumuthukrishnan, S.; Baskaran, P. InceptionResNetV2 for plant leaf disease classification. In Proceedings of the 2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Palladam, India, 11–13 November 2021; pp. 1161–1167. [Google Scholar]
Islam, M.A.; Shuvo, M.N.R.; Shamsojjaman, M.; Hasan, S.; Hossain, M.S.; Khatun, T. An automated convolutional neural network based approach for paddy leaf disease detection. Int. J. Adv. Comput. Sci. Appl. 2021, 12. [Google Scholar] [CrossRef]
Hridoy, R.H.; Afroz, M.; Ferdowsy, F. An Early Recognition Approach for Okra Plant Diseases and Pests Classification Based on Deep Convolutional Neural Networks. In Proceedings of the 2021 Innovations in Intelligent Systems and Applications Conference (ASYU), Elazig, Turkey, 6–8 October 2021; pp. 1–6. [Google Scholar]
Sharma, M.; Kumar, C.J.; Deka, A. Early diagnosis of rice plant disease using machine learning techniques. Arch. Phytopathol. Plant Prot. 2022, 55, 259–283. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
Bir, P.; Kumar, R.; Singh, G. Transfer learning based tomato leaf disease detection for mobile applications. In Proceedings of the 2020 IEEE International Conference on Computing, Power and Communication Technologies (GUCON), Greater Noida, India, 2–4 October 2020; pp. 34–39. [Google Scholar]
Elfatimi, E.; Eryigit, R.; Elfatimi, L. Beans leaf diseases classification using mobilenet models. IEEE Access 2022, 10, 9471–9482. [Google Scholar] [CrossRef]
Mehedi, M.H.K.; Hosain, A.S.; Ahmed, S.; Promita, S.T.; Muna, R.K.; Hasan, M.; Reza, M.T. Plant leaf disease detection using transfer learning and explainable ai. In Proceedings of the 2022 IEEE 13th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 12–15 October 2022; pp. 166–170. [Google Scholar]
Tambe, U.Y.; Shanthini, A.; Hsiung, P.A. Integrated Leaf Disease Recognition Across Diverse Crops through Transfer Learning. Procedia Comput. Sci. 2024, 233, 22–34. [Google Scholar] [CrossRef]
Tan, M.; Le, Q.E. Rethinking model scaling for convolutional neural networks. arXiv 2019, arXiv:1905.11946. [Google Scholar]
Yasid, A.; Wahyuningrum, R.T.; Ni’Mah, A.T.; Ayani, I.H. Rice Leaf Diseases Classification using Deep Learning Based onEfficientNetB3 Architecture with Transfer Learning. In Proceedings of the 2023 International Conference on Technology, Engineering, and Computing Applications (ICTECA), Semarang, Indonesia, 20–22 December 2023; pp. 1–6. [Google Scholar]
Adnan, F.; Awan, M.J.; Mahmoud, A.; Nobanee, H.; Yasin, A.; Zain, A.M. EfficientNetB3-adaptive augmented deep learning (AADL) for multi-class plant disease classification. IEEE Access 2023, 11, 85426–85440. [Google Scholar] [CrossRef]
Yaswanth, D.; Manoj, S.S.; Yadav, M.S.; Chowdary, E.D. Plant Leaf Disease Detection Using Transfer Learning Approach. In Proceedings of the 2024 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), Bhopal, India, 24–25 February 2024; pp. 1–6. [Google Scholar]
Saleem, M.H.; Potgieter, J.; Arif, K.M. Plant disease classification: A comparative evaluation of convolutional neural networks and deep learning optimizers. Plants 2020, 9, 1319. [Google Scholar] [CrossRef] [PubMed]
Singh, P.P.; Kaushik, R.; Singh, H.; Kumar, N.; Rana, P.S. Convolutional neural networks based plant leaf diseases detection scheme. In Proceedings of the 2019 IEEE Globecom Workshops (GC Wkshps), Waikoloa, HI, USA, 9–13 December 2019; pp. 1–7. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Kumar, D.; Ishak, M.K.; Maruzuki, M.I.F. EfficientNet based Convolutional Neural Network for Visual Plant Disease Detection. In Proceedings of the 2022 19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Prachuap Khiri Khan, Thailand, 24–27 May 2022; pp. 1–4. [Google Scholar]

Figure 1. Framework for plant leaf disease detection and classification.

Figure 2. Datasets: (a) PlantVillage, (b) PlantDoc, (c) FieldPlant.

Figure 3. PlantVillage dataset distribution.

Figure 4. PlantDoc dataset distribution.

Figure 5. FieldPlant dataset distribution.

Figure 6. Proposed model using InceptionResNetV2, MobileNetV2, and EfficientNetB3.

Figure 7. Training and Validation Accuracy of (a) PlantVillage, (b) PlantDoc, and (c) FieldPlant.

Figure 8. Training and validation loss of (a) PlantVillage, (b) PlantDoc, and (c) FieldPlant.

Figure 9. Confidence scores—FieldPlant dataset.

Figure 10. Confusion matrix—PlantVillage dataset.

Figure 11. Confusion matrix—PlantDoc dataset.

Figure 12. Confusion matrix—FieldPlant dataset.

Table 1. Classifier performance on various datasets: a comparative analysis.

Reference	Dataset	Classifier	Accuracy (%)
[21]	Village database of tomato leaf—6 disorders	SVM, CNN, KNN	88, 97, 99.6
[22]	PlantVillage—color and gray-scale	SVM, CNN, Naïve Bayes	96.2, 91.3, 78.8
[23]	Grapevine images from three cameras	CNN (1280 × 720), CNN (320 × 180), CNN (80 × 45)	99, 99, 96
[8]	Leaf pictures of 9 tomato diseases (13,112 images)	DenseNet_Xception, Xception, ResNet50, MobileNet, ShuffleNet	97.10, 93.17, 86.56, 80.11, 83.68
[24]	400 tomato images	DenseNet	99.688
[1]	4062 grape leaf images from PlantVillage	SegCNN	93.75
[5]	20,639 images of tomato, potato, and bell pepper	Deep CNN	98.9
[17]	18,160 images of tomato leaves from PlantVillage	Deep CNN	98.4
[16]	PlantVillage	Deep CNN	94
[20]	50,000 cucumber leaf images	ResNeXt-50	97.81
[9]	54,303 images of various crops	Hybrid random forest, multiclass SVM	98.9
[18]	16,012 images of tomato plants	DenseNet121 (5 classes, 7 classes, 10 classes)	99.5, 98.65, 97.11
[25]	50,000 images of 14 crops—PlantVillage	CNN	91.2
[19]	PlantVillage	ResNet-9	99.25
[15]	PlantVillage	VGGNet, InceptionV3, ResNet50, InceptionResNetv2	88.65, 96.25, 98.13, 91.06
[26]	MangoleafBD	LeafNet, AlexNet	98.55, 98.25
[13]	PlantVillage	CNN, VGG-16, VGG-19, ResNet-50	98.60, 92.39, 96.15, 98.98
[27]	Grape disease dataset from Kaggle	VGG19	98
[14]	Strawberry dataset	ResNet50, DenseNet121 (Fine-tuned)	93.9, 93.5, 94.4, 94.1
[28]	PlantVillage	VGG16	95.71
[18]	PlantVillage (with C-GAN augmentation)	DenseNet121	99.51

Table 2. Accuracies of InceptionResNetV2 for various datasets.

Reference	Dataset	Accuracy (%)	Model Modification
[30]	PlantVillage dataset—54,205 images	99.11	Standard convolution in InceptionResNet-A block replaced with depthwise convolution.
[31]	PlantVillage dataset—54,205 images	98	InceptionResNet-C replaced by 3 × 1 and 1 × 3 structure Global average pooling layer, batch normalization layer, and a denser layer with weight 38
[32]	Rice leaf images from Kaggle—5200 images	95.67	Global average pooling layer, dropout (0.3), and softmax activation
[33]	1540 field images from Nilgiris and images from image data repository	95	Original architecture
[34]	984 paddy leaf images from Kaggle and machine learning repository	92.68	Original architecture
[35]	124,760 images of Okra dataset	98.16	2 convolution layers, 3 dense layers, 2 dropout layers, max pooling, and softmax activation
[36]	1108 images of rice leaves (3 classes)	98.9	Original architecture

Table 3. Accuracies of MobileNetV2 for various datasets.

Reference	Dataset	Accuracy (%)	Model Modification
[39]	1296 field images from iBean	97	Original architecture
[4]	Citrus plant dataset	Unaugmented dataset: 93.81, Augmented dataset: 97.91	Fully connected layer replaced with five nodes based on the number of classes in the dataset and added softmax activation function
[40]	New plant diseases dataset: 38 diseases of 14 different plants	98.86	Flattening layer and softmax activation function
[30]	PlantVillage dataset	97.02	Activation layer, batch-normalization layer, and dropout layer (different values)
[41]	New Plant diseases dataset	91.98	Original architecture

Table 4. Accuracies of EfficientNetB3 for various datasets.

Reference	Dataset	Accuracy (%)	Model Modification
[4]	Citrus plant dataset	Unaugmented dataset: 92.78, Augmented dataset: 99.58	Fully connected layer replaced with five nodes based on the number of classes in the dataset, and added softmax activation function
[44]	59,809 images—58 classes of healthy and unhealthy plants (Kaggle)	98.71	A convolutional layer, max pooling, replacing the final layers, and incorporating batch normalization, regularization, and a dense layer
[45]	New plant diseases dataset (augmented)	99.9	Batch normalization layer, denser layer with 256 neurons, dropout layer (0.45), and a final dense layer with softmax activation
[43]	Rice leaf dataset from Kaggle	79.43	Original architecture

Table 5. Data augmentation methods and values.

Data Augmentation Methods	Values
Image Size	224 × 224 × 3
Zoom Range	0.2
Rotation Range	40
Horizontal Flip	True
Vertical Flip	True
Rescaling Factor	1/255
Validation Split	0.2

Table 6. Model parameters and values.

Parameters	Values
Optimizer	Adam
Epochs	15
Initial learning rate	0.0001
Loss function	Categorical cross-entropy
Batch size	16
Activation function	Softmax and ReLU
Dropout	0.5
Early stopping	Monitor metric = validation loss, patience = 5
Reduce LR on plateau	Monitor metric = validation loss, patience = 2, factor = 0.2, minimum learning rate = 1 × 10⁻²²

Table 7. Comparative test accuracies of proposed approach across various methodologies.

Datasets	References	Model	Accuracy
FieldPlant	[12]	MobileNet	82.9%
		VGG16	80.54%
		InceptionResNetV2	81.81%
		InceptionV3	82.54%
		Proposed Approach	83.00%
PlantDoc	[49]	MobileNetV2	40.00%—Validation accuracy
		EfficientNetV2	28.00%
		Xception	81.53%
		Proposed Approach	60%
PlantVillage	[19]	ResNet-9	99.25%
	[15]	VGGNet	88.65%
		InceptionV3	96.25%
		ResNet50	98.13%
		InceptionResNetV2	91.06%
	[30]	MobileNetV2	97.02%
	[18]	DenseNet121 with C-GAN augmentation	99.51%
		Proposed Approach	99.69%

Table 8. Precision, recall, and F1 score for PlantVillage dataset.

Class	Precision	Recall	F1 Score
Apple apple_scab	100.00	100.00	100.00
Apple black_rot	100.00	100.00	100.00
Apple cedar_apple_rust	100.00	100.00	100.00
Apple healthy	100.00	99.70	99.85
Blueberry healthy	99.34	100.00	99.67
Cherry_(including_sour) powdery_mildew	100.00	100.00	100.00
Cherry_(including_sour) healthy	100.00	99.41	99.70
Corn_(maize) Cercospora_leaf_spot_gray_leaf_spot	93.33	95.15	94.23
Corn_(maize) common_rust_	99.58	100.00	99.79
Corn_(maize) northern_leaf_blight	97.42	95.94	96.68
Corn_(maize) healthy	99.57	99.14	99.35
Grape black_rot	100.00	99.58	99.79
Grape esca_(black_measles)	99.64	100.00	99.82
Grape leaf_blight_(isariopsis_leaf_spot)	100.00	100.00	100.00
Grape healthy	100.00	100.00	100.00
Orange Haunglongbing_(citrus_greening)	100.00	100.00	100.00
Peach bacterial_spot	100.00	100.00	100.00
Peach healthy	100.00	100.00	100.00
Pepper_bell bacterial_spot	100.00	100.00	100.00
Pepper_bell healthy	100.00	100.00	100.00
Potato early_blight	100.00	100.00	100.00
Potato late_blight	100.00	100.00	100.00
Potato healthy	96.88	100.00	98.41
Raspberry healthy	100.00	100.00	100.00
Soybean healthy	100.00	99.90	99.95
Squash powdery_mildew	100.00	100.00	100.00
Strawberry leaf_scorch	100.00	100.00	100.00
Strawberry healthy	100.00	100.00	100.00
Tomato bacterial_spot	99.53	100.00	99.77
Tomato early_blight	98.03	99.50	98.76
Tomato late_blight	100.00	98.95	99.47
Tomato leaf_Mold	100.00	100.00	100.00
Tomato septoria_leaf_spot	100.00	99.43	99.72
Tomato spider_mites_two-spotted_spider_mite	100.00	98.51	99.25
Tomato target_Spot	98.94	100.00	99.47
Tomato tomato_yellow_leaf_curl_virus	99.44	99.81	99.63
Tomato tomato_mosaic_virus	100.00	100.00	100.00
Tomato healthy	100.00	100.00	100.00

Table 9. Precision, recall, and F1 score for PlantDoc dataset.

Class	Precision	Recall	F1 Score
Apple Scab Leaf	62.50	100.00	76.92
Apple Leaf	42.11	88.89	57.14
Apple Rust Leaf	100.00	50.00	66.67
Bell Pepper Leaf	75.00	75.00	75.00
Bell Pepper Leaf Spot	44.44	44.44	44.44
Blueberry Leaf	50.00	36.36	42.11
Cherry Leaf	57.14	40.00	47.06
Corn Gray Leaf Spot	12.50	25.00	16.67
Corn Leaf Blight	53.85	58.33	56.00
Corn Rust Leaf	100.00	60.00	75.00
Peach Leaf	85.71	66.67	75.00
Potato Leaf Early Blight	30.77	50.00	38.10
Potato Leaf Late Blight	25.00	25.00	25.00
Raspberry Leaf	87.50	100.00	93.33
Soybean Leaf	80.00	50.00	61.54
Squash Powdery Mildew Leaf	100.00	100.00	100.00
Strawberry Leaf	100.00	100.00	100.00
Tomato Early Blight Leaf	50.00	22.22	30.77
Tomato Septoria Leaf Spot	43.75	63.64	51.85
Tomato Leaf	100.00	37.50	54.55
Tomato Leaf Bacterial Spot	20.00	22.22	21.05
Tomato Leaf Late Blight	66.67	80.00	72.73
Tomato Leaf Mosaic Virus	-	0.00	-
Tomato Leaf Yellow Virus	100.00	83.33	90.91
Tomato Mold Leaf	36.36	66.67	47.06
Grape Leaf	85.71	100.00	92.31
Grape Leaf Black Rot	100.00	87.50	93.33

Table 10. Precision, recall, and F1 score for FieldPlant dataset.

Class	Precision	Recall	F1 Score
Cassava Brown Leaf Spot Cassava Healthy	62.86 75.00	57.89 66.00	60.27 70.21
Cassava Mosaic	89.08	95.33	92.10
Corn Brown Spots	92.59	73.53	81.97
Corn Healthy	60.00	50.00	54.55
Corn Streak	86.21	69.44	76.92
Corn Stripe	94.12	76.19	84.21
Corn Yellowing	90.00	88.52	89.26
Corn Leaf Blight Tomato Brown	84.43	94.06	88.98
Spots Tomato Blight Leaf	99.38 50.91	84.29 59.57	91.22 54.90
Tomato Healthy	39.47	55.56	46.15
Tomato Leaf Yellow	50.00	53.85	51.85

Table 11. Impact of additional layers on the three-model ensemble.

Layers Included in the Model	Accuracy
No additional layer	58.4%
One layer—dense layer	56.7%
Two layers—dense layer and batch normalization	58.8%
Three layers—dense layer, batch normalization, and dropout	60.1%

Table 12. Performance of individual models and model combinations.

Model	Accuracy
InceptionResNetV2	57.6%
MobileNetV2	43.2%
EfficientNetB3	12.2%
InceptionResNetV2 and MobileNetV2	58%
MobileNetV2 and EfficientNetB3	52.5%
InceptionResNetV2 and EfficientNetB3	54.2%
All three models	60.1%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zubair, F.; Saleh, M.; Akbari, Y.; Al Maadeed, S. A Robust Ensemble Model for Plant Disease Detection Using Deep Learning Architectures. AgriEngineering 2025, 7, 159. https://doi.org/10.3390/agriengineering7050159

AMA Style

Zubair F, Saleh M, Akbari Y, Al Maadeed S. A Robust Ensemble Model for Plant Disease Detection Using Deep Learning Architectures. AgriEngineering. 2025; 7(5):159. https://doi.org/10.3390/agriengineering7050159

Chicago/Turabian Style

Zubair, Fida, Moutaz Saleh, Younes Akbari, and Somaya Al Maadeed. 2025. "A Robust Ensemble Model for Plant Disease Detection Using Deep Learning Architectures" AgriEngineering 7, no. 5: 159. https://doi.org/10.3390/agriengineering7050159

APA Style

Zubair, F., Saleh, M., Akbari, Y., & Al Maadeed, S. (2025). A Robust Ensemble Model for Plant Disease Detection Using Deep Learning Architectures. AgriEngineering, 7(5), 159. https://doi.org/10.3390/agriengineering7050159

Article Menu

A Robust Ensemble Model for Plant Disease Detection Using Deep Learning Architectures

Abstract

1. Introduction

2. Literature Review

3. Datasets

4. Proposed Approach

4.1. Data Augmentation

4.2. Model Architecture

4.3. Parameter Settings

4.4. Evaluation Measures

5. Results and Discussion

5.1. Loss and Accuracy

5.2. Precision, Recall, and F1 Score Analysis

5.3. Evaluation Measures for Each Class

5.4. Ablation Study

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI