Detection of Leaf Diseases in Banana Crops Using Deep Learning Techniques

Jiménez, Nixon; Orellana, Stefany; Mazon-Olivo, Bertha; Rivas-Asanza, Wilmer; Ramírez-Morales, Iván

doi:10.3390/ai6030061

Open AccessArticle

Detection of Leaf Diseases in Banana Crops Using Deep Learning Techniques

by

Nixon Jiménez

¹

,

Stefany Orellana

¹

,

Bertha Mazon-Olivo

^2,*

,

Wilmer Rivas-Asanza

²

and

Iván Ramírez-Morales

³

¹

Department of Information Technology, Facultad de Ingeniería Civil, Universidad Técnica de Machala, 5.5 km Pan-American Av., Machala 070150, Ecuador

²

AutoMathTIC Research Group, Facultad de Ingeniería Civil, Universidad Técnica de Machala, 5.5 km Pan-American Av., Machala 070150, Ecuador

³

DINTA Research Group, Facultad de Ciencias Agropecuarias, Universidad Técnica de Machala, 5.5 km Pan-American Av., Machala 070150, Ecuador

^*

Author to whom correspondence should be addressed.

AI 2025, 6(3), 61; https://doi.org/10.3390/ai6030061

Submission received: 10 January 2025 / Revised: 8 March 2025 / Accepted: 14 March 2025 / Published: 17 March 2025

(This article belongs to the Special Issue Artificial Intelligence in Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Leaf diseases, such as Black Sigatoka and Cordana, represent a growing threat to banana crops in Ecuador. These diseases spread rapidly, impacting both leaf and fruit quality. Early detection is crucial for effective control measures. Recently, deep learning has proven to be a powerful tool in agriculture, enabling more accurate analysis and identification of crop diseases. This study applied the CRISP-DM methodology, consisting of six phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment. A dataset of 900 banana leaf images was collected—300 of Black Sigatoka, 300 of Cordana, and 300 of healthy leaves. Three pre-trained models (EfficientNetB0, ResNet50, and VGG19) were trained on this dataset. To improve performance, data augmentation techniques were applied using TensorFlow Keras’s ImageDataGenerator class, expanding the dataset to 9000 images. Due to the high computational demands of ResNet50 and VGG19, training was performed with EfficientNetB0. The models—EfficientNetB0, ResNet50, and VGG19—demonstrated the ability to identify leaf diseases in bananas, with accuracies of 88.33%, 88.90%, and 87.22%, respectively. The data augmentation increased the performance of EfficientNetB0 to 87.83%, but did not significantly improve its accuracy. These findings highlight the value of deep learning techniques for early disease detection in banana crops, enhancing diagnostic accuracy and efficiency.

Keywords:

banana disease detection; convolutional neural networks; Cordana; deep learning; EfficientNetB0; ResNet50; Sigatoka negra; VGG19

Graphical Abstract

1. Introduction

The banana is one of Ecuador’s main export products and plays a crucial role in its economy, representing 18% of its non-oil exports. According to Ciancio et al. [1], around 400 million people choose to consume bananas due to their health benefits and significant nutritional composition. Additionally, approximately 70 million people earn income from banana production, primarily in some countries in Africa, Asia, and Latin America. The trade of certified bananas under sustainability standards represents a significant part of the economy in several tropical countries, with an estimated market value of USD 2.4 billion [2].

However, the productivity of banana plantations is threatened by various foliar diseases, such as Black Sigatoka (Pseudocercospora fijiensis) and Cordana (Cordana musae), which have a direct impact on fruit quality and plant longevity [3]. Although there are other pathologies that also affect the leaves of this crop, such as Panama disease (Fusarium oxysporum f. sp. Cubense), moko (Ralstonia solanacearum raza 2), Yellow Sigatoka (Mycosphaerella musicola), Erwinia, and mosaic (Tobacco mosaic virus), our research mainly focuses on the two mentioned diseases. This is due to the low incidence of the other diseases in our area, which would have made it difficult to collect the images needed to create the dataset.

Black Sigatoka is one of the most aggressive diseases affecting banana crops, reducing photosynthesis by damaging leaves and causing the premature ripening of the fruit, thus affecting its quality [4]. This disease can decrease crop yields by up to 50%; to prevent its spread, fungicides must be applied up to 40 times a year, increasing the production costs [5]. In the case of Cordana musae, lesions develop as oval spots, characterized by a yellowish edge and a grayish center, exhibiting a pattern of concentric rings. This pathology, attributed to the fungus Cordana musae, presents visual characteristics similar to those of Sigatoka, complicating its identification [6].

In the agricultural context, the implementation of advanced technologies like deep learning in agriculture enables the early detection of plant diseases, helping farmers to make informed decisions and reduce costs. The timely detection of these diseases is crucial to prevent them from worsening, thus protecting both banana production and exports, while minimizing their impacts on producers in the country [7]. Research related to this topic is limited. The following are some important studies. Salehin et al. [8] focused on the detection of two diseases, Fusarium wilt (Panama disease) and Black Sigatoka, achieving accuracy of 99% using complex models like ResNet152 and Inception V3. The use of only two classes fails to represent the real complexity of banana crops, where multiple diseases can coexist, each requiring differentiated diagnosis.

Elinisa and Mduma [9] implemented a CNN model with four layers to classify two diseases, Fusarium wilt and Black Sigatoka, achieving accuracy of 91.17% with a dataset of 27,360 images. Thiagarajan et al. [10] focused on detecting diseases such as Black Sigatoka, Fusarium wilt (Panama disease), and mosaic disease in 937 images of banana leaves [11], using artificial neural networks (ANN) combined with feature extraction techniques such as the scale-invariant feature transform (SIFT), histogram of oriented gradients (HOG), and local binary patterns (LBP). The applied neural networks were convolutional (CNN) and pre-trained models like ResNet50 and Inception V3, achieving accuracy of 98.9% and 95.5%, respectively.

Rajalakshmi et al. [12] identified banana diseases such as Cordana and Pestalotiopsis in India with a dataset of 768 images. The proposed model, called 8C-DCNN, was a deep convolutional neural network that achieved average accuracy of 98.92%.

The models used offer high levels of precision but require significant computational capacities, which can be a practical limitation. Additionally, while the achieved accuracy seems very high, it may not reflect their true generalization, as the authors do not evaluate how the models perform in a multiclass environment. Furthermore, confusion analyses or errors between classes are often not exhaustive, hindering a comprehensive assessment of the model’s performance in realistic scenarios.

After reviewing the existing studies, a significant lack of research focused on the detection of foliar diseases in banana crops within the Latin American context was identified.

The main contributions of this work are as follows:

We propose an approach for early disease detection using pre-trained deep learning models;
In response to the lack of banana leaf datasets contextualized to Latin America, a tailored dataset was collected and created specifically for this environment;
We provide a comparison and evaluation of the performance of the models used (ResNet50, EfficientNetB0, VGG19);
We describe the implementation of deep learning models in a mobile application to facilitate practical use.

Initially, the models were trained with a dataset of 900 images, achieving accuracy of 85% with ResNet50, 84% with EfficientNetB0, and 88% with VGG19. The EfficientNetB0 model stands out not only for its good accuracy but also for its efficiency in resource use, both computationally and in time, completing its training in approximately 30 min. In contrast, ResNet50 and VGG19 had longer training times, requiring around 1.5 h and 7 h, respectively, using computational resources such as an Intel Core i7 11th generation CPU with 32 GB of RAM and the Jupyter Notebook tool (6.4.12) with the Python language (3.9.13).

Given this difference in training times, it was decided not to train ResNet50 and VGG19 with the dataset of 9000 images due to the inability to perform multiple iterations of fine-tuning and comparative evaluation. Therefore, EfficientNetB0 was the only model trained with the complete set of images, achieving accuracy of 88%. The application of deep learning techniques not only facilitates the early detection of foliar diseases in bananas but also offers a tool for farmers, allowing them to apply corrective measures in time, improving the quality of the final product and reducing economic losses.

2. State of the Art

In addition to the studies reviewed in the Introduction section, related to the application of deep learning in banana disease detection, the work of Sanga, Mero, and Machuve [13] stands out, focusing on binary classification models; however, it does not include a multiclass analysis or a comparison between deep learning models. Nevertheless, related research on the application of deep learning in other crops has been identified, which is described below.

Blanc-Talon et al. [14] focused on applying deep learning techniques, presenting a deep meta-architecture and a refinement filter bank to address class imbalances and false positives. They evaluated their method on a dataset of tomato diseases, achieving improvements in disease detection under real field conditions.

Abbas et al. [15] proposed a method for the detection of diseases in tomato plants using transfer learning and synthetic image generation through conditional generative adversarial networks (C-GAN). To enhance the model’s generalization, synthetic images were generated and combined with real data to train the DenseNet121 model. They focused on the multiclass classification of tomato diseases, achieving accuracy of 97.11% across 10 classes from the PlantVillage dataset [16]. This approach demonstrates the effectiveness of combining transfer learning and synthetic data to improve the performance in agricultural settings.

Atila et al. [17] utilized the EfficientNet model to classify diseases in maize, grape, orange, peach, potato, raspberry, soybean, strawberry, tomato, and pumpkin leaves, employing the PlantVillage dataset [16]. They trained different versions of EfficientNet (B0 to B7), comparing their performance with state-of-the-art models such as ResNet50, VGG16, and Inception V3. The EfficientNetB4 and B5 models achieved average accuracy of 99.97% and 99.91%, respectively, outperforming the other models. This study highlights the efficacy of EfficientNet for the multiclass classification of plant diseases and its potential for early detection.

Jadhav et al. [18] focused on identifying soybean plant diseases using pre-trained convolutional neural networks (CNN), specifically AlexNet and GoogleNet, with transfer learning to classify three common diseases in 649 images of soybean leaves, achieving accuracy of 98.75% with AlexNet and 96.25% with GoogleNet.

Anh et al. [19] concentrated on detecting foliar diseases in tomato, maize, potato, rice, wheat, soybean, cotton, grape, apple, and citrus using deep learning models on resource-limited devices like a Raspberry Pi 3, especially when utilizing large datasets. They found that MobileNetV3 was the most suitable for field implementation, achieving accuracy of 96.58%, with inference and initialization times of 127 ms and 11 ms, respectively, and requiring only 7.4 MB of memory.

Similarly, Andrew et al. [20] addressed the detection of foliar diseases in plants using deep learning models, specifically pre-trained convolutional neural networks (CNN) such as DenseNet-121, ResNet50, VGG-16, and Inception V4. The dataset used was PlantVillage [20], which consists of 38 classes and 54,305 images. They applied hyperparameter tuning and regularization techniques to improve the accuracy and reduce overfitting. DenseNet121 achieved the highest accuracy (99.81%) in the multiclass classification of diseases in agricultural crops, standing out as the most suitable model for this task.

3. Literature Review

In the search for information related to this topic, various authors have been identified who have provided relevant analyses regarding the methods and models applied in the detection of foliar diseases using deep learning.

Liu and Wang [21] present a review of disease and pest detection in plants using deep learning techniques, focusing on classifying, detecting, and segmenting images of diseases in crops. They compare these approaches with traditional methods and highlight the advantages, challenges, and potential solutions of deep neural networks in this field. Additionally, the study addresses the available datasets and evaluates the performance of different models in agricultural disease detection, providing a detailed analysis of the use of these technologies in various agricultural crops, although not specifically in bananas.

Hussein and Mousa [22] categorize and classify convolutional neural network (CNN) models, identifying tools and platforms while highlighting some implementation challenges in plant disease detection. The following presents a review of the state of the art in deep learning, which has served as a basis for the associated terminology and aided in carrying out our work.

3.1. Deep Learning

Deep learning, a subfield of machine learning and artificial intelligence, employs artificial neural networks with multiple layers. These architectures allow for the extraction of hierarchical representations from data, thereby optimizing the automation of complex tasks such as image recognition, natural language processing, and time series prediction. In the scientific literature, there has been a notable increase in research implementing deep learning techniques across various sectors, including the detection of plant diseases, such as convolutional neural networks (CNN), recurrent neural networks (RNN), and transfer learning [9,17,18].

3.2. Convolutional Neural Networks

Convolutional neural networks (CNNs) are models specifically designed to process data in image format. These networks have proven to be essential in the fields of computer vision and digital image processing due to their ability to model and adapt representations of relevant visual features [23]. The architecture is based on convolutional layers that perform convolution operations to extract hierarchies of spatial features, such as edges, textures, and shapes, at different levels of abstraction [24]; this design allows CNNs to excel in tasks such as image classification, object detection, and semantic segmentation, as well as in specific applications like facial recognition, autonomous vehicles, and medical diagnostics [25,26,27,28].

The structure of a CNN operates on data organized in matrices (images) through the implementation of convolutional layers, pooling layers, and fully connected layers. The convolutional layers are trained using a set of specifically optimized filters (kernels) for edge and texture detection, which are fundamental for visual interpretation [29]. The rectified linear unit (ReLU) activation function is incorporated in the hidden layers to introduce non-linearity into the model, thereby facilitating the learning of complex patterns in data representations. The pooling process is essential for dimensionality reduction while preserving the most important information from the detected features [30]. Finally, the fully connected layers perform the prediction stage based on the feature representations extracted in the previous layers, completing the inference process [31].

Throughout the development of CNNs [32,33], various architectures have emerged, designed to improve their performance in terms of accuracy, efficiency, and depth. For this research, three models have been considered, namely VGG, ResNet, and EfficientNet, due to their wide use in image classification tasks and their ability to extract relevant features in the analysis of crop diseases. The choice of these models was due to their performance in the detection of complex visual patterns, their adaptability to different image conditions, and their balance between accuracy and computational efficiency [34].

3.3. EfficientNet B0

EfficientNetB0 is a convolutional neural network that efficiently optimizes the depth, width, and resolution by using a composite coefficient. Unlike traditional approaches that adjust these dimensions without a defined criterion, this model uses preset scaling coefficients; this strategy allows the network to add more layers and channels as the input image resolution increases, which in turn expands the receptive field and facilitates the capture of more detailed patterns in the image [35,36].

3.4. ResNet50

ResNet is a convolutional neural network with 50 layers, and its main advantage lies in its lower memory consumption compared to other convolutional neural network models. This is due to the use of a layer called GlobalAveragePooling in the classification phase, instead of using dense layers. This layer transforms the two-dimensional feature maps generated in the last feature extraction stage into a vector of n-classes, which is used to calculate the probability of belonging to each category [37,38,39].

3.5. VGG19

VGG19 is a convolutional neural network with a deep architecture that improves feature extraction in images, uses alternating convolutional layers with non-linear activations and MaxPooling to reduce the dimensionality. In addition, it uses the ReLU to select the most representative values, which optimizes image processing by preserving key features and reducing the number of parameters [40,41].

3.6. Transfer Learning

Transfer learning reuses pre-trained models from a source domain to solve tasks in a related target domain, mitigating overfitting in small datasets. In convolutional neural networks (CNNs), it optimizes the feature selector of the target domain, adjusting only the fully connected layers, which reduces the complexity and computational costs without compromising the accuracy [30].

The implementation and development of advanced deep learning architectures primarily apply to large and diverse datasets and require high-performance computational platforms. This is particularly true for convolutional neural networks (CNNs), which require a substantial number of labeled image data to carry out the model training process for classification. These models possess a large number of parameters, whose optimal adjustment occurs through complex matrix operations during the training phase. Therefore, the adoption of specialized hardware, such as graphics processing units (GPUs), is common practice to significantly increase the efficiency and speed of these processes [30,31,32]. If such capabilities are not available for model construction, the training process may become unfeasible. A reduced volume of data may prevent models from generalizing to new cases; similarly, if the computational capacity is insufficient, training may require excessive time, hindering the effective use of the models [42].

In light of the inherent complexity of the issues at hand, the concept of “transfer learning” emerged, based on the premise that one task can leverage the results and knowledge acquired from another task, thereby facilitating the learning process and optimizing the transferred knowledge [43]. In the field of image processing, the most recognized transfer learning technique was introduced by Yosinski [44]. The proposed methodology facilitates the transfer of learned attributes from one model to another distinct model. However, to carry out this adaptation of the recipient model, a fine-tuning process must be executed on the interconnected layers of the latter [45].

3.7. Confusion Matrix and Associated Metrics for Classification Problems

In the context of convolutional neural networks (CNNs), the confusion matrix is a tool used to assess a classifier’s performance by analyzing the relationships between classes within a dataset. It is generated from a test dataset with a known ground truth, where each class is compared to the others to identify the number of misclassified samples [46]. These improvements help to enhance both the accuracy and computational efficiency of the model. For instance, as mentioned in the work by Chipindu et al. [47], in an image detection scenario, the confusion matrix can be applied to binary classification tasks, such as distinguishing between two outputs: 0 (no abortion) and 1 (abortion).

The confusion matrix is used in neural networks to evaluate feature attribution methods, classifying data as “relevant” or “not relevant”. Metrics such as precision, recall, and the F1 score are adopted, as visualized in Figure 1, where true positives represent instances correctly classified as belonging to the target class and false positives represent incorrect classifications.

This approach allows for a quantitative evaluation of the effectiveness of attribution methods, thereby facilitating the selection of more reliable and effective techniques to explain model behavior [48].

3.8. Performance Metrics for CNN Models

Zapeta Hernández et al. [49] define performance metrics used to quantify the accuracy and precision in image classification through convolutional neural networks. These metrics facilitate the identification of errors, such as incorrect predictions, underfitting, and overfitting during the machine learning process, thereby optimizing the model’s ability to perform correct classification. The evaluation metrics include indicators such as accuracy, precision, recall (or completeness), specificity, and the F1 score, as well as the rates of true positives and false positives [50].

As a first point, it is essential to understand the meanings of the variables involved, which are described below.

TP: True Positives
TN: True Negatives
FP: False Positives
FN: False Negatives

Accuracy is defined as the proportion of predictions that a model correctly classifies and is represented by Equation (1). This metric is valuable due to its direct interpretation; however, it can be misleading in unbalanced datasets, as it may present high levels of accuracy even if the model does not adequately classify minority classes [51].

A c c u r a c y = \frac{T P + T N}{(T P + T N + F P + F P)}

(1)

Precision measures the accuracy of the model in correctly classifying a positive class and is calculated using Equation (2).

P r e c i s i o n = \frac{V P}{V P + F P}

(2)

Specificity, also known as the true negative rate (TNR) and represented by Equation (5), is defined as the proportion of true negatives that are correctly identified compared to the total number of actual negative outcomes [51].

S p e c i f i c i t y = \frac{T P}{(T P + F N)}

(3)

Recall, represented by Equation (4), indicates how many of the truly positive examples were classified. A model with high recall is effective in recognizing the majority of the positive cases in the dataset. On the other hand, a model with low recall has limitations in identifying all positive cases present in the data [52].

R e c a l l = \frac{T P}{T P + F N}

(4)

The F1 score seeks a balance between precision and recall, as it penalizes both false positives and false negatives, and is calculated using Equation (5):

F 1 - s c o r e = 2 \times \frac{A c c u r a c y \times R e c a l l}{A c c u r a c y + R e c a l l}

(5)

In addition to the previously mentioned metrics, there are advanced metrics that expand the evaluation of convolutional neural networks. Among these, the geometric precision, decisiveness, and robustness stand out, aimed at analyzing the model’s confidence in different probability regions. Additionally, the index balanced accuracy (IBA) and G_mean allow for the assessment of its performance in unbalanced datasets. Finally, metrics such as the NetScore and information density integrate precision with architectural and computational complexity, being especially relevant for applications on resource-constrained devices [44,45,46].

3.9. Loss Function

The loss function used by a model is undoubtedly a key element in evaluating its performance. Algorithms adjust their parameters through this function, which measures the difference between the model’s predictions and the actual data. When the loss function yields a high value, it indicates a significant discrepancy between the predictions and the observed results [53].

The categorical cross-entropy loss function is extensively used in classification tasks due to its effectiveness in quantifying the divergence between the predictions generated by the model and the true labels represented in one-hot format. Hui and Belkin [54] state that, although this function is prominent in widely used implementations, such as those in Hugging Face Transformers, its superiority is not clearly defined across all domains. They also highlight the limitation of this function in generating well-calibrated probabilities, indicating that the softmax activation at the end of the model does not always adequately represent the actual confidence probability. Each of these metrics is suitable for different purposes, such as preserving local structures in images or optimizing the perception of visual quality [55,56].

4. Materials and Methods

The methodology applied in this work is CRISP-DM, which consists of 6 phases [57]: bussiness understanding, data understanding, data preparation, Modeling, evaluation, deployment.

Each of the phases of the applied methodology is described below:

4.1. Business Understanding

The understanding of the problem in this work focuses on identifying the need to improve the early detection of leaf diseases in banana crops using deep learning models, as described in Section 1.

4.2. Data Understanding

As we can see in Figure 2, 900 images of banana leaves were collected, classified into three categories: 300 images of leaves affected by Black Sigatoka, 300 images of those affected by Cordana, and 300 images of healthy leaves.

In order to collect images, we visited several banana farms in the province of El Oro. With the collaboration of an agronomist, we were able to identify and classify the previously mentioned diseases, ensuring that the collected images were representative and relevant for the study.

4.3. Data Preparation

Next, data preparation was carried out, which involved data cleaning to remove images that contained noise or where the leaf was not clearly visible due to blurriness or poor quality. Additionally, the images were separated into three distinct classes: healthy, affected by Black Sigatoka, and affected by Cordana.

Once classified, the images were resized to a standard size of 224 × 224 pixels to ensure consistency in the model’s input. Regarding data augmentation, although initially 900 images were used (300 per class), it was decided to increase the number of images using data augmentation techniques with the TensorFlow Keras ImageDataGenerator class. Transformations included rotations of up to 45 degrees, horizontal and vertical shifts of up to 20%, cropping transformations with a range of 0.2, random zoom of up to 20%, horizontal flipping with a probability of 50%, and brightness adjustments ranging from 50% to 150%. These techniques increased the variability of the images, generating an expanded dataset of 9000 images to improve the model’s representativeness.

To ensure that the model could generalize well and avoid underfitting, the dataset was divided into two parts: the training set, which represented 80% of the images, and the validation set, which included the remaining 20%. This division allowed the model to train on a wide variety of examples while maintaining a separate dataset to evaluate its performance on images not seen during training. This strategy helped to obtain a more accurate measure of the model’s generalization ability, preventing the model from overfitting the training data and promoting a more objective assessment of its performance.

4.4. Modeling

A deep learning analysis was conducted, focusing on convolutional neural networks (CNNs) applied to the classification of leaf diseases. These architectures have been proven to be robust in handling large volumes of data, allowing us to develop generalizable systems [58,59,60].

In this study, not only were three main models evaluated, but additional architectures were also tested, such as NasNetLarge, Inception ResNetV2, AlexNet, and MobileNet. However, due to overfitting issues, achieving accuracy of around 33% without any improvement, these latter models had to be discarded. As a result, the three models that demonstrated better performance and generalization capabilities were selected.

In the following sections, the architectures of the deep learning models used in this study are described in detail. Their structural components are analyzed, from the input layers to the output layers, emphasizing the specific configurations of each model and the modifications implemented to optimize the classification of leaf diseases in bananas.

As shown in Figure 3, the architecture begins with the input of images sized 224 × 224 pixels and progresses through a series of convolutional layers, highlighting the use of MBConv mobile layers. These layers progressively reduce the spatial resolution while increasing the depth of the extracted features, facilitating the learning of patterns present in the leaves.

The network consists of seven MBConv blocks applied repetitively, designed to enhance the detection of visually similar features. Subsequently, a flattening layer transforms the features into a one-dimensional vector, followed by a dense layer with 32 neurons, responsible for classification. To reduce the risk of overfitting, a dropout layer is included, and the architecture concludes with a softmax layer that classifies the inputs into three categories.

As shown in Figure 3, which describes the data input and the development of the architecture, the model presented in Figure 4 maintains the general structure previously described.

However, unlike its predecessor, this version consists of five convolutional blocks. The model begins with the conv1 layer, which has 64 filters, and progresses through the conv2, conv3, conv4, and conv5 layers, progressively increasing the number of filters to 128, 256, and 512, respectively, while the spatial resolution gradually decreases.

Similarly, as observed in Figure 3 and Figure 4, the model in Figure 5 presents an analogous structure; however, it differs at the start by incorporating the conv1 layer with 64 filters.

From this layer, the model progresses through conv2, conv3, conv4, and conv5, gradually increasing the number of filters to 128, 256, and 512. This modification allows for the capture of more complex features in the leaf images, while the optimized design of the blocks, with the inclusion of multiple filters, enhances the model’s learning capacity.

Next, the training and comparison of the models were conducted using three pre-trained deep learning architectures, namely ResNet50, EfficientNetB0, and VGG19, all with initial weights pre-trained on ImageNet. To ensure a fair comparison, the hyperparameter configuration (Table 1) was kept constant across all three cases, varying only the pre-trained neural network.

A sequential architecture was used, excluding the original top layer, which was replaced with task-specific layers. The layers of the pre-trained network were kept frozen; in other words, their weights were not updated during training, preserving the knowledge previously acquired with the ImageNet dataset. Following the pre-trained network, custom layers were integrated: a flatten layer to flatten the data, followed by a dense layer with 32 neurons and a ReLU activation function, to which L2 regularization was applied to prevent overfitting. Subsequently, a dropout layer with a rate of 50% was added to further reduce the risk of overfitting. Finally, a dense layer with softmax activation was responsible for classifying the inputs into the three corresponding categories: healthy, Black Sigatoka, and Cordana.

The model was compiled using the Adam optimizer with a learning rate of 0.001 and the loss function (categorical_crossentropy), suitable for multi-class classification problems. Training was conducted for 100 epochs, with a batch size of 64. The training results were monitored using TensorBoard, allowing for the visualization of loss and accuracy curves for both training and validation data, thus facilitating a detailed analysis of the model’s performance.

In addition, an input size of 224 × 224 pixels was chosen, in line with standard pre-trained models such as EfficientNetB0, ResNet50, and VGG19. This choice allowed us to leverage pre-trained ImageNet weights, improving the model’s performance and accelerating convergence by providing better parameter initialization.

A batch size of 64 was selected for training, balancing stability and computational efficiency without demanding excessive memory usage. One hundred epochs were set to ensure adequate convergence and minimize the risk of premature overfitting.

To optimize the training process, the Adam optimizer was used, known for dynamically adjusting the learning rates of each parameter, facilitating more efficient convergence. The learning rate of 0.001 was chosen based on empirical evidence showing its effectiveness for these models, ensuring a balance between rapid convergence and stability.

The top layer of the pre-trained model was deactivated (Include_top = False) to allow for a custom final classification layer tailored to the specific task. A flatten layer was used for dimensionality reduction before classification, preserving key spatial information without losing important features.

Finally, a softmax activation function was employed in the classification layer, which is ideal for multi-class classification problems, enabling the efficient handling of multiple categories.

5. Results

5.1. Evaluation

The results obtained from the three convolutional neural network models (EfficientNetB0, ResNet50, and VGG19) used in this project are presented below. To evaluate their performance in classifying leaf diseases in banana crops, key evaluation metrics such as accuracy, recall, and the F1-score were used; these metrics reflect the effectiveness of each model in correctly classifying the images, providing an objective basis for the comparison their disease detection capabilities, including diseases such as Black Sigatoka and Cordana.

Presenting these metrics is essential to identify which of the evaluated architectures offers better performance in this specific task, considering both their accuracy and generalization abilities.

In Figure 6, the confusion matrix of the ResNet50 model is presented, which illustrates its performance in classifying healthy banana leaves and those affected by Cordana and Black Sigatoka. The model successfully classified 57 out of 60 leaves with Cordana, presenting an error rate of three samples, which were incorrectly classified as Black Sigatoka. In the case of healthy leaves, the performance was very good, with 59 out of 60 correct classifications and a single error regarding Cordana. However, the performance in classifying leaves affected by Black Sigatoka was lower, achieving 44 out of 60 correct predictions, while 16 leaves were incorrectly classified as Cordana, indicating significant confusion between these two diseases.

In general terms, the confusion matrix observed in Figure 6 reveals that most of the model’s errors are concentrated in the classification of Black Sigatoka. This suggests that the features extracted by the ResNet50 model may not be sufficiently discriminative to differentiate between Cordana and Black Sigatoka. This difficulty could be related to the visual similarity of the lesions caused by both diseases, especially in the advanced stages, which significantly complicates their differentiation.

Figure 7 shows the evolution of the accuracy per epoch for the ResNet50 model during training and validation on a dataset of leaf images. The curves reflect how the accuracy changes over 100 epochs. The accuracy gradually increases, reaching nearly 90%, but decreases after epoch 60.

The accuracy stabilizes at around 90% from epoch 30, showing more consistent behavior than during training. Visually, we can say that the discrepancy between the two curves suggests that the model memorizes the training data rather than generalizing correctly.

Figure 8 illustrates the evolution of the loss for the ResNet50 model during training and validation over 100 epochs. A rapid decrease in loss is observed in the early epochs, indicating that the model learns effectively at the beginning of the process. From epoch 10, the loss stabilizes at low values, remaining close to that of validation, suggesting a model without signs of overfitting.

The validation loss follows a similar pattern to the training loss, with a rapid decrease in the early epochs. Subsequently, both curves show slight fluctuations but remain close to each other, indicating the consistent evolution of the model. From epoch 50, both curves stabilize and present low and similar loss values, suggesting that the model has adequately converged on both the training and validation data.

The training process lasted approximately one and a half hours, indicating that the model achieved good optimization in a reasonable time. In conclusion, the overall behavior of the loss suggests that the model was well fitted, as the loss curves for training and validation were similar throughout the process. Additionally, there were no signs of overfitting, as the validation loss did not show significant increases in the final epochs.

The confusion matrix in Figure 9 reveals a significant pattern of errors in classifying leaves with diseases, particularly in the classes of Cordana and Black Sigatoka. The EfficientNetB0 model shows a tendency to confuse these two classes, indicating the inherent difficulty of the model in identifying distinctive features between both diseases. This confusion is likely due to the visual similarity in the symptoms that they present, such as leaf spots, which complicates their differentiation. Nevertheless, the model achieves the accurate classification of healthy leaves, indicating that it can clearly recognize the absence of symptoms.

Figure 10 shows the evolution of the accuracy of the EfficientNetB0 model during training and validation over 100 epochs. The model’s accuracy improves rapidly in the first 10 epochs, reaching a value close to 95% in the training set, where it stabilizes with slight fluctuations, indicating that the model learns effectively from the training dataset.

The validation accuracy also experiences a rapid increase during the early epochs but stabilizes at around 88%, showing consistent performance, although slightly lower than that obtained in training. This suggests that the model generalizes well, although there is a slight gap between the two accuracy values.

Similarly, Figure 11 shows the evolution of the loss of the EfficientNetB0 model during training and validation over 100 epochs. The loss drops quickly in the first 10 epochs, reaching low values of around 0.5, which suggests that the model learns efficiently from the training data. After this point, the loss stabilizes, with slight fluctuations around this value, indicating that the model has achieved a good fit on the training data. The validation loss also decreases rapidly at first but stabilizes around higher values, close to 0.8, reflecting somewhat lower performance compared to that on the training data.

The EfficientNetB0 model shows overall good performance and a proper fit to the training data, completing the process in about half an hour. Similarly, in Figure 7 and Figure 8, we observe that the accuracy and loss functions over the 100 epochs demonstrate the favorable evolution of learning. The accuracy in both the training and validation sets converges at around 90%, while the loss function decreases rapidly in the early epochs before stabilizing, indicating an effective fit to the data.

The confusion matrix for the VGG19 model, shown in Figure 12, reflects a trend similar to that observed in the previous models, where the highest number of errors is concentrated in classifying the Black Sigatoka class. Specifically, the model struggles to differentiate between Cordana and Black Sigatoka, resulting in a considerable amount of confusion between these two categories. This may be related to the visual similarity of the lesions on the leaves affected by both diseases. However, it is important to highlight that the VGG19 model demonstrates high accuracy in classifying healthy leaves, as it shows no errors in this category, indicating that the model is effective in detecting the absence of symptoms.

Figure 13 shows the evolution of the accuracy of the VGG19 model during training and validation over 100 epochs. The accuracy increases rapidly during the first 10 epochs, reaching values close to 88%. From this point on, the accuracy stabilizes with slight fluctuations, indicating a good fit to the training data.

The validation accuracy follows a similar pattern to that of the training, with a rapid improvement in the first 10 epochs. Subsequently, it remains stable at around 87%, with some minor variations, suggesting that the model generalizes well to unseen data.

Figure 14 shows the evolution of the loss of the VGG19 model during training and validation over 100 epochs. The loss decreases rapidly during the early epochs, dropping from over 4 to values close to 0.5, suggesting that the model learns effectively from the training data. After epoch 10, the loss stabilizes, remaining at low values, indicating the proper fit of the model to the training data.

The validation loss also decreases quickly at first but stabilizes at around 1 starting from epoch 10, with minor fluctuations throughout the training. This suggests that the model generalizes well to the validation data, although it shows a greater loss compared to the training data.

In Figure 13 and Figure 14, we can observe that the accuracy and loss functions over the 100 epochs show the favorable evolution of learning. It is important to highlight the training time, which was around 7 h, reflecting the time required for this model due to its greater complexity and the size of the data. The accuracy in both the training and validation sets converges at around 90%, while the loss function decreases rapidly in the early epochs before stabilizing, indicating an effective fit to the data.

5.2. Evaluation of the Best Model

In this study, although an expanded dataset of 9000 images was available, obtained by applying data augmentation techniques to an original set of 900 images, the complete training of all models using the entirety of these images was not carried out due to processing time limitations. Initial tests showed that the training time varied significantly among the selected models. EfficientNetB0, being a lighter and more efficient model, completed its training in a relatively short time, approximately 30 min, allowing the model to be used with the complete set of 9000 images.

In contrast, ResNet50 and VGG19 exhibited significantly longer training times, with ResNet50 taking about 1.5 h and VGG19 nearly 7 h when using the initial set of 900 images. Given this situation, it was concluded that training both models with the 9000 images would not be feasible within the project’s time frame, as this would have considerably extended the total processing time, impacted the study’s efficiency, and limited the opportunity for multiple fine-tuning iterations and comparative evaluations.

Therefore, the EfficientNetB0 model was the only one trained with the complete set of images due to its shorter training time, making it suitable for working with larger datasets without affecting the total experiment time. This strategy facilitated quicker results and allowed for a more thorough evaluation of its performance in a larger data context. In contrast, the other models were trained using a reduced set of 900 images, enabling a general comparison of their performance without compromising the available processing time.

As we can see in Figure 15, we visualized the confusion matrix showing the model’s performance in classifying leaves affected by Cordana and Black Sigatoka and healthy leaves. The model correctly identified 71.67% of the leaves with Cordana, although it confused 26.67% of cases with Black Sigatoka. In classifying healthy leaves, the model performed excellently, with accuracy of 99.5%. Regarding Black Sigatoka, the model correctly classified 92.33% of the leaves but confused 6% of cases with Cordana. Overall, the model showed good performance, with high accuracy in identifying healthy leaves, although it struggled to differentiate between Cordana and Black Sigatoka, suggesting possible visual similarities between the diseases.

Figure 16 illustrates the accuracy achieved during the training and validation process of the EfficientNetB0 model, using a dataset that was expanded through augmentation techniques. This analysis covered a total of 100 epochs, where the vertical axis represents the accuracy and the horizontal axis indicates the corresponding epochs.

Two curves can be observed in the graph. The blue line denotes the accuracy obtained on the training set. Initially, there is a notable increase in accuracy, which stabilizes at around 0.85, showing slight fluctuations as the training progresses. The pink line represents the accuracy on the validation set. This curve consistently remains above the training curve, reaching values close to 0.89. This behavior suggests the robust performance of the model on data that were not used during training.

At the end of the training, the accuracy achieved was 87.83% on the training set and 86.32% on the validation set. These results indicate that the model learned effectively, with no significant evidence of overfitting, as the difference between these accuracy values was moderate. The training process was completed in approximately 3.13 h.

Figure 17 illustrates the evolution of the loss of the EfficientNetB0 model during the training and validation process on the augmented dataset, over a total of 100 epochs. In this context, the vertical axis represents the loss value, while the horizontal axis indicates the epochs.

The blue line reflects the loss on the training set, showing a rapid decrease during the initial stages of the process and stabilizing around a value of 0.65, with moderate fluctuations throughout the epochs. This behavior suggests that the model effectively reduces its prediction errors in the early phases, although the rate of improvement slows over time.

On the other hand, the pink line represents the loss on the validation set, which consistently remains lower than the training loss. This indicates that the model performs better on the validation data, with the loss values stabilizing near 0.57.

The final results reveal a training set loss of 0.6444 and a validation set loss of 0.5735. Both values demonstrate a significant reduction from the start of the training process, reflecting the model’s effective learning. It is worth noting that the training process was completed in approximately 3.13 h.

5.3. Deployment

Following the principles of the CRISP-DM methodology [61], the final stage of the project focused on developing a mobile application designed for the deployment of the previously trained neural network. This step allowed the model’s results to be transferred to a practical environment, evaluating its functionality under conditions similar to a production environment and demonstrating its adaptability in more dynamic scenarios. The main interface (Figure 18) of the application includes a scanner that enables the capture of leaf images for analysis, classifying them into three categories: Black Sigatoka, Cordana, and healthy leaves.

The application was developed using Angular as the framework to build a modular and efficient user interface, while Firebase was utilized for data management, authentication, and hosting. Additionally, a section of the application (Figure 19) provides specific recommendations based on the results of the disease prediction.

The pre-trained neural network was integrated into the application, enabling predictions to be executed directly from a mobile device. This not only validates the model’s performance under conditions closer to a production environment but also demonstrates the possibility of implementing artificial intelligence solutions outside controlled settings, such as laboratories.

This deployment marks a significant step within the research framework by transitioning the model’s capabilities to a production environment, while focusing on evaluating and documenting its behavior, accuracy, and potential limitations in applied scenarios.

6. Discussion

6.1. Discussion of the Results Obtained

The models ResNet50, EfficientNetB0, and VGG19 were evaluated in terms of their accuracy, loss, generalization capabilities, and training time, yielding diverse results that can help in selecting the most suitable model according to the specific requirements of the leaf disease detection system.

In Table 2, the results in terms of all metrics used during the training of each neural network model are summarized.

Table 2 shows the comparative performance of three deep learning models (ResNet50, EfficientNetB0, and VGG19) in classifying three categories of banana leaves: Black Sigatoka, Cordana, and healthy leaves. They were evaluated using key metrics such as precision, recall, the F1 score, and the overall accuracy.

ResNet50 stands out for its high precision in identifying healthy leaves and cases of Black Sigatoka, meaning that it correctly classifies the detected cases. However, its recall is lower for the Black Sigatoka class, indicating that it may fail to detect all existing cases of this disease. Overall, this model achieves the highest global accuracy (88.90%), positioning it as the most accurate model among the three.

EfficientNetB0 exhibits balanced performance, with more consistent metrics across all classes. Although its global accuracy is slightly lower than that of ResNet50 (88.33%), it maintains a better balance between precision and recall, indicating that it can detect more cases without significantly compromising correct classifications. This suggests it is a versatile and reliable model for this type of classification.

VGG19 demonstrates acceptable performance across all metrics, excelling in detecting healthy leaves, where it achieves perfect recall (1.00). However, its global accuracy is the lowest among the three models (87.22%), indicating that it is less effective in overall classification.

6.2. Training and Validation Precision and Loss

The VGG19 model achieved the best training set accuracy, reaching values close to 88%, with a loss that rapidly decreased in the initial epochs and stabilized at around 0.44.

EfficientNetB0, on the other hand, showed more balanced performance, with the validation accuracy converging at around 88% and the loss stabilizing at approximately 0.80 for validation and 0.45 for training. This behavior indicates a lower tendency to overfit and a higher capacity for generalization.

In comparison, ResNet50 demonstrated a good fit on the training set but showed a significant discrepancy in the validation accuracy and loss, revealing challenges in generalizing, particularly in the classification of Black Sigatoka.

6.3. Generalization Capabilities

In terms of generalization, EfficientNetB0 proved to be the most robust model, with minimal differences between the training and validation curves, indicating a good balance between both. VGG19 exhibited the largest gap between the loss curves, signaling lower generalization capabilities. Similarly, ResNet50 displayed a tendency toward overfitting compared to the other models, which primarily impacted the classification of leaves affected by Black Sigatoka.

6.4. Training Time

The training time was a critical factor in the model comparison. EfficientNetB0 emerged as the most efficient, completing training in approximately 30 min, making it a viable option for systems requiring rapid implementation without sacrificing accuracy. In contrast, VGG19 took nearly 7 h to complete training, which, despite achieving high precision, may not be ideal in terms of the computational cost. ResNet50 required 1.5 h of training, representing a middle ground in terms of time but with inferior generalization performance compared to EfficientNetB0.

6.5. Disease Classification Performance

Regarding the disease classification performance, VGG19 excelled in identifying healthy leaves and those affected by Cordana. However, like ResNet50, it faced challenges in correctly distinguishing leaves affected by Black Sigatoka. EfficientNetB0, although also prone to errors in Black Sigatoka classification, demonstrated a more consistent balance across the classes, highlighting its ability to handle complex datasets and reduce the confusion among similar categories.

6.6. Comparison with the Results of Other Studies

The use of neural networks for the identification of diseases such as Cordana and Black Sigatoka in banana crops has generated growing interest in the scientific community. However, studies in this area still face significant challenges, which underlines the need to continue refining these methodologies for their application in real agricultural scenarios. A fundamental aspect to consider is the similarity in the appearance of the spots caused by both diseases, which can complicate accurate differentiation through deep learning models. Below are some of the main challenges identified in the literature:

The scarcity of specific studies on the detection of Cordana and Black Sigatoka using neural networks;
Few works that compare the performance of different neural network architectures in the detection of these pathologies;
Complexity in the collection and annotation of high-quality images, considering the similarity in the foliar lesions of these diseases.

In the following section, a comparative analysis is presented between the results obtained in this study and those reported in previous research. Table 3 summarizes these findings, allowing an assessment of the progress and the main limitations identified in this field.

In our study, the pre-trained architectures ResNet50, EfficientNetB0, and VGG19 achieved accuracy scores of 88.90%, 88.33%, and 87.22%, respectively, with ResNet50 showing the best performance. To improve the generalizability of the model, data augmentation with the TensorFlow Keras ImageDataGenerator was applied. However, due to the long processing times, EfficientNetB0 was chosen, which, with 88.33% accuracy, offered a balance between performance and computational efficiency. Compared to the DCNN model of Rajalakshmi et al. [12], which reports 99% accuracy, it is important to note that this study does not show evidence of class balancing or techniques to avoid underfitting, which could affect the model’s generalizability. Our results confirm that pre-trained models optimized with data augmentation strategies and efficient architecture selection can offer competitive and viable performance for field applications.

7. Conclusions

This study evidences the effectiveness of deep learning techniques in the detection of foliar diseases in banana crops, focusing on Black Sigatoka and Cordana. Among the models analyzed, EfficientNetB0 proved to be the most efficient, reaching accuracy of 88.33% and completing its training in only 30 min. Although ResNet50 showed slightly higher accuracy (88.90%), its training time was significantly longer (1.5 h), as was that of VGG19, which achieved 87.22% accuracy in 7 h. These results position EfficientNetB0 as the most suitable choice for early disease diagnosis systems, thanks to its ability to balance accuracy and computational costs. This finding underlines the importance of efficient models in the agricultural context, enabling the implementation of scalable and sustainable solutions for disease management in banana crops.

This research focused on the diseases Black Sigatoka and Cordana; however, its approach can be extended to other diseases, such as moko, Panama disease, or mosaic. Additionally, the implementation of advanced techniques can be considered, such as the coarse-to-fine double constraint network for camouflaged object detection, as well as the investigation of algorithms applied to multispectral images.

Looking ahead, innovative approaches such as multispectral and near-infrared (NIR) imaging could be considered, which would open up new possibilities for the detection of diseases that are not visible in the traditional spectrum, improving the diagnostic accuracy. In addition, the integration of IoT sensors to monitor key environmental variables, such as the temperature and humidity, could contribute to the more accurate prediction of disease outbreaks, correlating the environmental conditions with the appearance of pathologies in real time. Finally, the development of autonomous monitoring platforms, such as drones with advanced sensors, would allow the automated, real-time monitoring of large areas of plantations, optimizing diagnosis and information management.

Author Contributions

Conceptualization, N.J. and S.O.; methodology, S.O.; software, N.J.; validation, N.J. and S.O.; investigation, N.J. and S.O.; resources, N.J. and S.O.; data curation, N.J.; writing—original draft preparation, S.O.; writing—review and editing, B.M.-O. and W.R.-A.; supervision, B.M.-O., W.R.-A. and I.R.-M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Universidad Técnica de Machala, AutoMathTIC Research Group, Research Project with Resolution No. 0244-2024-CU-SO-13: “Adopción de las Tecnologías de Inteligencia Artificial e Internet de las Cosas en el Sector Agropecuario de la Provincia de El Oro”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available at https://github.com/NixonJimenez02/deep-learning-banana-diseases (accessed on 10 March 2025).

Acknowledgments

We thank the Universidad Técnica de Machala for funding this research work.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CNN	Convolutional Neural Network
ResNet	Residual Network
VGG	Visual Geometry Group
PPV	Positive Predictive Value
NPV	Negative Predictive Value
TPR	True Positive Rate
TNR	True Negative Rate
TP	True Positives
TF	True Negatives
FP	False Positives
FN	False Negatives

References

Ciancio, A.; Rosso, L.C.; Lopez-Cepero, J.; Colagiero, M. Rhizosphere 16S-ITS Metabarcoding Profiles in Banana Crops Are Affected by Nematodes, Cultivation, and Local Climatic Variations. Front. Microbiol. 2022, 13, 855110. [Google Scholar] [CrossRef] [PubMed]
Voora, V.; Larrea, C.; Bermudez, S. Global Market Report: Bananas; International Institute for Sustainable Development: Winnipeg, MB, Canada, 2020. [Google Scholar]
Esguera, J.G.; Balendres, M.A.; Paguntalan, D.P. Overview of the Sigatoka Leaf Spot Complex in Banana and Its Current Management. Trop. Plants 2024, 3, e002. [Google Scholar] [CrossRef]
Strobl, E.; Mohan, P. Climate and the Global Spread and Impact of Bananas’ Black Leaf Sigatoka Disease. Atmosphere 2020, 11, 947. [Google Scholar] [CrossRef]
Yonow, T.; Ramirez-Villegas, J.; Abadie, C.; Darnell, R.E.; Ota, N.; Kriticos, D.J. Black Sigatoka in Bananas: Ecoclimatic Suitability and Disease Pressure Assessments. PLoS ONE 2019, 14, e0220601. [Google Scholar] [CrossRef]
Mathew, D.; Kumar, C.S.; Anita Cherian, K. Classification of Leaf Spot Diseases in Banana Using Pre-Trained Convolutional Neural Networks. In Proceedings of the 2023 International Conference on Control, Communication and Computing (ICCC), Thiruvananthapuram, India, 19–21 May 2023; pp. 1–5. [Google Scholar]
Lin, H.; Zhou, G.; Chen, A.; Li, J.; Li, M.; Zhang, W.; Hu, Y.; Yu, W.T. EM-ERNet for Image-Based Banana Disease Recognition. J. Food Meas. Charact. 2021, 15, 4696–4710. [Google Scholar] [CrossRef]
Salehin, Y.; Siddique, A.; Nafisa, A.T.; Jahan, I.; Priyanka, M.M.; Ul Islam, R.; Hasan, M.; Rahman, A.; Rashid, M.R.A. A Comparative Analysis on Transfer Learning Models to Classify Banana Diseases- Fusarium Wilt and Black Sigatoka. In Proceedings of the 2024 2nd International Conference on Advancement in Computation & Computer Technologies (InCACCT), Gharuan, India, 2–3 May 2024; pp. 208–213. [Google Scholar]
Elinisa, C.A.; Mduma, N. Mobile-Based Convolutional Neural Network Model for the Early Identification of Banana Diseases. Smart Agric. Technol. 2024, 7, 100423. [Google Scholar] [CrossRef]
Thiagarajan, J.D.; Kulkarni, S.V.; Jadhav, S.A.; Waghe, A.A.; Raja, S.P.; Rajagopal, S.; Poddar, H.; Subramaniam, S. Analysis of Banana Plant Health Using Machine Learning Techniques. Sci. Rep. 2024, 14, 15041. [Google Scholar] [CrossRef]
Banana Leaf Spot Diseases (BananaLSD) Dataset. Available online: https://www.kaggle.com/datasets/shifatearman/bananalsd (accessed on 19 December 2024).
Rajalakshmi, N.R.; Saravanan, S.; Arunpandian, J.; Mathivanan, S.K.; Jayagopal, P.; Mallik, S.; Qin, G. Early Detection of Banana Leaf Disease Using Novel Deep Convolutional Neural Network. J. Data Sci. Intell. Syst. 2024. [Google Scholar] [CrossRef]
Sanga, S.L.; Machuve, D.; Jomanga, K. Mobile-Based Deep Learning Models for Banana Disease Detection. Eng. Technol. Appl. Sci. Res. 2020, 10, 5674–5677. [Google Scholar] [CrossRef]
Blanc-Talon, J.; Delmas, P.; Philips, W.; Popescu, D.; Scheunders, P. (Eds.) Advanced Concepts for Intelligent Vision Systems: 20th International Conference, ACIVS 2020, Auckland, New Zealand, 10–14 February 2020, Proceedings; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2020; Volume 12002, ISBN 978-3-030-40604-2. [Google Scholar]
Abbas, A.; Jain, S.; Gour, M.; Vankudothu, S. Tomato Plant Disease Detection Using Transfer Learning with C-GAN Synthetic Images. Comput. Electron. Agric. 2021, 187, 106279. [Google Scholar] [CrossRef]
PlantVillage Dataset. Available online: https://www.kaggle.com/datasets/abdallahalidev/plantvillage-dataset (accessed on 20 November 2024).
Atila, Ü.; Uçar, M.; Akyol, K.; Uçar, E. Plant Leaf Disease Classification Using EfficientNet Deep Learning Model. Ecol. Inform. 2021, 61, 101182. [Google Scholar] [CrossRef]
Jadhav, S.B.; Udupi, V.R.; Patil, S.B. Identification of Plant Diseases Using Convolutional Neural Networks. Int. J. Inf. Technol. 2021, 13, 2461–2470. [Google Scholar] [CrossRef]
Anh, P.T.; Duc, H.T.M. A Benchmark of Deep Learning Models for Multi-Leaf Diseases for Edge Devices. In Proceedings of the 2021 International Conference on Advanced Technologies for Communications (ATC), Ho Chi Minh City, Vietnam, 14–16 October 2021; pp. 318–323. [Google Scholar]
PlantVillage. Available online: https://www.kaggle.com/datasets/mohitsingh1804/plantvillage (accessed on 20 November 2024).
Liu, J.; Wang, X. Plant Diseases and Pests Detection Based on Deep Learning: A Review. Plant Methods 2021, 17, 22. [Google Scholar] [CrossRef] [PubMed]
Hamed, B.S.; Hussein, M.M.; Mousa, A.M. Plant Disease Detection Using Deep Learning. Int. J. Intell. Syst. Appl. 2023, 15, 38–50. [Google Scholar] [CrossRef]
Amjoud, A.B.; Amrouch, M. Object Detection Using Deep Learning, CNNs and Vision Transformers: A Review. IEEE Access 2023, 11, 35479–35516. [Google Scholar] [CrossRef]
Bangalore Vijayakumar, S.; Chitty-Venkata, K.T.; Arya, K.; Somani, A.K. ConVision Benchmark: A Contemporary Framework to Benchmark CNN and ViT Models. AI 2024, 5, 1132–1171. [Google Scholar] [CrossRef]
Rivas-Asanza, W.; Mazon-Olivo, B.; Tusa-Jumbo, E. Reconocimiento de Patrones En Imágenes. In Redes Neuronales Artificiales Aplicadas al Reconocimiento de Patrones; Rivas-Asanza, W., Mazon-Olivo, B., Eds.; Universidad Técnica de Machala: Machala, Ecuador, 2018; pp. 61–126. ISBN 978-9942-24-100. [Google Scholar]
Moya, E.; Campoverde, E.; Tusa, E.; Ramirez-Morales, I.; Rivas, W.; Mazon, B. Multi-Category Classification of Mammograms by Using Convolutional Neural Networks. In Proceedings of the 2017 International Conference on Information Systems and Computer Science (INCISCOS), Quito, Ecuador, 23–25 November 2017; pp. 133–140. [Google Scholar]
Rivas-Asanza, W.; Mazon-Olivo, B.; Mejía-Peñafiel, E. Generalidades de Las Redes Neuronales Artificiales. In Redes Neuronales Artificiales Aplicadas al Reconocimiento de Patrones; Rivas-Asanza, W., Mazon-Olivo, B., Eds.; Universidad Técnica de Machala: Machala, Ecuador, 2018; pp. 11–35. ISBN 978-9942-24-100. [Google Scholar]
González, Y.; Rivas-Asanza, W.; Mazon-Olivo, B.; Tusa, E. Revision of Classification Schemes in Tissue Carcinogenic Mammography Images. Alternativas 2018, 19, 72–83. [Google Scholar] [CrossRef]
Turay, T.; Vladimirova, T. Toward Performing Image Classification and Object Detection with Convolutional Neural Networks in Autonomous Driving Systems: A Survey. IEEE Access 2022, 10, 14076–14119. [Google Scholar] [CrossRef]
Chen, L.; Li, S.; Bai, Q.; Yang, J.; Jiang, S.; Miao, Y. Review of Image Classification Algorithms Based on Convolutional Neural Networks. Remote Sens. 2021, 13, 4712. [Google Scholar] [CrossRef]
Diwan, T.; Anirudh, G.; Tembhurne, J.V. Object Detection Using YOLO: Challenges, Architectural Successors, Datasets and Applications. Multimed. Tools Appl. 2023, 82, 9243–9275. [Google Scholar] [CrossRef]
Elaraby, A.; Hamdy, W.; Alruwaili, M. Optimization of Deep Learning Model for Plant Disease Detection Using Particle Swarm Optimizer. Comput. Mater. Contin. 2022, 71, 4019–4031. [Google Scholar] [CrossRef]
Ali, A.H.; Yaseen, M.G.; Aljanabi, M.; Abed, S.A.; Gpt, C. Transfer Learning: A New Promising Techniques. Mesopotamian J. Big Data 2023, 2023, 29–30. [Google Scholar] [CrossRef]
Zhang, X.; Gao, J. Measuring Feature Importance of Convolutional Neural Networks. IEEE Access 2020, 8, 196062–196074. [Google Scholar] [CrossRef]
Amin, H.; Darwish, A.; Hassanien, A.E.; Soliman, M. End-to-End Deep Learning Model for Corn Leaf Disease Classification. IEEE Access 2022, 10, 31103–31115. [Google Scholar] [CrossRef]
Khomidov, M.; Lee, J.-H. The Novel EfficientNet Architecture-Based System and Algorithm to Predict Complex Human Emotions. Algorithms 2024, 17, 285. [Google Scholar] [CrossRef]
Yapici, M.M.; Tekerek, A.; Topaloglu, N. Performance Comparison of Convolutional Neural Network Models on GPU. In Proceedings of the 2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT), 23–25 October 2019; IEEE: Baku, Azerbaijan; pp. 1–4. [Google Scholar]
Singh, V.; Baral, A.; Kumar, R.; Tummala, S.; Noori, M.; Yadav, S.V.; Kang, S.; Zhao, W. A Hybrid Deep Learning Model for Enhanced Structural Damage Detection: Integrating ResNet50, GoogLeNet, and Attention Mechanisms. Sensors 2024, 24, 7249. [Google Scholar] [CrossRef]
J., A.; Eunice, J.; Popescu, D.E.; Chowdary, M.K.; Hemanth, J. Deep Learning-Based Leaf Disease Detection in Crops Using Images for Agricultural Applications. Agronomy 2022, 12, 2395. [Google Scholar] [CrossRef]
Nguyen, T.-H.; Nguyen, T.-N.; Ngo, B.-V. A VGG-19 Model with Transfer Learning and Image Segmentation for Classification of Tomato Leaf Disease. AgriEngineering 2022, 4, 871–887. [Google Scholar] [CrossRef]
Awan, M.J.; Masood, O.A.; Mohammed, M.A.; Yasin, A.; Zain, A.M.; Damaševičius, R.; Abdulkareem, K.H. Image-Based Malware Classification Using VGG19 Network and Spatial Convolutional Attention. Electronics 2021, 10, 2444. [Google Scholar] [CrossRef]
Rodrigues, I.; Santos, G.L.; Sadok, D.F.H.; Endo, P.T. Classifying COVID-19 positive X-ray using deep learning models. IEEE Lat. Am. Trans. 2021, 19, 884–892. [Google Scholar] [CrossRef]
Iman, M.; Arabnia, H.R.; Rasheed, K. A Review of Deep Transfer Learning and Recent Advancements. Technologies 2023, 11, 40. [Google Scholar] [CrossRef]
Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How Transferable Are Features in Deep Neural Networks? In Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS’14), Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
Ardalan, Z.; Subbian, V. Transfer Learning Approaches for Neuroimaging Analysis: A Scoping Review. Front. Artif. Intell. 2022, 5, 780405. [Google Scholar] [CrossRef] [PubMed]
Ahmad, S.; Ansari, S.U.; Haider, U.; Javed, K.; Rahman, J.U.; Anwar, S. Confusion Matrix-Based Modularity Induction into Pretrained CNN. Multimed. Tools Appl. 2022, 81, 23311–23337. [Google Scholar] [CrossRef]
Chipindu, L.; Mupangwa, W.; Mtsilizah, J.; Nyagumbo, I.; Zaman-Allah, M. Maize Kernel Abortion Recognition and Classification Using Binary Classification Machine Learning Algorithms and Deep Convolutional Neural Networks. AI 2020, 1, 361–375. [Google Scholar] [CrossRef]
Arias-Duart, A.; Mariotti, E.; Garcia-Gasulla, D.; Alonso-Moral, J.M. A Confusion Matrix for Evaluating Feature Attribution Methods. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 3709–3714. [Google Scholar]
Zapeta Hernández, A.; Galindo Rosales, G.A.; Juan Santiago, H.J.; Martínez Lee, M. Métricas de rendimiento para evaluar el aprendizaje automático en la clasificación de imágenes petroleras utilizando redes neuronales convolucionales. Cienc. Lat. Rev. Cient. Multidiscip. 2022, 6, 4624–4637. [Google Scholar] [CrossRef]
Machine Learning for Brain Disorders; Colliot, O., Ed.; Neuromethods; Springer: New York, NY, USA, 2023; Volume 197, ISBN 978-1-07-163194-2. [Google Scholar]
Strelcenia, E.; Prakoonwit, S. Improving Classification Performance in Credit Card Fraud Detection by Using New Data Augmentation. AI 2023, 4, 172–198. [Google Scholar] [CrossRef]
George, C.A.; Barrera, E.A.; Nelson, K.P. Applying the Decisiveness and Robustness Metrics to Convolutional Neural Networks. arXiv 2020, arXiv:2006.00058. [Google Scholar]
Wong, A. NetScore: Towards Universal Metrics for Large-Scale Performance Analysis of Deep Neural Networks for Practical On-Device Edge Usage. In Proceedings of the Image Analysis and Recognition: 16th International Conference, ICIAR 2019, Waterloo, ON, Canada, 27–29 August 2019. [Google Scholar]
Hui, L.; Belkin, M. Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks. arXiv 2020, arXiv:2006.07322. [Google Scholar]
Zhao, H.; Gallo, O.; Frosio, I.; Kautz, J. Loss Functions for Neural Networks for Image Processing. arXiv 2018, arXiv:1511.08861v3. [Google Scholar]
Lopez-Betancur, D.; Bosco Duran, R.; Guerrero-Mendez, C.; Zambrano Rodríguez, R.; Saucedo Anaya, T. Comparación de arquitecturas de redes neuronales convolucionales para el diagnóstico de COVID-19. Comput. Sist. 2021, 25, 601–615. [Google Scholar] [CrossRef]
Schröer, C.; Kruse, F.; Gómez, J.M. A Systematic Literature Review on Applying CRISP-DM Process Model. Procedia Comput. Sci. 2021, 181, 526–534. [Google Scholar] [CrossRef]
Zhang, Y.; Wa, S.; Liu, Y.; Zhou, X.; Sun, P.; Ma, Q. High-Accuracy Detection of Maize Leaf Diseases CNN Based on Multi-Pathway Activation Function Module. Remote Sens. 2021, 13, 4218. [Google Scholar] [CrossRef]
Balafas, V.; Karantoumanis, E.; Louta, M.; Ploskas, N. Machine Learning and Deep Learning for Plant Disease Classification and Detection. IEEE Access 2023, 11, 114352–114377. [Google Scholar] [CrossRef]
Mahouachi, D.; Akhloufi, M.A. Recent Advances in Infrared Face Analysis and Recognition with Deep Learning. AI 2023, 4, 199–233. [Google Scholar] [CrossRef]
Linero-Ramos, R.; Parra-Rodríguez, C.; Espinosa-Valdez, A.; Gómez-Rojas, J.; Gongora, M. Assessment of Dataset Scalability for Classification of Black Sigatoka in Banana Crops Using UAV-Based Multispectral Images and Deep Learning Techniques. Drones 2024, 8, 503. [Google Scholar] [CrossRef]
Yan, K.; Shisher, M.K.C.; Sun, Y. A Transfer Learning-Based Deep Convolutional Neural Network for Detection of Fusarium Wilt in Banana Crops. AgriEngineering 2003, 5, 2381–2394. [Google Scholar] [CrossRef]

Figure 1. Confusion matrix and model evaluation metrics.

Figure 2. Authors’ own dataset, collected from Orense farms.

Figure 3. Integrated EfficientNetB0 architecture with custom extension layers.

Figure 4. Integrated architecture of ResNet50 with custom extension layers.

Figure 5. Integrated architecture of VGG19 with custom extension layers.

Figure 6. Confusion matrix—ResNet50.

Figure 7. Learning evolution curve for ResNet50 model: fuchsia line (training) vs. yellow line (validation).

Figure 8. Loss evolution curve for ResNet50 model: fuchsia line (training) vs. yellow line (validation).

Figure 9. Confusion matrix—EfficientNetB0.

Figure 10. Learning evolution curve of the EfficientNetB0 model: green line (training) vs. orange line (validation).

Figure 11. Loss evolution curve of the EfficientNetB0 model: green line (training) vs. orange line (validation).

Figure 12. Confusion matrix for VGG19.

Figure 13. Learning evolution curve of the VGG19 model: orange line (training) vs. dark blue line (validation).

Figure 14. Loss evolution curve of the VGG19 model: orange line (training) vs. dark blue line (validation).

Figure 15. Confusion matrix of EfficientNetB0 model with 9000 images.

Figure 16. Learning curve of the EfficientNetB0 model with 9000 dataset points: sky blue line (training) vs. fuchsia line (validation).

Figure 17. Loss curve of the EfficientNetB0 model with 9000 dataset points: sky blue line (training) vs. fuchsia line (validation).

Figure 18. Scanning system for the identification of two diseases (Black Sigatoka and Cordana) and healthy leaves.

Figure 19. Recommendations for management the detected disease based on its classification.

Table 1. Parameter configuration.

Parameter	Value
Input size	224 × 224
Batch size	64
Number of epochs	100
Optimizer	Adam
Learning rate	1 × 10⁻³
Include_top	False
Weights	ImageNet
Pooling	Flatten
Classes	3
Classifier activation	Softmax

Table 2. Evaluation metrics for the models.

Class	Metric	ResNet50	EfficientNetB0	VGG19
Black Sigatoka	Accuracy Recall F1 score	94% 73% 82%	84% 82% 83%	82% 78% 80%
Cordana	Accuracy Recall F1 score	73% 82% 77%	82% 83% 86%	78% 80% 82%
Healthy	Accuracy Recall F1 score	100% 98% 99%	94% 98% 96%	97% 100% 98%
Global	Accuracy	88.90%	88.33%	87.22%

Table 3. Comparative table of various studies using convolutional neural network models applied to banana diseases.

Authors	Classes Considered	# Images	Model Used	Accuracy
Linero-Ramos et al. [61]	Total images	3180	EfficientNetV2B3	87.33%
	Black Sigatoka	1890	VGG19	83.94%
	Healthy	1290	MobileNetV2	77.20%
Sanga et al. [13]	Fusarium wilt race 1	3000 *	ResNet152	99.20%
Sanga et al. [13]	Black Sigatoka		Inceptionv3	95.41%
Yan K., Chowdhury K., and Jin S. [62]	Total images	156
	Fusarium wilt race	72	ResNet50	98.00%
	Healthy	84
Elinisa, Mduma [9]	Fusarium wilt race	27.360 *	CNN model	91.17%
	Black Sigatok
	Healthy
Rajalakshmi N. et al. [12]	Total images	803	DCNN	98.92%
	Cordana	86
	Healthy	164
	Pestalotiopsis	131
	Sigatoka	422
Models applied in our proposed work	Total images	900	EfficientNetB0	88.33%
	Black Sigatoka	300	ResNet50	88.90%
	Cordana	300	VGG19	87.22%
	Healthy	300

* Does not specify the size of the data subset for each class.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jiménez, N.; Orellana, S.; Mazon-Olivo, B.; Rivas-Asanza, W.; Ramírez-Morales, I. Detection of Leaf Diseases in Banana Crops Using Deep Learning Techniques. AI 2025, 6, 61. https://doi.org/10.3390/ai6030061

AMA Style

Jiménez N, Orellana S, Mazon-Olivo B, Rivas-Asanza W, Ramírez-Morales I. Detection of Leaf Diseases in Banana Crops Using Deep Learning Techniques. AI. 2025; 6(3):61. https://doi.org/10.3390/ai6030061

Chicago/Turabian Style

Jiménez, Nixon, Stefany Orellana, Bertha Mazon-Olivo, Wilmer Rivas-Asanza, and Iván Ramírez-Morales. 2025. "Detection of Leaf Diseases in Banana Crops Using Deep Learning Techniques" AI 6, no. 3: 61. https://doi.org/10.3390/ai6030061

APA Style

Jiménez, N., Orellana, S., Mazon-Olivo, B., Rivas-Asanza, W., & Ramírez-Morales, I. (2025). Detection of Leaf Diseases in Banana Crops Using Deep Learning Techniques. AI, 6(3), 61. https://doi.org/10.3390/ai6030061

Article Menu

Detection of Leaf Diseases in Banana Crops Using Deep Learning Techniques

Abstract

1. Introduction

2. State of the Art

3. Literature Review

3.1. Deep Learning

3.2. Convolutional Neural Networks

3.3. EfficientNet B0

3.4. ResNet50

3.5. VGG19

3.6. Transfer Learning

3.7. Confusion Matrix and Associated Metrics for Classification Problems

3.8. Performance Metrics for CNN Models

3.9. Loss Function

4. Materials and Methods

4.1. Business Understanding

4.2. Data Understanding

4.3. Data Preparation

4.4. Modeling

5. Results

5.1. Evaluation

5.2. Evaluation of the Best Model

5.3. Deployment

6. Discussion

6.1. Discussion of the Results Obtained

6.2. Training and Validation Precision and Loss

6.3. Generalization Capabilities

6.4. Training Time

6.5. Disease Classification Performance

6.6. Comparison with the Results of Other Studies

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI