GLD-Det: Guava Leaf Disease Detection in Real-Time Using Lightweight Deep Learning Approach Based on MobileNet

: The guava plant is widely cultivated in various regions of the Sub-Continent and Asian countries, including Bangladesh, due to its adaptability to different soil conditions and climate environments. The fruit plays a crucial role in providing food security and nutrition for the human body. However, guava plants are susceptible to various infectious leaf diseases, leading to signiﬁcant crop losses. To address this issue, several heavyweight deep learning models have been developed in precision agriculture. This research proposes a transfer learning-based model named GLD-Det, which is designed to be both lightweight and robust, enabling real-time detection of guava leaf disease using two benchmark datasets. GLD-Det is a modiﬁed version of MobileNet, featuring additional components with two pooling layers such as max and global average, three batch normalisation layers, three dropout layers, ReLU as an activation function with four dense layers, and SoftMax as a classiﬁcation layer with the last lighter dense layer. The proposed GLD-Det model outperforms all existing models with impressive accuracy, precision, recall, and AUC score with values of 0.98, 0.98, 0.97, and 0.99 on one dataset, and with values of 0.97, 0.97, 0.96, and 0.99 for the other dataset, respectively. Furthermore, to enhance trust and transparency, the proposed model has been explained using the Grad-CAM technique, a class-discriminative localisation approach.


Introduction
The guava is a popular tropical fruit widely consumed in both urban and rural areas of Bangladesh. Belonging to the Myrtaceae family, this fruit originated in the American tropics and found its way to Portugal during the 17th century. Guava is rich in essential nutrients such as Vitamin C, Calcium, Iron, Nicotinic Acid, Vitamin B6, Magnesium, and Phosphorus, and is notably free of cholesterol. The fruit is known for its beneficial effects on various health conditions including diarrhoea, dysentery, blood pressure, diabetes, and immune system support [1,2]. Bangladesh is recognised as one of the largest guavaproducing countries globally, with extensive cultivation covering approximately 0.3 million hectares and yielding around 3.7 million metric tons annually [2]. The guava plant is highly adaptable to diverse climates and soil conditions, making it commercially significant in both subtropical and tropical regions. However, the quality and quantity of guava production are significantly impacted by the vulnerability of guava plants to leaf diseases. These diseases not only increase production costs but also hinder food supply in the national and global markets [3]. Currently, the prevailing method for detecting guava leaf diseases in Bangladesh relies on manual labour and visual observation based on farmers' experiences. This conventional approach is time-consuming, expensive, and unreliable. Moreover, incorrect disease detection can lead to misinterpretation of the situation [1,3]. Alternatively, modern techniques, such as spectrometers [4] and molecular methods like geographical locations-Pakistan and Bangladesh-to guarantee the robustness of the GLD-Det architecture. The preprocessing technique is introduced to streamline feature extraction from both datasets to enhance the classifier accuracy. The performance of the proposed model compares with several existing models based on the same datasets by using evaluation metrics, and also the Grad-CAM technique is used to emphasise the trust and the transparency of the proposed model at the end. Hence, this approach presents an ideal and sustainable disease detection system for guava leaves, particularly when integrated into smartphones.
The study's overall contributions encompass: 1. This paper presents GLD-Det, an advanced detection framework for guava leaf diseases that offers enhanced speed, robustness, precision, accuracy, efficiency, costeffectiveness, and lightweight design. This framework can effectively identify various types of guava leaf diseases.

2.
This paper presents the efficacy of incorporating various components, such as to help reduce the spatial dimensions of feature maps by introducing additional max and global average pooling layers, to tackle the overfitting problem introducing a regularisation technique called dropout layers, to accelerate the training process introducing batch normalisation layers, to provide non-linearity with efficient computation and better training stability introducing dense layers with a ReLU activation function, and to employ multiclass classification, introducing a lighter dense layer with a Soft-Max activation function. This study evaluates the impact and effectiveness of these elements in the proposed architecture.

3.
This paper suggests a customised MobileNet architecture based on transfer learning, designed to be device-friendly and can be deployable on portable and resourceconstrained computational devices in the future. This approach will enable the implementation of the model directly on devices like smartphones, eliminating the need for a cloud-based prediction system.

4.
This paper includes a comparative analysis between the proposed model and various existing base models, including EfficientNetV2B2, EfficientNetB0, EfficientNetB2, EfficientNetB1, and MobileNetV2.

5.
This paper introduces a model capable of analysing guava leaf images in dynamic and unpredictable real-time settings by using two benchmark datasets to ensure the robustness of the proposed model that does not require leaf segmentation and utilises the ReLU function, resulting in a lighter and more user-friendly solution for farmers. Furthermore, the Grad-CAM technique is applied to elucidate the functioning of the model.
The remaining sections of the paper are structured as follows: Section 2 provides a comprehensive literature review, Section 3 presents the materials and method of the proposed GLD-Det architecture, Section 4 contains the results and discussion of the models, and Section 5 concludes the study and outlines future work.

Literature Review
In the past, traditional machine learning techniques, such as random forest (RF), Knearest neighbour (KNN), and support vector machine (SVM), were utilised for detecting leaf diseases [11,12]. As an example, Song et al. [12] employed an SVM for corn leaf disease detection and achieved a detection accuracy of 89.6%. This research has utilised a multiclass dataset. Similarly, Abirami et al. [11] employed SVM and KNN-based classifiers to detect guava leaf diseases using 125 sample images, achieving detection accuracies of 97.2% and 92%, respectively. However, these proposed methods exhibited relatively low detection accuracies and faced limitations in managing large datasets that contain various essential features.
In recent years, ML has demonstrated remarkable performance and has gained significant popularity in the field of precision agriculture [3]. ML techniques demonstrate proficiency in managing large databases and are adept at executing intricate tasks, including image and disease classification, relation extraction, and pattern analysis [13]. For instance, Ji et al. [14] utilised the PlantVillage dataset and proposed a method based on CNN to detect grape plant diseases. Their approach achieved validation and test accuracies of 99.17% and 98.57%, respectively. Jiang et al. [15] utilised a real-time DL approach to classify apple leaf diseases using the Apple Leaf Diseases Dataset (ALDD). The model is based on CNN and their model achieved a processing speed of 23.13 frames per second (FPS) and a mean absolute precision (mAP) of 78.80%. A VGG-16 architecture was developed by Xu et al. [16] for corn disease identification, employing transfer learning techniques. They achieved an impressive accuracy of 95.33% on a relatively small dataset containing corn disease images taken against challenging field backgrounds. For recognizing anthracnose in walnut leaves, Anagnostis et al. [17] proposed a method that achieved an accuracy within the range of 92.40% to 98.70%. They developed the model based on CNN. To identify nine types of tomato diseases, Maeda-Gutiérrez et al. [18] developed an ensemble model, which achieved an impressive accuracy of 99.72%. They combined AlexNet, GoogleNet, and InceptionV3. For corn leaf disease identification, Wenxia et al. [19] proposed an improved CNN model with an accuracy of 95.74%. Furthermore, Wang et al. [20] devised the AT-AlexNet architecture by incorporating a new down-sampling attention module with the Mish activation function, resulting in an accuracy of 99.53% for corn disease identification. Compared to traditional ML methods, DL approaches have demonstrated higher accuracy in detecting plant diseases. However, DL models tend to have large parameter sizes and longer runtimes, making them less suitable for development on mobile terminals.
In very recent years, the You Only Look Once (YOLO) pipeline has gained popularity for real-time object detection, including applications in fruit and leaf disease detection and classification [3]. The YOLO architecture [21] is preferred due to its faster processing speed, robustness, and comparatively higher accuracy when compared to other object detection pipelines. For instance, Kateb et al. [21] proposed a model called FruitDet, designed on the YOLO pipeline, for the detection of multiple fruits in real-time within orchards. Their model utilised five benchmark datasets comprising eight different fruit classes. FruitDet employed the densely connected CNN (DenseNet) as the backbone architecture and incorporated attentive feature aggregation. This approach outperformed YOLOv3 and also provided a better performance than YOLOv4. To enhance detection performance, the method introduced blackout regularisation, which disregards the object size for head detection mapping, leading to improved results. Xu et al. [8] introduced YOLO-Jujube, a CNN-based method for jujube fruit detection and ripeness inspection using the YOLO architecture. YOLO-Jujube achieved a detection performance with 11.7 giga floating point operations per second (GFLOPs), an average precision (AP) of 88speed of 245 frames per second (FPS). The model was trained and evaluated on three recorded jujube video datasets and a total of 1959 images, outperforming the YOLOv-tiny family, particularly YOLOv3, v4, v5, and v7. Fu et al. [22] proposed YOLO-Banana, a CNNbased rapid detection model for banana bunches and stalks. YOLO-Banana achieved high detection performance with an AP of 98.4% for banana bunches and 85.98% for banana stalks, resulting in an overall mean Average Precision (mAP) of 92.19%. To detect four types of defects in real-time for kiwi fruit, Yao et al. [23] developed a modified version of YOLOv5 using a specifically curated dataset. Their method achieved a mean average precision (mAP) of 94.7% ± 0.5, demonstrating an effective defect detection performance. To identify different diseases on a single apple leaf, Roy et al. [24] proposed a real-time DL method using YOLOv4. Their approach achieved a mean Average Precision (mAP) of 92.2%, 56.9 FPS, and f1 score of 95.9% for disease detection performance. While the YOLO pipeline offers faster FPS speed and higher accuracy based on mAP and AP compared to traditional DL methods, it does have some limitations. The YOLO framework is computationally heavy, with a larger number of trainable parameters, which results in higher time and space complexity. This makes it impractical to embed the pipeline in mobile devices and inconvenient for farmers. Additionally, the YOLO pipeline may not be suitable for detecting all types of plant leaf diseases.
Researchers have conducted several studies on the detection of guava leaf diseases, though the number of studies is very few. Al Haque et al. [25] developed a CNN-based DL method that achieved better performance to detect multiple guava diseases such as rot, canker, and anthracnose. The model achieved 95% accuracy. A novel approach was proposed by Howlader et al. [26] for classifying guava leaf diseases using Deep CNN (DCNN), with a 98% accuracy rate. They used a dataset with multi-class diseases such as rust, whitefly, healthy and algal spot. Based on Red-Green-Blue (RGB) images, Almadhor et al. [27] proposed a model to detect four types of diseases on guava leaves and fruits. The model was built on an AI-driven architecture. They utilised five classifiers, including cubic SVM, bagged tree, fine KNN, boosted tree, and complex tree. Among them, the bagged tree provided the best result with 99% accuracy. However, the main limitation of this method is the small dataset, consisting of only 393 sample images. Perumal et al. [28] proposed an SVM-based approach for detecting a single disease on a guava leaf, achieving an accuracy of 98.17%. Their method focused on identifying a specific disease in individual guava leaves. Mostafa et al. [1] introduced a DCNN-based method for guava fruit disease detection. They utilised five different CNNs and achieved a classification accuracy of 97.74% using ResNet-101. Rashid et al. [3] proposed a model to detect guava leaf disease that utilises a hybrid DL framework based on the YOLOv5 model. The method incorporates several components, including a modified MobileNetv2 and U-Net with the leaf segmentation method. The researchers collected two datasets specifically for this study. The model achieved 92% accuracy for U-Net, 73% precision, 73% recall, and 71% ± 0.5 mAp for detection performance.
The aforementioned studies primarily focused on the development of large-sized models without proper consideration for model optimisation or reducing their size for practical application on end-user devices. However, there have been some research efforts in this direction. Yang et al. [29] utilised transfer learning techniques and proposed a model based on MobileNet and InceptionV3 architectures for plant leaf disease identification on mobile phones. They developed two crop disease classification models with a focus on mobile device implementation. To detect crop disease, Yu et al. [30] introduced the CDCNNv2 model, which is based on the ResNet50 architecture. This architecture can be embedded into Android applications. Similarly, Fan et al. [31] constructed a model by using an improved version of VGG16 architecture combined with transfer learning. Their model was developed for the identification of grape leaf diseases specifically designed for Android mobile phones. Overall, these studies aimed to optimise models and adapt them for mobile device applications, taking into account factors such as model size, computational efficiency, and compatibility with specific mobile operating systems. Notably, Howard et al. [10] proposed a streamlined DL architecture called MobileNets in 2017 based on depth-wise separable CNN, initially developed for mobile and embedded vision applications by building a lightweight deep neural network.
Therefore, there is a crucial need for additional research to gain a deeper understanding and knowledge of lightweight CNN-based architectures that can effectively detect leaf diseases. By reducing the model's complexity and parameters, improved accuracy can be achieved. Inspired by this concept, this research has developed the GLD-Det model, based on a modified MobileNet, which is a lightweight CNN. This proposed model offers enhanced speed, efficiency, and robustness, and can accurately identify various types of diseases present on guava leaves, even in complex scenarios.

Materials and Methods
This research proposed a transfer learning-based DL method to detect leaf disease from guava orchards in real-time scenarios. The methodology pertains to the overall approach and rationale of a research project. It involves familiarizing oneself with the techniques and theories employed in the field to develop a strategy that aligns with the research objectives.
The chosen approach and techniques should be carefully considered, ensuring that they are well-suited to the research goals and are capable of producing valid and dependable outcomes. By doing so, the methodology can provide clarity regarding the decision-making process behind the research design, demonstrating its appropriateness and potential for generating authentic and trustworthy findings. For this purpose, a 7-stage module has been constructed, which is depicted in Figure 1.

Data Collection
A dataset is comprised of a collection of unprocessed statistics and analytical materials. For this research, two guava leaf disease datasets have been chosen, constructed by experienced researchers from Pakistan and Bangladesh. To reduce the complexity, this research named these datasets Dataset D1 and Dataset D2, respectively, to track and describe these datasets throughout this research. Both datasets are benchmarked and accessed from Mendeley Data. Dataset D1 consists of 1842 images. Dataset D2 consists of 2243 images. The reason for choosing multiple datasets is to ensure the robustness of this research.

Dataset Description
Dataset D1 has four distinguished guava leaf disease classes: canker, dot, mummification, and rust, followed by a healthy class; as shown in Table 1. This dataset originates from the tropical areas of Pakistan, created and supervised by experienced researchers in this field during early 2021. The pixel size of all the images is 6000 × 4000 with 300 dpi regulation. Dataset D2 has been divided into original image and augmented image types. Each type has four distinguished disease classes: phytophthora, red rust, scab, and styler end rot, followed by a disease-free class, shown in Table 2. The pixel size of all the images is 512 × 512. This dataset originates from Bangladesh and was obtained from a large guava garden in early 2022. The data collection was carried out by a proficient team from Bangladesh Agricultural University, located in Mymensingh, Dhaka. They used a Nikon Digital Camera as advanced photographic equipment, which is a single-lens reflex (SLR) type. The model number is D3200 and has an F mount. The focal length of the camera is 1.5× and the resolution is 23.2 × 15.4 mm, which allows an efficient field of view. The camera has a CMOS sensor. During the image-capturing process, they set fps as 4 with manual focus, shutter speed as 1/250 s, and default values for other settings. The sample images of Dataset D1 and Dataset D2 are shown in Figure 2 with five distinct classes.

Data Preprocessing
This research rescaled the pixel values of 6000 × 4000 and 512 × 512 of Datasets D1 and D2, respectively, to a lower pixel size. This procedure accelerated the training time and facilitated proficient training. For EfficientNetB2 and EfficientNetV2B2, the input size was set to 260 × 260 × 3; and 240 × 240 × 3 for EfficientNetB1. The input image size was set to 224 × 224 × 3 for the proposed architecture, EfficientNetB0, and MobileNetV2. To prepare the machine learning model using various algorithms, it is common to convert raw data into pixel array formats. Prior to model training, the images in the dataset underwent preprocessing to streamline feature extraction. This process also enhanced the classifier accuracy. To represent the image data, RGB coefficients were employed, with values ranging from 0 to 255. However, dealing with higher values poses challenges. To overcome this issue, a scaling factor of 1/255 was employed to normalise the images in both datasets. As a result, all pixel values were transformed to a range between 0 and 1.
Data augmentation is an essential step in data preprocessing that involves generating additional training examples by applying different transformations to existing ones. The main purpose of data augmentation is to artificially increase the size of the training dataset, thereby enhancing its diversity. This is achieved without the necessity of collecting new data, which can be expensive and time-consuming. In this research, for both base models and the proposed model, decent data augmentation was used by applying several combinations, where parameters were set as "width_shift_range" = 0.2, "height_shift_range" = 0.2, "rotation_range" = 0.2, "vertical_flip" = True, and "horizontal_flip" = False. It is also noted that the shuffle was set as "True" during training and "False" during validation and testing.
The datasets were split into two parts for testing and training purposes. This research randomly selected 75% of images for testing and 25% of images for training from Dataset D1 and 80% and 20% from Dataset D2, respectively. Dataset D1 consists of 1842 images in total, where for training purposes 1377 images and for testing purposes 465 images were chosen. Dataset D2 consists of 5426 images including 4899 augmented images. As this research already constructed a data augmentation procedure, only original images were considered with a total number of 527 images, where 422 and 105 images were chosen for training and testing purposes, respectively. Tables 3 and 4 show the split section for the training and testing of Datasets D1 and D2, respectively.

Environment Setup
This research used Python coding for the detection mechanisms. The TensorFlow [32] and Keras library, which are open-source and freely available tools for data flow and DL models, were utilised for training all the pre-trained models. This research used Anaconda as an environment for Python, and Jupyter Notebook as a text editor. It is noted that input size, augmentation parameter, batch size, number of epochs, an optimiser with a learning rate, an activation function, an Explainable Artificial Intelligence (XAI) tool, evaluation metrics, etc., have been considered in this environment. Optimiser = "Adam", Batch Size = "16", Learning Rate = "1 × 10 −5 ", Loss = "categori-cal_crossentropy", Activation Function = "ReLU", Epochs Size = "80", Patience = "3", and Performance Metrics = "Accuracy, Precision, Recall, AUC" have been used in this research across all experiments; both base models and proposed model. This research set monitor = "val_loss", mode = "min" for early stopping. The "restore_best_weights" was set to True for overfitting problems for all the implementation processes. The GradeCam was set for XAI for only the proposed model. The input size was set as the model's requirement definition. The environmental setup details of this research are presented in Table 5, as summarised below.

Proposed GLD-Det Architecture
This research introduced transfer learning to construct a model called GLD-det for guava leaf disease detection from Dataset D1 and Dataset D2. The model is based on modified MobileNet by adding extra layers with MobileNet. Due to limited data availability, transfer learning methods can be valuable in reducing both training time and computational expenses. This research has made modifications to a well-established and robust pretrained model called MobileNet [10], which was originally trained on the ImageNet dataset. MobileNet comprises 28 convolution layers, serving as the foundational feature extraction component of the model. The MobileNet model utilises depth-wise separable convolutions that use 9 times less computation [10] than standard convolution, a type of factorised convolution that decomposes a regular convolution into two parts: a 3 × 3 Depth-wise (dw) convolution and a 1 × 1 point-wise convolution. By employing width and resolution multipliers, MobileNet achieves a smaller and faster model by sacrificing a moderate level of accuracy. Approximately 95% of MobileNet's computation time is allocated to 1 × 1 convolutions, which, in turn, account for about 75% of the model's parameters. The computational time for MobileNet per layer is shown in Table 6. This trade-off allows for a reduction in size and latency while still maintaining reasonable performance. The layer with two combined convolutions-where the first one is a depth-wise separable convolution-runs 5× times. Applying the width multiplier to any model structure enables the creation of a smaller model with a balanced trade-off between accuracy, latency, and size. The computational cost for MobileNet associated with a depth-wise separable convolution, considering a width multiplier of α, is defined below.
where α ∈ [0, 1] with typical settings of 1, 0.75, 0.5, and 0.25. α = 1 is the baseline MobileNet; D K = is the spatial dimension of the kernel assumed to be square; M = is number of input channels; D F = is the spatial width and height of a square input feature map; N = is the number of output channels.
In the proposed GLD-Det architecture, the input data enter to MobileNet, then through additional layers that have been added after MobileNet. After every convolution and separable convolution layer, batch normalisation is applied. The proposed framework, the GLD-Det architecture workflow, is shown in Figure 3. Furthermore, Table 7 displays the parameters of the additional layers of the proposed GLD-Det architecture.
At first, this research resized the Dataset D1 and Dataset D2 images to 224 × 224 pixels and split all images for testing and training purposes. Dataset D1 consists of 1842 images in total, where for training purposes 1377 images and for testing purposes 465 images were chosen. Dataset D2 consists of 527 images in total where for training purposes 422 images and for testing purposes 105 images were chosen. Then data augmentation was used to solve the overfitting problem and to ensure better prediction accuracy. To perform five class problem default classification, a layer was removed. Subsequently, the flattened layer was replaced by max pooling. Max pooling reduces the spatial dimensions of the input feature maps, making subsequent layers computationally less expensive. This reduction helps control the growth of model complexity and memory usage. Additionally, global average pooling was incorporated after the first dense and batch normalisation layers, as it aligns better with the convolutional structure by establishing connections between feature maps and categories. Moreover, these pooling techniques help mitigate overfitting concerns, resulting in a reduction in the architecture's parameter count.  However, this research has explored various regularisation techniques to enhance the model's performance. One such technique involved applying dropout regularisation with a rate of 0.3 [33]. During training, dropout randomly deactivates specific neurons, leading to enhanced accuracy and decreased loss as the training progresses. Figure 4 provides a visual representation of how dropout operates. In GLD-Det architecture, this research incorporated four dense layers with a ReLU activation function after each batch normalisation layer. With the incorporation of these dense layers, the model efficiently categorises the abundant features extracted from the convolutional layers. In the GLD-Det model, the inclusion of dense layers enhances the network's capability to organise and utilise the extracted elements more efficiently. The ReLU function is a popular choice in DL models as it aids in overcoming optimisation challenges and promotes non-linearity, which makes it capable of learning complex patterns and features in the data, and enhances the network's ability to learn and generalise from data effectively. By eliminating negative values, ReLU ensures that the network remains sparse and efficient. It reduces the risk of encountering the vanishing gradient problem, which has the potential to impede the training process. This activation function is described below.
where x = the input to the function; max(0, x) = the maximum value between 0 and x; if x ≥ 0; the function returns x itself; if x < 0; the function outputs 0. Moreover, in this research, batch normalisation [34] layers were introduced to normalise the activation of hidden layers. This method expedites the training process while also resolving internal covariate shift problems by ensuring that the input for each layer is distributed around a consistent mean and standard deviation. Batch normalisation stabilises and regularises the network during training, promoting faster convergence and improved performance. The mathematical formulation is shown below.
where X i = input over a minibatch; m = minibatch size; µ B = means; σ 2 B = variance. Now, the samples are normalised to have zero means and unit variance. To ensure numerical stability and prevent division by zero, the term is introduced in the denominator. This adjustment helps in maintaining a stable and effective normalisation process, wherê X i = activation vector as shown below.
At the end, this research used the following formula: where y i =output; γ = adjustable parameters during training process; β = learning parameters during training process.
To create a classification layer with five classes (canker, dot, mummification, rust, healthy for Dataset D1 and phytophthora, red rust, scab, styler end rot, disease-free for Dataset D2), this research utilised a dense layer consisting of five neurons. A SoftMax activation function [35] is also applied in this dense layer. The SoftMax function is frequently used for multiclass classification tasks. This function assigns probabilities to each class, and it makes sure that the probabilities are within the range of 0 to 1. The formula of this function is described below.
where z = number of neurons of the output layer; Exponential function (exp) = non-linear transformation. In this research, the proposed model utilised Adam as an optimiser. The learning rate is set to 1 × 10 −5 . Given that the research involved a multi-class detection problem, the loss function is set as categorical cross-entropy. It is specifically designed for multi-class detection problems with SoftMax output units. The formula of the categorical cross-entropy is shown below.
where p = prediction; t = targets; i = data points; j = class. This research used a confusion matrix and other matrices to evaluate the performance of all models, including proposed and base models. The results showed that the GLD-Det architecture outperformed all the base models for both Dataset D1 and Dataset D2. Moreover, the proposed architecture achieved a lower loss compared to the base models. Additionally, Grad-CAM was utilised to provide explanations and insights into how the proposed model operates.

Model Explainability Using Grad-CAM
In this research, gradient-weighted class activation mapping (Grad-CAM) [36] was incorporated to provide visual explanations for the proposed GLD-Det model, ensuring trust and confidence in its predictions. Grad-CAM is a technique used to visualise the attention of a CNN by highlighting the important regions of an original image. The algorithm computes gradients of a specific convolutional layer's output concerning the feature maps of that layer followed by a ReLU. These gradients are then weighted by the global average pooling of the gradients, resulting in a heatmap that highlights the most salient regions of the image. Grad-CAM offers an interpretable way to comprehend the internal workings of deep neural networks, aiding in debugging, model enhancement, and the communication of research findings. The mathematical details of the Grad-CAM are provided below.
where y c = The score of class c in a network before the SoftMax activation; A k = Feature map activations; α c k = Neuron weights; Z = Number of pixels in the feature map. For this research, the Grad-Cam images are described and analysed in the Results section.

Evaluation Metrics
For evaluating the results of all models, this research used various evaluation metrics. Furthermore, a confusion matrix was employed as a visualisation tool. The confusion matrix compares the actual labels (ground truth) with the predicted labels from the model. Key metrics are derived from the information provided by the confusion matrix shown in Table 8, enabling a comprehensive evaluation of the model's performance.

Results and Discussion
This research introduces the GLD-Det model, employing the transfer learning method to detect guava leaf disease in real-time using Dataset D1 and Dataset D2. The proposed model is compared with five existing CNN models, including EfficientNetV2B2, Efficient-NetB0, EfficientNetB2, EfficientNetB1, and MobileNetV2, to assess its robustness. Accuracy, precision, recall, and AUC values are calculated to evaluate the models' effectiveness. For Dataset D1, the base model EfficientNetV2B2 achieved the best performance with an accuracy of 0.85, whereas the base model MobileNetV2 achieved an unsatisfactory performance with an accuracy of 0.56. However, in Dataset D2, the MobileNetV2 achieved the best performance with an accuracy of 0.84, whereas the EfficientNetV2B2 achieved an unsatisfactory performance with an accuracy of 0.74. The EfficientNets family is heavy-weight in comparison with the MobileNet family. This research intends to propose a model that is light-weighted, provides better accuracy and precision, and is also robust at the same time, to detect guava leaf disease in real-time. The robustness of a model means that the model performs best in different settings with different types of datasets. Taking into account all the considerations and ideas discussed, MobileNet has been chosen for further modification by adding additional layers. As guava leaf disease is very infectious and lethal for guava plants, it is important to detect the disease more accurately and precisely with robustness. The GLD-Det architecture was constructed after several modifications of the base MobileNet model. The GLD-det architecture outperformed all existing models. For Dataset D1, the proposed model provided the values with an accuracy of 0.98, a precision of 0.98, a recall of 0.97, an AUC of 0.99, and Dataset D2 provided the values of 0.97, 0.97, 0.96, and 0.99, respectively. It is also noteworthy that the loss value is the smallest for both Dataset D1 and Dataset D2, compared to the other base models. The GLD-Det architecture extended the MobileNet with an additional two pooling layers such as max and global average, three batch normalisation layers, three dropout layers, ReLU as an activation function with four dense layers, SoftMax as a classification layer with the last lighter dense layer, and Adam as an optimiser, which provides the best performance. Tables 9 and 10 for Dataset D1 and Dataset D2, respectively, show the results of the models. This research employed a confusion matrix of Dataset D1 and Dataset D2, shown in Figure 5. The diagrams display a deep blue colour on the diagonal, indicating the number of instances correctly predicted by the model compared to their corresponding ground truth values.  For further clarification of the proposed model, the AUC outputs of Dataset D1 and Dataset D2 are shown in Figure 8.
The visualisation of the AUC graphs from Figure 8 reveals that the proposed model achieved an impressive AUC score, approaching 1 for both Dataset D1 and Dataset D2.
When the AUC value is higher, it indicates that the proposed GLD-Det model exhibits a robust ability to effectively detect different classes. As a result, the model's performance in the detection task was outstanding.   Through the visualisation of graphs from Figures 6 and 7, it becomes evident that the validation curves for accuracy, precision, and recall consistently outperformed the corresponding training curves. While some minor underfitting was observed in the recall curve at the beginning in Figure 6 and the precision and recall curve in Figure 7, the proposed model's performance improved over epochs, as the gap between training and validation lines decreased. To tackle both underfitting and overfitting concerns, this research implemented several measures. To prevent overfitting, early stopping was applied by continuously monitoring the validation loss for three consecutive epochs, using the patience of three. From the loss graph, it was clear that training was higher than validation loss for both datasets. It indicated that the proposed model performed well and there were no underfitting problems for both datasets. Visualising the precision and recall graphs for both datasets, it was clear that the training curves did not exceed the testing curve. It proved that the proposed model has no underfitting problems for both datasets. Overall, these efforts contributed to the robustness and performance of the proposed GLD-Det model during training and testing.
The models' parameters and floating point operations per second (FLOPs) of Effi-cientNetV2B2 [37], EfficientNetB0, EfficientNetB2, EfficientNetB1 [38], MobileNetV2 [9], and MobileNet [10] are shown in Table 11 based on the ImageNet dataset. It shows that MobileNetV2 has the lowest parameter count of 3.4 M with 0.30 B FLOPs value, whereas MobileNet has the second lowest parameter count of 4.2 M but has twice the FLOPs rather than MobileNetV2 with a value of 0.60 B. However, EfficientNetV2B2 has the highest parameter count and FLOPs with values of 10.1 M and 1.7 B, respectively. The parameter count of EfficientNetV2B2 is more than double that of MobileNetV2 and MobileNet. This research aims to construct a model using transfer learning to detect guava leaf disease, which is both lightweight and robust, has a faster computational speed, and can be implemented in mobile devices in the future. It is noted that the MobileNets architectures were originally designed for applications in mobile and embedded vision [10]. Thus, it is clear from Table 11 that MobileNetV2 and MobileNet are suitable for this criteria. However, MobileNetV2 showed inconsistency in performance based on Dataset D1 and Dataset D2, which are geographically located in two different regions-Pakistan and Bangladesh, respectively. Hence, in terms of robustness, MobileNetV2 performed inadequately. After several considerations, for the proposed GLD-Det architecture, MobileNet has been chosen for further modification, which has a lower parameter count and also has a favourable computational speed with 0.60 B FLOPs. The proposed GLD-Det architecture outperformed all existing models for both Dataset D1 and Dataset D2.
In this research, the proposed model was elucidated using Grad-CAM to analyse various convolutional layers. Grad-CAM is a technique that provides insights into how the model makes classifications based on specific areas of an input image. By generating a heatmap, Grad-CAM visualises the crucial regions in the image that influence the model's decision-making process. This visualisation aids in making informed decisions and un-derstanding the model's focus on important areas within the data image. To reduce the complexity, this research has shown the Grad-CAM images of Dataset D1 only in Figure 9. This research introduced four convolution layers, which are conv1, conv_pw_5, conv_pw _10, conv_pw_13_relu to generate Grad-CAM images for Dataset D1, which has five classes. In Figure 9, the first column represents the input images. The second column represents the output image of conv1, the third output image of conv_pw_5, the fourth output image of conv_pw_10, and the fifth output image of conv_pw_13_relu. In the initial convolution layer, the visualisation shows that the model focuses on detecting contours and borders in the images. As the process progresses through the subsequent convolution layers, it becomes evident that the layers are attempting to identify different concepts and features present in the images. To understand the GLD-Det model's performance exclusively in detecting relevant parts of the image, this research focused on the last convolution layer (conv_pw_13_relu). It was impressive that Grad-CAM highlighted the infected part extremely well. It verified that GLD-Det architecture detected leaf disease by paying attention to the most highlighted areas in the image.
Overall, the proposed GLD-Det model demonstrated substantial enhancements, achieving the highest accuracy and precision for both Dataset D1 and Dataset D2. The utilisation of Grad-CAM further ensured trust and transparency in the model's predictions. However, more extensive research is required to explore lightweight guava leaf disease detection further and compare the robustness of different models, providing valuable insights and understanding for future advancements in this domain. It is important to highlight that the types of diseases affecting guava leaves vary across various regions, which adds complexity to their detection using CV-based image processing. The limitation of guava leaf disease datasets further compounds the challenges faced by researchers in training their DL models. Therefore, the creation of additional guava leaf disease datasets is imperative within this domain. However, the GLD-Det architecture has been constructed based on modified MobileNet using the transfer learning technique and has provided an outstanding performance based on two benchmark datasets from distinct geographical locations-Pakistan and Bangladesh-to detect guava leaf disease. Observations indicate that the MobileNets architectures [10] were initially created with the intent of being used in mobile and embedded vision applications. Therefore, potential opportunities for enhancing future research involve integrating the proposed GLD-Det model into mobile devices. This would enable farmers to detect guava leaf disease using their smartphones without relying on cloud services, thereby providing them with direct benefits.
Canker Dot Healthy Mummification Rust Figure 9. The Grad-CAM images are for Dataset D1, which has five classes. The top first row is for canker, second row is for dot, the third row is for healthy, the fourth row is for mummification, and the fifth row is for rust. The first column is for the input image, and the other four columns are for the output heatmap images generated by Grad-CAM.

Conclusions
Guava leaf disease poses a significant threat to the health of guava plants, as it is highly infectious and can lead to plant death. Furthermore, it has a negative impact on both the quality and quantity of guava fruit. Utilising deep learning for early leaf disease detection can aid in mitigating these issues and assist farmers in achieving a successful harvest. However, detecting guava leaf disease is challenging due to factors such as illumination variation, leaf obstruction, changing brightness, and leaves overlapping. Existing leaf disease detection models often rely on heavy-weight neural networks and leaf segmentation, which can be resource-intensive. In response to these challenges, this research explored a transfer learning-based approach to detect guava leaf disease in real-time. The aim was to create a lightweight yet robust model that delivers an improved performance. Two benchmark datasets of guava leaf disease from Pakistan and Bangladesh were used for experimentation. Various pre-trained models, including EfficientNetV2B2, EfficientNetB0, EfficientNetB2, EfficientNetB1, and MobileNetV2, were tested to achieve optimal results. However, all models yielded unsatisfactory outcomes without exhibiting robustness when tested on the two datasets. Following numerous experiments, this research introduced the GLD-Det model by incorporating various enhancements into the base model of MobileNet. These enhancements included the additional components with two pooling layers such as max and global average, three batch normalisation layers, three dropout layers, ReLU as an activation function with four dense layers, and SoftMax as a classification layer with the last lighter dense layer. These supplementary blocks proved to be robust, enabling the extraction of more informative features, faster model training, and improved overall performance. The proposed GLD-Det model outperformed all base models that have been compared in terms of evaluation matrices for both datasets. Subsequently, the proposed model was further elucidated using Grad-CAM to enhance its transparency and provide deeper insights into its decision-making process.
To summarise, the future work of this research will concentrate on integrating the proposed model into mobile devices, enabling marginal farmers to detect guava leaf disease in real-time using their smartphones, without relying on any cloud service. Additionally, the research will explore and compare another explainable model to enhance the transparency of the detection system further.

Data Availability Statement:
There is no statement regarding the data.