An Improved Detection Method for Crop & Fruit Leaf Disease under Real-Field Conditions

: Using deep learning-based tools in the field of agriculture for the automatic detection of plant leaf diseases has been in place for many years. However, optimizing their use in the specific background of the agriculture field, in the presence of other leaves and the soil, is still an open challenge. This work presents a deep learning model based on YOLOv6s that incorporates (1) Gaussian error linear unit in the backbone, (2) efficient channel attention in the basic RepBlock, and (3) SCYLLA-Intersection Over Union (SIOU) loss function to improve the detection accuracy of the base model in real-field background conditions. Experiments were carried out on a self-collected dataset containing 3305 real-field images of cotton, wheat, and mango (healthy and diseased) leaves. The results show that the proposed model outperformed many state-of-the-art and recent models, including the base YOLOv6s, in terms of detection accuracy. It was also found that this improvement was achieved without any significant increase in the computational cost. Hence, the proposed model stood out as an effective technique to detect plant leaf diseases in real-field conditions without any increased computational burden.


Introduction
Crops, fruits, and vegetables are the economic backbone of a country.Pakistan's GDP relies on some major cash crops for domestic use & it also got its hefty share from exports of seasonal fruits [1].Outdated agri-practices and inefficient use of technology have drastically affected local utilization and foreign trade.Pathogenic infestation hinders growth and consequently reduces the yield of the agri-products [2].
Wheat and cotton are Pakistan's most important cash crops, which are extensively grown in the Punjab province.However, the yield of the wheat crops has been adversely affected by various factors in the last few years [3].The environmental conditions of Sindh and Punjab favor the growth of fungal pathogen (Puccinia & Urocystic tritici) that are considered to be more destructive agents for wheat, resulting in the occurrence of rust & smut that affect all parts of the plant and stunts its growth [4,5].The country is also the fourth-largest producer of mangoes in the world [6].Anthracnose is considered the most devastating disease for the growth and yield of mango.It starts from leaf and twigs and spreads in all parts of the mango plant [7].
Timely remedial actions can prevent the spread of the disease.Over the past few years, the low production of these crops and fruits has led agriculturists to adopt modern agricultural methods [8].One such method is the timely identification of disease on the plant and the recommendation of an effective treatment for it.Farmers depend on visual symptoms in conventional farming to identify the specific disease type and its stage.However, visual symptoms can often be similar for multiple diseases and can be influenced by weather conditions.Accurately distinguishing between biotic pathogen attacks and abiotic nutritional deficiencies requires a considerable level of technical expertise or assistance from pathologists [9].
For that reason, machine learning and deep learning techniques are extensively being used in stress and disease identification of plants due to their ability to learn deep intricate features to be learned from image datasets at better speed and accuracy [10].With the increase in sustainable agriculture the need to employ computer vision-based techniques to revolutionize.Object detection-based methods have been improvised in the past few years to improve weed identification, pest control, and plant disease detection [11].One of the challenges faced by the researchers is to locate and classify the type of stress in real-world scenarios [12].The more robust and accurate the model will be and trained on large datasets that can be used for agricultural applications.Such trained models can be deployed on hardware platforms or embedded systems to develop automatic plant disease detection platforms [13].Convolutional neural network-based techniques [10,14] are extensively used to extract relevant information from the diseased area.Unlike the traditional machine learning-based methods [15], it does not classify the diseased area on the lesion's colour, background, size, and shape.The same cannot be used for large datasets or real-field agricultural applications.
Recent studies have extensively used computer vision-based object detection techniques as an automated tool to locate and classify the plant stress type [9,16].To localize and classify the diseased part in an image, researchers worldwide have thrived to look for an accurate and efficient model.Saleem et al. [17] have trained & fine-tuned different meta architectures like SSD, RFCN, and Faster RCNN to detect 26 deceased and 12 healthy parts of the plants.A training accuracy of 73.07%was obtained for the SSD model while using the Adam optimizer.Authors in [18] proposed a novel SSD-based model fusing attention mechanism with VGG feature extractor.An improved accuracy performance was observed for the PlantVillage dataset.In another work [19], authors have localized and classified diseased areas of plants using a novel deep-learning framework.The improved RefineDet model achieves a remarkable accuracy of 99.994% on the plantVillage dataset.The quest to attain better accuracy at low cost has been the prime objective of researchers in the past few years.Chowdhury et al. [20] analyzed the performance of lightweight models on plain and segmented images of tomato leaves.The EfficientnetB4 model achieved 99.89% accuracy in classifying 10 tomato healthy and diseased classes.Wang et al. [21] proposed a lightweight YOLOv5 model to attain improved detection results on public and self-collected datasets.The model weight reduction was achieved using GhostNet and the weighted box fusion method.Most of the work showcased near-to-ideal accuracies on datasets having images with plain or clear backgrounds or a single leaf image.The detection ability of these trained models will be limited when they are tested on images captured in varying and difficult environmental conditions.A novel Bidirectional transposition feature pyramid network [22] was proposed to detect apple leaf diseases in complex real-field conditions with a remarkable accuracy performance.The cross-attention module was used to detect relevant feature information.In another notable work, Zhao et al. [23] integrated a coordinate attention module into the backbone of the You Only Look Once (YOLOv5s) model to detect small buds and occluded flowers in real-field conditions effectively.
Although the research in the field of deep learning-based disease detection in plants is quite mature at the moment, locating and identifying the type of stress in real-field conditions remains challenging for computer vision experts in the field.The main aim of the proposed work is to extract diseased areas on the respective plant considering the unconstrained environmental conditions.Images, when captured in real-field conditions, may vary in scale because of the inconsistent distance between the target leaf/plant and the camera.The diseased plant in the background of the target leaf can also be the cause of scale invariance.Moreover, varying lighting conditions, complex & similar backgrounds, and variability of diseased symptoms pose further challenges for the detection model.A few images, shown in Figure 1 from the self-collected dataset, explain these typical scenarios.In this paper, we propose an improved deep-learning framework to localize and classify various plant diseases in real-field conditions.Major contributions of the proposed model are given as follows: • A fine-tuned efficient model (based on YOLOv6s) was trained and optimized using Gaussian error linear unit (GELU) in the backbone of the model.That improved model's generalization in detecting small and complex objects.• Efficient channel attention was introduced in the basic Rep Block in the neck region of the base model (YOLOv6s) to improve the accuracy and recall of the detection model without any additional computational cost.

•
To improve the regression accuracy, the Generalized-IoU (GIoU) loss in the base YOLOv6s model is replaced with the SCYLLA-IoU (SIoU) loss function in the proposed model.

•
The authors present a self-collected dataset comprising 3305 images captured in real field conditions.

Object Detection
When it comes to classifying and localizing a particular class in a complex scene, object detection algorithms play a vital role [18].The detection performance of deep learningbased object detectors has gained breakthrough performance in computer vision [24].Object detection models employ deep learning methods to form single-stage and twostage detectors.Localization and classification of the object are carried out simultaneously in two-stage detectors so they offer superior performance [25].However, SSD models directly classify and form a bounding box around the object, thus making the detection process faster, but accuracy may be compromised.Most recent studies have employed single-stage detectors over two-stage detectors by working on the methods to improve the classification and detection accuracy [18,26,27].Also, the use of anchor-free detectors [28] made the inference process simpler and more generalized compared to anchor-based methods.The study focuses on a swifter and accurate detecting model that can be employed for real-time detection; hence YOLOv6 model was employed for the detection of plant disease for our specific dataset due to its better accuracy performance as compared to other detectors with similar inference speed [29].

YOLOv6 Model
YOLOv6 [28] is a deep learning-based one-stage object detector.The model is chosen for its improved baseline performance compared with more recent state-of-the-art object detection models on our specific dataset.The model outperforms others regarding inference speed, model convergence, and accuracy.The backbone, neck, and head are the basic parts of the YOLOv6 model.The reparametrized VGG-style backbone and anchor-free detection suit several hardware-based real-time applications.The anchor point-based paradigm makes it suitable to predict the detection results as the regression branch calculates the distance from all sides of the bounding box to the anchor box.The model employs Varifocal loss (VFL) [30] and Distribution Focal Loss (DFL) for detection.
The YOLOv6 model comes in several versions like YOLOv6-L, YOLOv6-M, YOLOv6-S and YOLOv6-Nano.However, the authors selected YOLOv6-S due to its reasonable accuracy and low computation cost.The feature extraction task is carried out in the backbone to the neck structure to aggregate low-level and high-level semantic information.The reparameterized backbone and neck incorporate VGG networks and skip connections.The resulting RepBlock [31] encompasses the effective classification performance of VGG and better accuracy of ResNets.The backbone and neck structures are GPU-friendly, and the model can also be used for hardware applications.
YOLOv6-s uses EfficientRep and RepBiFPAN as backbone and neck structures.Multiscale features from reparamterized blocks are aggregated using PAN structure in the neck.The BiFusion block makes the low-level feature concatenation more effective.The aggregated features are passed to the Efficient Decoupled head, which performs the classification and regression tasks separately.This decoupled head strategy reduces the complexity and enhances the accuracy as in the YOLOX model [9].The VFL is used as classification loss along with any IoU (Intersection Over Union) loss for regression purposes.Additionally, DFLs are used to improve bounding box localization in YOLOv6 large and medium models [28].Unlike YOLOX, the model uses Task Aligned Assignment (TAL) as a label assignment technique instead of SimOTA.The latter is considered to be a slower technique used with anchor-free detectors.

Efficient Channel Attention
The attention mechanism enhances the focus on important features in image processing applications [27].Recently, using SE nets has shown great performance in improving the accuracy performance of various models.Channel attention like SEnet has been employed to improve performance in image classification and segmentation [32].The spatial dimensions of the input image are squeezed into channel-wise information to attain better accuracy performance.Attention mechanisms can be attained by aggregating important features or by combining channel and spatial attention [33].
However, all such methods come with loads of computational costs.Efficient Channel Attention (ECA) performs better as the method integrates cross-channel interaction.Adaptive kernel strategy is adopted to capture cross channel and its neighbours' information [34].After the implementation of channel-wise global pooling, k channel is adaptively determined to perform 1D convolution.Sigmoid activation is applied afterward to generate the attention channel weights.
In our proposed model, we have added the ECA attention mechanism in Rep Block to form a RepEA Block discussed in Section 2.4.1.The computational process of Efficient channel attention [34] for an input feature map having dimensions of H × W × C where H is the height, W is the width, and C is the number of channels.The average value for each channel is calculated as shown in Equation (1).
The formulation for all channels to share the same parameters can be shown in Equation (2) where local cross-channel interaction is implemented with 1D Convolution using k channel.
The obtained channel weight factor G ECA (A c , θ) after applying sigmoid σ to the mapped values.The weighted output feature map is obtained by element-wise multiplication of the original input feature map A cij with obtained channel weight as shown in Equation (3).
where ⊙ shows element-wise multiplication.The overall structure of the ECA net is shown in Figure 2.

The Proposed Methodology
An improved YOLOv6s model is proposed with RepEA block used in the modified neck structure of the model discussed in Section 2.4.1.The incorporation of GELU activation in RepBlock to promote the model-improved performance for non-linear fitting instead of ReLU is discussed in Section 2.4.2.The proposed model is fine-tuned for training on our self-collected dataset given in Section 3.

RepEA Block
The main building block of YOLOv6 is the Reparametrized block, namely RepBlock [35].In the small model, the E f f icientRep backbone consists of this RepBlocks.RepBlock is a stack of RepVGGblock.The reparametrized VGG style blocks are a parallel addition of 3 × 3 convolution, a 1 × 1 convolution, and a batch normalization operation (BN).The results are aggregated and pass through a non-linearity operation of ReLU [35].However, in our modified RepEA block, GELU activation achieves this non-linearity.Channel attention with minimum complexity by adding an ECA layer in the Repblock.It generated channel attention using 1D convolution as shown in Figure 2. The REPEA block is modified RepBlock in the neck region of YOLOv6s.

Gaussian Error Linear Unit (GELU)
The nonlinear activation function enables the model to learn intricate input features and establish a meaningful transformation between input and output data [36].GELU non-linearity [37] as expressed in Equation ( 5) is a smooth and differentiable alternative to the ReLU function, as shown in Equation ( 4).It offers more smoothness than ReLU as it weights inputs based on percentile instead of sign.Therefore, GELU is popular in vision transformers and NLP AI models.
ReLU activation effectively imparts non-linearity but is non-differentiable at zero, leading to the problem of vanishing gradients.The proposed model uses gradient-based optimization, and GELU computes and ensures the existence of the necessary gradient for backpropagation.GELU exhibits linearity for values less than zero, as shown in Figure 3.

Hyper-Parameter Tuning
The learning process in deep learning models greatly relies on the values of hyperparameters that process the training procedure [38].For the improved YOLOv6 model, we iteratively varied the hyperparameters to attain better accuracy and convergence performance.By adjusting various types of augmentations like mixup, flipping, scaling, and mosaic, the model was trained for 100 epochs.The base learning rate was lowered and momentum was adjusted to improve the accuracy consistently during training.Distribution Focal loss (DFL) is not adopted as the small and lightweight variant has no significant effect on the performance.A list of hyperparameters is shown in Table 1.The two main steps in an object detection model are to accurately predict the bounding box around ground truth and correctly classify the object's class in the bounding box.Bounding box regression loss defines the penalty that lies between the ground truth (B gt ) and prediction boxes (B).The loss is evaluated based on various parameters, i.e., the aspect ratio of boxes, distance between their centres and overlapping area.To evaluate the overlap of predicted and ground truth boxes, the intersection over union (IoU) metric is used.As evident, an object detector's mean accuracy precision (mAP) is based on IoU loss.The loss is constantly improved during training, leading to better detection and classification.Several commonly used losses are CIoU, DIoU, and GIoU loss functions.However, in the proposed work, the efficiency of SIoU loss [39] function is utilized where the metrics are refined.The SIoU loss function after incorporating angle cost information to determine the mismatch direction between B and B gt .The RepEA block replaces the RepBlock block in the neck to improve the detection of small targets under complex background conditions.The neck structure contains the CSP-styled stack of RepBlocks.The RepEA block is added after the Bifusion block, where the low-level, high-level, and current features are fused.Adding an attention mechanism will enhance the fused features aggregated by a path-aggregated network (PAN) at different scales.

The Self-Collected Crop & Fruit Disease Dataset
A challenging dataset is of prime importance for training a deep-learning model.Evaluating the detection model's performance in challenging environmental conditions through smartphone usage could pave the way for the creation of a smartphone-assisted application intended for future use by farmers.For this purpose, a dataset comprising 1353 images and 3305 images after augmentation is collected.Various wheat, cotton, and mango diseases were captured from the southern Punjab region using a smartphone camera with resolution equal to 5 megapixels contained in Samsung SM-A217F smartphone).The dataset is composed of: Images were captured in various lighting situations, from various angles, and against different backgrounds.Additionally, the distance between the camera and the plants was deliberately changed to introduce variation.The image was collected from various fields and orchids from March 2023 to August 2023.The diseased images were located, and after identifying the proper symptoms in the presence of a plant pathologist were captured.After careful filtering 87 blurred, improperly captured images were discarded.Some classes of wheat & cotton were further enriched by public datasets available on kaggle [40], CGIAR dataset [41] and CoSEV dataset(authors' public dataset) [9] respectively.A selection of images was also sourced from the internet to enhance the dataset's diversity.For this, we used the famous web sources of Google Images and Bing.Collecting images from various sources makes the dataset more challenging and will help train a more generalized deep-learning model.The detail of numbers of images captured by smartphone, sourced from the internet and public datasets in each class is mentioned in Table 2. Apart from 3 healthy classes, the dataset comprises wheat yellow rust, wheat brown rust, wheat stem rust, wheat smut, Mango anthracnose, mango nutrient deficiency, and cotton leaf curl.The snapshot of 10 classes of the collected dataset (marked as x-axis) is shown in Figure 5. Number of images (marked as y-axis) contained in each class is shown in Figure 6.We have introduced several augmentations to enhance the effectiveness of the detection model and address the challenges posed by images taken in varying lighting, angles, and zoom conditions.These include vertical and horizontal image flipping, 25 • image rotation, and a ±25% brightness adjustment.The effect of data enhancement process is shown in Figure 7.As a result of these augmentations, the total training images were increased three times.

Experimentation
The model is trained, validated, and tested using Colab Pro, Python 3.10 with GPU Nvidia T4 V100 to accelerate the training.Cuda 11.0 is used on the Pytorch deep learning framework.Several metrics are used to verify the model's effectiveness in detecting and classifying diseased symptoms.These evaluation metrics include precision (P), recall (R), mean accuracy precision (mAP), and detection time.Precision is the ratio of correct predictions and total predictions made by the model.The mean average precision can be expressed as shown in Equation (6).
Higher mAP values indicate that the model makes more accurate predictions after training.Whereas, recall is the ratio of correct predictions to the number of ground truth detections as shown in Equation (7).
where AP i is the average precision value of each class, mAP@ 50% is a commonly used metric in object detection models.The higher the value, the more accurate the detection.
The proposed model is trained on the 2017 COCO dataset.
Initially, the mAP score was adjusted by fine-tuning of hyperparameter mentioned in Table 1.Moreover, GELU activation used in repblock effectively captures intricate feature details under varying light conditions.To improve the detection process further, the location of the ECA block was verified by adding various locations in the backbone and neck regions.Finally, best accuracy results with better recall are obtained after the addition of the REPEA block in the neck region of the model.

Results
Experiments are carried out to evaluate the performance of all other variants of YOLOv6 on the basis of accuracy, recall, training time, and computational cost.YOLOv6s version was chosen to be optimized for the specific problem addressed by this work due to its comparable performance at lower computational cost.As can be seen in Table 3, YOLOv6m took more time to train the the model so that it could perform detection accurately on our dataset with slightly improved performance metrics.The detection results of the proposed model are also compared with state-of-the-art object detection models, namely YOLOv5s, YOLOv7(base version), YOLOv8s, YOLO-NAS-s, YOLOS-s, and EfficinetDet.
All models are trained for 100 epochs using the default image size settings.A comparison of accuracy, parameters, and training times is given in Table 3.As can be seen, our proposed model outperforms all other techniques on the basis of the performance metrics considered for comparisons.YOLOv8s shows better recall performance but at the cost of increased training time and inferior mAP values at IoU = 50%.The proposed model has 17.2 M parameters and 22 GFLOPS (Floating Point Operations), significantly lower than other models.Although the parameters of EfficinetDet are much lower at the cost of poor detection performance.To consider the requirements of real-time applications, the authors considered only small versions of all models.The model is trained for 100 epochs with 2.0 warmup epochs.With a batch size of 32, a cosine lr schedule is used having an initial learning rate of 0.0036.The declining loss curve for both classification and IoU loss is shown in Figure 8.This shows that as the training process continues, the targeted class is correctly localized and classified.The IoU loss gradually decreases as the training continues.However, the green line shows that classification loss sharply falls before the 20th epoch and between 90-100th epoch.Classification loss is the correctness of an object classified in the bounding box The improved performance of the proposed model can be seen in Figure 9, where the red line shows the average precision performance for all classes at IoU threshold 0.5 compared with the green line of mAP values for the proposed model.As can be seen, the detection ability greatly improves after the 20th epoch and finally tends to converge at better values in comparison with the baseline model.
Figure 10 shows the confusion matrix of the proposed model on the test dataset.As can be seen, the stem rust and wheat healthy classes are misclassified because of their higher similarity with the real-field background.Further, since the lesion area of stem rust is small, it worked as another reason for its misclassification.Few data images of the mango nutrient-deficient class were misclassified as mango anthracnose due to their similar symptoms.The wheat smut is the most missed class by the proposed detector probably because of its smaller sample size and smaller target area on the affected leaf.In some instances, brown rust is also misclassified as yellow rust.This is because under varying lighting conditions, symptoms of the two diseases become very similar.Mango healthy and mango anthracnose are also misclassified from each other in the presence of cluttered backgrounds containing green leaves of the plant(s).Figure 11 shows the precision-recall curves of all classes at 50% mAP.As the IoU threshold increases the detection accuracy of some classes decreases.The curves of stem rust and healthy(wheat) are more affected due to similarities of their symptoms with other classes.Cotton curl and cotton healthy obtain higher scores in terms of precision and recall as most of their images have high contrast image conditions.The precision and average recall of each class is also shown in Table 4. Almost all the classes are detected with reasonably large values of mAP.Stem rust and wheat healthy classes are either not correctly detected or missed by the model.This is due to the similarity of diseased symptoms with the background.

Ablation Experiments
In an attempt to verify the effectiveness of the proposed model, numerous ablation studies have been conducted.The experimental settings and dataset version are kept the same while performing the ablation experiments to maintain comparability.The use of √ in Table 5 refers to the use of that method.As can be seen in Table 5, the training accuracy is improved after fine-tuning the baseline model.An optimization performance is achieved by changing the bounding box regression loss from GIoU loss to SIoU loss.A 2.06% increase in mAP is obtained by replacing the non-linearity function in the model backbone with GELU as compared to ReLU activation.Better convergence is obtained as GELU is non-convex, non-monotonic, and is not linear in the positive domain, in contrast to the ReLU activation function.A further increase in accuracy is observed after integrating efficient channel attention in the neck of the YOLOv6s model.
Consequently, compared to the baseline model, the improved model showed an overall increase of 7.92% in the mAP@50% score.

Discussions on Results
Some sample instances where the proposed deep learning model detects the leaf diseases are given in Figure 12.
In Figure 12b, it can be seen that the targeted class is not only detected with a better confidence score but also some targets that were treated as background in Figure 12a are also detected.In Figure 12d target is detected with better confidence and, in turn, with improved mAP performance.In Figure 12f, target localization is further improved due to the use of SIOU regression loss.
In Figure 12g, the diseased area is not detected as it is similar to the background and lightning conditions make the situation more difficult.But as can be seen in Figure 12h, the leaf rust is detected with an improved confidence score; the leaf rust class present in the background is also detected by the proposed model.As shown in Figures 10 and 12, there are several missed and false detections, which happen mostly in images with similar or cluttered background conditions and low-resolution images with blurry ground truth.However, the model detection performance was found to be superior when compared with other models.The accuracy and recall of YOLOv8s were found closer to comparable training time but still, the performance of the proposed model was above par in comparison.It can be concluded hence, that when dealing with real-field images captured via smartphone, almost all models exhibit poor performance, which is why it is still a challenge for researchers.

Conclusions & Future Work
The study aims to introduce an enhanced approach for identifying diseased areas on plants.Numerous studies in the field utilize various computer vision techniques to classify and locate plant diseases on public datasets.However, advanced recognition models often struggle to detect symptoms in intricate field environments.Challenges arise from variable lighting conditions, complex and similar backgrounds, variable lesion/diseased areas, and low contrast, making the detection process particularly challenging.In this regard, we have proposed an improved model utilizing the Efficient channel attention mechanism integrated into the baseline model of YOLOv6.The regression and localization task is further improved via fine-tuning and the use of the SIOU loss function.To improve the detection performance further GELU function is incorporated as a non-linearity function.The mAP score of the proposed model is 81.2% and an Average recall of 73.2% after 1.56 h of training.The requirement of a robust real-time detection model is better accuracy in a shorter time and lesser computational cost.The results obtained using the proposed model are also compared with other recent small and/or baseline versions of various models and are found superior in terms of recall, accuracy, and training time in complex environmental conditions.As the proposed dataset comprises images with varying resolution, the robustness of the model in the detection of small lesion areas is better compared to other models.
However, the detection accuracy suffered due to the imbalance of the dataset.Several classes are left undetected due to the low contrast of images.Wheat smut images are low in number, so the imbalance resulted in low precision.Moreover, due to varying lighting conditions and disease severity, few classes of yellow rust are falsely detected as brown leaf rust.
In the future, we intend to enhance our dataset in terms of the number of images and classes to make it further richer and closer to the real-field conditions.To make the model widely applicable to different types of plant disease detection areas, we wish to extend the number of images by covering a wide variety of crops.In addition, further studies will focus on gathering the environmental information to construct a multisource fusion model to gather information about humidity, temperature, and soil information to predict favorable conditions for a particular pathogen.That will make the early diagnosis of infected crops easier.

Figure 1 .
Figure 1.Sample images taken from the dataset showing difficult field conditions (a) Similar background (b) Shadow interference (c) varying light & complex background (d) variability of diseased symptoms (e) multiple objects in varying light.

Figure 2 .
Figure 2. The structure of the RepEA block with Efficient Channel attention embedded in the Rep block of YOLOV6.

Figure 3 .
Figure 3.The GELU function used in place of ReLU.

2. 4 . 5 .
Figure4gives the block diagram of our proposed model.Feature extraction is performed in the backbone using Repblock, RepConv, and cross-stage partial spatial pyramid pooling (CSPSPPF) block to enhance the network's learning ability further.The vanishing gradient is addressed using the CSPSPPF block, which splits the feature part into two parts, and after applying the pooling operation, important spatial features are extracted.The split features are then merged via cross-stage hierarchy.The Repblock utilizes the GELU in place of ReLU as a non-linearity operator in the backbone block of the proposed model.The YOLOv6s version of the network is used in this work, the relevant code of which can be found on GitHub.The model uses the EfficientRep structure as its backbone.

Figure 4 .
Figure 4. Proposed model for crop & fruit leaf disease detection.

• 4 classes
of wheat, namely yellow rust, brown rust, stem rust, smut & healthy wheat • 3 classes of mango leaves namely anthracnose, nutrient deficient & healthy leaf) and • 2 classes of cotton namely cotton leaf curl & healthy leaf.

Figure 5 .
Figure 5. Self Collected dataset (a) Wheat healthy (b) Wheat brown rust (c) Wheat yellow rust (d) Wheat stem rust (e) Wheat smut (f) Mango healthy (g) Mango anthracnose (h) Mango nutrient deficient (i) Cotton healthy (j) Cotton Curl.Diseased and healthy classes were manually annotated using the bounding box tool.Roboflow online tool is used to label the images using rectangular boxes.The labeling format for YOLOv6 is somewhat different from other versions of YOLO.The annotations of each image are saved in .TXT format.The annotation file contains information about the corresponding class and width, height, and coordinates of the bounding box.Labeled images were randomly split into train test and validation sets with a ratio of 88%: 5%:7% images.The directory structure containing train, test, and Validation images is linked in .YAML format.

Figure 6 .
Figure 6.Visualizing the distribution of images in each class.

Figure 8 .
Figure 8. IoU and Classification loss curves during proposed model training.

Figure 10 .
Figure 10.Confusion matrix of the proposed model on test dataset.

Figure 12 .
Figure 12.Detection results on test dataset.(a,c,e,g) results of default YOLOV6 model.(b,d,f,h) Results of Improved YOLOV6 model.

Table 1 .
A summary of hyperparameters used for training.

Table 2 .
A summary of number of images of dataset.

Table 3 .
Comparison of performance metric with different detection models.

Table 4 .
A comprehensive summary of validation results of the proposed model.

Table 5 .
Results of an ablation study conducted during training the model with varying schemes.