A Novel Deep Learning Model for Detection of Severity Level of the Disease in Citrus Fruits

: Citrus fruit diseases have an egregious impact on both the quality and quantity of the citrus fruit production and market. Automatic detection of severity is essential for the high-quality production of fruit. In the current work, a citrus fruit dataset is preprocessed by rescaling and establishing bounding boxes with labeled image software. Then, a selective search, which combines the capabilities of both an extensive search and graph-based segmentation, is applied. The proposed deep neural network (DNN) model is trained to detect targeted areas of the disease with its severity level using citrus fruits that have been labeled with the help of a domain expert with four severity levels (high, medium, low and healthy) as ground truth. Transfer learning using VGGNet is applied to implement a multi-classiﬁcation framework for each class of severity. The model predicts the low severity level with 99% accuracy, and the high severity level with 98% accuracy. The model demonstrates 96% accuracy in detecting healthy conditions and 97% accuracy in detecting medium severity levels. The result of the work shows that the proposed approach is valid, and it is efﬁcient for detecting citrus fruit disease at four levels of severity.


Introduction
According to the FAO (FAOSTAT 2019) [1], world citrus fruit production is estimated to be at 157.98 million of tons, with oranges accounting for more than half of the total. Producers seek to produce superior fruits at a cheaper cost that are free of any disease insects and pathogens; this task can be accomplished through the use of appropriate mechanized standards and predictive maintenance techniques [2]. Fruit diseases create a substantial danger to modern farming production of citrus. The citrus sector needs early and automatic identification of diseases during post-harvesting since a few contaminated fruits might disseminate the disease to the entire sequence during processing or shipment. The severity of the disease is a crucial parameter for determining the extent of the disease and affects yield production. The ability to diagnose disease severity quickly and accurately would help to prevent production deficits; disease severity has been previously determined by trained professionals by visually inspecting plant tissues. The high cost and limited efficiency of human disease assessment stymies modernized agriculture's rapid progress [3]. This paper presents deep learning models for the image-based automatic diagnosis of citrus fruit disease severity levels. We address the issues of determining the severity of disease in citrus fruits in a multi-classification framework using a deep learning model in this paper. Section 1 presents the introduction and contributions of the paper. The rest of the paper is organized as follows: Section 2 provides the literature review. Section 3 presents the proposed algorithm for the disease and severity detection of the citrus fruits and detailed description of the materials and methodology used by the model. Further, results evaluation is presented in Section 4. Finally, the paper is concluded in Section 5.

Contributions of the Paper
The objective of this paper is to develop a deep learning model that classifies the disease according to the severity level and to identify the disease-affected area of the citrus fruit. The proposed model has the ability to recognize and classify the infected areas of citrus fruits. It is a powerful approach for automatically identifying the citrus fruit disease severity and can be further extended to reinforce a unified citrus disease identification system for real-world applications. The current study helps to mitigate and prevent the fruit disease at the initial stages and can be able to control the cost of the disease when safeguarding the surroundings globally.

Literature Review
Effective surveillance and diagnosis of resistant cultivars is critical for disease control and prevention for healthy yields. Using watershed segmentation, a novel machine vision system for the automatic identification of diseases was proposed. Two kinds of diseases, i.e., yellow rust and Septoria, were accurately detected using the proposed approach [4]. The severity of leaf rust disease can result in a reduction in sugar production. As a result, illness signs must be discovered as soon as possible, and appropriate actions should be implemented to prevent the disease from spreading or progressing. A faster region-based convolutional neural network framework was constructed by altering the parameters of the model and a faster R-CNN framework was developed for the detection of leaf spot infestation in sugar. The technique provided for severity detection of disease with image-based systems was trained on 155 images, and classification accuracy of 95.48% was obtained [5]. The citrus industry is still working on developing technologies for automatically identifying deterioration in citrus fruit throughout quality control. Using three distinct manifold learning approaches, the viability of reflectance spectroscopy in the visible and near-infrared regions was tested for the early identification of the root cause of rot by Penicillium digitatum in citrus fruit [6]. Controlling the spread of disease requires its diagnosis and then destroying the cause, particularly for citrus huanglongbing (HLB)infected trees. Ground investigation is an arduous and time-consuming task. It is rare to find a large-area analysis tool for citrus orchards with excellent efficiency. The possibility of large-area monitoring of citrus HLB using low-altitude remote sensing was explored [7]. Nowadays, citrus fruit exports to international markets are significantly hampered by fruit disorders such as citrus canker, black spot and scab. As a result, thorough procedures must be performed prior to the transportation of fruits to mitigate the presence of citrus damaged by them. A model based on a feature selection method with a classifier trained on quarantine disease for disease detection is being deployed [8]. Among the most significant components used for enhancing agricultural products, scalability and waste reduction are considered to be criteria for evaluating quality. An optimized convolutional neural network system was developed to identify visible flaws in sour lemon, evaluate them and devise a better solution. To detect and characterize abnormalities, lemon images were taken and divided into two categories, i.e., healthy and impaired. Following preprocessing, the images were classified using an improved CNN model. To improve the outcomes, a stochastic pooling mechanism with augmentation techniques was implemented [9]. A machine vision system to detect irregularities in citrus peel and evaluate the nature of the defect was designed. The image is segmented into defective zones using the Sobel gradient. Following this, color and texture features are retrieved, some of which are associated with high-order statistics [10]. Disease detection is currently conducted manually by domain experts using harmful ultraviolet rays on fruits. The utilization of hyperspectral imaging technologies allows for the advancement of systems for the automatic detection of disease. A methodology was proposed to develop a multi-classification system using the receiver operating characteristic curve to detect fungal infections in citrus fruits. The developed system helped in reducing the set of features and achieved an accuracy rate of 89% [11].

Materials and Methods
The proposed model for detecting affected areas and the severity levels of the citrus fruit disease comprises five modules, as shown in Figure 1. The first module targets the collection of citrus fruit images.The second module is used to label the healthy and infected images by using expert knowledge. For labeling the images, an open-source tool is used [12]. Labeling is the process of providing annotation to the graphical images and labeling the bounding box for object detection. Annotations of the images are stored as XML files in Pascal VOC form; the process of annotating the images is further explained in Section 3.2. The third module is the combination of graph-based segmentation and object detection to produce regions of proposal that are independent of the class. The most similar regions are grouped together and the similarity is calculated between the regions, which is further explained in Section 3.4. A CNN network using transfer learning extracts a fixed-length feature map for each region in the fourth module. The last module represents the implementation of multi-class sequential CNN models that determine the severity level of the citrus fruit disease using a softmax function, as explained in Section 3.6.

Dataset
Fruit diseases severely affect the product quality, market segment and revenue. Citrus is an important source of vitamins A and C. Citrus illnesses, on the other hand, have a negative impact on citrus fruit output and quality [11]. Citrus plants such as lemons, oranges, grapefruit and limes are susceptible to a variety of citrus diseases, such as anthracnose, HLB, scab, black spot and other fungal infections [13]. Adequate datasets are necessary for object detection and the classification process using deep learning. All the images collected for the dataset were downloaded from online datasets and collected from the sources, i.e., PlantVillage and Kaggle [14,15]. After taking the images from the publicly available source, the images were prepared for obtaining the severity of the disease with the help of a domain expert.

Annotation
Before training a model, image annotation is an essential image preprocessing step. During the training phase, a model can learn the labeled features. As a result, the quality of the training model is strongly influenced by the precision of the feature labeling. As several types of disease appear to be relatively similar, knowledge of the different types of fruit diseases could aid the machine in learning traits important to different fruit diseases. A scientist of horticulture helped with the data annotation. The expert considered the diameter, color features, shape and the surface area of the affected portion of the disease present in the image in order to determine the extent of damage in the fruit. The labeling only included the exterior features of the image, while interior damage was not considered. The outcome of the annotated image was coordinates and bounding boxes, and the practice of image annotation required the labeling of disease locations in the image. Labeling is a free graphical image annotation tool that locates and categorizes the disease severity in an image and stores it as an XML file with the matching xmin, xmax, ymin and ymax data for each bounding box [16,17]. There is an XML file in the Annotation folder for a single JPEG file in the JPEG Images folder. Each object's bounding box is saved in an XML file. It is difficult to work with annotation data for each image in a separate file. Therefore, we used Panda modules to combine each of these XML files into one CSV file. Annotations were first made in a Panda data framework called "df anno", which was then saved as a CSV file. Then, after the CSV file was segregated, containing the annotated data of citrus fruits, into four disease severity categories: healthy, medium, high and low. We then built an object for each class of severity. Next, we iterated each row of an object to extract the image name and URL from the object file and read it. Then, on each category's object, the accuracy of object detection was measured. Table 1 represents the total number of citrus samples taken for training and testing.

Proposed Algorithm for Detecting Severity Regions of the Citrus Diseases
Input the colored image(Img) (1) Perform BoundingBox(Img) and annotate the image, i.e., Annotate(Img), where BoundingBox(Img) is used to create boundary coordinates on affected areas of the image and the Annotate(Img) function is used to create and extract the annotated image as an XML file for each image.
Create object for each category (i.e., healthy, low, medium and high).
The precision of object detection highly affects the disease and severity recognition accuracy so a robust automatic detection system is proposed using image processing techniques. This algorithm was used to perform the preprocessing and object identification task for different disease locations and severity of disease present in citrus fruits. Graphbased segmentation was implemented to obtain the region proposal of each image. The above steps of the algorithm were implemented to obtain the region proposal and object detection was performed.

Steps of Selective Search to Obtain the Region Proposal
Initial regions were generated using Felzenszwalb's graph-based segmentation approach. The results after implementation are represented in Figure 2. The next step was to add labels to the segmented regions of the image [18]. Visualization of labels output after Felzenszwalb segmentation is shown in Figure 3.
After segmentation, a great deal of useless labels or labels are generated belonging to one object. The next step is to group labels that belong to one object based on the most similar regions. For this grouping, Local Binary Pattern (LBP) was implemented [19]. To capture the texture similarities of the initial regions, for each initial region, LBP features were calculated. The calculated texture gradient for an entire image was computed and the results are shown in Figure 4.   Next, we collected the RGB values on a scale of 0 to 1, the highest and lowest RGB values, as well as the point of difference, by following Equations (1) to (6). (1) The Hue Saturation Value (HSV) format symbolizes how paints of multiple colors blend altogether, with the saturation component also representing different intensities of vibrantly colored paints and the value component representing the combination of each of these paints with different ratios of black or white paints [20]. Figure 5 represents an HSV image with calculated min-max values. The sum of the histogram intersection of color ( , ) was calculated to measure the color similarity. One-dimensional color histograms were derived for individual color channels for each region using 25 bins, which was found to be effective. Three RGB color channels resulted in a color histogram with dimensions d = 75 for each region. The L1 norm was used to normalize the color histograms. The histogram intersection was used to determine the similarity using Equation (7).
The color histograms can be efficiently propagated through the hierarchy by using the following Equation (8).
The sum of the histogram intersection of texture ( , ) was calculated to measure the texture similarity. The L1 norm was adopted to normalize the texture histograms. In Equation (9), the histogram intersection is used to determine similarity: Next,we calculated the image's size similarity ( , ) , which promotes the rapid fusion of tiny regions. This constrains the size of regions in S, i.e., regions that have not yet been merged, throughout the procedure. This is also advantageous since it enables the generation of object locations at all scales throughout the image. For instance, it inhibits an individual region from devouring most other regions one after the other, giving all scales exclusively at the location of this developing region.
( , ) is defined as the percentage of the image that and collectively inhabit, whereas ( ) specifies the image's pixel size in Equation (10): Following this, we computed the fill similarity throughout the image.
( , ) determines how effectively the regions and fit together. The goal is to fill up the gaps: if is included in , it is reasonable to merge them first to prevent any gaps. If and are barely touching one another, they would most certainly form an odd region and should not be combined. Only the sizes of the regions and the enclosed boxes are incorporated in order to ensure a quick evaluation. In particular, we defined as the compact bounding box encompassing and .
( , ) therefore represents the proportion of the image in that is not covered by the regions of and in Equation (11).
Then, we retrieve a list of regions that intersect. We calculate the similarities between each pair of neighboring regions and then produce the sum of the regions' similarities using Equation (12). We obtain the total of two regions' similarity, which is a composite of the four types of similarity mentioned previously.
We next calculate the similarity of all regions using Equation (13).
Next, we merge the regions and then remove already merged regions and calculate a new similarity value. The following steps should be followed in order to merge the regions.
Merge regions in order s (ri, r j, R) (1) Retrieve the pair of regions with the highest degree of similarity from the similarity dictionary. (2) Merge the region pairs and add them to the dictionary of regions.
Eliminate all pairs of regions from the similarity dictionary in which one of the regions is defined in step 1. (4) Determine the degree of similarity between the newly combined region and the regions and their intersecting regions (intersecting region is the region that is to be deleted).

Intersection of Union on Overlapped Region
To train a classifier using CNN features as input, we require ground truth labels for each candidate region. However, there is a quandary over how to identify a region that partially overlaps when a portion of the fruit is included. To address this issue, an overlap threshold value will be used below which regions will be regarded as negatives. Intersect over Union (IoU) is a frequently used metric for determining the similarity of the projected bounding box to the ground truth bounding box using Equations (14)- (16). The aim is to examine the area of overlap between two boxes to the cumulative area of the two boxes [21,22]. Figure 6 shows the region of Intersection over Union. ( 1 , 1 ) = (max(a 1 ), max(x 1 )).
Overlapping region = width * height
Training features are created and ground truth is divided into 4 pickled objects that contain candidate regions with an IoU > 0.75. The same object can have a large number of small candidate regions that hardly provide new information, so, for each object, only the candidate region will be chosen. Other pickled objects correspond to the particular object captured in the first object. The remaining two picked objects contain all the candidate regions that do not contain a citrus fruit object, i.e., IoU < 0.4, and information regarding the particular object that was not captured in the first object.

Warp the Regions Proposed by the Selective Search
To calculate features for a region proposal, the transformation of image samples in the region into a form that is compatible with the CNN is required [23]. All pixels in a tight bounding box around the candidate region are warped to the desired size irrespective of its size or aspect ratio. We elongate the tight bounding box before to warping so that there are exactly p pixels of warped image across the original box (we use p = 16). VGG16 specifies that the image must have the dimensions (height, width, Nchannel) = (224, 224, 3). The region proposal given by the selective search often does not correspond to the image with the dimensions 224 in height and width. Thus, all pixels in the region proposal need to be warped to the CNN's input size.

Feature Extraction
Using VGGNet16, a 4096-feature map is extracted from each region proposal. VGGNet is the current state of the art, with advanced and efficient identification capabilities, and it is frequently used for transfer learning due to its portability. Only 3 × 3 convolutions are used by VGGNet. VGGNet, on the other hand, contains many extra filters [24]. It has 16 layers, each with its own set of trainable weights. It is now the most popular method for obtaining features from images. VGGNet's weight composition is open to the public. VGGNet is only used for feature extraction and not for classification purposes. For classification, the last three layers were removed from the network. Forward propagation of a mean-subtracted RGB 227 × 227 image through 5 convolution layers and 2 fully connected dense layers is used to compute features.

Transfer Learning
Transfer learning is a powerful approach to machine learning that makes CNNs to learn for one goal and they are repurposed as the foundation for a model on a different task. Despite initiating the training from scratch by arbitrarily instantiating the weights, a pre-trained network can be used to initialize the weights on large labeled datasets such as public datasets [25]. The ImageNet project is a massive visual database designed for use in the development of visual object recognition [26]. In this article, leveraging a pre-trained model is investigated from the enormous ImageNet dataset, which is then used to a obtain the severity trained on the citrus fruit dataset. The following are the key processes of the transfer learning technique. The proposed model using transfer learning is shown in Figure 7. The first step is to determine the base networks of the transfer learning and assign the network's weights by using the pre-trained CNN model. These weights are available for download from an online source. Then, we reconstruct the network structure by manipulating the bottom layers of the network. A new modified network structure can be obtained using this approach. The newly constructed networks can then be fine-tuned in order to minimize the loss function using the dataset and associated labels. Specifically, the Adaptive Moment Estimation (Adam) algorithm is used to determine the optimized weights with control of the loss function using sparse categorical cross-entropy as a loss function. Thus, for transfer learning, a VGGNet pre-trained model was used on ImageNet, and a sequential CNN model was used to train the newly updated neural networks using the citrus fruit datasets. The method offers the features of VGGNet with a sequential CNN. From the initial layers, i.e., 1_ 1 to FC1(Dense) are from the VGGNet.Dense, _1, _2 is substituted with the sequential CNN model. Lastly, a softmax classifier is used for multi-classification of the severity classes of citrus disease. Thus, the new model generally consists of two sections, in which the first section is the pre-trained model and the other section contains the perpetuated layers employed on a multi-scale feature vector for multi-classification. Table 2 lists the parameters of the implemented deep learning model.

Result Analysis
The training accuracy is the percentage of the correctly defined data samples in the training set. Similarly, the validation accuracy refers to the percentage of the correctly elucidated data samples from some of the other samples. The dataset is divided into two sets, one set comprising images for training and other for validation. The 80-20 cross-validation process is used to train and validate the model. For validation, multiple investigations are carried out with shuffled images [26]. New, randomly selected images are used to test the efficiency of the model. Sparse categorical cross-entropy for the loss function was used to determine the classification model's performance. The overall training accuracy achieved by the model is 95%. The Adam optimizer is selected for the model to optimize the cross-entropy function [27]. The result of the implemented convolution neural network model on randomly selected test images was analyzed and represented as a confusion matrix, as shown in Table 3. Figure 8 depicts the classification accuracy and loss gained after the training and validation process of the model. Table 3. Confusion matrices for all levels of severity of disease present in citrus fruits.

Class
Healthy Out of the four levels of disease severity of the citrus fruits, the model is able to predict the low severity level with accuracy of 99%, precision of 100%, recall 84% and an F1 score of 91%. For high severity levels of the disease, our model recorded accuracy of 98% when compared to other classes. For the detection of healthy conditions, the model displays 96% accuracy, and it shows 97% accuracy in the case of the medium severity level. The accuracy, precision, recall and F1 score calculated for each severity level of the citrus fruit disease are listed in Table 4.  Figure 9 depicts some of the graphical outcomes of the proposed automatic disease recognition system. The results demonstrate that the accuracy of the disease severity level of citrus fruits was assessed as low severity (95.9%), high severity (99.7%), medium severity (95.6%) and healthy (99.7%). As demonstrated in Figure 9, our system can efficiently diagnose the image dataset with four severity levels of disease, and has been compared to expert manual evaluation. The results reveal that disease severity identification is quite accurate and falls within the domain experts' acceptable range.

Conclusions
Fruit diseases are the most serious threats to global agricultural progress, and they have a strong influence on food safety. As a result, automatic diagnosis of citrus fruit diseases is increasingly desirable in analytics. Deep learning approaches, specifically CNNs, have demonstrated an encouraging ability to resolve the majority of the difficult classification problems. Transfer learning for deep CNNs is investigated in this research with the goal of improving the learning ability of obtaining the severity level, and a sequential VGGNet16 architecture is developed for the diagnosis of four severity levels of the disease present in citrus fruit. The pre-trained VGGNet16 is updated by substituting its bottom layers with an extended convolutional layer that includes a dense layer with ReLu activation and sparse categorical cross-entropy for the loss function used to determine the classification model's performance. The Adam optimizer is selected for the model to optimize the cross-entropy function. Lastly, a fully connected softmax layer was inserted as the classification layer in order to obtain the four severity levels of the disease. Test accuracy achieved on randomly selected images for healthy, low level, high level and medium levels of disease was 96%, 99%, 98% and 97%.