Classiﬁcation of Apple Disease Based on Non-Linear Deep Features

: Diseases in apple orchards (rot, scab, and blotch) worldwide cause a substantial loss in the agricultural industry. Traditional hand picking methods are subjective to human efforts. Conventional machine learning methods for apple disease classiﬁcation depend on hand-crafted features that are not robust and are complex. Advanced artiﬁcial methods such as Convolutional Neural Networks (CNN’s) have become a promising way for achieving higher accuracy although they need a high volume of samples. This work investigates different Deep CNN (DCNN) applications to apple disease classiﬁcation using deep generative images to obtain higher accuracy. In order to achieve this, our work progressively modiﬁes a baseline model by using an end-to-end trained DCNN model that has fewer parameters, better recognition accuracy than existing models (i.e., ResNet, SqeezeNet, and MiniVGGNet). We have performed a comparative study with state-of-the-art CNN as well as conventional methods proposed in the literature, and comparative results conﬁrm the superiority of our proposed model.


Introduction
The apple is known as one of the most important tree fruits, due to its second place in world fruit production [1,2]. In the year 2017, the annual production of apples worldwide reached 83.1 million tons and consumed heavily around the world [1,3]. The high consumption of apples is due to their low cost and numerous healthy properties i.e., high content of fiber, minerals, vitamins, and antioxidants. In addition, its flavor offers the possibility of consuming them naturally or using them for innumerable derived products. It is estimated that approximately 33% of apples produced worldwide are processed to make juices, ciders, applesauce, alcoholic beverages, and dried apples, among other products [4].
In recent years, the production of the apple industry is facing significant loss due to diseases that cause poor quality of the product. Minimal observation through a naked eye can distinguish and identify the diseased apple from the rest. However, human analysis is highly subjective and prone to error. Therefore, an accurate and timely diagnosis of diseases is a fundamental and extremely critical process to avoid future losses. There are many apple diseases according to a phytopathology datasheet [5,6], but the most common diseases are Blotch, Rot, and Scab.
Several studies suggest that visual exploration through hand-picked methods are the most used methods in fruit disease diagnosis. However, it is a slow and problematic process [7,8]. Conventional methods such as Polymerase Chain Reaction (PCR) require detailed molecular sampling, resulting in a non-cost-effective technique [9]. In recent years, Artificial Intelligence (AI) has been used to help experts in the automatic diagnosis of diseases that affect plants and trees. The methods performed with AI are faster, less expensive, and more efficient [10][11][12].
Machine learning (ML) is defined as a branch of AI that automates the construction of analytical models so that systems can learn from large amounts of data, identify patterns, and make decisions [13][14][15]. Nevertheless, in most cases, traditional ML approaches applied to complex images have non-automatic feature extraction steps [16,17], reducing the effectiveness and making the process more time-consuming, and accuracy may not be satisfactory [18][19][20]. Whereas, Deep Learning (DL) is an advanced form of ML modeling that allows systems to train themselves and improve classification accuracy through a series of calculations based on multiple layers of non-linear processing units [21][22][23]. The advantage of DL is the ability to exploit raw data without using hand-crafted features, and without prior knowledge, to extract relevant features [24][25][26][27].
Enormous studies have been carried out to identify and classify apple diseases. For instance, Goel et al. [28] proposed a method to classify healthy apples and three types of diseases (Blotch, Rot, and Scab). The authors hybridized three metaheuristic algorithms to segment apple images using the maximization function between groups for clustering and segmentation. Local Binary Pattern (LBP)-based features are extracted from segmented images for classification using Multiclass Support Vector Machine (MSVM). Li et al. [29] introduced an apple disease classification technique using a back-propagation-based Artificial Neural Network (ANN). The model is trained using healthy and fungal-infected apple images, to which the methods of eliminating the background, segmentation of apple defects, and identification of the calyx and stem are applied to obtain the features for the network. In 2012, Dubey et al. [7] presented a solution to detect Blotch, Rot, and Scab apple diseases using an MSVM classification technique. The apple images are segmented using the K-means clustering technique followed by feature extraction using the Global Color Histogram (GCH), Color Coherence Vector (CVC), LBP, and Complete Local Binary Pattern (CLBP). These features are fed to MSVM to classify among the different apple diseases.
Later in 2016, Dubey et al. [30] used another method to investigate the same set of apple images by using a combined unique feature descriptor instead of separate feature descriptors. First, the images are segmented using the K-means grouping technique to obtain the features of color, texture, and shape of the apples. Then, apple diseases are classified using MSVM with an average accuracy of 95.6%. In 2019, Ayyub et al. [31] obtained an accuracy of 96.29% by classifying the same apple images. Ayyub et al. proposed a method that consists of the extraction of features using the Improved Summation and Difference Histogram (ISADH), Complete Local Binary Patterns (CLBP), and Zernike Moments (ZM).
In addition to the works discussed above (i.e., classical/conventional ML), many researchers tend to work on DL such as Convolutional Neural Networks (CNN), a popular technique for image recognition, which has demonstrated outstanding ability in image processing and classification [24,32]. Wang et al. [33] evaluated the performance of transfer learning using pre-trained DL models to classify images of healthy apple leaves and apple leaf black Rot in three stages (i.e., early stage, middle stage, and final stage). According to experimental results, the highest performance obtained is 90.4% with the VGG16 model. Furthermore, Alharbi and Arif [25] collected 800 images for each of the four diseases (i.e., blotch, rot, scab, and healthy). They further augment the dataset using basic operations such as flips, scale, crop, illumination to generate 3200 images for each disease.
Several other similar studies (similar to the ones discussed above) have been conducted by different researchers, for instance, Nachtigall et al. [34] proposed a method to detect and classify the nutritional deficiencies and herbicide damage in apple trees, using leaf images and AlexNet. Liu et al. [35] worked in the identification of four types of apple leaf diseases (i.e., Mosaic, Rust, Brown spot, and Alternaria leaf spot) based on AlexNet. Al-Shawwa and Abu-Naser [36] implemented a method for the classification of 13 different apple species using Deep Convolutional Neural Network (DCNN). Turkoglu et al. [37] applied a hybrid method between Long Short-Term Memory (LSTM) architecture with pre-trained DCNN models. The features are extracted with transfer learning and fed to the LSTM to detect pests and diseases with an average accuracy of 99.2%. The higher accuracies are obtained using DL approaches for the classification of apple leaf diseases. One of the most prominent advantages of DCNN is its ability to execute non-linear features [38,39]. DCNN automatically detects the essential features without any human supervision [40,41]. DCNN is computationally efficient since it uses a particular convolution, special pooling operations, and performs parameter sharing. Additionally, DCNN can solve difficult applications such as classification, image segmentation and location inspection [42][43][44][45][46][47][48][49].
To the best of our knowledge, there are very limited existing approaches that directly handle apple disease classification problems similar to the one addressed in this article. Therefore, this paper proposes an approach to classify the apple diseases based on Deep Convolutional Generative Adversarial Network (DCGAN) data and DCNN-based on two different architectures. DCGAN model is used to overcome the limitations of the limited availability and validation of apple disease images. Thus, DCGAN generates new (to some extent, similar to the original) images, which helps to train a DCNN model to obtain higher accuracy in apple disease classification.
The rest of the paper is structured as; Section 2 explains the material and methods used in this study. Section 3 contains information regarding the experimental results and discussion. Finally, Section 4 concludes the paper with possible future research directions.

Materials and Methods
This section explains the dataset used in this study and the generation of synthetic images using deep convolutions. Furthermore, this section also discusses the deep learning architecture used for the classification of apple diseases.

Dataset Description
The original dataset used in this work contains 319 (i.e., 80 images for healthy, Blotch, and Rot Apple images, respectively, and 79 for Scab) images of apples in which some are healthy, and others are from apples with any of the three, Blotch, Rot, and Scab diseases. These images are obtained from a set of the dataset used in works [7,9,30,31], which is publically available at the Kaggle repository. This small dataset is further used to create 4000 synthetic images through the Deep Convolutional Generative Adversarial Network (DCGAN) architecture, from which 1000 images are for each of the four categories are generated to construct an optimal DL solution.

Deep Convolutional Generative Adversarial Network (DCGAN)
As explained in Section 2.1, DCGAN architecture is used to generate synthetic images to train a DL model. DCGAN implicitly learns from the distribution of data contained in a set of sample images to create new images extracted from the learned distribution. DCGAN is faster than traditional GAN architecture because DCNN is modified instead of ANN to increase stability and convergence [50]. The model used in this study consists of two DCNNs that are trained simultaneously. The first model is the generator, and the second is the discriminator; these models work with only convolutional layers to learn up and down spatial samples independently. The main aim of the generator is to construct fake noise data into an image that fools the discriminator to classify it as a fake image. While the discriminator aims to identify if the image is fake or real, at the end of the training, the generator can produce the image that is indistinguishable from real data, recreating the original data distribution [51].
The discriminator model is a convolutional process that eliminates the fully connected layer and uses the LeakyReLu as an activation function to compress an image into a feature vector. The generator model is a deconvolution process in which all activation functions use LeakyReLu, except the output layer that uses tanh. The overall network works through the Equation (1) [52].
where D and G represent generator and discriminator, whereas X represents a sample that is transformed by the generator through noise vector Z. D defines the training target for G to distinguish between real sample p data (X) and generated one P z (Z). Therefore, the generator will confuse the discriminator to predict that the generated data is true or real. Figure 1 shows the DCGAN structure used in this work to increase the size of the dataset. The generative network consists of three deconvolution layers of 128, 64, and 3 filters, respectively, with a 5 × 5 kernel. The first two layers used the LeakyReLu activation function and the Batch Normalization (BN). The BN is used to normalize the input layer by adjusting and scaling the activation, while the last layer used tanh as an activation function. The discriminative network consists of two convolutional layers of 64 and 128 filters with a 5 × 5 kernel. The LeakyReLu activation function is used for both layers. Further information regarding the layer-wise settings can be found in Table 1.

Deep Convolutional Neural Network
Nowadays, Deep Convolutional Neural Network (DCNN) has been explored for series of 2D and 3D datasets [26]. AlexNet [53], ResNet, Mini-VGGNet, and SqueezeNet are considered as state-of-the-art DL networks for medical imaging, food processing, and fruit disease detection tasks. Therefore, in this work, AlexNet architecture is used a based model, which uses minimal convolutional operations to the input data using 2D kernels (presented in Equation (2)) to extract the feature maps (i.e., output maps) [54].
where A xy nm is the output feature at [x, y], n indicates the layers, m represents the number of feature maps, b is the bias, K cr nm gives the value at (c, r) of the kernel connected to the m th feature map, with C and R being the entire height and width of the kernel. Finally, f (.) represents the activation function (ReLu in our case) respectively. Normally, the disease region is usually smaller than the rest of the apple percentage, as shown in Figure 2 and Table 1. Given this, we first automatically focused on the feature extraction of the disease region and confined a fully connected feature map at the FC-layer(fully connected) for classification. Further information regarding the layer-wise settings can be found in Table 1.

Optimizer and Loss Function
In DL, step size also refers to the learning rate is the most concerning issue and causes redundancies. For example, a large step may diverge instead of converging, or a small size may make take longer for a network to converge. Thus, for the aforesaid reasons, several optimization algorithms are considered during the training of the dataset, which includes Adam, AdaGrad, Adamax, and Nadam, among others. In this study, for multi-class classification, categorical cross-entropy is employed as a loss function. The comparison within the class can be predicted through the Equation (3) [55], where P and p represent the expected and the target values of the function N respectively.

Evaluation Metrics
Overall (OA) accuracy is rigorously used for comparative analysis. As in the literature, this work also used the same metric to analyze the generalization performance of our proposed model. The OA can be computed as follows: where TP is a true positive and C is the total number of classes. In addition to the OA, we have also performed a statistical test called the z-test. In this test, the confidence interval is a type of statistical estimation in which the intervals are associated with confidence concerning the true parameters of the proposed model. The confidence interval λ is then obtained by the given observations, i.e., a valid probability of containing the true underlying parameters. There are many possibilities to choose a level of confidence, such as a 90% confidence interval that defines the hypothetical indefinite data collection; furthermore, it estimates the population parameter. Therefore, it is required to choose an appropriate confidence level before examining the data. In a nutshell, in this work, a 95% confidence level is used. However, the values of the confidence level of 99% or 90% are also often used for several applications. The confidence interval is then computed as in the following steps.

2.
Identify the standard deviation σ is known otherwise compute the standard deviation i.e., δ.
• If the standard deviation is known then , where C = 100(1 − α)% is confidence level and Φ is the cumulative density function of the standard normal distribution used as a critical value. • If the standard deviation is unknown then the t distribution is used as a critical value point which depends on the confidence level C for the degrees of freedom (DoF). The DoF can be found by subtracting one of the number of observations i.e., (n − 1). The critical values are as follows: C = 99%, z * = 2.576, C = 98%, z * = 2.326, C = 95%, z * = 1.96 and C = 90%, z * = 1.645. Thus, the critical value can be expressed as t * − t α (r) where r be the degree of freedom and α = 1−C 2 .

3.
Thus, by plugging the values into the appropriate equations: • For a known standard deviation: • For an unknown standard deviation:

Results and Discussion
The dataset was divided into 70% for training and 30% for blind testing. The 70% training dataset is further divided into 90/10% for training and validation sets using a 10-fold cross-validation process. Therefore DCGAN-DCNN is originally trained and validated on 3023 images and tested on the remaining 1295 images with a spatial size of 32 × 32 × 3 per image.
This section further illustrates the experimental evaluation of apple disease classification through several experiments along with a statistical test. All the listed experiments are performed on the online platform Google Colab using the Jupyter environment as back-end [56]. The run-time of the environment is a GPU with a python 3 notebook, with 25 GB of Random Access Memory (RAM) and 358.27 GB of cloud storage for data computation. Initially, the experiments are done on original data (without including the synthetic images) with a size of 350 × 350 and then further analyzed on the DCGAN-dataset with the size of 32 × 32. Some examples of the generated images are shown in Figure 3. There are similar features between the generated images and the original images. However, it is important to highlight that the generated images are of comparatively low resolution due to the filter sizes, which is due to the availability of a limited resource. Higher filter sizes may produce more accurate images with higher resolution, however, as earlier explained, it requires to have more powerful computational resources. The structure of the DCNN model used in this study consists of a sequential model since it has equivalent dimensions for each input and output. A 2D kernel of size (3 × 3) is used that will pass the filters throughout the image for the convolution operation for each convolution layer. To stride down and reduce the noisy features from the segmented images, a max-pooling layer is used with the ReLu. Further information regarding the layer-wise settings can be found in Table 1. After the convolutional operators, the flatten layer is used with 205,056 features and a dropout of 0.25 to cope with the over-fitting issues. These FC layers features are then dense to give four class labeled classification for apple disease detection with softmax using several optimizers (i.e., Adam, Adamax, Nadam, and Adadelta).
For DL models, the learning rate is sensitive; therefore, in this work, for each optimizer, we set a standard step size as Adam = 0.001, Adadelta = 1.0, Nadam = 0.002, and Adamax = 0.002 for training. Beta 1 and Beta 2 parameters are observed to be close to 1 for Adam, Nadam, and Adamax; similarly, the ρ parameter for Adadelta is greater than 0 with no delay parameter. Moreover, for the DCNN model, the loss function is compiled to be "categorical-correspondence" to separate the diseases into four classes. DCGAN model works on the small spatial size of each class, i.e., 32 × 32 × 3 and thus converge on lesser iterations, i.e., 10 epochs for each optimizer, and requires some tuning in its structure as shown in Figure 4. Whereas, for a larger size, data are compiled and trained on using 35 epochs for the original dataset.  Compared with the conventional methods that mostly rely on hand-crafted features, the DCNN model achieved higher statistical significance and accuracy. This is because the deep models provide non-linear features as shown in Figure 5 that preserve the significant spatial information about the object for Rotten Apple image (Figure 5a) is used as a visual example. Figure 5b-f represents the feature maps learned by applying 128, 64, 64, 32 and 32 filters respectively. As mentioned above, DL extract features in more depth such as edges, color, corner, and shape rather than conventional segmentation and clustering-based methods. For instance, Figures 6a, 7a and 8a present the input image, whereas Figures 6b, 7b and 8b showed the results obtained using a binary segmentation process. Meanwhile, Figures 6c, 7c and 8c present the output obtained through the multilevel segmentation. For all these experiments, Otsu's global thresholding method is used for binary segmentation, whereas multilevel segmentation is obtained through three-points K-means clustering for each RGB color sequence. Multilevel segmentation is commonly used to highlight the defects by partitioning the images in different clusters [7,30]. All these results are compared with the DCNN model as shown in Figures 6d, 7d and 8d. From these results, one can conclude that the proposed model obtained significantly higher results as compared to the binary and multilevel segmentation as well as several other hand-crafted feature-based classification. In previous studies, the accuracy of the apple disease classification is examined along with multiple techniques. These techniques extract features to form a feature descriptor in order to achieve an accuracy rate of 93% to 95.6%. Furthermore, the classification techniques have also been analyzed through color, texture, and shape-based features. All of these studies used an MSVM as a baseline classifier to classify different diseases in apples. The comparative study is shown in Table 2 explains the accuracy achieved by DCNN with several optimizers. Adam's optimizer learning rate converges effectively for this work than other optimizers and produces remarkable results compared to the several conventional hand-crafted features-based classification techniques.
Moreover, the state of art deep learning models such as ResNet (https://www.kaggle. com/yadavsarthak/residual-networks-and-mnist (accessed on 25 February 2021)) , SqeezeNet (https://www.kaggle.com/somshubramajumdar/squeezenet-for-mnist (accessed on 26 February 2021)) and MiniVGGNet (https://www.pyimagesearch.com/2019/02/11/fashion-mnistwith-keras-and-deep-learning (accessed on 11 February 2019)) have been analyzed in comparison with the proposed DCGAN-DCNN model. All of these models have several numbers of convolutional operations and require keen observations in setting up a number of kernels and filters. The layer-wise settings for these models, along with the proposed model are presented in Table 3. All the competing methods are analyzed with different optimizers through a 10-fold cross-validation process. From experimental results, one can conclude that the proposed DCGAN-DCNN model outperformed with an overall accuracy of 99.99% in comparison with other state-of-the-art models. The complete pipeline comparison with the abovementioned methods can be seen in Table 2.

Conclusions
This study proposed a DCGAN-DCNN model for apple disease, i.e., Blotch, Rot, and Scab classification. The DCNN structure consists of five convolutional, two dense, and one decision vector layer to classify the apple disease. Experimental results reveal that the proposed model outperformed several conventional and state-of-the-art deep models. However, the learning rate and optimizer have a strong influence; therefore, an appropriate selection of these two essential hyper-parameters is critical to get better results. Future research entails incorporating the soft and hard attentional mechanism in deep models for apple disease classification.  Data Availability Statement: The dataset can be found at https://www.kaggle.com/kaivalyashah/ apple-disease-detection.

Conflicts of Interest:
The authors declare no conflict of interest.