An Automated Hyperparameter Tuning Recurrent Neural Network Model for Fruit Classiﬁcation

: Automated fruit classiﬁcation is a stimulating problem in the fruit growing and retail industrial chain as it assists fruit growers and supermarket owners to recognize variety of fruits and the status of the container or stock to increase business proﬁt and production efﬁcacy. As a result, intelligent systems using machine learning and computer vision approaches were explored for ripeness grading, fruit defect categorization, and identiﬁcation over the last few years. Recently, deep learning (DL) methods for classifying fruits led to promising performance that effectively extracts the feature and carries out an end-to-end image classiﬁcation. This paper introduces an Automated Fruit Classiﬁcation using Hyperparameter Optimized Deep Transfer Learning (AFC-HPODTL) model. The presented AFC-HPODTL model employs contrast enhancement as a pre-processing step which helps to enhance the quality of images. For feature extraction, the Adam optimizer with deep transfer learning-based DenseNet169 model is used in which the Adam optimizer ﬁne-tunes the initial values of the DenseNet169 model. Moreover, a recurrent neural network (RNN) model is utilized for the identiﬁcation and classiﬁcation of fruits. At last, the Aquila optimization algorithm (AOA) is exploited for optimal hyperparameter tuning of the RNN model in such a way that the classiﬁcation performance gets improved. The design of Adam optimizer and AOA-based hyperparameter optimizers for DenseNet and RNN models show the novelty of the work. The performance validation of the presented AFC-HPODTL model is carried out utilizing a benchmark dataset and the outcomes report the promising performance over its recent state-of-the-art approaches.


Introduction
Automatic fruit classification is an intriguing challenge in the growth of fruit and retailing industrial chain since it is helpful for the fruit producers and supermarkets to discover various fruits and their condition from the containers or stock with a view to improvising manufacturing effectiveness and revenue of the business [1]. Thus, intelligent systems making use of machine learning (ML) approaches and computer vision (CV) have been applied to fruit defect recognition, ripeness grading, and classification in the last decade [2].
In automated fruit classification, two main methods, one conventional CV-related methodologies and the other one deep learning (DL)-related methodologies, were investigated. The conventional CV-oriented methodologies initially derive the low-level features, after which they execute image classification through the conventional ML approaches, while the DL-related techniques derive the features efficiently and execute an endwise image classification [3]. In the conventional image processing and CV approaches, imagery features, such as shape, texture, and color, were utilized as input unit for fruit classification.
Previously, fruit processing and choosing depended on artificial techniques, leading to a huge volume of waste of labor [4]. Nonetheless, the above-mentioned techniques require costly devices (various kinds of sensors) and professional operators, and their comprehensive preciseness is typically less than 85% [5]. With the speedy advancement of 4G communication and extensive familiarity with several mobile Internet gadgets, individuals have created a large number of videos, sounds, images, and other data, and image identification technology has slowly matured [6].
Image-related fruit recognition has gained the interest of authors because of its inexpensive gadgets and extraordinary performances [7]. At the same time, it is needed to design automated tools capable of handling unplanned scenarios such as accidental mixing of fresh products, fruit placement in unusual packaging, different lighting conditions or spider webs on the lens, etc. Such situations may also cause uncertainty in the model results. The intelligent recognition of fruit might be utilized not only from the picking stages of the prior fruit but also in the processing and picking phase in the next stage [8]. Fruit identification technology depending on DL could substantially enhance the execution of fruit identification and comprises a positive impact on fostering the advancement of smart agriculture. In comparison with artificial features and conventional ML combination techniques, DL may derive features automatically, and contains superior outcomes that slowly emerged as the general methodology of smart recognition [9]. Particularly, convolutional neural network (CNN) is one of the vital DL models utilized for image processing. It is a type of artificial neural network (ANN) which utilizes convolution operation in at least one of the layers. Recently, CNNs have received significant attention on the image classification process. Specifically, in the agricultural sector, CNN-based approaches have been utilized for fruit classification and fruit detection [10].
This paper introduces an Automated Fruit Classification using Hyperparameter Optimized Deep Transfer Learning (AFC-HPODTL) model. The presented AFC-HPODTL model employs contrast enhancement as a pre-processing step which helps to improve the quality of the image. Next is the Adam optimizer with deep transfer learning-based DenseNet169 model. Moreover, the Aquila optimization algorithm (AOA) with recurrent neural network (RNN) model is utilized for the identification and classification of fruits. The performance validation of the presented AFC-HPODTL model is carried out using a benchmark dataset and examines the results under different aspects. In summary, the contribution of the paper is as follows: • An intelligent AFC-HPODTL model comprising of pre-processing, Adam with DenseNet169based feature extraction, RNN classification, and AOA-based hyperparameter tuning is presented. To the best of our knowledge, the AFC-HPODTL model has never been presented in the literature.

•
Hyperparameter tuning of the DenseNet169 and RNN models takes place using Adam optimizer and AOA techniques respectively, which in turn considerably enhances the fruit classification performance shows the novelty of the work.

•
The performance of the proposed AFC-HPODTL model is validated on two open databases and the results demonstrate the better performance over other DL models.
The rest of the paper is organized as follows. Section 2 offers a detailed literature review of existing fruit classification models. Next, Section 3 introduces the proposed AFC-HPODTL model and Section 4 provides the experimental result analysis. Finally, Section 5 concludes the study.

Related Works
In [11], the authors suggest an effective structure for fruit classification with the help of DL. Most importantly, the structure depends on two distinct DL architectures. One is a proposed light model of six CNN layers, and the other is a fine-tuned visual geometry group-16 pretrained DL method. Rojas-Aranda et al. [12] provide an image classification technique, based on lightweight CNN, for the purpose of fastening the checking procedure in the shops. A novel images dataset has presented three types of fruits, without or with plastic bags. These input units are the RGB histogram, the RGB centroid acquired from K-means clustering, and single RGB color. In [13], the researchers suggested a new fruit classification method that uses Long Short-Term Memory (LSTM), RNN structures, and CNN features. Type-II fuzzy advancement was further utilized as a preprocessing device for advancing the images. Furthermore, TLBO-MCET was used to tune the hyperparameters of the suggested method.
In [14], the researchers advanced a hybrid DL-related fruit image classification structure called attention-related densely connected convolution network with convolution auto-encoder (CAE-ADN), that employs a CAE for pretraining the images and leverages an attention-related DenseNet for extracting the image features. In the opening portion of the structure, an unsupervised technique with a group of images is applied to pretrain the greedy layer-wised CAE. In the next portion of the structure, the supervised ADN with the ground truth is applied. The structure's last portion performs an estimation of the classes of fruits. Kumari and Gomathy [15] recommend a classical method that utilizes texture features and color for fruit classification. The conventional fruit classification technique is reliable upon manual function on the basis of visual ability. The classification can be performed with the help of the Support Vector Machine (SVM) classifier depending on co-occurrence and statistical features extracted from the wavelet transform.
In [16], a 13-layer CNN was devised. Three categories of data augmentation methods are employed: noise injection, image rotation, and Gamma correction. The researchers made a comparison of average pooling and max pooling. The stochastic gradient descent with momentum is utilized for training the CNN with a minibatch size of 128. In [17], a fruit image classification technique based on lightweight neural network MobileNetV2 and transfer learning (TL) method is employed for recognizing fruit images. They leveraged MobileNetV2 network pretrained by ImageNet dataset as a base system after replacing the topmost layer of the base system with a Softmax classifier and conventional convolution layer. They applied dropout to the newly added conv2d simultaneously for diminishing overfitting. The pretrained MobileNetV2 is utilized for extracting features and the Softmax classifier is utilized for classifying features.
The researchers in [18] provide an extensive review of the hyperparameter tuning of CNN models by the use of nature-inspired algorithms. It provides an overview of various CNN approaches utilized for image classification, segmentation, and styling. Next, in [19], the mathematical relationship between four hyperparameters, namely learning rate, batch size, dropout rate, and convolution kernel size were investigated in detail. A generalized multi-parameter mathematical correlation approach was derived, showing that the hyperparameters play a vital part in the efficiency of the NN models. Guo et al. [20] introduced a distributed particle swarm optimization (DPSO) algorithm for hyperparameter tuning of the CNN models. On comparing with the complex, with manual designs based on historical experience and personal preference, the DPSO algorithm effectually chooses the hyperparameters of the CNN model. In addition, the DPSO algorithm has shown significant improvement over the conventional PSO algorithm.
Several fruit classification models exist in the literature. Despite the development of the ML and DL models in previous works, it is still necessary to boost the fruit classification performance. Due to the continual deepening of the model, the number of parameters of DL models gets increased rapidly, resulting in model overfitting. Moreover, different hyperparameters have a significant impact on the efficiency of the CNN model. Particularly, the hyperparameters such as epoch count, batch size, and learning rate selection are important to achieve effective results. As the trial and error method for hyperparameter tuning is a tiresome and erroneous process, metaheuristic algorithms can be applied. Therefore, in this work, we employ the Adam optimizer and AOA algorithm for the parameter selection of the DenseNet169 and RNN models, respectively.

The Proposed Model
In this study, a new AFC-HPODTL model was developed for the automatic identification and classification of fruits. The presented AFC-HPODTL model comprises a series of processes namely pre-processing, DenseNet169 feature extraction, Adam optimizer, RNN classification, and AOA hyperparameter optimization. Figure 1 illustrates the overall process of the AFC-HPODTL algorithm.

Contrast Enhancement
Initially, the presented AFC-HPODTL model employs contrast enhancement as a pre-processing step which helps for improving the quality of the image. CLAHE is different from AHE as it gets care of over-amplification of contrasts. CLAHE functions on smaller areas from the image named tiles, before the total images. The adjacent tiles are then integrated utilizing bilinear interpolation for removing the artificial boundary. This technique is executed for improving the contrast of images.

Feature Extraction
To extract feature vectors from the pre-processed fruit images, the DenseNet169 model is employed. The CNN structures have two bases, namely the convolution and classification bases. The convolution base contains three important kinds of layers, namely the convolution, activation, and pooling layers [21]. These kinds of layers were utilized for discovering the fundamental features of input images that are named feature maps (FM). The FM was obtained by applying convolutional procedures to input images or prior features utilizing linear filtering, and integration of a bias term. Afterward, the passing of this FM was achieved with a nonlinear activation function such as Sigmoid and ReLU. Conversely, the classification base comprised the dense layer integrated with activation layer for converting the FMs to 1D vector for expediting the classifier task utilizing several neurons. Generally, more than one dropout layer is used along with the classification base to minimize the overfitting which encounters CNN structures and enhances its generalized nature. Adding a dropout layer to the classification base establishes a novel hyperparameter named dropout rate. Usually, the dropout rate is fixed in the range of 0.1-0.9.
DenseNet is the most novel addition to the NNs utilized for detection of visual objects. DenseNet169 is a process of the DenseNet group [22]. The DenseNet group is planned for the executing image classifier. DenseNet169 is superior to the rest of the DenseNet group. Typically, in DenseNet, every image is being trained. An ImageNet image database, however, can be trained by the method and stored and tested by loading our saved method rather than ImageNet. At this point, the results of the earlier layer is obtained concatenated with the future layer DenseNet. DenseNet has been shown to reduce the accuracy from a higher level NN that is produced by vanishing gradients, while there is a longer path that exists amongst the input as well as output layers and the data obtained vanish even before attaining its target. The DenseNet goes to type of typical network. Based on the new stats, a convolution layer is more effective and accurate when it can be shorter and linked among layers nearby, the input and closer output. At this point, the DenseNet was employed for connecting all the layers that are in feed-forward fashion. Generally, a classical convolution network has L layers. Moreover, L linking exists among the L layer. That represents one link among all the layers and their following layers.
It takes L (L + 1)/2 direct connections from the networks. For all the layers as input, every presiding layer is utilized. In order to input every following layer, their FM is being utilized. Several benefits can be obtained from DenseNet. It can decrease the vanishing gradient problems. The feature propagation is strengthened, feature reprocess is encouraged, and it decreases the number of parameters. The presented structure is estimated on extremely competitive image detection benchmark ImageNet and it also utilizes the saved and load function. The combination of layers was feasible as defined by when there is an entire similarity from the FM dimensional at the time of concatenation or addition. DenseNet is divided into DenseBlocks with a different number of filters, but within the blocks, the dimensional is similar. The Batch normalization (BN) was executed by utilizing down-sampling with transition layer. That is assumed to be a vital stage by the CNN. Based on the improvement of dimensional of the channels, the number among the DenseBlocks of filter variations, The rate of growth is represented by K. It plays an important role in generalizing I th layer. The count of data which is further required from all the layers are being measured by: Here, the Adam optimizer fine-tunes the initial values of the DenseNet169 model. We employ ADAM, which is an optimization approach, as a substitute for traditional stochastic gradient descent algorithm for updating the network weight in training dataset [23]. This is utilized for performing optimization. ADAM is derived from adagrad and it is the more adaptable technique. ADAGRAD and momentum are collectively called ADAM.
Variables w (t) and L (t) , where index t specifies the present trainable iteration, the parameter update in ADAM is shown as follows: From the expression, β 1 and β 2 denotes gradient forgetting factor and second moment of gradient. ∈ is a small scalar utilized for preventing division by 0.

Fruit Classification
In the final stage, the RNN model is utilized for the identification and classification of fruits. The presented technique makes use of the LSTM model, which is a special kind of RNN. In the RNN, the neurons are interconnected with one another through a directed cycle [24]. The RNN model processes the data sequentially since it utilizes internal memory for processing a series of inputs or words. RNN implements a similar task to all the elements since the output is dependent on each preceding node input and remembered data. Figure 2 depicts the structure of RNN. For additional processing, Equation (7) characterizes typical RNN structure where ht indicates the novel state at time t, f w denotes a function with w variable, h t − 1 represents an older state (preceding state), and x t signifies input vector at t time.
Given that, the activation function is denoted as tanh, the weight of hidden state is represented by w h , and the input vector can be signified as x t . The exploding vanishing or gradient problems are generated while learning of gradient model is back-propagated by using the network. A special kind of RNN model called LSTM is utilized for handling the gradient vanishing problems. The LSTM saves long-term dependency by utilizing three diverse gates in an efficient manner. The LSTM gate is explained in the following expression.
From the formula, b characterizes the bias vector, W is utilized for weighted, and x t indicates the input vector at t time, whereas, i, f , and c represent input, forget, cell memory, and output gates.

Hyperparameter Tuning
In this study, the AOA is exploited to tune the hyperparameters of the RNN model such as learning rate, number of hidden layers, weight initialization, and decay rate. The AOA algorithm is a new modern swarm intelligence approach [25]. There are four hunting strategies of Aquila; for dissimilar types of prey, Aquila might flexibly change hunting strategy for diverse prey and later use their fast speed combined with claws and sturdy feet to attack the prey. The summary of mathematical expression is demonstrated in the following steps.
Step 1: Extended exploration (X 1 ): higher soar using vertical stoop Here, the Aquila flies higher above the ground level and widely explores the searching space, later a vertical dive is taken when the Aquila defines the prey region. Such behavior can be mathematically expressed as follows: From the equation, X best (t) signifies the optimally obtained location, and X(t) represents the average location of each Aquila in present iteration. t and T indicate the existing iteration and the maximal amount of iterations, correspondingly N denotes the population size, and r 1 refers to an arbitrary integer that lies within the range of zero and one.
Step 2: Narrowed exploration (X 2 ): contour flight with shorter glide attack This is a popular hunting methodology for Aquila. It applies short gliding to attack the prey, afterward descending within the designated area and flying around the prey. The updated location is given in the following: In Equation (15), X R (t) refers to an arbitrary location of Aquila, D indicates the dimension size, and r 2 represent an arbitrary integer lies in the range of [0, 1]. LF(D) signifies Levy flight function that is given in the following: From the expression, s and β are constant values equivalent to 0.01 and 1.5, correspondingly, and u and v stand for arbitrary numbers lying within a range [0, 1]. y and χ represent the spiral shape in the search space that is computed in the following: In Equation (18), r 3 is the number of search cycles within the interval of 1 and 20, D 1 is comprised of integer numbers from 1 to D dimensional size, then ω is equivalent to 0.005.
Step 3: Expanded exploitation (X 3 ): lower flight with a slower descent attacks Here, once the prey region is commonly identified, the Aquila vertically descends to execute an initial attack. AOA uses the designated region to get closer and attack the prey. This behavior can be mathematically modeled by the following equation: In Equation (19), X besi (t) represents the optimally attained location, and X M (t) indicates the average value of present position. α and δ signify the exploitation fine-tuning parameter set as 0.1, UB and LB denotes the upper and lower limits, and r 4 and r 5 refers to arbitrary value lies in the interval of (0, 1).
Step 4: Narrowed exploitation (X 4 ): grabbing and walking prey Here, the Aquila chase the prey with regard to escape trajectory and later attack the prey on the ground. The arithmetical expression of the behavior is given below: In Equation (20), X(t) indicates the present location, and QF(t) characterizes the quality function value that balances the searching strategy. G 1 represents the movement parameter of Aquila during tracking prey, which is an arbitrary integer lying within the range of [−1, 1]. G 2 signifies the flight slope while chasing prey that linearly reduces from 2 to 0. r 6 , r 7 , and r 8 are arbitrary numbers that lie within [0, 1].
The AOA system computes a fitness function (FF) for achieving higher classifier efficiency. It defines the positive integer for demonstrating a better performance of candidate outcomes. During this case, the minimized classifier error rate can be assumed as FF provided in Equation (21).

Result Analysis on Dataset 1
Dataset 1 (D1) is an openly accessible fruit and vegetable dataset that comprises 15 classes as shown in Table 1. All the classes involve at least 75 images, resultant from 2633 images in total. These images are gathered at a resolution of 1024 × 768 pixels on distinct dates and times. The dataset was freely accessible in [26]. A few sample images from dataset 1 are showcased in Figure 3.     Table 2 reports the overall fruit classification results of the AFC-HPODTL model obtained under dataset 1. The results indicate that the AFC-HPODTL model obtained effective classification results on all datasets. For instance, with the entire dataset, the AFC-HPODTL model classified 15 classes with average accu y , prec n , reca l , F score , MCC, and kappa score of 99.85%, 98.90%, 98.84%, 98.85%, 98.78%, and 98.76% respectively. Afterward, with 70% of TR data, the AFC-HPODTL approach classified 15 classes with average accu y , prec n , reca l , F score , MCC, and kappa score of 99.85%, 98.95%, 98.88%, 98.80%, 98.83%, and 98.77%, correspondingly. Similarly, with 30% of the TS dataset, the AFC-HPODTL algorithm classified 15 classes with average accu y , prec n , reca l , F score , MCC, and kappa score of 99.84%, 98.72%, 98.77%, 98.70%, 98.64%, and 98.73%, correspondingly. The training accuracy (TA) and validation accuracy (VA) attained by the AFC-HPODTL approach on dataset 1 is demonstrated in Figure 5. The experimental outcome shows that the AFC-HPODTL methodology gained maximal values of TA and VA. Specifically, the VA seemed to be higher than the TA. The training loss (TL) and validation loss (VL) achieved by the AFC-HPODTL system on dataset 1 are established in Figure 6. The experimental outcome inferred that the AFC-HPODTL approach achieved the lowest values of TL and VL. Specifically, the VL seemed to be lower than TL.   Figure 7 provide a comprehensive comparison study of the AFC-HPODTL model with existing models [28] on dataset 1. The results show that the NASNetMobile and MobileNetV1 models showed worse fruit classification results. Following, the Inception v3 model gained a slightly increased classification outcome. Then, the DenseNet121, VGG-16, and MobileNetV2 models reported moderately closer classification results. However, the AFC-HPODTL model gained maximum performance with accu y , prec n , reca l , F1 score , and kappa score of 99.84%, 98.72%, 98.77%, 98.70%, and 98.73%, respectively. Table 3. Comparative analysis of AFC-HPODTL approach with existing algorithms on dataset 1 [28].

Result Analysis on Dataset 2
Dataset 2 (D2) is an Indian fruit dataset that involves 12 classes as illustrated in Table 4. This is a balanced dataset, whereas all the classes have 1000 images, resulting from 12,000 images in total. All the images were obtained with various lighting, angles, and background conditions. The dataset is openly accessible in [27]. A few sample images from dataset 2 are demonstrated in Figure 8.     Table 5 demonstrates the overall fruit classification outcomes of the AFC-HPODTL approach obtained under dataset 2. The outcomes show that the AFC-HPODTL model obtained effectual classification outcomes on all datasets. For instance, with entire dataset, the AFC-HPODTL approach classified 15 classes with average accu y , prec n , reca l , F score , MCC, and kappa score of 99.63%, 97.79%, 97.78%, 97.78%, 97.58%, and 97.57%, correspondingly. Next, with 70% of TR data, the AFC-HPODTL algorithm classified 15 classes with average accu y , prec n , reca l , F score , MCC, and kappa score of 99.61%, 97.70%, 97.68%, 97.68%, 97.47%, and 97.47%, correspondingly. Similarly, with 30% of the TS dataset, the AFC-HPODTL methodology classified 15 classes with average accu y , prec n , reca l , F score , MCC, and kappa score of 99.67%, 97.99%, 98.02%, 98%, 97.82%, and 97.82%, correspondingly.  The TA and VA attained by the AFC-HPODTL approach on dataset 2 are demonstrated in Figure 10. The experimental outcome shows that the AFC-HPODTL methodology gained maximal values of TA and VA. Specifically, the VA appeared superior to the TA. The TL and VL achieved by the AFC-HPODTL system on dataset 2 are established in Figure 11. The experimental outcome exposed that the AFC-HPODTL approach achieved the lowest values of TL and VL. Specifically, the VL seemed to be lesser than TL.  Table 6 and Figure 12 illustrate a comprehensive comparison analysis of the AFC-HPODTL algorithm with existing approaches on dataset 2 [28]. The outcomes demonstrate that the NASNetMobile and MobileNetV1 techniques demonstrated worse fruit classification results. The Inception v3 model gained somewhat superior classification outcomes. Likewise, the DenseNet121, VGG-16, and MobileNetV2 approaches reported moderately closer classification results. Eventually, the AFC-HPODTL system showed a higher performance with accu y , prec n , reca l , F1 score , and kappa score of 99.67%, 97.99%, 98.02%, 98%, and 97.82%, correspondingly. Table 6. Comparative analysis of AFC-HPODTL approach with existing algorithms on dataset 2 [28]. From the detailed results and discussion, it is apparent that the AFC-HPODTL model accomplished maximum fruit classification results over the other models.

Conclusions
In this study, a new AFC-HPODTL model was developed for the automatic identification and classification of fruits. The presented AFC-HPODTL model comprises a series of processes, namely pre-processing, DenseNet169 feature extraction, Adam optimizer, RNN classification, and AOA hyperparameter optimization. For feature extraction, the Adam optimizer with deep transfer learning-based DenseNet169 model is used and the AOA-RNN model is utilized for the classification of fruits. The performance validation of the presented AFC-HPODTL model was carried out using a benchmark dataset and the results reported promising performance over recent state-of-the-art approaches with maximum accuracy of 99.84% and 99.67% on datasets 1 and 2, respectively. The results demonstrated that the presented model has the effectual ability to classify fruits in real time. As a part of future scope, hybrid DL models can be integrated into the AFC-HPODTL model for enhanced classification performance. In addition, the presented model can be extended to the examination of fruit quality assessment in future. Moreover, the computational complexity of the proposed model can be examined in our future work.