Nondestructive Estimation of the Chlorophyll b of Apple Fruit by Color and Spectral Features Using Different Methods of Hybrid Artificial Neural Network

Nondestructive estimation of the various physicochemical features of food such as fruits and vegetables will create a dramatic development in the food industry. The reason for this development is that the estimation is non-destructive, online, and most importantly fast. Regarding the advantages, various researchers have focused on how to undertake non-destructive estimation of the physicochemical features of various nutrients. Three main goals were pursued in this article. These are: 1. Nondestructive estimation of the chlorophyll b content of red delicious apple using color features and hybrid artificial neural network-cultural algorithm (ANN-CA), 2. Nondestructive estimation of chlorophyll b content of red delicious apple using spectral data (around a range of 680 nm) and hybrid Artificial Neural Network-biogeography-based algorithm (ANN-BBO), 3. Nondestructive estimation of the chlorophyll b content of red delicious apple using different groups of selective spectra by the hybrid artificial neural network-differential evolution algorithm (ANN-DA). In each of these methods, 1000 replications were performed to evaluate the reliability of various hybrids of the artificial neural network. Finally, the results indicated that the average determination coefficient in 1000 replications for the hybrid artificial neural network, the cultural algorithm, and the hybrid artificial neural network, the biogeography-based optimization algorithm, was 0.882 and 0.932, respectively. Also, the results showed that the highest value of the coefficient of determination among the different groups of effective features is related to the group of features with 10 spectra. The coefficient of determination in this case was 0.93.


Introduction
Fruits are foods that are rich in vitamins and are consumed among people all over the world. Different fruits grow based on the water, air, and soil of each region of the planet. Therefore, standards must be considered for the distribution of these fruits in the country or export to other countries, delicious apple (due to peel color changes at different ripening stages). The first method is based on the color specification and hybrid artificial neural network-the Cultural Algorithm, and the second method is based on spectral data and hybrid artificial neural network-biogeography-based optimization algorithm.

Materials and Methods
As stated, in this study, the color and spectral methods were used for non-destructive estimation of chlorophyll b. Figure 1 shows the flowchart of the various stages of the non-destructive estimation of the chlorophyll b content using two color and spectral methods.

Samples Used
For color and spectral analysis, 42 different samples of red delicious apple are randomly selected at various ripening stages in different gardens of Kermanshah, Iran (longitude: 7.03 • E; latitude: 4.22 • N). In total, 10 samples of 42 samples were related to the unripe stage (135 days after flowering of apple fruit), 9 samples were related to the half-ripe stage (145 days after flowering of apple fruit), 12 samples were related to the ripe stage (155 days after flowering of apple fruit) and 11 samples were related to the overripe stage (165 days after flowering of apple fruit). Figure 2 shows a few examples of test apples. To extract spectral features, samples were transferred to Shahid Beheshti University. After that, to extract color and chemical features of chlorophyll b, samples were transferred to the Agricultural Engineering Technical Research Center. Color data and spectra data were extracted from 5 direction on samples and then the average of them were considered as final values.

Development of Visible and Near-Infrared Light Spectroscopy System
The configuration of the spectroscopy system is one of the steps to measure the spectrum. Figure 3 shows the configuration of the visible and near-infrared light spectroscopy system. As you can see, the mode of measurement in this research is reflective. The spectrometer EPP200NIR (StellarNet, USA) with an Indium Gallium-Arsenide (InGaAs) detector and a range of 200 nm to 1100 nm and a resolution of 1 to 3 nm was used which connected to a computer via a USB2 cable. The light source model was SLI-CAL (StellarNet, USA) and made with 20-watt tungsten halogen. A laptop with Intel Corei3CFI, 330 M at 2.13 GHz, 4 GB of RAM, and Windows 10 equipped with Spectra Wiz software is used to store the resulting spectrum in a computer. An optical fiber with two ends was used to guide light from the light source to apples and from apples to the spectrometer. Because of intense noise, 200 first wavelengths and 100 end wavelengths were eliminated, and thus the spectral range was 400 nm to 1000 nm.

Extracting Color Features
From the peel of each apple sample, three color components of L*, a*, and b* were measured using the CR-400 colorimetric device (Konika Minolta, Japan). After measuring these three components, the color purity indices (C*) and the fame angle (ha) are calculated using Equations (1) and (2), [34].

Extraction of Chlorophyll b
Due to the change in the value of chlorophyll b during the ripening stages of fruits, especially red delicious apple, its non-destructive estimation will be useful for predicting the ripening stage [16,35]. To measure the actual amount of chlorophyll b, the method used by Ncama et al. [20] was used. Based on this method, the formula for calculating the value of chlorophyll b is based on Equation (3).
where A is the absorbance of the sample at the subscript wavelengths of A. For example, A 646.8 are absorbance of the sample at 646.8 nm.

Different Hybrids of Artificial Neural Networks Used for Selecting Effective Features and Predicting Chlorophyll b
Artificial neural network-particle swarm optimization algorithm (ANN-PSO). For selecting effective color features among 5 color extracted features, hybrid ANN-PSO was used. Also, hybrid ANN-DE was used for selecting effective wavelengths among 120 extracted wavelengths. The procedure of these two hybrids is the same, but their optimization algorithm is different.
For predicting chlorophyll b based on color features and spectral data, hybrid ANN-CA and hybrid ANN-BBO were used respectively. The algorithms of CA and BBO determine the best structure of ANNs for predicting chlorophyll b. 120 wavelengths were extracted in this region and were used as inputs to hybrid ANN-BBO.

Hybrid Neural Networks Used to Select the Most Effective Color Features
After extracting different color features from each sample, some effective features to input to hybrid artificial neural network-cultural algorithm (ANN-CA) were selected for providing and estimating the amount of chlorophyll b using hybrid artificial neural network-particle swarm optimization algorithm (ANN-PSO). Particle swarm algorithm is a metaheuristic algorithm that emulates the collective movements of birds in order to optimize various issues. This algorithm was originally proposed by Kennedy and Eberhart [36]. Each answer is considered as a particle. Each particle is constantly searching and moving. The motion of each particle depends on three factors: 1. the current position of the particle; 2. the best position ever had; and 3. the best position that the whole set of particles has ever had. The procedure is that, initially, the PSO algorithm considers a vector with all extraction features, and in the next step, smaller vectors of the features, for example, vectors with 2, 3, 4, etc., members are sent as inputs to ANN with the hidden layer features shown in Table 1. The output of ANN is chlorophyll b. Each time that a vector of features is sent for ANN, the mean squared error of ANN is recorded and finally, the vector having the least-squares error is selected as the optimal vector and the characteristics of the vector are selected as effective characteristics. In this research, among the five color features extraction, two color features of a* and C* were selected as effective features. In fact, the input of the neural network of vectors transmitted by the particle swarm optimization algorithm and its output is the chemical extraction characteristic of chlorophyll b. The neural network divides inputs in the ratio of 70% for training, 15% for validation, and 1% for testing. The mean squared error of each input vector to the multi-layer perceptron neural network is recorded and, finally, the vector with the least-squares error is selected as the optimal vector and the characteristics of the vector are selected as effective characteristics. In this research, among the five color features extraction, two color features of a* and C* were selected as effective features.

Neural Network Hybrid Used to Estimate the Amount of Chlorophyll b Using Color Features
In order to estimate the amount of chlorophyll b using color features, the hybrid artificial neural network-cultural algorithm is used. The cultural algorithm, like the genetic algorithm, performs the optimization process. In fact, in the genetic algorithm, a natural and biological evolution are considered. However, in the cultural algorithm, cultural evolution and the impact of cultural and social space are considered, which ultimately leads to a model for solving an optimization problem. In a society, every person who is famous, directly and indirectly, will have the greatest impact on cultural evolution. In fact, these people will affect the way people talk, walk, dress, and so on. The ultimate goal of this algorithm is to find these elites and develop them for cultural evolution [37]. The task of the cultural algorithm in this section is to determine the optimal values of the multilayer perceptron neural network parameters. The network has 5 adjustable parameters and, if the parameters have optimal values, the neural network will have the highest performance. These parameters include the number of neurons, the number of layers, the transfer function, the back propagation network training function, weight learning/bias back propagation function. In this study, the minimum and a maximum number of neurons in each layer could be 0 and 25, respectively. The minimum and a maximum number of layers could be 1 and 3, respectively. The transfer function for each layer is selected among the transfer functions of tansig, logsig, purelin, hardlim, compet, hardlims, netinv, poslin, radbas, satlin, satlins, softmax, tribas. Back propagation network training function is selected among trainlm, trainbfg, trainrp, traincgb, traincgf, traincgp, traincgb, trainscg, trainass, traingda, traingdx, trainb, trainbfgc, trainbr, trainbuwb, trainc, traingdm, trainr, trains. Finally, the weight learning/bias back propagation function is selected among learngdm, learngd, learncon, learnh, learnhd, learnis, learnk, learnlv1, learnlv2, learnos, learnp, learnpn, learnsom, learnsomb, learnwh. The methodology is such that at first the cultural algorithm considers a vector of the same size with the number of parameters mentioned, namely a vector with minimum four and maximum eight members. Each member represents a parameter. For example, the vector x = [8, 14, 21, tribas, satlins, logsig, trainrp, learnp] indicates that the investigated network has three hidden layers with 8, 14, and 21 neurons, the first layer transfer function of tribas, the the second layer transfer function of satlins, the third layer transfer function of logsig, the back propagation network training function of trainrp, and the weight learning/bias back propagation function of learnp. The results of each vector sent to the neural network are measured by the mean squared error. Finally, each vector with the least mean squared error is used as the optimal vector for setting the parameters of the multilayer perceptron neural network.

Neural Network Hybrid Used to Select Effective Wavelengths
A hybrid artificial neural network-differential evolution algorithm was used to select effective wavelengths. The differential evolution algorithm, like many optimization algorithms, is based on population and has a random behavior proposed by Storn and Price [38]. This algorithm consists of two main steps: initialization and evolution. While the optimization problem has no initial information, so a random population should be first created, and then, in order to optimize the problem, population members are recovered through mutation, recombination, and selection process as long as the optimization is done. The method in this case is similar to the hybrid artificial neural network-particle swarm optimization algorithm. Table 2 shows the features of the hidden layers of the multi-layer perceptron neural network used in this section. To estimate the amount of chlorophyll b using spectral data, an artificial neural network hybrid-biogeography-based optimization algorithm is used. The biogeography-based optimization algorithm is inspired by how different animal and plant species are distributed in different parts of the world [39]. The various stages of biogeography-based optimization Algorithm 1 are described below. The method of this hybrid is similar to hybrid artificial neural network-cultural algorithm.

Parameters Used to Evaluate the Performance of Proposed Methods for Estimating the Amount of Chlorophyll b
In order to evaluate the performance of models predicting the amount of chlorophyll b by various hybrids artificial neural network, the coefficients of determination (R 2 ), sum squared error (SSE), mean absolute error (MAE), mean square error (MSE), and root mean square error (RMSE) are used [40,41]. Figure 4 shows the response of red delicious apple to spectra in a range of 400 to 1000 nm, in both the reflection spectrum and absorption spectra of the samples. The absorption spectrum of the sample was obtained using the log (1/R) relation, in which R is a reflection spectrum. As shown in Figure 4 with the box, there is a peak between 680 and 700 nm, which is related to the absorption of chlorophyll [42,43].

Estimation of chlorophyll b Using Color Features
As described in the materials and methods section, among the five color features extracted, two features of a* and C* are used as inputs of hybrid ANN-CA to estimate the chlorophyll b features. Table 3 shows the optimal structure of the hidden layers of the multilayer perceptron neural network set by a cultural algorithm to estimate the amount of chlorophyll b. As can be seen, the network has three hidden layers with the characteristics of each layer. After determining the best artificial neural network structure, 1000 repetitions are used to evaluate the reliability of the predictive method. Table 4 shows the mean, standard deviation, and best performance of hybrid ANN-CA for 1000 repetitions on color data. The criteria for evaluating the performance of the predictive method are the coefficient of determination, the sum squared error, the mean absolute error, the mean square error, the root mean square error. As the table shows, the mean coefficient of determination is 0.882 and the best value of the coefficient of determination is more than 0.99, which indicates the high performance of the method used to estimate the amount of chlorophyll b. Figure 5 shows the regression analysis of the scatter plot between the estimated mean and the actual value (measured) of the chlorophyll b content of red delicious apple (test set) using color data. Each replicate contains 13 test samples, so there will be 13,000 samples in 1000 replicates, and since there are only 42 samples, there will be more than 309 replicates per sample, the mean of which is measured. Each replicate contains 13 test samples, so there will be 13,000 samples in 1000 replicates, and since there are only 42 samples, there will be more than 309 replicates per sample, and their average is measured. As can be seen, the regression coefficient between the mean of predicted and measured values is more than 0.977, which indicates the high performance of the proposed method for the non-destructive estimation of chlorophyll b content. Figure 6 shows a graphical example for comparing the actual value of chlorophyll b content of apple samples with a mean estimated value of chlorophyll b using color data in 1000 replicates. As can be seen, in most samples, actual values and mean estimated values of chlorophyll b are superimposed, indicating an acceptable prediction of the method used.  Table 5 shows the optimal structure of the hidden layers of the multilayer perceptron neural network set by a biogeography-based optimization algorithm to estimate the amount of chlorophyll b using spectral data. This table shows that the best artificial neural network structure has three hidden layers. In this case, after determining the optimal structure, 1000 replications were conducted to evaluate the validity of the hybrid ANN-BBO for estimating the amount of chlorophyll b. Table 6 shows the mean, standard deviation, and best performance of hybrid ANN-BBO for 1000 repetitions on spectral data. As can be seen, the mean values for the error and standard deviation are small, and in the best-case are close to zero, indicating a high performance of this method. Figure 7 shows the regression analysis of the scatter plot between the estimated mean in 1000 replicates and the actual value (measured) of chlorophyll b content of the red delicious apple (test set) using spectral data. The regression coefficient for this state is higher than 0.991 which indicates the high performance of the proposed method. Finally, Figure 8 shows a visual example for comparing the actual value of the chlorophyll content of apple samples with a mean estimated value in 1000 replicates using spectral data. As can be seen, in most samples, the mean estimated values of chlorophyll b are close to the actual values of the samples, which indicates the acceptable prediction of the method used. Each replicate contains 13 test samples, so there will be 13,000 samples in 1000 replicates, and since there are only 42 samples, there will be more than 309 replicates per sample, the mean of which is measured.

Analyzing the Performance of Chlorophyll b Predictive Systems Based on Color and Spectroscopy Methods
Figures 9 and 10 illustrate the boxplots obtained by differentiating the actual amount of chlorophyll b and its estimated values using color and spectral data. Each replicate contains 13 test samples, so there will be 13,000 samples in 1000 replicates, and since there are only 42 samples, each sample will have more than 309 replicates. In a large number of these samples, box plots are compressed, which means that predictive methods have close results in different replicates, which indicates the reliability of the methods used. Finally, Tables 7 and 8 show the actual value, mean, standard deviation, and predicted value of chlorophyll b content of 42 red delicious apples on the test data set using color and spectral data.  In this case, each replicate contains 13 test samples, so there will be 13,000 samples in 1000 replicates, and since there are only 42 samples, there will be more than 309 replicates per sample, the mean of which is measured.

Effective Wavelengths Selected by the Hybrid Artificial Neural Network-Differential Evolution Algorithm
It is possible to estimate the amount of chlorophyll b by developing an on-line multi-spectrum system (2-10 spectra). For this reason, in this section, different effective spectral properties are selected by a hybrid artificial neural network-differential evolution algorithm (Table 9).

The Performance of the Chlorophyll b Estimation System Based on the Effective Wavelengths Selected
After selecting different groups of effective spectra, these spectra were sent as inputs to the hybrid ANN-BBO for estimating the amount of chlorophyll b properties. Table 10 shows the mean, standard deviation, and best results of evaluation parameters of performance of hybrid ANN-BBO on the data of selected effective wavelengths in 1000 replicates. As can be seen, the highest value of the coefficient of determination for the input set with the number of effective features is 10. Of course, there is little difference between the coefficients of determination in different categories of effective features. In the same way, Figures 11 and 12 show the box plots of error assessment criteria, regression coefficients, and determination of the hybrid neural network-biogeography optimization algorithm method in non-destructive estimation of the amount of chlorophyll b for 1000 replicates.
The subscript numbers show the number of effective features. As can be seen, on the one hand, the graphs are all compressed, and on the other hand, the error estimation criteria have low values and regression and determination coefficients have values close to 1. The sum of these three states indicates the high performance of the proposed method. In the end, in order to compare the performance of the proposed methods in this study, four different studies which carried out a non-destructive prediction of chlorophyll content, are used. The results of this research are shown in Table 11 in the form of regression coefficients. As can be seen, the proposed color and spectral methods in this study have higher regression coefficients than other methods. Table 10. Mean, standard deviation and best results of evaluation parameters of performance of the hybrid artificial neural network-biogeography-based algorithm on the data of selected effective wavelengths in 1000 replicates.    Table 11. Comparison of the performance of proposed methods in this study with other methods in terms of non-destructive estimation of chlorophyll b content.

Conclusions
In this paper, two color and spectral methods based on different hybrids of the artificial neural network are used to non-destructive estimation of the chlorophyll b content of red delicious apple. The most important results of this research are: 1.
The cost of the configuration and set-up of the spectroscopy system is very important for real time aims. To reduce the cost of configuration, a small window of around 680 nm wavelength could be used instead of using spectroscopy over the entire visible/near-infrared range.

2.
The largest peak in spectral diagrams in the visible light region is related to the chlorophyll absorption because the chlorophyll b content was predicted to be high when the coefficient was predicted using the relevant spectral data of this region.

3.
There is a relationship between the color features of the apple and the amount of chlorophyll b so that the chlorophyll b values are estimated using these color features, with a coefficient of more than 0.996.

4.
Performance of the spectral method is higher than the color method in terms of the determination and regression coefficients as well as the error estimation parameters.

5.
When effective spectra selected by the hybrid artificial neural network-differential evolution algorithm are introduced as an input to a hybrid artificial neural network-biogeography-based algorithm, it has high regression and determination coefficients.