Non-Destructive Prediction of Titratable Acidity and Taste Index Properties of Gala Apple Using Combination of Different Hybrids ANN and PLSR-Model Based Spectral Data

Non-destructive estimation of the internal properties of fruits and vegetables is very important, because better management can be provided for subsequent operations. Researchers and scientists around the world are focusing on non-destructive methods because if they are developed and commercialized, there will be an impressive change in the food industry. In this regard, this paper aims to present a non-destructive method based on Vis-NIR spectral data. The different stages of the proposed algorithm are: (1) Collection of samples of Gala apples, (2) Spectral data extraction by spectroscopy, (3) Pre-processing of spectral data, (4) Measurement of chemical properties of titratable acidity (TA) and taste index, (5) Selection of key wavelengths using hybrid artificial neural network-firefly algorithm (ANN-FA), (6) Non-destructive estimation of the properties using two methods of hybrid ANN- Particle swarm optimization algorithm and partial least squares regression. For considering the reliability of methods for estimating the chemical properties, the prediction operation was executed in 300 iterations. The results represented that the mean and standard deviation of the correlation coefficient and the root mean square error of hybrid ANN-PSO and PLSR for TA were 0.9095 ± 0.0175, 0.0598 ± 0.0064, 0.834 ± 0.0313 and 0.0761 ± 0.0061 respectively. These values for taste index were 0.918 ± 0.02, 3.2 ± 0.39, 0.836 ± 0.033 and 4.09 ± 0.403, respectively. Therefore, it can be concluded that the hybrid ANN-PSO has a better performance for non-destructive prediction of the two mentioned chemical properties than the PLSR method. In general, the proposed method can predict the chemical properties of TA and taste index non-destructively, which is very useful for mechanized harvesting and management of post-harvest operation.


Introduction
Apples belong to the Rosaceae family. The origin of apples is reported to be Asia, especially the Republic of Kazakhstan according to Forsline et al. [1] Apples are grown all over the world in temperate, subtropical and tropical climates. Today, due to the growing demand for food products, the need for automation in agriculture is becoming more apparent. On the other hand, many developed countries 975 nm. The coefficient of determination (R2) of the model for SSC and DMC was 0.922-0.946 and 0.910-0.933, respectively.
As can be observed, various research has been conducted on the usage of non-destructive methods for detecting the different properties of fruits. In this regard, this study presents a non-destructive algorithm based on two hybrids of artificial neural network hybrids, namely ANN-FA and ANN-PSO as well as a partial least squares regression method using spectral data to estimate the chemical properties of the titratable acidity (TA) and the taste index of the Gala cultivar. Figure 1 demonstrates the flowchart of the different stages of algorithm proposed for non-destructive estimation of the chemical properties of TA and taste index. In this study, hybrid ANN-FA was used to select key wavelengths and the hybrid ANN-PSO and PLSR were used to predict the mentioned properties.

Materials and Methods
Plants 2020, 9, x FOR PEER REVIEW 3 of 18 destructive algorithm based on two hybrids of artificial neural network hybrids, namely ANN-FA and ANN-PSO as well as a partial least squares regression method using spectral data to estimate the chemical properties of the titratable acidity (TA) and the taste index of the Gala cultivar. Figure 1 demonstrates the flowchart of the different stages of algorithm proposed for nondestructive estimation of the chemical properties of TA and taste index. In this study, hybrid ANN-FA was used to select key wavelengths and the hybrid ANN-PSO and PLSR were used to predict the mentioned properties.

Data Collection
In order to train the proposed algorithm, Gala apple samples were collected from orchards of the Kermanshah province. In general, 140 days after flowering, Gala apples are ripe and ready to be harvested. After consulting with different gardeners, the ripening time of Gala cultivar apples was determined based on their experience during the past few years. Then the total numbers of 150 samples were collected during 3 stages, namely 20 days before the ripening time, 10 days before the ripening time and at the time of ripening. In fact there were 50 fruits for each phase. At each stage, after collecting the samples, they were transported to the laboratory to extract spectral properties and to measure chemical properties.

Hardware of System Used
In order to extract spectral data, a hardware system was used that has different components. These components are: A. Laptop with Intel Core i3 CFI specifications, 330 M at 2.13 GHz, 4 GB of RAM, Windows 10.
Spectra Wiz software was installed on the laptop to record spectral data of each sample.

Data Collection
In order to train the proposed algorithm, Gala apple samples were collected from orchards of the Kermanshah province. In general, 140 days after flowering, Gala apples are ripe and ready to be harvested. After consulting with different gardeners, the ripening time of Gala cultivar apples was determined based on their experience during the past few years. Then the total numbers of 150 samples were collected during 3 stages, namely 20 days before the ripening time, 10 days before the ripening time and at the time of ripening. In fact there were 50 fruits for each phase. At each stage, after collecting the samples, they were transported to the laboratory to extract spectral properties and to measure chemical properties.

Hardware of System Used
In order to extract spectral data, a hardware system was used that has different components. These components are: After extracting spectral data from each sample, there may be noise for some reasons such as the disturbing ambient light, unevenness of the sample surface, increase in temperature of spectrometer and different sizes of the sample that affects spectral information and causes error. For this reason, raw spectral data must be pre-processed. In this research, preprocessing operations were performed in three stages. The first stage is the conversion of reflectance spectra to absorption spectra (see Equation (1)), the second stage is light scatter and baseline correction using standard normal variant, and the third stage is smoothing using the median filter according to Rossel [26].

Chemical Properties of TA
Fruits, as they get closer to ripening time, decrease in their acidity and their taste tends to sweeten. Therefore, this feature can be used as an indicator for the ripening time of Gala apples. The method used to destructively measure the acidity of the titration was the method used by James (1998).

Properties of Taste Index
Taste index is defined as the ratio of total soluble solids (TSS) to the titratable acidity (TA). This index is used to determine the taste, which depends on the level of fruit ripening according to Wongkhot et al. [27]) Therefore, this index can be used to determine the ripening stage of Gala apples.

Selection of Key Wavelengths in the Range of 200-1100 nm
Since non-destructive estimation of physicochemical properties such as TA and taste, size and price of device are key elements for the development of portable devices, these elements should be considered. For this reason, instead of using all the spectral information of the 200 to 1100 nm, spectral information related to the key wavelengths should be used. In this study, the hybrid ANN-FA was used to select key wavelengths. The main idea of this algorithm is inspired by the optical connection between fireflies. This algorithm can be considered as a manifestation of swarm intelligence, in which the cooperation (and possibly competition) of simple and low-intelligence members creates a higher level of intelligence that is certainly not achievable by any of the components according to Yang [28] and Ncama et al. [29] The method is that the firefly algorithm sends different vectors of spectral data related to different wavelengths as input to the artificial neural network with the structure shown in Table 1 and the artificial neural network result is recorded as the mean squared error. It should be noted that the output of the artificial neural network is the mentioned chemical properties.
It should be noted that the key wavelength selection operation is performed separately for each chemical property. Any input vector that causes the lowest mean squared error is considered as the optimal vector and the wavelengths within that vector are considered as key wavelengths. In this study, the chemical properties of TA and taste index were estimated using hybrid ANN-PSO and PLSR in order to compare the performance of methods based on artificial intelligence and statistics. In order to evaluate the reliability of these methods in different replications, prediction operations were performed for 200 replications.

Hybrid Artificial Neural Network-Particle Swarm Optimization Algorithm (ANN-PSO)
The multilayer perceptron artificial neural network has various adjustable parameters, the optimal adjustment of which ensures its high performance. These parameters include the number of layers, the number of neurons, transfer function, back-propagation network training function and the back-propagation weight/bias learning function. The particle swarm optimization algorithm is a meta-heuristic algorithm that mimics the group movements of birds to optimize various problems. This algorithm was first proposed by Kennedy and Eberhart [30]. Each answer to the problem is considered as a particle. Every particle is constantly searching and moving. The motion of each particle depends on three factors: (1) The current position of the particle, (2) The best position the particle has ever had, and (3) The best position the whole set of particles has ever had according to Kennedy and Eberhart [30]. The number of selectable layers is at least 1 and at most 3, the number of neurons for the first layer is at least 1 and at most 25 and for other layers at least 0 and at most 25. The transfer function was selectable from 13 different functions such as tansig, logsig and purelin. The back-propagation network training function was selectable from 19 different functions such as trainbfg, trainrp and traingd. Finally, the back-propagation weight/bias learning function could be selected from 15 different functions such as learncon, learnh and learnhd. The input of the artificial neural network is the spectral data and the output is the mentioned chemical properties. The method is that the particle swarm optimization algorithm selects different vectors from the structure of the artificial neural network and sends it to the artificial neural network. The error is recorded as the mean squared error. Any structure that results in a minimum MSE is identified as the optimal structure. It should be noted that 60% of the data is randomly used for training, 10% for validation and the other 30% for testing.

Partial Least Squares Regression Method
Partial least squares regression (PLSR) is a statistical method that deals with principal component regression. This is a non-parametric method that is not sensitive to sample size and does not require data normalization according to Rossel [27].

Performance Evaluation of Methods for Estimating the Chemical Properties of TA and Taste Index
In order to evaluate the performance of the hybrid ANN-PSO algorithm and PLSR, the criteria of correlation coefficient (R), coefficient of determination (R2), mean squared error (MSE), root mean squared error (RMSE) and mean absolute error (MAE) were used for Sabzi and Arribas [31]. Table 2 gives the measured average of chemical properties of TA and taste index of Gala cultivar in three stages of harvest. Since there was no significant difference between the three stages of harvest, regression operations were performed on all 150 samples simultaneously. If there was a significant difference, the regression operation had to be performed separately for each class. The results of ANOVA analysis, Tukey and LSD comparing means are given in Tables 3 and 4.

ANOVA Analysis for Property of Taste Index
The results of ANOVA analysis, Tukey and LSD comparing means are given in Tables 5 and 6.

T-Student Test for Property of TA
Test of t-student was performed in order to compare destructive and non-destructive methods for predicting the property of TA based on ANN-PSO and PLSR. The results are given in Tables 7 and 8.

T-Student Test for Taste Index Property
Test of t-student was performed in order to compare destructive and non-destructive methods for predicting the property of taste index based on ANN-PSO and PLSR. The results are given in Tables 9 and 10.   Table 11 gives the structure of the hidden layers of hybrid ANN-PSO algorithm to estimate the chemical properties of TA. The optimal neural network has three layers with the mentioned properties.  Table 12 gives the optimal values of the adjustable parameters of the artificial neural network. As can be seen, the optimal structure has two hidden layers.

Chemical Properties of Taste Index
The key wavelengths selected by the hybrid ANN-FA algorithm for taste estimation are 942, 958, 967, 974 and 982 nm. Figure 2 represents the correlation analysis of the scatter plot between the mean estimated and the measured chemical properties of TA of Gala cultivar (test set) by hybrid ANN-PSO algorithm based on spectral data related to key wavelengths in 300 replications. As can be seen, the correlation coefficient is close to 0.92. Figure 3 shows a box diagram of five criteria for evaluating the performance of ANN-PSO algorithm including R, R2, MSE, MAE and RMSE within the Gala cultivar based on spectral data of key wavelengths in 300 replications. The more compact box diagrams show the similarity of the results in different iterations, which indicate the high validity of the method. As can be seen, the mean squared error in all iterations is less than 0.01 and the value of correlation coefficient in more than half of the iterations is more than 0.91.      Figure 4 illustrates the correlation diagram of the mean estimated and measured properties of taste index of Gala apple in 300 iterations. As can be seen, based on the key wavelength, the correlation coefficient between the mean estimated and measured values is about 0.93. Figure 5 examines the performance of the ANN-PSO for estimating the properties of the taste index using three criteria related to error and correlation coefficients and determination. As can be seen, in more than half of the iterations, RMSE is less than 3. Moreover, except one iteration, the value of correlation coefficient is above 0.85.  Figure 4 illustrates the correlation diagram of the mean estimated and measured properties of taste index of Gala apple in 300 iterations. As can be seen, based on the key wavelength, the correlation coefficient between the mean estimated and measured values is about 0.93. Figure 5 examines the performance of the ANN-PSO for estimating the properties of the taste index using three criteria related to error and correlation coefficients and determination. As can be seen, in more than half of the iterations, RMSE is less than 3. Moreover, except one iteration, the value of correlation coefficient is above 0.85.     Figure 6 shows a correlation diagram between the mean estimated and actual value of TA obtained by partial least squares regression method at 300 iterations using the destructive method. The correlation coefficient is slightly more than 0.82, which is lower than the case of ANN-PSO. Figure 7 uses five criteria of mean squared error, root mean squared error, mean absolute error, correlation coefficient and determination coefficient to investigate the performance of PLSR method for estimating the TA. As can be seen, the lowest value of correlation coefficient is above 0.74 and the highest value is close to 0.91.  Figure 6 shows a correlation diagram between the mean estimated and actual value of TA obtained by partial least squares regression method at 300 iterations using the destructive method. The correlation coefficient is slightly more than 0.82, which is lower than the case of ANN-PSO. Figure 7 uses five criteria of mean squared error, root mean squared error, mean absolute error, correlation coefficient and determination coefficient to investigate the performance of PLSR method for estimating the TA. As can be seen, the lowest value of correlation coefficient is above 0.74 and the highest value is close to 0.91.    Figure 8 illustrates the correlation analysis of the scatter plot between the mean estimated and measured taste index of Gala apple (test set) using PLSR method based on spectral data of key wavelengths in 300 iterations. As can be seen, the value of the correlation coefficient in this case is close to 0.85. Figure 9 shows a box diagram of the error criteria and correlation coefficients and determinations used to evaluate the performance of the PLSR method for estimating the taste index of Gala cultivar based on spectral data of key wavelengths at 300 iterations. The lowest coefficient of determination is 0.57 and the highest value is above 0.85.  Figure 8 illustrates the correlation analysis of the scatter plot between the mean estimated and measured taste index of Gala apple (test set) using PLSR method based on spectral data of key wavelengths in 300 iterations. As can be seen, the value of the correlation coefficient in this case is close to 0.85. Figure 9 shows a box diagram of the error criteria and correlation coefficients and determinations used to evaluate the performance of the PLSR method for estimating the taste index of Gala cultivar based on spectral data of key wavelengths at 300 iterations. The lowest coefficient of determination is 0.57 and the highest value is above 0.85.

Comparison of the Performance of Hybrid ANN-PSO Algorithm and PLSR for Non-Destructive
Estimation of Chemical Properties of TA Figure 10 compares the performance of these two methods using a plot scatter diagram related to the mean estimated and true value of TA using two methods, namely hybrid ANN-PSO algorithm and PLSR. In the ANN-PSO method, the mean estimated values of most samples are closer to the actual values than the PLSR method, which indicates better performance of the ANN-PSO hybrid than the PLSR method. Table 13 compares the mean and standard deviation of different criteria evaluating the performance of the ANN-PSO algorithm and PLSR for estimating the TA at 300 iterations using spectral data of key wavelengths. In the best iteration of ANN-PSO, RMSE and correlation coefficient are 0.0017 and 0.991, respectively. In the case of the PLSR method, these are 0.0040 and 0.9045 respectively. It can be mentioned that the hybrid ANN-PSO method performed better than the PLSR method for estimating TA.

Comparison of the Performance of Hybrid ANN-PSO Algorithm and PLSR for Non-Destructive
Estimation of Chemical Properties of TA Figure 10 compares the performance of these two methods using a plot scatter diagram related to the mean estimated and true value of TA using two methods, namely hybrid ANN-PSO algorithm and PLSR. In the ANN-PSO method, the mean estimated values of most samples are closer to the actual values than the PLSR method, which indicates better performance of the ANN-PSO hybrid than the PLSR method. Table 13 compares the mean and standard deviation of different criteria evaluating the performance of the ANN-PSO algorithm and PLSR for estimating the TA at 300 iterations using spectral data of key wavelengths. In the best iteration of ANN-PSO, RMSE and correlation coefficient are 0.0017 and 0.991, respectively. In the case of the PLSR method, these are 0.0040 and 0.9045 respectively. It can be mentioned that the hybrid ANN-PSO method performed better than the PLSR method for estimating TA. predicting the titratable acidity of Gala apples using spectral data related to key wavelengths.

Comparison of the Performance of Hybrid ANN-PSO Algorithm and PLSR for Non-Destructive
Estimation of Chemical Properties of Taste Index Figure 11 and Table 14 compare the performance of ANN-PSO methods and partial least squares regression for non-destructive estimation of the taste index value using plot scatter diagrams and various criteria. The ANN-PSO method performs better than the partial least squares regression method.   Figure 11 and Table 14 compare the performance of ANN-PSO methods and partial least squares regression for non-destructive estimation of the taste index value using plot scatter diagrams and various criteria. The ANN-PSO method performs better than the partial least squares regression method.  Table 15 shows the comparison of performance of the proposed algorithm (using methods of hybrid ANN-PSO and PLSR) with other research in the field of non-destructive estimation of TA. As can be seen, the regression coefficient of the proposed algorithm is much higher than other methods. predicting the taste index of Gala apples using spectral data related to key wavelengths.  Table 15 shows the comparison of performance of the proposed algorithm (using methods of hybrid ANN-PSO and PLSR) with other research in the field of non-destructive estimation of TA. As can be seen, the regression coefficient of the proposed algorithm is much higher than other methods.

Conclusions
In this paper, using a non-destructive method, the chemical properties of the TA and the taste index of the Gala apple at different stages of maturity were estimated. A hybrid ANN-FA method was used to select key wavelengths and two methods, e.g., ANN-PSO and PLSR were used to estimate the mentioned properties. The most important results of this research are listed below: 1.
The key wavelengths selected by ANN-FA to estimate the chemical properties of TA and taste index are located in areas outside the range of visible spectra. This is due to the third overtone N-H, the second overtone O-H and the third overtone C-H, which have caused spectral peaks (with useful spectral information) in the range of 870-960 nm [22]. 2.
Using the hybrid ANN-PSO algorithm, it is possible to estimate with results close to the true value of TA and taste index, e.g., correlation coefficient and RMSE of ANN-PSO for estimating TA and taste index are 0.9491, 0.042, 0.963 and 5.27, respectively. Therefore, since the cost of spectral data extraction is more at higher wavelengths, it can be concluded that it is possible to develop a portable device based on key wavelengths at a low cost for non-destructive estimation of these two properties. 3.
The reason for the superiority of the hybrid ANN-PSO algorithm against the PLSR method could be related to the non-linear nature of the artificial neural network and the optimal adjustment of its parameters by the particle swarm optimization algorithm. Funding: This research received no external funding.