Inner Properties Estimation of Gala Apple Using Spectral Data and Two Statistical and Artificial Intelligence Based Methods

Fruits provide various vitamins to the human body. The chemical properties of fruits provide useful information to researchers, including determining the ripening time of fruits and the lack of nutrients in them. Conventional methods for determining the chemical properties of fruits are destructive and time-consuming methods that have no application for online operations. For that, various researchers have conducted various studies on non-destructive methods, which are currently in the research and development stage. Thus, the present paper focusses on a non-destructive method based on spectral data in the 200–1100-nm region for estimation of total soluble solids and BrimA in Gala apples. The work steps included: (1) collecting different samples of Gala apples at different stages of maturity; (2) extracting spectral data of samples and pre-preprocessing them; (3) measuring the chemical properties of TSS and BrimA; (4) selecting optimal (effective) wavelengths using artificial neural network-simulated annealing algorithm (ANN-SA); and (5) estimating chemical properties based on partial least squares regression (PLSR) and hybrid artificial neural network known as the imperialist competitive algorithm (ANN-ICA). It should be noted that, in order to investigate the validity of the methods, the estimation algorithm was repeated 500 times. In the end, the results displayed that, in the best training, the ANN-ICA predicted the TSS and BrimA with correlation coefficients of 0.963 and 0.965 and root mean squared error of 0.167% and 0.596%, respectively.


Introduction
The marketing can be managed by enhancing cost-effective and non-destructive quality control systems in food industries. The NIR spectroscopy has been successfully used to measure the physicochemical features of food and agricultural products nondestructively [1][2][3][4]. The internal properties of various fresh fruits have been successfully evaluated for several decades using NIR spectroscopy [5][6][7][8][9][10][11][12].
The feasibility of predicting the soluble solid content (SSC) of citrus was investigated by Tian et al. [23] using portable Vis/NIR spectroscopy (550-1100 nm). The original spectra was preprocessed to improve the data. The applications of Vis/NIR spectroscopy were reviewed by Li et al. [18] for quality evaluation of oilseeds. The ability of spectroscopy to identify the geographical origin of oilseeds and edible oils was studied. Maniwara et al. [19] estimated soluble solid contents (SSC), titratable acidity (TA), and the pulp content (PC) of purple berry fruit using NIR spectroscopy. They developed prediction models based on partial least squares (PLS) at the range of NIR spectrum. A non-destructive and rapid method was proposed by Xia et al. [17] to determine sesamin and sesamolin in sesame using NIR spectroscopy. Some sesame samples were collected from three different regions of China and the partial least squares (PLS) model. Huang et al. [24] used spatially resolved (SR) spectroscopy to evaluate the quality of tomatoes (at the range of 550-1650 nm). The results were obtained with two conventional single point spectroscopes (SP) at the range of 400 to 1000 nm and 900 to 1300 nm. The partial least squares (PLS) method was used to predict SSC and PH. A partial least squares model (PLS) was developed by Nascimentoa et al. [25] to measure the SSC and firmness of peach fruits with low and healthy frost, and to investigate the effect of maturity stage. FT-NIR spectra were obtained at three stages of maturity. They evaluated the performance of model through R 2 and root mean squared error. The PLS method did not successfully classify fruits based on red and white skin color, maturity stages, and harvest season. A combination of a support vector machine with a feature extraction algorithm and X-ray computed tomography was presented by Looverbosch et al. [26] for the successful detection of internal malformations of pear. Wang et al. [27] evaluated kiwi fruit based on storage damage that is a physiological anomaly of colds. Water core is a symptom of this disease. Early signs appear only inside the fruit. Therefore, early detection is very helpful. There was a significant difference between damaged fruits and healthy fruits.
As obviously seen, various researches have been carried out on non-destructive methods to estimate the various physicochemical properties of fruits as well as their internal defects. Hence, this study is directed to present a non-destructive method based on hybrid ANN-ICA and PLSR using NIR spectroscopy to estimate properties of TSS and BrimA in Gala cultivar apples.
The innovations of the present study are (1) adjusting parameters of the artificial neural network optimally to guarantee high performance of the artificial neural network method to predict the properties of total soluble solids (TSS) and BrimA (TSS-K × TA); (2) selecting the optimal wavelengths using simulated annealing algorithm; and (3) performing the proposed algorithm for 500 times to assess the reliability of algorithm. The main difference between the current study with the other researches is that most of them either use linear statistical methods that use a simple ANN without any adjustment of parameters and so often gain errors in complex data. Moreover, in most researches, algorithms are executed only once; thus, they cannot achieve the same reliability. Figure 1 represents the different stages of different work stages of proposed method to predict TSS and BrimA of Gala apples. There are six main steps for training the proposed algorithm.

Collecting the Samples Used to Train the Proposed Algorithm
Gala apples were harvested in three different stages of their growth in different gardens in Karaj-Iran (located at latitude 35.83266 and longitude 50.99155). According to gardener's experience, the harvest time of Gala apples were identified. Then, samples collected in 3 steps e.g., 14 days before harvesting time, at the time of harvest, and 7 days after that. At each step, 50 samples were collected and transferred to the laboratory for extracting the spectral data and measuring the chemical properties of TSS and BrimA.

Collecting the Samples Used to Train the Proposed Algorithm
Gala apples were harvested in three different stages of their growth in differ gardens in Karaj-Iran (located at latitude 35.83266 and longitude 50.99155). According gardener's experience, the harvest time of Gala apples were identified. Then, samp collected in 3 steps e.g., 14 days before harvesting time, at the time of harvest, and 7 d after that. At each step, 50 samples were collected and transferred to the laboratory extracting the spectral data and measuring the chemical properties of TSS and BrimA.

Obtaining the Spectral Data
The hardware components for extracting spectral data includes a laptop (Intel C i5, 500 M, 4 GB of RAM at 2.13 GHz, Windows 7) equipped with Spectra Wiz softwar spectrometer (EPP200NIR (StrllarNet, Tampa, FL, USA) in a 200-1100-nm region (V NIR) with resolution of 2 nm, optical fiber and light source (tungsten halogen). Spect data (5 scans per sample) were extracted from different intact apple samples. Finally, average of these 5 scans was used for analysis. Spectral data often contain noise owing various reasons, such as disturbing ambient light and unevenness of surface in app [28]. Thus, in this study, three steps were used to generating exact data, nam conversion of reflectance spectra to absorption spectra by Equation (1), light scatter a baseline correction, and a smoothing operation using the wavelet filter [29].

Destructive Measurement of the Chemical Properties of TSS and BrimA
Sugars are the major soluble solids of fruit. As the fruit is ripe, the acid is conver to sugar and the accumulation of sugars in the fruit increases. Therefore, soluble so contents typically determine the ripening and harvesting time of the crop. On the ot hand, due to the effect of sugar and acid on the taste, the BrimA index is calculated bas on the amount of soluble solids and acid. This index is also used to determine the time ripeness in apples.

Obtaining the Spectral Data
The hardware components for extracting spectral data includes a laptop (Intel Core i5, 500 M, 4 GB of RAM at 2.13 GHz, Windows 7) equipped with Spectra Wiz software, a spectrometer (EPP200NIR (StrllarNet, Tampa, FL, USA) in a 200-1100-nm region (VIS-NIR) with resolution of 2 nm, optical fiber and light source (tungsten halogen). Spectral data (5 scans per sample) were extracted from different intact apple samples. Finally, the average of these 5 scans was used for analysis. Spectral data often contain noise owing to various reasons, such as disturbing ambient light and unevenness of surface in apples [28]. Thus, in this study, three steps were used to generating exact data, namely conversion of reflectance spectra to absorption spectra by Equation (1), light scatter and baseline correction, and a smoothing operation using the wavelet filter [29].

Destructive Measurement of the Chemical Properties of TSS and BrimA
Sugars are the major soluble solids of fruit. As the fruit is ripe, the acid is converted to sugar and the accumulation of sugars in the fruit increases. Therefore, soluble solid contents typically determine the ripening and harvesting time of the crop. On the other hand, due to the effect of sugar and acid on the taste, the BrimA index is calculated based on the amount of soluble solids and acid. This index is also used to determine the time of ripeness in apples.

TSS Content of Gala Apples
Since the amount of acid decreases and the amount of sugars increases with the ripening of fruits, therefore it is possible to identify the ripening stage of fruits, including Gala apples, using the values resulting from TSS determination that was applied by [30]. Normally, soluble solids determine the ripening and harvesting time of the crop. Soluble solids can be measured in a small sample of fruit extract by a refractometer. The refractometer shows the failure index. This indicator indicates the amount of reflected light rays after the light passes through the juice. Refractometers have a conversion scale and some others measure the amount of soluble solids in degrees of brix.

Property of BrimA
This feature is used to grade the fruits based on taste and is used as an indicator to assess the ripening stage. This property is calculated by Equation (2) where TA is the titratable acidity. To measure titratable acid (TA), 5 mL of the extract obtained from the sample is taken and, for dilution, 45 mL of distilled water is added to it. The extract diluted with sodium hydroxide (NaOH) is titrated to 0.1 N, and the acid can be calculated using the Equation (3) in percentage.
The K indicates the sensitivity of the tongue to acid-sugar ratio which is usually considered to be 5 depending on the sugars and acids of individual fruit [31].

Choosing the Key Wavelengths Using Hybrid ANN-SA Algorithm
The use of spectral data in the total range of 200-1100 nm has many limitations, which increases the computation complexity, due to high volume of data. On the other hand, the cost of development of portable devices and the computing time are the most important factors. Thus, determining effective and key spectral data should be helpful. In this study, the key wavelengths were selected by the ANN-SA algorithm. The simulated annealing algorithm (SA) simulates the annealing operation of metals. The annealing operation is needed to gain the least energetic and most stable state the material. The annealing operation will reach its goal whenever the temperature drop is slow enough. Therefore, first material is melted and then the temperature is decreased step by step until the material gets solid. In contrast, if material cools rapidly, the body will reach a near-optimal state that lacks the minimum energy [32]. In this study, a neural network was used to select effective wavelengths ( Table 1). The input of this ANN involved various vectors of spectral data selected by a SA algorithm. The outputs of network are TSS and BrimA. Since mean squared error (MSE) of the neural network was calculated and saved for each vector, the ones with a lower MSE were introduced as key wavelengths.

Non-Destructive Estimation of TSS and BrimA
Two methods of ANN-ICA and PLSR were used to estimate properties of TSS and BrimA. It should be noted that, in order to assess the validation of the mentioned methods, the algorithm was executed for 500 times. In each iteration, 30% of the data were allocated to test data, 10% to validation data, and 60% to train data.

Hybrid Artificial Neural Network-Imperialist Competitive Algorithm (ANN-ICA)
The multilayer perceptron artificial neural network has five adjustable parameters including the number of layers, the number of neurons, transfer function, back-propagation network training function, and back-propagation weight/bias learning function. The optimal adjustment of these layers guarantees high performance. In this paper, the imperialist competitive algorithm (ICA) has the task of optimally setting these parameters. ICA is an algorithm based on a mathematical model and simulation of human social and political evolution. The algorithm attempts to solve the problem by finding a general optimal point [33].
At first, the ICA algorithm includes a vector with a size equal to number of mentioned parameters. Each parameter is represented by a member. The mean squared error of vector was measured. If the mean squared error of vector was the lowest, it was assumed to be the most effective wavelength.

Partial Least Squares Regression (PLSR)
As mentioned, the partial least squares regression is a non-parametric method that does not require data normalization [29]. Furthermore, despite the lost data, it has a high statistical power. This helps to explain several independent and dependent variables simultaneously.

Parameters Measuring the Performance of ANN-ICA and PLSR
To investigate the performance of ANN-ICA and PLSR, the evaluating criteria, namely the correlation coefficient (R), the coefficient of determination (R 2 ), the mean squared error (MSE), the root mean squared error (RMSE), and the mean absolute error (MAE), were used [34].

Optimally Tuned ANN Structure Based on ICA
The best structure of ANN to predict the chemical property of TSS and BrimA is given in Table 2.    Figure 2 demonstrates the correlation plot between the mean predicted and true value of TSS of Gala cultivar (test set) by the ANN-ICA and PLSR classifiers based on spectral data in 500 iterations. The correlation coefficient for classifier ANN-ICA and PLSR is above 0.95 and 0.88, respectively, which is an acceptable value for estimating the TSS.  Figure 3a shows a box plot of several criteria for evaluating the efficiency of ANN-ICA for predicting TSS within Gala cultivar based on spectral data of effective wavelengths at 500 iterations. Compact box diagrams mean that the results are close at different iterations. Among all iterations, only one of them has the mean squared error of greater than 0.1. Additionally, in more than half of the iterations, MSE values were less than 0.06 and the correlation coefficient was above 0.9. Figure 3b demonstrates the performance of PLSR method for predicting TSS. In most iterations, RMSE is more than 0.55 and less than 0.65. Furthermore, in the best state of training, this value is less than 0.45.  Figure 3a shows a box plot of several criteria for evaluating the efficiency of ANN-ICA for predicting TSS within Gala cultivar based on spectral data of effective wavelengths at 500 iterations. Compact box diagrams mean that the results are close at different iterations. Among all iterations, only one of them has the mean squared error of greater than 0.1. Additionally, in more than half of the iterations, MSE values were less than 0.06 and the correlation coefficient was above 0.9. Figure 3b demonstrates the performance of PLSR method for predicting TSS. In most iterations, RMSE is more than 0.55 and less than 0.65. Furthermore, in the best state of training, this value is less than 0.45.     Figure 4 shows the correlation plot corresponding to the mean predicted and true value BrimA by the ANN-ICA and PLSR algorithm in 500 iterations. According to Figure 4a, the value of the correlation coefficient obtained for classifier ANN-ICA is above 0.94, which is an acceptable value for non-destructive estimation. Figure 4b shows the correlation plot between mean predicted and true value of BrimA of Gala apples by PLSR method. The correlation coefficient obtained in this case is above 0.86.   Figure 4 shows the correlation plot corresponding to the mean predicted and true value BrimA by the ANN-ICA and PLSR algorithm in 500 iterations. According to Figure  4a, the value of the correlation coefficient obtained for classifier ANN-ICA is above 0.94, which is an acceptable value for non-destructive estimation. Figure 4b shows the correlation plot between mean predicted and true value of BrimA of Gala apples by PLSR method. The correlation coefficient obtained in this case is above 0.86.   Figure 5 examines the performance of two classifiers, i.e., ANN-ICA and PLSR, using five criteria, including R, R 2 , MSE, RMSE, and MAE. Since in the error criteria, the distance between the first and middle quarters is less than the distance between middle and third quarters, so it can be concluded that, in most iterations, the results had a smaller error value.   Figure 5 examines the performance of two classifiers, i.e., ANN-ICA and PLSR, using five criteria, including R, R 2 , MSE, RMSE, and MAE. Since in the error criteria, the distance between the first and middle quarters is less than the distance between middle and third quarters, so it can be concluded that, in most iterations, the results had a smaller error value.  Table 3 compares the efficiency of ANN-ICA and PLSR methods for predicting TSS of Gala cultivar in 500 iterations using mean and standard deviation. The ANN-ICA has lower values than the PLSR method in error criteria. Moreover, the coefficients of   Table 3 compares the efficiency of ANN-ICA and PLSR methods for predicting TSS of Gala cultivar in 500 iterations using mean and standard deviation. The ANN-ICA has lower values than the PLSR method in error criteria. Moreover, the coefficients of correlation and determination of ANN-ICA are higher than PLSR. Therefore, it can be concluded that the ANN-ICA method has a better efficiency than the PLSR method for predicting the non-destructive chemical property of TSS.  Figure 6 represents the box plot of the difference between the true (measured) and predicted values of TSS in 500 iterations to evaluate the efficiency of ANN-ICA and PLSR for predicting TSS of Gala apples. In fact, more compact box plot diagrams indicate that the predicted values are closer to the true values.  Table 4 compares the efficiency of ANN-ICA and PLSR for predicting BrimA of Gala apple in 500 iterations using two criteria of mean and standard deviation. As can be seen, this method has a similar performance based on the error-related criteria. However, based  Table 4 compares the efficiency of ANN-ICA and PLSR for predicting BrimA of Gala apple in 500 iterations using two criteria of mean and standard deviation. As can be seen, this method has a similar performance based on the error-related criteria. However, based on the coefficients of correlation and determination, the ANN-ICA method has a higher performance than the PLSR method. Figure 7 also examines the performance of these two methods based on a graphical analysis.

Comparing the Efficiency of Proposed Methods in this Study with other Studies
The results represent that the proposed method in this article can predict the value of TSS content with correlation coefficient above 0.96. Tian et al. [23] and Maniwara et al. [19] proves this issue. Tian et al. [23] studied the prediction of TSS of citrus. Three apparent absorption peaks were found at 710, 810, and 915 nm in the original spectrum curve. Then, the key wavelengths were determined. Finally, prediction models were created based on

Comparing the Efficiency of Proposed Methods in This Study with Other Studies
The results represent that the proposed method in this article can predict the value of TSS content with correlation coefficient above 0.96. Tian et al. [23] and Maniwara et al. [19] proves this issue. Tian et al. [23] studied the prediction of TSS of citrus. Three apparent absorption peaks were found at 710, 810, and 915 nm in the original spectrum curve. Then, the key wavelengths were determined. Finally, prediction models were created based on entire wavelength and the key wavelengths. The results showed that the optimal prediction model had a correlation coefficient, a root mean squared error, and residual prediction deviation of 0.965, 0.584, and 512, respectively. The evaluation of purple berry fruit was conducted by Maniwara et al. [19] using NIR spectroscopy; the results indicated a significant relationship between the estimated and the actual values (0.84, 0.91, and 0.99 for SSC, TA, and PC, respectively).

Conclusions
Sugars are the major soluble solids in fruit. Herein, the present paper proposed the prediction of TSS and BrimA of Gala cultivar apples using ANN-ICA and PLSR based on spectral data related to key wavelengths. Low coefficient of determination in nondestructive estimation of soluble solids can be related to inappropriate spectral range, the effect of light scattering through changing the detector distance, change in sample size, surface roughness, noise caused by spectrometer temperature rise, software error, etc.
The most important results obtained from comparing two methods indicated that the ANN-ICA method with a correlation coefficient of 0.963 and an RMSE of 0.167 was able to predict the chemical property of TSS, while the correlation coefficient and RMSE of the PLSR method was 0.953 and 0.432, respectively. The same conclusion was made for property of BrimA.
In the end, it is suggested that the mentioned properties is predicted non-destructively at the range of 1000 to 2500 nm to be compared with the results of the present research. At higher wavelengths, the penetration of rays into the fruit is greater and provide more spectral information. So, new results may be obtained. Of course, it should be noted that developing a portable device based on wavelengths at the range of 200-1100 nm is less complex than wavelengths of 1000-2500 nm and will certainly be more practical.