Prediction of Soluble Solids Content in Green Plum by Using a Sparse Autoencoder

: The soluble solids content (SSC) a ﬀ ects the ﬂavor of green plums and is an important parameter during processing. In recent years, the hyperspectral technology has been widely used in the nondestructive testing of fruit ingredients. However, the prediction accuracy of most models can hardly be improved further. The rapid development of deep learning technology has established the foundation for the improvement of building models. A new hyperspectral imaging system aimed at measuring the green plum SSC is developed, and a sparse autoencoder (SAE)–partial least squares regression (PLSR) model is combined to further improve the accuracy of component prediction. The results of the experiment show that the SAE–PLSR model, which has a correlation coe ﬃ cient of 0.938 and root mean square error of 0.654 for the prediction set, can achieve better performance for the SSC prediction of green plums than the three traditional methods. In this paper, integration approaches have combined three di ﬀ erent pretreatment methods with PLSR to predict the SSC in green plums. The SAE–PLSR model has shown good prediction performance, indicating that the proposed SAE–PLSR model can e ﬀ ectively detect the SSC in green plums.


Introduction
Green plum, also known as sour plum, is one of the traditional fruits that has been cultivated for thousands of years in China. Green plum contains many vitamins, trace elements (iron, phosphorus, potassium, copper, calcium, and zinc), and 17 kinds of amino acids. Green plum can help digestion, stimulate appetite, eliminate fatigue, protect the liver, and fight aging.
Green plums that are not mature enough contain a large amount of organic acids, which results in sour-tasting flesh when eaten raw. Therefore, green plums are usually processed into plum essence and plum wine. Mature green plums contain more sugar and are used for jam or crisp fruit. Therefore, the content of acid and sugar in plums greatly affects their subsequent processing.
At present, the form of the green plum industry in China is relatively simple. Most of the green plums are sold raw, and the degree of processing is low, which leads to the low comprehensive utilization value of green plums. Thus, the income of fruit farmers cannot be increased. During processing, the green plums are divided on the basis of their defects, hardness, acidity, and sugar content. When measuring the sugar content of fruits, the soluble solids content (SSC) [1,2], which refers to soluble sugars and includes monosaccharides, disaccharides, and polysaccharides, is usually used as the indicator. At present, the sorting of green plums depends on manual methods of observation and classification by experienced workers. However, large differences exist among green plum individuals during sorting due to factors, such as light and varieties. The cost of manual sorting is high, and the and visualized to reflect the SSC prediction result of each green plum to facilitate the subsequent sorting of green plums.

Green Plum Samples
The green plums were purchased from Dali (Yunnan, China). The green plums that are extremely small or have large areas of bad spots and rot were removed. The samples were stored in a laboratory refrigerator maintained at 4 • C. During each test, each sample was randomly selected and placed at room temperature. The spectral data collection and the physicochemical experiments were performed when the fruit temperature was the same as the room temperature.

Equipment
The GaiaField-V10E-AZ4 visible near-infrared hyperspectral camera of Shuanglihepu Company (Sichuan, China), the main instrument used in the experiments, had a spectral imaging range of 400-1000 nm and a spectral resolution of 2.8 nm.
On the basis of this instrument, a new green plum hyperspectral imaging system was developed. The system consisted of a camera, a light source, a conveyor belt, and a computer, as shown in Figure 1a. The computer controlled the camera and the conveyor belt. The speed of the conveyor belt controlled the camera's shooting speed, ensuring that the sample smoothly passed through the field of view of the camera and made the obtained data closer to reality. The light emitted by the halogen light source fell on the sample through the reflection of the dome, whose interior is evenly coated with Teflon, ensuring that the light on the sample was as uniform as possible and keeping the light in the camera's field of view constant, which guaranteed subsequent data processing, as shown in Figure 1b. The SSC was measured using the PAL-1 handheld Brix content spindle, as shown in Figure 1c.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 14 The green plums were purchased from Dali (Yunnan, China). The green plums that are extremely small or have large areas of bad spots and rot were removed. The samples were stored in a laboratory refrigerator maintained at 4 °C. During each test, each sample was randomly selected and placed at room temperature. The spectral data collection and the physicochemical experiments were performed when the fruit temperature was the same as the room temperature.

Equipment
The GaiaField-V10E-AZ4 visible near-infrared hyperspectral camera of Shuanglihepu Company (Sichuan, China), the main instrument used in the experiments, had a spectral imaging range of 400-1000 nm and a spectral resolution of 2.8 nm.
On the basis of this instrument, a new green plum hyperspectral imaging system was developed. The system consisted of a camera, a light source, a conveyor belt, and a computer, as shown in Figure  1(a). The computer controlled the camera and the conveyor belt. The speed of the conveyor belt controlled the camera's shooting speed, ensuring that the sample smoothly passed through the field of view of the camera and made the obtained data closer to reality. The light emitted by the halogen light source fell on the sample through the reflection of the dome, whose interior is evenly coated with Teflon，ensuring that the light on the sample was as uniform as possible and keeping the light in the camera's field of view constant, which guaranteed subsequent data processing, as shown in Figure 1(b). The SSC was measured using the PAL-1 handheld Brix content spindle, as shown in Figure 1

Hyperspectral Data Acquisition
Before testing, the hyperspectral imaging system was turned on and warmed up for 30 min to determine the best parameters of spectra acquired through the pretest. The exposure time and the moving speed of the conveyor belt were 1.2 ms and 0.6 cm/s, respectively. After scanning the entire green plum, the obtained hyperspectral data were calibrated using standard white and dark reference images. The white reference image is obtained from the 99% standard reflectance plate whose surface is made of Teflon, and this reflectance plate is provided by the camera manufacturer. The dark reference image is obtained after covering the camera lens cap. The calibration image A0 is defined

Hyperspectral Data Acquisition
Before testing, the hyperspectral imaging system was turned on and warmed up for 30 min to determine the best parameters of spectra acquired through the pretest. The exposure time and the moving speed of the conveyor belt were 1.2 ms and 0.6 cm/s, respectively. After scanning the entire green plum, the obtained hyperspectral data were calibrated using standard white and dark reference images. The white reference image is obtained from the 99% standard reflectance plate whose surface is made of Teflon, and this reflectance plate is provided by the camera manufacturer. The dark reference image is obtained after covering the camera lens cap. The calibration image A 0 is defined as: where A 0 is the green plum spectral reflectance data after the calibration, A is the green plum spectral raw data, A D is the dark field spectral reflectance data, and A W is the 99% reflectance plate spectral data.

Green Plum SSC Testing
After the spectral data were collected, the green plum juice was squeezed out of the green plum immediately. The slot of the Brix content spindle was cleaned using distilled water and wiped dry. An appropriate amount of green plum juice was poured into the sample tank to record the SSC value. The SSC of each sample was measured thrice, and their average was set as the SSC value of the sample for subsequent data processing.
There were 366 samples selected and sorted according to the SSC value. One of every four samples was randomly selected as a prediction set sample, and the remaining were used for training. Finally, 274 and 92 samples were selected and constructed as a calibration set and a prediction set, respectively. Table 1 shows the measured SSC values of the plum samples.

Image Processing
The hyperspectral imaging system was used to collect the green plum image. Figure 2 shows the pseudocolor image of the samples. The ENVI5.3 software region of interest tool was used to select the green plum part from the image, and the average spectral reflectance was calculated. The Matlab2016b software was used for pretreatment, modeling, and green plum spectral data analysis. Figure 3 shows the original spectral reflectance curves of all the green plum samples.
where A0 is the green plum spectral reflectance data after the calibration, A is the green plum spectral raw data, AD is the dark field spectral reflectance data, and AW is the 99% reflectance plate spectral data.

Green Plum SSC Testing
After the spectral data were collected, the green plum juice was squeezed out of the green plum immediately. The slot of the Brix content spindle was cleaned using distilled water and wiped dry. An appropriate amount of green plum juice was poured into the sample tank to record the SSC value. The SSC of each sample was measured thrice, and their average was set as the SSC value of the sample for subsequent data processing.
There were 366 samples selected and sorted according to the SSC value. One of every four samples was randomly selected as a prediction set sample, and the remaining were used for training. Finally, 274 and 92 samples were selected and constructed as a calibration set and a prediction set, respectively. Table 1 shows the measured SSC values of the plum samples.

Image Processing
The hyperspectral imaging system was used to collect the green plum image. Figure 2 shows the pseudocolor image of the samples. The ENVI5.3 software region of interest tool was used to select the green plum part from the image, and the average spectral reflectance was calculated. The Matlab2016b software was used for pretreatment, modeling, and green plum spectral data analysis. Figure 3 shows the original spectral reflectance curves of all the green plum samples. .

SAE
The autoencoder, which is usually used for the pretraining of the original high-dimensional data to reduce the data dimension and remove some useless information, is an unsupervised model of artificial intelligence deep learning [23,24]. This model can reduce the subsequent training pressure of the model and improve training accuracy. The autoencoder included an encoder and a decoder.
The autoencoder was used to reconstruct the input data at the output layer and ensure that the output signals [X1′, X2′, X3′, …, Xn′] were exactly the same as the input signals [X1, X2, X3, …, Xn]. The encoding and the decoding processes of the autoencoder are defined as: where h is the encoded hidden layer feature parameter; W1 and b1 are the weight and the offset, respectively, of the encoder; W2 and b2 are the weight and the offset, respectively, of the decoder; fe and fd are the activation functions of the encoder and the decoder, respectively. The commonly used activation functions are sigmoid, tanh, and Relu.
The definition of the loss function of the autoencoder is given to minimize the difference between the input and output data of the autoencoder.
where n is the number of input samples, xi and xi ' represent the input and the output, respectively, of sample i. The output of the autoencoder was almost equal to the input. However, the intermediate hidden layer obtained by encoding the input layer can restore the original features after decoding. Therefore, the hidden layer implemented the abstraction of the raw data. The abstraction was another way to represent raw data. When the number of neurons in hidden layer was greater than or equal to the dimension of the input layer, the data were embedded in the neural network mechanically, and features were not extracted. When the number of neurons in the hidden layer was less than the dimension of the input layer, autoencoding can extract the features. Therefore, on the basis of traditional autoencoders, the sparsity restriction was added to the hidden layer neurons to form the SAEs [25,26]. Figure 4 shows the structure of the SAE. Most neurons of the hidden layer in the SAE were restricted, presented as H1 and Hm in Figure 4.

SAE
The autoencoder, which is usually used for the pretraining of the original high-dimensional data to reduce the data dimension and remove some useless information, is an unsupervised model of artificial intelligence deep learning [23,24]. This model can reduce the subsequent training pressure of the model and improve training accuracy. The autoencoder included an encoder and a decoder.
The autoencoder was used to reconstruct the input data at the output layer and ensure that the output signals [X 1 , X 2 , X 3 , . . . , X n ] were exactly the same as the input signals [X 1 , X 2 , X 3 , . . . , X n ]. The encoding and the decoding processes of the autoencoder are defined as: where h is the encoded hidden layer feature parameter; W 1 and b 1 are the weight and the offset, respectively, of the encoder; W 2 and b 2 are the weight and the offset, respectively, of the decoder; f e and f d are the activation functions of the encoder and the decoder, respectively. The commonly used activation functions are sigmoid, tanh, and Relu. The definition of the loss function of the autoencoder is given to minimize the difference between the input and output data of the autoencoder.
where n is the number of input samples, x i and x i represent the input and the output, respectively, of sample i. The output of the autoencoder was almost equal to the input. However, the intermediate hidden layer obtained by encoding the input layer can restore the original features after decoding. Therefore, the hidden layer implemented the abstraction of the raw data. The abstraction was another way to represent raw data. When the number of neurons in hidden layer was greater than or equal to the dimension of the input layer, the data were embedded in the neural network mechanically, and features were not extracted. When the number of neurons in the hidden layer was less than the dimension of the input layer, autoencoding can extract the features. Therefore, on the basis of traditional autoencoders, the sparsity restriction was added to the hidden layer neurons to form the SAEs [25,26]. Figure 4 shows the structure of the SAE. Most neurons of the hidden layer in the SAE were restricted, presented as H 1 and H m in Figure 4. The sigmoid is an example of an activation function. A neuron was activated when its output was close to 1 and suppressed when its output was close to 0. The sparsity parameter ρ was introduced to suppress the neuron activity. ρ was adopted close to 0 to make the average activation degree of the hidden layer neurons equal to ρ as much as possible. The sparsity restriction was added to the original autoencoder as an additional penalty factor.
where j stands for each neuron in the hidden layer, s is the number of hidden layer neurons, is the average activation degree of hidden layer neurons, and ρ is a sparsity parameter.
After adding the sparsity restriction to the SAE, the cost function is defined as: where β is the weight of the sparsity penalty factor. The sparsity restriction made most of the neurons in the hidden layer restricted. Therefore, the encoding process of the SAE extracted the low-dimensional feature vectors from the highdimensional data. Through self-learning, the SAE can obtain the effective features from raw data to reduce the data dimension and interference factors of the original information and avoid overfitting caused by excessive high dimensions and problems, such as the collinearity in the raw data.

SAE-PLSR
Aiming at predicting the SSC in green plum, a multi-layer network model SAE-PLSR was proposed, as shown in Figure 5. The input layer of SAE1 was the input layer of SAE-PLSR, and the hidden layers of SAE1 and SAE2 were the first hidden layer and second hidden layer of SAE-PLSR. PLSR was the third hidden layer of SAE-PLSR. The output of SAE2 hidden layer was passed to the PLSR model and trained. As a statistical regression method, the PLSR model has been widely used in the prediction research by spectral pattern [27,28]. The result of the third hidden layer was directly passed to the output layer as the output result of SAE-PLSR.
SAE uses an unsupervised training method, whose purpose is to reconstruct the input data. In order to improve accuracy, a supervised fine turning was added [29,30]. The results of SAE-PLSR were compared with the actual results, the error was calculated and backpropagated and the parameters in the network were adjusted.
Before network training, the green plum spectral data was pre-trained by SAEs. The weights and offsets of the first hidden layer and second hidden layer 2 of SAE-PLSR were initialized through The sigmoid is an example of an activation function. A neuron was activated when its output was close to 1 and suppressed when its output was close to 0. The sparsity parameter ρ was introduced to suppress the neuron activity. ρ was adopted close to 0 to make the average activation degree of the hidden layer neuronsρ j equal to ρ as much as possible. The sparsity restriction was added to the original autoencoder as an additional penalty factor.
where j stands for each neuron in the hidden layer, s is the number of hidden layer neurons,ρ j is the average activation degree of hidden layer neurons, and ρ is a sparsity parameter. After adding the sparsity restriction to the SAE, the cost function is defined as: where β is the weight of the sparsity penalty factor. The sparsity restriction made most of the neurons in the hidden layer restricted. Therefore, the encoding process of the SAE extracted the low-dimensional feature vectors from the high-dimensional data. Through self-learning, the SAE can obtain the effective features from raw data to reduce the data dimension and interference factors of the original information and avoid overfitting caused by excessive high dimensions and problems, such as the collinearity in the raw data.

SAE-PLSR
Aiming at predicting the SSC in green plum, a multi-layer network model SAE-PLSR was proposed, as shown in Figure 5. The input layer of SAE1 was the input layer of SAE-PLSR, and the hidden layers of SAE1 and SAE2 were the first hidden layer and second hidden layer of SAE-PLSR. PLSR was the third hidden layer of SAE-PLSR. The output of SAE2 hidden layer was passed to the PLSR model and trained. As a statistical regression method, the PLSR model has been widely used in the prediction research by spectral pattern [27,28]. The result of the third hidden layer was directly passed to the output layer as the output result of SAE-PLSR.
SAE uses an unsupervised training method, whose purpose is to reconstruct the input data. In order to improve accuracy, a supervised fine turning was added [29,30]. The results of SAE-PLSR were compared with the actual results, the error was calculated and backpropagated and the parameters in the network were adjusted.
Before network training, the green plum spectral data was pre-trained by SAEs. The weights and offsets of the first hidden layer and second hidden layer 2 of SAE-PLSR were initialized through the weights and offsets obtained by SAEs. During the network training process, the W 3 and B 3 calculated by the PLSR were updated by the PLSR itself. The updating principle of weights and offsets is based on the back propagation principle, so it can be obtained as follows: where E is the deviation between the output result and the actual result, W i is the weight transferred from layer i to layer i+1, B i is the offset transferred from layer i to layer i+1, X i is the input of layer i, η is the gradient decline proportion coefficient, and δ i+1 is calculated by: The δ 4 of output layer is calculated by: The result of PLSR was directly transferred to the output layer. So the activation function is equivalent to Y = X and its derivatives is calculated by: The activation function of SAEs is the Sigmod function, so its derivatives is calculated by: Since SAE added a penalty term to the cost function, δ i+1 was updated as: Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 14 the weights and offsets obtained by SAEs. During the network training process, the W 3 and B 3 calculated by the PLSR were updated by the PLSR itself. The updating principle of weights and offsets is based on the back propagation principle, so it can be obtained as follows: where E is the deviation between the output result and the actual result, W i is the weight transferred from layer i to layer i+1, B i is the offset transferred from layer i to layer i+1, X i is the input of layer i, η is the gradient decline proportion coefficient, and δ i+1 is calculated by: The result of PLSR was directly transferred to the output layer. So the activation function is equivalent to Y = X and its derivatives is calculated by: ′( + ) = 1 (11) The activation function of SAEs is the Sigmod function, so its derivatives is calculated by: Since SAE added a penalty term to the cost function, δ i+1 was updated as: ...

Performance Analysis of SAE-PLSR Model
The input of SAE-PLSR was a hyperspectral characteristic curve and the corresponding SSC value of a 119-dimensional green plum. The number of neurons in the SAE was set to [90,55] and the sparsity parameter ρ was set to 0.01. The SAE was used to train the raw data beforehand. The hidden layer outputs of the two SAEs are shown in Figure 6.
The correlation coefficient (R) and root mean squared error (RMSE) were used to evaluate the performance of the models. R C (correlation coefficient in calibration set) and RMSEC (root mean squared error in calibration set) were used to evaluate the performance of the calibration set, whereas R P (correlation coefficient in prediction set) and RMSEP (root mean squared error in prediction set) were used to evaluate the performance of the prediction set. After training, the prediction performance of the SAE-PLSR model on green plum was quantified, the R C , RMSEC, R P , and RMSEP values were 0.957, 0.542, 0.938, and 0.654, respectively.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 14 The input of SAE-PLSR was a hyperspectral characteristic curve and the corresponding SSC value of a 119-dimensional green plum. The number of neurons in the SAE was set to [90,55] and the sparsity parameter ρ was set to 0.01. The SAE was used to train the raw data beforehand. The hidden layer outputs of the two SAEs are shown in Figure 6.
The correlation coefficient (R) and root mean squared error (RMSE) were used to evaluate the performance of the models. RC (correlation coefficient in calibration set) and RMSEC (root mean squared error in calibration set) were used to evaluate the performance of the calibration set, whereas RP (correlation coefficient in prediction set) and RMSEP (root mean squared error in prediction set) were used to evaluate the performance of the prediction set. After training, the prediction performance of the SAE-PLSR model on green plum was quantified, the RC, RMSEC, RP, and RMSEP values were 0.957, 0.542, 0.938, and 0.654, respectively.  In order to evaluate the performance of the present model, the SAE-PLSR model was compared with three traditional methods, namely, BP neural network [31], SVR machine [32], and PLSR [33] to reflect prediction performance. For these methods, Li et al. [34] have used PLSR model to predict the SSC of plum with the short wave near infrared hyperspectral image and obtained good prediction results. Figure 7 shows the prediction results of different models and Table 2 shows the results of the comparison between the SAE-PLSR model and other traditional models.
The results of the comparison showed that the PLSR model had good prediction performance on the calibration set. However, the prediction performance on the prediction set was not good enough. Compared with the calibration set, the prediction set had a 82.6% increase in RMSEP and 7.5% decrease in RP, indicating that the robustness of the PLSR model was not good or that overfitting occurred. The SVR and the BP models were inferior to the PLSR model in prediction performance. The RP for the prediction sets of the SVR and the BP models decreased by 1.5% and 2.5%, respectively, and the RMSEP for the prediction sets of the SVR and the BP models increased by 1.4% and 9.2%, respectively. The prediction performance of the SAE-PLSR model greatly improved compared with that of the PLSR model. The calibration set had a 1.1% decrease in RC and a 22.6% increase in RMSEC. The main reason for the analysis results was that the input of the PLSR model was the raw spectral pattern of green plum, which contained 119-dimensional data. The SAE-PLSR model only had a PLSR module input of only 55-dimensional data. Therefore, the calibration set of the SAE-PLSR model was not as effective as that of the PLSR model. However, the prediction set of SAE-PLSR model had better performance compared with that of the PLSR model, the RP increased by 4.8% and the RMSEP decreased by 19.0%. Meanwhile, the PLSR model in the SAE-PLSR model was replaced with BP and SVR for prediction and analysis. The experiment found that the prediction effect of the two models was not as good as SAE-PLSR. The RP of the SAE-BP and the SAE-SVR models decreased by 0.9% and 1.2%, respectively, and the RMSEP of the SAE-BP and the SAE-SVR models increased by 2.1% and 3.8%. These results indicated that the SAE-PLSR model had good performance in the In order to evaluate the performance of the present model, the SAE-PLSR model was compared with three traditional methods, namely, BP neural network [31], SVR machine [32], and PLSR [33] to reflect prediction performance. For these methods, Li et al. [34] have used PLSR model to predict the SSC of plum with the short wave near infrared hyperspectral image and obtained good prediction results. Figure 7 shows the prediction results of different models and Table 2 shows the results of the comparison between the SAE-PLSR model and other traditional models.
The results of the comparison showed that the PLSR model had good prediction performance on the calibration set. However, the prediction performance on the prediction set was not good enough. Compared with the calibration set, the prediction set had a 82.6% increase in RMSEP and 7.5% decrease in R P , indicating that the robustness of the PLSR model was not good or that overfitting occurred. The SVR and the BP models were inferior to the PLSR model in prediction performance. The R P for the prediction sets of the SVR and the BP models decreased by 1.5% and 2.5%, respectively, and the RMSEP for the prediction sets of the SVR and the BP models increased by 1.4% and 9.2%, respectively. The prediction performance of the SAE-PLSR model greatly improved compared with that of the PLSR model. The calibration set had a 1.1% decrease in R C and a 22.6% increase in RMSEC. The main reason for the analysis results was that the input of the PLSR model was the raw spectral pattern of green plum, which contained 119-dimensional data. The SAE-PLSR model only had a PLSR module input of only 55-dimensional data. Therefore, the calibration set of the SAE-PLSR model was not as effective as that of the PLSR model. However, the prediction set of SAE-PLSR model had better performance compared with that of the PLSR model, the R P increased by 4.8% and the RMSEP decreased by 19.0%.
Meanwhile, the PLSR model in the SAE-PLSR model was replaced with BP and SVR for prediction and analysis. The experiment found that the prediction effect of the two models was not as good as SAE-PLSR. The R P of the SAE-BP and the SAE-SVR models decreased by 0.9% and 1.2%, respectively, and the RMSEP of the SAE-BP and the SAE-SVR models increased by 2.1% and 3.8%. These results indicated that the SAE-PLSR model had good performance in the green plum SSC prediction.

Performance Analysis of Feature Extraction Methods
The SNR of some bands in the hyperspectral data was relatively low because of the influence of noise. At the same time, a linear correlation or redundant information between the data in different bands may exist. Modeling with the whole band may lower the prediction accuracy of the model. Therefore, the method of extracting feature wavelengths can eliminate some of the band information and improve the prediction performance of the model [35]. The training process of the SAE was the process of extracting features from the raw data. The output of the hidden layer also completed the function of dimension reduction on the basis of feature extraction. PLSR was used as the prediction model in this study. Three feature wavelength extraction methods, namely, SPA, CARS, and genetic algorithm, were selected and integrated with PLSR. The three integrated models were compared with the SAE-PLSR model. Figure 8 shows the prediction results of the different models. Table 3 shows the results of comparison of the different models.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 14 The SNR of some bands in the hyperspectral data was relatively low because of the influence of noise. At the same time, a linear correlation or redundant information between the data in different bands may exist. Modeling with the whole band may lower the prediction accuracy of the model. Therefore, the method of extracting feature wavelengths can eliminate some of the band information and improve the prediction performance of the model [35]. The training process of the SAE was the process of extracting features from the raw data. The output of the hidden layer also completed the function of dimension reduction on the basis of feature extraction. PLSR was used as the prediction model in this study. Three feature wavelength extraction methods, namely, SPA, CARS, and genetic algorithm, were selected and integrated with PLSR. The three integrated models were compared with the SAE-PLSR model. Figure 8 shows the prediction results of the different models. Table 3 shows the results of comparison of the different models.   Table 3 shows that the CARS method had poor performance on the green plum SSC prediction. The prediction accuracy of the CARS method was lower than that of the simple PLSR model. Compared with that of the PLSR model, the prediction set of the CARS method had a 5.7% decrease in RP and a 20.1% increase in RMSEP. The SPA and the GA methods had good performance on reducing the data dimension and improving prediction performance. The RP for the prediction set of   Table 3 shows that the CARS method had poor performance on the green plum SSC prediction. The prediction accuracy of the CARS method was lower than that of the simple PLSR model. Compared with that of the PLSR model, the prediction set of the CARS method had a 5.7% decrease in R P and a 20.1% increase in RMSEP. The SPA and the GA methods had good performance on reducing the data dimension and improving prediction performance. The R P for the prediction set of the SPA and the GA methods increased by 2.6% and 1.2%, respectively, and the RMSEP for the prediction set of the SPA and the GA methods decreased by 11.3% and 5.6%, respectively. Compared with the previous two methods, the SAE-PLSR model still had better prediction performance. For feature wavelength extraction, the prediction set of the SAE-PLSR model had a 2.2% increase in R P and an 8.7% decrease in RMSEP compared with that of the SPA-PLSR. The green plum SSC prediction verified the SAE-PLSR model.

Influence of Sparsity Parameter ρ on Prediction Results
The sparsity parameter ρ plays a key role in model training. Different sparsity parameters determine the degree of neuron activation. In order to explore the influence of sparsity parameters on the prediction results of the model, three different sparsity parameters of 0.01, 0.1, and 1 were selected for modeling. Figure 9 shows the prediction results of the different models. The results is shown in Table 4. When the sparsity parameter is 1, the neurons are activated without any restriction, which has R P and RMSEP values of 0.910 and 0.750, respectively. When the sparsity parameter is 0.1, compared with that of 1, the model prediction performance is improved, which has a 2.2% increase in R P and a 12.1% decrease in RMSEP. When the sparsity parameter is 0.01, compared with that of 1, the model prediction performance is further improved, which has a 3.1% increase in R P and a 12.8% decrease in RMSEP. This shows that the sparsity parameter can affect the SAE-PLSR model prediction performance. The reason may be that the sparsity restriction is added, so that most of the useless features and the features that are harmful to the model prediction are inhibited, and the influence of the useful features is amplified [36]. So, the prediction accuracy of the model is improved. In this paper, the sparsity parameter 0.01 is used to effectively improve the model prediction performance.
the SPA and the GA methods increased by 2.6% and 1.2%, respectively, and the RMSEP for the prediction set of the SPA and the GA methods decreased by 11.3% and 5.6%, respectively. Compared with the previous two methods, the SAE-PLSR model still had better prediction performance. For feature wavelength extraction, the prediction set of the SAE-PLSR model had a 2.2% increase in RP and an 8.7% decrease in RMSEP compared with that of the SPA-PLSR. The green plum SSC prediction verified the SAE-PLSR model.

Influence of Sparsity Parameter ρ on Prediction Results
The sparsity parameter ρ plays a key role in model training. Different sparsity parameters determine the degree of neuron activation. In order to explore the influence of sparsity parameters on the prediction results of the model, three different sparsity parameters of 0.01, 0.1, and 1 were selected for modeling. Figure 9 shows the prediction results of the different models. The results is shown in Table 4. When the sparsity parameter is 1, the neurons are activated without any restriction, which has RP and RMSEP values of 0.910 and 0.750, respectively. When the sparsity parameter is 0.1, compared with that of 1, the model prediction performance is improved, which has a 2.2% increase in RP and a 12.1% decrease in RMSEP. When the sparsity parameter is 0.01, compared with that of 1, the model prediction performance is further improved, which has a 3.1% increase in RP and a 12.8% decrease in RMSEP. This shows that the sparsity parameter can affect the SAE-PLSR model prediction performance. The reason may be that the sparsity restriction is added, so that most of the useless features and the features that are harmful to the model prediction are inhibited, and the influence of the useful features is amplified [36]. So, the prediction accuracy of the model is improved. In this paper, the sparsity parameter 0.01 is used to effectively improve the model prediction performance.

Conclusions
SSC is one of the important indicators in the processing of green plums. The hyperspectral technology is used for testing to achieve the nondestructive testing of SSC. On the basis of the newly developed hyperspectral imaging system, the spectroscopic data of green plum are collected. Experimental analysis has proven that the use of visible-infrared spectroscopy can effectively predict the SSC in plums. The spectral information of 400-1000 nm is collected, and the SSC prediction is performed using the PLSR model, which has R P and RMSEP values of 0.895 and 0.807, respectively. The SAE-PLSR model is proposed on the basis of the traditional unsupervised learning models to improve the prediction accuracy of green plum SSC. The autoencoder is improved, and the multilayer SAEs are connected. Moreover, the PLSR module is used for regression prediction to back propagate the predicted error, adjust the model parameters, and improve the model prediction performance. The results of the experiment prove that compared with the traditional PLSR model, the SAE-PLSR model prediction set has a 4.8% increase in R P and a 19.0% decrease in RMSEP. The model also has advantages compared with the traditional feature extraction and regression prediction methods. By comparison, the prediction set has a 2.2% increase in R P and an 8.7% decrease in RMSEP. The SAE-PLSR model has good prediction performance on green plum SSC. However, the training time of the model has increased because of the multiple back-propagation processes. The model can still be improved later to reduce the training time of the model.