Prediction Models for the Content of Calcium, Boron and Potassium in the Fruit of ‘Huangguan’ Pears Established by Using Near-Infrared Spectroscopy

It has been proved that the imbalance of the proportion of elements of ‘Huangguan’ pears in the pulp and peel, especially calcium, boron and potassium, may be important factors that can seriously affect the pears’ appearance quality and economic benefits. The objective of this study was to predict the content of calcium, boron and potassium in the pulp and peel of ‘Huangguan’ pears nondestructively and conveniently by using near-infrared spectroscopy (900–1700 nm) technology. Firstly, 12 algorithms were used to preprocess the original spectral data. Then, based on the original and preprocessed spectral data, full-band prediction models were established by using Partial Least Squares Regression and Gradient Boosting Regression Tree. Finally, the characteristic wavelengths were extracted by Genetic Algorithms to establish the characteristic wavelength prediction models. According to the prediction results, the value of the determination coefficient of the prediction sets of the best prediction models for the three elements all reached ideal levels, and the values of their Relative analysis error also showed high levels. Therefore, the micro near-infrared spectrometer based on machine learning can predict the content of calcium, boron and potassium in the pulp and peel of ‘Huangguan’ pears accurately and quickly. The results also provide an important scientific theoretical basis for further research on the degradation of the quality of ‘Huangguan’ pears caused by a lack of nutrients.


Introduction
'Huangguan' pears (Pyrus bretschneideri Rehd cv.Huangguan), which are famous for their excellent quality in Chinese and even the world's pear market, have been introduced and planted in many places with good performance and have a good development prospect [1,2]. In actual production, 'Huangguan' pears are often bagged to prevent pests and diseases. However, brown spots that often appear on the surface of bagged 'Huangguan' pears have been seriously affecting the appearance quality and economic benefits of the fruits [3,4]. A large number of studies have proved that the content of calcium, boron and potassium may be important factors affecting the appearance quality of bagged 'Huangguan' pears [5][6][7][8][9]. The traditional detection of the mineral nutrient content is mostly based on laboratory physical and chemical analysis, including inductively coupled plasma mass spectrometry (ICP-MS), atomic absorption spectrometry, UV-visible spectrophotometer measurement and so on [10][11][12]. Although the results are accurate, destructive sampling, high time consumption, high labor intensity, high costs and some other disadvantages have brought many limitations to the study of pear fruit mineral nutrition. If the mineral nutrient content in the fruit of pears can be conveniently and nondestructively detected by modern technology during the critical period of the growth and development of 'Huangguan' pears,

Near-Infrared Spectrum Data Acquisition
The near-infrared spectrum acquisition instrument used was a reflective handheld near-infrared spectrometer with the model 'NIR-S-G1 produced by the Shenzhen Ugreen Group, Ltd. (Figure 1) (Shenzhen, China). The detected spectral wavelength range was from 900 nm to 1700 nm with 228 bands of spectrum acquisition points on it. The spectral resolution was 3.89 nm, and the signal-to-noise ratio (SNR) was 5000:1. In the spectral measurement process, it is inevitable that the results will be affected by environmental factors. If the data are collected only once, the error will be too large. Therefore, in order to make the measurement results as accurate as possible, we took five points on the surface of each pear sample and used each point to collect spectral data once, and we repeated this five times. The specific operations were as follows: As shown in Figure 2, before collecting spectral data, an ellipse with a short axis length of about 5 cm and a long axis length of about 8 cm was delimited near the equator of the surface of each pear, and the area within the ellipse was used as the scope of subsequent spectral data collection and sampling. Then, we closed the scanning window at the front end of the spectrometer to the designated range on the surface of pears, scanned once at each spectral acquisition point and scanned 5 times for each pear sample in total. Finally, the average of the reflectance spectral data of the 5 scans of each pear sample was taken as the original modeling spectral data. Before each measurement, the spectrometer was calibrated with a black and white plate. The material used for the white board calibration was polytetrafluoroethylene. The instrument was closely attached to the calibration white board. The light emitted by the instrument shined on the white board and reflected. This reflected light was captured and recorded as the brightness value (W) of the white board; for the blackboard correction, we turned off the emission light source of the instrument and recorded the brightness value captured by the instrument when there was no light source, which was recorded as the blackboard brightness value (B). After calibration, the instrument was used to measure the spectral data of the pear surface, the light source window on the instrument was close to the pear sample and the reflected light obtained was recorded as the brightness value (R) of the pear surface. The spectral reflectance of the sample can be calculated by Formula (1). The app 'Instagram' on a mobile phone was connected with the instrument by Bluetooth before use, and then the spectral data was collected.
In the spectral measurement process, it is inevitable that the results will be affected by environmental factors. If the data are collected only once, the error will be too large. Therefore, in order to make the measurement results as accurate as possible, we took five points on the surface of each pear sample and used each point to collect spectral data once, and we repeated this five times. The specific operations were as follows: As shown in Figure 2, before collecting spectral data, an ellipse with a short axis length of about 5 cm and a long axis length of about 8 cm was delimited near the equator of the surface of each pear, and the area within the ellipse was used as the scope of subsequent spectral data collection and sampling. Then, we closed the scanning window at the front end of the spectrometer to the designated range on the surface of pears, scanned once at each spectral acquisition point and scanned 5 times for each pear sample in total. Finally, the average of the reflectance spectral data of the 5 scans of each pear sample was taken as the original modeling spectral data.

Detection of Mineral Elements in the Fruit of Pears
After the spectral data collection was completed, the pulp and peel of each pear were sampled within the designated range, packed in self-sealing bags and marked with numbers. The content of the mineral elements calcium, boron and potassium in the samples were detected by the instruments in the Anhui Institute of geological experiments. The detection basis was GB 5009.268-2016, and the main instrument used was an inductively coupled plasma emission spectrometer/B1497.

Near-Infrared Spectrum Data Information Preprocessing and Characteristic Wavelength Extraction
Since there is no literature indicating which preprocessing method is the best, trial and error is usually used to determine the appropriate spectral data preprocessing method [28]. Therefore, after obtaining the original spectral curve based on the algorithms first derivative (FD), second derivative (SD), Multiple Scattering Correction (MSC), Standard Normal Variate (SNV), Savitzky Golay Convolution Smoothing (SG) and Logarithmic Transformation (LG), we used algorithms combined by two single preprocessed methods-SG+MSC and SG+SNV-and algorithms combined by three single preprocessed methods-SG+MSC+SD, SG+MSC+FD, SG+SNV+FD and SG+SNV+SD-to preprocess the original spectral data and compare the impact of different preprocessed methods on the later modeling accuracy.
After preprocessing based on Genetic Algorithms (GA), we extracted the characteristic wavelengths of the full-band models that met the prediction standard, and then established the characteristic wavelength models to further optimize the models [29,30].

Establishment of Prediction Models for the Content of Mineral Elements in the Fruit of 'Huangguan' Pears
Partial Least Square Regression (PLSR) is a classical statistical method; it combines the advantages of principal component analysis, multiple linear regression, and canonical correlation analysis, and can solve the problems of spectral data redundancy and collinearity [31,32]. It can be seen that PLSR is superior to the general least squares method, so we selected PLSR as one of the modeling methods. Gradient Boosting Regression Tree (GBRT) uses the boosting integration strategy to integrate and improve the traditional algorithms. It combines the advantages of multiple algorithms and has great advantages in nonlinear fitting. However, in the process of model training, it is necessary to continuously adjust the parameters to achieve a more reasonable training effect [33]. Therefore, we also selected GBRT as another modeling method. Additionally, the model evaluation indicators in this paper are the determination coefficient (R 2 ) and Relative Percent Deviation (RPD). The value range of the determination coefficient (R 2 ) is (0, 1). The closer R 2 is to 1, the smaller the error is, the more stable the model is, and the better the prediction effect is. Relative analysis error (RPD) is an important indicator to evaluate the quality of the model. This paper evaluates the prediction abilities of the models according to the RPD evaluation grade proposed by Chang et al. [34]: when E RPD ≥ 2, it indicates that the prediction effect of the model is very good, and the model belongs to level A, which can be applied to the corresponding quantitative prediction; when 1.4 ≤ E RPD < 2, it indicates that the model has the moderate predictive ability and belongs to level B; when E RPD < 1.4, it indicates that the prediction effect of the model is relatively poor, and the model belongs to level C and cannot be used for quantitative prediction.

Analysis of Original Spectral Curve
It can be seen from Figure 3 that the overall shape of the spectral curves of all 'Huangguan' pear fruit samples was similar, with clear peaks at wavelengths of 1050 nm, 1260 nm and 1680 nm, and clear troughs at 980 nm, 1200 nm, 1450 nm and 1660 nm. According to the literature, 980 nm and 1450 nm are the main water absorption peaks, corresponding to the first frequency doubling information of the O-H bond stretching vibration of related carbohydrates, and about 1660 nm corresponds to the first frequency doubling information of the C-H bond stretching vibration in methyl [35]. The main chemical components in pears include flavones, triterpenes, phenolic acids, luminol esters, polysaccharides, etc., of which polyphenols are the most abundant component, which are also very valuable secondary metabolites in pears, including chlorogenic acid, arbutin, coumaric acid, etc. [36]. At the same time, phenolic acids, flavones and triterpenes are usually more abundant in pear peels than that in pear pulp [37]. The presentation of the spectral curve, which corresponds to the frequency doubling and frequency information of different chemical bond vibrations accordingly, was complicated because of the complex compounds in pear fruits.

Analysis of Spectral Data Preprocessing Results
The spectral curves obtained by twelve pretreatment methods are shown in Figure 4. The different pretreatment methods had different effects. Firstly, it can be seen from Figure 4a,b that the baseline offset phenomenon was improved to a certain extent after the derivative methods, and the signal-to-noise ratio also decreased, but the noise also increased accordingly. The processing of the second derivative algorithm had an even greater noise impact than the first derivative algorithm. Secondly, the spectral curves after being processed by MSC and SNV are shown in Figure 4c,d. MSC and SNV had no obvious tendency to change the original spectral curve. MSC transformation mainly eliminated the scattering effect caused by uneven particle distribution and different particle sizes on the surface of the samples, and SNV was mainly used to eliminate the influence of scattering and optical path changes on the spectrum. Thirdly, it can be seen from Figure 4e that the overall shape of the spectral curve preprocessed by SG was similar to the original spectral curve. SG removed some irrelevant noise based on the original spectral data, making the spectral curve smoother. Fourthly, LG reduced the absolute value of the spectral data to a certain extent, which was conducive to later calculation. In general, the curves processed by MSC, SNV, SG, SG+MSC and SG+SNV were similar to the original spectral curves. In addition, the curves pretreated by FD, SG+MSC+FD and SG+SNV+FD were similar, and the curves pretreated by SD, SG+MSC+SD and SG+MSC+SD were also similar.

Analysis of Spectral Data Preprocessing Results
The spectral curves obtained by twelve pretreatment methods are shown in Figure 4. The different pretreatment methods had different effects. Firstly, it can be seen from Figure 4a,b that the baseline offset phenomenon was improved to a certain extent after the derivative methods, and the signal-to-noise ratio also decreased, but the noise also increased accordingly. The processing of the second derivative algorithm had an even greater noise impact than the first derivative algorithm. Secondly, the spectral curves after being processed by MSC and SNV are shown in Figure 4c,d. MSC and SNV had no obvious tendency to change the original spectral curve. MSC transformation mainly eliminated the scattering effect caused by uneven particle distribution and different particle sizes on the surface of the samples, and SNV was mainly used to eliminate the influence of scattering and optical path changes on the spectrum. Thirdly, it can be seen from Figure 4e that the overall shape of the spectral curve preprocessed by SG was similar to the original spectral curve. SG removed some irrelevant noise based on the original spectral data, making the spectral curve smoother. Fourthly, LG reduced the absolute value of the spectral data to a certain extent, which was conducive to later calculation. In general, the curves processed by MSC, SNV, SG, SG+MSC and SG+SNV were similar to the original spectral curves. In addition, the curves pretreated by FD, SG+MSC+FD and SG+SNV+FD were similar, and the curves pretreated by SD, SG+MSC+SD and SG+MSC+SD were also similar.

Characteristic Analysis of Sample Detection Data
From the data shown in Table 1, the contents of calcium, boron and potassium in the peel were higher than those in the pulp, and the difference was obvious. In the pulp, the content of potassium was much higher than that of calcium and boron, and it was higher in the peel as well. It can be seen from Figure 5 that except for a few samples that were far away from the reference line, other samples were closely distributed around the reference line, where the x-axis value is the actual detection value and the corresponding y-axis value is the expected normal value.    Table 2 shows the comparison of parameters R 2 and RPD of each prediction model for calcium, boron and potassium content in pear pulp based on full-wave band.

Modelling and Prediction Analysis of the Content of Calcium in Pear Pulp
(1) Full-band based modelling Based on the original spectral data and the spectral data after 12 kinds of pretreatment, 13 PLSR and GBRT models were established to predict the content of calcium in pear pulp. Among the 13 PLSR models, Raw-PLSR (prediction set R 2 = 0.716, RPD = 1.431) and SG-PLSR (prediction set R 2 = 0.717, RPD = 1.433) met level B. However, the predictive effects of the GBRT models for predicting the content of calcium in the pear pulp based on the full-wave band were not ideal, and they could not be used for the prediction of the content of calcium in pear pulp.
(2) Characteristic wavelength extraction Based on GA, 125 and 119 characteristic wavelengths were extracted from the original spectral curves of RAW-PLSR and SG-PLSR, respectively. As shown in Figure 6, the characteristic wavelengths extracted from the original spectral curve points of Raw-GA-PLSR were concentrated at 1050-1300 nm and 1400-1680 nm, and at the same time, the characteristic wavelengths extracted from the original spectral curve points of SG-GA-PLSR were concentrated at 950-1300 nm and 1400-1680 nm. Although the specific values of the characteristic wavelengths extracted from the spectral curves of Raw-GA-PLSR and SG-GA-PLSR were not the same, their distribution ranges on their original spectral curves were similar, which indicates that the characteristic wavelength points corresponding to the compounds related to the content of calcium in pear pulp were roughly concentrated in the range above. Note: "-" means the value here is invalid, the same below. (1) Modelling of characteristic wavelengths  (3) Modelling of characteristic wavelengths According to the modelling results presented in Table 3 and Figure 7, after GA treatment, the two models for predicting the content of calcium in pear pulp-Raw-PLSR and SG-PLSR-showed a slight 'underfitting' state (the effect of the prediction set was slightly better than that of the modelling set), which may have been due to the sample size, but the prediction performance was improved to level A. Furthermore, the prediction performance of the two models was better than before, which further shows that the feature extraction process can remove wavelengths that are not relevant to later modelling to a certain extent, and keep the features with rich information as much as possible to improve the predictive effect of the models.  (2) Modelling effect evaluation After comparative analysis, among the established models for predicting the content of calcium in the pulp of 'Huangguan' pears, Raw-GA-PLSR showed the best prediction effect and met level A, whose modelling sets R 2 and RPD were 0.987 and 6.110, respectively.

Modelling and Prediction Analysis of the Content of Boron in Pear Pulp
(1) Full-band based modelling A total of 65 samples of pear pulp boron content were randomly divided into a modeling set at 52 and a prediction set at 13, and then subsequent modeling analyses were carried out. Among the 13 PLSR models established, only RAW-PLSR and SG-PLSR met the prediction standards; moreover, the R 2 of the modelling sets and prediction sets of SG-PLSR increased slightly and reached level B, and at the same time, the R 2 of the (4) Modelling effect evaluation After comparative analysis, among the established models for predicting the content of calcium in the pulp of 'Huangguan' pears, Raw-GA-PLSR showed the best prediction effect and met level A, whose modelling sets R 2 and RPD were 0.987 and 6.110, respectively.

Modelling and Prediction Analysis of the Content of Boron in Pear Pulp
(1) Full-band based modelling A total of 65 samples of pear pulp boron content were randomly divided into a modeling set at 52 and a prediction set at 13, and then subsequent modeling analyses were carried out. Among the 13 PLSR models established, only RAW-PLSR and SG-PLSR met the prediction standards; moreover, the R 2 of the modelling sets and prediction sets of SG-PLSR increased slightly and reached level B, and at the same time, the R 2 of the modelling sets and prediction sets of the remaining 11 models decreased to varying degrees, among which SD-PLSR, SG+MSC+SD-PLSR and SG+SNV+SD-PLSR failed. Among the 13 GBRT models established, except LG-GBRT, the R 2 and RPD of the modelling sets of the other 11 models based on pretreatment spectra were improved compared with RAW-PLSR. Although the R 2 and RPD of the prediction sets of SNV-GBRT, MSC-GBRT, SD-GBRT, SG+MSC-GBRT, SG+SNV-GBRT and SG+MSC+FD-GBRT were improved compared with RAW-GBRT, they still did not meet the prediction standard.
(2) Characteristic wavelength extraction Among the 26 models for predicting the content of boron in pear pulp based on the full-wave band, RAW-PLSR and SG-PLSR, which initially met the prediction standards, were extracted with characteristic wavelengths. The genetic algorithm extracted 116 and 118 characteristic wavelengths from the spectral curves of RAW-PLSR and SG-PLSR, respectively. As shown in Figure 8, in the original spectral curve points of Raw-GA-PLSR, the extracted characteristic wavelengths were concentrated at 900-1100 nm, 1200-1400 and 1500-1700 nm; meanwhile, in the original spectral curve points of SG-GA-PLSR, the extracted characteristic wavelength points were mainly distributed at 950-1100 nm and 1200-1680 nm. The dense distribution areas of the characteristic wavelengths of the two models were similar, indicating that the compounds related to the content of boron in pear pulp mainly responded in the range of 950-1100 nm, 1200-1400 nm and 1500-1680 nm. (3) Modelling of characteristic wavelengths After GA treatment, the predictive effects of RAW-PLSR and SG-PLSR for predicting the content of boron in pear pulp was significantly improved as the results shown in Table 4. The scatter plots of the prediction results shown in Figure 9 verify the good prediction effects of RAW-GA-PLSR and SG-GA-PLSR. The R 2 of the two model prediction sets exceeded 0.9, reaching a relatively accurate prediction level, while the R 2 of the modelling set was slightly lower than that of the respective prediction sets. Although it showed a slight 'underfitting' state, it did not affect the overall predictive effects. (3) Modelling of characteristic wavelengths After GA treatment, the predictive effects of RAW-PLSR and SG-PLSR for predicting the content of boron in pear pulp was significantly improved as the results shown in Table 4. The scatter plots of the prediction results shown in Figure 9 verify the good prediction effects of RAW-GA-PLSR and SG-GA-PLSR. The R 2 of the two model prediction sets exceeded 0.9, reaching a relatively accurate prediction level, while the R 2 of the modelling set was slightly lower than that of the respective prediction sets. Although it showed a slight 'underfitting' state, it did not affect the overall predictive effects.    LG-GBRT, SG+MSC+SD-GBRT and SG+SNV+SD-GBRT did not meet the prediction standards. On the whole, the established GBRT models were not suitable for the prediction of the content of potassium in 'Huangguan' pear pulp, and the reasons may be as follows: On one hand, GBRT is an additive model, and the next round is the residual between the predicted value and the actual value of the previous round, which is used as a label to continue fitting, and finally, the results are added, which in the end may lead to cases where R 2 is negative or the prediction is invalid. On the other hand, the number of samples, data detection errors and so on may also be other reasons.
(2) Characteristic wavelength extraction The spectral curves of Raw-PLSR and SG-PLSR, whose full-wave band models met the prediction standards for predicting the content of potassium in pear pulp, were optimized to extract the characteristic wavelengths. Then, GA extracted 118 and 114 characteristic band points from the original spectral curve and the spectral curve preprocessed by SG, respectively. As shown in Figure 10, on the original spectral curve, the extracted characteristic wavelengths were densely distributed in the range of 900-1100 nm, 1200-1300 nm and 1500-1680 nm. Meanwhile, on the spectral curve after SG pretreatment, the extracted characteristic wavelengths were most densely distributed in the range of 1020-1150 nm and 1430-1650 nm. It can be inferred that the response of the compounds related to the content of potassium in pear pulp on the spectral curve was mostly concentrated in the range of 1000-1100 nm and 1500-1650 nm.
tracted characteristic wavelengths were densely distributed in the range of 900-1100 nm, 1200-1300 nm and 1500-1680 nm. Meanwhile, on the spectral curve after SG pretreatment, the extracted characteristic wavelengths were most densely distributed in the range of 1020-1150 nm and 1430-1650 nm. It can be inferred that the response of the compounds related to the content of potassium in pear pulp on the spectral curve was mostly concentrated in the range of 1000-1100 nm and 1500-1650 nm. (

3) Modelling of characteristic wavelengths
According to the results shown in Table 5, after GA treatment, the prediction abilities of RAW-PLSR and SG-PLSR were both greatly improved, and the model levels also rose from level B to level A. The scatter plots of the prediction results are shown in Figure  11, and they more intuitively show that both Raw-GA-PLSR and SG-GA-PLSR can more accurately predict the content of potassium in the pulp of 'Huangguan' pears. (3) Modelling of characteristic wavelengths According to the results shown in Table 5, after GA treatment, the prediction abilities of RAW-PLSR and SG-PLSR were both greatly improved, and the model levels also rose from level B to level A. The scatter plots of the prediction results are shown in Figure 11, and they more intuitively show that both Raw-GA-PLSR and SG-GA-PLSR can more accurately predict the content of potassium in the pulp of 'Huangguan' pears.

(4) Modelling effect evaluation
After discussion and analysis, SG-GA-PLSR had the best performance in predicting the content of potassium in pear pulp, with R 2 = 0.834 and RPD = 1.810 for the modelling set and R 2 = 0.985 and RPD = 5.833 for the prediction set. 3.5.

(4) Modelling effect evaluation
After discussion and analysis, SG-GA-PLSR had the best performance in predicting the content of potassium in pear pulp, with R 2 = 0.834 and RPD = 1.810 for the modelling set and R 2 = 0.985 and RPD = 5.833 for the prediction set. Table 6 shows the comparison of parameters R 2 and RPD of each prediction model for calcium, boron and potassium content in pear peel based on a full-wave band.

Modelling and Prediction Analysis of the Content of Calcium in Pear Peel
(1) Full-band based modelling Among the 13 PLSR models established, the predicting level of RAW-PLSR, SNV-PLSR, MSC-PLSR, SG-PLSR, LG-PLSR and SG+SNV-PLSR were all B. Meanwhile, although the R 2 and RPD of the modelling sets of SNV-PLSR, MSC-PLSR and SG+SNV-PLSR decreased slightly compared with the original models, the R 2 and RPD of the prediction sets of these three models were improved, which means that they may have better abilities to predict the content of calcium in the peel of 'Huangguan' pears. Furthermore, among the 13 established GBRT models, although the predictive effects of MSC-GBRT and SG+MSC-GBRT were outstanding at the modelling sets, their R 2 and RPD of the prediction sets did not meet the predictive standards.
(2) Characteristic wavelength extraction As shown in Figure 12, GA extracted 113, 109, 118, 136, 105 and 104 characteristic wavelengths from the spectral curves processed by SNV, MSC, SG, LG, SG+MSC and SG+SNV, respectively. Firstly, the characteristic wavelengths extracted from the spectral curve processed by SNV were roughly concentrated at 1000-1100 nm, 1200-1350 nm and 1550-1680 nm; secondly, the characteristic wavelengths extracted from the spectral curve points processed by MSC were concentrated at 900-1100 nm, 1200-1300 nm and 1500-1650 nm; thirdly, the characteristic wavelengths extracted from the spectral curve points processed by SG were concentrated at 900-1080 nm, 1150-1380 nm and 1500-1650 nm; fourthly, the characteristic wavelengths extracted from the spectral curve points processed by LG were concentrated at 900-1350 nm and 1500-1600 nm; fifthly, the characteristic wavelengths extracted from the spectral curve points processed by SG+MSC were concentrated at 1000-1100 nm and 1450-1650 nm; and finally, the characteristic wavelengths extracted from the spectral curve points processed by SG+SNV were concentrated at 1000-1300 nm and 1400-1680 nm. Overall, the characteristic wavelengths extracted from the six spectral curves were mostly concentrated in 1000-1100 nm, 1200-1300 nm and 1500-1600 nm, indicating that the information reflected by the organic matters related to the content of calcium in the peel was mostly concentrated on these three bands. After the GA treatment, the prediction abilities of the six models for predictin content of calcium in pear peel were improved. As the results shown in Table 7 and ure 13, the prediction effects of the modelling set and prediction set of MSC-GAand SG+MSC-GA-PLSR all reached level A. Although the prediction se SNV-GA-PLSR, SG-GA-PLSR, LG-GA-PLSR and SG+SNV-GA-PLSR met level A level of the modelling sets remained at level B; moreover, the R 2 and RPD of the m ling sets were slightly lower than those of the model without feature processing. F this may have been because, in the process of feature wavelength extraction, the (3) Modelling of characteristic wavelengths After the GA treatment, the prediction abilities of the six models for predicting the content of calcium in pear peel were improved. As the results shown in Table 7 and Figure 13, the prediction effects of the modelling set and prediction set of MSC-GA-PLSR and SG+MSC-GA-PLSR all reached level A. Although the prediction sets of SNV-GA-PLSR, SG-GA-PLSR, LG-GA-PLSR and SG+SNV-GA-PLSR met level A, the level of the modelling sets remained at level B; moreover, the R 2 and RPD of the modelling sets were slightly lower than those of the model without feature processing. Firstly, this may have been because, in the process of feature wavelength extraction, the eliminated feature wavelengths may have contained useful information that was closely related to later modelling, and the lack of this part of the information had a certain impact on the accuracy of the models. Secondly, it may have also been related to the data structure characteristics of the original modelling sets.  Through comparative analysis, LG-GA-PLSR was the best model for predictin content of calcium in pear peels, with R 2 = 0.93 and RPD = 2.671 in the modelling se R 2 = 0.989 and RPD = 6.951 in the prediction set, which also met level A.

Modelling and Prediction Analysis of the Content of Boron in Pear Peel
(1) Full-band based modelling Among the 12 PLSR models based on the spectrum after pretreatment, SNV-P MSC-PLSR, SG+MSC-PLSR and SG+SNV-PLSR significantly improved the R 2 and RP the prediction sets compared with RAW-PLSR, and reached level B. Among the 13 G models for predicting the content of boron in pear peels, although the modelling s FD-GBRT, SD-GBRT, LG-GBRT, SG+MSC+FD-GBRT, SG+MSC+SD-GBRT and SG+ performed well, with R 2 exceeding 0.8 and RPD exceeding 1.4, their prediction set not meet the basic prediction criteria.
(2) Characteristic wavelength extraction Among the 26 established full-wave band models, SNV-PLSR, MSC-P SG+MSC-PLSR and SG+SNV-PLSR, which initially met the prediction standards, (4) Modelling effect evaluation Through comparative analysis, LG-GA-PLSR was the best model for predicting the content of calcium in pear peels, with R 2 = 0.93 and RPD = 2.671 in the modelling set and R 2 = 0.989 and RPD = 6.951 in the prediction set, which also met level A.

Modelling and Prediction Analysis of the Content of Boron in Pear Peel
(1) Full-band based modelling Among the 12 PLSR models based on the spectrum after pretreatment, SNV-PLSR, MSC-PLSR, SG+MSC-PLSR and SG+SNV-PLSR significantly improved the R 2 and RPD of the prediction sets compared with RAW-PLSR, and reached level B. Among the 13 GBRT models for predicting the content of boron in pear peels, although the modelling sets of FD-GBRT, SD-GBRT, LG-GBRT, SG+MSC+FD-GBRT, SG+MSC+SD-GBRT and SG+SNV performed well, with R 2 exceeding 0.8 and RPD exceeding 1.4, their prediction sets did not meet the basic prediction criteria.
(2) Characteristic wavelength extraction Among the 26 established full-wave band models, SNV-PLSR, MSC-PLSR, SG+MSC-PLSR and SG+SNV-PLSR, which initially met the prediction standards, were further extracted with characteristic wavelengths. GA extracted 116, 109, 108 and 124 characteristic wavelengths from the spectral curves preprocessed by SNV, MSC, SG+MSC and SG+SNV, respectively. It can be seen from Figure 14 that on the spectral curve after SNV, the extracted characteristic wavelengths were most densely distributed in the range of 900-1070 nm, 1200-1350 nm and 1450-1650 nm; on the spectral curve after MSC, the extracted characteristic wavelengths were most densely distributed in the range of 910-1090 nm, 1200-1300 nm and 1400-1680 nm; on the spectral curve after SG+MSC, the extracted characteristic wavelengths were most densely distributed in the range of 940-1060 nm, 1130-1260 nm and 1490-1680 nm; on the spectral curve after SG+SNV, the extracted characteristic wavelengths were most densely distributed in the range of 960-1090 nm, 1150-1300 nm and 1450-1680 nm. The dense distribution areas of the extracted characteristic wavelengths were similar, which indicates that the compounds with a great correlation with the content of boron in the pear peels mainly responded in the range of 900-1050 nm, 1200-1300 nm and 1500-1650 nm. (3) Modelling of characteristic wavelengths As the results shown in Table 8 and Figure 15, after the GA treatment, the performances of SNV-GA-PLSR, MSC-GA-PLSR, SG+MSC-GA-PLSR and SG+SNV-GA-PLSR for predicting the content of boron in the pear peel were improved to varying degrees. The scatter plots of the prediction results shown in Figure 15 further intuitively show the predictive effects of the models. Table 8. Modelling prediction results of the content of boron in pear peel based on the selected characteristic wavelengths. (3) Modelling of characteristic wavelengths As the results shown in Table 8 and Figure 15, after the GA treatment, the performances of SNV-GA-PLSR, MSC-GA-PLSR, SG+MSC-GA-PLSR and SG+SNV-GA-PLSR for predicting the content of boron in the pear peel were improved to varying degrees. The scatter plots of the prediction results shown in Figure 15 further intuitively show the predictive effects of the models.  (4) Modelling effect evaluation Through comparative analysis, SG+MSC-GA-PLSR had the best effect on predicting the content of boron in pear peels, with the R 2 of the modelling set = 0.870 and RPD = 2.027, and the R 2 of the prediction set = 0.984 and RPD = 5.598.

Modelling and Prediction Analysis of the Content of Potassium in Pear Peel
(1) Full-band based modelling As shown in Table 5, among the 13 PLSR models established for predicting the content of potassium in pear peels, SNV-PLSR and SG+SNV-PLSR met the predictive standards, and the R 2 and RPD of the SG+SNV-PLSR prediction set were slightly higher than those of the SNV-PLSR prediction set. This may be because the SG+SNV pretreatment further smoothed the curve and improved the modelling accuracy based on the SNV. At the same time, among the 13 GBRT models, SNV-GBRT, MSC-GBRT, SD-GBRT, SG+MSC-GBRT, SG+SNV-GBRT, SG+MSC+SD-GBRT and SG+SNV+SD-GBRT failed. The remaining six GBRT models belonged to level C and could not be used to predict the content of potassium in the peel of 'Huangguan' pears.
(2) Characteristic wavelength extraction (4) Modelling effect evaluation Through comparative analysis, SG+MSC-GA-PLSR had the best effect on predicting the content of boron in pear peels, with the R 2 of the modelling set = 0.870 and RPD = 2.027, and the R 2 of the prediction set = 0.984 and RPD = 5.598.

Modelling and Prediction Analysis of the Content of Potassium in Pear Peel
(1) Full-band based modelling As shown in Table 5, among the 13 PLSR models established for predicting the content of potassium in pear peels, SNV-PLSR and SG+SNV-PLSR met the predictive standards, and the R 2 and RPD of the SG+SNV-PLSR prediction set were slightly higher than those of the SNV-PLSR prediction set. This may be because the SG+SNV pretreatment further smoothed the curve and improved the modelling accuracy based on the SNV. At the same time, among the 13 GBRT models, SNV-GBRT, MSC-GBRT, SD-GBRT, SG+MSC-GBRT, SG+SNV-GBRT, SG+MSC+SD-GBRT and SG+SNV+SD-GBRT failed. The remaining six GBRT models belonged to level C and could not be used to predict the content of potassium in the peel of 'Huangguan' pears.
(2) Characteristic wavelength extraction SNV-PLSR and SG+SNV-PLSR, which preliminarily met the prediction standards, were processed by extracting the characteristic wavelengths on the spectral curves. GA extracted 122 and 133 characteristic wavelengths from the spectral curves pretreated by SNV and SG+SNV, respectively. As shown in Figure 16, on the spectral curve pretreated by SNV, the extracted characteristic wavelengths were mainly concentrated in 1160-1280 nm and 1400-1650 nm; on the spectral curve pretreated by SG+SNV, the extracted characteristic wavelengths were mainly concentrated at 1050-1110 nm, 1200-1300 nm and 1450-1680 nm. Therefore, the response intervals of the compounds related to the content of potassium in the pear peel on the spectral curve were mainly 1200-1300 nm and 1450-1650 nm. SNV-PLSR and SG+SNV-PLSR, which preliminarily met the prediction standards, were processed by extracting the characteristic wavelengths on the spectral curves. GA extracted 122 and 133 characteristic wavelengths from the spectral curves pretreated by SNV and SG+SNV, respectively. As shown in Figure 16, on the spectral curve pretreated by SNV, the extracted characteristic wavelengths were mainly concentrated in 1160-1280 nm and 1400-1650 nm; on the spectral curve pretreated by SG+SNV, the extracted characteristic wavelengths were mainly concentrated at 1050-1110 nm, 1200-1300 nm and 1450-1680 nm. Therefore, the response intervals of the compounds related to the content of potassium in the pear peel on the spectral curve were mainly 1200-1300 nm and 1450-1650 nm.  (3) Modelling of characteristic wavelengths It can be seen from Table 9 and Figure 17 that the prediction ability of the model built after GA processing was improved. Although the R 2 and RPD of the SNV-GA-PLSR and SG+SNV-GA-PLSR modelling sets were slightly decreased, the R 2 and RPD of their prediction sets were higher than those before feature extraction processing, and the prediction level of the model increased to level A.
It can be seen from Table 9 and Figure 17 that the prediction ability of the model built after GA processing was improved. Although the R 2 and RPD of the SNV-GA-PLSR and SG+SNV-GA-PLSR modelling sets were slightly decreased, the R 2 and RPD of their prediction sets were higher than those before feature extraction processing, and the prediction level of the model increased to level A.   (4) Modelling effect evaluation From the discussion and comparison above, we can see that SG+SNV-GA-PLSR had the best ability to predict the content of potassium in pear pulp with an R 2 = 0.888 and RPD = 2.175 for the modelling set, and an R 2 = 0.973 and RPD = 4.294 for the prediction set.

Discussion
Previous studies have pointed out that the brown spots that seriously affect the appearance quality of 'Huangguan' pears are caused by physiological disease, [38] and are largely related to the content and proportion of mineral elements in the pulp and peel of pears. In previous studies, the mineral elements calcium, boron and potassium were proved to be important factors affecting the appearance quality of 'Huangguan' pears, and the content of mineral elements in the pulp and peel of diseased pears may have a certain degree of correlation, which will show a different appearance with the severity of the disease [6]. The timely detection of the content of calcium, boron and potassium in fruits and the adjustment of fertilization and bagging based on the detected results are of great significance to the improvement of fruit quality in the later stage. Compared with the traditional laboratorial and expensive physical and chemical detection methods, nearinfrared spectroscopy has been applied to the study of soluble solids, titratable acids and the mineral element content of pears for its several potential advantages [20][21][22][23]. Therefore, in this paper, which is based on the visible-near-infrared (900-1700 nm) spectroscopy technology, the 'NIR-S-G1 micro spectrometer was used to collect spectral data, and combined with 12 pretreatment methods, the content of calcium, boron and potassium in 'Huangguan' pears' fruit (including peel and pulp) were predicted by PLSR and GBRT. For the models that met the prediction standard, GA was used to extract the characteristic wavelengths of its original spectral curve to further optimize the models.
The full-wave band modelling results showed that the overall prediction abilities of the PLSR models were stronger than those of the GBRT models. The GBRT integration algorithm combines the advantages of many algorithms, but its performance in processing high-dimensional data with small samples was not as good as that of the traditional single regression model PLSR. This is roughly the same as what was found in previous studies [39][40][41], but there are also differences. Relatively speaking, the overall prediction effects of the models with feature extraction by the genetic algorithm were better than those of the full-band models without feature extraction. Although the specific characteristic wavelengths extracted from each model were not the same, the distribution ranges of the extracted characteristic wavelengths on the original spectral curve were similar, which may have been due to the similarity of the compounds corresponding to the corresponding mineral elements, resulting in the similar chemical bond vibration laws contained in the corresponding compounds. Previous studies have shown that the chemical components in the pulp and peel of pears are complex, and the main chemical components in pears include flavonoids, triterpenes, phenolic acids, etc. Generally, the peel of a pear contains more phenolic acids, flavonoids and triterpenes than the pulp. The presentation of the spectral curves also corresponds to the frequency doubling and frequency combining information of different chemical bond vibrations. In the range of 900-1700 nm, some chemical bonds with a strong saturation activity, especially the X-H bond, and some other chemical bonds are also active in this range, such as the C=O bond of ester and the C=N bond of amine. This paper may provide some theoretical basis for the study of pear fruit characteristic wavelength modelling.
The results of this study showed that the best models for predicting the content of calcium in the pulp and peel of 'Huangguan' pears were RAW-GA-PLSR (the R 2 and RPD of the modelling set were 0.898 and 2.271, respectively, and the R 2 and RPD of the prediction set were 0.987 and 6.110, respectively) and LG-GA-PLSR (the R 2 and RPD of the modelling set were 0.93 and 2.671, respectively, and the R 2 and RPD of the prediction set were 0.989 and 6.951, respectively). The prediction ability of LG-GA-PLSR was slightly stronger than RAW-GA-PLSR, which may have occurred because the compounds related to the content of calcium in pear peels are more sensitive in the range of 900-1700 nm. Secondly, the best models for predicting the content of boron in the pulp and peel of 'Huangguan' pears were SG-GA-PLSR (the R 2 and RPD of the modelling set were 0.846 and 1.878, respectively, and the R 2 and RPD of the prediction set were 0.986 and 6.033, respectively) and SG+MSC-GA-PLSR (the R 2 and RPD of the modelling set were 0.870 and 2.027, respectively, and the R 2 and RPD of the prediction set were 0.984 and 5.598, respectively). The prediction effect of SG-GA-PLSR was slightly stronger than that of SG+MSC-GA-PLSR, which may have occurred due to the pretreatment method, or because the compounds related to the content of boron in the pulp have a stronger response within the spectral range determined in this paper. Thirdly, the best models for predicting the content of potassium in the pulp and peel of 'Huangguan' pears were SG-GA-PLSR (the R 2 and RPD of the modelling set were 0.834 and 1.810, respectively, and the R 2 and RPD of the prediction set were 0.985 and 5.833, respectively) and SG+SNV-GA-PLSR (the R 2 and RPD of the modelling set were 0.888 and 2.175, respectively, and the R 2 and RPD of the prediction set were 0.973 and 4.294, respectively).

Conclusions
From the discussion above, it can be concluded that near-infrared spectroscopy can be used in the effectively and accurately nondestructive prediction of calcium, boron and potassium in the pulp and peel of 'Huangguan' pears. Furthermore, the precision of the mineral element content in the pulp was slightly higher than that in the peel, so the occurrence of the skin browning of 'Huangguan' pears can be more accurately predicted. The research in this paper provides a theoretical basis for the nondestructive prediction of the element content affecting the appearance quality of 'Huangguan' pears. The next stage is the validation of methods, which refers to the demonstration of the suitability of the method by analyzing random samples, to further enrich the modeling samples from different dimensions and to optimize the model, by combining it with artificial intelligence to implement the application of the research results in the field.