Spectral Estimation Model Construction of Heavy Metals in Mining Reclamation Areas

The study reported here examined, as the research subject, surface soils in the Liuxin mining area of Xuzhou, and explored the heavy metal content and spectral data by establishing quantitative models with Multivariable Linear Regression (MLR), Generalized Regression Neural Network (GRNN) and Sequential Minimal Optimization for Support Vector Machine (SMO-SVM) methods. The study results are as follows: (1) the estimations of the spectral inversion models established based on MLR, GRNN and SMO-SVM are satisfactory, and the MLR model provides the worst estimation, with R2 of more than 0.46. This result suggests that the stress sensitive bands of heavy metal pollution contain enough effective spectral information; (2) the GRNN model can simulate the data from small samples more effectively than the MLR model, and the R2 between the contents of the five heavy metals estimated by the GRNN model and the measured values are approximately 0.7; (3) the stability and accuracy of the spectral estimation using the SMO-SVM model are obviously better than that of the GRNN and MLR models. Among all five types of heavy metals, the estimation for cadmium (Cd) is the best when using the SMO-SVM model, and its R2 value reaches 0.8628; (4) using the optimal model to invert the Cd content in wheat that are planted on mine reclamation soil, the R2 and RMSE between the measured and the estimated values are 0.6683 and 0.0489, respectively. This result suggests that the method using the SMO-SVM model to estimate the contents of heavy metals in wheat samples is feasible.


Introduction
Land reclamation in mining areas is a priority for agricultural production in China [1]. However, the characteristics of the reclamation process and materials (i.e., filling the depressions with coal gangue), and the complexity of the reclamation environment often result in heavy metal pollution in the soil, which may directly or indirectly threaten human health by direct contact, through the food chain or in other ways [2][3][4][5][6]. Due to the complex spatial heterogeneity of soil, the chemical analysis methods that have been traditionally used to detect the heavy metal content are found to be laborious, inefficient in terms of time required, and not suitable for large-scale monitoring [7][8][9]. Therefore, how to monitor soil heavy metal pollution quickly and accurately has become an important research topic in the field of mine reclamation.
Reflectance spectral characteristics are basic soil characteristics. Near-infrared reflectance spectroscopy was first used to estimate the heavy metal content in lake sediments to prove its 2 of 18 feasibility in investigating heavy metal content [10]. Using a remote sensing spectral analysis method to estimate the content of heavy metals in soil can overcome the shortcomings of traditional sampling methods and monitor the heavy metal pollution in soil dynamically and quickly on a large-scale. Many studies have already been done on the spectral characteristics of soil. The early focus was mainly on soil classification and the factors influencing the soil spectrum [11][12][13][14][15][16][17][18][19]. With continuous research development, researchers began to combine mathematical analysis methods with spectral characteristics, focusing on the quantitative investigation of the physical and chemical properties of soils [20][21][22][23][24][25][26][27]. Although heavy metals have limited effect on the soil spectral curve because they are not the main components of soil, a suitable spectral inversion model can be established and effectively used to estimate the content of heavy metals in soil based on the information-rich spectral characteristics of soil. Currently, many inversion models have been developed, which can be mainly divided into two types-statistical analysis models and machine learning models. The statistical analysis models have the characteristics of simple structure, few parameters and a relatively easy process. For example, multiple linear regression can describe the linear connection between several independent variables and a dependent variable; and the partial least squares method, first proposed in 1983 [28,29], is a type of deformation of multiple linear regression models focusing on the characteristics of principal component analysis, canonical correlation analysis and linear regression analysis methods in the modeling process [30,31]. The machine learning models mainly refer to the artificial neural network (ANN) and support vector machine (SVM) methods. The artificial neural network is a system that imitates the structure and function of nerve cells in human brains [32]. It has been widely used in many fields and achieved satisfactory results due to its ability to perform highly nonlinear mapping [33]. The support vector machine was first proposed in 1995 and its main advantages are that it can solve small sample, nonlinear and high-dimensional pattern recognition problems very well, and can be applied to other machine learning problems, e.g., function fitting [34,35].
Taking the artificial reclamation areas (coal gangue reclamation area and fly-ash reclamation area) of the Liuxin mining area in Xuzhou, China as the case study, this paper establishes quantitative models to simulate the connection between heavy metal contents in mine reclamation soil and characteristic spectral remote sensing parameters, compares the estimation results of different models, and selects the optimal model for the estimation of Cd content in wheat planted in the reclaimed mine soil. The paper further proposes a quick and efficient method that is suitable for large-range monitoring of the heavy metal pollution in mine reclamation soils, as well as the technical support required for the regulation of heavy metal pollution and food security in mining areas.

Study Area
The study area is in the Liuxin national reclamation demonstration area of Tongshan County, which is located 20 km northwest of Xuzhou, China. The area has a temperate, humid to semi-humid continental monsoon climate with an average annual precipitation of 800-930 mm, 56% of which falls in July and August. The annual average air temperature is 14˝C with lowest temperature ranging from´9˝C to´13˝C and highest temperature from 36˝C to 39˝C. The climate conditions are suitable for the growth of crops. There are five large coal mines and two large power plants, Chacheng and Huarun, in the demonstration area, and also large areas of subsided lands affected by many years of mining ( Figure 1). The coal mining subsidence was used for agriculture production after filling the reclamation area with mainly coal gangue and fly ash. The collapsed area in the coal gangue reclamation area (Zone A), backfilled in 1998, was directly filled with coal gangue of different block sizes and then covered with 40-45 cm soil for subsequent planting. The collapsed area in the fly-ash reclamation area (Zone B), backfilled in 1999, was directly filled with power plant fly ash and then covered with 40-50 cm soil for subsequent planting. In addition, an area with a soil depth greater than 1 m was used as the experimental control soil (Zone C). The control area has the same climatic

Spectral Data Collection
Field sampling was completed in March 2012 and April 2012. The surface soil samples in the coal gangue reclamation area, fly-ash reclamation area and control area were designated and collected according to the plum blossom stationing method. Ten sampling points were determined in each area, and a total of 30 soil samples were collected.

Heavy Metal Content Measurement
A traditional chemical detection method was used to measure the contents of heavy metals in the soil samples. The soil sample pretreatment included air drying, grinding, mesh screening, digesting and constant volume adjustment. After pretreatment, the heavy metal contents in the soil samples were detected by an Inductively Coupled Plasma-Mass Spectrometry (ICP-MS, Agilent, Palo Alto, CA, USA).

Outdoor Spectral Data Collection
The outdoor spectral data of the soil samples were collected from 11:00 a.m. to 2:00 p.m. using an ASD FieldSpec 3 Spectrometer (ASD Inc., Boulder, CO, USA). A standard white reflection plate was used to calibrate the instrument before collecting each soil sample spectrum. The sampling points were identified with GPS to precisely fix the positions, and 10 spectral curves were collected for each soil sample. The average of the 10 spectral curves for each soil sample was used as the actual reflectance spectrum of the soil samples [36].

Indoor Spectral Data Collection
The soil samples were stored in a darkroom for indoor spectral data collection. The spectrometer was preheated to stability, and the standard white reflection plate was used for instrument calibration. The soil samples were placed on a plain black velvet, the surfaces of the soil samples were flattened before recording the spectral data in dishes. Finally, 10 spectral curves were collected for each soil sample. The arithmetic average of the 10 spectral curves for each soil sample was used as the actual reflectance spectrum of the soil sample.

Spectral Data Collection
Field sampling was completed in March 2012 and April 2012. The surface soil samples in the coal gangue reclamation area, fly-ash reclamation area and control area were designated and collected according to the plum blossom stationing method. Ten sampling points were determined in each area, and a total of 30 soil samples were collected.

Heavy Metal Content Measurement
A traditional chemical detection method was used to measure the contents of heavy metals in the soil samples. The soil sample pretreatment included air drying, grinding, mesh screening, digesting and constant volume adjustment. After pretreatment, the heavy metal contents in the soil samples were detected by an Inductively Coupled Plasma-Mass Spectrometry (ICP-MS, Agilent, Palo Alto, CA, USA).

Outdoor Spectral Data Collection
The outdoor spectral data of the soil samples were collected from 11:00 a.m. to 2:00 p.m. using an ASD FieldSpec 3 Spectrometer (ASD Inc., Boulder, CO, USA). A standard white reflection plate was used to calibrate the instrument before collecting each soil sample spectrum. The sampling points were identified with GPS to precisely fix the positions, and 10 spectral curves were collected for each soil sample. The average of the 10 spectral curves for each soil sample was used as the actual reflectance spectrum of the soil samples [36].

Indoor Spectral Data Collection
The soil samples were stored in a darkroom for indoor spectral data collection. The spectrometer was preheated to stability, and the standard white reflection plate was used for instrument calibration. The soil samples were placed on a plain black velvet, the surfaces of the soil samples were flattened before recording the spectral data in dishes. Finally, 10 spectral curves were collected for each soil sample. The arithmetic average of the 10 spectral curves for each soil sample was used as the actual reflectance spectrum of the soil sample.

Heavy Metal Contents in the Soil Samples
The heavy metal contents in the soil samples from the three areas detected using the traditional chemical detection methods are shown in Table 1. The heavy metal contents in the coal gangue reclamation area are relatively close to the values in the fly-ash reclamation area, and the heavy metal contents in both reclamation areas are obviously higher than those of the control area. Therefore, taking the spectral characteristics and the sampling point quantity into account, the soil samples from the coal gangue and fly-ash reclamation areas are both regarded as the mining area reclamation soil in this research. The original outdoor and indoor spectral curves of the soil samples in the mining area are shown in Figure 2. There are two obvious reflection peaks in the outdoor spectral curve at 1400 nm and 1900 nm, which are influenced by the light intensity, air suspended particles, wind speed, gas and polarization interference of the surrounding targets during the process of the outdoor spectral data collection. In addition, missing or overlapping information in the soils' typical reflection peaks and absorption valley bands is the most serious problem of the spectral data collected outdoors in this study. It is generally considered that 600 nm and 815 nm are typically the reflection peak and the second reflection peak of the soil organic matter, respectively, while the weak absorption peak of Fe 2+ and Fe 3+ is near 900 nm and the absorption peak near 1000 nm is the characteristic spectral band of iron hydroxides in soil. The outdoor spectral data largely lose these spectral features. The indoor spectral curve is relatively smooth, and there are no abnormal reflection peaks. Because of the large differences in soil conditions between the control area and the reclamation area, this paper selected the indoor spectral data of the two mine reclamation soils for the characteristics extraction and the spectral estimation model construction. iron hydroxides in soil. The outdoor spectral data largely lose these spectral features. The indoor spectral curve is relatively smooth, and there are no abnormal reflection peaks. Because of the large differences in soil conditions between the control area and the reclamation area, this paper selected the indoor spectral data of the two mine reclamation soils for the characteristics extraction and the spectral estimation model construction.
(a) (b) Figure 2. Original outdoor (a) and indoor (b) spectra of mine reclamation soils.

Multivariable Linear Regression
Multivariable linear regression, MLR, is used to predict the independent variable by establishing the regression equation model with the optimal combination of multiple dependent variables X. The MLR model is a classical statistical analysis method based on the least squares method. The basic form is: where Y represents the characteristics to be analyzed; represents the jth independent variable; represents the regression coefficient corresponding to the jth independent variable; and represents a random error of the regression equation, subject to the normal distribution with a mean of zero, where ( ) = 0; and represents the number of independent variables used for the modeling.
First, the overall parameters = ( , , ⋯ , ) are estimated based on the least square method.
method. The estimated quantity of β is denoted as = ( , , ⋯ , ) ; therefore, the estimated quantity of Y is: Since the difference between the estimated quantity and the original vector Y must be minimized, the least squares method is used again to calculate the overall parameters of the least squares estimated quantity: Thus, the least squares estimated quantity of Y is obtained: In Equation (4), X = , , , ⋯ , , where the rank of X is n, does not have a complete correlation.

Multivariable Linear Regression
Multivariable linear regression, MLR, is used to predict the independent variable Y by establishing the regression equation model with the optimal combination of multiple dependent variables X. The MLR model is a classical statistical analysis method based on the least squares method. The basic form is: where Y represents the characteristics to be analyzed; X j represents the jth independent variable; β j represents the regression coefficient corresponding to the jth independent variable; and ε represents a random error of the regression equation, subject to the normal distribution with a mean of zero, where E pεq " 0; and n represents the number of independent variables used for the modeling. First, the overall parameters β " pβ 0 , β 1 ,¨¨¨, β n q are estimated based on the least square method. The estimated quantity of β is denoted as B " pb 0 , b 1 ,¨¨¨, b n q; therefore, the estimated quantity of Y is: Since the difference between the estimated quantityŶ and the original vector Y must be minimized, the least squares method is used again to calculate the overall parameters of the least squares estimated quantity: Thus, the least squares estimated quantity of Y is obtained: In Equation (4), X " X 1 , X 2 , X 3 ,¨¨¨, X n , where the rank of X is n, does not have a complete correlation.

Generalized Regression Neural Network
The generalized regression neural network, GRNN, is usually used for function approximation. The entire network consists of four layers: input layer, pattern layer, summation layer and output layer.
The network input is X " rx 1 , x 2 ,¨¨¨, x n s T , and its output is Y " ry 1 , y 2 ,¨¨¨, y k s T . The number of neurons in the input layer is equal to the dimension m of the input vector in the model sample. Each neuron can be viewed as a simple distribution unit, which can directly transmit the input variable into the hidden layer. The number of neurons in the model layer is equal to the number n of the model sample, and each neuron in this layer corresponds to a model sample. The transfer function of the ith neuron in the model layer is: In Equation (5), X represents the input variable of the network; X i represents the corresponding training sample of the ith neuron; and σ represents smoothing parameter. The ith output neuron is the exponential form of the Euclidean distance square between the input variable X and the corresponding training sample X i : The summation layer contains two types of neurons. One type performs the arithmetic summation of all the output of the model layer, and the connection weight between each neuron and this neuron in the model layer is 1. The transfer function is: The other type performs a weighted summation of all the output of the pattern layer. The connection weight between the ith neuron in pattern layer and the jth neuron in the summation layer is the jth element y ij of the ith output sample Y i . The transfer function of the neurons in the summation layer is: The number of neurons in the output layer is equal to the dimension k of the output vector in the model sample. The output of each neuron is obtained by dividing the two different types of neuron outputs in the summation layer: Therefore, it can be determined that the structure and weight of the GRNN are fully determined after the selection of the model sample. As a result, the GRNN is more convenient than the other neural networks.

Sequential Minimal Optimization for Support Vector Machines
Sequential minimal optimization for support vector machines, SMO-SVM, is a statistical learning method based on the statistical theory of the VC dimension (for Vapnik-Chervonenkis dimension) theory and the structural risk minimization principle. The basic principle of a standard support vector machine is mapping the n-dimensional sample vector from the original R n space into the feature space F through the nonlinear mapping ψ and constructing the optimal linear decision function using the structural risk minimization principle: In Equation (10), x P R n , ω P F, and b represents the threshold. For the standard support vector machine, the optimization problem is: The expressions define the formula; C represents the penalty coefficient; ξ i , ξi represent the slack variable; and ε represents an insensitive parameter. A Lagrange function is established as: In Equation (12), a i represents the Lagrange multiplier factor. The specific steps of the sequential minimal algorithm are as follows: (1) Select two updated elements a 1 and a 2 , order F " w¨x i´yi , and calculate the upper bound H and lower bound L; (2) Update the element a 2 : In Equation (13), k " k px 1 , x 1 q`k px 2 , x 2 q´2k px 1 , x 2 q. (3) If ∆a 2 is less than the threshold, the update fails; otherwise, the update element a 1 is as follows: ( 14) In Equation (14), s " y 2 y 1 . (4) Update all the F i : (5) Calculate the error E of the function output and the target classification: (6) The algorithm ends, if E is less than the threshold; otherwise, repeat Equations (1)-(5).

Stress Sensitive Band Selection of Heavy Metal Pollution
This study carried out a correlation analysis between the content of heavy metal in soils and the spectral reflectance obtained after performing a first-order differential transformation, envelope elimination and inverse logarithmic transformation. Eight maximum correlated bands were selected as pollution stress sensitive bands for the heavy metals Cd, Cr, Cu, Pb and Zn.
Compared with the first-order differential transformation and envelope elimination, the correlation coefficient between the Cd content in the mine reclamation soil and the spectrum obtained after performing the inverse logarithmic transformation was significantly reduced, and no significant correlation to the selection of feature bands was found. The selection results of the five heavy metals' stress sensitive spectral bands are shown in Table 2.

Establishment of the MLR Estimation Model
According to the results of the stress sensitive band selection, the spectral reflectance values obtained through the spectral transformation of the sensitive band were defined as a set of independent variables X 1 , X 2 , X 3 ,¨¨¨, X n , and the heavy metal content in the soil of the mining area was treated as dependent variable Y to establish the model.
The stress sensitive bands with a significant correlation were selected as related factors. The data for 6 sample points (12 soil samples, accounting for 60% of the total samples) from the fly-ash and coal gangue reclamation areas were used as the training data, and the remaining eight soil samples (accounting for 40% of the total samples) were used as the testing data. The regression coefficients of the MLR model were calculated, and the stability and forecast accuracy of the model was predicted. The MLR equations for the contents of the five heavy metals are as follows:  (21) This paper evaluated the stability of the regression model with the coefficient of determination (R 2 ) and the accuracy with the root mean square error using the following formulas, respectively: In Equations (22) and (23), Y m and Y p represent the measured and predicted values of the heavy metal content, respectively; Y represents the average of the measured values of the heavy metal contents; N c , N p and N represent the number of modeling samples, the number of forecast modeling samples and the number of total samples, respectively; and k represents the number of the independent variables in the model. R 2 indicates the stability of the model. The closer the value is to 1, the more stable the model is. RMSE indicates the accuracy of the model. The smaller the value, the higher the accuracy of the model. The results for estimating the heavy metal contents in the soil samples of the mining area using the MLR model are shown in Table 3. As shown in Table 3, the MLR model was used to estimate the heavy metal contents in the soil of the mining area. Among them, the Cr estimation result is the best, and the value of its R 2 reaches 0.5976. As shown in Table 3, the MLR model was used to estimate the heavy metal contents in the soil of the mining area. Among them, the Cr estimation result is the best, and the value of its R 2 reaches 0.5976. The estimated values of Cu and Cd are the second best, with R 2 values more than 0.5. The estimated values of Pb and Zn are the worst, with their R 2 values reaching 0.4851 and 0.4687, respectively.
The distribution of the measured values and the estimated values based on the MLR model is shown in Figure 3. Compared to the prediction efforts of a single element, the distribution of the five types of heavy metal elements is closer to the 1:1 line. Cd, Cr and Zn all have different degrees to which the maximum or minimum values deviate from the 1:1 line. Because the differences between these extreme values and the majority of the sample are great, there are insufficient training samples to support the modeling in the corresponding multi-dimensional space. As a result, the generalization ability of the model is weak, and the model performs unsatisfactorily when the model's value is a maximum or minimum. Among the five heavy metals, the distribution of Cr is the closest to the 1:1 line, and the trend is more consistent with the 1:1 line. Relative to Cr, the distributions of Cd and Cu appear to be loose in the vicinity of the 1:1 line, and the overall trend is consistent with the 1:1 line. The distribution of Pb and Zn show that the estimation of those two heavy metals are not very impressive using the model. These results indicate that the best prediction ability of the model is for Cr, followed by Cu and Cd, while the worst is for Pb and Zn.

Establishment of the GRNN Estimation Model
A smoothing factor σ has a significant effect on the ability for the forecasting and generalization of the GRNN. Therefore, the emphasis rests on improving the accuracy of the network forecasting by selecting and using a suitable smoothing factor as the parameter in this network. The experiment adopted a cross validation approach to conduct the parameter optimization of smoothing factor σ. The steps of the parameter optimization are given as follows: (1) Set an initializing smoothing factor.
(2) Divide the modeling sample into four equal parts, with the first part as the modeling sample and the rest for establishing the GRNN. According to the selection results of the sensitive bands, the spectral reflectance values obtained through the spectral transformation of the sensitive bands were defined as the independent variables X 1 , X 2 , X 3 ,¨¨¨, X n , and the heavy metal content in the soil of the mining area is treated as the dependent variable Y to establish the model.
The stress sensitive bands with a significant correlation were selected as the related factors. The data of six sample points (12 soil samples, accounting for 60% of the total samples) from the fly-ash and coal gangue filling sites were considered as the training data, and the remaining eight soil samples (accounting for 40% of the total samples) were considered as the testing data.
The parameter optimization of the smoothing factor σ for Cr in the GRNN is shown in Figure 4. When σ is 1.7, the generalization ability of the network is the best and thus the parameter σ participates in the subsequent process of the regression forecast. The optimization of the other parameters used the same method. Among the five heavy metals, the distribution of Cr is the closest to the 1:1 line, and the trend is more consistent with the 1:1 line. Relative to Cr, the distributions of Cd and Cu appear to be loose in the vicinity of the 1:1 line, and the overall trend is consistent with the 1:1 line. The distribution of Pb and Zn show that the estimation of those two heavy metals are not very impressive using the model. These results indicate that the best prediction ability of the model is for Cr, followed by Cu and Cd, while the worst is for Pb and Zn.

Establishment of the GRNN Estimation Model
A smoothing factor has a significant effect on the ability for the forecasting and generalization of the GRNN. Therefore, the emphasis rests on improving the accuracy of the network forecasting by selecting and using a suitable smoothing factor as the parameter in this network. The experiment adopted a cross validation approach to conduct the parameter optimization of smoothing factor . The steps of the parameter optimization are given as follows: (1) Set an initializing smoothing factor.
(2) Divide the modeling sample into four equal parts, with the first part as the modeling sample and the rest for establishing the GRNN. According to the selection results of the sensitive bands, the spectral reflectance values obtained through the spectral transformation of the sensitive bands were defined as the independent variables , , , ⋯ , , and the heavy metal content in the soil of the mining area is treated as the dependent variable to establish the model. The stress sensitive bands with a significant correlation were selected as the related factors. The data of six sample points (12 soil samples, accounting for 60% of the total samples) from the fly-ash and coal gangue filling sites were considered as the training data, and the remaining eight soil samples (accounting for 40% of the total samples) were considered as the testing data.
The parameter optimization of the smoothing factor for Cr in the GRNN is shown in Figure  4. When is 1.7, the generalization ability of the network is the best and thus the parameter participates in the subsequent process of the regression forecast. The optimization of the other parameters used the same method. The results using the GRNN model for estimating the five types of heavy metal contents in the reclamation soil of the mining area are shown in Table 4. As shown in Table 4 The results using the GRNN model for estimating the five types of heavy metal contents in the reclamation soil of the mining area are shown in Table 4. As shown in Table 4

Establishment of SMO-SVM Estimation Model
This experiment selected the Radial Basis Function (RBF) as the kernel function of SMO-SVM. RBF has the characteristics of a good classification effect and easily adjustable parameters. The computational formula is as follows: ( 24) On selecting the SVM parameters when adopting RBF, the internationally commonly-used method confines the penalty parameter C and the kernel function parameter g to a certain range of values, uses cross validation to obtain the optimum parameters C and g, which have the highest accuracy of modeling and validation, and selects the two parameters to participle in the subsequent SVM model establishment. Due to the problem of two-parameter selection, there may be a situation in which there are multiple-unit combinations of C and g that correspond to the highest accuracy of the validation. Therefore, the strategy of selecting the combination of C and g can reach the highest validation accuracy when the parameter C is minimized. If g corresponding to the minimum C has several groups, the first g that was searched for as the optimal parameter is selected.
The stress sensitive bands with a significant correlation were selected as the related factors. The data of six sample points (12 soil samples, accounting for 60% of the total samples) from the fly-ash and coal gangue filling sites were used as the training data, and the remaining eight soil samples (accounting for 40% of the total samples) were used as the testing data.
The map of parameter optimization of the SVM network's parameters C and g for Cr is shown in Figure 6. When C " 0.050766 and g " 51.9842, the generalization ability of the support vector machines is the best and thus the parameter combination participates in the subsequent process of the regression forecast. The optimization methods for the rest of the elements are the same.

Establishment of SMO-SVM Estimation Model
This experiment selected the Radial Basis Function (RBF) as the kernel function of SMO-SVM. RBF has the characteristics of a good classification effect and easily adjustable parameters. The computational formula is as follows: On selecting the SVM parameters when adopting RBF, the internationally commonly-used method confines the penalty parameter C and the kernel function parameter g to a certain range of values, uses cross validation to obtain the optimum parameters C and g, which have the highest accuracy of modeling and validation, and selects the two parameters to participle in the subsequent SVM model establishment. Due to the problem of two-parameter selection, there may be a situation in which there are multiple-unit combinations of C and g that correspond to the highest accuracy of the validation. Therefore, the strategy of selecting the combination of C and g can reach the highest validation accuracy when the parameter C is minimized. If g corresponding to the minimum C has several groups, the first g that was searched for as the optimal parameter is selected.
The stress sensitive bands with a significant correlation were selected as the related factors. The data of six sample points (12 soil samples, accounting for 60% of the total samples) from the fly-ash and coal gangue filling sites were used as the training data, and the remaining eight soil samples (accounting for 40% of the total samples) were used as the testing data.
The map of parameter optimization of the SVM network's parameters C and g for Cr is shown in Figure 6. When = 0.050766 and = 51.9842, the generalization ability of the support vector machines is the best and thus the parameter combination participates in the subsequent process of the regression forecast. The optimization methods for the rest of the elements are the same. The results of using the SMO-SVM model for estimating the five types of heavy metal contents in the reclamation soil of the mining area are shown in Table 5.  The results of using the SMO-SVM model for estimating the five types of heavy metal contents in the reclamation soil of the mining area are shown in Table 5. As shown in Table 5 As shown in Table 5

Comparison of the Estimation Results of Heavy Metals Based on Different Models
All of the spectral estimation results of heavy metals in the mine reclamation soils based on the three models are shown in Table 6. The SMO-SVM model provides the best estimation of the heavy metal content in the mine reclamation soils, and the stability and accuracy of its estimation results are higher than those of the GRNN and MLR models. Among the five heavy metals, the estimation for Cr using the SMO-SVM model is the best, and its R 2 value reaches 0.8628. The estimation for Cd is second best, and its R 2 reaches 0.8532. The estimation for Zn is the worst, but its R 2 reaches 0.7988. The results show that the estimations of the Cd and Cr contents in the soil samples are relatively good.

Comparison of the Estimation Results of Heavy Metals Based on Different Models
All of the spectral estimation results of heavy metals in the mine reclamation soils based on the three models are shown in Table 6. The SMO-SVM model provides the best estimation of the heavy metal content in the mine reclamation soils, and the stability and accuracy of its estimation results are higher than those of the GRNN and MLR models. Among the five heavy metals, the estimation for Cr using the SMO-SVM model is the best, and its R 2 value reaches 0.8628. The estimation for Cd is second best, and its R 2 reaches 0.8532. The estimation for Zn is the worst, but its R 2 reaches 0.7988. The results show that the estimations of the Cd and Cr contents in the soil samples are relatively good.

Demonstration of the Optimization Model Used in Estimating the Content of Cd in Wheat
Mine reclamation, especially mine soil reclamation, has been given high priority in agricultural production in China. The heavy metal elements in mine reclamation soil can enter the human body through food crops, therefore, it is significant to study the heavy metal content in crops planted on mine reclamation soils and to evaluate their food safety implications. To demonstrate the feasibility of spectral estimation for the heavy metal contents in crops using the SMO-SVM method, Cd in the wheat planted on mine reclamation soils was selected as the study case.

Determination of the Content of Cd in Wheat
Wheat samples that were planted on the soil sample points were collected at the same time as the collection of the soil samples. The stalk of wheat was collected as the research sample, and 10 wheat samples were collected on each of the three areas. The contents of Cd in the wheat samples are shown in Table 7 according to a traditional chemical analysis method.

Selection of Stress Sensitive Bands of Cd Pollution in Wheat
Because the effect of the spectral inverse logarithmic transformation is relatively unsatisfactory, the original spectral reflectance of the wheat samples was pre-processed by the first-order differential transformation and envelope elimination methods. Correlation analysis was carried out between the content of heavy metal Cd in the wheat samples and the spectral reflectance obtained by performing the first-order differential transformation and envelope elimination on the corresponding 350-2500 nm wavelength range in the spectral data of the samples. The curve of the correlation coefficient varying with wavelength is shown in Figure 8. Figure 8a shows the fluctuation of the relativity between the content of Cd and the spectral reflectance of the wheat after performing the first-order differential transformation varying with wavelength. It presents strong negative correlations at 380, 390, 400, 670, 880, 890, 900 and 2200 nm. All of their values are larger than the maximum correlation coefficient obtained by performing the envelope elimination in Figure 8b. Therefore, the aforementioned eight wavelengths were selected as the stress sensitive bands for Cd pollution in wheat. varying with wavelength is shown in Figure 8. Figure 8a shows the fluctuation of the relativity between the content of Cd and the spectral reflectance of the wheat after performing the first-order differential transformation varying with wavelength. It presents strong negative correlations at 380, 390, 400, 670, 880, 890, 900 and 2200 nm. All of their values are larger than the maximum correlation coefficient obtained by performing the envelope elimination in Figure 8b. Therefore, the aforementioned eight wavelengths were selected as the stress sensitive bands for Cd pollution in wheat.
(a) (b) Figure 8. Correlation coefficients of the Cd concentration in the wheat planted on mine reclamation soils with transformed spectra. The spectral reflectance were obtained by performing the first-order differential transformation (a) and envelope elimination (b) on the corresponding 350-2500 nm wavelength range in the spectral data of the samples.

Spectral Estimation of the Cd Content in the Wheat Planted in the Reclamation Soil of Mining Areas
The stress sensitive bands of Cd pollution in wheat were selected as the related factors. The data of six sample points (12 soil samples) from the coal gangue and fly-ash reclamation areas were used as the training data, and the remaining eight samples were used as the testing data. The model was established by using the SMO-SVM method. When the penalty parameter C and the kernel function parameter g are 0.1768 and 2.8284, respectively, through cross validation, the corresponding RMSE reaches the minimum value of 0.0446. In this case, the generalization of the support vector machines is the best and thus the parameter combination participates in the subsequent process of the regression forecast. The result is shown in Figure 9. Correlation coefficients of the Cd concentration in the wheat planted on mine reclamation soils with transformed spectra. The spectral reflectance were obtained by performing the first-order differential transformation (a) and envelope elimination (b) on the corresponding 350-2500 nm wavelength range in the spectral data of the samples.

Spectral Estimation of the Cd Content in the Wheat Planted in the Reclamation Soil of Mining Areas
The stress sensitive bands of Cd pollution in wheat were selected as the related factors. The data of six sample points (12 soil samples) from the coal gangue and fly-ash reclamation areas were used as the training data, and the remaining eight samples were used as the testing data. The model was established by using the SMO-SVM method. When the penalty parameter C and the kernel function parameter g are 0.1768 and 2.8284, respectively, through cross validation, the corresponding RMSE reaches the minimum value of 0.0446. In this case, the generalization of the support vector machines is the best and thus the parameter combination participates in the subsequent process of the regression forecast. The result is shown in Figure 9.
as the training data, and the remaining eight samples were used as the testing data. The model was established by using the SMO-SVM method. When the penalty parameter C and the kernel function parameter g are 0.1768 and 2.8284, respectively, through cross validation, the corresponding RMSE reaches the minimum value of 0.0446. In this case, the generalization of the support vector machines is the best and thus the parameter combination participates in the subsequent process of the regression forecast. The result is shown in Figure 9.  Using the regression model based on the SMO-SVM method to estimate the Cd's content in the wheat that were planted in the reclamation soils of the mining area, the correlation coefficient R 2 and root mean square error RMSE between the measured and the estimated values were found to be 0.6683 and 0.0489, respectively, suggesting that it is feasible to use spectral data to estimate the heavy metal content in the wheat planted in the reclamation soils of mining areas. However, the estimation compared to the soil is relatively unsatisfactory, which may attribute to the fact that the spectral reflectance are affected by multiple components in crop samples, i.e., moisture, chlorophyll, protein and lignin. Further study is needed to develop a more scientific and reasonable method for the spectral estimation of heavy metals in crops.

Conclusions
In this research, the MLR, GRNN and SMO-SVM models were established to estimate the heavy metal contents in the reclamation soil of mining areas, and the optimal model was found and used to demonstrate its use in estimating Cd in the wheat planted on mine reclamation soils. The spectral inversion models for heavy metals in soil established based on the MLR, GRNN and SMO-SVM perform satisfactory to a certain degree. The R 2 values of the MLR model, which provides the worst estimates, remain above 0.46. This result suggests that the stress sensitive bands of the heavy metal pollution selected by the treatment method of the spectral data and correlation analysis contain sufficient effective spectral information. The established model based on the GRNN has an advantage for modelling small samples in this study. Compared with the MLR model, the GRNN model estimations are superior for the five types of heavy metals in the reclamation soils of mining areas. The stability and accuracy of the SMO-SVM model are the best, compared to those of the GRNN and MLR models. Among the five types of heavy metals, the estimations of Cd and Cr are the best using the SMO-SVM model. Using the optimal SMO-SVM model to estimate the Cd content in the wheat planted in the reclamation soils of mining areas, the R 2 and RMSE values between the measured and estimated values are relatively good, suggesting that the method is feasible.