Estimation Methods for Soil Mercury Content Using Hyperspectral Remote Sensing

Mercury is one of the five most toxic heavy metals to the human body. In order to select a high-precision method for predicting the mercury content in soil using hyperspectral techniques, 75 soil samples were collected in Guangdong Province to obtain the soil mercury content by chemical analysis and hyperspectral data based on an indoor hyperspectral experiment. A multiple linear regression (MLR), a back-propagation neural network (BPNN), and a genetic algorithm optimization of the BPNN (GA-BPNN) were used to establish a relationship between the hyperspectral data and the soil mercury content and to predict the soil mercury content. In addition, the feasibility and modeling effects of the three modeling methods were compared and discussed. The results show that the GA-BPNN provided the best soil mercury prediction model. The modeling R2 is 0.842, the root mean square error (RMSE) is 0.052, and the mean absolute error (MAE) is 0.037; the testing R2 is 0.923, the RMSE is 0.042, and the MAE is 0.033. Thus, the GA-BPNN method is the optimum method to predict soil mercury content and the results provide a scientific basis and technical support for the hyperspectral inversion of the soil mercury content.


Introduction
Mercury (Hg) is a toxic metal contaminant that is released into the environment through natural and anthropogenic emissions [1].It has strong neurotoxicity and teratogenicity and moreover, the fact that mercury is not easily decomposed by microorganisms causes its accumulation.Mercury does not only affect water quality through the leaching of soil water but is also a toxic compound that can affect the growth of crops and eventually accumulate in animal and human bodies through the food chain.It may then cause damage to the central nervous system, heart, and immune system, leading to large-scale outbreaks of disease [2,3].Therefore, soil mercury pollution is of wide concern in countries worldwide and it is very important to monitor the soil mercury content.
The traditional monitoring method for soil heavy metals is field sampling followed by chemical analysis of the sampled soil in the laboratory to obtain the soil heavy metal content using a geostatistical interpolation method [4].This method has high precision but is time-consuming, labor-intensive, cost-intensive, and inefficient [5].It is difficult to monitor the heavy metal content accurately and quickly for large areas [6].Remote sensing technology has the advantages of rapid and large-scale dynamic monitoring and it plays a unique role in the investigation, evaluation, monitoring and management of large-scale open-air agricultural production.It has been found in many spectral studies that the spectral curves of soils containing heavy metals are different from those that are not contaminated by heavy metals [7].Furthermore, with the development of hyperspectral techniques, the use of spectral analysis methods can overcome the shortcomings of traditional monitoring methods and the soil heavy metal content can be determined accurately, efficiently, non-destructively, and on a large scale [8].
Current research has shown that most of the hyperspectral inversion models for the determination of heavy metals can be divided into two categories, namely statistical analysis models and machinelearning models.Statistical analysis models include the following: (1) Single-variable regression is a one-variable model that is established using a spectral index or a band with the highest correlation with the heavy metal content.Since there is only one independent variable, the model is simple but the accuracy is not as high as for multivariate models [9]; (2) Multiple linear regression (MLR) usually uses multiple spectral indices or multiple bands to establish a linear model; although the accuracy is improved, there is a high degree of collinearity between the variables.In order to solve this problem, this method can be improved by using a stepwise regression [10] and enter regression [11].In this improved method, each variable enters the model incrementally and this method introduces meaningful variables and eliminates meaningless variables; (3) Principal component regression (PCR) is a combination of principal component analysis and MLR.Although several uncorrelated factors can be used to represent a large number of variables to establish an MLR, the extracted principal component factors are often not able to provide a realistic background and explanation [12]; (4) Partial least squares regression (PLSR) is a new type of multivariate statistical analysis method that combines the advantages of the three methods (principal component analysis, correlation analysis, and MLR analysis).It is well suited for solving the problem of the internal and highly variable linear correlation and the sample number is lower than the number of variables; this is by far the most commonly used inversion method for determining heavy metal content [13][14][15].
The machine learning models include the following.(1) An artificial neural network (ANN) is a mathematical model based on the behavioral characteristics of animal neural networks and performs distributed parallel information processing.It has the characteristics of self-organization, self-learning, and self-adaptation but it is easy to fall into a local minimum and the model is very complex [16,17]; (2) A support vector machine (SVM) is a statistical learning method based on minimum structural risk.It is capable of obtaining an optimal separation of the hyperplane of a set of training data according to a given error; it requires few samples, is nonlinear, and is suitable for high-dimensional problems but it is difficult to implement for a large number of training samples [18,19].With the development of artificial intelligence algorithms, an increasing number of data mining techniques are used in heavy metal inversions, such as genetic algorithms, random forests, and multiple adaptive regression splines.
The objective of this study is to determine the optimal soil mercury content simulation method by comparing the results of heavy metal content simulations of three commonly used statistical methods, i.e., MLR and two machine learning methods-back-propagation neural network (BPNN) and genetic algorithm optimization of the BPNN (GA-BPNN).The goal is to solve the existing problems in the current hyperspectral estimation of heavy metal content using statistical analysis models and machine learning methods.

Study Area
Guangdong Province in southern China is located at 20.13 -25.31N and 109.39 -117.19E. The study area overview is shown in Figure 1.The area of the province is 179,700 km 2 .The northern region is mostly hilly and has a relatively high elevation.The southern coastal area has relatively low altitude and the terrain is relatively flat.Guangdong Province has a subtropical monsoon climate, the average sunshine duration in the province is 1745.8h, the annual average temperature is 22.3 • C, and the average annual precipitation is between 1300 and 2500 mm.It is one of the regions with the most abundant light, heat, and water resources in China.Since the reform and opening up, Guangdong Province has achieved rapid economic development and high levels of urbanization.It is a province with major economic development and urbanization in China.With the development of the economy, the level of industrialization is also continuously improving and, therefore, the area has become one of the provinces in China where soil heavy metal pollution is relatively serious.

Study Area
Guangdong Province in southern China is located at 20.13′-25.31′N and 109.39′-117.19′E. The study area overview is shown in Figure 1.The area of the province is 179,700 km 2 .The northern region is mostly hilly and has a relatively high elevation.The southern coastal area has relatively low altitude and the terrain is relatively flat.Guangdong Province has a subtropical monsoon climate, the average sunshine duration in the province is 1745.8h, the annual average temperature is 22.3 °C, and the average annual precipitation is between 1300 and 2500 mm.It is one of the regions with the most abundant light, heat, and water resources in China.Since the reform and opening up, Guangdong Province has achieved rapid economic development and high levels of urbanization.It is a province with major economic development and urbanization in China.With the development of the economy, the level of industrialization is also continuously improving and, therefore, the area has become one of the provinces in China where soil heavy metal pollution is relatively serious.

Acquisition and Processing of Soil Data
A total of 75 soil samples were collected in Guangdong Province and were located using GPS positioning; the sampling was conducted at a depth of 0-20 cm and the samples weighed about 300 g.Field samples were collected at 50 km × 50 km scale of sampling grid, and the sample points in densely populated areas and possibly contaminated areas were mainly collected at 30 km × 30 km scale of sampling grid.The sample locations are shown in Figure 1.The soil samples were taken back to the laboratory and naturally dried and the gravel and the residues of animals and plants were removed.Each sample was divided into two parts after grinding and sieving to 0.2 mm for determining the soil heavy metal content and the soil spectral reflectance.For the determination of the soil mercury content, a sample amount of 0.2 g was digested with H2SO4-HNO3-KMnO4 and cold atomic absorption was used.The descriptive statistics of the mercury content of the 75 soil samples are shown in Table 1.
Table 1.Descriptive statistics of the soil mercury content.

Acquisition and Processing of Soil Data
A total of 75 soil samples were collected in Guangdong Province and were located using GPS positioning; the sampling was conducted at a depth of 0-20 cm and the samples weighed about 300 g.Field samples were collected at 50 km × 50 km scale of sampling grid, and the sample points in densely populated areas and possibly contaminated areas were mainly collected at 30 km × 30 km scale of sampling grid.The sample locations are shown in Figure 1.The soil samples were taken back to the laboratory and naturally dried and the gravel and the residues of animals and plants were removed.Each sample was divided into two parts after grinding and sieving to 0.2 mm for determining the soil heavy metal content and the soil spectral reflectance.For the determination of the soil mercury content, a sample amount of 0.2 g was digested with H 2 SO 4 -HNO 3 -KMnO 4 and cold atomic absorption was used.The descriptive statistics of the mercury content of the 75 soil samples are shown in Table 1.The maximum soil mercury content in the study area was 0.615 mg/kg, the minimum was 0.018 mg/kg, and the average was 0.139 mg/kg, which was 1.782 times the background value of Guangdong.The variation coefficient of the mercury content in the soil samples is 84.89%.Generally, the variation coefficient reflects the degree of dispersion and the ranges is between 10% and 100%.Therefore, the soil mercury content in the study area has moderate variability.In order to clarify the model establishment and verification, a comprehensive explanation of the pros and cons of each model is provided.In the subsequent data processing, the 75 samples are arranged in descending order of mercury content and every third sample (25 samples) is used as a test sample and the remaining 50 samples are used as modeling samples to ensure consistency between the modeling and the test samples and an even distribution, as shown in Figure 1.

Collection and Processing of Soil Spectral Data
The soil spectral reflectance was measured using an AvaField portable spectrometer manufactured by Avantes, Holland.The band range was 340.316-2511.179nm, the spectral sampling interval was 0.6 nm, and the measurement light source was a 50-W halogen lamp; the light source was connected to the probe via an optical fiber and the field of view angle was 10 • .The soil sample was placed in a sample dish with a diameter larger than 10 cm and a depth greater than 5 cm and the spectral data were collected by aligning the probe perpendicular to the soil sample.A standard whiteboard calibration was performed prior to each collection.Each soil sample was measured five times and 10 data points were automatically collected.The AvaReader software was used to eliminate the anomalous data and the mean value of the spectral reflectance was used as the reflectance value of the sample.
The spectral measurements are easily influenced by many factors such as the observation angle, illumination, and the surface roughness of the sample; these effects result in a relatively low signal-to-noise ratio (SNR) of the spectral data.Therefore, a transformation process needs to be performed.After the transformation, the original spectral data can be transformed to eliminate the background noise, enhance the difference, and highlight the absorption and reflection characteristics of the spectral curve.In this study, the Savitzky-Golay smoothing filter was used for smoothing and optimization of the spectral curve.The smoothed curve retains the information of the data, as shown in Figure 2a.Based on the smoothed spectral curves, continuum removal (CR), first-order differential (FD), and reciprocal logarithmic (RL) processing were performed separately.The results are shown in Figure 2b-d.The maximum soil mercury content in the study area was 0.615 mg/kg, the minimum was 0.018 mg/kg, and the average was 0.139 mg/kg, which was 1.782 times the background value of Guangdong.The variation coefficient of the mercury content in the soil samples is 84.89%.Generally, the variation coefficient reflects the degree of dispersion and the ranges is between 10% and 100%.Therefore, the soil mercury content in the study area has moderate variability.In order to clarify the model establishment and verification, a comprehensive explanation of the pros and cons of each model is provided.In the subsequent data processing, the 75 samples are arranged in descending order of mercury content and every third sample (25 samples) is used as a test sample and the remaining 50 samples are used as modeling samples to ensure consistency between the modeling and the test samples and an even distribution, as shown in Figure 1.

Collection and Processing of Soil Spectral Data
The soil spectral reflectance was measured using an AvaField portable spectrometer manufactured by Avantes, Holland.The band range was 340.316-2511.179nm, the spectral sampling interval was 0.6 nm, and the measurement light source was a 50-W halogen lamp; the light source was connected to the probe via an optical fiber and the field of view angle was 10°.The soil sample was placed in a sample dish with a diameter larger than 10 cm and a depth greater than 5 cm and the spectral data were collected by aligning the probe perpendicular to the soil sample.A standard whiteboard calibration was performed prior to each collection.Each soil sample was measured five times and 10 data points were automatically collected.The AvaReader software was used to eliminate the anomalous data and the mean value of the spectral reflectance was used as the reflectance value of the sample.
The spectral measurements are easily influenced by many factors such as the observation angle, illumination, and the surface roughness of the sample; these effects result in a relatively low signalto-noise ratio (SNR) of the spectral data.Therefore, a transformation process needs to be performed.After the transformation, the original spectral data can be transformed to eliminate the background noise, enhance the difference, and highlight the absorption and reflection characteristics of the spectral curve.In this study, the Savitzky-Golay smoothing filter was used for smoothing and optimization of the spectral curve.The smoothed curve retains the information of the data, as shown in Figure 2a.Based on the smoothed spectral curves, continuum removal (CR), first-order differential (FD), and reciprocal logarithmic (RL) processing were performed separately.The results are shown in Figure 2b-d

Feature Band Selection
A relationship exists between the spectral reflectance and the soil mercury content.We used a correlation analysis and a significance level greater than p = 0.01 to determine the bands with high correlation coefficients; the variance inflation factor (VIF) was used to verify the autocorrelation between the bands.The larger the value of the VIF, the greater the collinearity is.The empirical evaluation showed that no multicollinearity existed at VIF values between 0 and 10; at VIF values between 10 and 100, there was high multicollinearity and at VIF values greater than 100, there was very high multicollinearity.In this study, Pearson's correlation coefficient was used to describe the relationship between the soil spectral characteristics and the soil mercury content.The Pearson's correlation coefficient reflects the degree of linear correlation between the two variables and is one of the most widely used relational measures.It is defined as the product of the covariance of two variables divided by the product of the standard deviation of the two variables [20]; Pearson's correlation coefficient is expressed as shown in Equation (1): where   is the reflectance of the i th band,   is the ith soil mercury content, ̅ is the average of the band reflectance, and  ̅ is the average mercury content of the soil.The range of the coefficient . When the sign is positive, the two variables are positively correlated and vice versa.The greater the absolute value of the coefficient, the greater the linear correlation between the two variables is.

MLR Method for Determination of the Soil Mercury Content
The MLR method was first proposed by Francis Galton in the late 19 th century and was used for model prediction in the early days.It is a classical statistical analysis method based on the least squares method and is used to establish a linear equation to explain the relationship between two or more independent variables and a dependent variable [8,21].The general form of the model is shown in Equation ( 2): where Y is the soil mercury content,   is the reflectance of the j th feature band,   is the j th regression coefficient, n is the number of feature bands, and ε is the random error.The matrix expression of the equation is shown in Equation ( 3):

Feature Band Selection
A relationship exists between the spectral reflectance and the soil mercury content.We used a correlation analysis and a significance level greater than p = 0.01 to determine the bands with high correlation coefficients; the variance inflation factor (VIF) was used to verify the autocorrelation between the bands.The larger the value of the VIF, the greater the collinearity is.The empirical evaluation showed that no multicollinearity existed at VIF values between 0 and 10; at VIF values between 10 and 100, there was high multicollinearity and at VIF values greater than 100, there was very high multicollinearity.In this study, Pearson's correlation coefficient was used to describe the relationship between the soil spectral characteristics and the soil mercury content.The Pearson's correlation coefficient reflects the degree of linear correlation between the two variables and is one of the most widely used relational measures.It is defined as the product of the covariance of two variables divided by the product of the standard deviation of the two variables [20]; Pearson's correlation coefficient is expressed as shown in Equation (1): where x i is the reflectance of the i th band, y i is the ith soil mercury content, x is the average of the band reflectance, and y is the average mercury content of the soil.The range of the coefficient When the sign is positive, the two variables are positively correlated and vice versa.The greater the absolute value of the coefficient, the greater the linear correlation between the two variables is.

MLR Method for Determination of the Soil Mercury Content
The MLR method was first proposed by Francis Galton in the late 19th century and was used for model prediction in the early days.It is a classical statistical analysis method based on the least squares method and is used to establish a linear equation to explain the relationship between two or more independent variables and a dependent variable [8,21].The general form of the model is shown in Equation ( 2): where Y is the soil mercury content, X j is the reflectance of the j th feature band, β j is the j th regression coefficient, n is the number of feature bands, and ε is the random error.The matrix expression of the equation is shown in Equation ( 3): where X is the full rank matrix and β is predicted by the least squares method.The estimated value is calculated by Equation ( 4): Therefore, according to Equations ( 3) and ( 4), the predicted value of the soil mercury content is calculated by Equation ( 5

BPNN Method for Determination of the Soil Mercury Content
In this study, a BPNN is used to predict the soil mercury content.The structure of the BPNN is shown in Figure 3 and it can be called a "black box" model.The BPNN learns and is trained by a guided learning method that predicts the relationship between any nonlinear input variable and output variable.The learning process is composed of forward propagation of the input signal and BP of the error.The training process consists of continuously adjusting the connection weights until the output error reaches a required standard [22].
Sustainability 2018, 10, x FOR PEER REVIEW 6 of 14 where X is the full rank matrix and β is predicted by the least squares method.The estimated value is calculated by Equation ( 4): Therefore, according to Equations ( 3) and ( 4), the predicted value of the soil mercury content is calculated by Equation ( 5 In this study, a BPNN is used to predict the soil mercury content.The structure of the BPNN is shown in Figure 3 and it can be called a "black box" model.The BPNN learns and is trained by a guided learning method that predicts the relationship between any nonlinear input variable and output variable.The learning process is composed of forward propagation of the input signal and BP of the error.The training process consists of continuously adjusting the connection weights until the output error reaches a required standard [22].The input layer to the hidden layer is expressed as shown in Equation ( 6): where   is the input layer information, which in this case is the reflectance of the feature band;   is the hidden layer information;   represents the weight of the input layer to the hidden layer; and   is the transfer function of the input layer to the hidden layer.This is generally a sigmoid function but in this study, the tansig function is used;   is the threshold of the hidden layer.
The hidden layer to the output layer is expressed as shown in Equation (7): The input layer to the hidden layer is expressed as shown in Equation ( 6): where o i is the input layer information, which in this case is the reflectance of the feature band; o j is the hidden layer information; ω ji represents the weight of the input layer to the hidden layer; and f i is the transfer function of the input layer to the hidden layer.This is generally a sigmoid function but in this study, the tansig function is used; θ j is the threshold of the hidden layer.
The hidden layer to the output layer is expressed as shown in Equation (7): where o k is the output layer information, which in this case is the predicted value of the soil mercury content; ω kj represents the weight of the hidden layer to the output layer; f j is the transfer function of the hidden layer to the output layer; the purelin function is selected in this study; θ k is the threshold of the output layer.
If there is a large difference between the predicted value and the measured value, this discrepancy is transferred to the error propagation process.The BP process uses the Levenberg-Marquardt algorithm to correct the connection weights from the output layer to the input layer to reduce the mean squared error: where o is the measured soil mercury content and N is the number of training samples.

GA-BPNN Method for Determination of the Soil Mercury Content
A GA is a stochastic global search and optimization method that mimics the biological evolution mechanism in nature.It is robust, does not easily fall into a local optimum, and can be used for parallel distributed processing [23].Therefore, we combined the GA and BPNN by using the population search method to optimize the weights and thresholds of the NN; the structure of the GA-BPNN is shown in Figure 3.
We use a real-number coding method to transform the initial weights and thresholds in the BPNN into chromosomes in the GA.The code length is calculated using Equation ( 9): where i is the number of input layer neuron nodes, which in this case is the number of feature bands; k is the number of output layer neuron nodes; k = 1 because the output layer consists only of the soil mercury content; j is the number of hidden layer neuron nodes.Then, a random population of chromosomes is generated.The BPNN is used to obtain the sum of the absolute value of the error between the predicted and measured values of the training data as the individual fitness value.The formula is shown in Equation (10): where y k is the measured value of the mercury content in the k th soil sample; o k is the predicted value of the mercury content in the k th soil sample.The larger the fitness value, the larger the error is; therefore, the reciprocal of the fitness value should be used prior to the selection operation.Furthermore, individual evolutionary operations such as roulette selection, real crossover, and mutation are performed until the training target reaches the preset requirements or the number of iterations is reached.The optimal solution of the GA is used as the initial weight and the threshold of the BPNN, that is, ω and θ in Equations ( 6) and (7); subsequently, the BPNN is trained to obtain the optimal solution.
To effectively determine the optimal soil mercury content simulation method, we present a flow chart which is shown in Figure 4.

Feature Band Selection Results
The Pearson's correlation analysis was performed using the four spectral indices (i.e., the smoothed spectral reflectance, CR spectral reflectance, FD spectral reflectance, and RL spectral reflectance) and the soil mercury content; the result is shown in Figure 5.
It can be seen that the absolute value of the correlation coefficient between the spectral reflectance (the smoothed spectral reflectance is in the wavelength range of 350-695 nm and 2216-2228 nm, the CR spectral reflectance is in the range of 356-685 nm and 2200-2228 nm, and the RL spectral reflectance is in the range of 355-674 nm and 2171-2500 nm) and the soil mercury content was greater than 0.260 (significance level of p = 0.01), which means that the correlation is significant.The correlation coefficient between the FD spectral reflectance and the soil mercury content was considerably higher than the coefficients for the other three spectral indices and the number of bands with a correlation coefficient greater than 0.260 was higher; in addition, the absolute values of the correlation coefficients were significantly higher.It was found that the highest positive correlations of the FD spectral reflectance occurred at 465.351 nm, 799.18 nm, 1373.48 nm, and 2114.978nm and the lowest negative correlations occurred at 587.705 nm, 1035.788nm, and 1975.4 nm; the absolute values of these correlation coefficients are all greater than 0.300.Therefore, we selected the bands where the correlation coefficients were highest or lowest and had no collinearity to predict the soil mercury content.A total of 13 bands were selected as the feature bands, as shown in Table 2.

Feature Band Selection Results
The Pearson's correlation analysis was performed using the four spectral indices (i.e., the smoothed spectral reflectance, CR spectral reflectance, FD spectral reflectance, and RL spectral reflectance) and the soil mercury content; the result is shown in Figure 5.
It can be seen that the absolute value of the correlation coefficient between the spectral reflectance (the smoothed spectral reflectance is in the wavelength range of 350-695 nm and 2216-2228 nm, the CR spectral reflectance is in the range of 356-685 nm and 2200-2228 nm, and the RL spectral reflectance is in the range of 355-674 nm and 2171-2500 nm) and the soil mercury content was greater than 0.260 (significance level of p = 0.01), which means that the correlation is significant.The correlation coefficient between the FD spectral reflectance and the soil mercury content was considerably higher than the coefficients for the other three spectral indices and the number of bands with a correlation coefficient greater than 0.260 was higher; in addition, the absolute values of the correlation coefficients were significantly higher.It was found that the highest positive correlations of the FD spectral reflectance occurred at 465.351 nm, 799.18 nm, 1373.48 nm, and 2114.978nm and the lowest negative correlations occurred at 587.705 nm, 1035.788nm, and 1975.4 nm; the absolute values of these correlation coefficients are all greater than 0.300.Therefore, we selected the bands where the correlation coefficients were highest or lowest and had no collinearity to predict the soil mercury content.A total of 13 bands were selected as the feature bands, as shown in Table 2.

MLR Model Prediction Results of Soil Mercury Content
In this study, 13 feature bands were used as independent variables and the corresponding soil mercury content was used as the dependent variable to perform a regression analysis using Equation ( 2).The MLR model is shown in Equation (11): The predicted value is obtained and is compared with the measured value.The result is shown in Figure 6; the X-coordinates are the measured values, and the Y-coordinates are the predicted values.

MLR Model Prediction Results of Soil Mercury Content
In this study, 13 feature bands were used as independent variables and the corresponding soil mercury content was used as the dependent variable to perform a regression analysis using Equation ( 2).The MLR model is shown in Equation ( 11 The predicted value is obtained and is compared with the measured value.The result is shown in Figure 6; the X-coordinates are the measured values, and the Y-coordinates are the predicted values.In this study, three indicators are selected to test the accuracy of the model; these are the coefficient of determination (R 2 ), root mean squared error (RMSE), and mean absolute error (MAE), as shown in Equations ( 12)-( 14).The range of R 2 is [0, 1]; the larger the R 2 value, the stronger the linear relationship between the measured value and the predicted value is and the more stable the model is.The smaller the RMSE and MAE, the better the model predictability is.
where y is the measured value of the soil mercury content;  ̂ is the predicted value of the soil mercury content;  ̅ is the measured mean value of the soil mercury content; n is the number of samples.
The predicted value of the MLR model is quite different from the measured value.There are many points that exhibit large differences and a trend is apparent.The points deviate from the 1:1 line to varying degrees.The modeling  2 is 0.665, the RMSE is 0.076, and the MAE is 0.059; the testing  2 is 0.665, the RMSE is 0.087, and the MAE is 0.063.

BPNN Model Prediction Results of Soil Mercury Content
A three-layer BPNN with a single hidden layer was used to predict the soil mercury content in this study.An arbitrary nonlinear mapping is achieved by adjusting the number of neurons in the hidden layer.The input layer of the network was composed of the reflectance of the 13 feature bands and the output layer was the soil mercury content.Through several experiments, it was finally determined that the number of neurons in the hidden layer was 13, the learning rate was 0.1, the training frequency was 1000, and the expected error was 0.0001.The result is shown in Figure 7.In this study, three indicators are selected to test the accuracy of the model; these are the coefficient of determination (R 2 ), root mean squared error (RMSE), and mean absolute error (MAE), as shown in Equations ( 12)-( 14).The range of R 2 is [0, 1]; the larger the R 2 value, the stronger the linear relationship between the measured value and the predicted value is and the more stable the model is.The smaller the RMSE and MAE, the better the model predictability is.
where y is the measured value of the soil mercury content; ŷ is the predicted value of the soil mercury content; y is the measured mean value of the soil mercury content; n is the number of samples.The predicted value of the MLR model is quite different from the measured value.There are many points that exhibit large differences and a trend is apparent.The points deviate from the 1:1 line to varying degrees.The modeling R 2 is 0.665, the RMSE is 0.076, and the MAE is 0.059; the testing R 2 is 0.665, the RMSE is 0.087, and the MAE is 0.063.

BPNN Model Prediction Results of Soil Mercury Content
A three-layer BPNN with a single hidden layer was used to predict the soil mercury content in this study.An arbitrary nonlinear mapping is achieved by adjusting the number of neurons in the hidden layer.The input layer of the network was composed of the reflectance of the 13 feature bands and the output layer was the soil mercury content.Through several experiments, it was finally determined that the number of neurons in the hidden layer was 13, the learning rate was 0.1, the training frequency was 1000, and the expected error was 0.0001.The result is shown in Figure 7.In the BPNN model, the points are located close to the 1:1 line but a few points exhibit slight deviations.The modeling  2 is 0.797, the RMSE is 0.059, and the MAE is 0.032; the testing  2 is 0.826, the RMSE is 0.063, and the MAE is 0.048.

GA-BPNN Model Prediction Results of Soil Mercury Content
In order to compare the results of the GA optimization more accurately, the network structure and parameter configuration were the same as in the BPNN.The evolution algebra is set to 100 times, the population size is 64, the crossover probability is 0.4, and the mutation probability is 0.07.The result is shown in Figure 8.In the GA-BPNN model, the points are located closest to the 1:1 line and the trend is more consistent with the 1:1 line.The modeling  2 is 0.842, the RMSE is 0.052, and the MAE is 0.037; the testing  2 is 0.923, the RMSE is 0.042, and the MAE is 0.033.

Comparison of Models
The model accuracy indicators are shown in Table 3.It is evident that the MLR model is inferior to the BPNN and GA-BPNN models both in modeling accuracy and testing accuracy.This shows that there is a clear non-linear relationship between the selected feature bands and the soil mercury In the BPNN model, the points are located close to the 1:1 line but a few points exhibit slight deviations.The modeling R 2 is 0.797, the RMSE is 0.059, and the MAE is 0.032; the testing R 2 is 0.826, the RMSE is 0.063, and the MAE 0.048.

GA-BPNN Model Prediction Results of Soil Mercury Content
In order to compare the results of the GA optimization more accurately, the network structure and parameter configuration were the same as in the BPNN.The evolution algebra is set to 100 times, the population size is 64, the crossover probability is 0.4, and the mutation probability is 0.07.The result is shown in Figure 8.In the BPNN model, the points are located close to the 1:1 line but a few points exhibit slight deviations.The modeling  2 is 0.797, the RMSE is 0.059, and the MAE is 0.032; the testing  2 is 0.826, the RMSE is 0.063, and the MAE is 0.048.

GA-BPNN Model Prediction Results of Soil Mercury Content
In order to compare the results of the GA optimization more accurately, the network structure and parameter configuration were the same as in the BPNN.The evolution algebra is set to 100 times, the population size is 64, the crossover probability is 0.4, and the mutation probability is 0.07.The result is shown in Figure 8.In the GA-BPNN model, the points are located closest to the 1:1 line and the trend is more consistent with the 1:1 line.The modeling  2 is 0.842, the RMSE is 0.052, and the MAE is 0.037; the testing  2 is 0.923, the RMSE is 0.042, and the MAE is 0.033.

Comparison of Models
The model accuracy indicators are shown in Table 3.It is evident that the MLR model is inferior to the BPNN and GA-BPNN models both in modeling accuracy and testing accuracy.This shows that there is a clear non-linear relationship between the selected feature bands and the soil mercury In the GA-BPNN model, the points are located closest to the 1:1 line and the trend is more consistent with the 1:1 line.The modeling R 2 is 0.842, the RMSE is 0.052, and the MAE is 0.037; the testing R is 0.923, the RMSE is 0.042, and the MAE is 0.033.

Comparison of Models
The model accuracy indicators are shown in Table 3.It is evident that the MLR model is inferior to the BPNN and GA-BPNN models both in modeling accuracy and testing accuracy.This shows that results are in agreement with the results of previous studies [15,24], which implies that the selected bands were reliable.
Hyperspectral prediction models of soil mercury content were established using MLR, BPNN, and GA-BPNN.After analyzing and comparing the results of the three methods, it was found that the GA-BPNN model was the best model for predicting the mercury content in soil; the R 2 was 0.842 and the RMSE was 0.052.The superiority of the GA-BPNN model was attributed to the optimization of the BPNN initial input parameters (thresholds and weights) by the GA algorithm; this approach does not have the problem of low accuracy common in MLR methods.
The data in Table 4 indicate that larger errors were observed for the soil samples with high soil mercury content because few samples with high soil mercury content were used to train the MLR, BPNN and GA-BPNN models.Thus, in order to improve the prediction accuracy of the soil mercury content, more soil samples with high mercury content have to be collected to develop soil mercury models in the future.
Otherwise, the study was limited to individual sample points.In order to determine the soil mercury content at the regional scale, hyperspectral images should be combined with the models.

Figure 1 .
Figure 1.Study area and the distribution of the soil sampling points.

Figure 1 .
Figure 1.Study area and the distribution of the soil sampling points.

Figure 2 .
Figure 2. The spectral reflectance curves of the soil samples.(a) Smoothed spectral curves; (b) continuum removal spectral curves; (c) first-order differential spectral curves; (d) reciprocal logarithmic spectral curves.

3 .
BPNN Method for Determination of the Soil Mercury Content

Figure 3 .
Figure 3.The structure of the GA-BPNN.

Figure 3 .
Figure 3.The structure of the GA-BPNN.

Figure 4 .
Figure 4. Flow chart for determining the optimal soil mercury content simulation method.

Figure 4 .
Figure 4. Flow chart for determining the optimal soil mercury content simulation method.

Figure 5 .
Figure 5. Correlation coefficients between soil spectral indices and soil mercury content.(a) The soil mercury content and the smoothed spectral reflectance; (b) the soil mercury content and the CR spectral reflectance; (c) the soil mercury content and the FD spectral reflectance; (d) the soil mercury content and the RL spectral reflectance.

Figure 5 .
Figure 5. Correlation coefficients between soil spectral indices and soil mercury content.(a) The soil mercury content and the smoothed spectral reflectance; (b) the soil mercury content and the CR spectral reflectance; (c) the soil mercury content and the FD spectral reflectance; (d) the soil mercury content and the RL spectral reflectance.

Figure 6 .
Figure 6.Measured and MLR predicted values of the soil mercury content.

Figure 6 .
Figure 6.Measured and MLR predicted values of the soil mercury content.

Figure 7 .
Figure 7. Measured and BPNN predicted values of the soil mercury content.

Figure 8 .
Figure 8. Measured and GA-BPNN predicted values of the soil mercury content.

Figure 7 .
Figure 7. Measured and BPNN predicted values of the soil mercury content.

Figure 7 .
Figure 7. Measured and BPNN predicted values of the soil mercury content.

Figure 8 .
Figure 8. Measured and GA-BPNN predicted values of the soil mercury content.

Figure 8 .
Figure 8. Measured and GA-BPNN predicted values of the soil mercury content.

Table 1 .
Descriptive statistics of the soil mercury content.