Rapid Determination of Low Heavy Metal Concentrations in Grassland Soils around Mining Using Vis–NIR Spectroscopy: A Case Study of Inner Mongolia, China

Proximal sensing offers a novel means for determination of the heavy metal concentration in soil, facilitating low cost and rapid analysis over large areas. In this respect, spectral data and model variables play an important role. Thus far, no attempts have been made to estimate soil heavy metal content using continuum-removal (CR), different preprocessing and statistical methods, and different modeling variables. Considering the adsorption and retention of heavy metals in spectrally active constituents in soil, this study proposes a method for determining low heavy metal concentrations in soil using spectral bands associated with soil organic matter (SOM) and visible–near-infrared (Vis–NIR). To rapidly determine the concentration of heavy metals using hyperspectral data, partial least squares regression (PLSR), principal component regression (PCR), and support vector machine regression (SVMR) statistical methods and 16 preprocessing combinations were developed and explored to determine an optimal combination. The results showed that the multiplicative scatter correction and standard normal variate preprocessing methods evaluated with the second derivative spectral transformation method could accurately determine soil Cr and Ni concentrations. The root-mean-square error (RMSE) values of Vis–NIR model combinations with PLSR, PCR, and SVMR were 0.34, 3.42, and 2.15 for Cr, and 0.07, 1.78, and 1.14 for Ni, respectively. Soil Cr and Ni showed strong spectral responses to the Vis–NIR spectral band. The R2 value of the Vis–NIR-based PLSR model was higher than 0.99, and the RMSE value was 0.07–0.34, suggesting higher stability and accuracy. The results were more accurate for Ni than Cr, and PLSR showed the best performance, followed by SVMR and PCR. This perspective has critical implications for guiding quantitative biogeochemical analysis using proximal sensing data.


Introduction
Although coal mining promotes local economies, it also causes serious environmental pollution [1][2][3]. Heavy metals in coal and coal spoil can enter soil through various routes, leading to the contamination of soil around mining areas [4,5]. Soil heavy metal contamination not only increases food safety risks, but also directly threatens human health [6]. In particular, heavy metals in the human body can undergo a latent accumulation process, and when their content exceeds the maximum capacity of the human body, various diseases may arise. Heavy metal poisoning increases the likelihood of liver, kidney, stomach, and nerve tissue damage, leading to teratogenesis, carcinogenesis, and mutagenesis, in serious cases. Therefore, with increasing focus on environmental issues and ecological conservation, the real-time monitoring of soil around mining areas has become an urgent requirement.
A critical aspect of the effective prevention and control of soil heavy metal pollution is rapidly acquiring accurate information on the concentration and spatial distribution of heavy metals. However, traditional methods of monitoring and identifying soil heavy metals involve field collection and lab analysis of samples [7]. Although such methods provide highly accurate results, they are laborious, costly, and time-consuming in large-scale monitoring of soil heavy metal concentrations. Therefore, it is difficult to describe dynamic changes of pollution elements on a large scale using traditional methods because they have spatial and temporal limitations. With the advantages of rapidity, non-destructivity, and high spectral resolution, hyperspectral proximal sensing has momentous functions in quantitative soil monitoring [8][9][10]. Considering its research value and practical significance, hyperspectral proximal sensing was introduced into the rapid determination of soil heavy metal concentration around mining areas. Vis-NIR has been used to determine heavy metal concentrations in soils since 1997 [11]. The Vis-NIR reflectance of soil can provide information on the accumulation properties of heterogeneous combinations of organic matter (OM), soil moisture, particle size and distribution, iron oxide, soil mineralogy, and parent material.
The accuracy of models based on hyperspectral data for determining soil heavy metals is affected by different physicochemical properties of different types of soil, differences in heavy metal content, different methods of data preprocessing, spectral resolutions, band ranges used, and different forms of transformations. In most instances, preprocessing variables can effectively eliminate and reduce multicollinearity and randomness between spectral bands to improve the accuracy and stability of the model [12]. Current approaches toward improving modeling accuracy can be mainly classified as follows: (1) Using a band combination approach based on comprehensive information associated with spectral signals, and transforming multiband reflectance by certain mathematical processes, to highlight major information and minimize minor information. This approach could be applied to eliminate the effect of multicollinearity among variables, reduce effective signal-to-noise ratio (SNR), and eliminate background interference, thus enhancing useful information and suppressing interference [13,14]; (2) The response of spectral bands varies widely among soil properties. Many researchers have removed noise generated during spectral analyses using the spectral information of pretreated raw soil and removed the effects of baseline and overlap to a certain extent, with good performance of the constructed models [15,16]. All preprocessing techniques aim to reduce un-modeled variability in data, which is necessary for enhancing spectral information [17,18].
Another important factor affecting the predictive capacity of models is band selection [19]. Soil reflectance is only loosely associated with the concentration of transition elements [20]. At low concentrations, heavy metals in soil cannot be identified directly with Vis-NIR reflectance [21,22]. Studies have demonstrated that Fe oxides, clays, and OM exhibit spectral activity in Vis-NIR spectra [23,24]. Therefore, soil spectral reflectance can reflect the concentration of heavy metals in soil according to the correlation between contaminant elements and active spectral components in soil [8,22,25]. Heavy metals and soil components, such as soil organic matter (SOM), clay minerals, and Ferromanganese (Fe-Mn) oxide, exhibit prominent adsorption characteristics, enabling the indirect prediction of heavy metal concentration from soil reflectance [26,27]. The adsorption and retention of heavy metals by spectrally active components in soil vary with the contamination elements and soil conditions. Some scholars used the adsorption relationship of SOM, clay minerals, and heavy metals in soil to indirectly establish an inversion model for heavy metals in soil [28][29][30][31]. Via simultaneous adsorption-desorption analyses of Cd, Cr, Cu, Ni, Pb, and Zn, researchers found that OM has stronger adsorption for Ni, and clays containing kaolinite have strong retention for Ni [32]. Moreover, studies investigating the behaviors of Ni and Zn in adsorption and desorption experiments have found that Ni binds to clay and SOM with relatively high intensity [33,34]. Although heavy metals with low concentrations have no spectral characteristics in the Vis-NIR region, the concentrations of non-characteristic elements in soil can be predicted by their correlations with OM, clay minerals, and iron oxides [22,35,36]. The determination of heavy metal concentration using hyperspectral proximal sensing is affected not only by the spectral band, but also by the original spectral noise. As a consequence, it is necessary to select specific treatment methods and modeling variables according to the spectral characteristics of the soil.
The application of spectroscopy is to establish the mathematical relationship between spectral and soil properties based on a calibration model. Once a calibration model is developed, it can be used to predict the chemical or physical properties of unknown samples. For this purpose, different multivariate statistical methods can be used. The most commonly used methods include multiple linear regression (MLR) [37], principal component regression (PCR) [38], partial least squares regression (PLSR) [39], artificial neural networks (ANNs) [40], support vector machine regression (SVMR) [41], and regression trees [42]. There is no best method because each one has its advantages and drawbacks. For example, PCR and PLSR have the advantage of handling data multicollinearity compared to MLR, but they are only capable of estimating the linear relationship between spectral and soil properties. On the contrary, the latest techniques, ANN and SVMR, can manage the nonlinear behavior of soil reflectance [23]. In particular, SVMR is based on the statistical learning theory [43] and exhibits high performance in training calibration models with few samples. However, there is no specific conclusion regarding the most effective and accurate method.
This study aimed to rapidly determine the concentration of heavy metals using spectral bands associated with SOM and Vis-NIR in soil, taking different grassland soils around two coal mining areas as the research objects. PLSR, PCR, and SVMR statistical methods and 16 preprocessing combinations were developed and explored to determine the optimal combination. The objective was to evaluate the predictability of Cr and Ni concentrations using a Vis-NIR spectroscopy technique, by considering the entire reflectance spectrum (350-2500 nm) and only that related to SOM absorption (600-800 nm). To achieve this, the statistical modeling methods of PLSR, PCR, and SVMR, and 16 preprocessing combinations were tested to determine an optimal combination that provides accurate estimation models. The findings of this study will provide a reference for future related research.

Materials and Methods
A method using Vis-NIR and spectral bands associated with OM is proposed for the determination of low heavy metal concentration in soil. The influence of different preprocessing and statistical methods on the accuracy of the determination model was investigated to achieve the most suitable effect. In order to explore the most suitable model combination for determination, 201 absorption spectral bands associated with SOM and 2150 Vis-NIR spectral bands were extracted as independent variables to establish the estimation model, considering PLSR, PCR, and SVMR for soil Cr and Ni concentrations. The coefficient of determination (R 2 ) and RMSE represent the stability and accuracy of the estimation model, respectively. Three-quarters of the measured soil reflectance spectra were grouped into a calibration set, and the remaining one-quarter of soil reflectance spectra were used as validation samples; the calibration and validation sets comprised 27 and 10 samples, respectively. Data from the other 9 sampling points in the study area were used to validate the PLSR estimation model for Cr and Ni concentrations.

Study Area
In this study, the Huolinhe open cast coal mine and Baiyinhua coal mine were selected as the research objects. Figure 1 presents a schematic diagram of the study area. The base map was the Landsat8 OLI image of the study area, which was downloaded from the geospatial data cloud [44]. Study area 1 is the Huolinhe coalfield, which is located in Tongliao City, Inner Mongolia Autonomous Region. It is the largest open cast coal mine with the highest production among modern coal mines in China; it has a reserve of 13.28 Gt. The Huolinhe coalfield was the first modern open cast coal mine in Asia, with an annual production capacity of 10 Mt. The coalfield is 9 km wide and 60 km long, with a total area of 540 km 2 . There are 9 minable coal seams, with a total thickness of 81.7 m. It stores 13.1 Gt of high-quality lignite, which is 9-fold greater than that of the Fushun Coal Mine, and 4-fold greater than that of the Datong coal mine, and has achieved an annual production capacity of 15 Mt. The geographical coordinates are 119

Sample Collection and Processing
In October 2018, soil samples were collected from grasslands around the two coal mining areas. The plum blossom point distribution method was used to arrange points around the mining area [45]. Soil samples were collected from 0 to 10 cm of the soil layer at five points in each sampling site. The location of each sampling site was recorded using a handheld Global Positioning System (GPS). Approximately 1 kg of each soil sample was collected in a clean plastic bag, sealed, and numbered; a total of 37 soil samples were collected. The samples were dried, pulverized, and sieved (100 mesh sieve). Each sample was divided into two parts, one for chemical analysis of SOM, heavy metals, and water content, and another for spectral analysis in the laboratory.
Soil pH was measured using a pH meter in 1:2.5 (mass to volume ratio) soil and deionized water suspensions. SOM was determined using potassium dichromate. For sample preparation, microwave acid digestion apparatus was used, and the samples were digested with HNO 3 -HF-HClO 4 before analysis. The metal concentration in the samples was determined through inductively coupled plasma atomic emission spectrometry (ICP-AES, Optima 2000DV) [46][47][48].

Acquisition of Indoor Spectral Data of Soil Samples
In this study, an ASD FieldSpec4 spectroradiometer was used for spectral data acquisition. The wavelength range was 350-2500 nm, the spectral resolutions were 3 nm at 700 nm, 30 nm at 1400 nm, and 30 nm at 2100 nm, and the sampling intervals were 1.4 nm at 350-1000 nm and 2 nm at 1000-2500 nm. Soil samples were directly measured by a hand-held soil probe with an embedded light source. The light source was a 50 W halogen lamp. The spectrometer was calibrated by the standard white BaSO 4 panel before determination. The sample was placed in a 6 cm diameter and 1.5 cm deep dish, and spectral reflectance was measured after scraping the soil surface. During measurements, the sample dish was rotated 90 • for three turns. From each soil sample, ten spectral curves were collected in replicates. The mean value was taken as the final reflectance, and a standard white BaSO 4 panel calibration was performed every 15 min. The spectrometer resampled the spectral data at 1 nm intervals during the output values [49].

Data Processing 2.4.1. Continuum-Removal Method
The following process was applied to the resampled data. The CR method is a spectroscopic analysis approach for removing unrelated background features and enhancing absorption characteristics of interest [50]. The CR method can normalize the spectral reflectance to 0-1 while maintaining the same background, effectively highlighting absorption valleys and reflection peaks of the spectral curve. Therefore, the resampled data were first CR processed.

Spectral Data Preprocessing and Transformation
The reflectance (R) and CR were preprocessed by smoothing with the Savitzky-Golay filter (fitting times: 2, window width: 9) [51]. Spectral preprocessing can be applied to remove the effects of scattering between soil samples. Spectral transformation methods can eliminate noise generated by spectral data, highlight spectral valleys and peaks, and enhance the response of heavy metal elements in soil spectra. The R and CR after SG smoothing were used for preprocessing using the normalization (NOR), multiplicative scatter correction (MSC) [52], and standard normal variate (SNV) [53] methods. Finally, the processed data were subjected to First Derivative (FD), Second Derivative (SD), and Reciprocal Logarithm (log(1/R)) spectral transformations. In this manner, 16 methods of preprocessing were evaluated, as shown in Table 1.

Extraction of Absorption Spectral Band Associated with Organic Matter
Previous studies have shown that the main components of soils, such as SOM and clay minerals, have distinct absorption characteristics, and much work has been conducted on their quantitative determination [54][55][56]. The impact of SOM was mainly reflected in the Vis-NIR wavelengths, with the greatest impact in the 600-800 nm band [57]. The raw spectral curves of soil in Figure 2a showed the occurrence of prominent absorption valleys at 1400 and 1900 nm, i.e., water absorption bands, which are usually considered to be related to soil water content. The absorption band was extracted based on the CR spectra, and the absorption band was more pronounced after CR (Figure 2b). The maximum absorption band and absorption width were determined according to the absorption depth, and the SOM characteristic band was extracted at a half-width interval in the absorption region to ensure that the selected spectral band had a strong absorption capacity. Therefore, the absorption spectra at 600-800 nm were considered to be associated with SOM.  In this study, PLSR was used for predicting heavy metal concentrations in soil. PLSR is widely applied in many fields and can be regarded as a reference method. It is a new multivariate statistical regression method that integrates canonical correlation analysis, principal component analysis, and multiple linear regression analysis. The method can use all effective data to construct a model and extract the maximum information reflecting data variation; moreover, it has a good prediction function [58] and a unique advantage in handling variables with high internal correlation. Therefore, PLSR has been receiving increasingly more attention in the field of hyperspectral proximal sensing. This method has been well established in the construction of predictive models for spectral and crop physicochemical parameters and soil information.

Principal Component Regression (PCR)
PCR is an unsupervised pattern recognition algorithm. When establishing a multiple linear regression equation, multicollinearity exists among variables, due to which the coefficients of some independent variables become extremely unstable. When increasing or decreasing variables, the coefficients of independent variables may change significantly, and even lead to symbols inconsistent with the actual situation, leading to inconsistencies in the established regression equation. The PCR algorithm attempts to reduce the dimension of independent variables in order to solve the multicollinearity problem among independent variables, which can enhance relevant information about components and filter out some noise signals that cause interference [59]. This algorithm can extract the principal component containing basic information of the sample and use linear transformation to transform the original high-dimensional data into a tablespace. The new principal component band images obtained by the transformation are not related to each other, and there are significant differences between the data. With increasing eigenvalues, the proportion of the new variables obtained by the transformation to express the original data also increases.

Support Vector Machine Regression (SVMR)
SVMR is a class of generalized linear classifiers for binary classification, which is an important application of support vector machines (SVMs). SVMR has only one class of sample points in the end, and it seeks an optimal hyperplane without maximizing the distance between two or more classes of sample points to the nearest sample point in the hyperplane, as in SVM. On the contrary, SVMR attempts to minimize the distance to the farthest sample point in the hyperplane [60]. It is a new modeling method that improves the generalization ability through the principle of structural risk minimization and better solves various practical problems, such as small samples, nonlinearity, high dimensionality, and local minima. It is emerging as a powerful tool for solving traditional problems such as "dimensional disaster" and "overlearning" [61].
Unscrambler X 10.4 (Unscrambler version X 10.4, CAMO, Trondheim, Norway) and Origin 2021 (for mapping and processing) were used for elemental concentration analysis and monitoring of soil heavy metal contamination.

Description of Soil Samples
The soil was alkalized meadow soil with a pH of approximately 8-8.5. Descriptive statistical analyses of the calibration/validation set (Table 2), including the calculations of mean, standard deviation (std), kurtosis, skewness, coefficient of variation (CV), maximum values, and minimum values, were performed to analyze the soil in the study area. The average values of Cr, Ni, SOM, and water content were 16.59, 5.78, 2.93, and 5.06, respectively. The concentrations of heavy metals were higher than background values in only a few instances, and all mean concentration values were below the national secondary standard values [62]. The concentration ranges of Cr and Ni were 8.02-24.12 mg·kg −1 and 0.01-10.22 mg·kg −1 , respectively. The maximum values of Cr and Ni were 1.14-and 1.01-fold greater than their background values, respectively, indicating a certain enrichment of heavy metals in surface soil. The K-S test indicated that soil data followed a normal distribution. The skewness of Cr and Ni were negative at −0.24 and −0.56, respectively, indicating that high-frequency ranges occurred in areas of high concentrations. The kurtosis of Cr and Ni were positive at 0.11 and 0.70, respectively, indicating that they were more concentrated than the normal distribution.

Estimation Model Based on R and CR Spectral Data
Taking NOR, MSC, and SNV preprocessing methods and FD, SD, and (log (1/R) spectral transformation data as modeling variables, a heavy metal estimation model was developed using the PLSR, PCR, and SVMR methods. Figures 3 and 4 show plots of R 2 and RMSE for the determination of the entire data (37 samples) of Cr and Ni concentrations on the basis of R and CR spectra, in which the circle symbol line represents CR, and the square symbol line represents R. CR can effectively enhance the spectral reflectance characteristics of different land types [63]. The stability and accuracy of the model based on CR spectra were found to be significantly higher than that of R. In general, the R 2 of the two elements in the CR-based model was higher than that of the R-based model, while the RMSE of the CR-based model was lower than that of the R-based model. The results showed that CR can enhance the spectral characteristics and improve the determination accuracy. Therefore, CR data were selected as the basic spectral data in this study.

Estimation Models Based on Different Preprocessing Methods
Taking NOR, MSC, and SNV preprocessing and FD, SD, and (log (1/R) spectral transformation data of CR spectra as modeling variables, the PLSR, PCR, and SVMR methods were applied to establish a model for determining soil heavy metal concentration. Tables 3-6 show the determination results of Cr and Ni concentrations with different spectral preprocessing and spectral datasets, respectively. The results of the three spectral transformations showed that the SD transformation is more suitable for the model. Among the three preprocessing methods, the MSC and SNV groups had a significant impact on the determination ability of the model. The MSC and SNV groups . In general, in terms of model stability, the R 2 values of the two elements were higher for the model based on MSC and SNV than that based on NOR, and it was higher for the model based on SD than the model based on FD and log(1/R). In terms of model accuracy, the RMSE values of Cr and Ni elements were lower in the model based on MSC and SNV than in the model based on NOR, and lower in the model based on SD than that based on FD and log(1/R). The optimal model for Cr based on the Vis-NIR dataset and PLSR, PCR, and SVMR is the combination of MSC-SD, SNV-SD, and SNV-SD, respectively. The optimal model for Cr based on the SOM dataset and PLSR, PCR, and SVMR is the combination of SNV-SD, MSC-SD, and SNV-SD, respectively. The optimal model for Ni based on the Vis-NIR dataset and PLSR, PCR, and SVMR is the combination of SNV-SD, MSC-SD, and SNV-SD, respectively. The optimal model for Ni based on the SOM dataset and PLSR, PCR, and SVMR is the combination of MSC-SD, SNV-SD, and MSC-SD, respectively.

Estimation Model Based on Different Modeling Variables
Based on the abovementioned analysis and spectral bands (600-800 nm) associated with SOM and Vis-NIR after CR treatment, the MSC-SD and SNV-SD preprocessing methods were applied to establish models for the determination of soil heavy metal concentrations. Table 7 shows the determination accuracies of the calibration and validation models based on spectral bands associated with SOM and Vis-NIR for Cr and Ni concentrations. The lower RMSE values of the Vis-NIR-based model indicate its higher accuracy over the SOM-based model. The model for Cr and Ni was sensitive to the Vis-NIR spectral band. The R 2 value of the PLSR model with Vis-NIR was stable above 0.55 (p > 0.05) and the RMSE value was between 0.38 and 1.56. The model had a strong ability to determine the concentrations of the two elements, and the model exhibited greater ability for Cr than Ni. In contrast, the accuracy of determination using the spectral bands associated with SOM was lower. As shown in Table 7, the model accuracies of the different modeling variables were balanced. Models based on the Vis-NIR spectral band were more accurate for Cr and Ni. Stable and highly accurate determination is key to the application of spectroscopy for the determination of soil heavy metal concentration.

Estimation Model Based on Different Statistical Methods
Based on the abovementioned analysis and the Vis-NIR spectral band after CR treatment, the MSC-SD and SNV-SD preprocessing methods were applied to establish models for the determination of soil heavy metal concentration. Table 7 shows the determination accuracies of the calibration and validation models based on different statistical methods for Cr and Ni concentrations.
Regarding model stability, the R 2 values of the PLSR-based model for Cr and Ni were higher than those of the PCR-and SVMR-based models, and SVMR showed higher values than PCR. In terms of model accuracy, PLSR, PCR, and SVMR for Cr showed RMSEC values of 0.46, 3.75, and 3.81 and RMSEV values of 1.56, 2.06 and 4.27, respectively. For Ni, PLSR, PCR, and SVMR showed RMSEC values of 0.38, 1.76, and 2.27 and RMSEV values of 1.28, 1.99 and 2.52, respectively. The lower RMSE values of the PLSR-based model indicate its higher accuracy over the PCR-and SVMR-based models. The models for Cr and Ni were sensitive to the PLSR and SVMR statistical methods. The constructed PLSR model was stable with R c 2 and R V 2 values above 0.55 (p > 0.05) and highly accurate, with RMSEC and RMSEV values between 0.38 and 1.56. The model had a strong determinative ability for these elements, and the proposed approach can be used to predict the concentrations of these elements with satisfactory precision. The determinative abilities of the three statistical methods follow the order PLSR > SVMR > PCR. In addition, the PCR statistical method showed the lowest accuracy. As shown in Table 7, the model accuracies of the different statistical methods were balanced. The results showed that the models based on PLSR and SVMR were more stable for Cr and Ni concentrations.
Through the statistics obtained from the abovementioned analysis, the Vis-NIR dataset and PLSR model were validated. Furthermore, data from nine sampling points in the study area were used to validate the PLSR estimation model for Cr and Ni concentrations (as shown in Table 8). Regarding model stability, the R 2 values for Cr and Ni were 0.54 (p > 0.05) and 0.57 (p > 0.05), respectively. In terms of model accuracy, the RMSEP values for Cr and Ni were 2.02 and 0.02, respectively. The results showed that the PLSR model constructed using Vis-NIR spectra had good quantitative prediction ability.

Discussion
Preprocessing of soil spectral data is an essential and efficient means for improving the accuracy of hyperspectral modeling [64]. Preprocessing methods exhibit varying performances with different modeling approaches. In this study, taking NOR, MSC, and SNV preprocessing and FD, SD, and (log (1/R) spectral transformation data of CR spectral as modeling variables, a model for determining soil heavy metal concentration was established. Among the three preprocessing methods, the MSC and SNV groups significantly affected the determination ability of the model. Ren et al. constructed the PCR and PLSR prediction model of As and Fe concentrations and OM content using the Vis-NIR spectra of farmland soil in the mining area and soil data as pollution concentration, Fe and OM content, obtained in the laboratory. The research showed that the prediction ability of the model could be significantly improved through MSC, SNV and CR preprocessing [65]. Riedel et al. used 203 soil samples from the German Saxony soil monitoring program covering the period 1998-2013 to test the potential of Vis-NIR and mid-infrared (MIR) in the quantitative prediction of soil properties. They that showed spectroscopy can provide reliable information of soil metal content in a rapid manner, and two preprocessing methods, MSC and SNV transformation, can improve the performance of the model [66]. Zheng et al. used the PLSR method to establish the relationship between reflectance spectral and As content in soil. Compared with other methods, they showed that MSC provides a more accurate prediction (R 2 = 0.711, RMSE = 1.613) [67]. Wu et al. found that baseline smoothing and MSC pretreatment of MID spectral data significantly improve the prediction ability of the model for heavy metal content in off-site soil samples [68] by eliminating the influence of light scattering and sample thickness. The results of this study are very close to those of Ren, Riedel, Zheng, and Wu [64][65][66][67]. The prediction ability of different soil elements based on different preprocessing at different study areas was investigated. MSC and SNV transformation were found to improve the performance of the model. Light scattering effects and baseline shifts of the spectra are among the main factors affecting the spectroradiometer signal in the Vis-NIR [69]. By effectively reducing systematic errors and background noise of the whole sample, the MSC and SNV methods improve the SNR [70].
The limitations of statistical models vary among different soil types, different methods of data preprocessing, different spectral resolutions, different band ranges used, or different forms of transformations, leading to large differences in the accuracy of the same model or different best models for determination. In general, the PLSR algorithm is superior to PCR and SVMR and can monitor the concentration of heavy metals in soil with good results. Compared with the SVMR and PCR algorithms, PLSR firstly extracts principal component information of both spectral band and heavy metal concentration variable matrices and uses a constraint equation in the process of dimensionality reduction to ensure the maximum correlation between spectral band and heavy metal concentration variable component information. Although PCR also involves the extraction of principal components to reduce dimensionality, it only extracts the information of the spectral band variable matrix, without considering the information of the heavy metal concentration variable matrix and does not reduce the dimensionality of the heavy metal concentration variable matrix. Therefore, further optimization operations are required. Some scholars [71] also found that the PLSR method provides better results than the PCR method because the latent variable of PLSR contains information about the OM content. The SVMR method is a nonlinear modeling method, while the PLSR and PCR methods are linear methods. In this study, radial basis functions were mainly used for nonlinear modeling, but the results were not satisfactory in combination with the experimental data, mainly because the RMSE values were large. Choe et al. [72] monitored heavy metal pollution in river sediments in Rodalquilar, southeastern Spain; using a combination of geochemistry, ground spectral parameters, and hyperspectral remote sensing, they obtained parameters from spectral changes related to heavy metals in soil. Ground spectral parameters obtained from the spectral absorption characteristics were found to have potential applicability in analyzing the spatial distribution of heavy metal elements, while the spectral characteristics of soil were not obvious. In terms of scores, PLSR modeling is highly advantageous for making predictions. Kooristra et al. successfully predicted the composition and heavy metal content of beach soil using a PLSR model established using soil Vis-NIR, and pointed out that PLSR method is an effective approach toward predicting the heavy metal content of soil using spectral methods [8].
Compared with the SVMR and PCR methods, the PLSR method uses fewer latent variables, but the model has higher fitting and stability, and has stronger determinative ability, indicating that the latent variables used by the PLSR method contain more soil physicochemical information. Wang [73] used the PLSR method to compare and analyze various spectral indices, and showed that the reciprocal logarithm spectra had the best determinative ability, especially with the detection accuracy of Cd and Pb exceeding 0.82. McDowell et al. also found that spectral characteristic variables related to various organic components and silicate minerals were fully utilized in the PLSR modeling and determination process [74]. Malley [75] pointed out a linear relationship between the absorbance of the NIR spectrum and the concentration of substances. However, some scholars have reported different findings. Shao et al. found that the determination result of the least squares support vector machine (LS-SVM) is better than that of PLSR when using NIR spectra to determine soil NPK [76]. It is speculated that LS-SVM uses the nonlinear information of spectral data to improve the determination accuracy. Evaluating different spectral datasets and different statistical methods, PLSR modeling was found to be very beneficial to the prediction of soil composition and heavy metal concentration. No modeling method is universal, and a model that performs well in one application may not be suitable for another. Therefore, when using spectral data to determine soil properties, the optimal modeling regression method varies across study areas, spectral ranges, and target components.
Soil heavy metals and components, such as SOM, clay minerals, and iron and manganese oxides, exhibit obvious spectral characteristics [23,24]. There is a significant correlation between heavy metals and soil spectral characteristics, such as OM, clay, and Fe [8,20]. Therefore, these properties may play a bridging role in the determination of soil heavy metal concentrations using Vis-NIR reflectance. By selecting characteristic bands, the original spectral information can be well retained and the relationship between soil spectral characteristics and SOM and heavy metals can be reflected more accurately. According to the crystal field theory [77], transition elements with unfilled d-shells, such as Ni, Cu, and Cr, can exhibit absorption characteristics in the Vis-NIR spectral regions. Iron oxides, clay minerals, water content, and SOM are active in Vis-NIR spectral regions [21,22]. The results in Table 7 show that the models for Cr and Ni are sensitive to the Vis-NIR spectral band. The model based on Vis-NIR exhibited stable R 2 values above 0.98 and RMSE values ranging from 0.07 to 0.34, suggesting a strong determinative ability for Cr and Ni. These results confirm that the Vis-NIR technique can improve the accuracy of Cr and Ni estimation models, and that the Vis-NIR technique has strong potential for the simultaneous monitoring and estimation of different species of heavy metals in soils, providing an effective method for large-scale and long-term monitoring of soil heavy metal contamination. Future studies could consider other factors such as Fe-Mn oxide and extract multi-factor characteristic bands to construct multi-spectral transformation indices and estimation models. In the future, the SNV-SD-PLSR method can be verified and promoted through application to other study areas, such as field spectral analysis, and even to UAV and satellite remote sensing data.

Conclusions
This study evaluated three preprocessing methods (NOR, MSC, and SNV), three spectral transformations (FD, SD, and LOG), and three statistical methods (PLSR, PCR, and SVMR). This approach can enhance variable information, reduce model errors, and improve the accuracy and stability of the model. The mechanism of determining heavy metal concentration was systematically analyzed, the relationship between heavy metal concentration and spectral analysis in the soil around a mining area was determined, and different preprocessing and statistical methods were compared to provide important scientific support for heavy metal pollution research. It is considered that the absorption spectral band at 600-800 nm was associated with SOM. The CR data were selected as the basic spectral data, and MSC-SD and SNV-SD were found to be the best among the 16 preprocessing methods for determining Cr and Ni concentrations. The estimation models for Cr and Ni were sensitive to the Vis-NIR spectral band. The R 2 value of the PLSR model built using Vis-NIR was stable above 0.55, the RMSE value was between 0.38 and 1.56, and the model had a strong ability to determine the concentration of two elements, in the order of Cr > Ni. In contrast, the accuracy of determination using the spectral bands associated with SOM is lower. The performances of the three statistical methods are as follows: PLSR > SVMR > PCR, and the accuracy of determination using the PCR statistical method is lower. The estimation models based on the PLSR and SVMR statistical methods are more stable for Cr and Ni concentrations. In the future, the SNV-SD-PLSR method could be applied to other study areas, from field spectral to even UAV and satellite remote sensing data for verification and promotion.