Improved Mapping of Potentially Toxic Elements in Soil via Integration of Multiple Data Sources and Various Geostatistical Methods

: Soil pollution by potentially toxic elements (PTEs) has become a core issue around the world. Knowledge of the spatial distribution of PTEs in soil is crucial for soil remediation. Portable X-ray ﬂuorescence spectroscopy (p-XRF) provides a cost-saving alternative to the traditional laboratory analysis of soil PTEs. In this study, we collected 293 soil samples from Fuyang County in Southeast China. Subsequently, we used several geostatistical methods, such as inverse distance weighting (IDW), ordinary kriging (OK), and empirical Bayesian kriging (EBK), to estimate the spatial variability of soil PTEs measured by the laboratory and p-XRF methods. The ﬁnal maps of soil PTEs were outputted by the model averaging method, which combines multiple maps previously created by IDW, OK, and EBK, using both lab and p-XRF data. The study results revealed that the mean PTE content measured by the laboratory methods was as follows: Zn (127.43 mg kg − 1 ) > Cu (31.34 mg kg − 1 ) > Ni (20.79 mg kg − 1 ) > As (10.65 mg kg − 1 ) > Cd (0.33 mg kg − 1 ). p-XRF measurements showed a spatial prediction accuracy of soil PTEs similar to that of laboratory analysis measurements. The spatial prediction accuracy of di ﬀ erent PTEs outputted by the model averaging method was as follows: Zn (R 2 = 0.71) > Cd (R 2 = 0.68) > Ni (R 2 = 0.67) > Cu (R 2 = 0.62) > As (R 2 = 0.50). The prediction accuracy of the model averaging method for ﬁve PTEs studied herein was improved compared with that of the laboratory and p-XRF methods, which utilized individual geostatistical methods (e.g., IDW, OK, EBK). Our results proved that p-XRF was a reliable alternative to the traditional laboratory analysis methods for mapping soil PTEs. The model averaging approach improved the prediction accuracy of the soil PTE spatial distribution and reduced the time and cost of monitoring and mapping PTE

PTEs in the soil are caused by various sources, such as waste emission, smelting, mining activity, traffic exhaust, atmospheric deposition, wastewater irrigation, fertilizers, and pesticide application [33][34][35][36][37]. Regulation and remediation of soil PTE pollution is a challenging and urgent task.
Information on the soil PTE spatial patterns forms the basis for effective management and soil remediation. Accurate identification of the extent of polluted areas could contribute to the more efficient management of soils contaminated by PTEs and reduce health risks and financial costs. Interpolation methods such as inverse distance weighting (IDW) and ordinary kriging (OK) have been widely employed to explore the spatial distribution of PTEs [38][39][40][41][42][43]. Recently, a new empirical Bayesian kriging (EBK) method developed by Krivoruchko (2012) [44] has been introduced to estimate the spatial pattern of target variables in the fields of public health [45], meteorology [46], and geology [47]. EBK has the advantage of representing the stochastic spatial process locally as a stationary or non-stationary random field, where the parameters of the locally defined random field vary across space. To the best of our knowledge, no application of EBK to soil PTE mapping has been published.
All geostatistical methods mentioned above require the input of information from soil observations. Typically, many soil samples are needed to ensure estimation accuracy, especially for surveys conducted at large spatial scales. However, this effort is usually hindered by financial and time constraints because soil properties significantly vary at both spatial and temporal scales, requiring extensive surveys and ample soil sample sizes [48][49][50][51]. Therefore, a trade-off has to be made between minimizing the sampling size and maximizing the estimation accuracy.
Portable X-ray fluorescence spectroscopy (p-XRF) is a rapid, low-cost, non-destructive, and easily applicable analysis method that can measure soil PTEs and has been previously used to monitor PTE pollution in soil [52][53][54][55]. p-XRF allows for significantly reducing the cost and time for the measurement of PTEs, as it does not require chemical analysis procedures (e.g., acid digestion). These advantages of p-XRF could potentially enhance the spatial prediction of soil PTEs, with some attempts previously reported. For example, in a study conducted in a former smelting area, Kim et al. (2019) [56] proved that using p-XRF measurements as an auxiliary for COK improved the estimation of the PTE spatial distribution even with a smaller sample size. In another study, Xia et al. (2019) [57] reported that using p-XRF measurements as covariates in COK improved the PTE mapping accuracy in agricultural soils.
Scientists have produced multiple maps based on both laboratory and p-XRF data, as well as by different interpolation methods (i.e., IDW, OK), but until now, no single method has shown better performance than others for the whole range of soil PTEs [38]. However, no method has shown much better performance for different PTEs. Therefore, we suggest combining the results of different methods to obtain better performance results. Herein, we assumed that a PTE distribution map produced using different methods and data sources was complementary and, thus, more accurate. We merged different maps using the model averaging method [58]. Previous researchers have improved estimation accuracy for estimating the spatial distribution of different soil properties, such as soil pH [59] and soil carbon [60], by applying model averaging. However, no study has been reported yet on employing the model averaging method to improve the mapping of soil PTEs using different data sources and interpolation methods.
To fill these gaps, we aimed to: (1) compare the performance of different interpolation methods (i.e., IDW, OK, and EBK) to estimate the PTE spatial distribution based on the laboratory and p-XRF measurements; (2) test the feasibility of using p-XRF as an alternative to traditional laboratory analyses to map soil PTEs; and (3) evaluate the performance and gain of model averaging based on a map produced using different data sources and interpolation methods.

Study Area and Soil Sampling
This survey was conducted in Fuyang County, Hangzhou, South China. The study area covers an area of 1821.03 km 2 (latitude 29 • 44 45"-30 • 11 58.5"N and longitude 119 • 25 -120 • 19 30"E) ( Figure 1). Fuyang County is characterized by a subtropical monsoon climate, with a mean annual precipitation of approximately 1487 mm and a mean annual temperature of 17.8 • C. We collected 293 surface soil (0-20 cm) samples from cropland in the survey region using a stainless steel shovel, where each soil sample was mixed with five other soil samples within a radius of 10 m. All the soil samples were packed into polyethylene bags and brought back to the lab. The soil samples were air-dried in the laboratory. Subsequently, after stones and roots were removed from the samples, we grinded and sieved the samples to less than 2 mm before sending them for chemical analysis. Fuyang County is marked by a long history of mining activities and PTEs smelting activities, as well as important agricultural grain production regions [53]. Previous studies have revealed that mining activities, paper mills, cement factory, and metallurgic activities, as well as agricultural activities, are the main sources of PTEs in soil in Fuyang County [53,61,62].
Remote Sens. 2020, 12, x FOR PEER REVIEW  3 of 25 To fill these gaps, we aimed to: (1) compare the performance of different interpolation methods (i.e., IDW, OK, and EBK) to estimate the PTE spatial distribution based on the laboratory and p-XRF measurements; (2) test the feasibility of using p-XRF as an alternative to traditional laboratory analyses to map soil PTEs; and (3) evaluate the performance and gain of model averaging based on a map produced using different data sources and interpolation methods.

Study Area and Soil Sampling
This survey was conducted in Fuyang County, Hangzhou, South China. The study area covers an area of 1821.03 km 2 (latitude 29°44′45′′-30°11′58.5′′ N and longitude 119°25′-120°19′30′′ E) ( Figure  1). Fuyang County is characterized by a subtropical monsoon climate, with a mean annual precipitation of approximately 1487 mm and a mean annual temperature of 17.8 °C. We collected 293 surface soil (0-20 cm) samples from cropland in the survey region using a stainless steel shovel, where each soil sample was mixed with five other soil samples within a radius of 10 m. All the soil samples were packed into polyethylene bags and brought back to the lab. The soil samples were air-dried in the laboratory. Subsequently, after stones and roots were removed from the samples, we grinded and sieved the samples to less than 2 mm before sending them for chemical analysis. Fuyang County is marked by a long history of mining activities and PTEs smelting activities, as well as important agricultural grain production regions [53]. Previous studies have revealed that mining activities, paper mills, cement factory, and metallurgic activities, as well as agricultural activities, are the main sources of PTEs in soil in Fuyang County [53,61,62].

Chemical Analysis
In this study, the soil pH was measured by a pH meter using a water/soil mixture with a ratio of 2.5:1. The pre-treatments included weighing portions of 0.2 g, microwave digestion (4 mL HNO3, 2 mL HCl, and 2 mL HF) at a temperature of 120 °C for 45 min, liquid transition, and filtration [63]. Thereafter, the total concentration of all PTEs was determined by liquid chromatography-inductively coupled plasma mass spectrometry (LC-ICP-MS) (NexION 300X, PerkinElmer Inc., 940 Winter Street. Waltham, MA 02451, USA). The total concentration of PTEs was measured twice in approximately 15% of soil samples to ensure the accuracy and repeatability of the method [63].

Chemical Analysis
In this study, the soil pH was measured by a pH meter using a water/soil mixture with a ratio of 2.5:1. The pre-treatments included weighing portions of 0.2 g, microwave digestion (4 mL HNO 3 , 2 mL HCl, and 2 mL HF) at a temperature of 120 • C for 45 min, liquid transition, and filtration [63]. Thereafter, the total concentration of all PTEs was determined by liquid chromatography-inductively coupled plasma mass spectrometry (LC-ICP-MS) (NexION 300X, PerkinElmer Inc., 940 Winter Street. Waltham, MA 02451, USA). The total concentration of PTEs was measured twice in approximately 15% of soil samples to ensure the accuracy and repeatability of the method [63].

p-XRF
We used the "Soils Mode" of the portable Niton XL2 GOLDD XRF Analyzer (Thermo Fisher Scientific Inc., Billerica, MA, USA) to obtain p-XRF spectra information. All the soil samples were tested three times in the same place in the laboratory, with a scanning time of 1.5 min [63]. The Cubist algorithm was used to predict the total content of As, Cd, Cr, Cu, Ni, and Zn, based on the p-XRF spectra data. Detailed information on the processes of PTE estimation using p-XRF spectra data is provided in our previous studies [63]. The R 2 of the PTEs predicted by p-XRF spectra data is 0.50, 0.69, 0.96, 0.87, and 0.97 for As, Cd, Cu, Ni, and Zn, respectively.

IDW Method
IDW is one of the most popularly used deterministic interpolation methods in environmental and soil sciences [64,65]. IDW estimates the target variable value using the value from the nearby sampled sites. The weights assigned to the nearby sites are determined by the inverse of their distance from the estimated location. The IDW formula used in this study is as follows: where Z(x 0 ) expresses the estimated value of the interpolated site; N means the number of nearby samples used for interpolation. In this study, when we conducted the IDW interpolation, neighbor samples within the distance of 14.5 km were used for the estimation of PTEs in unvisited locations and the number of N varied between 10 and 15; x i represents the value of ith samples; and h ij is the distance between the estimated point and sampled locations.

OK Method
OK is another popular method for exploring spatial patterns of environmental variables [66][67][68][69][70]. The experimental semivariogram was used to reveal the spatial dependence of the variables. The corresponding formula is as follows: where γ * (h) represents the semi-variance; N(h) represents the pairs of sample numbers separated by distance h; Z(x i ) means the observed value at location i; and Z(x i + h) expresses the observed value at location i + h. Based on the experimental semivariogram, a model was then fitted to employ interpolation. Detailed information on the OK method was reported by Webster and Oliver (2007) [66].

EBK Method
EBK was developed by Krivoruchko (2012) [44] and aimed to resolve the drawbacks of classical geostatistical models. In EBK, the stochastic spatial process is represented locally as a stationary or non-stationary random field, and the corresponding parameters vary across space. EBK comprises two geostatistical models: the intrinsic random function kriging and the linear mixed model. The result of EBK is a robust non-stationary algorithm for spatial interpolation of geophysical corrections. When the data coverage is sufficient, the EBK algorithm helps extend local trends. When the data coverage Remote Sens. 2020, 12, 3775 5 of 26 is poor, EBK integrates an a priori background mean. EBK differs from other kriging methods by including the error introduced by estimating the underlying semivariogram [71]. Detailed information on EBK can be found in Krivoruchko (2012) [44].

Model Averaging Using the Granger-Ramanathan Algorithm
The Granger-Ramanathan (GR) method was developed by Granger and Ramanathan in 1984 [72]. This method assumes that better predictions are produced by a linear combination of different individual predictions with the weight of each prediction determined by the ordinary least squares (OLS) algorithm. In this study, we fitted a linear regression model between the measured PTE content of the validation dataset and the PTEs predicted by the map obtained using different methods and data sources. The PTE GR outcomes produced using the GR method were obtained as follows: where α i represents the regression coefficient; PTE i is the PTE prediction of the i-th PTE map (n = 3 in this study); and β is the intercept. The values of α and β are calculated using the OLS method. The sum of weights α i does not need to be equal to 1.

Assessment of Model Performance
We divided the collected soil samples into a training dataset (n = 195) and an independent validation dataset (n = 98) with a ratio of 2:1. The performance of the map produced by the single method or model averaging was assessed through independent validation. Four indices, including the coefficient of determination (R 2 ), root mean square error (RMSE), relative root mean square error (RRMSE), and bias, were estimated to evaluate the accuracy of different maps.

Data Analysis
The summary statistics were conducted using R Studio [73]. The variograms of all PTEs were fitted by GS+ software (version 9.0; Gamma Software Design, 2008). Then, the parameters of variograms were recorded and used for ordinary interpolation performed using The Granger-Ramanathan (GR) method was developed by Granger and Ramanathan in 1984 [72]. This method assumes that better predictions are produced by a linear combination of different individual predictions with the weight of each prediction determined by the ordinary least squares (OLS) algorithm. In this study, we fitted a linear regression model between the measured PTE content of the validation dataset and the PTEs predicted by the map obtained using different methods and data sources. The PTEGR outcomes produced using the GR method were obtained as follows: where represents the regression coefficient; is the PTE prediction of the -th PTE map (n = 3 in this study); and is the intercept. The values of and are calculated using the OLS method. The sum of weights does not need to be equal to 1.

Assessment of Model Performance
We divided the collected soil samples into a training dataset (n = 195) and an independent validation dataset (n = 98) with a ratio of 2:1. The performance of the map produced by the single method or model averaging was assessed through independent validation. Four indices, including the coefficient of determination (R 2 ), root mean square error (RMSE), relative root mean square error (RRMSE), and bias, were estimated to evaluate the accuracy of different maps.

Data Analysis
The summary statistics were conducted using R Studio [73]. The variograms of all PTEs were fitted by GS+ software (version 9.0; Gamma Software Design, 2008). Then, the parameters of variograms were recorded and used for ordinary interpolation performed using

Summary Statistics
In Table 1, we summarize the descriptive statistics of the As, Cd, Cu, Ni, and Zn contents measured by laboratory analysis (LC-ICP-MS) and p-XRF. The mean content of Cd was higher than the risk screening value for soils (0.3 mg kg −1 ) provided in National Standards of China (GB15618−2018), while the mean content of other PTEs was lower than the corresponding risk screening value. This indicates that soil in the study area underwent Cd pollution. Generally, the PTEs measurements by p-XRF were consistent with those by LC-ICP-MS. However, the coefficient of variation (CV) of the PTE content determined by p-XRF was lower than that of LC-ICP-MS.

Mapping of PTEs Using Data from Individual Sources
We fitted the semivariogram of different PTEs to analyze the spatial structure of different PTEs. The nugget/sill ratio defined by Cambardella et al. (1994) [74] was calculated to analyze the spatial dependence of different PTEs. As presented in Figure 3 and Table 2, the nugget/sill ratio values were 47.27%, 7.60%, 20.85%, 27.16%, and 18.27% for As, Cd, Cu, Ni, and Zn measured by the LC-ICP-MS method, respectively. Meanwhile, the nugget/sill ratio values were 49.98%, 21.05%, 20.90%, 37.48%, and 18.53% for As, Cd, Cu, Ni, and Zn measured by the p-XRF method, respectively. These values indicate strong spatial dependence for Cd, Cu, and Zn and moderate spatial dependence for As and Ni. With regard to the range, Cu has the largest spatial dependence range. Considering the background information of the study area and parameters of the semivariogram, the spatial dependence of Cu and Ni in Fuyang County is mainly controlled by intrinsic variation in soil characteristics such as texture and mineralogy, while for As, Cd, and Zn, it is controlled by both intrinsic variation and extrinsic variations such as industrial activities and mining activities, as well as agricultural actions [74].   The map of the content of As, Cd, Cu, Ni, and Zn determined by LC-ICP-MS and p-XRF was outputted using the IDW, OK, and EBK methods (Figures 4-8). The results showed that the As values were lower than the risk screening values regulated by the Chinese National Standards (GB15618-2018) in most of the study area ( Figure 4). The elevated As values in the soil were predominantly detected in the northwestern and central parts of Fuyang County. The low As values were concentrated in the southeast of Fuyang County. Many mines are located in the northwest of the study area, whereas its central part is represented by the flood plain-an important agricultural production area. Long-term application of chemical fertilizers and active mining may have contributed to a relatively high As content in the soil in the northwestern and central parts of the survey region. Although the spatial pattern of As content was well illustrated in Figure 4b,d,f compared with Figure 4a,c,e, the As content would be overestimated in areas with a relatively low value of As, while underestimated in regions with a relatively high value of As in Figure 4b,d,f because the As measured by p-XRF has a narrower range compared with LC-ICP-MS with a larger minimum and smaller maximum of As content ( Table 1).
The content of Cd showed a spatial pattern similar to As ( Figure 5). Some regions revealed pollution by Cd, where the Cd content in the soil was higher than the risk screening value regulated by the Chinese National Standards (GB15618-2018). High Cd values in the soil were primarily located in the northwestern, north, and central parts of Fuyang County. We attribute Cd pollution to the long-term application of chemical fertilizers, mining activity, and industrial activity in these regions, especially considering that agricultural and mining activities have been previously identified as important sources of Cd and As in Fuyang County [75]. The relative smaller spatial dependence range of Cd measured by p-XRF could explain the appearance of artificial lines apparently crossing the  The map of the content of As, Cd, Cu, Ni, and Zn determined by LC-ICP-MS and p-XRF was outputted using the IDW, OK, and EBK methods (Figures 4-8). The results showed that the As values were lower than the risk screening values regulated by the Chinese National Standards (GB15618-2018) in most of the study area ( Figure 4). The elevated As values in the soil were predominantly detected in the northwestern and central parts of Fuyang County. The low As values were concentrated in the southeast of Fuyang County. Many mines are located in the northwest of the study area, whereas its central part is represented by the flood plain-an important agricultural production area. Long-term application of chemical fertilizers and active mining may have contributed to a relatively high As Remote Sens. 2020, 12, 3775 9 of 26 content in the soil in the northwestern and central parts of the survey region. Although the spatial pattern of As content was well illustrated in Figure 4b,d,f compared with Figure 4a,c,e, the As content would be overestimated in areas with a relatively low value of As, while underestimated in regions with a relatively high value of As in Figure 4b,d,f because the As measured by p-XRF has a narrower range compared with LC-ICP-MS with a larger minimum and smaller maximum of As content (Table 1)   The results showed that the Cu content in most areas of the survey region was lower than the risk screening value regulated by the Chinese National Standards (GB15618-2018) ( Figure 6). The elevated Cu content was only found in the southeast of Fuyang County, downstream of the Fuchun River. We argue that Cu contained in chemical fertilizers and pesticides applied in other regions of the study area could have been transported by the Fuchun River and then accumulated in the soil of this region by irrigation, which could also be one of the main sources of As and Cd accumulated in soil in the southeast of Fuyang County (Figures 4 and 5). The Ni content in most parts of the survey region was lower than the risk screening value regulated by the Chinese National Standards (GB15618-2018) (Figure 7). A high Ni value was located only in the northwest and north of Fuyang County. Waste discharge from mining activity was likely the primary source of Ni accumulated in the soil of this area. The Zn content in most parts of the survey region was lower than the risk screening value regulated by the Chinese National Standards (GB15618-2018) (Figure 8). High values of Zn were mainly found in the north, northwest, and southeast of Fuyang County. Zn in these regions was likely mainly sourced from mining activities. As presented in Figures 4-8, the maps of the PTEs generated by different interpolation methods and measurement methods showed very similar spatial patterns. However, the maps generated by the IDW method revealed more local variations of the PTEs, while the maps produced by OK provided the smoothest spatial patterns of PTEs. The local high values of PTEs in the center of the study area are mainly rooted in industrial activities, waste emission of factories, and mining activities with a small scale.
It is worth noting that there are some artificial lines in Figures 4d, 6d, and 7d which were produced by the OK method, and that dramatic changes of PTE content occur in the areas located on the different sides of these artificial lines. Adjusting the parameters such as the search window and number of neighbor samples may resolve this issue and could obtain smoother maps of soil PTEs. In addition, improving the sample size is also expected to avoid the appearance of these artificial lines. The content of Cd showed a spatial pattern similar to As ( Figure 5). Some regions revealed pollution by Cd, where the Cd content in the soil was higher than the risk screening value regulated by the Chinese National Standards (GB15618-2018). High Cd values in the soil were primarily located in the northwestern, north, and central parts of Fuyang County. We attribute Cd pollution to the long-term application of chemical fertilizers, mining activity, and industrial activity in these regions, especially considering that agricultural and mining activities have been previously identified as important sources of Cd and As in Fuyang County [75]. The relative smaller spatial dependence range of Cd measured by p-XRF could explain the appearance of artificial lines apparently crossing the study area presented in Figure 5b,d,f.
The results showed that the Cu content in most areas of the survey region was lower than the risk screening value regulated by the Chinese National Standards (GB15618-2018) ( Figure 6). The elevated Cu content was only found in the southeast of Fuyang County, downstream of the Fuchun River. We argue that Cu contained in chemical fertilizers and pesticides applied in other regions of the study area could have been transported by the Fuchun River and then accumulated in the soil of this region by irrigation, which could also be one of the main sources of As and Cd accumulated in soil in the southeast of Fuyang County (Figures 4 and 5).
The Ni content in most parts of the survey region was lower than the risk screening value regulated by the Chinese National Standards (GB15618-2018) (Figure 7). A high Ni value was located only in the northwest and north of Fuyang County. Waste discharge from mining activity was likely the primary source of Ni accumulated in the soil of this area.
The Zn content in most parts of the survey region was lower than the risk screening value regulated by the Chinese National Standards (GB15618-2018) (Figure 8). High values of Zn were mainly found in the north, northwest, and southeast of Fuyang County. Zn in these regions was likely mainly sourced from mining activities.
As presented in Figures 4-8, the maps of the PTEs generated by different interpolation methods and measurement methods showed very similar spatial patterns. However, the maps generated by the IDW method revealed more local variations of the PTEs, while the maps produced by OK provided the smoothest spatial patterns of PTEs. The local high values of PTEs in the center of the study area are mainly rooted in industrial activities, waste emission of factories, and mining activities with a small scale.
It is worth noting that there are some artificial lines in Figures 4d, 6d and 7d which were produced by the OK method, and that dramatic changes of PTE content occur in the areas located on the different sides of these artificial lines. Adjusting the parameters such as the search window and number of neighbor samples may resolve this issue and could obtain smoother maps of soil PTEs. In addition, improving the sample size is also expected to avoid the appearance of these artificial lines.

Spatial Modeling of PTEs Using Model Averaging
In this study, model averaging was performed to produce the final PTE map using the six maps constructed by a combination of LC-ICP-MS measurements with IDW, OK, and EBK and p-XRF measurements with IDW, OK, and EBK. As presented in Figure 9, in general, the maps produced by model averaging method show similar spatial patterns to the maps produced by IDW, OK and EBK.

Spatial Modeling of PTEs Using Model Averaging
In this study, model averaging was performed to produce the final PTE map using the six maps constructed by a combination of LC-ICP-MS measurements with IDW, OK, and EBK and p-XRF measurements with IDW, OK, and EBK. As presented in Figure 9, in general, the maps produced by model averaging method show similar spatial patterns to the maps produced by IDW, OK and EBK.

Comparison of Performance of Different Mapping Methods
In this study, we validated all the models by comparing their estimations with lab measurements (LC-ICP-MS measurements) using the independent validation dataset ( Table 3, Figures S1-S4). We estimated the accuracy of the measuring and interpolation methods using their different combinations ( Table 3). The absolute values of the RMSE of the different interpolation methods follow a decreasing order: Zn > Cu > Ni > As > Cd (Table 3). With regard to the RRMSE, the interpolation models clearly have a lower RRMSE value for Ni and Zn compared with As, Cd, and

Comparison of Performance of Different Mapping Methods
In this study, we validated all the models by comparing their estimations with lab measurements (LC-ICP-MS measurements) using the independent validation dataset ( Table 3, Figures S1-S4). We estimated the accuracy of the measuring and interpolation methods using their different combinations ( Table 3). The absolute values of the RMSE of the different interpolation methods follow a decreasing order: Zn > Cu > Ni > As > Cd (Table 3). With regard to the RRMSE, the interpolation models clearly have a lower RRMSE value for Ni and Zn compared with As, Cd, and Cu. In addition, the LC-ICP-MS spatial estimation outperformed p-XRF for As, Ni, and Zn with a smaller value of the RMSE and RRMSE (Table 3) [76,77]. Note: a. the RRMSE means relative root mean square error. It is equal to the ratio of the mean of the square root of residuals squared with the mean content of PTEs [76,77].
As listed in Table 3, generally for As, Cu, Ni, and Zn, all the methods overestimated PTEs content when mapping the spatial distribution of PTEs with a positive value of bias. The results presented in Figures S1-S4 indicate a general trend of overestimation of low PTEs concentration and underestimation of high levels of PTEs concentration when mapping all the elements using different methods. In terms of Cd, the results produced by different methods using LC-ICP-MS measurements slightly underestimated the value of Cd, while using p-XRF measurements slightly overestimated the value of Cd.
Overall, our results revealed that the accuracy of the LC-ICP-MS spatial estimation was higher than that of p-XRF for all PTEs, as also reported by others related studies [78,79]. However, the accuracy of the p-XRF spatial prediction of Cd, Cu, Zn, and Ni contents was very similar to that of the laboratory measurements. The best spatial prediction of different PTEs was obtained from various combinations of measuring methods and algorithms, highlighting the necessity of using model averaging to improve the prediction accuracy even further.
As for As in the soil, the most accurate prediction results were produced by a combination of the LC-ICP-MS measurements and the EBK algorithm, with an R 2 of 0.44, RMSE of 4.37 mg kg −1 , RRMSE of 35.10%, and a bias of 0.34 mg kg −1 . The best prediction for Cd was obtained by the combination of the LC-ICP-MS measurements and the IDW method, with an R 2 of 0.65, RMSE of 0.18 mg kg −1 , RRMSE of 43.90%, and a bias of −0.01 mg kg −1 . With regard to Cu, the most accurate estimation was acquired by the combination of the LC-ICP-MS measurements and the OK method, with an R 2 of 0.60, RMSE of 13.53 mg kg −1 , RRMSE of 43.17%, and a bias of 1.00 mg kg −1 . For Ni, the combination of the LC-ICP-MS measurements and the EBK method provided the best prediction results, with an R 2 of 0.64, RMSE of 5.32 mg kg −1 , RRMSE of 25.59%, and a bias of 0.53 mg kg −1 . As for Zn, using the LC-ICP-MS measurements and the IDW method produced the best estimation accuracy with an R 2 of 0.67, RMSE of 35.39 mg kg −1 , RRMSE of 27.77%, and a bias of 10.51 mg kg −1 .

PTE Mapping Using Model Averaging
Herein, we applied the GR algorithm to map the spatial distribution of PTEs based on the individual maps. These maps were produced by the IDW, OK, and EBK methods using the LC-ICP-MS and p-XRF measurements. Then, we compared the results produced by model averaging with optimized results produced by individual interpolation methods and measurement methods.
As shown in Table 4, the PTEs map produced by model averaging showed minor improvement compared with the map with the highest accuracy produced based on individual data sources (i.e., LC-ICP-MS or p-XRF measurements) using the IDW, OK, or EBK algorithms. Compared with the IDW, OK, and EBK algorithms, the maps of all PTEs produced by model averaging had higher accuracy and were characterized by a higher R 2 , and a lower value of the RMSE and RRMSE. Especially, compared with individual interpolation methods and measurement methods, model averaging essentially reduced the bias value of the prediction results. Thus, model averaging can be used to improve the estimation of the spatial distribution of soil PTEs, especially of As.

Spatial Prediction Accuracy Using Different Interpolation Methods
In this study, the prediction accuracy of different interpolation methods for estimating the PTE content measured by the LC-ICP-MS and p-XRF methods is summarized in Tables 3 and 4. The interpolation methods used herein had a good performance for all PTEs, except for As. The bad performance of the interpolation method for As may mainly be attributed to two sources. On the one hand, as we have described in Section 2.3, the content of soil As measured by p-XRF clearly has lower accuracy compared with others elements, which then also leads to lower accuracy for the interpolation results. On the other hand, as presented in Figure 10, the histograms of the content of As measured by both LC-ICP-MS and p-XRF clearly deviated from a normal distribution even when a log10() transformation was employed. However, a normal distribution of data was usually expected to employ kriging interpolation. This could also explain the relative bad performance of the interpolation results for As.
Remote Sens. 2020, 12, x FOR PEER REVIEW 17 of 25 As measured by both LC-ICP-MS and p-XRF clearly deviated from a normal distribution even when a log10() transformation was employed. However, a normal distribution of data was usually expected to employ kriging interpolation. This could also explain the relative bad performance of the interpolation results for As. EBK provides the highest spatial prediction accuracy for As distribution, IDW provides the highest spatial prediction accuracy for Cd distribution, OK provides the highest spatial prediction accuracy for Cu distribution, IDW provides the highest spatial prediction accuracy for Ni distribution, and ENK provides the highest spatial prediction accuracy for Zn distribution. The EBK method did not perform better than IDW and OK, according to the study results. These findings confirmed that spatial interpolation algorithms are data-specific and variable-specific [64], and there exists no interpolation method that always shows the best performance for all the variables. However, different methods used herein did not output significantly different estimations of PTEs content in the study area. We attribute this fact to the relatively high sampling density in this study. As stated elsewhere, when the sampling density is high, most interpolation methods tend to output similar results [80]. Dirks et al. (1998) [81] found that when using a high-density dataset, kriging methods did not show any major improvement in prediction accuracy compared with simple methods (e.g., IDW, nearest neighbors).
With regard to the PTE spatial distribution, different interpolation methods produced very similar results. However, local variations in PTEs became apparent in the map produced by the IDW algorithm. The map produced by the OK method smoothed out most of the local details. These findings were consistent with those of previous studies [82][83][84]. In the PTE estimation results produced by the EBK algorithm, we detected a more dramatic PTE variation in the study area. This is because, unlike in OK, in EBK, the stochastic spatial process is represented locally as a stationary or non-stationary random field, so the prediction varies more strongly [71,85]. In addition, the sample size is another important factor to affect the interpolation accuracy [86][87][88]. In this study, we collected a total of 293 soil samples. Although this sample size has proved to be able to capture the spatial variation of soil heavy metals at the county level [89][90][91][92][93], it is still far from an extensive survey and more soil samples are expected to improve the interpolation accuracy. Therefore, in further study, EBK provides the highest spatial prediction accuracy for As distribution, IDW provides the highest spatial prediction accuracy for Cd distribution, OK provides the highest spatial prediction accuracy for Cu distribution, IDW provides the highest spatial prediction accuracy for Ni distribution, and ENK provides the highest spatial prediction accuracy for Zn distribution. The EBK method did not perform better than IDW and OK, according to the study results. These findings confirmed that spatial interpolation algorithms are data-specific and variable-specific [64], and there exists no interpolation method that always shows the best performance for all the variables. However, different methods used herein did not output significantly different estimations of PTEs content in the study area. We attribute this fact to the relatively high sampling density in this study. As stated elsewhere, when the sampling density is high, most interpolation methods tend to output similar results [80]. Dirks et al. (1998) [81] found that when using a high-density dataset, kriging methods did not show any major improvement in prediction accuracy compared with simple methods (e.g., IDW, nearest neighbors).
With regard to the PTE spatial distribution, different interpolation methods produced very similar results. However, local variations in PTEs became apparent in the map produced by the IDW algorithm. The map produced by the OK method smoothed out most of the local details. These findings were consistent with those of previous studies [82][83][84]. In the PTE estimation results produced by the EBK algorithm, we detected a more dramatic PTE variation in the study area. This is because, unlike in OK, in EBK, the stochastic spatial process is represented locally as a stationary or non-stationary random field, so the prediction varies more strongly [71,85]. In addition, the sample size is another important factor to affect the interpolation accuracy [86][87][88]. In this study, we collected a total of 293 soil samples. Although this sample size has proved to be able to capture the spatial variation of soil heavy metals at the county level [89][90][91][92][93], it is still far from an extensive survey and more soil samples are expected to improve the interpolation accuracy. Therefore, in further study, more soil samples need to be collected to analyze the effects of the sampling density on the interpolation accuracy.

p-XRF Measurements as an Alternative Means for PTE Mapping
The summary statistics extracted from the p-XRF measurements of As, Cd, Cu, Ni, and Zn in the soil were only slightly worse compared to the laboratory analysis (LC-ICP-MS) results, proving the potential of p-XRF to be used as an alternative method for measuring As, Cd, Cu, Ni, and Zn contents in soil, which could reduce the time and economy cost for the measurement of PTEs content in the laboratory. With regard to the spatial prediction, the prediction accuracy produced by different methods using p-XRF was close to that of the traditional laboratory analysis, except for As. The application of the p-XRF method reduces the time, labor, and financial costs associated with soil sampling, soil sample pre-processing, and laboratory analysis [94]. As such, it enables us to collect and analyze more soil samples with a higher sampling density, which is crucial for conducting soil surveys at large spatial scales. Another potential application of p-XRF is to help understand the general status of soil PTE pollution and provide prior knowledge for designing more efficient soil sampling schemes.

Potential of Model Averaging for PTE Mapping
Our study results demonstrated the superiority of using model averaging (GR) to improve the mapping of all soil PTEs (Figure 8, Table 3). As shown in Table 4, using the model averaging method reduced the RRMSE value of all PTEs. Especially, the model averaging method obviously reduced the bias value which confirms the potential of model averaging for improved mapping of soil PTES. The degree of improvement varied for different elements. The most significant improvement when employing model averaging was shown for As, where the R 2 of the prediction of model averaging was significantly improved, while the RMSE, RRMSE, and bias decreased compared with other interpolation methods. In addition, the prediction results of model averaging retained more local variations in soil PTEs compared with the OK method, while some of the extremely abrupt variations in soil PTEs were smoothed out compared with EBK, making the results seem closer to the actual situation. In this study, we only tested the feasibility of GR for mapping soil PTEs using model averaging. In our future work, some other model averaging approaches, such as variance weighted, Bayesian model averaging, piecewise linear decision tree, and weighted model averaging, will be tested to map soil PTEs with larger expected improvement [95][96][97].

Method Limitations and Potential Improvements
Herein, we proved the potential of p-XRF to serve as an alternative method for laboratory analysis and the potential of model averaging to map soil PTEs. However, some limitations of the application of p-XRF and model averaging still need to be overcome.
First, PTEs from the obtained soil samples were measured using p-XRF after soil sample pre-processing (i.e., we air-dried the samples in the laboratory, removed stones and roots, and, finally, grounded and sieved the samples to be less than 2 mm in size). There are still some limitations to realize in situ and on-the-go measurements of soil PTEs because the soil spectra are extremely susceptible to interference from the external environment (e.g., soil moisture and particle size) and reduce the prediction accuracy [98,99]. Therefore, p-XRF should have a better possibility of application if we could reduce the negative effects of the environmental factors and then realize in situ and even on-the-go measurements [100]. Advanced pre-processing algorithms, such as external parameter orthogonalization and direct standardization, can be used to deal with this issue [101,102].
Second, as reported by Chen et al. (2020) [96], the sample size substantially affects the accuracy of model averaging methods. Therefore, it is necessary to compare the accuracy of spatial interpolation algorithms with that of the model averaging method using datasets with different sizes or sampling densities. This could help achieve a better trade-off between prediction accuracy and cost.
After that, the sampling design and spatial distribution of the samples also have essential effects on the prediction results. The sampling design could affect the reliability of the variogram [64,103]. The sampling design also has a significant influence on the prediction accuracy with irregular designs preferred to regular ones [88]. Sample spacing must relate to the scale or scales of the variation of the primary variable in a region, otherwise samples might be too sparsely spaced to identify a correlation and could result in a pure nugget [66]. Spatial clustering of samples affects the accuracy of the estimations, but the effects vary among different interpolation methods [64,66]. Data collected at a suitable range of separations to capture changes in the scales of the variation deserve higher prediction accuracy [104]. Therefore, more attention should be paid to the soil sampling design and spatial distribution of soil samples in the future.
Finally, the performance of the model averaging method may vary among different survey regions and soil properties. Thus, extended applications are needed to confirm the feasibility of the model averaging approach for mapping soil and other PTEs in different areas and at variable spatial scales. Moreover, other types of model averaging methods, such as variance weighted, Bayesian model averaging, piecewise linear decision tree, and weighted model averaging, could also be tested to improve the accuracy of PTE mapping.

Conclusions
This study is the first to compare the mapping of soil PTEs using individual geostatistical methods (i.e., IDW, OK, and EBK) and a model averaging approach based on the laboratory and p-XRF measurements. The feasibility of p-XRF as an alternative method of laboratory analysis for mapping soil PTEs was confirmed. The main study conclusions are as follows: 1.
The average content of all PTEs was lower than the corresponding risk screening value regulated by the Chinese National Standards (GB15618-2018), except for Cd.

2.
The EBK method showed no clear advantage over IDW and OK, according to the study results. Moreover, no method regarded herein consistently outperformed other methods when interpolating different PTEs.

3.
The p-XRF spatial prediction accuracy of Cd, Cu, Ni, and Zn was similar to that of the laboratory measurements, confirming that p-XRF can be used as a reliable alternative to traditional laboratory analysis for mapping Cd, Cu, Ni, and Zn in the soil. 4.
The model averaging method improved the mapping accuracy of all soil PTEs regarded herein compared with individual geostatistical algorithms (i.e., IDW, OK, and EBK). The improvement in mapping accuracy was the clearest for As.
Our study proved that p-XRF and model averaging provide a cost-effective, reliable, and highly accurate alternative to map soil PTEs, which could essentially reduce the time and economy cost for soil sampling and monitoring. This advantage is more obvious when soil PTEs pollution monitoring and remediation are carried out at a large spatial scale. This study provides an implication for more efficient monitoring and remediation of soil PTE pollution.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-4292/12/22/3775/s1, Figure S1: Scatter plot of spatial prediction values using IDW Vs PTEs content measured by LC-ICP-MS method, Figure S2: Scatter plot of spatial prediction values using OK Vs PTEs content measured by LC-ICP-MS method, Figure S3: Scatter plot of spatial prediction values using EBK Vs PTEs content measured by LC-ICP-MS method, Figure S4: Scatter plot of spatial prediction values using model averaging Vs PTEs content measured by LC-ICP-MS method.
Author Contributions: F.X.: conceptualization, methodology, software, writing-review and editing, resources. B.H.: conceptualization, methodology, data curation, writing-review and editing. Y.Z.: supervision, resources. W.J.: writing-review and editing, software. S.C.: writing-review and editing. D.X.: data curation. Z.S.: conceptualization, supervision, writing-review and editing, resources. All authors have read and agreed to the published version of the manuscript.