The Optimization Strategy of the Existing Urban Green Space Soil Monitoring System in Shanghai, China

High concentrations of potentially toxic elements (PTE) create global environmental stress due to the crucial threat of their impacts on the environment and human health. Therefore, determining the concentration levels of PTE and improving their prediction accuracy by sampling optimization strategy is necessary for making sustainable environmental decisions. The concentrations of five PTEs (Pb, Cd, Cr, Cu, and Zn) were compared with reference values for Shanghai and China. The prediction of PTE in soil was undertaken using a geostatistical and spatial simulated annealing algorithm. Compared to Shanghai’s background values, the five PTE mean concentrations are much higher, except for Cd and Cr. However, all measured values exceeded the reference values for China. Pb, Cu, and Zn levels were 1.45, 1.20, and 1.56 times the background value of Shanghai, respectively, and 1.57, 1.66, 1.91 times the background values in China, respectively. The optimization approach resulted in an increased prediction accuracy (22.4% higher) for non-sampled locations compared to the initial sampling design. The higher concentration of PTE compared to background values indicates a soil pollution issue in the study area. The optimization approach allows a soil pollution map to be generated without deleting or adding additional monitoring points. This approach is also crucial for filling the sampling strategy gap.


Introduction
The quality of the urban ecosystem depends on the green space soil quality. Soil quality refers to the soil's ability to ensure biological productivity, maintain environmental quality, and promote organism health functions within the limit of ecosystems [1]. Rapid urbanization, industrialization [2] and greenery development will affect the soil quality in urban areas [3]. Therefore, urban areas become the sources of various pollutant elements that can be accumulated for an extended period of time in the soil [4][5][6]. Studies on the concentrations of potentially toxic elements (PTE) in urban soils, previously known as heavy metals, started in the 1960s and identified massive heavy metals sources of urban soil pollution [4,5]. The origins of PTE in urban soils are natural and anthropogenic. The pedogenesis processes are considered the natural source of PTE in the soil [7]. The anthropogenic factors are the crucial sources of PTE in soils and predominantly result from urban development and urbanization [8], the distribution of vehicles and the types of fuels [9], emission from industries and transportation [10], smelting, manufacturing, mining, and coal-burning [11]. Based on these factors, urban soils are enriched with a high level of PTE compared to threshold values [12][13][14].
Numerous studies about PTE in urban soils have been conducted in many cities around the world, including Glasgow [15], London [16], Hong Kong [17], New Orleans [18], of the most extensive coastal cities in eastern China, which plays a crucial role in its main economic, financial, trade, and shipping center, with the most important industrial centers in China. The town covers about 6340.5 km 2, of which 6218.65 km 2 is the land, and the rest is water, and it covers 0.06% of China's total territory [46]. The soil types mainly include paddy soil, fluvial-aquic soil, and coastal saline soil [46]. The entire green spaces in 2015 were about 3593.5 km 2 [47]. The city has characterized the subtropical monsoon climate, with an annual mean temperature of 16 °C and yearly average precipitation is approximately 1200 mm.

Soil Sampling and Chemical Analysis
A total of 460 surface soil (0-20 cm) samples were collected from different green spaces in 2018. The locations were recorded using a global positioning system (GPS) and displayed in Figure 1. Five random soil samples were collected using a soil corer (2.5 cm diameter) and then pooled into one composite sample. The composite samples were air-dried, cleared of visible plant roots and residues. In order to ensure the complete digestion of soil samples, the air-dried soils were ground and passed through a 0.15 mm nylon mesh sieve. For each sample, 0.5 g of soil was digested with a concentrated mixture of HNO3, HF, and HClO4 as stated in the EPA 3052 method [48]. Mixed acid digestion makes the soil digestion more complete. Therefore, compared with the aqua regia digestion, the result of mixed acid digestion becomes higher, which is closer to the actual concentrations of PTE in the soil. The five PTE, including Cu, Zn, Cd, Cr, and Pb contents, were measured using Inductively Coupled Plasma Mass Spectrometry (ICP-MS, NexION 300X, Spectralab Scientific Inc. Markham, ON L3R 3V6, Canada). The limit of detection (LOD) and limit of quantification (LOQ) for the different metals were determined. The LOD for analysis of Cr, Cu, Zn, Cd, and Pb were 0.47 mg kg −1 , 0.25 mg kg −1 , 0.70 mg kg −1 , 0.01 mg kg −1 , and 0.30 mg kg −1 , respectively. The LOQ of the above five PTE

Soil Sampling and Chemical Analysis
A total of 460 surface soil (0-20 cm) samples were collected from different green spaces in 2018. The locations were recorded using a global positioning system (GPS) and displayed in Figure 1. Five random soil samples were collected using a soil corer (2.5 cm diameter) and then pooled into one composite sample. The composite samples were air-dried, cleared of visible plant roots and residues. In order to ensure the complete digestion of soil samples, the air-dried soils were ground and passed through a 0.15 mm nylon mesh sieve. For each sample, 0.5 g of soil was digested with a concentrated mixture of HNO 3 , HF, and HClO 4 as stated in the EPA 3052 method [48]. Mixed acid digestion makes the soil digestion more complete. Therefore, compared with the aqua regia digestion, the result of mixed acid digestion becomes higher, which is closer to the actual concentrations of PTE in the soil. The five PTE, including Cu, Zn, Cd, Cr, and Pb contents, were measured using Inductively Coupled Plasma Mass Spectrometry (ICP-MS, NexION 300X, Spectralab Scientific Inc. Markham, ON L3R 3V6, Canada). The limit of detection (LOD) and limit of quantification (LOQ) for the different metals were determined. The LOD for analysis of Cr, Cu, Zn, Cd, and Pb were 0.47 mg kg −1 , 0.25 mg kg −1 , 0.70 mg kg −1 , 0.01 mg kg −1 , and 0.30 mg kg −1 , respectively. The LOQ of the above five PTE was four times their respective LOD. Certified soils (GSS series, China) were used as standard reference materials to verify the accuracy of the method, and the recovery rate of all measured PTE was 95-105%. All tested glass and blanks were soaked in HNO 3 , rinsed, and Milli-Q water to prevent contamination of the testing instrument.

Geostatistical Methods
Geostatistics is an extension tool in GIS that describes the spatial variation and carries out spatial interpolations [49]. The semivariance function and the kriging interpolations were used to produce the initial interpolation map on green spaces soil [50].
Semivariogram is equal to one-half of the expected value of the squared differences between values of X at locations (i) and (i + h) [51], where m(h) is the number of pairs of observations separated by distance h, Z(x i ) is the sample value of the variable Z at location x i , and Z(x i + h) is the sample value of the variable Z at location x i + h. The ordinary Kriging interpolation is one of the most frequently used geostatistics tools to estimate unknown values using the sample data [52], whereẑ(x 0 ) is the value to be estimated at the location of x 0 ; and z(x i ) is the known value at the sampling site x i ; y i represents constant values of each local neighborhood. While, n represents the number of sites or sampling points within the search neighborhoods used for the estimation. The existing monitoring points were visualized and analyzed using exploratory spatial data analysis (ESDA) tools. ESDA was used to assess the degree of spatial association and examine how the data are normally distributed [53,54]. The spatial clusters and outliers of existing data sets were identified using Local Moran's I [55] and Global Moran's I statistic [56].

Prediction Accuracy Improvement Procedures
The prediction accuracy improvements can normally be achieved by optimizing sample locations over the geographical areas [57]. Optimization usually consists of adding, removing, and moving stations or sampling points [58]. One of the optimization algorithms used to add, remove, and transfer stations to generate optimized sampling sizes and designs is called SSA [42]. The SSA algorithm uses the mean kriging variance (MKV) as the objective function to obtain an optimal sample layout. In this case, the initial design was optimized by moving existing spatial points to the given study surface areas using soil Pb data as an example. Sample optimization by SSA also considers the kriging prediction and fitting variogram models [59]. Then, data were log-transformed before spatial optimization analysis was undertaken. The detailed optimization and evaluation techniques were explained as follows.

Perturb Initial Sampling Design by SSA and Evaluations
A 100 m × 100 m grid size overlaid on the study greens spaces areas, and an initial (before optimized) kriging soil Pb predictions and MKV were produced. Then, 50 to 200 random existing sample points were perturbed using 10,000 times iterations by the SSA algorithm. A new combination is generated, and the MKV values are compared with the initial sampling layout's value. The combination is accepted if the change has improved the MKV values. The maximum perturbed sampling points were decided based on the improved MKV values. The process continued until the prediction variance became constant or higher. The best-improved MKV combination was chosen, and a kriging prediction map and sampling distributions were generated. Finally, to evaluate prediction accuracy improvement, cross-validations were performed.

Statistical Analysis Software and Tools
Spatial sampling optimization and descriptive statistics were performed using the R Statistical Software (version 4.0.2) [60,61]. The spatial clusters and outliers of existing data sets were analyzed using the software GeoDa (version 1.14.0) [62]. Arc GIS (10.4 version) is also used to produce the kriging prediction maps.

Mean Concentrations and Summary Statistics of Potentially Toxic Elements
The summary statistics and mean concentration of the five PTE in urban green space soils are indicated in Table 1. The highest and lowest mean concentrations were found for Zn, and Cd, respectively. The soil mean background values in Shanghai [63] and China [64] are used as reference values to compare the present study's values. Compared to Shanghai's background values, the mean concentrations of PTE in urban green spaces soil are much higher, except for Cd and Cr. All measured mean values exceed China's reference values (Table 1)  Similarly, the coefficients of variation (CV, %) for Pb, Cu, Zn, and Cd were higher, meaning more significant variations among the urban green spaces soils ( Table 1). The high CV of Pb, Cu, Zn, and Cd suggests soil pollution sources in urban green spaces are from anthropogenic sources [65]. On the contrary, the Cr CV is low, which means both natural and anthropogenic factors govern its spatial distribution. The lower CV value of Cr is consistent with many other studies [66][67][68].
The present study is consistent with the previous findings on the park and roadside green spaces in Shanghai [69], but inconsistent with results found on road-greenbelts, except for Pb [70]. The average values of Zn and Cr were significantly higher than the values reported in the western city of Urumqi in China [71]. The mean concentrations of the majority of the five pollutants considered in our study were lower than those found in studies that reported about ten years ago in roadside soil, dust, and sediment in eastern cities in China, including Shanghai [46], Guangzhou [72], and Hangzhou [73] (Table 2). Compared to the average concentrations in worldwide studies, Pb, Cu, and Zn values of our study were much lower than reported values from Spain, Mexico, Turkey, Sweden, and Cuba, but Cr concentration was much higher than the study from Turkey and Sweden (Table 2). In this study, the Cr concentration value was 2.7 and 5.2 times higher than the concentrations values found from Sweden, and Turkey, respectively ( Table 2). For all investigated pollutants, the mean concentration values were higher than those observed in the City of Pensacola, USA ( Table 2). The differences in results between this study and other studies could be due to the test method, level of urbanization in the city, the management strategies on urban green space soils [8], and the sources for variation of PTE [7], such as emissions from industry and transportation [10], smelting, manufacturing, mining, and coal-burning [11].

Optimization Strategy and Evaluation of Existing Monitoring Points
The spatial interpolation in kriging is undertaken by accounting for the following assumptions [49]: (1) Data with a normal distribution, (2) data are stationary, and (3) data fit a variogram and spatial autocorrelation. Prior to carrying out the optimization strategy and the evaluation of prediction accuracy, these assumptions should be assessed and evaluated.

Spatial Patterns of Existing Monitoring Points
The spatial patterns and distribution of each PTE are shown in Table 3. All variables revealed a clustered spatial distribution with a statistical significance (p value < 0.01) and a positive spatial autocorrelation in the existing data sets. The most clustered positive spatial autocorrelation pattern was observed for Pb and Cd (Table 3). Global Moran's I Index values confirmed that the spatial points are clustered and non-randomness. Similarly, the kurtosis and skewness values for all pollutants, except Cr, were higher, which means the data are not normally distributed ( Table 3). The higher Kurtosis values showed many outlier data sets, and the majority of them are clustered at relatively low values. However, it does not state which spatial location features are spatial clustering [80]. Spatial outliers or local outliers are neighboring values that are spatially located at a certain distance [81]. Local Indicators of Spatial Association (LISA), known as Anselin's Local Moran's I, were used to visualize and identify the degree of spatial instability and outliers of the given data set [55]. The results of univariate Local Moran's I scatter plots of the four PTE in the soil at 12905 m threshold distance divided into four association neighborhood layouts are indicated in the supporting data files, Figure 1 HL). Spatial outlier values that include HL and LH values and spatial clusters that include HH and LL values are also indicated. For example, for soil Pb data sets, a 45 feature has neighboring features with values above the mean surrounded by HH values, and one features surrounded by LL values, which is the part of a cluster or pattern data set ( Figure 2). In contrast, 19 data points have nearby features with different values (low high and high low), and this feature is a spatial outlier. Spatial outliers are the values that are different from the values recorded in their surrounding location, while spatial patterns often exhibit spatial continuity and autocorrelation with nearby samples [81]. These spatial outliers influence the spatial structure modeling and prediction of soil pollutant concentrations in urban green spaces. Therefore, the outliers were excluded, and data were transformed before the optimization strategy was undertaken.  HL). Spatial outlier values that include HL and LH values and spatial clusters that include HH and LL values are also indicated. For example, for soil Pb data sets, a 45 feature has neighboring features with values above the mean surrounded by HH values, and one features surrounded by LL values, which is the part of a cluster or pattern data set ( Figure 2). In contrast, 19 data points have nearby features with different values (low high and high low), and this feature is a spatial outlier. Spatial outliers are the values that are different from the values recorded in their surrounding location, while spatial patterns often exhibit spatial continuity and autocorrelation with nearby samples [81]. These spatial outliers influence the spatial structure modeling and prediction of soil pollutant concentrations in urban green spaces. Therefore, the outliers were excluded, and data were transformed before the optimization strategy was undertaken.

Spatial Structures and Dependency
The theoretical Semivariogram models are used to kriging interpolation and optimizing the existing points. The best-fitting Semivariogram models were selected based

Spatial Structures and Dependency
The theoretical Semivariogram models are used to kriging interpolation and optimizing the existing points. The best-fitting Semivariogram models were selected based on root mean square error (RMSE), average standard error (ASE), and root mean square standardized error (RMSSE) values, indicated in Table 4. The best-fitted model is considered to be the one with the smallest value of RMSE, the absolute values of mean errors near to zero, the mean square error (MSE) near zero, and the RMSSE closest to 1 [82]. Based on these criteria, the fitted semivariograms models for each soil element are summarized in Table 5. The best-fit spatial model of Pb and Cr was spherical, whereas Zn and Cu were best-fitted using the Gaussian model. The Cd was fitted with the exponential model. In the semivariograms, the nugget values represent the variability of the measured variables at a certain distance. The spatial dependence and variation of soil properties can be categorized based on the Nugget/Sill ratio values. Suppose the Nugget/Sill ratio is less than 25%, between 25% and 75%, and greater than 75%, the variable has strong, moderate, and weak spatial dependence [83], respectively. All studied elements had a moderate-to-strong spatial dependency, and fit the assumptions around spatial autocorrelation ( Table 5). The Nugget/Sill ratio also indicated predominant sources or soil PTE factors, either natural or anthropogenic factors. Strong spatial dependence can be attributed to intrinsic factors, and weak spatial dependence can be attributed to extrinsic factors [83].

Prediction Accuracy Improvement by Optimization Strategy
A kriging interpolation surface of the study green spaces soil before optimized hereafter refers to the initial sampling design shows a predicted Pb MKV of 131.74 mg kg −1 . The kriging concentration of Pb in the initial sampling design displays spatial heterogeneity with a high prediction hotspot, which is located in the high clustered sampling points and low concentration at the edge segment, since these are the sparse and lacking in sampled areas (Figure 3a). It is also clearly noted that there are many non-sampled green spaces areas at the initial sampling design, which leads to high spatial prediction variance (131.74 mg kg −1 ). In this study, the MKV as the objective function was used in the SSA algorithm to optimize the initial sampling design [84,85]. Each SSA iteration step only involves moving one random sampling point, and the row and column of the covariance matrix are changed. As Figure 3b shows, after optimization, soil Pb sampling points were placed with a better uniformity over the study area than the initial sampling design. The MKV values also were calculated after the initial sampling design is perturbed by SSA (10000 iterations). The initial soil Pb MKV (131.7 mg kg −1 ) decreased to 128.9 mg kg −1 under 50 random spatial samplings perturbed and 102.3 mg kg −1 by 200 random spatial samples perturbed (Table 6). This means the existing soil Pb sampling points captured 22.4% of the total kriged variance improvement and increased the accuracy of un-sampled green spaces without extra sampling points. To evaluate the prediction accuracy and improvements in the initial sampling design, we performed a cross-validation comparison based on prediction RMSE, RMSSE, and ASE ( Figure 4). The values identified for RMSE, RMSSE, and ASE of 20.63, 1.006, and 21.12, respectively, before the initial sampling design was optimized; the values were 19.22, 0.99, 19.99, respectively, after the initial sampling design optimized by SSA. The better prediction accuracy could be found in the smaller values of RMSE, the closer values of RMSE with ASE, and the values of RMSSE approximate to one (Figure 4). In contrast, the value of RMSSE is higher than one for the initial sampling design, which explains the underestimation of the variability of soil Pb predictions on green spaces soil. Figure 3a also shows a higher variability of soil Pb predication concentration in the study areas by comparing the optimized sampling configuration. Many studies confirmed that the initials sampling design samples, optimized by SSA methods, provided closer pre- The MKV values also were calculated after the initial sampling design is perturbed by SSA (10,000 iterations). The initial soil Pb MKV (131.7 mg kg −1 ) decreased to 128.9 mg kg −1 under 50 random spatial samplings perturbed and 102.3 mg kg −1 by 200 random spatial samples perturbed (Table 6). This means the existing soil Pb sampling points captured 22.4% of the total kriged variance improvement and increased the accuracy of un-sampled green spaces without extra sampling points. To evaluate the prediction accuracy and improvements in the initial sampling design, we performed a cross-validation comparison based on prediction RMSE, RMSSE, and ASE ( Figure 4). The values identified for RMSE, RMSSE, and ASE of 20.63, 1.006, and 21.12, respectively, before the initial sampling design was optimized; the values were 19.22, 0.99, 19.99, respectively, after the initial sampling design optimized by SSA. The better prediction accuracy could be found in the smaller values of RMSE, the closer values of RMSE with ASE, and the values of RMSSE approximate to one (Figure 4). In contrast, the value of RMSSE is higher than one for the initial sampling design, which explains the underestimation of the variability of soil Pb predictions on green spaces soil. Figure 3a also shows a higher variability of soil Pb predication concentration in the study areas by comparing the optimized sampling configuration. Many studies confirmed that the initials sampling design samples, optimized by SSA methods, provided closer prediction results to the actual value and the lowest value of mean-variance of spatial prediction [84,[86][87][88][89][90].

Conclusions
The current work has been carried out to investigate the five potentially toxic element concentrations and identify a method to improve prediction accuracy in the non-sampled locations in urban green space soils. The mean concentrations of five pollutants in urban green areas are much higher than Shanghai's background values, except for Cd and Cr. However, all measured values exceed the mean reference values in China. The concentrations of Pb, Cu, and Zn were 1.45, 1.2, 1.56 times the background value of Shanghai, respectively, and 1.57, 1.66, 1.91 times the background values of China, respectively. The higher values, in comparison to the background values, may indicate the presence of soil pollution in the study areas. Similarly, the higher CV means more significant variation exists among urban green spaces soils.
The second objective was to improve the prediction values of non-sampled locations by optimized limited sampling points in the SSA algorism. As a result, an improvement in prediction accuracy by 22.4% was found for spatial prediction in non-sampled locations. Similarly, the lower mean-variance values of spatial prediction were comparable to those the initial sampling design. Therefore, this optimization approach ensures good quality of soil pollution predictions without deleting or adding monitoring points.

Conclusions
The current work has been carried out to investigate the five potentially toxic element concentrations and identify a method to improve prediction accuracy in the non-sampled locations in urban green space soils. The mean concentrations of five pollutants in urban green areas are much higher than Shanghai's background values, except for Cd and Cr. However, all measured values exceed the mean reference values in China. The concentrations of Pb, Cu, and Zn were 1.45, 1.2, 1.56 times the background value of Shanghai, respectively, and 1.57, 1.66, 1.91 times the background values of China, respectively. The higher values, in comparison to the background values, may indicate the presence of soil pollution in the study areas. Similarly, the higher CV means more significant variation exists among urban green spaces soils.
The second objective was to improve the prediction values of non-sampled locations by optimized limited sampling points in the SSA algorism. As a result, an improvement in prediction accuracy by 22.4% was found for spatial prediction in non-sampled locations. Similarly, the lower mean-variance values of spatial prediction were comparable to those the initial sampling design. Therefore, this optimization approach ensures good quality of soil pollution predictions without deleting or adding monitoring points.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.