Spatio-Temporal Analysis of Heavy Metals in Arid Soils at the Catchment Scale Using Digital Soil Assessment and a Random Forest Model

Predicting the spatio-temporal distribution of absorbable heavy metals in soil is needed to identify the potential contaminant sources and develop appropriate management plans to control these hazardous pollutants. Therefore, our aim was to develop a model to predict soil adsorbable heavy metals in arid regions of Iran from 1986 to 2016. Soil adsorbable heavy metals were measured in 201 samples from locations selected using the Latin hypercube sampling method in 2016. A random forest (RF) model was used to determine the relationship between a suite of geospatial predictors derived from remote sensing and digital elevation model data with georeferenced measurements of soil absorbable heavy metals. The trained RF model from 2016 was used to reconstruct the spatial distribution of soil absorbable heavy metals at three historical timesteps (1986, 1999, and 2010). Results indicated that the RF model was effective at predicting the distribution of heavy metals with coefficients of determination of 0.53, 0.59, 0.41, 0.45, and 0.60 for Fe, Mn, Ni, Pb, and Zn, respectively. The predicted maps showed high spatio-temporal variability; for example, there were substantial increases in Pb (the 1.5–2 mg/kg−1 class) where its distribution increased by ~25% from 1988 to 2016—similar trends were observed for the other heavy metals. This study provides insights into the spatio-temporal trends and the potential causes of soil heavy metal contamination to facilitate appropriate planning and management strategies to prevent, control, and reduce the impact of heavy metal contamination in soils.


Introduction
Soil contaminants have important implications in the natural environment where they negatively impact plant and animal biodiversity and have the potential to cause human health problems [1]. Key pollutants in the soil are heavy metals due to their nondegradability, toxicity, cumulative effects, and carcinogenic properties [2][3][4][5]. Heavy metals are a part of the Earth's crust, insoluble, and can enter the human body through drinking water, air, and food [6,7]. The exploitation of previously buried heavy metals has led to a spatial distribution of heavy metals in the soil that is continuously expanding with concentrations increasing. This occurs through mining activities, combustion of fossil fuels for power generation and transportation, and other human activities, such as rapid urbanization and industrialization around the world [1,8]. Unlike organic pollutants such as Aldrin and Kerodan, these heavy metals are not biodegradable by microorganisms and often remain unchanged in chemical form for long periods of time [9,10]. The chemical form of a heavy metal is key to its bioavailability to humans, plants, and animals. Metal contamination is of great importance due to its abundance, persistence, and toxicity in the environment [11]. Heavy metals and their different chemical forms show differences in their behavior in terms of mobility, adsorption capacity, leaching from the soil, and absorption by plants [12]. This can lead to the gradual accumulation of heavy metals in the soil. The process of accumulating heavy metals in the soil is very slow, and its effects can be observed after several decades [1,11,[13][14][15][16].
The accumulation of heavy metals in the soil is often caused by industrialization; it has attracted much attention from researchers [1,3,5,11,[15][16][17][18] and has become a serious environmental challenge in many parts of the world-especially in developing countries [19]. Although heavy metals in the soil may be produced from activities related to urbanization and mining, intense agricultural and industrial activities are also considered to be major producers of heavy metal contamination in the environment [4,11,12,[20][21][22][23].
Due to the complex nature of soil and the many sources responsible for the enrichment of heavy metals, the spatial patterns of heavy metals are highly heterogeneous within the soil [3,5,15,18,22,24,25]. Knowledge of the spatial distribution of heavy metals in soil is a prerequisite for identifying the potential sources of contaminants and for developing appropriate management plans to control these hazardous pollutants [26]. The traditional methods for investigating the chemical contamination of soils are based on field sampling and chemical analysis, which are very labor-intensive and expensive; however, digital soil mapping (DSM) approaches provide a suite of tools that are more efficient and less expensive [27].
DSM approaches involve the spatial prediction of soil properties based on numerical models [27,28]. Here, data are acquired through advanced technologies such as geographic information systems, global positioning systems, spectral waves, field scanners, digital elevation models (DEMs), remote sensing (RS) data, and the use of a range of computational methods for interpolation and data mining [29][30][31][32]. DSM techniques are used to predict the spatial distributions of soil classes and soil properties, and can be used to evaluate prediction uncertainty [33][34][35]. Generally, DSM has been recognized as the most effective approach for providing spatial estimation on soil properties and related uncertainty predictions [12,14,[36][37][38][39].
The first requirement in DSM is the acquisition of the environmental covariates (e.g., RS data and terrain attributes) for the prediction of soil heavy metals [13,24,26,40,41]. Among a wide range of covariates, RS data have been widely used as a rapid and practical source of environmental information for understanding the spatial distribution of soil heavy metals [6,[41][42][43][44]. For example, Choe et al. [43] successfully used satellite image-derived spectral parameters to map heavy metal pollution in stream sediments in southeastern Spain. Similarly, Shi et al. [6] also confirmed the importance of RS-based spectral information to evaluate and predict soil heavy metals.
Although there are many studies that have predicted the spatial distribution of soil properties using DSM approaches, such as soil moisture [37], soil texture fractions [45], soil salinity [46,47], soil quality index [47], and soil organic carbon [29][30][31]48,49]; to the best of our knowledge, studies have yet to investigate the spatio-temporal prediction of heavy metals-especially in arid environments. Therefore, this study explores the spatiotemporal patterns of heavy metals in soils of the central arid regions of Iran using DSM techniques and the random forest (RF) model. Specifically, the spatial distribution of iron (Fe), manganese (Mn), nickel (Ni), lead (Pb), and zinc (Zn) in soils from 1986 to 2016 are predicted using the RF model and a wide range of environmental covariates for the Yazd-Ardakan Plain, Yazd Province, Iran.

Materials and Methods
This study was conducted in four main steps ( Figure 1). In Step 1, a wide range of environmental covariates was obtained from various sources and was calculated for four time periods (1986, 1999, 2010, and 2016). In Step 2, the 201 sample locations were identified within the study area using the Latin hypercube sampling (LHS) approach. Here, soil samples were collected and the adsorbable heavy metals were measured. In Step 3, the relationships between soil heavy metals and covariates were quantified using a RF model. In Step 4, the constructed RF model was applied to the covariates acquired for 1986, 1999, 2010, and 2016 to predict and map the spatial distribution of soil heavy metals for the four time periods.

Study Area
The study area is in the central plateau of Iran and has an area of 482,900 hectares. The location of the central part of the province is between 53 • 08 36 and 54 • 85 32 E longitude and 31 • 21 59 and 32 • 61 02 N latitude. The elevation of Yazd Province ranges from 997 to 2677 meters above mean sea level. Precipitation is low and infrequent in the region, averaging 59.2 mm per year; furthermore, evaporation rates are high, averaging 2200 to 3200 mm per year. The study area has no permanent rivers-the rivers of the plain only flow during periods of flooding ( Figure 2). The population density of the region is high, with nearly 75% of the population of the Yazd Province living on the plain and working in the numerous factories and mines. Major production units of Yazd include the production of tiles and ceramics, and this is concentrated along the Ardakan-Meybod-Yazd route. The ceramics industry provides more than 50% of the ceramic tiles for Iran and thus, the factories and workshops are a major source of industrial pollutants throughout the region. This region also features geological features with a vast range in age, from the oldest (Precambrian) to the youngest (Holocene) (Figure 3).

Environmental Covariates
This study used 69 environmental covariates, which were derived from DEMs, remote sensing data, land use data, geological maps, groundwater quality data, and other geospatial datasets. It should be added that all raster-based covariates were calculated for the four timesteps from 1999 to 2016. The covariates, summarized in Table A1, were resampled (aggregation or disaggregation) to a common spatial resolution of 30 m. Four examples of covariates are presented in Figure 3.

Digital Elevation Model
A pre-processed SRTM-derived DEM [50], with a 30 m spatial resolution, was used to derive 12 terrain attributes using SAGA GIS [51]. These attributes (e.g., slope gradient, and valley depth) are commonly used in DSM studies to represent topography (Table A2).

Remote Sensing Data
Landsat images were used to prepare the covariates derived from remote sensing data, which were acquired for each of the timesteps. Remote sensing data were acquired from Landsat 4 (Multispectral Scanner System; MSS) for October 7, 1986; Landsat 5 (Thematic Mapper; TM) for August 19, 1999; Landsat 7 (Enhanced Thematic Mapper Plus; ETM+), for September 3, 2010; and Landsat 8 (Operational Land Imager; OLI) for September 10, 2016. After performing geometric and radiometric corrections on the Landsat images, a total of 36 covariates were calculated. These covariates included the original Landsat bands, their principal components (PC1 to PC11), vegetative indices (e.g., normalized difference vegetation index, ratio vegetation indices), soil salinity indices, bare soil index (BI), and soil moisture indices (Table A2).  Table A2).

Land Use Map
Using the Landsat imagery described in Section 2.2.2, land use maps were developed using an object-based, controlled, classification method with seven user-defined categories (afforested, agricultural land and garden, barren land, poor rangeland, residential land, rocky land, and sand dune) [52] (Table A2).

Geological Map
Geological information was obtained through field operations, a review of case reports, and the integration of geological maps. Studies show that the area has rocks and sediments from both the oldest (Precambrian) and youngest (Holocene) geological organizations. The Precambrian formations consist of various types of metamorphic rocks, as well as igneous rocks, in which even layers of gypsum can be observed (Table A2).

Groundwater Quality Parameters
To investigate the quality of groundwater samples and their potential effect on the heavy metal concentrations in the soil, data were acquired from 103 wells from a 1986-2016 water monitoring survey. The well water data included measurements of several groundwater quality parameters such as bicarbonate chloride (Cl − ; mg L −1 ) and sodium (Na + ; mg L −1 ) ( Table A2). The inverse distance weighted (IDW) interpolation method was used to create the maps of groundwater quality parameters for the four time periods.

Other Geospatial Data
One of the main reasons for increased concentrations of heavy metals in soils can be due to rapid population growth, which affects natural resources and the environment [40]. Therefore, to investigate the effect of population change and its impact on heavy metal concentrations in soil, 10 cities (e.g., Mehriz and Yazd) were identified and a population layer was prepared. Furthermore, 478 residential points were identified in the area and the distance from these residential areas was determined. Furthermore, the distance to the nearest road and the distance to the nearest river were also included as model inputs. The major production units of the Yazd tile and ceramic industry are located along the Ardekan-Meybod-Yazd road. In the study area, 282 mines and factories were identified; hence, a layer representing the distance to the nearest mines and factories was included.
To explore the effects of changes in precipitation, data from rain gauge stations located in Kharanagh, Mohammadabad, Yazd, Mahdiabadabadar, Ali Abadchehl-e Ghazi, Nir, Yazd, Meybod, Ardekanm, and Nodooshan were obtained and precipitation maps were prepared using IDW interpolation.

Soil Data and Laboratory Analysis
Topsoil samples from the 0-20 cm depth increment were collected at 201 locations using the LHS method [53] (Figure 2). Based on the understanding of the factors affecting the soil heavy metals distribution in the study area [47,52,54] and a literature review [11,13,24,26,41,55], the selected covariates (i.e., geological layers, land use, rainfall, the first eleven principal components, vegetative indices derived from remote sensing data, and Tasseled Cap differencing) were used as inputs for the Latin hypercube method. Concentrations (mg/kg −1 ) of Fe, Mn, Ni, Pb, and Zn were measured from an air-dried <2 mm fraction of the soil and via the nitric acid extraction method [56]. The total concentration of each element in the soil extract was determined using an Analytic Jena-novAA300 Atomic Absorption Spectrophotometer.

Random Forest
The aim of machine learning is to obtain a useful approximation of a function that is based on the relationship between input variables and the desired outputs [57]. Here, the RF model was used to find the relationship between the environmental covariates and heavy metals in the soil. The RF approach was first proposed in Breiman [58] and consists of an ensemble of individual decision trees, which may be applied for both regression and classification problems. Decision trees are a hierarchical model, which consists of a series of nodes and leaves (i.e., terminal nodes). Each node may be considered as an if-then statement, whereby the model generates a node-splitting rule that is intended to maximize the within-node homogeneity and between-node heterogeneity within the measured values. Due to the instability of an individual tree, whereby a small change in the dataset could result in a large change in the structure of decision tree, the RF learner generates a diverse set of decision trees and aggregates the models using an averaging function (when applied to regression problems). These decision trees are generated through a bootstrap sample (with replacement) of the training data and further randomness is incorporated in the model by randomly selecting a subset of covariates that are tested when establishing the splitting rules at each node. After each tree is generated, the trees are ensembled together using an averaging function. The model ensemble provides outputs that are more accurate, more stable, and less sensitive to overfitting in comparison to individual decision trees. Due to the use of bootstrap sampling, the training data that were not selected for the model (out-of-bag sample) are then used by the RF model for variable importance analysis. In this study, three main hyperparameters of the RF model were optimized, namely, ntree, mtry, and nodesize. The ntree parameter controls the number of trees within the RF; the mtry parameter controls the number of variables that are randomly selected when establishing the node-splitting rules; and the nodesize parameter controls the minimum size of the terminal nodes.
A 10-fold-cross-validation procedure was utilized to assess the performance of a RF model for predicting heavy metals. The root mean square error (RMSE), mean absolute error (MAE), coefficient of determination (R 2 ), and the percentage of error (RMSE/average of each heavy metal) were calculated as the main validation criteria. The randomForest package in the R environment [59] was used to implement the RF model.

Historical Spatial Distributions of Soil Heavy Metals
In this study, a RF model was used to find the relationship between the environmental covariates and soil heavy metals in 2016. Then, the trained RF model for 2016 was applied to the covariates that were generated for the 1986, 1999, and 2010 periods in order to reconstruct the spatial distribution of soil heavy metals of the past [47]. The number of covariates and their spatial resolutions for the 1986, 1999, 2010, and 2016 periods were the same (Table A2).

Soil Data
The summary statistics for soil absorbable heavy metals are provided in Table 1. Ni had the highest coefficient of variation (1.53) while Mn had the lowest (0.75). The correlations between the absorbable heavy metals and the other soil attributes were weak but significant in some cases (Table 2). For example, Table 2 shows that Cl and EC had a positive correlation with Fe and Ni (p < 0.01); SOC had a positive correlation with Zn (p < 0.01); shear strength was positively correlated with Ni (p < 0.01) and Zn (p <0.05); bulk density had a positive correlation with Fe (p < 0.05); and HCO 3 − also had a positive correlation with Zn (p < 0.01).

Environmental Covariates
The covariate importance analysis of the RF model for 2016 is illustrated in Figure 4, whereby higher values of increase-in-node-purity (IncNodePurity) represent higher importance of the covariate. As can be seen, the urbanization and road construction predictors were important covariates for predicting soil heavy metals. Therefore, it can be concluded that human activities might be the main factor affecting the spatial distribution of soil heavy metals in the study area. Furthermore, it was also observed that groundwater, vegetation indices, satellite bands, and topographic attributes derived from the DEM were also important predictors. Previous studies have shown that heavy metals have strong reflectance in the visible and near-infrared regions of the electromagnetic spectrum, and their reflectance characteristics could be useful for identifying these elements [43]. Here, the red and infrared bands had high correlations with the heavy metals in the soil (Figure 4). Dana and Badea [44] showed that satellite bands, which were transformed using PCA, were an effective predictor when used for detecting contaminated farmlands. The results also confirm our original notion of the importance of human activities on the distribution of heavy metal concentrations; for example, the concentration of Pb was higher in soils that were in close proximity to roads. Similarly, Karrari et al. [40] reported higher Pb contamination of soils in the vicinity of polluted and industrialized cities in Iran. This also was aligned with the findings of Kheir et al. [60], who showed the influence of roads as being the third most important factor affecting heavy metal accumulation in the soils after urban margins and waste disposal areas.

Current Distributions of Heavy Metals (2016)
Upon optimizing the RF model parameters for 2016, the model was validated using the MAE, RMSE, R 2 , and %Error metrics, which were calculated based on the actual and the predicted heavy metal values for the soil ( Table 3). The validation results indicated that the RF models performed reasonably well in terms of the MAE, RMSE, R 2 , and %Error metrics, with values ranging from 0.11 to 0.47, 0.15 to 0.60, 0.41 to 0.60, and 25 to 50, respectively. The effectiveness of the RF models was further confirmed in Figure 5, which shows a moderate correlation between the measured and predicted heavy metal concentrations. It should be highlighted that R 2 values > 70% are uncommon and values of R 2 < 50% are common in DSM studies [54,61,62]. The %Error values in Table 3 indicated that the RF model had the highest accuracy in predicting Pb, sequentially followed by Zn, Mn, Fe, and Ni. Similar to the results of this study, Amirian-Chakan et al. [36], Nabiollahi et al. [63], and Taghizadeh-Mehrjardi et al. [45] also used the RF model for predicting different soil properties in various regions of Iran and reported acceptable levels of accuracy (Table S1).

Historical Distributions of Heavy Metals (1986, 1999, and 2010)
Comparing the results of total concentrations of heavy metal distributions with the quality standards for soil resources in Iran, it was observed that the average concentration of heavy metals in the study area was lower than the standards. Using the environmental datasets that represented the historical conditions, the 2016 RF model was used to estimate the heavy metal concentrations for 1986, 1999, and 2010. As a result, the heavy metal maps of the soil were classified according to their conditions and their composition in the study area. Figures 6-10 show the maps of estimated heavy metal distribution in soils of the Yazd-Ardekan plain for 1986, 1999, and 2010. These findings suggest that some of the spatial variation of these heavy metals is partially due to factors such as mining operations, extraction, and activities. This further suggests that the industrial materials and smoke caused by the passing service vehicles and vehicles on the adjacent highway and Bandar Abbas-Tehran road should be controlled. Although the point sources of pollution are intuitively linked to the spatial distribution of heavy metals, other factors such as wind speed and direction are also important factors and should therefore not be neglected.

Comparison of Heavy Metal Distributions (1986 to 2016)
After the maps of soil heavy metals were prepared for each time step, the areal extent of each class for every heavy metal was calculated and presented in Figure 11. According to Figure 11, the change in Fe levels in the area between 1986 and 2016 showed that the area of the 0-1 mg/kg −1 class decreased by 144,683 ha and the areas of the 1-2, 2-3, and >3 mg/kg −1 classes increased by 94,169 ha, 43,023 ha, and 7,491 ha, respectively. Between 1986 and 2016 (Figures 6 and 11), the greatest percentage of Fe in the 0-1 mg/kg −1 class was in 1986, while the areal extent of the class >3 mg/kg −1 was very small. In 2016, however, the areal of extent of the 0-1 mg/kg −1 class decreased by more than 30% and the areal extent of the 1-2 mg/kg −1 increased by more than 19.5%, indicating an overall increase in Fe concentrations for the region. There were also substantial increases in Mn concentrations where the 2-4, 4-6, and >6 mg/kg −1 classes increased by 22% (106,090 ha), 9.80% (47,339 ha), and 1.76% (8519 ha), respectively; in comparison, the areal extent of the 0-2 mg/kg −1 class decreased by 33.53% (161,948 ha) between 1986 and 2016 ( Figure 11). Mn concentrations in the Yazd-Ardekan plain have historically increased from 1986 to 2016 due to mining activities and factories. According to the statistics from the Yazd Province Mining Industry Organization, by 2017, there were 566 active mines in Yazd. According to the Mn map (Figures 7 and 11), there were moderate (2-4 mg/kg −1 ) concentrations across the whole region and these concentrations need to be managed. Although the main source of naturally occurring Mn in the soil is from the Earth's crust, the anthropogenic sources of contamination are linked to urban wastewater discharge, sewage sludge, combustion of fossil fuels, mines, and factories [64].  20% (116,908 ha). Overall, the highest Ni concentrations were found in pastures where the soil quality is low (Figures 8 and 11). Contrary to expectations, the location of mines and factories, agricultural lands, gardens, and residential lands had little influence on Ni concentrations, indicating that the element is not related to land use. Studies such as Mico et al. [2] have shown that the Ni concentrations in the soil are related to parent materials in the Mediterranean region. Based on Figure 8 and the land use map (Figure 3), the amount of Ni seems to have little to do with land use. With respect to Pb concentrations, the areal extent of the 1-1.5, 1.5-2, and >2.5 mg/kg −1 classes increased by 8.75% (42,258 ha), 25.18% (121,633 ha), and 2.18% (10,562 ha), respectively, while the areal extent of the 0-1 mg/kg −1 class decreased by 36.56% (176,565 ha) ( Figure 11). Furthermore, the Pb concentration maps indicated that the distribution of Pb was focused at the center to the southern and western regions of the study area (shown in dark brown- Figures 9 and 11). In these areas, high concentrations of Pb were due to its accumulation from mines, factories, agricultural lands, gardens, and residential lands. The reason for this may be the tendency of this element to accumulate in smaller soil particles (clay and silt) [65]. Our results were in accordance with the findings of Facchinelli et al. [66], who identified that the sources of Pb in the soil were controlled by human activities.  Figure 11). Furthermore, the predicted maps indicated that in the areas where mines, factories, and roads were located, there was a corresponding accumulation of Zn that was moderately high (Figures 10 and 11). In a study on surface soils in Spain [23], human activities and the type of parent materials were linked to increased Zn concentrations in the agricultural soils of the region. Similar findings were shown in Chen et al. [41], where there were also links between Zn concentrations, human activities, and parent materials.

Conclusions
It is important to recognize that the challenges related to heavy metal pollution of soil are a by-product of development and industrialization efforts and are therefore difficult to address. There is a need for continuous modeling and monitoring of potentially contaminated areas. This study integrates geochemical analysis, remote sensing data, and predictive modeling as basic steps in monitoring the soils of heavily contaminated areas. Soil monitoring is necessary to identify possible contamination sources and appropriate corrective and remedial actions; and sustainable land use management practices in relation to mining, agriculture, and urban development. Here, the RF model was applied for the spatio-temporal prediction of Fe, Mn, Ni, Pb, and Zn concentrations in the central arid region of Iran. The overall trends indicated that the concentration and spatial distribution of these heavy metals have historically increased from 1986 to 2016. This study provides fundamental knowledge on heavy metal contamination in soils and will facilitate decision making practices, support soil remedial practices for contaminated areas, and assist in targeted conservations measures.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10 .3390/rs13091698/s1, Table S1: Validation results for soil heavy metal concentrations predicted using machine learning models.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy restrictions.

Conflicts of Interest:
The authors declare no conflict of interest.