Spatial Distribution and Mobility Assessment of Carcinogenic Heavy Metals in Soil Proﬁles Using Geostatistics and Random Forest, Boruta Algorithm

: In third world countries, industries mainly cause environmental contamination due to lack of environmental policies or oversight during their implementation. The Sheikhupura industrial zone, which includes industries such as tanneries, leather, chemical, textiles, and colour and dyes, contributes massive amounts of untreated efﬂuents that are released directly into drains and used for the irrigation of crops and vegetables. This practice causes not only soil contamination with an excessive amount of heavy metals, but is also considered a source of toxicity in the food chain, i.e., bioaccumulation in plants and ultimately in human body organs. The objective of this research study was to assess the spatial distribution of the heavy metals chromium (Cr), cadmium (Cd), and lead (Pb), at three depths of soil using geostatistics and the selection of signiﬁcant contributing variables to soil contamination using the Random Forest (RF) function of the Boruta Algorithm. A total of 60 sampling locations were selected in the study area to collect soil samples (180 samples) at three depths (0–15 cm, 15–30 cm, and 60–90 cm). The soil samples were analysed for their physico-chemical properties, i.e., soil saturation, electrical conductivity (EC), organic matter (OM), pH, phosphorus (P), potassium (K), and Cr, Cd, and Pb using standard laboratory procedures. The data were analysed with comprehensive statistics and geostatistical techniques. The correlation coefﬁcient matrix between the heavy metals and the physico-chemical properties revealed that electrical conductivity (EC) had a signiﬁcant ( p ≤ 0.05) negative correlation with Cr, Cd, and Pb. The RF function of the Boruta Algorithm employed soil depth as a classiﬁer and ranked the signiﬁcant soil contamination parameters (Cr, Cd, Pb, EC, and P) in relation to depth. The mobility factor indicated the leachate percentage of heavy metals at different vertical depths of soil. The spatial distribution pattern of Cr, Cd, and Pb revealed spatial variability regarding subsoil horizons. Signiﬁcant contamination was discovered near the Deg drain and the Bed Nallah irrigated area that indicated a high Cr topsoil contamination, and in a homogenous pattern in Cd and Pb ( p < 0.05). Consequently, different soil management strategies can be adopted in an industrial irrigated area to reduce the contamination load of heavy metals in soil.


Introduction
Increasing population growth, large-scale industrialization, and exceptional urbanization during the last few decades are responsible for the degradation of the quality of environment and soil [1]. Soil is one of the basic environmental media, it can be contaminated through the accumulation of heavy metals through the discharge of effluents from mine tailings, coal combustion, sewage sludge, disposal of high metal wastes, paints and leaded gasoline, land application of fertilizers, and animal manures [2]. The ill-defined group of inorganic compounds and hazardous chemicals that industries Some analyses performed methodical investigations to determine the presence of heavy metals and other elements in soil sediments. They employed conventional statistical techniques to classify soil properties and other factors using a cluster analysis (CA) and a principal component analysis (PCA) [20]. CA and PCA are multivariate analysis techniques and are commonly employed to extract hidden patterns and structures from multi-dimensional data from different fields of the social sciences, natural sciences, and soil sciences [21]. The combinations of these methods are applied for the evaluation and classification of the elements and heavy metals present in soil [22]. However, these methods entail some complexities regarding data analysis from different dimensions with multiple soil depths. Moreover, they require the normalization of skewed results and spatial data distribution patterns that are influenced by many principal components [23]. Therefore, a comprehensive method is required to replace the conventional multivariate statistical methods for the selection of soil parameters with spatial heterogeneity in soil depths. The Random Forest (RF) classifier can accomplish reliable and robust estimations [24] of essential soil parameters at different soil depths. No studies reported a reliable method of assessment for the technique of soil and the ranking of significant soil pollution variables at different depths using the RF classifier of the Boruta Machine Learning Algorithm. In this regard, this article introduces a new assessment of soil parameters at multiple lithological classes of soil depths.
Furthermore, exponential and spherical variogram models have been employed by many experts for the analysis of the spatial variability of heavy metals in soil [25]. For this purpose, apart from the spatial variability of heavy metals at large distances described by the lag distance, the practical interest of variability at short distances is also required for improved characterisation. The Matern constitutes a flexible variogram model with a behaviour of the smoothness and fits the semivariogram close to the origin, supporting the differentiability and continuity of the random variables and covering the local smoothness [26] to plot the sample concentration of soil parameters.
Traditional interpolation methods, such as the Generalized Kriging Method, Polynomial Method, and the Inverse Distance Weighting Method (IDW) have been extensively employed to transform the experimental point data of sample locations into thematic maps that show the spatial distribution [27]. The polynomial method and IDW method are comparatively easy and quick to evaluate, but the spatial variability and relations between the nodes cannot be estimated with them, moreover, these methods allow a partial accuracy of estimation. In comparison, the Kriging Interpolation Method employs the variogram model to analyse the structure of the spatial variation of estimated values and considers the spatial autocorrelation in modelling the surface [16]. The kriging technique comprises a collection of interpolation methods based on stochastic approaches, including ordinary kriging (OK), cokriging, universal kriging, simple kriging, residual, and regression methods [28].
This study applied traditional statistical and geospatial methods, and in addition a novel machine learning approach, i.e., the RF Wrapper function of the Boruta Algorithm, which had not been previously employed in soil contamination studies in the literature. These techniques of analysis were employed to explore the spatial variability of heavy metals not only at the surface soil, but also at the subsurface and deep-soil, and to further assess the mobility factor due to the leaching of heavy metals in the lower horizons of the soil profile.

Study Area
The study area was located in the industrial zone of Sheikhupora (31 • 20 N to 33 • 05 N and 73 • 37 E to 74 • 41 E) ( Figure 1). A total of 220 major industries are located in this area, which includes tannery and leather, shoes, textile, chemical industries, batteries and tires, rubber, pipe, food, glass, oil and polyester, flour mills, steel mills, rice mils, and fertilizer industries. These industries generate about 395 cusecs of effluent sewage that is released directly into municipal sewers and open surface drains, and ultimately end up in the river Ravi [29]. The discharged effluent contains carcinogenic and hazardous substances and a high level of heavy metals, aromatic dyes, inorganic salts, and organic material [30]. and hazardous substances and a high level of heavy metals, aromatic dyes, inorganic salts, and organic material [30].

Soil Sampling and Lab Analysis
A random soil sampling technique was employed for the collection of soil samples to make sure that all sites in the study area were represented in the soil sampling [31]. A total of 60 sites were sampled at three depths (0-15 cm, 15-30 cm, and 60-90 cm) and a total of 180 (60 × 3) soil samples were collected with a tube auger ( Figure 1). Each soil sample was represented by three replicates that were composited as a single sample. The spatial coordinates of the sampling locations were recorded using a handheld Garmin Global Positioning System (GPS).
The collected soil samples were placed in labelled paper bags and transported to the lab for further analysis. The soil samples were dried, and subsequently, crushed and passed through a fine sieve of 2 mm [32]. Soil chemical properties including EC, pH, P, K + , and saturation were determined in soil samples using the standard procedures [33,34]. Soil organic matter (OM) was determined through Tyurin's method [35,36]. The acid digestion method followed by USEPA standards 3051A and 3050B was employed to evaluate the total concentrations of selected heavy metals in the soil samples [37,38]. Further, the digested soil samples were analysed for the selected heavy metals (Cr, Pb, and Cd) using the Graphite Furnace Atomic Absorption Spectrometer (GFAAS) by means of their respective wavelengths [39,40]. The metal concentrations were expressed in units of mg kg −1 of the dry soil. The concentrations of all examined physiochemical properties and selected heavy metals for the three depths of soil were arranged along with their spatial coordinates in ArcGIS Geodatabases. The soil data was further explored using exploratory data analysis techniques.

Descriptive Statistics and the Correlation Matrix of Soil Parameters
The heavy metals present in the soil and other chemical properties were statistically analysed using the SPSS software (IBM Corp., Armonk, NY, USA) [41] to determine the descriptive statistics (max. min., standard deviation, kurtosis and skewness) and coefficient of correlation. The distribution pattern of all parameters for the three depths of soil was estimated based upon the SD

Soil Sampling and Lab Analysis
A random soil sampling technique was employed for the collection of soil samples to make sure that all sites in the study area were represented in the soil sampling [31]. A total of 60 sites were sampled at three depths (0-15 cm, 15-30 cm, and 60-90 cm) and a total of 180 (60 × 3) soil samples were collected with a tube auger ( Figure 1). Each soil sample was represented by three replicates that were composited as a single sample. The spatial coordinates of the sampling locations were recorded using a handheld Garmin Global Positioning System (GPS).
The collected soil samples were placed in labelled paper bags and transported to the lab for further analysis. The soil samples were dried, and subsequently, crushed and passed through a fine sieve of 2 mm [32]. Soil chemical properties including EC, pH, P, K + , and saturation were determined in soil samples using the standard procedures [33,34]. Soil organic matter (OM) was determined through Tyurin's method [35,36]. The acid digestion method followed by USEPA standards 3051A and 3050B was employed to evaluate the total concentrations of selected heavy metals in the soil samples [37,38]. Further, the digested soil samples were analysed for the selected heavy metals (Cr, Pb, and Cd) using the Graphite Furnace Atomic Absorption Spectrometer (GFAAS) by means of their respective wavelengths [39,40]. The metal concentrations were expressed in units of mg kg −1 of the dry soil. The concentrations of all examined physiochemical properties and selected heavy metals for the three depths of soil were arranged along with their spatial coordinates in ArcGIS Geodatabases. The soil data was further explored using exploratory data analysis techniques.

Descriptive Statistics and the Correlation Matrix of Soil Parameters
The heavy metals present in the soil and other chemical properties were statistically analysed using the SPSS software (IBM Corp., Armonk, NY, USA) [41] to determine the descriptive statistics (max. min., standard deviation, kurtosis and skewness) and coefficient of correlation. The distribution pattern of all parameters for the three depths of soil was estimated based upon the SD and skewness values. Datasets of a variable with skewness in the range −1 and 1 were considered to be normally distributed [42].

Selection of Soil Parameters Using the Random Forest Classifier
In order to classify the soil parameters according to their variability or consistent distribution patterns at different soil depths, a feature selection of algorithm Boruta [43] was applied with the Classifier (RF) function. The RF classifier is a machine learning function of the wrapper method through the Boruta algorithm [44]. The wrapper function of RF identifies relevant parameters by performing multiple runs of the provided classification factors that test the performance of different subsets of the input parameters [45]. The RF function is a suitable alternative for the analysis of soil parameters at different depths, as it can be employed without extensive parameter tuning and returns an estimate of the feature's importance (Z-score) [46]. This parameter selection method is different from other dimensionality reduction techniques, such as PCA or other methods. The purpose was to identify the compound linkages of the concentration of heavy metals in the geographical context of the soil profiles. Unlike the PCA, this method attempts to reduce the number of attributes in the dataset, without reducing the dimensionality [47].
In the present scenario, the soil's physio-chemical parameter (soil saturation, EC, OM, pH, P & K) and heavy metals (Cr, Cd, and Pb) were evaluated using the RF, while soil depths were assigned as the classification factor. Therefore, it was likely to identify the effective random soil parameters that may explain some of the variability in the targeted classes of soil profile by obtaining the positive Z-score by a single run of the RF classifier (Equation (1)).
where PPV * represents the approximate positive predicted values, N relevant (X contrast ), and N relevant (X original ) constitute a number of contrast and original variables respectively, which the algorithm considered significant. The importance scores of the original features were tested against these random variables and only parameters with a positive higher Z-score were retained as significant soil variability factors. The analysis was repeated for each depth of soil (0-15 cm, 15-30 cm, and 30-90 cm) to check the robustness of the parameters for the contamination risk at the subsoil level. This may examine the possibility of selected soil heavy metals for quantification according to their mobility at different depths.

Metal Mobility Ratio in Subsoil Profiles
A metal mobility factor (Equation (2)) was utilised to differentiate the natural concentration of heavy metals at different soil depths [48] from the anthropogenic contamination. The metal mobility factor describes the possible movement of the selected heavy metals out of the soil profiles [49].
where MMR denotes the metal mobility ratio, and M represents a concentration of metals (Mt1, Mt2, and Mt3 indicate total heavy metal concentration in the soil, Ma4, Ma5, Ma6 indicate available heavy metal enrichment for plants) at the three different fractions of soil depths. The mobility ratio of selected heavy metals (Cr, Cd, and Pb) denotes the vertical distribution of soil contamination parameters at different depths; however, to assess the variability and spatial distribution of heavy metals in the soil, the geostatistical approach was further employed to map the geographical extent of heavy metals.

Geostatistical Analysis and Variogram Modelling for Soil Heavy Metals
The spatial distribution of selected soil heavy metals (Cr, Cd, Pb) at different depths was assumed as a normal random distribution and was assessed using variogram modelling. Spatial distribution maps were generated to depict the spatial variability of heavy metals. Four variogram models (Matern, Spherical, Exponential, and Gaussian) were tested. The Matern (Ste) variogram, which had the smallest root mean square error (RMSE), was optimally suited and selected. Experimental variograms were estimated and fitted with the theoretical models. If a variogram reaches a certain limit that results in a certain effective range, the spatial structure and conditions for inherent spatial assumption can be achieved. The Matern variogram models were calculated [25] using the given Equation (3).
where σ 2 > 0 represents the variance, v > 0 is the shape parameter, α > 0 represents the scale (or range) of parameter, K v (.) forms the modified Bessel function of the second kind order, Γ(.) represents the gamma function, while |h| and v constitute the norm of vector h, v = 0.5. Further, the Matern model is the exponential model, while the Gaussian model represents a limiting case of the Matern model when n tends to infinity [50]. The variograms were calculated for the nine different parameters of soil at the three soil depths. The maximum sample variogram distance, taken from 1/3 of the range parameter, sill displayed the spatial autocorrelation component. The nugget parameter was assumed as the mean of the first three sample variogram values. The partial sill was assumed to be the mean of the last five sample values of the variogram and scaled through variance values. Furthermore, based on estimation and the smallest RMSE, the Matern variogram was considered as a suitable model to predict the spatial surfaces of heavy metals' concentration, except for Cr (15-30 cm), for which the spherical variogram was selected as the suitable model.

Kriging Interpolation
Spatial prediction to assess the variability of parameters implies the process of estimation of a target quantity's values at un-sampled locations. Ordinary kriging (OK) interpolation is the geostatistical method employed for the spatial distribution of heavy metals' concentration (Cr, Cd, and Pb) in soil at three depths. It entails the superior method for the estimation of interpolation weights and can provide error information for the creation of the thematic maps. The following formula was used to estimate the OK:Ẑ whereẐ(S 0 ). represents the interpolated or predicted value for S 0 point, and λ i . is the value for the equivalent weight of the Z(S i ) values, while Z(S i ) indicates the known value to satisfy the condition and ∑ N i = 1 λ i = 1. The SPSS software was utilised for the statistical analysis, GeoR (version 3.3.3) [51] the gstat package was used for the geostatistical analysis, and ArcGIS (ESRI, Redlands, CA, USA) [52] was applied for the interpolation of the distribution maps for heavy metals in soil [53,54].  Table 1 presents the descriptive statistics of all the soil variables. All the variables display a normal distribution, except potassium (K) (skewness value > 2) and EC (skewness = 1 in topsoil and subsoil; < 1 in deep-soil). The mean concentration of EC for the topsoil was significantly (p < 0.05) higher than that for the subsoil and deep-soil horizon. The mean pH (8.2-7.8) and EC values (4.9-4 µS cm -1 ) exhibit a decreasing trend from the topsoil to the subsurface soil. The order of heavy metal concentration in the soil was Cr > Pb > Cd, with a decreasing trend from the surface to the subsurface soil ( Table 1). The heavy metals' assimilation can be greatly influenced by the physiochemical properties of soil that, ultimately, also affect the root growth and mobility of the contaminants at different soil horizons [55].

The Exploratory Analysis of Soil Variables
The P concentration significantly increased from the topsoil to the subsoil (range 17.3-20.3 mg kg -1 ). Andersson found high P concentrations at the subsoil, as compared to the topsoil, that indicated that the subsoil acted as a sink or source for P leaching. Therefore, depending on the P concentration and the soil saturation level, the subsoil properties could be considered for the adoption of mitigation measures to reduce the P leaching in soil horizons [56].  Table 1 presents the descriptive statistics of all the soil variables. All the variables display a normal distribution, except potassium (K) (skewness value > 2) and EC (skewness = 1 in topsoil and subsoil; < 1 in deep-soil). The mean concentration of EC for the topsoil was significantly (p < 0.05) higher than that for the subsoil and deep-soil horizon. The mean pH (8.2-7.8) and EC values (4.9-4 µS cm -1 ) exhibit a decreasing trend from the topsoil to the subsurface soil. The order of heavy metal concentration in the soil was Cr > Pb > Cd, with a decreasing trend from the surface to the subsurface soil ( Table 1). The heavy metals' assimilation can be greatly influenced by the physiochemical properties of soil that, ultimately, also affect the root growth and mobility of the contaminants at different soil horizons [55].

The Exploratory Analysis of Soil Variables
The P concentration significantly increased from the topsoil to the subsoil (range 17.3-20.3 mg kg -1 ). Andersson found high P concentrations at the subsoil, as compared to the topsoil, that indicated that the subsoil acted as a sink or source for P leaching. Therefore, depending on the P concentration and the soil saturation level, the subsoil properties could be considered for the adoption of mitigation measures to reduce the P leaching in soil horizons [56].  Table 2 presents the results of the correlation coefficient for various soil properties. A positive significant (p ≤ 0.01) correlation coefficient (r = 0.55) was found between pH and EC. However, a non-significant (p ≥ 0.05) relationship (r = 0.12) was found between OM and pH, and a negative correlation was found between all heavy metals and pH. Many laboratory experiments and studies demonstrated the existence of a negative correlation coefficient between the mobility of heavy metals and pH [57]. A significant correlation was assessed between all three heavy metals and P, while OM indicated a positive relationship with Cr and Pb, but a negative non-significant (p ≥ 0.05) correlation with Cd. A negative correlation between Cd and Pb was discovered with soil saturation, while a positive correlation was found with Cr. The Cr with its oxidation states may affect greater downward mobility in the presence of high soil saturation [58]. Generally Cr is highly mobile in soil, however, Cr(VI), due to its anionic nature, can be adsorbed by clays, iron and manganese oxides and hydroxides in acidic soil because of an increase in positively charged sites at mineral surfaces [59]. The retention of Cr(VI) decreases with increasing pH, resulting in higher mobility in the subsurface soil due to weak adsorption in the alkaline to neutral conditions [60]. Similarly, Pb and Cd solubility and movement is highly dependent on soil pH and the type of soil separates. Both metals are highly available naturally at low pH, but since the study site is repeatedly irrigated with the wastewater enriched with the heavy metals, the concentration will govern its bioavailability and movement in the soil [61]. The metal to metal correlation indicated the mutual variability in metal accumulation and their bioavailability in soil [62]. The random pattern of correlation among each soil parameter is observed from each layer of soil. Therefore, to check the relevance and importance of each soil parameter with regard to soil depth, a wrapper algorithm with the RF classifier [63] was employed using the R package Boruta [46]. * p < 0.05, ** p < 0.01, ± the mean concentration of all soil parameters was assessed for correlation analysis.

Assessment of Soil Contaminant Parameters
Further investigation of the soil parameters was required to sort and denote the soil contamination factors with regard to the downward soil profile at the random spatial scale. Therefore, a combination of multiple parameters (parameters = 9 and depths = 3) were tested and assessed with the wrapper RF Boruta algorithm that performed three iterations based on different classes of soil depths to attain reliable soil contamination plots for the study area. The outcomes associated with this method confirmed the relative significance values of all soil parameters (soil saturation, EC, OM, pH, P & K, and Cr, Cd, and Pb) in this study. In Figure 3, the Y-axis represents the Z-score values to discriminate the importance of each soil property (X-axis) for the given random samples. The significance of the soil parameters was evaluated with regard to spatial distribution and downward availability from the topsoil to subsoil profiles, and the relative importance factor was assessed as Z-scores ≥ 1. The blue boxes represent the shadow variables. In RF, randomness is assigned to the data, and the shuffle copies of all parameters are created as shadow features. At every iteration of soil depth, whether a real soil parameter had a higher critical score than its shadow feature was determined.

Assessment of Soil Contaminant Parameters
Further investigation of the soil parameters was required to sort and denote the soil contamination factors with regard to the downward soil profile at the random spatial scale. Therefore, a combination of multiple parameters (parameters = 9 and depths = 3) were tested and assessed with the wrapper RF Boruta algorithm that performed three iterations based on different classes of soil depths to attain reliable soil contamination plots for the study area. The outcomes associated with this method confirmed the relative significance values of all soil parameters (soil saturation, EC, OM, pH, P & K, and Cr, Cd, and Pb) in this study. In Figure 3, the Y-axis represents the Z-score values to discriminate the importance of each soil property (X-axis) for the given random samples. The significance of the soil parameters was evaluated with regard to spatial distribution and downward availability from the topsoil to subsoil profiles, and the relative importance factor was assessed as Z-scores ≥ 1. The blue boxes represent the shadow variables. In RF, randomness is assigned to the data, and the shuffle copies of all parameters are created as shadow features. At every iteration of soil depth, whether a real soil parameter had a higher critical score than its shadow feature was determined.  . The Relevance feature selection of different parameters in soil by using the RF Boruta Algorithm in machine learning (n = 180, Classes = 3 depths of Soil) * p < 0.05 (Green represents the important parameters for contamination at three soil depth classes; blue presents the shadow variables through RF machine learning; yellow indicates the tentative variables; red represents significant variability with reference to soil depths); (X-axis = soil parameters; Y-axis = importance of parameters at depth in Z-score).
Large Z-scores of a parameter indicated significant (p ≤ 0.05) importance. The distribution of Pb was found to be random with substantial importance (Z ≥ 30) which showed significant permutations in comparison to other heavy metals (Cr ≤ 30 and Cd ≥ 20). This indicates that the concentration of heavy metals has higher Z-scores based on permutations of spatial variability and the heterogeneity of metals at different soil depths [64]. The importance of P was also found to be significant (p ≤ 0.05, Z-score ≤ 10) in the consideration of the soil analysis with RF, it is the critical nutrient in soil and vastly used in agricultural soil as a fertilizer [65] and showed significant leaching from topsoil to subsoil (p ≤ 0.05). The RF importance score for EC with Z-score ≥ 1.0 implies the least significance (10%) as compared to other significant soil parameters. The importance of score K is estimated as a tentative parameter by a simple comparison of the median attribute, which was not confirmed as significant. The saturation (Sat), OM, and pH were indicated as not confirmed by the RF (p ≥ 0.05), which may indicate some assortment of soil horizons with the least important Z-score (≤1.0). Furthermore, it is evident that the soil parameters that invariably yielded the high importance Z-scores in each class of soil depth during the individual RF runs were selected as important. The classification performance and assortment of soil parameters was directly affected by the permutation process of the RF. The relatively high importance factor revealed its permutations by using the RF algorithm. Since the whole process is dependent on permutation without reducing the dimension, dimensionality reduction techniques such as PCA [66] were replaced by RF to explore the distribution of soil parameters at depth. Additionally, the mobility assessment of important soil heavy metals at the individual level is necessary for further investigations at subsoil horizons.

Estimation of Metal Mobility and Variability at Vertical Soil Profile
For the RF classifier, the heavy metals were selected as essential contamination factors with significant speciation at different soil depths. The degree of contamination and downward percolation of heavy metals in soil was assessed using the metal mobility ratio. The mobility ratio of the total heavy metals' concentrations and environmentally available metals' concentrations at different depths were assessed. The metal mobility ratio was calculated for the leachate of Cr, Cd, and Pb at three different depths of soil. Figure 4 revealed that the topsoil contained >50% mobility for Cr, while for the middle soil, the value of metal mobility ratio (MMR) was 30%, and 0-20% MMR for subsoil. The mobility of Cd was 40% at the topsoil profile, while there was >45% mobility in the middle soil depth and >20% mobility at subsoil ( Figure 5): this indicates the highest speciation of Cd below the surface soil. Pb displayed the highest mobility at the topsoil horizon, with >60% MMR, while it was 20% in the middle and subsoil MMR. This demonstrates a low leachate of Pb in the deep-soil ( Figure 6). The concentration of heavy metals that can pose a danger to the ecosystem and plants was represented as the environmentally available level. The comparison of the available and total concentrations of heavy metals in the three soil depths indicated that the mobility ratio of Pb is higher than that of Cr and Cd. Furthermore, the percentage of water-soluble Cd concentration in the subsoil was higher than that obtained for the effluent irrigated topsoil. The concentration of Cr indicated more water-soluble, exchangeable, and carbonated bound forms in the topsoil profile [67]. The heavy metals' speciation and their mobility associated with toxicity at soil depths were critical to understand and evaluate the bioavailability of environmental risks. However, various variogram models for each heavy metal were evaluated for all three depths before generating their respective spatial interpolation maps.

Assessment of Appropriate Variogram Models
The spatial analysis of heavy metals in soil is considered as a prerequisite for designing effective strategies for the improved management of contaminated soil [68]. For that purpose, the selection of the best-fitted variogram model to explore the spatial structure, and ultimately, generate accurate perdition maps at the un-sampled location is required. The variogram models for the soil heavy metals were estimated based on cross-validation. The best-fit parameters of the variogram models are presented in Table 3. The best-fit experimental variogram models were selected based on the lowest RMSE values obtained by the GeoR software.
The spatial correlation of the soil heavy metals was assessed through the estimation of the ratio of the nugget effect in relation to the sill. A variable was considered to be strongly spatially dependent if the ratio was ≤ 25%; moderately spatially dependent if the ratio ranged between 25% and 75%; and weakly dependent if the ratio exceeded 75%. The heavy metals' concentration was indicated to have a moderate to weak dependency for the spatial variability on the topsoil to subsoil profiles. Therefore, it indicated that the concentration of heavy metals at the subsoil profile is affected by the slow leachate process. High soil saturation, with less permeability and a high porosity of soil at the deep-soil level, may also affect the variability [69]. The variogram of Cd (60-90 cm soil depth) (Figure 7ii,iii) showed some fluctuations with a small upward trend with a low sill, suggesting that there is less spatial autocorrelation. This may lead to heterogeneity in the spatial distribution of Cd. The variograms for the concentrations of Pb (Figure 8i-iii) and Cr (Figure 6i-iii) are normally well structured with small nugget effects, demonstrating that the sampling density is acceptable for the spatial structures with moderate spatial dependency. According to the nugget/sill ratios of heavy metals' (Cr, Cd, Pb) concentration, the selected variograms were found to be suitable for the adoption of interpolation techniques to predict attribute values at un-sampled locations using related information.

Assessment of Appropriate Variogram Models
The spatial analysis of heavy metals in soil is considered as a prerequisite for designing effective strategies for the improved management of contaminated soil [68]. For that purpose, the selection of the best-fitted variogram model to explore the spatial structure, and ultimately, generate accurate perdition maps at the un-sampled location is required. The variogram models for the soil heavy metals were estimated based on cross-validation. The best-fit parameters of the variogram models are presented in Table 3. The best-fit experimental variogram models were selected based on the lowest RMSE values obtained by the GeoR software.
The spatial correlation of the soil heavy metals was assessed through the estimation of the ratio of the nugget effect in relation to the sill. A variable was considered to be strongly spatially dependent if the ratio was ≤ 25%; moderately spatially dependent if the ratio ranged between 25% and 75%; and weakly dependent if the ratio exceeded 75%. The heavy metals' concentration was indicated to have a moderate to weak dependency for the spatial variability on the topsoil to subsoil profiles. Therefore, it indicated that the concentration of heavy metals at the subsoil profile is affected by the slow leachate process. High soil saturation, with less permeability and a high porosity of soil at the deep-soil level, may also affect the variability [69]. The variogram of Cd (60-90 cm soil depth) (Figure 7ii,iii) showed some fluctuations with a small upward trend with a low sill, suggesting that there is less spatial autocorrelation. This may lead to heterogeneity in the spatial distribution of Cd. The variograms for the concentrations of Pb (Figure 8i-iii) and Cr (Figure 6i-iii) are normally well structured with small nugget effects, demonstrating that the sampling density is acceptable for the spatial structures with moderate spatial dependency. According to the nugget/sill ratios of heavy metals' (Cr, Cd, Pb) concentration, the selected variograms were found to be suitable for the adoption of interpolation techniques to predict attribute values at un-sampled locations using related information.

Spatial Distribution of Heavy Metals in Soil
The spatial extent of the Cr concentration reveals a homogenous trend in the topsoil ( Figure 7A) and subsoil ( Figure 7B) around the tanning industries in the (northern) N and (western) W part of the study area. In the topsoil, high concentrations of Cr in the eastern (E) part of the study area exhibited trans-boundary chromium discharge in drains from the tanning industry in the neighbouring areas [70]. An extensive amount of untreated effluent containing chromium is discharged by the tanning industries and released directly into the drains for irrigation purposes [71]. Figure 7C reveals a different spatial trend for Cr with E to W, with a maximum concentration of 20-24 mg kg -1 in deeper soil profiles that indicate Cr leached into the deep-soil from the anthropogenic source. The Cr percolation may vary with different valent stats of metal through the processes of oxidation and reduction [72]. The spatial extent of Cd displays a higher concentration in the central part of the study area in patches while it covers greater spatial extent of higher values in subsoil and deep-soil ( Figure 8A-C). It showed the highest Cd concentrations, ≥8 mg kg -1 in Figure 8A topsoil, ≥7 mg kg -1 in Figure 8B subsoil, and ≥5 mg kg -1 in Figure 8C deep-soil at the central part, with a south (S) and S-E trend of Cd around the industrial zone of the study area. While, a medium concentration (≤5 mg kg -1 ) was found in the extreme eastern part at all soil depths. Low concentrations (≤4 mg kg −1 -≥2 mg kg −1 ) were found in the W parts for the topsoil and the subsoil. While a ≥2 mg kg -1 Cd concentration was found in the N-W part and ≥3 mg kg −1 in western parts of the deep-soil. There was a narrow outlier of high Cd concentrations in the extreme W portion of the study area that may be associated with the industrial point source pollution, such as the Cd battery production plants indicated in Figure 8C. The Cd concentration decreased spatially from top to deep-soil layer, however, the subsoil formed a transitional layer with similar distribution patterns while the deep-soil demonstrated more stable and consistent deposition of Cd, and the same was reported by Onweremadu [73]. The Pb in all three depths of soil revealed higher concentration in the N-E quadrant ( Figure 9A) of the study area, while a small-scale discontinuity pattern of Pb was found with a concentration level of 6 mg kg −1 ( Figure 9B) in the subsoil. The patterns of medium and low concentrations in Figure 9C were observed under the lead emission drain that form slow sinks of Pb in the deep-soil profiles [74].
The overall heavy metals' concentration maps revealed that high amounts of Cr, Cd, and Pb were found in soil profiles in the north direction and the central part of the study area. This area is subject to the discharge of different drains that carry industrial effluents that are used to irrigate the agricultural fields. This causes heavy metal emission in the soil, with a continuous uptake of contamination [75]. Clusters of many industries were discovered in the study area based on our survey that constitutes the source of untreated wastewater. The industries and the discharged effluent waste form the major possible sources of the leading extrinsic factors responsible for a high concentration of heavy metals in soil in this area.
This research would be instrumental in the assessment of heavy metals' contamination of soils and their remediation strategies. The reclamation of contaminated soil with heavy metals at different depths can be achieved with an appropriate understanding of the trends, availability, and leaching of metals downward. Further, heavy metal contamination can be assessed at subsequent groundwater levels, in cultivated plants, and humans. Scientific information regarding the spatial variance of measured soil properties is essential for the formulation of sustainable soil and environmental management decisions [76]. extreme eastern part at all soil depths. Low concentrations (≤ 4 mg kg −1 -≥ 2mg kg −1 ) were found in the W parts for the topsoil and the subsoil. While a ≥ 2 mg kg -1 Cd concentration was found in the N-W part and ≥ 3 mg kg −1 in western parts of the deep-soil. There was a narrow outlier of high Cd concentrations in the extreme W portion of the study area that may be associated with the industrial point source pollution, such as the Cd battery production plants indicated in Figure 8C. The Cd concentration decreased spatially from top to deep-soil layer, however, the subsoil formed a transitional layer with similar distribution patterns while the deep-soil demonstrated more stable and consistent deposition of Cd, and the same was reported by Onweremadu [73]. The Pb in all three depths of soil revealed higher concentration in the N-E quadrant ( Figure 9A) of the study area, while a small-scale discontinuity pattern of Pb was found with a concentration level of 6 mg kg −1 ( Figure 9B) in the subsoil. The patterns of medium and low concentrations in Figure 9C were observed under the lead emission drain that form slow sinks of Pb in the deep-soil profiles [74]. The overall heavy metals' concentration maps revealed that high amounts of Cr, Cd, and Pb were found in soil profiles in the north direction and the central part of the study area. This area is subject to the discharge of different drains that carry industrial effluents that are used to irrigate the agricultural fields. This causes heavy metal emission in the soil, with a continuous uptake of contamination [75]. Clusters of many industries were discovered in the study area based on our survey that constitutes the source of untreated wastewater. The industries and the discharged effluent waste form the major possible sources of the leading extrinsic factors responsible for a high concentration of heavy metals in soil in this area.
This research would be instrumental in the assessment of heavy metals' contamination of soils and their remediation strategies. The reclamation of contaminated soil with heavy metals at different depths can be achieved with an appropriate understanding of the trends, availability, and leaching of metals downward. Further, heavy metal contamination can be assessed at subsequent groundwater levels, in cultivated plants, and humans. Scientific information regarding the spatial variance of measured soil properties is essential for the formulation of sustainable soil and environmental management decisions [76].

Conclusions
Continuous effluent discharge contaminates the soil in the vicinity of industrial areas. It was discovered in this study that soil is being contaminated with the continuous loading of heavy metals at different levels. The EC values were found to be low from the topsoil to the subsoil and exhibited negative correlations between heavy metals, while they shared a positive correlation with pH. However, pH and EC were found to have a negative correlation with all the heavy metals. The RF function was found to be extremely useful for assortments that possess a significant variability in subsoil horizons without changing or reducing the parameters. In the RF EC, P, Cr, Cd and Pb were found as significant (p ≤ 0.05 and Z score ≥ 2) soil parameters across all soil depths. The soil mobility ratio identified the different percentages of heavy metal leachate with a downward vertical scale. The spatial extent and geographical distribution pattern of the heavy metals were assessed through the geostatistical technique of kriging, based upon the most suitable variogram model. The variogram modelling for the soil contamination sorted at three different depths of soil and the spatial distribution of heavy metals provided a comprehensive depiction of soil pollution in the industrial area. The N-W and central part of the study exhibited maximum contamination at all soil depths.

Conclusions
Continuous effluent discharge contaminates the soil in the vicinity of industrial areas. It was discovered in this study that soil is being contaminated with the continuous loading of heavy metals at different levels. The EC values were found to be low from the topsoil to the subsoil and exhibited negative correlations between heavy metals, while they shared a positive correlation with pH. However, pH and EC were found to have a negative correlation with all the heavy metals. The RF function was found to be extremely useful for assortments that possess a significant variability in subsoil horizons without changing or reducing the parameters. In the RF EC, P, Cr, Cd and Pb were found as significant (p ≤ 0.05 and Z score ≥ 2) soil parameters across all soil depths. The soil mobility ratio identified the different percentages of heavy metal leachate with a downward vertical scale. The spatial extent and geographical distribution pattern of the heavy metals were assessed through the geostatistical technique of kriging, based upon the most suitable variogram model. The variogram modelling for the soil contamination sorted at three different depths of soil and the spatial distribution of heavy metals provided a comprehensive depiction of soil pollution in the industrial area. The N-W and central part of the study exhibited maximum contamination at all soil depths. This, therefore, constituted the major effluent irrigated area for raising crops and vegetables. It is recommended that agriculture crops and vegetables are not irrigated with waste water, to eliminate the toxic effects that it may have on humans. Instead, it should be used for non-food crops. Further research is needed to assess the concentration of heavy metals in the groundwater, field crops, and their impact on human health.