Spatiotemporal Variability Assessment of Trace Metals Based on Subsurface Water Quality Impact Integrated with Artiﬁcial Intelligence-Based Modeling

: Increasing anthropogenic emissions due to rapid industrialization have triggered environmental pollution and pose a threat to the well-being of the ecosystem. In this study, the ﬁrst scenario involved the spatio-temporal assessment of topsoil contamination with trace metals in the Dammam region, and samples were taken from 2 zones: the industrial (ID), and the agricultural (AG) area. For this purpose, more than 130 spatially distributed samples of topsoil were collected from residential, industrial, and agricultural areas. Inductively coupled plasma—optical emission spectroscopy (ICP-OES)—was used to analyze the samples for various trace metals. The second scenario involved the creation of different artiﬁcial intelligence (AI) models, namely an artiﬁcial neural network (ANN) and a support vector regression (SVR), for the estimation of zinc (Zn), copper (Cu), chromium (Cr), and lead (Pb) using feature-based input selection. The experimental outcomes depicted that the average concentration levels of HMs were as follows: Chromium (Cr) (31.79 ± 37.9 mg/kg), Copper (Cu) (6.76 ± 12.54 mg/kg), Lead (Pb) (6.34 ± 14.55 mg/kg), and Zinc (Zn) (23.44 ± 84.43 mg/kg). The modelling accuracy, based on different evaluation criteria, showed that agricultural and industrial stations showed performance merit with goodness-of-ﬁt ranges of 51–91% and 80–99%, respectively. This study concludes that AI models could be successfully applied for the rapid estimation of soil trace metals and related decision-making.


Introduction
Generally, heavy metals (HMs) and trace elements (TE) are among the most critical environmental problems. They may be found in soils, water, and the environment, and they pose a severe threat to water scarcity, water quality, and groundwater contamination [1,2]. In recent years, HMs contamination of the environment has been a growing environmental and public health problem around the world. Furthermore, due to an exponential expansion in their usage in various activities such as agriculture, industry, technology, and urban applications, human exposure has increased considerably [1,2]. According to the World Health Organization (WHO), the ecosystem has been endangered with several physiochemical such as DO (dissolved oxygen), TOC (total organic carbon), pH, BOD (biological oxygen demand), EC (conductivity), TDS (total dissolved solids), temperature, TSS (total suspended solids), turbidity, total alkalinity, COD (chemical oxygen demand), nutrients, and HMs elements as a result of rapid industrialization, agricultural, and urbanization trends [3]. For decades, technical research has been conducted with the aim of system identification of science and engineering problems [21][22][23][24][25][26][27]. The superiority of data-driven models is attributed to certain factors, such as the building of models, type of learning, data type, and basin characteristics. Hence, achieving complex modelling such as that required for HMs requires both black-box and white-box expertise to facilitate the stochastic and experimental process [28][29][30][31]. Despite a number of published technical studies on the simulation of HMs using AI-based models such as artificial neural networks (ANN), adaptive neuro-fuzzy inference systems (ANFIS), support vector machines (SVM), etc. others that work on the application of AI in the rea of HMs include [12,16,32,33]. Recently, Yaseen [3] conducted a comprehensive review for modelling HMs using soft computing models. In this regard, the established works from 2000-2021 indicate that there is clear interest in this domain (HMs simulation) around the world.
Although various studies of AI-based HMS simulation have been published, several aspects should be explored with respect to the long-term viability of modelling HMs. The current discussion is centered on the limitations of traditional and chemometric methodologies. There is a need for new data pre-processing, such as feature selection and spatiotemporal linkage using remote sensing (RS) and geographical information systems (GIS) to understand the processes of chemical reaction and energy balance. Although conducting environmental impact assessment (EIA) studies and monitoring programs are required by Saudi environmental regulations, only minimal soil pollution studies in the Kingdom have been reported. The objective of the current study is to: (i) assess the geochemical condition of HM contamination of topsoil in the Dammam Region, Eastern Saudi Arabia, based on different spatio-distributed samples from the industrial and agriculture areas: (ii) to employ the capability of different AI-based models based on the dependency feature selection approach on two essential HMs. The principal motivation of this study is to conduct sensitivity analysis to inspect the potential influence of various parameters on the target variables, which will ease the selection attribute.

Study Area and Sample Locations
Dammam is located between latitudes 26 • 20 and 26 • 32 and longitudes 49 • 49 and 50 • 09 . It is an important port on the Arabian Gulf in the east of Saudi Arabia. It is the largest city in the eastern region with a population of over a million. The most important centers in the world for the production and refining of petroleum are located close by. It is also surrounded by many farms that produce dates and other fruits and vegetables. It also has two (2) main industrial cities for small to medium size industries. Dammam has recently experienced migration, sub-urbanization, and rapid industrialization. As stated above, the geological Map and AG and ID areas were considered and are presented in Figure 1. The top soil samples were collected in 132 locations from 4 different locations in the Dammam area over a two (2) month period (i.e., February to March 2014). The procedure is considered as a powerful acid digesting process capable of dissolving all elements that are naturally widespread in the environment.

Proposed Methodology
The modelling was carried out using the experimental data on the trace metals levels in the topsoil of the Dammam Area, Saudi Arabia. The average and standard deviation of each factor under consideration for each zone are included in the data summary. Each trace metal's concentration is measured in milligrams per kilogram of body weight. Data (inputs and outputs) were prepared for each of the eleven (11) elements under study, namely Arsenic (As), Barium (Ba), Cadmium (Cd), Chromium (Cr), Copper (Cu), Mercury (Hg), Nickel (Ni), Lead (Pb), Titanium (Ti), Vanadium (V), and Zinc (Zn). The viability of using soft computing to estimate trace metals was investigated in this work. As a result, selecting the best methodology or the most appropriate solution for a particular situation is challenging for forecasters. Data pre-processing such as data normalization, outlier removal, cleaning, and detecting the missing data was carried out for all the inputs and outputs before the development of the models, and cross-validation was employed to ensure there was no overfitting or underfitting in the training and testing data. The splitting of the data was performed using 70% for calibration and 30% for verification [34][35][36]. Furthermore, 10-k-fold cross-validation was employed during the modelling. In this technique, the data is split into k-fold equal number of sets. On the first trial, the first set was used as the test data, while the remaining sets were used to train the model. On the subsequent trial, the second set was used as the test set. In reality, determining whether one feature selection is superior to another is challenging. Hence this study employed correlation-based feature selection to understand the relationship between the AG and ID stations (Figure 2a,b).

Proposed Methodology
The modelling was carried out using the experimental data on the trace metals levels in the topsoil of the Dammam Area, Saudi Arabia. The average and standard deviation of each factor under consideration for each zone are included in the data summary. Each trace metal's concentration is measured in milligrams per kilogram of body weight. Data (inputs and outputs) were prepared for each of the eleven (11) elements under study, namely Arsenic (As), Barium (Ba), Cadmium (Cd), Chromium (Cr), Copper (Cu), Mercury (Hg), Nickel (Ni), Lead (Pb), Titanium (Ti), Vanadium (V), and Zinc (Zn). The viability of using soft computing to estimate trace metals was investigated in this work. As a result, selecting the best methodology or the most appropriate solution for a particular situation is challenging for forecasters. Data pre-processing such as data normalization, outlier removal, cleaning, and detecting the missing data was carried out for all the inputs and outputs before the development of the models, and cross-validation was employed to ensure there was no overfitting or underfitting in the training and testing data. The splitting of the data was performed using 70% for calibration and 30% for verification [34][35][36]. Furthermore, 10-k-fold cross-validation was employed during the modelling. In this technique, the data is split into k-fold equal number of sets. On the first trial, the first set was used as the test data, while the remaining sets were used to train the model. On the subsequent trial, the second set was used as the test set. In reality, determining whether one feature selection is superior to another is challenging. Hence this study employed correlation-based feature selection to understand the relationship between the AG and ID stations (Figure 2a,b).

Analysis of Soil Sampling
Experimental samples were produced according to the USEPA technique 3050B for soil, sediment, and sludge digestion [37]. Subsequently, they were evaluated using an Inductively Coupled Plasma-optical Emission Spectrometer (ICP-OES), from SPECTRO Analytical Instruments Germany [37]. The chemical reagent employed complied with the standards of the ACSCAR (American Chemical Society's Committee on Analytical Reagents). Distilled water (DI), concentrated nitric acid (HNO 3 ), concentrated hydrochloric acid (HCl), and 30% hydrogen peroxide are examples of such reagents (H 2 O 2 ). Because the digestion required the use of acid, it was done in a fume hood under the supervision of an expert and with the certified and recommended laboratory safety equipment. The equipment was calibrated using a multi-element standard solution. To confirm the equipment's appropriateness and accuracy, six working standard samples and one blank were used. Each batch of processed samples was also subjected to quality control techniques. Each batch had 20 samples: one duplicate, two spiked samples, two blank samples, and two standard samples.
For sampling purposes, the Dammam area was divided into four (4) zones: residential (R), industrial (ID), agricultural (AG), and background (BG) areas; however, for this study, only the AG and ID areas were considered, as shown in Figure 3. The background area was selected west of Dammam, away from any known industrial, agriculture, or residential activities. From each zone, 33 representative samples were collected, with a total of 132 samples. Representative soil samples of a uniform soil type in each sampling zone were collected using an auger at a depth of 10-15 cm. Geographical coordinates of all sample locations were recorded with the aid of handheld Global Positioning System (GPS) instruments (Garmen Handheld, ETrex 20). However, google maps, google street view, and satellite imagery were used in deciding the best sample locations and also helped in avoiding sampling repetition. The samples collected were then stored in polythene bags, placed in a hard box casing, and transported to the environmental laboratory at the Geosciences Department of KFUPM.

Analysis of Soil Sampling
Experimental samples were produced according to the USEPA technique 3050B for soil, sediment, and sludge digestion [37]. Subsequently, they were evaluated using an Inductively Coupled Plasma-optical Emission Spectrometer (ICP-OES), from SPECTRO Analytical Instruments Germany [37]. The chemical reagent employed complied with the standards of the ACSCAR (American Chemical Society's Committee on Analytical Reagents). Distilled water (DI), concentrated nitric acid (HNO3), concentrated hydrochloric acid (HCl), and 30% hydrogen peroxide are examples of such reagents (H2O2). Because the digestion required the use of acid, it was done in a fume hood under the supervision of an expert and with the certified and recommended laboratory safety equipment. The equipment was calibrated using a multi-element standard solution. To confirm the equipment's appropriateness and accuracy, six working standard samples and one blank were used. Each batch of processed samples was also subjected to quality control techniques. Each batch had 20 samples: one duplicate, two spiked samples, two blank samples, and two standard samples.
For sampling purposes, the Dammam area was divided into four (4) zones: residential (R), industrial (ID), agricultural (AG), and background (BG) areas; however, for this study, only the AG and ID areas were considered, as shown in Figure 3. The background area was selected west of Dammam, away from any known industrial, agriculture, or residential activities. From each zone, 33 representative samples were collected, with a total of 132 samples. Representative soil samples of a uniform soil type in each sampling zone were collected using an auger at a depth of 10-15 cm. Geographical coordinates of all sample locations were recorded with the aid of handheld Global Positioning System (GPS) instruments (Garmen Handheld, ETrex 20). However, google maps, google street view, and satellite imagery were used in deciding the best sample locations and also helped in avoiding sampling repetition. The samples collected were then stored in polythene bags, placed in a hard box casing, and transported to the environmental laboratory at the Geosciences Department of KFUPM.

Artificial Neural Network (ANN)
The ANN is a form of artificial intelligence based on the study of human neurons to simulate how the human brain processes information [38]. It is a computational model

Artificial Neural Network (ANN)
The ANN is a form of artificial intelligence based on the study of human neurons to simulate how the human brain processes information [38]. It is a computational model that produces outputs of the received inputs through several processing elements based on their predefined activation function. It has the ability to analyze the relationship between the inputs from multiple sources in an intuitive way [39] (Figure 4). For soil application, the ANN has been used to predict soil properties with reasonable accuracy by several researchers. Licznar and Nearing [40] used the ANN to predict soil erosion and runoff. Ramadan et al. [41] applied the ANN to estimate the percentage of some soil properties (clay, sand, silt, and organic carbon) from a microbial community DNA dataset. Zhao et al. [42] produced high-resolution maps of soil properties (soil texture, soil organic carbon, and soil drainage) based on DEM-generated topo-hydrological data.
that produces outputs of the received inputs through several processing elements based on their predefined activation function. It has the ability to analyze the relationship between the inputs from multiple sources in an intuitive way [39] (Figure 4). For soil application, the ANN has been used to predict soil properties with reasonable accuracy by several researchers. Licznar and Nearing [40] used the ANN to predict soil erosion and runoff. Ramadan et al. [41] applied the ANN to estimate the percentage of some soil properties (clay, sand, silt, and organic carbon) from a microbial community DNA dataset. Zhao et al. [42] produced high-resolution maps of soil properties (soil texture, soil organic carbon, and soil drainage) based on DEM-generated topo-hydrological data.

Support Vector Regression (SVR)
A support vector machine (SVR) is a machine learning technique that is widely used to perform classification and regression analysis using data analysis and pattern recognition [43,44]. It was developed by Vapnik [45], as shown in Figure 5. The SVR has been used as a standalone technique or combined with other machine learning techniques (e.g., ANN) to predict, map, and model soil properties (e.g., soil moisture, infiltration rate, soil salinity, organic content, total hydrocarbon etc.) in several studies, such as [46][47][48][49][50][51]. In the literature, it has been reported that SVR models have been utilized by several researchers to estimate the concentration of trace and heavy metals in soil [52][53][54]. Equation (1) represents the SVR function with notations xk and m as the support vectors and their numbers, while the bias term (b) and the Lagrange coefficient need to be determined analytically for optimal SVR network identification.

Support Vector Regression (SVR)
A support vector machine (SVR) is a machine learning technique that is widely used to perform classification and regression analysis using data analysis and pattern recognition [43,44]. It was developed by Vapnik [45], as shown in Figure 5. The SVR has been used as a standalone technique or combined with other machine learning techniques (e.g., ANN) to predict, map, and model soil properties (e.g., soil moisture, infiltration rate, soil salinity, organic content, total hydrocarbon etc.) in several studies, such as [46][47][48][49][50][51]. In the literature, it has been reported that SVR models have been utilized by several researchers to estimate the concentration of trace and heavy metals in soil [52][53][54]. Equation (1) represents the SVR function with notations x k and m as the support vectors and their numbers, while the bias term (b) and the Lagrange coefficient α k need to be determined analytically for optimal SVR network identification.

'Top Soil's Trace Metal Impact on Subsurface Water Quality
In this regard, the issue of groundwater interaction with polluted soil is equally crucial. This research intends to create spatial concentrations of HMs in the region's topsoil, the "hot spot" area, and the regional distribution of the pollutants, allowing the authorities to monitor company operations and offer a decent quality of life for the general population. The presence of high quantities of trace metals in soil can contaminate groundwater. These substances can enter aquifers through a variety of mechanisms. HMs with geological formation; underground water contacts with the surface that contain such metals; percolation of precipitation water, including dissolved, colloidal, and suspended materials; and direct access from the land surface via wells are examples of such channels.

'Top Soil's Trace Metal Impact on Subsurface Water Quality
In this regard, the issue of groundwater interaction with polluted soil is equally crucial. This research intends to create spatial concentrations of HMs in the region's topsoil, the "hot spot" area, and the regional distribution of the pollutants, allowing the authorities to monitor company operations and offer a decent quality of life for the general population. The presence of high quantities of trace metals in soil can contaminate groundwater. These substances can enter aquifers through a variety of mechanisms. HMs with geological formation; underground water contacts with the surface that contain such metals; percolation of precipitation water, including dissolved, colloidal, and suspended materials; and direct access from the land surface via wells are examples of such channels. For instance, HMs include metals such as Arsenic (As), which is a naturally occurring element in the earth's crust. Volcanic ash, degradation of AS-containing minerals, and ores dissolved in groundwater are some of the naturally occurring exposure mechanisms. Food, water, earth, and air contain them [8,12,54].
The water footprint (WF) indicates the amount of freshwater used by consumers or products in direct or indirect ways [55]. It is an accepted international indicator that reflects the human impact on the quantity and quality of water resources [56]. The water footprint consists of three types: GWF (green water footprint), BWF (blue water footprint), and gray WF (gray water footprint). The green water footprint (GWF) indicates the volume of rainwater consumed to produce a product or service. In other terms, it relates to the quantity of water from rainfall that is either lost by evaporation and transpiration or absorbed by plants after being held in the root zone of the soil (called green water). However, the amount of water obtained from sources such as groundwater and surface bodies are referred to as BWF. This type of water can also originate from several sources, including shallow and deep aquifers, lakes, rivers, and wetlands, which indicates the amount of groundwater and surface water used in the service production [57]. The grey WF refers to the quantity of water used to dilute a certain amount of contamination in order to meet the required standard. Gray WF was introduced to meet the needs of the ambient water quality standards by considering the essential water volume for the dilution of the commercial, agricultural, and mining industries, as well as municipal discharges, ranging from point and non-point sources of pollution. To meet the need of several international health bodies and to maintain environmental sustainability as stated by sustainable development goals, these three WF indicators emerged as essential requirements for measuring environmental indicators ( Figure 6).

Results and Discussion
The overall well-being of individuals and groups has attracted the global community's interest for decades. Such well-being cannot be measured purely by income and jobs, but also by the sustainability of the built environment and the inhabitants' physical and mental health. Dammam has seen significant urbanization and industrialization due

Results and Discussion
The overall well-being of individuals and groups has attracted the global community's interest for decades. Such well-being cannot be measured purely by income and jobs, but also by the sustainability of the built environment and the inhabitants' physical and mental health. Dammam has seen significant urbanization and industrialization due to the discovery and production of oil and gas, as well as petrochemical and other industries. The area has become more urbanized due to industrialization, necessitating an evaluation of the topsoil in the area because many of the industries in the vicinity have the potential to release harmful HMs into the ecosystem. As mentioned in Section 1, the complex nature of HMs and associated elements tends to introduce an emerging soft computing knowledge and the internet of things. This study aims to assess different trace elements' concentrations and spatial distributions at the selected AG and ID zones. Additionally, machine learning is applied for the simulation and modelling of four HMs: Zn, Cu, Cr, and Pb. Other trace elements can also be simulated using the feasibility of the same approach.

Spatiotemporal Analysis of Trace Metals
The American Association of State Highway and Transportation Officials (AASHTO) sieve analysis procedure was adopted to assess the concentrations and spatial distributions of different trace elements in the AG and ID areas. This provided a method for addressing soil research through a systematic and thorough methodology. The grain size distribution and soil consistency were used to classify soil characteristics. The topsoil was classified using a unified soil classification method. The geochemical map of the sample sites was created using AutoCAD (Autodesk: San Rafael, CA, USA) and ArcMap (Esri: Redlands, CA, USA) software. During sample collection, a portable GPS was used to record sample locations. The geochemical spatial distribution map of each element discovered in the different regions was then created using Surfer 8 software (Golden Software: Golden, CO, USA). After analysis, the trace metal content in each place was combined with the geographical coordinates of each sample location obtained during sample collection. The levels of metals are presented below in parenthesis with the mean levels followed by the maximum level detected in (mg/kg) in the sampled locations. Since the Kingdom of Saudi Arabia does not have well-defined guidelines regarding the limits of trace metals in soil, the Canadian Environmental Soil Quality Guidelines (CESQG) standards for the Protection of Environment and Human Health (PEHH) were adopted for comparison purposes. The results of the study showed that barium (Ba) was higher in industrial areas (Median = 120.90 mg/kg, Max = 1966.50 mg/kg, Min = 0.00 mg/kg), followed by agricultural (Median = 32.78 mg/kg, Max = 100.650 mg/kg, Min = 7.80 mg/kg), and residential areas (Median = 33.77 mg/kg, Max = 98.55 mg/kg, Min = 0.33 mg/kg) (Figure 7). Some of the samples from the industrial areas exceeded the allowable limit of 500 mg/kg. The elevated levels of Ba in industrial areas can be associated with the use of Ba compounds or oxides for several industrial activities [58]. Barium nitrate is used in fireworks to give them a green colour. Chromium (Cr) was highest in industrial soil samples (Median = 26.99 mg/kg, Max = 247.60 mg/kg, Min = 0.12 mg/kg), followed by the residential area (Median = 25.65 mg/kg, Max = 120.20 mg/kg, Min = 0.07 mg/kg) and the agricultural areas (Median = 23.12 mg/kg, Max = 74.70 mg/kg, Min = 3.03 mg/kg). Some of the samples from each of the three sampled areas measured above the allowable limit of 74 mg/kg. The elevated levels of Cr in agricultural soil may be attributed either to natural sources or atmospheric deposition of Cr containing compounds, as presented in Figure 7 Redlands, CA, USA) software. During sample collection, a portable GPS was used to record sample locations. The geochemical spatial distribution map of each element discovered in the different regions was then created using Surfer 8 software (Golden Software: Golden, CO, USA). After analysis, the trace metal content in each place was combined with the geographical coordinates of each sample location obtained during sample collection. The levels of metals are presented below in parenthesis with the mean levels followed by the maximum level detected in (mg/kg) in the sampled locations. Since the Kingdom of Saudi Arabia does not have well-defined guidelines regarding the limits of trace metals in soil, the Canadian Environmental Soil Quality Guidelines (CESQG) standards for the Protection of Environment and Human Health (PEHH) were adopted for comparison purposes. The results of the study showed that barium (Ba) was higher in industrial areas (Median = 120.90 mg/kg, Max = 1966.50 mg/kg, Min = 0.00 mg/kg), followed by agricultural (Median = 32.78 mg/kg, Max = 100.650 mg/kg, Min = 7.80 mg/kg), and residential areas (Median = 33.77 mg/kg, Max = 98.55 mg/kg, Min = 0.33 mg/kg) (Figure 7). Some of the samples from the industrial areas exceeded the allowable limit of 500 mg/kg. The elevated levels of Ba in industrial areas can be associated with the use of Ba compounds or oxides for several industrial activities [58]. Barium nitrate is used in fireworks to give them a green colour. Chromium (Cr) was highest in industrial soil samples Close analysis also indicated that lead (Pb) was highest in the industrial area (Median = 1.92 mg/kg, Max = 100.25 mg/kg, Min = 0.04 mg/kg), followed by the agricultural (Median = 4.61 mg/kg, Max = 52.35 mg/kg, Min = 0.90 mg/kg) and residential areas (Median = 1.89 mg/kg, Max = 25.60 mg/kg, Min = 0.08 mg/kg). None of the samples exceeded the allowable limit of 140 mg/kg. Lead occurs naturally in the environment in very small amounts. The results also showed that low levels of As, Cd, Hg, and V were detected in the top soil samples collected in the study. As was highest in the industrial area (1.58, 4.56), followed by the agricultural area (1.52, 3.14 mg/kg), while the lowest level was in the residential area (Mean = 0.97 mg/kg, Max = 2.22 mg/kg). However, none of the samples exceeded the threshold of 12 mg/kg. Cadmium (Cd) was highest in the industrial area (Median = 0.05 mg/kg, Max = 28.69 mg/kg, Min = 0.00 mg/kg), followed by the residential area (Median = 0.03 mg/kg, Max = 23.01 mg/kg, Min = 0.00 mg/kg) and the agricultural area (Median = 0.03 mg/kg, Max = 1.14 mg/kg, Min = 0.00 mg/kg). One sample from each of the industrial and agricultural areas measured above the allowable limit of 10 mg/kg. On the other hand, Mercury (Hg) was highest in the industrial area (Median = 0.05 mg/kg, Max = 1.44 mg/kg, Min = 0.01 mg/kg), followed by the agricultural (Median = 0.05 mg/kg, Max = 1.212 mg/kg, Min = 0.00 mg/kg) and residential areas (Median = 0.03 mg/kg, Max = 0.59 mg/kg, Min = 0.00 mg/kg). None of the samples exceeded the allowable limit of 6.6 mg/kg. Vanadium (V) was highest in the industrial area (Median = 14.78 mg/kg, Max = 20.42 mg/kg, Min = 0.09 mg/kg), followed by the agricultural (Median = 12.43 mg/kg, Max = 21.89 mg/kg, Min = 1.64 mg/kg) and residential areas (Median = 7.62 mg/kg, Max = 17.73 mg/kg, Min = 0.02 mg/kg). None of the samples exceeded the allowable limit of 130 mg/kg.

Simulation Using AI-Based Models
Classical techniques have been adopted for the analytical exploration, extraction, and quantification of trace metals despite several limitations and an unrealistic way of predicting the trace metals. As a result of AI-based technological developments and an industrial 4.0 IoT, a more reliable and understanding estimation of trace metals can now be achieved. To achieve the AI-based objective of this paper, a widely used AI model (ANN) and recently employed machine learning regression (SVM) are explored to simulate four different HMs (Zn, Cu, Cr, and Pb) in the AG and ID regions of Dammam, Saudi Arabia ( Figure 8). The sensitive nature of the data and sampling sites has been a focus of global attention recently; on the other hand, AI-based models provide an efficient and economic advantage that leads to strong policies related to trace elements. According to Yaseen [3] it is clear that Zn, Cu, Cr, and Pb are the most explored HMs using soft computing techniques owing to their hazardous nature. The selection of input variable features is crucial for any computational development and can play an influential role in increasing the learning and robustness of the models; this study used a Pearson-based input combination, as mentioned above. For the ANN model, modelling was carried out using several trial-and-error approaches to optimize the best hyper-turning variable, such as hidden nodes, iteration, momentum parameter, and activation constant. Similarly, SVM modelling was fine-tuned to optimize the results. The performance efficacy of the models was evaluated using a statistical variable (NSE, MSE, RMSE). In addition to the effective analysis in terms of the reliability of the predictive models, the quantitative and visual presentations of the findings provide an in-depth understanding of the impact and significance of each parameter of the proximate analysis regarding this determination.
The outcomes of the simulated models are discussed and evaluated in this section for the AG and ID sample stations. The overall results for both the calibration and verification are presented in Table 1 Moreover, this emphasizes that various statistical performance indicators and visualizations are used for the prediction models to be analyzed and evaluated. Further understanding of the results is presented in the form of spider plots in Figure 9. The figure identifies several variations of the NSE value that directly indicate the determination coefficient; the NSE establishes the relative degree of the noise or residual variance compared to the experimental data variance. The NSE values range between AG (51-91% and 51-87%) and ID (80-99% and 79-99%) for calibration and verification, respectively. Based on the reported graphical visualization of spider plots, it can be seen that AI-based models (ANN and SVR) are promising techniques for capturing nonlinear patterns of HMs. Almost all the ID station modelling justified merit with an NSE value above 80%. Some of the results for AG station are within the marginal borderline, which indicates a lot of warning signals with regard to the agricultural station.
learning and robustness of the models; this study used a Pearson-based input combination, as mentioned above. For the ANN model, modelling was carried out using several trial-and-error approaches to optimize the best hyper-turning variable, such as hidden nodes, iteration, momentum parameter, and activation constant. Similarly, SVM modelling was fine-tuned to optimize the results. The performance efficacy of the models was evaluated using a statistical variable (NSE, MSE, RMSE). In addition to the effective analysis in terms of the reliability of the predictive models, the quantitative and visual presentations of the findings provide an in-depth understanding of the impact and significance of each parameter of the proximate analysis regarding this determination. The outcomes of the simulated models are discussed and evaluated in this section for the AG and ID sample stations. The overall results for both the calibration and verification are presented in Table 1   warning signals with regard to the agricultural station.  For a better examination of the computational result, a graphical visualization was performed using scatter and time series plots. The two graphs can be used to evaluate the produced model's precision. According to the NSE, the scatter plots of the models in the ANN and SVR models are displayed in Figure 10   For a better examination of the computational result, a graphical visualization was performed using scatter and time series plots. The two graphs can be used to evaluate the produced model's precision. According to the NSE, the scatter plots of the models in the ANN and SVR models are displayed in Figure 10 (a) AG (b) ID. For most of the models in Figure 10, the accumulation of data points is high, around the 45 • line in the scatter plots. The scatter plot has the critical manner of evaluating the performance of the ML model to demonstrate the degree of deviation from the ideal line. Furthermore, as seen in the time series plots, the trend of the predicted HMs closely matches the pattern of the individual experimental HMs (Figure 10). These are arguments for the remarkable correlations between the projected HMs using these models and the experimental HMs, particularly at the ID station. As a result, the models are accurate and consistent in their predictions. To validate the promising capability of ANN and SVR models, Keshavarzi et al. [59] used an ANN to estimate soil phosphorus by combining satellite-based topography and vegetation data with field-based soil data. The ANN has been used to estimate the soil water retention curve based on field-based soil data [60]. Recently, Pham et al. [61] used the ANN to predict the soil coefficient of consolidation as a mechanical parameter to define the compaction or consolidation status of the soil. Khan et al. [62] establish an ANN-predictive model of soil temperature as geotechnical properties of clay-rich soil using a field dataset. Most of these studies have used an ANN to predict the temporal variation of the heavy and trace metals in soil [63]. The SVR is widely used owing to its optimization objective, since it aims to minimize the generalized error bound instead of the sum squared errors between the actual and predicted outputs that are peculiar in polynomial regression. Figure 11 depicts the time series plot for the models.
dictive model of soil temperature as geotechnical properties of clay-rich soil using a field dataset. Most of these studies have used an ANN to predict the temporal variation of the heavy and trace metals in soil [63]. The SVR is widely used owing to its optimization objective, since it aims to minimize the generalized error bound instead of the sum squared errors between the actual and predicted outputs that are peculiar in polynomial regression. Figure 11 depicts the time series plot for the models. It is clear that the AI-based model implementation outperformed the traditional models in capturing the pattern of the system. Additionally, careful examination of the error plot in Figure 12 indicates that the ANN model generated the smallest error term of RMSE at both the AG and ID stations. The discrepancies between the range of observed  It is clear that the AI-based model implementation outperformed the traditional models in capturing the pattern of the system. Additionally, careful examination of the error plot in Figure 12 indicates that the ANN model generated the smallest error term of RMSE at both the AG and ID stations. The discrepancies between the range of observed and predicted values in most of the models are within the acceptable limit despite some variations observed for ID-SVR-Cr in the verification phase. The error evaluation criteria depict the extent of the closeness of the predicted results to the observed values. Essentially, lower values are indications of a better correlation of the developed model. It is important to note that various factors influence the concentration of trace elements, including not only physiochemical but also hydrological, climatic, and Lithological factors. As a result, examining the AI model's capabilities with restricted input data is more advantageous for low-income and developing nations that lack the resources to create a broader range of input variables, such as hydrological and meteorological variables.

Conclusions
The primary objective of this study was to use an integrated scenario, (i) integrating the GIS to evolve the spatial distribution of HMs in the eastern province of Dammam area, which was accomplished successfully. The outcomes of the field and laboratory tests revealed that some collected samples had exceeded the standard range, and (ii) the study employed AI-based models, namely ANN and SVR, to control and understand the feasibility of simulating trace metals in the topsoil. The concentration of ten essential trace metals in the soils of the Dammam area followed a general trend in almost all metals. They were found at higher concentrations in the samples taken from the soils of industrial areas, followed by agricultural and residential areas. Only in a few of the samples were the maximum levels higher than the allowable limits. However, the mean concentration of all metals exceeded the allowable thresholds. These findings provide relief that the concentrations of all the metals studied were within acceptable limits, and they pose no immediate threat to the environment. Ultimately, the natural environmental resources, as well as animal and human health, might be exposed to the risks/hazards associated with these metals. The modelling approach indicated that ANN and SVR models are capable of estimating the HMs with high accuracy, especially in the ID stations. However, AG stations are within the range of marginal-to-good accuracy. This showed that more robust models need to be explored, such as adaptive neuro-fuzzy inference systems (ANFIS), Elman neural networks, extreme learning machines (ELM), hybrid models, and optimization algorithms, to boost the accuracy of the predictions.   To make the fair judgment deep into the current literature, the outcomes are in line with those of Pyo et al. [8], who employed an ANN, a convolutional neural network (CNN), and random forest regression (RFR) for estimating heavy metals in soil (Pb, Cu, AS). It was observed in the study that the R 2 value ranged from 0.6-0.94 for both Pb and Cu. In addition, Bazoobandi et al. [7] applied an ANN, ANFIS, and MLR model, and the outcomes showed that the range of R 2 was 0.6-0.89 and 0.3-0.93 for ANN and ANFIS, respectively. Because HMs are widely used in agriculture, industry, and other areas, they have become part of the environment, increasing the risk of the metals having a harmful influence on the ecosystem. By enacting legislation to that effect, the United States, for example, has outlawed the use of items suspected of elevating trace metals levels in the soil. For example, the United States Environmental Protection Agency (USEPA) has established specified HM thresholds that must be present in bio-solids before authorization for land spreading can be given. HMS contamination has undeniable consequences for our environment and human health. Although there have been several types of research on pollution in Saudi Arabia, such research has evaluated metals content in marine, coastal, and air environments, with limited research on topsoil.

Conclusions
The primary objective of this study was to use an integrated scenario, (i) integrating the GIS to evolve the spatial distribution of HMs in the eastern province of Dammam area, which was accomplished successfully. The outcomes of the field and laboratory tests revealed that some collected samples had exceeded the standard range, and (ii) the study employed AI-based models, namely ANN and SVR, to control and understand the feasibility of simulating trace metals in the topsoil. The concentration of ten essential trace metals in the soils of the Dammam area followed a general trend in almost all metals. They were found at higher concentrations in the samples taken from the soils of industrial areas, followed by agricultural and residential areas. Only in a few of the samples were the maximum levels higher than the allowable limits. However, the mean concentration of all metals exceeded the allowable thresholds. These findings provide relief that the concentrations of all the metals studied were within acceptable limits, and they pose no immediate threat to the environment. Ultimately, the natural environmental resources, as well as animal and human health, might be exposed to the risks/hazards associated with these metals. The modelling approach indicated that ANN and SVR models are capable of estimating the HMs with high accuracy, especially in the ID stations. However, AG stations are within the range of marginal-to-good accuracy. This showed that more robust models need to be explored, such as adaptive neuro-fuzzy inference systems (ANFIS), Elman neural networks, extreme learning machines (ELM), hybrid models, and optimization algorithms, to boost the accuracy of the predictions.