Spatiotemporal Prediction and Mapping of Heavy Metals at Regional Scale Using Regression Methods and Landsat 7

: Soil contamination by heavy metals is of particular concern, due to the direct negative impact on crop yield, food quality and human health. Although the conventional approach to monitor heavy metals relies on ﬁeld sampling and lab analysis, the proliferation in the use of portable spectrometers has reduced the cost and time of investigation. However, discrepancies in spectral data from different spectrometers increase the modeling time and undermine the model accuracy for spatial mapping. This study, therefore, took advantage of the readily accessible Landsat 7 data to predict and map the spatiotemporal distribution of ten heavy metals (i.e., Sb, Pb, Ni, Mn, Hg, Cu, Cr, Co, Cd and As) over a 640 km 2 area in Belgium. The Land Use/Cover Area Frame Survey (LUCAS) database of a region in north-eastern Belgium was used to retrieve variation in heavy metals concentrations over time and space, using the Landsat 7 imagery for four single dates in 2009, 2013, 2016 and 2020. Three regression methods, namely, partial least squares regression (PLSR), random forest (RF) and support vector machine (SVM) were used to model and predict the heavy metal concentrations for 2009. By comparing these models unbiasedly, the best model was selected for predicting and mapping the heavy metal distributions for 2013, 2016 and 2020. RF turned out to be the optimal model for 2009 with a coefﬁcient of determination of prediction (R 2P ) and residual prediction deviation of prediction (RPD P ) ranging from 0.62 to 0.92, and 1.23 to 2.79, respectively. The measured heavy metal distributions along the river ﬂoodplains, at the highlands and in the lowlands, were generally high, compared to their RF spatiotemporal predictions, which decreased over time. Increasing moisture contents in the ﬂoodplains adjacent to the river channels and the lowlands were the primary contributors to the reduction in the satellite reﬂectance spectra. However, topsoil erosion from rainfall, snowmelt as well as wind into the lowlands could have inﬂuenced the reduction in heavy metal spatiotemporal predicted values over time in the highlands. The spatiotemporal prediction maps produced for the heavy metals for the four different years revealed a good spatial similarity and consistency with the measured maps for 2009, which indicates their stability over the years.


Introduction
Soil resources of Europe are being threatened by contaminants, particularly heavy metals (HMs) and petroleum hydrocarbons. Main sources of HM contamination in Euro-prediction models were determined by comparing their prediction accuracy. The results from this investigation will be applied as a reference for future delineation of management zones in the study area for risk assessment analyses.

Study Area and Topsoil Database
The study was conducted over an estimated area of 640 km 2 in Northeastern Belgium between the cities of Ghent and Antwerp (Figure 1). A total of 435 soil sampling locations selected from LUCAS database were considered, with a 1 km sampling interval. Ten HM concentrations (i.e., Sb, Pb, Ni, Mn, Hg, Cu, Cr, Co, Cd and As) were extracted from the Land Use/Cover Area Frame Survey (LUCAS) database, which is an EU-wide project that monitors changes in the management and character of the land surface [21]. The LUCAS topsoil survey provides possibilities to obtain detailed information on soil cover in Europe, including HMs. With its sampling density (1site/200 km 2 ), it is possible to create continuous maps for reliable spatial representation at 1 km resolution of HMs in topsoil of Europe [21]. Details of the soil sampling protocols and soil tests for the HM contents are available in the study of Tóth et al. [1]. The dominant reference soil groups across the study area are Cambisols, Arenosols and Anthrosols ( Figure 2). Podzols and Gleysols occur mostly in the northwest and Retisols in the southeast.

Satellite Data Acquisition and Processing
In this study, satellite data (Landsat 7 imagery with 8 spectral bands) were used as independent variables for building prediction models. From USGS EarthExplorer, four single dates of the Landsat 7 imagery with zero cloud cover were downloaded in the order of 29-05-2009, 05-03-2013, 19-07-2016 and 31-07-2020. The digital number (DN) value of pixels corresponding to the soil sampling locations in each band was converted to spectral reflectance, using ArcMap 10.8.1.

Data Preparation and Modelling
Exploratory data analyses on the HM concentrations were performed on the 435 extracted points from the LUCAS dataset to identify outliers. Using the quartile-quartile approach, 4.8% of the dataset was detected as an outlier and removed. The new dataset was partitioned (training = 70% and test set = 30%) using the Kennard-Stone algorithm before modeling. The PLSR, RF and SVM models were developed for the soil HM concentrations and the spectral features of the Landsat 7 images. We iterated random 5-fold cross-validation 10 times to counteract both bias and overfitting. In training the models, the coefficient of determination (R 2 C ), the root-mean-square error of calibration (RMSEC), residual prediction deviation (RPD C ), and the ratio of performance to the inter-quartile range (RPIQ C ) were derived to assess how well the regressions fit the training set. The established models were validated using the test set in terms of R 2 P , RMSEP, RPD P , and RPIQ P . To compare the prediction performance of PLSR, RF and SVM models, the model classification scheme based on RPD and proposed by Viscarra Rossel et al. [22] was used. The best performing modeling method for 2009 was subsequently applied to the spectral features for 2013, 2016 and 2020 to assess its transferability and robustness, and allow evaluation of spatiotemporal variation in the studied HMs. Reference soil groups of the study area. World reference base (WRB) 40k indicates the soil map of the study area according to the international soil classification system is of a scale of 1:40,000.
As a linear multivariate regression analysis, PLSR was used. It is a popular multivariate regression method that has a good capacity for estimating attributes resulting from the spectral characteristics of the soil [23]. It is a bilinear modeling method, where information in the original x data is projected onto a small number of underlying ("latent") variables called PLS components [24]. The y data are actively used in estimating the "latent" variables to ensure that the first components are those that are most relevant for predicting the y variables. Interpretation of the relationship between x data and y data is then simplified, as this relationship is concentrated on the smallest possible number of components. More detailed information about the PLS can be found in the work of Martens and Naes [25]. To determine the optimal number of latent variables, leave-one-out cross-validation (LOOCV) was used [26] to prevent over-or under-fitting the data, which may produce models with poor performance. Generally, a model with the highest cross-validated R 2 C value and lowest RMSEC value was selected.
As a linear method, SVM is a kernel-based learning method originated from statistical learning theory [27]. These learning methods use an implicit mapping of the input data into a high dimensional feature space defined by a kernel function [28]. The model reduces the complexity of the training data to a significant subset of so-called support vectors. In the current study, two kernels were adopted (second order polynomial and linear kernel) using the R package e1071, and a grid search was conducted by 10-fold cross-validation [29]. The optimal parameters of SVM were adopted so as to produce the best performing model that produced the smallest RMSEC.
RF is a nonparametric and nonlinear classification and regression algorithm first proposed by Ho [30] and further developed by Breiman [31]. It is based on a kind of learning strategy (ensemble learning) that generates many classifiers and aggregates their results. According to its algorithm, RF does not need any data pretreatment, which is one of its main advantages. Tree diversity guarantees RF model stability, which is achieved by two means: (1) a random subset of predictor variables is chosen to grow each tree, or (2) each tree is based on a different random data subset, created by bootstrapping, i.e., sampling with replacement [32]. The RF models were developed using the randomForest package in R with the optimal number of trees to be grown (ntree) and the number of predictor variables used to split the nodes at each partitioning (mtry) set to 500 and 3-times the default mtry value, respectively. The three modeling approaches (PLSR, RF and SVM) were developed with R [33], using the packages 'prospectr' [34], 'e1071 (Meyer et al., 2015), 'pls' [35], 'randomForest' [36], and 'chemometrics' [37].

Geostatistical Prediction Method
Ordinary kriging (OK) in ArcGIS software (ArcGIS version 10.7.1, ESRI, Redlands, CA, USA) was employed as the spatial prediction method for mapping the HM distribution in the study area. The input data for the OK were the predicted values of HMs in 2009, 2013, 2016 and 2020, using the best performing models of 2009. Additionally, OK was employed to develop maps of measured HMs. As a geostatistical tool, OK uses the distance between two points to predict the semivariance of the dependent variable. The inter-point semivariances of the spatial data from a measured grid can be used to create a system of linear equations to interpolate the prediction at unmeasured points as a linear function of the measured points. Therefore, for an unmeasured point, linear weights are derived between the unmeasured point and all measured points in the network [38].
In order to explain the variations in HMs over time and space, the study area was classified into three categories: lowlands denoted by 'L' (i.e., inland areas), highlands denoted by 'H' [i.e., points with elevation >15 m above the sea level (asl)] and river floodplains denoted by 'R' (i.e., all points located along the river channels in the study area on the southeast-northeast (SE-NE) boundary and close to the west boundary) ( Figure 3 and Table A1).

Soil Heavy Metal Contents
Summary of descriptive statistics of HM concentrations from LUCAS database shows that the coefficient of variation varies between 0.24 and 0.33. Except for Sb, the distribution of HMs concentration showed heavy tails (kurtosis > 3), as compared to the univariate normal distribution (kurtosis = 3). Table 1 summarizes the statistics for all ten HMs. The Pearson correlations between the ten HMs at a significant level of 0.05 are shown in Figure 4. There are significant positive correlations between all HMs. However, apart from Cr (r = 0.65) and Co (r = 0.66), As reveals a weak positive correlation with the other HMs (r ≤ 0.47). Thus, except for As, there exist strong to very strong correlations (0.60 ≤ r ≤ 1.0) between all HMs.

Comparison in Model Prediction Performance for 2009
The comparison of model prediction performance for PLSR, RF and SVM (with the Gaussian kernel) for 2009 are presented in Figure 5. The two-machine learning algorithms (i.e., RF and SVM) considerably outperformed PLSR in the prediction of all ten HMs that were investigated. In comparison with SVM and PLSR, RF was the best performing model for all ten HMs (0.62 ≤ R 2 P ≤ 0.92; 1.63 ≤ RPD P ≤ 3.47) followed by SVM with radial kernel (0.33 ≤ R 2 P ≤ 0.87; 1.23 ≤ RPD P ≤ 2.79) and PLSR (0.06 ≤ R 2 P ≤ 0.85; 1.12 ≤ RPD P ≤ 2.57). Therefore, RF was selected and used for spatiotemporal prediction and mapping of the HMs for three subsequent years (i.e., 2013, 2016 and 2020). The independent model validation results for PLSR, RF and SVM are detailed in the Appendix A (Table A2).

Spatiotemporal Prediction Performance of RF
The performance of the optimal model (i.e., RF) for 2009 in predicting the HM concentrations for 2013, 2016 and 2020 using only the satellite band reflectance for these three years were analyzed. Figure 6 compares graphically the spatiotemporal prediction performance for the three years as well as 2009. The prediction model statistics (values) are detailed in the Appendix A (Table A3). Apart from Sb, Cd and As, the optimal model results of 2009 for Pb, Ni, Mn, Hg, Cu, Cr and Co outperformed those of 2013, 2016 and 2020. For the seven HMs (Pb, Ni, Mn, Hg, Cu, Cr and Co), the RF prediction performance was decreased over time ( Figure 6).

Spatiotemporal Distribution of Heavy Metals (HMs)
Along the river channel on the SE-NE boundary (Figures 1 and 3), the measured concentrations of Sb, Pb, Ni, Mn, Hg, Cu, Cr, Cd and As increases steadily until after point R-8, where the concentrations sharply decrease. However, the concentration of Co dips sharply at points R-5 and R-10, both of which are located outside the meander of the river. The measured HM concentrations in 2009 at the respective points along the river flood plains exceeded the RF spatiotemporal predicted values, and the latter gradually decreased over time.
In the highlands (Figure 3), the measured concentrations of Pb, Cu and Cd were higher than those recorded in the lowlands (Figure 7). The measured concentrations of Mn and Cd at all the selected points in the highlands randomly alternated in concentrations between high and low. In contrast, high concentrations of Sb, Ni, Mn, Cr, Co and As were recorded in the lowlands, compared to the highlands. However, in the highlands, the RF predicted concentrations of Sb, Ni, Mn Hg, Cr and Co exceeded the measured concentrations in 2009 and increased over time from 2013 to 2020.
The RF predicted concentrations of Cu and As were lower than those measured in the highlands and also decreased over time. Furthermore, the predicted concentrations of Pb and Mn in the lowlands also exceeded the measured values, although the former increased over time from 2013 to 2020, while the latter was generally constant. However, the measured concentrations of Hg, Cu and As exceeded the RF predicted values in the lowlands, although there was no significant variation in the predicted values over time, particularly at L6, L7 and L8, which were located at low elevations (<4.5 m asl) and near the base of the highlands.

Comparison of Spatiotemporal Distribution Maps
Maps of measured HMs (raw HM extracted data from the LUCAS raster file) and predicted HMs concentration were produced, using OK for interpolation for 2009, 2013, 2016 and 2020 ( Figures A1 and A2). Comparison between the measured and predicted maps shows a general decline in the similarity of the HM distribution over time indicated by the kappa values ( Figure 8). However, anomalous observations were detected for Sb (kappa = 0.968) and Cd (kappa = 0.984) in 2016, during which a higher similarity (kappa) in the maps was realized.

Soil Heavy Metal Contents
In this study, we predicted and mapped the distribution of HM concentrations, using readily accessible Landsat 7 data over a 640 km 2 area for years 2009, 2013, 2016 and 2020. The Land Use/Cover Area Frame Survey (LUCAS) database of the study area in the Flemish region of Belgium was used to retrieve the variation in HMs distribution over time and space. The summary of descriptive statistics shows that the coefficient of variation (CV = sd/mean) varies between 0.24 and 0.33 (Table 1), indicating a moderate variation in the HM concentrations, according to Hu et al. [39]. Apart from Sb, the HMs distribution was leptokurtic (kurtosis > 3), as compared to a univariate normal distribution (kurtosis = 3). It indicates the impact of anthropogenic activities on the distribution of HMs in the study area. Factors that affect the HMs distribution in the study area could be pedological factors, agricultural activities, industrial pollution and other anthropogenic activities. Lv et al. [40] reported that anthropogenic activities influence HMs concentration in the soil. Ballabio et al. [41] reported an average concentration of Cu in European topsoil of 16.85 mg/kg, where the main source of Cu in an agricultural soil was the intensive use of pesticides. These results were in line with those of present study, where the average Cu concentration was 13.57 mg/kg. Similarly, Ballabio et al. [42] reported a median Hg concentration of 0.038 mg/kg, which was lower than the 0.055 mg/kg median concentration of Hg in the present study at the non-polluted sites. These differences could be due to spatial variation in HMs distribution and variation in rate of anthropogenic activities at different sites. In another study, Temmerman et al. [43] reported a mean Pb concentration of 21.0 mg/kg and a mean Cu concentration of 10.3 mg/kg in the Flanders region of Belgium, which did not show a significant difference with the measured concentrations of Pb (22.4 mg/kg) and Cu (12.95 mg/kg) in the present study.
Significantly positive correlations were observed between HMs (except for As), which indicate the similar source and origin of HMs in the study area ( Figure 3). These results are in line with the findings of previous studies [44,45]. Cr, Co, Ni and Fe have siderophile affinity [46], among which Martín et al. [47] found high correlations. In another study, based on significant positive correlations among Cd, Pb, Co, Mn, Cr and Hg, Lv et al. [40] suggested the similar source of HM pollution in the soil. Positive correlation also explains the comparable modeling results obtained in this work.

Comparison of Heavy Metals Prediction Performance of Models
Comparison of prediction performance of PLSR, RF and SVM (with the Gaussian kernel) for the dataset of 2009 showed that the RF and SVM considerably outperformed the PLSR in HMs prediction ( Figure 5). This result indicates the non-linear relationships between spectra and HMs. However, SVM is a linear regression algorithm, but kernels make it work for nonlinear scenarios. Functions of kernel in the SVM model return the inner product between two points in a suitable feature space; therefore, it defines the notion of similarity, even in the high-dimensional spaces [48]. On the other hand, RF is a non-linear regression method; therefore, it can adequately address the non-linear relationships, compared to PLSR [22]. It makes several different trees by the repeated selection of the subset, and each tree's deviation is small with large variance. The effect of variance in the overall model is reduced by summing the trees [49]. However, PLSR is a linear regression model, which establishes a linear regression between independent and dependent variables by incorporating principal component matrices. It can partially remove the correlations among variables [50]. In this study, among all three models, RF showed the best performance for all HMs (0.62 ≤ R 2 P ≤ 0.92; 1.63 ≤ RPD P ≤ 3.47). The suitability of a model for HM detection mainly depends on the inner relationships between spectra and soil HMs [51]. In a previous study, SVM and RF were reported as being the best prediction models for different soil characteristics, as compared to PLSR [52]. In contrast, a study on HM prediction by using Landsat 8 imagery showed that PLSR models performed better than the non-linear regression model (SVM and ANN) [13], which could be due to different soil properties and land use types.
We used RF for the spatiotemporal prediction of HMs for three subsequent years (i.e., 2013, 2016 and 2020), due to its best prediction performance recorded, as compared to SVM and PLSR ( Figure 5). Apart from Sb, Cd and As, the prediction efficiency of the RF model for the dataset of 2009 was higher than the datasets for 2013, 2016 and 2020. This outcome was reasonable because the RF model was developed using the dataset of 2009; hence, the RF model achieved the best results for that year. It also could be due to the changes in land use, soil moisture content over time and acquisition date (in 2013, 2016 and 2020) that directly affect the Landsat reflectance spectra [53]. Therefore, it can be assumed that reflectance values of 2009 were different than those obtained for 2013, 2016 and 2020, which can affect the model performance. This means that for the farthest year from the sampling date, the lowest HM models' prediction performance was recorded. However, the RF spatiotemporal prediction of Sb and Cd for 2016, and As for 2013 were the best, in contrast to 2009 and 2020, which in turn could be attributed to enhanced soil reflectance signatures from the Landsat 7 data during these periods ( Figure 5).

Spatiotemporal Distribution of Heavy Metals and Comparison of Maps
The spatiotemporal distribution of HMS in the study area was classified into three categories: lowlands denoted (L), highlands (H) and river floodplains (R) (Figure 3). Measured concentrations of Sb, Pb, Ni, Mn, Hg, Cu, Cr, Cd and As increased steadily along the river floodplain before point R-8, where the HMs concentration sharply decreased. Compared to L and H, the concentration of Cd at R was relatively higher, which could be due to organic and mineral particles sedimented during frequent flooding periods with low flowrates [54]. Other sources of Cd distribution in the study area could be the application of phosphate fertilizers and atmospheric deposition [55]. Cd input through fertilizer depends on the rate of fertilizer application and the Cd:P 2 O 5 ratio. In 2009, the average atmospheric Cd deposition in Belgium soil was reported to be 0.2 g ha −1 year −1 [56]. Moreover, since the 19th century, several smelting industries, such as Zn smelting, were located in the Flanders region, which produced emissions, causing diffused pollution of HMs in soil [57]. While the concentration of Co sharply decreased at R-5 and R-10, both these points are located outside of the river meander. At these points, sedimentation is low, and the transportation of sediments is high, due to the high flow velocity of the river, which possibly accounts for the low Co concentration. Thus, the increase in SB, Pb, Ni, Mn, Hg, Cu, Cr, Cd and As inside the meander could be due to the low velocity of water and high sedimentation. In previous studies, the significant impact of water flow velocity and sedimentation on HMs transportation was reported [58,59]. Sediment parts (containing HMs) transported to lowlands are transferred with sediment fluxes to the North Sea, which can pollute the marine environment, endanger marine organisms and consequently negatively affect the marine food chain [60].
The measured HM concentrations in 2009 at the respective points along the river channel exceeded the RF spatiotemporal predicted values, and the latter gradually decreased over time. Seasonal variation in the river volume directly affects the soil moisture content in the floodplains. Particularly, since the prediction was based primarily on Landsat 7 reflectance spectra as the independent variable, an increase in the river volume will also increase the soil moisture content in the adjacent floodplains, thus reducing the reflectance spectra. Therefore, the most probable reason accounting for the gradual decrease in the RF predicted HM concentrations along the river channel over time could be an increase in the river volume during the period when the Landsat 7 images for 2013, 2016 and 2020 were taken.
In the highlands, the measured concentrations of Pb, Cu and Cd were higher than those recorded in the lowlands (Figure 7). The measured concentrations of Mn and Cd at all the selected points in the highlands alternated in concentration between high and low. In the study area, there were four gas stations located in the highlands and close to the points X-8, X-20 and X-21. Moreover, since Pb, Cu, Cd and Mn are trace metals in petroleum, a high concentration of these metals in the highlands originated most likely from soil contamination by the petroleum from the gas stations. In contrast, high concentrations of Sb, Ni, Mn, Cr, Co and As were recorded in the lowlands, compared to the highland, which could be due to erosion, transportation and the deposition of petroleum-contaminated soils from the highland areas into the lowlands aided by rainfall, snowmelt and wind. In the highlands, the RF-predicted concentrations of Sb, Ni, Mn Hg, Cr and Co exceeded the measured concentrations in 2009 and increased over time from 2013 to 2020 due to the enhanced reflectance spectra, which could be attributed to a reduction in the soil moisture content in the highlands over time. The sun's azimuth in the study area decreased from 156.6 • (in 2013), 149.9 • (in 2016) to 148.6 • (in 2020), but in 2009 it was 146.6 • . Weather patterns during data acquisition could be another reason. A decrease in the sun's azimuth is reciprocal to an increase in solar insolation and, hence, an increase in evapotranspiration, which will eventually result in a reduction in moisture content. Therefore, a reduction in the soil moisture content in the highlands over time enhanced the soil reflectance spectra, which could account for the increase in RF-predicted concentrations of HMs [61]. The measured concentrations of Cu and As in the highlands exceeded the RF-predicted concentrations. In the lowlands, the measured concentrations of Pb and Mn were lower than the RF-predicted values. It could be due to an increase in the reflectance spectra within the lowlands as a result of the decrease in the sun's azimuth and the decrease in moisture content, and perhaps partially due to a decline of natural drainage. In the lowlands, RF-predicted values of Hg, Cu and As were lower, compared to the measured concentrations, and did not show significant differences among the years. Particularly, L6, L7 and L8 were located at low elevations (<4.5 m asl) and near the base of the highlands. Such locations are prone to sedimentation from the highlands during rainfall and snowmelt when the topsoil is eroded, transported and deposited into the lowlands. Thus, the major driving factors of HMs accumulation in the region could be changes in moisture content due to evapotranspiration, variation in waterflow velocity, affecting HMs transportation from highlands to lowlands, sedimentation and changes in weather conditions over time, i.e., rainfall. However, apart from climate and topographic factors, anthropogenic activities, such as agricultural and mining activities and land use management, could be among major factors of HM accumulation in the study area. For example, the application of Cu-containing fungicide in agricultural soil is considered one of the major sources of Cu accumulation in European soils [62]. In another study, Qaswar et al. [63] found that the long-term application of fertilizers increased Cd, Hg and Cr concentrations in agricultural soils. Land use change alters the hydrological process and influences the transportation of HMs in soil [64]. Moreover, soil pH and organic matter distribution significantly affect the accumulation of HMs in soils [42].
Maps were produced for measured and predicted concentrations of HMs, using OK interpolation method (Figures A1 and A2). The predicted maps show that similarity of HMs distribution was decreased over the time in the study area. These variations in the spatial distribution are attributed to changes in the weather conditions over time that change the Landsat 7 spectral reflectance [65]. Moreover, spatial similarity, evaluated according to Kappa values of 2009 were higher than those obtained for 2013, 2016 and 2020 (Figure 8). That might be due to the high accuracy of the RF prediction of the 2009 dataset, compared to the corresponding predictions in 2013, 2016 and 2020, using the 2009 models. Thus, the RF models for spatiotemporal prediction of HMs using Landsat 7 spectral reflectance could be regarded as the best method that can be used for monitoring and management of HMs contaminated sites. This will allow observing spatiotemporal differences and understand the reasons of these differences.

Conclusions
The testing of different models (including linear and non-linear) and model selection as well as unbiased evaluation of the models are imperative to achieve and improve the prediction accuracy of HMs in soils when using satellite data. In this study, we found that Landsat 7 satellite data coupled with RF show great potential for predicting and mapping the spatiotemporal distribution of Sb, Pb, Ni, Mn, Hg, Cu, Cr, Co, Cd and As in soils at regional scale. The R 2 P and RPD P achieved by RF for all the HMs ranged from 0.62 to 0.92, and 1.23 to 2.79, respectively. This could be classified as a high performing model for spatiotemporal prediction in contrast to PLSR and SVM.
High concentrations of the measured HMs were detected inside the meanders of the floodplains. These locations are characterized by low river flow velocity, which creates a conducive environment for high river sedimentation. Therefore, sediments with HMs adsorbed onto their surface were readily deposited inside the meander, resulting in high HM concentrations. The RF predicted concentrations along the river channel decreased over time (from 2009, 2013, 2016 to 2020), largely due to the seasonal increase in the river volume, which also increased the moisture content in the floodplains and hence, reduced the reflectance spectra from the floodplains. In the highlands, the measured concentration of HMs exceeded the RF-predicted concentration over time, which is attributed to reduced Landsat 7 spectral reflectance, due to increases in the moisture content in the highlands. Moreover, high concentration of measured HMs in the highlands also could be due to the presence of gas stations in the highlands. Similarly, high concentrations of measured HMs were detected in the lowlands, but these decreased over time when predicted with the RF model. Comparison between the measured and predicted maps showed a general decline in the similarity of the HM spatial distribution over time in the study area. Thus, we concluded that RF model for spatiotemporal prediction of HMs using Landsat 7 spectral reflectance could be regarded as an efficient method, which can be helpful to monitor the large HM contaminated areas for mitigation strategies. Future research will need to consider multiple years before 2009 to ascertain the robustness of the RF model as well as assess the variations and similarities in the spatiotemporal distribution of the HMs in the study area.