Machine Learning Algorithms for Modeling and Mapping of Groundwater Pollution Risk: A Study to Reach Water Security and Sustainable Development (Sdg) Goals in a Mediterranean Aquifer System

: Groundwater pollution poses a severe threat and issue to the environment and humanity overall. That is why mitigative strategies are urgently needed. Today, studies mapping ground-water risk pollution assessment are being developed. In this study, ﬁve new hybrid/ensemble machine learning (ML) models are developed, named DRASTIC-Random Forest (RF), DRASTIC-Support Vector Machine (SVM), DRASTIC-Multilayer Perceptron (MLP), DRASTIC-RF-SVM, and DRASTIC-RF-MLP, for groundwater pollution assessment in the Saiss basin, in Morocco. The performances of these models are evaluated using the Receiver Operating Characteristic curve (ROC curve), precision, and accuracy. Based on the results of the ROC curve method, it is indicated that the use of hybrid/ensemble machine learning (ML) models improves the performance of the individual machine learning (ML) algorithms. In effect, the AUC value of the original DRASTIC is 0.51. Furthermore, both hybrid/ensemble models, DRASTIC-RF-MLP (AUC = 0.953) and DRASTIC-RF-SVM, (AUC = 0.901) achieve the best accuracy among the other models, followed by DRASTIC-RF (AUC = 0.852), DRASTIC-SVM (AUC = 0.802), and DRASTIC-MLP (AUC = 0.763). The results delineate areas vulnerable to pollution, which require urgent actions and strategies to improve the environmental and social qualities for the local population.


Introduction
Groundwater resources represent a precious source for the life of both humans and animals [1,2]. According to UNICEF [3], it is estimated that about 2.2 billion people globally still have no access to drinking water supply [4]. Groundwater pollution is a worldwide challenge, specifically in arid and semi-arid regions [5,6].
Groundwater pollution is associated with the uncontrolled and irrational use of agrochemicals for agricultural activities [7,8]. Therefore, the applicability of pesticides leads to dangerous effects on both human life and environmental ecosystems [9]. Thus, the use of pesticides for agricultural practices presents a double-edged sword and farmers should be sensitize about following an integrated approach of crop insurance and farmers' In this ecosystem, previous efforts have been made by other authors. For instance, Sadkaoui et al. [34] used the DRASTIC, GOD, and PRK methods to assess groundwater vulnerability. They reported the sensitivity of water resources due to the pollution coming from the ground surface. Moreover, a recent article developed by Lahjouj et al. [35] used random forest (RF) to map groundwater vulnerability with encouraging outcomes. With respect to the above cited articles, the work presented here takes a step further in the field of hybridization of machine learning (ML) models and statistical tests. In this case, we proposed five new ensemble models, DRASTIC-RF (random forest), DRASTIC-SVM (support vector machine), DRASTIC-MLP (multilayer perceptron), DRASTIC-RF-SVM, and DRASTIC-RF-MLP for groundwater vulnerability pollution risk assessment in the Saiss basin.

Study Area
This research was conducted in the Saiss basin, located in the Fez-Meknes region, Morocco (Figure 1), between latitudes 33°38′ N and 34°4′ N and longitudes 5°49′ W and 4°53′ W. The basin occupies an area of about 2100 km 2 . The elevation ranges from 212 to 1047 m. According to Koppen Climate Classification, the study area is characterized by Mediterranean climate and it has an average annual temperature of 17.2 °C, recorded in Meknes station. The average annual precipitation is about 589.3 mm. According to the census report of 2014, the population is about 2.3 million inhabitants.  From the geological point of view, the Saiss comprises several formations extending from the Palaeozoic to the Quaternary, the majority of which are Pliocene Lake limestones and fawn sands [37]. The hydrographic network of the basin consists of four main wadis, which are Oued El Kell, Oued Mikkes, Oued R'Dom, and Oued Fes. According to Essahlaoui et al. [37], the Saiss aquifer system presents a complex structure of two aquifers: 4 of 19 the Plioquaternary phreatic aquifer composed of Pliovillafranchian sandstone, conglomerated sand and lacustrine limestone, and the liassic aquifer composed of dolomitic limestone. In this aquifer, as agriculture is still going to be one of the main sources of the local population, to meet the livelihood's resource requirements, more comprehensive studies should be carried out to promote the sustainable use of natural resources for the proper management and planning of them.

Data Used and Methodology
The methodology followed in this work is composed of the following stages. The first step of this study was the use of DRASTIC method to map and assess groundwater vulnerability to pollution. Next, the bivariate statistic frequency ratio was applied to modify the result of DRASTIC method. Then, five new hybrid/ensemble machine learning models, namely DRASTIC-RF, DRASTIC-SVM, DRASTIC-MLP, DRASTIC-RF-SVM, and DRASTIC-RF-MLP, were proposed and developed in Python and the Jupyter Notebook Environment. Finally, groundwater vulnerability pollution risk maps were carried out.

DRASTIC Method and Parameters Description
DRASTIC method is a widely applied approach to assess groundwater vulnerability [21]. It is based on seven factors (Figure 2), including depth to groundwater (D), net recharge (R), aquifer media (A), soil media (S), topography (T), impact of the vadose zone (I), and hydraulic conductivity (C). Following the classification established by Aller et al. [16], different rates (r) and weights (w) are assigned to each class of each DRASTIC parameter. The DRASTIC method was calculated using Equation (1): where r is the parameter rate and w is the parameter weight.

• Depth to groundwater (D)
According to Khosravi [21], the (D) parameter presents the distance measured from ground surface to water table. It is considered a limiting factor for groundwater vulnerability, because it conditions the transfer process of pollutant and its possibility of degradation [17]. The greater the depth of water level, the lower the risk of groundwater vulnerability to pollution [16]. In our case, the depth to water map (Figure 2a) was derived from the Digital Elevation Model (DEM) with a pixel size of 30 m and the piezometric map. Based on the DRASTIC framework range value, the map generated was classified into 7 classes, including 0-1.5 m, 1.5-4.5 m, 4.5-9 m, 9-15 m, 15-33 m, 23-31 m, and >31 m.
The recharge is a hydrological process and corresponds to the amount of water that infiltrates through the surface of the ground and contributes to the recharge of the aquifer [16]. The increase in net recharge leads to an increase in the risk of contamination of groundwater. This parameter is related to the topography and the nature of the geological formations. Net Recharge of the Plio-Quaternary aquifer of the Saiss basin is mainly contributed by precipitation and infiltration of irrigation water, as well as by the drainage of the Liasic aquifer of the Middle Atlassic Causse in the southern part of the Saiss basin [34]. The equation for calculating the net recharge was given by Scanlon et al. [38] as: where S y represents the specific yield, ∆h represents the differences of the water-table height for the highest and lowest tables, and ∆t represents the interval time for those tables. The net recharge map obtained was divided into 3 classes, including 0-50, 50-100, and 100-180 mm (Figure 2b).  According to Khosravi [21], the (D) parameter presents the distance measured from ground surface to water table. It is considered a limiting factor for groundwater vulnerability, because it conditions the transfer process of pollutant and its possibility of degradation [17]. The greater the depth of water level, the lower the risk of groundwater vulnerability to pollution [16]. In our case, the depth to water map (Figure 2a •

Recharge (R)
The recharge is a hydrological process and corresponds to the amount of water that infiltrates through the surface of the ground and contributes to the recharge of the aquifer [16]. The increase in net recharge leads to an increase in the risk of contamination of groundwater. This parameter is related to the topography and the nature of the geological formations. Net Recharge of the Plio-Quaternary aquifer of the Saiss basin is mainly contributed by precipitation and infiltration of irrigation water, as well as by the drainage of the Liasic aquifer of the Middle Atlassic Causse in the southern part of the Saiss basin [34]. The equation for calculating the net recharge was given by Scanlon et al. [38] as:

•
Aquifer media (A) The aquifer environment or the saturated zone (see in Figure 2c) influences the vulnerability to pollution because its properties make it possible to control the concentration of pollutants by diluting them. The Saiss aquifer consists of lacustrine formations, conglomerates, sandstones, and sands. These formations are extracted from the Hydrogeological map of the Meknes-Fes basin obtained from Morocco's National Irrigation Office, Directorate General of Studies (1/100,000).

Soil (S)
Soil texture affects the amount of infiltration from ground surface. In this study, this parameter was constructed using the pedological map of central Morocco obtained from the national institute of agronomic research, physical environment department (1/500,000). The soils of this study area were divided into 3 classes (see Figure 2d): clay, clay loam, and sand. Areas with sand are characterized by high permeability, whereas clay areas are characterized by lower permeability.
• Topography (T) Topography parameter plays an important role in the infiltration at the ground surface. A lower slope results in more infiltration and therefore a higher potential for contamination. The slope of the Saiss basin (see in Figure 2e) was extracted from Digital Elevation Model with pixel size 30 m and it ranges from 0% to 112%.

•
Impact of Vadose zone (I) The vadose zone (see in Figure 2f) represents the zone located between the surface of the earth and the upper part of the aquifer; using its property of permeability, it can determine the potential of groundwater contamination. In this research, this layer is constituted by Quaternary formation, such as: impermeable clays, alluviums, travertines, and basalts.
Geological data are obtained from the Hydrogeological map of the Meknes-Fes basin obtained from Morocco's National Irrigation Office, Directorate General of Studies (1/100,000).

•
Hydraulic conductivity (I) Hydraulic conductivity or permeability presents the capacity of an aquifer to transmit water in a porous medium. The higher C is, the faster the pollutant will be transported. In our case, the hydraulic conductivity is given by Equation (3) [39]: where K represents the hydraulic conductivity (m/s), t represents the transmissivity (m 2 /s), and b is the aquifer thickness (m). The hydraulic conductivity values in the Sais basin ranged from 0 to 30 m/day ( Figure 2g). These values were classified into 4 classes: 0.04-4, 4-12, 12-29, and 29-41 (m/s).

Frequency Ratio
Due to its ease of applicability and understanding [40], the bivariate statistic test frequency ratio is widely used in several hazard monitoring studies [41]. In this study, it was used to understand the spatial correlation between each parameter of DRASTIC method and the nitrate points.
where S f represent the area of each class for each DRASTIC parameter; ∆S f represent the total area of each parameter; N f represent concentration of nitrate occurrence in the class of each parameter; εN represent the number of total nitrates in the study area. The correlation between the independent variable, i.e., DRASTIC parameter and the target, i.e., nitrate sample is highest if the FR is >1, whereas the correlation is low if the FR is <1. The FR was calculated for the rate of each class of each DRASTIC parameter and reclassified using natural breaks method in spatial analyst tools in ArcGIS 10.5 (ESRI; Redlands, CA, USA). Thus, we proposed an improved result for groundwater vulnerability based on statistical approach ( Table 1). The frequency ratio of each DRASTIC parameter was normalized between 0 and 1 using Equation (5): where x is the current value of the variable and x' is the normalized value. In this study, these seven FR-DRASTIC parameters served as the explicative variables for groundwater vulnerability modeling. Thus, the dataset was randomly split into 70 points which served as training sample (70%) and 30 points which served as validation sample (30%). The seven DRASTIC parameters were applied as the model inputs and the nitrate values used as the target of the model after normalization process were randomly split into 70% training data and 30% validation data.

Preparation of Nitrate Locations' Data and Validation
In this study, a total of 100 well samples ( Figure 3) were selected to collect nitrate concentration. They were taken from the Sebou Hydraulic Basin Agency (SHBA). For the analysis, the inverse distance weighted (IDW) interpolation was performed for the samples. The result of nitrate mapping is presented in Figure 3. Following the recommendations of the WHO, i.e., an aquifer receiving a concentration of nitrate exceeding 50 mg/L is considered polluted. To validate the machine learning models, nitrate concentration data, which served as target in this study, were divided into 2 groups: locations with nitrate concentrations higher than 50 mg/L were classified as polluted areas, whereas those with nitrate concentrations less than 50 mg/L were classified as unpolluted areas. We randomly separated groundwater polluted areas and groundwater unpolluted areas; out of 100 samples of nitrate concentrations, 70% were used as training dataset and 30% were used as validation dataset. nitrate concentration data, which served as target in this study, were divided into 2 groups: locations with nitrate concentrations higher than 50 mg/L were classified as polluted areas, whereas those with nitrate concentrations less than 50 mg/L were classified as unpolluted areas. We randomly separated groundwater polluted areas and groundwater unpolluted areas; out of 100 samples of nitrate concentrations, 70% were used as training dataset and 30% were used as validation dataset.

Support Vector Machine (SVM)
SVM was developed for the first time by Vapnik with colleagues [42,43]. It is used for classification and problems. The goal of SVM algorithm is to find a hyperplane in an N-dimensional space ( Figure 4). SVM is widely used with successful results in  SVM was developed for the first time by Vapnik with colleagues [42,43]. It is used for classification and problems. The goal of SVM algorithm is to find a hyperplane in an N-dimensional space ( Figure 4). SVM is widely used with successful results in groundwater potential mapping [44,45], landslide mapping [46], and flood susceptibility assessment [47,48]. In this study, the kernel function was used due to its robustness in previous studies [49,50].
Remote Sens. 2022, 14, x FOR PEER REVIEW 9 of 21 groundwater potential mapping [44,45], landslide mapping [46], and flood susceptibility assessment [47,48]. In this study, the kernel function was used due to its robustness in previous studies [49,50]. Developed for the first time by Breiman [51], RF is a supervised machine learning algorithm, based on ensemble technique [52], used for both regression and classification tasks [47]. According to Rahmati et al. [6], running the RF algorithm requires two optimized parameters: the number of variables/factors to be used in each tree-building process (mtry) and the number of trees to be built in the forest to run it (ntree). Developed for the first time by Breiman [51], RF is a supervised machine learning algorithm, based on ensemble technique [52], used for both regression and classification tasks [47]. According to Rahmati et al. [6], running the RF algorithm requires two optimized parameters: the number of variables/factors to be used in each tree-building process (mtry) and the number of trees to be built in the forest to run it (ntree).
In this research, the number of trees was considered as 123 with one random division variable, which led to the highest precision.
RF algorithm was used to evaluate many natural environmental hazard studies, because of its reliability and efficiency [29,53]. RF algorithm was used to evaluate many natural environmental hazards studies, such as gully erosion modelling [54], forest fire mapping [53], and flood susceptibility prediction [55,56].

Multilayer Perceptron-Neural Network (MLP-NN)
Multilayer Perceptron-Neural network MLP-NN is a class of Neural Networks algorithms (ANN) with a structure similar to biological neurons. The structure of MLP-NN models consists of three layers: an input layer, an output layer, and a hidden layer [57,58]. The input layers represent the factors conditioning the risk of pollution, the output layers are considered as classified results that infer the classes of pollution or non-pollution by nitrates, whereas the hidden layer is located between these two, in which the function applies weights. Meanwhile, the hidden layers perform non-linear transformations by applying weights to the inputs and directing them through an activation function as output. The training of this algorithm takes place in three steps: feedforward, backpropagation, and weight adjustment [59,60].
In the current study, in order to achieve the highest performance and to avoid the overfitting problem, two hidden layers were used with 37 neurons in each layer, the Linear Unit Rectification (Relu) was used as activation function, and the optimizer was set to Adam, which was developed by Kingma and Ba [61].

Validation of Groundwater Vulnerability Models
Evaluating the performance of machine learning models is a mandatory step in the modeling research [52,62]. Common statistical approaches were used by several researchers in related studies [15,55].
In this article, using the training and validation datasets, the receiver operating curve (ROC) was applied to evaluate the performance of the groundwater pollution risk models developed in this study. The ROC is a graphical presentation in which the specificity is plotted on the X-axis and the sensitivity is plotted on the Y-axis at different cut-off thresholds classifications [15,63]. To quantitatively validate the models, the area under the curve (AUC) is often used. It is defined as "the probability of a classifier to correctly anticipate the occurrence or non-occurrence of predefined events".
In addition, other statistical measures were used, including Sensitivity, Specificity, Accuracy (ACC), and Precision: Specificity= TN TN + FP , where FP (false positive) is the number of groundwater points incorrectly predicted and considered as polluted and FN (false negative) is the number of groundwater points considered and predicted as non-polluted; TP (true positive) is the number of nitrate pollution samples classified correctly; and TN (true negative) is the number of non-pollution samples classified correctly. According to Pham et al. [64], to measure the quality of the model, a threshold classification can be applied to classify the area under the ROC curve (AUC) into five classes, including: values ranging from 0.5 to 0.6, indicating that the prediction of the model is poor, 0.6 to 0.7, indicating a fair quality of prediction, 0.7 to 0.8, indicating a good quality of prediction, 0.8 to 0.9, indicating a very good quality of prediction, and 0.9 to 1, indicating an excellent quality of prediction.

DRASTIC Vulnerability
After multiplying each of the seven DRASTIC index maps by their standard ratings and weights, the overlays of these seven parameters of the DRASTIC index using Equation (1) were observed to range from 53 to 143.
Following the classification provided by Aller et al. [16], the generated map was divided into three classes of groundwater vulnerability, including very low (14%), low (83%), and medium (3%) ( Figure 5). It can be seen from Figure 5 that the least vulnerable areas are located in the eastern part of the study area. Whereas the south and the west parts of the study area were considered to be without risk, probably due to the high depth (>15) and clay soil, the northeast and the center parts of the basin were considered to be the most vulnerable areas. Due to the low slope (0-12), low depth (0-4.5 m), and high hydraulic conductivity (12-41 m/day), the vadose zone in this area is represented by sandstone and conglomerate.

Frequency Ratio
Using the frequency ratio bivariate statistic test, the correlations between groundwater pollution risk and each DRASTIC parameter were calculated. As can be seen from Table 1, the net recharge class, ranging from 50 to 100 mm, has the highest FR (0.88). The classes 0-1.5 and 1.5-4.5 of depth to water table and the class 29-41 of hydraulic conductivity present the FR value of zero, i.e., no probability of groundwater pollution with nitrate. It can be seen from Figure 5 that the least vulnerable areas are located in the eastern part of the study area. Whereas the south and the west parts of the study area were considered to be without risk, probably due to the high depth (>15) and clay soil, the northeast and the center parts of the basin were considered to be the most vulnerable areas. Due to the low slope (0-12), low depth (0-4.5 m), and high hydraulic conductivity (12-41 m/day), the vadose zone in this area is represented by sandstone and conglomerate.

Frequency Ratio
Using the frequency ratio bivariate statistic test, the correlations between groundwater pollution risk and each DRASTIC parameter were calculated. As can be seen from Table 1, the net recharge class, ranging from 50 to 100 mm, has the highest FR (0.88). The classes 0-1.5 and 1.5-4.5 of depth to water table and the class 29-41 of hydraulic conductivity present the FR value of zero, i.e., no probability of groundwater pollution with nitrate.

Groundwater Pollution Risk Maps
The spatial distribution of groundwater pollution risk maps (GPRMs) for the study area were produced by the DRASTIC method and the applied machine learning models. Using equal interval reclassification within the ArcGIS environment, the generated maps were categorized into five different classes, including very high, high, moderate, low, and very low (Figures 6 and 7).  It can clearly be seen from Figure 7 that the west and the center parts of the basin are at risk of being more contaminated, whereas the eastern part is considered an unpolluted area. For the DRASTIC-MLP model, the pollution risk map (GPRM) elaborated showed that 53% of the basin was characterized as having very low vulnerability risk. Whereas low, medium, high, and very high contributed 15%, 15%, 12%, and 6%, respectively (Figure 7a). For the DRASTIC-SVM model, the GPRM elaborated showed that 52% of the basin was characterized as having very low vulnerability risk. Whereas low, medium, high, and very high contributed 13%, 15%, 12%, and 7%, respectively ( Figure 7b). As can be seen from Figure 7c, the generated groundwater pollution risk map (GPRM) produced by RF model showed that 41% of the study area was classified as very low vulnerability, followed by the class of low vulnerability (15%). Whereas medium, high, and very high contributed 13%, 17%, and 14%, respectively. The GPRM generated by the DRASTIC-FR-SVM was divided into five classes: very low vulnerability (38%), low vulnerability (12%), moderate vulnerability (9%), high vulnerability (11%), and very high vulnerability (30%) (Figure 7d). The GPRM generated by the DRASTIC-FR-MLP was divided into five classes: very low vulnerability (36%), low vulnerability (10%), moderate vulnerability (9%), high vulnerability (15%), and very high vulnerability (30%) (Figure 7e).

Validation
Using the training and the validation datasets, the performance of the developed models in this study was evaluated. The DRASTIC-RF-MLP ensemble outperformed all other models developed in this study. Thus, based on the training dataset (Table 2), the It can clearly be seen from Figure 7 that the west and the center parts of the basin are at risk of being more contaminated, whereas the eastern part is considered an unpolluted area. For the DRASTIC-MLP model, the pollution risk map (GPRM) elaborated showed that 53% of the basin was characterized as having very low vulnerability risk. Whereas low, medium, high, and very high contributed 15%, 15%, 12%, and 6%, respectively (Figure 7a). For the DRASTIC-SVM model, the GPRM elaborated showed that 52% of the basin was characterized as having very low vulnerability risk. Whereas low, medium, high, and very high contributed 13%, 15%, 12%, and 7%, respectively ( Figure 7b). As can be seen from Figure 7c, the generated groundwater pollution risk map (GPRM) produced by RF model showed that 41% of the study area was classified as very low vulnerability, followed by the class of low vulnerability (15%). Whereas medium, high, and very high contributed 13%, 17%, and 14%, respectively. The GPRM generated by the DRASTIC-FR-SVM was divided into five classes: very low vulnerability (38%), low vulnerability (12%), moderate vulnerability (9%), high vulnerability (11%), and very high vulnerability (30%) (Figure 7d). The GPRM generated by the DRASTIC-FR-MLP was divided into five classes: very low vulnerability (36%), low vulnerability (10%), moderate vulnerability (9%), high vulnerability (15%), and very high vulnerability (30%) (Figure 7e).

Validation
Using the training and the validation datasets, the performance of the developed models in this study was evaluated. The DRASTIC-RF-MLP ensemble outperformed all other models developed in this study. Thus, based on the training dataset ( following values were obtained: Accuracy = 0.957, Precision = 0.943, Sensitivity = 0.971, and Specificity = 0.944. Based on the validation dataset, the following statistical values were obtained: Accuracy = 0.7952, Precision = 0.969, Sensitivity = 0.939, and Specificity = 0.966. The DRASTIC-RF-SVM ensemble performed second best, with the training dataset revealing the following values shown in Table 2 (Table 2). Thus, for the validation dataset: Accuracy= 0.871, Precision= 0.857, Sensitivity = 0.875, and Specificity = 0.867.
The performances obtained by the DRASTIC-SVM ensemble were as follows for the training dataset (Table 2)

Variable Importance
In all modeling research purposes, variables' importance should be applied to find the suitable predictive variables for the modelling research [65,66]. In this study, the RF model was used to evaluate the importance of the DRASTIC parameters. Our findings showed that both the depth of groundwater (1.8) and the hydraulic conductivity (1.6) had the highest importance in groundwater vulnerability assessment, followed by net recharge (1.49), aquifer media (1.38), topography (1.27), soil (1.21), and the impact of the vadose zone (0.97).

Discussion
Groundwater pollution is one of the challenging issues in the world, especially in arid and semi-arid ecosystems [20]. To overcome this issue, researchers have deployed great efforts to prevent and manage water contamination. The DRASTIC method is a widely applied approach to assess groundwater pollution.
Our results showed that the north and central parts tended to have a moderate risk of pollution, whereas the least vulnerable areas are located in the eastern parts of the study area. The south and west parts of the study area are considered to be without risk. This could be due to the hydrological background of the study area (i.e., low slope, lowest depth, highest hydraulic conductivity, and the lithological nature) [67]. Similar to our results, previous research reported that a higher groundwater vulnerability was observed in downslope areas [68], meaning the higher the hydraulic conductivity, the higher the risk of groundwater [67]. According to Baghapour et al. [68], the slope has an important impact. In addition, [21] indicated that the lowest depth is more affected by groundwater vulnerability.
Due to its uncertainty, many researchers have modified the original DRASTIC method using statistical techniques and machine learning models to increase its accuracy. In this study, five new individual models and their ensemble machine learning models were developed, namely DRASTIC-RF, DRASTIC-SVM, DRASTIC-MLP, DRASTIC-RF-SVM, and DRASTIC-RF-MLP, for groundwater pollution risk assessment in the Saiss basin, in Morocco.
Through its ability to handle large datasets, its low aptitude to overfitting, and its ability to learn non-linear relationships between the nitrate polluted samples and the

Variable Importance
In all modeling research purposes, variables' importance should be applied to find the suitable predictive variables for the modelling research [65,66]. In this study, the RF model was used to evaluate the importance of the DRASTIC parameters. Our findings showed that both the depth of groundwater (1.8) and the hydraulic conductivity (1.6) had the highest importance in groundwater vulnerability assessment, followed by net recharge (1.49), aquifer media (1.38), topography (1.27), soil (1.21), and the impact of the vadose zone (0.97).

Discussion
Groundwater pollution is one of the challenging issues in the world, especially in arid and semi-arid ecosystems [20]. To overcome this issue, researchers have deployed great efforts to prevent and manage water contamination. The DRASTIC method is a widely applied approach to assess groundwater pollution.
Our results showed that the north and central parts tended to have a moderate risk of pollution, whereas the least vulnerable areas are located in the eastern parts of the study area. The south and west parts of the study area are considered to be without risk. This could be due to the hydrological background of the study area (i.e., low slope, lowest depth, highest hydraulic conductivity, and the lithological nature) [67]. Similar to our results, previous research reported that a higher groundwater vulnerability was observed in downslope areas [68], meaning the higher the hydraulic conductivity, the higher the risk of groundwater [67]. According to Baghapour et al. [68], the slope has an important impact. In addition, [21] indicated that the lowest depth is more affected by groundwater vulnerability.
Due to its uncertainty, many researchers have modified the original DRASTIC method using statistical techniques and machine learning models to increase its accuracy. In this study, five new individual models and their ensemble machine learning models were developed, namely DRASTIC-RF, DRASTIC-SVM, DRASTIC-MLP, DRASTIC-RF-SVM, and DRASTIC-RF-MLP, for groundwater pollution risk assessment in the Saiss basin, in Morocco.
Through its ability to handle large datasets, its low aptitude to overfitting, and its ability to learn non-linear relationships between the nitrate polluted samples and the input layers [15], random forest has shown greater accuracy in previous studies [26,53]. In addition, Ref. [44] confirms that RF is one of the successful machine learning models for groundwater mapping.
Hybrid/ensemble approaches have been proposed recently to improve the performance of both individual machine learning models and bivariate statistical techniques [24,42,51]. In the current research, RF was used to improve the accuracy of the other applied models. In term of the accuracy, our results show that the DRASTIC-RF-MLP hybrid model provided the best accuracy in comparison to the other models. MLP has been used in several studies with successful results because of its ability to deal with complex non-linear problems and ease in processing large numbers of input data [24].
In terms of the DRASTIC-RF-SVM model, the SVM algorithm has significant advantages in solving linear and non-linear problems and works well in high dimensional spaces [48]. It has already revealed great potential for groundwater mapping [45,52].
It should be highlighted that selecting the best model is not an easy task because all the aforementioned models present their own advantages and drawbacks. In addition, we can add that the outcomes of this research are limited and related to the study area background and the data used. Thus, it can be concluded that all the above discussed models can be used for any hazard environment monitoring studies, including groundwater pollution risk mapping, for other environment backgrounds with promising results.
Our findings showed that both depth of groundwater and hydraulic conductivity had the highest importance in groundwater vulnerability assessment, followed by net recharge, aquifer media, topography, soil, and impact of the vadose zone. Similar findings were also found by [52,69]. From our results, depth of groundwater is the most important variable; this is because the groundwater could easily be contaminated by surface runoff and contaminants. These findings agree with Pham et al. [52], who also confirm that depth of groundwater is the most significant factor for groundwater potential mapping. Hydraulic conductivity is considered an important factor for groundwater management strategies because it controls the contaminants' migration rate from the source to the aquifer.
With the rapid population growth in the Fes-Meknes region, urbanization and extensive use of soils for agriculture activities have become serious challenges for environmental agency, leading to the groundwater quality deterioration in the Saiss basin. In the same geographical area, in a recent work El Hafyani et al. [70] reported that the increase in water consumption is linked to Meknes city's urban growth and the agricultural activities' system adopted. In addition, the impact of climate change on natural resources has been investigated in this study area [71]. Laraichi et al. [72] demonstrated a weakening in the transmission of information and communication about groundwater, which might lead to several issues including overexploitation and pollution. For instance, Benaabidate et al. [73] reported in their study a decrease in piezometric level, which is mainly due to several factors, including the decrease in precipitation, the reduced natural aquifer recharge, and the increased pumping, mainly for irrigation. Likewise, a recent work by Berni et al. [9] highlighted that a negative effect of pesticides in the Saiss basin was explained by farmers' safety behavior. Additionally, expert interpretations confirmed that a clear link between groundwater vulnerability risk and geological background of the Saiss basin was shown, highlighting that Meknes city is more susceptible to groundwater pollution, which is probably due to the existence of permeable Paleocene sands in this area.
Our results indicate that the most vulnerable areas are located in the northeast and the center of the basin, because of low depth, low slope, high recharge, and high hydraulic conductivity, whereas the high depth, low recharge, and low conductivity of the western areas of the Saiss cause this part to be considered as without risk. These findings are in line with previous works [9,35]. Thus, the delineation of highly affected areas to pollution is urgently needed to avoid the deterioration of groundwater resources. In this aquifer, as agriculture is still going to be one of the main sources of the local population, to meet the livelihood's resource requirements, more comprehensive studies should be carried out to promote the sustainable use of natural resources for the proper management and planning of them. Therefore, the proposed methodology could be a newer, effective tool and an emergent path to decision making for assessing groundwater pollution based on the performance accuracy of hybrid machine learning algorithms.
Marking vulnerable areas to pollution based on the opportunities of the developed models will be very helpful in encouraging the ongoing efforts to develop a geoportal platform in the framework of the VLIR-UOS project through a collaborative effort with scientists and different environmental agencies (https://www.vliruos.be/en/projects/ project/ (accessed on 14 December 2021).Furthermore, this work contributes to the aim of the Sustainable Development Goals (SDGs) framework on its sixth goal, which is dedicated to water sustainability and different indicators related to water quality.

Conclusions
In this study, five new hybrid models were developed based on the modified DRASTIC method by the frequency ration bivariate statistic test and the random forest machine learning algorithm, namely DRASTIC-RF, DRASTIC-SVM, DRASTIC-MLP, DRASTIC-RF-SVM, and DRASTIC-RF-MLP, for groundwater pollution assessment in the Saiss basin. Furthermore, three conclusions can be highlighted.

-
The results obtained indicate that the most vulnerable areas are located in the west and the center parts of the basin, because of the low depth, low slope, and high hydraulic conductivity, whereas the high depth, low recharge, and low conductivity of the western areas of the Saiss basin mean that this area is considered to be without risk; -As expected, the locations subject to high vulnerability risk are associated with a high concentration of nitrate; - The spatial distribution of groundwater pollution risk maps (GPRMs) for the study area show that the west and the center parts of the basin are the most vulnerable areas; - The results highlight that the hybrid/ensemble machine learning (ML) model outperforms the individual based model.
It should be noted that the overall goal beyond this research is to implement a machine learning algorithm to understand groundwater pollution. Furthermore, the output will help the authorities and government agencies in designing appropriate decisionmaking strategies.
Finally, in a vision to protect our environment, the methodology developed here could be applied in other case studies with similar background.