A Novel Approach to Harmonize Vulnerability Assessment in Carbonate and Detrital Aquifers at Basin Scale

The DRASTIC (D: Depth to water; R: Net recharge; A: Aquifer media; S: Soil media; T: Topography; I: Impact of vadose zone; C: Hydraulic conductivity) index is usually applied to assess intrinsic vulnerability in detrital and carbonate aquifers, although it does not take into account the particularities of karst systems as the COP (C: Concentration of flow; O: Overlying layers above water table; P: precipitation) method does. In this paper we aim to find a reasonable correspondence between the vulnerability maps obtained using these two methods. We adapt the DRASTIC index in order to obtain reliable assessments in carbonate aquifers while maintaining its original conceptual formulation. This approach is analogous to the hypothesis of “equivalent porous medium”, which applies to karstic aquifers the numerical solution developed for detrital aquifers. We applied our novel method to the Upper Guadiana Basin, which contains both carbonate and detrital aquifers. Validation analysis demonstrated a higher confidence in the vulnerability assessment provided by the COP method in the carbonate aquifers. The proposed method solves an optimization problem to minimize the differences between the assessments provided by the modified DRASTIC and COP methods. Decision trees and spatial statistics analyses were combined to identify the ranges and weights of DRASTIC parameters to produce an optimal solution that matches the COP vulnerability classification for carbonate aquifers in 75% of the area, while maintaining a reliable assessment of the detrital aquifers in the Basin.


Introduction
Groundwater pollution is a widespread problem affecting most aquifers all over the world due to the increasing agricultural and industrial human activity [1,2]. It is the result of the interaction between the anthroposphere and the hydrosphere, where substances from the different land uses penetrate the groundwater, leading to impacts on environmental water quality and human health.
The protection of the groundwater resources has become a priority due to the importance of groundwater for human supply, irrigation and dependent ecosystems, especially in semi-arid regions [3][4][5]. The degree of protection depends on intrinsic groundwater vulnerability, which is defined as the susceptibility of aquifers to pollution arising from anthropogenic activity [6]. Several methods have been proposed to assess intrinsic vulnerability by using conceptual groundwater flow models and taking transport processes into account [7,8]. The most frequently employed methods are the index-based approaches [9,10] of which DRASTIC (D: Depth to water; R: Net recharge; A: Aquifer media; S: Soil media; T: Topography; I: Impact of vadose zone; C: Hydraulic conductivity) [9] is the most common. DRASTIC has been in the vulnerability assessment would allow the dissimilarities provided by different vulnerability methods to be dealt with. Although in the literature we can find several attempts to adapt the DRASTIC method to the particularities of each case study, none of them have proved the applicability of the modified DRASTIC in different aquifer typologies.
In this study, an optimization problem is solved to establish a correspondence between DRASTIC and COP by maintaining the original formulation of the DRASTIC method. It aims to find the weights and ranges of the DRASTIC parameters that maximize the correspondence between the qualitative vulnerability classes obtained using DRASTIC and COP methods. The optimization problem is solved through a new approach that combines spatial statistics analysis and data mining (decision trees). Although decision trees have been applied to predict the sensitivity to contaminants based on groundwater vulnerability [50,51], they have not been used to adapt/improve the DRASTIC method.
The optimization methodology was applied to five carbonate aquifers in the Upper Guadiana Basin, where the validation analysis demonstrated that COP method produces better vulnerability assessment than the original DRASTIC one. The optimal solution obtained for carbonate aquifers (O-DRASTIC) was also tested in the three detrital aquifers within the basin and the validation analysis also shows significant improvement in the results comparing with the original DRASTIC. A sensitivity analysis of the changes introduced to define the optimum DRASTIC reveals the influence of the various intrinsic characteristics on the severity of vulnerability for different aquifer typologies.

Study Area and Data
The Upper Guadiana Basin ( Figure 1) is located in the central part of Spain. It is composed of eight groundwater bodies including five unconfined mixed (carbonate and detrital) and three unconfined detrital aquifers extending over approximately 14,000 km 2 .
Water 2020, 12, x FOR PEER REVIEW 3 of 26 literature we can find several attempts to adapt the DRASTIC method to the particularities of each case study, none of them have proved the applicability of the modified DRASTIC in different aquifer typologies.
In this study, an optimization problem is solved to establish a correspondence between DRASTIC and COP by maintaining the original formulation of the DRASTIC method. It aims to find the weights and ranges of the DRASTIC parameters that maximize the correspondence between the qualitative vulnerability classes obtained using DRASTIC and COP methods. The optimization problem is solved through a new approach that combines spatial statistics analysis and data mining (decision trees). Although decision trees have been applied to predict the sensitivity to contaminants based on groundwater vulnerability [50,51], they have not been used to adapt/improve the DRASTIC method.
The optimization methodology was applied to five carbonate aquifers in the Upper Guadiana Basin, where the validation analysis demonstrated that COP method produces better vulnerability assessment than the original DRASTIC one. The optimal solution obtained for carbonate aquifers (O-DRASTIC) was also tested in the three detrital aquifers within the basin and the validation analysis also shows significant improvement in the results comparing with the original DRASTIC. A sensitivity analysis of the changes introduced to define the optimum DRASTIC reveals the influence of the various intrinsic characteristics on the severity of vulnerability for different aquifer typologies.

Study Area and Data
The Upper Guadiana Basin ( Figure 1) is located in the central part of Spain. It is composed of eight groundwater bodies including five unconfined mixed (carbonate and detrital) and three unconfined detrital aquifers extending over approximately 14,000 km 2 . The climate in this area is typically continental and semiarid. Mean annual precipitation over the period 1974-2015 was 445 mm and the mean annual temperature is 14.7 °C. Mean potential evapotranspiration is 700 mm/year. The area is predominantly flat, bounded by mountain landscapes to the north (Sierra de Altomira) and south (Campo de Montiel).
Connectivity between groundwater bodies is structurally complex, with strong natural interaction between groundwater and surface water. Under natural conditions groundwater flow discharges into the central aquifer of the system (Mancha Occidental Aquifer) [52] which flows eastwards. The climate in this area is typically continental and semiarid. Mean annual precipitation over the period 1974-2015 was 445 mm and the mean annual temperature is 14.7 • C. Mean potential evapotranspiration is 700 mm/year. The area is predominantly flat, bounded by mountain landscapes to the north (Sierra de Altomira) and south (Campo de Montiel).
Connectivity between groundwater bodies is structurally complex, with strong natural interaction between groundwater and surface water. Under natural conditions groundwater flow discharges into the central aquifer of the system (Mancha Occidental Aquifer) [52] which flows eastwards.
The predominantly dry climate and the prevalence of irrigated agriculture means that the Upper Guadiana Basin has been intensively pumped, and this has led to some of its aquifers being declared overexploited [53]. The water table is highly variable, lying at less than 1 m to more than 250 m, though over most of the basin it lies below 15 m.
Rainfall is the main source of aquifers' recharge. Mean annual recharge varies between 45 and 70 mm/year, although there is disagreement about these values [52][53][54].
The soils in the basin mainly belong to the cambisol group according to the FAO (Food and Agriculture Organization of the United Nations) classification [54,55]. Regosol and others, such as luvisol and podzol, can be found in the southeast [54]. Soil texture in the northern part of the basin is predominantly silty-loam, whereas the soil in the southern part it is peaty.
The geology is complex, including mixed carbonate-detrital aquifers. In the southern half of the Upper Guadiana Basin, the aquifers are predominantly composed of limestones, with many karstified zones [53]. In general, the karst is not very developed and there are no swallow holes in these aquifers. Many parts of the central aquifer are formed by Tertiary detrital deposits [53]. The northern aquifers are more heterogeneous. There are no large karstified areas and other formations of metamorphic materials can be found. The detrital aquifers are mainly composed of Tertiary and Quaternary alluvial materials.
The unsaturated zone is formed by poorly permeable lithologies in the northern part of the basin, with higher permeabilities in the southern part [54].
Conductivity in the Upper Guadiana Basin varies widely. To the north, conductivity is low (below 1.5 × 10 −4 m/s); in the central part there are zones with values higher than 5 × 10 −4 m/s, while conductivity to the southern is mainly in the range 3.5 × 10 −4 -5 × 10 −4 m/s [56].

DRASTIC and COP Vulnerability Maps
DRASTIC and COP vulnerability maps were calculated in the five mixed aquifers following the proposal made in [9,10], respectively ( Figure 2). A detailed explanation of these methods can be found in the Appendix A. Tables A4 and A5 (Appendix B) summarizes the data sources and the methodology applied to calculate the parameters of DRASTIC (hereinafter, we will call it "original DRASTIC") and COP, respectively.
The "original DRASTIC" values vary between 41 and 171 within the study area. The final index was reclassified into five qualitative classes (Very low: <100; Low: 100-120; Moderate: 120-160; High: 160-180; Very high: ≥180) by grouping the original categories proposed in [9] to obtain the same number of classes as the COP vulnerability map. The values of parameters Aquifer media, Soil media, Impact of vadose zone and Conductivity ( Figure 2(a3,a4,a6,a7)) show significant heterogeneity, with clear differences between the northern and central-southern areas. These differences are finally reflected in the vulnerability map ( Figure 2(a8)). Figure 2b shows the results of applying the COP method as proposed by [10]. The protection conferred by Overlaying the layers (Figure 2(b2)) generally decreases from north to south. It is very low in the southern part due to the presence of limestones and dolomites outcrops. The surface features related to the Concentration of flow (Figure 3(b1)) produce only a slight reduction of protection over 80% of the area, except in the southern Campo de Montiel aquifer, where the protection from Overlying layers is greatly reduced.
While the highest values of the DRASTIC index are located in the center and southern part of the basin (classified as "Moderate vulnerability"), the highest values of the COP index appear in the southern zone (Campo de Montiel aquifer). Both indices give their lowest values in the north area (Sierra de Altomira and Lillo Quintanar aquifers).
The percentage overlap between the vulnerability classes obtained using COP and DRASTIC is 55.75%. There is a significant coincidence in the "Very low" and "Low" vulnerability classes, which cover nearly 80% of the area in the COP map. However, the coincidence in "High" and "Very high" classes is almost null. A misclassification of the highest vulnerability areas of groundwater dependent ecosystems such as the Upper Guadiana Basin would lead to erroneous planning and management decisions, possibly leading to significant environmental impacts.

Validation of Vulnerability Maps
Contaminant loads are linked to human activities and land use (LU). Therefore, in addition to aquifer vulnerability, these are a key factor in determining groundwater contamination [3,57]. In the study area, the intensive agriculture and the use of nitrogen fertilizers has provoked high levels of nitrates in many groundwater areas. Since nitrate is not naturally found in groundwater, it is considered to be a good indicator of contamination by human impact, especially in agricultural zones.

Validation of Vulnerability Maps
Contaminant loads are linked to human activities and land use (LU). Therefore, in addition to aquifer vulnerability, these are a key factor in determining groundwater contamination [3,57]. In the study area, the intensive agriculture and the use of nitrogen fertilizers has provoked high levels of nitrates in many groundwater areas. Since nitrate is not naturally found in groundwater, it is considered to be a good indicator of contamination by human impact, especially in agricultural zones. The DRASTIC vulnerability map is validated by adding an LU factor to the DRASTIC index (Equation (1)), as proposed in other studies [27,58,59]. Making an analogy, the index defining protection against pollution could be based on the COP vulnerability (Equation (2)). In this case the inverse of the LU factor is considered since a higher COP index indicates lower vulnerability levels. Since the principal land use is agriculture, the pollution index and the protection-against-pollution index will maintain the factor of 5 used to weight the LU term, as was originally included in the DRASTIC pollution index by other authors [27,60]: The rates assigned to LU [3,27,33,34] are shown in Table 1. The correlations (R-squared coefficient) with the pollution index derived from DRASTIC and COP methods ( Figure 3) were determined from the 214 observation points where nitrate concentration data was available (1974 to 2015 provided by the Spanish Geological Survey and the River Basin Authority). The DRASTIC pollution index ( Figure 3a) is weakly correlated with the mean nitrate concentration (R 2 = 0.04) in the carbonate aquifers. However, the protection-against-pollution index linked to the COP map ( Figure 3b) shows strong correlation (R 2 = 0.78). Accordingly, in this validation analysis, the COP vulnerability assessment for the study area provides a more reliable assessment than the DRASTIC index one.
As can be observed in Figures 2 and 3, both DRASTIC and COP indices suggest low vulnerability in some zones with high nitrate concentration (in the western part of the basin) that results from the intensive agricultural exploitation since early 1970's. Thus, groundwater contamination not only depends on intrinsic vulnerability, but also the land use, type of crops, type of irrigation, etc. For this reason, the most contaminated areas do not always correspond to the most vulnerable ones [31,32].

Methodology: Optimization of DRASTIC Method
In this paper we adapt the DRASTIC method to minimize differences with the COP vulnerability map for the five carbonate aquifers in the Upper Guadiana Basin, which has been proven to provide better results in those aquifers according to the validation analysis. The main objective is to obtain a harmonized method that allows vulnerability in carbonate and detrital aquifers to be assessed in a homogenous way, in a system comprising varying geological formations. The harmonization of criteria to assess groundwater vulnerability will make the results comparable at basin scale. To this end, the DRASTIC index is recalculated by varying the ranges and weights of its parameters (fulfilling certain constraints to maintain a rational definition), in order to minimize the differences with the COP vulnerability map for the five mixed (carbonate and detrital) aquifers in the Upper Guadiana Basin. Data mining techniques are then applied to identify the ranges and weights that provide an optimal solution. Figure 4 shows the flowchart of the proposed optimization problem and the methodology applied. Two objective functions (O.F.) were tested: (A) to maximize the percentage of spatial coincidence (area, Si) between DRASTIC and COP vulnerability classes; and (B) to minimize the distance (di; see Equation (3)) between the vulnerability classes in DRASTIC and COP. The decision variables in this optimization problem are (1) the ranges (of the DRASTIC index and its parameters) and (2) the weights of the DRASTIC parameters.
Water 2020, 12, x FOR PEER REVIEW 8 of 26 coincidence (area, Si) between DRASTIC and COP vulnerability classes; and (B) to minimize the distance (di; see Equation (3)) between the vulnerability classes in DRASTIC and COP. The decision variables in this optimization problem are (1) the ranges (of the DRASTIC index and its parameters) and (2) the weights of the DRASTIC parameters. The ranges of the DRASTIC parameters and index are proposed based on the available data in the study area and covering a wide range of hypothetical cases. The following constraints regarding the ranges proposal are imposed: • The ranges of non-continuous parameters (A, S and I) were assumed to be invariant and we classify these parameters as proposed in [9]. We modify the DRASTIC index classification (DR) and the ranges of only three numerical parameters in DRASTIC: Depth to water (D), Topography (T) and Conductivity (C). The proposed ranges are shown in Table 2. Due to the narrow variability of the data in the study area the recharge ranges were not modified.

•
The number of classes and rates adopted for all parameters are as proposed in [9]. We change only the distribution of numerical data within the ranges in order to adapt them to the characteristics of the case study. The ranges of the DRASTIC parameters and index are proposed based on the available data in the study area and covering a wide range of hypothetical cases. The following constraints regarding the ranges proposal are imposed: • The ranges of non-continuous parameters (A, S and I) were assumed to be invariant and we classify these parameters as proposed in [9]. We modify the DRASTIC index classification (DR) and the ranges of only three numerical parameters in DRASTIC: Depth to water (D), Topography (T) and Conductivity (C). The proposed ranges are shown in Table 2. Due to the narrow variability of the data in the study area the recharge ranges were not modified.

•
The number of classes and rates adopted for all parameters are as proposed in [9]. We change only the distribution of numerical data within the ranges in order to adapt them to the characteristics of the case study.

•
The modification of ranges is based on statistical criteria according to the distribution of data in the study area (quantile method, equal intervals, natural clusters in ArcGis and increasing the limit of the ranges by a constant value) [32,61].

•
We established a minimum range amplitude of 5 for the DRASTIC vulnerability classes. We also established certain constraints in the weights of parameters: • Only weights between 1 to 5 are considered; • The sum of the weights had to be 23 for each combination (as the original proposal in [9]).  Low  80  90  100  90  100  100  100  110  110  115  120  130  120  130  130  140  Moderate  90  110  130  100  120  110  115  130  120  125  140  140  130  140  145  150  High  100  130  160  110  140  120  130  150  130  140  160  150  140  150  160 160 For each combination of parameter ranges and weights used to generate a DRASTIC map, the distance (d i ) or difference between vulnerability classes reported by DRASTIC and COP is calculated by Equation (3): where • α jk is the total area of "j" vulnerability class of COP overlapping with the "k" vulnerability class of DRASTIC; • x jk is a weight from 0 to 4 depending on the number of jumps from one vulnerability class to another.
In order to establish a dimensionless threshold to quantify the distance (d i ), it is expressed as a percentage of the maximum calculated distance (d max ) over all calculated DRASTIC indices: In order to reduce the number of calculations, we employ data mining techniques (decision trees) to select the values of the variables domain to be tested. The optimal solution is sought following two steps:

1.
Ranges optimization: i. First, DRASTIC vulnerability maps are calculated modifying the ranges of parameters and the classification of the DRASTIC index. The weights of parameters are the same as proposed in [9]. ii.
All the DRASTIC indices are evaluated through the objective functions and the results are classified intro three categories (Table 3). iii.
Decision trees are applied in order to find out the ranges for each parameter that gives the highest coincidence (Max(S i )) and a lowest distance (Min(d i )) between vulnerability classes assigned using DRASTIC and COP.

2.
Weights optimization: i. In this second step, the weights of parameters are introduced as new variables to compute all the feasible combinations of weights and parameter ranges selected in the previous step. The DRASTIC index is calculated for all the combinations of weights and selected classifications in step 1. ii.
The new set of DRASTIC maps are evaluated by means of the objective functions. iii.
For each parameter, decision trees are applied again to determine the weight that yields greatest similarity between the DRASTIC and COP maps. Table 3. Classification criteria of objective functions for the decision trees algorithm.

Objective Function Value Class
S i (spatial coincidence) The main objective of decision trees in this study is to identify the ranges and weights for each DRASTIC parameter involved in any combination that yields the maximum spatial coincidence (S i ) and the minimum distance between vulnerability classes (d i ) (S i = 3 or d i = 1 according to Table 3). We establish these threshold values according to the distribution of results in the first set of combinations. Decision trees reduce the computational cost of the optimization problem and allow the most relevant variables to be identified in the vulnerability assessment in carbonate aquifers.
The CHAID (Chi-square Automatic Interaction Detection) algorithm [62,63] is applied in decision trees, considering the objective functions of the optimization problem (S i and d i ) as dependent variables. The proposed ranges and weights of parameters and classifications of DRASTIC are the independent variables, and the chi-squared test of significance is used as the splitting criterion in the CHAID algorithm. Each dataset generated in steps 1 and 2 is partitioned into a training set (70%) and a testing set (the remaining 30%) in order to assess the performance of the models. The goodness-of-fit of the classification is evaluated using the precision index [51,64]: where: Decision trees can represent the relationship between variables and output class using specific rules following the pathway from the root node to the terminal node [63,65,66]. A priori, the number of terminal nodes on the tree can determine the number of rules but it may generate a large number of irrelevant pieces of information. Only the most relevant variables (rules with the highest population for each decision tree with precision above 50%) whose terminal nodes in the tree are classified as S i = 3 or d i = 1 are selected to generate all the feasible combinations to compute the DRASTIC indices.

Ranges Optimization
In the first optimization step, we obtained a total of 11,520 different DRASTIC maps. The spatial coincidence (S i ) of the vulnerability classes between the DRASTIC and COP maps rose to 61.31% (from 55.75% in "original DRASTIC") and the minimum distance between vulnerability classes (d i ) fell to 20.85% (d original DRASTIC = 24.72%).
The ranges proposed in [9] were selected for parameters D and T in the decision tress. For conductivity, the selected classifications (C7 and C8) assign lower rates to conductivity values.

Weights Optimization
The selected ranges for parameters and the DRASTIC index were combined, varying the weights of parameters between one and five. Due to a large number of generated combinations, we first considered weights one, three and five. This resulted in 35,700 new DRASTIC maps. The spatial coincidence (S i ) between vulnerability classes increased to 70.34% and the minimum distance between vulnerability classes (d i ) fell to 8.45%.
Decision trees were applied again to determine the optimum weight for each parameter. We aimed to determine if the value of the weight for each parameter provides relevant information in the optimization problem. We found that a weight equal to five did not appear in relevant rules in parameters D and T, whereas a weight equal to one did not appear for parameters R and S. All weights (one, three, and five) were found in rules for parameters A, I and C.
A new set of combinations of ranges and weights were computed to find an optimum between the gaps left due to constraints. We introduced the mid-weights for each parameter, discarding those not found in the relevant rules. We included the following weights for each parameter: W D = 2; W R = 4; W A = 2 and 4; W S = 4; W T = 2; W I = 2 and 4; W C = 2 and 4. The total of combinations provide two optimal solutions: The second solution was considered the best solution because the gain in S i was higher than the loss in d i . Finally, the best solution was refined by adjusting the classification of the DRASTIC index to increase the spatial coincidence with the COP map, obtaining the optimum DRASTIC. The classification of the optimum DRASTIC does not match with the ranges proposed originally in [9]. The objective functions take the following values for the optimum DRASTIC: S i = 76.75%; d i = 10.92%. Figure 5 shows the dot-plot of the all the DRASTIC indices calculated. It reveals the efficacy of using decision trees in the methodology to reduce the number of combinations to be tested when seeking the optimal solution. Each set of combinations improves the objective functions. Black dots show those including the mid-weights for each parameter that produce improvement in the objective function "d i ", but not in "S i ". Red dots indicate the DRASTIC indices that provide Max(S i ) and Min(d i ).

Weights Optimization
The selected ranges for parameters and the DRASTIC index were combined, varying the weights of parameters between one and five. Due to a large number of generated combinations, we first considered weights one, three and five. This resulted in 35,700 new DRASTIC maps. The spatial coincidence (Si) between vulnerability classes increased to 70.34% and the minimum distance between vulnerability classes (di) fell to 8.45%.
Decision trees were applied again to determine the optimum weight for each parameter. We aimed to determine if the value of the weight for each parameter provides relevant information in the optimization problem. We found that a weight equal to five did not appear in relevant rules in parameters D and T, whereas a weight equal to one did not appear for parameters R and S. All weights (one, three, and five) were found in rules for parameters A, I and C.
A new set of combinations of ranges and weights were computed to find an optimum between the gaps left due to constraints. We introduced the mid-weights for each parameter, discarding those not found in the relevant rules. We included the following weights for each parameter: The second solution was considered the best solution because the gain in Si was higher than the loss in di. Finally, the best solution was refined by adjusting the classification of the DRASTIC index to increase the spatial coincidence with the COP map, obtaining the optimum DRASTIC. The classification of the optimum DRASTIC does not match with the ranges proposed originally in [9]. The objective functions take the following values for the optimum DRASTIC: Si = 76.75%; di = 10.92%. Figure 5 shows the dot-plot of the all the DRASTIC indices calculated. It reveals the efficacy of using decision trees in the methodology to reduce the number of combinations to be tested when seeking the optimal solution. Each set of combinations improves the objective functions. Black dots show those including the mid-weights for each parameter that produce improvement in the objective function "di", but not in "Si". Red dots indicate the DRASTIC indices that provide Max(Si) and Min(di).

Analysis of Optimum DRASTIC (O-DRASTIC)
O-DRASTIC keeps the ranges proposed in [9] for parameters D and T. Only ranges of parameter C change in O-DRASTIC (C8 from Table 2). In general, the value of the conductivity rates is reduced in O-DRASTIC. The spatial distribution of conductivity ranges shows slight changes. The weights (W) of parameters of O-DRASTIC are shown in Table 4, compared with the original DRASTIC weights. Results of O-DRASTIC reveal that Depth to water and Conductivity have no a significant impact on vulnerability for our case study area. The ranges of parameter C in O-DRASTIC also support this conclusion given that the new classification of this parameter reduces the rate assigned to conductivity values. This conclusion is confirmed by the single-parameter sensitivity analysis of O-DRASTIC, in which the parameter C takes a mean effective weight of 1.7% (compared to its empirical weight 4.3%). In general, in the study area, the carbonate aquifers have a scarcely developed karst. In fact, approximately half the study area shows lower rates in the conductivity map. Although conductivity takes high values in some zones, the ranges of this parameter change in O-DRASTIC with respect to the original DRASTIC. The new classification in O-DRASTIC assigns lower rates to the conductivity values. Therefore, this parameter losses influence, which is also reflected in the weight. Moreover, conductivity can be considered as implicit in the aquifer media parameter (A), which increased in weight in O-DRASTIC in our case study. The reduced weight of parameter D may be due to the large depth to water table over most of the study area. The mean effective weight obtained in the sensitivity analysis was 3.2% (empirical weight = 4.3%) Aquifer media and Soil media gained great significance in the vulnerability assessment, as well as Recharge, albeit by a reduced amount. Impact of vadose zone continues to be an important factor in O-DRASTIC.
The weights in O-DRASTIC are consistent with the concept of COP methodology, where the vulnerability assessment is based mainly on the degree of protection afforded by overlying layers and the way in which the water percolates (recharges) into the aquifer.
The main changes between the original and optimum DRASTIC vulnerability maps mostly occur in areas of "Very low" and "Moderate" vulnerability of the original DRASTIC, since these are the predominant classes in this vulnerability map. Remarkably, those changes do not always occur in one direction. "Moderate" vulnerability shifts towards "Low" or "Very high" vulnerability depending on the zone, whereas other "Moderate" vulnerability areas remain with the same class. In the same way, some "Very low" vulnerability zones jump up a vulnerability class to "Low", while other maintain the same class. We analyzed the differences in the distribution of the parameter rates in these areas. Figure 6a shows that the main factor causing the vulnerability to change from "Moderate" to "Low" is Depth to water. Since the weight of this parameter changed from five to one in O-DRASTIC, zones with higher rates of D report a greater fall in the vulnerability value, leading the vulnerability class to drop by one level. This graph also shows how some areas with rates of three, five and seven change from "Moderate" to "High" vulnerability, which demonstrates that other parameters influence the "Very high" vulnerability class.
On the other hand, we can observe in Figure 6a that the rate of parameter S (soil media) is equal to eight over more than 80% of the area where vulnerability increased from "Moderate" to "Very high", corresponding with soils with a high organic content, and outcrops of limestone and dolomites. Furthermore, the rate of S is equal to four over nearly 90% of the area where vulnerability fell from "Moderate" to "Low". Zones where no change in vulnerability class was observed have soils with medium values.
A similar conclusion regarding Soil media is drawn in the areas where vulnerability rose from "Very low" to "Low" (Figure 6b). In these zones we can also observe higher rates of parameters A (aquifer media) and I (impact of vadose zone). These zones correspond to karstified areas and highly permeable layers.
The results highlight the influence of Aquifer media, Soil media and Impact of vadose zone in the vulnerability of carbonate aquifers.
These conclusions are supported by the single-parameter sensitivity analysis carried out for O-DRASTIC, which yields higher effective weights in parameters A, S and I and lower effective weights for parameters D and C. On the other hand, we can observe in Figure 6a that the rate of parameter S (soil media) is equal to eight over more than 80% of the area where vulnerability increased from "Moderate" to "Very high", corresponding with soils with a high organic content, and outcrops of limestone and dolomites. Furthermore, the rate of S is equal to four over nearly 90% of the area where vulnerability fell from "Moderate" to "Low". Zones where no change in vulnerability class was observed have soils with medium values.
A similar conclusion regarding Soil media is drawn in the areas where vulnerability rose from "Very low" to "Low" (Figure 6b). In these zones we can also observe higher rates of parameters A (aquifer media) and I (impact of vadose zone). These zones correspond to karstified areas and highly permeable layers.
The results highlight the influence of Aquifer media, Soil media and Impact of vadose zone in the vulnerability of carbonate aquifers.  (Figure 7b). The vulnerability classes in O-DRASTIC overlap with COP over 76.75% of the basin, representing an improvement of 20% relative to the original DRASTIC. This spatial coincidence is particularly improved in the class of "Very high" vulnerability, with a 90% coincidence between COP and O-DRASTIC.
• "Moderate": 130-138; • "High": 138-146; • "Very high": ≥146; Figure 7a shows a better distribution of vulnerability values in O-DRASTIC within the COP vulnerability classes (compared to the original DRASTIC). Close similarities can also be appreciated between the O-DRASTIC and COP vulnerability maps (Figure 7b). The vulnerability classes in O-DRASTIC overlap with COP over 76.75% of the basin, representing an improvement of 20% relative to the original DRASTIC. This spatial coincidence is particularly improved in the class of "Very high" vulnerability, with a 90% coincidence between COP and O-DRASTIC.  O-DRASTIC was validated using the pollution index (Equation (1)) and the correlation with nitrate concentration (R 2 = 0.653) in carbonate aquifers improved with respect to the original DRASTIC).
Lastly, we assessed the vulnerability of the three detrital aquifers in the Upper Guadiana Basin by applying original DRASTIC and O-DRASTIC, and we validated the maps following Equation (1). Original DRASTIC showed a good correlation (R 2 = 0.765) in detrital aquifers but O-DRASTIC gave a significantly better correlation (R 2 = 0.862). Both methods (original DRASTIC and O-DRASTIC) perform a better vulnerability assessment in detrital aquifers than in carbonate aquifers. classes. This area is characterized by a higher recharge rate and a karstified aquifer. In the mid-basin (including Mancha Occidental I, Mancha Occidental II, Consuegra Villacañas and Rus-Valdelobos), vulnerability decreases generally by one class, although most of the area remains unchanged; in the northern zone (Lillo-Quintanar, Sierra de Altomira and La Obispalía) only small areas jump up a vulnerability class, and by only one level. The largest difference in vulnerability class assigned is for carbonate aquifers. More than 17% of the carbonate aquifers area rises by two classes or more vulnerability classes. Only 7% of the detrital aquifers area jumps by this much, while 62% of the detrital aquifers show no change in the vulnerability class.

Discussion and Conclusions
This paper demonstrates that the DRASTIC method can be adapted to assess vulnerability in carbonate aquifers by undertaking some simple modifications of the weights and ranges of the parameters. Most recent studies to adapt the DRASTIC method to carbonate aquifers have aimed to transform DRASTIC by including and/or removing parameters that take the karst characteristics into account [12,28,29]. Other studies modified the classic assessment according to nitrate concentrations [27,30], though the most highly contaminated groundwater does not always imply higher vulnerability [31,32]. Instead of using either of these previous approaches, our study establishes a The largest difference in vulnerability class assigned is for carbonate aquifers. More than 17% of the carbonate aquifers area rises by two classes or more vulnerability classes. Only 7% of the detrital aquifers area jumps by this much, while 62% of the detrital aquifers show no change in the vulnerability class.

Discussion and Conclusions
This paper demonstrates that the DRASTIC method can be adapted to assess vulnerability in carbonate aquifers by undertaking some simple modifications of the weights and ranges of the parameters. Most recent studies to adapt the DRASTIC method to carbonate aquifers have aimed to transform DRASTIC by including and/or removing parameters that take the karst characteristics into account [12,28,29]. Other studies modified the classic assessment according to nitrate concentrations [27,30], though the most highly contaminated groundwater does not always imply higher vulnerability [31,32]. Instead of using either of these previous approaches, our study establishes a correspondence between DRASTIC and a vulnerability method that was specifically developed for karstic aquifers, the COP method, and therefore we avoid making conceptual changes in the original definition of DRASTIC method.
Our methodology is based on an optimization approach that identifies the ranges and weights of DRASTIC parameters that minimize the differences in the vulnerability assessment compared with the COP method, which was developed specifically for karstic aquifers. It allows us to identify the significance of the different DRASTIC parameters in the vulnerability assessment in karstic aquifers. Decision trees and spatial statistics analyses are combined to identify the ranges and weights of parameters that provide the optimal DRASTIC classes in terms of coincidence and distance with the COP vulnerability map. This optimization approach could be applied by minimizing the differences with respect to any other reference method for assessing vulnerability, whose results has been previously validated and considered as reasonable. Although many data mining techniques have been applied in groundwater vulnerability assessments [39,40,42], only a few studies have used decision trees in vulnerability studies [50,51]. They have been mainly used to assess other groundwater quality problems [65,67,68].
Our proposed method was applied in the Upper Guadiana Basin, an agricultural area that overlies carbonate and detrital aquifers. The socio-economic and hydrogeological particularities of the basin highlight the need to establish unified management measures at basin scale, not only regarding groundwater exploitation but also in terms of protecting the good quality of the groundwater resource. Therefore, harmonization of criteria to assess groundwater vulnerability would allow comparison at basin scale and overcome the issue of dissonant results provided by contrasting vulnerability methods. Our approach assumes that the COP method provides a better approximation of the vulnerability in the case study. This assumption was tested by means of a validation analysis in which we show that the "protection-against-pollution" index (derived from COP and LU data) more closely correlates (R 2 = 0.78) with nitrate concentration than the DRASTIC "pollution index". The results of the validation show that the pollution index derived from the original DRASTIC for carbonate aquifers is not correlated with nitrates (R 2 = 0.04), whereas in detrital aquifers there is a close correlation ((R 2 = 0.76). Other authors have previously pointed out that the original DRASTIC method does not significantly correlate with nitrate concentration in agricultural areas [45,59,69]. In contrast, the pollution index derived from O-DRASTIC shows significant correlation with nitrates for both, carbonate and detrital aquifers (R 2 = 0.65 and R 2 = 0.86, respectively).
The optimal solution (O-DRASTIC) obtained in this optimization problem shows that changing the range of the Conductivity parameter produces a small drop in the vulnerability class compared with the original DRASTIC but does not lead to a significant improvement in the objective functions (coincidence and distance between vulnerability classes). The reduced significance of Conductivity is also confirmed by the much-reduced weight assigned to this parameter in O-DRASTIC. Other DRASTIC adaptation for carbonate aquifers also pointed the reduced significance of Conductivity to assess the potential "protection-against-pollution" in karstic systems [12]. We also observed a reduced significance of the Depth to water table in our case study (WD = 1). Other research studies that modified the weights of DRASTIC parameters [3,69,70] also found the Depth to water parameter to be insignificant. Specifically in karstic aquifers Depth to water is not so relevant in protecting an aquifer from contamination because of the high transit velocity through the vadose zone [12]. Moreover, the low significance of in our case study may be due to fact that the water table lies below 30 m over most of the basin. The same argument was also stated in another case study [3]. The reduced significance of Depth to water and Conductivity is confirmed by a sensitivity analysis. Topography and Impact of vadose zone maintain their weights (W T and W I ) as defined in the original DRASTIC, while the remaining parameters (Recharge, Aquifer media and Soil media) are given maximum weights (W R = W A = W S = 5). The large increments in the weights given to Aquifer media and Soil media in O-DRASTIC show that they are the most significant factors controlling the vulnerability in karstic aquifers. Other authors concur that these parameters are the most significant [3,70,71]. The principal change in O-DRASTIC with respect to the original DRASTIC is in the weights of parameters, which highlights that these parameters embrace most of the uncertainty in the DRASTIC vulnerability assessment [72].
Our optimal solution provides an improvement of 20% in terms of coincidence between the vulnerability classes assigned by DRASTIC and COP. This improvement was achieved by applying spatial statistical analyses and decision trees, which allowed potential solutions to be obtained by exploring only a 0.1% of the total dimensionality of the defined optimization problem. The proposed method also helps to achieve a better understanding of the parameters and variables of the "equivalent detrital approach" that really influence vulnerability in this karstic system. This optimal solution was tested for the carbonate and detrital aquifers in our case study, but it should also be tested in other different aquifers with similar hydrogeological characteristics in order to prove its applicability in a broader context under different management framework.
In summary, results show that COP and O-DRASTIC report higher vulnerability classes than the original DRASTIC method over 36% of the total area overlying carbonate aquifers. The greatest differences between the original DRASTIC and O-DRASTIC are produced for the carbonate aquifers rather than the detrital aquifers. This confirms that the reliability of DRASTIC vulnerability assessment is significantly better for detrital aquifers than for karst aquifers. These underestimations of vulnerability in karstic aquifers when applying the classic DRASTIC is due to the physical particularities of these aquifers and their greater sensitivity to pollution [1,12,31].

Hypotheses, Limitations and Future Works
We have demonstrated the applicability of the method in the case of carbonate aquifers where the karst is not highly developed. Its applicability to well-developed karst aquifers also needs to be tested. The main adopted assumptions and limitations of the general methodology adopted are: • The ranges of categorical non-continuous parameters (Aquifer media, Soil media and Impact of vadose zone) are not modified in this optimization procedure. We consider that the Delphi criteria proposed in [9] can be applied to establish the relative significance of each range with respect to potential pollution. • Other algorithms and/or techniques (for example, a Random Forest algorithm) could be employed to achieve the goal in a more efficient way.

•
We have not studied the whole domain of potential solutions, and a wider spectrum of parameter ranges could be tested to find other optimal solutions. Moreover, the optimization procedure provides local optimum solutions.

•
Although decision trees help to reduce the dimensionality of the optimization problem, the methodology involves a large number of calculations, which might handicap extending the method to other case studies.

•
The proposed methodology requires a previous validated vulnerability assessment in the study area in order to optimize the DRASTIC method.
Future work should be developed to verify the applicability of O-DRASTIC in other case studies, including aquifers with different physical (climatic and hydrogeological settings) and management particularities in order to withdrawal more generalized conclusions.

Appendix A. DRASTIC and COP Methods
Appendix A.1. DRASTIC Method DRASTIC method was developed by [9] to assess intrinsic groundwater vulnerability in any type of aquifer.
This method considers that there are seven parameters/variables influencing the vulnerability to contamination: Depth to water table (D), Net recharge (R), Aquifer media (A), Soil media (S), Topography (T), Impact of vadose zone (I) and Hydraulic conductivity (C). A rate of importance is assigned to the parameters according to the value or characteristics of each parameter (Table A1).
The values of the parameters are weighted to obtain the DRASTIC index, which is calculated following Equation (A1): The DRASTIC index was originally classified into eight vulnerability levels according to some color codes [9] (Table A2) although it is usually grouped into five vulnerability classes that do not match with the original proposal in [9].  The COP method was developed by [10] to assess intrinsic groundwater vulnerability in carbonate aquifers.
This method considers the properties of layers overlying the water table (O factor), the concentration of flow (C factor) and precipitation (P factor) as the main parameters influencing groundwater vulnerability in carbonate aquifers. The concept of this method is to assess the natural protection of groundwater determined by the overlying soils and the unsaturated zone, which may be modified by the infiltration process and climatic conditions. Each factor is divided into subfactors, whose formulation is detailed explained in [10]. The factors are classified in ranges and the COP index calculated as the product of the three factors following the Equation (A2): COP index = C × O × P (A2) The ranges of factors and the COP index are shown in Table A3.   Classification of each lithology according to COP methodology and determination of thickness of vadose zone from 3D flow model. P

PQ-Precipitation quantity
Precipitation data from SPAIN02 [73] in the grid within the Upper Guadiana Basin (mean rainfall taking into account data above 0.5 mm/day).
Reclassification of the precipitation values into the PQ subfactor values, taking into account the average rainfall in the wet years. Precipitation series from SPAIN02 between 1974 and 2015 were used to extract the mean annual precipitation for wet years.

PI-Temporal distribution
Precipitation data from SPAIN02 (number of rainy days in the grid within the Upper Guadiana Basin).
Counting of the number of rainy days above 0.5 mm for each cell in the SPAIN02 grid. For the estimate of the rainy days per year, meteorological historical series between 1974 and 2015 from SPAIN02 were analysed.