Next Article in Journal
Reliability Assessment of a Series System with Weibull-Distributed Components Based on Zero-Failure Data
Next Article in Special Issue
Significance in Numerical Simulation and Optimization Method Based on Multi-Indicator Sensitivity Analysis for Low Impact Development Practice Strategy
Previous Article in Journal
Sub-Pixel Displacement Measurement with Swin Transformer: A Three-Level Classification Approach
Previous Article in Special Issue
Metal Contamination and Human Health Risk Assessment of Soils from Parks of Industrialized Town (Galati, Romania)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Data-Driven Method for Determining DRASTIC Weights to Assess Groundwater Vulnerability to Nitrate: Application in the Lake Baiyangdian Watershed, North China Plain

1
Institute of Geographical Sciences, Hebei Academy of Sciences, Shijiazhuang 050011, China
2
Hebei Technology Innovation Center for Geographic Information Application, Shijiazhuang 050011, China
3
Hebei Province Collaborative Innovation Center for Sustainable Utilization of Water Resources and Optimization of Industrial Structure, Hebei GEO University, Shijiazhuang 050031, China
4
Key Laboratory of Agricultural Water Resources, Hebei Key Laboratory of Agricultural Water-Saving, Center for Agricultural Resources Research, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Shijiazhuang 050001, China
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2025, 15(5), 2866; https://doi.org/10.3390/app15052866
Submission received: 17 January 2025 / Revised: 25 February 2025 / Accepted: 3 March 2025 / Published: 6 March 2025

Abstract

:
Nitrate pollution due to agricultural activities challenges the management of groundwater resources. The most popular technique used for groundwater vulnerability assessments is the DRASTIC. The subjectivity introduced by the DRASTIC has always been questioned. Therefore, the determination of rating scores and weights of parameters has become the main difficulty in DRASTIC applications. In this paper, a new data-driven weighting method based on Monte Carlo or genetic algorithm was developed. The new method considers both single factors and the relationship among factors, overcomes the subjectivity of weight determination, and is theoretically applicable to various hydrogeological environments and as a general weight determination method. In addition, a new method for the verification of the evaluation results on a temporal scale was established, which is based on changes in the nitrate concentration over the past 20 years. To verify and test these methods, they were used for the evaluation of groundwater vulnerability to nitrate in the plain area of the Baiyangdian watershed in the North China Plain and compared with other commonly used methods. The Pearson correlation coefficient increased by 15%. From a time perspective, the changes in nitrate concentration confirmed that the correctness of the assessment is 88%. In this study, the effect of the revision of the rating ranges on the improvement of the evaluation results is very obvious. Therefore, the focus of future work should be on determining the rating ranges and their rating scores, and whether the corresponding weights based on the data-driven method will yield more reliable results.

1. Introduction

Groundwater is an important natural resource that is used as a drinking water supply and for industrial and agricultural production, especially in areas with scarce surface water resources. Because of significant agricultural production activities and population growth, the vulnerability of groundwater is constantly increasing. Generally, groundwater vulnerability assessment is an effective and feasible tool for groundwater pollution prevention and control. Groundwater vulnerability mapping can be used to evaluate the groundwater sensitivity to contamination by distinguishing vulnerability extents and regions. The assessment results will help managers in developing sound plans for socioeconomic activities. They also provide recommendations for the mitigation or control of pollution in combination with vulnerability control factors [1,2,3,4,5].
Existing vulnerability assessment methods can be divided into three categories: (1) process-based methods, (2) statistical models, and (3) overlay and index methods. In process-based methods, the groundwater flow and contaminant transport are numerically solved to calculate the vulnerability distribution. Based on the requirement of large datasets and their complexity, these methods are primarily applicable to small areas. Statistical models use either simple statistics of the concentration of special contaminants or sophisticated regression analyses simultaneously incorporating the effects of several explanatory variables. They generally assume the probability of the occurrence of a pollution event [6]. The user will then use the probability to interpret the results. Overlay and index methods are based on the superposition of several indexes for different hydrogeological parameters to calculate composite indexes reflecting the degree of vulnerability. They are widely used for vulnerability assessment because of their ease of use, minimum data requirements, and explanation of groundwater vulnerability and reliability [7,8,9]. The DRASTIC model is one of the most widely used overlay and index methods [10,11].
The DRASTIC model is composed of seven hydrogeological parameters: depth to water table (D), net recharge (R), aquifer media (A), soil media (S), topography (T), impact of the vadose zone (I), and hydraulic conductivity (C). Each of the parameters is divided into ranges with a rating score (1–10) and assigned a weight (1–5) according to its relative importance [11]. Subsequently, the DRASTIC index is computed using Equation (1):
DI = Dr × Dw + Rr × Rw + Ar × Aw + Sr × Sw + Tr × Tw + Ir × Iw + Cr × Cw,
where DI is the DRASTIC index, D, R, A, S, T, I, and C are the hydrogeological parameters, and the subscripts r and w represent the rating score and weight of each parameter, respectively. The higher the DRASTIC index is, the greater is the vulnerability. The DRASTIC model is designed for two cases, that is, common and pesticide cases. The former is used for intrinsic vulnerability and the latter is used for specific vulnerability. The pesticide DRASTIC assigns higher weights to the soil media and topography to account for the natural attenuation of contaminants.
Despite its popularity, the application of the DRASTIC method has been questioned. Arguments focus on parameter selection and the judgment of expertise in determining the weights and rating scores for the parameters. In some studies, depending on the specific hydrogeological setting of the study area, the DRASTIC method has been improved by subtracting factors [12,13,14] or including additional factors [15,16,17,18,19] and by employing the Analytic Hierarchy Process [20,21] or statistical techniques [22,23,24,25] to determine the weights and rating scores.
The improvement of the DRASTIC model mainly focuses on overcoming the subjectivity in the judgment of expertise. Several approaches have been proposed for the determination of rating scores such as AHP, statistical techniques, and artificial intelligence methods [26]. Amongst these approaches, a method based on simple statistical procedures is prominent [12,22]. This method revises the rating score according to the mean nitrate concentration of every range determined by Aller’s DRASTIC model for each parameter, which is simple and statistically reclassifies the ranges into groups with different concentrations based on data obtained from the Wilcoxon Rank Sum Test. In previous studies, the modification of the DRASTIC method generally included the following procedures: (1) modification of the rating scores for each parameter, (2) modification of the factor weights, and (3) addition and subtraction of parameters based on the correlation between the parameter and contaminate concentration [6]. However, the revisions of the rating scores and weights are interrelated. The modification of the rating scores will inevitably affect the modification of the weights. If the rating scores have been determined, the corresponding weights have theoretically also been determined. At present, the weights are determined by either averaging the heterogeneity of hydrogeological parameters or by considering the correlation between a single parameter and the concentration of the pollutant, ignoring the correlation between parameters, and therefore are not necessarily applicable to various hydrogeological settings.
The accuracy of vulnerability mapping is also important for the evaluation of the effectiveness of the assessment methods. In previous studies, correlation coefficients between the degree of vulnerability and concentration of specific contaminants have been used to verify vulnerability maps [2,12]. Another method is based on the difference between groundwater vulnerability and nitrate concentration levels [27]. Both methods can be used to effectively verify the consistency of vulnerability assessment results with the current pollution situation. In other words, these methods are suitable for the verification of static vulnerability assessment results. In several studies, it has been pointed out that the time trend should be considered for the groundwater vulnerability [28,29,30,31]. However, from a dynamic point of view, it needs to be verified whether the vulnerability assessment results reflect long-term changes in the pollution conditions.
In this paper, a novel data-driven weight determination method based on the correlation between the parameters and pollutant concentration using Monte Carlo (MC) or genetic algorithm (GA) methods is developed. And a new method for the verification of the feasibility and accuracy of the groundwater-specific vulnerability on a temporal scale has also been proposed. These methods will be introduced and compared to DRASTIC’s common, pesticide, and modified versions with respect to the assessment of the groundwater vulnerability for the Lake Baiyangdian watershed in the North China Plain (NCP). The objectives of this study were twofold: (1) to identify a common methodology for weights that is both objective and applicable to a wide range of hydrogeological situations, and (2) to propose programs that can verify that the results of the vulnerability assessment reflect long-term changes in the state of contamination.

2. Methodology

2.1. Modification of the DRASTIC Model

Aller’s DRASTIC model considers seven hydrogeological parameters. It determines the rating scores for the aquifer media, soil media, and vadose zone based on their types, which represent qualitative data. Based on the research of Zhong [32], explanation of soil media by Aller et al. [11], and effect of different geomorphic units on the nitrate concentration reflected by statistical results [33], these parameters were replaced by the aquifer thickness, soil infiltration coefficient, and mean nitrate concentration of samples belonging to various geomorphic units, respectively, in this paper. In addition, land use (L) was used as a parameter to obtain more precise results [33]. In brief, eight parameters were used in this study for specific vulnerability assessment. The vulnerability indexes (VI) were evaluated using Equation (2):
VI = DI + Lr × Lw,
where DI is the DRASTIC index calculated according to Equation (1), L is the land use and the subscripts r and w represent the rating score and the weight of the land use, respectively. Each input parameter and the output result are expressed in the form of a raster layer. Although the DRASTIC model expressed by Equation (2) is different from Aller’s DRASTIC model, the rating scores and weights are the same as those in Aller’s model. These values are referred to as “Aller’s common and pesticide DRASTIC models” in this paper and are intended to be distinguished from modified models that use simple statistics and different weighting techniques to obtain rating scores and weights.

2.2. Revision of Rating Scores and Weights

The rating score for each parameter was calculated based on the data obtained from the Wilcoxon Rank Sum Test. The weights of parameters were determined with the newly developed method. The subsequent section is dedicated to the description of the prevailing weight modification methods that are currently available, with the objective of facilitating meaningful comparisons with data-driven methods.

2.2.1. Common Weighting Methods

Currently, popular techniques for the determination of weights include single-parameter sensitivity analysis, correlation analysis [22], and logistic regression analysis [26].
The SPSA technique was used to evaluate the influence of the rating and weighting values assigned to each parameter and to judge the significance of subjectivity elements [34]. Today, it is being used as a weighting technique [26]. It might be the most commonly used technique for the determination of DRASTIC parameter weights. Based on this approach, the modified weight for parameter j can be calculated with Equation (3):
w j ¯ = i = 1 n W j × R i j V i n ,
where w j ¯ and Wj are parameter j’s modified and original weights, respectively. Rij is the rating score of parameter j in the ith subarea of the corresponding input parameter layer, Vi is the vulnerability index in the ith subarea of the output result layer, and n is the number of subareas.
Correlation analysis can also be used to revise the weight of each parameter [22]. In this method, the quantitative correlation between the rating scores of every parameter and the nitrate concentration is computed. Because the rating scores of the parameters vary on an interval scale, the quantitative correlation is expressed using Spearman’s ρ and Kendall’s τ correlation coefficients. The results are assumed to be the modified weights; they are called CA in this paper.
Logistic regression has been successfully applied to evaluate groundwater’s vulnerability to contamination [6,12,35,36]. Unlike the DRASTIC model, this method provides a probability map to depict aquifer vulnerability, which has also been used to determine the weights of the parameters [25]. Two steps are necessary to implement this approach: (1) the nitrate concentration is reclassified into two classes (0 for a concentration that is below a predefined threshold, and 1 for others), (2) the coefficients of the logistic regression equation are calculated. The logistic regression equation can then be written as Equation (4) [25]:
P = e b 0 + b 1 X 1 + b 2 X 2 + + b j X j 1 + e b 0 + b 1 X 1 + b 2 X 2 + + b j X j   ,
where P is the probability of a nitrate concentration over the given threshold, Xj is the rating score of parameter j, and bj is the coefficient for the corresponding parameter. These coefficients (bjs) can be optimized with Maximum Likelihood Estimation. The likelihood of the regression model is formulated as follows [25]:
l = i = 1 n y i ln P i + 1 y i ln 1 P i ,
where l is the likelihood, yi is the binary-coded value (0 or 1) for the nitrate concentration of sample i, Pi is the predicted possibility based on Equation (4), and n is the number of samples. If Pi ∈ [0−1], lnPi and ln(1−Pi) are negative, l is also negative. Lastly, these coefficients are considered to be the corresponding weights.

2.2.2. Modification of the Weights Based on the Data-Driven Method

Many studies used the correlation between the hydrogeological parameter and nitrate concentration to determine weight [3,12,22,25]. The mechanism of weighting based on correlation analysis and logistic regression is the following: if there is a stronger correlation between the parameter and nitrate concentration, the parameter will have a stronger weight. Thus, the correlation between the vulnerability indexes and nitrate concentration is enhanced and a more accurate result is obtained.
The problem of optimization of the parameter weights can be formulated into a mathematical problem: the vulnerability index is a linear combination of parameters, and the weights are unknown coefficients between zero and five, as defined by the common DRASTIC model, which can be optimized to maximize the correlation coefficient. A weight of zero means that the parameter should be excluded. In this study, Pearson’s ρ correlation coefficient was used to represent correlations and to optimize weights (other correlation coefficients are also suitable criteria). The application of Pearson’s ρ correlation assumes a normal distribution of the nitrate concentration, which is not satisfied based on the dataset of this study area. A logarithmic transformation of the nitrate concentration depicted by Equation (6) was used to satisfy this condition:
C n ' = log 1 + C n ,
where Cn and Cn are the nitrate concentration and the value of the logarithmic transformation of the nitrate concentration, respectively, and n is the number of samples. The problem is expressed in mathematical terms as follows (in this case):
Let Cn1, Cn2, ..., and Cnk be the values of the logarithmic transformation of the nitrate concentration of all k observations, denoted by C n = C n 1   C n 2   C n 3     C n k T . The matrix A consists of all hydrogeological parameters at the corresponding sample locations:
A = d 1 d 2 d k   r 1 r 2 a 1 a 2 s 1 s 2 r k a k s k   t 1 t 2 t k   i 1 i 2 c 1 c 2 l 1 l 2 i k c k l k ,
where d, r, a, s, t, i, c, and l are the parameters of the hydrogeological setting and subscripts 1, 2, ..., and k represent the different sample locations. The unknown weights to be solved are denoted by W = W d     W r   W a   W s   W t   W i   W c   W l T and the vulnerability indexes evaluated by DRASTIC at these sample locations are denoted by V = V 1   V 2   V 3     V k T , which can be calculated by matrix multiplication, that is, V = A × W. The optimal weights are the solution of Equation (7):
MAXIMIZE   ρ C N ,   V = COV C N , V σ C N σ V
subject to Wi ∈ [0,5], i = d, r, ..., l,
where ρCN,V is the Pearson correlation coefficient, COV(CN,V) is the covariance of CN (nitrate concentrations), V (vulnerability indexes) and σCN and σV are the standard deviations of the two variables, respectively, and Wi is the weight for parameter i, d, r, …, l, which are the parameters of the hydrogeological setting.
Equation (7) does not only consider the relationship between the vulnerability indexes and nitrate concentration, but also the effects between the parameters, thus making up for the shortcomings caused by neglecting the relationship between the parameters. The MC and GA methods can be used to solve this equation and are described in the following subsections.

2.2.3. Monte Carlo Method

The essential idea of the MC method is the use of randomness to obtain numerical results of problems that might be deterministic in principle [37]. It is often used for the optimization of mathematical problems [38] and is most useful when it is difficult or impossible to use other approaches. Generally, the MC method follows a pattern: (1) define a domain of possible inputs, (2) randomly generate inputs from a probability distribution over the domain, (3) perform a deterministic input computation, and (4) aggregate the results.
In this study, one million random weight vectors were produced and each component of every vector was uniformly distributed across the range of 0–5. One million Pearson’s ρ correlation coefficients were calculated; then, the maximum coefficient was selected, and the corresponding weight vector was regarded as the solution.

2.2.4. Genetic Algorithm

The GA is a metaheuristic method, which was inspired by the process of natural selection [39]. It is commonly used to generate high-quality solutions to optimization and search problems by bio-inspired operators such as mutation, crossover, and selection [40]. In the GA, a population of candidate solutions, called individuals, to an optimization problem is evolved toward better solutions [41]. The evolution is an iterative process, which starts from a population of randomly generated individuals and terminates when a maximum number of iterations has been produced or when the specified fitness function converges to a stable value [42].
To obtain the optimum solution, the GA was implemented as follows: (1) One thousand individuals were randomly generated, whereby each individual had eight genomes encoded as array of bits and was in the range of 0–5; (2) The parameters of this algorithm were set, that is, 1000 as maximum number of generations, 0.8 as crossover operator probability, 0.001 as mutation operator probability, and 0.0001 as the tolerance for the convergence difference; the selection scheme for mating was the roulette wheel selector; and (3) The objective function for Pearson’s ρ correlation coefficient was set. The best individual of the last generation was the solution, that is, the best weight vector.

2.3. Harmonization of Scales

Because the modified rating scores and weights calculated by different methods are on different scales, the vulnerability indexes derived from the DRASTIC rating scores and weights cannot be compared with the vulnerability indexes derived from the new scores and weights. The scales of the modified rating scores and weights must be harmonized to ensure a fair comparison. All modified rating scores and weights were scaled to the range of 1 to 10 or 1 to 5. Equations (8) and (9) were used for this purpose:
R i j * = C N i , j Max C N i , j × 10 ,
W i * = W i Max W i × 5 ,
where Rij* is the harmonized rating score of the jth group of the ith parameter, CNi,j is the mean nitrate concentration corresponding to the jth group of the ith parameter, Wi* is the harmonized weight of the ith parameter, and Wi is the weight of the ith parameter calculated using various techniques.

2.4. Verification of the Vulnerability Assessment Performance

The correlation of the vulnerability indexes based on actual pollution occurrences was used for the validation of the accuracy of groundwater vulnerability maps [12,43]. The Pearson and Spearman correlation coefficients between the vulnerability indexes and nitrate concentration were adopted to verify the assessment results.
The Spearman correlation coefficient is defined as the Pearson correlation coefficient between ranked variables. The vulnerability index Vi and nitrate concentration Cn of the samples are converted into ranks and calculated for the correlation coefficient.
As mentioned above, the Pearson and Spearman correlation coefficients can be used to verify the static vulnerability assessment performance. However, it is necessary to evaluate the vulnerability results from a time perspective. To achieve this goal, the method reported in the work of Stigter et al. [44] was modified. From a temporal perspective, the difference between the groundwater vulnerability level and the level of nitrate concentration changes is a good validation criterion. The vulnerability indexes and nitrate concentration changes are strictly divided into the same number of categories and the difference in the levels is then calculated by computing the absolute difference between the two maps.
The two nitrate datasets obtained in the study area (separated by almost 20 years) provide an opportunity to measure the effects of the vulnerability assessment using nitrate concentration changes. Two maps of nitrate concentration in shallow groundwater of the study area can be attained by spatial interpolation and the variations can be calculated by subtraction of the two maps. The magnitude of the changes can be compared with the vulnerability indexes and the consistency of the two graphs is a good measurement criterion. Since there is no obvious trend, the ordinary kriging (OK) was used instead of the universal kriging (UK) to obtain the spatial nitrate concentration map. The spatial interpolation of the nitrate concentration includes the following steps: (1) the nitrate concentration was logarithmically transformed, as shown in Equation (6); (2) kriging interpolation was performed; and (3) the interpolation result was converted to the original form.
The data-driven weighting method proposed in this paper is based on the Monte Carlo algorithm or genetic algorithm. This method considers both single factors and the relationship between factors. It overcomes the subjectivity of weight determination. It is theoretically applicable to a variety of hydrogeological environments. It is a generalized weight determination method.

3. Case Study

3.1. Study Area

The NCP is one of the most important agricultural production areas in China. Based on the expansion of inorganic fertilizers since the 1070s [45] and the increase in irrigation [46], the wheat and corn production of the NCP agricultural production system accounted for 61% and 39% of China’s total agricultural production in 2012, respectively [47]. The annual chemical nitrogen fertilizer input was estimated to be as high as 400–600 kg·hm−2 [48]. Excessive application of nitrogen fertilizers is one of the important factors leading to the increasing nitrate concentration in groundwater in the NCP region. The Lake Baiyangdian watershed is a typical NCP region. The main crops in the region are cotton, vegetables, fruit trees, winter wheat, and summer maize, which are rotated annually. The water deficit is balanced by groundwater pumping. Because of intensive agricultural activities and fertilizer application, nitrate has become the main chemical pollutant input in farmland. To mitigate the increasing nitrate pollution of the groundwater, it is important to assess groundwater vulnerability.
The plain area of the Baiyangdian watershed (16,102.6 km2) is in the western part of the NCP (114°18′45″–116°29′59″ E, 38°01′18″–39°51′27″ N; Figure 1). The climate is semiarid continental monsoon with a perennial mean precipitation of 503 mm; 75% of the rainfall occurs from June to September. The annual evaporation is 1369 mm [49]. The topography of the plain area of the Baiyangdian watershed inclines from west to east and includes a piedmont plain (composed of alluvial fans and plains) and a depression. The entire area is flat and the slope of the terrain only changes on dry riverbeds or gullies. The alluvial fans and paleochannels in the study area were formed by rivers and lake depressions in the Late Pleistocene [50].
Unconsolidated Quaternary sediments constitute the main stratigraphy of the plain area (Supplementary Figures S3 and S4). The sediments include cobbles, gravel, sand, laminated or lensed clay, and interbedded sand and clay. Based on the stratigraphic features, the aquifers can be divided into four aquifer groups: the first and second aquifer groups (I, II), that is, Holocene Qh aquifers with a depth of 10–50 m and upper Pleistocene Q3p aquifers with a depth of 80–160 m; the third aquifer group (III), that is, the mid-Pleistocene Q2p with a depth of 330–400 m; and the fourth aquifer group (IV), that is, the lower Pleistocene Q1p aquifer group [49]. Shallow groundwater, which was selected as the target for the assessment of the vulnerability to nitrate contamination, includes the first and second aquifer groups (I, II). It is the major stratum used for the supply of water for domestic and agricultural use in the area.

3.2. Dataset and Computation

3.2.1. Nitrate Concentration Data Collection

In order to assess the vulnerability and validate the results, nitrate concentration in shallow groundwater from a previous study was referenced [33]. That study had previously described the sampling methodology employed in the study area. The ion balance errors for the nitrate concentrations of these samples are of less than 5%. The data were gathered from the literature for 168 samples that satisfied the information requirements of the vulnerability assessment. Additionally, nitrate concentration data from the literature for the region in 1998 were employed to validate the results of this study (Figure 1).
The spatial distribution of the nitrate concentration was estimated by the OK. The mean error (ME) and kriged root-mean-square error (KRMSE) of the 1998 and 2018 data, which can be used to estimate the performance of the kriging model, are 0 and 0.98 and −0.02 and 1.13, respectively. The closer the ME is to 0 and the closer the KRMSE is to 1, the better the model is [51].

3.2.2. Hydrogeological Parameter Dataset

To obtain the vulnerability indexes, eight hydrogeological setting and land use datasets were prepared (see Supplementary Figures S5–S12). Armengol et al. [24] indicated the use of geostatistics (kriging) as the best alternative for DRASTIC to assess the vulnerabilities based on the amount and type of available data. In this work, the kriging was used to obtain the depth to the water table and aquifer thickness.
The depth to the water table can be calculated as the difference in the elevation of the ground surface and water table. The altitude of the study area is based on the Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) [52], which has a spatial resolution of 1 arc second or 30 m and is highly accurate [53,54]. The water table was interpolated from the water table of 103 shallow wells in the study area (Supplementary Figure S1). For the interpolation, the universal kriging (UK) with a second-order polynomial (on x and y coordinates of the wells) trend surface was used, considering the influence of anisotropy. The ME and the KRMSE are 0.086 and 1.003, respectively. Due to the lack of data close to the boundary of the area, the kriging variance increases in that area and the depth to the water table data near the boundary were replaced with data derived from the “Atlas of Groundwater Sustainable Utilization in the North China Plain” [55]. The water depth varies from 0 to >50 m.
The net recharge data were derived from the work of Cao et al. [56]. The values vary from 0 to 300 mm/a. The rating division slightly differed from that of the original scheme. According to the findings of Aller et al. [11], the rating scores for net recharge exhibited a nearly linear trend. Consequently, the range from 150 to 254 mm was segmented into two sub-ranges and assigned scores of 7 and 8, respectively. The rating scores are listed in Table 1.
Aller’s common DRASTIC model uses the aquifer media type to assess vulnerability, which is a qualitative parameter. Based on the work of Zhong [32] and Kazakis and Voudouris [34], aquifer thickness was used instead of aquifer media in this paper. The data were obtained by subtracting the water table from the bottom elevation of the shallow aquifer. The bottom elevation was derived from UK interpolation of data from 277 boreholes distributed in the study area (Supplementary Figure S2). The UK model shows a second-order trend on the x and y coordinates of the boreholes but is not anisotropic. The ME and the KRMSE are −0.006 and 0.983, respectively. The thickness of the aquifer varies from 0 to >200 m.
The soil, landform, and hydraulic conductivity parameters were all obtained from the “Atlas of Groundwater Sustainable Utilization in the North China Plain” [55]. Soil media affect the transport of water and pollutants from the surface to the aquifer. Although physical processes, chemical reactions, and biodegradation take place in the soil, the rating scores appear to be based on the ability of the contaminants to migrate to the aquifer. Infiltration coefficients were used instead of the soil media type. The infiltration coefficients vary from 0.2 to 0.65. The coefficients were linearly scaled to the range of 1 to 10, which represents the rating scores for soil media (Table 1).
The vadose zone of the study area exhibits one lithology and is mainly composed of clay, silt, and sand, which generally exhibit a range from coarser textures in the piedmont region and in paleochannels to finer textures in plains [56]. The lithological changes in the vadose zone can therefore be characterized by the landform. The study area can be divided into five types of geomorphic units: proluvial fans, alluvial-proluvial fans, alluvial and flood plains, lakes and depressions and river belts (Supplementary Figure S10). The rating scores of the different geomorphic units can be determined by calculating the average nitrate concentration in the samples of different geomorphological units (Table 2).
The hydraulic conductivity varies from 15 to 300 m/d.
Topography refers to the slope of the land surface, which is generated in the DEM. The slope ranges from 0% to >100%.
The impact of human activities on groundwater vulnerability is also reflected in the land use. The land use types were used as a contamination indicator in this study. The land use data were derived from the work of Wang et al. [57] and Min et al. [58] and include six land use types: winter wheat/summer maize, vegetables, orchards, woods, cotton, and residential area (village). The mean nitrate concentration in the groundwater of different land use types was used as criterion for the rating scores for the land use types. No samples were obtained from the cotton growth area. However, according to the survey, cotton planting uses the least amount of chemical fertilizer, that is, 40% of the amount used for wheat/maize. Therefore, the lowest score (1, the same score as that used for woods) was assigned to the cotton area (Table 2).

4. Results

4.1. Application of the Common and Pesticide DRASTIC Models

The rating scores for the depth to the water table, topography, and hydraulic conductivity agree with the values reported by Aller et al. [11]; the scores for the aquifer were taken from the work of Zhong [32]. The scores for other parameters are listed in Table 1 and Table 2, which follow the original idea. Due to this study area and these datasets, these scores differ from Aller’s DRASTIC model only at some values. The weights of all parameters, except for land use, are consistent with those of Aller’s common and pesticide DRASTIC models. The importance of land use is considered to be equal to the depth to the water table in the study area. Therefore, the weight of land use was set to 5. Table 3 shows the weights of Aller’s common and pesticide DRASTIC models. The vulnerability indexes for Aller’s common and pesticide DRASTIC models are in the ranges of 38–216 and 39–247, respectively. The vulnerability indexes were reclassified into five classes using the natural breaks method and are shown in Figure 2a,b. The Pearson and Spearman correlation coefficients calculated for Aller’s common and pesticide DRASTIC models are 0.513 and 0.478 and 0.502 and 0.457, respectively.

4.2. Optimization of the DRASTIC Model with Respect to the Vulnerability to Nitrate

4.2.1. Revision of the Rating Scores

The revision of the rating scores was carried out using the Wilcoxon Rank Sum Test described in Section 2.2. The parameters, except for landform and land use, represent continuous data. Although land use itself is category data, its use as a pollution indicator in this study is also considered to be continuous. For landform, that is, the only noncontinuous data, all categories were maintained regardless of the statistical result. The topography of different groups, presented as a slope percentage, does not significantly differ, thus maintaining the original rating ranges. The scores of each parameter were calculated using Equation (8). Table 4 shows the new range, mean nitrate concentration, and modified rating scores for every factor.
After the rating scores were revised, the assessment was performed using the new scores and the weights of Aller’s common DRASTIC. The Pearson and Spearman coefficients are 0.647 and 0.643, respectively.

4.2.2. Revision of the Weights

To obtain better results, the weights of the parameters must be modified. The modification process and results are as follows.
(1)
Using single-parameter sensitivity analysis
Based on the SPSA, the effective weights of the D, R, A, S, T, I, C, and L factors are 0.17, 0.24, 0.09, 0.05, 0.03, 0.14, 0.11, and 0.16, respectively. Based on the use of Equation (9) to scale the values to the range of 1 to 5, the equivalent weights are 3.61, 5, 1.96, 1.14, 0.7, 2.91, 2.43, and 3.46, respectively. The Pearson and Spearman correlation coefficients calculated using both the modified rating scores and weights are 0.646 and 0.635, respectively.
(2)
Using correlation analysis
Table 5 lists the results of the correlation analysis, which includes the Spearman ρ and Kendall τ correlation coefficients between the nitrate concentration and rating scores for different parameters. The sp and kp columns present the p-values of the two correlation coefficients, respectively. Based on the statistical significance, the Pearson and Spearman correlation coefficients were calculated using seven parameters (excluding topography parameters) and six parameters (excluding topography and recharge parameters) based on the modified scores and weights, respectively. The results of the two calculations are 0.633 and 0.634 and 0.624 and 0.63, respectively, corresponding to the Pearson and Spearman coefficients.
(3)
Using logistic regression
The threshold for the conversion of nitrate concentration into a binary variable is 50 mg/L because it has a conventional meaning, that is, a potable limit. Table 5 shows the b coefficients, p-value, and weights. The Pearson and Spearman coefficients based on the new weights were calculated by LRA and the modified rating scores are 0.591 and 0.593, respectively.
(4)
Using data-driven methods
The MC method and GA were used to calculate the weights. For a fair comparison with CA, the calculation for weighting was performed three times: (1) using eight parameters, (2) using seven parameters (excluding topography), and (3) using six parameters (excluding both topography and net recharge). The results, which were also used for comparison with SPSA and LRA, are shown in Table 5.
To make the results clearer and easier to compare, Table 6 lists the correlation coefficients for vulnerability and nitrate concentration calculated by different methods.
Figure 2c,d shows the results of the evaluation using MC and GA. The results are similar.

4.2.3. Verification of the Assessment Performance Based on Nitrate Concentration Changes

In addition to using correlation coefficients to assess the performance, the changes in the nitrate concentration from 1998 to 2016–2019 were used to assess the accuracy of the vulnerability assessments. The vulnerability indexes derived from the GA method were divided into five classes using natural breaks. As with the classification of vulnerability indexes, these changes were also divided into five classes using natural breaks (Figure 3a). The level differences are the absolute values of the differences between the levels of change and the degrees of the vulnerability indexes. The differences range from 0 to 4. The area percentages corresponding to the level differences of 0, 1, 2, 3, and 4 are 45.34%, 42.6%, 10.2%, 1.71%, and 0.15%, respectively (Figure 3b). When the difference is 0 or 1, the assessment is correct. When the difference is 2 or 3, the evaluation is overestimated; when the assessment is equal to or greater than 4, the assessment is extremely overestimated. The correct, overestimated, and extremely overestimated areas account for 87.94%, 11.91%, and 0.15% of the area. The inappropriately assessed area accounts for 12% of the area.

5. Discussion

5.1. Impact of Revising Rating Scores

Table 4 shows that the number of parameter groups for the depth to the water table, net recharge, aquifer thickness, infiltration coefficient, hydraulic conductivity, and land use is two. The number is smaller than that reported based on similar studies. The soil texture in this area is relatively simple and the vadose zone and aquifer lithology are monotonic, which are the main reasons for the smaller number of parameter groups. Except for the topography factor, which shows an insignificant trend due to the flat terrain (DRASTIC slope changes only occur in dry riverbeds or gullies), all parameters show an ascending or descending trend with respect to both the mean nitrate concentration and rating scores, which is consistent with the theory. From this perspective, the results are better than those obtained in similar studies [6,7]. The Pearson and Spearman correlation coefficients calculated using the modified rating scores and Aller’s common DRASTIC weights are 0.647 and 0.643, respectively (Table 6). They exhibit increases by more than 10% compared with those of Aller’s common DRASTIC model. It can therefore be concluded that this simple statistical method is effective with respect to the range division of the parameters in the study area.

5.2. Impact of Revision Weights

When using the modified rating scores, the weights of Aller’s common DRASTIC model result in Pearson and Spearman coefficients of 0.647 and 0.643, respectively. None of the weight vectors derived from the SPSA, CA, and LRA provide better correlation coefficients. The worst results were obtained by using the weights of the LRA calculation (Table 6). The SPSA results are better than those obtained with LRA; the Pearson and Spearman correlation coefficients are 0.646 and 0.635, respectively (Table 6). However, they are not better than the results obtained by using the rating scores of the modified DRASTIC model. When using only six parameters for CA, the best Pearson and Spearman correlation coefficients are 0.634 and 0.63, respectively (Table 6). This means that the data did not improve. Based on the correlation coefficients, these three methods are not suitable for the study area.
In the following section, the rationality of the weights obtained by each method is considered. The weights for the LRA method are shown in Table 5. The p-value column shows that only the land use parameter is statistically significant and the weights are eccentric. Although the use of logistic regression for the determination of the weights is not advisable in the study area, the results obtained by using the new weights calculated by LRA and the modified rating scores are better than those obtained with Aller’s common DRASTIC model. This implies that one of the key points in assessing the groundwater vulnerability in a similar hydrogeological environment is the determination of the rating scores. In other words, if the rating scores are reasonable, the results are almost acceptable, even if the weights are not appropriate.
Based on SPSA, the effective weights of the D, R, A, S, T, I, C, and L factors are 3.61, 5, 1.96, 1.14, 0.7, 2.91, 2.43, and 3.46, respectively. The weight for topography is only 0.7, which implies that this parameter should be ignored. The highest weight was assigned to the recharge. The difference in the recharge in this area is the smallest (Table 4); therefore, the weights obtained by this method are not reasonable.
Table 5 displays the results of the CA method. The relationship of the topography with the nitrate concentration is statistically insignificant, which implies that this parameter should be excluded from Equation (2). This phenomenon has been reported in previous studies [2]. The Pearson and Spearman correlation coefficients calculated using the seven parameters (excluding the topography parameter) and modified rating scores and weights are 0.633 and 0.624, respectively. The correlation coefficient between the net recharge and nitrate concentration is weakly negative, which can be explained with the approximately equal nitrate concentrations listed in Table 4. Therefore, it is also reasonable to exclude the net recharge parameter from Equation (2). The results of the calculation using the remaining six parameters are better than the results obtained by using seven parameters but worse than the results of the SPSA. Based on this study, it can be concluded that CA is not an effective method for weight determination. It is also worth noting that, although the results of the SPSA are currently the best, the highest weight is assigned to the recharge, which is negligible in the CA. Thus, the results of the two methods are contradictory and unreasonable. This again confirms that the SPSA and CA methods are ineffective in determining the weights of the parameters in the study area.
The MC and GA methods are both data-driven but follow different theories and mechanisms. Both of these methods are used to solve Equation (7). If there is an optimal solution that is unique, the same results should be obtained from these methods. To prevent the solution from converging to a local optimum, six threads were started in sequence. Each thread started at 2–3 s intervals and the thread start time was used as a random seed to generate one million random weight vectors for the calculation. Each thread completes the calculation in approximately 10 s and is recalculated using the new time as a random seed. The six threads performed 24 h calculations (continuously) and tested approximately 50 billion random weight vectors. The results show that a large number of the results are consistent with a single calculation result, which proves that the MC method converges to the global optimal solution.
The results obtained from the MC and GA methods are almost identical (Table 5), which confirms the existence and uniqueness of the optimal solution. Therefore, from a mathematical point of view, the optimum weights exist in the solution space and are the optimum solutions for Equation (7), which can be searched as accurately as possible by the two methods. Based on this result and perspective, it can be concluded that the data-driven approach is applicable to various hydrogeological settings.
Unlike the CA method, both the MC and GA methods show that using all eight parameters yields better results than the use of fewer parameters. This result differs from those of many previous studies [2,13,14] but is consistent with the conclusion of Rosen [59]. The reason for this is that all parameters are related rather than independent; even if a parameter itself does not significantly differ, it may amplify the difference within or between other parameters with minor differences. Both the results of the weights derived from the objective data overcome the subjectivity and indicate that the most important factors are the depth to the water table and land use type, followed by the hydraulic conductivity, topography, infiltration coefficients, aquifer thickness, and landform. The least important factor is the net recharge, which agrees with the result of Turkeltaub et al. [60]. This is different from all results derived from other methods used in this study. Whether it is SPSA, CA, or LRA, only a single factor is considered, regardless of the relationship between the factors. However, both the MC and GA consider not only factors but also the relationship between the factors. Therefore, the MC and GA methods yield the best Pearson’s factor.
Table 6 shows a comparison of the performances of the methods used in this study. Aller’s common and pesticide DRASTIC models are the simplest methods for the determination of the vulnerability indexes because only parameter data need to be prepared and the calculation is simple. The modification of the rating scores and weights requires more preparation and calculations. Certainly, modifying only the rating scores is easier and faster than modifying both the rating scores and weights. With respect to the determination of reasonable weights, SPSA is relatively easier than CA and LRA. The former requires only simple statistics, while the latter methods require the preparation of more data, complex computations, and more time. Unlike CA and LRA, the data-driven methods (MC and GA) are simple. They are common methods to solve this problem. The time required to calculate weights using MC and GA is less than ten seconds and hundreds of microseconds, respectively. The difference between the MC and GA lies in the computational efficiency and implementation difficulty. To achieve the same accuracy, more calculations and time are needed for the MC method. Correspondingly, the implementation of GA is more difficult than that of MC. Nonetheless, both methods are very fast and efficient. Based on this study, the data-driven approaches are the best approaches.
Table 6 shows that the best results are obtained with the MC and GA methods compared with the use of modified rating scores and original weights. This indicates that SPSA, CA, and LRA are not suitable for various hydrogeological settings. A reasonable explanation is that these methods only consider every single factor individually, regardless of the relationship among the factors. The use of these methods requires careful consideration. Table 5 also shows that the most important factors controlling the vulnerability in the study area are the depth to the water table and land use. Therefore, the reason for the better results based on only using the modified rating scores is that the weights assigned by Aller’s common DRASTIC model to the depth to the water table and land use are equal to the weights obtained from the MC and GA methods.
Just by modifying the rating scores, Pearson’s factor increases over 15%. This suggests that the key for the assessment of groundwater vulnerability is the division of the parameter levels. Once reasonable rating scores are determined, the best weights can be quickly and efficiently determined by means of the GA or MC methods and a more reliable evaluation can be obtained. Many modifications of the DRASTIC model reported in the literature involve the revision of the rating scores and weights [2,12,22,25]. Based on the above-mentioned analysis, only reasonable rating scores are necessary when using data-driven weighting methods.

5.3. Verification of the Assessment Performance

Figure 2a,b shows the assessment results based on Aller’s common and pesticide DRASTIC models, which are consistent. Based on the comparison of the two graphs, it can be concluded that Aller’s common DRASTIC version is superior to the pesticide version. This conclusion is similar to that of a previous study [34]. The difference between the two versions are mainly the different weights of the soil and the topography. The weights of the soil and topography of pesticide are significantly higher than the original version because physical, chemical and biodegradation reactions occurring in the soil can effectively degrade pollutants. Significant topographical changes can also lead to the loss of surface runoff pollutants and reduce the potential of pollution. Due to the flat terrain of the study area, the impact of topography weight is negligible. Nitrate with specific chemical and physical properties cannot be adsorbed by the soil. This makes reductions in nitrate in the soil very limited. As a result, higher soil and topography weights are no longer valid, and such a result is reasonable. Figure 2c,d shows the results of the improved DRASTIC model evaluation using the MC and GA methods for the weight determination. The results notably differ from those of Aller’s common and pesticide DRASTIC models. The consistency between the distribution of the vulnerability indexes and the distribution of the black circle symbols (symbol size represents the nitrate concentration), may indicate the accuracy of the evaluation results. The results derived from the data-driven methods are better than those obtained with Aller’s common and pesticide DRASTIC models, which overestimate the areas with high and very high risks. Many pixels with high and very high risks based on the two unmodified versions are assigned to medium- or low-risk areas based on the new results (Table 7). The correlation coefficients can also be used to confirm the assessment performance.
Based on the comparison of Figure 2d and Figure 3a, the distribution of the changes in nitrate concentration is consistent with the distribution of the vulnerability indexes, which implies that the results of the assessment are accurate on a time scale. Figure 3b quantitatively confirms this result, which again shows that the GA and MC methods are effective. Based on the long-term changes in nitrate concentration and consistency between the assessment results and changes in nitrate concentration, it can be concluded that the level differences can be used to verify the accuracy of the vulnerability assessment in terms of time. Therefore, the accuracy of the current evaluation results can be verified by the correlation coefficients and the long-term reliability of the vulnerability assessment can be determined using the level differences.

6. Conclusions

In this study, the Wilcoxon rank sum test was employed to ascertain rating scores for hydrogeologic parameters, and a data-driven methodology for determining parameter weights was developed. This novel method was then compared with SPSA, CA, and LRA. The Pearson and Spearman correlation coefficients between the vulnerability index and nitrate concentration were also utilized to validate the assessment results. The results demonstrated that the correlation between the vulnerability index and nitrate concentration improved significantly after the revision of the rating scores. The findings further underscore the conclusion that SPSA, CA, and LRA are not universally applicable to all hydrogeologic environments, particularly within the confines of the present study area. The primary conclusions derived from this study are as follows: (1) The primary limitation of Aller’s general DRASTIC model and the pesticide DRASTIC model pertains to the high degree of subjectivity inherent in determining the rating scores and weights of the parameters. (2) The data-driven weighting method proposed in this paper (solved by MC and GA) accounts for both individual factors and the relationships between factors, thereby enabling the optimal weights to be obtained. (3) The discrepancy between the level of change in nitrate concentrations over extended periods and the outcomes of the vulnerability assessment can be employed to ascertain the reliability and practicality of the assessment results from a temporal perspective. (4) The key to a comprehensive groundwater vulnerability assessment is the rationalization of the rating ranges of parameters and their corresponding rating scores, which should be the focus of future work.
This paper established a new, effective data-driven method based on the Monte Carlo and Genetic Algorithm methods to determine weights. In this way, the revision of the rating scores and weights will no longer be a separate step but a continuous and simple process. This paper also proposed a new method to verify the accuracy of the assessment from a time perspective based on changes in nitrate concentration. These methods are applicable to a variety of hydrogeological environments. It was found that the establishment of rating scores is a key process that can effectively improve results. All of this will raise awareness of groundwater-specific vulnerability assessments and strengthen the protection and management of groundwater quality.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app15052866/s1.

Author Contributions

Investigation, Y.Z. (Yan Zhang), Y.W. and W.F.; Data curation, L.P. and Y.Z. (Yuan Zhang); Writing—original draft, X.H.; Visualization, H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Pilot Project of Basic Research Operating Expenses System of Hebei Academy of Sciences, China (2025PF05).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article and Supplementary Material.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. He, H.; Li, X.; Li, X.; Cui, J.; Zhang, W.; Xu, W. Optimizing the DRASTIC method for nitrate pollution in groundwater vulnerability assessments: A case study in China. Pol. J. Environ. Stud. 2017, 27, 95–107. [Google Scholar] [CrossRef] [PubMed]
  2. Javadi, S.; Hashemy, S.; Mohammadi, K.; Howard, K.; Neshat, A. Classification of aquifer vulnerability using K-means cluster analysis. J. Hydrol. 2017, 549, 27–37. [Google Scholar] [CrossRef]
  3. Douglas, S.; Dixon, B.; Griffin, D.W. Assessing intrinsic and specific vulnerability models ability to indicate groundwater vulnerability to groups of similar pesticides: A comparative study. Phys. Geogr. 2018, 39, 487–505. [Google Scholar] [CrossRef]
  4. Pacheco, F.; Martins, L.; Quininha, M.; Oliveira, A.S.; Fernandes, L.S. Modification to the DRASTIC framework to assess groundwater contaminant risk in rural mountainous catchments. J. Hydrol. 2018, 566, 175–191. [Google Scholar] [CrossRef]
  5. Voutchkova, D.D.; Schullehner, J.; Rasmussen, P.; Hansen, B. A high-resolution nitrate vulnerability assessment of sandy aquifers (DRASTIC-N). J. Environ. Manag. 2021, 277, 111330. [Google Scholar] [CrossRef]
  6. Antonakos, A.; Lambrakis, N. Development and testing of three hybrid methods for the assessment of aquifer vulnerability to nitrates, based on the DRASTIC model, an example from NE Korinthia, Greece. J. Hydrol. 2007, 333, 288–304. [Google Scholar] [CrossRef]
  7. Huan, H.; Wang, J.; Teng, Y. Assessment and validation of groundwater vulnerability to nitrate based on a modified DRASTIC model: A case study in Jilin City of northeast China. Sci. Total Environ. 2012, 440, 14–23. [Google Scholar] [CrossRef]
  8. Barbulescu, A. Assessing groundwater vulnerability: DRASTIC and DRASTIC-like methods: A review. Water 2020, 12, 1356. [Google Scholar] [CrossRef]
  9. Fannakh, A.; Farsang, A. DRASTIC, God, and SI approaches for assessing groundwater vulnerability to pollution: A review. Environ. Sci. Eur. 2022, 34, 77. [Google Scholar] [CrossRef]
  10. Aller, L.; Bennet, T.; Leher, J.H.; Petty, R.J.; National Water Well Association. DRASTIC: A Standardized System for Evaluating Groundwater Pollution Potential Using Hydrogeologm Settings; EPA/600/2 85/018_296; United States Environmental Protection Agency: Ada, OK, USA, 1985. [Google Scholar]
  11. Aller, L.; Bennet, T.; Leher, J.H.; Petty, R.J. DRASTIC: A Standardized System for Evaluating Ground Water Pollution Potential Using Hydrogeological Settings; US Environmental Protection Agency: Washington, DC, USA, 1987; pp. 1–455. [Google Scholar]
  12. Rupert, M. Calibration of the DRASTIC ground water vulnerability mapping method. Ground Water 2001, 39, 625–630. [Google Scholar] [CrossRef]
  13. Wu, C.; Yin, S.; Liu, H.; Chen, H. Groundwater vulnerability assessment and feasibility mapping under reclaimed water irrigation by a modified DRASTIC model. Water Resour. Manag. 2014, 28, 1219–1234. [Google Scholar] [CrossRef]
  14. Khan, A.; Khan, H.; Umar, R.; Khan, M. An integrated approach for aquifer vulnerability mapping using GIS and rough sets: Study from an alluvial aquifer in north India. Hydrogeol. J. 2014, 22, 1561–1572. [Google Scholar] [CrossRef]
  15. Wei, A.; Bi, P.; Guo, J.; Lu, S.; Li, D. Modified DRASTIC model for groundwater vulnerability to nitrate contamination in the Dagujia river basin, China. Water Supply 2021, 21, 1793–1805. [Google Scholar] [CrossRef]
  16. Pourkhosravani, M.; Jamshidi, F.; Sayari, N. Evaluation of groundwater vulnerability to pollution using DRASTIC, composite DRASTIC, and nitrate vulnerability models. Environ. Health Eng. Manag. 2021, 8, 129–140. [Google Scholar] [CrossRef]
  17. Guo, X.; Xiong, H.; Li, H.; Gui, X.; Hu, X.; Li, Y.; Cui, H.; Qiu, Y.; Zhang, F.; Ma, C. Designing dynamic groundwater management strategies through a composite groundwater vulnerability model: Integrating human-related parameters into the DRASTIC model using Lightgbm regression and SHAP analysis. Environ. Res. 2023, 236, 116871. [Google Scholar] [CrossRef] [PubMed]
  18. Gupta, T.; Kumari, R. Assessment of groundwater nitrate vulnerability using DRASTIC and modified DRASTIC in upper catchment of Sabarmati basin. Environ. Earth Sci. 2023, 82, 344. [Google Scholar] [CrossRef]
  19. Tian, H.; Xiao, C.; Xu, H.; Liang, X.; Zhao, H.; Zhao, Q.; Qiao, L.; Zhang, W. Groundwater vulnerability assessment for nitrate pollution based on modified DRASTIC method: A case study in southwest China. Appl. Ecol. Environ. Res. 2024, 22, 2339–2358. [Google Scholar] [CrossRef]
  20. Saaty, T.L. How to make a decision: The analytic hierarchy process. Eur. J. Oper. Res. 1990, 48, 9–26. [Google Scholar] [CrossRef]
  21. Thirumalaivasan, D.; Karmegam, M.; Venugopal, K. AHP-DRASTIC: Software for specific aquifer vulnerability assessment using DRASTIC model and GIS. Environ. Model. Softw. 2003, 18, 645–656. [Google Scholar] [CrossRef]
  22. Panagopoulos, G.P.; Antonakos, A.K.; Lambrakis, N.J. Optimization of the DRASTIC method for groundwater vulnerability assessment via the use of simple statistical methods and GIS. Hydrogeol. J. 2006, 14, 894–911. [Google Scholar] [CrossRef]
  23. Chen, S.-K.; Jang, C.-S.; Peng, Y.-H. Developing a probability-based model of aquifer vulnerability in an agricultural region. J. Hydrol. 2013, 486, 494–504. [Google Scholar] [CrossRef]
  24. Armengol, S.; Sanchez-Vila, X.; Folch, A. An approach to aquifer vulnerability including uncertainty in a spatial random function framework. J. Hydrol. 2014, 517, 889–900. [Google Scholar] [CrossRef]
  25. Pacheco, F.; Pires, L.; Santos, R.; Fernandes, L.S. Factor weighting in DRASTIC modeling. Sci. Total Environ. 2015, 505, 474–486. [Google Scholar] [CrossRef]
  26. Fijani, E.; Nadiri, A.A.; Moghaddam, A.A.; Tsai, F.T.-C.; Dixon, B. Optimization of DRASTIC method by supervised committee machine artificial intelligence to assess groundwater vulnerability for Maragheh-Bonab plain aquifer, Iran. J. Hydrol. 2013, 503, 89–100. [Google Scholar] [CrossRef]
  27. Stigter, T.Y.; Ribeiro, L.; Dill, A.M.M.C. Evaluation of an intrinsic and a specific vulnerability assessment method in comparison with groundwater salinisation and nitrate contamination levels in two agricultural regions in the south of Portugal. Hydrogeol. J. 2006, 14, 79–99. [Google Scholar] [CrossRef]
  28. Stuart, M.; Chilton, P.; Kinniburgh, D.; Cooper, D. Screening for long-term trends in groundwater nitrate monitoring data. Q. J. Eng. Geol. Hydrogeol. 2007, 40, 361–376. [Google Scholar] [CrossRef]
  29. Wang, L.; Stuart, M.E.; Bloomfield, J.P.; Butcher, A.S.; Gooddy, D.C.; McKenzie, A.A.; Lewis, M.A.; Williams, A.T. Prediction of the arrival of peak nitrate concentrations at the water table at the regional scale in Great Britain. Hydrol. Process. 2012, 26, 226–239. [Google Scholar] [CrossRef]
  30. Stevenazzi, S.; Masetti, M.; Nghiem, S.V.; Sorichetta, A. Groundwater vulnerability maps derived from a time-dependent method using satellite scatterometer data. Hydrogeol. J. 2015, 23, 631–647. [Google Scholar] [CrossRef]
  31. Wang, L.; Stuart, M.E.; Lewis, M.; Ward, R.; Skirvin, D.; Naden, P.; Collins, A.; Ascott, M. The changing trend in nitrate concentrations in major aquifers due to historical nitrate loading from agricultural land across England and Wales from 1925 to 2150. Sci. Total Environ. 2016, 542, 694–705. [Google Scholar] [CrossRef]
  32. Zhong, Z. A discussion of groundwater vulnerability assessment methods. Earth Sci. Front. 2005, 12, 3–13. [Google Scholar]
  33. Feng, W.; Wang, S.; Hu, C.; Li, L. Landform sedimentary contributed to groundwater nitrate vulnerability in multi-alluvial fan aquifer systems in a watershed. Environ. Earth Sci. 2023, 82, 232. [Google Scholar] [CrossRef]
  34. Kazakis, N.; Voudouris, K.S. Groundwater vulnerability and pollution risk assessment of porous aquifers to nitrate: Modifying the DRASTIC method using quantitative parameters. J. Hydrol. 2015, 525, 13–25. [Google Scholar] [CrossRef]
  35. Teso, R.R.; Poe, M.P.; Younglove, T.; McCool, P.M. Use of logistic regression and GIS modeling to predict groundwater vulnerability to pesticides. J. Environ. Qual. 1996, 25, 425–432. [Google Scholar] [CrossRef]
  36. Tesoriero, A.; Voss, F. Predicting the probability of elevated nitrate concentrations in the Puget Sound basin: Implications for aquifer susceptibility and vulnerability. Ground Water 1997, 35, 1029–1039. [Google Scholar] [CrossRef]
  37. Gentle, J.E. Random Number Generation and Monte Carlo Methods; Springer: New York, NY, USA, 2003. [Google Scholar] [CrossRef]
  38. Fishman, G.S. Monte Carlo: Concepts, Algorithms, and Applications; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
  39. John, H. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence; The MIT Press: Cambridge, MA, USA, 1992. [Google Scholar] [CrossRef]
  40. Mitchell, M. An Introduction to Genetic Algorithms, 5th ed.; MIT Press: Cambridge, MA, USA; London, UK, 1999. [Google Scholar]
  41. Whitley, D. A genetic algorithm tutorial. Stat. Comput. 1994, 4, 65–85. [Google Scholar] [CrossRef]
  42. Eiben, A.E.; Smith, J.E. Introduction to Evolutionary Computing; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar] [CrossRef]
  43. McLay, C.; Dragten, R.; Sparling, G.; Selvarajah, N. Predicting groundwater nitrate concentrations in a region of mixed agricultural land use: A comparison of three approaches. Environ. Pollut. 2001, 115, 191–204. [Google Scholar] [CrossRef]
  44. Stigter, T.; Ribeiro, L.; Dill, A. Building factorial regression models to explain and predict nitrate concentrations in groundwater under agricultural land. J. Hydrol. 2008, 357, 42–56. [Google Scholar] [CrossRef]
  45. Zhu, Z.; Chen, D. Nitrogen fertilizer use in China—Contributions to food production, impacts on the environment and best management strategies. Nutr. Cycl. Agroecosyst. 2002, 63, 117–127. [Google Scholar] [CrossRef]
  46. Li, H.; Liu, Z.; Zheng, L.; Lei, Y. Resilience analysis for agricultural systems of North China Plain based on a dynamic system model. Sci. Agric. 2011, 68, 8–17. [Google Scholar] [CrossRef]
  47. Brauns, B.; Bjerg, P.L.; Song, X.; Jakobsen, R. Field scale interaction and nutrient exchange between surface water and shallow groundwater in the Baiyang lake region, North China Plain. J. Environ. Sci. 2016, 45, 60–75. [Google Scholar] [CrossRef]
  48. Wang, M.; Liu, D.G.; Wu, L.W.; Bao, Y.; Liu, N.W. Prediction of agriculture derived groundwater nitrate distribution in North China Plain with GIS-based BPNN. Environ. Geol. 2006, 50, 637–644. [Google Scholar] [CrossRef]
  49. Wang, S.; Tang, C.; Song, X.; Yuan, R.; Wang, Q.; Zhang, Y. Using major ions and δ15N-NO3− to identify nitrate sources and fate in an alluvial aquifer of the Baiyangdian lake watershed, North China Plain. Environ. Sci. Process. Impacts 2013, 15, 1430–1443. [Google Scholar] [CrossRef]
  50. Wu, C. Landform Environment and Its Formation in North China; Science Press: Beijing, China, 2008. (In Chinese) [Google Scholar]
  51. Pardo-Iguzquiza, E.; Dowd, P. Empirical Maximum Likelihood Kriging: The General Case. Math. Geol. 2005, 37, 477–492. [Google Scholar] [CrossRef]
  52. Farr, T.G.; Rosen, P.A.; Caro, E.; Crippen, R.; Duren, R.; Hensley, S.; Kobrick, M.; Paller, M.; Rodriguez, E.; Roth, L.; et al. The Shuttle Radar Topography Mission. Rev. Geophys. 2007, 45, 1–33. [Google Scholar] [CrossRef]
  53. Hirt, C.; Filmer, M.S.; Featherstone, W.E. Comparison and Validation of the Recent Freely Available ASTER-GDEM ver1, SRTM ver4.1 and GEODATA DEM-9S ver3 Digital Elevation Models over Australia. 2010. Available online: https://espace.curtin.edu.au/bitstream/20.500.11937/43846/2/137777_137777.pdf (accessed on 28 July 2009).
  54. Rexer, M.; Hirt, C. Comparison of free high resolution digital elevation data sets (ASTER GDEM2, SRTM v2.1/v4.1) and validation against accurate heights from the Australian National Gravity Database. Aust. J. Earth Sci. Int. Geosci. J. Geol. Soc. Aust. 2014, 61, 213–226. [Google Scholar] [CrossRef]
  55. Zhang, Z.; Fei, Y. (Eds.) Atlas of Groundwater Sustainable Utilization in North China Plain; China Cartographic Publishing House: Beijing, China, 2009. [Google Scholar]
  56. Cao, G.; Scanlon, B.R.; Han, D.; Zheng, C. Impacts of thickening unsaturated zone on groundwater recharge in the North China Plain. J. Hydrol. 2016, 537, 260–270. [Google Scholar] [CrossRef]
  57. Wang, H.; Pan, X.; Luo, J.; Luo, Z.; Chang, C.; Shen, Y. Using remote sensing to analyze spatiotemporal variations in crop planting in the North China Plain. Chin. J. Eco Agric. 2015, 23, 1199–1209. [Google Scholar]
  58. Min, L.; Shen, Y.; Pei, H.; Wang, P. Water movement and solute transport in deep vadose zone under four irrigated agricultural land-use types in the North China Plain. J. Hydrol. 2018, 559, 510–522. [Google Scholar] [CrossRef]
  59. Rosen, L. A study of the DRASTIC methodology with emphasis on Swedish conditions. Ground Water 1994, 32, 278–285. [Google Scholar] [CrossRef]
  60. Turkeltaub, T.; Jia, X.; Zhu, Y.; Shao, M.-A.; Binley, A. Recharge and nitrate transport through the deep vadose zone of the Loess Plateau: A regional-scale model investigation. Water Resour. Res. 2018, 54, 4332–4346. [Google Scholar] [CrossRef]
Figure 1. Location of the study area and distribution of the groundwater samples. The nitrate data are cited from the work of Feng et al. [33].
Figure 1. Location of the study area and distribution of the groundwater samples. The nitrate data are cited from the work of Feng et al. [33].
Applsci 15 02866 g001
Figure 2. Nitrate concentration and vulnerability maps for: comparison of nitrate concentration distribution and vulnerability index of (a) Aller’s common DRASTIC; (b) pesticide DRASTIC; (c) Monte Carlo DRASTIC; and (d) Genetic Algorithm DRASTIC.
Figure 2. Nitrate concentration and vulnerability maps for: comparison of nitrate concentration distribution and vulnerability index of (a) Aller’s common DRASTIC; (b) pesticide DRASTIC; (c) Monte Carlo DRASTIC; and (d) Genetic Algorithm DRASTIC.
Applsci 15 02866 g002
Figure 3. Nitrate concentration changes from 1998 to 2018 and level difference maps for: (a) nitrate concentration changes; (b) level difference.
Figure 3. Nitrate concentration changes from 1998 to 2018 and level difference maps for: (a) nitrate concentration changes; (b) level difference.
Applsci 15 02866 g003
Table 1. Rating scores for the recharge and soil media for Aller’s common and pesticide DRASTIC models.
Table 1. Rating scores for the recharge and soil media for Aller’s common and pesticide DRASTIC models.
Net Recharge (According to [11]) Soil Media
DRASTIC Range (mm)Range in This Paper (mm)ScoreInfiltration CoefficientScore
0–510–5010.21
51–10250–10030.2261.52
102–178100–15060.2752.5
150–20070.2852.7
178–254200–25080.33
>254>25090.45
0.456
0.558
0.6510
Table 2. Rating scores for geomorphic units and land use types of Aller’s common and pesticide DRASTIC models.
Table 2. Rating scores for geomorphic units and land use types of Aller’s common and pesticide DRASTIC models.
LandformsLand Use
Geomorphic UnitsNO3 (mg/L)ScoreLand Use TypesNO3 (mg/L)Score
River belts12.061Cotton1
Alluvial and flood plains13.211.24Woods12.191
Lakes and depressions21.122.9Residential area16.321.88
Alluvial-proluvial fans44.447.78Orchards21.673.02
Proluvial fans55.0310Wheat/maize29.914.78
Vegetables54.3310
Table 3. Weights for Aller’s common and pesticide DRASTIC [11].
Table 3. Weights for Aller’s common and pesticide DRASTIC [11].
VersionWeights
DRASTICL
original54321535
pesticide54353425
Table 4. Modified rating scores for the DRASTIC model parameters.
Table 4. Modified rating scores for the DRASTIC model parameters.
Depth to the Water Table (D) Net Recharge (R) Aquifer Thickness (A)
Range (m)NO3 (mg/L)ScoreRange (mm)NO3 (mg/L)ScoreRange (m)NO3 (mg/L)Score
0–30.547.37100–10229.649.870–4555.6410
>30.520.454.32>10230.0210>4525.674.61
Infiltration coefficient (S) Hydraulic Conductivity (C) Land use (L)
RangeNO3 (mg/L)ScoreRange (m/d)NO3 (mg/L)ScoreTypesNO3 (mg/L)Score
0–0.27522.923.610–40.711.82.8All types except
for vegetables
27.715.1
>0.27563.4510>40.742.0710Vegetables54.3310
Topography slope (T) Landform (I)
Range (%)NO3 (mg/L)ScoreGeomorphic unitsNO3 (mg/L)Score
0–222.664.31River belts12.062.19
2–632.616.2Alluvial and flood plains13.212.4
6–1225.064.76Lakes and depressions21.123.84
>1252.5910Alluvial-proluvial fans44.448.08
Proluvial fans55.0310
Table 5. Results of the correlation analysis, logistic regression analysis and MC and GA.
Table 5. Results of the correlation analysis, logistic regression analysis and MC and GA.
Correlation AnalysisLogistic Regression AnalysisMC and GA
ParametersSpearman’s ρSp aKendall’s τkp bWeightb Coefficients cp-Value dWeightWeight (MC)Weight (GA)
123123
D0.4130 *0.340 *4.2250.082230.33120.0934.9934.8554.80254.7874.839
R−0.1630.039 *−0.1340.039 *1.6664.429040.314450.0860.030.1710.232
A0.3160 *0.260 *3.2310.056020.61780.0632.2522.4142.4462.2512.4912.5
S0.4260 *0.3510 *4.3580.148770.07780.1682.6242.5222.4672.4832.4742.452
T0.1130.1510.090.151.140.176780.51160.22.7492.808
I0.470 *0.360 *4.6380.145430.17530.1641.7591.3791.4311.6641.3911.419
C0.4880 *0.4030 *50.162290.13410.1833.8643.83.7973.8893.8483.848
L0.2070.007 ∗0.170.008 *2.1160.357280.0113 *0.4035554.99855
a p-value for Spearman’s ρ. b p-value for Kendall’s τ. c Coefficients for logistic regression. d p-value for b coefficients. * statistically significant (confidence level > 95%). 1: using all parameters; 2: using seven parameters; 3: using six parameters.
Table 6. Comparison of the correlation coefficients for vulnerability and nitrate concentration calculated by different methods and the performance of these methods.
Table 6. Comparison of the correlation coefficients for vulnerability and nitrate concentration calculated by different methods and the performance of these methods.
C-DRASTIC aP-DRASTIC bM-DRASTIC cSPSA dLRA eCA fMC gGA h
123123123
Pearson’s relation coefficient0.5130.5020.6470.6460.591-0.6330.6340.6580.6560.6560.6580.6560.656
Spearman’s relation coefficient0.4780.4570.6430.6350.593-0.6240.6300.6380.6320.6340.6380.6280.635
Difficulty leveleasyeasymediummedium+difficultdifficultmediummedium
Computational efficiencyhighhighmediummedium−lower than CAlowmedium+medium+
a C-DRASTIC is Aller’s common DRASTIC. b P-DRASTIC is the pesticide DRASTIC. c M-DRASTIC is the DRASTIC model with modified rating scores and Aller’s common weights. d SPSA is single-parameter sensitivity analysis. e LRA is logistic regression analysis. f CA is correlation analysis. g MC is the Monte Carlo method. h GA is the Genetic Algorithm method. 1: using all parameters; 2: using seven parameters; 3: using six parameters.
Table 7. Results of the evaluation of Aller’s common and GA DRASTIC models.
Table 7. Results of the evaluation of Aller’s common and GA DRASTIC models.
Vulnerability LevelAller’s Common DRASTICGA DRASTIC
Percentage (%)Cumulative Percentage (%)Percentage (%)Cumulative Percentage (%)
Very Low28.2628.2627.3727.37
Low25.6253.8939.3566.72
Medium23.5477.4317.8184.53
High16.8394.2610.8895.41
Very High5.741004.59100
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hou, X.; Peng, L.; Zhang, Y.; Zhang, Y.; Wang, Y.; Feng, W.; Yang, H. A Data-Driven Method for Determining DRASTIC Weights to Assess Groundwater Vulnerability to Nitrate: Application in the Lake Baiyangdian Watershed, North China Plain. Appl. Sci. 2025, 15, 2866. https://doi.org/10.3390/app15052866

AMA Style

Hou X, Peng L, Zhang Y, Zhang Y, Wang Y, Feng W, Yang H. A Data-Driven Method for Determining DRASTIC Weights to Assess Groundwater Vulnerability to Nitrate: Application in the Lake Baiyangdian Watershed, North China Plain. Applied Sciences. 2025; 15(5):2866. https://doi.org/10.3390/app15052866

Chicago/Turabian Style

Hou, Xianglong, Liqin Peng, Yuan Zhang, Yan Zhang, Yunxia Wang, Wenzhao Feng, and Hui Yang. 2025. "A Data-Driven Method for Determining DRASTIC Weights to Assess Groundwater Vulnerability to Nitrate: Application in the Lake Baiyangdian Watershed, North China Plain" Applied Sciences 15, no. 5: 2866. https://doi.org/10.3390/app15052866

APA Style

Hou, X., Peng, L., Zhang, Y., Zhang, Y., Wang, Y., Feng, W., & Yang, H. (2025). A Data-Driven Method for Determining DRASTIC Weights to Assess Groundwater Vulnerability to Nitrate: Application in the Lake Baiyangdian Watershed, North China Plain. Applied Sciences, 15(5), 2866. https://doi.org/10.3390/app15052866

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop