Groundwater Simulations and Uncertainty Analysis Using MODFLOW and Geostatistical Approach with Conditioning Multi-Aquifer Spatial Covariance

This study presents an approach for obtaining limited sets of realizations of hydraulic conductivity (K) of multiple aquifers using simulated annealing (SA) simulation and spatial correlations among aquifers to simulate realizations of hydraulic heads and quantify their uncertainty in the Pingtung Plain, Taiwan. The proposed approach used the SA algorithm to generate large sets of natural logarithm hydraulic conductivity (ln(K)) realizations in each aquifer based on spatial correlations among aquifers. Moreover, small sets of ln(K) realizations were obtained from large sets of realizations by ranking the differences among cross-variograms derived from the measured ln(K) and the simulated ln(K) realizations between the aquifer pair Aquifer 1 and Aquifer 2 (hereafter referred to as Aquifers 1–2) and the aquifer pair Aquifer 2 and Aquifer 3 (hereafter referred to as Aquifers 2–3), respectively. Additionally, the small sets of realizations of the hydraulic conductivities honored the horizontal spatial variability and distributions of the hydraulic conductivities among aquifers to model groundwater precisely. The uncertainty analysis of the 100 combinations of simulated realizations of hydraulic conductivity was successfully conducted with generalized likelihood uncertainty estimation (GLUE). The GLUE results indicated that the proposed approach could minimize simulation iterations and uncertainty, successfully achieve behavioral simulations when reduced between calibration and evaluation runs, and could be effectively applied to evaluate uncertainty in hydrogeological properties and groundwater modeling, particularly in those cases which lack three-dimensional data sets yet have high heterogeneity in vertical hydraulic conductivities.


Introduction
Numerical simulations of groundwater flow are crucial to accurately predicting fluid movements in aquifer systems.One of the most influential parameters in groundwater simulation is hydraulic conductivity (K), which is a soil property that measures the ability of soil to transmit fluid through pore spaces and fractures [1] (e.g., the ratio of velocity to hydraulic gradient).Previous studies [2][3][4][5] have demonstrated that hydrogeological parameters, such as transmissivity and hydraulic conductivity, exhibit log-normal distributions.Kupfersberger and Deutsch [6] indicated that a deterministic distribution of hydraulic conductivity can only be acquired by measuring hydraulic conductivity at each location in an aquifer.However, though hydraulic conductivity in soil and rock exhibits large spatial heterogeneity, with likely variations of several orders of magnitude within a short distance [7,8], only a small number of measurements can be made due to the prohibitively expensive cost.It is thus suggested that hydraulic conductivity should be addressed within a stochastic framework, and the uncertainty should be quantified in groundwater simulations in multiple aquifers [1,7,9].
Hydraulic conductivity is one of the most critical and uncertain parameters in numerous groundwater models [1].The high degree of spatial variability in hydraulic conductivity and the complexity of causes of this variability are widely recognized as major impediments to making precise predictions [8].Stochastic methods are widely applied to estimate, simulate, and delineate representative heterogeneous field distributions of hydrogeological properties for groundwater models based on a limited number of groundwater samples [1,[9][10][11].Stochastic analysis allows a quantitative evaluation of the effects of variability and thereby provides a means of addressing uncertainty in the resultant heads and flows that are caused by uncertainty in the hydraulic conductivity field [10,12].Monte Carlo simulation is commonly applied in groundwater modeling, where hundreds-or even thousands-of simulations are performed using generated parameters such as hydraulic conductivity, from which the probability of occurrence of each simulated response can be computed based on statistical data [9,11].Typically, the parameters are generated by integrating random instances of parameter values from user-defined probability distribution functions or by generating a spatial distribution of parameters using such approaches as geostatistical simulation [13,14].In groundwater modeling, the flow and transport equations are then solved numerically for each simulated realization.Most authors acknowledge that the estimation of hydrological parameters from a limited number of parameters and hydraulic head measurements is an ill-posed problem that leads to non-unique solutions [15].All of these non-unique solutions satisfy a geostatistical characterization of the model domain, as encapsulated in a semivariogram, and all yield nearly identical model-generated heads at the locations of the observed hydraulic heads [15].Geostatistical and stochastic methods applied to modeling geological heterogeneity are common practice, based on the complexity of subsurface and the relative scarcity of data on media properties [3].
Geostatistical simulation, including lower upper (LU) decomposition, turning bands, sequential gaussian (SG) simulation and simulated annealing (SA) simulation, is a well-known technique for modeling the spatial uncertainty of a regionalized variable [1,2,10,11,[16][17][18] such as hydraulic conductivity.The typical geostatistical simulation is performed using a semivariogram model and a simulation algorithm to generate many realizations (conditional or unconditional) of the random function model [18].Geostatistical conditional simulation techniques can be applied to generate numerous multiple realizations of hydraulic parameters.However, the realizations likely match sample statistics, while conditioning data provide a quantitative measure of spatial uncertainty [19].The consensus is that no single algorithm is most suitable in all cases [17,20].The SA algorithm is a well-known adaptation of the Metropolis-Hastings algorithm, which was developed by Metropolis et al. [21] and subsequently generalized by Hastings [22].Basically, in geostatistics, SA simulation is one of Monte Carlo simulation techniques that perturbs or updates pixelwise an initial pixel grid over numeral iterations until the pixel values honor the given histogram and semivariogram model [23].As shown in Lin et al. [2], the SA simulation consistently simulated spatial distributions of transmissivity in their case.For this reason, the SA simulation was adopted in this study.
Although numerous recent studies have applied geostatistical simulation techniques to simulate hydrogeological properties [1][2][3]10,11,18,24], most of these simulated or estimated hydraulic conductivities of aquifers were not spatially correlated for a multi-aquifer subsurface.A set of alternative realizations is particularly useful in assessing uncertainty in the spatial distribution of attribute values [20,25].Yet, the fact that only a limited number of realizations is usually generated makes characterizing the spatial uncertainty difficult [20].However, a large number of realizations usually ensures that extreme scenarios are bounded in the response distribution [20].Accordingly, Water 2017, 9, 164 3 of 17 questions relate to the number of realizations needed to specify this space [20].Additional information can be used in conditional simulations to generate small but accurate realization sets representing hydrogeological parameters.Therefore, conditioning information, such as spatial correlations among measured hydraulic conductivity values of paired aquifers, can be considered in geostatistical simulations of horizontal spatial distributions in large multiple aquifer subsurfaces.The simulated hydraulic conductivity conditioning semivariogram, histogram and cross-semivariogram then can be used to model hydraulic heads in multi-aquifer systems when the parameters among the aquifers displayed are spatially correlated.
The generalized likelihood uncertainty estimation (GLUE) procedure [26] is an extension of Monte Carlo random sampling that incorporates the goodness-of-fit of each simulation [27], and has been widely used to quantify uncertainty based on model calibration results [28].The GLUE procedure comprehensively estimates the likelihood of all possible outcomes for a specific distribution of inputs, and practically determines behavioral parameter sets of a model [28,29].Based on the results of the GLUE procedure in assessments of multiple possibilities for generating simulations, a set of parameter sets, which lead to acceptable model realizations rather than only one "optional" calibrated parameter set, is determined.This determination refers to "equifinality" for model uncertainty analysis [30].It is noted that most relevant hydrological and groundwater studies have tended to use GLUE for their uncertainty analysis, such as Chu et al. [29] and Mirzaei et al. [31] in hydrological modeling; Hassan et al. [27], Jackson et al. [32], and Marchant et al. [33] in groundwater modeling; and Wang et al. [34] and Huang et al. [28] in wetland modeling.
The major purpose of this study was to develop an approach that uses SA simulation and spatial correlations among aquifers, obtain limited sets of realizations of hydraulic conductivity of multiple aquifers, and simulate realizations of hydraulic heads and quantify their uncertainty in a study case.We developed semivariogram and cross-semivariogram models of measured natural log hydraulic conductivity (ln(K)) for three aquifers, Aquifers 1, 2, and 3, in the Pingtung Plain of southern Taiwan.This study generated the average standard deviations for 500 sets of ln(K), statistics of ln(K) simulations, experimental semivariogams of ln(K) simulations, and GLUE analysis with threshold 0.6 to rank realizations which were great enough to capture and simulate distributions of K and groundwater levels.The fitted semivariogram model-based simulation was performed to generate 500 realizations of ln(K) for each aquifer, respectively.A rank-and-select process is proposed to consider the spatial relations between aquifer pairs.The pairs considered in this study are Aquifer 1 and 2 and the second pair Aquifer 2 and 3, hereafter referred to as Aquifers 1-2 and Aquifers 2-3 respectively.The top 100 combinations of ln(K) realizations of Aquifers 1, 2, and 3 were selected, with consideration given to the spatial relations among aquifer pairs, and used as input for MODFLOW to simulate hydraulic heads.GLUE analysis was employed to assess the global uncertainty of the simulated hydraulic heads based on the top 100 combinations.

Figure 1a
,b depicts the study process with GLUE analysis as flowcharts.In Figure 1a, GS+, SASIM (Simulated Annealing SIMulation) and GAM3 (a program of geostatistical) [13] is integrated with measured hydraulic conductivity (designated "measured K") and applied to generate 100 combinations of K which fit the spatial-correlation of measured data.In Figure 1b, the 100 combinations and one set of an interpolation of K are modeled in a well-calibrated MODFLOW simulation.The simulated head with different sets of K is then compared to demonstrate the uncertainty caused by different sets of K.

Study Area
The Pingtung Plain is the largest alluvial plain in the region in southern Taiwan.To its east are the Central Mountains of Taiwan, to the north and west are low hills of quaternary sediment, and to the south is the Taiwan Strait.The Pingtung Plain has an area of approximately 1140 km 2 , running about 60 km from north to south and 20 km from east to west (Figure 2a).Groundwater in the Plain is one important water source in southern Taiwan.Figure 2b,c shows the hydrogeological formation A'-A and B'-B, respectively.In Figure 2b,c, for example, F1 refers to Aquifer 1, T1 is Aquitard 1, and B1 is the boundary of the conceptual layer between Aquitard 1 and Aquifer 2. As shown in Figure 2b,c, the aquifer system includes Aquifer 1, Aquifer 2, and Aquifer 3, respectively.Aquifer 1, Aquifer 2, and Aquifer 3 monitoring wells are denoted by a cross, an empty square, and a black dot, respectively, in Figure 2a.The monitoring wells are part of the Taiwan government's Groundwater Monitoring Network Integrated Plan.After the monitoring wells are installed, pumping tests for achieving hydraulic conductivity (measured K) are required.The number of monitoring wells (i.e., number of measured K) are 41, 32, and 38 for Aquifer 1, Aquifer 2, and Aquifer 3, respectively.

Study Area
The Pingtung Plain is the largest alluvial plain in the region in southern Taiwan.To its east are the Central Mountains of Taiwan, to the north and west are low hills of quaternary sediment, and to the south is the Taiwan Strait.The Pingtung Plain has an area of approximately 1140 km 2 , running about 60 km from north to south and 20 km from east to west (Figure 2a).Groundwater in the Plain is one important water source in southern Taiwan.Figure 2b,c shows the hydrogeological formation A'-A and B'-B, respectively.In Figure 2b,c, for example, F1 refers to Aquifer 1, T1 is Aquitard 1, and B1 is the boundary of the conceptual layer between Aquitard 1 and Aquifer 2. As shown in Figure 2b,c, the aquifer system includes Aquifer 1, Aquifer 2, and Aquifer 3, respectively.Aquifer 1, Aquifer 2, and Aquifer 3 monitoring wells are denoted by a cross, an empty square, and a black dot, respectively, in Figure 2a.The monitoring wells are part of the Taiwan government's Groundwater Monitoring Network Integrated Plan.After the monitoring wells are installed, pumping tests for achieving hydraulic conductivity (measured K) are required.The number of monitoring wells (i.e., number of measured K) are 41, 32, and 38 for Aquifer 1, Aquifer 2, and Aquifer 3, respectively.

Study Area
The Pingtung Plain is the largest alluvial plain in the region in southern Taiwan.To its east are the Central Mountains of Taiwan, to the north and west are low hills of quaternary sediment, and to the south is the Taiwan Strait.The Pingtung Plain has an area of approximately 1140 km 2 , running about 60 km from north to south and 20 km from east to west (Figure 2a).Groundwater in the Plain is one important water source in southern Taiwan.Figure 2b,c shows the hydrogeological formation A'-A and B'-B, respectively.In Figure 2b,c

Spatial Correlations
In geostatistics, the semivariogram is widely used for quantifying the spatial relationships between sample values [2,13].In this study, the experimental semivariogram ( ) zz h γ , which is the half-mean squared difference between pairs of sample points by the lag distance h , is represented by: where ( ) zz h γ is the semivariance, h is the lag distance that separates pairs of points, ( ) Z x is the hydraulic conductivity at location x, ( ) Z x h + is the hydraulic conductivity at location x h + and ( ) N h is the number of point pairs separated by lag distance h , respectively.The spatial relationships between the two second-order stationary regionalized variables (e.g., the hydraulic conductivities in two aquifers), ( ) , which is expressed as: According to Equation (2), a cross-semivariogram between Aquifer 1 and Aquifer 2 is calculated based the natural log measured K which exists for both Aquifer 1 and Aquifer 2 at locations ( i x ).
Semivariogram modelling, which is needed for kriging and geostatistical simulation procedures, was performed on the experimental semivariograms and cross-semivariograms of measured natural log hydraulic conductivity (ln(K)).Two widely used semivariogram models, the spherical model and the Gaussian model are represented below by Equations ( 3) and ( 4), respectively, below.

Spatial Correlations
In geostatistics, the semivariogram is widely used for quantifying the spatial relationships between sample values [2,13].In this study, the experimental semivariogram γ zz (h), which is the half-mean squared difference between pairs of sample points by the lag distance h, is represented by: where γ zz (h) is the semivariance, h is the lag distance that separates pairs of points, Z(x) is the hydraulic conductivity at location x, Z(x + h) is the hydraulic conductivity at location x + h and N(h) is the number of point pairs separated by lag distance h, respectively.The spatial relationships between the two second-order stationary regionalized variables (e.g., the hydraulic conductivities in two aquifers), Z(x) and Y(x), can be characterized by the experimental cross-semivariogram γ xy (h), which is expressed as: According to Equation ( 2), a cross-semivariogram between Aquifer 1 and Aquifer 2 is calculated based the natural log measured K which exists for both Aquifer 1 and Aquifer 2 at locations (x i ).
Semivariogram modelling, which is needed for kriging and geostatistical simulation procedures, was performed on the experimental semivariograms and cross-semivariograms of measured natural log hydraulic conductivity (ln(K)).Two widely used semivariogram models, the spherical model and the Gaussian model are represented below by Equations ( 3) and (4), respectively, below.
Water 2017, 9, 164 6 of 17 In Equations ( 3) and ( 4), C is the nugget, h is the lag distance, a is the range, and S is the partial sill, respectively.In the case of the Gaussian model, the effective range A = 3 0.5 a, which is the distance at which the sill (equal to the sum of partial sill (S) and nugget (C)) is within 5% of the asymptote (the sill never meets the asymptote in the Gaussian or exponential models) [35].
The sill is the upper limit of a semivariogram that tends to level out at large distances (i.e., the range of a semivariogram model) [13].The variability of the investigated variable can be measured by a sill of the semivariogram model.A higher sill corresponds to greater variability [13].The range of a semivariogram model reveals the distance of two sites above which hydraulic conductivity becomes independent and is more likely to be random.Therefore, a greater range of the semivariogram model corresponds to better continuity of hydraulic conductivity in this study.In this study, the experimental semivariograms and cross-semivariograms models of measured ln(K) were calculated with the GAM3 program from the geostatistical software library (GSLIB) [13].The variography of the measured ln(K) experimental semivariograms and cross-semivariograms were performed by GS + software [35].

Geostatistical Simulation
The SA algorithm was used to simulate the spatial distributions of hydraulic conductivity realizations of ln(K).In this study, the modified simulated annealing simulation (SASIM) program, which is a SA routine of GSLIB [13], was used to generate ln(K) realizations.Execution steps of the SASIM program are described as follows.Firstly, the SA program assigned random values in the simulation grids from a given histogram and then swapped the pairs of data values (i.e., ln(K)) at random locations in the simulation grids.After each swap, the object function (O) in Equation ( 5), which is defined as an average squared difference between the experimental semivariogram of ln(K) and given semivariogram, was employed to measure the goodness of fit.
where γ(h) is the prespecified semivariogram, and γ * (h) is the semivariogram calculated from the simulated realizations of ln(K).To avoid trapping within the local minima of the object function O, a temperature function (i.e., a Boltzman distribution) in the SASIM program controls the speed at which the optimization function is reduced by allowing swaps that increase the optimization function [13].In the temperature function, t represents the temperature in the annealing procedure.As temperature increases, the probability that an unfavorable swap will be accepted increases [13].The temperature function used in this study is shown in Equation (6).
To obtain combinations of realizations for a multi-aquifer system which considers the semivariograms of each aquifer and cross-semivariograms of aquifer combinations simultaneously, a two-step approach is proposed.First, the geostatistical simulation SASIM program which conditionally simulates a complete 3-D field with simulated-annealing, was run in 316 cells each with 2.0 km × 2.0 km dimensions.A 2.0 km × 2.0 km cell size was suggested by Chang and Liu [36] and the grid systems of the entire study area are shown in Figure 2d.The objective function of SASIM is to minimize the squared difference between the desired semivariogram and the actual semivariogram by generating realizations that are guaranteed to fit the actual semivariogram.Previous studies [2][3][4] have demonstrated that hydrogeological parameters, such as transmissivity and hydraulic conductivity, exhibit log-normal distributions.Therefore, we applied the SASIM program to generate 500 realizations of ln(K) for Aquifer 1, Aquifer 2, and Aquifer 3, respectively.The candidate number of aquifer combinations is 500 3 (125,000,000).In order to consider the spatial relationship between aquifer pairs Aquifers 1-2 and Aquifers 2-3 for producing the combinations of simulated realizations of ln(K), the cross-semivariograms of ln(K) for Aquifers 1-2 and Aquifers 2-3 were ranked to select the appropriate ln(K) in our study area.Therefore, we calculated the differences of cross-semivariograms derived from the measured ln(K) and the simulated realizations of ln(K) for Aquifers 1-2 and Aquifers 2-3, respectively.The differences were ranked and the top 100 combinations of simulated realizations of hydraulic conductivity were selected for MODFLOW modeling.

Modular Groundwater Flow Model
The MODFLOW model was reported by McDonald and Harbaugh [37].The MODFLOW model is a physical finite-difference numerical flow model developed by the U.S. Geological Survey (USGS) to solve three-dimensional partial-differential equations for ground-water flow through a porous medium using a finite-difference method [38,39].Before running MODFLOW, the study area should be discretized into cells for which parameters are assigned to simulate confined or unconfined flows and saturated flows in one, two, or three dimensions.The groundwater flow process solves the partial differential equation using a finite-difference method in which the groundwater flow system is divided into a grid whereby the hydraulic head is calculated.The details of MODFLOW can be found in the MODFLOW and MODFLOW-2000 guides edited by McDonald and Harbaugh [38] and Harbaugh et al. [39], respectively.
In this study, the Pingtung Plain was divided into 316 cells, each with a cell size of 2.0 × 2.0 km (Figure 2d).The top ranked 100 combinations of simulated realizations of hydraulic conductivity were input for MODFLOW to simulate steady-state hydraulic heads.Because one of the aims of this study is to develop an approach to simulate spatial distributions of hydraulic conductivity honoring measured heads, the steady-state groundwater model is sufficient.Moreover, no flow boundary conditions of the study plain were imposed along the eastern, western, and northern boundaries of the Pingtung Plain, and a constant head equal to mean sea level datum was specified along the southern domain to simulate the coastal boundary (Figure 2d).Only the main river, Kaoping River, and its upstream branch, including the Chishan River, Laonung River, and Ailiao River, are considered in the groundwater model.These river sections are connected to the top aquifer (Aquifer 1), though the model considered the sink or source terms (type II) instead of the interaction between rivers and the top aquifer.The quantities for river exchange (between three major rivers), recharge (plain area), and pumpage (from urban and suburban areas) were determined in an inverse procedure, and have been calibrated.

Generalised Likelihood Uncertainty Estimation (GLUE)
The performance of the top ranked 100 combinations of simulated realizations of hydraulic conductivity for hydraulic head simulation to simulate the hydraulic heads was evaluated by a likelihood function of the Nash-Sutcliffe efficiency (NSE) index (Equation ( 7)), which is a normalized measure (−Inf to 1.0) that calculates the relative measure of the residual variance compared to the measured data variance [28,40,41].The likelihood function is calculated as follows where the L(θ i |Y) is the likelihood measure for each parameter set θ, N is the total number of measured heads, Y i is the measured head of the ith monitoring well, Y is the average of measured heads of the monitoring wells and Ŷi is the corresponding simulated head of MODFLOW output.
If the simulated head MODFLOW predicts the measured head perfectly, then Y i will equal to Ŷi , which implies that the square of the differences between the model simulations and the measured heads is 0 and the NSE index equals 1.A NSE value of 0 suggests that the model performs only as well as the use of the mean measuring heads Y in prediction.A NSE index less than 0 indicates that the residual variance is larger than the measured heads variance, which implies that the mean of measured heads is a better predictor than the model.Therefore, we prefer the NSE values to be larger than 0.0 and approaching 1.0.In their ground water studies, Jackson et al. [32] and Marchant et al. [33] selected an NSE value of 0.5 to represent the threshold of determination of non-behavioral simulations and behavioral simulations.Ritter and Muñoz-Carpena [42] determined an NSE threshold for performance evaluation of hydrological models at 0.65.In this study, the GLUE analysis with an NSE index threshold larger than 0.6 was used to assess the uncertainty and behavioral parameter sets on the selected realizations of the hydraulic conductivities associated with the simulated hydraulic heads.

Semivariogram and Cross-Semivariogram Analyses
Three experimental semivariograms of measured hydraulic conductivity ln(K) for Aquifer 1, Aquifer 2, and Aquifer 3, respectively, and three cross-semivariograms of measured ln(K) between aquifers pairs Aquifers 1-2, Aquifers 2-3, and Aquifers 1-3, were computed by the geostatistical software GS+.In order to perform the semivariogram modelling on the three experimental semivariograms and three cross-semivariograms, a relatively consistent set of best-fit models was generated using the least-squares fitting technique with minimum reduced sum of squares (RSS), and maximum R 2 in GS+.The theoretical models used for fitting the three semivariograms of measured ln(K) and the three cross-semivariograms of measured ln(K) were the spherical and Gaussian models, respectively.
Table 1 lists the parameters of fitted spherical models and Gaussian models for the corresponding semivariograms and cross-semivariograms.A trial and error procedure is applied to decide the model type of semivariograms and cross-semivariograms. Semivariograms for Aquifer 1, 2, and 3 can be fitted better with spherical models and cross-semivariograms for Aquifers 1-2 and Aquifers 2-3 can be fitted better with Gaussian models.It is noted that the cross-semivariogram of Aquifers 1-3 can only be fitted with a pure nugget model (not shown in Table 1) which indicates no spatial correlation of ln(K) between Aquifer 1 and Aquifer 3. The small-scale variations can be represented as the ratio of the nugget effect to the sill, which is the sum of the variogram models' partial sill and nugget variance.As observed in Table 1, the small-scale variations (nugget effects) of the fitted spherical model for Aquifer 1, Aquifer 2, and Aquifer 3 are 29.4% (0.990/3.370), 24.3% (0.350/1.440) and 23.9% (=0.390/1.630),respectively.The small-scale variations of the fitted Gaussian model for Aquifers 1-2 and Aquifers 2-3 are 0.3% (0.001/0.392) and 0.4% (0.001/0.261), respectively.Ranges of the fitted spherical model for Aquifer 1, Aquifer 2, and Aquifer 3 are 32,110.0m, 32,130.0m and 10,280.0m, respectively.Ranges of the fitted Gaussian model for Aquifers 1-2 and Aquifers 2-3 are 5850.0m and 5050.0 m, respectively.The comparison of ranges of the fitted spherical and Gaussian models indicates that the ln(K) values at monitoring wells exhibit greater spatial continuity in the single aquifer than in the pairs of aquifers.These spatial correlations are explained by the characteristic horizontal distribution of hydraulic conductivity in a hydrogeological range and those among aquifers in the study plain-particularly between Aquifers 1-2.
Water 2017, 9, 164 9 of 17 The cross-semivariogram results also show that hydraulic conductivities of aquifers are spatially correlated and that such correlations can be treated as additional information in the estimation and simulation of spatial distributions of hydraulic conductivity and groundwater in a hydrogeological region when spatial correlations of ln(K) among aquifers is exhibited.Moreover, small-scale discontinuity, heterogeneity, and spatial variability of ln(K) also cause poor values of R 2 and high values of RSS when fitting semivariogram models of ln(K) in this study.

Conditional Geostatistical Simulations
We applied the SASIM program in GSLIB to generate 500 sets of ln(K) realizations for Aquifer 1, Aquifer 2, and Aquifer 3 in the study area.The combinations of simulated realizations of ln(K) for the three aquifers were ranked with the consideration of the spatial relationships between pairs of aquifers.Figure 3 shows the impact of the number of the realizations on the average standard deviations of the top 500 realizations.As observed in Figure 3, the average standard deviation of hydraulic conductivity in Aquifer 1, Aquifer 2, and Aquifer 3 reaches a plateau when the realization numbers approach 500.We noted that the average standard deviation gradually became stable by realization number 100.Thus, the top 100 combinations of simulated realizations of hydraulic conductivity for Aquifer 1, Aquifer 2, and Aquifer 3 selected for MODFLOW modeling.
hydrogeological region when spatial correlations of ln(K) among aquifers is exhibited.Moreover, small-scale discontinuity, heterogeneity, and spatial variability of ln(K) also cause poor values of R 2 and high values of RSS when fitting semivariogram models of ln(K) in this study.

Conditional Geostatistical Simulations
We applied the SASIM program in GSLIB to generate 500 sets of ln(K) realizations for Aquifer 1, Aquifer 2, and Aquifer 3 in the study area.The combinations of simulated realizations of ln(K) for the three aquifers were ranked with the consideration of the spatial relationships between pairs of aquifers.Figure 3 shows the impact of the number of the realizations on the average standard deviations of the top 500 realizations.As observed in Figure 3, the average standard deviation of hydraulic conductivity in Aquifer 1, Aquifer 2, and Aquifer 3 reaches a plateau when the realization numbers approach 500.We noted that the average standard deviation gradually became stable by realization number 100.Thus, the top 100 combinations of simulated realizations of hydraulic conductivity for Aquifer 1, Aquifer 2, and Aquifer 3 selected for MODFLOW modeling.The kriged estimates of measured ln(K) of Aquifer 1, Aquifer 2, and Aquifer 3 were produced using kriging with a polynomial trend three-dimensional model (KT3D) program in GSLIB and processed using MODFLOW.Table 2 presents the descriptive statistics of ordinary kriging and SA results for ln(K).Compared with measured ln(K), the statistics shown in Table 2 reveal that the realizations of ln(K) simulated by SA capture the statistical characteristics of ln(K) data for Aquifer 1, Aquifer 2 and Aquifer 3.This suggests that the selected top three sets of realizations are able to characterize the spatial variability and heterogeneous properties of the hydraulic conductivity.The kriged estimates of measured ln(K) of Aquifer 1, Aquifer 2, and Aquifer 3 were produced using kriging with a polynomial trend three-dimensional model (KT3D) program in GSLIB and processed using MODFLOW.Table 2 presents the descriptive statistics of ordinary kriging and SA results for ln(K).Compared with measured ln(K), the statistics shown in Table 2 reveal that the realizations of ln(K) simulated by SA capture the statistical characteristics of ln(K) data for Aquifer 1, Aquifer 2 and Aquifer 3.This suggests that the selected top three sets of realizations are able to characterize the spatial variability and heterogeneous properties of the hydraulic conductivity.

Comparison of Simulated Heads Derived from Spatial Distributions of Ln(K) of Kriged Estimate and of the Selected Top Three Sets of Realizations
Figures 5-7 show the simulated heads derived from kriged estimates of ln(K) and from the selected top three sets of realizations of ln(K) for Aquifer 1, Aquifer 2, and Aquifer 3, respectively.The contours and the grey cells in the Figures 5-7 represent the simulated heads and ln(K), respectively.As shown in Table 2, the standard deviation of kriged estimate of ln(K) is much smaller than that of measured ln(K) and simulated ln(K), which can be observed by the smoother surfaces in Figures 5a, 6a and 7a.The comparisons of semivariograms are based on the analysis of 500 realizations each for Aquifers 1, 2, and 3.For cross-semivariogram cases, the number of possible combinations is 500 3 .A rank-and-select process is also applied to eliminate the worst combinations and to reduce the number to 100.Comparing measured semivariograms with the top three ranked simulated semivariograms for each aquifer, the sums of absolute differences between measured and simulated experimental semivariograms divided by the experimental semivariogram of measured data of hydraulic conductivity were all <0.143 for Aquifer 1, <0.147 for Aquifer 2, and <0.073 for Aquifer 3. The mean absolute relative error (MARE) between measured and realizations of experimental cross-semivariograms of hydraulic conductivity were all <0.115 for Aquifers 1-2, and <0.117 for Aquifers 2-3.The SASIM-generated realizations and the top 100 combinations are well-fitted to the Water 2017, 9, 164 11 of 17 measured semivariograms and cross-semivariograms, which can represent the characteristic of spatial statistic provided by measured data.On the other hand, estimations using ordinary kriging did not capture the semivariograms of the measured K values.The semivariogram and statistical analyses of estimates and simulations show that the realizations of ln(K) produced by the SASIM and the proposed two-step approach generated results that were consistent with spatial correlations and global statistics of ln(K) measurements.-7 show the simulated heads derived from kriged estimates of ln(K) and from the selected top three sets of realizations of ln(K) for Aquifer 1, Aquifer 2, and Aquifer 3, respectively.The contours and the grey cells in the Figures 5-7 represent the simulated heads and ln(K), respectively.As shown in Table 2, the standard deviation of kriged estimate of ln(K) is much smaller than that of measured ln(K) and simulated ln(K), which can be observed by the smoother surfaces in Figures 5a, 6a  and 7a.

Comparison of Simulated Heads Derived from Spatial Distributions of Ln(K) of Kriged Estimate and of the Selected Top Three Sets of Realizations
Figures 5-7 show the simulated heads derived from kriged estimates of ln(K) and from the selected top three sets of realizations of ln(K) for Aquifer 1, Aquifer 2, and Aquifer 3, respectively.The contours and the grey cells in the Figures 5-7 represent the simulated heads and ln(K), respectively.As shown in Table 2, the standard deviation of kriged estimate of ln(K) is much smaller than that of measured ln(K) and simulated ln(K), which can be observed by the smoother surfaces in Figures 5a, 6a and 7a.

Uncertainty Analysis of Spatial Distributions of ln(K) in Multiple Aquifers
For ln(K), 93 of 100 combinations were used for the uncertainty analysis because 7 of 100 ln(K) failed to adequately simulate heads.The calculated NSE index for the three aquifers, Aquifer 1, Aquifer 2, and Aquifer 3 are demonstrated in Figure 8. Table 3 lists the number of non-behavioral parameter sets for the three aquifers, Aquifer 1, Aquifer 2, and Aquifer 3. As observed in Figure 8 and Table 3, the number of non-behavioral parameter sets for NSE index less than 0.60, 0.65, and 0.7 are less than 7 for the three aquifers Aquifer 1, Aquifer 2, respectively.However, we noted that the number of non-behavioral parameter sets for NSE index less than 0.65 and 0.7 exceeds 10 sets for Aquifer 3. The above results suggest that MODFLOW consistently simulated hydraulic heads based on the top 100 ln(K) combinations.The NSE values mostly ranged from 0.75 to 0.95 in the calibration for groundwater levels in Aquifers 1, 2, and 3 except Aquifer 3 wherein the semivariogram presented a pure nugget.

Uncertainty Analysis of Spatial Distributions of in Multiple Aquifers
For ln(K), 93 of 100 combinations were used for the uncertainty analysis because 7 of 100 ln(K) failed to adequately simulate heads.The calculated NSE index for the three aquifers, Aquifer 1, Aquifer 2, and Aquifer 3 are demonstrated in Figure 8. Table 3 lists the number of non-behavioral parameter sets for the three aquifers, Aquifer 1, Aquifer 2, and Aquifer 3. As observed in Figure 8 and Table 3, the number of non-behavioral parameter sets for NSE index less than 0.60, 0.65, and 0.7 are less than 7 for the three aquifers Aquifer 1, Aquifer 2, respectively.However, we noted that the number of non-behavioral parameter sets for NSE index less than 0.65 and 0.

Semivariogram and Cross-Semivariogram Analyses
This study took the log-normal value of K as previous studies [3][4][5]43] have demonstrated that hydrogeological parameters (i.e., hydraulic conductivity) exhibit log-normal distributions.The variability of ln(K) was determined using geostatistical measurements of relevant values, such as mean, variance (sill), and range [44].A good fit between a semivariogram model and a field-sampled experimental semivariogram typically means that the model is a good representation of a field system and that geostatistical approaches are valid [3].The semivariogram range of ln(K) in Aquifer 3 was approximately 0.32 times that of the ranges in Aquifer 1 and Aquifer 2. As indicated by Sampler [45], the field test errors and small-scale variability could possibly cause nugget effects.The results of the semivariogram analysis revealed that the distributions of hydraulic conductivity in Aquifer 1 and Aquifer 2 are more spatially continuous than in Aquifer 3 (e.g., greater ranges of semivariograms).Additionally, the distribution of hydraulic conductivity in Aquifer 3 is more heterogeneous and complex than those in Aquifer 1 and Aquifer 2. This spatial continuity and heterogeneity of hydraulic conductivity is confirmed by the hydrogeological formation characteristics in each aquifer, as shown in Figure 2b,c.The variography results are similar to the conclusions of Simo et al. [4] namely, nugget effects are probably related to strong local variations at a short scale in the heterogeneous formations.Differing from the approach of Simo et al. [4] who used thickness-weighting, the proposed approach is suitable in cases lacking three-dimensional data sets, and cases where spatial correlation (or high heterogeneity) in vertical hydraulic conductivities to simulate hydraulic conductivities is lacking.The variography results indicated that small sets of ln(K) realizations were successfully obtained from large sets of realizations by ranking the differences cross-variograms derived from the measured ln(K) and the simulated ln(K) realizations for Aquifers 1-2 and Aquifers 2-3.However, it should be noted that when using our proposed approach, cross-semivariogram calculation problems arise if samples taken from paired aquifer locations do not correspond with each other.

Conditional Geostatistical Simulations
Much of the research on stochastic groundwater hydrology has focused on evaluating the effects of variability of hydraulic conductivity on dependent variables such as hydraulic head [8].Physical-based numerical groundwater flow models such as MODFLOW are frequently applied to refine hydrogeological characterizations and make informed groundwater management decisions [9].This study did not pursue a single, slickly varying, expected value of hydraulic conductivity that honors both hydraulic conductivity and hydraulic head measurements.Rather, it sought sets of equally likely realizations of hydraulic conductivity, all realizations of which are plausible representations of reality as they are conditional upon available data and exhibit the same horizontal variability aquifers as observed in the field.Stochastic methods are popular techniques for estimating, simulating and delineating representative heterogeneous field distributions of hydraulic conductivity with a limited number of groundwater samples [9].If hydraulic conductivity is stochastically represented, then the governing equations of physical numerical groundwater flow models become stochastic partial differential equations [6].All realizations of hydraulic conductivity are processed through a groundwater flow and transport simulator [6].In this study, unlike the studies of Bianchi et al. [10] and Ko et al. [5], SA algorithm conditionally generated multiple realizations of ln(K) for all aquifers and provided probable combinations of simulated ln(K) for all aquifers where ln(K) values were spatially correlated.The results indicated that the conditioned simulated hydraulic conductivity of multiple aquifers honored the spatial correlations (semivariograms) of measured hydraulic conductivity, and honored the spatial relations among the hydraulic conductivities of aquifers as additional information used for simulating the hydraulic conductivity in multiple aquifers.

Assessment of Ranking Realizations and Uncertainty of Groundwater Modeling
In this study, GLUE was used to assess the results of a stochastic groundwater flow model similar to the study of Hassan et al. [27].Differing from the studies of Hassan et al. [27] and Jackson et al. [32], this study's GLUE results with a NSE of 0.6 (>0.5 in Jackson et al. [32]; >0.65 in Ritter and Muñoz-Carpena [42]) indicated that most of the top 100 realizations based on the above simulation approach, which were selected by spatial relations among the hydraulic conductivities of aquifers, were behavioral parameter sets.The results highlight the importance of the spatial relations among the hydraulic conductivities of aquifers in the simulated hydraulic conductivities ln(K).When the values of the NSE increased from 0.6 to 0.7, the number of non-behavioral parameter sets increased slightly, except for Aquifer 3 wherein the semivariogram presented a pure nugget which is evidence of a highly heterogeneous physical formation at Aquifer 3. The use of GLUE with our proposed simulation approach in assessing and conditioning groundwater modeling results and uncertainties is suitable, however, GLUE approaches have been criticized and evaluated for not being formally Bayesian [27,[31][32][33]46].
In stochastic modeling, flow and transport equations are solved numerically for each realization.The practical number of conditional realizations that is required to characterize true field conditions varies according to the specific conditions of the problem [24].Goovaerts [25] used a set of 50 alternative realizations generated by SA algorithm to evaluate a measure of uncertainty in the spatial distribution of permeability.Goovaerts [20] stated that using more than 20 of 100 realizations generally only slightly increased the size of the spatial uncertainty.Recently, Warner et al. [24] applied turning bands and kriging methods to generate 30 realizations to represent uncertainty in hydraulic conductivity in their case.Ko et al. [5] indicated that when the number of realizations was greater than 30, their groundwater simulation means and standard deviations were similar.The common approach is to generate many realizations which are then processed using a transfer function, and the responses are then used to rank realizations of attribute values from the most to the least optimistic [20].The GLUE analysis results indicated that top 100 combinations using the proposed approach consistently represented hydraulic conductivities and groundwater models in the study area that are similar to the results of the above studies.The proposed approach could minimize the number of groundwater simulations needed to successfully achieve behavioral simulations when reduced between calibration and evaluation runs.However, Ritter and Muñoz-Carpena [42] indicated that the NSE of "very good" or "good" model efficiencies are 0.75 and 0.65, respectively.The number of combinations that presented behavioral parameter sets in the study case was 72 when the NSE value was greater than 0.70.Further studies should assess thresholds of NSE values for GLUE analysis.

Conclusions
This study presented a framework that integrates geostatistical simulation with conditioning spatial correlations of hydrogeological parameters among aquifers, as well as a physical groundwater numerical model to generate spatial distributions of hydraulic conductivities and groundwater for aquifers in the Pingtung Plain, Taiwan.The proposed approach provides 500 sets of equally likely realizations of hydraulic conductivity, all of which are plausible representations of reality as they are conditional on available data, and present the same horizontal variability and variability among aquifers, as observed in the field.The approach successfully obtained 100 sets of hydraulic conductivity based on spatial statistics, and physically simulated hydraulic heads based on simulated hydraulic conductivity in a way that honored the measured head data.The GLUE results indicated that MODFLOW consistently simulated hydraulic heads based on the top 100 ln(K) combinations using the proposed simulation approach, which implies that the small set of realizations of hydraulic conductivity should be considered in future groundwater modeling and uncertainty analysis.The proposed approach could minimize the number of simulations, and effectively achieve behavioral simulations when reduced between calibration and evaluation runs.Additionally, the proposed approach is suitable for cases lacking three-dimensional data sets, and cases lacking spatial correlation (i.e., with high heterogeneity) in vertical hydraulic conductivities so as to simulate hydraulic conductivities and groundwater levels.For future studies, it is essential to develop an integrated process or software program based on our proposed approach, a more applicable and original option than the current procedure.Moreover, improved validation techniques are needed.
, for example, F1 refers to Aquifer 1, T1 is Aquitard 1, and B1 is the boundary of the conceptual layer between Aquitard 1 and Aquifer 2. As shown in Figure 2b,c, the aquifer system includes Aquifer 1, Aquifer 2, and Aquifer 3, respectively.Aquifer 1, Aquifer 2, and Aquifer 3 monitoring wells are denoted by a cross, an empty square, and a black dot, respectively, in Figure 2a.The monitoring wells are part of the Taiwan government's Groundwater Monitoring Network Integrated Plan.After the monitoring wells are installed, pumping tests for achieving hydraulic conductivity (measured K) are required.The number of monitoring wells (i.e., number of measured K) are 41, 32, and 38 for Aquifer 1, Aquifer 2, and Aquifer 3, respectively.

Figure 2 .
Figure 2. (a) Locations of groundwater monitoring wells in the study area; (b) hydrogeological formation A'-A; (c) hydrogeological formation B'-B; (d) grid systems and boundary conditions.Note: "F1" refers to Aquifer 1, and "B1" is the conceptual layer boundary between Aquifer 1 and Aquifer 2.

Figure
Figure 4a-c show the experimental semivariograms of measured, kriged, and simulated values of ln(K) for Aquifer 1, Aquifer 2, and Aquifer 3, respectively.Figure 4d,e demonstrates the cross-semivariograms of measured, kriged, and simulated values of ln(K) for Aquifers 1-2 and Aquifers 2-3. Figure 4f shows the cross-semivariograms of measured, kriged, and simulated values of ln(K) between Aquifer 1 and Aquifer 3, and shows no spatial relation between Aquifer 1 and Aquifer 3. As observed in Figure 4, the experimental semivariograms and cross-semivariograms of measured ln(K) are consistent with the experimental semivariograms and cross-semivariograms of the top three ranked simulated ln(K)s for Aquifer 1, Aquifer 2, and Aquifer 3, respectively.Water 2017, 9, 164 11 of 17

3. 3 .
Figures5-7show the simulated heads derived from kriged estimates of ln(K) and from the selected top three sets of realizations of ln(K) for Aquifer 1, Aquifer 2, and Aquifer 3, respectively.The contours and the grey cells in the Figures5-7represent the simulated heads and ln(K), respectively.As shown in Table2, the standard deviation of kriged estimate of ln(K) is much smaller than that of measured ln(K) and simulated ln(K), which can be observed by the smoother surfaces in Figures5a, 6aand 7a.

Figure 5 .
Figure 5. Spatial distributions ln(K) and simulated heads of (a) kriged estimate; (b) rank1 of simulation; (c) rank2 of simulation; (d) rank3 of simulation based on measured heads in Aquifer 1. Note: unit of hydraulic conductivity (K) is m/day.

Figure 6 .
Figure 6.Spatial distributions ln(K) and simulated heads of (a) kriged estimate; (b) rank1 of simulation; (c) rank2 of simulation; (d) rank3 of simulation based on measured heads in Aquifer 2. Note: unit of hydraulic conductivity (K) is m/day.

Figure 6 .
Figure 6.Spatial distributions ln(K) and simulated heads of (a) kriged estimate; (b) rank1 of simulation; (c) rank2 of simulation; (d) rank3 of simulation based on measured heads in Aquifer 2. Note: unit of hydraulic conductivity (K) is m/day.

Figure 7 .
Figure 7. Spatial distributions ln(K) and simulated heads of (a) kriged estimate; (b) rank1 of simulation; (c) rank2 of simulation; (d) rank3 of simulation based on measured heads in Aquifer 3. Note: unit of hydraulic conductivity (K) is m/day.

Figure 7 .
Figure 7. Spatial distributions ln(K) and simulated heads of (a) kriged estimate; (b) rank1 of simulation; (c) rank2 of simulation; (d) rank3 of simulation based on measured heads in Aquifer 3. Note: unit of hydraulic conductivity (K) is m/day.

Table 3 .
The number of non-behavioral parameter sets for the three aquifers, Aquifer 1, Aquifer 2, and Aquifer 3.NSE Index Aquifers 1,

Table 3 .
The number of non-behavioral parameter sets for the three aquifers, Aquifer 1, Aquifer 2, and Aquifer 3.