Surrogate Model Application to the Identification of Optimal Groundwater Exploitation Scheme Based on Regression Kriging Method—A Case Study of Western Jilin Province

This paper introduces a surrogate model to identify an optimal exploitation scheme, while the western Jilin province was selected as the study area. A numerical simulation model of groundwater flow was established first, and four exploitation wells were set in the Tongyu county and Qian Gorlos county respectively so as to supply water to Daan county. Second, the Latin Hypercube Sampling (LHS) method was used to collect data in the feasible region for input variables. A surrogate model of the numerical simulation model of groundwater flow was developed using the regression kriging method. An optimization model was established to search an optimal groundwater exploitation scheme using the minimum average drawdown of groundwater table and the minimum cost of groundwater exploitation as multi-objective functions. Finally, the surrogate model was invoked by the optimization model in the process of solving the optimization problem. Results show that the relative error and root mean square error of the groundwater table drawdown between the simulation model and the surrogate model for 10 validation samples are both lower than 5%, which is a high approximation accuracy. The contrast between the surrogate-based simulation optimization model and the conventional simulation optimization model for solving the same optimization problem, shows the former only needs 5.5 hours, and the latter needs 25 days. The above results indicate that the surrogate model developed in this study could not only considerably reduce the computational burden of the simulation optimization process, but also maintain high computational accuracy. This can thus provide an effective method for identifying an optimal groundwater exploitation scheme quickly and accurately.


Introduction
Simulation and optimization is the main technical method for mechanism analysis, simulation prediction and scheme optimization of actual groundwater systems at present [1][2][3]. In general, the simulation model is used to simulate the actual groundwater system using a numerical method, which can not only express the intrinsic regularity of the actual groundwater system, but also describes the principles and regularities of physics, chemistry and biology followed by the actual groundwater system. As a replica of the actual groundwater system, it can also depict the response relationship between input and output. In a word, the simulation model is applied to predict the results of changes of natural conditions and human activities [4,5].
The simulation model can solve prediction problems in a given decision-making input, but it cannot indicate to us which decision-making input is able to obtain optimal response. Fortunately, the optimization model (operational research model) can solve the optimization problem, by which the optimal decision-making scheme should be obtained by optimizing the decision-making input scheme under the given objectives and constraints.
It should be noted that the optimization model must be based on the simulation model so as to ensure that the optimization process is followed by the intrinsic principle and regularity of the actual groundwater system (presented as simulation model). Therefore, the simulation model needs to be embedded in the optimization model by certain methods to make it become a part of the optimization model. Embedding method, response matrix method and state transition equation method are three methods commonly used to deal with how to embed and invoke the simulation model in the optimization model. However, all of them have their limitations. For example, the embedding method is used to solve small-scale problems generally, but problems of dimension disaster may be produced when using it to solve large-scale problems; the response matrix method is just used to solve linear problems. Comparing these two methods, the computational burden could be reduced to some extent when using the state transition equation method to solve large-scale, multi-period and nonlinear problems, but it still cost a great deal of time to solve the actual problems. Consequently, it is inappropriate for us to use these three methods to solve the large-scale, multi-period and nonlinear groundwater problems [6].
In recent years, some scholars have proposed a surrogate model of the simulation model, where the surrogate model is invoked by the optimization model directly in the process of the iterative solution of the optimization model, which not only solves large-scale nonlinear groundwater problems, but also reduces the huge computational burden and maintains a high computational accuracy. The frequently used surrogate models include the BP neural network model, the RBF neural network model, the regression kriging model, the support vector machine model and so on [7][8][9][10][11], which have been proved to substitute the simulation model. However, researches using the surrogate model of the simulation model to solve the groundwater flow optimization problem in reality are sparse, and both the high computational accuracy and low computational burden need to be verified in the process of the surrogate model invoked by the optimization model. The interpolation results of the regression kriging method have been proved to be very effective because it is unbiased and has minimum estimation variance [12,13], thus the regression kriging method is generally selected in the surrogate model of the simulation model [14][15][16][17][18].
Groundwater is one of the most important water resources for domestic water and agricultural water in western Jilin province, and exploiting groundwater excessively could cause geological environmental disasters including land collapse, soil salinization, desertification and so on. Therefore groundwater should be exploited in a reasonable way in life, which requires the search for an optimal exploitation scheme with consideration of allowable groundwater withdrawal and economic benefits.
This paper selected the western Jilin province as the study area and first established a numerical simulation model of groundwater flow in order to reduce the continuous decline of the groundwater table of Daan county in western Jilin province in recent years. Then four exploitation wells were set in the Tongyu county and Qian Gorlos county respectively so as to supply water to Daan county. Utilizing the LHS method in the above eight exploitation wells to extract 40 groups of exploitation schemes, they were then introduced into a numerical simulation model of groundwater flow so as to obtain a drawdown dataset of the groundwater table. A surrogate model of the numerical simulation model of the groundwater flow was established by the regression kriging method using the above exploitation schemes and drawdown dataset of the groundwater table. Using the LHS method again 10 groups of exploitation schemes were obtained which were introduced into the groundwater flow numerical simulation model and surrogate model simultaneously so as to verify the computational accuracy of the surrogate model. In view of geological environmental disasters on continuous decline of the groundwater table and the difference of the groundwater exploitation costs of the eight exploitation wells, an optimization model was established to search an optimal groundwater exploitation scheme using the minimum drawdown of groundwater table and minimum cost of groundwater exploitation as multi-objective functions. The optimal exploitation scheme was achieved in the process of the surrogate model invoked by the optimization model.

Study Area
The western Jilin province is located in the southwest of Songnen Plain where it is situated in the transitional zone from semi-humid to semi-arid area. The geographic coordinate lies between 123°09′~124°22′ east longitude and 44°57′~45°46′ north latitude. The whole region is mainly affected by the inland climate of Inner Mongolia and has typical features of the continental climate. The annual average air temperature is 4.6 °C, and the temperature in the southwest is higher, while the north and east are relatively lower. The average annual precipitation in this region is about 400-500 mm, and its temporal-spatial distribution is extremely uneven due to the influences of geographical location and topography.
The study area is a huge aquifer system which has multiple aquifers, including pore unconfined aquifers and pore confined aquifers (shallow and middle-deep), pore-fracture aquifers of Daan and Taikang formations of Neogene (deep), fracture-pore aquifers of lower and upper cretaceous (deep). The recharge sources of groundwater include precipitation infiltration, river leakage, irrigation, infiltration, and lateral groundwater runoff, in which precipitation infiltration is in the dominant position. The discharge of groundwater includes groundwater evaporation, discharge of river, lateral groundwater runoff and artificial exploitation, in which the groundwater evaporation and artificial exploitation are in the dominant position.

Latin Hypercube Sampling Method
LHS is a kind of homogeneous stratified sampling method, developed from the monte-carlo method [19,20]. The basic principle of this method is to divide the whole sample space into several subintervals, and choose randomly a sample in these subintervals. In this way, the sampling results can cover the entire sample space and will be more representative [21][22][23][24]. The detailed sampling process is described as follows: Suppose the dimension of random variable is κ, , x are the lower and upper limits of th i variable respectively. Then the process of stratified sampling for a multi-dimensional random variable is described as follows [25][26][27][28]: (1) Determining the sampling scale of random variable (N).
(2) Dividing each variable into N equiprobable intervals, and the probability of each interval is 1/N.

Regression Kriging Method
The kriging method is a geostatistics technique which has many different types [29][30][31], and each type has its own special features. The regression kriging method is a type of kriging method, which was first introduced as a surrogate model by Sacks [32][33]. Many researchers now use the regression kriging method to establish the surrogate model.
The form of regression kriging model is [34][35][36][37] is the term of deterministic functions, βj refers to the coefficients of the deterministic function, and fj(x) is a known regression function of p-order, which is usually a polynomial function of 0-order, 1-order, or 2-order. 0-order: z (x)is a stochastic process with zero-mean, variance σ 2 , and covariance where R (xi, xj)is the correlation function, depending only on the distance vector of xi and xj, not on their locations. The common types of correlation functions are as follow [38]: Cubic-spline function: ( ) where ξk are the unknown parameters, k i x and k j x are the k th component of sample points x . The Gauss function had been proved feasible in many researches [15,39], thus it was selected as a correlation function in this paper.
The Equation (10) is a general form of regression kriging model, which is established by the following steps: (1) r, the correlation matrix between m samples and prediction points x, and R, the correlation matrix between m samples, are calculated by Equation (8).
(2) f (X), referring to the known regression functions of p-order, is calculated through equation 5.
(3) β * is the estimated value of β, which is obtained by the generalized least-squares method.
( ) (4) The estimated value of σ 2 is obtained by the following equation.
(5) The parameter θ k is obtained when the following equation achieves its maximum value, and this method is named as the maximum likelihood estimation method. The basic idea of this method (Maximum Likelihood, ML) is that the most reasonable parameter estimator is determined when extracting an n group sample observation value from the sample population of the model randomly and making the n group sample observation value selected from the overall model have a maximum probability.

Numerical Simulation of Groundwater Flow
The aimed for aquifer located in western Jilin Province is a pore aquifer which is composed of unconfined aquifer and confined aquifer. In the middle of these two aquifers, there is a weakly permeable clayey soil layer.
The top of the simulation area is the unconfined aquifer's upper boundary where such actives pertaining to water exchange mainly occur as precipitation infiltration, irrigation leakage, evaporation, artificial exploitation, etc. The bottom boundary of the simulation area is the floor of the confined aquifer which is a clayey soil layer and almost has no water exchange. The lateral boundary is generalized based on the boundary of unconfined aquifer (Figure 1), because the unconfined aquifer is found relatively thicker. The groundwater flow system of the simulation area can be generalized as non-homogeneous, isotropic, and two-dimensional unsteady flow system, which can be shown as follows [41,42]: where H(x, y, t) is the groundwater table (m) , H0(x, y) is the initial water table (m), Zb is the elevation of aimed for aquifer floor (m), k is the hydraulic conductivity (m·d −1 ), μ is the specific yield (dimensionless), W is the vertical recharge, discharge strength of unconfined aquifer (m·d −1 ), 1 Γ is the boundary of Dirichlet condition, 2 Γ is the boundary of Newman condition, q (x, y, t) is the recharge and discharge quantity of aquifer per unit width (m·d −1 ), n  is the direction of outward normal on the boundary, D is the area for simulation computation. The groundwater flow direction and parameters partitions of the study area are shown in Figure 2, in which the study area is divided into 13 subareas, and the parameters values of study subareas are in Table 1.  Hydraulic conductivity (m/d)  14  135  27  17  20  26  9  11  12  13  28  44  15 Specific yield 0.09 0.23 0.10 0.12 0.18 0.08 0.08 0.10 0.11 0.08 0.09 0.10 0.15 Specific storage (m −1 ) 0.008 0.008 0.009 0.008 0.008 0.008 0.009 0.008 0.008 0.008 0.007 0.008 0.008 The Groundwater Modeling System (GMS) is made of several modular (MODFLOW, FEMWATER, MT3DMS, RT3D and so on) designed by Environmental Model Laboratory of Brigham Yong University and Test Station of America Army Drainage Engineering. It was used to model groundwater flow and groundwater quality widely [43,44]. MODFLOW modular of GMS (version 9.2.2) software is used to solve the numerical simulation model of groundwater flow, and the algorithm of MODFLOW is a finite difference method [45][46][47].

Genetic Algorithm
The genetic algorithm (GA) is a computational model on the basis of Darwin's biological evolution theory genetic mechanism, used to search for the optimal solution by simulating natural evolution. It includes three genetic operators of selection, crossover and mutation [48][49][50]. A flowchart for solving a general problem through the genetic algorithm is shown in Figure 3.

Numerical Simulation of Groundwater Flow
The calibration phase of simulation was selected in the dry season for 181 days from October 1, 2006 to March 31, 2007, taking into consideration that less source and sink are beneficial to identify hydrogeology parameters. The verification phase was selected in the wet season for 182 days from

Surrogate Model of Numerical Simulation Model of Groundwater Flow
The four exploitation wells were set in the Tongyu county and Qian Gorlos county respectively so as to supply water to Daan county shown in Figure 8. The Latin hypercube sampling method was used to obtain 40 and 10 groups of exploitation schemes which were introduced into the numerical simulation model of the groundwater flow to obtain groundwater table drawdown datasets respectively ( Table 2).
The q is the groundwater exploitation quantity and s is the groundwater table drawdown under the exploitation schemes in Table 2.
MATLAB (2013a) procedure was compiled according to the principle of the regression kriging method. Training samples were used to establish the surrogate model (regression kriging model) and validation samples were used to verify the computational accuracy of the surrogate model. The θ are a series of coefficients of gauss functions which determine the precision of the surrogate model. The θ are calculated by a genetic algorithm through Equation (16) in Table 3. Table 3. Parameters of the surrogate model. To investigate the validity of the surrogate model, the validation samples were introduced into the groundwater flow numerical simulation model and the surrogate model respectively. Then, the results of the surrogate model and the numerical simulation model of the groundwater flow were compared with the evaluation indexes including relative error and root mean square error. The value and relative error of the groundwater table drawdown of the simulation model and surrogate model are shown in Figure 9, the mean relative error and root mean square error between the simulation model and the surrogate model are shown in Table 4.  From Figure 9, the relative error of each well of each scheme is between 0.01% and 4.93%, less than 5%, and the mean relative error of each scheme is between 0.99% and 2.60%. The mean relative error of the 10 validation schemes is 1.87%, which shows that the computed groundwater table drawdown of each well of each scheme by the kriging model is very close to the simulation model. The root mean square error of each scheme is between 1.06% and 2.93%, and the root mean square error of the 10 validation schemes is 2.27%. The results show that the computed groundwater table drawdown of each well of each scheme by the kriging model is not significantly different close to the simulation model, and each scheme also has no significant difference. The above description demonstrates that the surrogate model could substitute the groundwater flow numerical simulation model effectively.

Optimization Model
Considering that the eight pumping wells were used to supply water for Daan county water simultaneously, the exploitation quantity of each well needed to be distributed reasonably according to the minimum average groundwater table drawdown of the eight exploitation wells. However, the cost of water supply of each exploitation well is different. We selected an optimal exploitation scheme which could make the average drawdown of the groundwater table and the cost of the groundwater exploitation less.
To optimize the conditions of water supply for the scheme of groundwater exploitation, a nonlinear multi-objective optimizations model was developed using the minimum average drawdown of the groundwater table and the minimum cost of groundwater exploitation as multi-objective functions, with the exploitation rates as decision-making variables. The optmization model was constructed as follows: where S is the average groundwater table drawdown (m), si is the groundwater table drawdown of the i th well (m), n is the numbers of exploitation wells, M is the water cost ($), xi are cost coefficients in the Table 5 ($·d·m -3 ), qi are the exploitation rates of the th i well (m 3 d −1 ).  In order to solve the nonlinear multi-objective optimization model, the surrogate model (regression kriging program) was loaded into the genetic algorithm and linked with the exploitation rates. The optimal groundwater exploitation strategy through invoking the surrogate model is in the Table 6. The main computational burden of the simulation optimization process is the repeated running of the numerical simulation model. The optimization for the study area requires 55 s of CPU time to run every simulation model on a 2.93 GHz Inter CPU and 2 GB RAM PC platform. A conventional simulation optimization model requires 40000 runs of the simulation model. Thus, it would require 2200000 s (25 days) of CPU time to process the overall simulation model run. However, in this study replacing the simulation model with the surrogate model in the optimization process could reduce the process to only 50 simulation model runs during the training and validation of the surrogate model. Thus, it only required 19800 s (5.5 h) of CPU time to complete the simulation model and optimization model run. According to this study, replacing the simulation models with surrogate models could reduce the huge computational burden effectively and maintain considerably high accuracy so as to obtain an optimal exploitation scheme.