Data-Driven Approach to Assess Spatial-Temporal Interactions of Groundwater and Precipitation in Choushui River Groundwater Basin, Taiwan

: The scarcity of groundwater and precipitation stations has limited accurate assessments of basin-scale groundwater systems. This study proposes a workﬂow that integrates satellite and on-site observations to improve the spatial and temporal resolution of the groundwater level and enable recharge estimations for the Choushui River groundwater basin (CRGB) in Western Taiwan. The workﬂow involves multiple data processing steps, including analysis of correlation, evaluation of residuals, and geostatistical interpolation based on kriging methods. The observed groundwater levels and recharge are then the basis to assess spatial-temporal interactions between groundwater and recharge in the CRGB from 2006 to 2015. Results of correlation analyses show the high correlation between the groundwater level and the land surface elevation in the study area. However, the multicollinearity problem exists for the additional precipitation data added in the correlation analyses. The correlation coe ﬃ cient, root mean square error, and normalized root mean square parameters indicate that the Regression Kriging (RK) performs better the groundwater variations than the Ordinary Kriging (OK) dose. The data-driven approach estimates an annual groundwater recharge of approximately 1.40 billion tons, representing 37% of the yearly precipitation. The correlation between groundwater levels and groundwater recharge exhibits low or negative correlation zones in the groundwater basin. These zones might have resulted from multipurpose pumping activities and the river and drainage networks in the area. The event-based precipitation and groundwater level have shown strong recharge behavior in the low-land area of the basin. Artiﬁcial weir operations at the high-land mountain pass might considerably inﬂuence the groundwater and surface water interactions.


Introduction
Groundwater resources play an important role in water resource management issues because of the highly variable climate conditions that significantly influence water availability. The main

Data Used
This study used several types of data, including groundwater levels (GWLs), precipitation, land use, land surface elevation, soil types, and evapotranspiration. The monthly averaged GWLs at 31 groundwater wells were provided by the Water Resource Agency (WRA) of Taiwan from 2006 to 2015. There were two sets of precipitation used in this study. The first set of precipitation data was obtained based on 9 rain gauges of the Taiwan Water Resource Agency (WRA) from 2006 to 2015. The second set of precipitation data relied on satellite processing, which was available from The In the study area, the wet months are typically from May to October, and dry months are from November to April in the following years [24]. There are special conditions in the proximal fan of the CRGB. There is a mountain pass between the Pakuashan tableland and Douliu hill ( Figure 1).
Based on the well logs obtained from Taiwan Central Geological Surveys (CGSs), the mountain pass has a relatively shallow aquifer thickness, which varies from 30 to 50 m, and low permeable bedrock bounds the aquifer bottom. Field investigations have shown that groundwater and surface water interactions in the east and the west of the mountain pass are different because of the narrow and shallow mountain pass that allows the limited space for groundwater flow from upstream to the downstream. The specific landform near the mountain pass provides excellent conditions for the storage of surface water. The development of the Jiji weir at the mountain pass was one of the important infrastructures for the conjunction used of water resources in the CRGB.
The land use map in Figure 1b shows that agricultural lands (green color) are distributed uniformly in the study area, especially in the distal fan. The urban area shown with the dark red color is mainly located in the areas close to the Peikang River and along the coastline. Wetlands are mostly located along the coastline. Land use has an essential role in groundwater recharge. Building and other artificial structures from concrete will reduce the infiltration process. Vegetation cover over the area will support the recharge because vegetation presence increases infiltration by reducing the compacted soil and the surface runoff. Vegetation with coarse roots should facilitate groundwater recharge in the study area. The rainfall sensitivity to the rainfall in the distal fan is probably caused by vegetation over the distal fan along the Choushui River. The discharge decreases with less vegetation, indicating the impact of increased evapotranspiration due to more vegetation [4,5].

Data Used
This study used several types of data, including groundwater levels (GWLs), precipitation, land use, land surface elevation, soil types, and evapotranspiration. The monthly averaged GWLs at 31 groundwater wells were provided by the Water Resource Agency (WRA) of Taiwan from 2006 to 2015. There were two sets of precipitation used in this study. The first set of precipitation data was obtained based on 9 rain gauges of the Taiwan Water Resource Agency (WRA) from 2006 to 2015. The second set of precipitation data relied on satellite processing, which was available from The Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) [25]. The CHIRPS data were created by the National Aeronautics and Space Administration (NASA) and the National Oceanic and Atmospheric Administration (NOAA). The CHIRPS provides global coverage of daily precipitation with a 5.5 × 5.5-km spatial resolution. CHIRPS combined several satellite data and validated by station data. The land use data we obtained from sensor of the MODIS Land Cover Type product (MCD12Q1) provided by the United States Geological Survey (USGS) have a 450 × 450-m spatial resolution (Figure 1b). The supervised classification was performed to generate 18 different types of land uses and land covers. For the data processing, the land use data were reorganized from 18 classes to seven classes, namely: Water, Forest, Grassland/Savanna, Wetlands, Croplands, Urban area, and Open space or bare soils. The 30-m Shuttle Radar Topographic Mission (SRTM) data were used as the elevation data. Besides land use data, we also have soil type data provided by the Agricultural Committee of Taiwan. There were six classes of soil types, namely: (0) fine alluvial soil of tableland with well drainage, (1) limestone-and shale-type medium-coarse alluvial soil with poor drainage, (2) soaking nonlimestone-type fine alluvial soil, (3) nonlimestone-type medium alluvial soil with well drainage, (4) Poor drainage fine red soil, (5) andesite-type fine alluvial soil with poor drainage. In this study, the monthly evapotranspiration data for the analysis were obtained from the Water Resource Agency (WRA) of Taiwan. This study also analyzed the relationship between GWL, GWR, and precipitation based on an hourly rainfall event to assess the impact of the rainfall event on the groundwater responses. Due to the limited hourly evaporation data, we used the hourly evaporation and surface runoff obtained from ERA-Interim data that were produced by the European Centre for Medium-Range Weather Forecasts (ECMWFs).

Methods
In this study, assessing the relationship among the GWLs, GWR, and precipitation will focus on two topics: the interpolation of GWL and GWR estimation. In the first topic, we compared OK and RK for analyzing the spatial-temporal distribution of the GWLs over the study area. In principle, OK used only the GWL as the single variable for interpolation. However, the RK used the elevation and "modified" precipitation as the covariates for interpolating the GWLs. The closeness between the predicted and observed values has been presented using selected criteria such as Coefficient Correlation (r), Root Mean Square Error (RMSE), and Normalized Mean Square Error (NMSE). The OK and RK methods produced 240 maps for the monthly averaged GWLs. The discussion of the results will focus on different time-scales that represent the significant variations of GWLs induced by short events and seasonal and annual variations. The second topic will focus on the estimation of basin-scale GWR. This study employed the concept of water balance for calculating GWR. In the analysis, the water cycle components include the regional precipitation (P), evapotranspiration (ET), and surface runoff (R). Results of GWR and the associated GWLs and precipitation play an essential role in characterizing the interactions of surface water and groundwater in the CRGB.

Modified Precipitation (MoP)
The MoP aims to reduce the over-and underestimations of the precipitation obtained from the CHIRPS satellite data. Over the years, research efforts have focused on improving the temporal and spatial accuracy of the CHIRPS data for site-specific problems [25]. It was recognized that the accuracy of the CHIRPS data could vary significantly based on the available rain gauge stations used for conditioning the CHIRPS observations at the rain gauge stations. For this study, it was assumed that the rain gauge stations in the CRGB might also play an essential role in predicting regional precipitation. Therefore, the residual precipitation between rain gauge observations and CHIRPS satellite data were then used to integrate and assess the interactions between the precipitation and groundwater. There are 12 rain gauge stations in the study area (see Figure 1a). Because of the limited rain gauge stations to formulate the geostatistical structure for kriging interpolation, the residual maps for different times were interpolated based on the Inverse Distance Weighting (IDW) method. Adding the residual maps back to the CHIRPS data can yield the MoP at different times in the study area. The temporal scales of the precipitation for our analyses can be a month or a day. Therefore, the daily results of MoP were further accumulated to obtain different time intervals of MoP for our analyses.

Linear Regression Analysis
This study implemented both simple linear regression and multiple linear regression algorithms to assess the relationship between the independent variables (i.e., precipitation and elevation) and the GWL. The simple linear regression aims to determine the relationship between the precipitation and GWL and the land surface elevation versus the GWL. We combined the land surface elevation and precipitation as the independent variables for the analysis for the multiple linear regression. The correlation squared (r 2 ) has a special meaning in linear regressions. It represents the proportion of variation in GWL explained by independent variables [26].
Results of the linear regression analysis provide general insight into the use of the precipitation and land surface elevation for estimating the GWL. In the study, the multiple linear regression model has the general formula [26]:ẑ =β 0 +β 1 x 1 +β n x n + ε whereẑ is the dependent variable. The notation x n is the independent variable andβ n is the regression coefficient. The ε on the right-hand side represents the error from the regression.

Geostatistical Interpolation Methods
This study explored two geostatistical interpolation methods, including the OK and the RK methods for a data-driven approach [23]. Kriging is a geostatistical interpolation method frequently used for various implementations [18]. The kriged estimates rely on the weighted linear sum of the observations. The unique feature of the kriging is to consider the geostatistical structure of the observations. The weights are derived based on the selected variogram or covariance models to represent the spatial variability of the variables [25]. Kriging methods provide an estimate that is more sophisticated than other deterministic interpolation methods such as the nearest-neighbor or the IDW method.
Ordinary kriging is the most commonly used type of kriging method [26]. OK creates an estimator used in cases where there are unknown global means. The prediction model of OK can be described by: whereẐ (s 0 ) is the predicted value at an unsampled location s 0 , µ is the constant stationary function (global mean), and ε (s 0 ) is the spatially correlated stochastic part of the variation. The first step in OK is to calculate a semivariogram from the scatter point set to be interpolated. A semivariogram is a tool that can be used to quantify the changes of correlation among the measurements by the increasing distance [17]. The variograms can be described by [17]: where γ(h) is the semivariogram with a fixed distance h. The Z(s) is the value of the variable at the location s and Z(s + h) is the value at a location separated by the distance h. Determining a variogram model requires the information of the properties, the sill, the range, and the nugget, fitted based on the data pairs in the experimental semivariogram [27]. Regression kriging is a modified kriging method that combines a regression of the dependent auxiliary variables (such as terrain parameters, remote sensing imagery, and thematic maps) with the kriging of the regression residuals. It is mathematically equivalent to the interpolation method variously called universal kriging and kriging with external drift, where the auxiliary predictors are used directly to solve the kriging weights [19]. In the study, the concept of the regression algorithm for the RK has the following mathematical expressions [17]: wherem(s 0 ) is the fitted drift andε(s 0 ) is the interpolated residual. Notationsβ k andβ 0 are the estimated drift model coefficients and the estimated intercept, respectively. q k (s 0 ) represents the values of the auxiliary variables at the target location s 0 and λ i represents the kriging weights determined by the spatial dependence structure of the residual ε(s i ) at the measurement location s i .

Assessment Criteria
This study assesses the results obtained from the OK and RK. The closeness of the predicted values to the observed values relies on the calculated statistics such as the Coefficient Correlation (r), Root Mean Square Error (RMSE), and Normalized Mean Square Error (NMSE). The Pearson correlation coefficient, r, indicates how far away all these data points are to this line of best fit (i.e., how well the data points fit this new model or the line of the best fit) [26]. The RMSE is a measure of the error size, but it is sensitive to outliers. The RMSE should be as low as possible to produce better modeling. The Normalized Mean Square Error (NMSE) indicates information on the deviations and therefore always giving positive values. The increasing NMSE means the increase in deviation between predicted and observed values. If a model has a very low NMSE, it indicates good predictions in space and time.

Estimations of GWR Based on Land Cover and Land Use
The calculation of the GWR is critical to quantify the input of an aquifer system. In this study, the mass balance concept was employed to estimate the spatial-temporal GWR in the CRGB. Based on the available observations from the Water Resource Agency and Central Weather Bureau of Taiwan, the recharge can be determined by the following water balance equation [28]: where W is the recharge, P is atmospheric precipitation, ET is the evapotranspiration, and R is the surface runoff. The MoP is determined for the source of the precipitation data for the analysis. The monthly evapotranspiration used the data from the Central Weather Bureau of Taiwan. For the calculation of surface runoff, this study used the Soil Conservation Service (or SCS) curve number (CN) model to link the relationships between land cover, hydrologic soil classes, and curve numbers [29]. The SCS-CN model is simple to apply, easy to understand, and relatively stable to the solution. Furthermore, this method includes most runoff characteristics in watersheds, such as hydrologic conditions, land use, moisture condition, and soil type. The SCS-CN model was initially developed for its use on small agricultural watersheds and was extended and applied to rural, forest, and urban watersheds. Since the inception of the model, it has been used in a wide range of environmental implementations. The Natural Resource Conservation Service classifies four Hydrologic Soil Groups (HSGs) based on the runoff potential of different soils. Therefore, the first step to calculate the runoff is to define the soil types based on the hydrologic soil groups. Details of this classification can be found in the study of Abraham et al. [30].
The second step is to classify the land use based on the curve number groups of the U.S. Department of Agriculture (USDA) in 1986 [30]. In this study, we used the land use data derived by the MODIS satellite. Specifically, the land use is derived based on the land cover classifications proposed by Anderson et al. [31] and Congalton and Green [32].

Results and Discussion
The presentation of the results will focus on comparing different supporting variables (i.e., MoP and surface elevation) to predict the GWL in the CRGB. There are long-term data collected for this study. The data set in January 2006 was used for illustration purposes. Other data sets follow the same workflow to obtain the detailed temporal variations from February 2006 to December 2015. With the developed workflow applied to the groundwater basin, the details of the GWR were obtained. The variations of GWR enable the assessment of long-term interactions between groundwater and surface water in the CRGB. Figure 2 shows an example of the results for the calculation of the MoP. The map for January 2006 was selected to illustrate how the MoP was obtained in the study. Figure 2a shows the original CHIRPS data for January 2006, while Figure 2b presents the MoP in the study area. The CHIRPS observations give a maximum and minimum monthly precipitation of 33.52 and 7.37 mm for January 2006. Furthermore, the CHIRPS observations in Figure 2a show that the high precipitation areas were in the east region of the CRGB. With the observations at the rain gauge stations, the MoP shows different patterns and magnitudes of the precipitation in the CRGB. In general, the magnitude of the precipitation is reduced based on the field observations at the rain gauge stations. The variations of the precipitation in local areas are also high as compared with the original CHIRPS data in Figure 2a. We found significant differences between the original CHIRPS data and the MoP results. The large difference can be from the resolution of CHIRPS data and the field observations at the stations. Before the use of CHIRPS, we applied a linear regression analysis for CHIRPS data and field observations. The Choushui River located in Southwestern Taiwan, and the correlation coefficient shows a strong correlation (r > 0.50) between CHIRPS data and field observations. However, the performance of the CHIRPS data was not always consistent through the observation times [25]. The difference between CHIRPS data and field observations at the station can also produce results of MoP different from the original CHIRPS data.

Linear Regression Analysis
A linear regression analysis can provide indications to assess whether the additional variables have a linear correlation with the GWL. In this study, there were three combinations of models tested by the linear regression method. The results are the criteria for selecting secondary variables in the RK interpolation of the GWL. As an example, we showed the results of the linear regression analyses for each model in January 2006.

GWL and Land Surface Elevation (DEM)
The first model evaluates the relationship between GWL and the land surface elevation in the CRGB. According to the regression model result, using land surface elevation as the additional variable produced an r-squared (r 2 ) value of 0.95, and the correlation coefficient of the regression model was 0.97 (Table 1). The correlation coefficient (r) between the GWL and elevation is closer to +1, indicating a positive and strong linear relationship between the rain gauges and elevation. Such results imply that the land surface elevation can describe 95% of GWL variation in the study área. Additionally, the increase in land surface elevation is associated with the increase in the GWL in the study area. The statistics of the two data sets show that the p-value is smaller than 0.05, indicating the significance of the land surface elevation as the additional variable. The significance of the land surface elevation suggests that the elevation data obtained from DEM SRTM can be included as an additional covariate for improving the RK interpolation of the GWL. We found significant differences between the original CHIRPS data and the MoP results. The large difference can be from the resolution of CHIRPS data and the field observations at the stations. Before the use of CHIRPS, we applied a linear regression analysis for CHIRPS data and field observations. The Choushui River located in Southwestern Taiwan, and the correlation coefficient shows a strong correlation (r > 0.50) between CHIRPS data and field observations. However, the performance of the CHIRPS data was not always consistent through the observation times [25]. The difference between CHIRPS data and field observations at the station can also produce results of MoP different from the original CHIRPS data.

Linear Regression Analysis
A linear regression analysis can provide indications to assess whether the additional variables have a linear correlation with the GWL. In this study, there were three combinations of models tested by the linear regression method. The results are the criteria for selecting secondary variables in the RK interpolation of the GWL. As an example, we showed the results of the linear regression analyses for each model in January 2006.

GWL and Land Surface Elevation (DEM)
The first model evaluates the relationship between GWL and the land surface elevation in the CRGB. According to the regression model result, using land surface elevation as the additional variable produced an r-squared (r 2 ) value of 0.95, and the correlation coefficient of the regression model was 0.97 (Table 1). The correlation coefficient (r) between the GWL and elevation is closer to +1, indicating a positive and strong linear relationship between the rain gauges and elevation. Such results imply that the land surface elevation can describe 95% of GWL variation in the study área. Additionally, the increase in land surface elevation is associated with the increase in the GWL in the study area. The statistics of the two data sets show that the p-value is smaller than 0.05, indicating the significance of the land surface elevation as the additional variable. The significance of the land surface elevation suggests that the elevation data obtained from DEM SRTM can be included as an additional covariate for improving the RK interpolation of the GWL.

GWL and MoP
The second regression model considers the MoP to be the additional variable for predicting the GWL. In the analysis of the GWL and the MoP, the linear regression model shows an R-squared value of 0.49, and the correlation coefficient (r) for the model is 0.70 ( Table 1). The result shows a strong and positive linear relationship between the GWL and MoP. Furthermore, as the additional variable, the p-value of MoP for the GWL is also lower than 0.05 (i.e., p-value < 0.05). This p-value implies that adding the MoP into the regression model might improve the GWL prediction. Compared with the result obtained from land surface elevation analysis, the MoP has a relatively low linear relationship with the GWL. In the case of January 2006, MoP can be used as an additional variable for predicting the GWL. However, the highly nonuniform precipitation in CRGB might lead to a low correlation between the GWL and the MoP. Based on the results obtained from the 10-year data, the overall performance of the correlation coefficient was r = 0.23, showing a weak positive correlation. In general, the prediction of the monthly GWL could include the MoP as the additional variable. However, the linear regression analysis for each month is required because of the inconsistency of the linear relationship between the GWL and MoP for each month.

GWL, Land Surface Elevation (DEM), and MoP
The third regression model considers the land surface elevation and MoP simultaneously. The linear regression results show that the correlation coefficient (r) for the third model is 0.97 (Table 1). This correlation coefficient shows a strong linear correlation among GWLs, land surface elevation, and the MoP. However, the p-value of the analysis indicates that adding land surface elevation and MoP in the model leads the MoP from being significant to insignificant. The multicollinearity of data characteristic induces the MoP to become an insignificant variable in the third model. The data multicollinearity represents the independent variables in a multiple regression model that might closely correlate to others. In the third model, we obtained a correlation coefficient values of 0.68 for land surface elevation and MoP. A similar correlation coefficient value (r = 0.70) was obtained for the GWL and MoP in the CRGB ( Table 1). The results give a general insight into the use of the additional variable for RK interpolation. A higher correlation coefficient (i.e., closer to 1) between the independent and dependent variables will lead to better results of RK interpolation. Therefore, this study considered the land surface elevation to be the single variable used for improving the interpolation of the GWL in the CRGB.

Geostatistical Interpolation Methods
The kriging methods require the geostatistical structure of the data for site-specific problems. In most cases, the geospatial analysis can produce the geostatistical structure for specific parameters. Figure 3 shows an example of the experimental variogram and the fitted Gaussian variogram model for January 2006. The results in Figure 3 used the monthly averaged GWLs obtained from the first layer of the groundwater monitoring network in the CRGB. Note that the isotropic Gaussian model and nonlinear least-squares fitting algorithm were used for the analysis. The study of Kumar and Remadevi (2008) found that the Gaussian variogram model is appropriate for the interpolation of the GWL [33]. Three important properties are indicated in the (semi) variograms, including the sill, range, and nugget [27]. In Figure 3, the fitted variogram model yields sill, nugget (c 0 ), and range values of 1094, 85, and 15,295 m, respectively. In this study, the data set for other months follows the same workflow to obtain the geostatistical structure as the kriging interpolation methods. The data processing and the fitting algorithm were developed with the R programming language. In this study, there are 120 sets of monthly averaged data for the OK and RK interpolations. The variogram parameters for different months might show differently because the groundwater variations in the CRGB might be influenced by natural precipitation and human activities.
Water 2020, 12, x FOR PEER REVIEW 10 of 22 and nugget [27]. In Figure 3, the fitted variogram model yields sill, nugget (c0), and range values of 1094, 85, and 15,295 m, respectively. In this study, the data set for other months follows the same workflow to obtain the geostatistical structure as the kriging interpolation methods. The data processing and the fitting algorithm were developed with the R programming language. In this study, there are 120 sets of monthly averaged data for the OK and RK interpolations. The variogram parameters for different months might show differently because the groundwater variations in the CRGB might be influenced by natural precipitation and human activities.   Figure 5 further presents the kriging standard deviation (i.e., variance) for the different interpolation methods. The results in Figures 4 and 5 show that the RK associated with the regression of land surface elevation can reproduce more details of the groundwater variations. It is clear that the GWLs near the Choushui River had been resolved, and the GWL varies with the land surface elevation gradually from the east to the west of the CRGB. A considerable difference was obtained near the coastal line in the north portion of the CRGB. With relatively coarse monitoring wells located in the area (see Figure 1a), the OK result shows the smooth and slightly overestimated GWL. The kriging error variance accounts for the uncertainty that might be introduced by the interpolation methods. The greater the error variance, the higher the uncertainty obtained from the interpolation method ( Figure 5).
By definition, the values of the error variance depend on the locations of the observations and the nugget effect in the variogram model. The locations with observation data (i.e., the conditioning points) lead to the smallest variances. Away from the observation locations, the kriging interpolation employs the geostatistical structure to derive the kriging weightings for interpolations. In Figure 4b, the RK result has shown that the additional variable from the land surface elevation can significantly improve the interpolation. The improvement is especially evident in the areas near the coastal line, the south boundary, and the north boundary of the CRGB. Note that the result shown here is for the month-averaged GWL in January 2006. The overall quantitative evaluation of the interpolation results will be discussed later in the following sections.   Figure 5 further presents the kriging standard deviation (i.e., variance) for the different interpolation methods. The results in Figures 4 and 5 show that the RK associated with the regression of land surface elevation can reproduce more details of the groundwater variations. It is clear that the GWLs near the Choushui River had been resolved, and the GWL varies with the land surface elevation gradually from the east to the west of the CRGB. A considerable difference was obtained near the coastal line in the north portion of the CRGB. With relatively coarse monitoring wells located in the area (see Figure 1a), the OK result shows the smooth and slightly overestimated GWL. The kriging error variance accounts for the uncertainty that might be introduced by the interpolation methods. The greater the error variance, the higher the uncertainty obtained from the interpolation method ( Figure 5).
By definition, the values of the error variance depend on the locations of the observations and the nugget effect in the variogram model. The locations with observation data (i.e., the conditioning points) lead to the smallest variances. Away from the observation locations, the kriging interpolation employs the geostatistical structure to derive the kriging weightings for interpolations. In Figure 4b, the RK result has shown that the additional variable from the land surface elevation can significantly improve the interpolation. The improvement is especially evident in the areas near the coastal line, the south boundary, and the north boundary of the CRGB. Note that the result shown here is for the month-averaged GWL in January 2006. The overall quantitative evaluation of the interpolation results will be discussed later in the following sections.

Assessment Criteria
The comparison of cross-validation results between OK and RK for 10-year data are shown in Figure 6. The study employed the Leave One-Out Cross-Validation (LOOCV) method to quantify the performance of different kriging interpolation methods. The selected criteria for the evaluation are Normalized Mean Square Error (NMSE), Root Mean Square Error (RMSE), and the correlation coefficient (r). Again, the high NMSE value indicates the increase in deviation between predicted and observed values [34]. The parameter NMSE is very sensitive to differences between predicted and measured values. A model with a low NMSE represents well prediction in space and time [34]. The RMSE deals with the measurement error size, but it is sensitive to outliers [35]. The RMSE should be as low as possible to produce better modeling results [36]. The Pearson product-moment correlation coefficient (or Pearson correlation coefficient, for short) is a measure of the strength of a linear association between two variables [35].

Assessment Criteria
The comparison of cross-validation results between OK and RK for 10-year data are shown in Figure 6. The study employed the Leave One-Out Cross-Validation (LOOCV) method to quantify the performance of different kriging interpolation methods. The selected criteria for the evaluation are Normalized Mean Square Error (NMSE), Root Mean Square Error (RMSE), and the correlation coefficient (r). Again, the high NMSE value indicates the increase in deviation between predicted and observed values [34]. The parameter NMSE is very sensitive to differences between predicted and measured values. A model with a low NMSE represents well prediction in space and time [34]. The RMSE deals with the measurement error size, but it is sensitive to outliers [35]. The RMSE should be as low as possible to produce better modeling results [36]. The Pearson product-moment correlation coefficient (or Pearson correlation coefficient, for short) is a measure of the strength of a linear association between two variables [35].

Assessment Criteria
The comparison of cross-validation results between OK and RK for 10-year data are shown in Figure 6. The study employed the Leave One-Out Cross-Validation (LOOCV) method to quantify the performance of different kriging interpolation methods. The selected criteria for the evaluation are Normalized Mean Square Error (NMSE), Root Mean Square Error (RMSE), and the correlation coefficient (r). Again, the high NMSE value indicates the increase in deviation between predicted and observed values [34]. The parameter NMSE is very sensitive to differences between predicted and measured values. A model with a low NMSE represents well prediction in space and time [34]. The RMSE deals with the measurement error size, but it is sensitive to outliers [35]. The RMSE should be as low as possible to produce better modeling results [36]. The Pearson product-moment correlation coefficient (or Pearson correlation coefficient, for short) is a measure of the strength of a linear association between two variables [35]. By accumulating the results for 10-year observations, the average performance for each month was obtained. In general, the RK shows a relatively better performance when estimating the GWL in the CRGB. However, the differences are not evident if one compares the average performance for each month. Figure 6 shows that the interpolation performance varies with the seasonal fluctuation of the groundwater observations. In the dry seasons from February to June, GWL estimations are generally less accurate than those in the wet seasons from July to October in a year. A particular case in April shows that the RK method obtains an NMSE value higher than that obtained by using the OK method. This result implies that the land surface elevation added in the RK interpolation might not improve the interpolation results. A possible factor that influences the inconsistency of the RK interpolation could be the human activities introduced in the CRGB. In April, the dry season will lead to excessively high groundwater pumping for agriculture activities (especially the paddy fields for the first crop season) in the CRGB. Local groundwater pumping will lead to a local GWL decrease, and the local GWL reduction might not directly correlate to the land surface elevation.

GWR
The GWR is relevant to numerous factors, including infiltration capacity, the stochastic character of precipitation, and other site-specific climate conditions [37]. Previous investigations have shown that the spatial and temporal distribution of the regional precipitation is the main factor that dominates the natural variations of the GWR [37]. Table 2 summarizes the soil groups identified in the CRGB. In the study, the Composite Curve Number (CCN) number weights were modified based on the studies of USDA (1986) and Hong and Adler (2008). Table 1 also lists the areas for the land use types. It is recognized that crop lands play an important role in regional water interactions. The agriculture activities associated with the water demand might produce a challenging task for water resource management. The results indicate that the type A of hydrogeological soil group occupied the maximum area in the CRGB. There is more than 60% of soil type A obtained for the study area. The total area was then calculated by CN of type A which multiplied by area per land use (CN x area). By accumulating the results for 10-year observations, the average performance for each month was obtained. In general, the RK shows a relatively better performance when estimating the GWL in the CRGB. However, the differences are not evident if one compares the average performance for each month. Figure 6 shows that the interpolation performance varies with the seasonal fluctuation of the groundwater observations. In the dry seasons from February to June, GWL estimations are generally less accurate than those in the wet seasons from July to October in a year. A particular case in April shows that the RK method obtains an NMSE value higher than that obtained by using the OK method. This result implies that the land surface elevation added in the RK interpolation might not improve the interpolation results. A possible factor that influences the inconsistency of the RK interpolation could be the human activities introduced in the CRGB. In April, the dry season will lead to excessively high groundwater pumping for agriculture activities (especially the paddy fields for the first crop season) in the CRGB. Local groundwater pumping will lead to a local GWL decrease, and the local GWL reduction might not directly correlate to the land surface elevation.

GWR
The GWR is relevant to numerous factors, including infiltration capacity, the stochastic character of precipitation, and other site-specific climate conditions [37]. Previous investigations have shown that the spatial and temporal distribution of the regional precipitation is the main factor that dominates the natural variations of the GWR [37]. Table 2 summarizes the soil groups identified in the CRGB. In the study, the Composite Curve Number (CCN) number weights were modified based on the studies of USDA (1986) and Hong and Adler (2008). Table 1 also lists the areas for the land use types. It is recognized that crop lands play an important role in regional water interactions. The agriculture activities associated with the water demand might produce a challenging task for water resource management. The results indicate that the type A of hydrogeological soil group occupied the maximum area in the CRGB. There is more than Water 2020, 12, 3097 13 of 22 60% of soil type A obtained for the study area. The total area was then calculated by CN of type A which multiplied by area per land use (CN x area). Using the CCN values enables the calculation of the S and Ia for the entire groundwater basin. The value of S and Ia are the key parameters to derive the surface runoff (R) for the study area. With the distribution of S and Ia in the study area, the groundwater recharge (W) can be calculated with different temporal scales. Figure 7 shows the annual average GWR from 2006 to 2015 in the CRGB. In general, the GWR pattern varies with the land surface elevation and gradually decreases from the east to the west in the CRGB. This spatial pattern can also be found in the individual year of GWR. The distribution of the GWR is consistent with the distribution of precipitation in a year. However, in local areas, the GWR variations are high, especially in the distal fan area near the coastal zone. In the proximal fan area in the CRGB, we obtained the highest GWR, and the recharge zone is stable throughout the 10-year analysis. The Taiwan Central Geological Survey (Taiwan CGS) had defined the area as the recharge sensitivity zone for the CRGB. Conducting intensive groundwater monitoring is required to protect the groundwater resource in the proximal fan area.  Using the CCN values enables the calculation of the S and Ia for the entire groundwater basin. The value of S and Ia are the key parameters to derive the surface runoff (R) for the study area. With the distribution of S and Ia in the study area, the groundwater recharge (W) can be calculated with different temporal scales. Figure 7 shows the annual average GWR from 2006 to 2015 in the CRGB. In general, the GWR pattern varies with the land surface elevation and gradually decreases from the east to the west in the CRGB. This spatial pattern can also be found in the individual year of GWR. The distribution of the GWR is consistent with the distribution of precipitation in a year. However, in local areas, the GWR variations are high, especially in the distal fan area near the coastal zone. In the proximal fan area in the CRGB, we obtained the highest GWR, and the recharge zone is stable throughout the 10-year analysis. The Taiwan Central Geological Survey (Taiwan CGS) had defined the area as the recharge sensitivity zone for the CRGB. Conducting intensive groundwater monitoring is required to protect the groundwater resource in the proximal fan area. Using the raster calculation can obtain the total volume of the GWR in a year. The data-driven approach yields a total volume of 1.40 × 10 9 m 3 GWR for the CRGB. This value represents approximately 37% of the total precipitation in a year, which has a total volume of 3.77 × 10 9 m 3 on average in a year. The results in this study agree with many previous investigations that used numerical models to evaluate the GWR in the CRGB. In these studies, the annual average recharge Using the raster calculation can obtain the total volume of the GWR in a year. The data-driven approach yields a total volume of 1.40 × 10 9 m 3 GWR for the CRGB. This value represents approximately 37% of the total precipitation in a year, which has a total volume of 3.77 × 10 9 m 3 on average in a year. The results in this study agree with many previous investigations that used numerical models to evaluate the GWR in the CRGB. In these studies, the annual average recharge in the CRGB was approximately 1.238 × 10 9 m 3 [38]. Our result slightly overestimated the GWR in the CRGB.

Spatial-Temporal Variations of GWL and GWR
In this study, overlaying the land use map in the study area found negative GWLs obtained in the bare soil and urban regions, especially in the west of the CRGB. The over-extraction of the groundwater might cause the local negative GWLs in these areas. Understanding the interactions between the GWL and MoP is crucial in managing basin-scale water resources. This study then conducted correlation analyses to characterize the interactions between the normalized GWL and the GWR. Figure 8 shows the time series of the normalized GWL and GWR by integrating the 10-year data. Note that the normalized data had considered the standard deviation of GWL and GWR for presentation purposes. We used three selected groundwater monitoring stations, including 9100211, 1,004011, and 9180211 (; shown in Figure 1a) in the proximal, middle, and distal fans to illustrate the analysis. In this study, the land surface elevations for these monitoring stations in the proximal, middle, and distal fans are 179, 46, and 5 m, respectively.
Water 2020, 12, x FOR PEER REVIEW 14 of 22

Spatial-Temporal Variations of GWL and GWR
In this study, overlaying the land use map in the study area found negative GWLs obtained in the bare soil and urban regions, especially in the west of the CRGB. The over-extraction of the groundwater might cause the local negative GWLs in these areas. Understanding the interactions between the GWL and MoP is crucial in managing basin-scale water resources. This study then conducted correlation analyses to characterize the interactions between the normalized GWL and the GWR. Figure 8 shows the time series of the normalized GWL and GWR by integrating the 10-year data. Note that the normalized data had considered the standard deviation of GWL and GWR for presentation purposes. We used three selected groundwater monitoring stations, including 9100211, 1,004011, and 9180211 (; shown in Figure 1a) in the proximal, middle, and distal fans to illustrate the analysis. In this study, the land surface elevations for these monitoring stations in the proximal, middle, and distal fans are 179, 46, and 5 m, respectively. In Figure 8, the results for July and August show similar trends for the GWL and the GWR in the entire CRGB. Estimated correlation coefficients between GWL and GWR show approximately 0.86, 0.59, and 0.82 in proximal, middle, and distal fan areas in the high precipitation months (i.e., July and August in a year). This finding agrees with the conclusion proposed by Tsai et al. (2015) that the upstream area of the Choushui River has a higher recharge rate than other areas [39]. There are significant decreases in the GWR from September to October because of less precipitation in the study area (see Figure 8a-c). In Figure 8d-f, the results from December to February show clear negative In Figure 8, the results for July and August show similar trends for the GWL and the GWR in the entire CRGB. Estimated correlation coefficients between GWL and GWR show approximately 0.86, 0.59, and 0.82 in proximal, middle, and distal fan areas in the high precipitation months (i.e., July and August in a year). This finding agrees with the conclusion proposed by Tsai et al. (2015) that the upstream area of the Choushui River has a higher recharge rate than other areas [39]. There are significant decreases in the GWR from September to October because of less precipitation in the study area (see Figure 8a-c). In Figure 8d-f, the results from December to February show clear negative correlations between the GWL and the GWR. The cases in January are the worse scenarios that show high deviations for proximal, middle, or distal fan areas. This period is a new crop season for the CRGB. With the shortage of available surface water, human activities might induce additional decreases in the GWL. The conditions can lead to the complexity of water resource management.
Notice that the selected well 9,100,211 is located in the proximal fan area. We found that the pattern of the correlation between GWL and GWR in the dry months is different from the other two wells in the middle and distal fans. The results in Figure 8d show a negative correlation in dry months. However, the GWL gradually increases from November to February, which is different from the results in Figure 8e,f. The increase in the GWL at the well might be influenced by the controlling lateral recharge from the Choushui River. As we have known in the proximal fan area, there is the Jiji weir built at the mountain pass for conjunction use of water resources. The operations of the weir might influence the natural behavior of the GWL near the upstream of the proximal fan area.
For understanding the impact of precipitation on the GWR and GWL, the Pearson correlation coefficient maps were derived using the 10-year data (i.e., 2006-2015) of the GWL and GWR. Based on the precipitation data, the high precipitation months are generally from May to October, representing the wet months. The low precipitation months are from November to April, which represent the dry months. The Pearson correlation values are generally various in the study area, but most higher correlation values are shown in the proximal fan area. Figure 9 shows the annual average of the estimated correlation between GWL and GWR. The results are the basis to assess the regional interactions between the GWL and GWR.
In the upstream area, there are low correlation zones for the GWL and GWR (Figure 9a). Principally, low correlations occur when disagreement trends exist in the same months in different years at the grid points. The mechanism is complicated because the long-term data set covered the climate and anthropogenic conditions through the collection data. The pattern of the results in Figure 9 can provide the fundamental elements to assess the outcome of the regional water interactions. In this study, the GWR estimation relied on the calculation of surface runoff using the data from precipitation, land use, evaporation, and soil types. The various land use patterns and soil types have shown a high impact on the estimated GWR variations. In Figure 9a, the spots with high correlations typically represent the areas without vegetation, and those areas consist of well drainage soil types. Therefore, these areas have a consistent high GWR and GWL in the same period and produce a relatively higher correlation between the GWR and GWL. The areas with low correlation or negative correlation could be influenced by land use and human activities. Figure 9a shows the correlation between GWL and GWR by averaging all data sets from 2006 to 2015. The high annual correlation values between GWL and GWR are mainly located in the proximal and distal fan areas. There are negative correlation zones in the central regions of the CRGB. The correlation values in these zones can reach −0.5, indicating the disturbance of GWL has dominated the character of surface water and groundwater interactions. Figure 9b,c show the correlation between GWL and GWR for the wet and dry months. There are considerable differences in the patterns of the correlation distribution. In the wet months, the positive correlation areas dominate the CRGB, indicating a similar GWL and GWR trend. There are also negative correlation areas located along the Choushui River in the center of the CRGB and along the Peikang River in the south of the CRGB. The behavior is not apparent based on the analyses of the correlation between the GWL and GWR. Figure 9c shows the correlation map for dry months. The map shows a comparison with the result in Figure 9b. The negative correlation dominates in the dry months. We found that the positive correlation areas in the dry months are also along the river system. The weak correlation near the river system could be the overall character that is induced by pumping activities and controlled by the river stages.
Jiji weir built at the mountain pass for conjunction use of water resources. The operations of the weir might influence the natural behavior of the GWL near the upstream of the proximal fan area.
For understanding the impact of precipitation on the GWR and GWL, the Pearson correlation coefficient maps were derived using the 10-year data (i.e., 2006-2015) of the GWL and GWR. Based on the precipitation data, the high precipitation months are generally from May to October, representing the wet months. The low precipitation months are from November to April, which represent the dry months. The Pearson correlation values are generally various in the study area, but most higher correlation values are shown in the proximal fan area. Figure 9 shows the annual average of the estimated correlation between GWL and GWR. The results are the basis to assess the regional interactions between the GWL and GWR. In the upstream area, there are low correlation zones for the GWL and GWR (Figure 9a). Principally, low correlations occur when disagreement trends exist in the same months in different years at the grid points. The mechanism is complicated because the long-term data set covered the climate and anthropogenic conditions through the collection data. The pattern of the results in Figure  9 can provide the fundamental elements to assess the outcome of the regional water interactions. In  Figure 10 shows the selected distributions of the correlation map for February and July, representing the dry month and wet month, respectively. Note that the correlation maps considered the data sets obtained from 2006 to 2015. The averaged correlation maps can provide variations of the water interaction dynamics in the CRGB. Comparing the results in Figure 9b,c with the results of February and July in Figure 10, we have observed identical distributions of the correlation, suggesting that the behavior in July and February can represent the interactions between the GWL and GWR in the wet and dry months. Except for the selected months in Figure 10, the proposed approach can estimate correlation maps for different months. During the monsoon season (May and June), the precipitation in the season varies significantly from one year to another. Therefore, the correlation between GWL and GWR during the transition month (May and June) is not consistent with other wet months (May-October). The unstable variations of the precipitation might directly influence the overall correlation between GWL and GWR. However, in the dry months (i.e., November to April), the correlation of the GWL and GWR is not clear. Characterizing the mechanism of water interactions becomes a challenging task because the variations of the correlation are very high from month to month. Human activities in the CRGB might need to be involved in the analysis to address the issue. Figure 11 shows the even-based estimation of the GWR and compares the GWL at wells in the proximal, middle, distal fan areas. This study selected a 3-day rainfall event from 22 May 2015 to 24 May 2015 to assess the interactions of the GWL and GWR. In the comparison case, we employed the hourly precipitation and GWL for the analysis. This period was determined based on a continuous rain event, enabling us to focus on the typical rain event. We calculated the GWR using the mass balance concept shown in Equation (6). Due to limited evaporation data, the GWR was estimated by using hourly evaporation and runoff data from ERA-Interim data produced by the European Centre for Medium-Range Weather Forecasts (ECMWFs). ERA-Interim is a global atmospheric reanalysis that combines observations and a numerical model to create a synthesized data estimate. Evaporation and runoff data of ERA-Interim have often been used to analyze hydrological problems in many cases [40]. Except for the selected months in Figure 10, the proposed approach can estimate correlation maps for different months. During the monsoon season (May and June), the precipitation in the season varies significantly from one year to another. Therefore, the correlation between GWL and GWR during the transition month (May and June) is not consistent with other wet months (May-October). The unstable variations of the precipitation might directly influence the overall correlation between GWL and GWR. However, in the dry months (i.e., November to April), the correlation of the GWL and GWR is not clear. Characterizing the mechanism of water interactions becomes a challenging task because the variations of the correlation are very high from month to month. Human activities in the CRGB might need to be involved in the analysis to address the issue. Figure 11 shows the even-based estimation of the GWR and compares the GWL at wells in the proximal, middle, distal fan areas. This study selected a 3-day rainfall event from 22 May 2015 to 24 May 2015 to assess the interactions of the GWL and GWR. In the comparison case, we employed the hourly precipitation and GWL for the analysis. This period was determined based on a continuous rain event, enabling us to focus on the typical rain event. We calculated the GWR using the mass balance concept shown in Equation (6). Due to limited evaporation data, the GWR was estimated by using hourly evaporation and runoff data from ERA-Interim data produced by the European Centre for Medium-Range Weather Forecasts (ECMWFs). ERA-Interim is a global atmospheric reanalysis that combines observations and a numerical model to create a synthesized data estimate. Evaporation and runoff data of ERA-Interim have often been used to analyze hydrological problems in many cases [40].  Figure 11 shows that the rainfall event in the CRGB produced a similar rainfall pattern but was slightly different in the rainfall intensity. However, the two GWL observations in the proximal fan area (one in the east of the mountain pass and one in the west of the mountain pass) show significantly different patterns. The GWL in the east of the proximal fan area shows a gradual  Figure 11 shows that the rainfall event in the CRGB produced a similar rainfall pattern but was slightly different in the rainfall intensity. However, the two GWL observations in the proximal fan area (one in the east of the mountain pass and one in the west of the mountain pass) show significantly different patterns. The GWL in the east of the proximal fan area shows a gradual decrease (see Figure 11a). The effect of the rainfall event on the GWL is limited. With the heavy rainfall after the third day (i.e., after 48 h), the GWL decreased dramatically. However, the GWL in the west of the mountain pass in the proximal fan shows a pattern similar to the GWL in the middle and distal fan areas. The GWL in the west increases with the increase in precipitation in the rainfall event. The GWL observations in the east and the west of the proximal fan show significantly different responses to the same rainfall event. This different behavior could be influenced by the high variation of aquifer thickness at the mountain pass between Pakuashan tableland and Douliu hill (see Figure 1a). The mountain pass could constrain the groundwater flow. Additionally, the Jiji weir was built across the mountain pass (Figure 1a). The operations of the weir could also induce the local interactions of groundwater and surface water. With a flood event, the water levels in the weir might be reduced to protect the infrastructure. The GWL in the middle and distal fan areas generally follows the precipitation pattern, directly contributing to the GWR. Our study found that the rainfall in the middle fan area is relatively sensitive to the GWR as compared to the GWL in the distal fan area.

Conclusions
This study proposes a data-driven approach that enables the integration of multiple data sources to produce a higher spatial and temporal resolution of the GWL. The results provide useful information to assess the basin-scale water interactions. Numerous evaluation criteria were used to determine the accuracy of the interpolated GWLs and evaluate the performance when additional variables were added in the interpolation. Based on the MoP obtained from the data processing, this study employed the CCN method to estimate the GWR and evaluate the correlation between GWL and GWR. The interactions between GWL and GWR are systematically discussed in the study.
The MoP was obtained based on reducing the residuals between CHIRPS and field observations from the rainfall stations. The results of the precipitation modification showed significant improvement of the accuracy as compared to the original CHIRPS data. The linear regression analysis showed that the precipitation and DEM data could not be included together as the additional variables for the GWL interpolation because of the obtained multicollinearity condition. With the DEM data included in the RK, the interpolated GWL showed considerable improvement in spatial and temporal domains. This study calculated the GWR by using the relationship that involves observations from the precipitation, evapotranspiration, and surface runoff in the study area. By the raster calculation, the volume of GWR in a year is about 1.40 billion m 3 /year and the value agrees with previous numerical studies. This value represents 37% of the annual precipitation (3.77 billion m 3 /year).
The high annual correlation values between the GWL and GWR are mainly located in the proximal and distal fan areas. There are negative and positive correlation zones that coexist in the central regions of the CRGB. The correlation values in these zones can reach −0.5, indicating the disturbance of GWL has dominated the character of surface water and groundwater interactions. In the wet months, the positive correlation areas dominate the CRGB, showing a similar GWL and GWR trend. The positive correlation between GWL and GWR are along with the river system during the dry years. The weak correlation near the river system could be the overall character that is induced by pumping activities and controlled by the river stages.
We have observed that the behavior in July and February can represent the GWL and GWR interactions in the wet and dry months. In general, the wet months show relatively high correlations between GWL and GWR. However, in the dry months, the correlation between GWL and GWR is not clear based on the available observation data. The operations of the Jiji weir at the mountain pass can also induce the local interactions of groundwater and surface water. More information relevant to anthropogenic activities such as groundwater use and weir operations is required to quantify the influence of the anthropogenic activities on the regional water interactions.
The approach in this study was developed to improve the spatio-temporal resolution of the observations obtained from the satellite and field observations. The RK is robust to interpolate the groundwater levels supported by surface elevation as the second variable. The available satellite data allow for evaluating basin-scale groundwater recharge by using the modified precipitation and land use. However, the temporal resolution has been restricted by the available satellite images, although the field observations from field stations have relatively high sampling rates. The inconsistency of the temporal resolution will limit the use of the near real-time monitoring of near-surface hydrology and groundwater dynamics.