Numerical Representation of Groundwater-Surface Water Exchange and the Effect on Streamﬂow Contribution Estimates

Groundwater-Surface Water Streamﬂow Abstract: The effects of streams and drainage representation in 3D numerical catchment scale models on estimated streamﬂow contribution were investigated. MODFLOW-USG was used to represent complex geology and a stream network with two different conceptualizations—one with equal cell discretization in the entire model domain and another with reﬁned cell discretization along stream reaches. Both models were calibrated against a large data set including hydraulic heads and streamﬂow measurements. Though the optimized hydraulic parameters and statistical performance of both model conceptualizations were comparable, their estimated streamﬂow contribution differed substantially. In the conceptualization with equal cell discretization, the drainage contribution to the streamﬂow was 13% compared to 41% in the conceptualization with reﬁned cell discretization. The increase in drainage contribution to streamﬂow was attributed to the increase in drainage area in proximity to the stream reaches arising from the reﬁned discretization. e.g., the cell reﬁnement along stream reaches reduced the area occupied by stream cells allowing for increased drain area adjacent to the stream reaches. As such, an increase in drainage area equivalent to 7% yielded a 146% increase in drainage contribution to streamﬂow. In-stream ﬁeld measurements of groundwater-surface water exchange ﬂuxes that were qualitatively compared to calculated ﬂuxes from the models indicated that estimates from the reﬁned model discretization were more representative. Hence, the results of this study accentuate the importance of being able to represent stream and drain ﬂow contribution correctly, that is, to achieve representative exchange ﬂuxes that are crucial in simulating groundwater–surface water exchange of both ﬂow and solute transport in catchment scale modeling. To that end, the in-stream measurements of exchange ﬂuxes showed the potential to serve as a proxy to numerically estimate drainage contribution that is not readily available at the catchment scale.


Introduction
In catchment scale hydrological modeling, the numerical discretization needed to cover large areas, often in scales > 100 km 2 , instigates challenges of representing field measurements conducted on the meter scale. For groundwater (GW)-surface water (SW) exchanges, instream and streambed measurements provide valuable information for testing large-scale models that quantify both the magnitude and direction of exchange fluxes and the variability within the meter scale along stream reaches [1][2][3][4]. Such measurements are often not utilized in catchment scale modeling [5], for good reasons. Measured variability in GW-SW exchanges along stream reaches is difficult to represent in models where the spatial discretization

Study Area and Setting
The study area is situated in western Denmark and comprises a 108 km 2 headwater catchment including two lakes and 52 km streams. Numerous field investigations and local scale modeling studies of GW-SW exchanges have been conducted within the catchment in the past decade (e.g., [3,4,6,16,[24][25][26]), and a wealth of borehole data with geological information and measured water levels exist in the Danish National well database, JUPITER [27]. Based on water level observations from JUPITER and measured water levels from the many field investigations, the groundwater catchment was established from average head observations ( Figure 1). The catchment consists of marine Miocene silt and sand layers overlain by Quaternary clay, sand, and gravel deposited during three glacial-interglacial periods [28,29]. The main stationary line of Weichselian glaciation runs through the catchment [29], dividing the landscape into flat outwash plain at the western part and moraine hills in the south-eastern area. Streamflow is from east to west. The main stream, Holtum Stream, is a second-order 15-km-long perennial stream with increasing width near the headwaters from around 0.4 m to around 8 m at the catchment outlet. From the hydraulic head contours, it is evident that Holtum Stream is a gaining stream with the largest gradients towards the stream occurring at the downstream reaches.
FLOW-USG capabilities, allowing inclusion of complex geology and refined cell discret zation in the vicinity of streams in a headwater catchment to investigate: (i) the effect o numerical discretization on the flow balance, and (ii) GW-SW exchange estimates alon stream reaches. The MODFLOW-USG results are compared with common modeling ap proaches using regular discretization in the entire modeling domain.

Study Area and Setting
The study area is situated in western Denmark and comprises a 108 km 2 headwate catchment including two lakes and 52 km streams. Numerous field investigations an local scale modeling studies of GW-SW exchanges have been conducted within the catch ment in the past decade (e.g., [3,4,6,16,[24][25][26]), and a wealth of borehole data with geolog ical information and measured water levels exist in the Danish National well database JUPITER [27]. Based on water level observations from JUPITER and measured water lev els from the many field investigations, the groundwater catchment was established from average head observations ( Figure 1). The catchment consists of marine Miocene silt an sand layers overlain by Quaternary clay, sand, and gravel deposited during three glacia interglacial periods [28,29]. The main stationary line of Weichselian glaciation run through the catchment [29], dividing the landscape into flat outwash plain at the wester part and moraine hills in the south-eastern area. Streamflow is from east to west. The mai stream, Holtum Stream, is a second-order 15-km-long perennial stream with increasin width near the headwaters from around 0.4 m to around 8 m at the catchment outlet. From the hydraulic head contours, it is evident that Holtum Stream is a gaining stream with th largest gradients towards the stream occurring at the downstream reaches.

Geological Setting
Geological model was built using GeoScene 3D and ArcGIS. Lateral extension and top elevation of the layers were interpreted in GeoScene 3D based on the data from 3664 boreholes from the JUPITER database, 20 electrical resistivity tomography (ERT) profiles, two SkyTem data sets, Danish soil map, and Digital Elevation Model ( Figure 2). The highest borehole density was in the western part of the groundwater catchment. Seventythree percent of the boreholes had a depth <20 m and 0.01% had a depth >100 m with the deepest borehole (334 m) located south-east of Lake Hampen. ERT profiles were measured Water 2021, 13, 1923 4 of 13 in the area of Lake Ejstrup and Lake Hampen. SkyTem data sets covered the eastern and central parts of the catchments. The top elevation of the layers was interpolated to a 50 × 50 m grid in ArcGIS following the shape of geological structures and discontinuities in the layer extensions. The interpolated elevations were adjusted to ensure no mutual crossing of the layers.
Geological model was built using GeoScene 3D and ArcGIS. Lateral extension and top elevation of the layers were interpreted in GeoScene 3D based on the data from 3664 boreholes from the JUPITER database, 20 electrical resistivity tomography (ERT) profiles, two SkyTem data sets, Danish soil map, and Digital Elevation Model ( Figure 2). The highest borehole density was in the western part of the groundwater catchment. Seventy-three percent of the boreholes had a depth <20 m and 0.01% had a depth >100 m with the deepest borehole (334 m) located south-east of Lake Hampen. ERT profiles were measured in the area of Lake Ejstrup and Lake Hampen. SkyTem data sets covered the eastern and central parts of the catchments. The top elevation of the layers was interpolated to a 50 × 50 m grid in ArcGIS following the shape of geological structures and discontinuities in the layer extensions. The interpolated elevations were adjusted to ensure no mutual crossing of the layers.  . Geological cross-section through the study area (for location see Figure 2). There is a buried valley cutting through Miocene deposits in the NE area of the groundwater catchment. Top elevations of the Miocene formations are at the greater depths east of the main stationary line.

Conceptual Model
Simulations were performed with MODFLOW-USG [23] and two steady-state model conceptualizations were used; a horizontally regular grid (REG) with a discretization of 100 × 100 m and a horizontally irregular grid (IRREG) with a base size of 10 m along with all stream segments. As such, the IRREG had a numerical discretization of 6.25 × 6.25 m Geological model was built using GeoScene 3D and ArcGIS. Lateral extension and top elevation of the layers were interpreted in GeoScene 3D based on the data from 3664 boreholes from the JUPITER database, 20 electrical resistivity tomography (ERT) profiles, two SkyTem data sets, Danish soil map, and Digital Elevation Model ( Figure 2). The highest borehole density was in the western part of the groundwater catchment. Seventy-three percent of the boreholes had a depth <20 m and 0.01% had a depth >100 m with the deepest borehole (334 m) located south-east of Lake Hampen. ERT profiles were measured in the area of Lake Ejstrup and Lake Hampen. SkyTem data sets covered the eastern and central parts of the catchments. The top elevation of the layers was interpolated to a 50 × 50 m grid in ArcGIS following the shape of geological structures and discontinuities in the layer extensions. The interpolated elevations were adjusted to ensure no mutual crossing of the layers.  . Geological cross-section through the study area (for location see Figure 2). There is a buried valley cutting through Miocene deposits in the NE area of the groundwater catchment. Top elevations of the Miocene formations are at the greater depths east of the main stationary line.

Conceptual Model
Simulations were performed with MODFLOW-USG [23] and two steady-state model conceptualizations were used; a horizontally regular grid (REG) with a discretization of 100 × 100 m and a horizontally irregular grid (IRREG) with a base size of 10 m along with all stream segments. As such, the IRREG had a numerical discretization of 6.25 × 6.25 m Figure 3. Geological cross-section through the study area (for location see Figure 2). There is a buried valley cutting through Miocene deposits in the NE area of the groundwater catchment. Top elevations of the Miocene formations are at the greater depths east of the main stationary line.

Conceptual Model
Simulations were performed with MODFLOW-USG [23] and two steady-state model conceptualizations were used; a horizontally regular grid (REG) with a discretization of 100 × 100 m and a horizontally irregular grid (IRREG) with a base size of 10 m along with all stream segments. As such, the IRREG had a numerical discretization of 6.25 × 6.25 m along the stream cells increasing to 100 × 100 m away from the stream. Flow in the streams was simulated using the SFR2 package [30], where stream width and elevation were specified according to data from Karan et al. [31]. The streambed thickness is assumed to be 1 m. A general head boundary (GHB) was used to define the lakes and time-averaged measured lake stages were specified as head stage. Drains were assigned in the remaining modeling area in a depth of 1 m to account for drainage from agricultural land use (around 54%), riparian lowlands, as well as ditches and smaller tributaries not represented separately due to lacking information. This is common practice in many regional scale models (e.g., [20,21]), and drainage is not activated unless the simulated water level exceeds the specified drain elevation. Unsaturated zone processes were not in focus in the current study, and recharge was based on average streamflow measurements from the gauging station near the stream outlet ( Figure 1). The recharge flux was calculated from the average streamflow divided by the catchment area. It is thus assumed that the average streamflow represents steady-state flow conditions and that all recharge is captured by the stream (and drains to streams). Hence, the outer boundary of the model domain was specified as no-flow.

Model Calibration
The parameter estimation code PEST [32] was used for calibration of the models, and these were calibrated against 419 hydraulic head data representing seven aquifer units. The hydraulic head data were a mix of single measurements and the average time series of head observations. The uncertainty of the hydraulic head data was calculated based on procedures in Henriksen et al. [33] and yielded a standard deviation of around 3 m. As the clay units of Quaternary and Miocene origin are not present throughout the model domain, the sand aquifers are not clearly separated. For simplicity, the hydraulic head data were treated equally, i.e., the weighting of the data within each of the six sandy geological layers representing aquifers was the same. The average streamflow was also used in the calibration and the weight was specified to assure that the contribution to the total objective function was similar for both hydraulic head and streamflow data. The average streamflow is based on observations from 2007 to 2016. The streamflow measurements have an estimated uncertainty of 10%. The average streamflow was compared to the sum of simulated streamflow and drainage. In MODFLOW, the simulated drainage (using the drain package) is treated as water leaving the model domain, and therefore the drain contribution must be added to the simulated streamflow when comparing to observed flow.

Geology
The catchment comprises Miocene and Quaternary gravel, sand, silt, and clay layers ( Figure 2). The bottom of the geological model is at the top of a fine grained, low permeable Vejle Fjord Formation. The Miocene sequence consists of alternating silt and sand layers locally overlain by Maarbaek clay formation. The total thickness of the Miocene deposits increases towards the east and reaches up to 266 m. Miocene formations were partially eroded during glaciations and a buried valley structure reaching −120 m below sea level was created at the eastern boundary of the catchment. The buried valley is mostly filled with sand and gravel covered by a thick clay layer. Miocene and the buried valley deposits are overlain by Quaternary sand and gravel with clay lenses. The thickness of the Quaternary sediments outside of the buried valley structure is up to 100 m. The uppermost Quaternary layer in the major part of the catchment consists of sand and gravel deposited at an outwash plain. In the south-eastern part of the catchment, east from the main stationary line, the sedimentary sequence is covered with moraine clay forming a hilly landscape.

Model Calibration
Each of the interpreted geological units was assigned a horizontal hydraulic conductivity and an anisotropy ratio for the horizontal and vertical hydraulic conductivity (K h /K v ). Since the aquifers were not clearly separated (Figure 3), the parameters for the Quaternary and Miocene sand and clay layers were tied. This left four geological units to be included in the calibration. The stream network was divided to represent one streambed vertical hydraulic conductivity (K v streambed) for Holtum Stream and one for all its tributaries. PEST was initially used to find the most sensitive parameters plotted as relative composite sensitivities in Figure 4. A sensitive parameter is here defined as having a sensitivity ≥ 3% of the maximum sensitivity. During the sensitivity analyses, different initial values were tested, which showed that with high conductance values for drains and lake GHB (generally > 1 1/d), the parameter became insensitive. Due to correlation between the two lake GHB conductances, these were also tied during the calibration. Likewise, the anisotropy ratio was fixed to 10, because of the correlation to K h . For both model conceptualizations, the K h Quaternary clays were insensitive and therefore not included in the calibration and set to 0.1 m/d similar to values previously used to simulate Quaternary clays in the area [6]. A lower K h for the Quaternary clays was tested and although the model converged successfully, the K h value of 0.1 m/d was maintained due to longer model run times with lower values, which was deemed acceptable as the K h Quaternary clays were insensitive. be included in the calibration. The stream network was divided to represent one streambed vertical hydraulic conductivity (Kv streambed) for Holtum Stream and one for all its tributaries. PEST was initially used to find the most sensitive parameters plotted as relative composite sensitivities in Figure 4. A sensitive parameter is here defined as having a sensitivity ≥ 3% of the maximum sensitivity. During the sensitivity analyses, different initial values were tested, which showed that with high conductance values for drains and lake GHB (generally > 1 1/d), the parameter became insensitive. Due to correlation between the two lake GHB conductances, these were also tied during the calibration. Likewise, the anisotropy ratio was fixed to 10, because of the correlation to Kh. For both model conceptualizations, the Kh Quaternary clays were insensitive and therefore not included in the calibration and set to 0.1 m/d similar to values previously used to simulate Quaternary clays in the area [6]. A lower Kh for the Quaternary clays was tested and although the model converged successfully, the Kh value of 0.1 m/d was maintained due to longer model run times with lower values, which was deemed acceptable as the Kh Quaternary clays were insensitive. During the calibrations, it was evident that for the IRREG model, Kv for the streambed for the tributaries could not be uniquely optimized, resulting in large confidence intervals. Hence, for simplicity, in both model conceptualizations, Kv for the streambed was optimized for a single parameter representing tributaries and Holtum. However, as final the calibration results were deemed satisfactory based on calculated root mean square error (RMSE), coefficient of determination (R 2 ), and mean absolute error (MAE), the selected approach was maintained.
For both the REG and IREG models, root mean square error (RMSE) for hydraulic heads was well below the estimated uncertainty of 3 m and coefficient of determination (R 2 ) close to one ( Figure 5). The two models represent equally well the mean hydraulic heads over the range from 50 to 117 m. This was also supported by the mean absolute error (MAE) of 1.80 m and 1.77 m for the REG and IRREG models, respectively. For both models, the simulated streamflow was within 0.1% and thus well within the uncertainty of 10% for the flow observation.  During the calibrations, it was evident that for the IRREG model, K v for the streambed for the tributaries could not be uniquely optimized, resulting in large confidence intervals. Hence, for simplicity, in both model conceptualizations, K v for the streambed was optimized for a single parameter representing tributaries and Holtum. However, as final the calibration results were deemed satisfactory based on calculated root mean square error (RMSE), coefficient of determination (R 2 ), and mean absolute error (MAE), the selected approach was maintained.
For both the REG and IREG models, root mean square error (RMSE) for hydraulic heads was well below the estimated uncertainty of 3 m and coefficient of determination (R 2 ) close to one ( Figure 5). The two models represent equally well the mean hydraulic heads over the range from 50 to 117 m. This was also supported by the mean absolute error (MAE) of 1.80 m and 1.77 m for the REG and IRREG models, respectively. For both models, the simulated streamflow was within 0.1% and thus well within the uncertainty of 10% for the flow observation.
The optimized parameters for the models yield comparable values for the REG and IRREG models ( Table 1). The calibrated hydraulic conductivities for geological layers and the streambed were well constrained with relatively narrow confidence intervals. For the lake GHB's, the calibrated conductance was only well constrained in the IRREG model, while the GHB-and drain conductances in the REG model were not. Several calibrations were performed with different initial values for both conceptualizations and weights adjusted to secure equal contribution of the observation groups to the total objective function. Still, it was not possible to obtain a well constrained parameter value for the drain conductance in the REG model. Since the drain conductance was insensitive with values >1 1/d and the drain conductance in the IRREG model calibration kept reaching the upper bound of the specified parameter range (which was also tested), it was decided to fix the parameter in the model. Therefore, based on the final REG model calibration of the drain conductance, a fixed drain conductance of 0.05 1/d was used in the calibration of the IRREG model (Table 1).  The optimized parameters for the models yield comparable values for the REG and IRREG models ( Table 1). The calibrated hydraulic conductivities for geological layers and the streambed were well constrained with relatively narrow confidence intervals. For the lake GHB's, the calibrated conductance was only well constrained in the IRREG model, while the GHB-and drain conductances in the REG model were not. Several calibrations were performed with different initial values for both conceptualizations and weights adjusted to secure equal contribution of the observation groups to the total objective function. Still, it was not possible to obtain a well constrained parameter value for the drain conductance in the REG model. Since the drain conductance was insensitive with values >1 1/d and the drain conductance in the IRREG model calibration kept reaching the upper bound of the specified parameter range (which was also tested), it was decided to fix the parameter in the model. Therefore, based on the final REG model calibration of the drain conductance, a fixed drain conductance of 0.05 1/d was used in the calibration of the IR-REG model (Table 1).

Stream Flow Contribution and Groundwater-Surface Water Exchange
The flow balance relative to the total outflow showed that the main difference between the two conceptualizations was related to the simulated discharge to streams and drainage to streams ( Table 2). As the total estimated streamflow was represented by the sum of simulated streamflow and drainage, the drain contribution to the total streamflow was equivalent to 13% and 31% for the REG-and IRREG model, respectively. Simulated groundwater fluxes to Holtum stream were extracted from the model results and compared to estimated groundwater discharge rates from previous field measurements ( Figure 6). For the REG model, the simulated groundwater discharge ranged from 0.3 to 3.5 m/d with an average of 1.41 m/d, while in the IRREG model, the groundwater discharge ranged from 0.01 to 2.6 m/d with an average of 0.9 m/d. Hence, the groundwater discharge to the stream in the REG model was substantially larger than in the IRREG model. The average of GW-SW exchange fluxes from field measurements was 0.41 m/d and ranged from 0.06 to 1.73 m/d mainly around 3 km of the most downstream part of the stream. Overall, both models showed decreasing groundwater discharge moving downstream. At the upstream part, from around 1700 m to 6500 m, where the largest groundwater discharge was calculated, there was a pattern of areas being flooded in both model conceptualizations. The sudden drops in calculated groundwater discharge in both models, for instance at around 5000 m, coincided with stream cells defined in areas represented by clay.

Flow Budget
Horizontally Regular Grid (REG) Simulated groundwater fluxes to Holtum stream were extracted from the model results and compared to estimated groundwater discharge rates from previous field measurements ( Figure 6). For the REG model, the simulated groundwater discharge ranged from 0.3 to 3.5 m/d with an average of 1.41 m/d, while in the IRREG model, the groundwater discharge ranged from 0.01 to 2.6 m/d with an average of 0.9 m/d. Hence, the groundwater discharge to the stream in the REG model was substantially larger than in the IRREG model. The average of GW-SW exchange fluxes from field measurements was 0.41 m/d and ranged from 0.06 to 1.73 m/d mainly around 3 km of the most downstream part of the stream. Overall, both models showed decreasing groundwater discharge moving downstream. At the upstream part, from around 1700 m to 6500 m, where the largest groundwater discharge was calculated, there was a pattern of areas being flooded in both model conceptualizations. The sudden drops in calculated groundwater discharge in both models, for instance at around 5000 m, coincided with stream cells defined in areas represented by clay. Figure 6. Calculated groundwater discharge along Holtum stream from the two model conceptualizations and previously estimated values [4,7,16,31,34].

Parameter Calibration
The calibrated K h for the Quaternary sands was higher compared to the Miocene sands with a ratio of approximately 2:1. Kidmose et al. [6] found a similar ratio in a model investigating the GW-SW exchange at Lake Hampen, which is situated in the north-eastern part of the catchment (Figure 1). Kidmose et al. [6], however, attained slightly higher K h values for the Quaternary sand of around 70 m/d. The calibrated K h of Quaternary sand is in good agreement with measured K h of surface-near aquifers in the riparian zones. Karan et al. [7] reported values ranging from 0.2 to 19.6 m/d, Karan et al. [25] values ranging from 16 to 69 m/d, and Steiness et al. [16] values ranging 0.2 to 50 m/d. Further, the calibrated K h for Quaternary sand was well constrained with relatively narrow confidence intervals (Table 1). In addition, different initial values were tested for the parameters in the calibrations to ensure that no local minima were encountered, and the optimizations yielded similar values for the calibrated parameters (not shown). Thus, the calibrated K h for Quaternary sand is deemed representative. Though no measured values of K h for the Miocene sand exist for the area, the ratio correspondence to previous studies [6] and the well constrained parameter value from the calibrations ( Table 1) indicate K h to be representative for Miocene sands. Likewise, the calibrated K h of Miocene clays were also well constrained and comparable to previously calibrated values for the Miocene clay in the area [6]. Further, the calibrated K h of Miocene clays supports the assumed 0.1 m/d for Quaternary clays.
The calibrated K v for the streambed was similar in the two model conceptualizations with well constrained confidence intervals (Table 1). Compared to previous studies [2,4] of the streambed hydraulic conductivities in Holtum stream, the calibrated values were comparable although in the lower range. In a detailed investigation of the horizontaland vertical streambed hydraulic conductivities, Sebok et al. [4] estimated K v streambed in the range of 0.01 to 8.4 m/d. Karan et al. [2] used measured horizontal streambed hydraulic conductivities in a numerical estimation of the anisotropy ratio. Based on these ratios, the estimated K v for the streambed would range from 0.14 to 13.6 m/d. As shown by Sebok et al. [4], K v can be subject to spatial and temporal variations (scouring and sedimentation). Here, the entire stream network with Holtum stream encompassing 15 km was represented with a single K v . Therefore, a calibrated K v value in the lower range of previously estimated values is likely representative in the applied conceptualizations. A more sophisticated approach of dividing the stream into multiple segments each having its own K v and varying the weighting schemes of the observations to contribute differently to the total objective function could potentially lead to a better optimization with well constrained K v for the individual stream reaches. However, as the final calibration results were deemed satisfactory based on calculated RMSE, R 2 , and MAE ( Figure 5), the selected approach was maintained. In future work, the spatially distributed K v will be further investigated by dividing the stream network into multiple reaches to enable calibration of K v streambed at the reach level rather than in the entire lengths of the stream.
The conductances for drain and GHB were not well constrained in the REG model and although several initial values were tested, it was not possible to reach unique estimates of the parameters. For the IRREG model, where the drain conductance was fixed, the calibrated lake GHB conductance was well constrained (Table 1), and thus, there seemed to be a correlation between the two parameters. Nevertheless, the calibrated values for drain conductance were comparable to the National regional hydrological model, also covering the study area [35], where drain conductance in sandy areas was estimated to 0.04 1/d.

Streamflow Contribution of Flow Balance Components
The calibration of the two model conceptualizations yielded similar results in terms of calibration statistics with slightly better performance of the IRREG model ( Figure 5). However, the resulting contribution to streamflow differed between the two conceptualizations where the drain contribution was substantially larger in the IRREG model ( Table 2). The drainage contribution to streamflow was 31% in the IRREG model compared to 13% in the REG model. The reason for the large difference in the drain flow contribution is due to the numerical discretization along the streams. Due to the finer discretization in the IRREG model, the drainage area increased by almost 8 km 2 as it was possible to specify a drain coverage closer to the stream cells (compare a cell area of 6.25 × 6.25 m for the IRREG model to a cell area of 100 × 100 m in the REG model). An example of the different drainage and stream areas is given in Figure 7. The additional drainage area in the IRREG model was equivalent to a 7% increase compared to the REG model but lead to a 146% increase in the drainage contribution to streamflow in the IRREG model. These results clearly showed the effect of refining the discretization along stream segments and accentuate the importance of how streams and drains in stream valleys are represented in numerical models. Substantial drainage contribution to the total streamflow has been shown in other studies. The drainage contribution to streamflow was 41% in the study by King et al. [14] or as high as 67% in the field investigations by Steiness et al. [16]. It is well known that riparian lowlands, although representing a small fraction of a catchment area, their ability to remove e.g., nitrate, is very high [36,37]. The most abrupt changes in topography and hydraulic gradients are often at the hill slope of stream valleys bordering riparian lowlands and their adjacent upland areas. Thus, it can be the ratio of catchment to lowland area that defines the split between seepage and drain. Therefore, with too large discretization, the likelihood of misrepresenting the stream and drainage areas is eminent in catchment scale models, which will affect the split between how much of the regional groundwater input reaches the stream by drains or seepage through the streambed. In conjunction, modeling of non-point nutrient dynamics with such models would be prone to misinterpretation given the potential erroneous flow balance contribution to streamflow. merical models. Substantial drainage contribution to the total streamflow has been shown in other studies. The drainage contribution to streamflow was 41% in the study by King et al. [14] or as high as 67% in the field investigations by Steiness et al. [16]. It is well known that riparian lowlands, although representing a small fraction of a catchment area, their ability to remove e.g., nitrate, is very high [36,37]. The most abrupt changes in topography and hydraulic gradients are often at the hill slope of stream valleys bordering riparian lowlands and their adjacent upland areas. Thus, it can be the ratio of catchment to lowland area that defines the split between seepage and drain. Therefore, with too large discretization, the likelihood of misrepresenting the stream and drainage areas is eminent in catchment scale models, which will affect the split between how much of the regional groundwater input reaches the stream by drains or seepage through the streambed. In conjunction, modeling of non-point nutrient dynamics with such models would be prone to misinterpretation given the potential erroneous flow balance contribution to streamflow.

GW-SW Exchange Fluxes
At the downstream part of Holtum stream, the groundwater fluxes to the stream estimated in the field were comparable with calculated groundwater fluxed from the two conceptualizations ( Figure 6). Moving upstream from around 7500 m to the headwaters, the calculated fluxes-up to 3.5 m/d and 2.6 m/d for the REG and IRREG model-are somewhat higher than the measured fluxes ( Figure 6) and the previously largest simulated fluxes of around 1.6 m/d [2]. As these relatively large calculated groundwater fluxes coincided with areas being flooded, it is concluded that the GW-SW exchange is not correctly represented in these areas within both models. This was supported by the interpreted hydraulic head contour map, Figure 1. Here, there were no strong gradients towards the stream that would indicate gaining reaches in the upstream part of Holtum stream. As such, the drainage contribution to the streamflow could be larger than estimated in these conceptualizations (Table 2). Thus, with a larger contribution from drainage in the upper part of Holtum stream, estimated groundwater fluxes could also be correspondingly lower and similar to measured groundwater flux ranges. Hansen et al. [15] showed that including drainage in a prior undrained area reduced groundwater fluxes substantially. Compared to the REG model, the calculated range and average exchange flux in the IRREG model yielded values in closer correspondence with the measured values. The lowest GW-SW exchange fluxes from the measurements, 0.06 m/d, were also captured in the IRREG model with values as low as 0.01 m/d, while the lowest calculated flux in the REG model was 0.3 m/d. The calculated groundwater fluxes in stream reaches with measurements showed a general pattern of being larger ( Figure 6). However, it is noted that the measured groundwater fluxes do not represent steady-state but represent a snapshot in time of transient fluxes [2,4,24].
GW-SW exchange fluxes from especially the upstream area could likely be better represented if a more rigorous calibration was performed by using regularization such as pilot points [38] for K h in the different geological units, dividing the stream reaches into several segments representing different K v streambed values, and/or by varying the drainage conductance throughout the catchment. Although this was not part of the scope for the current study, the worth of incorporating measured GW-SW exchange fluxes is clear. With large differences in especially upstream calculations of GW-SW exchange fluxes, measured exchange fluxes could be used to constrain the models and thus aid in achieving representative flow balance contributions to streamflow. A common approach for assessing groundwater-stream exchange in catchment models is evaluating the simulations against observed hydraulic heads and stream flows [5,39]. Schilling et al. [5] argue that incorporating unconventional data such as GW-SW exchange fluxes in addition to hydraulic heads and streamflow in systematic model calibration could enhance the model predictability in terms of flow and mass balance.

Conclusions
The results of this study showed that the discretization used to specify the streams and drainage in their proximity had a substantial impact on simulated flow balance contribution to streamflow. As these differences in flow balance contribution consequently lead to differences in exchange fluxes that are crucial in simulating GW-SW exchange of flow and nutrients dynamics, it is paramount that future 3D numerical modeling can accurately estimate flow balance contribution to streamflow. A key challenge is to gather measurements of the drainage contribution on the catchment scale [18]. Here, we showed that in-stream measurements of GW-SW exchange fluxes are readily applicable for comparison with model calculated GW-SW exchange fluxes. Therefore, with the flexibility in calibration tools (e.g., PEST) to incorporate GW-SW exchange fluxes as calibration targets [5] and numerical codes allowing for flexible implementation of complex geology and discretization (e.g., MODFLOW-USG), more research is needed to investigate if and how GW-SW exchange fluxes can also be used to serve as a proxy to estimate the drainage contribution to streamflow on the catchment scale.
Author Contributions: S.K.; conceptualization, methodology, investigation, formal analysis, visualization, writing-original draft preparation, writing-review and editing. M.J.; investigation, formal analysis, writing-review and editing. J.K.; investigation, formal analysis, visualization, writing-review and editing. J.A.R.-G.; investigation, formal analysis, writing-review and editing. T.B.; investigation, formal analysis, writing-review and editing. P.E.; investigation, formal analysis, writing-review and editing. All authors have read and agreed to the published version of the manuscript.