Evaluation of Baseflow Modeling with BlueM.Sim for Long-Term Hydrological Studies in the German Low Mountain Range of Hesse, Germany

: So far, research with the hydrological model BlueM.Sim has been focused on reservoir management and integrated river basin modeling. BlueM.Sim is part of the official toolset for estimating immissions into rivers in Hesse (Germany) via long-term continuous modeling. Dynamic runoff modeling from rural catchments is permitted within the Hessian guidelines, but in practice, a constant flow or low flow is used. However, due to increasing water stress in the region caused by climate change, the dynamic modeling of runoff from rural catchments will become necessary. Therefore, dynamic baseflow modeling with BlueM.Sim is of the greatest importance. This study evaluated baseflow modeling with BlueM.Sim in a representative hard-rock aquifer in the German Low Mountain range. Two model setups (Factor Approach (FA): CN method + monthly baseflow; Soil Moisture Approach (SMA): physical soil moisture simulation) were calibrated (validated) for a 9-year (5-year) period. The FA achieved an NSE of 0.62 (0.44) and an LnNSE of 0.64 (0.60) for the calibration and validation periods. The selection of a solution for the successful validation of the FA was challenging and required a selection that overestimated baseflow in the calibration period. This is due to the major disadvantage of the FA, namely, that baseflow can only vary according to an estimated yearly pattern of monthly baseflow factors. However, the data requirements are low, and the estimation of monthly baseflow factors is simple and could potentially be regionalized for Hesse, leading to a better representation of baseflow than in current practice. The SMA achieved better results with an NSE of 0.78 (0.75) and an LnNSE of 0.72 (0.78). The data requirements and model setup are extensive and require the estimation of many parameters, which are limitations to its application in practice. Furthermore, a literature review has shown that a single linear reservoir, as in BlueM.Sim, is not optimal for modeling baseflow in hard-rock aquifers. However, for detailed climate change impact studies in the region with BlueM.Sim, the SMA should be preferred over the FA. It is expected that BlueM.Sim would benefit from implementing a more suitable model structure for baseflow in hard-rock aquifers, resulting in improved water balance and water quality outcomes.


Introduction
Baseflow is the portion of streamflow originating from groundwater and other delayed discharge sources, like lakes and snowpacks [1,2].On an annual scale, it is often considered equal to the groundwater recharge of a catchment [3].Baseflow is influenced by factors such as the aquifer type, the geology and geomorphology of the catchment [4,5] and land use [4,[6][7][8][9].Therefore, quantifying baseflow is complex: it is not commonly readily measurable [10,11], and the data quality is often poor [10].In southern Germany, there have been very few years with above-average precipitation since 2003 [12], and studies have shown that in many regions in Germany, a decline in groundwater recharge can be expected [13].This will lead to prolonged low-flow periods, ecohydrological problems and socio-economic conflicts [14].Refs.[15,16] found that, in particular, catchments with fractured hard-rock aquifers, such as the German Low Mountain range, are more sensitive to decreases in seasonal precipitation and are consequently more at risk of seasonal low flows.A study in the German Low Mountain range [17] found that, in the future, "despite wetter winters, extended droughts persist.Water stress intensifies in summer and autumn, with a projected 68% increase in dry period duration."[17] (p. 1).Therefore, the future holds many challenges, e.g., regarding reservoir management, freshwater distribution, freshwater ecology and the protection of groundwater resources, to name a few.To address these challenges, especially with regard to the future, computer models are used to study baseflow trends under different scenarios, e.g., changes in temperature and precipitation due to climate change.Due to the complexity inherent in the quantification of baseflow, many different approaches to hydrological modeling exist.The following is not intended to be a comprehensive review of the literature regarding baseflow modeling but serves as a quick comparison of the model BlueM.Sim [18], which was used in this study, with other well-known models.
Approaches to modeling baseflow range from fixed discharge rates modified by a monthly factor to account for seasonal variations, e.g., HEC-HMS [19] and BlueM.Sim [18]; semi-distributed conceptual models consisting of one or more linear or nonlinear reservoirs, e.g., HEC-HMS [19], BlueM.Sim [18], SWAT [20] and HBV [21]; and distributed physically based groundwater models, e.g., MODFLOW [22].The modeling of baseflow as a linear or nonlinear reservoir is an ongoing debate within the hydrological community.Several studies have found a linear reservoir to be adequate, e.g., [23,24].Other studies found a single linear reservoir to be inadequate.Ref. [5] found parallel linear reservoirs to be the best model structure for fractured hard-rock aquifers.Several studies, including [25][26][27][28], have found that nonlinear reservoirs are better suited for baseflow modeling.
The focus of this study is on the hydrological simulation model BlueM.Sim and its ability to model baseflow in the long-term continuous simulation of a research basin in the federal state of Hesse in Germany.BlueM.Sim (version 0.9.7) is part of the software package BlueM [18], which consists of BlueM.Wave (version 2.1.0,visualization and timeseries analysis), BlueM.opt(version 1.8.7, autocalibration and sensitivity analysis) and the hydrological simulation core BlueM.Sim.The software package is freely available at https://bluemodel.org/ (accessed on 30 September 2023).
BlueM.Opt has been applied to, e.g., the optimization of reservoir management [29,30], the multi-objective optimization of urban wastewater systems [31] and model predictive control [32].
BlueM.Sim has been used in a variety of hydrological fields, e.g., the integrated modeling of rural and urban catchments [33], reservoir management [29,34] and sediment transport [35].Additionally, it is part of the official toolset in the guideline regarding the identification of critical ecological impacts on rivers due to waste-and stormwater discharge in the federal state of Hesse [36].To our knowledge, BlueM.Sim has not been used for a study focusing on baseflow.However, baseflow plays a significant role in the Hessian guideline, and therefore, the evaluation of BlueM.Sim with regard to baseflow modeling is necessary.
The toolset consists of BlueM.Sim, which models rural catchment hydrology, flow routing and water quality modeling in rivers, and the pollutant load simulation model SMUSI [37], which depicts urban catchment hydrology, flow routing, overflow and water quality in the sewer system.
In recent years, the guideline DWA-A/M 102/BWK-A/M 3 [38], which is relevant to all of Germany, has been introduced.It applies similar methods to the Hessian guideline, and therefore, further interest in BlueM.Sim is warranted.The Hessian guideline [36] is outlined in the following.It consists of two simulation scenarios, named the hydraulic and pollutant scenarios.The hydraulic scenario is concerned with limiting the hydraulic impact, e.g., sheer stress, of overflows from sewer systems during rain events.The pollutant scenario assesses the impact of overflows and outflows from wastewater treatment plants (WWTP) on the oxygen and ammonia nitrogen concentrations in a river.Critical conditions in this scenario typically arise during the summer months, when low-flow conditions and high temperatures are more likely [36].Long-term continuous simulations are performed for both scenarios.However, both scenarios include rural catchment hydrology only at the most basic level.In the pollutant scenario, a constant discharge rate representing a mean low flow (baseflow) with a constant temperature of 20 °C (daily average temperature in summer in Hesse) is assigned to each rural catchment and remains constant over the entire simulation.This represents a worst-case scenario, as these unfavorable conditions are naturally not present 365 days per year [36].This guideline, as well as the national guideline, does allow for a more physically based modeling approach for baseflow and total runoff from a catchment; however, it is common practice to apply the minimal representation of catchment hydrology.Changes in climate boundary conditions will significantly alter flow conditions, especially low-flow conditions in the summer [16].Therefore, assessing the impact of combined sewer overflows and WWTP effluents on freshwater ecology under climate change will need a significantly improved representation of baseflow over the currently applied statistically derived mean low flow.However, given the long-term simulations required by both the Hessian and national guidelines, their obligatory application for all peri-urban catchments [39] in Germany, the data demand, data complexity and computing time demands, physically based soil moisture seems unfeasible for the time being.Therefore, the baseflow-modeling capabilities of BlueM.Sim were examined regarding long-term continuous modeling in this study.BlueM.Sim offers two approaches to baseflow modeling: a constant discharge rate modified by monthly factors to represent seasonal variation and a detailed physical soil moisture simulation, with the seepage of the lower soil layer being routed through a single linear reservoir representing the baseflow.This study was conducted on a research basin in southern Germany in the federal state of Hesse, which is representative of the crystalline basement complex in the German Low Mountain range [40] and is therefore well suited for model testing.
The following research questions will be addressed: -How does BlueM perform in the long-term continuous simulation of baseflow, as it has not been previously used for this purpose?-While it is likely that the physically based soil moisture simulation will outperform the constant discharge rate with monthly factors, is there any potential for the simpler approach to be used as part of the Hessian guideline and, potentially, the nationwide guideline?
The findings are significant for a more physically based pollutant scenario, as baseflow plays a crucial role in water quality.This is particularly critical during summers characterized by high temperatures and low-flow conditions, which typically have a substantial impact on water quality.In a broader context, the results are of interest as southern Germany has experienced water stress in the last several years [13].The results are potentially of interest to stakeholders in catchments with similar geological and climatological conditions, especially with regard to modeling baseflow to study the future impacts of climate change.

Study Site
The Fischbach catchment (FIS) is southeast of the city of Darmstadt in the federal state of Hesse, Germany.The flow-gauging station, which marks the outlet of the catchment, covers an area of 35.6 km 2 .Its elevation ranges from 600 m. a.s.l. to 160 m a.s.l., and it is part of the Odenwald crystalline basement complex in the German Low Mountain range [41].The crystalline basement complex mainly consists of granite and diorite and has a low water storage capacity [42].The aquifer in the crystalline Odenwald is a fractured hard-rock aquifer that is partly overlain by weathered rock, acting as a shallow aquifer with better permeability [42].Lateral interflow makes up most of the direct runoff, which is the sum of surface runoff and interflow [42].Within the catchment, two main river branches exist.These are the Fischbach itself and the Rodauer Bach, which merges with the Fischbach shortly before the gauging station.Land uses comprise 51.7% forests and vegetation, 41.8% agricultural land and 6.5% settlements [40].More detail on the Fischbach catchment can be found in [16,17,41].A map of FIS is given in Figure 1.

Data Availability
Three long-term rain-gauging stations are located outside of the catchment itself (see Figure 1).The mean annual precipitation at these stations is 887.5 mm, 928.6 mm and 815.4 mm for the Lautertal/Odenwald-Reichenbach, Reichelsheim and Reinheim stations, respectively, for the time period 1989-2002 [43].The mean actual evapotranspiration is approximately 650 mm/a [42].
Discharge data (m 3 /s) are available from the Hessian Agency for Nature Conservation, Environment and Geology (HLNUG) at a daily resolution starting from 1974 [44].The long-term mean observed flow and mean observed low flow at the gauging station are 0.33 m 3 /s and 0.091 m 3 /s, respectively, for the period 1975-2017 [45].In this study, gauging data from 1989 to 2002 were used because climate and precipitation data are continuously available at a daily resolution.This period encompasses varying hydrological conditions, which include extreme low-flow periods (0.023 m 3 /s), as well as the highest recorded discharge (6.89 m 3 /s).Daily temperature (°C) is available from the nearest climate station in Lindenfels-Winterkasten [43].An overview of the available time series is given in Table 1.Land use and soil data are available from the CORINE [46], ATKIS [47] and BFD50 [48] datasets.

Methods
Precipitation and potential evapotranspiration: Mean daily precipitation in the catchment was estimated using the Thiessen Polygon Method [49].
Daily potential evapotranspiration was calculated according to [50].Ref. [50] analyzed climate stations in the federal state of Hesse and derived a semi-empirical equation to calculate daily potential evapotranspiration, which is scaled by the mean annual potential evapotranspiration in a catchment.
Mean baseflow and monthly baseflow factors: Mean baseflow was estimated using the Kille method [51], in which monthly low flows are sorted in ascending order.The plot is typically S-shaped for catchments with a pronounced relief [52], such as FIS.In this case, a nearly linear section of the plot is identified and is extrapolated to the beginning and end of the plot.Mean baseflow is then calculated as the height of the center of the area below the entire linear section.The calculated mean baseflow is then used to determine the BFIFIS in FIS, which is defined as the ratio of mean baseflow to mean total observed flow.
To determine the monthly factors, a daily baseflow series was separated using the Eckhardt filter [53].BFImax was set to BFIFIS.The authors of [41] found this method suitable for the estimation of the BFI of a catchment in the German Low Mountain range.The separated daily baseflow series was aggregated to a monthly time step, and the average monthly baseflow for each month was determined.The monthly factor fi was calculated as the ratio of the average monthly baseflow to the mean baseflow determined using the Kille method.
The mean baseflow and the monthly factors are input as well as calibration parameters for modeling baseflow with BlueM.Sim.
The BlueM software package was used for visualization and simulation, as well as calibration and validation.BlueM.Sim is the hydrological simulation core, and BlueM.Opt is a sensitivity analysis and calibration tool [18].
In the following, relevant aspects of BlueM.Sim and BlueM.Opt will be described following the documentation at https://wiki.bluemodel.org/index.php/Main_Page(accessed on 30 September 2023).

BlueM.Sim-Hydrological Simulation
BlueM.Sim encompasses the runoff generation, runoff concentration and flow-routing processes in the catchment, as well as channel flow routing.The catchment is made up of subcatchments and river segments in order to take land uses, soil types and slopes into account.
Figure 2 depicts the two model setups that were used in this study.The first approach-the Factor Approach (FA) (see Figure 2a)-divides a catchment into subcatchments and river segments.It utilizes a modified curve number method [54] to calculate excess precipitation and the resulting direct runoff, which consists of surface runoff and interflow.The modified curve number method [54] allows for continuous simulation and calculates initial losses for each new time step, taking antecedent precipitation into account.The event-dependent initial loss hva is given as where Ia is catchment-specific initial loss (mm), VN is 21-day antecedent precipitation (mm), CVW is the curvature parameter (-), and Ia is formulated as the product of the abstraction factor a and the storage capacity S of the considered area, as in the original curve number method [55].However, the constant abstraction factor a was set to 0.05 instead of 0.2, as [54] found this to be more representative of the considered climatic region.Finally, using the event-dependent initial loss hva, the runoff coefficient Ψ is calculated as: where hN,E is the accumulated sum of precipitation during a rain event.
Excess precipitation he is then computed as follows: where hN denotes the precipitation amount in each time step.For more details on the modified CN method, refer to [54].The direct runoff is then routed to the outlet of a subcatchment using a parallel linear reservoir cascade.
The baseflow out of a subcatchment is modeled by multiplying the mean baseflow by a monthly factor following a yearly pattern.Finally, the routed direct runoff and baseflow are superimposed to form the total discharge of a subcatchment.This total discharge is then routed downstream by the Kalinin-Miljukov method [56] to the connecting river segments.
The second approach-the Soil Moisture Approach (SMA) (see Figure 2b)-divides a catchment into subcatchments and river segments as well.However, subcatchments are made up of hydrological response units (HRUs) based on the slope, land use and soil type.It calculates excess precipitation and the resulting runoff components (surface runoff, interflow and baseflow) using a physically based soil moisture routine.The unsaturated soil layer in an HRU is represented by three layers: the infiltration, root and transport layer, respectively.The water balance equation is solved for each layer, taking relevant processes into account (see Figure 3).Surface runoff is routed to the catchment outlet via a parallel linear reservoir cascade, whereas interflow is routed through a single linear reservoir.Seepage from the transport layer is passed on to a linear reservoir, which models baseflow routing in the catchment.The mathematical formulation of the complete soil moisture routine is given in [33,57,58].The three flow components are superimposed at the outlet of a subcatchment to form its total discharge.This total discharge is then routed downstream by the Kalinin-Miljukov method [56] in the connecting river segments.
The different flow-routing approaches are depicted in Figure 4 to further illustrate the difference between the FA and SMA.For direct runoff (FA) and surface runoff (SMA), excess precipitation is distributed into two parallel branches of linear reservoirs.The first branch receives ß and the second branch receives 1-ß of excess precipitation as input.Each branch consists of two linear reservoirs.The recession constants of the reservoirs are k1 and k2 for the first and second branches, respectively.k3 represents the recession constant of the linear reservoir applied to impervious areas.K_interflow and k_baseflow, which are denoted by kI and kb in Figure 4b, are the recession constants of the linear reservoir of interflow and baseflow, respectively.

BlueM.Opt-Calibration and Validation
BlueM.Opt [18] is used for the autocalibration of the BlueM.Sim model.Calibration is performed by applying a Parametric Evolution Strategy (PES).The PES is a multicriteria evolutionary strategy that was originally developed by [59][60][61][62].For details on the implementation and modifications in BlueM.opt,refer to [63].The general process is as follows: The first generation of individuals is generated randomly with parameter sets within specified boundaries.The best individuals (parameter sets) regarding one or more objective functions are then chosen to form the parents for the next generation of individuals.The new individuals are then generated by the mutation and recombination of the parameter sets of the parent generation [64].Finally, "The solutions known as […] Pareto-optimum solution are those which are superior to all other possible solutions in the search space when all objectives are considered" [65].This means that no other solutions can be found that would improve all considered objective functions.For example, a new solution that improves objectives 1 and 2 would not have to improve objective 3 if the set of Paretooptimal solutions has been found.

Objective Functions
BlueM.Opt offers several different objective functions for calibration.Objective functions can be marked as primary or secondary.A secondary objective function will be evaluated; however, it will not be taken into consideration for determining the best individuals of a generation.
For the calibration, the following objective functions were chosen: Qobs and Qsim are the observed and simulated flows (m 3 /s), respectively.Maxobs and Maxsim are the maximum observed and simulated flows (m 3 /s).
The APD is chosen to reduce the flattening out of the simulated hydrograph.The SSE is quickly reduced when the APD difference is small and differences in peaks are reduced.The remaining reductions in the SSE consequently lead to better fitting of the curves in general.
Additionally, the following objective functions were considered as secondary objective functions: The LnNSE reduces the emphasis on peak flow compared to SSE and NSE [66].Consequently, the LnNSE better measures the fit of average-to-low flows [66].
Finally, a visual inspection of the Pareto-optimal solutions, the consideration of the secondary objective functions and the consideration of the flow duration curve (FDC) of the observed and simulated hydrographs are factored into the selection of the chosen solution.The FDC is a graphical representation of the magnitude and frequency of streamflow [67].Flows are ordered by their probability of exceedance.The magnitude of the flow is typically assigned to the Y-Axis, and the probability of exceedance is assigned to the X-Axis.The visual comparison of the FDC of observed and simulated daily streamflows is an indicator of how well the distributions of observed and simulated streamflows match.However, the FDC does not contain temporal information as to when potential differences occur.
The classification of the APD, NSE and ∆V as unsatisfactory, satisfactory, good or very good in the text follows the performance ratings given in [68].The performance rating of the NSE is applied to the LnNSE as well.The authors are not aware of an existing classification of SSE performance.Naturally, a lower SSE indicates better performance.

Preliminary Studies
Ref. [69] set up the initial FA model and divided the catchment into 8 subcatchments based on topography, land use and size.Ref. [70] set up the soil moisture model (SMA) based on the work of [69] and derived the hydrological response units for the SMA.The preliminary results of [70] indicated a better performance of the SMA.Ref. [71] focused on the analysis of the SMA setup with regard to hydrological and meteorological inputs, model deficits and first calibration attempts.Ref. [71] found that with a more accurate estimation of elements of the water balance cycle (precipitation, evapotranspiration, surface runoff, etc.), the calibration results could be improved.In this study, the FA model was calibrated for the same time period as that for which the SMA model was initially calibrated in [71].Furthermore, the calibration of the SMA was improved upon.
Starting parameters, parameter ranges and calibrated parameters The starting parameters of the CN, soil parameters, etc., are given in Table 2.The monthly factors fi by which the baseflow Qb is modified in the FA are depicted in Figure 5.In Figure 5, the calibrated monthly factors are the factors of the chosen Pareto-optimal calibration solution of the FA (see Section 3.2).The recession constants k1 and k2 for direct runoff, as well as the distribution factor ß, were estimated according to [54].K3 was not included in the calibration, as it was found not to be a non-sensitive parameter.The starting values and ranges of soil parameters for the SMA are based on the Bodenkundliche Kartieranleitung [72], a standard of soil parameters in Germany.The parameters 10ft he SMA shown in Figure 3 that are not included in Table 2 were found to not be sensitive in [71].The variation ranges of Qb and k_baseflow are based on the findings of [41].The monthly factors fi were estimated to be between 0.431 and 1.619 using the daily series of baseflow separated by the Eckhardt filter for the years 1989-1997.The bottom limit was set to 0.2, and the maximum limit was increased to 2.0.The lower boundary was chosen to allow for more options for BlueM.opt to explore the solution space.The maximum was increased to account for the maximum separated daily baseflow being nearly 0.6 m 3 /s.Setting the maximum factor to 2.0 and multiplying it by the maximum value of the range for Qb (5.6L/(s*km 2 )) allows for a maximum monthly baseflow of about 0.4 m 3 /s.Test calibrations indicated that monthly factors fi > 2.0 were not typically within the best solutions.The variation range of the curve number (CN) was based on the findings of [69].Similarly, the variation ranges of k1, k2 and ß were chosen based on the estimation of these parameters for each subcatchment according to [54].A variation range of 1-96 h for k_interflow was based on [42], where it is noted that interflow typically reaches the river within a few days in the German Low Mountain range.The nonlinear exponent nlin was varied between 0.7 and 1.0, as [41] noted some nonlinearity in the analysis of recession periods for FIS.It should be noted that nlin only applies to the parallel reservoir cascade of direct runoff and surface runoff in the FA and SMA, respectively.The snow-related parameters (for the meanings of the abbreviations, refer to Table 2) Ts, NSD, TD, MR and MRIS were estimated according to [73].The potential evapotranspiration (ETP) was varied within a range of 580-750 mm/a, as preliminary tests showed that the range of ETP of 542-612 mm/a, compiled by [71], resulted in too much runoff occurring due to too little actual evapotranspiration (ETA).The ETP scaling factors a_f, f_f and p_f were varied within the ranges according to [74].[73]; *** density at which the snowpack releases water; **** arithmetic mean and range of soil parameters are given over all soil types and not for individual soil types.Note: plausibility of parameters was considered by using constraints, e.g., tpv > fc > w, within soil, as well as between soils, e.g., sand and silt.
Calibration was carried out over 100 generations, with each generation consisting of 100 individuals generated from 15 parents.Therefore, for both the FA and SMA, 10,000 simulations were carried out.In the following, solutions (simulation results) will be addressed by their ID number: e.g., solution 9790 is the 9790th simulation result of the FA.GOFC were calculated for each individual by comparing the observed total streamflow with the simulated total streamflow.The simulated total streamflow was evaluated with respect to baseflow by observing the tail end of the FDC and the LnNSE.Additionally, BlueM.Sim outputs surface runoff, interflow (or direct runoff for FA) and baseflow time series in its result file.With the time series of simulated baseflow, a comparison of annual statistics found in the literature and monthly statistics derived from the separated daily series of baseflow can be made.

Precipitation Data Analysis and Baseflow Separation
A mean baseflow of approximately 0.13 m 3 /s (red line in Figure 6) was estimated for 1989-1997 using the Kille method, which corresponds to a BFI of 0.46.The separated baseflow using the Eckhardt filter with BFImax = 0.46 and the filter constant a = 0.976 [41] is shown in Figure 6 (light-blue line) for the entire available time period (1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002).The calculated BFI for the calibration period is 0.45.The depicted baseflow pattern (green line) was estimated for the calibration period 1989-1997.The dry and wet periods in the precipitation data are visible and are marked by yellow and blue boxes, respectively, in Figure 6.The monthly factors of the annual baseflow pattern are given in Figure 5.

Calibration of FA
The calibration results of the FA regarding the goodness-of-fit criteria (GOFC) are given in Table 3 for all Pareto-optimal solutions, as well as the chosen favorite solutionsolution 9790.Calibrated parameters are given in Table 2. Solution 9790 was chosen based on the GOFC, FDC, plausibility of the calibrated parameters and optical impression of the plots of simulated vs. observed flow.The FDC for this solution is shown in Figure 7, and the plot of simulated versus observed hydrographs is shown in Figure 8.The FDC for the calibration period shows very good agreement for flow greater than the mean observed flow of 0.33 m 3 /s.Table 3. Goodness-of-fit criteria for all Pareto-optimal solutions of FA, as well as the chosen favorite solution for FA.For flows smaller than the mean observed flow, solution 9790 tends to overestimate flows when compared to observed flows.

GOFC
The NSE of 0.62 is low compared to the very good agreement in the FDC.This means that the distribution of simulated flows tends to match that of the observed flows.However, this does not mean that the hydrographs of simulated and observed flows match as well.This is also evident when examining the plot in Figure 8.While peak values match according to the FDC, the timing of the peaks is not very accurate.This means that peak values are commonly over-or underestimated, but this is masked in the FDC, in which the temporal component is not included.The timing of the peak flows for the two highest flows is modeled well, whereas peaks below 4.0 m 3 /s are commonly over-or underestimated (see Figure 9).Furthermore, the stepwise baseflow pattern is discernible and overestimates baseflow and total flow.This is especially the case in the dry years in the first half of the calibration period, e.g., 1991 (see Figure 10).For the later years of the calibration period, the baseflow pattern fits the observed flows well.Other solutions reproduced the simulated vs. observed flows in the FDC more accurately for values below the mean observed flow.Choosing one solution solely based on either the FDC or one of the GOFC is difficult.For example, the FDCs of solutions 9107, 9790 and 9934 are plotted together with the GOFC in Figure 7. Based solely on the FDC, solution 9934 would seem to be the best solution.However, as the validation period consists of normal-to-wet years, with presumably higher baseflows, and due to the fixed nature of the FA, for baseflow modeling, a solution was chosen that tended to overestimate low flows in the calibration.Therefore, solution 9790 was chosen as the favorite solution.In other words, the derived monthly baseflow pattern from the daily baseflow series of the Eckhardt filter is always tightly correlated with the period in which baseflow is separated, and the optimal calibrated monthly factors can be less well suited for more or less wet periods.Precipitation is omitted, as it is not a driving factor for baseflow in FA.

Validation of FA
The validation results of solution 9790 of the FA are given in Table 4 and are shown in Figure 11 and Figure 12.In general, the GOFC deteriorate in the validation period for solution 9790.The notable exception is the LnNSE, which is only slightly less than in the calibration period.Most of the deficits mentioned in 3.2 are still present.The time of the peak and the peak value are simulated poorly.However, the general baseflow pattern matches the observed hydrograph almost as well as in the calibration period.Notably, the model now tends to underestimate flows less than 0.33 m 3 /s rather than overestimate them.Figure 11 depicts the FDC and GOFC for solutions 9107, 9790 and 9934.Here, too, solution 9790 lies between the other two solutions.While the overall result is poor, the fit of the baseflow can be considered satisfactory, as it is comparable to the calibration result regarding the LnNSE, which was expected, as solution 9790 represents a compromise solution, as described in Section 3.2.

Comparison of Calibration and Validation Results of FA
The FA proved capable of reproducing the baseflow pattern in the calibration period, with an LnNSE of 0.64, indicating a satisfactory fit.In the months of May to August, the calibrated monthly factors were lower than estimated, which led to a better fit of simulated and observed flows.This is due to the calibration period containing a distinct dry period from 1990 to 1994 and low flows mainly occurring in the summer months [16].The calibrated factors are lower for January, March and November and are higher for the months of February, April, October and December.The lower factors can be explained by the dry period as well.Figure 8 reveals that the baseflow component is commonly overestimated in these months in this period.A reduction in the monthly factors benefited the calibration in these years and did not cause significant underestimation in the remaining years.The higher factors can be explained by considering Figure 6.In the dry years, the estimated mean monthly baseflows (green line) tend to be higher than the separated daily baseflows (light-blue line) in these months, and they are significantly lower than the separated daily baseflows in the years 1994 and 1995.The higher factors aided the calibration result in following the separated daily baseflow more closely.Validation with solution 9790 proved satisfactory regarding baseflow, with the LnNSE only deteriorating slightly to 0.6.The FDC of the validation period indicates a general underestimation.This is also reflected in the negative ∆V (−11.36%).The water balance is presented in Table 5.From Table 5, the inherent limitation of the FA is apparent.The annual baseflow amount always remains the same and does not vary even though the main driving factor, precipitation, differs greatly between the calibration and validation periods.

Calibration of SMA
From the set of Pareto-optimal solutions, solution 3468 was chosen.The calibrated parameters of this solution are given in Table 2, and the GOFC are given in Table 6.The FDC and the plot of observed and simulated flows are shown in Figure 13 and Figure 14.As in 3.2, solution 3468 was chosen based on the plausibility of the calibrated parameters, GOFC, FDC and optical impression of the plot of observed vs. simulated flows.Table 6.Goodness-of-fit criteria for all Pareto-optimal solutions of SMA, as well as the chosen favorite solution for SMA.Overall, the GOFC are better when compared to solution 9790 of the FA.The exception is the APD, which is greater: 1.03 m 3 /s versus 0.35 m 3 /s for the SMA and FA, respectively.However, the FDC of observed and simulated flows agree very well over the entire range of flows.Peak flows tend to be underestimated, but the general timing of the peaks is well matched, and accordingly, solution 3468 has a higher NSE (0.78) than solution 9790 of the FA (0.62).The fit of recession and low-flow periods is good, as indicated by the LnNSE (0.72).In the dry period from 1990 to 1991, some overestimation of the baseflow component is apparent, whereas in the year 1996, the baseflow component is underestimated.The LnNSE of solution 3648 (0.72) is higher than the best LnNSE achieved with the FA (0.66) and is indicative of the better modeling capabilities of the SMA regarding baseflow.The generally good fit of the baseflow component is reflected in the good LnNSE, the FDC and the plotted hydrograph (Figure 14 and Figure 15).

Validation of SMA
The validation results of solution 3468 of the SMA are given in Table 7 and shown in Figure 16 as well as Figure 17.The GOFC of the validation period are comparable to the GOFC of the calibration period.Even though the LnNSE is higher than in the calibration period, both the plot of the hydrographs and the FDC indicate a poorer fit of the baseflow component when compared to the calibration period.In general, the baseflow component is still modeled well.The timing of peak flows matches well; however, the peak flow values can only be considered satisfactory (see Figure 17).

Comparison of Calibration and Validation of SMA
In comparison to the FA, the SMA achieved better results in the calibration period regarding baseflow.In the calibration period, an LnNSE of 0.72 indicates a good fit.The FDC of observed and simulated flows match very well, and in Figure 14, the recession periods tend to match well.In 1990 and 1991, there is a visible overestimation of the baseflow component.This can be attributed to too much seepage reaching the baseflow reservoir in some situations and outflow from the baseflow reservoir declining too gradually.The slow decline in the baseflow component is especially apparent in the years 1999 and 2002 in the validation period.On the other hand, a shorter calibrated recession constant would have resulted in a greater underestimation in low-flow periods, e.g., 1996.The modeled BFI is 0.6 for the calibration period and 0.62 for the validation period, which is greater than the estimated BFI of 0.46, which further indicates that the amount of baseflow is overestimated.The comparison of monthly statistics between the modeled baseflow and the separated baseflow is shown in Figure 18.In this figure, the greater BFI and slower decline in baseflow are evident as well.However, in general, the forms of the mean monthly baseflows do agree.The range of Min/Max values of the Eckhardt filter is greater in the validation period than the range of the modeled baseflows.The median monthly baseflow is shifted toward April rather than March.A more detailed overview of the climatic and hydrological values of solution 3468 is presented in Table 8.The discussed differences are in relation to the 30-year means given in [42].The calibration period has less precipitation per year compared to the 30-year mean.Therefore, the total runoff, direct runoff (surface runoff and interflow) and groundwater recharge (baseflow) are lower in this period.The validation period is close to the 30-year mean with regard to precipitation (see Table 8).However, the model underestimates the actual evapotranspiration (ETA) by about 43 mm/a.The total runoff is 41 mm/a greater, and the direct runoff is 26 mm/a less.Baseflow is 67 mm/a greater and thereby accounts for the difference in total runoff and nearly half of the difference in direct runoff.Based on the characteristics of the German Low Mountain range, the direct runoff consists mostly of interflow [42], which is the case with the SMA.The high amount of baseflow in the validation period can be partially attributed to the wetter conditions compared to the calibration period but is mainly a result of the underestimation of ETA and interflow.

Discussion
Selection of calibration and validation periods Ref. [75] recommends including observed data from wet, average and dry years for good model calibration and validation.The analysis of annual precipitation amounts showed that within the 14 years of available data, hydrological conditions varied significantly.The period 1989-1997 was chosen for calibration, and it contained distinctive dry periods and average years.It included both the lowest (1992) and highest (1995) recorded flows.All remaining years (1998)(1999)(2000)(2001)(2002) were used for model validation and consisted of average-to-wet years.In general, it can be concluded that the available 14 years encompassed dry, average and wet conditions and are therefore well suited for model calibration and validation.
Baseflow analysis from observed flow Ref. [41] compared six different baseflow separation methods in FIS.Following the findings of [41], the Kille method and Eckhardt filter were used to derive the BFI and daily baseflow time series, respectively, in the calibration period.The BFI of 0.46 fits well into the range of plausible BFIs (0.4-0.5), as determined by [41].The corresponding mean baseflow discharge rate is 3.65 L/(s km 2 ).The separated daily baseflow values range from 0.02 to 0.6 m 3 /s.The highest separated baseflows typically occur in January-April.The lowest separated baseflows occur in June-October.The estimated monthly baseflow factors reflect this pattern, with the highest factor being determined for March (1.619) and the lowest obtained for September (0.431).Ref. [76] found that low flows in southwest Germany are typical in late autumn.The possible range of monthly baseflows using the estimated mean baseflow discharge rate is 0.06-0.21m 3 /s.The narrower variation range is the result of aggregation from daily to monthly time steps.
Factor Approach In the calibration period, the FA was able to achieve an NSE and LnNSE in ranges of 0.44-0.69and 0.39-0.66,respectively.Following the classification in [68], it therefore achieved unsatisfactory-good results for both NSE and LnNSE.∆V ranged from −6.03 to 19.26% or satisfactory to good.The APD, expressed as the percentage error in peak flow, that is, the absolute error between observed and simulated peak flows in percent, ranged from 0 to 36% or satisfactory to very good.The SSE ranged from 103.3 to 187.9 (m 3 /s) 2 .
The ratings of the favorite solution 9790 for calibration and validation are summarized in the following: -NSE (0.The decline in the NSE and the rise in the APD and SSE can mainly be attributed to the poor peak-flow simulation.Even though the APD is classified as very good, these GOFC must be rejected as informative for the validation period, as the plot of observed and simulated hydrographs clearly shows that the peak flow timing is poor and that absolute peak flows do not occur at the same time.This is also reflected in the much lower NSE and higher SSE.A satisfactory LnNSE and a good ∆V in the validation period indicate that the FA can model baseflow in a satisfactory manner.The calibrated monthly factors fi follow the estimated fi well (see Figure 5).
However, the limitations of this approach are apparent because to achieve a satisfactory validation result, a solution had to be chosen that overestimated the baseflow component in the calibration period.In other words, a solution that is closer to reproducing average baseflow conditions had to be favored even though there were solutions (9934, see Figure 7) that achieved a similar LnNSE, had a better ∆V and better reproduced the FDC in the calibration period.The major drawback of the FA is that baseflow is not linked to a driving factor and therefore cannot respond to significantly different hydrological conditions.
The main strength of the FA is that it is relatively simple to implement.The data requirements are low, and monthly baseflow factors can be potentially regionalized using available flow gauges in the German Low Mountain range.This is already the case for mean low flow, mean flow and flood flows for specific return periods [36].
Soil Moisture Approach In the calibration period, an NSE of 0.58-0.80(satisfactory-very good) and LnNSE of 0.59-0.76(satisfactory-very good) were achieved with the SMA.∆V was within the range of 0.42-11.29%,corresponding to good-very good performance.The range of APD, again expressed as the percentage error of peak flow, was 0-52% (unsatisfactory-very good).The SSE ranged from 69.03 to 108.02 (m 3 /s) 2 .The favorite solution 3468 achieved the following performance ratings for the calibration and validation periods, respectively: -NSE (0. the APD, the same conclusion must be reached as for the FA.Despite the good classification, the timing of the absolute peak values is poor.However, compared to the FA, the SMA proves to be robust in simulating both the calibration and validation periods with regard to the baseflow simulation.This is reflected in the similar LnNSE, ∆V and the FDC.In general, regarding the FDC, the SMA can reproduce the flow regime in the calibration and validation periods.The water balance components were found to be within plausible ranges compared to [42], and a comparison of the monthly baseflow statistics of simulated and separated baseflows showed that they agreed well.
The main strength of the SMA is that it can reproduce the flow regime under different hydrological conditions, especially as baseflow is linked to driving factors.The major disadvantage of the SMA is that the model setup is labor-intensive, the data requirements are extensive, and numerous parameters must be estimated and calibrated.These factors potentially limit the use of the SMA for the Hessian guideline.
Comparison with the results of other studies The results of this study are compared to the results of several other studies in Table 9.The NSE and LnNSE were reported most frequently, and therefore, the comparison is made for these two GOFC.The studies included in Table 9 explicitly focus either on baseflow and/or on long-term continuous simulation with well-known hydrological models.The exception is the model GWN-BW used by [5], which is a spatially distributed soilwater balance model used in the federal state of Baden-Württemberg, Germany.From Table 9, it can be concluded that the NSE and LnNSE of the SMA are comparable to the achieved NSE and LnNSE of the other studies.The FA would be ranked at the lower end of the achieved goodness of fit but would still be within the range of the reported GOFC.Future development of BlueM.Sim for baseflow modeling in hard-rock aquifers While [41] concluded that a linear reservoir could be reasonably used in FIS, some nonlinearity was noted.Therefore, it is proposed that, in the ongoing development of BlueM.Sim, parallel linear reservoirs be implemented, as [5] found this to be the better model structure for baseflow in hard-rock aquifers.Ref. [79] modified the SWAT model to use parallel linear reservoirs and thereby improved the baseflow modeling results in the Manas River basin in the Tianshan Mountains, Northwest China.Ref. [82] found a tworeservoir structure to be best suited for fractured hard-rock aquifers as well.Furthermore, it is proposed to allow the use of the nonlinear reservoir, according to [57], for baseflow and interflow modeling, as it is currently only used for surface runoff.Hence, BlueM.Sim would be equipped with a wide range of possible model options and could thereby increase the interest in the BlueM software package.

Limitations of this study
Neither the model nor the method for estimating inputs for the FA has been applied to other catchments specifically for the purpose of baseflow modeling.Therefore, future studies with BlueM.Sim should be conducted to verify the results of this study.However, as [40] concluded that FIS is a typical catchment for the German Low Mountain range, the general findings of this study are expected to hold for this region.Furthermore, other models, such as SWAT or HEC-HMS, should be tested in FIS to enable a direct comparison with BlueM.Sim.

Conclusions
To our knowledge, baseflow modeling with BlueM.Sim was evaluated for the first time in this study.The evaluation was conducted for a 14-year period in the Fischbach catchment (FIS) in the German Low Mountain range in Hesse, Germany.BlueM.Sim is part of an official toolset in the Hessian guideline regarding immissions into rivers.Two possible approaches were evaluated.The first approach (FA) combines a modified curve number method with a mean baseflow discharge rate that follows a yearly pattern on a monthly basis.The mean baseflow discharge was estimated using the Kille method.A daily baseflow time series was separated from observed flows using the Eckhardt filter.The average yearly pattern of the baseflow variation in FIS was estimated from this time series.The methods used to estimate the mean baseflow and the monthly factors for the FA proved successful in FIS and are applicable to the German Low Mountain range, as FIS is a representative catchment of this region.The FA is not recommended for future studies in which gradual impacts on baseflow due to climate change or land use are of interest, as there is not a link between driving factors and the modeled baseflow.For the Hessian pollutant scenario, the calibration of simulated hydrographs is typically not possible due to the lack of observed data.Representing an average baseflow pattern would therefore be sufficient, and the inclusion of variations in baseflow throughout the year would lead to a more realistic representation of flow conditions in rivers.
The second approach (SMA) uses a physically based soil moisture routine in which seepage from the bottom soil layer is the input for baseflow modeling via a single linear reservoir.The SMA performed better than the FA over the entire 14-year period and achieved good results in baseflow modeling in FIS.
The SMA can be used for climate change and land use change impact studies in the German Low Mountain range, as baseflow is linked to driving factors via the soil moisture routine.The SMA would represent the most realistic baseflow representation with BlueM.Sim in the Hessian guideline.However, its practical application is hindered by extensive data requirements and the large number of parameters that need to be estimated and require calibration.

Figure 1 .
Figure 1.Map of FIS and locations of climate-measuring stations and rain and flow gauges.

Figure 5 .
Figure 5. Monthly factors fi used in the FA.

Figure 6 .
Figure 6.Separated daily baseflow using the Eckhardt filter and estimated baseflow pattern.Orange represents the mean observed flow, and red indicates the mean baseflow determined by the Kille method for the calibration period.The light-blue line is the daily baseflow series, as separated by the Eckhardt filter from the daily observed total flow in dark blue.The green line is the monthly mean baseflow pattern.Yellow and blue boxes mark dry and wet periods, respectively.

Figure 7 .
Figure 7. FA: comparison of three solutions from the set of Pareto-optimal solutions, calibration period.

Figure 8 .
Figure 8. FA: solution 9790-plot of observed and simulated total flows, calibration period.

Figure 10 .
Figure 10.FA: solution 9790-simulated baseflow vs. observed total flow in the first half of 1991.Precipitation is omitted, as it is not a driving factor for baseflow in FA.

Table 4 .Figure 11 .
Figure 11.FA: comparison of three solutions from the set of Pareto-optimal solutions, validation period.

Figure 12 .
Figure 12.FA: solution 9790-plot of observed and simulated total flows, validation period.

Table 1 .
Overview of data availability and data resolution.

Table 2 .
Parameters selected for calibration, including starting values, variation ranges and calibrated parameters of selected solutions for FA and SMA.

Table 5 .
FA: solution 9790-comparison of the water balance components with literature values.

Table 8 .
SMA: solution 3468-comparison of the water balance components with literature values.

Table 9 .
GOFC of studies found in literature compared with GOFC in this study.Baseflow/low-flow modeling was the aim of the study. *