Qom—A New Hydrologic Prediction Model Enhanced with Multi-Objective Optimization

: The efﬁcient calibration of hydrologic models allows experts to evaluate past events in river basins, as well as to describe new scenarios and predict possible future ﬂoodings. A difﬁculty in this context is the need to adjust a large number of parameters in the model to reduce prediction errors. In this work, we address this issue with two complementary contributions. First, we propose a new lumped rainfall-runoff hydrologic model—called Qom—which is featured by a limited set of continuous decision variables associated with soil moisture and direct runoff. Qom allows to separate and quantify the volume of losses and excesses of the rainwater falling in a hydrographic basin, while a Clark’s model is used to determine output hydrograms. Second, we apply a multi-objective optimization approach to ﬁnd accurate calibrations of the model in a systematic and automatic way. The idea is to formulate the process as a bi-objective optimization problem where the Nash-Sutcliffe Efﬁciency coefﬁcient and percent bias have to be minimized, and to combine the results found by a set of metaheuristics used to solve it. For validation purposes, we apply our proposal in six hydrographic scenarios, comprising river basins located in Spain, USA, Brazil and Argentina. The proposed approach is shown to minimize prediction errors of simulated streamﬂows with regards to those observed in these real-world basins.


Introduction
The modeling over time of the volumes of rainfall collected in a watershed in terms of losses and excesses, quantifying the accumulation on the surface and the movement by gravity superficially to topographically lower areas is currently an indispensable task in hydrology, since it allows experts to efficiently project water management infrastructures and design ecological strategies. Examples of these applications include the generation of hydroelectric power facilities, water reservoirs for human consumption, agricultural and livestock exploitation, water quality management, and so forth. It can also be used as a base and support for other studies, such as the dissolution of pollutants in a river, the erosion and sedimentation in riverbeds and basins, the impact of urbanization on the increase of the waterproof surface, as well as the formulation of contingency plans for emergencies due to deficiencies (droughts) and excesses of water (floods).
However, realistic hydrologic models often require many parameters to be properly tuned to reduce prediction errors with regards to observed scenarios. Finding the most appropriate settings of these parameters can be formulated as a complex optimization problem [1]. This was outlined in early studies [2,3] and more recently in Reference [4], where the importance of using bio-inspired multi-objective approaches was emphasized due to their ability to explore the search space generated for each problem river basin scenario and for several objectives at the same time. The main contribution of this work is the combination of a new hydrologic model-called Qom-with the application of a multi-objective approach to find efficient calibrations of the model when applied to real-world scenarios. The motivation driving us is to provide an efficient tool, able to provide accurate predictions when applied to heterogeneous river basin real scenarios, considering different time periods and climatological conditions. Compared to other models, Qom is featured by requiring a small number of parameter to be tuned-maximum storage water volume on surface, maximum storage volume in the soil and volumetric conductivity coefficient. Qom allows the efficient separation and quantification of the volume of losses and excesses of the rainwater falling in a hydrographic basin. Qom considers antecedent moisture and it is applicable to urban and rural watersheds, for permeable and impermeable soils. In addition, it determines the runoff in three layers (two superficial ones and a groundwater one) and it considers simple and continuous rainfall events for long periods of time of several years, including dry and wet periods with very short time records from several minutes to one day. A Clark's [5] model is then used to obtain predicted output hydrograms that include the transit of excesses with the effects of delays and storage.
The calibration of Qom is carried out by formulating it as a bi-objective optimization problem, which is solved with the combination of a number of bio-inspired algorithms belonging to metaheuristics, a family of non-exact optimization techniques [6]. We have taken seven algorithms that are representative of the state-of-the-art, namely NSGAII [7], OMOPSO [8], SMSEMOA [9], MOEAD [10], AbYSS [11], SMPSO [12] and MOCell [13]. These algorithms are provided by the jMetal framework for multi-objective optimization with metaheuristics [14], an open-source project, developed in the Java programming language, and released under an MIT (Massachusetts Institute of Technology) licence. Qom has also been implemented in Java to be integrated into jMetal as a black-box optimization problem, with vectors of decision variables as inputs and objective function values as outputs. The version of jMetal including Qom as well as all the utilities and scripts (most of them implemented in R) we have used in this work, are freely available to the community of interested users.
For validation purposes, six realistic hydrographic scenarios comprising river basins, located in different regions of Spain, USA, Brazil and Argentina, have been modeled and evaluated with different size conditions, climates, topographies, heterogeneous soils and with a series of very short time periods of 10, 15, 30 and 60 min and 24 h. The results are mathematically validated by three gauges parameters and exponential functions that relate evapotranspiration, surface stored water, water stored in the soil with the infiltration and percolation processes. The conducted experimentation shows that the combination of Qom with the multi-objective based calibration procedure can provide experts with successful trade-off solutions, which minimize the prediction errors of simulated streamflows with regards to those observed in real-world basins.
The remainder of this paper is organized as follows. Section 2 provides the algorithmic background. Section 3 presents a series of related works in the state-of-the-art. In Section 4, the proposed Qom hydrologic model is detailed and formulated in the form of multi-objective optimization problem. Experimental results and discussions are presented in Section 5. In Section 6, a series of discussions have been included. Finally, conclusions and future work are commented in Section 7.

Background on Multi-Objective Optimization with Metaheuristics
This section is devoted to providing a basic background of multi-objective optimization with metaheuristics, which is needed to understand the calibration scheme we propose.
Many real-world optimization problems can be formulated as the maximization and/or minimization of two or more conflicting functions or objectives at the same time. These problems are known as multi-objective optimization problems, and they can be found in many disciplines such as civil engineering [15], economics [16], telecommunications [17], bioinformatics [18,19], agriculture [20], and so forth. A multi-objective optimization problem can be defined as follows (we assume minimization of all the functions, without loss of generality): Definition 1 (Multi-objective optimization problem). A multi-objective optimization problem is a tuple (S, f , g, h), where S = ∅ is called the solution space (or search space), f = [ f 1 , f 2 , . . . , f k ], with k >= 2, is a vector of functions, where f i : S → R, are the objective functions and g = [g 1 , g 2 , . . . , g m ] and h = [h 1 , h 2 , . . . , h p ] are also vectors of functions, where g i : S → R and h i : S → R are the constraint functions. Thus, solving an optimization problem consists of finding a set of solutions X * ⊆ S such that, for all x * ∈ X * : for some 1 ≤ j ≤ k, subject to: where g i , h j : S → R, i = 1, ..., m, j = 1, ..., p are the constraint functions of the problem.
The set of all values satisfying all the equality and inequality constraints defines the feasible region Ω, so any point x ∈ Ω is a feasible solution. For simplicity, in the following definitions we will consider Ω = S.
A key difference between mono-and multi-objective optimization is that in the latter, vectors of values (instead of a single value) must be compared to determine which is the best of them. In this sense, a basic concept in multi-objective optimization is Pareto dominance, which is defined as follows: Definition 2 (Pareto dominance). Given two vectors x, y ∈ R k , we say that x ≤ y if x i ≤ y i for i = 1, ..., k, and that x dominates y (denoted by x ≺ y) if x ≤ y and x = y.
Given two solutions in objective space, Pareto dominance indicates whether one dominates the other one or, on the contrary, that the compared solutions are non-dominated, that is, none of them is strictly better than the other in all the objective values. This way, solving the multi-objective problem can be viewed as the process of finding the set of solutions that dominates every other point in the solution space and all the solutions of that set is said to be Pareto optimal for that problem. Formally:

Definition 3 (Pareto Optimality).
A solution x * ∈ S is Pareto optimal if it is non-dominated by any other solution x ∈ S.
The set of solutions composed of all the Pareto optimal solutions is known as Pareto optimal set (or Pareto set): Definition 4 (Pareto Optimal Set). The Pareto Optimal Set P * is defined as: The solutions in the Pareto set belong to the variable space (S), and their correspondence in the objective space (R k ) is a set known as Pareto front: Definition 5 (Pareto Front). The Pareto Front P F * is defined by: The solutions in the Pareto front are usually referred to as non inferior, acceptable or efficient. The Pareto front is also known in some contexts as efficient frontier.
From these definitions, we can conclude that the main goal of multi-objective optimization is to find the Pareto set of a given problem so that an expert in the problem domain can choose one or more solutions from the corresponding Pareto front according to some preferences. However, when dealing with real-world problems finding the Pareto front can be unpractical for a number of reasons [21] affecting the objective functions, such as non-linear linearity, NP-hard complexity, epistasis, and so forth. Furthermore, a Pareto front can contain a very large (or even an infinite) number of solutions. So, in practice, the goal usually is to find a high quality approximation of the true Pareto front containing a limited number of solutions (e.g., 50, 100, 500). The quality of these approximations is measured in terms of two properties-convergence, which implies that they must be as close as possible to the Pareto front, and diversity, which means they are uniformly spread.
The concept of Pareto front approximation is illustrated in Figure 1, which includes the solutions found by an optimization algorithm when solving a multi-objective problem having two objective functions (named f1 and f2 in the figure). They are trade-off solutions in the sense that neither is better than the other in the two objectives. The true Pareto front in the example, depicted as a continuous line, has a convex shape and it can be observed that some of the obtained solutions do not completely overlap with the line (i.e., they have not fully converged to the Pareto front, so they are not optimal solutions) and the set of solutions covers all the Pareto front although it does not have a perfect spread out. The most popular techniques to deal with multi-objective problems are metaheuristics [22], a family of non-exact, stochastic optimization algorithms which do not guarantee to find the optimum of a problem, but do usually provide high quality solutions (near optimal and sometimes optimal) within a reasonable amount of time and resources. There are different ways of classifying metaheuristics, one of which distinguishes between bioinspired (or nature-inspired) and non-bioinspired algorithms [6]. The first category is by far the most well known and used, including techniques such as evolutionary algorithms, particle swarm optimization, or ant colony optimization. The reference algorithm in multi-objective optimization, NSGA-II [7], is an evolutionary algorithm.
A metaheuristic typically follows an iterative process in which a set of tentative solutions P is manipulated somehow by a number of variation operators, aiming at progressively generating better solutions, as shown in the pseudo-code in Algorithm 1. In the case of evolutionary algorithms, the solutions and the set P are named, respectively, individuals and population, and the variation operators are crossover and mutation. The new produced solutions are evaluated to compute their corresponding objective function values and they are used to update the population. The main loop of the algorithm is executed until a stopping condition is met.
Algorithm 1 Template of a metaheuristic. 1: Set control parameters 2: P(0) ← GenerateInitialSolutions() 3: t ← 0 4: Evaluate(P(0)) 5: while not StoppingCriterion( ) do 6: Q(t) ← Variation(P(t)) 7: Evaluate(Q(t)) 8: t ← t + 1 10: end while 11: Return P(N) A reason for the popularity of metaheuristics is the development of software frameworks that provide algorithms that are representative of the state of the art plus benchmark problems and utilities for performance assessment [23]. As commented in the introduction, in this work we use jMetal, a Java-based framework for multi-objective optimization with metaheuristics.

Related Work
Since the last decade, the optimal parameter calibration of hydrologic models has been tackled with mono-and multi-objective metaheuristic algorithms, usually providing successful results for different approaches of decision variables and objective functions. In particular, two early surveys [2,3] outlined the importance of using bio-inspired multi-objective approaches, as they are specially adapted to work as black-box optimizers with capabilities to explore the search space generated for each problem river basin scenario and for several objectives at the same time. In this regard, an interesting proposal was presented in Reference [24], where a sensitivity analysis and automatic calibration of a rainfall-runoff NedborAfstromnings model (NAM) [25] was carried out with nine decision variables to measure the quantitative and qualitative variation of results. In this work, an Elitist Non-dominated Sorting Differential Evolution (NSDE) [26] algorithm was used to optimize two objective functions consisting on Average Root Mean Squared-Error (RMSE) of peak and low flow Events.
In Reference [27], a strategy that combined the principles of multi-objective optimization and depth-based sampling was presented with the title of Multi-Objective Robust Parameter Estimation (MOROPE). This algorithm applied multi-objective optimization to identify non-dominated robust model parameter vectors for the calibration of a distributed hydrologic model with a focus on flood events in a small pre-alpine and fast responding catchment in Switzerland. A thorough comparison was conducted in Reference [28] with a series of popular multi-objective algorithms (Borg, AMALGAM, GDE3, NSGA-II, OMOPSO, and SPEA2) for the calibration of the HBV (Hydrologiska Byråns Vattenbalansavdelning) conceptual rainfall-runoff model [29] with 14 real-valued decision variables and considering four objective functions. In this study, Borg and OMOPSO obtained, in overall, the best performance for the tackled instances. Also in 2013, the work presented in Reference [30] was centered on a river basin according to the distributed hydrologic model HEC-HSM (Hydrologic Engineering Center -Hydrologic Modeling System, https://www.hec.usace.army.mil/), for which a number of 10 decision variables and 4 objectives were approached with a multi-objective particle swarm optimization algorithm.
More recently, several variable balancing approaches for the exploration and exploitation of the NSGA-II, in the automatic parameter calibration of a HYdrological MODel (HYMOD) were evaluated in Reference [31]. These balancing approaches were compared with traditional static balancing methods (the two values are fixed during optimization) in a benchmark hydrological calibration problem for the Leaf River (1950 km 2 ) near Collins, Mississippi.
Another related study is found in Reference [32], where authors analyzed how adding daily Total Water Storage (dTWS) derived from the Gravity Recovery And Climate Experiment (GRACE) as extra observations, besides the traditionally used runoff data, improved the calibration of a conceptual hydrological model within the Danube River Basin. As calibration approach, four popular evolutionary optimization techniques (NSGA-II, MOPSO, PESA-II, SPEA2) were tested to calibrate the model. Results indicated that NSGA-II performed better than the other techniques for calibrating GR4J using GRACE dTWS and in situ runoff data.
A common aspect in all these works is the use of already existing multi-objective optimizers (NSGA-II, SPEA2, MOPSO, etc.) with well-known hydrologic models usually requiring large number of parameters (HBV, HEC-HSM, GRACE, etc.), trying to find the best performing algorithm. Our proposal differs not only in the definition of a novel hydrologic model (Qom), but also in how the calibration with metaheuristics is conducted. Instead of looking for a particular technique, we take the practical approach of combining the results of seven algorithms, configured with commonly used settings, with the idea of combining the best solutions provided by all of them.

Hydrologic Calibration Strategy
In this section, the complete calibration strategy of the hydrologic model is detailed. First, the Qom model is explained with the main parameters and formulations. Then, the model calibration is formulated as a multi-objective optimization problem, so its objective functions and constraints are defined. Finally, the optimization approach is described from the perspective of software and computational issues.

Qom Model
The Qom model considers moisture, evapotranspiration, surface stored water and infiltration, and percolation processes, that is, those actions that cause water storage in the soil. It is applicable to permeable and impermeable soils and it determines the runoff in three layers, two of them superficial (permeable and impervious soils) and one another of underground flow. In order to determine the outlet flows in a given place in the basin in response to rainfall, we separate the total volumes of rainfall into volumes of losses and excesses discreetly over time, simulating for excess water the way in which runoff occurs, considering the transit and delay in the basin.
Therefore, besides the rainfall data and morphometric characteristics of the basin, the model needs three initial tuning parameters, namely the maximum storage water volume on surface Rmax, the maximum storage volume in the soil Smax and the volumetric conductivity coefficient VC, which represents the volume of water that would move in three-dimensions in the soil towards the exit per unit of surface in a certain time, expressed in volume of water sheet per unit of time. These variables are emphasized in red color in the flow chart of Figure 2, which represents the mathematical model that corresponds to the equations commented in lines (1) to (12) of the this figure, involving the hydrologic natural processes of Qom. The computation cycle starts obtaining those required data from the problem specification given from the scenario instance, as well as the parameters (decision variables), to calculate the soil moisture states. A part of the rainfall defined with the variable P in the unity of the volume of the water sheet. Retentions in the basin are produced by surface depressions of the soil R, by the infiltration of water in the porosity of the soil S and percolates in permeable soils D. Another part of the volume is lost in the atmosphere through evaporation and plant transpiration EVT. The rest of the volumes would leave the basin superficially by a portion of impermeable areas Vi and permeable ones V p. In addition, another part Vg of the volume can be percolated into deep layers remaining in the basin as soil moisture or as a mass of water that will be added to the underground outflow of the river.
Clark's method (runoff Vi, Vp, Vg) CT,Shape,Eq Flow Translation (Synthetic Histogram) Flow Attenuation (Simple Linear Reservoir) Potential evapotranspiration cover with excess water on the surface WITHOUT RUNOFF Total evapotranspiration of surface water Surface water accumulation with runoff Surface water accumulation without runoff  Drainage volumes: Vi on impermeable soil (1st layer) Vp in permeable soil (2nd layer). Vg of groundwater (3rd layer) or according to consideration storage of humidity in soil (2nd layer).
If there are more records, the process is repeated.
False True An additional fraction in evapotranspiration is considered WITH RUNOFF When the Qom process is finished, the Vi, Vp and Vg volumes are passed to model the hydrograms. Clark's model is then used for modeling direct runoff of watershed rainfall response. It is based on the concept that instantaneous unit hydrograph can be derived by routing the unit excess rainfall in the form of a time-area histogram (translation) through a single linear reservoir (attenuation). As a consequence of the application of this model, a new set of decision variables is taken into account for the calibration of the model, comprising: Ki, Kp, Kg (emphasized in blue in Figure 3) and additionally the variables CT, SC and Eq can be enabled and considered (emphasized in grey); which leads to obtaining the solution vector Qe that includes the estimated water flow as formalized in Equation (4).
A description of all the terms and factors used in equations of Qom model can be found in Table 1. The coding of the solution consists of a real value vector that includes each decision variable as a parameter for calibration, three decision variables namely: Rmax, Smax and VC, which represent the soil moisture estimated by Qom. Finally, in order to estimate the output flow Qe through Clark, we need 3 other decision variables: Ki, Kp, Kg, alternatively CT, Shape, Eq. Actual data values Qo and those estimated by the computation of Qe are used to calculate prediction errors in the two objective functions that will be optimized, as explained next. Table 1. Description of the terms in flowchart of Figure 2 and in Equations.

Multi-Objective Formulation
To measure the quality of solutions obtained thorough the optimization process, a widely used statistic metric in hydrology is the Nash-Sutcliffe Efficiency coefficient (NSE) [33], which is computed by means of Equation (5). It is a normalized metric that determines the relative magnitude of the residual variance compared to the measured data variance, evaluating the model fit primarily during high-flow periods. NSE ranges between −∞ and 1.0, with NSE = 1.0 being the optimal value. Values between 0.0 and 1.0 are generally viewed as acceptable levels of performance, whereas values < 0.0 indicate that the mean observed value is a better predictor than the simulated one, hence leading to unacceptable performance.
Based on this standard measure, our multi-objective approach is focused on minimizing two objective functions f 1 and f 2 , by taking into account the time-series data from both, observed and estimated water flows in a given hydrologic scenario. The first objective function is based on NSE, although using the formulation f 1 = 1 − NSE, to promote minimization (optimum in 0.0, jMetal assumes that all the objective functions are to be minimized.) The second objective function is based on the percent bias (PBI AS) [34], which is formulated as f 2 = Abs(PBI AS) and measures the average tendency of the estimated data to be larger or smaller than their observed counterparts. PBI AS is calculated as shown in Equation (6), with optimum value in 0.0. Low-magnitude, positive, and negative values indicate, respectively, accurate model estimation, model underestimation bias and model overestimation bias.
The problem formulation includes constraints based on the relative error in terms of the percentages RE, with AE being the absolute error between observed and estimated values according to Equation (7). A constraint considers the mass balance function MB (water balance or soil moisture accounting) that measures the river basin drainage with Equation (8) [ UNESCO, 1971] for bounding the relative error MBRE. Other two constraints are defined for bounding the fitting errors in hyetograms as well as in losses and excesses volumes. In this work, solutions are constrained to

Optimization Approach
Once we have formulated the calibration of the Qom model as a multi-objective optimization problem, the next step is to solve it to find an accurate approximation of the Pareto front and then take from it that solution providing the best overall result. Determining the best metaheuristic for a given problem instance is far from being a trivial process; thus, the No Free Lunch Theorem [35] states that all optimization algorithms that search for a particular problem of a class perform the same when averaged over all the problems in the class.
The experimental methodology we follow is not aimed at finding or designing a particular metaheuristic for the automatic calibration of Qom, but to make use of a set of existing algorithms and use all of them to get a Pareto front approximation. We want to avoid spending time in tuning the control parameters of the algorithms, so they will be configured with standard settings. Our approach is based on the assumption that each algorithm can efficiently explore a particular region of the search space, so running all of them independently and joining the results they provide is likely to yield a satisfactory set of solutions. To this end, we have selected a set of representative multi-objective metaheuristics which are included in the jMetal framework, namely: NSGAII [7], OMOPSO [8], SMS-EMOA [9], MOEA/D [10], AbYSS [11], SMPSO [12] and MOCell [13].
The scheme we follow is illustrated in Figure 3. The starting point is, for each riven basin, a dataset comprising 5 data files. A file with extension *.basin contains morphometric features with static parameters to characterize it, such as area, shape, percentage of permeable area %p, initial surface and soil moisture (Ro and So, respectively), start and end dates of analysis, time slot between registers ∆t, and so forth. File *.var includes decision variables representing calibration parameters, as well as upper and lower bounds for algorithmic search. Information about total precipitations P is set in file *.rain, output flow data Qo are in file with extension *.Qo and potential evapotranspiration data PEVT are in file *.pevt. These files register observed data for each time period ∆t. For each river basin, a problem instance is defined according to two or more series of data. One of these series is used for multi-objective optimization (calibration) and the remaining ones are used for validation, with regards to optimized solutions. All the used algorithms are configured to find a bounded set of solutions after performing a maximum number of evaluations over the problem calibration data. When they finish, the results are stored in two files, named VAR and FUN, which contain, respectively, the found solutions (i.e., the values of the decision variables) and the corresponding Pareto front approximation (i.e., the objective function values). A number of 30 independent runs for configuration have been executed, and all the solutions produced by all the algorithms have been joined, so after removing the dominated solutions a set of non-dominated solutions is obtained. This set is used as a reference front.
After the optimization process, the next step is the decision making, consisting in the selection of one solution in the reference front that will allow the expert to compute an optimized prediction of the soil moisture state, volumes of looses and excesses and hydrographs. This step is described in the next section.

Experimentation
In this section, we first describe the scenarios used for validating our model. In particular, we have taken six problem instances involving river basins in Spain, USA, Brazil and Argentina. One of the goals of our proposal is to make it applicable to a wide range of scenarios. For that reason, the model has been tested with a number of river basins with different features of topography, land use in the region, and weather conditions. Then we detail the strategy followed to select a solution from the reference front obtained in the optimization phase. Finally, we analyze the results obtained when the model is configured with certain solution values selected (decision variables) and it is applied to the problem instances.

Problem Instances
Next we describe the river basin problems, including information about the datasets (a description of the main features of the river basins is included in Appendix A): 1.
Asua river basin, Bizkaia, Spain (73.44 km 2 , ∆ t = 24 h). The time period of registered data comprises from 6 June 2005 to 30 September 2009 and required the split in 4 different series due to the discontinuity of these data. Therefore, one of these subsets was used for model calibration, while the other three were employed for validation as independent test sets. These validation data sum up 2462 registries and they are annotated as continuous samples with time slot ∆ t = 24 h.
Problem instances and their data subseries were identified by an alphanumeric code of 4 characters. The first number in this code indicates the river basin with specific morphometric features; the second number (separated with dot) is the data series of different time periods; the third character could be "C" to indicate continuous samples with several storming events or "S" meaning a single event (one storm); the fourth character could be "C" to indicate this series is used for calibration (optimization) or "V" to specify the data series is used for validation as external test set. For example, instance 1.1CC corresponds to Asua river basin data series 1 with continuous samples that are used for calibration.

Solution Selection Strategy
As commented before, the results of optimizing each problem with the seven multi-objective metaheuristics constitute a reference front composed of the non-dominated solutions produced by all the algorithms. Now, we proceed to describe and apply a solution strategy to select from the reference front that solution providing the best trade-off, according to the two objective functions, to properly set the Qom model with a high accurate prediction capability. Figure 4a,b show the reference fronts obtained for problem instances 1 and 2, respectively, including information about the number and percentage of solutions contributed by each algorithm to the reference front. We use a different combination of symbols and colors for each solution to associate to the algorithm that produced it.
We start by analyzing the Asua river basin (a) with calibration series 1.1CC. We observe that all the algorithms have contributed to the reference front, with SMS-EMOA being the one contributing with more solutions by far (70.81%), followed by MOEA/D (12.69%), NSGA-II (6.25%), AbYSS (5.26%), MOCell (2.76%), OMOPSO (1.64%), and SMPSO (0.58%). However, it is worth noting that the solutions in the extreme regions of the reference front have been obtained by algorithm AbYSS (green points with symbol ×). We observe also that the solutions of MOEA/D (blue points with symbol ) are concentrated on the bottom part of the front, which corresponds to higher values of the first objective (NSE) and lower values of the second objective function (PBIAS). The interesting fact is that all the algorithms complement each other, as the reference front has no gaps.
We focus now on the numerical dispersion of results obtained for the calibration series with regards to validation ones on the Asua river data. Figure 5 includes the boxplots of distribution of results for each objective function and four data series of the problem. We observe that there are not high differences between calibration and validation. For objective function f1 (1-NSE measure), quarterlies of series 1.1CC, 1.3CV and 1.4CV are in the range of 0.9 and 0.7 (NSE = 1 − f1) and only 1.2CV shows higher dispersion with quarterlies 0.57 and 0.47, but without outliers. These results are classified in the reference literature [38] from very good to excellent. In the case of PBIAS (f2), series 1.1CC, 1.2CV and 1.3CV show predicted results between 0% and 25% also without outliers, whereas for validation series 1.4CV, results alternate from 45% to 60%, but with low dispersion between maximum and minimum limits. As AbYSS yielded the extreme solutions in the calibration, we have outlined the solutions having the lowest values for the two objectives in both, the calibration and validation results. We observe that AbYSS also gets the lowest values in all the cases but one, the 1.3CV dataset with the second objective function.   Keeping this in mind, we can now plot the different prediction error distributions of the two objective function values (1-NSE versus PBIAS), to show the influence of calibrated variables when applied to validation data series. This way, as shown in Figure 6 for problem instance 1 (Asua river basin), calibrated solutions remain accurate when applied to the corresponding validation data, although better results are obtained in general when selecting the best solution in 1.1CC according to f2 (Figure 6 right). It can be argued that, for this specific problem instance, we can select the vector of decision variables (solution) that sets the Qom model with higher precision for three validation series. This strategy has been followed for all the problem instances worked in this study with similar results, hence making the solution selection easier in the decision making process. We now analyze instance 2 (Artivai river basin) for series 2.1CC, the reference front of which is plotted in Figure 4b. In this case, SMS-EMOA is again the multi-objective calibrator with a higher percentage of solutions contributing to this front approximation with 98.97% and also with low dispersion in objective functions (variation of f 1 ∈ [0, 0.08]). However, according to the previous strategy, after validation of series 2.1CV the most accurate solution is provided by MOEA/D, which only contributes with 1.03% to the reference front. This is an interesting result that can be explained by checking decision variables Rmax, Smax and VC, as shown in Figure 7. In these graphs, the ranges of these two variables are plotted with regards to the corresponding solutions in the reference front, so for SMS-EMOA they do not exceed a value of 0.5 mm. Conversely, for MOEA/D solutions, the maximum values of Rmax and Smax are 3 mm and 30 mm, respectively. This way, the spectrum of accurate solutions is expanded to those located at the extremes of the reference front, as plotted in Figure 4b. This leads us to suggest that this solution selection strategy does not necessarily depend on a given outperforming algorithm (SMS-EMOA in this case), but on a validation methodology with an external series of data, that provide trade-off values in decision variables.

Results After Choosing Solutions
Once the resulting solutions are selected after calibration and validation, their corresponding decision variables are registered and analyzed in terms of the hyetographs and hydrograms, where the volumes of losses and excesses, as well as the fitting error of observed versus estimated data series are plotted. These decision variables are shown in Table 2 for each problem instance, which have been used to set the Qom model.

Problem Instance
Rmax (mm) Smax (mm) V C (mm/h) 1 The corresponding NSE and PBIAS obtained by using these variables on the different calibration and validation data series are shown in Table 3, where additional information concerning the time period of observation, time slot of registration (10 min, 15 min, 1 h, and 24 h) and the number of registries is also provided. In general, values of NSE ≥ 0.75 and PBIAS ≤ 10% are obtained for calibration series, while NSE ≥ 0.6 and PBIAS ≤ 25% can be observed for validation series of data.
In accordance with reference studies in the literature [39], where registration periods are conducted monthly, values of NSE ≥ 0.50 and PBIAS ≤ 25% can be considered as satisfactory. In the light of these results, we can claim that the prediction errors obtained by our proposed Qom model based on multi-objective calibration are more than satisfactory, since it is able to deal with complex problem instances with observation time periods significantly lower than 1 month, as done in Reference [39]. More in depth, when focusing on a specific solution for a river basin, a series of hydrologic results are extracted, which leads the expert to define a hydrology engineering project. An example of these values are shown in Tables 4 and 5, which include the specific mass balance obtained by Qom-Clark and estimated peak values with relative errors for problem instance 6.2CV (Rio Bermejo, Argentina), and data period 1971 09:00 to 3 April 2014 09:00 step 24.0 h. From a visual perspective, it is possible to check the quality of the hydrographic models obtained by plotting the hyetographs, which represent the humidity losses (HL) and excesses (HE) values from input data, together with the hydrographs, with observed (QO) and estimated (QE) outputs. In this way, Figure 8 shows the resulting rainfall and streamflow by means of the hyetograph and hydrograph of the 1.1CC (Asua river basin) calibration data series, where we can observe the high correspondence between water losses/excesses and the predicted series (estimated data), which indeed fits the curve of observed data. For validation data series 1.3CV, a similar plot is made in Figure 9 to better show the precision in estimated streamflow (QE), so even for external data (not used in calibration) a high prediction can be reached with the proposed approach.  A similar observation can be extracted from problem instance 2 (Artibai reiver basin), for which Figures 10 and 11 show the corresponding hyetographs and hydrographs of calibration and validation. In this case, although the amount of data used for calibration is lower than for validation, the precision power of the model is satisfactory, as changes in streamflow are usually detected and modeled for several years.
With respect to the San Antonio River, two hydrometric stations were considered, in Loop 410 and Goliad. The calibration and validation hietograms and hydrograms for the two stations are represented in Figures 12-15, respectively.
It is worth noting that for San Antonio and Juquiá problem instances, similar hydrographs are obtained (see Figures 16 and 17), which lead us to suggest that Qom is useful and accurate when applied to multiple and heterogeneous river basins and time periods.
The hyetograph and hydrograph of the basin that belongs to Balapuca (Bermejo river) are shown in Figure 18. In this regard, an interesting comparison can be also extracted from the duration curve of the observed and estimated flows in Figure 19. It is another way to evaluate the behavior of the simulation model and the parameters, allowing to appreciate for which groups of flows better adjustments were obtained.    Resulting hydrographs in Figures 8 and 19 reveal that observed streamflows are properly fitted by estimated ones in different problem instances and time periods.

Discussion
The main goal driving us in this work has been to provide a practical tool for hydrologic prediction based on the combination of two components-the Qom model and its automatic calibration using multi-objective optimization algorithms. The results reported in the previous section indicate that the Qom model is stable for a set of heterogeneous scenarios with different conditions and it does not produce large changes to small variations of the parameters. Therefore, it can be considered an adequate model for its use in the basin hydrological risk estimation, in basin climate change assessments, and in changes by use or soil practices.
A question that can arise is why have we proposed a new hydrologic model instead of using an existing one? The reason is related to the expertise of the paper authors, comprising hydraulic and computer science engineers. We are the creators of the jMetal software framework for multi-objective optimization metaheuristics. jMetal includes a large amount of algorithms representative of the state of the art but, as pointed out in Section 4.3, choosing which is the most promising one to solve a given optimization problem is not a easy task, in particular for experts in the application domain, which are usually not experts in the optimization techniques.
In this sense, a solution selection strategy is also suggested for guiding the expert in hydrology in the decision making process, which comprises the use of independent data series for calibration, as well as for validation, each of them involving different time periods of a given problem instance. This leads the proposal to select successful solutions that could be generated by calibration algorithms with low contribution in the Pareto reference front approximation, but with very stable results in terms of practical validation.
Our starting point was to use jMetal for the calibration of a hydrologic model; although this algorithmic library is implemented in the Java programming language, it hinders using existing models; for example, the Sacramental Soil Moisture Accounting (SAC-SMA) model is available in FORTRAN, Matlab, and R, but to the best of our knowledge there is no a Java version. Consequently, from our experiences in water resources we decided to define a new model-Qom-based on a reduced number of parameters to be adjusted, to implement it in Java, and integrating it into jMetal to use a multi-objective approach for automatic calibration, all aimed at providing a generic and accurate software tool that is freely available to the community.
Our objective in this paper has not been to compare Qom with existing models, nor to compare the automatic calibration scheme with existing strategies. These issues remain open and are included as lines of future work in the next section.

Conclusions
In this work, a novel hydrologic model called Qom is proposed to efficiently separate and quantify the volumes of losses and excesses of the rainwater falling in a hydrographic basin. In combination with it, an evolutionary multi-objective approach is used for parameter calibration focusing on indicators to qualify the adjustments between the observed and estimated hydrograms. A number of multi-objective algorithms in the scope of the jMetal framework are used to deal with the hydrologic model as a black-box optimization problem, with a vector of decision variables as inputs and objective function values as outputs.
The Qom model, featuring a reduced set of parameters, together with the possibility of using existing multi-objective optimizers without the need to tune each of them, just combining the solutions they produce, make this proposal of special interest for both academy and industry communities.
For testing purposes, six realistic hydrographic scenarios, comprising river basins located in Spain, USA, Brazil and Argentina, have been modeled and evaluated with different size conditions, climates, topographies, heterogeneous soils, and with series of very short time periods of 10, 15, 30 and 60 min and 24 h. The results indicate that the calibration of Qom of each problem allow to predict future storm flows as it is possible of accurately reproduce the phenomenon rain-runoff (cause-effect), thus fulfilling the goal of our work.
In general, the NSE and PBIAS obtained by Qom involved highly successful values of roughly 75% and 10% (respectively) in calibration, as well as in validation. This could be explained in terms of low errors of the predicted series with regards to observed ones, which can be classified as very satisfactory according to Reference [39]. Moreover, these results could even be improved if larger time periods of registration are considered, as we measure changing conditions from 10 min to 24 h, whereas references in the literature worked within monthly periods [39].
As lines of future work, an open issue to explore is the analysis of the Qom model when compared with other hydrological models, such as SWAT, VIC or SAC-SMA. Our proposal to combine different multi-objective metaheuristics can be time consuming, as a large number of independent runs of configurations of the algorithm/problem must be carried out. It would be more efficient to explore the use of automatic parameter tuning tools to determine if a single algorithm could be good enough. In this sense, a further step would be to self-generate an ad-hoc metaheuristic suited for the calibration of Qom. In both cases, the idea would always be to use automatic tools to avoid the expert to deal with algorithmic issues. The performance of the resulting method would be validated by comparing it with other parameter estimation techniques.
Author Contributions: G.R.Z. has contributed to the conceptualization, methodology, software development and validation; J.G.-N. has contributed to methodology, formal analysis, writing-original draft preparation, writing-review, and editing and visualization; A.J.N. has contributed to supervision, software development, writing-review, and editing. All authors have read and agreed to the published version of the manuscript.
Funding: This work has been partially funded by Grants TIN2017-86049-R (Spanish Ministry of Education and Science). José García Nieto is the recipient of a Post-Doctoral fellowship of "Captación de Talento para la Investigación" Plan Propio at University of Málaga.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. River Basins Description
A description of the main features of the studied river basins is included next: • Asua river basin, Bizkaia, Spain. It is located in the north of the Iberian Peninsula. The climate is temperate, oceanic, with frequent and abundant rainfall, especially in autumn and winter, with an annual average of 1200 mm. It is a rural basin comprising a drainage area equal to 50.8 km 2 , corresponding to the Derio (meteorological) and San Groniz (capacity) stations, with slopes not exceeding 1%. It is one of the most open valleys in Bizkaia. It is bordered by mountains of low altitude without exceeding 360 m in height. The lands through which the courses of this basin run are mainly made up of marl and limestone, crossing the main river alluvial lands from the middle stretch to the mouth.The shape can resemble a rectangular figure of proportions 1.5 to 1 with respect to length and width. • Artibai, Bizkaia, Spain. The Artibai river is about 20 km long, and extends in a S-NE direction. Its origin are two groups of streams from mountains between 1029 m 793 m altitude. The geological substratum of the basin is predominantly limestone at the head and later sandstone and clay. The fluvial bed is stony with a predominance of blocks or boulders both in the streams and in the main channel. The shape can resemble a rectangular figure of proportions 2 to 1 with respect to length and width. • San Antonio river sub-basin, station Loop 410, Texas, USA. It is a sub-urban basin, having an average slope of 0.2%, and a climate that alternates between dry and humid, summers are hot and winters vary from mild to cold; the average annual rainfall is 738 mm. It can resemble a rectangular shape of proportions 2 to 1 in relation to length and width. • San Antonio river basin, station Goliad, Texas, USA. The morphometry of the San Antonio River basin includes the contribution to the Goliad station. The basin has been considered regional as a single unit of drainage area, most of its surface being permeable. It has slopes from 0.2% to 3%, has an elongated shape from northwest to southeast, the climate is varied, alternating between humid and predominantly dry. Being an extensive drainage area of the regional type, the area of incidence of storms has been distributed spatially. The shape is very elongated with a ratio of 5 to 1 with respect to the width. • Juquiá river basin, station 4F-018R, Ribeira Do Iguapé, São Pablo, Brasil: We have considered the sub-basin of the Juquiá River in the hydrometeorological station 4F-018R, which includes a drainage area equal to 4360.0 km 2 , code 81679000, basin of the Ribeira Do Iguapé River, São Pablo, Southeast Atlantic Hydrographic Region, Brazil. The topography is mountainous with the characteristics of a somewhat flowing and turbulent river. The region has a hot tropical climate, with high temperatures in summer. The high rainy season occurs in winter with mild temperatures. Formed by two sub-basins, its shape is elongated at the headwaters and widens at the exit.

•
Bermejo river upper basin, station Balapuca, Argentina-Bolivia. The basin is located in the north of Argentina, province of Salta, sharing a part with Bolivia; the sub-basin belongs to the Bermejo river basin. It has the mountainous ecosystem of the Andes Mountains with an average slope of 35%. The hydrological regime of the rivers is purely pluvial with a well-defined seasonal variety, characterized by a period of important flows in the rainy season. The shape of the basin can be assimilated to a square shape of relation 1 to 1 with respect to the length and width.
The basins are depicted in Figure A1.