Currently, extreme events such as heavy rainfall, droughts, and floods are becoming more common and less exceptional [1
], even within a particular territory [6
], resulting in growing variability in certain hydrological processes [8
], in addition to pressure on, and uncertainty regarding, water resources [10
], which is especially noticeable using runoffs series [13
]. In this sense, the frequency of droughts in Southern Europe is increasing significantly [14
]. This is making it necessary to improve the knowledge on the temporal behavior of rivers [16
], as not all the reasons that explain this increasing variability are new [16
As such, there is a strong need to develop new analytical approaches capable of capturing the induced and widespread effects that these new hydrological phenomena are causing on water resource availability [19
]. This is essential not only for planning and development of effective water resource management strategies [18
], but also in terms of optimal dimensioning of hydraulic infrastructure, such as reservoirs [13
The temporal behavior of rivers has traditionally received special attention from the scientific and engineering communities [16
]. This topic has been developed through previous studies and approaches that are currently the benchmarks and through persistence, and is strongly related to the measurement of the long-term memory of time series using the Hurst coefficient [23
], as well as storage and drought statistics [24
]. This issue has mainly been addressed by models of: (1) the interactions between multiple physical factors, such as meteorological, geological, and hydrogeological factors [13
]; and (2) the analysis of hydrological records [24
]. Furthermore, it should firstly be noted that any model is an abstraction of reality [25
], which is only partially known, and secondly, as is well-known, the scarcity of data has been a constant issue in hydrological research [26
This has led to diverse points of view [22
] on how to address these issues, which are essentially classified into two main approaches, deterministic and stochastic [27
], as was detailed by Molina et al. [16
]. Moreover, the growing global demand for water resources [29
], negative scientific predictions of their availability due to climate change [31
], and the partial knowledge of the underlying relationships in complex natural systems such as water systems [33
] give this research special relevance.
Deterministic models, which are often complex, are characterized by (1) being accurate, (2) needing large amounts of data and even inputs from other models, (3) and by the data processing being time consuming and frequently based on a unique time series [16
]. The above advantages allow a better understanding of the modeling process through the abstraction (i.e., simplification) of complex simulated natural phenomena [16
]. This is especially useful in the case of controlled and gauged river basins that are barely affected by climate change [27
]. Moreover, this general method is currently applied using software such as HEC-HMS [37
] (which is widely accepted in the field of hydrological engineering [16
]) by most water administrations [38
]. However, the main weakness of these models is their limited capacity for analysis of certain relevant aspects, such as the temporal dependence or persistence of the basin´s memory [16
], in addition to not being viable for basins with limited data [39
In recent decades, stochastic-model-based approaches have emerged in the development of hydrological models as a result of advances in computer science [16
]. These models are fundamentally different from the previous ones in how they deal with the uncertainty inherent in any hydrological process [27
]. This distinctive feature makes them suitable for hydrological modeling [41
]. Some relevant examples are autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA) models, multivariate adaptive regression splines (MARS), and causal reasoning (CR). The theoretical backgrounds of all these approaches were covered in depth by Molina et al. [16
In this sense, CR, as an artificial intelligence (AI) technique, is a powerful stochastic approach and is especially relevant to this research for the following reasons: (1) it uses raw data directly [42
]; (2) it does not require “a priori
” knowledge of the process [43
]; (3) it handles a large amount of information from dynamic and non-linear systems [36
]; (4) it is able to define sophisticated relationships in complex natural systems [44
]; (5) it is a powerful tool for discovering causal structures in raw statistical data [45
Additionally, as shown by the studies by Molina et al. [20
] and Zazo [22
], the potential for using causality to deal with temporal river behavior has recently begun to be explored using CR. This involves coupling traditional and novel methodologies through an ARMA model and Bayesian causal modeling (BCM). This has given rise to an active research area focused on increasing the knowledge of water resources and based on extracting the hidden logical time dependency structure that inherently underlies hydrological series. A comprehensive and in-depth theoretical and mathematical background and the main hydrological contributions to this research area are covered in [13
The sustainability of a hydrological system as assessed through the characterization of basin memory is very relevant, especially in the short term. Consequently, this research work aims to capture that temporal signature through a hybrid causal framework, which combines the advantages of deterministic and stochastic models. This is mainly highlighted through the relationship between an input signal (rainfall) and an output signal (runoff).
Additionally, this work aims to overcome the lack of suitable and available hydrological data for CR analysis, as causal models need to be populated by as much information as possible to be representative [16
]. This scarcity is largely due to several reservoirs having been built in the study basin (Almodovar, Celemín, and Barbate reservoirs) at the end of the 20th century in order to solve the problems related to recurrent droughts, floods, and water availability in this study zone.
In order to solve these challenges, a hybrid framework called the hybrid causal–hydrological (HCH) approach is developed in this study. This is done by hybridizing a BCM (stochastic platform) with a rainfall–runoff model (RRM) (deterministic module), which is the data source for the BCM platform.
Here, the RRM is the Témez RRM model (henceforth referred as T-RRM) [46
]. This is an aggregate and semi-distributed model of parameters that is suitable for homogeneous basins with a reduced amount of data. The T-RRM model is well-established in hydrological engineering, particularly in Spain [15
], and is supported here by EvalHid software [54
]. The T-RRM can provide reliable and long-term natural regime runoff time series and is used here to generate synthetic series using a parsimonious and unconditioned ARMA model (first stochastic approach). Then, these equiprobable time series are used for causal reasoning (second stochastic series) and the results are analyzed in depth.
After this introductory section, this manuscript is organized as follows. A case study, dataset description, and the applied methodology are discussed in Section 2
. Section 3
provides the main experimental results from the research. In Section 4
, the results are discussed in detail. Lastly, Section 4
is devoted to the general conclusions drawn from the study.
3.1. T-RRM Outputs
The T-RRM model obtained the monthly runoff in natural regime of the studied sub-basin (Barbate, Celemín, and Almodóvar) reservoirs for the study period (1951–2017). Figure 5
shows the results of the backward validation technique and Table 2
displays the results of the calibration process for the maximum moisture (Hmax)
, coefficient of runoff (C
), maximum infiltration (Imax
), and aquifer depletion curve (α
) parameters. The modeled runoff fits very well with the observed data for the validation period (Figure 5
and Figure 6
An average difference between the estimated and the observed contributions of 3.3 Hm3/month was detected in the case of the Barbate reservoir, of 1 Hm3/month for Celemín, and of 0.2 Hm3/month for the Almodóvar reservoir. Considering the maximum contribution of each reservoir, the average deviation was 2.9% for Barbate, 3.9% for Celemín, and 3.5% for Almodóvar.
shows that the monthly average contribution values ranged between 0.5 and 10.8 Hm3
/month, implying annual values of between 6 and 130 Hm3
. The greatest contributions are from the Barbate sub-basin, owing to its physiography and more abundant rainfall, which enable greater surface runoff. On the contrary, the Almodóvar basin gave the lowest contributions, mainly due to the reduced surface area of its basin (16.6 Km2
, see Table 1
shows the regression adjustments and provides linear correlation coefficients (R) ranging between 0.90003 and 0.9618, with determination coefficients (R2
) ranging between 0.8106 and 0.92514.
Finally, Figure 7
displays the annual flow hydrographs for the three studied sub-basins. The storage capacity of the Barbate reservoir is 228 Hm3
, for Celemín it is 45 Hm3
, and for the Almodóvar reservoir it is 5.7 Hm3
. Considering the average contributions, it can be deduced that the only infrastructure with multiannual regulation capacity within the basin (approximately 2 years of storage) is the Barbate reservoir. Moreover, for 11 of the 68 years of study (16%), the inputs exceeded the storage capacity. This situation accounts for 26% of the years in the case of Celemín and 46% for Almodóvar, showing the reduced capacity of the latter. In addition, the hydrographs also show pronounced dry periods. The first took place in the 1950s, while the two most extreme periods in terms of length and volume took place in the 1990s and in the 2000s. During these decades, there were also contribution peaks, which evidence greater irregularity in the precipitation events during the last decades.
3.2. Stochastic Module: Statistical Parameters and Design of Bayesian Causal Modeling (BCM)
shows the main statistical parameters of the considered time series (long and short series), as well as the Hurst coefficient as a dependence indicator. It can be clearly observed that the ARMA (1
) model is able to preserve the main statistical parameters of historical time series. Equally remarkable are the differences in the statistical parameters of short series (obtained from the T-RRM and observed records) in the case of the Barbate sub-basin (mean: 110.98 Hm3
and 88.82 Hm3
; standard deviation: 69.92 Hm3
and 56.08 Hm3
), which is in agreement with the results shown in the previous section.
Regarding the Hurst coefficient obtained from the T-RRM time series, in the case of the long series (1951–2017), all sub-basins had a value of 0.66, while the short ones (2000–2015), for which the results were less homogeneous, were in the range of [0.61,0.69] (0.65 Barbate/Q1; 0.61 Celemín/Q2; 0.69 Almodóvar/Q3). However, the average value (H = 0.65) was practically the same. In contrast, this observed trend was not observed for the gauging data (short series exclusively). In this case, the variability of result was larger ([0.57, 0.77]; 0.74 Barbate/Q1; 0.77 Celemín/Q2; 0.57 Almodóvar/Q3), although the average value was similar to the other two (H = 0.69 versus 0.66 and 0.65, respectively). Furthermore, it should be noted that all H coefficients display values greater than 0.50, implying a positive correlation and a persistent trend (long-term memory).
On the other hand, Figure 8
summarizes the conceptual scheme of the design of BCM using HUGIN©
software, which may also be seen as a result in itself. On the left side, the learning and preprocessing processes are shown. Here, the synthetic data were discretized into five intervals of the same length. On the other hand, the right side shows the developed hierarchical structure from top to bottom (initial to final year). Here, each decision variable is connected in such a way that it can influence the previous and following one in a natural way (trivial relationships). This defines the structure constraints process, in which the main “a priori
” relationship among variables is considered as the natural behavior, i.e., between consecutive years. Subsequently, and by means of the analysis of the “a posteriori”
probability distributions, non-trivial dependence relationships (time lag > 1) were extracted, owing to the power of analysis that CR supported by DMG offers. This information is implicitly present in hydrological data.
3.3. Runoff Basin Memory Assessment through Hybrid Causal–Hydrological (HCH) Modeling
Given the availability of both the T-RRM results and gauging data and inspired by the backward validation technique, the temporal behavior was validated. This was possible thanks to the qualitative approach that DMGs offer and was focused on the short series (time period ranging from 2000 to 2015). Furthermore, both a reliability analysis of the T-RRM results and a suitability analysis of the hybrid causal–hydrological (HCH) approach were performed. Figure 9
shows a comparative analysis of the results based on both DMGs. In this sense, it is worth noting that the determination coefficients (R2
) of the resulting mathematical functions were almost 1.00 (0.99 in all cases), demonstrating the robustness of the adjustment process.
In general, all the graphs present important asymmetry results, with the minimum result being obtained for the Celemín sub-basin from the T-RRM (time lag = 0 127.27 versus −161.72; see Figure 9
b), which is highlighted through a dominant wrap-around function (basically W-MAX). In addition, the behavior trends are maintained (convergence to 0 on X axis-temporal horizon), with a practically equal dependence propagation range. The difference in the Almodóvar sub-basin (see Figure 9
c, T-RRM [0, 3] versus gauging [0, 4] ranges) is not significant, because the relative percentage of change for time lag 3 is practically 0 in both cases.
Although the DMGs show differences in absolute values, (1) the relationship between W-MAX and W-MIN, (2) the temporal horizons, and (3) the range of relative percentage change are homogeneous and essentially maintain the observed trends. In particular, in the case of the Barbate sub-basin (Figure 9
a), the relationship between W-MAX and W-MIN remains practically constant (3.3/1 versus 3.1/1; T-RRM = [+500.21, −150.90]; gauging records = [+861.50, −276.89]), being equal to the temporal horizon ([0, 4]). In the Celemín sub-basin (Figure 9
b), the dependence propagation range is the same ([0, 3]), as well as the pattern of mitigation, with a strong dependence on the interval [0, 1] and symmetry between W-MAX and W-MIN in the remaining time lags. Finally, the Almodóvar sub-basin (Figure 9
c) is the case with the lowest difference. Here, even the range is roughly the same (T-RRM: 740.99 versus 686.72 from gauging records).
Therefore, a general analysis shows a more than reasonable concordance between the resulting DMGs (Figure 9
), which validates both the T-RRM results and the suitability of HCH approach.
shows the results based on long series. As in the case of the short series, all DMGs display asymmetric graphs with distinctly dominant wrap-around functions (basically W-MAX versus W-MIN), except in the case of the Celemin sub-basin, where this general trend is inversed and the graph shows a certain symmetry (see Figure 9
b), thus evidencing clearly dependent behavior. In addition, here all determination coefficients (R2
) for W-MAX and W-MIN mathematical functions (polynomial in all cases) show values close to 1.00, with 0.98 being the lowest value (Figure 10
a), demonstrating the excellent fit of the mathematical functions.
As mentioned above, the convergence of all series and W-MAX and W-MIN to 0 on the X axis (time lags) defines the temporal horizon of the dependence influence. In the case of the long series, the temporal horizons are mainly focused in the short term, with dependence propagation values ranging between [0, 4] (Figure 10
a,b) and [0, 5] (Figure 10
c). In this case, all W-MIN values present a practically constant convergence trend, whilst W-MAX values display different behavior.
Similarly, based on the DMG gradient, two regions can be clearly observed—the first region, where there is rapid mitigation (high gradient) and therefore greater dependence; and the second region, in which the gradient is lower, characterized by a gradual dissipation or mitigation of the dependence up to a relative percentage of change of 0.
Barbate Q1 and Celemín Q2 sub-basins (Figure 10
a,b) present a similar pattern of behavior, both with a temporal horizon of four (4) years (time lags), whereby the first region of rapid mitigation is located in time lags [0, 1] and the second between [1
]. In contrast, the Almodóvar Q3 sub-basin (Figure 10
c) displays slightly different behavior, with a temporal horizon of up to five (5) years, a high-gradient region situated in the interval [0, 2], and a low-gradient region between [2
Generally, the temporal patterns in all cases are clearly dependent; in line with the values of H coefficients, which reveal a high degree of dependence. This is mainly focused on the [0, 1] interval, which has a similar range for the relative percentage of change (maximum 1353.60 and minimum −345.77) and an average relationship that is four times greater for W-MAX than W-MIN (relationship 4:1).
Furthermore, the analytical power and suitability of BCM for discovering, extracting, and modeling hidden relationships is highlighted through the two regions of the DMGs (based on an analysis of the gradient). BCM was used to define a short qualitative indicator of behavior (general) and also a very short-term indicator through the greater influence region (particular). This characteristic is especially useful in the current context of the increasing variability of water resources.
4. Discussion and Conclusions
The effective inclusion of uncertainty in hydrological modeling is a challenging topic. There have been attempts at doing this before, but this is still a challenge that must be overcome. In this context, it seems obvious and rational to merge physical rainfall–runoff models with stochastic developments.
In this sense, this paper presents the hybridization process of traditional hydrological rainfall–runoff modeling (T-RRM) with hydrological Bayesian causal modeling (BCM), which is a prominent, innovative, and productive research field. This research work has generated a tool that the authors have termed the hybrid causal–hydrological” (HCH) method, which can provide the hydrological response of a basin and can stochastically characterize the studied hydrological processes.
This research paper has addressed the methodological process involved in this combination of models and produced results through its implementation in the Barbate River Basin, located in SW Spain. Regarding the first point, the aforementioned iterative process was technically satisfactory, given the outcome obtained from the HCH tool. Regarding the results, from a hydrological perspective, in the very short term (within the first year lag), the long series obtained from the T-RRM model had more dependent hydrological behavior, as the relative percentage of change in the DMG was higher than for the short series generated from the T-RRM model and gauging (observed) records. The gauging data had more dependent behavior than the short series produced by the T-RRM, except for in the Almodóvar sub-basin. This can be attributed to several factors, such as the insufficient amount of consistent data in the hydrological records and climate change processes, which might alter the general hydrological behavior of a basin when a long record period is analyzed. The results for the temporal propagation of the dependence (hydrological memory) are quite similar for the three analyses (long series, short series, and gauging series) as well as for the three sub-basins. In addition, the symmetry or asymmetry of the DMGs remains quite constant across the nine causal models. This outcome should be seen as a validation itself of the usefulness and reliability of the T-RRM and the hydrological causal modeling based on BCM, and therefore of the new HCH tool.
Questions remain regarding the optimal dimensioning of hydraulic infrastructure and efficient water resource planning, for which management approaches can successfully assist in providing dynamic and continuous analysis of the temporal dependence of a basin´s runoff.
There are upcoming research topics based on this work, one of which is especially important given the similarity of the topic to the one addressed here. This topic involves the multitemporal analysis of hydrological behavior across river basins. Ingeniería y Gestión del Agua (IGA) Research Group and their associates are currently studying the optimum parameters from hydrological series, such as the length of the analysis period or the volume of data required for the BCM.