1. Introduction
Climate change and intensive human activities extensively influence hydrological regimes and water resource systems. Consequently, many world rivers have suffered quality and quantity problems. Rivers are considered one of the most vulnerable resources to human disturbances, e.g., land cover changes, agricultural irrigation, and reservoir construction [
1,
2,
3,
4]. Further assessing the changes in streamflow regimes is a critical challenge for understanding hydrologic mechanisms, which can improve water resource management.
Many water resource studies have applied stochastic methods to identify temporal uncertainties. The early time series models assumed that the time series came from a stationary or a cyclostationary process [
5]. These models performed well for hydrological data without signs of long-term memory or nonlinear dependence [
6,
7,
8,
9]. However, as records length increased, low-frequency structures of climate were associated with hydrologic time series. These structures became an essential feature in the hydrological analysis, especially in streamflow analysis, due to the extremely non-uniform temporal distribution of global runoff. In addition, several studies have associated the changes in hydrological records with the effects of natural climate variability, particularly from low-frequency climate indices such as the Pacific Decadal Oscillation (PDO) and the Atlantic Multidecadal Oscillation (AMO) [
5,
10,
11]. Thus, recognizing variability patterns and the shift linked to climate variability and human activities such as dam construction and water withdrawal remains a significant challenge. The low-frequency variability modifies the occurrence of extreme events, such as droughts and floods, in the decadal and multidecadal time frames. Consequently, the risk of extreme events is dynamic and changes from one decade to another.
Over the years, several studies have focused on the changes in streamflow at various spatial and temporal scales, including studies in Brazilian basins [
4,
12,
13,
14,
15,
16]. Climate fluctuations in the decadal and interdecadal time frames control water availability, affect ecosystems, and modulate higher frequency variability, thus having a significant social and economic impact [
17,
18]. Proper assessment of this type of variability is important in medium and long-range water resource management, particularly for hydropower generation planning.
The recent intensive drought (2010–2017) in the northeast region of Brazil (NEB) significantly affected hydroelectric production. Thus, variations in hydrological systems require regime shift detection and forecasting for better planning and management of water resources. This is particularly so for the case study where the Brazilian National Operator of the Electrical System (ONS) is seeking to estimate water availability from the forecast of inflows to optimize the interconnected hydroelectric system. The models commonly used by the ONS belong to the class of autoregressive models, which considers the series stationary [
19]. Consequently, these models do not represent the best forecasting methods, given the climatic variability that exerts pressure on water systems. The case study is one of the greatest hydroelectric power plants of the NEB, the Sobradinho dam. Several studies have investigated trends and different characteristics of the Sobradinho streamflow times series [
4,
20,
21]. However, studies have not analyzed changes in the low-frequency streamflow regime of the Sobradinho dam, including the possible causes. Further motivation for analysis of the low-frequency variability is provided by the inspection of filtered streamflow series in the chosen region and their significant contribution to the total variability.
Climate variables can exhibit low-frequency and regime-switching variability at multiple time scales, which are likely to cause risks associated with climate extremes to vary in time. To analyze the shifts in hydrological variables, which are directly influenced by climate change and anthropogenic activities, four different methods were applied to identify the shifts in the low-frequency streamflow: (i) the Standardized Runoff Index (SRI), which is extensively used for monitoring hydrological drought due to its simplicity in computation and relatively low data requirements; (ii) the Pruned Exact Linear Time (PELT) algorithm used as a typical search method for segmentation; (iii) the breaks for additive seasonal and trend (BFAST) method, which has the advantage of decomposing the time series, not only for performing trend analysis but also for detecting the regime shift of seasonal data; and (iv) the hidden Markov model (HMM), which has been successfully used in modeling regime-like behavior.
As extreme climatic events become more frequent and threatening, it becomes critical to assess the sensitivity of watersheds to climatic change and its impact on important aspects for human life such as energy and water supply. Much knowledge could be gained from detecting shifts and predicting low-frequency streamflow regimes considering dynamic variations over time. Accurate and reliable prediction models for streamflow are tools of great importance in the management and optimal allocation of water systems. This is especially the case in systems under significant stress due to surface and groundwater scarcity [
22]. Therefore, an HMM is proposed to forecast the low-frequency streamflow. The HMM is a doubly embedded stochastic process model. This feature fits well with the complicated occurrence and development process of nonlinear hydrological phenomena. Therefore, the objectives of this study were (i) to develop a methodology for investigating low-frequency streamflow shifts and (ii) to predict the system’s states in the low-frequency time frame. Identifying the current state of low frequency allows the assessment of the risk of extreme events and an accurate forecast of streamflow.
2. Study Area and Data
The Sobradinho reservoir is located in the São Francisco River Basin, Brazil (
Figure 1). The basin has an area of approximately 639,219 km², equivalent to 8% of the country. It is the longest river that runs entirely in Brazilian territory. The São Francisco River runs through six Brazilian states (Minas Gerais, Goiás, Bahia, Pernambuco, Alagoas, and Sergipe) and the Federal District. The São Francisco River originates in Minas Gerais and runs 2863 km to the Atlantic Ocean [
23].
The water and climate characteristics of the basin are highly variable. The Sobradinho reservoir presents critical periods of prolonged droughts due to low rainfall and high evapotranspiration. The rainfall season starts approximately in November and ends in April [
23,
24]. The average annual flow of the São Francisco River is 2846 m
3/s, and, until 2013, the water withdrawn was 278 m³/s [
25]. One of the uses of water resources in the São Francisco River is irrigation, whose withdrawal is 77% of the region’s total demand.
The São Francisco Region plays an essential role in generating electricity, with a potential installed in 2013 of 10,708 MW, coming from 28 small plants and 12 large plants (12% of the country’s total). The hydroelectric exploitation of the São Francisco River represents the NEB’s energy supply base [
24]. There are several large dams along the São Francisco River, namely Três Marias, Sobradinho, Itaparica, Moxotó, Paulo Afonso I, II, III, and IV, and Xingó, which were constructed between 1962 (Três Marias) and 1994 (Xingó). The Sobradinho dam (coordinates: 9°25’49” S, 40°49’37” W; construction: 1973–1979; the beginning of operations: 1979–1982) is located 742 km from the mouth, in the Bahia state. The dam has a height of 41 m and length of 12.5 km. The reservoir has a maximum length of 320 km, a surface area of 4214 km², and a storage capacity of 34.1 × 10
9 m³. The Sobradinho reservoir was built to achieve multiple uses such as multiannual flow regulation for hydropower, navigation, irrigation, and flood control management for the riverine communities of the São Francisco River Basin.
Monthly naturalized flows measured at a gauging station along the São Francisco River were obtained from the ONS. The naturalized flow of a hydroelectric plant is the flow that would be observed in that measurement gauge considering the river in its natural condition, that is, assuming that there is no reservoir regulating the flow and no human activity impacts. The streamflow time series range from January 1931 to December 2016 without any missing values. The gauge station is located approximately 95 km upstream of the Sobradinho dam. This station is likely to suffer minor effects of the Três Marias reservoir, the largest reservoir upstream of Sobradinho, and located about 1087 km from Sobradinho [
4].
3. Materials and Methods
This section provides a brief description of the methods and the steps used for shift identification and projection of the low-frequency streamflow time series. All experiments were carried out in the R environment for statistical computing [
26]. In this study, (i) the wavelet transform was used to decompose and reconstruct the low-frequency streamflow from the streamflow time series and (ii) a classical method (SRI) and three state-of-the-art methods (PELT, BFAST, and HMM) were used to identify the shifts in the low-frequency streamflow series. After the identification of the shifts present in the series and knowing how they repeated themselves, (iii) an HMM was used to project the low-frequency streamflow. The main advantage of applying this method to streamflow time series is its ability to simulate long persistence and regime-switching behavior [
27]. Then, the degree of improvement in prediction accuracy from using the HMM as the projection method for low-frequency streamflow time series was assessed.
Figure 2 illustrates the flowchart describing the key steps in this study.
3.1. Time Series Decomposition Using Wavelet Transform
An extensive method used in extracting the low-frequency part of a time series is the wavelet transform [
28,
29,
30], which decomposes the series in the time-frequency domain and identifies the dominant modes of variability. A wavelet transform decomposes a time series into a set of functions, also known as the “daughter wavelet”, derived from the translation in time and scaling of the “mother wavelet”. The choice of the mother wavelet is a significant one, where the kind of wavelet transform chosen depends on the type of output information needed. There are many mother wavelet functions from which to choose, such as Haar wavelet, Daubechies wavelet, Mexican Hat wavelet, Morlet wavelet, and others. This study applied the Morlet wavelet, which is commonly used in hydrological time series because of its power to describe the time series adequately and provide a better time-frequency localization [
31,
32].
The signal component of the time series is identified by the 90–95% significance test using the white noise as a null hypothesis and the interpretation of the wavelet power spectrum. The identified significant component is then extracted from the original series using the reconstruction function. Reconstruction of the original time series over a set of periods can be obtained as follows:
where
C is the reconstruction factor;
and
are the scale and time factor, respectively;
is the factor that removes the energy scaling for the Morlet wavelet function;
Re(
Wave(
s)) is the real part of the wavelet transform; and
s is the scale parameter.
We used the WaveletComp package [
33] to decompose the time series and reconstruct the low-frequency streamflow. More details of the wavelet technique can be found in [
5,
28,
30].
3.2. Regime Shift Detection
In this study, we applied the PELT algorithm and the HMM to detect shifts in the annual low-frequency streamflow time series, and the BFAST model and the SRI to detect shifts in the monthly low-frequency streamflow series.
3.2.1. SRI
Drought indices, such as SRI, are used for drought identification and the description of its intensity. The SRI is based on the concept of the Standardized Precipitation Index [
34], discussed by the authors of [
35]. Although the indices show similarities, the SRI incorporates hydrological processes that control the seasonal loss in streamflow due to the climate’s influence, thus being able to describe the hydrological aspects of droughts.
First, a long-term record was fitted to a probability distribution. A variety of probability distributions (e.g., gamma, lognormal, generalized extreme value, log-logistic, and generalized Pareto) have been used to fit monthly observations of different hydro-climatic variables for calculating drought indices [
36,
37]. Then, the cumulative distribution function (CDF) of the fitted marginal distribution was transformed into a standard normal variate Z.
After the low-frequency streamflow was decomposed and reconstructed using the wavelet transform to a significant variability frequency range, we tested three different distributions, where the best one was chosen to fit the reconstructed time series. The adjusted SRI (Ad-SRI), as we are calling the SRI of the low frequency, is not the same as the classical SRI as they present different information. Consequently, their variability range is different. The Ad-SRI uses the mean of a 12-month time scale to characterize the system’s state in that year. Afterward, the values are classified. Years with negative values were termed State 1 and years with positive values were termed State 2.
3.2.2. PELT Algorithm
The detection of breakpoints was based on the method presented in [
38]. A data set is defined as being
. In a model with multiple breakpoints (m) and their positions
, each point is an integer between 1 and n − 1. It is defined that
and
. Consequently, the m breakpoints will split the data into m+1 segments. A common approach in the methodology to detect multiple breakpoints is to minimize the cost function as follows:
where
C is a cost function for a segment, e.g., negative log-likelihood, and
βf(
m) is a penalty to guard against overfitting (a multiple breakpoint version of the threshold
c).
Many breakpoints algorithms are implemented in the R package changepoint [
39] used in this study, such as the binary segmentation algorithm, segmentation neighborhood, and the PELT algorithm [
38]. The PELT shows speed gains and increased accuracy over the other methods. The method is an adaptation of the optimal partitioning, and for computational efficiency it removes points that can never be minima from the minimization performed at each iteration by the cost function [
39]. We assessed the annual low-frequency streamflow with the PELT algorithm and divided the shifts into two states based on the breakpoint found by the algorithm. Negative values of low-frequency streamflow were termed State 1, whereas positive values were termed State 2.
3.2.3. BFAST
The BFAST method, proposed by the authors of [
40], is a decomposition method that integrates the iterative decomposition of a time series into trend, seasonal, and remainder components to examine changes (i.e., trends and breakpoints) within the time series. The method has been applied to detect long-term seasonal changes in satellite image time series [
40]. The general model is described as follows:
where
is the observed data at time
t,
is the trend component,
is the seasonal component and the remainder component
denotes the remaining variation in the data beyond that in the seasonal and trend components.
Assuming that the entire time series has m breakpoints
,…,
in the trend component
, then the segment-specific slopes and intercepts can be calculated on each segment. The trend component can be expressed as follows:
where
i = 1, …,
m and we define
= 0 and
=
n. The intercept
and slope
can be used to assess the magnitude and direction of the abrupt change.
Similarly, a harmonic model is applied to parameterize the seasonal component. The seasonal component is fixed between breakpoints. Given the time series has
p seasonal breakpoints
, …,
, then the seasonal component
can be calculated as follows:
where the unknown parameters are the segment-specific amplitude
and phase
, which must be estimated. The known frequency
f is equal to 12 for the monthly observations used here. The moving sum test based on ordinary least squares residuals (OLS-MOSUM) is applied to detect whether one or more breakpoints occur [
41]. The breakpoints are estimated using Bai and Perron’s method if the test indicates a significant change (
p < 0.05). [
42] argues that the Akaike information criterion (AIC) usually overestimates the number of breaks, and the Bayesian information criterion (BIC) is a more suitable procedure in many situations. Thus, in the method, the number of breaks was determined by the BIC, and the date and confidence interval of the date for each break were estimated. The BFAST model parameters were estimated by iterating the following steps:
Step 1: If the OLS-MOSUM test shows that breakpoints occur in the trend component, then the number and positions of the breakpoints in the trend component , …, are estimated through least squares from the seasonally adjusted data . For a specific segment, the trend component can be estimated by Equation (4). Then, the trend coefficient and are calculated for different segments using robust regression method based on M-estimation to account for potential outliers.
Step 2: Similarly, if the OLS-MOSUM test indicates that breakpoints occur in the seasonal component, then the number and positions of the breakpoints in the seasonal component , …, are estimated from the detrended data . The parameters and for each segment are calculated using a robust regression method based on M-estimation. We applied the BFAST model to monthly data and identified the shifts in the low-frequency streamflow.
3.2.4. HMM
HMM [
43] is a statistical model in which the realizations from an unobserved Markov process represent the observed time series [
44,
45]. A Markov process is a random process whose future probabilities are determined by its most recent values. The HMM was developed for speech recognition and has been successfully used in many knowledge areas, including hydrology [
27,
44,
46].
An HMM consists of state (
and observation
variables. The distribution of
can be written as
and the marginal distribution for a discrete number of states can be described as mixture distribution with n components [
45,
46,
47]. The equation is written as follows:
where
,
and
() is the conditional distribution of the data.
The transition between states is governed by probabilities described as transition probabilities. They are denoted by the matrix
, where the first row (
) has the probabilities from moving from state
to
. When dealing with the transition’s parameters in
A, one must define the initial state or the prior probabilities π that define where the process begins.
The forward algorithm is used for calculating the joint likelihood [
48]. The expectation–maximization algorithm is used to optimally estimate the parameters of the HMM. In this algorithm, the parameters are obtained with the maximization of the expected joint log-likelihood given the observations and states through an iterative process [
45,
49]. The Viterbi algorithm is applied to decode the observation sequences into hidden state sequences.
3.2.5. HMM for Projection
For the simulation of the model, the low-frequency streamflow was fitted to different HMMs, varying the number of hidden states. The model with the lowest BIC was chosen. Then, the model’s parameters were estimated, including the mean and standard deviation of each state and the transition matrix of the model.
The prediction method was based on the work proposed by the authors of [
50], which is divided into three steps:
Step 1: The HMM parameters are calibrated using the training data, and the probability of the observed data is calculated.
Step 2: Based on similar data sets in the past data, we find parts of the training data with a similar likelihood.
Step 3: The difference between the streamflow of the previous year and the streamflow of the consecutive year is calculated to forecast the future streamflow.
When predicting the streamflow at a time step T + 1 of the time series, a part-time series of length D is chosen to be the training data, used to calibrate the HMM’s parameters, λ (π,A,B). Consequently, streamflow patterns similar to the current year are located in the past data. Considering that the predicted value must assume a similar pattern from the training data, the difference of a year’s streamflow and the next year’s value is calculated. The difference between years is estimated by the summation of the probability of being in a previous state multiplied by the mean of its respective states.
Similarly, to predict the streamflow at time T + 2, new training data is used by adding the predicted value and repeating the three-step-prediction process for the time step T + 2 and so on. Data length and quality are limitations in the method because shorter records cannot show the low-frequency variability efficiently needed in multidecadal projections.
3.2.6. Model Performance Metrics
The projection performance of the model for the low-frequency streamflow time series is estimated by comparing the observation and prediction. The root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (R) are used to estimate the performance of the HMM, as defined in Equations (8)–(10).
N is the number of input samples; and represent the observed and predicted runoff at time t, respectively; and and represent the averages of the observed and predicted runoff, respectively.
In the evaluation, the target values are R close to 1, MAE and RMSE close to 0. The RMSE is an ideal error index used to evaluate the global fitness of high streamflow values, whereas the MAE provides a more balanced measure of overall errors [
22].
5. Discussion and Conclusions
The nonlinearity of hydrological systems has been recognized for many years. The recent development of computational power and data acquisition provided us with tools and new methods to study temporal and spatial variability in hydrological variables [
20]. Different studies [
4,
20] provided evidence that the time series of the Sobradinho reservoir presents nonlinear behavior, particularly after its construction. Therefore, the wavelet transform was applied in this paper as a pre-processing tool to extract significant features such as the low-frequency variability and to gain insight into time-varying characteristics that the streamflow time series may present.
Previous studies have indicated that a hydrological regime shift may occur due to anthropogenic activities, such as dam construction. In this context, the analysis of the low-frequency streamflow using the BFAST model showed a shift in 1986, and for the other models, the shift started in 1985. We also observed a reduction in the low-frequency streamflow peaks from 1985. These alterations in the low-frequency streamflow regime might indicate the influence on the river regime of the Sobradinho dam, which was constructed between 1973 and 1979 and started functioning between 1979 and 1982. According to the authors of [
52], downstream areas of the Sobradinho dam presented significant changes in the annual seasonality of floods. Although more events that contributed to increases in precipitation occurred during the 1986–2006 period, this was not reflected in the flood data, concluding that dam impacts combined with other water withdrawals, particularly for agriculture, are the main reasons for changes in floods along the São Francisco River [
52]. Other concerns raised for the region are the increase in the amount of water removed from the São Francisco River due to the increase in irrigation farming and illegal water removal from the river [
53]. These anthropogenic actions are tough to measure and carry a great impact on flow rates.
Some authors [
54,
55] observed a systematic change in the hydrological time series of the NEB in the late 1970s and early 1980s that are associated with the natural fluctuations such as decadal variability. Several studies point out that precipitation records over South America exhibit decadal and interdecadal variability. This precipitation variability has been associated with sea surface temperature (SST) anomalies such as the PDO [
20,
56] and the AMO [
57] in the NEB. Thus, although there was no apparent correlation between the climate indices and the analyzed low-frequency streamflow, these variables may influence some periods of the low-frequency streamflow behavior and the shifts present in hydrological variables.
Another significant perturbation that can cause changes in the hydrological system is extreme events such as droughts. The recent intensive drought developed in the NEB region for the period 2010–2017 has been one of the worst in the last decades. We can see an apparent decrease in low-frequency streamflow. However, in the low-frequency streamflow, the decrease is not that intense. That phenomenon may be because the drought was associated with the strong 2015–2016 El Niño, which brought warmer weather and SSTs [
58]. That influence may be present in the high frequency of the time series (2- to 8-year frequency in
Figure 4).
Future projections state that temperatures in the NEB should increase, while rainfall could decline by approximately 25–50% in semi-arid areas. Consequently, flow rates will be reduced for various rivers in the NEB [
58,
59]. Hydroelectric potential in the São Francisco River Basin will be reduced due to more frequent and intense climate-induced droughts. Water allocation and appropriate land management are necessary for the region [
52]. Consequently, more realistic projections can help to improve water management in the region.
In this study, we also proposed a model to forecast the nonlinear low-frequency streamflow series. Three scenarios of projection were modeled and evaluated using three statistical performance evaluation measures (RMSE, MAE, and R). The forecasts for 5- and 10-year periods presented remarkably high values of R. Although the value for the 15-year period was low, the result is significant when compared with the results provided in [
7], whose authors used a climate hidden Markov model and presented an R² of approximately 0.26 for a lead time of 15 years. The model presents an improvement in the information of the dry period. Low-frequency streamflow (1931–2016) has a higher probability of average low flow than the test data set, and the model prediction can capture this behavior. This information is relevant for water resource planning, in particular for drought planning. Furthermore, as extreme climatic events become more frequent and threatening, it is essential to assess watersheds and prepare strategies for those situations.
Identifying different state periods also reveals the impact of low frequency in the streamflow time series. Due to the clear separation of states in the analysis, we observed that the patterns have different probability distributions through the CDF plots; thus, the low-frequency variability conditions the flows of a given year. The model comparison in this paper provides an insight into a modified version of a classical method such as the SRI and state-of-the-art methods such as BFAST, PELT, and HMM available for identifying shifts in the time series. Assessing the current state of low-frequency streamflow allows the assessment of the dynamic risk of extreme events and an accurate forecast of streamflow. The HMM forecast model is a tool to aid in the management and operation of this reservoir.