Shift Detection in Hydrological Regimes and Pluriannual Low-Frequency Streamflow Forecasting Using the Hidden Markov Model

Improved water resource management relies on accurate analyses of the past dynamics of hydrological variables. The presence of low-frequency structures in hydrologic time series is an important feature. It can modify the probability of extreme events occurring in different time scales, which makes the risk associated with extreme events dynamic, changing from one decade to another. This article proposes a methodology capable of dynamically detecting and predicting low-frequency streamflow (16–32 years), which presented significance in the wavelet power spectrum. The Standardized Runoff Index (SRI), the Pruned Exact Linear Time (PELT) algorithm, the breaks for additive seasonal and trend (BFAST) method, and the hidden Markov model (HMM) were used to identify the shifts in low frequency. The HMM was also used to forecast the low frequency. As part of the results, the regime shifts detected by the BFAST approach are not entirely consistent with results from the other methods. A common shift occurs in the mid-1980s and can be attributed to the construction of the reservoir. Climate variability modulates the streamflow low-frequency variability, and anthropogenic activities and climate change can modify this modulation. The identification of shifts reveals the impact of low frequency in the streamflow time series, showing that the low-frequency variability conditions the flows of a given year.


Introduction
Climate change and intensive human activities extensively influence hydrological regimes and water resource systems. Consequently, many world rivers have suffered quality and quantity problems. Rivers are considered one of the most vulnerable resources to human disturbances, e.g., land cover changes, agricultural irrigation, and reservoir construction [1][2][3][4]. Further assessing the changes in streamflow regimes is a critical challenge for understanding hydrologic mechanisms, which can improve water resource management.
Many water resource studies have applied stochastic methods to identify temporal uncertainties. The early time series models assumed that the time series came from a stationary or a cyclostationary process [5]. These models performed well for hydrological data without signs of long-term memory or nonlinear dependence [6][7][8][9]. However, as records length increased, low-frequency structures of climate were associated with hydrologic time series. These structures became an essential feature in the hydrological analysis, especially in streamflow analysis, due to the extremely non-uniform temporal distribution of global runoff. In addition, several studies have associated the changes in hydrological records with the effects of natural climate variability, particularly from low-frequency climate indices such as the Pacific Decadal Oscillation (PDO) and the Atlantic Multidecadal Oscillation The water and climate characteristics of the basin are highly variable. The Sobradinho reservoir presents critical periods of prolonged droughts due to low rainfall and high evapotranspiration. The rainfall season starts approximately in November and ends in April [23,24]. The average annual flow of the São Francisco River is 2846 m 3 /s, and, until 2013, the water withdrawn was 278 m³/s [25]. One of the uses of water resources in the São Francisco River is irrigation, whose withdrawal is 77% of the region's total demand.
The São Francisco Region plays an essential role in generating electricity, with a potential installed in 2013 of 10,708 MW, coming from 28 small plants and 12 large plants (12% of the country's total). The hydroelectric exploitation of the São Francisco River represents the NEB's energy supply base [24]. There are several large dams along the São Francisco River, namely Três Marias, Sobradinho, Itaparica, Moxotó, Paulo Afonso I, II, III, and IV, and Xingó, which were constructed between 1962 (Três Marias) and 1994 (Xingó). The Sobradinho dam (coordinates: 9°25'49" S, 40°49 '37" W; construction: 1973-1979; the beginning of operations: 1979-1982) is located 742 km from the mouth, in the Bahia state. The dam has a height of 41 m and length of 12.5 km. The reservoir has a maximum length of 320 km, a surface area of 4214 km², and a storage capacity of 34.1 × 10 9 m³. The Sobradinho reservoir was built to achieve multiple uses such as multiannual flow regulation for hydropower, navigation, irrigation, and flood control management for the riverine communities of the São Francisco River Basin.
Monthly naturalized flows measured at a gauging station along the São Francisco River were obtained from the ONS. The naturalized flow of a hydroelectric plant is the flow that would be observed in that measurement gauge considering the river in its natural condition, that is, assuming that there is no reservoir regulating the flow and no human activity impacts. The streamflow time series range from January 1931 to December 2016 without any missing values. The gauge station is located approximately 95 km upstream of the Sobradinho dam. This station is likely to suffer minor The water and climate characteristics of the basin are highly variable. The Sobradinho reservoir presents critical periods of prolonged droughts due to low rainfall and high evapotranspiration. The rainfall season starts approximately in November and ends in April [23,24]. The average annual flow of the São Francisco River is 2846 m 3 /s, and, until 2013, the water withdrawn was 278 m 3 /s [25]. One of the uses of water resources in the São Francisco River is irrigation, whose withdrawal is 77% of the region's total demand. The São Francisco Region plays an essential role in generating electricity, with a potential installed in 2013 of 10,708 MW, coming from 28 small plants and 12 large plants (12% of the country's total). The hydroelectric exploitation of the São Francisco River represents the NEB's energy supply base [24]. There are several large dams along the São Francisco River, namely Três Marias, Sobradinho, Itaparica, Moxotó, Paulo Afonso I, II, III, and IV, and Xingó, which were constructed between 1962 (Três Marias) and 1994 (Xingó). The Sobradinho dam (coordinates: 9 • 25'49" S, 40 • 49'37" W; construction: 1973-1979; the beginning of operations: 1979-1982) is located 742 km from the mouth, in the Bahia state. The dam has a height of 41 m and length of 12.5 km. The reservoir has a maximum length of 320 km, a surface area of 4214 km 2 , and a storage capacity of 34.1 × 10 9 m 3 . The Sobradinho reservoir was built to achieve multiple uses such as multiannual flow regulation for hydropower, navigation, irrigation, and flood control management for the riverine communities of the São Francisco River Basin.
Monthly naturalized flows measured at a gauging station along the São Francisco River were obtained from the ONS. The naturalized flow of a hydroelectric plant is the flow that would be observed in that measurement gauge considering the river in its natural condition, that is, assuming that there is no reservoir regulating the flow and no human activity impacts. The streamflow time series range from January 1931 to December 2016 without any missing values. The gauge station is Water 2020, 12, 2058 4 of 16 located approximately 95 km upstream of the Sobradinho dam. This station is likely to suffer minor effects of the Três Marias reservoir, the largest reservoir upstream of Sobradinho, and located about 1087 km from Sobradinho [4].

Materials and Methods
This section provides a brief description of the methods and the steps used for shift identification and projection of the low-frequency streamflow time series. All experiments were carried out in the R environment for statistical computing [26]. In this study, (i) the wavelet transform was used to decompose and reconstruct the low-frequency streamflow from the streamflow time series and (ii) a classical method (SRI) and three state-of-the-art methods (PELT, BFAST, and HMM) were used to identify the shifts in the low-frequency streamflow series. After the identification of the shifts present in the series and knowing how they repeated themselves, (iii) an HMM was used to project the low-frequency streamflow. The main advantage of applying this method to streamflow time series is its ability to simulate long persistence and regime-switching behavior [27]. Then, the degree of improvement in prediction accuracy from using the HMM as the projection method for low-frequency streamflow time series was assessed. Figure 2 illustrates the flowchart describing the key steps in this study.

Materials and Methods
This section provides a brief description of the methods and the steps used for shift identification and projection of the low-frequency streamflow time series. All experiments were carried out in the R environment for statistical computing [26]. In this study, (i) the wavelet transform was used to decompose and reconstruct the low-frequency streamflow from the streamflow time series and (ii) a classical method (SRI) and three state-of-the-art methods (PELT, BFAST, and HMM) were used to identify the shifts in the low-frequency streamflow series. After the identification of the shifts present in the series and knowing how they repeated themselves, (iii) an HMM was used to project the lowfrequency streamflow. The main advantage of applying this method to streamflow time series is its ability to simulate long persistence and regime-switching behavior [27]. Then, the degree of improvement in prediction accuracy from using the HMM as the projection method for lowfrequency streamflow time series was assessed. Figure 2 illustrates the flowchart describing the key steps in this study.

Time Series Decomposition Using Wavelet Transform
An extensive method used in extracting the low-frequency part of a time series is the wavelet transform [28][29][30], which decomposes the series in the time-frequency domain and identifies the dominant modes of variability. A wavelet transform decomposes a time series into a set of functions, also known as the "daughter wavelet", derived from the translation in time and scaling of the "mother wavelet". The choice of the mother wavelet is a significant one, where the kind of wavelet transform chosen depends on the type of output information needed. There are many mother wavelet functions from which to choose, such as Haar wavelet, Daubechies wavelet, Mexican Hat wavelet, Morlet wavelet, and others. This study applied the Morlet wavelet, which is commonly used in hydrological time series because of its power to describe the time series adequately and provide a better time-frequency localization [31,32].

Time Series Decomposition Using Wavelet Transform
An extensive method used in extracting the low-frequency part of a time series is the wavelet transform [28][29][30], which decomposes the series in the time-frequency domain and identifies the dominant modes of variability. A wavelet transform decomposes a time series into a set of functions, also known as the "daughter wavelet", derived from the translation in time and scaling of the "mother wavelet". The choice of the mother wavelet is a significant one, where the kind of wavelet transform chosen depends on the type of output information needed. There are many mother wavelet functions from which to choose, such as Haar wavelet, Daubechies wavelet, Mexican Hat wavelet, Morlet wavelet, and others. This study applied the Morlet wavelet, which is commonly used in hydrological time series because of its power to describe the time series adequately and provide a better time-frequency localization [31,32].
The signal component of the time series is identified by the 90-95% significance test using the white noise as a null hypothesis and the interpretation of the wavelet power spectrum. The identified significant component is then extracted from the original series using the reconstruction function. Reconstruction of the original time series over a set of periods can be obtained as follows: where C is the reconstruction factor; d j and dt are the scale and time factor, respectively; ψ(0) is the factor that removes the energy scaling for the Morlet wavelet function; Re(Wave(s)) is the real part of the wavelet transform; and s is the scale parameter. We used the WaveletComp package [33] to decompose the time series and reconstruct the low-frequency streamflow. More details of the wavelet technique can be found in [5,28,30].

Regime Shift Detection
In this study, we applied the PELT algorithm and the HMM to detect shifts in the annual low-frequency streamflow time series, and the BFAST model and the SRI to detect shifts in the monthly low-frequency streamflow series.

SRI
Drought indices, such as SRI, are used for drought identification and the description of its intensity. The SRI is based on the concept of the Standardized Precipitation Index [34], discussed by the authors of [35]. Although the indices show similarities, the SRI incorporates hydrological processes that control the seasonal loss in streamflow due to the climate's influence, thus being able to describe the hydrological aspects of droughts.
First, a long-term record was fitted to a probability distribution. A variety of probability distributions (e.g., gamma, lognormal, generalized extreme value, log-logistic, and generalized Pareto) have been used to fit monthly observations of different hydro-climatic variables for calculating drought indices [36,37]. Then, the cumulative distribution function (CDF) of the fitted marginal distribution was transformed into a standard normal variate Z.
After the low-frequency streamflow was decomposed and reconstructed using the wavelet transform to a significant variability frequency range, we tested three different distributions, where the best one was chosen to fit the reconstructed time series. The adjusted SRI (Ad-SRI), as we are calling the SRI of the low frequency, is not the same as the classical SRI as they present different information. Consequently, their variability range is different. The Ad-SRI uses the mean of a 12-month time scale to characterize the system's state in that year. Afterward, the values are classified. Years with negative values were termed State 1 and years with positive values were termed State 2.

PELT Algorithm
The detection of breakpoints was based on the method presented in [38]. A data set is defined as being y 1:n = (y 1 , . . . , y n ). In a model with multiple breakpoints (m) and their positions τ 1:m = (τ 1 , . . . , τ m ), each point is an integer between 1 and n − 1. It is defined that τ 0 = 0 and τ m+1 = n. Consequently, the m breakpoints will split the data into m+1 segments. A common approach in the methodology to detect multiple breakpoints is to minimize the cost function as follows: Water 2020, 12, 2058 where C is a cost function for a segment, e.g., negative log-likelihood, and βf (m) is a penalty to guard against overfitting (a multiple breakpoint version of the threshold c).
Many breakpoints algorithms are implemented in the R package changepoint [39] used in this study, such as the binary segmentation algorithm, segmentation neighborhood, and the PELT algorithm [38]. The PELT shows speed gains and increased accuracy over the other methods. The method is an adaptation of the optimal partitioning, and for computational efficiency it removes points that can never be minima from the minimization performed at each iteration by the cost function [39]. We assessed the annual low-frequency streamflow with the PELT algorithm and divided the shifts into two states based on the breakpoint found by the algorithm. Negative values of low-frequency streamflow were termed State 1, whereas positive values were termed State 2.

BFAST
The BFAST method, proposed by the authors of [40], is a decomposition method that integrates the iterative decomposition of a time series into trend, seasonal, and remainder components to examine changes (i.e., trends and breakpoints) within the time series. The method has been applied to detect long-term seasonal changes in satellite image time series [40]. The general model is described as follows: where Y t is the observed data at time t, T t is the trend component, S t is the seasonal component and the remainder component ξ t denotes the remaining variation in the data beyond that in the seasonal and trend components.
Assuming that the entire time series has m breakpoints τ 1 , . . . , τ m in the trend component T t , then the segment-specific slopes and intercepts can be calculated on each segment. The trend component can be expressed as follows: where i = 1, . . . , m and we define τ 0 = 0 and τ m+1 = n. The intercept α i and slope β i can be used to assess the magnitude and direction of the abrupt change.
Similarly, a harmonic model is applied to parameterize the seasonal component. The seasonal component is fixed between breakpoints. Given the time series has p seasonal breakpoints t 1 , . . . , t p , then the seasonal component S t can be calculated as follows: where the unknown parameters are the segment-specific amplitude γ k and phase δ k , which must be estimated. The known frequency f is equal to 12 for the monthly observations used here. The moving sum test based on ordinary least squares residuals (OLS-MOSUM) is applied to detect whether one or more breakpoints occur [41]. The breakpoints are estimated using Bai and Perron's method if the test indicates a significant change (p < 0.05). [42] argues that the Akaike information criterion (AIC) usually overestimates the number of breaks, and the Bayesian information criterion (BIC) is a more suitable procedure in many situations. Thus, in the method, the number of breaks was determined by the BIC, and the date and confidence interval of the date for each break were estimated. The BFAST model parameters were estimated by iterating the following steps: Step 1: If the OLS-MOSUM test shows that breakpoints occur in the trend component, then the number and positions of the breakpoints in the trend component τ 1 , . . . , τ m are estimated through least squares from the seasonally adjusted data Y t − S t . For a specific segment, the trend component can be estimated by Equation (4). Then, the trend coefficient α i and β i are calculated for different segments using robust regression method based on M-estimation to account for potential outliers.
Step 2: Similarly, if the OLS-MOSUM test indicates that breakpoints occur in the seasonal component, then the number and positions of the breakpoints in the seasonal component t 1 , . . . , t p are estimated from the detrended data Y t − S t . The parameters γ k and δ k for each segment are calculated using a robust regression method based on M-estimation. We applied the BFAST model to monthly data and identified the shifts in the low-frequency streamflow.

HMM
HMM [43] is a statistical model in which the realizations from an unobserved Markov process represent the observed time series [44,45]. A Markov process is a random process whose future probabilities are determined by its most recent values. The HMM was developed for speech recognition and has been successfully used in many knowledge areas, including hydrology [27,44,46].
An HMM consists of state (S 1:T ) and observation (O 1:T ) variables. The distribution of O t can be written as f i (O t ) = f (O t |S t = i) and the marginal distribution for a discrete number of states can be described as mixture distribution with n components [45][46][47]. The equation is written as follows: where n i = 1 p i = 1, p i ≥ 0 and f i () is the conditional distribution of the data.
The transition between states is governed by probabilities described as transition probabilities. They are denoted by the matrix A(t), where the first row (a 1 j ) has the probabilities from moving from state S t = 1 to S t+1 . When dealing with the transition's parameters in A, one must define the initial state or the prior probabilities π that define where the process begins.
The forward algorithm is used for calculating the joint likelihood [48]. The expectation-maximization algorithm is used to optimally estimate the parameters of the HMM. In this algorithm, the parameters are obtained with the maximization of the expected joint log-likelihood given the observations and states through an iterative process [45,49]. The Viterbi algorithm is applied to decode the observation sequences into hidden state sequences.

HMM for Projection
For the simulation of the model, the low-frequency streamflow was fitted to different HMMs, varying the number of hidden states. The model with the lowest BIC was chosen. Then, the model's parameters were estimated, including the mean and standard deviation of each state and the transition matrix of the model.
The prediction method was based on the work proposed by the authors of [50], which is divided into three steps: Step 1: The HMM parameters are calibrated using the training data, and the probability of the observed data is calculated.
Step 2: Based on similar data sets in the past data, we find parts of the training data with a similar likelihood.
Step 3: The difference between the streamflow of the previous year and the streamflow of the consecutive year is calculated to forecast the future streamflow.
When predicting the streamflow at a time step T + 1 of the time series, a part-time series of length D is chosen to be the training data, used to calibrate the HMM's parameters, λ (π,A,B). Consequently, streamflow patterns similar to the current year are located in the past data. Considering that the predicted value must assume a similar pattern from the training data, the difference of a year's streamflow and the next year's value is calculated. The difference between years is estimated by the summation of the probability of being in a previous state multiplied by the mean of its respective states.
Similarly, to predict the streamflow at time T + 2, new training data is used by adding the predicted value and repeating the three-step-prediction process for the time step T + 2 and so on. Data length and quality are limitations in the method because shorter records cannot show the low-frequency variability efficiently needed in multidecadal projections.

Model Performance Metrics
The projection performance of the model for the low-frequency streamflow time series is estimated by comparing the observation and prediction. The root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (R) are used to estimate the performance of the HMM, as defined in Equations (8)- (10).
N is the number of input samples; y i and y * i represent the observed and predicted runoff at time t, respectively; and y and y * represent the averages of the observed and predicted runoff, respectively.
In the evaluation, the target values are R close to 1, MAE and RMSE close to 0. The RMSE is an ideal error index used to evaluate the global fitness of high streamflow values, whereas the MAE provides a more balanced measure of overall errors [22].

Time Series Decomposition
A wavelet transform is used to decompose and extract the low frequency from the time series of Sobradinho's inflow. Figure 3a illustrates the annual streamflow time series, while Figure 3b shows the wavelet power spectrum and the average wavelet power over that period. The red points represent the significance levels and show which regions of the spectrum meet the significance level. The frequency chosen in this study was between 16 and 32 years since it presented significance in its power spectrum and the average wavelet power. The 16-to 32-year frequency represents 10.20% of the explained variance of the streamflow time series. Figure 4 illustrates the reconstructed time series using the wavelet transform decomposition. The series was reconstructed for high (2-8 years), medium (8-16 years), low (16-32 years) frequency, and the residue, which is represented with the frequency higher than 32 years.
represent the significance levels and show which regions of the spectrum meet the significance level. The frequency chosen in this study was between 16 and 32 years since it presented significance in its power spectrum and the average wavelet power. The 16-to 32-year frequency represents 10.20% of the explained variance of the streamflow time series. Figure 4 illustrates the reconstructed time series using the wavelet transform decomposition. The series was reconstructed for high (2-8 years), medium (8-16 years), low (16-32 years) frequency, and the residue, which is represented with the frequency higher than 32 years.    The y-axis represents the different frequencies selected for a band-pass filter of wavelets for the reconstruction of the time series. The first series is the high-frequency (2-8 years), next is the medium-frequency (8-16 years), followed by the low-frequency (16-32 years) and the residue, which is the frequency higher than 32 years.

Low-Frequency Streamflow Shift Detection
The identification of shifts in time series is still the subject of different methods that try to comprehend changes in hydrological records due to the effect of natural climatic variability and human activities. An analysis of the low-frequency streamflow time series was made using Ad-SRI, PELT, BFAST, and HMM, aiming to identify states in the time series.
In the Ad-SRI analysis, we tested three distributions (generalized Pareto, Pearson type III, and

Low-Frequency Streamflow Shift Detection
The identification of shifts in time series is still the subject of different methods that try to comprehend changes in hydrological records due to the effect of natural climatic variability and human activities. An analysis of the low-frequency streamflow time series was made using Ad-SRI, PELT, BFAST, and HMM, aiming to identify states in the time series.
In the Ad-SRI analysis, we tested three distributions (generalized Pareto, Pearson type III, and gamma), and the best fit for the low-frequency streamflow was Pearson type III. The Pearson type III distribution has also been widely used for flow frequency analyses [51]. Although the authors of [51] reported an excessive frequency of negative values, the analyzed low-frequency streamflow seems to present bias for positive values. The BFAST was also used to assess the shifts in a monthly time frame. The model did not present any seasonal breakpoints, only trend breakpoints. The PELT algorithm and the HMM with two states were applied to evaluate the low-frequency streamflow annually. Figure 5 illustrates the shifts in the low-frequency streamflow dynamics. The first shift from State 1 to State 2 occurred in the beginning of the 1950s for most models, whereas for the BFAST it started five years later and lasted for approximately three years more than the other models. The BFAST showed a similar classification as the other models from 1985 onward.
The low-frequency streamflow shift models showed a clear separation of the states. Then, the CDF for each state was calculated using the series' data without being decomposed by the wavelet transform to verify if those periods followed a similar statistical distribution. Figure 6 illustrates that the two states of the series without being decomposed by the wavelet transform do not follow the same distribution in the CDF plot. When comparing the CDFs across the four methods, the BFAST is unable to distinguish states well. Results of the Ad-SRI, PELT, and HMM were similar. State 2 presented a probability of higher streamflow values than in State 1 in all models. Hence, the wet periods have a higher likelihood of occurring than the dry periods. The low frequency shows a clear effect in the pattern of the time series, which justifies its study as one of the influencing components of the time series behavior. Knowing that the states have a different probability distribution and identifying the states in which the system presents itself give the modeler great insight on how to project the time series.  Figure 5 shows that the classification of the Ad-SRI, HMM, and PELT were very similar, presenting a cycling behavior of seven to nine years duration up until the mid-1980s. In that period, State 1 of the Ad-SRI model started later compared with the PELT and the HMM. The BFAST classification produced longer cycles, of approximately 11 to 16 years, compared with the other models in this study.
The first shift from State 1 to State 2 occurred in the beginning of the 1950s for most models, whereas for the BFAST it started five years later and lasted for approximately three years more than the other models. The BFAST showed a similar classification as the other models from 1985 onward.
The low-frequency streamflow shift models showed a clear separation of the states. Then, the CDF for each state was calculated using the series' data without being decomposed by the wavelet transform to verify if those periods followed a similar statistical distribution. Figure 6 illustrates that the two states of the series without being decomposed by the wavelet transform do not follow the same distribution in the CDF plot. When comparing the CDFs across the four methods, the BFAST is unable to distinguish states well. Results of the Ad-SRI, PELT, and HMM were similar. State 2 presented a probability of higher streamflow values than in State 1 in all models. Hence, the wet periods have a higher likelihood of occurring than the dry periods. The low frequency shows a clear effect in the pattern of the time series, which justifies its study as one of the influencing components of the time series behavior. Knowing that the states have a different probability distribution and identifying the states in which the system presents itself give the modeler great insight on how to project the time series. Water 2020, 12, x FOR PEER REVIEW 11 of 16

Projection of the Low-Frequency Streamflow with the HMM
The method applied to predict the low-frequency streamflow time series of the Sobradinho reservoir was based on the work proposed by the authors of [50]. In this case, the lowest BIC was the criterion for the number of states in the HMM. The model was used to predict periods of 5, 10, and 15 years. Figure 7 shows the CDFs of the predicted low-frequency streamflow, observed lowfrequency streamflow for the test period, and the observed low-frequency streamflow from 1931 to 2016 in the (a) 5-year, (b) 10-year, and (c) 15-year projection.
Although the models overestimated the low-frequency streamflow values, they were able to accompany the shifts in the streamflow regime. Compared with the mean of the training periods (1931-2012, 1931-2007, and 1931-2001), the models were able to follow the dry and wet periods better than the mean values. Figure 7 presents an improvement in the information of the dry period. Lowfrequency streamflow (1931-2016) has a high probability of average low flow than the observed-test, and the model prediction can model this behavior. The entire time series (observed low-frequency streamflow from 1931 to 2016) presents a variance that is greater than the observation. The model prediction has the same scale as the variance of observation of the test period.

Projection of the Low-Frequency Streamflow with the HMM
The method applied to predict the low-frequency streamflow time series of the Sobradinho reservoir was based on the work proposed by the authors of [50]. In this case, the lowest BIC was the criterion for the number of states in the HMM. The model was used to predict periods of 5, 10, and 15 years. Figure 7 shows the CDFs of the predicted low-frequency streamflow, observed low-frequency streamflow for the test period, and the observed low-frequency streamflow from 1931 to 2016 in the (a) 5-year, (b) 10-year, and (c) 15-year projection.
Although the models overestimated the low-frequency streamflow values, they were able to accompany the shifts in the streamflow regime. Compared with the mean of the training periods (1931-2012, 1931-2007, and 1931-2001), the models were able to follow the dry and wet periods better than the mean values. Figure 7 presents an improvement in the information of the dry period. Low-frequency streamflow (1931-2016) has a high probability of average low flow than the observed-test, and the model prediction can model this behavior. The entire time series (observed low-frequency streamflow from 1931 to 2016) presents a variance that is greater than the observation. The model prediction has the same scale as the variance of observation of the test period.  Table 1 shows the statistical accuracy of the predictions by different models, indicated by RMSE, MAE, and R. Smaller values of RMSE and MAE suggest more accurate predictions. High values of R indicate a better fit between prediction and observation. Table 1 shows that the period of 15 years had the lowest RMSE and MAE. However, it presented the lowest value of R as well. The RMSE and MAE values are still close to those of 5 years and 10 years, with an increase of 8% and 17%, respectively, for the RMSE. Furthermore, there was a decrease of 63% for the R in the 15-year period compared with the other projections.

Discussion and Conclusions
The nonlinearity of hydrological systems has been recognized for many years. The recent development of computational power and data acquisition provided us with tools and new methods to study temporal and spatial variability in hydrological variables [20]. Different studies [4,20] provided evidence that the time series of the Sobradinho reservoir presents nonlinear behavior, particularly after its construction. Therefore, the wavelet transform was applied in this paper as a pre-processing tool to extract significant features such as the low-frequency variability and to gain insight into time-varying characteristics that the streamflow time series may present.
Previous studies have indicated that a hydrological regime shift may occur due to anthropogenic activities, such as dam construction. In this context, the analysis of the low-frequency streamflow using the BFAST model showed a shift in 1986, and for the other models, the shift started in 1985. We also observed a reduction in the low-frequency streamflow peaks from 1985. These alterations in the  Table 1 shows the statistical accuracy of the predictions by different models, indicated by RMSE, MAE, and R. Smaller values of RMSE and MAE suggest more accurate predictions. High values of R indicate a better fit between prediction and observation. Table 1 shows that the period of 15 years had the lowest RMSE and MAE. However, it presented the lowest value of R as well. The RMSE and MAE values are still close to those of 5 years and 10 years, with an increase of 8% and 17%, respectively, for the RMSE. Furthermore, there was a decrease of 63% for the R in the 15-year period compared with the other projections.

Discussion and Conclusions
The nonlinearity of hydrological systems has been recognized for many years. The recent development of computational power and data acquisition provided us with tools and new methods to study temporal and spatial variability in hydrological variables [20]. Different studies [4,20] provided evidence that the time series of the Sobradinho reservoir presents nonlinear behavior, particularly after its construction. Therefore, the wavelet transform was applied in this paper as a pre-processing tool to extract significant features such as the low-frequency variability and to gain insight into time-varying characteristics that the streamflow time series may present.
Previous studies have indicated that a hydrological regime shift may occur due to anthropogenic activities, such as dam construction. In this context, the analysis of the low-frequency streamflow using the BFAST model showed a shift in 1986, and for the other models, the shift started in 1985.
We also observed a reduction in the low-frequency streamflow peaks from 1985. These alterations in the low-frequency streamflow regime might indicate the influence on the river regime of the Sobradinho dam, which was constructed between 1973 and 1979 and started functioning between 1979 and 1982. According to the authors of [52], downstream areas of the Sobradinho dam presented significant changes in the annual seasonality of floods. Although more events that contributed to increases in precipitation occurred during the 1986-2006 period, this was not reflected in the flood data, concluding that dam impacts combined with other water withdrawals, particularly for agriculture, are the main reasons for changes in floods along the São Francisco River [52]. Other concerns raised for the region are the increase in the amount of water removed from the São Francisco River due to the increase in irrigation farming and illegal water removal from the river [53]. These anthropogenic actions are tough to measure and carry a great impact on flow rates.
Some authors [54,55] observed a systematic change in the hydrological time series of the NEB in the late 1970s and early 1980s that are associated with the natural fluctuations such as decadal variability. Several studies point out that precipitation records over South America exhibit decadal and interdecadal variability. This precipitation variability has been associated with sea surface temperature (SST) anomalies such as the PDO [20,56] and the AMO [57] in the NEB. Thus, although there was no apparent correlation between the climate indices and the analyzed low-frequency streamflow, these variables may influence some periods of the low-frequency streamflow behavior and the shifts present in hydrological variables.
Another significant perturbation that can cause changes in the hydrological system is extreme events such as droughts. The recent intensive drought developed in the NEB region for the period 2010-2017 has been one of the worst in the last decades. We can see an apparent decrease in low-frequency streamflow. However, in the low-frequency streamflow, the decrease is not that intense. That phenomenon may be because the drought was associated with the strong 2015-2016 El Niño, which brought warmer weather and SSTs [58]. That influence may be present in the high frequency of the time series (2-to 8-year frequency in Figure 4).
Future projections state that temperatures in the NEB should increase, while rainfall could decline by approximately 25-50% in semi-arid areas. Consequently, flow rates will be reduced for various rivers in the NEB [58,59]. Hydroelectric potential in the São Francisco River Basin will be reduced due to more frequent and intense climate-induced droughts. Water allocation and appropriate land management are necessary for the region [52]. Consequently, more realistic projections can help to improve water management in the region.
In this study, we also proposed a model to forecast the nonlinear low-frequency streamflow series. Three scenarios of projection were modeled and evaluated using three statistical performance evaluation measures (RMSE, MAE, and R). The forecasts for 5-and 10-year periods presented remarkably high values of R. Although the value for the 15-year period was low, the result is significant when compared with the results provided in [7], whose authors used a climate hidden Markov model and presented an R 2 of approximately 0.26 for a lead time of 15 years. The model presents an improvement in the information of the dry period. Low-frequency streamflow (1931-2016) has a higher probability of average low flow than the test data set, and the model prediction can capture this behavior. This information is relevant for water resource planning, in particular for drought planning. Furthermore, as extreme climatic events become more frequent and threatening, it is essential to assess watersheds and prepare strategies for those situations.
Identifying different state periods also reveals the impact of low frequency in the streamflow time series. Due to the clear separation of states in the analysis, we observed that the patterns have different probability distributions through the CDF plots; thus, the low-frequency variability conditions the flows of a given year. The model comparison in this paper provides an insight into a modified version of a classical method such as the SRI and state-of-the-art methods such as BFAST, PELT, and HMM available for identifying shifts in the time series. Assessing the current state of low-frequency streamflow allows the assessment of the dynamic risk of extreme events and an accurate forecast of streamflow. The HMM forecast model is a tool to aid in the management and operation of this reservoir.