Open Access This article is
- freely available
Entropy 2019, 21(2), 132; https://doi.org/10.3390/e21020132
Application of Entropy Spectral Method for Streamflow Forecasting in Northwest China
College of Water Resources and Architectural Engineering, Northwest A&F University, Yangling 712100, China
Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A&F University, Yangling 712100, China
Department of Water Resources Management and Agricultural-Meteorology, Federal University of Agriculture, PMB 2240, Abeokuta 110282, Nigeria
Author to whom correspondence should be addressed.
Received: 5 November 2018 / Accepted: 21 January 2019 / Published: 1 February 2019
Streamflow forecasting is vital for reservoir operation, flood control, power generation, river ecological restoration, irrigation and navigation. Although monthly streamflow time series are statistic, they also exhibit seasonal and periodic patterns. Using maximum Burg entropy, maximum configurational entropy and minimum relative entropy, the forecasting models for monthly streamflow series were constructed for five hydrological stations in northwest China. The evaluation criteria of average relative error (RE), root mean square error (RMSE), correlation coefficient (R) and determination coefficient (DC) were selected as performance metrics. Results indicated that the RESA model had the highest forecasting accuracy, followed by the CESA model. However, the BESA model had the highest forecasting accuracy in a low-flow period, and the prediction accuracies of RESA and CESA models in the flood season were relatively higher. In future research, these entropy spectral analysis methods can further be applied to other rivers to verify the applicability in the forecasting of monthly streamflow in China.
Keywords:burg entropy; configurational entropy; relative entropy; spectral analysis; streamflow forecasting
Accurate streamflow forecasting is vital for flood control, reservoir management, restoration of river environment, irrigation, and navigation, among other uses . Moreover, it can also provide guidelines for policy makers in the utilization and management of water resources and the formulation of water environmental health protection plans. So far, the simulation of monthly streamflow is a hotspot for hydrologic researchers but is still in exploration and development due to the limitations of forecasting methods. As a traditional method, time series analyses such as autoregressive (AR) or autoregressive moving average (ARMA) models are often used to simulate streamflow, but they cannot address the issue of seasonality that exists in the monthly streamflow series . Fortunately, entropy spectral analysis can extract significant information from streamflow process and forecast monthly streamflow accurately coupled with the time series analysis method. Actually, the spectral method has been successfully used by some researchers for monthly streamflow forecasting with different types of entropy including Burg entropy , configuration entropy [1,2], and minimum relative entropy [4,5].
Burg  proposed Burg entropy theory (BET) in the frequency domain and then further developed the maximum Burg entropy spectral method (BESA) for time series forecasting. As a classic method for hydrologic forecasting, BESA has been widely used in groundwater level forecasting , flood forecasting , and streamflow forecasting  and has shown an advantage for long-term streamflow forecasting. However, BESA has lower resolution in determining multi-peak spectra, and the monthly streamflow hardly exist in only one period. Maximum configuration entropy spectral method (CESA) is a substitute for the forecasting of multi-peak spectra series.
The concept of the maximum configuration entropy spectral method (CESA) was initially proposed by Frieden  in the identification of images. Thereafter, Gull and Daniell  applied the concept in the field of astronomy for image reconstruction. In the field of time series analysis, the CESA performs better than the BESA in the determination of spectral density function in the ARMA model and the MA model but has no practical advantage in the AR model . The CESA has been applied for streamflow forecasting by Cui  and has shown better reliability than BESA.
As an extension of BESA, minimum relative entropy spectral analysis (RESA) proposed by Shore [12,13] was also applied to the time series forecasting. In RESA, the spectral power was considered as a random variable. Tzannes et al.  and Woodbury and Ulrych  developed RESA and extended the theory and practice of minimum relative entropy. The RESA spectra have higher resolution and are more accurate in detecting peak location than other methods for spectral computation . The RESA method has been used for monthly streamflow forecasting [4,5,16] and has smaller errors than the other two entropy spectral methods.
However, there is very little research that has reported the application of these methods in streamflow forecasting in China. Moreover, not many researchers have given attention to the selection of streamflow length for a training period. Therefore, the main objectives of this paper are (1) to use three entropy spectral methods for monthly streamflow forecasting in Northwest China, (2) to select the appropriate training period for the models, and (3) to compare the three models and select the best model for streamflow forecasting in Northwest China.
Suppose there is a streamflow series, y(t). Convert it to the frequency domain f. If f is considered a random variable, the spectral density function is normalized as a probability function. Burg entropy can then be expressed as:where W = 1/2∙Δt is the Nyquist fold-over frequency and Δt is the sampling period.
The definition of configuration entropy is similar to Burg entropy and is defined as:
With the given prior spectral density function q(f), the relative entropy can be defined as:
The prior spectral density is like background noise in the peak of observed periodicity. When spectral density is uniform, the relative entropy reduces to a configuration entropy.
The processes shown in Figure 1 mainly include (1) calculating parameters; (2) determining spectral density function; (3) extending autocorrelation function; and (4) forecasting streamflow and comparing the three methods for the selection of the most appropriate method.
2.1. Deriving Spectral Density Function
In order to obtain the least biased spectral density, under some given constraints, the Burg and configuration entropies are maximized while the relative entropy is minimized before spectral density estimation. According to the relationship of spectral density function and autocorrelation function, the constraints could be given as:where , is the autocorrelation function of n-th lag; N usually equals 1/4 to 1/2 of the streamflow length series.
Subject to the constraints, entropy can be maximized or minimized using the Lagrangian function, which can be formulated as:where λn is the Lagrangian multiplier and H(f) is the entropy function. The partial derivatives of L to the spectral density are taken and then equated to zero. The least biased spectral densities obtained by maximizing Burg entropy and configuration entropy and minimizing relative entropy respectively, are expressed as follows:
2.2. Calculating Lagrangian Multipliers
The methods of calculating Lagrangian multipliers are different due to the variation in the forms of spectral densities. For Burg entropy, Levinson–Burg algorithms [6,17] are applied to determine Lagrangian multipliers. While in the case of configuration entropy and relative entropy, cepstrum algorithms are applied to calculate Lagrangian multipliers. By taking the inverse Fourier transform of the log magnitude of Equation (8), we can obtain:
Take the prior and posterior cepstrum of autocorrelations which are transformed from the prior and posterior spectral densities and expressed as eq(n) and ep(n) in the following equations:
Then Equation (9) can be abbreviated as:where δn is a delta function.
Lagrangian multipliers can be solved using N linear functions from Equation (12) of the relative entropy
For configuration entropy, eq = 0, Lagrangian multipliers can then be calculated by:
2.3. Forecasting Streamflow
BESA allows autocorrelation to expand as a linear combination of previous autocorrelation parameters with predicted coefficients. can be expressed as:where aj is obtained using the reflection recursion method proposed by Burg .
For the configurational and relative entropies, the autocorrelations are extended as:and
According to the extended autocorrelation functions, the forecasting equations of the three spectral entropies methods are obtained as follows:where Cp(T + k) is the cepstrum of streamflow series, and it always equals . m is the order of the model, which is determined by BIC criteria.where N is the length of streamflow series and is the variance of residual of observed and forecasted streamflow.
2.4. Evaluating the Precision of Forecasting Results
In this paper, we selected average relative error (RE), root mean square error (RMSE), correlation coefficient (R) and determination coefficient (DC) as evaluation indicators for the forecasted results. The RE, RMSE, R and DC are expressed as:where represents the average value of observed streamflow x(t), represents the average value of the forecasted streamflow f(t), and n is forecasting period (month). According to the “forecasting norm for hydrology intelligence”, the determination coefficient (DC) is classified into three levels as shown in Table 1.
3.1. Data Preprocessing
Observed streamflow data from five hydrological stations, Yingluoxia, Zamusi, Jiutiaoling, Xiangtang and Tangnaihai, in Northwest China were selected to verify these three spectral entropy methods. These five hydrological stations are located in the Yellow River, Heihe River and Shiyang River, respectively. Tangnaihai station is located at the mainstream of the Yellow River, while Xiangtang is located at the tributary of the Yellow river. Zamusi and Jiutiaoling stations are situated on the Shiyang River. Yingluoxia station is located at the Heihe River and it marks the boundary between the upstream and middle reaches . Basic information on the five hydrological stations are shown in Figure 2 and Table 2.
The entropy spectral analysis model belongs to the autocorrelation methods, and the input data should be a standardized stationary random sequence. To meet the requirement, the streamflow sequences should be transformed using the Box–Cox method. Box–Cox transformation can eliminate data skewness and make data errors present a normal distribution . In addition, standardized transformation was also performed on the sequences.
To test whether transformed sequences were stable, we verified the unit root of sequences. If the unit root exists in the sequence, it is not a stationary random sequence and vice versa . The adftest function in the econometric toolbox of MATLAB 2010b (2010b, MathWorks, Beingjing, China, https://ww2.mathworks.cn/products/matlab-online.html) was used to test whether the unit root exists. The adftest function assumes that the unit root does not exist in the sequence. If the hypothesis is true, the logical value of H is 1 and the confidence can be returned. If the hypothesis is false, the logical value of H is 0. The test results of all transformed streamflow sequences for five hydrological stations show that all the sequences are stable and homogeneous (Table 3).
3.2. Determining Training Period
In previous research, the training periods were always less than 100 months, and very few papers discussed the influence of the training period on the forecasted results. In this paper, we selected the observed streamflow data from the years 2008 to 2012 as the validation period. Additionally, observed data from 3 to 40 months were selected as a training period to evaluate the influence of the training period on forecasted results. In order to determine the period of models, the relationship between the different training periods and the optimal order of the models are explained in Figure 3. As seen in Figure 3, when the training period is short, the optimal fitting order of models is lower, and then the optimal order of models tends to be stable with the increase of training period.
Beyond that, the relationship between the training period and the DC of the validation period were explored (Figure 4). The forecasting effect is weak and not stable enough when the training period is less than 15 years. However, increase in the training period increases and stabilizes the DC. In order to make use of the expert opinion to increase the forecasting precision, the calibration period was determined as 26 years.
3.3. Estimating Spectral Density
Spectral analysis is a powerful method employed to check the periodicity by finding out the frequency of spectrum peaks. The spectral densities estimated by these three spectral entropy methods were compared to the spectral density estimated by fast Fourier transform (FFT) (Figure 5). Five representative rivers were chosen to show the ability of BESA, CESA, and RESA to estimate the spectral densities. For RESA, a prior spectral density function was hypothesized from data information. The determination process of prior spectral density functions is described in Appendix A. It can be discovered from Figure 5 that all of the stations displayed a peak at frequency 1/12. On the other hand, there were other peaks near frequency 1/4th and 1/6th in the spectral density at Zamusi, Jiutiaoling and Tangnaihai stations.
For uni-peak streamflow series, the BESA, CESA, and RESA can check the periodicity equally as well as FFT. However, for multi-peak streamflow series, the BESA did not perform as effectively in detecting the principal periodicity. On the contrary, the CESA and RESA correctly checked the most significant peak at the 1/12th frequency. However, CESA always neglects all secondary spectral peaks to keep the peak at 1/12th most significant. The RESA detected less significant peaks, and was consistent with the FFT results.
In order to examine whether this variation would affect the forecasting precision, we used these three methods to forecast streamflow in five hydrological stations for selecting the optimal model in northwest China.
3.4. Streamflow Forecasting Analysis
Streamflow was forecasted using three spectral entropy methods for five hydrological stations (Figure 6 and Figure 7) with a validation period of five years. The results indicated that the forecasting accuracy was worse in Tangnaihai station where the DC is less than 0.6 (Table 4) and belongs to level C compared with the other four stations. The reason for this may be that the catchment area of Tangnaihai station is much wider than other stations. Moreover, the intensive anthropogenic activities might also have a severe impact on the streamflow of Tangnaihai station. Therefore, it is difficult to accurately forecast streamflow with only streamflow from previous months using autoregression-based models.
By comparing the forecasting accuracy of the three models for five hydrological stations during the validation period, we discovered that the rank of forecasting accuracy with the evaluation criteria of DC, RMSE and R for the three models was in the order RESA > CESA > BESA for Yingluoxia station (Table 4). However, for Zamusi station of Shiyang River, the accuracy of the CESA model was higher than the other models (Table 4, Figure 6). For the remaining three hydrological stations, the accuracy was similar for the three models, and the RESA model was more accurate than CESA and BESA models using DC, RMSE and R criteria. However, the RE between the observed streamflow and forecasted streamflow using BESA was smaller than other methods. The reason for this is that RE reflects the linear error between observed values and forecasted values, while the RMSE, R and DC reflect the quadratic power error between observed values and forecasted values. When the forecasting error of the flood season was smaller, the RMSE, R and DC would be effective. However, when the forecasting error of the non-flood season were smaller, RE would be better.
To verify this conjecture, the whole period was divided into flood season from July to October and low-flow season from January to June, November, and December in each year. We extracted the forecasted streamflow of the non-flood season and compared it with the observed streamflow in the five stations (Table 5). As shown, BESA performs better than other methods. During the low flow season, the advantage of BESA over the others was significant, where the streamflow was forecasted close to the observation. However, the overall forecasting accuracy of the RESA model and the CESA model was higher. At the same time, because the streamflow forecasting itself serves as the optimal allocation of water resources, the annual or flood runoff prediction was more meaningful. As a whole, the RESA model can better adapt to the streamflow forecasting for the five hydrological stations in northwest China. Combining precipitation as a predictor, selecting one or more models with high accuracy in the flood season, and using the entropy spectrum model and its combination  to forecast streamflow could be a future research direction.
In this paper, the BESA, CESA, and RESA models were applied for spectral analysis and streamflow forecasting in northwest China using monthly streamflow data from five hydrological stations. The estimated spectral density and prediction accuracy of the three methods were compared based on the optimal length of the training period. The spectral density functions of the BESA, CESA and RESA was smoother than that of FFT, and all of them can clearly estimate the 12 month primary period of monthly streamflow sequence without deviation.
However, the spectral density function of BESA could not detect the other significant secondary periods, while that of CESA could detect the secondary periods for multi-period sequences despite a certain degree of leakage. By comparing these three entropy spectral methods, we discovered that all of these methods could forecast streamflow accurately. Among them, the RESA model has the highest prediction accuracy, followed by the CESA model.
Due to the lack of data, this paper only applied the entropy spectral theory to the monthly streamflow forecasting of few rivers in northwest China. In future research, three entropy spectral analysis methods can further be applied to other rivers to verify the applicability of the three entropy spectral analysis methods in the forecasting of monthly streamflow in China.
G.Z. designed the study and wrote the manuscript. Z.Z. performed the experiments. X.S. and O.O.A. reviewed and edited the manuscript. All the authors have read and approved the final manuscript.
We are grateful for the grant support from the National Natural Science Fund in China (Project No. 51879222) and The National Key Research and Development Program of China (Project No. 2016YFC0401306).
We wish to thank the editor and anonymous reviewers for their valuable comments and constructive suggestions, which were used to improve the quality of this manuscript.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. Determination of the Prior Spectral Density Function for RESA Method
In this paper, the spectral density of the monthly streamflow sequence in each hydrological station was estimated to determine the major periods and their intensity. The prior spectral density for RESA was determined based on the estimated spectral density of CESA. Through the spectrum analysis of BESA and CESA, it can be found that all sequences have significant period of 12 months. There are 6 months and 4 months minor periods in most hydrological stations. Based on the spectral estimation values of each hydrological station, the prior spectral density functions mainly consist of six types. The specific settings are shown in Table A1.
Table A1. Hypothesis on the prior spectral density.
|Number||Period||Spectral Density Function|
|Assumption 2||12 months||,|
|Assumption 3||12 months, 6 months||, ,|
|Assumption 4||12 months, 4 months||, ,|
|Assumption 5||12 months, 4 months, 6 months||, ,|
|Assumption 6||12 months, 4 months, 6 months||, , ,|
To find the optimal prior spectral density function for RESA, the Itakura–Saito distance (I–S) between the CESA spectral density and the prior RESA spectral density are calculated. The formula for solving I–S distance of two discrete sequences is as follows:where and are spectral functions of two different sequences, DI–S represents I–S distance of two spectral functions.
The smaller the I–S distance values are, the smaller the difference between the two spectral density functions is. The distance between the estimated CESA spectral function of each hydrological station and the assumed I–S is shown in Table A2. For each hydrological station, the hypothesis corresponding to the minimum I–S distance is the closest to the spectral density estimated by CESA, which should be taken as the prior spectral density function. Results are shown in Table A2.
Table A2. Itakura–Saito distance between CESA spectral density and each hypothesis spectral density for RESA.
|Hydrologic Station||Assumption 1||Assumption 2||Assumption 3||Assumption 4||Assumption 5||Assumption 6|
Note: Boldface represents the optimal spectral density functions for RESA in five hydrologic stations.
- Zhou, Z.; Ju, J.; Su, X.; Singh, V.; Zhang, G. Comparison of Two Entropy Spectral Analysis Methods for Streamflow Forecasting in Northwest China. Entropy 2017, 19, 597. [Google Scholar] [CrossRef]
- Cui, H.; Singh, V.P. Configurational entropy theory for streamflow forecasting. J. Hydrol. 2015, 521, 1–17. [Google Scholar] [CrossRef]
- Cui, H.; Singh, V.P. Maximum entropy spectral analysis for streamflow forecasting. Phys. A 2016, 442, 91–99. [Google Scholar] [CrossRef][Green Version]
- Cui, H.; Singh, V.P. Minimum relative entropy theory for streamflow forecasting with frequency as a random variable. Stoch. Environ. Res. Risk Assess. 2016, 30, 1545–1563. [Google Scholar] [CrossRef][Green Version]
- Cui, H.; Singh, V.P. Application of minimum relative entropy theory for streamflow forecasting. Stoch. Environ. Res. Risk Assess. 2017, 31, 587–608. [Google Scholar] [CrossRef]
- Burg, J.P. Maximum Entropy Spectral Analysis. Ph.D. Thesis, Stanford University, Stanford, CA, USA, 1975. [Google Scholar]
- Huo, C.; Chen, N.; Liu, H. Application of time series free regreession model in dynamic simulation and prediction of groundwater in irrigation area. Geotech. Investig. Surv. 1990, 1, 36–38. [Google Scholar]
- Krstanovic, P.F.; Singh, V.P. A real-time flood forecasting model based on maximum-entropy spectral analysis: I. Development. Water Resour. Manag. 1993, 7, 109–129. [Google Scholar] [CrossRef]
- Frieden, B.R. Restoring with maximum likelihood and maximum Entropy. J. Opt. Soc. Am. 1972, 62, 511–518. [Google Scholar] [CrossRef] [PubMed]
- Gull, S.F.; Daniell, G.J. Image reconstruction from incomplete and noisy data. Nature 1978, 272, 686–690. [Google Scholar] [CrossRef]
- Nadeu, C. Finite length cepstrum modelling—A simple spectrum estimation technique. Signal Proc. 1992, 26, 49–59. [Google Scholar] [CrossRef]
- Shore, J.E. Minimum Cross-Entropy Spectral Analysis; Naval Research Laboratory: Washington, DC, USA, 1979.
- Shore, J.E. Minimum cross-entropy spectral-analysis. IEEE Trans. Acoust. Speech 1981, 29, 230–237. [Google Scholar] [CrossRef]
- Tzannes, M.A.; Politis, D.; Tzannes, N.S. A general method of minimum cross-entropy spectral estimation. IEEE Trans. Acoust. Speech 1985, 33, 748–752. [Google Scholar] [CrossRef]
- Woodbury, A.; Ulrych, T.J. Minimum relative entropy and probabilistic inversion in groundwater hydrology. Stoch. Hydrol. Hydraul. 1998, 12, 317–358. [Google Scholar] [CrossRef]
- Singh, V.P.; Cui, H. Entropy Theory for Streamflow Forecasting. Environ. Process. 2015, 2, 449–460. [Google Scholar] [CrossRef]
- Box, G.E.; Cox, D.R. An analysis of transformations. J. R. Stat. Soc. B 1964, 26, 211–252. [Google Scholar] [CrossRef]
- Burg, J.P. Maximum entropy spectral analysis. In Proceedings of the 37th Meeting of Society Exploration Geophysics, Oklahoma City, OK, USA, 31 October 1967; pp. 34–41. [Google Scholar]
- Jam, D.; Singh, V.P. A comparison of transformation methods for flood frequency analysis. J. Am. Water Resour. 1986, 22, 903–910. [Google Scholar] [CrossRef]
Figure 1. The flow chart of streamflow forecasting using entropy spectral method. RE: average relative error; RMSE: root mean square error; R: correlation coefficient; DC: determination coefficient.
Figure 2. Location of hydrologic stations in Northwest China.
Figure 3. The model order corresponding to the different calibration periods. (a) Yingluoxia station; (b) Jiutiaoling station; (c) Zamusi station; (d) Xiangtang station; (e) Tangnaihai station.
Figure 4. Evaluation index (DC) corresponding to different lengths of calibration period. (a) Yingluoxia station; (b) Jiutiaoling station; (c) Zamusi station; (d) Xiangtang station; (e) Tangnaihai station.
Figure 5. Spectral density estimated by BESA, CESA, RESA and fast Fourier transform(FFT) method for five hydrological stations in Northwest China. (a) Yingluoxia station; (b) Jiutiaoling station; (c) Zamusi station; (d) Xiangtang station; (e) Tangnaihai station.
Figure 6. Streamflow forecasting using entropy spectral methods for five hydrological stations. (a) Yingluoxia station; (b) Jiutiaoling station; (c) Zamusi station; (d) Xiangtang station; (e) Tangnaihai station.
Figure 7. Comparison between observed and forecasted streamflow. (a) Yingluoxia station; (b) Jiutiaoling station; (c) Zamusi station; (d) Xiangtang station; (e) Tangnaihai station.
Table 1. Model forecasting accuracy rating.
Table 2. Basic information of streamflow data for selected hydrologic stations .
|Hydrologic Station||Longitude||Latitude||River||Catchment Area (km2)||Control Area (km2)||Annual Runoff (m3/s)|
|Yingluoxia||100°11′ E||38°48′ N||Hei River||130,000||10.009||51|
|Zamusi||102°34′ E||37°42′ N||Zamu River||851||851||8|
|Jiutiaoling||102°03′ E||37°52′ N||Xiying River||1120||1077||10|
|Xiangtang||102°51′ E||36°22′ N||Datong River||15.133||15,126||88|
|Tangnaihai||100°09′ E||35°30′ N||Yellow River||752,443||121,972||633|
Table 3. Adftest test results of monthly streamflow in each hydrologic station.
|Confidence coefficient (%)||99.9||99.9||99.9||99.9||99.9|
Table 4. Three models’ performance metrics in each of the selected hydrological station.
Table 5. Three models’ performance in non-flowed metrics in each selected hydrological station.
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).