Next Article in Journal
Assessment of Spatiotemporal Groundwater Recharge Distribution Using SWAT-MODFLOW Model and Transient Water Table Fluctuation Method
Next Article in Special Issue
Changes in Water and Sediment Processes in the Yellow River and Their Responses to Ecological Protection during the Last Six Decades
Previous Article in Journal
The Effect of Nutrient Source and Beneficial Bacteria on Growth of Pythium-Exposed Lettuce at High Salt Stress
Previous Article in Special Issue
The Spatio-Temporal Dynamic Patterns of Shallow Groundwater Level and Salinity: The Yellow River Delta, China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Gaussian Process Regression and Cooperation Search Algorithm for Forecasting Nonstationary Runoff Time Series

1
Pearl River Water Resources Research Institute, Guangzhou 510610, China
2
Key Laboratory of the Pearl River Estuary Regulation and Protection of Ministry of Water Resources, Guangzhou 510610, China
3
College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China
4
Key Laboratory of Water Security Guarantee in Guangdong-Hong Kong-Marco Greater Bay Area of Ministry of Water Resources, Guangzhou 510610, China
*
Author to whom correspondence should be addressed.
Water 2023, 15(11), 2111; https://doi.org/10.3390/w15112111
Submission received: 4 May 2023 / Revised: 27 May 2023 / Accepted: 29 May 2023 / Published: 2 June 2023

Abstract

:
In the hydrology field, hydrological forecasting is regarded as one of the most challenging engineering tasks, as runoff has significant spatial–temporal variability under the influences of multiple physical factors from both climate events and human activities. As a well-known artificial intelligence tool, Gaussian process regression (GPR) possesses satisfying generalization performance but often suffers from local convergence and sensitivity to initial conditions in practice. To enhance its performance, this paper investigates the effectiveness of a hybrid GPR and cooperation search algorithm (CSA) model for forecasting nonstationary hydrological data series. The CSA approach avoids the premature convergence defect in GPR by effectively determining suitable parameter combinations in the problem space. Several traditional machine learning models are established to evaluate the validity of the proposed GPR-CSA method in three real-world hydrological stations of China. In the modeling process, statistical characteristics and expert knowledge are used to select input variables from the observed runoff data at previous periods. Different experimental results show that the developed GPR-CSA model can accurately predict nonlinear runoff and outperforms the developed traditional models in terms of various statistical indicators. Hence, a CSA-trained GPR model can provide satisfying training efficiency and robust simulation performance for runoff forecasting.

1. Introduction

Precise and timely runoff prediction is crucial for reducing flood damage, managing water resources and optimizing reservoir scheduling [1,2,3]. Moreover, multiple huge reservoirs in large rivers are put into operation in succession and, thus, the actual practical requirements for accurate runoff forecasting increase significantly [4,5,6]. Researchers and engineers are devoted to establishing a more comprehensive forecasting model that facilitates the scientific management of limited water resources under a changing environment [7,8,9]. However, the natural runoff process exhibits strong nonlinearity and nonstationarity owing to the combined influences of multiple factors, such as meteorological events, natural geography and watershed features [10,11,12,13]. Thus, accurate runoff forecasting remains an important but challenging research topic for hydrology experts and scientists.
In the last few years, various runoff forecasting methods have been successfully established, which can be broadly categorized into two different groups [14,15,16]: process-based approaches and data-driven approaches. Process-based approaches often involve intricate models, thorough knowledge of the physical mechanisms underlying runoff processes, sufficient hydrometeorological data, and scientific expertise judgment. The strict requirements pose application limitations that may lead to poor prediction performance and uncertainty. To address this problem, data-driven approaches have become the primary approach to produce reliable forecasting information for reservoir operation and hydropower energy management. Mathematical statistics techniques have been widely employed by many researchers to forecast nonstationary hydrological data series [16,17,18]. Although the linear relationship between predictors and predicted values can be well identified, these models cannot provide satisfying prediction results as the highly nonlinear characteristics inherent in runoff series are not given full consideration. With the advancement of intelligent computing, numerous artificial intelligence techniques have been widely used in runoff prediction [19,20]. Compared with the conventional regression approaches, the machine learning methods have demonstrated significant improvements in prediction accuracy.
As a classical machine learning approach, Gaussian process regression (GPR) relies on Bayesian theory and statistical learning theory [21,22,23]. By replacing the basis function used in Bayesian linear identification, GPR can address the complicated regression problems with small sample sizes and high dimensionality [24,25,26]. Compared to the traditional forecasting models, GPR has the advantages of easy implementation, adaptively acquiring hyperparameters, and producing probabilistic outputs. GPR has gained significant attention in regression problems, such as runoff, wind power, and solar power forecasting. Generally, the conjugate gradient method is often used to obtain GPR hyperparameters. However, it suffers from high dependence on the initial values and difficulty in determining the number of iterations. In other words, the GPR model may suffer from local convergence and strong parameter dependence, which obviously limit its practicality and interpretability in runoff prediction. To mitigate the research gaps, it is essential to find more practical methods to improve GPR’s performance for forecasting nonlinear runoff series [27,28,29].
Recently, a novel meta-heuristic cooperation search algorithm (CSA) was developed to resolve intricate engineering optimization problems [30]. In the foundational concept of the CSA method, each solution can be viewed as a staff member in a teamwork setting and multiple solutions form the swarm for evolutionary computing. The swarm converges gradually towards the promising search regions around the optimal solution with three operators, including the team communication operator for improving global search ability, the reflective learning operator for facilitating local search ability, and the internal competition operator for ensuring the survival of elite solutions. The CSA approach has been used to resolve numerical optimization problems. Despite its potential, there is little research on applying the CSA to promote the GPR performance in runoff forecasting. Thus, the paper proposes a hybrid GPR-CSA model that leverages the CSA algorithm to enhance the generalization ability of GPR for runoff forecasting. The experiments show that compared with conventional models, the GPR-CSA approach offers better prediction accuracy for different applications. In summary, this paper contributes multiple effective models for forecasting the real-world runoff data series; moreover, a novel GPR-CSA method with better forecasting accuracy can be applied for nonlinear regression tasks, including runoff prediction, in various scenarios. The application of the GPR-CSA to three hydrological stations in China demonstrates that the GPR-CSA is able to fully identify the high-dimensional relationship between predictors and predicted values, providing an effective artificial intelligence model for addressing hydrologic forecasting problems.
The layout structure of this article is given as follows: Section 2 outlines the specifics of our methodology for runoff forecasting; Section 3 gives four evaluation criteria; Section 4 examines the applicability of the proposed method; and Section 5 gives the conclusion.

2. Methods

2.1. Gaussian Process Regression (GPR)

Suppose that y and t are the dependent and independent variable belonging to the real number set R. The regression problem that further incorporates noisy information can be expressed as:
y = f ( t ) + ε
Here, ε N 0 , σ 2 and f are the measurement error and an unidentified function. Based on the Gaussian process technique, f is a stochastic function with both a mean function u and a covariance function k , . The relationship between different instances can be formulated as below:
k t , t ' ; θ = C o v f ( t ) , f t
where θ stands for the hyper-parameters requiring estimation.
For an observation set D = t 1 , y 1 , t 2 , y 2 , , t n , y n , the equation can be expressed as the following equation:
y i = f ( t i ) + ε i
where ε i i = 1 , 2 , , n are the random disturbances in the normal distribution with mean 0 and variance σ 2 . Thus, the joint distribution of y 1 , y 2 , , y n obeys a multivariate normal distribution, which can be expressed as follows:
y = y 1 , y 2 , , y n T N n ( μ , Ψ )
where u i = u t i is entries of average matrix μ . Ψ denotes an n × n matrix whose i , j th element is specified by
Ψ i j = C o v y i , y j = k t i , t j ; θ + σ 2 δ i j
where δ i j denotes the Kronecker delta.
It is assumed that t and y are the testing point and possible response value, and for the training set D , the conditional distribution is characterized by a normal distribution where the average and variance are computed as follows:
E y D = μ t + ψ T t Ψ 1 ( y μ ) V a r y D = k t , t ; θ + σ 2 ψ T t Ψ 1 ψ t
where ψ t = k t , t 1 ; θ , , k t , t n ; θ T is the covariance between f t and f = f t 1 , , f t n T . Ψ is the covariance matrix of y 1 , y 2 , , y n T . In GPR, the parameters are the hyperparameters θ in the covariance function, the noise variance σ 2 , and other coefficients (indicated generally by β ) in the mean function u . These parameters are obtained by finding the maximum succeeding marginal log-likelihood function [31,32,33]:
l θ , σ 2 , β D = 1 2 log ( det ( Ψ ) ) 1 2 ( y μ ) T Ψ 1 ( y μ ) n 2 log ( 2 π )

2.2. Cooperation Search Algorithm (CSA)

The cooperation search algorithm (CSA) is a novel and effective meta-heuristic tool for tackling complex optimization problems [34,35,36]. In the CSA optimizer, the target problem is regarded as a growing company. Each solution is perceived as an employee, while a group of solutions forms a team. The supervisor committee consists of the personal best-known staff members, while the executive committee is composed of an elite staff member set that contains M global best-known solutions. The chairman-in-office is picked from the executive committee in a random manner. The CSA methodology starts with the random initialization of solutions, and then all solutions are dynamically updated to gradually discover high-quality solutions through three operators. Specifically, the team communication operator determines the solution’s probability of being influenced by leader solutions, whereas the reflective learning operator determines whether to learn from its own best-known position or that of its supervisor. The internal competition operator selects solutions with better fitness values to compete for leadership positions, promoting distribution diversity and global search ability of the swarm. By the above procedures, the CSA method has been widely applied to resolve the complicated optimization problems in different engineering fields.
Figure 1 shows the schematic diagram of the CSA method. Then, the technical details that are crucial for solving the multivariable optimization problems are as below.
(1)
Team building phase. The initial positions of all staff members in the team are determined by Equation (8). Based on the fitness value, M elite solutions are used to form the exterior leader set.
x i , j k = ϕ x _ j , x ¯ j , i 1 , I , j 1 , J , k = 1
where I is the number of solutions. The jth value of the ith solution at the kth iteration is represented by x i , j k . A uniformly distributed random number in [ L , U ] is denoted with ϕ L , U . x _ j and x ¯ j are the lower and upper bound of the jth variable, respectively.
(2)
Team communication operator. Each solution has the opportunity to acquire fresh insights from leader staff members. As showed in Equation (9), the team communication operator uses three components: the expertise A from the chairman, the cumulative knowledge B from the leader staff members in board of directors, and the combined knowledge C from leader staff members in the board of supervisors. The chairman is selected randomly from M global best-known solutions, whereas all directors and supervisors are assigned equal roles. The detailed equations are given as below:
u i , j k + 1 = x i , j k + A i , j k + B i , j k + C i , j k , i 1 , I , j 1 , J , k 1 , K
A i , j k = log 1 / ϕ 0 , 1 g b e s t i n d , j k x i , j k
B i , j k = α ϕ 0 , 1 1 M m = 1 M g b e s t m , j k x i , j k
C i , j k = β ϕ 0 , 1 1 I i = 1 I p B e s t i , j k x i , j k
where the jth element of the ith group agent at iteration k + 1 is denoted with u i , j k + 1 . The jth element of the ith agent’s best-known position at iteration k is represented by p B e s t i , j k . The jth element of the indth global best-known agent is represented by g B e s t i n d , j k . ind is the integer randomly chosen from {1,2,…,M}. The expertise affected by the chairman is denoted with A i , j k . The mean expertise affected by M global best-known staff members and I personal best-known staff members are denoted as B i , j k and C i , j k . α and β are the learning parameters.
(3)
Reflective learning operator. In addition to studying from elite agents, each agent can also acquire new information by considering their own experiences and observations, which can be represented as follows:
v i , j k + 1 = r i , j k + 1         if u i , j k + 1 c j p i , j k + 1       if u i , j k + 1 < c j , i 1 , I , j 1 , J , k 1 , K
r i , j k + 1 = ϕ x ¯ j + x _ j u i , j k + 1 , c j             if u i , j k + 1 c j < ϕ 0 , 1 x ¯ j x _ j ϕ x _ j , x ¯ j + x _ j u i , j k + 1             otherwise
p i , j k + 1 = ϕ c j , x ¯ j + x _ j u i , j k + 1           if u i , j k + 1 c j < ϕ 0 , 1 x ¯ j x _ j ϕ x ¯ j + x _ j u i , j k + 1 , x ¯ j           otherwise
c j = x ¯ j + x _ j 0.5
where the jth element of the ith reflective agent at iteration k + 1 is denoted as v i , j k + 1 .
(4)
Internal competition operator. By guaranteeing retention of the high-quality agents, the competitiveness of the swarm can be gradually enhanced by the following equation:
x i , j k + 1 = u i , j k + 1         if F u i k + 1 F v i k + 1 v i , j k + 1         if F u i k + 1 > F v i k + 1 , i 1 , I , j 1 , J , k 1 , K
where F x is the specific fitness score associated with staff x .
Based on the fitness value, the board of directors and the board of supervisors are updated by the following equation:
p B e s t i k + 1 = p B e s t i k i f ( F ( p B e s t i k ) F ( x i k + 1 ) ) x i k + 1 i f ( F ( p B e s t i k ) > F ( x i k + 1 ) ) i [ 1 , I ]
g B e s t k + 1 = arg min M ( F ( p B e s t 1 k + 1 ) , F ( p B e s t 2 k + 1 ) , , F ( p B e s t I k + 1 ) )
where p B e s t i k + 1 is the ith personal best-known solution at iteration k + 1. g B e s t k + 1 is the board of directors containing M global best-known solutions.

2.3. Proposed Runoff Forecasting Method

The GPR method has satisfying performance in learning efficiency and satisfactory generalization ability compared to the traditional forecasting models. However, the standard GPR method may yield undesirable hydrologic forecasting outcomes in actual application scenarios because unsatisfying parameter combinations will lead to the local convergence problem. To efficiently overcome this problem, this paper establishes a hybrid evolutionary artificial intelligence model where the first-rank parameter combinations of the GPR model is determined by the CSA method, and then the optimized GPR model to predict runoff data series in the coming periods. By linking the advantages of both GPR and CSA, this study can offer a more robust runoff forecasting model with higher compactness than the traditional GPR model. As showed in Figure 2, the specific process of the proposed model is given as follows:
Step 1: Preparatory work. The computation parameters of the proposed GPR-CSA method are set before calculation, such as the maximum number of iterations K, solutions I, leaders M in the CSA method, and the kernel function in the GPR model.
Step 2: Parameter optimization. Based on the training data, the detailed procedures of the CSA method to determine the GPR parameters are given as follows:
Step 2.1: Define the counter k = 1. Then, use Equation (8) in the team building phase to create the initial population in the feasible zone.
Step 2.2: Evaluate the fitness value of all staff members to update the optimal position of each staff member and the globally best-known position of the swarm.
Step 2.3: Use the team communication operators defined in Equations (9)–(12) to enhance global exploration as while the reflective learning operators in Equations (13)–(16) to improve local exploitation. Then, the internal competition operator in Equation (17) is used to select better solutions for the iteration k + 1.
Step 2.4: Increment the counter k by 1. If k is smaller than the maximum iteration, go to Step 2.2; otherwise, the globally best-known position of the CSA represents the ideal GPR model parameters.
Step 3: Operational prediction. Utilize the optimized GPR model to forecast the potential predicted values of the new predictors in the testing dataset.

3. Performance Evaluation Criteria

In this section, four evaluation criteria are used to evaluate the performance of the developed models in hydrologic forecasting, including root mean squared error (RMSE), mean absolute error (MAE), correlation coefficient (R), and Nash–Sutcliffe efficiency (NSE). In practical applications, a reliable and robust model can be capable of producing lower RMSE and MAE values, as well as larger values of R and NSE. The equations for these evaluation criteria are provided as below:
RMSE = 1 n i = 1 n y i y ˜ i 2
MAE = 1 n i = 1 n y ˜ i y i
R = i = 1 n ( y i y a v g ) ( y ˜ i y ˜ a v g ) i = 1 n y i y a v g 2 i = 1 n ( y ˜ i y ˜ a v g ) 2
NSE = 1 i = 1 n y i y ˜ i 2 / i = 1 n y i y a v g 2
where y i and y ˜ i are the ith point in the recorded and estimated dataset. The average of all the recorded and estimated points are denoted with y a v g and y ˜ a v g . The set of the data being evaluated is represented by n.

4. Case Studies

4.1. Engineering Background

The presented GPR-CSA model is used to predict the nonlinear and nonstationary runoff data series of three representative hydrological stations in China, i.e., SX station with daily runoff data, LYX station and TNH station with weekly runoff data. Next, this study conducts a comparative analysis utilizing the runoff data of the SX, LYX and TNH stations. Figure 3 offers a comprehensive overview and statistical information of the recorded runoff data. The recorded runoff data are partitioned into two distinct subsets, with the first 70% reserved for training and validation, while the last 30% are reserved for testing.

4.2. Model Development

Several runoff forecasting models were applied to check the effectiveness of the proposed model, including linear regression (LR), artificial neural network (ANN), recurrent neural network (RNN), Gaussian process regression (GPR), and long short-term memory network (LSTM). It is worth noting that appropriate inputs play an important place in improving the performance of machine learning models. Thus, expert knowledge and partial autocorrelation functions were implemented to identify suitable inputs. For the SX, LYX and TNH stations, the input variables are all three antecedent runoffs at period t − 1, t − 2, and t − 3, which are used to forecast the runoff at period t + τ, where τ is the forecasting period. Moreover, the parameters for the models were determined as follows: for the ANN, RNN and LSTM models, the activation function was set as the sigmoid function, and the Adam optimizer was used for tuning parameters; for GPR, the combined RBF and rational quadratic kernel was used while the default parameter configurations in the scikit-learn toolbox in Python 3.10 were adopted for tunning parameters.

4.3. Experiment Results

4.3.1. Case 1: One-Step-Ahead Prediction Outcomes

This study develops several runoff forecasting models and then compares their performances at different forecasting horizons by four statistical evaluation indicators, namely RMSE, MAE, R, and NSE. Table 1 lists the detailed statistical indicators of one-step-ahead predicting results by the GPR-CSA method and several control methods. The data illustrate that the data at the SX station present the largest forecasting biases at both the training and testing sets, and the TNH station and LYX station present the lower forecasting biases than the SX station. This phenomenon lies in the higher runoff values of the SX station, showing the significant impacts of the runoff dataset at hand. Table 1 also supports the following interesting conclusions: (a) the GPR-CSA model displays superior forecasting ability owing to its lowest values of RMSE and MAE, coupled with the highest R and NSE indicators; (b) the standard GPR method outperforms the LR model in terms of the fitting ability and overall performance, highlighting the importance of the employed model structure; (c) the comparison of the GPR and GPR-CSA models highlights the validity of the CSA algorithm in identifying feasible parameters, demonstrating the superior ability of CSA in optimizing multivariable combinations.
To illustrate the prediction ability of the developed model, Figure 4 shows the one-step-ahead predicting results for the analyzed runoff data series. It shows that the runoff predicting curves of the GPR-CSA model approximate the original runoff curves, better than the other models at the three hydrological stations. Figure 5 depicts the radar plots of R and NSE at the one-step-ahead forecasting results. It can be seen that the values of R and NSE of the GPR-CSA method in the radar plots are the farthest to the center zone, proving the effectiveness of GPR-CSA in providing satisfactory predicting results. Thus, the GPR-CSA model can trace the complex features of runoff data, thereby leading to highly satisfactory forecasting outcomes. These findings serve to prove the feasibility of the model for hydrological forecasting.

4.3.2. Case 2: Two-Step-Ahead Prediction Outcomes

The refined model boasts adequate forecasting accuracy in the above one-step-ahead runoff prediction. In real-world scenarios, the forecasting model’s performance at various horizons is also critical to promote water resource utilization. Consequently, the two-step-ahead runoff predicting results are compared. As outlined earlier, Table 2 gives the statistical indicators of the predicting outcomes by various models. For both the training and testing data, the statistical data fully highlight the superiority of the GPR-CSA model compared to other control methods. Thus, this section provides further evidence of the engineering feasibility of the hydrological forecasting approach.
Figure 6 illustrates the scatter plots of two-step-ahead predicting results for the testing dataset derived through several techniques. It shows that the proposed model exhibits superior prediction accuracy in comparison to other models, as it attains the largest correlation between the recorded and predicted runoff in all simulations. Figure 7 shows the bar graphs of the RMSE and MAE for the two-step-ahead predicting results at the testing phase. It shows that GPR-CSA has smaller RMSE and MAE values compared with other forecasting methods, demonstrating the superiority of CSA in forecasting nonstationary runoff series. Thus, incorporating artificial intelligence and metaheuristic optimization can effectively meet the practical needs of hydrological forecasting tasks.

4.3.3. Case 3: Three-Step-Ahead Prediction Outcomes

Table 3 displays the statistical metrics of three-step-ahead predicting results obtained through multiple models. Figure 8 illustrates the statistical indicators of the three-step-ahead estimated results for the testing dataset. The data show that the standalone models yield limited forecasting results, while the evolutionary algorithm substantially improves the achieved outcomes in different cases. Compared to other models, the developed GPR-CSA model attains the best forecasting performance for the testing datasets at the three stations. Thus, the introduced parameter optimization strategy can significantly enhance the forecasting effectiveness of a standalone model for runoff forecasting.

4.4. Simulation Discussion

4.4.1. Analysis of the Kernel Function

The experiments are executed to show the influences of different kernel functions on the predicting results at the three stations. Table 4 gives the statistical indicators of one-step-ahead predicting results using different kernel functions, where kernel1 represents the radial basis function kernel, kernel2 represents the rational quadratic kernel, and kernel3 represents the compound kernel with the radial basis function kernel and rational quadratic kernel. Figure 9 draws the correlation coefficients of the predicting results with different kernels at the SX station. The following phenomenon can be observed: (1) compared with the standard GPR model, GPR-CSA achieves better prediction results regardless of the employed kernel functions; (2) for the same station, the predicting results change with the kernel functions, and thus, it is necessary to carefully select the kernel function according to the actual runoff situation by experiments; (3) with the extension of the forecasting period, the prediction performances of three kernel functions decrease gradually at the SX station. In all cases, the GPR-CSA model is always superior to the GPR model, demonstrating the superiority of the CSA method in finding suitable computation parameters of the GPR model. Thus, the GPR-CSA method is an effective tool to provide accurate hydrological forecasting information.

4.4.2. Analysis of the Forecast Errors

Figure 10 draws the forecast errors of various methods with different leading times at the SX station. The prediction performances of the three methods become worse with the increasing leading time, while the forecast errors of the GPR-CSA model are always better than the errors of the two other models (LR and GPR) regardless of the changing leading time. Thus, the proposed GPR-CSA model is effective in providing satisfactory runoff forecasting results.

4.4.3. Analysis of the Model Robustness

To analyze the robustness of the GPR-CSA model, the GPR-CSA model was run 50 times for the one-step-ahead prediction at the three stations. Figure 11 shows the RMSE of the one-step-ahead predicting results from the GPR-CSA and ANN model at the testing phase in different runs. Compared with the conventional ANN method, the predicting results of the GPR-CSA model at the testing dataset are stable with little fluctuations, which demonstrates the reliable forecasting performance of the GPR-CSA method. Thus, the GPR-CSA model has an outstanding parameter optimization ability and a robust performance for runoff prediction.

5. Conclusions

Accurate hydrological prediction is critical for the effective management of water energy resources. For addressing practical demands, this article proposes a hybrid artificial intelligence model for predicting runoff under uncertainty. For the first time, the cooperation search algorithm (CSA) is used to find the suitable parameter combinations of the classical Gaussian process regression (GPR) model. Through three well-designed operators, the CSA tool effectively overcomes the local convergence defects associated with the traditional gradient-based methods. To validate its efficacy, the GPR-CSA model is used to predict the nonlinear runoff data of three hydrological stations. The simulations indicate that the CSA method achieves a balance between global search and local search by optimizing the computational parameters of the traditional GPR model. Moreover, the results of the GPR-CSA method are better than several control models in both training and testing datasets. Thus, a novel hybrid artificial intelligence model is proposed for forecasting nonstationary and nonlinear streamflow. Combining the advantages of the GPR model and the CSA algorithm, the proposed model shows superior forecasting ability and robust prediction results in addressing complex hydrologic forecasting tasks. The research presented in this paper has resulted in innovative outcomes for the application of artificial intelligence methods in the field of hydrological forecasting. The findings have facilitated early warnings of flood disasters in river basins and the efficient utilization of water resources, making them a significant contribution to this field.

Author Contributions

Conceptualization, S.W. and Z.F.; methodology, S.W., H.G. and J.G.; software, W.L. and J.G.; validation, W.L., H.G. and J.G.; writing—original draft preparation, S.W. and Z.F.; writing—review and editing, W.L., H.G. and J.G.; funding acquisition, S.W., Z.F., J.G. and W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is financially supported by the National Key Research and Development Program of China (2022YFC3202300, 2021YFC3001000), the Open Research Fund of Key Laboratory of Water Security Guarantee in Guangdong-Hong Kong-Marco Greater Bay Area of Ministry of Water Resources (Grant number WSGBA-KJ202301) and Fundamental Research Funds for the Central Universities (B210201046).

Data Availability Statement

The data supporting the findings of this paper are available from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to thank the editors and reviewers for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ma, C.; Lian, J.; Wang, J. Short-term optimal operation of Three-gorge and Gezhouba cascade hydropower stations in non-flood season with operation rules from data mining. Energy Convers. Manag. 2013, 65, 616–627. [Google Scholar] [CrossRef]
  2. Madani, K.; Lund, J.R. California’s Sacramento–San Joaquin Delta Conflict: From Cooperation to Chicken. J. Water Resour. Plan. Manag. 2012, 138, 90–99. [Google Scholar] [CrossRef] [Green Version]
  3. Zheng, F.; Qi, Z.; Bi, W.; Zhang, T.; Yu, T.; Shao, Y. Improved Understanding on the Searching Behavior of NSGA-II Operators Using Run-Time Measure Metrics with Application to Water Distribution System Design Problems. Water Resour. Manag. 2017, 31, 1121–1138. [Google Scholar] [CrossRef]
  4. Chen, X.; Huang, J.; Han, Z.; Gao, H.; Liu, M.; Li, Z.; Liu, X.; Li, Q.; Qi, H.; Huang, Y. The importance of short lag-time in the runoff forecasting model based on long short-term memory. J. Hydrol. 2020, 589, 125359. [Google Scholar] [CrossRef]
  5. Zhang, J.; Chen, X.; Khan, A.; Zhang, Y.-K.; Kuang, X.; Liang, X.; Taccari, M.L.; Nuttall, J. Daily runoff forecasting by deep recursive neural network. J. Hydrol. 2021, 596, 126067. [Google Scholar] [CrossRef]
  6. He, Y.; Fan, H.; Lei, X.; Wan, J. A runoff probability density prediction method based on B-spline quantile regression and kernel density estimation. Appl. Math. Model. 2021, 93, 852–867. [Google Scholar] [CrossRef]
  7. Bai, T.; Chang, J.-X.; Chang, F.-J.; Huang, Q.; Wang, Y.-M.; Chen, G.-S. Synergistic gains from the multi-objective optimal operation of cascade reservoirs in the Upper Yellow River basin. J. Hydrol. 2015, 523, 758–767. [Google Scholar] [CrossRef]
  8. Chang, J.; Wang, X.; Li, Y.; Wang, Y.; Zhang, H. Hydropower plant operation rules optimization response to climate change. Energy 2018, 160, 886–897. [Google Scholar] [CrossRef]
  9. Liu, D.; Guo, S.; Wang, Z.; Liu, P.; Yu, X.; Zhao, Q.; Zou, H. Statistics for sample splitting for the calibration and validation of hydrological models. Stoch. Environ. Res. Risk Assess. 2018, 32, 3099–3116. [Google Scholar] [CrossRef]
  10. Liu, P.; Li, L.; Guo, S.; Xiong, L.; Zhang, W.; Zhang, J.; Xu, C.-Y. Optimal design of seasonal flood limited water levels and its application for the Three Gorges Reservoir. J. Hydrol. 2015, 527, 1045–1053. [Google Scholar] [CrossRef]
  11. Jiang, Z.; Wu, W.; Qin, H.; Hu, D.; Zhang, H. Optimization of fuzzy membership function of runoff forecasting error based on the optimal closeness. J. Hydrol. 2019, 570, 51–61. [Google Scholar] [CrossRef]
  12. Xie, T.; Zhang, G.; Hou, J.; Xie, J.; Lv, M.; Liu, F. Hybrid forecasting model for non-stationary daily runoff series: A case study in the Han River Basin, China. J. Hydrol. 2019, 577, 123915. [Google Scholar] [CrossRef]
  13. Zhang, Q.; Wang, B.-D.; He, B.; Peng, Y.; Ren, M.-L. Singular Spectrum Analysis and ARIMA Hybrid Model for Annual Runoff Forecasting. Water Resour. Manag. 2011, 25, 2683–2703. [Google Scholar] [CrossRef]
  14. Taherei Ghazvinei, P.; Hassanpour Darvishi, H.; Mosavi, A.; Yusof, K.B.W.; Alizamir, M.; Shamshirband, S.; Chau, K.W. Sugarcane growth prediction based on meteorological parameters using extreme learning machine and artificial neural network. Eng. Appl. Comp. Fluid. 2018, 12, 738–749. [Google Scholar] [CrossRef] [Green Version]
  15. Taormina, R.; Chau, K.-W.; Sivakumar, B. Neural network river forecasting through baseflow separation and binary-coded swarm optimization. J. Hydrol. 2015, 529, 1788–1797. [Google Scholar] [CrossRef]
  16. Wang, W.-C.; Chau, K.-W.; Xu, D.-M.; Chen, X.-Y. Improving Forecasting Accuracy of Annual Runoff Time Series Using ARIMA Based on EEMD Decomposition. Water Resour. Manag. 2015, 29, 2655–2675. [Google Scholar] [CrossRef]
  17. Yang, T.; Liu, X.; Wang, L.; Bai, P.; Li, J. Simulating Hydropower Discharge using Multiple Decision Tree Methods and a Dynamical Model Merging Technique. J. Water Resour. Plan. Manag. 2020, 146, 04019072. [Google Scholar] [CrossRef]
  18. Zhao, T.; Bennett, J.-C.; Wang, Q.-J.; Schepen, A.; Wood, A.W.; Robertson, D.E.; Ramos, M.H. How suitable is quantile mapping for postprocessing GCM precipitation forecasts? J. Clim. 2017, 30, 3185–3196. [Google Scholar] [CrossRef]
  19. Gao, S.; Huang, Y.; Zhang, S.; Han, J.; Wang, G.; Zhang, M.; Lin, Q. Short-term runoff prediction with GRU and LSTM networks without requiring time step optimization during sample generation. J. Hydrol. 2020, 589, 125188. [Google Scholar] [CrossRef]
  20. Huo, W.; Li, Z.; Zhang, K.; Wang, J.; Yao, C. GA-PIC: An improved Green-Ampt rainfall-runoff model with a physically based infiltration distribution curve for semi-arid basins. J. Hydrol. 2020, 586, 124900. [Google Scholar] [CrossRef]
  21. Asante-Okyere, S.; Shen, C.; Ziggah, Y.Y.; Rulegeya, M.M.; Zhu, X. Investigating the Predictive Performance of Gaussian Process Regression in Evaluating Reservoir Porosity and Permeability. Energies 2018, 11, 3261. [Google Scholar] [CrossRef] [Green Version]
  22. Maritz, J.; Lubbe, F.; Lagrange, L. A Practical Guide to Gaussian Process Regression for Energy Measurement and Verification within the Bayesian Framework. Energies 2018, 11, 935. [Google Scholar] [CrossRef] [Green Version]
  23. Alizadeh, Z.; Yazdi, J.; Kim, J.H.; Al-Shamiri, A.K. Assessment of Machine Learning Techniques for Monthly Flow Prediction. Water 2018, 10, 1676. [Google Scholar] [CrossRef] [Green Version]
  24. Taki, M.; Rohani, A.; Soheili-Fard, F.; Abdeshahi, A. Assessment of energy consumption and modeling of output energy for wheat production by neural network (MLP and RBF) and Gaussian process regression (GPR) models. J. Clean. Prod. 2018, 172, 3028–3041. [Google Scholar] [CrossRef]
  25. Campos-Taberner, M.; García-Haro, F.J.; Camps-Valls, G.; Grau-Muedra, G.; Nutini, F.; Busetto, L.; Katsantonis, D.; Stavrakoudis, D.; Minakou, C.; Gatti, L.; et al. Exploitation of SAR and Optical Sentinel Data to Detect Rice Crop and Estimate Seasonal Dynamics of Leaf Area Index. Remote Sens. 2017, 9, 248. [Google Scholar] [CrossRef] [Green Version]
  26. Chang, W.; Chen, X. Monthly rainfall-runoff modeling at watershed scale: A comparative study of data-driven and theory-driven approaches. Water 2018, 10, 1116. [Google Scholar] [CrossRef] [Green Version]
  27. Fang, D.; Zhang, X.; Yu, Q.; Jin, T.C.; Tian, L. A novel method for carbon dioxide emission forecasting based on improved Gaussian processes regression. J. Clean. Prod. 2018, 173, 143–150. [Google Scholar] [CrossRef]
  28. Sun, A.Y.; Wang, D.; Xu, X. Monthly streamflow forecasting using Gaussian Process Regression. J. Hydrol. 2014, 511, 72–81. [Google Scholar] [CrossRef]
  29. Yan, J.; Li, K.; Bai, E.; Yang, Z.; Foley, A. Time series wind power forecasting based on variant Gaussian Process and TLBO. Neurocomputing 2016, 189, 135–144. [Google Scholar] [CrossRef]
  30. Feng, Z.-K.; Niu, W.-J.; Shi, P.-F.; Yang, T. Adaptive Neural-Based Fuzzy Inference System and Cooperation Search Algorithm for Simulating and Predicting Discharge Time Series Under Hydropower Reservoir Operation. Water Resour. Manag. 2022, 36, 2795–2812. [Google Scholar] [CrossRef]
  31. Liu, X.; Paritosh, P.; Awalgaonkar, N.M.; Bilionis, I.; Karava, P. Model predictive control under forecast uncertainty for optimal operation of buildings with integrated solar systems. Sol. Energy 2018, 171, 953–970. [Google Scholar] [CrossRef]
  32. Hermans, T.; Oware, E.; Caers, J. Direct prediction of spatially and temporally varying physical properties from time-lapse electrical resistance data. Water Resour. Res. 2016, 52, 7262–7283. [Google Scholar] [CrossRef] [Green Version]
  33. Liu, H.; Cai, J.; Ong, Y.-S. Remarks on multi-output Gaussian process regression. Knowl.-Based Syst. 2018, 144, 102–121. [Google Scholar] [CrossRef]
  34. Feng, Z.-K.; Huang, Q.-Q.; Niu, W.-J.; Yang, T.; Wang, J.-Y.; Wen, S.-P. Multi-step-ahead solar output time series prediction with gate recurrent unit neural network using data decomposition and cooperation search algorithm. Energy 2022, 261, 125217. [Google Scholar] [CrossRef]
  35. Feng, Z.-K.; Niu, W.-J.; Wan, X.-Y.; Xu, B.; Zhu, F.-L.; Chen, J. Hydrological time series forecasting via signal decomposition and twin support vector machine using cooperation search algorithm for parameter identification. J. Hydrol. 2022, 612, 128213. [Google Scholar] [CrossRef]
  36. Feng, Z.-K.; Shi, P.-F.; Yang, T.; Niu, W.-J.; Zhou, J.-Z.; Cheng, C.-T. Parallel cooperation search algorithm and artificial intelligence method for streamflow time series forecasting. J. Hydrol. 2022, 606, 127434. [Google Scholar] [CrossRef]
Figure 1. Sketch map of the CSA method.
Figure 1. Sketch map of the CSA method.
Water 15 02111 g001
Figure 2. Sketch map of the proposed GPR-CSA model.
Figure 2. Sketch map of the proposed GPR-CSA model.
Water 15 02111 g002
Figure 3. Sketch map of the studied streamflow data. (a) SX Station; (b) LYX Station; (c) TNH Station.
Figure 3. Sketch map of the studied streamflow data. (a) SX Station; (b) LYX Station; (c) TNH Station.
Water 15 02111 g003
Figure 4. One-step-ahead predicting results by various methods at the testing phase. (a) SX; (b) LYX; (c) TNH.
Figure 4. One-step-ahead predicting results by various methods at the testing phase. (a) SX; (b) LYX; (c) TNH.
Water 15 02111 g004
Figure 5. R and NSE of the one-step-ahead predicting results at the testing phase. (a) SX; (b) LYX; (c) TNH.
Figure 5. R and NSE of the one-step-ahead predicting results at the testing phase. (a) SX; (b) LYX; (c) TNH.
Water 15 02111 g005
Figure 6. The scatter plots of the two-step-ahead predicting results by various methods at the testing phase. (a) SX; (b) LYX; (c) TNH.
Figure 6. The scatter plots of the two-step-ahead predicting results by various methods at the testing phase. (a) SX; (b) LYX; (c) TNH.
Water 15 02111 g006
Figure 7. RMSE and MAE of the two-step-ahead predicting results at the testing phase. (a) SX; (b) LYX; (c) TNH.
Figure 7. RMSE and MAE of the two-step-ahead predicting results at the testing phase. (a) SX; (b) LYX; (c) TNH.
Water 15 02111 g007
Figure 8. RMSE and NSE of the three-step-ahead predicting results by various methods. (a) SX; (b) LYX; (c) TNH.
Figure 8. RMSE and NSE of the three-step-ahead predicting results by various methods. (a) SX; (b) LYX; (c) TNH.
Water 15 02111 g008
Figure 9. The correlation coefficient of predicting results with different kernels at the SX station.
Figure 9. The correlation coefficient of predicting results with different kernels at the SX station.
Water 15 02111 g009
Figure 10. Forecast errors of various methods with different leading times at the SX station. (a) one-step-ahead; (b) two-step-ahead; (c) three-step-ahead.
Figure 10. Forecast errors of various methods with different leading times at the SX station. (a) one-step-ahead; (b) two-step-ahead; (c) three-step-ahead.
Water 15 02111 g010
Figure 11. RMSE values of one-step-ahead predicting results at the testing phase in different runs. (a) SX; (b) LYX; (c) TNH.
Figure 11. RMSE values of one-step-ahead predicting results at the testing phase in different runs. (a) SX; (b) LYX; (c) TNH.
Water 15 02111 g011
Table 1. Statistical indicators of one-step-ahead runoff predicting results by different methods.
Table 1. Statistical indicators of one-step-ahead runoff predicting results by different methods.
StationMethodTraining Testing
RMSEMAERNSERMSEMAERNSE
SXLR2179.81251087.02660.97790.95642067.5276967.79990.97350.9477
ANN2141.90271081.54920.97890.95792023.4121953.25620.97500.9499
RNN2132.86111092.90850.97900.95822025.3833959.73140.97500.9498
LSTM2116.37331059.87810.97920.95892032.6758931.18920.97460.9494
GPR2133.34071074.14510.97890.95822021.9179933.63960.97480.9499
GPR-CSA2090.56761040.65870.97970.95992014.4942910.23080.97500.9503
LYXLR146.565384.69160.94910.9008224.2732132.82470.94330.8885
ANN145.067285.58490.95030.9028223.3189131.61430.94400.8895
RNN143.967981.98080.95090.9042222.0857128.55800.94560.8907
LSTM145.021583.29840.95020.9028226.2795130.82220.94350.8865
GPR146.123384.28440.94940.9013223.8168132.31470.94360.8890
GPR-CSA143.759781.55560.95110.9045220.4119127.92650.94600.8923
TNHLR156.152090.59240.94910.9008225.4124128.94110.93850.8794
ANN157.791493.35500.94820.8987227.0591134.65470.93730.8776
RNN159.843892.22030.94660.8961231.0292133.91000.93550.8733
LSTM155.895790.63690.94930.9011225.7327130.10970.93820.8791
GPR155.694690.18650.94940.9014224.8368128.48490.93880.8800
GPR-CSA153.775988.11680.95070.9038221.4671125.60450.94070.8836
Table 2. Statistical indicators of the two-step-ahead runoff predicting results.
Table 2. Statistical indicators of the two-step-ahead runoff predicting results.
StationMethodTraining Testing
RMSEMAERNSERMSEMAERNSE
SXLR3942.69852057.79490.92600.85743687.98041836.83070.91310.8334
ANN3796.75941985.06890.93160.86783556.66301734.47890.92040.8451
RNN3785.23461965.43620.93200.86863558.40271721.20910.92030.8449
LSTM3772.28121952.18390.93250.86953553.08901712.44150.92050.8454
GPR3921.76302042.24140.92680.85893674.16531820.14430.91380.8347
GPR-CSA3717.34931926.08420.93450.87333509.42831690.37850.92220.8492
LYXLR244.1420151.63290.85150.7250386.6832232.62070.82260.6686
ANN242.7675149.00920.85340.7281384.8749235.36070.82680.6717
RNN237.6878144.46940.85990.7393384.0141231.81090.82790.6732
LSTM237.6402143.21280.85990.7394385.2297231.12410.83080.6711
GPR242.9637150.37550.85300.7276385.4811231.87000.82380.6707
GPR-CSA236.6189142.82150.86120.7417379.1407228.65810.83300.6814
TNHLR257.0790161.10980.85530.7315373.6444226.96040.82220.6687
ANN254.3070156.00870.85870.7373370.6303227.03240.82580.6740
RNN254.4948158.74070.85840.7369370.3550226.73890.82510.6745
LSTM254.8199156.22930.85810.7362368.1236229.10520.82890.6784
GPR255.8652160.05300.85680.7340372.2691226.12100.82350.6711
GPR-CSA250.4350154.16250.86330.7452364.5931222.48080.83130.6845
Table 3. Statistical indicators of three-step-ahead runoff forecasting results.
Table 3. Statistical indicators of three-step-ahead runoff forecasting results.
StationMethodTraining Testing
RMSEMAERNSERMSEMAERNSE
SXLR4793.33942588.25350.88850.78954525.63642391.31040.86600.7492
ANN4563.06912454.24230.89970.80924334.62862254.36840.87980.7699
RNN4542.25382438.60450.90050.81094321.93722237.35520.88070.7713
LSTM4532.99152452.72040.90100.81174300.61862247.89330.88170.7735
GPR4548.53242464.93720.90020.81044325.55582261.59960.87990.7709
GPR-CSA4492.18952424.88730.90280.81514259.78872204.61380.88380.7778
LYXLR306.6492 203.0209 0.7524 0.5662 474.2153 306.9632 0.7212 0.5021
ANN292.4584 187.3703 0.7781 0.6054 468.7709 295.1098 0.7445 0.5135
RNN290.3365 184.7736 0.7818 0.6111 466.1477 292.4331 0.7454 0.5189
LSTM292.5871 186.6877 0.7779 0.6050 467.8508 294.4701 0.7477 0.5154
GPR295.2664 189.9242 0.7732 0.5978 468.1844 297.8071 0.7380 0.5147
GPR-CSA279.4007 172.0490 0.7999 0.6398 463.4016 289.1666 0.7514 0.5245
TNHLR325.8074214.80810.75410.5686459.6293298.60750.71750.4992
ANN315.2876201.67660.77210.5960449.3503292.83340.73760.5214
RNN315.5895204.72560.77160.5953448.3488293.83280.73390.5235
LSTM316.0069204.49480.77090.5942448.1143294.46480.73710.5240
GPR314.9528203.36130.77260.5969446.4302291.91470.73740.5276
GPR-CSA310.9775200.17070.77940.6070443.6498288.74420.74680.5334
Table 4. Statistical indicators of the one-step-ahead predicting results using different kernel functions.
Table 4. Statistical indicators of the one-step-ahead predicting results using different kernel functions.
StationKernelMethodTraining Testing
RMSEMAERNSERMSEMAERNSE
SXkernel1GPR2176.77781082.61860.97800.95652064.6345966.25680.97360.9478
GPR-CSA2090.76761040.63100.97970.95992015.0046910.69990.97500.9503
Improvement3.95%3.88%0.18%0.35%2.40%5.75%0.15%0.26%
kernel2GPR2133.33971073.53780.97890.95822021.9094933.61120.97480.9499
GPR-CSA2090.82231040.65870.97970.95992015.0183910.73150.97500.9503
Improvement1.99%3.06%0.09%0.17%0.34%2.45%0.02%0.04%
kernel3GPR2133.34071074.14510.97890.95822021.9179933.63960.97480.9499
GPR-CSA2090.56761040.65870.97970.95992014.4942910.23080.97500.9503
Improvement2.00%3.12%0.09%0.17%0.37%2.51%0.02%0.04%
LYXkernel1GPR146.433784.63130.94920.9009224.3537132.77930.94330.8884
GPR-CSA143.754581.54150.95110.9045220.3405127.86770.94600.8924
Improvement1.83%3.65%0.20%0.40%1.79%3.70%0.29%0.45%
kernel2GPR146.240984.43820.94930.9012224.1000132.53320.94340.8887
GPR-CSA143.774081.57760.95110.9045220.4450127.94910.94600.8923
Improvement1.69%3.39%0.18%0.37%1.63%3.46%0.27%0.41%
kernel3GPR146.123384.28440.94940.9013223.8168132.31470.94360.8890
GPR-CSA143.759781.55560.95110.9045220.4119127.92650.94600.8923
Improvement1.62%3.24%0.18%0.35%1.52%3.32%0.26%0.38%
TNHkernel1GPR156.013990.53360.94920.9010225.3258128.97400.93850.8795
GPR-CSA153.788088.10020.95070.9038221.3923125.51300.94070.8837
Improvement1.43%2.69%0.16%0.31%1.75%2.68%0.23%0.47%
kernel2GPR155.815190.34330.94930.9012225.0535128.74290.93870.8798
GPR-CSA153.758888.09450.95070.9038221.3747125.55020.94070.8837
Improvement1.32%2.49%0.14%0.29%1.63%2.48%0.22%0.44%
kernel3GPR155.694690.18650.94940.9014224.8368128.48490.93880.8800
GPR-CSA153.775988.11680.95070.9038221.4671125.60450.94070.8836
Improvement1.23%2.29%0.13%0.27%1.50%2.24%0.20%0.41%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, S.; Gong, J.; Gao, H.; Liu, W.; Feng, Z. Gaussian Process Regression and Cooperation Search Algorithm for Forecasting Nonstationary Runoff Time Series. Water 2023, 15, 2111. https://doi.org/10.3390/w15112111

AMA Style

Wang S, Gong J, Gao H, Liu W, Feng Z. Gaussian Process Regression and Cooperation Search Algorithm for Forecasting Nonstationary Runoff Time Series. Water. 2023; 15(11):2111. https://doi.org/10.3390/w15112111

Chicago/Turabian Style

Wang, Sen, Jintai Gong, Haoyu Gao, Wenjie Liu, and Zhongkai Feng. 2023. "Gaussian Process Regression and Cooperation Search Algorithm for Forecasting Nonstationary Runoff Time Series" Water 15, no. 11: 2111. https://doi.org/10.3390/w15112111

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop