1. Introduction
One critical factor in planning, design, operation, and management of water distribution system (WDS) is satisfying quality water demand at reasonable pressure [
1,
2,
3]. An accurate hydraulic model of WDS will help water utilities to improve their operation ability and management effectively. Because the WDS hydraulics are driven by consumer demands, it is necessary to estimate consumer demands prior to performing hydraulic evaluation [
4]. Water demand at a given time in the future is usually related to historical water consumption and meteorological factors such as humidity, air temperature, and wind velocity [
5]. Water demand forecasting plays an important role in activities of the WDS such as water production, pump station operation, real-time modeling, and other strategic decisions of water management [
1,
6].
The water demand forecasting models can be categorized into long-term and short-term models according to the forecast horizon (i.e., the time period that the water demand will be forecasted) and forecast frequency (i.e., the time step that the water demand forecasts are performed within the time period) [
7]. The long-term forecasting model (1 to 10 years’ forecast horizon) pays more attention to the plan and design of WDSs. The short-term forecasting model (1 day to 1 month’s forecast horizon) targets the real-time water demands of the existing WDSs, which is generally used for daily operation of water plants and pump stations [
8]. In this study we focus on the short-term model. The accurate model for short-term water demand forecasting with a forecast frequency ranging from daily to sub-hourly is an essential support for optimal scheduling and better decision marking for WDS management [
9].
Many studies have proposed forecasting models for short-term water demand forecasting, which can be generally classified into traditional methods and learning algorithms [
9]. Early works used traditional statistical models to settle this problem, such as liner regression, exponential smoothing, and auto regressive integrated moving average (ARIMA) [
7]. These models have been widely applied in practice because they are simple to understand and implement. Whereas, the traditional models are not always able to accurately predict the nonlinear changes of water demands. Recently, more sophisticated models that use machine learning algorithms and artificial intelligence have been utilized to address this problem. The models utilizing machine learning algorithms are typical data-driven nonlinear models, which are mainly based on historical data to establish the relationships between water demand and related variables (e.g., previous water consumption, air humidity, and temperature).
A number of data-driven models that use machine learning algorithms have been developed for short-term water demand forecasting, such as artificial neural networks (ANN) models [
10,
11,
12], support vector machine models (SVM) [
13,
14,
15,
16], project pursuit regression models [
1,
17], and random forests [
18]. Herrera et al. [
1] conducted a comparison of these aforementioned models, and found that the SVM model has the most accurate results. Khan and Coulibaly [
15] performed a comparison between SVM, ANN, and seasonal autoregressive model in forecasting lake water levels, and the results indicated the SVM model outperforms the other two. The main reason is because the SVM exhibits inherent advantages in formulating cost functions by using structural risk minimization principle instead of the empirical risk minimization of ANN [
19].
SVM maps the nonlinear trends of input space to linear trends in a higher dimensional space and recognizes the subtle patterns in complex datasets by using a learning algorithm [
20]. The least squares support vector machine (LSSVM) is an extension of SVM which involves equality constraints instead of inequality constraints and works with a least squares cost function [
21,
22]. Due to the equality constraints, the LSSVM reduces the computational complexity by solving a set of linear equations rather than the quadratic programming problem in standard SVM. Chen and Zhang [
14], Herrera et al. [
1], and Praveen and Bagavathi [
23] established an LSSVM-based model to forecast hourly water demand; it was found that the LSSVM model has better generalization ability than ANN. Other examples of LSSVM applications include river flow estimation [
24], discharge-suspended sediment estimation [
25], and pipeline network failure estimation [
26]. When forecasting water demand with the LSSVM-based model, Chen and Zhang [
13] utilized the Bayesian framework to determine the model parameters (namely, the regularization constant and the width of the RBF kernel). Their case study showed that parameter determination by Bayesian method is faster than that of cross-validation [
26,
27].
Both the traditional models and the learning algorithms have achieved promising results in their own linear or nonlinear domains, whereas, none of them are universally suitable for all circumstances. To improve the performance of the forecasting models, the hybrid models combining two or more different algorithms/models are developed by some studies. Zhang [
28] established the hybrid model with ANN and ARIMA to forecast time series, in which the ARIMA model was firstly used to predict the linear part of the data, then ANN was performed to model the errors between the linear part and the observed data (i.e., the nonlinear part of the data). The application results of three benchmark time series data showed that the hybrid model improved forecasting accuracy more than the independent models. Odan and Reis [
7] associated the Fourier series (FS) to ANN for hourly water demand forecasting. ANN were used to model the errors of the FS forecast (i.e., the difference between the FS model and the observed data). Brentan et al. [
29] proposed a hybrid model based on SVM and adaptive FS, where SVM firstly provided the initial forecasting and then the adaptive FS was utilized to model the errors between the initial forecasting and the observed data. Thus, the nonlinear and periodical behavior of water demand can be captured by the SVM and FS model, respectively.
In addition to FS, the chaotic time series method gives the possibility of detecting instability phenomena hidden behind random-looking phenomena, which has been widely used in short-term time series forecasting of rainfall, traffic, and other fields. For example, Dhanya et al. [
30] examined the chaotic characteristics of daily rainfall data of the Malaprabha basin, India, and they established a daily rainfall prediction model based on the theory of chaotic time series. Liu et al. [
31] combined chaos theory with SVM to perform short-term prediction of network traffic. Yang et al. [
32] proposed an improved fuzzy neural system based on chaotic reconstruction technology for short-term load forecasting of electric power systems, and the application showed that the chaotic technology-based model performs better than the conventional neural network model. So far, chaotic time series has rarely been implemented to forecast water demand, and its performance in this field is unknown.
As aforementioned, with the help of error correction of the initial forecasting, hybrid models could perform better than any individual model [
7,
28,
29]. Therefore, it is worthwhile to integrate the chaotic time series method in the hybrid forecasting model and investigate their performance. This paper aims to achieve better predictions of short-term water demand by presenting a hybrid forecasting model which couples the chaotic time series with LSSVM in the error correction module. Specifically, it will:
Present the framework, methods, and performance indicators of the hybrid forecasting model,
Test the hybrid model’s accuracy based on case studies of three real-world DMAs in Beijing WDSs,
Verify the effectiveness of the model by comparing it with the results of other models, including ARIMA, LSSVM without error correction, and LSSVM using Fourier series for error correction.
4. Conclusions
Short-term water demand forecasting with the horizon ranges from sub-hourly to daily plays an important role in the field of optimal operation of pump stations and online hydraulic simulation of water distribution systems. To obtain more accurate predictions, this study proposes a hybrid framework with the error correction module which uses the chaotic time series, and investigates the performance of the framework in the short-term water demand forecasting with one day ahead and a 15-min time step. The hybrid framework is developed by integrating two modules, namely, the initial forecasting module and the error correction module. The initial forecasting model is established by the least squares support vector machines (LSSVM). In the error correction module the errors forecasting model is established by LSSVM using chaotic time series of error data from initial forecasting.
The hybrid model is implemented in the water demand forecasting of three actual district metering areas (DMAs) in Beijing, China, and the application results of the hybrid model are comparable to that of other two models including the forecasting model without error correction and the hybrid model using Fourier series for error correction. From the case study results, the following conclusions could be drawn:
In most instances, the hybrid models perform better than the forecasting model without error correction. The error correction module performs better in the short-term water demand forecasting than the DMAs whose composition of customers is simple. A simple composition of customers indicates a simple water consumption pattern and less peak fluctuations in the water consumption curves.
Due to the capability of detecting the underlying instability characteristics of time series, the error correction module using chaotic time series performs better than the Fourier series in predicting a complex disordered time series of errors.
For the periods of frequent and disordered peak fluctuations in the error time series, the performance of the error correction module is not good, and the error forecasting model based on Fourier series may lead to unreasonable forecasts by misleading the corrections to the initial forecasting. As a result, more attention should be paid to the features of the error time series when using the error correction module.
In the presented study, the hybrid forecasting framework is tested by three actual DMAs in Beijing with different characteristics. Further work on other DMAs are needed to test and verify the robustness of the hybrid forecasting framework, and much more effort is needed to test the performance of chaotic methods in mining the characteristics of the disordered peak fluctuated data. This study only tested the proposed model for the 24 h forecast horizon, whereas, the hybrid forecasting framework is not limited to the forecast horizon of one day, there is a potential to implement the model to a much longer forecast horizon and frequency, such as one week ahead with a time step of 6 h. Then the feature data for model training obtained from the historical data set should be adjusted accordingly.