Short-Term Water Demand Forecasting Model Combining Variational Mode Decomposition and Extreme Learning Machine

: Accurate water demand forecasting is essential to operate urban water supply facilities efﬁciently and ensure water demands for urban residents. This study proposes an extreme learning machine (ELM) coupled with variational mode decomposition (VMD) for short-term water demand forecasting in six cities (Anseong-si, Hwaseong-si, Pyeongtaek-si, Osan-si, Suwon-si, and Yongin-si), South Korea. The performance of VMD-ELM model is investigated based on performance indices and graphical analysis and compared with that of artiﬁcial neural network (ANN), ELM, and VMD-ANN models. VMD is employed for multi-scale time series decomposition and ANN and ELM models are used for sub-time series forecasting. As a result, ELM model outperforms ANN model. VMD-ANN and VMD-ELM models outperform ANN and ELM models, and the VMD-ELM model produces the best performance among all the models. The results obtained from this study reveal that the coupling of VMD and ELM can be an effective forecasting tool for short-term water demands with strong nonlinearity and non-stationarity and contribute to operating urban water supply facilities efﬁciently. resampling techniques, VMD, and machine learning models can be suggested as a future study for improving short-term water demand forecasting.


Introduction
Many urban areas around the world are confronted with stresses associated with water supply due to natural and social factors including economic growth, overpopulation, and climate change [1][2][3][4][5].
To solve the problems, accurate water demand forecasting and the expansion and efficient operation of water supply and distribution facilities are essential. Especially, reliable and accurate water demand forecasting is essential to develop reliable water supply expansion strategies at the lowest cost. However, water demand forecasting is still a challenging task due to the availability of data, various influencing factors, various forecasting periods, and the nonlinearity and non-stationarity of data [1,6].
Water demand forecasting can be classified as short-term, medium-term, and long-term forecasting in terms of forecast horizon although there is no general rule for the classification. According to Tiwari and Adamowski [7], hourly forecasting (up to 48-h lead time), daily forecasting (up to 14-day lead time), and weekly forecasting (up to 26-week lead time) are classified as short-term forecasting. Medium-term forecasting includes monthly forecasting with up to 24-month lead time, whereas annual and decadal forecastings are considered as long-term forecasting. Decision problems in tactical planning level including revenue forecast and investment planning are treated based on medium-term forecasting, whereas long-term forecasting is used for decision problems such as capacity expansion in strategic planning level. This study is focused on short-term forecasting based on daily water demand time series. Short-term forecasting can be utilized for dealing with decision problems in this study for decomposing an original water demand time series into sub-time series. VMD has been successfully applied for the analysis of hydrological variables and exhibited better performance compared with conventional methods including DWT and ensemble empirical mode decomposition (EEMD) [17,18]. DWT has drawbacks that mother wavelets, decomposition level, and edge effect should be considered for time series decomposition, and empirical mode decomposition (EMD) has disadvantages including mode mixing, stopping criteria, and end-point effect. EEMD was developed for mitigating the problems of EMD (especially the mode mixing) but the problems were not completely resolved. On the other hand, VMD decomposes an original signal into multiple modes and then updates them based on Wiener filtering. By utilizing Wiener filtering, VMD can be more robust to sampling and noise and yield narrow-banded modes (see Dragomiretskiy and Zosso [19] and Theorem 1 of Polyak and Pearlman [20]). This helps to alleviate the effect of mode mixing and extract the time-frequency features accurately. In addition, VMD can be implemented faster and more efficiently compared with EMD and EEMD since VMD is a non-recursive algorithm based on augmented Lagrangian method and alternate direction method of multipliers (ADMM) (see Algorithm 1 of Dragomiretskiy and Zosso [19]).
Meanwhile, ELM is selected as a forecasting model for the decomposed water demands due to its ease of modeling and excellent performance [18,21,22]. Conventional gradient descent-based ANN models require many iterative learning steps for obtaining the optimal learning performance. This increases the computational time and the possibility of being trapped in the local minima. Recurrent neural network (RNN) is known as a powerful model for time series modeling. However, in practice, RNN has the disadvantage that it is difficult to be trained properly due to the vanishing gradient and exploding gradient problems [23]. Deep learning models require many training datasets and very long computational time since they have many weights and biases that should be trained due to their structural complexity. Further, optimization algorithms need to be used for determining the optimal learning parameters since deep learning models have many learning parameters that should be selected in advance. Moreover, deep learning models require many computational resources and long computational time, even using graphics processing unit (GPU)-based deep learning models. In practice, one may get just as good performance with non-deep learning models using far less computational time and resources although this is dependent on the given problems and datasets [24]. On the other hand, for the ELM model, the input weights and biases are assigned randomly, and the smallest norm least-squares solution for the output weights is calculated analytically using Moore-Penrose generalized inverse (see Definitions 2.1-2.2 and Theorem 2.1 of Huang et al. [25]). This can increase the learning speed extremely and provide better generalization compared with conventional ANN models. Thus, in this study, the performance of VMD-ELM model for short-term water demand forecasting in six cities (Anseong-si, Hwaseong-si, Pyeongtaek-si, Osan-si, Suwon-si, and Yongin-si), South Korea is investigated and compared with that of ANN, ELM, and VMD-ANN models, based on performance indices and graphical analysis. Figure 1 shows the locations of water purification plant and water supply areas. The Suji water treatment plant, which is located in the north of study area, supplies water to six cities, Anseong-si, Hwaseong-si, Pyeongtaek-si, Osan-si, Suwon-si, and Yongin-si, which are located in the southern region of Gyeonggi-do, South Korea. The plant has the capacity of 916,000 m 3 /day and has been operated by the K-water (http://www.kwater.or.kr, accessed on 12 August 2018). Table 1 shows the area, population, and water demand of study area. The total area is 2460.51 km 2 with a total population of 3,840,907. Especially, Suwon-si is a city with the largest population among local governments, South Korea and has the largest population density in the study area. Yongin-si is the fastest growing city in South Korea, which has the second largest population in the study area. Suwon-si and Yongin-si have the largest water demand among the cities in the study area. Domestic water demand is higher in Osan-si, Suwon-si, and Yongin-si, whereas agricultural water demand is higher in Anseong-si, Hwaseong-si, and Pyeongtaek-si.

Data Used
In this study, daily water demand data between 2008 and 2017 provided by K-water were used for developing water demand forecasting models. For training the models efficiently [26], the water demand data were scaled into the range of [0, 1]. There is no clear criterion to divide the entire data into training and testing data. However, the length of the training data is typically set at 70-90% of the total data length [27,28]. Thus, in this study, the scaled data were then partitioned into training (2008-2014, data length = 2557 (70%)) and testing datasets (2015-2017, data length = 1096 (30%)). water demand is higher in Osan-si, Suwon-si, and Yongin-si, whereas agricultural water demand is higher in Anseong-si, Hwaseong-si, and Pyeongtaek-si. In this study, daily water demand data between 2008 and 2017 provided by K-water were used for developing water demand forecasting models. For training the models efficiently [26], the water demand data were scaled into the range of [0,1]. There is no clear criterion to divide the entire data into training and testing data. However, the length of the training data is typically set at 70-90% of the total data length [27,28]. Thus, in this study, the scaled data were then partitioned into training (2008-2014, data length = 2557 (70%)) and testing datasets (2015-2017, data length = 1096 (30%)).

Variational Mode Decomposition (VMD)
VMD, which is developed by Dragomiretskiy and Zosso [19], is a non-recursive and adaptive time-frequency analysis method. The VMD decomposes an original time series f into K sub-time series called intrinsic mode functions (IMFs). The IMFs can be obtained by updating Equations (1)-(3) until convergence [19].

Variational Mode Decomposition (VMD)
VMD, which is developed by Dragomiretskiy and Zosso [19], is a non-recursive and adaptive time-frequency analysis method. The VMD decomposes an original time series f into K sub-time series called intrinsic mode functions (IMFs). The IMFs can be obtained by updating Equations (1)-(3) until convergence [19].û n+1 k Hydrology 2018, 5, 54 whereû n k = the kth mode in nth iteration,f = the Fourier transform of f, ω n k = the kth center frequency in nth iteration,λ n = the Lagrange multiplier in nth iteration, α = the quadratic penalty factor, τ = the time step of dual ascent. For the theoretical details of VMD, one can refer to Dragomiretskiy and Zosso [19].

Artificial Neural Network (ANN)
ANN is an artificial intelligence (AI) model for solving various problems including classification and regression [29]. Multilayer perceptron (MLP) is a kind of feedforward ANN organized into layers with multiple neurons [30], which has been widely applied in hydrological fields. Figure 2 represents the general structure of three-layer MLP. The neurons of the input layer receive the input vectors (lagged water demands), and output values (forecasted water demands) are yielded from the neuron of the output layer. The hidden layer, which is the middle layer of MLP, connects the neurons of the input and output layers. The MLP can be represented as Equation (4) [31]: where x j andŷ j = the input and output vectors, w i and β i = the weights for the hidden and output layers, b i and β 0 = the biases for the hidden and output layers, L = the number of the neurons of hidden layer, N = the length of time series data, and h = the transfer function. The determination of the parameters (weights and biases) of MLP, which is called model learning, is conducted through learning algorithms such as backpropagation (BP). The detailed theoretical background of ANN can be found in Alpaydin [32].
Hydrology 2018, 5, x FOR PEER REVIEW 5 of 19 where ˆn k u = the kth mode in nth iteration, f = the Fourier transform of f, n k  = the kth center frequency in nth iteration, ˆn  = the Lagrange multiplier in nth iteration,  = the quadratic penalty factor,  = the time step of dual ascent. For the theoretical details of VMD, one can refer to Dragomiretskiy and Zosso [19].

Artificial Neural Network (ANN)
ANN is an artificial intelligence (AI) model for solving various problems including classification and regression [29]. Multilayer perceptron (MLP) is a kind of feedforward ANN organized into layers with multiple neurons [30], which has been widely applied in hydrological fields. Figure 2 represents the general structure of three-layer MLP. The neurons of the input layer receive the input vectors (lagged water demands), and output values (forecasted water demands) are yielded from the neuron of the output layer. The hidden layer, which is the middle layer of MLP, connects the neurons of the input and output layers. The MLP can be represented as Equation (4) [31]: where j x and ˆj y = the input and output vectors, i w and i β = the weights for the hidden and

Extreme Learning Machine (ELM)
ELM is a least square-based single-hidden layer feed-forward neural networks (SLFNs) [33]. In terms of training speed and generalization capability, ELM is known to be superior to conventional ANN [34]. The SLFN with the weights and biases generated randomly can be expressed as Equation (5) and written compactly as Equation (6).
where H = h(w i · x j + b i ) = the matrix comprised of the outputs from hidden neurons, h = the transfer function, w i = [w i1 , w i2 , · · · , w in ] T = the ith weight vector between input and hidden neurons, . , x jn ] T = the jth input vectors, b i = the ith bias for hidden neurons, L and N = the number of hidden neurons and the data length, respectively, comprised of weights between hidden and output neurons, β i = [β i1 , β i2 , · · · , β im ] T = the ith weight vectors between hidden and output neurons, and Unlike conventional ANN, the β is analytically determined by the method of least squares as Equation (7).β = H † T where H † = PH T is the Moore-Penrose generalized inverse of H, and P = H T H −1 is the inverse of the covariance matrix of H. Figure 3 represents the general structure of ELM model used in this study. For detailed theoretical background on the ELM, one can refer to Huang et al. [33].

Extreme Learning Machine (ELM)
ELM is a least square-based single-hidden layer feed-forward neural networks (SLFNs) [33]. In terms of training speed and generalization capability, ELM is known to be superior to conventional ANN [34]. The SLFN with the weights and biases generated randomly can be expressed as Equation (5) and written compactly as Equation (6).
vectors between hidden and output neurons, and Unlike conventional ANN, the β is analytically determined by the method of least squares as Equation (7). †  β HT (7) where † T  H PH is the Moore-Penrose generalized inverse of H, and Figure 3 represents the general structure of ELM model used in this study. For detailed theoretical background on the ELM, one can refer to Huang et al. [33].

VMD-Based Water Demand Forecasting
VMD-based machine learning models (VMD-ANN and VMD-ELM) for water demand forecasting are to couple VMD and single machine learning models (ANN and ELM), respectively.

VMD-Based Water Demand Forecasting
VMD-based machine learning models (VMD-ANN and VMD-ELM) for water demand forecasting are to couple VMD and single machine learning models (ANN and ELM), respectively. VMD was adopted as a time series decomposition method for decomposing water demand time series into IMFs, and ANN and ELM models were used for model learning and forecasting for each IMF. Figure 4 shows the flowchart of VMD-based water demand forecasting consisting of the following four steps: Step 1. Decomposition: Water demand time series is decomposed into IMFs using VMD.
Step 2. Model learning: ANN and ELM models are learned for each IMF.
Step 3. IMF forecasts: ANN and ELM models produce forecasted values for each IMF.
Step 4. Final forecasts: Summing the forecasted IMFs produces the final water demand forecasts. VMD was adopted as a time series decomposition method for decomposing water demand time series into IMFs, and ANN and ELM models were used for model learning and forecasting for each IMF. Figure 4 shows the flowchart of VMD-based water demand forecasting consisting of the following four steps: Step 1. Decomposition: Water demand time series is decomposed into IMFs using VMD.
Step 2. Model learning: ANN and ELM models are learned for each IMF.
Step 3. IMF forecasts: ANN and ELM models produce forecasted values for each IMF.
Step 4. Final forecasts: Summing the forecasted IMFs produces the final water demand forecasts.

Performance Evaluation Indices
In this study, multiple performance indices were used for evaluating the model performances. Mathematical expressions and ranges for the performance indices are presented in Table 2. Since there is no universal performance index, it is desirable to choose the indices that are appropriate for a particular application. Furthermore, it is common to use multiple performance indices since there are pros and cons in each index [35,36]. The performance indices used in this study are categorized as absolute, relative, and dimensionless errors. The absolute errors include the mean absolute error (MAE), the root mean squared error (RMSE), and the fourth root mean quadrupled error (R4MS4E). RMSE and R4MS4E are sensitive to forecasting errors for peak and high data values, and thus can be good performance indices for high data values. In contrast, MAE is evaluated based on all deviations without being weighted to lower and higher data values. MAE, RMSE, and R4MS4E have the advantage that they can represent the size of a typical error effectively since they are evaluated in the same unit as the original data. For a perfect model, MAE, RMSE, and R4MS4E would be zero [36]. The mean absolute relative error (MARE) and median absolute percentage error (MdAPE) belong to relative errors. MARE is a good index for low data values since it is more affected to forecasting errors for low data values. MdAPE is similar to MARE but it has the advantage that it is less sensitive to skewed error distributions and outliers. For a perfect model, MARE and MdAPE would be zero [36].

Original time series
Variational Mode Decomposition (VMD)

Performance Evaluation Indices
In this study, multiple performance indices were used for evaluating the model performances. Mathematical expressions and ranges for the performance indices are presented in Table 2. Since there is no universal performance index, it is desirable to choose the indices that are appropriate for a particular application. Furthermore, it is common to use multiple performance indices since there are pros and cons in each index [35,36]. The performance indices used in this study are categorized as absolute, relative, and dimensionless errors. The absolute errors include the mean absolute error (MAE), the root mean squared error (RMSE), and the fourth root mean quadrupled error (R4MS4E). RMSE and R4MS4E are sensitive to forecasting errors for peak and high data values, and thus can be good performance indices for high data values. In contrast, MAE is evaluated based on all deviations without being weighted to lower and higher data values. MAE, RMSE, and R4MS4E have the advantage that they can represent the size of a typical error effectively since they are evaluated in the same unit as the original data. For a perfect model, MAE, RMSE, and R4MS4E would be zero [36]. The mean absolute relative error (MARE) and median absolute percentage error (MdAPE) belong to relative errors. MARE is a good index for low data values since it is more affected to forecasting errors for low data values. MdAPE is similar to MARE but it has the advantage that it is less sensitive to skewed error distributions and outliers. For a perfect model, MARE and MdAPE would be zero [36]. The dimensionless errors include the modified versions of the coefficient of efficiency (CE) and the index of agreement (IOA). The CE and IOA have the drawback that they can overestimate the model performance for peak data values. The modified CE (MCE) and modified IOA (MIOA) for j = 1 can reduce the overestimation for peak data values and represent better overall evaluation, whereas larger values of j can be used for the performance evaluation of high data values [35]. For a perfect model, MCE and MIOA would be one [35,36]. Table 2. Mathematical expressions and ranges for performance indices.

Performance Indices Equations Ranges
Absolute errors Dimensionless errors Q i : forecasted water demand, Q i : observed water demand; Q: mean of observed water demand; and N: data length.

Development of Single and VMD-Based Forecasting Models
To develop VMD-based water demand models, water demand time series should first be decomposed using the VMD. The decomposition results depend on the quadratic penalty factor (α) and the number of IMFs (K). In this study, the values of α and K were determined based on the following trial-and-error method: Step 1. Step 4. The sets of K and α values corresponding to r ≈ 1 are selected.
Step 5. The optimal values of K and α are selected based on the performances of VMD-based forecasting models. Figure 5 represents the correlation coefficients between the water demand time series and the reconstructed time series for different K and α values. Based on Figure 5, the sets of K and α values corresponding to r ≈ 1 were selected and then K = 4 and α = 5 producing the best performance of VMD-based forecasting models were selected as the optimal values. Using the K and α values, the original water demand time series was decomposed into four sub-time series (IMF 1-IMF 4) as seen in Figure 6. VMD-based forecasting models were selected as the optimal values. Using the K and α values, the original water demand time series was decomposed into four sub-time series (IMF 1-IMF 4) as seen in Figure 6. VMD-based forecasting models were selected as the optimal values. Using the K and α values, the original water demand time series was decomposed into four sub-time series (IMF 1-IMF 4) as seen in Figure 6. For developing forecasting models using ANN and ELM, the potential influencing input variables should be selected in advance. Various lag times for daily water demand time series were considered in this study. The optimal lag time was determined by autocorrelation function (ACF), partial autocorrelation function (PACF), and average mutual information (AMI) [37]. The structure of input and target variables for the ANN and ELM models is as follows: , and f = the ANN and ELM models. In the same manner, the structure of input and target variables for VMD-ANN and VMD-ELM models is determined as follows: Figure 7 shows correlation coefficients between IMFs obtained from the entire water demand time series and ones from the partial water demand time series with different lengths. From Figure  7, it was seen that time series decomposition by VMD was dependent on the length of water demand time series. When the length of the partial time series was more than 500, the decomposition result was almost the same as the result for the entire time series. Thus, for real-time forecasting, VMD should be executed whenever new observations are acquired, and at least 500 previous time series data should be used for time series decomposition. For developing forecasting models using ANN and ELM, the potential influencing input variables should be selected in advance. Various lag times for daily water demand time series were considered in this study. The optimal lag time was determined by autocorrelation function (ACF), partial autocorrelation function (PACF), and average mutual information (AMI) [37]. The structure of input and target variables for the ANN and ELM models is as follows: where Q t−j = the input water demand time series lagged by j days (j = 0, 1, ......, 5), Q t+i = the target water demand time series for lead time i day (i = 1, 2, ......, 7), and f = the ANN and ELM models. In the same manner, the structure of input and target variables for VMD-ANN and VMD-ELM models is determined as follows: Figure 7 shows correlation coefficients between IMFs obtained from the entire water demand time series and ones from the partial water demand time series with different lengths. From Figure 7, it was seen that time series decomposition by VMD was dependent on the length of water demand time series. When the length of the partial time series was more than 500, the decomposition result was almost the same as the result for the entire time series. Thus, for real-time forecasting, VMD should be executed whenever new observations are acquired, and at least 500 previous time series data should be used for time series decomposition.
In ANN and ELM modeling, selecting the optimal number of hidden neurons is a critical step since it affects the model performance. A trial-and-error approach [37] was utilized for determining the optimal number in this study. In most hydrological fields, the logistic sigmoid function has been widely applied to ANN modeling in order to compute the output of each neuron [26]. The function is also applied to ANN and ELM modeling in this study. Furthermore, the ANN model was learned by BP algorithm and the default values of learning parameters (learning and momentum rates) presented by Zell et al. [38] were used. When considering both processing speed and accuracy, it is more efficient to use the appropriate small values of learning parameters than to determine them using a trial-and-error approach [39]. More information on this can be found in Rumelhart et al. [40], Pao [41], and Seo et al. [18]. time series and ones from the partial water demand time series with different lengths. From Figure  7, it was seen that time series decomposition by VMD was dependent on the length of water demand time series. When the length of the partial time series was more than 500, the decomposition result was almost the same as the result for the entire time series. Thus, for real-time forecasting, VMD should be executed whenever new observations are acquired, and at least 500 previous time series data should be used for time series decomposition.

Performance Evaluation
In this study, absolute error indices (MAE, RMSE, and R4MS4E), relative error indices (MARE and MdAPE), and dimensionless error indices (MCE and MIOA) were employed in order to evaluate and compare model performances for testing data. MCE and MIOA with j = 1 (MCE 1 and MIOA 1 ) were used to evaluate overall performances, and MCE and MIOA with j = 2 and 3 (MCE 2 , MCE 3 , MIOA 2 , and MIOA 3 ) were employed to evaluate performances for high water demands. The results of performance evaluation for testing data are summarized in Table 3.  3 . These indicated that ELM model represented better performance for higher water demands compared with ANN model, and ANN model produced worse performance than a "no knowledge" model. Consequently, it was found from the results of single forecasting models that ELM model performed better over the entire range of testing data compared with ANN model.
For VMD-based forecasting models (VMD-ANN and VMD-ELM), VMD-ELM model yielded better performance indices than VMD-ANN model. For example, the MAE values of VMD-ELM model were lower than those of VMD-ANN model, and the MCE 1 and MIOA 1 values of VMD-ELM model were higher than those of VMD-ANN model. These represented that the overall performance of VMD-ELM model was better compared with VMD-ANN model. Furthermore, the values of MARE and MdAPE for VMD-ELM model were smaller than those for VMD-ANN model, and VMD-ELM produced lower values of RMSE and R4MS4E and higher values of MCE 2 , MIOA 2 , MCE 3 , and MIOA 3 compared with VMD-ANN models. These results represented that VMD-ELM model performed better than VMD-ANN model for low and high water demands. Consequently, from the results of VMD-based forecasting models, it was found that VMD-ELM model outperformed VMD-ANN model over the entire range of testing data.
These results are due to the theoretical characteristics of ANN and ELM models. For the ANN model, gradient descent-based learning algorithms such as BP algorithm adjust all the parameters iteratively and require many iterative steps for obtaining satisfactory performance. Thus, the learning process is very slow, and it is difficult for the ANN model to reach the minimum training error due to infinite training iteration and local minima. In addition, the ANN model may be over-trained by gradient descent-based learning and produce worse generalization performance. Gradient descent-based learning algorithms only ty to reach the smallest training error, and do not consider the magnitude of the weights. These characteristics make it difficult for the ANN model to achieve the best generalization performance [25]. On the other hand, the ELM model is considered as a linear system after the input weights and the biases of the hidden layer are assigned randomly. Namely, the input weights and the biases of the hidden layer do not need to be tuned and any iterative learning process is also not required. The output weights of the ELM model can be determined analytically by finding the smallest norm least-squares solution for the linear system using the Moore-Penrose generalized inverse. Thus, the ELM model can produce the minimum training error, the smallest norm of weights, and the best generalization performance [25].
From the comparison of single and VMD-based forecasting models, VMD-ANN and VMD-ELM models represented better performance compared with ANN and ELM models. For instance, VMD-ANN and VMD-ELM models yielded lower MAE and higher MCE 1 and MIOA 1 values than ANN and ELM models, which represented better overall performance for VMD-ANN and VMD-ELM models. Furthermore, VMD-ANN and VMD-ELM models yielded lower MARE and MdAPE values than ANN and ELM models. For VMD-ANN and VMD-ELM models, the values of RMSE and R4MS4E were lower and the values of MCE 2 , MIOA 2 , MCE 3 , and MIOA 3 were higher compared with ANN and ELM models. These results represented that VMD-ANN and VMD-ELM models produced better performance than ANN and ELM models for low and high water demands. Therefore, from the comparison of single and VMD-based forecasting models, it was found that VMD-ANN and VMD-ELM models outperformed ANN and ELM models over the entire range of testing data, VMD-ELM model produced the best performance, and the performances of ANN and ELM models were able to be improved by VMD.
In general, water demand time series is a combination of different time-frequency features and has strong nonlinearity and non-stationarity. In this study, VMD decomposes an original water demand time series into multiple modes (low-and high-frequency modes), and then the modes are updated by Wiener filtering [42]. Since the modes represent simpler patterns than the original time series, the ANN and ELM models can be learned more effectively using the modes as training datasets than using the original time series. Especially, since low-frequency mode (IMF 1), which represents the trend component of water demand time series, can be modeled very accurately by the single forecasting models, the performance of water demand forecasting can be improved significantly by VMD. Thus, for water demand time series with strong nonlinearity and non-stationarity, VMD-based forecasting models can produce better performance than single forecasting models. Figure 8 shows scatter diagrams for forecasted and observed water demands. It was observed from Figure 8 that the scatter points of VMD-ANN and VMD-ELM models were closer to 1:1 slope line than those of ANN and ELM models. Based on scatter diagrams, VMD-ELM model produced the most accurate result among all other models. Although VMD-ANN and VMD-ELM models represented similar dispersion around 1:1 slope line, VMD-ANN model yielded underestimated values for high water demands. ELM model represented better accuracy than ANN model. Especially, the ANN model produced greatly underestimated result for high water demands. Although VMD was able to improve the underestimation of ANN model for high water demands, the underestimation was not resolved completely even in VMD-ANN model. Consequently, VMD-ELM model represented better accuracy compared with all other models, in terms of under-and overestimation and dispersion around 1:1 slope line. It was also found from these results that the accuracy of ANN and ELM models was able to be improved significantly by VMD. series, the ANN and ELM models can be learned more effectively using the modes as training datasets than using the original time series. Especially, since low-frequency mode (IMF 1), which represents the trend component of water demand time series, can be modeled very accurately by the single forecasting models, the performance of water demand forecasting can be improved significantly by VMD. Thus, for water demand time series with strong nonlinearity and nonstationarity, VMD-based forecasting models can produce better performance than single forecasting models. Figure 8 shows scatter diagrams for forecasted and observed water demands. It was observed from Figure 8 that the scatter points of VMD-ANN and VMD-ELM models were closer to 1:1 slope line than those of ANN and ELM models. Based on scatter diagrams, VMD-ELM model produced the most accurate result among all other models. Although VMD-ANN and VMD-ELM models represented similar dispersion around 1:1 slope line, VMD-ANN model yielded underestimated values for high water demands. ELM model represented better accuracy than ANN model. Especially, the ANN model produced greatly underestimated result for high water demands. Although VMD was able to improve the underestimation of ANN model for high water demands, the underestimation was not resolved completely even in VMD-ANN model. Consequently, VMD-ELM model represented better accuracy compared with all other models, in terms of under-and overestimation and dispersion around 1:1 slope line. It was also found from these results that the accuracy of ANN and ELM models was able to be improved significantly by VMD.   Figure 9 shows Taylor diagram for single and VMD-based forecasting models. The diagram provides a graphical summary for the agreement of observed and forecasted patterns, based on correlation coefficient (CC), standard deviation (SD), and centered root-mean-square difference (CRMSD) [43]. In Figure 9, the gray contour lines and black arc represent the values of CRMSD and SD for observed pattern, respectively. The closer a model is to the black hollow circle marker, the closer the forecasted pattern is to the observed pattern. From Figure 9, it was observed that VMD-ANN and VMD-ELM models yielded higher CC and lower CRMSD than ANN and ELM models, and ELM and VMD-ELM models produced SD values closer to the observed pattern compared with ANN and VMD-ANN models. Especially, it was shown that VMD-ELM models were located the closest to the black hollow circle marker among all other models. Consequently, based on Taylor diagram, it can be said that VMD-ANN and VMD-ELM models outperformed ANN and ELM models, and VMD was able to improve the accuracy of ANN and ELM models.  Figure 9 shows Taylor diagram for single and VMD-based forecasting models. The diagram provides a graphical summary for the agreement of observed and forecasted patterns, based on correlation coefficient (CC), standard deviation (SD), and centered root-mean-square difference (CRMSD) [43]. In Figure 9, the gray contour lines and black arc represent the values of CRMSD and SD for observed pattern, respectively. The closer a model is to the black hollow circle marker, the closer the forecasted pattern is to the observed pattern. From Figure 9, it was observed that VMD-ANN and VMD-ELM models yielded higher CC and lower CRMSD than ANN and ELM models, and ELM and VMD-ELM models produced SD values closer to the observed pattern compared with ANN and VMD-ANN models. Especially, it was shown that VMD-ELM models were located the closest to the black hollow circle marker among all other models. Consequently, based on Taylor diagram, it can be said that VMD-ANN and VMD-ELM models outperformed ANN and ELM models, and VMD was able to improve the accuracy of ANN and ELM models.  Figures 10 and 11 show residual boxplots and time series plots for single and VMD-based forecasting models, respectively. It was observed from Figure 10 that the ranges (maximumminimum) and interquartile ranges (third quartile-first quartile) of residuals for VMD-ANN and VMD-ELM models were smaller than those for the ANN and ELM models, and the median values of residuals for the ANN and ELM models were biased in the negative direction from zero compared with VMD-ANN and VMD-ELM models, respectively. Furthermore, it was observed from Figure 11 that VMD-ANN and VMD-ELM models forecasted the observed pattern more accurately than ANN and ELM models. Especially, compared with VMD-ANN and VMD-ELM models, the ANN and ELM models showed greater over-and underestimation for low and high water demands, respectively. These results indicated that VMD was able to reduce the residuals of ANN and ELM models significantly, improve under-and overestimation represented in ANN and ELM models, and thus enhance the accuracy of ANN and ELM models.  Figures 10 and 11 show residual boxplots and time series plots for single and VMD-based forecasting models, respectively. It was observed from Figure 10 that the ranges (maximum-minimum) and interquartile ranges (third quartile-first quartile) of residuals for VMD-ANN and VMD-ELM models were smaller than those for the ANN and ELM models, and the median values of residuals for the ANN and ELM models were biased in the negative direction from zero compared with VMD-ANN and VMD-ELM models, respectively. Furthermore, it was observed from Figure 11 that VMD-ANN and VMD-ELM models forecasted the observed pattern more accurately than ANN and ELM models. Especially, compared with VMD-ANN and VMD-ELM models, the ANN and ELM models showed greater over-and underestimation for low and high water demands, respectively. These results indicated that VMD was able to reduce the residuals of ANN and ELM models significantly, improve under-and overestimation represented in ANN and ELM models, and thus enhance the accuracy of ANN and ELM models.      From the results obtained from this study, VMD-based forecasting models produced better performances than single forecasting models in daily water demand forecasting. Similar results have also been reported from previous studies dealing with DWT-based hybrid modeling for water demand [7][8][9] and VMD-based hybrid modeling for river stage [17] and rainfall-runoff [18]. VMD decomposes water demand time series, which is a mixture of complex and irregular components, into multiple sub-time series (IMFs). Machine learning-based forecasting models are then constructed for each sub-time series. Since the sub-time series represent simpler temporal patterns than the original time series, the machine learning models can be trained more effectively, and thus water demand time series with strong nonlinearity and non-stationarity can be forecasted more accurately. Furthermore, by applying Wiener filtering, VMD can be more robust to noise affecting the decomposition accuracy of water demand time series negatively and yield narrow-based modes. Due to these characteristics, the VMD not only alleviates the mode mixing but also provides accurate timefrequency features. In addition, since VMD is a non-recursive signal decomposition algorithm, error propagation which inevitably occurs in iterative algorithms does not occur. These characteristics enable the VMD to produce more accurate time-frequency features compared with other signal decomposition methods such as EMD [42,44].
Meanwhile, VMD-based water demand forecasting models can be a more reliable forecasting tool by combining with resampling techniques including simple bootstrap, block bootstrap, Gaussian process regression bootstrap, Bayesian bootstrap, and maximum entropy bootstrap. Hybrid modeling similar to this concept was carried out successfully by Tiwari and Adamowski [7]. They developed a hybrid model (WBNN) coupling simple bootstrap, DWT, and ANN in order to forecast short-term urban water demand and concluded that WBNN model was able to decrease the forecasting uncertainty and thus provide more accurate and reliable water demand forecasts. Therefore, the combined model of resampling techniques, VMD, and machine learning models can be suggested as a future study for improving short-term water demand forecasting. From the results obtained from this study, VMD-based forecasting models produced better performances than single forecasting models in daily water demand forecasting. Similar results have also been reported from previous studies dealing with DWT-based hybrid modeling for water demand [7][8][9] and VMD-based hybrid modeling for river stage [17] and rainfall-runoff [18]. VMD decomposes water demand time series, which is a mixture of complex and irregular components, into multiple sub-time series (IMFs). Machine learning-based forecasting models are then constructed for each sub-time series. Since the sub-time series represent simpler temporal patterns than the original time series, the machine learning models can be trained more effectively, and thus water demand time series with strong nonlinearity and non-stationarity can be forecasted more accurately. Furthermore, by applying Wiener filtering, VMD can be more robust to noise affecting the decomposition accuracy of water demand time series negatively and yield narrow-based modes. Due to these characteristics, the VMD not only alleviates the mode mixing but also provides accurate time-frequency features. In addition, since VMD is a non-recursive signal decomposition algorithm, error propagation which inevitably occurs in iterative algorithms does not occur. These characteristics enable the VMD to produce more accurate time-frequency features compared with other signal decomposition methods such as EMD [42,44].
Meanwhile, VMD-based water demand forecasting models can be a more reliable forecasting tool by combining with resampling techniques including simple bootstrap, block bootstrap, Gaussian process regression bootstrap, Bayesian bootstrap, and maximum entropy bootstrap. Hybrid modeling similar to this concept was carried out successfully by Tiwari and Adamowski [7]. They developed a hybrid model (WBNN) coupling simple bootstrap, DWT, and ANN in order to forecast short-term urban water demand and concluded that WBNN model was able to decrease the forecasting uncertainty and thus provide more accurate and reliable water demand forecasts. Therefore, the combined model of resampling techniques, VMD, and machine learning models can be suggested as a future study for improving short-term water demand forecasting.

Conclusions
This study investigates the performances of VMD-based water demand forecasting models. VMD and two machine learning models, the ANN and ELM, were employed for the decomposition of water demand time series and the forecasting of sub-time series, respectively. The performances of single forecasting models (ANN and ELM) and VMD-based forecasting models (VMD-ANN and VMD-ELM) are evaluated based on performance indices and graphical analysis. For single forecasting models, the ELM model outperforms the ANN model. VMD-ANN and VMD-ELM models perform better compared with ANN and ELM models, and the VMD-ELM model achieves the best performance among all the models. From the results of this study, it is revealed that VMD can reduce forecasting error and improve model performance in machine learning-based water demand forecasting. Therefore, VMD-based machine learning models can be effective forecasting tools for short-term water demand with strong nonlinearity and non-stationarity and contribute to operate urban water supply facilities efficiently. This study is limited to short-term water demand forecasting for a specific area. Further studies need to be performed with regard to water demand forecasting for other areas which have demographic, economic, and weather conditions similar to the area where forecasting models are developed.