A Novel Dual-Scale Deep Belief Network Method for Daily Urban Water Demand Forecasting

Water demand forecasting applies data supports for the scheduling and decision-making of urban water supply systems. In this study, a new dual-scale deep belief network (DSDBN) approach for daily urban water demand forecasting was proposed. Original daily water demand time series was decomposed into several intrinsic mode functions (IMFs) and one residue component with ensemble empirical mode decomposition (EEMD) technique. Stochastic and deterministic terms were reconstructed through analyzing the frequency characteristics of IMFs and residue using generalized Fourier transform. The deep belief network (DBN) model was used for prediction using the two feature terms. The outputs of the double DBNs are summed as the final forecasting results. Historical daily water demand datasets from an urban waterworks in Zhuzhou, China, were investigated by the proposed DSDBN model. The mean absolute percentage error (MAPE), normalized root-mean-square error (NRMSE), correlation coefficient (CC) and determination coefficient (DC) were used as evaluation criteria. The results were compared with the autoregressive integrated moving average (ARIMA) model, feed forward neural network (FFNN) model, support vector regression (SVR) model, EEMD and their combinations, and single DBN model. The results obtained in the test period indicate that the proposed model has the smallest MAPE and NRMSE values of 1.291099 and 0.016625, respectively, and the largest CC and DC values of 0.976528 and 0.953512, respectively. Therefore, the proposed DSDBN method is a useful tool for daily urban water demand forecasting and outperforms other models in common use.


Introduction
Short-term water demand prediction is the basis for optimal operation scheduling and decision-making of urban water supply systems, potentially providing a guide for the optimal operation scheduling of pumping stations, reduced energy consumption of water production and decreased economical costs of water supply.Domestic water cannot be stored for extended periods, balanced water production and supply can be achieved by accurate short-term water demand prediction, and the quality of water supply can be guaranteed.
In the past few decades, urban water demand forecasting has attracted considerable researcher attention.House-Peters and Chang [1] and Donkor et al. [2] reviewed the various available methods and models of water demand forecasting.In general, the two main approaches to water demand forecasting are knowledge-driven modeling and data-driven modeling.The former can contain detailed description of factors that affect urban water demand, such as population, price, income, Energies 2018, 11, 1068 2 of 15 temperature, rain and other factors.Jain and Ormsbee [3] used such methods for forecasting daily water demand.Gato et al. [4] proposed a novel daily urban water demand model that merges temperature and rainfall thresholds.Di et al. [5] integrated weather forecasting information into the water demand model to improve short-term urban water demand prediction.The latter approach only uses historical time series data and is widely employed in forecasting of water demand, including support vector regression (SVR) [6], artificial neural network (ANN) [7], autoregressive integrated moving average (ARIMA) [8], and a random forest regression models [9].In addition, Tiwari and Adamowski [10] applied mixed wavelet-bootstrap neural network method for short-term urban water demand prediction.Bai et al. [11] proposed a multi-scale relevance vector regression (RVR) method using the combination of stationary wavelet transform and RVR to predict daily urban water demand.According to the literature [2], ANNs and integrated models are more suitable than other approaches for short-term water demand prediction.
Deep belief network (DBN), developed by Hinton et al. [12], is a probabilistic generative model and uses greedy layer-wise unsupervised learning algorithm.DBN possess numerous hidden layers to extract the latent features by layer-wise learning, thereby realizing powerful nonlinear expressive capacity.DBN can better solve problems of overfitting, local minimum, and poor global search capability than ANN [13].Successful applications has been achieved in the fields of acoustic modeling [14], natural language understanding [15], image classification [16,17], fault diagnosis [18,19], exchange rate forecasting [20,21], reservoir inflow forecasting [22], electricity load forecasting [23], building energy consumption prediction [24], and time series forecasting [13,25].However, the literature on daily urban water demand forecasting is limited.
As a new data preprocessing method, the empirical mode decomposition (EMD) technique developed by Huang et al. [26] is a self-adaptive decomposition approach that does not require a priori knowledge.EMD is based on the assumption that data may consist of different coexisting modes of oscillations and can be expressed as intrinsic mode function (IMF) components [26,27].However, the mode mixing problem of EMD occurred during application, thus seriously affecting the accuracy of short-term forecasting [28].Wu and Huang [29] proposed ensemble EMD (EEMD) to eliminate the disadvantages of EMD technique by adding finite white noise to the original signal.In recent years, EEMD has demonstrated more advantageous than EMD in signal decomposition and been successfully applied in various fields.Liu et al. [28] used sub-section particle swarm optimization model based on EEMD for short-term load prediction, and the proposed method can improve prediction accuracy.Wang et al. [30] implemented the ANN model based on EEMD to forecast medium and long-term runoff, the results indicate that EEMD can markedly improve forecasting accuracy.Wang et al. [31] applied the hybrid EEMD and generic algorithm-back propagation (GA-BP) neural network algorithm to wind speed prediction.Compared with the traditional GA-BP model, GA-BP model based on EMD technique, and wavelet neural network model, the proposed hybrid model exhibited higher prediction accuracy.
This paper proposes a novel dual-scale deep belief network (DSDBN) method for daily urban water demand forecasting.The original time series data of daily urban water demand are decomposed into a set of IMFs and residue component using EEMD technique.Subsequently, generalized Fourier transform (GFT) is applied to analyze the frequency characteristics of each component.Subcomponents are reconstructed into stochastic and deterministic terms according to the frequency characteristics.Then, the terms are considered as input data to build DBN forecasting model.The final result of urban water demand forecast is obtained by superposing the forecasting result of double DBNs.The remainder of the paper is arranged as follows.Section 2 presents the methodologies including EEMD, GFT method, DBN, and DSDBN models.Section 3 describes the study area and the dataset.The forecasting results, comparison and discussions are illustrated Section 4. Finally, Section 5 gives the conclusions.

Methodology
In general, a time series data can be decomposed into a combination of several feature terms, including the stochastic, periodic, and trend terms [32], the latter two can be combined into a deterministic term.An original time series of daily water demand x(t) can be represented as: where x sto (t), x per (t), x tre (t), and x det (t) are the stochastic, periodic, trend and deterministic terms, respectively.The stochastic term x sto (t) shows the random interferences and noise information of the time series, and the deterministic term x det (t) including the periodic term x per (t) and the trend term x tre (t), which indicate the seasonal and long-term change rules in the time series.

EEMD and GFT
EEMD, proposed by Wu and Huang [29], is an adaptive method developed from EMD [26] with a noise-assisted analysis technique.EMD is a self-adaptive approach to extract a set of IMFs from the original data.IMFs stand for the natural oscillatory mode existed in the original signal, each with corresponding frequency, physical meaning and the following properties [26]: (1) the amount of zero crossings and extrema in the entire dataset must either equal or differ at most by one; and (2) at any point, the average value of the envelope defined by local minima and maxima is zero.
According to traditional definition, the original daily water demand data x(t) can be decomposed through an EMD shifting process, which can be simply described as below [26]: Step 1: All local minima and maxima values of the given data x(t) are identified.
Step 2: The lower envelope e L (t) and upper envelope e H (t) for the minima and maxima values, respectively, are obtained by cubic spline interpolation.
Step 3: The average value a(t) of the lower and upper envelopes are calculated: Step 4: The first component d(t) is derived by calculating the difference between the given data x(t) and the average value a(t): Step At the end of EMD shifting process, the original time series x(t) can be expressed as a linear combination of m IMFs and one residue from high to low frequency: where m represents the number of IMFs, c i (t) denotes the ith IMF at the time t, and r m (t) is the final residue of the time series x(t).All c i (t) have zero means and are nearly orthogonal to each other.However, mode mixing problem exists during the use of EMD approach, which cannot correctly decompose the original time series; some different-scale signals are present in an IMF, or the similar-scale signals are present in different IMFs; and the IMFs lose physical meaning by themselves.To eliminate the mode mixing phenomenon, Wu and Huang [29] introduced an effective EEMD method.The shifting process of EEMD method is briefly described as follows: Energies 2018, 11, 1068 4 of 15 Step 1: A random white noise signal with a zero mean and a given standard deviation e is added to the original time series x(t).
Step 2: The original time series x(t) with added white noise signal is decomposed into IMFs and residue component.
Step 3: Steps 1 and 2 are repeated M times using a different white noise signal each time.
Step 4: The average of corresponding IMFs resulting from decompositions is calculated as the final result.
The effect of the added white noise signal can be reduced based on the following rule [29]: where M denotes the ensemble numbers, e represents the amplitude of the added noise signal, and e denotes the final standard deviation of error and is defined as the difference between the original data x(t) and the corresponding IMFs.
The GFT of the ith IMF component c i (t) can be expressed as [33]: where s 0 (t) is a phase transform real-valued function that details the behavior of the c i (t).The component c i (t) may be calculated by the inverse GFT: , where f 0 = f (t) − s 0 (t) is the expectation of GFT frequency traces.
The IMFs and residue component are summed up by following the frequency characteristic and reconstructing the stochastic term x sto (t) =  This approach thereby reduces the prediction steps and computational complexity of the algorithm without increasing the prediction error.

Deep Belief Network
A DBN is stacked by multiple restricted Boltzmann machine (RBM) layers, and their structure are shown in Figure 1.As seen in Figure 1a, a RBM is composed of one visible layer and one hidden layer.Symmetrically weighted connections are present among the units of interlayers, and no connections occur between the neurons in the same layer [34].As shown in Figure 1b, a DBN is stacked by l RBMs, which presents one input layer, also named visible layer, multiple hidden layers and one output layer.The visible layer of the first RBM acts as the DBN input layer; the hidden layer of the first RBM is the visible layer of the second RBM and is the first hidden layer of the DBN.Then, the hidden layer of the second RBM is the visible layer of the third RBM and the second hidden layer of DBN.This process continues, and layer-by-layer stacking occurs.Thus, a DBN with multiple hidden layers is established.
Hinton et al. [12] proposed a fast, unsupervised and greedy learning method for DBN.This learning process is based on the training one layer at a time.The main concept of this algorithm is that the input samples are randomly selected for training the first RBM, and the hidden units can acquire the unique important features of the input samples.We used the hidden layer as the first hidden layer of DBN, the training feature is used as input samples to train the next RBM.We repeated the process of the learning feature until all layers of DBN have been trained over.that the input samples are randomly selected for training the first RBM, and the hidden units can acquire the unique important features of the input samples.We used the hidden layer as the first hidden layer of DBN, the training feature is used as input samples to train the next RBM.We repeated the process of the learning feature until all layers of DBN have been trained over.
Visible Units Hidden Units The classic RBM model can only address binary-valued data.A continuous restricted Boltzmann machine (CRBM) was proposed by Chen and Murray [35], and the neurons indicate continuous state values.Therefore, the model can address real-valued data.Given that daily urban water demand data are continuous, we used multiple CRBMs to construct the DBN, which can process real-valued data and can be applied to urban daily water demand forecasting.The model training process is described as follows: Step 1: The state of units and the weight matrix are randomly initialized.Let v s and h s denote the states of the visible units v and hidden units h , and (0,1) where represents a constant between 0 and 1, (0,1) h N denotes a Gaussian random unit with a unit variance and zero mean, () h x is a sigmoid function with asymptotes at min and max , and h a controls the gradient of the sigmoid function and the nature of the unit's random behavior [36].
Step 3: Step 4: s is used in computing the states ' h s of hidden units: '' (0,1) The classic RBM model can only address binary-valued data.A continuous restricted Boltzmann machine (CRBM) was proposed by Chen and Murray [35], and the neurons indicate continuous state values.Therefore, the model can address real-valued data.Given that daily urban water demand data are continuous, we used multiple CRBMs to construct the DBN, which can process real-valued data and can be applied to urban daily water demand forecasting.The model training process is described as follows: Step 1: The state of units and the weight matrix are randomly initialized.Let s v and s h denote the states of the visible units v and hidden units h, and w vh is the interconnected weight.Step 2: A set of training sample are randomly selected as input; the states s h of the first hidden units are updated based on the following formula: with where σ represents a constant between 0 and 1, N h (0, 1) denotes a Gaussian random unit with a unit variance and zero mean, ϕ h (x) is a sigmoid function with asymptotes at θ min and θ max , and a h controls the gradient of the sigmoid function and the nature of the unit's random behavior [36].Step 3: s h is used in computing the states s v of visible units: Step 4: s v is used in computing the states s h of hidden units: Step 5: The next set of training sample is randomly selected and Step 2 commences.If all training samples are used, the following formulas are used in updating the weight and noise control parameters: where η w and η a are the learning rates; s v and s h are the single-step sampled state of unit v and h, respectively; and < • > related to the mean over the training samples.

The DSDBN Forecasting Model
Daily urban water demand time series is a typical nonlinear and non-stationary signal.We used EEMD to extract the IMFs and residue component in the original time series of daily water demand.We reconstructed the stochastic term and deterministic term in accordance with the frequency characteristic, thereby decreasing the non-stationary of the original data.Then, we used DBN to predict the two feature terms.Finally, the prediction results of the two feature term were superimposed.The modeling structure flow chart of the DSDBN prediction model is illustrated in Figure 2. parameters: ( 1) ( ) where w and a are the learning rates; '

The DSDBN Forecasting Model
Daily urban water demand time series is a typical nonlinear and non-stationary signal.We used EEMD to extract the IMFs and residue component in the original time series of daily water demand.We reconstructed the stochastic term and deterministic term in accordance with the frequency characteristic, thereby decreasing the non-stationary of the original data.Then, we used DBN to predict the two feature terms.Finally, the prediction results of the two feature term were superimposed.The modeling structure flow chart of the DSDBN prediction model is illustrated in Figure 2. As seen in Figure 2, the six main steps of the DSDBN forecasting method are listed as follows: Step 1: Daily water demand time series is collected, and data outliers are eliminated.As seen in Figure 2, the six main steps of the DSDBN forecasting method are listed as follows: Step 1: Daily water demand time series is collected, and data outliers are eliminated.
Step 2: EEMD is utilized to decompose the daily water demand time series into the combination of m IMFs and one residue component.
Step 3: GFT is performed to analyze the spectral characteristics of each IMF and residue.The stochastic term x sto (t) and deterministic term x det (t) are reconstructed in accordance with the frequency characteristic.
Step 4: The structure of the DBN model is designed to correspondingly predict the stochastic term x sto (t) and deterministic term x det (t), including selecting the number of input units, hidden units and layer and setting other parameters.
Step 5: The prediction results of the stochastic term and the deterministic term are combined.The results can be used as the final prediction result for the original daily water demand time series.
Step 6: The final result is compared with the peer model, and the performance of the proposed DSDBN model is evaluated.

Study Area and Data
We collected 1462 daily water demand records from 1 January 2012 to 1 January 2016 from an urban waterworks (latitude and longitude north 113.158037, east 27.824564) in Zhuzhou, China.This urban waterworks has a capacity of 100,000 m 3 of filter water per day and supplies water for approximately 0.4 million people and enterprises in Lusong District, an area of approximately 200 km 2 .Among 1462 daily records, the first 1224 daily series from 1 January 2012 to 8 May 2015 were used the training dataset, and the other 238 records were retained for testing.The original daily urban water demand data are plotted in Figure 3.
Step 2: EEMD is utilized to decompose the daily water demand time series into the combination of m IMFs and one residue component.xt , including selecting the number of input units, hidden units and layer and setting other parameters.
Step 5: The prediction results of the stochastic term and the deterministic term are combined.The results can be used as the final prediction result for the original daily water demand time series.
Step 6: The final result is compared with the peer model, and the performance of the proposed DSDBN model is evaluated.

Study Area and Data
We collected 1462 daily water demand records from 1 January 2012 to 1 January 2016 from an urban waterworks (latitude and longitude north 113.158037, east 27.824564) in Zhuzhou, China.This urban waterworks has a capacity of 100,000 m 3 of filter water per day and supplies water for approximately 0.4 million people and enterprises in Lusong District, an area of approximately 200 km 2 .Among 1462 daily records, the first 1224 daily series from 1 January 2012 to 8 May 2015 were used as the training dataset, and the other 238 records were retained for testing.The original daily urban water demand data are plotted in Figure 3.As shown in Figure 3, the daily water demand time series indicates a certain regularity as time continues.The water demand begins to decrease until the annual minimum in the winter, then increase.The highest value is reached in the summer because most enterprises indicate that their main water consumers are on holiday and decrease their water demand during China's Spring Festival.

Performance Evaluation
For validating and evaluating the forecasting performance of the proposed models, we used four widely used criteria in this study.As shown in Figure 3, the daily water demand time series indicates a certain regularity as time continues.The water demand begins to decrease until the annual minimum in the winter, then increase.The highest value is reached in the summer because most enterprises indicate that their main water consumers are on holiday and decrease their water demand during China's Spring Festival.

Performance Evaluation
For validating and evaluating the forecasting performance of the proposed models, we used four widely used criteria in this study.

Mean Absolute Percentage Error (MAPE)
MAPE is an unbiased estimator for evaluating the forecasting capability of a model and it is often applied to practice because of its intuitive explanation in terms of relative error [37].MAPE is used in evaluating the effect of the model from the perspective of the prediction error. Energies where Y(t) and Ŷ(t) denote the observed and predicted value at time t, respectively, and N is the number of observed value.A low value indicates good prediction effect of the model.
A lower value indicates less residual variance and stronger agreement between the observed and predicted values.

Correlation Coefficient (CC)
CC shows the linear between the observed and predicted value [38].
where Y(t) and Ŷ(t) are the means of the observed and the predicted value, respectively.A CC value approaching 1 indicates a good fit between the observed and predicted value and superior predictive capability of the model.

Determination Coefficient (DC)
DC shows the unconformity between the observed and predicted values and the number of points close to the bisector in the scatter plot of two variables [39].

Decomposing and Reconstructing Water Demand Time Series
Using EEMD to decompose the original water demand time series, two key parameters need to be set, namely, the numbers of the ensemble M and the amplitude of white noise e.As presented in Equation ( 5), Wu and Huang [29] set the amplitude of added white noise signal to 0.2 times the standard deviation and experimented with empirical methods for numerous times.The group encountered difficulty in eliminating the mode mixing phenomenon at extremely small values of e, whereas, at an excessively large e, they observed that several extra IMF components are produced, leading to misjudgment of the results.The EEMD parameter settings for different application areas [40][41][42] refer to the method proposed by Wu and Huang [29].In this study, the ensemble member M and the standard deviation of added white noise signal e of the EEMD were set to 100 and 0.2, respectively.
As described above, the original daily urban water demand time series is decomposed into nine independent IMF c 1 (t)-c 9 (t) and one residue r 9 (t) by using the EEMD method, the decomposition results are shown in Figure 4, where the components c 1 (t)-c 9 (t) and r 9 (t) are in accordance with the order from high to low frequency, and r 9 (t) reflects the overall trend of the original daily water demand time series.The EEMD decomposition presents significant physical meaning, transforms nonlinear and non-stationary series into stationary series and facilitates improved forecasting performance.
Equation ( 5), Wu and Huang [29] set the amplitude of added white noise signal to 0.2 times the standard deviation and experimented with empirical methods for numerous times.The group encountered difficulty in eliminating the mode mixing phenomenon at extremely small values of e , whereas, at an excessively large e , they observed that several extra IMF components are produced, leading to misjudgment of the results.The EEMD parameter settings for different application areas [40][41][42] refer to the method proposed by Wu and Huang [29].In this study, the ensemble member M and the standard deviation of added white noise signal '  e of the EEMD were set to 100 and 0.2, respectively.As described above, the original daily urban water demand time series is decomposed into nine independent IMF 1 () ct-9 () ct and one residue 9 () rt by using the EEMD method, the decomposition results are shown in Figure 4, where the components 1 () ct-9 () ct and 9 () rt are in accordance with the order from high to low frequency, and 9 () rt reflects the overall trend of the original daily water demand time series.The EEMD decomposition presents significant physical meaning, transforms nonlinear and non-stationary series into stationary series and facilitates improved forecasting performance.The frequency spectrum of each IMF and residue component using GFT are shown in Figure 5, wherein two main frequency components, namely, 1 0.000488 f (1/day) and 2 0.002930 f The frequency spectrum of each IMF and residue component using GFT are shown in Figure 5, wherein two main frequency components, namely, f 1 = 0.000488 (1/day) and f 2 = 0.002930 (1/day), are in the original daily water demand time series; different frequency components are present in c 1 (t)-c 4 (t); similar frequency components f 2 in c 5 (t)-c 7 (t) (2 f 2 , 3 f 2 and 6 f 2 in c 5 (t); 0.7 f 2 , f 2 and 2 f 2 in c 6 (t); and f 2 in c 7 (t)); and same frequency components f 1 in c 8 (t), c 9 (t) and r 9 (t).According to EEMD and GFT analysis results, the nine IMFs and one residue component can be reconstructed into three categories, namely, 4 ∑ i=1 c i (t), 7 ∑ i=5 c i (t) and c 8 (t) + c 9 (t) + r 9 (t).The first category can be regarded as the stochastic term, whereas the latter two categories are treated as deterministic terms, that is,

Modeling Method of DBN
The number of input nodes, hidden layers and nodes is the key issue in the design of DBN structure.The number of input nodes that corresponds to the number of previous observed values is correlated with future values and determines the autocorrelation structure of the daily water demand time series.The hidden layers and nodes expose nonlinear patterns and complex intrinsic relationship in the daily water demand time series.At present, no mature theory can solve these problems.In this study, the number of input nodes, hidden layers and nodes of DBN was determined by the experiment.
Assuming that the training dataset is x(n), we used one-step-ahead forecasting and set the DBN with q input nodes and an output node, with the number of q ranging from 1 to 10.The training sample involves X(n) = [x(n), x(n + 1), . . ., x(n + q)] as the inputs and Y(n) = x(n + q + 1) as the output.The hidden layers were set from 1 to 3, and the number of the hidden nodes were set to 4, 8, 12, 16 and 20.We used the four evaluation criteria described in Section 3.2 to evaluate the learning capability of DBN with different parameters and select the number of hidden layers and nodes with optimal learning performance.Researchers found that the forecasting performance of multiple hidden layers are superior to that with only one layer [43], and that the forecasting performance of the neural network exerts a significant effect as the number of hidden nodes changes [44].
Other parameters need to be set during the DBN training process.First, we set the initial values and update the method of w vh in Equation ( 8); then, we used a group of random initial values in the first CRBM and constantly adjusted the weight matrix until stability was achieved.Then, the weight matrix of the next CRBM was initialized using the weight matrix of the previous CRBM, and layer-by-layer training was conducted until the CRBM trainings were finished.The parameters θ max and θ min in Equation ( 9) were set to be the maximum and minimum values of the entire training samples, respectively.The parameters σ in Equations ( 10) and (11), η w in Equation ( 12), and η a and a h in Equation ( 13) were determined by 5-fold cross-validation method.To obtain the global optimal solution, we trained every DBN 20 times using 20 groups of random initial values, and the mean values of those 20 times were used as the training result of a DBN.
The stochastic term x sto (t) and deterministic term x det (t) were modeled using the method described above.According to the four performance evaluation criteria, namely, MAPE, NRMSE, CC and DC, the best structure of DBN with the best learning performance was selected.The structure of DBN and the forecasting performance of training dataset are listed in Table 1, where the optimal architecture of DBN for the stochastic term x sto (t) is 10-8-12-1, that is, DBN with 2 hidden layers, 10 input nodes, 8 and 12 nodes in the first hidden layer and second hidden layer, respectively, and 1 output node.For the deterministic term x det (t), the optimal structure of DBN is 4-16-1 (input layer: 4 nodes, hidden layer: 16 nodes and output: 1 node).

Forecasting Result
The forecasting result of the stochastic term x sto (t) and deterministic term x det (t) is plotted in Figure 6. Figure 6a shows the forecasting result of the stochastic term x sto (t) using DBN with 10-8-12-1, and Figure 6b plots the forecasting results of the deterministic term x det (t) using DBN with 4-16-1.The final forecasting results of the original daily water demand can be obtained by superimposing the forecasting results of stochastic term x sto (t) and deterministic term x det (t). Figure 7 displays the final forecasting results.Figure 7a shows the comparison between the predicted and observed value, and Figure 7b exhibits the scatter of predicted and observed values.In Figure 7a, the forecasting values can follow the changes of the observed daily water demand.The correlation between the predicted and the observed values in Figure 7b indicates that both values are highly consistent.The results of four performance evaluation criteria are MAPE = 1.291099,NRMSE = 0.016625, CC = 0.976528 and DC = 0.953512, which indicate that the proposed approach exhibits good learning performance and forecasting capability.and observed value, and Figure 7b exhibits the scatter of predicted and observed values.In Figure 7a, the forecasting values can follow the changes of the observed daily water demand.The correlation between the predicted and the observed values in Figure 7b indicates that both values are highly consistent.The results of four performance evaluation criteria are MAPE = 1.291099,NRMSE = 0.016625, CC = 0.976528 and DC = 0.953512, which indicate that the proposed approach exhibits good learning performance and forecasting capability.

Comparison Experiment
To assess the performance of the proposed DSDBN model, ARIMA model, feed forward neural network (FFNN) model, and SVR model were employed for comparison using the same training and testing samples.The three comparison models also use EEMD method to decompose the daily water demand time series, and the stochastic and deterministic terms are reconstructed using the same method as described in Section 4.1.In addition, for assessing the effect of EEMD on predictive performance, a single DBN model was developed to use the original daily water demand time series.In ARIMA modeling, KPSS is used to test the stability of the two feature terms and Akaike information criterion [45] is employed to identify the best fitted model.FFNN modeling uses the same nodes and layers with the DSDBN model, the sigmoid and linear activation functions in the hidden and output layer, respectively, and the back propagation algorithm to train the model.In SVR modeling, the kernel function selects the linear kernel and uses the grid search method to determine values of optimal parameters C and 2 , and the value of insensitive loss function ' is set to 0.1.7a, the forecasting values can follow the changes of the observed daily water demand.The correlation between the predicted and the observed values in Figure 7b indicates that both values are highly consistent.The results of four performance evaluation criteria are MAPE = 1.291099,NRMSE = 0.016625, CC = 0.976528 and DC = 0.953512, which indicate that the proposed approach exhibits good learning performance and forecasting capability.

Comparison Experiment
To assess the performance of the proposed DSDBN model, ARIMA model, feed forward neural network (FFNN) model, and SVR model were employed for comparison using the same training and testing samples.The three comparison models also use EEMD method to decompose the daily water demand time series, and the stochastic and deterministic terms are reconstructed using the same method as described in Section 4.1.In addition, for assessing the effect of EEMD on predictive performance, a single DBN model was developed to use the original daily water demand time series.In ARIMA modeling, KPSS is used to test the stability of the two feature terms and Akaike information criterion [45] is employed to identify the best fitted model.FFNN modeling uses the same nodes and layers with the DSDBN model, the sigmoid and linear activation functions in the hidden and output layer, respectively, and the back propagation algorithm to train the model.In SVR modeling, the kernel function selects the linear kernel and uses the grid search method to determine values of optimal parameters C and 2 , and the value of insensitive loss function ' is set to 0.1.

Comparison Experiment
To assess the performance of the proposed DSDBN model, ARIMA model, feed forward neural network (FFNN) model, and SVR model were employed for comparison using the same training and testing samples.The three comparison models also use EEMD method to decompose the daily water demand time series, and the stochastic and deterministic terms are reconstructed using the same method as described in Section 4.1.In addition, for assessing the effect of EEMD on predictive performance, a single DBN model was developed to use the original daily water demand time series.In ARIMA modeling, KPSS is used to test the stability of the two feature terms and Akaike information criterion [45] is employed to identify the best fitted model.FFNN modeling uses the same nodes and layers with the DSDBN model, the sigmoid and linear activation functions in the hidden and output layer, respectively, and the back propagation algorithm to train the model.In SVR modeling, the kernel function selects the linear kernel and uses the grid search method to determine values of optimal parameters C and γ 2 , and the value of insensitive loss function ε is set to 0.1.The single DBN modeling uses the same method as DSDBN modeling to select the number of nodes and hidden layers, and the structure of single DBN is finally set to 6-4-12-1.
The prediction results of the four different comparison models are shown in Table 2.In Table 2, the DSDBN model presents smallest MAPE and NRMSE values, and the largest CC and DC values among the models and improved the EEMD-ARIMA, EEMD-FFNN, EEMD-SVR and DBN models, with reduction of MAPE of approximately 76.48%, 34.56%, 69.8% and 20.36%, respectively; reduction of NRMSE of 71.53%, 40.44%, 66.34% and 18.38%, respectively; increase in CC of 1.24%, 2.3%, 1.13% and 0.85%, respectively; and increase in DC of 123.67%, 9.73%, 61.73% and 2.52%, respectively.The results of this analysis illustrate that the proposed DSDBN model is superior to the other four models in forecasting daily urban water demand.Moreover, the findings indicate that the DSDBN model, which is based on the method of "decomposing and reconstructing", improves the prediction accuracy compared with the single DBN model; thus, the model is superior to ARIMA, FFNN and SVR models in predicting nonlinear and non-stationary daily urban water demand. j t) and deterministic term x det (t) = m ∑ i=j+1 c i (t)+r m (t).

vh w is the interconnected weight. Step 2 :
A set of training sample are randomly selected as input; the states h s of the first hidden units are updated based on the following formula:

Step 6 : 7 :
Step 2 is repeated, and the next round of training is conducted until the number of preset training is reached or the weight change matrix is sufficiently small, that is, |∆w(k)| < ε , and the first CRBM is trained over.Step The output of the first CRBM is considered as the input of the second CRBM.Steps 1-6 are repeated for training the second CRBM until all CRBMs of DBN are trained and the training is completed.

sStep 6 : 7 :
are the single-step sampled state of unit v and h , respectively; and related to the mean over the training samples.Step 2 is repeated, and the next round of training is conducted until the number of preset training is reached or the weight change matrix is sufficiently small, that is, | ( )| wk , and the first CRBM is trained over.Step The output of the first CRBM is considered as the input of the second CRBM.Steps 1-6 are repeated for training the second CRBM until all CRBMs of DBN are trained and the training is completed.

Figure 2 .
Figure 2. Modeling structure flow chart of the DSDBN model.

Figure 2 .
Figure 2. Modeling structure flow chart of the DSDBN model.

Step 3 : 4 :
GFT is performed to analyze the spectral characteristics of each IMF and residue.The stochastic term sto () xt and deterministic term det () xt are reconstructed in accordance with the frequency characteristic.Step The structure of the DBN model is designed to correspondingly predict the stochastic term sto () xt and deterministic term det ()

Figure 3 .
Figure 3. Original daily urban water demand time series for 1 January 2012-1 January 2016.

Figure 3 .
Figure 3. Original daily urban water demand time series for 1 January 2012-1 January 2016.

Figure 5 .
Figure 5. Frequency spectrum of each IMF and residue component using GFT.Figure 5. Frequency spectrum of each IMF and residue component using GFT.

Figure 5 .
Figure 5. Frequency spectrum of each IMF and residue component using GFT.Figure 5. Frequency spectrum of each IMF and residue component using GFT.

Figure 6 .
Figure 6.Forecasting results of the two feature terms: (a) stochastic term x sto (t); and (b) deterministic term x det (t).

Figure 7 .
Figure 7. Final forecasting results of the daily water demand: (a) comparison between the predicted value and the observed value; and (b) the scatter of the predicted value and the observed value.

5: If
and deterministic term x det (t) = According to EEMD and GFT analysis results, the nine IMFs and one residue component can be reconstructed into three categories, namely, .The first category can be regarded as the stochastic term, whereas the latter two categories are treated as deterministic terms, that is, stochastic term

Table 1 .
DBN structure and the forecasting result of training dataset.

Table 2 .
Comparison of the predictive performances by using different forecasting models.