Nonlinear Dynamic Modeling of Urban Water Consumption Using Chaotic Approach (Case Study: City of Kelowna)

This study investigated urban water consumption complexity using chaos theory to improve forecasting performance to help optimize system management, reduce costs and improve reliability. The objectives of this study were to (1) investigate urban water distribution consumption complexity and its role in forecasting technique performance, (2) evaluate forecasting models by periodicity and lead time, and (3) propose a suitable forecasting technique based on operator applications and performance through various time scales. An urban consumption dataset obtained from the City of Kelowna (British Columbia, Canada) was used as a test case to forecast future consumption values using varying lead times under different temporal scales to identify models which may improve forecasting performance. Chaos theory techniques were employed to inform model optimization. This study attempted to address the paucity of studies on chaos theory applications in water consumption forecasting. This was accomplished by applying non-linear approximation, dynamic investigation, and phase space reconstruction for input variables, to improve the accuracy in various periodicity and lead time. To reconstruct the phase space, lag time was calculated using average mutual information for daily resolution as 17 days to reconstruct the phase space. The optimum embedding dimension and correlation exponent for the phase space were 18 and 3.5, respectively. Comparing the results, the non-linear local approximation model provided the best performance. The forecasting horizon for the models was 122 days. Moreover, phase space reconstruction improved the accuracy of the models for the different lead times. The findings of this study may improve forecasting performance and provide evidence to support further investigation of the chaotic behaviour of water consumption values over different time scales.


Introduction
Global water scarcity concerns are increasing due to climate change, urban development, population growth, industrial development, economic expansion, and the cost of drinking water [1]. It is imperative that governments invest in integrated management plans that address consequences of water problems such as scarcity of available water resources, sufficient distribution and pipeline maintenance. Conflicts resulting from water scarcity are becoming more prevalent and severe than other natural resource scarcity due to rapid increases in population without access to reliable sources of fresh water in recent decades [1,2]. There were 153 water-related conflicts reported in the 19th century while 279 conflicts have been reported within the last decade [3]. Therefore, we highlight the necessity of a management plan that improves water consumption efficiency.
A robust operation of urban drinking water supply systems requires future water consumption values to inform the development of an efficient water consumption management plan to mitigate anticipated stressors on the system. The reliability of water distribution systems may be improved through the accurate simulation of hydraulic conditions in pipeline systems based on future water consumption forecasting. In other words, water consumption forecasting provides public suppliers with the necessary future consumption information to ensure consumption needs can be met [4,5]. Water consumption forecasting is a dynamic process as predictions are essential for the optimum operation and sustainable growth and development of urban water supply [6]. The reason for categorizing water consumption forecasting as a dynamic process is because of the influential variables in forecasting that are not stable and can change under varying conditions. Therefore, considering reliable forecasting horizon can be beneficial in any management plan. Moreover, forecasting consumption values in short-, mid-, and long-term (i.e., less than a week, a week to a month, a month to a year or more, respectively) time periods play a crucial role in water distribution systems' (WDS) daily operation basis by informing important factors such as optimized pumping, pipeline maintenance, minimizing energy and water supply cost, improving system reliability and the quality of allocated water [7][8][9]. Recent studies have improved the understanding of the nonlinearity and complexity of water consumption factors; however, more study on these concepts is required. The available studies related to these factors are limited and require (1) accurate estimation and forecasting of water consumption and (2) determination of the degree of nonlinearity among the influential variables in water consumption. Accurate estimation and forecasting data are influenced by the availability of high-resolution temporal scale datasets (e.g., daily and hourly). Additionally, data availability for other influential factors, such as holidays, humidity, peak consumption duration in a 24-h period, air temperature, population growth, and consumers' income, is important to accurately forecast consumption data.
Over the past three decades, two thoughts of deterministic and probabilistic methods have been proposed to forecast urban water consumption. The deterministic approach is solely based on input variables and their initial conditions, whereas a probabilistic model relies on modeling uncertainties and randomness of input variables. Researchers have confirmed the ability of nonlinear deterministic approaches in modeling the events with complexity in effective variables [10,11]. Individual consumption habits are an important factor in modeling and forecasting future consumption. However, there are a few studies that aimed to forecast water consumption at the individual household scale which is a technique to analyze consumption habits [12]. The authors are not aware of any recent studies that considered consumers' habits in total recorded consumption values when modeling urban water consumption in large scale of an urban district.
However, an individual's habits will influence consumption and thus, future values in the system will be highly influenced by individual consumer habits, resulting in a highly complex system. In a complex system, "the behavior of the integral part of the system is simple, but the behavior of the overall system is complex" [13]. Given the significant challenges and complexity of probabilistic methods, deterministic methods can provide a useful approximation to their probabilistic counterparts. Therefore, investigation of the complexity of effective variables that are used as input variables of water consumption models can provide information towards employing a reliable modeling technique. Therefore, this study investigates both the dynamic characteristics, and dynamic explanatory variables in water consumption within the City of Kelowna (British Columbia, Canada). Furthermore, this research focuses on a deterministic approach to forecast short-term and mid-term lead times under different temporal scales. To calibrate the forecasting models, urban dimension of the test data is considered (i.e., the whole urban scale, rather than neighborhoods or single buildings). This decision may have implications on usability of the model information for consumption and supply management.

Background
Information about future water consumption helps related authorities develop an integrated, efficient plan that reduces long-term supplier and consumer water stress. Accurate estimation of drinking water consumption and availability of resources is considered one of the solutions to reduce water stress [9,[14][15][16]. Moreover, having an accurate estimation of future consumption values improves sustainability for planning purposes. Unlike forecasting for other water resources-related parameters of interest (e.g., river discharge, sedimentation, rainfall, etc.), where influential factors in forecasting the goal values are mostly correlated (e.g., rainfall factors in modeling and simulating of river discharge), influential variables in forecasting water consumption are more complex with a weaker correlation. Moreover, applying large datasets with many input variables can lead to overfitting issues, which can misrepresent model performance through overly confident forecasting model predictions in addition to adding model complexity without improving accuracy. Hence, consumption forecasting requires a balance of optimizing the quantity of input variables with improvements in forecasting accuracy. It can improve the accuracy of the models while reducing the number of input variables by defining the model's input with active variables. Input variable selection techniques (e.g., Garson Equation, Influential correlation) are employed to highlight the active variables in a system [17].
There are different types of variables affecting consumption values such as climatic (e.g., temperature, relative humidity, rainfall, etc.) and socioeconomic (e.g., population and income) [18]. Commonly climatic variables are used in short-term and mid-term forecasting while socioeconomic variables are used to forecast long-term consumption values [19][20][21]. Common climatic factors considered in recent studies are temperature, precipitation, and previously recorded consumption values [20,22,23]. The number of studies on short-and mid-term forecasting of water consumption is considerable; however, limited studies have specifically investigated impact of climatic factors on consumption forecasting in the same period [24][25][26].
Influential factors in complex water consumption systems may benefit from chaos theory techniques to improve forecasting methods. As mentioned above, characteristics of influential factors in water consumption are characterized as complex systems. Chaos theory techniques may improve the application of forecasting models to estimate future values in complex systems and improve the prediction of nonlinearity within dynamic systems [13,50]. Descriptions of chaotic behavior have been used in various engineering applications. Chaotic systems are defined and characterized by large changes in behavior resulting from small changes in initial conditions of the system [51]. Using the 'extent of complexity' of a chaotic system (defined primarily in the context of the variability of relevant data), chaotic systems are classified as low-, medium-or high-dimensional [52]. Moreover, the availability of noise in time series increases the complexity of the data and results in a high embedding dimension [53]. In water distribution systems, factors such as temperature, humidity, precipitation, the economic condition of consumers, population size, holidays, etc., have an impact on water consumption. By aggregating data at increasing temporal scales, the effects of scaling time series on deterministic chaos can be found [54,55] and any missing data can be generated [56,57]. This approach has been used in solving problems in various fields of study such as river discharge [58][59][60], sedimentation [61][62][63], climate [64], lake level variability [65,66], rainfall [67][68][69][70], traffic speed [71], finance [72], image processing [73] and ship motion prediction [74]. However, there is a paucity of study on the application of chaos theory on water consumption forecasting methods. Oshima [13] developed a real-time forecasting system for water consumption based on the chaos theory. To support operations with labor saving, facility maintenance and efficient use of energy, he introduced "information integration type chaos theory-based consumption forecasting". The report showed the application of chaos-based techniques in improving the accuracy of the results.
Evolutionary techniques became popular in modeling and optimizing fields. Genetic programming (GP) and gene expression programming (GEP) place on heuristic algorithms that are found in Darwin's evolution theory [66]. Evolutionary techniques adjust the population of a specific solution. In each stage, individuals are selected randomly from the current population. Then, the chosen individual plays the parent's role to be reproduced for another population for the next generation of solutions. This reproduction goes toward an optimal answer which has been defined as the goal values. The evolutionary techniques are used for optimization problems where standard methods are not suitable, such as discontinuity, nondifferentiable, stochastic, or highly nonlinear objective functions. Among evolutionary techniques, GP and GEP are used for modeling problems with the same abilities as GA such as informing data gaps and forecasting time series [18,[75][76][77]. Aytek and Kishi [78] used the GP technique to model suspended sediment load in the Tongue River (United States) and reported better performance from GP compared to sediment rating curves and multiple linear regressions (MLR). Since GEP provides a tree-structure scheme, it makes GEP more convenient to interpret the results in comparison with GP. Moreover, GEP presents mathematical equations that clarify the relationship between input and output variables by a factor of 100-10000 [63,79,80]. The superiority of GEP and the advantages of this technique interested researchers to develop more sophisticated models with hybrid methods such as combining the extended Kalman filter [81], clustering the consumption values [82], Wavelet decomposition [39], and phase space reconstructed GEP (PSR-GEP) [42] in forecasting urban drinking water consumption. The results showed that GEP models are highly sensitive to wavelet decomposition when attempting to improve the performance of the models.

Problem Statement
Growth in peak water consumption, which overburdens urban water resources [30], requires an efficient management plan. This importance motivated the application of soft-computing techniques into urban water consumption forecasting methods to develop methods that are applicable in forecasting problems. Many techniques have been proposed to forecast water consumption under differing time scales; however, there has been limited investigation on performance comparisons between models to inform model selection under various conditions [5]. Moreover, non-stationary, non-linear, and inherent stochasticity of water consumption data makes forecasting problems more challenging in this field [83]. Donkor et al. [84] reported that periodicity and forecasting horizon influence the performance of the methods in short-term and long-term forecasting, such as artificial neural networks and econometric models, respectively. Therefore, understanding the forecasting horizon, which is dependent on the dataset considered, will help to categorize the performance of the developed models. It can be considered as a classification for short-, mid-, and long-term forecasting models, which help operators select the appropriate model to implement considering the available dataset [9,20,85]. Forecasting horizon can improve the reliability of consumption forecasting methods by informing which method is most useful for consumption forecasting under specified time frames; however, in general, there is currently no acknowledgement of time frame for these forecasting horizons [84]. This study focuses on the forecasting horizon for the considered models to inform performance under various periodicity (daily, 2-day, 4-day, weekly, bi-weekly, and monthly) and lead time.
Based on previous studies, common influential variables in water consumption forecasting include observed consumption values (e.g., historical recorded consumption values over various periodicity), climatic variables (e.g., temperature, humidity, levels of snow and rainfall) and socioeconomic variables (e.g., population growth rate, economic factors such as income and water cost) [5]. Although these variables are used in the literature, there are fewer studies that used socioeconomic variables in long-term forecasting than studies that employed the above-mentioned variables. Moreover, extrapolation models in water consumption forecasting are based on the recorded data and its error terms [84]. The limitation of these models is based on their dependence of past trends that will likely be observed in the future, but do not consider the role socioeconomic variables and consumer habits may play in influencing future consumption values. Recently published studies investigating these factors are limited and require (1) accurate estimation and forecasting of water consumption considering different periodicity and forecasting horizons to classify the models based on their performance, and (2) determination of the degree of nonlinearity among the influential variables to consider exogenous variables (e.g., socioeconomic). The role of these factors in the value of water consumption in the future is highlighted. Therefore, investigating the complexity of input data can be helpful in employing different techniques with various abilities to model forecasting water consumption problems.

Objective
The techniques presented in the literature are often considered for short-term forecasting, which is based on one-day-ahead lead time. In light of the previous discussion on model limitations, it has been reported that the weaknesses of the forecasting models are related to the lack of consideration for the complex behavior of water consumption datasets. Much of the existing research ignores the importance of dynamic behavior of consumption datasets although many pre-processing techniques have been introduced to improve model accuracy. This study investigates the dynamics of a dataset to detect chaotic behavior of water consumption time series. Then, a non-linear approximation technique is applied to forecast water demand consumption. The objective of this study was to (1) investigate system complexity and its role in forecasting technique performance, (2) evaluate selected forecasting models by periodicity and lead time, and (3) propose a suitable forecasting technique based on operator applications and performance through different time scales.
This study improves the understanding of nonlinear local approximation (NLA) performance in comparison with previously introduced methods in water consumption forecasting. Moreover, forecasting horizon is introduced as a factor to evaluate the performance of forecasting models. This study attempts to address gaps in previous studies by applying NLA, dynamic investigation, phase space reconstruction for input variables, and improving the accuracy in various periodicity and lead times. Figure 1 provides a schematic description of the research methodology. Average mutual information (AMI) is used to calculate the lag time. Then, the existence of chaotic behavior in the test case is investigated by the correlation dimension method. Four different methods are employed to forecast the short-and mid-term consumption values. Non-Linear local approximation method is considered as the forecasting method in the condition of the test data that has chaotic behavior. The performance of selected models is evaluated in different lag and lead times.

Phase Space Reconstruction
This research employed the phase-space concept to better understand the dynamic nature of a municipal water consumption dataset for the City of Kelowna. The dynamics of a water consumption system are represented by data points along a trajectory, whereby each position in time represents a system state. The lag-embedding technique can be used on deterministic, dynamic systems such as the present water consumption dataset to reconstruct phase-space from time series. The fundamental dynamics of a system can be studied by reconstructing an m-dimensional phase-space of Dt that is defined by [86,87]: where Dt is a vector of the consumption data of {dt}t=1,…, N, N is the number of recorded consumption data points, is the lag time and m is the number of embedding dimension that generally varies from 1 to 10 or 1 to 20 [61,66,88,89]. In the case when m is greater than the minimum embedding dimension, the trajectory of reconstructed vectors can display the true state of the chaotic system. Indeed, the lag time ( ) is arbitrary as the data are often assumed to have infinite precision. The lag time should not be too small given the difference between various elements of the delay vectors and, it should not be too large as this can result in low coordinate correlation [90]. If the dynamics of the system can be reduced to a set of deterministic laws, trajectories will converge towards a subset of the phase-space with a fractional dimension called the attractor [89]. The lag-embedding method is sensitive to both embedding parameters of and m. Average mutual information (AMI) is a wellknown method for estimating the lag time [91,92]. This research employed AMI to estimate the lag time [91]: where the sum is extended over the total number of samples in the time series, P(dt) and P(dt+τ) are the marginal probabilities for measurements dt and dt+τ and P (dt, dt+τ) is their joint probability. The optimal τ minimizes the value of the function I (dt, dt+τ) for t = τ. AMI considers the first local minimum as the lag time [91,93].

Correlation Dimension
The dimension of a system informs its complexity and indicates the number of required variables that specify a deterministic system. Kermani [59] classified different dimensions in a system including topological, Hausdorf, box counting, point-wise, and correlation dimensions. Additionally, the correlation dimension is used as an indicator of a deterministic or stochastic process [59,94]. The function below calculates the correlation integral value [95]:

Phase Space Reconstruction
This research employed the phase-space concept to better understand the dynamic nature of a municipal water consumption dataset for the City of Kelowna. The dynamics of a water consumption system are represented by data points along a trajectory, whereby each position in time represents a system state. The lag-embedding technique can be used on deterministic, dynamic systems such as the present water consumption dataset to reconstruct phase-space from time series. The fundamental dynamics of a system can be studied by reconstructing an m-dimensional phase-space of Dt that is defined by [86,87]: where D t is a vector of the consumption data of {d t } t=1, . . . ,N , N is the number of recorded consumption data points, τ is the lag time and m is the number of embedding dimension that generally varies from 1 to 10 or 1 to 20 [61,66,88,89]. In the case when m is greater than the minimum embedding dimension, the trajectory of reconstructed vectors can display the true state of the chaotic system. Indeed, the lag time (τ) is arbitrary as the data are often assumed to have infinite precision. The lag time should not be too small given the difference between various elements of the delay vectors and, it should not be too large as this can result in low coordinate correlation [90]. If the dynamics of the system can be reduced to a set of deterministic laws, trajectories will converge towards a subset of the phase-space with a fractional dimension called the attractor [89]. The lag-embedding method is sensitive to both embedding parameters of τ and m. Average mutual information (AMI) is a well-known method for estimating the lag time [91,92]. This research employed AMI to estimate the lag time [91]: where the sum is extended over the total number of samples in the time series, P(d t ) and P(d t+τ ) are the marginal probabilities for measurements d t and d t+τ and P (d t , d t+τ ) is their joint probability. The optimal τ minimizes the value of the function I (d t , d t+τ ) for t = τ. AMI considers the first local minimum as the lag time [91,93].

Correlation Dimension
The dimension of a system informs its complexity and indicates the number of required variables that specify a deterministic system. Kermani [59] classified different dimensions in a system including topological, Hausdorf, box counting, point-wise, and correlation dimensions. Additionally, the correlation dimension is used as an indicator of a deterministic or stochastic process [59,94]. The function below calculates the correlation integral value [95]: where N is the number of data points, H is the Heaviside step function (H (u) = 1 for u > 0, H (u) = 0 for u ≤ 0 and u = r − X i − X j , X i is the ith state vector, and r is the radius of a sphere with the content of X i or X j as the center. C m (r) is proportional to r for stochastic time series, whereas for chaotic time series, it scales with r as where c e is the correlation exponent defined by approximating the slope of log C m (r) versus log (r) in logarithmic scale. If the calculated c e is unchanged by increasing the number of embedding dimensions, C e can be considered as the correlation dimension of the attractor in the system. But, if c e is not stable as a function of embedding dimensions, this system can be considered non-chaotic [54,96,97].

Nonlinear Local Approximation
Nonlinear local approximation (NLA) can forecast a system's future without development of an analytical model [51]. This research applied NLA to (1) test the chaotic nature of consumption data and (2) forecast the consumption in the case study. This was done by reconstructing the phase space of the dataset. It is critical in a chaos analysis to reconstruct the multi-dimensional phase space that provides a conceptual pattern for the time series. The presence of attractors in phase space indicates the possibility of chaos in the dataset. A phase-space reconstruction in a dimension m facilitates an interpretation of the underlying dynamics in the form of an m-dimensional map, f T , by where X j is the vector of dimension m at the current state of the system at time j and X j+T is the vector of dimension m at the future state of the system at time j + T. NLA entails the subdivision of the f T domain into many subsets. In other words, the dynamics of the system were described step-by-step locally in the phase-space [98]. In an m-dimensional space, estimating the change of trajectory with time would lead to forecasting. Considering the relation between two states → X t and → X t+p , the behaviour at a future time (p) on the attractor was forecasted by the mapping → F as [66]: where the evolving dynamic of state → X t is influenced by nearby states. The future state → X t+p was determined by the first-order polynomial mapping While the mapping → f is linear, the forecasted value is nonlinear [100]. This is because every state on the trajectory belongs to a different subset defined by different expressions for → F .

Largest Lyapunov Exponent
The rate of convergence or divergence in different dimensions is measured using the Lyapunov exponent. A deterministic system contains at least one positive Lyapunov exponent [101]. This research Water 2020, 12, 753 8 of 22 employed the approach proposed by Rosenstein et al. [101] to extract the value of Lyapunov exponent for the test case. After reconstruction of phase-space, a point Y n0 was selected and all the points in the neighborhood of Y n , with the closer distance of r to that point were found. r is the radius of a sphere with the content of Y n as the center. The procedure was repeated for N points in the route to find a stretch factor S: where u Y n0 is the number of neighbors to point Y n0 . The plot of S verses N consists of linear and nonlinear components. The literature provides two methods for calculating the largest Lyapunov exponent (λ max ). Shang et al. [102] determined λ max as the slope of the linear part of the curve, while Rosenstein et al. [101] determined λ max as the average of the slope of the first part and that of the second part [89,102]. This research applied the second approach to study different temporal scales. The prediction horizon (∆t) is a time in which the consumption dataset sustains its dynamics in the most accurate forecasting; ∆t was obtained as the inverse of the largest Lyapunov exponent:

Gene Expression Programming
Evolutionary soft computing techniques may be applied to solve engineering problems. Among these evolutionary techniques, genetic algorithm (GA), genetic programming (GP), and gene expression programming (GEP) are considered, which were originally inspired from Darwin's theory of evolution [79,80,103,104]. GA and GP work with a string of numbers and a specific length that is known as a "chromosome." GEP defines an equation that shows the relationship between input and output values. Moreover, the process of the algorithm's learning starts with the generation of chromosomes for a given random raw dataset that works with chromosomes and expression trees. Expression trees demonstrate the relationship among variables connecting with arithmetic operators. Chromosomes contain genes that provide a series of symbols of two parts, head and tail, which have functions, terminals, and only terminals, respectively. GEP follows the genome restructuring process by mutation, recombination, transposition, and gene duplication randomly. This random reputation will deliver the best model to be selected. This cycle will be continued by the reproduction of randomly generated chromosomes to reach the model with satisfying results based on defined evaluation criteria. In this research, 30 chromosomes with the head size of 8 and 3 genes are reproduced with the linking functions of +, −, ×, x, x 2 , √ x, log x . The fitness functions used as the selection criteria are root mean square error (RMSE), correlation coefficient (CC) and mean absolute error (MAE). All models were selected with a fair assessment by considering stopping condition of 5000 generations. Figure 2 shows the scheme of the process with GEP method. where 0 is the number of neighbors to point Yn0. The plot of S verses N consists of linear and nonlinear components. The literature provides two methods for calculating the largest Lyapunov exponent (λmax). Shang et al. [102] determined λmax as the slope of the linear part of the curve, while Rosenstein et al. [101] determined λmax as the average of the slope of the first part and that of the second part [89,102]. This research applied the second approach to study different temporal scales. The prediction horizon (Δt) is a time in which the consumption dataset sustains its dynamics in the most accurate forecasting; Δt was obtained as the inverse of the largest Lyapunov exponent:

Gene Expression Programming
Evolutionary soft computing techniques may be applied to solve engineering problems. Among these evolutionary techniques, genetic algorithm (GA), genetic programming (GP), and gene expression programming (GEP) are considered, which were originally inspired from Darwin's theory of evolution [79,80,103,104]. GA and GP work with a string of numbers and a specific length that is known as a "chromosome." GEP defines an equation that shows the relationship between input and output values. Moreover, the process of the algorithm's learning starts with the generation of chromosomes for a given random raw dataset that works with chromosomes and expression trees. Expression trees demonstrate the relationship among variables connecting with arithmetic operators. Chromosomes contain genes that provide a series of symbols of two parts, head and tail, which have functions, terminals, and only terminals, respectively. GEP follows the genome restructuring process by mutation, recombination, transposition, and gene duplication randomly. This random reputation will deliver the best model to be selected. This cycle will be continued by the reproduction of randomly generated chromosomes to reach the model with satisfying results based on defined evaluation criteria. In this research, 30 chromosomes with the head size of 8 and 3 genes are reproduced with the linking functions of +, −,×, , , √ , log . The fitness functions used as the selection criteria are root mean square error (RMSE), correlation coefficient (CC) and mean absolute error (MAE). All models were selected with a fair assessment by considering stopping condition of 5000 generations. Figure 2 shows the scheme of the process with GEP method.

Multiple Linear Regression
The common application of the linear regression (LR) modeling method is to model the goal variable (Y) with a single input variable (X). Multiple linear regression (MLR) describes the relationship between the goal value (Y) and the number of input variables (X1, X2, …, Xn). In other words, MLR finds a relationship with a linear combination of independent explanatory variables (e.g., reconstructed phase space with a different number of embedding dimensions for recorded

Multiple Linear Regression
The common application of the linear regression (LR) modeling method is to model the goal variable (Y) with a single input variable (X). Multiple linear regression (MLR) describes the relationship between the goal value (Y) and the number of input variables (X 1 , X 2 , . . . , X n ). In other words, MLR finds a relationship with a linear combination of independent explanatory variables (e.g., reconstructed phase space with a different number of embedding dimensions for recorded values of water consumption) and response variable (time ahead values of water consumption) by For i = n recorded values. Y i is the output consumption value, X i is the explanatory input variables, dependent upon the number of embedding dimensions, where α 0 is constant, α p are slope coefficients for each input variables and is the unexplained residual value.

Models Selection Criteria
Correlation Coefficient (CC), root mean square error (RMSE) and mean absolute error (MAE) are fitness functions considered in this study to evaluate model performance and accuracy.
Variables in the above equations include the number of values, N t , and the recorded and forecasted values, R and F, respectively. R and F are the mean values of the recorded and forecasted water consumption, respectively. The range of CC is −1 and 1, where larger positive value of CC and smaller value for RMSE and MAE indicates better model performance in forecasting accuracy.

Test Case
Water consumption data from the City of Kelowna (British Columbia, Canada) were considered as a test case for this study. The water supply for the region comes from various resources, which include Lake Okanagan, Mission Creek, Mill Creek, Scotty Creek, Hydraulic Creek and numerous wells [105]. There is one primary distribution system that services 99% of the population via Poplar Point, Eldorado (seasonal intake uses only utilized during peak consumption) and Cedar Creek pump stations [105] plus Swick road pump station which services approximately 300 residents. This study considered Poplar Point, Eldorado and Cedar Creek stations. The SCADA system obtains information remotely from sensors installed at the intake locations. The software platform facilitates tracking of historical system performance for auditing and future decision making to optimize the system [105]. This study used six temporal scales of the consumption, including daily, 2-, 4-, 7-, 14-and 30 days from 1st of January 2010 to 30th December 2016. The test data are the whole scale of consumption values as the spatial scale for City of Kelowna. Table 1 shows the characteristics of the dataset in the test case.

Results
Lag Time: Figure 3a,b presents the time variation of water consumption dataset for a six-year period (from January 2011 to December 2016) and the average 24-h consumption based on City of Kelowna's Utility report. The data were parsed into hourly maximum, minimum, and mean values for a 24-h period (Figure 3b). Figure 3b also presents a boxplot of the total mean, minimum, and maximum consumption value distributions for the six years period. Figure 3b indicates that the average consumption and minimum values have low frequency, demonstrating the highly deterministic behavior of the data. To calibrate the models, the data splinted into two folds; (1) 80% for training period (1st January 2016 to 31st December 2015); (2) 20% for test period (1st January 2016 to 31st December 2016). Water consumption estimation and forecasting of the average and minimum values are not as complex as compared to the peak consumption values. However, the maximum values exhibit a non-linear behavior which does not follow a specific pattern (i.e., appears to be random) unlike average and minimum values. Further, the maximum consumption values in a water distribution system are very important. This is because of estimation of peak consumption to supply customers' consumption and optimize WDS pipeline to make WDS more reliable, such as managing pipeline failures, improving peak pressure, reducing leakage, etc.
Water 2020, 12, x FOR PEER REVIEW 10 of 23

Results
Lag Time: Figure 3a,b presents the time variation of water consumption dataset for a six-year period (from January 2011 to December 2016) and the average 24-h consumption based on City of Kelowna's Utility report. The data were parsed into hourly maximum, minimum, and mean values for a 24-h period (Figure 3b). Figure 3b also presents a boxplot of the total mean, minimum, and maximum consumption value distributions for the six years period. Figure 3b indicates that the average consumption and minimum values have low frequency, demonstrating the highly deterministic behavior of the data. To calibrate the models, the data splinted into two folds; (1) 80% for training period (1st January 2016 to 31st December 2015); (2) 20% for test period (1st January 2016 to 31st December 2016). Water consumption estimation and forecasting of the average and minimum values are not as complex as compared to the peak consumption values. However, the maximum values exhibit a non-linear behavior which does not follow a specific pattern (i.e., appears to be random) unlike average and minimum values. Further, the maximum consumption values in a water distribution system are very important. This is because of estimation of peak consumption to supply customers' consumption and optimize WDS pipeline to make WDS more reliable, such as managing pipeline failures, improving peak pressure, reducing leakage, etc. Average mutual information (AMI) was used to identify the proper lag time in this study. Figure  4a presents the first local minimum lag time of τ = 17 days for daily time series as a function of lag time. The first minimum values of AMI were at lag times of 17, 12, 10, 6, 3 and 2 for daily, 2-, 4-, 7-, 14-and 30-days (monthly) data series of the water consumption, respectively. Table 2 summarizes the results for all other timescales. By using the lag time of τ = 17 days, the phase-space was reconstructed, as presented in Figure 4b for daily scale. The vertical axis shows the time series and the other two axes show the same time series delayed by 17-(τ) and 34-days (2τ). This figure shows reconstructions in three dimensions; the projection attractor on the plane , , with the lag time of 17 days. When comparing the two PSR graphs in Figure 4b, the presence of attractors becomes clear for τ = 17 days (black line) comparing to τ = 1 (blue dots). Indeed, the phase space is more spread for 1-day lag time and more concentrated on attractors for 17-days lag time. Based on the figure, PSR can make a consistent set of inputs for the model that makes better results in comparison with other sets having less consistency (e.g., peak consumption represented in Figure 3b). Average mutual information (AMI) was used to identify the proper lag time in this study. Figure 4a presents the first local minimum lag time of τ = 17 days for daily time series as a function of lag time. The first minimum values of AMI were at lag times of 17, 12, 10, 6, 3 and 2 for daily, 2-, 4-, 7-, 14-and 30-days (monthly) data series of the water consumption, respectively. Table 2 summarizes the results for all other timescales. By using the lag time of τ = 17 days, the phase-space was reconstructed, as presented in Figure 4b for daily scale. The vertical axis shows the time series and the other two axes show the same time series delayed by 17-(τ) and 34-days (2τ). This figure shows reconstructions in three dimensions; the projection attractor on the plane x i , x i+τ , x i+2τ with the lag time of 17 days. When comparing the two PSR graphs in Figure 4b, the presence of attractors becomes clear for τ = 17 days (black line) comparing to τ = 1 (blue dots). Indeed, the phase space is more spread for 1-day lag time and more concentrated on attractors for 17-days lag time. Based on the figure, PSR can make a consistent set of inputs for the model that makes better results in comparison with other sets having less consistency (e.g., peak consumption represented in Figure 3b). Correlation Dimension: Figure 5a plots the results for correlation integral versus log (r) for different dimensions (m) in the range of 1 to 20 for daily consumption time series. The correlation exponents, Ce (m), were determined and the results were plotted in Figure 5b to evaluate the presence of chaos in the dataset. As demonstrated in Figure 5b, the correlation exponent increases with the embedding dimension up to a certain value and then remains steady. The figure reveals that the slope of larger embedding would become constant for all temporal scales. The embedding dimension of 17 appears to explain the dynamics of the system (1-and 2-day consumption). The correlation exponents of 3.5, 3.37, 3.74, 3.94, 3.83 and 3.49 were determined for daily, 2-, 4-, 7-, 14-and 30-day, respectively. Regarding the concept of PSR, rounding up the correlation exponent to the nearest integer informs the number of dominant variables. Thus, it can be interpreted as the number of variables that dominantly govern the temporal dynamics of water consumption, which is approximately 4 for all temporal scales. Table 2 presents the results of lag time and correlation exponent for all temporal scales. Moreover, the condition of Ce < 2 logN (with N as the number of data point) was satisfied for the chaotic temporal scales [106].  Correlation Dimension: Figure 5a plots the results for correlation integral versus log (r) for different dimensions (m) in the range of 1 to 20 for daily consumption time series. The correlation exponents, C e (m), were determined and the results were plotted in Figure 5b to evaluate the presence of chaos in the dataset. As demonstrated in Figure 5b, the correlation exponent increases with the embedding dimension up to a certain value and then remains steady. The figure reveals that the slope of larger embedding would become constant for all temporal scales.    The embedding dimension of 17 appears to explain the dynamics of the system (1-and 2-day consumption). The correlation exponents of 3.5, 3.37, 3.74, 3.94, 3.83 and 3.49 were determined for daily, 2-, 4-, 7-, 14-and 30-day, respectively. Regarding the concept of PSR, rounding up the correlation exponent to the nearest integer informs the number of dominant variables. Thus, it can be interpreted as the number of variables that dominantly govern the temporal dynamics of water consumption, which is approximately 4 for all temporal scales. Table 2 presents the results of lag time and correlation exponent for all temporal scales. Moreover, the condition of C e < 2 logN (with N as the number of data point) was satisfied for the chaotic temporal scales [106].
Nonlinear Local Approximation and Largest Lyapunov Exponent: The first objective in this research was to investigate the impact of PSR in the accuracy of forecasted values. Forecasted values were evaluated for the embedding dimensions (ranging from 2 to 10) at lag times 1 and 17 days to forecast 1-day-ahead and highlight the effect of embedding dimension and performance of models. For m = 2, for a lag time of 1 day, two variables D t and D t−1 , and for m = 2, for a lag time of 17 days, two variables D t and D t−17, were used as input variables to forecast 1-day ahead (D t+1 ). For m = 3, for the lag time of 1 and 17 days, three variables D t , D t−1 , D t−2 and D t , D t−17 , D t−34 were used as input variables to predict 1-day-ahead. Moreover, phase space was reconstructed for m > 3. Table 3 presents a summary of the 1-day ahead forecasted values that reconstructed phase-space in dimensions ranging from 2 to 20 for τ = 1 and 17 days. The overall average of fitness values for all embedding dimensions CC > 0.96, RMSE < 4200 (m3/day) (8% of average daily consumption) and MAE < 49. These results can be considered as reasonable performance of the model. Table 3 Figure 6 compares the NLA with observed data using bolded values in Table 3.
Water 2020, 12, x FOR PEER REVIEW 12 of 23 Nonlinear Local Approximation and Largest Lyapunov Exponent: The first objective in this research was to investigate the impact of PSR in the accuracy of forecasted values. Forecasted values were evaluated for the embedding dimensions (ranging from 2 to 10) at lag times 1 and 17 days to forecast 1-day-ahead and highlight the effect of embedding dimension and performance of models. For m = 2, for a lag time of 1 day, two variables Dt and Dt−1, and for m = 2, for a lag time of 17 days, two variables Dt and Dt−17, were used as input variables to forecast 1-day ahead (Dt+1). For m = 3, for the lag time of 1 and 17 days, three variables Dt, Dt−1, Dt−2 and Dt, Dt−17, Dt−34 were used as input variables to predict 1-day-ahead. Moreover, phase space was reconstructed for m > 3. Table 3 presents a summary of the 1-day ahead forecasted values that reconstructed phase-space in dimensions ranging from 2 to 20 for τ = 1 and 17 days. The overall average of fitness values for all embedding dimensions CC > 0.96, RMSE < 4200 (m3/day) (8% of average daily consumption) and MAE < 49. These results can be considered as reasonable performance of the model. Table 3 reveals the optimum embedding dimension for the most accurate forecasted value in bold. Using CC, RMSE, and MAE, the optimum embedding dimension (mopt) was found to be 18 and 19 for the lag time of 1 and 17 days, respectively. The results of nonlinear local approximation (NLA) identify the closest results of correlation exponent in optimum embedding dimension (m = 17 for correlation dimension of daily value). The similar values for the embedding dimension of daily values imply that m = 17 or 18 is the optimum dimension of the system to be used as the models' input. The second objective was to evaluate the performance of selected m and τ. The optimum models for m = 18, τ = 1 and m = 19, τ = 17 have been applied to forecast 2-, 4-, 7-, 14-, 30-and 60-day lead time water consumption. Figure 6 compares the NLA with observed data using bolded values in Table 3.    Three different values of λ max are considered because of the embedding dimensions range that were determined by CD, FNN, NLA and PSR-NLA, which were between 17 and 19. Note that m = 18 and 19 were determined by NLA and PSR-NLA, respectively, which gave a higher accuracy than the other embedding dimensions (see Table 3). The forecasting horizons (∆t = 1/λ max ) were 72, 98 and 122 days for m = 17, 18 and 19, respectively. Moreover, m = 17 with forecasting horizon value of 72 was determined by CD. Figure 7b shows the frequency of the data is smooth from day 1 to 72. Even with the high data frequency between days 98 and 122, PSR-NLA increased the forecasting horizon from 98 days (NLA) to 122 days. Considering the results of CD, NLA, PSR-NLA, and forecasting horizon, it is reasonable to select m = 17 as the system's optimum dimension.

Phase Space Reconstructed GEP (PSR-GEP) and Multiple Linear Regression (MLR) Models:
One of the most important steps in developing an accurate model is the selection of the input variables. Input variable selection challenges the effect of the number of inputs in models' performance. In other words, there are diminishing returns on performance based on the number of input variables selected [42]. The combinations were selected in a way that included daily consumption data with lag times of τ = 1 and 17 days. Different combinations of the time series of daily consumption were used to structure a policy for input dataset criteria. Combinations of D t , D t−1 , D t−2 , . . . , D t−20 variables were used as input data with D t+1 (1-day time ahead) as output of the GEP and MLR model and combinations of D t , D t−τ , D t−2τ , . . . , D t−20τ variables were used as input data with D t+1 as output of the PSR-GEP and PSR-MLR model (forecasting of 1 day time ahead). The combinations of arithmetic functions of +, −, ×, x, x 2 , √ x, log x and {+, −, ×,} were used for PSR-GEP and PSR-MLR models, respectively. Ultimately, the best combination was selected using the criteria of CC, RMSE, and MAE. Table 4 reveals the 20 combinations of inputs and their performance with GEP and PSR-GEP models. In the table, the three criteria indicate the fourth combination of inputs (m = 4) with the lag time of 1-day, and (m = 8) with the lag time of 17 days as the best combinations of input data (reconstructed phase space) for GEP and PSR-GEP. The study reveals the following relationship for both GEP and PSR-GEP, respectively: The results were not found to be more sensitive to lag time calculated by AMI (17 days) than to the 1-day lag time. However, PSR made the performance of the model better than m = 4 with 1-day lag time. Equation (14) used all four variables made by m = 4 (D t−1 , D t−2 , D t−3 , D t−4 ), while equation (15) only used two of the variables out of 8 variables (m = 8; D t−τ , D t−4τ ). Table 4 compares further details for GEP and PSR-GEP models for different lead times forecasted values. However, the results of both GEP and PSR-GEP are satisfactory, although they do not forecast as accurately as NLA and PSR-NLA. Table 5 presents the results for the MLR model that was used as a test to compare the results of the techniques studied. The application of PSR in improving accuracy is shown in the table. Surprisingly, the best embedding dimension for MLR is 17, which was identified by CD and NLA. Additionally, as expected, PSR-MLR results are more accurate than MLR when forecasting for different lead times.

Discussion
A steady value of the correlation exponent in a certain embedding dimension is an indication of chaotic behavior in all six temporal scales. The correlation exponent became steady after an embedding dimension of 17 for the temporal scales. The correlation exponent results show that the 17th embedding dimension may be considered the optimum embedding dimension for this system; however, we conducted an additional investigation to better understand the optimum embedding dimension for the test case. Moreover, the saturation and constant slope of correlation exponent for all temporal scales revealed a highly chaotic behaviour exhibited by the whole system. Many studies that investigated the chaotic behaviour of the system concluded that high-resolution timescales demonstrate chaotic behaviour while the low-resolution timescale of the same time series does not exhibit chaotic behaviour.
Regarding the fractal law, it remains uncertain whether a chaotic system should have chaotic behaviour in all temporal scales; we speculate that this is an important factor to consider under this application. Based on the presented results, this study suggests that to investigate the availability of chaos in any dataset, investigation for different temporal scales is needed to indicate if the system has a chaotic behaviour, as this conclusion supports the findings presented by other researchers [54]. The present investigation of chaotic behaviour in different time scales using the test case provides enough information to evaluate the objectives of this research. However, additional study is required to make definitive conclusions regarding the findings of this study and more generally, the chaotic behaviour of consumption data. Nevertheless, the study findings provide encouraging evidence to further investigate the chaotic behaviour of water consumption values over different time scales of a dataset.
The results indicate that forecasting was more sensitive to optimum embedding dimension and lag time calculated by AMI than to the reconstructed phase space by 1-day lag time. However, the difference between the results of NLA and PSR-NLA was not considerable; the performance of PSR-NLA was better than NLA to forecast consumption values in different lead time. Moreover, m opt = 19 provided more accurate predictions than m opt = 18 (Table 3). Figure 8 compares observed values to the forecasted results, which highlights the negligible error in the results of PSR-NLA and GEP during the test period. Regarding the average fitness results for all dimensions, the performance of GEP is better than PSR-GEP, while the application of PSR is shown in forecasting with equal dimension and different lead times. Figure 8a shows the greatest accuracy from the results shown by PSR-NLA compared to the other models. Figure 8b shows the value of residual for forecasted consumption by PSR-NLA in two forecasting horizons. The performance of PSR-NLA in forecasting high-frequency values is shown in Figure 8b. The figure reveals that the optimum embedding dimension of 19 was more acceptable than the embedding dimension of 17 determined by the correlation function ( Figure 5b). demonstrate chaotic behaviour while the low-resolution timescale of the same time series does not exhibit chaotic behaviour.
Regarding the fractal law, it remains uncertain whether a chaotic system should have chaotic behaviour in all temporal scales; we speculate that this is an important factor to consider under this application. Based on the presented results, this study suggests that to investigate the availability of chaos in any dataset, investigation for different temporal scales is needed to indicate if the system has a chaotic behaviour, as this conclusion supports the findings presented by other researchers [54]. The present investigation of chaotic behaviour in different time scales using the test case provides enough information to evaluate the objectives of this research. However, additional study is required to make definitive conclusions regarding the findings of this study and more generally, the chaotic behaviour of consumption data. Nevertheless, the study findings provide encouraging evidence to further investigate the chaotic behaviour of water consumption values over different time scales of a dataset.
The results indicate that forecasting was more sensitive to optimum embedding dimension and lag time calculated by AMI than to the reconstructed phase space by 1-day lag time. However, the difference between the results of NLA and PSR-NLA was not considerable; the performance of PSR-NLA was better than NLA to forecast consumption values in different lead time. Moreover, mopt = 19 provided more accurate predictions than mopt = 18 (Table 3). Figure 8 compares observed values to the forecasted results, which highlights the negligible error in the results of PSR-NLA and GEP during the test period. Regarding the average fitness results for all dimensions, the performance of GEP is better than PSR-GEP, while the application of PSR is shown in forecasting with equal dimension and different lead times. Figure 8a shows the greatest accuracy from the results shown by PSR-NLA compared to the other models. Figure 8b shows the value of residual for forecasted consumption by PSR-NLA in two forecasting horizons. The performance of PSR-NLA in forecasting high-frequency values is shown in Figure 8b. The figure reveals that the optimum embedding dimension of 19 was more acceptable than the embedding dimension of 17 determined by the correlation function (Figure 5b).  The results in Figure 8b cannot reject the optimum embedding dimension that was determined from other methods in this study. It can be concluded that the results of the 17th dimension are more reliable than those of the 19th dimension. However, the 19th dimension will offer a longer forecasting horizon. Therefore, based upon the primary goal of modeling, either dimension may be a candidate for optimum forecast modeling. The importance of embedding dimension is demonstrated when considering the small difference among the forecasted values by different embedding dimensions in all models. Therefore, it is recommended to test the optimum embedding dimension by various methods. However, the model results are quite similar; the overview of approximation for all the techniques for multiple dimensions concluded that the optimum embedding dimension is between m = 17 and m = 19, demonstrating reliability of the methods. To compare the models, Table 6 shows the comparison of statistics of each forecasted value with observed data. The check marks belong to the models where the results are closer to the observed values' statistics. The results reveal the PSR technique had a positive impact on the accuracy of all models at all temporal scales, especially in high forecasting horizons. As Figure 9 shows, the performance of NLA and PSR-NLA was approximately similar for daily and 2-days ahead forecasting, but when considering the lead time with longer period, PSR-NLA was more accurate than NLA in lead time of 1-and 2-months ahead. The value of correlation coefficient for NLA and PSR-NLA shows 43% improvement. Following CC, the values of RMSE and MAE showed the performance of PSR-NLA was better than NLA in forecasting for longer lead time. Nevertheless, the slope of the fitted line for the fitness functions shows that PSR-NLA can forecast more accurate values in comparison with NLA. The results of this application of PSR may also serve useful for application in long-term forecasting, which was previously identified in the literature as a topic requiring further study in drinking water consumption modeling.

Conclusions
This study presented a novel approach to improve the accuracy of models in forecasting residential daily water consumption values. The residential water consumption dataset from the City of Kelowna (British Columbia, Canada) was used as a test case to forecast future consumption values using varying lead times under different temporal scales to identify models which may improve forecasting performance. A chaos approach based nonlinear local approximation was compared with previously studied forecasting methods to assess and consider the applicability of the chaotic behavior of urban consumption values to better improve forecasting performance. To investigate the existence of chaos in test data, different temporal scales were considered. The results showed all temporal scales, daily at 2-, 4-, 7-, 14-and 30-days, have chaotic behavior with similar correlation exponent values for all scales. The value of the largest Lyapunov Exponent showed the reliable time period for the models as a forecasting horizon. Dynamic investigation of explanatory variables and phase space reconstruction of the input variables based on optimum correlation dimension was considered to improve the accuracy in various periodicity and lead-times. The findings suggest that considering the chaotic behavior of consumption values by taking the forecasting horizon into account, will give insight about the behavior of data. It was recommended to investigate the performance of the models in forecasting future values with different lead-times. Considering the fitness values for the models by different lead-times within the forecasting horizon, substantial unstudied criteria remain to organize different techniques based on the behavior of the models input data. Moreover, the high results of the optimum embedding dimension in reconstructed phase space

Conclusions
This study presented a novel approach to improve the accuracy of models in forecasting residential daily water consumption values. The residential water consumption dataset from the City of Kelowna (British Columbia, Canada) was used as a test case to forecast future consumption values using varying lead times under different temporal scales to identify models which may improve forecasting performance. A chaos approach based nonlinear local approximation was compared with previously studied forecasting methods to assess and consider the applicability of the chaotic behavior of urban consumption values to better improve forecasting performance. To investigate the existence of chaos in test data, different temporal scales were considered. The results showed all temporal scales, daily at 2-, 4-, 7-, 14-and 30-days, have chaotic behavior with similar correlation exponent values for all scales. The value of the largest Lyapunov Exponent showed the reliable time period for the models as a forecasting horizon. Dynamic investigation of explanatory variables and phase space reconstruction of the input variables based on optimum correlation dimension was considered to improve the accuracy in various periodicity and lead-times. The findings suggest that considering the chaotic behavior of consumption values by taking the forecasting horizon into account, will give insight about the behavior of data. It was recommended to investigate the performance of the models in forecasting future values with different lead-times. Considering the fitness values for the models by different lead-times within the forecasting horizon, substantial unstudied criteria remain to organize different techniques based on the behavior of the models input data. Moreover, the high results of the optimum embedding dimension in reconstructed phase space revealed that to improve the models, considering a high number of input variables is not beneficial. Therefore, this research suggested investigating the dynamic behavior of explanatory variables in modeling and forecasting problems to improve the reliability and accuracy of the water consumption models.
Author Contributions: P.Y. Analyzed the data, defined the models, investigated the accuracy of the models and wrote the first draft of the manuscript; G.C. contributed in writing and analyzing; conceived and designed the experiments; G.N. contributed in writing and editing, analyzing the data and models; H.M. contributed in writing and editing. All authors have read and agree to the published version of the manuscript.
Funding: This research received no external funding.