Urban Water Demand Prediction for a City that Suffers from Climate Change and Population Growth: Gauteng Province case study RM Urban Water Demand Prediction for a City that Suffers from Climate Change and Population Growth: Gauteng Province case study. Water.

The proper management of municipal water system is essential to sustain cities and support water 16 security of societies. Urban water estimating has always been a challenging task for managers of 17 water utilities and policymakers. This paper applies a novel methodology that includes data pre- 18 processing and Artificial Neural Network (ANN) optimized with Backtracking Search Algorithm 19 (BSA-ANN) to estimate monthly water demand in relation to previous water consumption. 20 Historical data of monthly water consumption in the Gauteng Province, South Africa, for the period 21 2007–2016, were selected for the creation and evaluation of the methodology. Data pre-processing 22 techniques played a crucial role in the enhancing of the quality of the data before creating the 23 prediction model. The BSA-ANN model yielded the best result with a root mean square error and a 24 coefficient of efficiency of 0.0099 mega liters and 0.979, respectively. Also, it proved more efficient 25 and reliable than the Crow Search Algorithm (CSA-ANN), based on the scale of error. Overall, this 26 paper presents a new application for the hybrid model BSA-ANN that can be successfully used to 27 predict water demand with high accuracy, in a city that heavily suffers from the impact of climate 28 change and


32
Urban water security is essential to get a resilient environment in smart cities, particularly under 33 the stress of climate change and socio-economic factors [1,2]. Also, cities located close to water 34 resources are driven by all kinds of industries hence, water lack is considered a classic problem for 35 decision-makers [3,4]. Since the last century, gradual changes in freshwater resources have been 36 observed [5]. Recent studies related to climate change have shown that it plays a key role on 37 freshwater resources due to the potential decrease in rainfall amount [6]. Specifically, it has been 38 shown that climate change adversely impacts freshwater resources in the centre of cities, which in 39 turn impacts the sustainable development of water availability and consequently, impacts socio-40 economic activities [7]. In addition, several studies have shown that freshwater resources are 41 generally adversely affected by pollution [8,9].

42
Different regions in the world have been facing water scarcity situations, which implies that the 43 gap between water supply and demand is likely to increase in the future. The European Environment 44 Agency in 2010 reported that municipal water consumption is driven by complicated interactions 45 between anthropogenic and natural system factors at multiple spatial and temporal scales [10][11][12]. In 46 the Gauteng Province, the Republic of South Africa, the municipal water delivered has been less than 47 the demand. This imbalance is due to the impact of climate change, rainfall reduction, as well as 48 others that are human-related, such as economic expansion and population growth. The lack of 49 freshwater resources and the increase in water demand has put pressure on the municipal water 50 supply system. Hence the importance of using the prediction of water demands as an effective 51 approach for optimizing the operation and management of the system, or plan for future expansion 52 or reduction under the variability of climate and socio-economic factors [2,13,14].

53
House-Peters and Chang [15], Donkor,et al. [16], Ghalehkhondabi,et al. [17] and de Souza 54 Groppo,et al. [18] stated that different methods and models have been applied in previous studies to 55 predict municipal water demand, including traditional, Artificial Intelligence (AI), and hybrid AI 56 models. Traditional models, such as time-series analysis and regression [19,20], were firstly employed 57 in water demand simulation. However, traditional approaches lacked accuracy when forecasting 58 water demand, which can cause significant issues in the operation and management of the water 59 supply system. Additionally, the growth of the impact of climate change and urbanization cause high 60 uncertainty, making the prediction and forecasting more complex, which also motivated researchers 61 to further develop their models [21], including the use of AI techniques.

62
Data-driven techniques have a far-ranging application such as wastewater [22,23]

96
From the application area viewpoint, another significant consideration is that the selection of 97 best model input that drives the dependent variable [50,51]. Several techniques were applied in 98 different studies such principal component analysis (PCA) [52,53], variance inflation factor (VIF) 99 [21,35] and mutual information (MI) [54,55]. In this study, mutual information technique will use to 100 select the best scenario of model input based on several historical observed water consumption data.

101
According to the literature review, another significant consideration is that most of the studies 102 focus on short-term water demand estimate, while only a few deals with medium to long-term 103 prediction. Lately, various studies such as [33,[56][57][58] have employed historical data of water 104 consumption as a single input in their short-term prediction models.

105
However, a challenge still exists for managers of water utilities and policymakers due to the 106 uncertainty to gain knowledge about the capacity of water system under a potential rapid growth in 107 urban water demand as a consequence of socio-economic, demographic and climate factors. Also, as 108 mentioned previously, only a few studies have considered medium-term municipal water demand 109 based on previous water consumption. Therefore, these aforementioned problems motivated us to 110 propose an approach that would refine those existing approached, providing managers with 111 scientific, more accurate insights about the future water demand, reducing the uncertainty.

112
The main objectives of this research study are:

125
Based on the literature review, the research is thought to be the first study that used this novel 126 combined methodology, which include data pre-processing and automated machine learning to 127 forecast municipal water demand depend on some lags' values of water consumption as model input.

128
As such, it is considering the effect of all climate, demographic and socio-economic factors.

130
Gauteng province is the economic powerhouse of the Republic of South Africa, which has eight 131 metropolitan municipalities. This city faced water stress that resulted from climate change, the 132 average annual rainfall was below the world's average of 363mm, and from human relation such as 133 population growth and economic expansion. More than 60% of the population living in the urban 134 regions in South Africa, and Gauteng province receives most migrants in this country. For this city, it 135 is anticipated that the water demand would outstrip the water delivered by 2025. For more than a 136 century, the company Rand Water has delivered municipal water to more than 9 million people and

141
Historical monthly data of municipal water consumption (in Mega liters, ML) over ten years 142 from 2007 to 2016 were provided by Rand Water and used to build and assess the model. Two pre-143 tests were applied to these data by SPSS (24) package, one of them being Komarov-Semenove test to 144 assess normality and the other one being a box-whisker test to check for outliers. The results showed 145 that these data are normally distributed, the value of significance is 0.2 > 0.05, and data are clean from 146 outliers, data lies between ±1.5 IQR. These results increase the reliability on the quality of data 147 received from the company. Figure 1 shows the municipal water consumption: a) monthly time

151
The proposed methodology can be divided into four parts, including data pre-processing,

152
Artificial Neural Network, Backtracking Search Algorithm and model evaluation.

Artificial Neural Network (ANN)
178 ANN is a method inspired by the way the human brain processes data, and emulates its 179 functionality by using similar operations and connectivity as a biological neural system [29,30,68].

180
Recently, ANN models have been widely utilised in water resources and hydrology applications 181 because of its ability to extract complex nonlinear relationships, which exist within the hydrology 182 data [30,31].

183
In this study, the multilayer perceptron (MLP) is applied to simulate municipal water demand.

184
MLP has been frequently and successfully used for the forecast of water resources and hydrology 185 applications. Its architecture and hyperparameters (as shown in Table 1)  simulated urban water reaches its minimum. The data were split randomly into three sets 70% for 199 training, 15% for testing and 15% for validation, as previously conducted by Zubaidi,et al. [21] and 200 Zubaidi,et al. [35]. As in Gharghan,et al. [36], cross-validation was used to ensure the generalization 201 capabilities of the model and avoid overfitting, and the stopping criterion for training was done using 202 the root mean square error (RMSE) as an objective function (i.e., error not more than the value of 203 RMSE in the testing stage). This procedure was also used successfully by Zubaidi et al.,[37,48].
Where, F is the responsible for controlling the amplitude of the search direction matrix. It can be 224 obtained by applying Equation (6), where randn is a standard normal random number.
Where, 233 mixrate: is the mix rate parameter, which controls the elements' number that will be altered.

234
A boundary control mechanism is conducted via applying Equation (9)

239
More details about the BSA algorithm can be found in Civicioglu [73]. In our research study, we

266
Two boxplots' shapes for normalised and denoised data are shown in Figure 3. It can be seen that 267 there are no outlier's data for both shapes. Additionally, both shapes almost have the same median, 268 the upper and lower quartiles, while the upper and lower extremes of the denoised data are less than 269 those for normalised data because of noise elimination. Moreover, the shape of denoised data is near 270 to normal distribution pattern, better than the normalised data shape.

282
In this research, the cases' number is N = 116, which is more than the 82 needed, which indicates 283 compliance with the proposition from Tabachnick and Fidell [61].

285
After performing data pre-processing methods, data were split into three datasets include 286 training, testing and validation as presented in Table 2. The table tabulates four statistical standards   287 for all data sets include maximum consumption (Cmax), minimum consumption (Cmin), mean 288 consumption (Cmean), standard deviation (Cstd) and total sample size for each data set (T). The

289
outcomes show that all sets mostly have the same style.

308
The ANN technique was design to estimate the effect of using BSA algorithm in conjunction with 309 the ANN, and to validate the results of the combined model. Consequently, extensively trial and error 310 technique's scenarios were implemented to determine the ANN model's factors (LR, N1, and N2) that

311
offer the optimal precise of prediction. Accordingly, the outcomes present that the values of LR, N1, 312 and N2 are 0.3, 7, and 10, respectively.

313
To explore the capability and accuracy of the combined model for generalization, the coefficient 314 of determination (R 2 ) was estimated between the observed and simulated water demand for training,

315
testing and validation sets, as presented in Figure 6. The measured municipal water consumption is 316 indicated in the x-axis and plotted against the simulated water demand in the y-axis. Also, the dataset 317 of the testing stage was employed to plot a regression calibration curve between the observed versus 318 simulated water consumption time series, with a 95% confidence interval (CI). The figure shows that 319 there are neither any irregular data nor a particular pattern trend, and high levels of consistency 320 between the observed and simulated data. Also, the hybrid model was significant R 2 = 0.97, 0.97, and 321 0.98 for training, testing, and validation datasets, respectively. These results support the capabilities 322 of the BSA-ANN model to accurately generalise unseen data (i.e. dataset that was not considered 323 before in training and testing stages).
324 325 Figure 6. The performance of combined model in training, testing and validation stages.

326
The coefficient of determination (R 2 ) criterion was utilised again to evaluate the accuracy of ANN

329
Although the values of coefficient of determinations for training and testing stages are slightly bigger 330 than the value of the same criteria for validation stage, this is not considered a problem, as was also 331 discussed in Dawson,et al. [78], . Hence, we can confidently say that this statistical criterion supports  Table 3. According to Dawson,et al. [78], the results of 340 these four statistical criteria indicate the ability of the models, BSA-ANN and ANN (stand-alone), to 341 simulate accurately municipal water demand. But, the capability of BSA-ANN model for generalizing 342 data in the validation stage is still better than ANN (stand-alone) model (e.g. the value of CE=0.979 for 343 BSA-ANN is better than CE=0.931 for ANN (stand-alone) model.

380
In this manuscript, the performance of novel combined models that include pre-treatment signal, 381 mutual information and BSA-ANN technique were assessed to estimate monthly municipal water 382 needed based on previous water consumption. Historical data of monthly water consumption over 383 ten years from the Gauteng province, South Africa, was utilised to build and evaluate the predictive 384 model developed. The outcomes show that data pre-processing is a crucial step to enhance the quality 385 of the data before feeding it into the model by denoising time series and selecting the best scenario of factors that drive water demand). Hence, these results can accurately inform Rand Water (i.e. its 395 decision-makers and managers), helping this water utility company to better manage the existing 396 municipal water system and to better plan for extensions in response to the increasing consumption, 397 which would lead to better service and better management of resources in the Gauteng province.

398
Therefore, taking into consideration all the benefits mentioned before, we recommend that additional 399 studies are conducted in other regions with similar or different climatic and socio-economic factors, 400 or regions that lack climatic and socio-economic factors but have reliable water consumption data.

401
Also, based on the outputs of the current study, we recommend exploring the use of different 402 techniques of data pre-processing and several hybrid models in the simulation of municipal water 403 demand depend on historical water consumption for other cities in the world due to there is no global 404 method that surpasses all the models for prediction water demand.

405
Acknowledgements: Authors are grateful the Rand Water company for providing the historical 406 municipal water data for this study.

407
Author Contributions: Each of the authors contributed to the design, analysis, and writing of the 408 study.

409
Conflicts of Interest: The authors declare no conflict of interest.