Water Demand Prediction Using Machine Learning Methods: A Case Study of the Beijing–Tianjin–Hebei Region in China

: Predicting water demand helps decision-makers allocate regional water resources efﬁciently, thereby preventing water waste and shortage. The aim of this study is to predict water demand in the Beijing–Tianjin–Hebei region of North China. The explanatory variables associated with economy, community, water use, and resource availability were identiﬁed. Eleven statistical and machine learning models were built, which used data covering the 2004–2019 period. Interpolation and extrapolation scenarios were conducted to ﬁnd the most suitable predictive model. The results suggest that the gradient boosting decision tree (GBDT) model demonstrates the best prediction performance in the two scenarios. The model was further tested for three other regions in China, and its robustness was validated. The water demand in 2020–2021 was provided. The results show that the identiﬁed explanatory variables were effective in water demand prediction. The machine learning models outperformed the statistical models, with the ensemble models being superior to the single predictor models. The best predictive model can also be applied to other regions to help forecast water demand to ensure sustainable water resource management. This study presents an analysis of water demand in the Beijing–Tianjin–Hebei region of China between 2004 and 2019. Eleven statistical and machine-learning models were built for predicting water demand based on economy, community, water use, and resource availability in IPS and EPS. Models were trained using 10-fold CV. The best four models were evaluated on the test data by the metrics of MSE, MAE, and R 2 score. According to the results, the GBDT model demonstrated the best performance among all the considered models, achieving the lowest error rates and R 2 scores of 99.9999% and 99.9578% in IPS and EPS, respectively. A comparison of model performance also showed that the machine-learning models outperformed the statistical models. Among the machine-learning models, the ensemble models appeared to be superior to the single predictor models. Beijing–Tianjin–Hebei


Introduction
Water scarcity is an issue for many cities globally [1]. By 2030, nearly half of the world's population is expected to live in water-stressed areas [2]. Water demand prediction is crucial for the sustainable management of water distribution systems [3]. It is also a factor that is considered by infrastructure decision-makers to ensure effective water usage plans and schedules, especially in light of the ongoing urban expansion, where consumers are encouraged to reduce their energy and resource consumption [4][5][6].
The economy of the Beijing-Tianjin-Hebei region has been developing rapidly. The demand for city expansion from Beijing to Tianjin and Hebei has increased. This rapid urban expansion has tightened the region's ability to provide adequate water resources. In the existing water distribution conditions, regional water resource management faces uncertainties and challenges such as water shortages, overall growth in water demand, seasonal demand peaks due to climate change, regional economic competition, and public health requirements.
This study focuses on water demand prediction in the Beijing-Tianjin-Hebei region to support water resource planning and management. The explanatory variables associated with economy, community, water use, and resource availability were derived. A historical dataset covering the period from 2004 to 2019 was collected from the Annual Water Resources Reports [7][8][9] and China Statistics Yearbook [10]. Eleven prediction models, containing both statistical and machine learning models, were designed. Two prediction scenarios were considered. In the interpolation prediction scenario (IPS), the training and the test samples were randomly selected from the entire dataset. In the extrapolation prediction scenario (EPS), the historical dataset was used for training to predict the Table 1. Main explanatory variables found by other studies to affect water demand.

Models for Predicting Water Demand
Choosing an appropriate water demand prediction model is a challenge because it involves many factors, such as technology, population, society, economy, climate, and public policy [3,22,23]. Predictive models can be broadly divided into two groups: statistical and machine-learning models. Statistical models employ probability theory and mathematical statistics to obtain the functional relationship between different variables. Commonly use statistical models for regression include linear regression, ridge regression, and lasso regression. In contrast, machine learning models do not require defining a clear relationship between dependent and explanatory variables [5,24]. Instead, they employ algorithms such as support vector machine, decision tree, and random forest to learn patterns from training data and use them to predict future outcomes.
Statistical models are widely used for water demand prediction [25][26][27][28]. The main limitation of statistical models is that they must have a predetermined structure [29], making it difficult to find one mathematical function that would work well on different data [30]. Furthermore, statistical models often fail to effectively deal with complex data relationships; their prediction accuracy also decreases with an increase in the amount of data [31]. Other methods should be employed when dealing with big and complex data [32]. For example, Rozos et al. [33] employed an integrated system dynamics and cellular automata model to predict water demand under alternative approaches, including distributed water infrastructure.
Machine learning models can be further divided into single predictors and ensemble algorithms, according to the number of employed predictors. A single predictor contains only one predictor (or algorithm), such as neural network, support vector machine, or decision tree. Ensemble algorithms such as random forest, AdaBoost, bagging, and gradient-boosting tree aggregate a number of predictors, all contributing to the final prediction result.
Ensemble learning is becoming increasingly popular [45,46]. It uses statistical sampling principles to train multiple models. Each of these models is used to separately predict a new sample. The value of the final prediction result for the new sample is selected based on the majority voting mechanism. In other words, ensemble learning transforms different hypotheses provided by single predictors into one hypothesis.
In the field of predicting water resources, Lee et al. [30] investigated 12 statistical and machine-learning models to predict daily household water usage in response to residential water demand. Pesantez et al. [5] applied random forest, artificial neural network, and support vector machine to smart-meter data to predict the hourly water demand of 90 accounts. Parisouj et al. [47] employed support vector regression, artificial neural network with backpropagation, and extreme learning machine to predict the monthly and daily streamflows of four river basins in the United States. Villarin and Rodriguez-Galiano [32] used classification and regression trees and random forest to establish a multivariate prediction model for water demand in Seville, Spain. Sengupta et al. [48] used support vector machine, artificial neural network, and random forest to predict changes in stream channel morphology.
China's Beijing-Tianjin-Hebei region is an area with a severe water shortage. Water demand prediction can help its decision-makers to achieve more efficient water resource allocation. Existing studies have used one or a few models to predict water demand. In contrast, this study provides, for the first time, a comprehensive comparative analysis of several statistical and machine-learning models under IPS and EPS. The training and test data refer to the same period of time under IPS, whereas models are built using historical training data and then applied to predict future water demand under EPS.

Research Design
This study was designed to investigate the performance of 11 modeling techniques under two prediction scenarios of water demand in the Beijing-Tianjin-Hebei region. The research design was divided into five steps: data preprocessing, modeling, model training, cross-validation (CV), and model testing ( Figure 1).

Research Design
This study was designed to investigate the performance of 11 modeling techniques under two prediction scenarios of water demand in the Beijing-Tianjin-Hebei region. The research design was divided into five steps: data preprocessing, modeling, model training, cross-validation (CV), and model testing ( Figure 1). As the first step, data preprocessing allowed us to reduce the sensitivity of the models to different data scales. The following two prediction scenarios were considered (more details are provided in Sections 3.4-3.6): (1) Interpolation prediction scenario (IPS): For each model, a 10-fold CV was applied to randomly selected training samples accounting for 80% of all the data. The fitted models were then tested on the remaining 20% of the data to verify their prediction performance.
(2) Extrapolation prediction scenario (EPS): For each model, a 10-fold CV was applied to the training data covering the period from 2004 to 2018. The fitted models were then tested on the 2019 data.

Data Preprocessing
Feature scaling was used consistently across all the studied models to ensure the comparability of their prediction results. Normalization was chosen as the feature scaling method. Let D = {X, y} denote the training set, where X = (x1, x2, …, xn) is the n-dimensional explanatory space, and y represents the dependent variable. The normalization of xi can be expressed as where xi' denotes the normalized value of an explanatory variable x for an ith sample; max(x) and min(x) denote the maximum and minimum values of x, respectively. Both explanatory and dependent variables were normalized in this study.

Modeling
The following 11 models were introduced in this study to predict water demand: As the first step, data preprocessing allowed us to reduce the sensitivity of the models to different data scales. The following two prediction scenarios were considered (more details are provided in Sections 3.4-3.6): (1) Interpolation prediction scenario (IPS): For each model, a 10-fold CV was applied to randomly selected training samples accounting for 80% of all the data. The fitted models were then tested on the remaining 20% of the data to verify their prediction performance. (2) Extrapolation prediction scenario (EPS): For each model, a 10-fold CV was applied to the training data covering the period from 2004 to 2018. The fitted models were then tested on the 2019 data.

Data Preprocessing
Feature scaling was used consistently across all the studied models to ensure the comparability of their prediction results. Normalization was chosen as the feature scaling method. Let D = {X, y} denote the training set, where X = (x 1 , x 2 , . . . , x n ) is the n-dimensional explanatory space, and y represents the dependent variable. The normalization of x i can be expressed as where x i ' denotes the normalized value of an explanatory variable x for an ith sample; max(x) and min(x) denote the maximum and minimum values of x, respectively. Both explanatory and dependent variables were normalized in this study.

Modeling
The following 11 models were introduced in this study to predict water demand: , and support vector machine (SVM); Ensemble methods: random forest (RF), adaptive boosting (AdaBoost), and gradientboosting decision tree (GBDT). RF is a parallel integration algorithm, whereas AdaBoost and GBDT are serial-integration algorithms.
The Python Scikit-Learn library [49,50] was used to implement the models. The prediction models for each method listed in Section 3.3 were built separately. The default hyperparameter values of each algorithm were used to train the models. Each algorithm is briefly described below.

Linear Regression
LR expresses the relationship between the explanatory variable(s) x and the dependent variable y through a linear function. It fits a linear model to minimize the residual sum of squares between the actual and estimated values of the dependent variable. The coefficients are estimated using the ordinary least squares (OLS) method.

Ridge and Lasso Regression
Ridge regression and lasso regression are variations of LR. Ridge regression [51] is an improvement of the OLS method. It is a more stable and reliable regression algorithm, owing to the employment of the loss of unbiasedness [52]. The regularization term L2 is added after the sum of squared error (SSE) to control the trade-off between variance and bias.
Lasso regression was proposed by Tibshirani [53]. The important characteristic of this algorithm is that it completely eliminates the least important features by setting their weights to zero [52]. Lasso regression automatically performs feature selection and outputs a sparse matrix model. It replaces the regularization term L2 with the regularization term L1.

Kernel and Bayesian Ridge Regression
KRR is an extension of ridge regression that works well on nonlinear data by mapping them to a new feature space using a kernel, where the data become linearly separable. BRR combines ridge regression with Bayesian statistics.

Backpropagation Neural Network
BPNN, also known as a multilayer perceptron (MLP), comprises an input layer, one or more hidden layers, and an output layer. Its signal propagates forward from the input layer to the output layer through the hidden layers, while the error propagates backward from the output layer to the input layer [54].
In BPNN, a weight vector is applied element-wise to the input feature vector, and the result is passed through an activation function to obtain the dependent value. The commonly used activation functions are threshold, sigmoid, and hyperbolic tangent functions. The objective function is the SSE. The minimum value of the objective function is determined using the gradient descent method.

Decision Tree
DT is a classic machine-learning algorithm. It builds a binary tree based on sample features. The process of arriving at the prediction result is easy to understand and interpret since the resulting decision tree structure is similar to a flowchart, where leaf nodes correspond to the predicted values.

Support Vector Machine
SVM finds a separating hyperplane, fitting y based on x with the largest margin. SVM uses a kernel function to map the original training set with nonlinear features to a high-dimensional feature space, where the data become linearly separable [55]. The popular kernel functions are linear, polynomial, and radial basis functions.
3.3.7. Random Forest RF, first proposed by Breiman [56], is an ensemble learning algorithm that uses the bagging algorithm for feature selection. In particular, replacement without sampling is used to randomly select a subset of features to construct each tree in the ensemble, while replacement sampling is used to select samples from the original data to train the trees. For each new test sample, the prediction result output by each trained tree in the ensemble is integrated using the majority voting mechanism to obtain the final result.

AdaBoost
AdaBoost is a famous boosting algorithm [57], where each sample is assigned an initial weight, and the weighted data set is then used to train a model. The weight of a sample is increased if the prediction result for it is wrong. Then, the next model is trained again by increasing some weights. This process is repeated several times to get weak learners. The weak learners are then combined to obtain a strong learner [58].

Gradient Boosting Decision Tree
GBDT, first developed by Friedman [59], uses gradient boosting to promote performance. It updates parameters in the direction of the gradient descent. The model's performance is evaluated using a loss function. GBDT usually operates as a combination of decision trees. Each decision tree learns the residuals from all previous trees. The final result is the aggregation of the results obtained by all decision trees [60].

Model Training
Random samples, accounting for 80% of the entire dataset, covering the period from 2004 to 2019, constituted the training data in the IPS model. The training data in the EPS model were the data from 2004 to 2018.

Cross-Validation
When fitting a model using the training set and validating it using the test set, the model can be prone to overfitting, i.e., the model will perform well on seen data but poorly on unseen data. To overcome the problem of overfitting, a 10-fold CV was introduced for the training data. The 10-fold CV randomly divides the training data into ten different subsets, with each subset constituting one fold. Each model is then trained on the nine folds and validated on the remaining fold 10 times, with the validation fold changed in each iteration. The average prediction score and the standard deviation are calculated using the ten scores output by CV. The same CV process was adopted for both IPS and EPS.

Model Testing
The remaining random samples that account for 20% of the entire dataset for the period from 2004 to 2019 constituted the test data in the IPS model. The test data in the EPS model were the data for 2019 only. To evaluate each model's performance, the following three metrics were employed: mean squared error (MSE), mean absolute error (MAE), and coefficient of determination (R 2 ). These metrics were considered due to their wide use in studies on water demand prediction [5,[30][31][32]. MSE and MAE were used to measure the difference between the actual and predicted values for each sample as follows: whereŷ i denotes the predicted value of the ith sample; y i denotes the corresponding actual value; n denotes the number of samples. Lower values of MSE and MAE indicate a better fit of models. R 2 indicates the goodness of fit. It measures the proportion of variance explained by the explanatory variables used in a model, as follows: whereŷ i and y i denote the predicted and corresponding actual values of the ith sample, respectively, and y denotes the average value of all actual values, y = 1 n ∑ n i=1 y i . The proportion of explained variance reflects how well unseen samples are predicted. The best possible value of R 2 is 1.0.

Study Region
The Beijing-Tianjin-Hebei region is located in the north of China, comprising Beijing, Tianjin, and Hebei. It has a population of 110 million and covers an area of 21.67 million ha [61]. The Beijing-Tianjin-Hebei region is adjacent to China's most water-short river basin. The region suffers serious water scarcity problems [61][62][63].
The total amount of water resources in the Beijing-Tianjin-Hebei region in 2019 was 14.62 billion m 3 [10]. The per capita water resources of Beijing, Tianjin, and Hebei were 114.02, 51.79, and 149.5 m 3 per person, respectively, which is far below the internationally recognized extreme water shortage standard of 500 m 3 per person. While the total amount of water resources in the Beijing-Tianjin-Hebei region is less than 1% of the total water resources of the country, the region accounts for 8% of the country's total GDP and population.

Dataset
The following eleven explanatory variables were chosen to characterize water demand based on the literature review presented in Section 2.1:

1.
Economy: GDP, per capita GDP, added value of primary industry, added value of secondary industry, added value of tertiary industry, and per capita disposable income; 2.
Water use: agriculture water consumption and irrigated area; 4.
Resource availability: total water resources and annual precipitation.
Agriculture water consumption and annual precipitation for the Beijing-Tianjin-Hebei region from 2004 to 2019 were obtained from the Annual Water Resources Reports [7][8][9], while the data from the same period for the remaining variables came from the China Statistics Yearbook [10]. The explanatory variable dataset had 48 rows and 16 columns, of which 48 represented 16 years of historical data from Beijing, Tianjin, and Hebei, and 11 represented the number of explanatory variables. The water demand of the Beijing-Tianjin-Hebei region is the sum of each subregion. Table A1 listed the basic statistics of the variables.

Training and CV Results
The blue and red bars in Figure 2 represent the training and CV R 2 scores for all 11 models, respectively, with the black circles and lines representing the mean and standard deviation of the CV scores, respectively. LR was considered as the baseline model across all the statistical models. ard deviations were observed for KRR, BPNN, and SVM, indicating that the R 2 values were highly dispersed across the ten-folds of these models, which means that the models' performance in each fold significantly deviated from the average performance. In addition to lower standard deviations, the CV scores of LR, BRR, AdaBoost, and GBDT models ranked in the top four. These four models were considered for the test set.
Similarly, the CV scores of all the models were also higher than 95% in EPS. Lasso, KRR, BPNN, SVM, and RF showed higher standard deviations in this scenario. The models that obtained the best CV performance were also the LR, BRR, AdaBoost, and GBDT models. These four models were further promoted to the test set.
The CV performance of the models in EPS was slightly lower than that in IPS. This was mainly because the EPS model focused on future prediction, for which it is much more difficult to achieve a higher prediction effect. On the contrary, IPS predicted data from the same historical period as the training data. Hence, the uncertainties were lower than that in the case of EPS.  Table 2 shows the results achieved by the models over the test set under IPS and EPS, respectively. The best values for each metric are highlighted in bold. The CV scores of all the models were higher than 95% in IPS. Relatively large standard deviations were observed for KRR, BPNN, and SVM, indicating that the R 2 values were highly dispersed across the ten-folds of these models, which means that the models' performance in each fold significantly deviated from the average performance. In addition to lower standard deviations, the CV scores of LR, BRR, AdaBoost, and GBDT models ranked in the top four. These four models were considered for the test set.

Test Results
Similarly, the CV scores of all the models were also higher than 95% in EPS. Lasso, KRR, BPNN, SVM, and RF showed higher standard deviations in this scenario. The models that obtained the best CV performance were also the LR, BRR, AdaBoost, and GBDT models. These four models were further promoted to the test set.
The CV performance of the models in EPS was slightly lower than that in IPS. This was mainly because the EPS model focused on future prediction, for which it is much more difficult to achieve a higher prediction effect. On the contrary, IPS predicted data from the same historical period as the training data. Hence, the uncertainties were lower than that in the case of EPS. Table 2 shows the results achieved by the models over the test set under IPS and EPS, respectively. The best values for each metric are highlighted in bold. The best values for each metric are highlighted in bold.

Test Results
The GBDT model outperformed the other models in both scenarios, achieving the lowest MSE and MAE and R 2 scores of 99.9999% and 99.9578% in IPS and EPS, respectively. The R 2 scores of LR, BRR, and AdaBoost models were lower than that of GBDT by 0.0531%, 0.0529%, and 0.0002%, respectively, in IPS; in EPS, AdaBoost came second and BRR and LR came third and fourth. The test results confirm that the machine-learning models outperform the statistical models, with the ensemble models being superior to the single predictor models. The best model for predicting water demand in these two scenarios is GBDT.

Model Robustness
The GBDT model was further applied to three other regions in China to verify its robustness because it exhibited the best performance in both IPS and EPS. However, the other three models (i.e., LR, BRR, and AdaBoost; see Table 2) had only slightly lower efficiencies. All four models were, therefore, tested for model robustness.
The Harbin-Changchun region is located in the northeast of China, and it comprises Heilongjiang and Jilin, covering an area of 263,640.92 km 2 and having an estimated population of 65 million. The Central Plains urban region is located in the central and eastern parts of China and comprises Henan, Shanxi, Hebei, Shandong, and Anhui. It covers an area of 287,000 km 2 and has an estimated population of 374 million. The Chengdu-Chongqing region is located in southwestern China and comprises Szechwan and Chongqing. It covers an area of 185,000 km 2 and has a population of 115 million. These three regions have vastly different geographical locations (from the north to the south of China) and climate and economic conditions. The yearly explanatory variables and the dependent variable were collected from 2004 to 2019, according to the related reports [64][65][66][67][68][69][70][71]. The dataset size was 32 × 11, 80 × 11, and 32 × 11 for the Harbin-Changchun region, the Central Plains urban region, and the Chengdu-Chongqing region, respectively. The explanatory variable data were imported into the LR, BRR, AdaBoost, and GBDT models to obtain the predicted value of annual water demand. Figure 3 shows the comparison results of the actual and predicted values of water demand in the three regions. The LR model's prediction results are not shown in Figure 3 because the model failed to obtain effective prediction results. This indicates that the LR model is effective for predicting the water demand in the Beijing-Tianjin-Hebei region but cannot be extended to other regions in China.
could not match the Beijing-Tianjin-Hebei region. Further, the region experienced a magnitude-8 earthquake in 2008. Its reconstruction was not completed until the end of 2011 [74]. To accelerate economic development, the Chengyu Economic Zone was established in May 2011. The construction of this economic zone has promoted regional economic development. As seen from Figure 3, the differences between the predicted and actual values decreased after 2012.

Prediction for the Next Two Years
Water demand in the Beijing-Tianjin-Hebei region for the next two years was further predicted. It is difficult to obtain the relevant explanatory variables in a timely manner and to use them to predict the next two years of water demand. Here, we adjusted the forecast period to predict the next two years of water demand.
The next two years' prediction (NTYP) was similar to the EPS model. The EPS model was trained using the data of the past 15 years (i.e., 2004 to 2018) and was tested on the 16th year's (i.e., 2019) water demand. NTYP was trained using the data of the past 12 years to predict the future 12 years' water demand. The GBDT model was used in NTYP as it achieved the best performance in EPS. The actual and predicted values of water demand are shown in Figure 4.
The R 2 scores from 2010 to 2019 were 86.62% (Beijing), 94.94% (Tianjin), and 95.57% (Hebei). The water demand in the Beijing-Tianjin-Hebei region was the sum of the demands of its subregions. For the next two years, Beijing's water demand remains stable, In contrast, BRR, AdaBoost, and GBDT models could predict the outcomes. The prediction accuracy (R 2 ) of the GBDT model reached 93.1503% (Harbin-Changchun region), 83.1108% (Central Plains urban region), and 93.3015% (Chengdu-Chongqing region). It also obtained the highest prediction accuracy for the Central Plains urban region, with an accuracy greater than that of BRR and AdaBoost models by 8% and 35%, respectively. These models showed minor differences (i.e., the accuracy difference was less than 5%) for the Harbin-Changchun region and the Chengdu-Chongqing region. The BRR model achieved the best results, which was both 3% higher than the other two models for the Harbin-Changchun region and 2% and 3% higher than that of AdaBoost and GBDT models for the Chengdu-Chongqing region. The AdaBoost model performed poorly for the Central Plains urban region, with only 47.93% accuracy. The accuracy should be higher than 50% for robust and acceptable model performance [72,73]. Overall, the GBDT and BRR models proved to be robust. Compared with these two models, the accuracy of the GBDT model was better than that of the BRR model by 3% in these three regions.
Good agreement was not obtained for the Chengdu-Chongqing region before 2012. The early economic growth for this region was relatively flat. Its economic growth rate could not match the Beijing-Tianjin-Hebei region. Further, the region experienced a magnitude-8 earthquake in 2008. Its reconstruction was not completed until the end of 2011 [74]. To accelerate economic development, the Chengyu Economic Zone was established in May 2011. The construction of this economic zone has promoted regional economic development. As seen from Figure 3, the differences between the predicted and actual values decreased after 2012.

Prediction for the Next Two Years
Water demand in the Beijing-Tianjin-Hebei region for the next two years was further predicted. It is difficult to obtain the relevant explanatory variables in a timely manner and to use them to predict the next two years of water demand. Here, we adjusted the forecast period to predict the next two years of water demand.
The next two years' prediction (NTYP) was similar to the EPS model. The EPS model was trained using the data of the past 15 years (i.e., 2004 to 2018) and was tested on the 16th year's (i.e., 2019) water demand. NTYP was trained using the data of the past 12 years to predict the future 12 years' water demand. The GBDT model was used in NTYP as it achieved the best performance in EPS. The actual and predicted values of water demand are shown in Figure 4.
construction of an underground utility tunnel were completed in 2019. In the same year, the construction of theme parks, resorts, theaters, libraries, and museums started. The large-scale city subsidiary-center construction caused a significant increase in water demand in 2019. Tianjin began to adjust its GDP calculation method in 2017, resulting in an overall decrease of about 30% in Tianjin's economy [75]. GDP is an economic explanatory variable. Owing to the adjustment of statistical data, the predicted values for 2018 and 2019 were lower than the actual values. Hebei realized that the proportion of tertiary industry surpassed the proportion of secondary industry in 2018, which was a turning point in its economic development; hence, the actual value of its water demand was lower than the predicted value. There was little difference between the actual and predicted values in 2019.

Conclusions
Water scarcity has become a problem of concern for many cities in the world. Predicting water demand helps policymakers and water suppliers maintain the balance between the supply and demand of urban water resources, thereby preventing water wastage and shortage.
This study presents an analysis of water demand in the Beijing-Tianjin-Hebei region of China between 2004 and 2019. Eleven statistical and machine-learning models were built for predicting water demand based on economy, community, water use, and resource availability in IPS and EPS. Models were trained using 10-fold CV. The best four models were evaluated on the test data by the metrics of MSE, MAE, and R 2 score. The R 2 scores from 2010 to 2019 were 86.62% (Beijing), 94.94% (Tianjin), and 95.57% (Hebei). The water demand in the Beijing-Tianjin-Hebei region was the sum of the demands of its subregions. For the next two years, Beijing's water demand remains stable, while Tianjin's water demand first decreases and then increases. Hebei's water demand remains stable, with a slight decrease. Water demand in the Beijing-Tianjin-Hebei region shows an overall downward trend.
This trend was affected by the policy of the "Beijing-Tianjin-Hebei Coordinated Development Plan" announced in 2015. Half of the manufacturing and advanced manufacturing productions in Beijing were relocated to Hebei. At the same time, Hebei is a big agricultural province with a large population. These factors led to an upward trend in its water demand. With the release of the policy effect, Hebei's water demand was gradually flattened. Tianjin is focused on finance, commerce, and technology, and its industries were in a relatively advanced condition to form the downward water demand trend.
We further discuss the differences between the actual and predicted values. The predicted value was less than the actual value for Beijing and Tianjin but not for Hebei for 2018-2019 ( Figure 4). To actively cooperate with the Beijing-Tianjin-Hebei Coordinated Development Plan, the Beijing Tongzhou District released a city subsidiary-center construction plan at the end of 2018. The subsidiary-center demolition and the substructure construction of an underground utility tunnel were completed in 2019. In the same year, the construction of theme parks, resorts, theaters, libraries, and museums started. The large-scale city subsidiary-center construction caused a significant increase in water demand in 2019. Tianjin began to adjust its GDP calculation method in 2017, resulting in an overall decrease of about 30% in Tianjin's economy [75]. GDP is an economic explanatory variable. Owing to the adjustment of statistical data, the predicted values for 2018 and 2019 were lower than the actual values. Hebei realized that the proportion of tertiary industry surpassed the proportion of secondary industry in 2018, which was a turning point in its economic development; hence, the actual value of its water demand was lower than the predicted value. There was little difference between the actual and predicted values in 2019.

Conclusions
Water scarcity has become a problem of concern for many cities in the world. Predicting water demand helps policymakers and water suppliers maintain the balance between the supply and demand of urban water resources, thereby preventing water wastage and shortage.
This study presents an analysis of water demand in the Beijing-Tianjin-Hebei region of China between 2004 and 2019. Eleven statistical and machine-learning models were built for predicting water demand based on economy, community, water use, and resource availability in IPS and EPS. Models were trained using 10-fold CV. The best four models were evaluated on the test data by the metrics of MSE, MAE, and R 2 score.
According to the results, the GBDT model demonstrated the best performance among all the considered models, achieving the lowest error rates and R 2 scores of 99.9999% and 99.9578% in IPS and EPS, respectively. A comparison of model performance also showed that the machine-learning models outperformed the statistical models. Among the machine-learning models, the ensemble models appeared to be superior to the single predictor models.
The GBDT model was further validated in three other regions in China with different geography, climates, and economies. The predicted accuracies were higher than 80% in all the cases. This proves the robustness of the GBDT model. Datasets with the same explanatory variables can be applied to water demand prediction in other areas. In addition, for the next two years, water demand was predicted by adjusting the training and test datasets. The trend was provided, and the reasons were explained.
The results of the study will help municipalities and utilities combine their own databases and the needs of corporate industry to carry out water demand prediction, optimizing the relationship between water supply and demand and saving uncertain water scheduling costs. In the future, we plan to further subdivide water demand and analyze the crucial reasons for changes in water demand in the Beijing-Tianjin-Hebei region. Multiscenario water demand forecasting should also be designed to be optimized with water supply data. In this study, a relatively small dataset was used. Large water demand datasets that cover all provinces and cities in China should be considered. The reused water from wastewater and polluted water can be considered in the predictive model as it mitigates overall demand. In addition, the year 2020 is different from other years, owing to the impact of COVID-19. It is worthy to compare the water demand data between 2020 and previous years. The difference between the actual and predicted values of water demand can also be discussed after the 2020 data are released. Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found in [7][8][9][10]. All data, models, or code generated or used during the study are available from the corresponding author by request (qings@bjtu.edu.cn).

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.