Comparison of Bottom-Up and Top-Down Procedures for Water Demand Reconstruction

: This paper presents a comparison between two procedures for the generation of water demand time series at both single user and nodal scales, a top-down and a bottom-up procedure respectively. Both procedures are made up of two phases. The top-down procedure adopted includes a non-parametric disaggregation based on the K-nearest neighbours approach. Therefore, once the temporal aggregated water demand patterns have been deﬁned (ﬁrst phase), the disaggregation is used to generate water demand time series at lower levels of spatial aggregation (second phase). In the bottom-up procedure adopted, demand time series for each user and for each time step are generated applying a beta probability distribution with tunable bounds or a gamma distribution with shift parameter (ﬁrst phase). Then, a Copula based re-sort is applied to the demand time series generated to impose existing rank cross-correlations between users and at all temporal lags (second phase). For the sake of comparison, two case studies were considered, both of which are related to a smart water network in Naples (Italy). The results obtained show that the bottom-up procedure performs signiﬁcantly better than the top-down procedure in terms of rank-cross correlations at ﬁne scale. However, the top-down procedure showed a better performance in terms of skewness and rank cross-correlation when the aggregated demands were considered. Finally, the level of aggregation in nodes was found to a ﬀ ect the performance of both the procedures considered.


Introduction
Accurate estimates of nodal demands are of fundamental importance for numerical simulation of water distribution networks (WDNs) behavior. Accurate estimates lead to consistent results in terms of nodal outflows and pressure-heads [1]. The assessment of nodal demands is usually carried out through two different approaches [2], the top-down and the bottom-up approach respectively.
According to the first approach, firstly the total water demand pattern of the whole WDN considered is defined and then the total demand is disaggregated into the individual nodes of the model. In its most common deterministic application, once the water demand pattern has been defined at high levels of spatial aggregation, the nodal water demand patterns are usually obtained by disaggregating the total amount of water supplied in proportion to the average demand at each node. Therefore, this approach does not consider both the random character of water demands and their variability. Indeed, it is assumed that all nodes are characterized by an identical demand pattern implying a maximum correlation in space between the temporal patterns. However, the importance of taking into account the variability of water demands was highlighted by several studies [3][4][5][6]. Therefore, with the aim of taking into account the random nature of the water demand, stochastic

Materials and Methods
In the following sections two procedures for generating synthetic time series are presented. The first procedure presented is based on the top-down non-parametric disaggregation developed by Nowak et al. [16]. Once the temporal water demand pattern at a high level of spatial aggregation has been defined, the disaggregation procedure allows generation of water demand time series at lower levels of spatial aggregation.
The second procedure is the bottom-up procedure developed by Creaco et al. [25] for the reconstruction of consistent demand time series at WDN users starting from the measured demand time series from a smart meter district.

Top-Down Procedure
Let us assume a district with N nodes (or users). After subdividing the generic day into N ∆t time steps ∆t, the first procedure proposed in this paper allows generation of the water demand time series q j i of the generic j-th node (or user) in the generic i-th ∆t, starting from the total amount of water Q i supplied at the i-th time step. The procedure is made up of two phases [26]. The first phase consists of using a stochastic [27] or non-parametric algorithm [28] to generate the total water demand time series of the area (i.e., one demand time series for each ∆t of the day). In the present work, the total demand at the generic time step of the day is sampled from a beta probability distribution with tunable bounds, which enables preserving mean, variance and skewness of the total demand time series [25]. A copula resorting is applied on the generated time series to preserve temporal correlations on the total demand at all temporal lags. These latter are derived from the measured time series through the Spearman index [29]. Specifically, a multivariate normal probability distribution, with means and standard deviations equal to 0 and 1 respectively, is used as copula [30]. The multivariate normal distribution is then used to generate time series expressing the rank cross correlations to be imposed on demand time series between users at all temporal lags.
Therefore, for the parametrization of the procedure mean µ, standard deviation σ and skewness γ for each total demand time series must be assessed (3 × N ∆t parameters). Furthermore, the minimum value of the total demand time series is needed (N ∆t parameters) in order to implement the beta probability distribution (make reference to Creaco et al. 2020 [25] for further explanations). Finally, temporal cross-correlations ρ between total demand time series must be evaluated. In this respect, two aspects should be considered: firstly, each time series is fully correlated with itself, secondly the correlation between two time series is symmetrical ((N ∆t × (N ∆t − 1))/2 parameters). Concluding, the number of parameters to be assessed adds up to 4 × N ∆t + (N ∆t × (N ∆t − 1))/2.
In the second phase, a spatial disaggregation is applied to generate the water demand time series of each node. In the present work the non-parametric disaggregation proposed by Nowak et al. [16] is used. Assuming an hourly time-step, according to the non-parametric disaggregation the generation of the nodal demand is obtained by the random resampling from the conditional probability density function f (q h |Q h ), where Q h and q h are the random variables representing the aggregate demand in the h-th hour and the disaggregated demand in the h-th hour respectively. In the model presented in Nowak et al. [16], the conditional density function is carried out using a K-nearest neighbours (K-NN) approach applied on the basis of the observed aggregate series. Specifically, let us assume the length of the generated and measured aggregated time series respectively equal to n d,g ∆N ∆t and n d,m ∆N ∆t , where n d,g and n d,m are the numbers of days of generated and measured time series respectively. K-nearest neighbours to each generated value of the aggregated series (Q gen n,h , with n = 1 : n d,g ) are identified from the measured aggregate demands related to the same hour h (Q mea m,h , with m = 1 : n d,m ). According to a heuristic approach, the optimal number K is equal to √ n d,m [28]. However, the neighbours are computed based on the absolute value of the difference between the observed and generated aggregate values (∆). Therefore, the K values with the smallest ∆ are selected. Then, after being reordered from the nearest to the farthest, the K-nearest neighbours are assigned a weight W j according to their position j in the reordered vector [28]: One of the K-nearest neighbours is selected based on a weighted resampling and the corresponding proportions p j,d,h for each of the N nodes for the h-th hour of the d-th day are calculated on the basis of measured demands: where q mea j,d,h is the generic disaggregated measured value of the j-th node for the h-th hour of the d-th day and Q mea d,h is the associated aggregated value. Finally, the obtained proportions are multiplied by the generated aggregated value Q gen d,h to provide the generated disaggregated values q gen i,d,h :

Bottom-Up Procedure
In this work the procedure developed by Creaco et al. [25] was applied as bottom-up procedure, enabling the generation of demand time series for each of the N nodes (or users) considered for any time step ∆t.
Then, the procedure is made up of two phases. In the first phase it is assumed that the daily demand q j i of the generic j-th user in the i-th ∆t follows the beta probability distribution with tunable bounds.
The generated demand time series respect the basic statistics (mean, variance, and skewness). However, they fail to preserve the existing rank cross correlations between users and at various temporal lags. Therefore, in the second phase, the generated demand time series are re-sorted through a copula to impose existing rank cross-correlations. These latter are derived from the measured time series through the Spearman index [29]. Specifically, a multivariate normal probability distribution, with means and standard deviations equal to 0 and 1 respectively, is used as copula [30]. The multivariate normal distribution is then used to generate time series expressing the rank cross correlations to be imposed on demand time series between users at various lags.
At the generic i-th of the N ∆t time steps, the total demand Q i is finally obtained as the sum of the N values q j i obtained after the copula based re-sorting. As regards the parameterization of the procedure, the assessment of mean µ, standard deviation σ, skewness γ and the minimum value of each demand time series is required (4 × N ∆t × N parameters). Moreover, cross-correlations ρ between demand time series must be evaluated, assuming the same as in the previous procedure (N ∆t × N × (N ∆t × N − 1)/2 parameters). Finally, the number of parameters to be assessed is equal to 4

Case Studies
Two case studies were analysed in this work. For both case studies the hourly consumption data from a smart water network located in a suburban area of Naples (Italy) were considered. In this area, called Soccavo, the municipal water company "Acqua Bene Comune Napoli" (ABC) implemented a smart WDN replacing almost 5000 traditional water meters with smart meters, aiming to reconstruct the total district consumption. For the sake of comparison with the results obtained in the previous work carried out by Creaco et al. [25], in the first case study (Case study 1) the data of 100 users for 31 days, from 1 January 2018 to 31 January 2018, were considered.
The second case study (Case study 2) is made up of 1000 users from the same smart water district, monitored from 1 October 2017 to 31 October 2017. Therefore, the case studies essentially differ because of the number of users. It must be noted that in both case studies only one month was considered since the parameters of the procedures presented are characterized by monthly variations.
The daily patterns of aggregated measured hourly demand in the 31 days considered in both case studies are shown in Figure 1, highlighting similar characteristics of water consumption in the network.
Water 2020, 12, x FOR PEER REVIEW 4 of 13 temporal lags. Therefore, in the second phase, the generated demand time series are re-sorted through a copula to impose existing rank cross-correlations. These latter are derived from the measured time series through the Spearman index [29]. Specifically, a multivariate normal probability distribution, with means and standard deviations equal to 0 and 1 respectively, is used as copula [30]. The multivariate normal distribution is then used to generate time series expressing the rank cross correlations to be imposed on demand time series between users at various lags. At the generic i-th of the ∆ time steps, the total demand Qi is finally obtained as the sum of the N values obtained after the copula based re-sorting. As regards the parameterization of the procedure, the assessment of mean , standard deviation , skewness and the minimum value of each demand time series is required ( 4 × ∆ × parameters). Moreover, cross-correlations between demand time series must be evaluated, assuming the same as in the previous procedure ( ∆ × × ( ∆ × − 1)/2 parameters). Finally, the number of parameters to be assessed is equal to 4 × ∆ × + ∆ × × ( ∆ × − 1)/2.

Case Studies
Two case studies were analysed in this work. For both case studies the hourly consumption data from a smart water network located in a suburban area of Naples (Italy) were considered. In this area, called Soccavo, the municipal water company "Acqua Bene Comune Napoli" (ABC) implemented a smart WDN replacing almost 5000 traditional water meters with smart meters, aiming to reconstruct the total district consumption. For the sake of comparison with the results obtained in the previous work carried out by Creaco et al. [25], in the first case study (Case study 1) the data of 100 users for 31 days, from 1 January 2018 to 31 January 2018, were considered.
The second case study (Case study 2) is made up of 1000 users from the same smart water district, monitored from 1 October 2017 to 31 October 2017. Therefore, the case studies essentially differ because of the number of users. It must be noted that in both case studies only one month was considered since the parameters of the procedures presented are characterized by monthly variations.
The daily patterns of aggregated measured hourly demand in the 31 days considered in both case studies are shown in Figure 1, highlighting similar characteristics of water consumption in the network. In both case studies, the generated demand time series were obtained by assuming the typical day of generation to be subdivided into ∆ = 24 time steps with ∆ = 1 hr.
Furthermore, the generation of demand time series was performed for a number of days = 93 and was reiterated for 500 times for both procedures. In both case studies, the generated demand time series were obtained by assuming the typical day of generation to be subdivided into N ∆t = 24 time steps with ∆t = 1 hr.
Furthermore, the generation of demand time series was performed for a number of days n days = 93 and was reiterated for 500 times for both procedures.

Results
In the following sections the results obtained applying both the top-down and bottom-up procedures to both case studies are reported. In the comparison between basic statistics of measured and generated demand time series the average values over the 500 iterations were considered for the latter series.

Results-Case Study 1
For the sake of comparison with the results obtained by Creaco et al. [25], the top-down procedure was applied to the Case study 1 performing two applications: in the first application (Application 1) the top-down procedure was applied in order to generate single user and aggregated water demand time series. In the second application (Application 2), nodal demands were considered grouping the users in 20 nodes. In Application 2, the measured demand time series of the generic node were estimated as the sum of the demand time series of the related users. Therefore, the procedure was parameterized based on the measured single user and nodal demand time series, respectively in Application 1 and in Application 2.
As regards the single user demands of Application 1, Figure 2a shows the comparison between measured and generated hourly demands in terms of mean values, highlighting a perfect fit (R 2 = 1). Almost the same result (R 2 = 0.95) was obtained for the standard deviation values, as shown in Figure 2b. The fit in terms of skewness values (Figure 2c) is almost perfect (R 2 = 0.96) as well. As regards the cross-correlations, in Figure 2d the dots representing the rank cross correlations at lag 0, i.e., the spatial correlations between the N user demands in the same hour, are differentiated from the others. Indeed, the values of the rank cross-correlations at lag 0 show a good fit (R 2 = 0.89). However, the top-down procedure failed to preserve the rank cross correlations between users and at various temporal lags (R 2 = 0). As already shown by Alvisi et al. [26], the non-parametric approach is unable to preserve the correlations between the demands associated with one hour and those of the previous hour, since for each hour the data are disaggregated independently of those related to other hours.
Water 2020, 12, x FOR PEER REVIEW 5 of 13

Results
In the following sections the results obtained applying both the top-down and bottom-up procedures to both case studies are reported. In the comparison between basic statistics of measured and generated demand time series the average values over the 500 iterations were considered for the latter series.

Results-Case Study 1
For the sake of comparison with the results obtained by Creaco et al. [25], the top-down procedure was applied to the Case study 1 performing two applications: in the first application (Application 1) the top-down procedure was applied in order to generate single user and aggregated water demand time series. In the second application (Application 2), nodal demands were considered grouping the users in 20 nodes. In Application 2, the measured demand time series of the generic node were estimated as the sum of the demand time series of the related users. Therefore, the procedure was parameterized based on the measured single user and nodal demand time series, respectively in Application 1 and in Application 2.
As regards the single user demands of Application 1, Figure 2a shows the comparison between measured and generated hourly demands in terms of mean values, highlighting a perfect fit ( = 1). Almost the same result ( = 0.95) was obtained for the standard deviation values, as shown in Figure 2b. The fit in terms of skewness values (Figure 2c) is almost perfect ( = 0.96) as well. As regards the cross-correlations, in Figure 2d the dots representing the rank cross correlations at lag 0, i.e., the spatial correlations between the N user demands in the same hour, are differentiated from the others. Indeed, the values of the rank cross-correlations at lag 0 show a good fit ( = 0.89). However, the top-down procedure failed to preserve the rank cross correlations between users and at various temporal lags ( = 0). As already shown by Alvisi et al. [26], the non-parametric approach is unable to preserve the correlations between the demands associated with one hour and those of the previous hour, since for each hour the data are disaggregated independently of those related to other hours.    Figure 3a,b,d, the comparison between measured and generated aggregated hourly demands in terms of mean, standard deviation and rank cross-correlations values highlights a perfect agreement (R 2 = 1). As regards the skewness values (Figure 3c), the fit is again almost perfect (R 2 = 0.94).   The graphs in Figure 4 show the results obtained in Application 2 for generated hourly demands at single node level, leading to similar considerations to Application 1. Indeed, though respecting the basic statistics in terms of mean, variance and skewness the top-down procedure was unable to reproduce the existing rank cross-correlations between nodes at various temporal lags. The graphs in Figure 4 show the results obtained in Application 2 for generated hourly demands at single node level, leading to similar considerations to Application 1. Indeed, though respecting the basic statistics in terms of mean, variance and skewness the top-down procedure was unable to reproduce the existing rank cross-correlations between nodes at various temporal lags. Finally, the comparison of measured and generated aggregated hourly demands is reported in Figure 5. The performance on the aggregated scale is excellent as in the Application 1, demonstrating the effectiveness of the procedure in total district consumption reconstruction. Finally, the comparison of measured and generated aggregated hourly demands is reported in Figure 5. The performance on the aggregated scale is excellent as in the Application 1, demonstrating the effectiveness of the procedure in total district consumption reconstruction. The Table 1 summarizes the results discussed above along with the results obtained by Creaco et al. [22].  The Table 1 summarizes the results discussed above along with the results obtained by Creaco et al. [22]. Table 1. Comparison of mean values µ, standard deviation values σ, skewness γ, rank cross-correlations ρ and ρ − lag0 of measured and generated hourly demands, evaluating the fit in terms of R 2 at both single and aggregated scales, for both applications to the first case study and for both the top-down and the bottom-up procedures.

Results-Case Study 2
Both procedures were applied to the Case study 2 performing three applications: the users were grouped in 10 (Application 1), 50 (Application 2), and 100 (Application 3) nodes, allocating in each node 100, 20, and 10 users respectively.
The results in terms of R 2 of the fit obtained by the application of the top-down procedure are shown in Table 2.
For all the three applications, the performance on mean, standard deviation, skewness and cross-correlations at lag 0 at single node level is excellent. However, the fit of rank cross-correlations at lag 0 seems to slightly improve with the increasing level of aggregation in nodes, i.e., with the increasing number of users allocated in each node. In this respect, even though Application 2 and Application 3 confirm the inability of the top-down procedure to reproduce the cross-correlations at various temporal lags, better results are obtained in case of 10 nodes. Indeed, for Application 1, the R 2 reaches a value of 0.54. It can be stated then that a high level of aggregation in nodes can improve the performance in terms of cross-correlations for the top-down procedure.
As regards the aggregated demands, the fit is always perfect for all three applications, confirming the effectiveness of the procedure in total district consumption reconstruction. Table 3 shows the results obtained applying the bottom-up procedure. For all three applications, the bottom-up procedure performs better than the top-down procedure in terms of rank-cross correlations at single node level. Indeed, the fit is always perfect (R 2 = 1), attesting to the effectiveness of the copula-based re-sort during the second phase of the procedure. However, on the aggregated scale the performance in terms of skewness is better in case of application of the top-down procedure. As already stated by Creaco et al. [25], the deterioration of the fit in terms of skewness when aggregated demand is considered is due to both the parameterization, which was performed on the single user scale, and the approximations inherent in the modelling. However, the fit of the skewness values seems to improve with the increasing level of aggregation in nodes. Indeed, the maximum value of R 2 is reached in case of 10 nodes (R 2 = 0.64).
For explicatory purposes, the graphs in Figure 6 report, for measured and generated aggregated demand time series, the daily temporal patterns of mean µ and their intervals µ ± 0.5µ for both the procedures in Application 2. The means for the measured demand time series were obtained starting from the aggregated measured demands, while those for the generated demand time series are the average values over 500 generations. However, these graphs constitute further evidence of the goodness of the fit at aggregated scale for both procedures.
Water 2020, 12, x FOR PEER REVIEW 10 of 13 For explicatory purposes, the graphs in Figure 6 report, for measured and generated aggregated demand time series, the daily temporal patterns of mean μ and their intervals ± 0.5 for both the procedures in Application 2. The means for the measured demand time series were obtained starting from the aggregated measured demands, while those for the generated demand time series are the average values over 500 generations. However, these graphs constitute further evidence of the goodness of the fit at aggregated scale for both procedures.

Discussion
In this section some considerations about the results reported above are made. Case study 1 was previously used by Creaco et al. [25] for the application of the bottom-up procedure. Therefore, a comparison between the top-down and the bottom-up procedures can be

Discussion
In this section some considerations about the results reported above are made. Case study 1 was previously used by Creaco et al. [25] for the application of the bottom-up procedure. Therefore, a comparison between the top-down and the bottom-up procedures can be carried out. The results reported in Table 1 show the inability of the top-down procedure to reproduce the cross-correlations at various temporal lags at both single user and nodal scales. However, according to the work of Creaco et al. [25], in both cases (single users and single nodes), the performance in terms of rank cross-correlation of the bottom-up procedure is excellent. Indeed, the R 2 of the fit between measured and generated values is equal to 1 at both single user and nodal scales.
However, as regards the aggregated demands better results were obtained by applying the top-down procedure. As mentioned above, for the bottom-up procedure the parameterization performed on the single user scale leads to the deterioration of the fit in terms of skewness and rank cross-correlation when the aggregated demands are considered. Furthermore, better results were obtained applying the bottom-up procedure starting from the nodes of the WDN, rather than from single users.
In this respect, the results obtained in Case study 2 give further evidence of the effectiveness of the bottom-up procedure in generating consistent synthetical patterns of nodal demands.
Account must be taken of the higher burden of parameterization of the bottom-up procedure. Indeed, the top-down procedure requires definition of a lower number of parameters. However, the parameters required by the bottom-up procedure can be easily estimated when smart meter readings are widely available over the WDN. Also, the computational burden is lower for the top-down procedure.
As regards the choice of ∆t in the present work, for both procedures, an hourly time step was considered. However, the procedures are expected to be effective also for different values of ∆t. Anyway, the bottom-up procedure is mainly based on the application of rank cross-correlations to demand time series. Therefore, its use is more suitable in the case of ∆t values in which these correlations are significant, i.e., starting from hourly time step [31]. Furthermore, the bottom-up procedure neglects the pulsed nature of demand, that becomes predominant when ∆t is small, i.e., of the order of minutes or seconds.

Conclusions
In the present paper, two procedures for the generation of demand time series at both single user and nodal scale were presented. The first procedure consists of a top-down approach based on the disaggregation developed by Nowak et al. [16]. According to this procedure, once the temporally aggregated water demand patterns have been defined, the disaggregation is applied to generate water demand time series at lower levels of spatial aggregation.
The second procedure proposed is made up of a bottom-up procedure. Under this procedure, a copula based re-sort is applied to demand time series of the first attempt generated through a beta or gamma probability distribution to impose existing rank cross-correlations.
Two case studies with different numbers of users were then considered performing various application types.
While the reproduction of mean and standard deviation of demand time series at single users (nodes) and for the total demand is satisfactory for both procedures, differences arise as for demand cross-correlations and skewness.
As expected, the top-down procedure is better at reproducing the total demand, especially in terms of skewness and temporal correlations. The bottom-up procedure prevails, instead, as for cross-correlations between single users (nodes).
Indeed, the top-down procedure proved poorly capable of reproducing the cross-correlations at various temporal lags though results slightly improve in the case of high levels of aggregation in nodes.
As regards the aggregated demands, the top-down procedure showed a better performance in terms of skewness and rank cross-correlation. However, it was found that the application of the bottom-up procedure for the generation of nodal demands, rather than single users' demands, has positive impacts in respecting rank cross-correlations in the aggregated consumption. Therefore, it can be successfully applied to the generation of nodal demands in a WDN.
Author Contributions: Conceptualization and methodology D.F. and E.C.; software E.C.; validation and resources F.D.P and M.G.; writing-original draft preparation, D.F.; writing-review and editing, D.F. and E.C.; supervision, E.C., F.D.P., M.G. All authors have read and agree to the published version of the manuscript.
Funding: This research received no external funding.