Bottom-Up Generation of Peak Demand Scenarios in Water Distribution Networks

This paper presents a two-step methodology for the stochastic generation of snapshot peak demand scenarios in water distribution networks (WDNs), each of which is based on a single combination of demand values at WDN nodes. The methodology describes the hourly demand at both nodal and WDN scales through a beta probabilistic model, which is flexible enough to suit both small and large demand aggregations in terms of mean, standard deviation, and skewness. The first step of the methodology enables generating separately the peak demand samples at WDN nodes. Then, in the second step, the nodal demand samples are consistently reordered to build snapshot demand scenarios for the WDN, while respecting the rank cross-correlations at lag 0. The applications concerned the one-year long dataset of about 1000 user demand values from the district of Soccavo, Naples (Italy). Best-fit scaling equations were constructed to express the main statistics of peak demand as a function of the average demand value on a long-time horizon, i.e., one year. The results of applications to four case studies proved the methodology effective and robust for various numbers and sizes of users.


Introduction
The last three decades have seen the spread of physically based models, which are used by water utility staff for replicating the nonlinear behavior of water distribution networks (WDNs) with specific objectives, such as contingency planning, network optimization, and strategy planning [1,2]. Among the critical inputs of WDN modelling, nodal demands play a major role [3]. Therefore, numerous research efforts have recently been dedicated to the accurate assessment of nodal demands, while benefitting from increasingly large datasets provided by smart water metering technologies [4].
A viable option to reconstruct demand lies in the application of the bottom-up stochastic approach [5], which is the topic of the present paper. This approach aims to reconstruct the whole WDN demand starting from the single nodes upward, while representing the single nodal demands as stochastic variables.
If the focus is the representation of demand at very fine spatial and temporal scales, pulse generation models can be used (e.g., see [6] for a review of these models), which can reproduce the arrival time, duration, and intensity of demand pulses generated by each faucet present in any WDN node. The output of these models can be used as an input to unsteady flow models with large computational overhead, as was shown in work [3].
Other kinds of approaches can be used when larger time steps are considered. If the focus is to reproduce nodal demand time series with values aggregated at temporal step ranging from minutes to hours, methodologies such as those proposed in the works [7][8][9][10] can be used. The output of these methodologies can be used as an input to the extended period simulation, which represents the behavior of the WDN as a succession of steady states [5]. The overall pulse model of Gargano et al. [8] aims to extend the concept of pulse from the representation of the instantaneous demand of the single faucet to the representation of the overall demand of one user or one node at a larger time step. In the methodology of Alvisi et al. [7], the demand associated with the generic time step is represented by the random variable through the multinomial probability distribution, with predetermined mean and standard deviation. Furthermore, cross-correlations between timeseries at lag 0 and lag 1 are considered. Compared to [7], the methodology of Creaco et al. [10], based on the beta probability distribution, preserves the skewness in demand time series and enforces correlations between time series at all lags. Furthermore, Creaco et al. [10] also presented a procedure to reconcile the generated demand time series with demand pulses generated at fine time step, thus enabling reconstruction of demand at any time step. The methodology of Kossieris et al. [9] is based on the mixed-type distribution for the representation of demand time series, while yielding similar performance to the methodology of Creaco et al. [10].
In various cases, the modelling objective has focused on exploring the performance and reliability of the WDN only in peak demand scenarios, with no need to investigate the temporal pattern of demand [5]. Starting from the early decades of the twentieth century [11], the peak demand has been the subject of numerous works, aimed at estimating the daily peak demand coefficient, as the ratio of maximum to daily average demand, as a function of the number of users. While the first works described this coefficient as a deterministic variable e.g., see [11][12][13][14], Tricarico et al. [15] were among the first to present the daily peak demand coefficient as a random variable. The probabilistic approach in [15] was later refined by Gargano et al. [16], who successfully described the daily peak demand coefficient using three probability distributions, namely the log-normal, Gumbel, and log-logistic distributions. Furthermore, Gargano et al. [16] used statistical analysis to express the parameters of these distributions as a function of the number of users and of the temporal step used for demand representation. Balacco et al. [17] used a different probabilistic framework, based on the exponential and Gumbel distributions, to develop a relationship between the above mentioned daily peak demand coefficient and the number of users. The applications of Balacco et al. [17] concerned several towns in Puglia. Finally, Del Giudice et al. [18] provided a methodological framework to investigate the main features of water demand hourly peak coefficients, with the objective to estimate the sample mean of hourly peak factors, the associated standard error (allowing definition of confidence bands), and its probability distribution. While considering the sample means normally distributed, they also provided empirical relations to express the sample mean and standard error as a function of the number of users. Though providing formulations and relationships that can be profitably used by practitioners to estimate the overall peak demand coefficient in a WDN, the methodologies in [15][16][17][18] fail to describe how the peak must be allocated to its nodes. In fact, in a generic peak scenario, the adoption of a single peak demand coefficient at all nodes, according to the deterministic top-down approach [5], may lead to an unrealistic representation of the peak demand in the WDN. One of the few works aimed at reconstructing the snapshot peak demand scenario in an entire WDN, as the combination of peak demands at all the nodes, is that of Magini et al. [19]. In the methodology presented in [19], combinations of simultaneous nodal demands are first generated through the gamma probability distribution with parameters expressed as a function of the number of nodal users by means of scaling laws. Then, nodal demand series are resorted to preserve correlation between nodes through the Iman-Canover method [20]. Though the methodology of Magini et al. [19] performs effectively at reconstructing peak demands through the bottom-up approach, the following aspects in [19] require further analysis and are therefore explored in the present paper:

•
Being based on two parameters, the gamma probability distribution used in [19] enables respecting statistics of user demands only up to the second order, i.e., mean and standard deviation. Therefore, the question arises if more complex distributions, such as the beta distribution with tunable bounds [10], can be used to respect bounds and statistics of user demands up to the third order i.e., including the skewness; • The scaling laws used in [19] for expressing the moments of the peak demand are based on the number of users as independent variable. However, demand may differ significantly from a user to another, being affected by the number and kind of household occupants, which is often unknown and/or variable in the year. Therefore, the average demand value on the long run, i.e., one year, is a more significative variable than the number of nodal users and can be considered as the independent variable for more well-founded scaling equations; • In some cases, there may not be enough demand data to calibrate the scaling equations for estimating rank cross-correlations between nodal peak demands. Therefore, the question arises if an alternative procedure can be conceived to impose crosscorrelations in the WDN.
The rest of the paper is organized as follows. Section 2 contains the methodology. Section 3 contains the applications, i.e., datasets from the district of Soccavo, Naples, Italy [10,18,21] and results. Sections 4 and 5 contain the discussions and conclusions, respectively.

Methodology
The methodology adopted in this paper consists of two steps, i.e., step 1 and step 2, which follow a preliminary step 0. As the flow chart in Figure 1 shows, step 0 operates on the smart metering data of measured consumption at the users of a WDN. It aims to create aggregated datasets of measured peak demands associated with growing values of the average demand on the long run.
Step 1 concerns parameterization of a beta probability distribution model for the bottom-up generation of peak demand samples as a function of the average demand.
Step 2 concerns the resorting of generated demand samples to preserve the expected degrees of rank cross-correlation in a WDN. In the following subsections, the preliminary step 0, and steps 1 and 2 of the methodology are described.

Step 0-Creation of the Dataset
The objective of Step 0 is to obtain samples of measured peak demand (for different aggregation levels) associated with various values of the average demand on the long run. As is explained below, these samples are obtained by aggregating the peak demand measurements at the users of a WDN. However, the level of aggregation at which the generic average demand value on the long run is obtained is not relevant for the statistical analyses of Steps 1 and 2.
Step 0 starts from the recorded timeseries of demand at a high number n u of users of a WDN with the temporal resolution of one hour, which corresponds to the typical time step used for WDN modelling, covering a period of at least one year. This dataset will look like the demand data d j i , with subscript i and superscript j associated with the i-th of the n u users and with the j-th of the 8760 h in the year (8760 = 24 h ×365 days), reported as a matrix in Table 1. It must be noted that the number 8760 for the hours is used here for explicative reasons, without loss of generality. Longer durations of timeseries could also be adopted. For the timeseries recorded at the i-th user, the long-time average d y i can be easily calculated (final row of the matrix in Table 1). If the series duration is one year, as is assumed implicitly hereinafter, d y i represents the yearly average. The first operation to carry out in Table 1 lies in the selection of the data to be used for the statistics, based on a certain criterion, e.g., the data of a certain hour of the day in a certain period of the year, e.g., the morning peak in working weekdays. In this context, let us assume that N tot hours of the year (with N tot < 8760), which are considered homogeneous from the statistical viewpoint, are selected leading to the dataset shown in Table 2. The yearly average value in Table 2 is kept the same as that in Table 1, since it does not depend on the criterion chosen to select the N tot hours of the year. Then, the spatial aggregation is performed by randomly summing the columns of Table 2 from the left to the right, to obtain the aggregated demands D j i and the aggregated yearly averages D y i (see Table 3), calculated by the two following formulas, respectively: The spatial aggregation can be reiterated various times, considering a different (random) sequence of users at each time. This will result in a different sequence of aggregated demands. Of course, the last aggregations (for i slightly lower than n u ) will be potentially very similar, while the very last ones (for i = n u ) will be identical between each other. In fact, for i = n u , all demands at time instant j are considered in all sequences within the aggregation. It is worth noting that reiterating n times Step 0 is likely to reproduce Table 3 n times, where in each column (i.e., for a given aggregation), different values are found related to the appropriate yearly average D y . At the end of Step 0, the n tables obtained can be concatenated horizontally. The demand samples associated with the not aggregated measured demand (i.e., Table 2) are also concatenated at the end, to include numerous demand data associated with low values of D y that would not be generated in the aggregation process. Finally, the columns of the final dataset are sorted in ascending order of D y and the doublet columns potentially present are removed. As a result, the final dataset contains peak demand samples for growing values of D y , independently of the aggregation level.

Step 1-Beta Probability Distribution with Tuneable Bounds
The beta probability distribution with tunable bounds was chosen in this work to represent the generic aggregated demand D i in each column of Table 3. This choice was due to its capability of reproducing the statistics of demand series in terms of mean, standard deviation, and skewness, as was shown by Creaco et al. [10].
The density function f of the beta distributed random variable D i , representing the water demand of a generic aggregation (see Table 3), is [22]: in which B, α and β are the beta function and the positive shape parameters, respectively. Finally, a ≥ 0 and b > a are the lower and upper bounds, respectively. Parameters α and β, and bounds a and b can be related to mean D mean and standard deviation σ of D i through the following Equations (4)- (7): where: The skewness γ of the beta probability function can be evaluated as a function of α and β as: For any pair of values of a (<D mean ) and b (>D mean ), α and β can be obtained from Equations (4) and (5) while respecting mean D mean and standard deviation σ of demand D i . This occurs if D mean and σ, in Equations (6) and (7), respectively, are set equal to the mean and standard deviation of the measured demand variable D i (method of the moments). However, each a, b pair determines different values of the shape parameters α and β, and subsequently of γ. In this work, a was set at the minimum observed value D min of D i (which was therefore an additional parameter of the probability distribution model), while b was suitably modified. In practice, the increase (or decrease) in b results in the lengthening (or shortening) of the right tail of the probability distribution, and therefore in the increase (or decrease) in the value of γ. The best value of b can be found by solving numerically an implicit equation obtained by setting γ at the observed value γ obs in Equation (8), after replacing α, β, µ, and σ, as they are expressed in Equations (4)- (7). In this work, a local optimization based on the active-set method [23] aimed at minimizing the absolute difference between γ in Equation (8)  Summing up what is reported above, parameters α and β, and bounds a and b can be univocally identified starting from minimum D min , mean D mean , standard deviation σ, and skewness γ in the measured demand data. Best-fit scaling equations can be constructed to relate D min , D mean , σ, and γ of the random variable D i , i.e., the demand in the peak hour in working weekdays, to the yearly average demand D y i . This enables obtaining a probability model to generate samples of peak demand values for any value of yearly average demand, when single or aggregated users have similar characteristics to those considered for model parameterization (available data). Specifically, the probability distribution model can be applied to generate a sample of N g peak demand values (D i ) for the generic node i (with known value of the yearly average demand D y i ) of a WDN. It can also be applied to generate a sample of N g demand values (D WDN ) for the whole WDN, which has a known value of the yearly average demand D y WDN . However, if rank cross-correlations between nodes are neglected, the peak demand estimate D WDN * obtained for the whole WDN by summing all nodal demand samples, i.e., D WDN * = ∑ Nn i=1 D i with N n equal to the number of nodes in the WDN, may not respect the expected statistics of standard deviation and skewness for the yearly average demand D WDN of the whole WDN. The next subsection deals with the resorting of the demand samples generated for each WDN node to preserve the Spearman rank cross-correlations at lag 0.

Step 2-Re-Sorting of Generated Demand Samples
The algorithm used in this work to enforce rank cross-correlation is based on a modified version [24] of the Gaussian Copula [25]. Let us assume we have used Step 1 of the methodology to generate uncorrelated demand samples of N g peak demands D j i for N n nodes in a WDN, where subscript i and superscript j range from 1 to N n and from 1 to N g , respectively. This method consists of generating N n samples of N g values y j i , sampled from random variables with mean values to 0 and standard deviations equal to 1 through the multivariate normal distribution characterized by a correlation matrix with sizes N n x N n , in which the generic element ρ k,l represents the expected rank cross-correlation between peak demand D k for the generic node k and peak demand D l for the generic node l. The rank cross-correlation matrix is symmetric with diagonal element equal to 1. To enforce the expected rank cross-correlations ρ k,l , the values D j i must be reordered using the same sorting as y j i .
If datasets of peak demand are available, such as those obtained from step 0 of this methodology, the values of rank cross-correlation can be calculated by confronting a k-column of one dataset with the l-column of another. Then, a best-fit scaling equation can be constructed to relate rank cross-correlations ρ k,l to the yearly average demands of nodes/aggregations k and l. By applying this equation, the rank cross-correlation for two generic WDN nodes with similar characteristics of peak demand to the datasets can be evaluated.
If the peak demand datasets are not numerous enough to enable robust best-fit equations, the following simplified procedure can be applied. This procedure is based on a numerical optimization and considers a single value of correlation ρ in the WDN. In other words, ρ k,l = ρ for any pair of nodes k and l in the network. The optimal value of ρ is searched for by applying a local optimization based on the active-set method [23], to obtain a cumulative distribution of D WDN * close to that of D WDN . Summing up, with reference to the uncorrelated demand samples of N g peak demands D j i for the N n nodes and to the sample of N g peak demands D j WDN for the whole WDN, both available from Step 1, the following instructions are carried out for each value of ρ proposed by the optimizer:

1.
Resort the N n samples of N g peak demands D j i following the ρ-based sort obtained through the Copula algorithm. When the optimizer converges, the best value of ρ that minimizes the 1-norm distance between the two cumulative distributions (objective function) is obtained.
At the end of Step 2, N g scenarios of peak demands D j i will be available. The generic j-th of the N g scenarios contains the re-sorted peak demand values D j i , with i ranging from 1 to N n nodes of the WDN.

Data
The applications of this work were carried out using the users' demand collected with a smart metering system from the Soccavo district of Naples, Italy. These data have already been used in the scientific literature [10,18,21]. They include the demand time series of n u = 1067 residential users with one hour resolution, for a total time period of one year (8760 values for each user), from 20 March 2017 h 0-1, to 19 March 2018 h 23-24. The demand time series data available from the monitoring of the Soccavo district are characterized by yearly average demands d y ranging from 3.47 × 10 −4 to 0.014 L/s. Further details can be found in the work of Fiorillo et al. [21]. The application of the first part of the preliminary Step 0 of the methodology (see Tables 1 and 2) enabled selection of N tot = 228 h in the year, corresponding to the morning peak from 8 am to 9 am in the working weekdays in all the months of the year except for August, which showed a very different behavior from the rest of the year. Then, in the second part of Step 0, two datasets were randomly generated for the Step 1 of the methodology. Each dataset was obtained by iterating the aggregation procedure based on Equations (1) and (2) (see Tables 2 and 3) for n = 50 times. As subsets of these two datasets, four smaller datasets (two with n = 1 and two with n = 2) were used for Step 2 of the methodology, that is for the characterization of the rank cross-correlation. The numerosity of these datasets was chosen to obtain a sufficiently representative group of datasets for assessing the parameters of the beta probability distribution and the rank cross-correlations, respectively. In the following subsections, first the results of Step 1 and Step 2 are described, followed by explicatory applications of the methodology for demand generation in WDNs. Only for clarity's sake, it is worth reminding at this stage that each dataset is organized like in Table 3, with each column reporting peak demand values D associated with a certain value of yearly average demand D y .

Results of Step 1
Since the parameters α and β, and bounds a and b of the beta probability distribution can be evaluated only if the minimum value D min , mean D mean , standard deviation σ, and skewness γ have been assigned, best-fit scaling equations were constructed to relate D min , D mean , σ, and γ of the peak demand D to the yearly average demand D y . Therefore, the two datasets constructed in Step 0 were used for constructing/calibrating and for validating the best-fit equations, respectively. In the graphs in Figure 2, D min , D mean , σ, and γ are plotted against D y for the peak demands D in the calibration datasets, independently of the aggregation level. These graphs show very regular increasing patterns of the first three parameters as a function of D y . Though all peak demands have a clearly positive asymmetry, the pattern of γ shows instead some irregularities, maybe because D y is unable to explain alone the variability in γ. Linear best-fit models were used to express D min and D mean . Due to the change in slope in the pattern σ(D y ) at around D y = 0.02 L/s, a combination of two linear models was chosen for σ. Finally, an exponentially decreasing model was chosen for γ. The best-fit scaling equations are reported in the following Table 4, pointing out very high-fitting performance in terms of R 2 for the first three parameters and a sufficient performance for γ (see also the lines in the graphs in Figure 2). γ in peak demand D as a function of the yearly average demand D y , as obtained from calibration dataset (grey dots) and from best-fit equations (black lines). Table 4. Best-fit equation obtained for each of the four parameters of the probability model and for the rank cross-correlation. R 2 in calibration and validation.  Table 4). A last verification of the beta probability model with parameters estimated through the best-fit scaling equations for D min , D mean , σ, and γ is shown in the graphs in Figure 4. Each graph allows comparing the histogram of normalized frequencies of the measured peak demand for a certain value of yearly average demand D y , with the beta probability density function associated with same value of D y . Globally, Figure 4 points out a satisfactory agreement for both low (graphs a and b) and high (graphs c and d) values of D y . Incidentally, it must be remarked that the low and high values of yearly average demand D y determined different patterns for the probability density function of peak demand D, since the probability of values of D close to 0 is high for the former and null for the latter. Furthermore, it must be remarked that the beta probability model enabled preservation of the skewness values obtained with the best-fit equation reported in Table 4. In fact, in the four case studies, it provided γ values equal to 1.7595, 1.6148, 0.2156, and 0.2156, respectively. If applied in the same context and parameterized only based on D mean and σ, the gamma probability model described in the work [19] would provide γ values equal to 1.9560, 1.9560, 0.1760, and 0.1605. Therefore, it would overestimate or underestimate the skewness for the low or high values of D y , respectively. Another benefit of the beta probability model is that it fits the minimum value D min of D better than the gamma probability model, for which D min =0 for any value of D y .

Results of Step 2
The four datasets constructed in Step 0 to explore rank cross-correlation were used to construct/calibrate and validate best-fit scaling equations. Calibration was carried out using one dataset with n = 1 and one dataset with n = 2. Validation was carried out using the other dataset with n = 1 and the other dataset with n = 2. In either phase, rank crosscorrelations (different combinations of k and l columns) were calculated between the peak demand samples in the dataset with n = 1 and those in the dataset with n = 2. Values of the rank cross-correlations equal to 1 due to the presence of identical peak demand samples were discarded from the analysis. The rank cross-correlations between the very much aggregated peak demands were also removed because the random sequential aggregation leads to overly similar peak demand samples at the last iterations and, therefore, to misleading rank cross-correlations very close to 1, as it was said above. The analysis of the rank cross-correlation ρ between two samples of peak demand showed that this variable tends to increase when the product prod of the yearly average demands grows. Under the same value of prod, ρ tends to increase when the ratio of the minimum (low) to the maximum (high) yearly average demand grows from 0 to 1. The relationship between ρ and prod was expressed by a polynomial model and a logarithmic model, valid for low and high values of prod, respectively. Both models were multiplied by a power of ratio to introduce the dependency of ρ on ratio. The calibrated best-fit equations are reported in Table 4, as well as the high-fitting performance in terms of R 2 in both calibration and validation. The graphs in Figure 5 compare the ρ values obtained from the datasets with those obtained through the best-fit scaling equations as a function of prod, attesting to the goodness of fit of the latter equations. In fact, in both graphs, the clouds of dots associated with the measured and calculated values of ρ as a function of prod are satisfactorily overlapped. Then, some tests were carried out on the numerical optimization procedure for sample re-sorting. In fact, for various pairs of values of D y , peak demand samples were generated and re-sorted by using the optimization procedure described above and by enforcing the correlation degree obtained through the best-fit formula. In all cases, the optimizationbased re-sorting yielded similar rank cross-correlations to those suggested by the formula. Other tests of step 2 of the methodology can be found in the following subsection.

Peak Demand Generation in WDNs
Four case studies, numbered as 1a, 1b, 2a, and 2b, were considered for peak demand generation, to test the methodology under different demand conditions. All case studies were assumed to have similar peak demand features to the Soccavo district, thus legitimizing the applicability of the relationships in Table 4. Obviously, the relationships would need to be recalibrated under different peak demand conditions.
Case study 1a was represented by the WDN serving a small fictitious town quarter in Italy. Further details can be found in the work [26]. The quarter was assumed to host 850 inhabitants, linked to 183 nodes, 170 of which have known nonzero average demand. The yearly average demand D y of the whole WDN was about 1.57 L/s. Case study 1b differed from case study 1a due to the amplification of each node demand through a factor of 2.5, resulting in a yearly average demand D y of the whole WDN of about 3.93 L/s. Case study 2a considered in this work was a skeletonized version of case study 1a. Further details can be found in the work [26]. In particular, demanding people's allocation in the skeletonized WDN was carried out considering the criterion of connection proximity to the node. To this end, the demanding people placed along the upstream part of the generic pipe were allocated to the upstream node in the skeletonized layout, and vice-versa. This resulted in 12 nodes associated with known nonzero demand values. Case study 2b differed from the case study 2a due to the amplification of demand through a factor of 2.5. As expected, the values of D y for the whole WDN in case studies 2a and 2b were equal to those in case studies 1a and 1b, respectively. The D y values of each demanding node in case studies 2a and 2b are reported in Table 5. By applying step 1 of the methodology with the best-fit equations for D min , D mean , σ, and γ reported in Table 4, (starting from the D y values in the nodes) samples of peak demand D were generated for the WDN nodes in all case studies using the corresponding beta probability distributions. Specifically, one sample with N g = 100,000 peak demand values was generated for each node of given yearly average D y . Then, the WDN peak demand D WDN * was obtained by summing the nodal peak demands without resorting. Obviously, it was made up of a sample of N g = 100,000 values. As the graphs in Figure 6 show, the cumulative distribution of D WDN * may be far from the theoretical cumulative distribution D WDN obtained by applying the beta probability model related to the whole yearly average demand D y WDN , especially for larger nodal values of the yearly average demand (case studies 1b, 2a, and 2b). Both variants of re-sorting in Step 2 of the methodology, based on the best-fit equation and on the optimization, respectively, brought the cumulative distribution of D WDN * very close to the theoretical cumulative distribution of D WDN (see Figure 6).
The single values of rank cross-correlations ρ obtained through the optimization in the four case studies were equal to 0.002, 0.0063, 0.1025, and 0.3381, respectively. As expected, ρ grew when the average value of D y at WDN nodes increased. In each case study, the ρ l,k values obtained through the best-fit equation changed from pair of nodes to pair of nodes, but lay around the single value obtained through the optimization. As further evidence of the effectiveness of the methodology, the graphs in Figure 7 compare, for case study 2a, the histogram of normalized frequencies of D WDN * with the theoretical beta probability density function of the peak demand D WDN in the whole WDN, proving that only by resorting nodal demand samples can the sum of nodal demand represent the expected demand for the whole WDN. The theoretical mean, standard deviation, and skewness values and the values obtained by summing the generated peak nodal demand samples are reported in Table 6, showing that the only preserved statistic without data re-sorting was the mean. This is because the mean is simply additive. Best-fit equation-based and optimization-based resorting yielded significant improvements to preservation of standard deviation and skewness.

Discussion
The best-fit scaling equations obtained in the present work to estimate D mean , σ, and ρ, shown in Table 4, can be compared with the scaling laws derived through a theoretical framework, e.g., [19,27]. These laws are based on the nodal number of consumers and the statistical features of water demand of the typical single user, the unitary user. To put them over the same ground as the best-fit equations of this work, the scaling laws were reformulated through algebraic manipulations as a function of the yearly average demand, instead of the number of users (see Appendix A). The calibration of the scaling laws on the Soccavo dataset yielded the following equations: Equation (9) for estimating D mean is identical to the best-fit equation obtained in the present work (see Table 4). Equation (10) for estimating σ has a similar fit to the calibration data (R 2 = 0.994), in comparison with the best-fit equation obtained in the present work (see graphs a and c in Figure 8). However, it must be remarked that the calibration of the scaling law causes a significant underestimation of σ for the low values of D y (see graphs b and d in Figure 8). Equation (11) for estimating rank cross-correlation has a globally lower goodness of fit (R 2 = 0.897, see Figure 9) and tends to underestimate the variability in the value of ρ.  Product of Average Demands (L 2 ts 2 ) Figure 9. As a function of the product prod of yearly average demands D y , comparison of rank cross-correlations ρ between peak demands D obtained from calibration datasets (grey dots) with the values obtained through (a) the best-fit equations of this work (black crosses) and through (b) the calibration of the theoretical scaling law on the Soccavo district (black crosses).
As a result of the comparison shown in Figures 8 and 9, best-fit scaling equations with a better goodness of fit than the calibrated theoretical scaling laws can be obtained to relate the statistical parameters of consumption to the yearly average demand.
Another issue to be discussed lies in the comparison of the bottom-up approach for generating peak demand scenarios with the traditional deterministic top-down approach, based on applying amplificative peak coefficients on the yearly average demands D y at WDN nodes. In case study 2a, in the case of nodal demands reconstructed using the optimization-based resorting, the peak demand coefficient was evaluated at the WDN nodes as the ratio of the peak demand on the yearly average demand. This was done in 20 demand scenarios featuring similar values of the cumulative frequency F close to 0.5 evaluated on the whole network demand (see graph c in Figure 6). As Table 7 shows, the peak demand coefficient in the whole WDN was about 1.81 in all scenarios. The nodal demand coefficients were, instead, quite variable around this value.
To better explore the difference of the bottom-up approach presented in this paper and the deterministic top-down approach, Figure 10 presents a comparison in terms of histogram of frequencies of the peak demand coefficient at node 1 in case study 2a. Graph (a) shows the histogram of frequencies obtained through the bottom-up approach, whereas graph (b) shows that obtainable through the deterministic top-down approach, by applying to all WDN nodes the same demand coefficient as the whole WDN. The comparison points out that the use of the deterministic top-down approach leads to the underestimation of the range of variation in the demand coefficient.  The results shown in Table 7 and in Figure 10 point out that the bottom-up generation presented in this work enables a much more realistic peak demand allocation than the standard top-down approach used in the engineering practice, which assigns the peak coefficient of the whole WDN to the single nodes. In this context, it must be noted that limitation is not inherent in the top-down approach but rather in the way practitioners apply it. In fact, there are notable examples in the scientific literature [28,29], where this limitation does not appear in the context of the top-down generation of demand time series. However, the top-down generation of peak demand scenarios has not been much explored so far.

Conclusions
The main conclusions of this work are the following:

1.
The beta probability distribution model with tunable bounds is effective at representing the peak demand at different degrees of aggregation, while successfully preserving bounds and statistics up to the third order; 2.
By making use of available data of measured consumption at WDN users, best-fit scaling equations can be constructed to relate mean, standard deviation, skewness, and rank cross-correlation of peak demands to the yearly average values; 3.
An optimization-based re-sorting of data samples generated at each WDN node with the beta probability distribution can be used when there are not enough data to calibrate the scaling equations for estimating rank cross-correlations between nodal peak demands; 4.
The bottom-up approach presented in this work yields more realistic demand scenarios than the deterministic top-down approach used in the engineering practice, based on applying single amplificative factors to all WDN nodes.
Author Contributions: Conceptualization, E.C. and M.F.; methodology, E.C. and G.G.; investigation, E.C. and G.G.; writing-original draft preparation, E.C.; writing-review and editing, G.G., A.C., and M.F. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Data, models, or codes that support the findings of this study are available from the corresponding author upon reasonable request.
Acknowledgments: This research was conducted using the funds supplied by the Universities of Pavia, Catania and Ferrara.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The theoretical scaling laws are equations derived through a theoretical framework based on the assumption that water flow in a meter, corresponding to the water consumption of a unit user, is a random variable or realization of a stationary stochastic process e.g., see [19,27]. These laws relate the statistics D mean , σ, and ρ of the aggregated users to the corresponding statistics D mean,1 , σ 1 , and ρ 1 of the typical unitary user, which must be calibrated based on monitored demands.
The theoretical scaling law for the average demand D mean is: D mean = n u D mean,1 .
in which n u is the number of aggregated users. The following manipulations involving the introduction of the yearly average demand D y of the aggregated users lead to Equation (9): D mean = D y D y n u D mean,1 = n u D y D mean,1 D y = par 1 D y .
The theoretical scaling law for the rank cross-correlation ρ between demands at nodes k and l is: σρ k,l = n u,k n u,l ρ 1 n u,k [1 + ρ 1 (n u,k − 1)] n u,l [1 + ρ 1 (n u,l − 1)] (16) in which n u,k and n u,l are the number of users at node k and the number of users at node l, respectively. The following manipulations involving the introduction of the yearly average demand D y for both nodes k and l lead to Equation (11): In the reformulated theoretical scaling laws, parameters par 1 , par 2 , par 3 , par 4 , and θ must be calibrated on the field based on monitored demands.