Simulating Marginal and Dependence Behaviour of Water Demand Processes at Any Fine Time Scale

Uncertainty-aware design and management of urban water systems lies on the generation of synthetic series that should precisely reproduce the distributional and dependence properties of residential water demand process (i.e., significant deviation from Gaussianity, intermittent behaviour, high spatial and temporal variability and a variety of dependence structures) at various temporal and spatial scales of operational interest. This is of high importance since these properties govern the dynamics of the overall system, while prominent simulation methods, such as pulse-based schemes, address partially this issue by preserving part of the marginal behaviour of the process (e.g., low-order statistics) or neglecting the significant aspect of temporal dependence. In this work, we present a single stochastic modelling strategy, applicable at any fine time scale to explicitly preserve both the distributional and dependence properties of the process. The strategy builds upon the Nataf’s joint distribution model and particularly on the quantile mapping of an auxiliary Gaussian process, generated by a suitable linear stochastic model, to establish processes with the target marginal distribution and correlation structure. The three real-world case studies examined, reveal the efficiency (suitability) of the simulation strategy in terms of reproducing the variety of marginal and dependence properties encountered in water demand records from 1-min up to 1-h.


Introduction
In the planning and management of urban water systems, the temporal and spatial variability of residential water demand is one of the most influential sources of uncertainty [1,2]. Conventional modelling approaches often neglect this aspect, by involving on the one hand, identical multiplier patterns to provide a coarse representation of the diurnal variation of water demand (typically from hour-to-hour) and on the other hand, simplified allocation techniques to express the variation of demand across different points (e.g., nodes) of the network. Despite its practical simplicity, this approach fails to capture adequately the high spatio-temporal variability and uncertainty of water demand (e.g., [3][4][5][6][7]), which becomes more and more intense as the scales of analysis become lower [8]. Furthermore, such a modelling concept does not allow the design, evaluation and management of water distribution systems (WDS) within an uncertainty-aware framework since it implies the use of deterministic water demand patterns as inputs in, typically, deterministic simulation models. A way to address some of tail behaviour and hence the reproduction of extremes. This is crucial especially in the design of WDS that requires a proper characterisation of peak flows at different temporal resolutions (e.g., [2,40,44]). This is further highlighted in the work of Kossieris and Makropoulos [39] who demonstrate that the use of some low-order moments (i.e., up to third moment) is not sufficient, since different distributions, having the same moments, result in totally different behaviour in terms of maximum water demands. Additionally, a complete description of the statistical behaviour (e.g., in terms of marginal distribution) of residential water demand is also important in WDS quality modelling applications where different flows, further to maximum values, are used in hydraulic models (e.g., see [3] and references herein).
The importance of appropriately reproducing the temporal and spatial dependencies, i.e., stochastic structure, encountered in water demand process has also been highlighted by many researchers [3-7, 24,37,38,45,46]. More specifically, autocorrelation structure expresses the dynamics of water demand process in time, affecting many parameters in the modelling of WDS, e.g., peak flows, stagnation time, velocities in the pipes, restoration time after pipe breaks etc (e.g., see [37] and references herein). On the other hand, cross-correlation expresses how water demands co-vary across different nodes of the network, and it is also associated with many aspects of the system, e.g., system's performance on the basis of the main statistics of the pressure heads and cost, as well as water quality (e.g., [5,7]). The influence of both auto-and cross-correlation structures have been also studied in the context of bottom-up approaches (see [2]). In this case, the main objective is for the aggregated data to exhibit the desired statistical properties at the temporal and spatial scales under study. As Alvisi et al. [24] pointed out, in order to reproduce the variance of the observed aggregated series, the auto-and cross-correlation structures of the lower-level data should be adequately captured.
In the present work we propose a stochastic modelling strategy for the simulation of residential water demand at any fine time scale, i.e., from 1-min up to 1-h. The proposed strategy combines the widely-used class of linear stochastic models (e.g., autoregressive models) with the concept of Nataf's joint distribution model [47] to enable the explicit reproduction of the marginal distribution and the dependence structure of the process. Nataf-based schemes have been applied in the field of operational research (e.g., [48][49][50]), while very recently this approach was transferred to the hydrological domain for the stochastic simulation of physical processes [51][52][53][54][55][56]. Based on these developments, here, we transfer this approach in the domain of residential water demand (a non-physical process) employing, for the first time, linear stochastic models, rather than pulse-based schemes, to explicitly account for the peculiarities of that process at different fine time scales (i.e., intermittency, skewed distributions, periodicity and strong temporal dependence). Furthermore, in the framework of this approach, various innovative modelling aspects are introduced and discussed, for the first time, in the field of water demand simulation, such as, the use of theoretical autocorrelation functions instead of empirical estimates as well as the use of mixed-type distributions with any arbitrary marginal distribution to describe the continuous part of the process. Especially regarding the latter, we also extend the above mentioned Nataf-based schemes by employing three-component mixed-type distributions to capture adequately both the discrete-continuous character (i.e., intermittent nature) of residential water demand as well as its tail behaviour.
A detailed description of the proposed modelling strategy can be found in Section 2 which is further subdivided into eight subsections presenting the components of the method. Specifically, Section 2.1 provides a literature review of the Nataf-based approach and a high-level description of its rational both in terms of random vectors and stochastic processes, while Sections 2.2 and 2.3 present the theoretical background of the methodology. Section 2.4 provides insights on the probability distributions that can be used to describe the marginal properties of residential water demand, while Section 2.5 illustrates an example of the method. Moving from the field of random variables to stochastic processes, Section 2.6 describes the auxiliary linear stochastic models. Next, the use of theoretical autocorrelation functions is presented in Section 2.7. Lastly, Section 2.8 summarises the overall approaches giving the steps of the generation procedure. The method was evaluated in terms of reproducing the marginal distribution and stochastic structure of residential water demand at three Water 2019, 11, 885 5 of 32 different temporal levels, i.e., fine, medium and high (Section 3). Finally, Section 4 provides a synopsis and discussion of the presented modelling approach.

Methodology and Key Components
Nataf-based schemes enable the explicit preservation of the marginal and stochastic properties of a process or, in other words, its distribution functions and correlation structure. They are based upon the simple, yet pivotal, idea of Nataf [47] according to which the joint distribution of random variables with any target arbitrary marginal distributions can be obtained by mapping an auxiliary multivariate standard Gaussian distribution via the inverse cumulative distribution functions (ICDF). Following the same rational, by mapping an auxiliary (stationary or cyclostationary) Gaussian process with zero mean and unit variance (e.g., simulated by a linear stationary or cyclostationary autoregressive model), we can obtain a stochastic process with a target marginal distribution and correlation structure.

Modelling Rationale and Literature Review
In the core of the described methodology lies the problem of generating random variables with arbitrary marginal distributions and specific correlation matrix. According to Nataf's pivotal work, correlated random vectors can be obtained on the basis of multivariate standard normal variables. This general approach is known as the Nataf's joint distribution model (NDM) or Nataf transformation [57], while later the work of Cario et al. [49] established the term NORmal To Anything (NORTA) to describe a generalized procedure that builds upon and extends the NDM rationale to account also for random vectors comprised by continuous and discrete marginal distributions.
Conveniently, NDM can be regarded as a two-step procedure: first the multivariate normal variables are mapped to uniform domain, and then the multivariate uniform variables are mapped to the desire distributions using their inverse cumulative function. Since this procedure implies the establishment of the variables' joint distribution through the joint distribution of normal variables passing also from uniform domain, it is closely associated with copula theory. This connection was first noted by Cario et al. [49], as well as by Tsoukalas et al. [51], while an extensive discussion on the relation of NDM with Gaussian copula was provided later by Lebrun et al. [58].
The main challenge in NDM is the identification of the so-called equivalent correlations of normal variables that will result in the target correlations after applying the mapping procedure. The suitability of NDM to describe a wide range of correlations was investigated by Liu et al. [57] who also developed empirical formulas that relate the equivalent and target correlations for specific types of continuous marginal distributions. Further to this classical work, various, yet often cumbersome, methods have been proposed to establish a relationship between equivalent and target correlations (e.g., [59][60][61][62]).
Further to the modelling of random vectors, NDM has been also studied and applied in the field of stochastic processes for the generation of synthetic time series which requires, among others, to account for dependencies in time (i.e., stochastic structure). In this case, a Gaussian process is enrolled to generate serially correlated Gaussian random variables which are mapped according to NDM to provide the target process. In this concept, Cario et al. [48] combined an autoregressive linear model with the idea of NDM to simulate auto-correlated univariate stationary processes with arbitrary marginal distributions. This model, known as AutoRegressive To Anything (ARTA), was further extended for the generation of multivariate time series by Biller et al. [50] who developed the Vector AutoRegressive To Anything (VARTA) procedure.
Recently, NDM-based approaches have drawn the attention of hydrological community, since they provide the tools for the reproduction of the peculiarities of hydrometeorological variables, i.e., cyclostationarity, highly skewed character, intermittency, short or long-range auto-dependence structures as well as spatial dependency. In this concept, Tsoukalas et al. [51,54] developed the Stochastic Periodic AutoRegressive To Anything (SPARTA) scheme that is a generalization of ARTA and VARTA models for the simulation of univariate and multivariate cyclostationary (i.e., periodic) processes with arbitrary marginal distributions. Furthermore, the Symmetric Moving Average To Anything (SMARTA) model [52] combines NDM with the symmetric moving average model [63] to simulate non-Gaussian processes that exhibit any-range dependence structure. Papalexiou [53], using autoregressive models, proposed a unified approach for the stochastic modelling of hydroclimatic processes characterised by different correlation structures and marginal distributions, with focus on the use of mixed-type distributions to model intermittency.
Further to the above models, as Tsoukalas et al. [51] pointed out, many other well-known simulation schemes for hydrological variables (e.g., [64][65][66]), with the most characteristic being the so-called Wilks' type weather generators [67], actually share a common rationale, and can be retrospectively viewed as NDM-based approaches.

The bivariate Nataf Distribution Model
As discussed above, NDM approach was initially developed for the generation of correlated random variables (RVs) with arbitrary marginal distributions, while next it was applied in the simulation of stochastic processes after certain extensions and modifications. Although this work focuses on the latter case, in order to keep things simple, we prefer to present NDM's theoretical background on the basis of a bivariate problem that studies the generation of two random variables with predefined marginal distributions and correlation. Essentially, this problem can be regarded as representative since it is also involved when we aim to apply NDM for the modelling of stochastic processes using linear models, due to the fact that the latter are also based on the Pearson's correlation coefficient which is a two-point dependence measure.
Given that our target is to generate correlated RVs X 1 and X 2 with predefined target marginal distributions F X 1 (x 1 )P(X 1 ≤ x) and F X 2 (x 2 )P(X 2 ≤ x), respectively, and target correlation ρ X 1 X 2 Corr[X 1 , X 2 ] which is the Pearson's correlation coefficient between the two variables, hereinafter abbreviated as ρ 1,2 .
Let us initially to define two auxiliary correlated RVs Z 1 and Z 2 which both have the standard normal marginal distribution and specific Pearson correlation coefficient ρ Z 1 Z 2 Corr[Z 1 , Z 2 ], herein after termed as equivalent correlation, for reasons revealed below, and abbreviated as ρ 1,2 . The joint distribution of the two auxiliary variables is the bivariate normal with zero mean, unit variance and correlation ρ 1,2 .
The RVs X 1 and X 2 can be obtained by mapping the auxiliary normal variables to the target distributions, according to the following operations: where F −1 X 1 (·) and F −1 X 2 (·) denote the ICDF of the two target distributions and Φ(·) stands for the standard Normal cumulative distribution function (CDF).
Recall that since U = Φ(·) is uniformly distributed in the interval (0, 1), the use of the ICDF of the target distribution as a mapping function ensures that the final variables will have the desired marginal properties. This concept, as mathematically expressed in Equation (1), is based upon the lemma of probability integral transformation which allows the representation of any random variable as a transformation of a uniform random variable, i.e., if U ∼ U [0,1] , then the random variable X = F −1 (U) has the distribution F X (e.g., see, ( [68], p. 139); and ( [69], p. 36).
Having said the above, one may assume that the preservation of the target correlation matrix is also satisfied by setting ρ 1,2 = ρ 1,2 . However, since ICDF typically imposes a nonlinear and monotonic transformation, it is not possible to preserve the linear correlation as expressed by the Pearson correlation coefficient and hence, ρ 1,2 will typically differ to ρ 1,2 after the mapping of Equation (1). This transformation usually leads to underestimated correlation coefficients, while as the target distribution deviates from the Normal case, the larger will be the underestimation. This implies that in order to attain the desire target correlation coefficient ρ 1,2 , we need to technically assign an inflated value to ρ 1,2 . Therefore, the key challenge of the methodology is to determine the equivalent correlation ρ 1,2 that, after applying the mapping procedure, will result to the desired target correlation ρ 1,2 .
The establishment of the relationship between the target ρ 1,2 and the equivalent correlation ρ 1,2 is based on the theoretical background of Nataf's joint distribution model. Elaborating, given Equation (1), the correlation of two RVs X 1 and X 2 can be written as: According to the definition of Pearson correlation coefficient, the following equation also stands: where E[X 1 ], E[X 2 ] and Var[X 1 ], Var[X 2 ] are the mean and variance of X 1 and X 2 , respectively. Since the associated marginal distributions are already known (and have finite variance), these quantities can be directly estimated and hence our attention is restricted to adjusting E[X 1 X 2 ]. Taking the first cross-product moment of X 1 and X 2 and using Equation (1), we obtain: where ϕ 2 ( 1 , 1 ; ρ 1,2 ) is the standard bivariate normal probability density function (PDF) with correlation ρ 1,2 . By substituting Equation (4) to Equation (3) we obtain: In Equation (5) we see that the target correlation ρ 1,2 is a function of the equivalent correlation ρ 1,2 , given the target marginal distributions F X 1 (x 1 ) and F X 2 (x 2 ). The latter equation can be compactly expressed as: where T (·) is the abbreviation of the function defined in Equation (5). By inverting Equation (6), we express the problem as stated initially, i.e., what ρ 1,2 will give the desired ρ 1,2 after applying the mapping procedure: Equation (6) has analytical solutions only for a few cases where the variables have the same target marginal distributions, such as the Uniform [60], Exponential [49] and Log-Normal [70]. In the general case, numerical schemes are required to solve the integral in Equation (5) in order to establish a relationship among ρ 1,2 and ρ 1,2 (see Section 2.3 for more details). In order to avoid tedious calculations and achieve a more efficient numerical search, the fundamental properties of Equation (6) are typically employed in the numerical procedures. The key properties, as provided by Liu and Der Kiureghian [57] and Cario and Nelson [48], are: Lemma 1. ρ 1,2 is a strictly increasing function of ρ 1,2 . Lemma 2. ρ 1,2 = 0 for ρ 1,2 = 0 and ρ 1,2 ≥ (≤)0 if ρ 1,2 ≥ (≤) 0. Lemma 3. ρ 1,2 |≤| ρ 1,2 , where the equality stands when ρ 1,2 = 0 or when both marginal distributions are normal. Lemma 4. The feasible minimum and maximum values for ρ 1,2 , which can be obtained for a given set of target distributions, are given for ρ 1,2 = −1 and ρ 1,2 = 1, respectively.

Establishing the Target-Equivalent Correlation Relationship
In the core of the NDM approach lies the problem of establishing a relationship between the target and equivalent correlation, ρ 1,2 and ρ 1,2 , respectively, given the target marginal distributions of the RVs. In other words, to define T (·), or alternatively T −1 (·), that will give the value of ρ 1,2 so as to attain the desired ρ 1,2 after mapping Gaussian variables to the real domain (i.e., Equation (1)). To solve the double integral appearing in Equation (5) and establish T (·) numerical schemes are typically employed, since an analytic solution is feasible only for some few special cases. Among others, crude search procedures [48], root-finding techniques [60,62] as well as quadrature methods and Monte-Carlo procedures [59] have been used. These methods are often based on iterative methods while some of them are specifically designed for certain types of distributions (e.g. continuous).
Recently, Tsoukalas et al. [51] and Papalexiou [53] proposed generic hybrid schemes that share a common notion regarding the establishment of a suitable relationship between ρ 1,2 and ρ 1,2 . These schemes can be regarded as generic methodologies since on the one hand, they aim to capture the whole form of T (·), and not to provide specific point estimates of ρ 1,2 , and on the other hand, they are easily-extendable and applicable irrespective of the type of marginal distributions of studied variables (i.e., continuous, discrete or mixed-type of distributions).
More specifically, in these methods, Equation (6) is solved for a specific set of ρ 1,2 values, and the corresponding target ρ 1,2 values are obtained. To solve the double integral of Equation (5) typical integration techniques [53] or Monte-Carlo [51] methods are employed. In this step, the methods also take advantage of the properties of Equation (6), e.g., Lemma 2, to define the range of the ρ 1,2 values that should be studied (e.g., if the target correlation is positive then ρ 1,2 values will range from 0 up to 1). This procedure leads to a set of (ρ 1,2 , ρ 1,2 ) anchor points upon which a suitable function (e.g., polynomial or parametric) is fitted to establish an approximation of the true T (·). This relationship can be used as an "interpolator" that provides the equivalent ρ 1,2 given the target ρ 1,2 . Papalexiou [53] proposes various parametric functions to establish directly T −1 (·), depending on the type of RVs. On the other hand, Tsoukalas et al. [51] uses a polynomial function to approximate the T (·), while the equivalent correlation ρ 1,2 , given a target correlation ρ 1,2 , is obtained by inverting the fitted polynomial.

Modelling the Marginal Behaviour of Water Demand
As described above, the NDM approach requires the definition of probability distributions to describe the marginal properties of correlated random variables or stochastic processes (Section 2.6). Essentially, the method aims to reproduce the target distributions which have assigned a priori to the variables under study. Tsoukalas et al. [51,71] highlight the importance and benefits of this approach against the classical stochastic modelling of hydrometeorological processes, which typically focuses on the resemble of a series of specific statistical characteristics. As discussed in Section 1, this is also crucial in residential water demand modelling given that WDS applications require a proper reproduction of various characteristics (e.g., maximum demands) which cannot derive explicitly by the preservation of some low-order statistics.
Another key advantage of the NDM approach is its flexibility regarding the selection and fitting of the target distributions, given that the only prerequisite of the method regarding the latter is to have finite variance. In this respect, several candidate distributions, fitted with alternative methods (e.g., classical moments, L-moments, maximum likelihood estimation, weighted moments), can be evaluated and the most suitable for the variables under study can be identified on the basis of certain criteria.
Residential water demand series with a fine time step (i.e., at daily and mainly sub-daily scales) are characterised by the presence of individual or grouped zero values, whose proportion becomes more and more higher as the metering resolution is getting lower, implying more extensive time intervals with no demand. To describe probabilistically such an intermittent process, a mixed-type (also known as discrete-continuous or zero-inflated) distribution can be employed, composed by a probability mass concentrated at zero (i.e., discrete part) and a continuous part that describes the nonzero positive demand magnitudes. This modelling approach, i.e., use of a unique distribution to model the whole water demand process, differs from that of the pulse-based schemes (see Section 1) which incorporate distributions for the individual components of demand events (i.e., occurrence, duration and intensity) without any reference to the marginal properties of the whole process.
More specifically, the cdf of a mixed-type distribution is given by: where p ND := P(X = 0) is the probability of no demand (probability zero), while G X and g X are the cdf and pdf, respectively, of values greater than zero, i.e., G X F X|X 0 = P(X ≤ x|X 0). Therefore, the establishment of F X (x) or f X (x) requires: (i) the estimation of p ND which can be directly estimated from the observed data as the ratio of the number of zero values over the total number of observations, and (ii) the selection and fit of a continuous distribution G X that describes adequately the nonzero values. As mentioned above, any probability distribution, defined on the real positive axis, could be a candidate to model the continuous part of the process. At the same time, residential water demand at fine time scales is characterised by high variability and skewness and hence, distributions that will be able to capture these characteristics should be employed. In this respect, Kossieris and Makropoulos [39] recently examined the suitability of 10 distributions to describe the nonzero 15-min and hourly water demand, showing that Weibull, Gamma and Lognormal distributions have a good performance in terms of reproducing the statistical behaviour of the under study records, while the latter two perform also well in terms of maxima. Further to that, the probability distributions employed by the pulse-based schemes to model pulse intensities could be potential candidates to represent G X , i.e., the Exponential [8,24,26,27], Weibull [18], Normal [19] and Lognormal [36] distribution.
In several cases, the above classical distributions appear as suitable models to describe the entire probabilistic behaviour of the under study continuous variable. On the other hand, as indicated by many hydrological studies, the probabilistic structure of observed data (e.g., rainfall, flows) in several cases is more complex (e.g., multimodal behaviour) and, hence, cannot be entirely captured with the use of a single distribution. An alternative solution to this consists the use of nonparametric (e.g., [76]) or mixture distributions. The former, being a data-driven approach, has extrapolation limitations and hence is inappropriate to model tail behaviour. Furthermore, mixture type distributions provide enhanced flexibility, at a cost of few parameters, to account simultaneously for low, moderate and high amounts of the under study variable. This approach has found fertile ground to adequately model hydrological extremes which is of high importance in many engineering applications (e.g., [77][78][79]). Furthermore, as discussed in Section 1, the whole range of water demand flows, further to the peak values, are often involved in WDS applications, and thus the reproduction of the entire probabilistic behaviour of residential water demand consists a conditio sine qua non.
To this end, in the present study we also employ a two-component mixture distribution for the description of nonzero water demand values. This enables to capture the main body of observed data as well as a more suitable representation of extreme behaviour, in cases where classical distribution models are found inadequate. Particularly, the employed mixture model for nonzero water amounts reads [80]: where G 1 (x) denotes a cdf, with parameter vector θ 1 , which describes the body of the data up to threshold c with corresponding quantile u c . Furthermore, G 2 (x) stands for the cdf of the Generalised Pareto distribution (GPD), with vector parameter θ 2 , which consists the model of the upper tail. The above mixture model has been implemented with the use of various continuous parametric models, such as those mentioned above, to represent G 1 (x), e.g., Weibull, Gamma and Lognormal distributions [81][82][83], while nonparametric methods have also been applied [80]. Furthermore, the use of GPD consists a well-established modelling approach for extremes in many disciplines, e.g., in hydrological domain this model has been used to probabilistically describe rainfall maxima. This model has the flexibility to represent different types of upper tail behaviour depending on the value of shape parameter γ (the cdf of GPD is given in Appendix A, Table A1). More specifically, for γ > 0 the GPD has a heavy (sub-exponential) upper tail, for γ = 0, it convergences to the exponential tail, while for γ < 0 the distribution has a finite upper bound at u c − β/γ, where β and u c denote the scale and location (threshold) parameter, respectively.
In the framework of the present study, the evaluation of different probabilistic models in terms of adequately representing the entire marginal distribution of the finely resolved observed water demand records showed that, in several cases, the latter was not possible to be achieved with the use of a single distribution. More specifically, the investigated single distributions, though resembling the main statistical characteristics upon which they are fitted (e.g., via the method of moments or L-moments), diverge significantly from the empirical marginals both in the body as well as the right tail of the distribution. On the contrary, when those models are combined with a GPD to formulate a mixture type distribution, i.e., Equation (9), a much better fitting is achieved. This is further illustrated graphically in Figure 1 that depicts the empirical distribution of the nonzero 1-min water demand observations, recorded in time interval 20:00-21:00 (see Section 3.1 for more information on the dataset used in the present case studies), along with the corresponding theoretical distributions of three single models (i.e., Gamma, Weibull, Lognormal) and one mixture model composed by Gamma and GPD. The graph presents in double logarithmic plot the probability of exceedance, i.e., G X (x) = P(X > x) = 1 − G X (x). The single distribution models (their cdf are given in Appendix A, Table A1) were fitted with the method of L-moments [72,84], while the parameters of the mixture model were obtained via the maximum likelihood method (see [80,83] for further details). As Table 1 shows, single distribution models perfectly resemble the observed statistical characteristics upon which they are fitted (i.e., L-mean and L-variation), providing also a good approximation to observed L-skewness and L-kurtosis which are not involved in the fitting procedure. Despite this, Figure 1 reveals that these models show a significant divergence from the empirical distribution of the observed data for values greater than the quantile x = 7 L/min. On the contrary, the Gamma-GPD mixture model provides a much better approximation both to the body and the tail of the observed distribution, resembling also well the under study statistics (see Table 1). In the present study, we exploit the flexibility of Nataf-based models, allowing the use of any distribution model, and we employ this mixture-type distribution, in cases where simpler models cannot provide a good fit. Further to this, to describe the entire marginal behaviour of water demand process at fine time scales via a single cdf model, we embed Equation (9) within Equation (8) to account simultaneously for intermittency (discrete part) as well as the body and tail behaviour of the continuous part (see Section 3).
approximation to observed L-skewness and L-kurtosis which are not involved in the fitting procedure. Despite this, Figure 1 reveals that these models show a significant divergence from the empirical distribution of the observed data for values greater than the quantile = 7 L/min. On the contrary, the Gamma-GPD mixture model provides a much better approximation both to the body and the tail of the observed distribution, resembling also well the under study statistics (see Table 1).
In the present study, we exploit the flexibility of Nataf-based models, allowing the use of any distribution model, and we employ this mixture-type distribution, in cases where simpler models cannot provide a good fit. Further to this, to describe the entire marginal behaviour of water demand process at fine time scales via a single cdf model, we embed Equation (9) within Equation (8) to account simultaneously for intermittency (discrete part) as well as the body and tail behaviour of the continuous part (see Section 3).

Demonstrating NDM Approach
Prior to studying the application of NDM for the simulation of water demand stochastic processes, which is the main subject of the present work, here we provide a comprehensive demonstration of the entire approach, as described in Sections 2.2 and 2.3, via a numerical example. More specifically, we examine the problem of generating RVs X 1 and X 2 which both follow the Weibull distribution and have target correlation ρ 1,2 = 0.75. We assume that the parameter values for the distribution of both variables are β = 1.01 and γ = 0.54 (see Table A1 in Appendix A for more details on Weibull distribution).
Given the target marginal distribution and correlation, initially, the equivalent correlation ρ 1,2 of the auxiliary standard normal variables Z 1 and Z 2 is obtained. This is achieved by first establishing T (·) that provides the relationship between the target and equivalent correlations for the specified distributions. In the present work, we employed the numerical scheme proposed by Tsoukalas et al. [51] according to which the true T (·) is approximated by a polynomial function (see Section 2.3 for more information). In the present example, the following quadratic function is derived: The established relationship between target and equivalent correlations is further illustrated in Figure 2a. The equivalent correlation ρ 1,2 of the Gaussian variables can be then obtained by solving the simple Equation (10) for the given target correlation. For this case, for ρ 1,2 = 0.75, we get ρ 1,2 = 0.834. Next, we generated 100,000 values from the bivariate standard normal distribution having the specified equivalent correlation. The scatter plot as well as the marginal distributions of these auxiliary variables are presented in Figure 2b. These variables are first mapped to the uniform domain via the cdf of the standard normal distribution Φ(·), providing correlated values that have uniform marginal distributions (see Figure 2c). The correlated uniformly distributed variables are then mapped to the actual domain (see Figure 2d), according to Equation (1), to provide variables with the target marginal distributions and correlation. As explained above, the use of the ICDF of the target distribution (the Weibull in this example) as a mapping function ensures that the final variables will have the desired marginal properties, while the assignment of a technically inflated value to ρ 1,2 allows to attain the target correlation ρ 1,2 after the mapping operation.

Moving from Random Variables to Stochastic Processes
As already mentioned above, NDM can be also applied to generate synthetic time series with the desired marginal distributions and stochastic structure (in terms of autocorrelation and crosscorrelation coefficients). In this case, a Gaussian process is employed to generate auxiliary time series which are then mapped via the ICDF of the target distributions to obtain the target process. Although Gaussian process can be regarded as an intermediate auxiliary step in the whole procedure, its role is crucial since its structure determines that of the target process, e.g., to simulate a stationary autocorrelated process then a stationary Gaussian process should be employed, while the generation of cyclostationary time series requires the use of a cyclostationary Gaussian model as auxiliary. Summarising, the choice of the auxiliary process that is going to be implemented in the NDM concept is a matter of modelling requirements and convenience.
As mentioned in Section 2.1, different auxiliary models have been used in the existing Natafbased schemes. In fact, as discussed in Tsoukalas et al. [52], any linear stochastic model could be used to play this role. For instance, Papalexiou [53] used the sum of independent autoregressive processes (e.g., see [85][86][87]) to generate Gaussian time series.
In the present work, taking into account the peculiarities of residential water demand process at the timescales under study (i.e., hourly and sub-hourly scales) and without loss of generality, we employ stationary and cyclostationary autoregressive models to simulate the auxiliary Gaussian process. Prior describing these models, let us introduce some notations that allow the transition from random variables to processes.
Beginning from the stationary case, let be a target process, where = 1, 2, …, is the time index, with target marginal distribution and autocorrelation structure ≔ Corr[ , + ], where is the time lag. Furthermore, let be the stationary Gaussian auxiliary process with zero mean

Moving from Random Variables to Stochastic Processes
As already mentioned above, NDM can be also applied to generate synthetic time series with the desired marginal distributions and stochastic structure (in terms of autocorrelation and cross-correlation coefficients). In this case, a Gaussian process is employed to generate auxiliary time series which are then mapped via the ICDF of the target distributions to obtain the target process. Although Gaussian process can be regarded as an intermediate auxiliary step in the whole procedure, its role is crucial since its structure determines that of the target process, e.g., to simulate a stationary auto-correlated process then a stationary Gaussian process should be employed, while the generation of cyclostationary time series requires the use of a cyclostationary Gaussian model as auxiliary. Summarising, the choice of the auxiliary process that is going to be implemented in the NDM concept is a matter of modelling requirements and convenience.
As mentioned in Section 2.1, different auxiliary models have been used in the existing Nataf-based schemes. In fact, as discussed in Tsoukalas et al. [52], any linear stochastic model could be used to play this role. For instance, Papalexiou [53] used the sum of independent autoregressive processes (e.g., see [85][86][87]) to generate Gaussian time series.
In the present work, taking into account the peculiarities of residential water demand process at the timescales under study (i.e., hourly and sub-hourly scales) and without loss of generality, we employ stationary and cyclostationary autoregressive models to simulate the auxiliary Gaussian process. Prior describing these models, let us introduce some notations that allow the transition from random variables to processes.
Beginning from the stationary case, let X t be a target process, where t = 1, 2, . . . , is the time index, with target marginal distribution F X and autocorrelation structure ρ τ Corr[X t , X t+τ ], where τ is the time lag. Furthermore, let Z t be the stationary Gaussian auxiliary process with zero mean and unit variance and autocorrelation structure ρ τ Corr[Z t , Z t+τ ]. In order to align the bivariate NDM, as presented in Section 2.2 for the case of random variables, with stationary stochastic processes, we have to set throughout Equations (1)-(7), X 1 X t and X 2 X t+τ for the target process, as well as Z 1 Z t and Z 2 Z t+τ for the auxiliary, accordingly.
The most prominent linear stochastic model, used in a variety of applications and scientific fields, is the autoregressive model of order p (i.e., AR(p)). This can be attributed to its parsimony and simplicity as well as to its intuitive structure since the model implies that the present value of the process is obtained as the weighted sum of p past values and a random component. According to AR(p), a standard Gaussian process Z t ∼ N(0, 1), with specific autocorrelation structure ρ τ , is obtained by the following recursive formula: where ε t are i.i.d variables (white noise) that follow the standard Normal distribution, i.e., ε t ∼ N(0, 1), while a i , with i = 1, . . . , p, and β are model parameters. The parameters a i are derived analytically on the basis of the Yule-Walker system, given the values of ρ τ up to lag p. This reads as: where: In order for an AR(p) process to be stationary, certain conditions for the parameters a i should be fulfilled (e.g., see [88]).
Parameter β essentially expresses the variance of the random component in Equation (11) and it can be estimated by: In the AR(p) model, since ε t are generated from the Normal distribution, the random process Z t will be also Gaussian. Furthermore, this model ensures the preservation of a given autocorrelation structure up to lag p, whereas for higher lags the model gives a gradually decay autocorrelation structure, i.e., at any lag τ ≥ p + 1 the modelled autocorrelation is given by ρ AR τ = p i=1 a i ρ τ−i . The stationary AR(p) model can be easily adjusted to account for cyclostationarity, i.e., in cases where the distribution and correlation structure of the process vary periodically from season-to-season. In this case, the model is termed as periodic autoregressive model of order p (i.e., PAR(p)), and further to the time index t, it is convenient to employ another index s = 1, . . . , S that denotes the season (e.g., S = 12 for processes exhibiting month-to-month variation or S = 24 for hour-to-hour variation). Subsequently, the cyclostationary process reads as Z s,t . Accordingly, ρ s,s−τ now represents the correlation between season s and s − τ. Reasonably, model parameters will also depend on season s. Particularly, in the case of PAR(1) model, the model generating equation reads (the index t is omitted for simplicity): where a s = ρ s,s−1 , β s = 1 − a s 2 and ε s ∼ N(0, 1). As presented in Section 3, in the present work, AR(p) was used as an auxiliary model in the simulation of residential water demand processes at sub-hourly time scales (see Sections 3.1 and 3.2), while to capture the hour-to-hour variation of hourly water demand (see Section 3.3) we employ the PAR(1) model.

Modelling the Dependence Behaviour of Water Demand
Further to the marginal distributions, Nataf-based models aim to capture and reproduce the dependence behaviour of a process as expressed by its stochastic structure. As explained above, the use of Pearson's correlation coefficient, as linear measure of dependence, arises naturally since linear stochastic models are employed to establish dependence in time. However, as it is known the estimation of empirical correlations, as well as of the other second-order statistics, subjects to important bias (e.g., [89,90]), especially in cases of small samples, large lags and intense (persistent) autocorrelation structures, i.e., large autocorrelation values that extend for large lags. This issue has been studied in the domain of hydrological variables (e.g., [63,91]), while Dimitriadis and Koutsoyiannis [92] compare the suitability of other tools to identify the auto-dependence structure of a process. These sources of bias are also encountered in residential water demand processes since the available observed records are typically short, while their autocorrelation structure exhibits a more persistent behaviour as the time scale becomes finer. Due to this, a theoretical autocorrelation structure (ACS) is typically preferred over the direct use of the empirical sample estimates. Further to the above, the use of a suitable ACS is favored for a series of other reasons. Firstly, it ensures the stability of the linear stochastic models given that a properly defined ACS provides positive definite autocorrelation structures, which is a prerequisite so as the latter to be valid (see [68] for more details). Furthermore, it can be regarded as a parsimonious procedure since the whole structure is described via a small set of parameters, enabling the extrapolation of correlation coefficients for lags as high as desired (1/3 of the sample size is the typically suggested as the maximum lag up to which empirical correlations can be properly estimated). Lastly, it enables the transferability of ACS parameters in cases where the sample size does not allow for a proper identification of the correlation structure and stable estimation of coefficients. The latter is of high importance in water demand modelling since observed records at fine time scales are typically limited both in terms of the number of meters and the length of records. In the present work, it is the first time that an ACS is employed to characterise the auto-dependence properties of residential water demand processes.
Various theoretical models to represent the correlation structure of stationary processes can be found in the literature (e.g., [53,86,89,90]). Here, we employ the Cauchy-type autocorrelation structure (CAS) proposed by Koutsoyiannis [63] as a simple and parsimonious model, able to capture a wide range of short-and long-range dependence structures. The power-type CAS, for positive ACF, is given by: where β ≥ 0 and κ > 0 are model parameters that control the form of the function and hence the degree of dependence. For β = 0, CAS provides a short-range autocorrelation structure, i.e., an exponential structure that decays rapidly and diminishes after few time lags. On the contrary, for β > 1, CAS resembles long-range autocorrelation structures, i.e., slowly decreasing structure that extends for large lags. For other values of parameters, the function provides the flexibility to represent autocorrelation structures between, or even outside of, the two cases explained above (see [63] for further details). The estimation of CAS parameters can be obtained by minimizing the error between the sample and theoretical correlation coefficients, on the basis of the least squares method. In the present study, our target was to achieve a good overall fit to a specific number of autocorrelations but with an extra care regarding the preservation of empirical lag-1, i.e., by setting a larger weight in the objective function for this quantity.
The above modelling strategy has been recently employed within NDM-based approaches for hydrological process modelling. In the NDM concept, Tsoukalas et al. [52] used CAS model, while Papalexiou [53] proposed a series of parametric ACS models exploiting their structural similarity with the form of common complementary cumulative distribution functions.

Summary of the Modelling Approach Step-by-Step
In the present section, we summarise the stochastic modelling framework for residential water demand processes via the following steps:

1.
Select and fit the suitable marginal distributions (i.e., continuous, mixed; see Section 2.4) and autocorrelation function (see Section 2.7) that better capture the distributional properties and autocorrelation structures, respectively, of the observed time series.

2.
Given the target autocorrelation structure ρ τ (derive either from a theoretical autocorrelation model or as it is estimated directly from the data), establish the correlation transformation function and estimate the equivalent correlation coefficient ρ τ up to the maximum specified lag τ (see Section 2.3).

3.
Identify the suitable auxiliary Gaussian linear stochastic model (e.g., AR(p) and PAR(p); see Section 2.6) and fit it on the basis of ρ τ .

4.
Generate a realization z t of the auxiliary process Z t at the Gaussian domain.

5.
Use the ICDF of the fitted distribution to map the synthetic time series z t to the actual domain (i.e., Equation (1)) and obtain the final realization x t of the target process X t .
The above described steps, summarising the methods and tools of Section 2, have been implemented in the R programming language [93], in the form of an R package, named anySim (the package is available at: http://www.itia.ntua.gr/en/softinfo/33/; [94]).

Case Studies
The stochastic modelling strategy was applied and evaluated in the simulation of residential water demand at three different temporal scales which are typically involved in WDS applications, i.e., 1-min (Section 3.1), 15-min (Section 3.2) and 1-h (Section 3.3). Furthermore, these three case studies allow the assessment of the modelling approach on the basis of a variety of different marginal behaviours and correlation structures which are usually observed in the water demand records at these time scales. This further demonstrates that a single modelling strategy suffices for the reproduction of the peculiarities of residential water demand at any time scale, without requiring the use of ad-hoc techniques or scale-specific adjustments.
The observed data used in the present study concerns a total water demand record of 1-min resolution from a single household in Athens (Greece). The data was collected via a smart metering equipment installed in the framework of the iWIDGET project [95] and concern an observation period of 5 months, spanning from 1 October 2014 to 31 March 2015. This record was selected due to the quality of data (i.e., small percentage of missing values) and the regular consumption of water, while a shorter part of this dataset has been also used in past studies [22,27]. The employed dataset was initially examined for negative or unrealistically large values due to smart meter malfunctions. All negative values and the values greater than 50 L/min were excluded from the analysis. The 15-min and hourly records were obtained by aggregating temporally the 1-min data at the corresponding time intervals, flagging as missing and excluding from the analysis the values with at least one missing 1-min value.
The performance of the modelling strategy was evaluated by comparing several relevant properties and characteristics of the simulated series with the observed one. Since the stochastic methodology enables the explicit reproduction of the marginal distribution and the dependence structure of the process under study, we initially examine the capability of the model to resemble the empirical cumulative distribution functions (both probability of no demand and the distribution of nonzero values) and the Pearson's autocorrelation coefficients of the observed series. To further assess and verify the model as a fully operational tool, we employed a series of different empirical characteristics which are not directly involved in the model's fitting, but their reproduction is regarded of high importance in WDS applications. These characteristics have been also used in the evaluation of pulse-based schemes [18,28,33] and concern: (1) the total water demand per day, V(L/d), which is the total volume of water consumed within a day; (2) the maximum flow per day which is the highest consumption of the day at the temporal scale of measurement, e.g., Q max (L/min) or Q max (L/15-min); and (3) the maximum hourly demand, Q max (L/h), which is the highest hourly consumption of the day.

Simulation of 1-min Water Demand
The first application concerns the simulation of residential water demand at a fine temporal scale (1-min resolution). At this scale the process is characterised by especially high proportion of zero values p ND , highly skewed distributions and strong autocorrelation structures. Furthermore, water demand characteristics exhibit a significant variation within the day, and, thus, the analysis and modelling is typically conducted by subdividing the day into shorter time intervals (e.g., typically with 1 or 2 h length) where the process can be assumed as stationary. In the present study, we treat 1-min water demand of each hour as a separate stationary stochastic process, and hence we apply the modelling strategy twenty-four times, varying the distribution function and autocorrelation structure from hour-to-hour. This implies that the series of each hour of the day has a different distribution and correlation function, while for the sake of simplicity and parsimony we do not assume different processes for working and weekend days.
Following the steps of the modelling strategy described in Section 2.8, initially, the most suitable probabilistic models to describe the marginal properties of 1-min water demand records at each time interval is identified. Due to the intermittent nature of the process, a mixed model (Section 2.4), given in Equation (8), was employed to describe simultaneously both the discrete and continuous part of the marginal distribution. The discrete part of the model, i.e., probability of no demand p ND , was obtained directly by dividing the number of zero values with the total number of recorded data. Regarding the continuous part G X of the mixed model, in the present study we used the two-parameter Gamma (G), Weibull (W) and Lognormal (LN) distributions as well as their combination with the Generalised Pareto distribution (GPD), i.e., the mixture model of Equation (9). Furthermore, the three-parameter Generalised Gamma distribution (GG) was also employed. The above models were fitted to the nonzero 1-min water demand data (either via the method of L-moments or maximum likelihood; see Section 2.4) and the most suitable to describe the record of each individual time interval was identified in terms of approximating adequately the entire empirical distribution of the observed data. These distributions were selected for the demonstration purposes of the present case study and they could be replaced by any other model as implied by the flexible structure of the modelling strategy. The estimated p ND as well as the empirical probability distributions of the fitted models along with their parameters for the 1-min demand records of each hour of the day are given in Figure A1 of Appendix B.
Further to the marginal distributions, to model the autocorrelation structures we employed the power-type CAS given in Equation (16). The parameters of CAS were obtained for each time interval by minimizing the mean square error (MSE) between the sample and theoretical correlation coefficients. The empirical autocorrelation structures along with the fitted CAS and its parameters are graphically presented in Figure A2. Given the target autocorrelation structure ρ τ and the marginal distributions, the T (·), given in Equation (7), was established following the numerical scheme proposed by Tsoukalas et al. [51] and the equivalent correlation coefficients ρ Z were obtained for each time interval. The visual inspection of autocorrelation structures of 1-min water demand records showed that ρ τ is practically equal to zero for τ > 10 in most hourly intervals and, thus, an AR(10) model was selected and fitted to ρ τ , for τ = 1, 2, . . . , 10, to generate the auxiliary time series. The final time series was then obtained by mapping the Gaussian series to the selected distribution for each time interval. In the present case study, we generated synthetic data of 4 000 days length (4 000 × 24 × 60 values).
The application of the methodology is summarised in Figure 3 that presents the results of the simulation of 1-min water demand of three representative hourly intervals of the day, i.e., a night, a morning and an evening hour. More specifically, plots Figure 3a,b depict the observed and synthetic 1-min water demand data of a typical day. The efficiency of the method in terms of reproducing exactly the target distributions of the nonzero demand values as well as the probability of no demand is illustrated in the panels (c) and (e) of Figure 3 as well as in the probability plots of Figure A1. Furthermore, panels (f)-(h), along with plots in Figure A2, present graphically the capability of the model to resemble the target autocorrelation structures, while panels (i)-(k) depict the relationships between the target and equivalent correlation coefficients for the mixed-type model as well as for the distribution fitted to the nonzero demands. As we see, the presence of zero values has as a result the intense divergence of the relationship from linearity.  Furthermore, Figure 4 presents the performance of the stochastic methodology in a series of important characteristics which are not explicitly modelled by the former. More specifically, as we see in Figure 4a the cdf of the entire simulated 1-min water demand series resembles the observed one, while as Figure 4b shows that the model achieved a very good performance in terms of reproducing the observed daily volumes. Regarding the reproduction of extremes (Figures 4c,d), the Furthermore, Figure 4 presents the performance of the stochastic methodology in a series of important characteristics which are not explicitly modelled by the former. More specifically, as we see in Figure 4a the cdf of the entire simulated 1-min water demand series resembles the observed one, while as Figure 4b shows that the model achieved a very good performance in terms of reproducing the observed daily volumes. Regarding the reproduction of extremes (Figure 4c,d), the cdfs of both nonzero maximum 1-min demand per day, Q max (L/min), as well as maximum hourly demand per day, Q max (L/h), produced by the model are in high agreement with the observed one, overestimating slightly the amounts with very low probability of exceedance.

Simulation of 15-min Water Demand
The second case study involves the simulation of residential water demand at a medium resolution scale, i.e., 15-min. This is the typical metering resolution of the available energy-efficient, cost-effective and with long lifetime smart metering devices, and it consists the typical temporal scale of many WDS applications. In the present case study, we followed the same approach as in the simulation of 1-min water demand (see Subsection 3.1) regarding the effect of seasonality, treating the process of each hour of the day as stationary and hence varying the distribution function and autocorrelation structure from hour-to-hour.
The mixed model of Equation (8) was employed to describe the marginal distribution of the intermittent process, while the distributions described in previous case study were also examined as candidates to model the nonzero 15-min demands of each time interval. The estimated as well as the empirical probability distributions of the fitted models along with their parameters for the 15min demand records of each hour of the day are given in Figure B3 Appendix B. Again, the powertype CAS, given in Equation (16), was used to model the autocorrelation structure and (•), given in Equation (7), was established to provide the equivalent correlation coefficients ̃ for each time interval. In the simulation of 15-min demand, an AR(3) model was selected to generate the Gaussian series, since the analysis of autocorrelation structures of the records showed is practically equal to zero for ≥ 4 in most hourly intervals.
For demonstration, we generated synthetic data of 4 000 days length (4 000 × 24 × 4 values), conducting a similar analysis with the above case study. The results are summarised in Figure 5 that depicts three representative hourly intervals of the day, i.e., a night, a morning and an evening hour. More specifically, Figures 5a,b present the observed and synthetic 15-min water demand data, respectively, of a typical day. In the plots (c)-(e) of Figure 5 along with probability plots of Figure B3, we can see that the method reproduces exactly the target distributions of the nonzero demand values as well as the probability of no demand. Additionally, the capability of the model to resemble the target autocorrelation structures is graphically illustrated in panels (f)-(h) and in the plots of Figure

Simulation of 15-min Water Demand
The second case study involves the simulation of residential water demand at a medium resolution scale, i.e., 15-min. This is the typical metering resolution of the available energy-efficient, cost-effective and with long lifetime smart metering devices, and it consists the typical temporal scale of many WDS applications. In the present case study, we followed the same approach as in the simulation of 1-min water demand (see Section 3.1) regarding the effect of seasonality, treating the process of each hour of the day as stationary and hence varying the distribution function and autocorrelation structure from hour-to-hour.
The mixed model of Equation (8) was employed to describe the marginal distribution of the intermittent process, while the distributions described in previous case study were also examined as candidates to model the nonzero 15-min demands of each time interval. The estimated p ND as well as the empirical probability distributions of the fitted models along with their parameters for the 15-min demand records of each hour of the day are given in Figure A3 Appendix B. Again, the power-type CAS, given in Equation (16), was used to model the autocorrelation structure and T (·), given in Equation (7), was established to provide the equivalent correlation coefficients ρ τ for each time interval. In the simulation of 15-min demand, an AR(3) model was selected to generate the Gaussian series, since the analysis of autocorrelation structures of the records showed ρ τ is practically equal to zero for τ ≥ 4 in most hourly intervals.
For demonstration, we generated synthetic data of 4 000 days length (4 000 × 24 × 4 values), conducting a similar analysis with the above case study. The results are summarised in Figure 5 that depicts three representative hourly intervals of the day, i.e., a night, a morning and an evening hour. More specifically, Figure 5a,b present the observed and synthetic 15-min water demand data, respectively, of a typical day. In the plots (c)-(e) of Figure 5 along with probability plots of Figure A3, we can see that the method reproduces exactly the target distributions of the nonzero demand values as well as the probability of no demand. Additionally, the capability of the model to resemble the target autocorrelation structures is graphically illustrated in panels (f)-(h) and in the plots of Figure A4, while panels (i)-(k) depict the established relationships between the target and equivalent correlation coefficients for the mixed model as well as for the distribution fitted to the nonzero demands.   As it is further demonstrated in Figure 6a,b, the cdf of the entire simulated 15-min water demand series as well as the cdf of synthetic daily volumes resemble with precision the observed ones. Furthermore, the model has the capability to reproduce the behaviour of observed extremes as revealed by Figure 6c,d that present the cdf of the nonzero maximum 15-min demand per day, Q max (L/15-min), and the cdf of maximum hourly demand per day, Q max (L/h), respectively.

Simulation of 1-hour Water Demand
The final case study concerns the simulation of residential water demand process at hourly scale, which is the typical temporal scale of analysis of many real-world WDS applications and distribution network simulation models. At this time scale, the process is characterised by strong seasonality and hence it is treated as cyclostationary with marginal distributions and correlation structures that vary periodically from season-to-season. Having said this, we follow the same approach as in the previous two case studies treating hourly water demand at each hour of the day as a separate stochastic process with common marginal distribution, while in the present case the correlation coefficients express the dependence of hourly demand among successive hours. To capture this hour-to-hour variation of hourly water demand and establish this stochastic structure we built upon the SPARTA approach [51,54], and particularly, we employ a 24-season PAR(1) model (see Subsection 2.6) for the generation of auxiliary Gaussian series. To demonstrate the stochastic modelling strategy in this concept, we generated synthetic hourly water demand data of 4 000 days length (4 000 × 24 values), performing a similar analysis as in the previous two case studies.
Regarding the marginal properties of the hourly water demands, the process still have an intermittent nature, despite the fact that the observed values are much smaller compared to those of 1-min and 15-min scales, while the distributional behaviour of the nonzero values deviates significantly from Gaussianity. Due to this, the cyclostationary stochastic model is coupled with the mixed-type distribution of Equation (8) to describe the entire marginal behaviour of the process. More specifically, we fit and evaluate the two-parameter Gamma, Weibull and Lognormal distributions as well as the three-parameter Generalised Gamma distribution (GG) to model the positive demand quantities. Figure 7 displays the observed and simulated values as well as the observed, theoretical and simulated empirical probability plots of the nonzero hourly water demand records of each hour of the day. As we see, the mapping operation, i.e., Equation (1), employed by the method ensures the perfect reproduction of the hourly-varying probabilities of no demand as well as the perfect resemblance of the assumed target theoretical distribution models by the simulated data.
Furthermore, the performance of the stochastic model in terms of reproducing the target hourto-hour correlation coefficients is graphically presented in Figure 8. As shown, the model, further to the marginal distributions, captures perfectly the great variety of lag-1 correlations of hourly water demand which is observed within a day.

Simulation of 1-h Water Demand
The final case study concerns the simulation of residential water demand process at hourly scale, which is the typical temporal scale of analysis of many real-world WDS applications and distribution network simulation models. At this time scale, the process is characterised by strong seasonality and hence it is treated as cyclostationary with marginal distributions and correlation structures that vary periodically from season-to-season. Having said this, we follow the same approach as in the previous two case studies treating hourly water demand at each hour of the day as a separate stochastic process with common marginal distribution, while in the present case the correlation coefficients express the dependence of hourly demand among successive hours. To capture this hour-to-hour variation of hourly water demand and establish this stochastic structure we built upon the SPARTA approach [51,54], and particularly, we employ a 24-season PAR(1) model (see Section 2.6) for the generation of auxiliary Gaussian series. To demonstrate the stochastic modelling strategy in this concept, we generated synthetic hourly water demand data of 4 000 days length (4 000 × 24 values), performing a similar analysis as in the previous two case studies.
Regarding the marginal properties of the hourly water demands, the process still have an intermittent nature, despite the fact that the observed p ND values are much smaller compared to those of 1-min and 15-min scales, while the distributional behaviour of the nonzero values deviates significantly from Gaussianity. Due to this, the cyclostationary stochastic model is coupled with the mixed-type distribution of Equation (8) to describe the entire marginal behaviour of the process. More specifically, we fit and evaluate the two-parameter Gamma, Weibull and Lognormal distributions as well as the three-parameter Generalised Gamma distribution (GG) to model the positive demand quantities. Figure 7 displays the observed and simulated p ND values as well as the observed, theoretical and simulated empirical probability plots of the nonzero hourly water demand records of each hour of the day. As we see, the mapping operation, i.e., Equation (1), employed by the method ensures the perfect reproduction of the hourly-varying probabilities of no demand as well as the perfect resemblance of the assumed target theoretical distribution models by the simulated data. Additionally, the capability of the model to capture the cdf of the entire observed hourly water demand series and the series of the observed total daily volumes is graphically presented in Figures  9c,d, respectively. It is noted that the model adequately captures the empirical distribution of the whole series, although not directly modelled by it. A fact that can be attributed to the preservation of the cyclostationary behavior of the process, in terms of both marginal and dependence properties.  Furthermore, the performance of the stochastic model in terms of reproducing the target hour-to-hour correlation coefficients is graphically presented in Figure 8. As shown, the model, further to the marginal distributions, captures perfectly the great variety of lag-1 correlations of hourly water demand which is observed within a day.

Summary and Conclusions
Due to the key role of water demand processes in the uncertainty-aware planning and management of urban water systems, the pursuit of adequately representing and simulating such processes has been a challenge that occupied many researchers over the years and resulted in the development of numerous simulation schemes. Depending on the time scale of analysis (which is typically a fine one), water demand processes exhibit a variety of marginal and dependence (expressed via auto-or cross-correlations) characteristics (e.g., non-Gaussianity, intermittency, autodependence) that should be reproduced by the stochastic model. However, already existing stochastic models provide a partial solution to the problem since they focus either on the marginal behavior of a process or on its dependence structure, and do not consider simultaneously both aspects.
Aiming to provide a remedy, and eventually a satisfactory modelling and simulation strategy for water demand processes, this work employs for the first time in the WDS modelling domain, the Finally, for demonstration purposes, Figure 9a,b depict the time series of a sequence of 10 randomly selected consecutive days of observed and synthetic hourly demands, respectively. Additionally, the capability of the model to capture the cdf of the entire observed hourly water demand series and the series of the observed total daily volumes is graphically presented in Figure 9c,d, respectively. It is noted that the model adequately captures the empirical distribution of the whole series, although not directly modelled by it. A fact that can be attributed to the preservation of the cyclostationary behavior of the process, in terms of both marginal and dependence properties.

Summary and Conclusions
Due to the key role of water demand processes in the uncertainty-aware planning and management of urban water systems, the pursuit of adequately representing and simulating such processes has been a challenge that occupied many researchers over the years and resulted in the development of numerous simulation schemes. Depending on the time scale of analysis (which is

Summary and Conclusions
Due to the key role of water demand processes in the uncertainty-aware planning and management of urban water systems, the pursuit of adequately representing and simulating such processes has been a challenge that occupied many researchers over the years and resulted in the development of numerous simulation schemes. Depending on the time scale of analysis (which is typically a fine one), water demand processes exhibit a variety of marginal and dependence (expressed via auto-or cross-correlations) characteristics (e.g., non-Gaussianity, intermittency, auto-dependence) that should be reproduced by the stochastic model. However, already existing stochastic models provide a partial solution to the problem since they focus either on the marginal behavior of a process or on its dependence structure, and do not consider simultaneously both aspects.
Aiming to provide a remedy, and eventually a satisfactory modelling and simulation strategy for water demand processes, this work employs for the first time in the WDS modelling domain, the so-called Nataf-based stochastic models that entail the coupling of linear stochastic models with quantile mapping procedures. This type of models, recently employed to address challenging stochastic simulation problems for non-Gaussian hydrometeorological processes [51][52][53], build upon the notion of Nataf's joint distribution model [47] which provides a sound theoretical background, as well as the necessary flexibility to account for a wide range of processes with arbitrary marginal distributions and correlation structures.
The generality, as well as the flexibility provided by the Nataf-based modelling strategy (detailed in Section 2 and summarized in Section 2.8) is stress-tested through three distinct simulation studies that are of particular interest for urban water systems (Section 3). These concern the stochastic simulation of water demand processes at three characteristic time scales (i.e., 1-min, 15-min and 1-h), which arguably exhibit a variety of peculiarities, thus emphasizing different aspects of the method.
The main outcome from the simulation studies is that the employed modelling strategy is sufficient for the simulation of water demand process (stationary or cyclostationary) at any time scale (from 1 h down to 1 min), proving capable of reproducing simultaneously both the marginal and dependence properties of the processes without sacrificing neither precision nor parsimony. Particularly, it is highlighted that depending on the observed process characteristics, the method can be combined with a three-component mixed-distribution model (an additional contribution of this work) to provide an explicit representation of the entire target marginal distribution, accounting also for intermittency and tail behavior. Further to this, we move beyond the reproduction of empirical temporal correlations (typically involving few time lags) to a more complete and parsimonious description of the process's temporal dependence properties through the use of theoretical correlation structures.
We argue that this work provides an interesting alternative to the prominent pulse-based methods (that mainly provide a moments-based representation of a process) for simulating water demand processes, introducing, for the first time within water demand process simulation, classic and easy to implement linear stochastic models. In our view, this approach, exploiting the flexibility (in terms of admissible distributions and correlations structures) of Nataf's model, as well as knowledge derived from large scale studies (e.g., in the spirit of Kossieris and Makropoulos [39]) which allow for a better understanding of the probabilistic laws and correlations that describe water demand, can greatly contribute to the widespread use of alternative stochastically generated water demand realizations, for a more uncertainty-aware design and management of urban water systems. Potential domains of future research could focus on the simulation of spatio-temporal water demand processes at node or household levels, thus accounting also for spatial dynamics. Furthermore, the proposed methodology could be also extended and studied in a disaggregation framework to allow the capturing of probabilistic and stochastic behaviour of water demand across multiple time scales, as well as for short-term and long-term forecasting. Water 2019, 11, x FOR PEER REVIEW 25 of 33 Figure B1: Observed, theoretical and simulated empirical probability plots (Weibull plotting positions and in logarithmic scale) of the nonzero 1-min water demand for each hour of the day. Each plot displays the parameters of the best fitted distribution as well as the observed and simulated ̂ probability of no demand. Figure A1. Observed, theoretical and simulated empirical probability plots (Weibull plotting positions and in logarithmic scale) of the nonzero 1-min water demand for each hour of the day. Each plot displays the parameters of the best fitted distribution as well as the observed p ND and simulatedp ND probability of no demand.