Exploring the Statistical and Distributional Properties of Residential Water Demand at Fine Time Scales

: Residential water demand consists one of the most uncertain factors posing extra difﬁculties in the efﬁcient planning and management of urban water systems. Currently, high resolution data from smart meters provide the means for a better understanding and modelling of this variable at a household level and ﬁne temporal scales. Having this in mind, this paper examines the statistical and distributional properties of residential water demand at a 15-minute and hourly scale, which are the temporal scales of interest for the majority of urban water modeling applications. Towards this, we investigate large residential water demand records of different characteristics. The analysis indicates that the studied characteristics of the marginal distribution of water demand vary among households as well as on the basis of different time intervals. Both month-to-month and hour-to-hour analysis reveal that the mean value and the probability of no demand exhibit high variability while the changes in the shape characteristics of the marginal distributions of the nonzero values are signiﬁcantly less. The investigation of performance of 10 probabilistic models reveals that Gamma and Weibull distributions can be used to adequately describe the nonzero water demand records of different characteristics at both time scales.


Introduction
Residential water demand is characterized by high temporal and spatial variability consisting of the most influential source of uncertainty that poses extra difficulties in the efficient planning and management of urban water systems [1][2][3]. In the conventional modeling of such systems, this variability is typically neglected and the attention is mainly concentrated on the appropriate representation of other, also uncertain, parameters of the system such as the exact roughness, diameter, and length of the pipes [1,4,5]. In this case, a constant multiplier pattern that provides a coarse representation of the variation of demand within the day (typically with hourly time steps) is used to determine the water demand for every type of user in each part of the studied system (node). This deterministic approach is known as "top-down" allocation and has been widely applied in urban water applications [6] even though it fails to adequately reproduce the high variability of residential water demand [7][8][9][10].
Recently, the rising deployment of smart metering technologies, making available large amount of new water demand data at very fine temporal (down to 1 s) and spatial (household or even water appliance) scales, provides new perspectives towards a better analysis, understanding, and modeling of the water demand process (for a comprehensive review of smart urban systems see References [11] and references herein). This dataflow enables the deployment of stochastic simulation techniques, 1.
How the main statistical characteristics of the nonzero 15-minute and hourly water demand vary from household-to-household? 2.
Which statistical characteristics vary the most at different temporal levels (i.e., monthly, daily, hourly basis) and which can be regarded as essentially invariant?

3.
Which are the most suitable probabilistic models to describe the 15-minute and hourly water demand of each household at each temporal level? 4.
Whether there is a single model able to capture the process of all households, all hours and all months, both at a 15-minute and an hourly scale? 5.
What are the typical values of parameters for the studied models?
To provide answers to the above questions, we present a general framework to preliminary screen different distributions as well as to fit and evaluate the most suitable of them. This framework is easily applied and extendable to analyze other large water demand datasets such as that studied in Reference [33] to detect water demand patterns on the basis of behaviors and life habits of residences.
Although this analysis seems to have local character, the size and the nature of the records allows for general conclusions and insights regarding the appropriate probabilistic representation of the residential water demand at the two understudy time scales. Especially the evaluation of the performance of several distributions and the discrimination of the most suitable of them consist of valuable prior knowledge regarding which of them should be considered as the first choice in the modeling of residential water demand. Additionally, the derived set of model parameters can be regarded and used either as default values in cases where no recorded data is available for a household or as prior estimations if the sample size is small and, subsequently, the sufficient fitting of a model is highly uncertain. Lastly, the insights on the seasonal variation of the main statistical characteristics at different temporal levels directly inform and determine the structure (e.g., stationary or cyclo-stationary) of the simulation models that should be employed to capture the variability at these time scales.
The paper is structured as follows: Section 2 provides an overview of the "iWIDGET dataset" and describes the modeling framework and presents a coarse analysis of the main statistical characteristics of the 15-minute and hourly water demand records (answering 1). Section 3 examines the seasonal variation of the marginal distribution of water demand at different temporal levels (answering 2). Section 4 includes the preliminary screening of the candidate models as well as the final identification, fitting, and evaluation of the most suitable to describe the marginal distribution of the understudy variables (answering 3, 4, and 5). Lastly, Section 5 concludes the present analysis summarizing the key findings.

Preparation and Overview of the Dataset
The present study is based on a set of residential water consumption data collected within the framework of the iWIDGET project [32]. The monitoring program concerned the measurement of total water demand at 20 households with different socio-demographic characteristics and consumption profile in the city of Athens, Greece. The installation of smart meters was implemented in September-October 2014 and the first phase of the measuring campaign was completed at the end of October 2015, which is coincident with the completion of the iWIDGET project. As a post-project activity, 10 of the households were selected for further monitoring until October 2016.
The installed equipment logged the instantaneous 15-minute water demand, i.e., it records the total amount of water consumed within a 15-minute time interval. The collected data were transmitted and stored into a central database on a daily basis, which is also accessible by the householders via a web-application [34,35].
Considering the peculiarities of the conducted analysis, we focus on the water consumption records of 11 households that have appropriate quality standards, i.e., (a) total record length larger than 1 year, (b) missing values of 15-minute time series less than 20% for each hour of the day. Unfortunately, the remaining nine time series of the "iWIDGET dataset" do not meet the above criteria either due to a high number of missing values or other special conditions that distort the real consumption profile of the household (i.e., physical absence of the residents for a long time that had as a result no recorded consumption for extended periods). In order to ensure the quality of the selected data set, the 15-minute time series were initially examined for negative or unrealistically large values due to smart meter malfunctions. All negative values and all values greater than 300 L/15 min were flagged as "missing" and were excluded from the analysis. The time series at an hourly time scale were obtained by summing up the 15-minute data of each hour, flagging as "missing" and excluding from the analysis the hours with at least one missing 15-minute value.
Further to the above criteria, our target was the development of a concrete and representative dataset that is composed by households with different socio-demographic characteristics and, hence, consumption profiles. This enables the analysis of time series properties on the basis of the key water consumption determinants (factors) that have direct and strong influence on the per capita water consumption of a household. According to the literature (e.g., see indicatively [36,37]), these factors are: the type (i.e., detached house, apartment) and size of the property, the number of people living in the household (occupancy), the household makeup, and the existence of kids in the family as well as the existence of water activity related to garden watering. The main characteristics of the understudy households as well as the length of hourly records are given in Table 1. The dataset is mainly composed by small families with three or less people while there are also four large families including two of them with kids. The age of adult residents varies from 20 to 70 years old. All households belong to the middle-income to high-income category while the educational level of adult residents is high (i.e., high school or higher level studies). Additionally, all of them are owners of their property working full-time or part-time while retired persons are living in five households (with House ID #2, 3,7,10,11). Lastly, it should be mentioned that, in all households, the same typical water-related activities and appliances are experienced, i.e., washing machine, dishwasher, toilet (WC), bath, shower, and kitchen sink while seven of them use water for garden irrigation. No leakages or bursts were reported by the householders during the period of the collection of data while this was further verified by investigating the recorded series for extensive periods without zero values (indication for a persistent leakage) or suspicious high values for successive time intervals (indication for a pipe burst).
The present analysis was conducted on the basis of per capita consumption rather than the total consumption as it is recorded from the smart meters. Subsequently, the hourly and 15-minute time series, which are analyzed here, is derived by dividing the total water consumption records of the household by its corresponding occupancy. This simple transformation enables the direct comparison of consumption characteristics among different households without affecting the shape characteristics of marginal probability distributions and the dependence structure of the series (i.e., autocorrelation coefficients). Further to that, the analysis and parameterization of stochastic models for the simulation of residential water demand at fine time scales is typically conducted on the basis of per capita values rather on total volumes (e.g., [5,12,14,22]).  Yes  9167  3  Apartment  105  3  No  Yes  8903  4  Detached house  100  3  No  No  19186  5  Apartment  145  3  Yes  Yes  9167  6  Detached house  85  3  Yes  No  11024  7  Apartment  80  2  No  Yes  10727  8  Apartment  110  3  Yes  No  18239  9  Apartment  100  2  No  No  9167  10  Detached house  200  3  No  Yes  18239  11  Apartment  105  4 No Yes 18191

The Modeling Strategy
Residential water demand at fine temporal scales (e.g., at daily and mainly sub-daily scales) occurs as an intermittent process with the presence of zero values (indicating no-demand) in the observed series becoming more and more extensive as the temporal resolution of metering is getting lower. Such a process is described probabilistically via a mixed-type (also known as "discrete-continuous" or "zero-inflated") marginal distribution, composed by a probability mass concentrated at zero (i.e., discrete part), and a continuous part for the positive nonzero values. Given that p ND := P(x = 0) is the probability of no demand (probability zero), then the probability density function (pdf) and the cumulative distribution function (cdf) of x are given by the equations below.
where G x and g x are the cdf and pdf, respectively, of values greater than zero, i.e., G x := F x|x>0 = P(x|x >0). Subsequently, the formulation of F x (x) or f x (x) for an intermittent process such as water demand is based on two distinct parts: (a) the estimation of the discrete part, p ND , calculated directly from observed data as the ratio of the number of zero values over the total number of values and (b) the selection and fit of a suitable continuous distribution G x to describe the probabilistic nature of nonzero values (e.g., [26,29]). This modeling approach has been followed in statistical analysis and stochastic simulation [38] of hydrometeorological processes with intermittent nature (e.g., wind speed and rainfall intensities at fine time scales, discharge of intermittent flows) while a mixed-type model to simulate the total daily water demand of the group of households was formulated by Gargano et al. [39]. On the contrary, such an approach had never been adopted in the simulation of residential water demand at a single-household level since the already developed schemes either model the individual components of demand events (i.e., occurrence, duration, and intensity [5,12,13,[15][16][17][18]) or reproduce the main summary statistics of the process [14,[19][20][21][22] without posing any prerequisite to model and capture the whole marginal distribution of the process.
In the present study, taking into account the above along with recent advances in the stochastic simulation field [26][27][28][29][30], which alleviate the barriers [31] in the deployment of typical linear stochastic models under non-Gaussian assumptions, we treat and examine residential water demand as a random variable described by a mixed-type distribution. Therefore, our focus is on the study of the statistical properties and peculiarities of the positive magnitudes of the records (i.e., continuous part), but also on the discrete part of the process as it is expressed by probability of no demand.
The main statistics that are examined in the present analysis are: (a) the mean value, (b) the standard deviation, (c) the skewness, (d) the L-variation, (e) the L-skewness, and (f) the probability of no demand in a specific time interval. The latter statistic indicates whether there is a probability mass concentrated at zero and is calculated as the ratio of the number of zero values over the total number of values. The mean value is a typical measure of central tendency of the random variable and the standard deviation is used to quantify the amount of dispersion of the data values about their mean. Skewness is a typical measure of the asymmetry of the random variable about its mean. Statistics of higher order (i.e., kurtosis) were not included in the analysis since their estimation is highly uncertain.
During the last decades, L-moments [40] have gained great popularity in the analysis and statistical characterization of environmental data especially in cases of small sample sizes [41,42]. They are linear combinations of order statistics analogous to conventional product moments and they can be used to estimate quantities analogous to standard deviation, skewness, and kurtosis, which are termed the L-scale, L-skewness, and L-kurtosis, respectively. The L-mean is identical to the conventional mean. The sampling properties for L-moments statistics unlike product moments are nearly unbiased even in small samples and are near normally distributed [43]. In the present study, we use the L-variation, τ 2 = λ 2 /λ 1 , and L-skewness, τ 3 = λ 3 /λ 2 , defined as ratios of L-moments λ i . L-variation is analogous to the conventional coefficient of variation (defined as the ratio of standard deviation to the mean value) and expresses the variability of a sample. For positive random variables such as water demand L-variation takes values in the range [0, 1]. L-skewness is a dimensionless measure of asymmetry analogous to a skewness coefficient and can take positive or negative values in the range [−1, 1]. L-skewness equals to zero implies a symmetric distribution while the distribution is right or left skewed for positive and negative values, respectively. Both τ 2 and τ 3 can be regarded as measures of distributional shape since they depend only on the shape parameter of a distribution.

Statistical Characteristics of the Records
To obtain a general idea on the characteristics and peculiarities of the understudy variables, we initially examine the entire per capita 15-minute and 1-h records. Further to the analysis on the basis of the individual household, we also studied the properties of a unified series composed by merging the records of all 11 households. This series, indicated with "All data" in the next figures and tables, can be regarded as a "parent household" that comprises a large number of different realizations of the same random variable irrespective of the household and, hence, the determinants and consumption generation mechanisms.
The representative statistics (mean, standard deviation, skewness, L-variation, L-skewness, probability of no demand) of the nonzero 15-minute and hourly water demand records of each household are presented in Table 2 while a graphical representation is provided via the box plots of Figure 1. In all box plots presented in this paper, the top and bottom lines of the boxes represent the 25% and 75% empirical quantile points, respectively, which defines the empirical interquartile range (IQR) that contains 50% of the central values. The middle horizontal line of the box indicates the median of the sample. Typically, a box plot also displays outliers, i.e., data values out of the range of whiskers (in the standard boxplot graph, the upper and lower fences of the whiskers extend 1.5 × IQR away from the top and bottom lines of the box, respectively). However, in the box plots of the present analysis, in order to provide a more clear and indicative representation of the sample, we choose to not present outliers and define the upper and lower fences of the whiskers as the 5% and 95% empirical quantile points (or the 90% empirical confidence interval (ECI)). Additionally, the mean values (indicated by the blue points) are also displayed in the box plots. As Table 2 and Figure 1 reveal, mean nonzero water demand varies among households while the households composed of a family with kids (House #1, 5, 8) as well as those with water needs for garden irrigation (House #1, 3, 5, 11) have clearly greater values especially at an hourly scale. These households are also characterized by lower percentages of zero values, p ND , values, which implies, in general, not only more intense but also more frequent use of water. Generally, in the great majority of records, the probability of no demand is especially high and varies around 50% at a 15-minute scale and around 30% at an hourly scale. As Figure 1 shows, the households with a greater mean value (blue points in the graph) have also wider IQR as well as 90% ECI, which indicates a greater dispersion of the nonzero data. However, as Table 2 presents, all nonzero water demand records, irrespective of their mean value, are characterized by a considerable high variability, which is expressed by the large L-variation values that vary among households in a narrow range, i.e., from 0.61 to 0.78 and from 0.52 to 0.71 at a 15-minute scale and hourly, respectively. (a) (b) Figure 1. Box plots depicting the nonzero 15-min (a) and hourly (b) per capita water demand of the 11 households and the unified series (indicated as "All data"). The blue points represent the mean value of the samples where dot is used for a "family with kids," triangle is used for a "family of adults," a square is used for "adults," and an inverse triangle is used for the mean of the unified series.
A small variation among households is also noted in the median of the nonzero records with the estimated values being significantly smaller than the corresponding mean. Since it is evident in the above box plots, in most records, about 50% of the recorded 15-min and hourly nonzero values are concentrated in the narrow intervals (0, 1.5) and (0, 3.5), respectively, which indicates the prevalence of small water consumption uses in the sample.
As we can further note, in all box plots, the median is much closer to the bottom line of the box than the top line, which indicates the much higher concentration of data between the 25% and 50% quantile. Additionally, in all cases, the upper whisker is much longer than the lower one, which is not even distinguishable in some records. These indicate the high positive skewness of nonzero water demand at both understudy time scales. Furthermore, especially the box plots of the records with a higher mean value are characterized by long upper whiskers, which far exceeds the length of the box (IQR) and indicates a heavy-tailed distribution. The level of asymmetry of the nonzero water demand records, in terms of L-skewness, is given in Table 2. As we see, L-skewness varies irregularly among households from 0.40 to 0.67 and from 0.31 to 0.54 at 15-min and hourly scales, respectively.
As is already known, residential water demand exhibits significant seasonal variation at different temporal scales (i.e., daily, weekly, monthly), which is often attributed to various coexisting factors such as, among others, household makeup, the characteristics (frequency, duration, and volume) of individual uses, and climatological conditions [6]. This observation typically concerns the mean per capita water demand (including zero values) with the relevant studies examining how the latter varies from month-to-month and/or throughout the day (diurnal pattern of demand) against certain consumption habits.
However, whether and how the other characteristics (i.e., higher order moments) including the mean value of nonzero magnitudes of the marginal distribution vary have never been examined before. This is of high importance since such an analysis reveals which statistics vary the most and which can be regarded as invariant at different time scales, which determine directly the type of statistical and stochastic models that should be employed to model properly residential water demand.
Having said that, the aim of the present section is twofold. On the one hand, to provide insights Figure 1. Box plots depicting the nonzero 15-minute (a) and hourly (b) per capita water demand of the 11 households and the unified series (indicated as "All data"). The blue points represent the mean value of the samples where dot is used for a "family with kids", triangle is used for a "family of adults", a square is used for "adults", and an inverse triangle is used for the mean of the unified series.
A small variation among households is also noted in the median of the nonzero records with the estimated values being significantly smaller than the corresponding mean. Since it is evident in the above box plots, in most records, about 50% of the recorded 15-minute and hourly nonzero values are concentrated in the narrow intervals (0, 1.5) and (0, 3.5), respectively, which indicates the prevalence of small water consumption uses in the sample.
As we can further note, in all box plots, the median is much closer to the bottom line of the box than the top line, which indicates the much higher concentration of data between the 25% and 50% quantile. Additionally, in all cases, the upper whisker is much longer than the lower one, which is not even distinguishable in some records. These indicate the high positive skewness of nonzero water demand at both understudy time scales. Furthermore, especially the box plots of the records with a higher mean value are characterized by long upper whiskers, which far exceeds the length of the box (IQR) and indicates a heavy-tailed distribution. The level of asymmetry of the nonzero water demand records, in terms of L-skewness, is given in Table 2. As we see, L-skewness varies irregularly among households from 0.40 to 0.67 and from 0.31 to 0.54 at 15-minute and hourly scales, respectively.
As is already known, residential water demand exhibits significant seasonal variation at different temporal scales (i.e., daily, weekly, monthly), which is often attributed to various coexisting factors such as, among others, household makeup, the characteristics (frequency, duration, and volume) of individual uses, and climatological conditions [6]. This observation typically concerns the mean per capita water demand (including zero values) with the relevant studies examining how the latter varies from month-to-month and/or throughout the day (diurnal pattern of demand) against certain consumption habits.
However, whether and how the other characteristics (i.e., higher order moments) including the mean value of nonzero magnitudes of the marginal distribution vary have never been examined before. This is of high importance since such an analysis reveals which statistics vary the most and which can be regarded as invariant at different time scales, which determine directly the type of statistical and stochastic models that should be employed to model properly residential water demand.
Having said that, the aim of the present section is twofold. On the one hand, to provide insights on how the understudy statistical characteristics of 15-minute and hourly water demand records (i.e., the probability of no demand as well as the mean value, the L-variation, and the L-skewness of nonzero data) vary among households from month-to-month, hour-to-hour, and day-to-day. On the other hand, to support the identification and selection of suitable distributions for the modeling of understudy variables, Section 4 is presented. The present section directly answers the questions 1 and 2, as posed in Section 1.

Variation from Month-to-Month
To examine the variation of nonzero water demand from month-to-month for each household including the "parent household", we formed 12 individual records (one record per month). The main statistical characteristics of these records are displayed in Figures 2 and 3 for a 15-minute and hourly scale, respectively. Indicatively, the variation of the nonzero demand records of House #4 on a monthly basis is presented in the form of boxplots in Figure A1 in Appendix A.
As observed, the mean value of the nonzero water demand in most households at both time scales (Figures 2b and 3b) exhibits a small variation among months while a significant increase during the summer months is observed in two of the households (House #1, 3), which is attributed to the intense use of water for garden irrigation. Deviation from this pattern is noted in House #5 where the continuous supply of grass with small amounts of water has as a result the lower mean values, i.e., small values prevail in the sample and significantly lower proportion of zero values during the summer months.
As Figures 2a and 3a depict, the households which are characterized by a small monthly change in the mean value (House #4, 7, 10) exhibit, in general, small variation in the p ND while the slightly higher values observed in the summer months and especially in August is due to the absence of householders for vacations. This also stands for the variation pattern of p ND of the "parent household". Despite the fact that, in all cases, the major changes in p ND values are noted in the summer months, which is a remarkable differentiation in the patterns of this variation studied along the mean value, and is observed. There are cases where the pattern of variation of the two statistics are consistent (House #5, 8,11). The cases where the patterns show a contrasting behavior to each other (House #1) as well as cases where p ND varies without any variation is noted in the corresponding mean values (House #2, 3,6). This highlights the difficulty to associate the mechanism of variation of these statistics with specific water consumption habits even in households with similar characteristics and prevailing uses such as garden irrigation since this is, obviously, the result of different aspects of the water demand process, which are interrelated at these time scales.
Water 2017, 9, x FOR PEER REVIEW  8 of 34 other hand, to support the identification and selection of suitable distributions for the modeling of understudy variables, Section 4 is presented. The present section directly answers the questions 1 and 2, as posed in Section 1.

Variation from Month-to-Month
To examine the variation of nonzero water demand from month-to-month for each household including the "parent household," we formed 12 individual records (one record per month). The main statistical characteristics of these records are displayed in Figures 2 and 3 for a 15-min and hourly scale, respectively. Indicatively, the variation of the nonzero demand records of House #4 on a monthly basis is presented in the form of boxplots in Figure A1 in Appendix A.
As observed, the mean value of the nonzero water demand in most households at both time scales (Figures 2b and 3b) exhibits a small variation among months while a significant increase during the summer months is observed in two of the households (House #1, 3), which is attributed to the intense use of water for garden irrigation. Deviation from this pattern is noted in House #5 where the continuous supply of grass with small amounts of water has as a result the lower mean values, i.e., small values prevail in the sample and significantly lower proportion of zero values during the summer months.
As Figures 2a and 3a depict, the households which are characterized by a small monthly change in the mean value (House #4, 7, 10) exhibit, in general, small variation in the while the slightly higher values observed in the summer months and especially in August is due to the absence of householders for vacations. This also stands for the variation pattern of of the "parent household." Despite the fact that, in all cases, the major changes in values are noted in the summer months, which is a remarkable differentiation in the patterns of this variation studied along the mean value, and is observed. There are cases where the pattern of variation of the two statistics are consistent (House #5, 8,11). The cases where the patterns show a contrasting behavior to each other (House #1) as well as cases where varies without any variation is noted in the corresponding mean values (House #2, 3,6). This highlights the difficulty to associate the mechanism of variation of these statistics with specific water consumption habits even in households with similar characteristics and prevailing uses such as garden irrigation since this is, obviously, the result of different aspects of the water demand process, which are interrelated at these time scales. Regarding τ2 and τ3, as Figures 2c,d and 3c,d present, a small fluctuation is noted from monthto-month in most of the households (note that y-axis of these plots spans the range covered by the estimated values to provide a more clear picture). It is worth noting that the two households with the significant higher mean value in the summer months (House #1, 3) do not exhibit any significant changes in terms of the two shape statistics. By comparing the variation of the two statistics, it is interesting to note that, at both time scales, they follow an identical pattern in all households with peaks appearing in the same months.  Regarding τ 2 and τ 3 , as Figure 2c,d and Figure 3c,d present, a small fluctuation is noted from month-to-month in most of the households (note that y-axis of these plots spans the range covered by the estimated values to provide a more clear picture). It is worth noting that the two households with the significant higher mean value in the summer months (House #1, 3) do not exhibit any significant changes in terms of the two shape statistics. By comparing the variation of the two statistics, it is interesting to note that, at both time scales, they follow an identical pattern in all households with peaks appearing in the same months. Regarding τ2 and τ3, as Figures 2c,d and 3c,d present, a small fluctuation is noted from monthto-month in most of the households (note that y-axis of these plots spans the range covered by the estimated values to provide a more clear picture). It is worth noting that the two households with the significant higher mean value in the summer months (House #1, 3) do not exhibit any significant changes in terms of the two shape statistics. By comparing the variation of the two statistics, it is interesting to note that, at both time scales, they follow an identical pattern in all households with peaks appearing in the same months.  Summarizing the above findings, we can say that, at both time scales, the statistic with a higher monthly variation is p ND followed by the mean value. The changes in these statistics as well as in the two shape characteristics are observed mainly during the summer months.
Furthermore, we examined if there are any significant changes of the above discussed statistical characteristics from day-to-day since there is evidence in the literature that residential water consumption exhibits changes especially during week-end days [6,44]. Towards this, seven individual records (one record per day) were formed for each household as well as the "parent household". Figures A2 and A3, provided in Appendix A, show that the understudy statistics of all households remain almost invariant among days at both time scales. This is also illustrated in Figure A1 that presents the variation of the nonzero 15-minute and hourly water demand records, respectively, of House #4.

Variation from Hour-to-Hour (Diurnal Pattern)
As is widely known, the total residential water demand exhibits significant variation throughout the day as the result of the different consumption habits experienced during the day. Typically, the diurnal consumption pattern is composed by low flows during night hours, a sharp morning peak and a moderate flow during the day until an evening and/or early night peak [6,44]. However, this fluctuation concerns the mean water demand (including zero values) while the diurnal pattern of other statistical characteristics including the mean of nonzero values have never been examined.
The analysis of previous section showed that the understudy statistics especially the mean value and the p ND of some households exhibit a non-negligible variation especially during the summer months. This observation lead us to initially examine whether the diurnal pattern and the statistics also varies month-to-month. Towards this, for each month, we formed 24 individual records (one record per hour) by assuming that, within a specific hour, the main statistical characteristics of a household remain invariant (i.e., 12 × 24 records for each household). The individual record of each hour without monthly discrimination was also formed (i.e., 24 records for each household assuming no variation of the diurnal pattern among months) and the statistics of these records were estimated and plotted to reveal any changes of the diurnal pattern of the studied statistics from month-to-month and per household. An indicative example of such plots is provided in Figure A4 of Appendix A, which present the diurnal pattern of variation of the four studied statistics for the above 15-minute records of House #1.
The visual analysis of these plots shows that only the four households (House #1, 2, 3, 5) whose statistics exhibit a significant differentiation among winter and summer months (see Section 2.1) exhibit a differentiation in the diurnal pattern of the statistics between the low and high consumption periods. To take into account this differentiation, the hour-to-hour analysis for these households was conducted by assuming two different periods: a winter and a summer (i.e., with irrigation use) diurnal pattern of variation of the statistics in which the statistical characteristics of each specific hour correspond to the same process. The "summer period" for House #1 extends from April to November and for House #2 from June to August., from May to October for House #3 and from June to August for House #5. Regarding the remaining households (House #4, 6,7,8,9,10,11), the analysis will be conducted assuming that the marginal distribution of each specific hour remains invariant among the months since no significant deviation of the statistics of the individual month records from those without monthly separation is noted.
According to the above, for the seven households without monthly fluctuation, we formed 24 individual records (one record per hour) using the data of all months, assuming that within a specific hour the main statistical characteristics remain invariant. Regarding the remaining four households, the 24 individual records were formed separately for the winter and summer periods. The understudy statistics of these records at 15-minute and hourly scales are graphically presented in Figures 4 and 5, respectively, where the dotted lines in the plots correspond to the data of the summer months. Figures 4b and 5b present the diurnal pattern of variation of the mean value of the nonzero records at 15-minute and hourly records, respectively. It is interesting to note that, in almost all households, this fluctuation deviates from the typical diurnal pattern of the mean demand (i.e., series including zero values, which is described in the beginning of Section 4). This concerns the small difference in the mean values of nonzero data between daily and night hours and, mainly, the fact that, within these periods, this statistic does not exhibit significant shifts. On the contrary, p ND (see Figures 4a and 5a) is in closer agreement, although contrasting with the known diurnal consumption pattern and exhibits higher hour-to-hour variability with sharper shifts during the daily hours. The combined study of these two statistics reveals that, in most understudy households, the hour-to-hour variation of total demand is the result of the fluctuation of the frequency of water use, as expressed via p ND , rather of the intensity of use, as expressed via the mean of nonzero values.
The extremely large mean values observed during the night hours of the summer period for House #1 is attributed to the extensive grass irrigation while the extensive presence of large nonzero values in this sample has as a result low L-variation and L-skewness values with the latter becoming negative (i.e., left-skewed data).
Studying the diurnal pattern of the shape statistics, it is evident that the main differentiation is observed between the daily and night hours. As we see in Figures 4c and 5c, most values of L-variation are concentrated in a narrow interval, which range from 0.6 to 0.8 and from 0.5 to 0.7 for 15-minute and hourly records, respectively, while the variation from hour-to-hour is not very large especially during the daily hours (note that the y-axis of these plots spans the range covered by the estimated values to provide a more clear picture). Regarding L-skewness (see Figures 4d and 5d), most of the estimated values are gathered in a wider range that spans from 0.4 to 0.7 and from 0.2 to 0.6 for 15-minute and hourly records, respectively. values, which is described in the beginning of Section 4). This concerns the small difference in the mean values of nonzero data between daily and night hours and, mainly, the fact that, within these periods, this statistic does not exhibit significant shifts. On the contrary, (see Figures 4a and 5a) is in closer agreement, although contrasting with the known diurnal consumption pattern and exhibits higher hour-to-hour variability with sharper shifts during the daily hours. The combined study of these two statistics reveals that, in most understudy households, the hour-to-hour variation of total demand is the result of the fluctuation of the frequency of water use, as expressed via , rather of the intensity of use, as expressed via the mean of nonzero values.
The extremely large mean values observed during the night hours of the summer period for House #1 is attributed to the extensive grass irrigation while the extensive presence of large nonzero values in this sample has as a result low L-variation and L-skewness values with the latter becoming negative (i.e., left-skewed data).
Studying the diurnal pattern of the shape statistics, it is evident that the main differentiation is observed between the daily and night hours. As we see in Figures 4c and 5c, most values of Lvariation are concentrated in a narrow interval, which range from 0.6 to 0.8 and from 0.5 to 0.7 for 15mins and hourly records, respectively, while the variation from hour-to-hour is not very large especially during the daily hours (note that the y-axis of these plots spans the range covered by the estimated values to provide a more clear picture). Regarding L-skewness (see Figures 4d and 5d), most of the estimated values are gathered in a wider range that spans from 0.4 to 0.7 and from 0.2 to 0.6 for 15-min and hourly records, respectively.    Summarizing the above, we can say that, at both time scales, the statistic with higher hour-tohour variation is while the mean values do not exhibit any significant changes during the hours of the day. Significant changes in these statistics as well as in the two shape characteristics are mainly observed between the daily and night hours.

Probabilistic Models for Residential Water Demand
The analysis of previous sections revealed that the main statistical characteristics of the nonzero 15-min and hourly water demand vary among households while significant variation was also noted on hourly and, in some cases, on a monthly basis. Further to that, as discussed in Section 1, the already developed simulation schemes for water demand incorporate different distributions to model either the individual components of demand events (i.e., occurrence, duration, and intensity) or the total demand magnitudes [5,[12][13][14][15][16][17][18]22]. However, it is worth mentioning that these studies focus on the demonstration of the structure of the simulation model, typically fitted and validated on a short series from one household, rather on the provision of general insights regarding the suitability of specific distributions to model water demand and, hence, they have a local character in terms of statistical characterization of this process. Additionally, these models address the simulation problem at superfine time scales (i.e., 1-s up to 1-min), which are not the temporal resolution of interest of the present study.
The above may lead one to assume that different probabilistic models should be employed to capture adequately the statistical behavior of water demand process at different households and time intervals (i.e., for different hours of the day or even months). Having said that, it is of high importance to provide rigorous insights regarding which distributions, from the vast amount available in the literature, can be considered more suitable to describe the understudy variables based on a large Summarizing the above, we can say that, at both time scales, the statistic with higher hour-to-hour variation is p ND while the mean values do not exhibit any significant changes during the hours of the day. Significant changes in these statistics as well as in the two shape characteristics are mainly observed between the daily and night hours.

Probabilistic Models for Residential Water Demand
The analysis of previous sections revealed that the main statistical characteristics of the nonzero 15-minute and hourly water demand vary among households while significant variation was also noted on hourly and, in some cases, on a monthly basis. Further to that, as discussed in Section 1, the already developed simulation schemes for water demand incorporate different distributions to model either the individual components of demand events (i.e., occurrence, duration, and intensity) or the total demand magnitudes [5,[12][13][14][15][16][17][18]22]. However, it is worth mentioning that these studies focus on the demonstration of the structure of the simulation model, typically fitted and validated on a short series from one household, rather on the provision of general insights regarding the suitability of specific distributions to model water demand and, hence, they have a local character in terms of statistical characterization of this process. Additionally, these models address the simulation problem at super-fine time scales (i.e., 1-s up to 1-min), which are not the temporal resolution of interest of the present study.
The above may lead one to assume that different probabilistic models should be employed to capture adequately the statistical behavior of water demand process at different households and time intervals (i.e., for different hours of the day or even months). Having said that, it is of high importance to provide rigorous insights regarding which distributions, from the vast amount available in the literature, can be considered more suitable to describe the understudy variables based on a large dataset. This process, from a practical point-of-view, informs and facilitates future water demand modeling studies by restricting the search of candidate distributions to a small set and by highlighting which of them should be considered the first choice. Further to that, to provide a representative set of model parameters for the most suitable distributions, it is of high importance since these sets can be regarded and used either as default values in cases where no recorded data is available for a household or as prior estimations if the sample size is small and, subsequently, the sufficient fitting of a model is highly uncertain. The present section provides answers to the questions 3, 4, and 5, as posed in Section 1.

Preliminary Screening of Models
A widely used approach for the preliminary discrimination of most consistent distributions to describe the understudy variable is the so-called "moment-ratio diagrams". This type of analysis has been widely applied in the probabilistic study of hydrological variables especially in cases where a large number of records are examined since it provides a quick and direct evaluation of the suitability of several distributions from a statistical point-of-view by using just a single graphical instrument [38,41,42,45]. In the present work, we first transfer this framework in the analysis of large datasets of water demand records.
The moment-ratio diagrams provide a graphical comparison between the sample statistics, calculated from the records, and the theoretical ones as they are given by the formulas of the candidate parametric distributions. The proximity of the sample statistics to the theoretical locus (i.e., point, curve, or area) of a particular distribution provides an indication of the appropriateness of this distribution to describe the understudy records, which implies whether a specific set of a distribution parameter exists so as to reproduce the understudy moments. In such diagrams, a coefficient of variation vs. a coefficient of skewness or a coefficient of skewness vs. a coefficient of kurtosis are typically depicted while any pair of standardized moments could be also used. A comprehensive review of the conventional product moment-ratio diagrams summarizing 37 theoretical probabilistic models can be found in Vargo and Leemis [46].
During the last decades, the L-moments ratio diagrams, which were first introduced by Hosking [40], have gained popularity against conventional ones and have been widely used in many hydrological applications due to the better statistical properties of L-moments [42]. L-variation vs. L-skewness and L-skewness vs. L-kurtosis pairs are mostly used while the latter is more popular since L-variation is not well defined for distributions with a mean value equals zero or is negative. The theoretical locus of a distribution in L-ratio diagrams as well as in conventional moments diagrams can be a point, a curve, or an area depending on the number of parameters of the distribution. In L-skewness vs. L-kurtosis diagram, the type of the locus depends on the number of shape parameters, i.e., distributions without a shape parameter form a point (e.g., Normal distribution) with one shape parameter forming a curve (e.g., Gamma distribution) and with two shape parameters forming an area (e.g., Burr Type XII). On the contrary, the theoretical locus of distributions in an L-variation vs. an L-skewness diagram depends also on the location and scale parameters (if they are adjusted to take any value) and, subsequently, the number of distributions whose locus is a point and is lower. The present analysis was based on the L-variation vs. L-skewness diagram for positive random variables such as nonzero water demand is well defined and more robust than L-kurtosis [38].
The special characteristics of the 15-minute and an hourly nonzero demand process, as derived from the above analysis, determine the initial set of candidate probabilistic models that are going to be examined as more suitable to describe the understudy variables. First, the studied distributions should be defined in the real positive axis. Additionally, the high L-skewness and L-variation coefficients along with the great variability of those shape characteristics from household-to-household and hour-to-hour imply the use of distributions whose shape parameters support a range of values for those statistics. Further to that, we took into account the probabilistic models that are used by the simulation schemes to model water demand at super-fine time scales (e.g., the Exponential [1, [19][20][21][22], Weibull [5], Lognormal [15,17,47], and Normal [14]) as well as in the simulation of hydrological variables with similar characteristics.
Having said that, as candidate models to describe the understudy variables, we selected the Gamma (G) distribution, the Generalized Logistic (GL) bounded at zero from below, the Generalized Pareto (GP) bounded at zero from below, the Generalized Extreme Value (GEV) distribution bounded at zero from below, the Lognormal (LN) distribution and the Weibull (W) distribution. The Normal (N) and Exponential (EXP) distribution were also included in the analysis since they are very popular and widely used models. As we see in the L-variation vs. L-skewness diagrams of Figure 6, the locus of all understudy distributions is a curve except from that of the Exponential distribution, which is controlled via just one scale parameter and subsequently it has constant L-variation (τ 3 = 0.50) and L-skewness (τ 3 = 1/3). Normal distribution is symmetric about the mean, i.e., τ 3 = 0, and their theoretical locus is a horizontal line since L-variation can be adjusted to take any value by controlling the location and scale parameters. Having said that, as candidate models to describe the understudy variables, we selected the Gamma (G) distribution, the Generalized Logistic (GL) bounded at zero from below, the Generalized Pareto (GP) bounded at zero from below, the Generalized Extreme Value (GEV) distribution bounded at zero from below, the Lognormal (LN) distribution and the Weibull (W) distribution. The Normal (N) and Exponential (EXP) distribution were also included in the analysis since they are very popular and widely used models. As we see in the L-variation vs. L-skewness diagrams of Figure 6, the locus of all understudy distributions is a curve except from that of the Exponential distribution, which is controlled via just one scale parameter and subsequently it has constant L-variation (τ3 = 0.50) and Lskewness (τ3 = 1/3). Normal distribution is symmetric about the mean, i.e., τ3 = 0, and their theoretical locus is a horizontal line since L-variation can be adjusted to take any value by controlling the location and scale parameters. Following the analysis of the variation of the two shape characteristics of the nonzero water demand at different temporal levels of analysis (i.e., monthly, daily, hourly), the preliminary identification of most suitable probabilistic models as well as their fitting and further performance evaluation was conducted on the records used in the hour-to-hour analysis, i.e., 24 individual records for the seven households without significant monthly variation and a different set of individual records for the winter and summer period for the remaining four households (i.e., 24 × 7 + 24 × 2 × 4 = 360 records). Since L-variation and L-skewness do not exhibit significant variation from hour-tohour especially among successive hours, the following analysis could be conducted for forming larger individual records composed of 15-min and hourly data of more than one hour intervals (e.g., assuming a variation between night and daily hours). However, we chose to keep the seasonal discretization of the previous section in order to study a larger cluster of L-points.
Subsequently, the evaluation of the suitability of various distributions was not conducted on the basis of 11 series of households but on the basis of 360 individual records.
The observed L-points (τ2,τ3) of the understudy nonzero 15-min and hourly water demand records along with the theoretical locus of the examined distributions are presented in Figure 6. Visually, it is apparent that Gamma and Weibull distribution provide the best approximation to the great majority of observed L-points at both time scales while a smaller cluster of points is concentrated closer to the Lognormal distribution especially at an hourly scale. As we see, the average position of the sample 15-min L-points (0.63,0.50) lies upon the theoretical curve of Weibull distribution while, at the hourly scale, the average point (0.58,0.40) is among the curves of Gamma and Weibull distributions close to the point of their interception. It is worth noting that the sample L-points, which are out of the main body of the cluster (i.e., τ2 ≤ 0.50 and τ3 ≤ 0.30) and away to the curves of the above distributions, correspond to the records of night hours, i.e., 12:00 to 8:00 am, Following the analysis of the variation of the two shape characteristics of the nonzero water demand at different temporal levels of analysis (i.e., monthly, daily, hourly), the preliminary identification of most suitable probabilistic models as well as their fitting and further performance evaluation was conducted on the records used in the hour-to-hour analysis, i.e., 24 individual records for the seven households without significant monthly variation and a different set of individual records for the winter and summer period for the remaining four households (i.e., 24 × 7 + 24 × 2 × 4 = 360 records). Since L-variation and L-skewness do not exhibit significant variation from hour-to-hour especially among successive hours, the following analysis could be conducted for forming larger individual records composed of 15-minute and hourly data of more than one hour intervals (e.g., assuming a variation between night and daily hours). However, we chose to keep the seasonal discretization of the previous section in order to study a larger cluster of L-points.
Subsequently, the evaluation of the suitability of various distributions was not conducted on the basis of 11 series of households but on the basis of 360 individual records.
The observed L-points (τ 2 ,τ 3 ) of the understudy nonzero 15-minute and hourly water demand records along with the theoretical locus of the examined distributions are presented in Figure 6. Visually, it is apparent that Gamma and Weibull distribution provide the best approximation to the great majority of observed L-points at both time scales while a smaller cluster of points is concentrated closer to the Lognormal distribution especially at an hourly scale. As we see, the average position of the sample 15-minute L-points (0.63,0.50) lies upon the theoretical curve of Weibull distribution while, at the hourly scale, the average point (0.58,0.40) is among the curves of Gamma and Weibull distributions close to the point of their interception. It is worth noting that the sample L-points, which are out of the main body of the cluster (i.e., τ 2 ≤ 0.50 and τ 3 ≤ 0.30) and away to the curves of the above distributions, correspond to the records of night hours, i.e., 12:00 to 8:00 am, where the probability of zero demand is high and the sample size of nonzero records is smaller. Since the variability of statistics is closely associated with sample size, it is expected that these points would be closer to the main scatter of L-points and closer to the distributions if a greater number of records was available.
The preliminary selection of the three above discussed distributions (i.e., Gamma, Weibull, and Lognormal) as more suitable to describe nonzero demand is further supported by the complementary L-variation vs. L-skewness diagrams provided in the Appendix B. Figure A5 present the sample L-points (τ 2 ,τ 3 ) of the individual records of each hour assuming also monthly separation (i.e., 12 × 24 = 288 points for each household), at a 15-minute and hourly scale, respectively. As shown, the sample L-plots are mostly concentrated around the three above mentioned distributions while the average L-point lies upon the curve of Weibull distribution. Additionally, Figure A5 present the L-points (τ 2 ,τ 3 ) of the entire nonzero record of each household (12 points) without assuming seasonal discrimination at 15-minute and hourly time scales, respectively, while Figure A5 contain the L-points of the records of each hour of the day including the records of all households at 15-minute and hourly scale, respectively (24 points). In all cases, the sample L-points are very close to the curve of either Gamma or Weibull distribution.

Evaluation of the Models
The preliminary evaluation from the previous section showed that Gamma and Weibull distribution seem to be the most suitable probabilistic models to describe at least in terms of the first three L-moments. The great majority of the nonzero 15-minute and hourly water demand records of all households and all hours of the day. At the same time, it seems that a smaller percentage of records is better captured by the Lognormal distribution. The performance of the three distributions is further evaluated and compared after their actual fitting to the above described records. In this section, we present the evaluation of the three models on the basis of fitting error measures while the fitting procedure along with the analysis of the values of distribution parameters are presented in the next section.
The three probabilistic models, which are examined, have two parameters and more specifically one scale and one shape parameter with the latter controlling the asymptotic behavior of the distribution tail. The two-parameter Gamma distribution is one of the most commonly used probabilistic models in hydrological applications due to its special characteristics, i.e., it is defined only for non-negative values of the variable and is positively skewed. Its pdf is given by the equation below.
where β > 0 is the scale parameter and γ > 0 is the shape parameter. For γ < 1, the pdf is J-shaped with an infinite ordinate at x = 0 and its right tail decreases slightly faster than the exponential tail. For γ = 1 the distribution is identical with exponential whereas, for γ > 1, the pdf is bell-shaped and it has a tail that decreases slightly slower than the exponential one. For large values of shape parameters (γ > 15) Gamma distribution approaches the normal. The Weibull distribution is another popular probabilistic model and its pdf is given by the equation below.
where β > 0 is the scale parameter and γ > 0 is the shape parameter. Like Gamma, Weibull distribution is positively skewed and defined for non-negative values of the variable while the value of shape parameter controls the form of the pdf. For γ < 1, the pdf is J-shaped with an infinite ordinate at x = 0 while, for γ = 1, the pdf tends to 1/γ as x approaches 0 and then decreases exponentially. For γ > 1, the pdf tends to 0 at x = 0 from above, increases until the mode, and decreases after the mode. Regarding the asymptotic behavior of the tail, for γ < 1, the Weibull distribution has a heavier tail (i.e., tending to zero less rapidly) than the exponential one while, for γ > 1, the tail decreases faster than the exponential. Lastly, the pdf of Lognormal distribution is given by the equation below.
where β > 0 is the scale parameter and γ > 0 is the shape parameter. A random variable x is lognormally distributed if random variable y = ln(x) is normally distributed. Due to this transformation, Lognormal distribution is defined for non-negative values of the variable, is positively skewed, and has a bell-shaped pdf. It is worth noting that the Lognormal distribution is commonly parameterized by using the mean and standard deviation of the distribution on the log scale. In the present work, we use the shape and scale parameterization concept to facilitate the direct comparison with the other two understudy distributions. Regarding extremes, Lognormal distribution is considered a heavy-tailed distribution.
The three probabilistic models are characterized by different tail behaviors, which have been have been extensively assessed and compared in the reproduction of extremes of different hydrometeorological variables such as rainfall and runoff (e.g., see Reference [48] and the literature herein). Summarizing the above, further to the Lognormal distribution that is considered heavy-tailed distribution, Weibull can be both heavy-tailed and light-tailed depending on the shape parameter while Gamma is considered essentially as an exponential tail distribution. The ordering of these distributions from a heavier to a lighter tail is Lognormal, Weibull with γ < 1, Gamma and Weibull distribution with γ > 1.
It is worth noting that the three above described distributions as well as exponential distributions are special cases of the Generalized Gamma distribution [49]. This distribution is very flexible but more complicated. It is comprised of one scale and two shape parameters that govern the form of the pdf and the behavior of the left and right tail. Although this model can be clearly regarded as a "universal model" for the 15-minute and hourly nonzero water demand, we prefer in this scenario to focus our study on two parameter models for the sake of parsimony and simplicity. Furthermore, the estimation of three parameters instead of two is a more uncertain, sensitive, and complicated procedure, requiring very long records of data. The appropriate reproduction of extremes is also of high importance in the sufficient modeling of residential water demand since the correct estimation of peak flow demand plays a crucial role in the efficient design, planning, and management of urban water systems [6,50]. Subsequently, a naïve selection of the probabilistic model may result in the overestimation or underestimation of maximum water demand magnitudes. In this respect, further to the standardized mean square error (MSE) that was used as the main "goodness-of-fit" measure to evaluate the overall fitting performance of the above three distributions, two additional error measures that indicate the performance in terms of extreme values were employed. A similar set of measures were employed by Papalexiou and Koutsoyiannis [38] to evaluate the distributions' performance on daily precipitation.
The MSE considers the relative error between the empirical and the theoretical quantile values.
where x u is the value predicted by the theoretical quantile function of the distribution where u equals the empirical probability p i of x (i) (the i indicates the position of the ith element in the sample of length n, ranked in ascending order), according to the Weibull plotting position, i.e., p i = i/(n + 1). In this case, taking into account the observations on the form of empirical pdf of nonzero water demand records, we preferred to use a standardized measure instead of the classical MSE in order to avoid the dominance of large values in the estimation of the error. The fitted distribution with the smallest ER 1 value is considered as better.
Complementary to the MSE, the first measure, focused on the extremes, considers the relative error between the m largest sample values and the corresponding theoretical values.
In the present work, ER 2 was estimated on the 10 largest values for each of the understudy record and distribution while the better fitted distribution is that with the smallest ER 2 value. The second measure assesses the percentage difference between the maximum sample value and the predicted maximum value.
The negative or positive percentage difference in ER 3 implies, respectively, underestimation or overestimation of the maximum value by the fitted distribution. This error measure along with ER 2 provides a clear indication on which type of tail behavior of the three understudy probabilistic is in better agreement with the observed water demand extremes.
The three distributions were fitted on the individual records described in the previous section as well as on the entire record of each household (i.e., 11 series) and each hour of the day (i.e., data from all households for each hour of the day, 24 series) for 15-minute and hourly time scale. Since all three distributions that are studied here are positively skewed, it was not possible to obtain a set of parameters for the individual records with negative or zero L-skewness. As discussed in Section 3.2, this is the case for seven 15-minute records and 10-h records, which correspond mainly to night hours (02:00-07:00 am) during the summer period. These records were excluded from the below evaluation of fitting, which is flagged as "NA" (i.e., Not Available) in Tables A1 and A2 in Appendix B that present the best fitted distribution on the understudy 15-minute and hourly nonzero water demand records, respectively, according to ER 1 measures. Figure 7 presents, indicatively, the empirical probability plots per hour of the three fitted distributions to the nonzero 15-minute water demand values of House #4. All three distributions fit well the sample data while the different right tail behavior of each distribution is clearly illustrated with a Lognormal distribution having a heavier tail than the other two. Similar behavior was observed on the plots of the remaining households at both time scales.
The fitting performance of G, W, and LN distribution on the individual records are graphically illustrated in terms of box plots (Figure 8) while Table A3 in Appendix B summarizes the main statistical characteristics of the three error measures along with the percentage of records in which each distribution exhibits a better performance. Figure 8 and Table A3 show that G and W distributions perform much better than the LN. According to the ER 1 measure that assesses the overall fitting, the former two distributions achieve a very good fit to the great majority of individual 15-minute and hourly records (i.e., 85% of 15-minute records and 95% of hourly records) while the analysis of individual values revealed that they fitted equally well with the LN distribution in the records in which the latter is appears to be superior (i.e., mainly in the records of House #5 and #2). As we see in Figure 8a,b, the LN distribution is characterized by very high mean ER 1 values, i.e., 8.78 and 25.81 in 15-minute and hourly records, respectively, along with much wider 90% ECI. In Tables A1 and A2, it becomes evident that the three study distributions do not dominate in specific time intervals or households in terms of the ER 1 measure since G and W distributions are the best fitted models in the records of all households.   Focusing on the fitting measures regarding extremes, G and W distributions perform equally well with mean ER equal to 5.92 and 5.77, respectively, in 15-min records, and 7.85 and 7.93, respectively, in hourly records. On the contrary, the LN distribution has higher mean values such as 11.53 and 10.6 at 15-min and hourly scale, respectively, and wider 90% ECI. However, its performance seems to be improved in the hourly records. In terms of the ER measure that evaluates the difference between the predicted and observed maximum value, LN distribution, which is considered a heavy-tailed model, tends to overestimate the maximum on average 74.31 and 32.76 at 15-min and hourly scale, respectively, which have very wide 90% ECI, ranging from −29.73% to 214.49% and from −38.26% to 132.62, respectively. In contrast, both G and W distributions are able to produce maximum values that are in closer agreement with the observed one with mean ER equals Focusing on the fitting measures regarding extremes, G and W distributions perform equally well with mean ER 2 equal to 5.92 and 5.77, respectively, in 15-minute records, and 7.85 and 7.93, respectively, in hourly records. On the contrary, the LN distribution has higher mean values such as 11.53 and 10.6 at 15-minute and hourly scale, respectively, and wider 90% ECI. However, its performance seems to be improved in the hourly records. In terms of the ER 3 measure that evaluates the difference between the predicted and observed maximum value, LN distribution, which is considered a heavy-tailed model, tends to overestimate the maximum on average 74.31 and 32.76 at 15-minute and hourly scale, respectively, which have very wide 90% ECI, ranging from −29.73% to 214.49% and from −38.26% to 132.62, respectively. In contrast, both G and W distributions are able to produce maximum values that are in closer agreement with the observed one with mean ER 3 equals to −11.27 and 7.72, respectively, in 15-minute records and −10.65 and −4.45, respectively, in hourly records while the 90% ECI range of variation is much narrower than the corresponding LN distribution. Figure 8e shows that G distribution tends to underestimate the predicted maximum value of 15-minute records while the ER 3 values of W distribution have a uniform dispersion around the median that is very close to zero. Regarding hourly records, as Figure 8 presents, the two distributions show similar performance, underestimating the maximum. Taking into account the strictness of the last assessment criterion that examines values with exceedance probability very close to 0 along with the results of the ER 2 measure, the general performance of G and W distribution in terms of extremes can be characterized as exceptionally good at both time scales.
Further to individual records, the three distributions were also fitted and evaluated on the entire record of each household as well as on the data of the "parent household" (i.e., 12 series). Table A4 in the Appendix B reveals that G and W distributions outperform LN in terms of all error measures in all cases and achieve an equally good fitting to the observed data. Focusing especially on ER 2 and ER 3 measures, it is evident that the fitting of LN distribution to the sample tail is poor, overestimating the maximum value. Since the entire records of each household are much larger than the individual one allowing a more accurate estimation of parameters, this can be regarded as an additional important indication of the non-suitability of this heavy-tailed distribution to model the nonzero 15-minute and hourly demand values.
The above analysis provided insights in the two of the questions posed in the beginning of this sections, i.e., which are the most suitable models to simulate the understudy variables and whether there is one or a set of models that prevail the others. As shown G, W and LN distributions are able to describe the great majority of nonzero water demand records, while the more thorough evaluation of the three models revealed the superiority of the two former models compared to the LN distribution. At the same time, G and W distributions show an almost equally good performance, at least on the basis of three error measures.
However, it is of high importance to investigate whether G and W distributions are complementary in the sense that one model achieves a good fitting on the records in which the other model has poor performance and vice versa or the two models have similar performance in the same records. This directly answers the third question posed in the beginning of Section 4, i.e., whether we can use just one probabilistic model to reproduce the characteristics of nonzero water demand records of all households and all hours of the day, indicating if there is just one distribution that should be the first choice in the modeling of nonzero water demand at these time scales.
A direct comparison of the G and W distribution is provided in the scatter plots of Figure 9 that depict the error measures of the fitting of the two models on individual records. In the plots corresponding to ER 1 and ER 2 measures, a point above the diagonal black dashed line implies better performance for G distribution for this record while the points below the line corresponds to records in which W distribution achieved a better fit. In the plots depicting ER 3 values, the points with both coordinates greater than zero imply an overestimation of the maximum from the two models and can be interpreted as the plots of the other two error measures while the opposite stands for the points with both coordinates lower than zero.
As is apparent in all box plots, the performance of G and W distributions follows a similar pattern in the sense that they exhibit a poor fit in the same few records, which correspond to the nigh hours of summer and winter period of House #1 while an almost identical good fitting was achieved for the remaining records. Similar scatter plots presenting the points of the LN distribution show that the fitting error for the "poor-fitted" points is even higher. Furthermore, we can see that especially the ER 2 and ER 3 points at the hourly scale are dispersed closely around the diagonal line, which implies an almost identical fitting performance.
The above observations provide strong evidence that the G and W distributions cannot be regarded as complementary models but as essentially equally good models. However, if there was a necessity for the discrimination of just one model, then Weibull distribution should be the first choice to represent the nonzero water demand data both at 15-minute and hourly time scales due to its ability to better reproduce the observed extremes.

Fitting and Parameters of the Models
In the present section, we describe the fitting procedure of the G, W, and LN distributions to the nonzero records water demand records, providing the actual values of their scale and shape parameters. From a practical point-of-view, the derive sets of parameters consist of valuable prior information, e.g., for instance, using the mean values of the parameter as the initial estimation especially in cases where no recorded data is available for a household or the sample size is small and, subsequently, the sufficient fitting of a model is highly uncertain.
To fit the three models, the method of L-moments was followed by using the analytical formulas for the first three L-moments for each distribution. This method is analogous to the usual method of moments, according to which the parameter estimates are obtained by equating the first p sample Lmoments to the corresponding distribution moments. Then, by solving the system of equations, we

Fitting and Parameters of the Models
In the present section, we describe the fitting procedure of the G, W, and LN distributions to the nonzero records water demand records, providing the actual values of their scale and shape parameters. From a practical point-of-view, the derive sets of parameters consist of valuable prior information, e.g., for instance, using the mean values of the parameter as the initial estimation especially in cases where no recorded data is available for a household or the sample size is small and, subsequently, the sufficient fitting of a model is highly uncertain.
To fit the three models, the method of L-moments was followed by using the analytical formulas for the first three L-moments for each distribution. This method is analogous to the usual method of moments, according to which the parameter estimates are obtained by equating the first p sample L-moments to the corresponding distribution moments. Then, by solving the system of equations, we obtain the shape and scale parameters of the distribution. The first three L-moments for the three understudy distribution (e.g., [40]) as a function of scale parameter β and shape parameter γ are provided in Table A5 in Appendix B. The method of L-moments were implemented in the R programming environment [51] with the use of the "lmom" package [52]. Scale and shape parameters were obtained for the nonzero 15-minute and hourly time scales and the individual records described in the previous section as well as on the entire record of each household (i.e., 11 series) including each hour of the day (i.e., data from all households for each hour of the day, 24 series). As explained in the previous section, the records with negative or zero skewness on which it was not possible to fit the three distributions were excluded from the analysis. The scale and shape parameters for 15-minute and hourly individual records are presented in the form of box plots in Figure 10 while their summary statistics are given in Table A6 in Appendix B. With regard to G and W distributions, we also included in the present analysis the shape and scale parameters for the LN distribution. obtain the shape and scale parameters of the distribution. The first three L-moments for the three understudy distribution (e.g., [40]) as a function of scale parameter β and shape parameter γ are provided in Table A5 in Appendix B. The method of L-moments were implemented in the R programming environment [51] with the use of the "lmom" package [52]. Scale and shape parameters were obtained for the nonzero 15-min and hourly time scales and the individual records described in the previous section as well as on the entire record of each household (i.e., 11 series) including each hour of the day (i.e., data from all households for each hour of the day, 24 series). As explained in the previous section, the records with negative or zero skewness on which it was not possible to fit the three distributions were excluded from the analysis. The scale and shape parameters for 15-min and hourly individual records are presented in the form of box plots in Figure 10 while their summary statistics are given in Table A6 in Appendix B. With regard to G and W distributions, we also included in the present analysis the shape and scale parameters for the LN distribution. As evident in Figure 10, the shape parameters of the two former models and especially those of W distributions vary in a very narrow range at both time scales (i.e., consider that the theoretical range of this parameter is (0,∞)). The values of shape parameters for both W and G distribution is lower than 1 for the great majority of 15-min records, i.e., the 90% ECI is (0.44,0.96) and (0.21,0.93), which indicates J-shaped density functions. Regarding the hourly records, a greater number of records results in bell-shaped densities with γ > 1 since the 90% ECI of γ for W and G distribution is (0.51,1.32) and (0.28,1.63), respectively. As is also apparent in Table A7, the variability of the shape parameter of the fitted distributions to the entire records of each household is even lower with all values being significantly less than 1. As evident in Figure 10, the shape parameters of the two former models and especially those of W distributions vary in a very narrow range at both time scales (i.e., consider that the theoretical range of this parameter is (0,∞)). The values of shape parameters for both W and G distribution is lower than 1 for the great majority of 15-minute records, i.e., the 90% ECI is (0.44,0.96) and (0.21,0.93), which indicates J-shaped density functions. Regarding the hourly records, a greater number of records results in bell-shaped densities with γ > 1 since the 90% ECI of γ for W and G distribution is (0.51,1.32) and (0.28,1.63), respectively. As is also apparent in Table A7, the variability of the shape parameter of the fitted distributions to the entire records of each household is even lower with all values being significantly less than 1.
As expected, the scale parameters vary in a wider range since this type of parameter depends on the unit of measurement and the values of a variable. The effect of the scale parameter is to stretch out the pdf of the distribution. As we can see in general, the scale parameters of hourly records are greater than the corresponding 15-minute records while, for both time scales, the empirical 90% ECI of the β parameter of Weibull distribution, i.e., (0.34,5.31) and (0.81,16.99) for 15-minute and hourly records, respectively, is much narrower than that of Gamma distribution, i.e., (1.71,13.41) and (2.67,24.77) for 15-minute and hourly records, respectively.

Discussion and Conclusions
The present study examines the statistical characteristics and peculiarities of 15-minute and hourly residential water demand based on the "iWIDGET dataset" consisting of 11 long records (i.e., from one up to two years data) from single-households in Greece. The dataset is composed by representative households with different socio-demographic characteristics allowing the statistical characterization of water demand under different consumption patterns and profiles. Furthermore, in order to provide more rigorous and concrete insights on the distributional properties of water demand, the analysis was conducted on the basis of 360 individual records, which are derived after the study of seasonal variation of the main statistical characteristics of 15-minute and hourly water demand data.
In the analysis, water demand was treated and examined as an intermittent process with a mixed-type marginal distribution, which is composed by a discrete part and mathematically expressed by the probability of no demand and a continuous part that describes probabilistically the nonzero demand magnitudes. Following this modeling strategy, we examined, (a) which statistical characteristics of the marginal distribution of water demand (i.e., probability of no demand as well as the mean value, L-variation, and L-skewness of the nonzero values) vary the most at different temporal levels (i.e., monthly, daily, hourly basis) and which can be regarded as essentially invariant and (b) which probabilistic models are considered more suitable to describe the continuous part (i.e., nonzero values) of the distribution.
Both questions are of high importance within the context of probabilistic modeling and stochastic simulation of residential water demand. From a practical point-of-view, the first question provides insights regarding the appropriate assumptions on the structure (e.g., stationary or cyclo-stationary) of the simulation scheme that should be employed to model the variability at these time scales while the second one determines the type of models from the vast amount available in the literature that should be considered as the first choice in the probabilistic description of residential water demand.
A first coarse comparison of the above described statistical characteristics on the basis of households indicated that the statistics with higher variation include the mean value and the probability of no demand, following the typical patterns observed in the literature, i.e., households with kids or water needs for garden irrigation are characterized by higher mean nonzero water demand and lower probabilities of no demand. Regarding the shape characteristics, the records of all households are characterized by especially high variability (in terms of L-variation) and positive skewness (in terms of L-skewness) irrespective of the corresponding mean value. As the analysis revealed, these two statistics vary in a very narrow range despite the differences in the socio-demographic characteristics of the understudy households.
Furthermore, we studied the seasonal variation of the above characteristics for all the records of all households on a monthly, a daily, and an hourly basis. The analysis on a monthly basis showed that significant changes in the statistics are observed between the summer and winter months especially in the households with irrigation needs while, within the two periods, no significant variation is observed.
At both time scales, the statistic with a higher monthly variation is the probability of no demand, p ND , followed by the mean value while the changes of L-variation and L-skewness is much less.
The studied statistics of the demand records of all households remain almost invariant from day-to-day at both time scales, which support the assumption of stationarity at these temporal levels.
Lastly, we examined the diurnal pattern (hour-to-hour) of variation of the above properties. As on the monthly basis, p ND exhibits the highest fluctuation with sharper shifts during the daily hours. This observation along with the fact that, in most cases, the mean value of nonzero demand remains almost invariant during the day indicates that the widely known diurnal pattern of water demand (including zero values) is the result of the fluctuation of the frequency of water use, as expressed via p ND , rather than of the intensity of use, as expressed via the mean of nonzero values. On the contrary, the two shape characteristics shift between the daily and night hours.
Next, we examined the suitability of different probabilistic models to describe the nonzero water demand of all households at a 15-minute and an hourly scale. Towards this, we use the L-moments ratio diagrams to preliminary screen 10 distributions in terms of reproducing the shape characteristics of the understudy records. This analysis highlighted the superiority of Gamma and Weibull distributions in terms of the first three L-moments to model the great majority of the nonzero 15-minute and hourly water demand records of all households and all hours of the day. At the same time, a smaller percentage of records is better described by the Lognormal distribution.
The performance of the three distributions were further evaluated on the basis of three error measures. The comparison of models' performance shows that Weibull and Gamma distribution achieve a very good fit to the great majority of 15-minute and hourly records even in the cases where the Lognormal is appeared as superior. Focusing on the fitting measures regarding extremes, Gamma and Weibull distributions perform equally well while Lognormal distribution, which is considered a heavy-tailed model, tends to overestimate the observed extremes.
A more detailed comparison of the performances of the two distributions revealed that the Gamma and Weibull distributions cannot be regarded as complementary models but as equally good models. If there is a necessity for a clear indication of just one model, the better performance of Weibull distribution in terms of the maximum value consists of the latter model as more suitable to describe the 15-minute and hourly nonzero water demand records of all households and time intervals.
As a rule of thumb, Weibull distribution can be the first choice in the probabilistic modeling of water demand records at these time scales.
The actual fit of the above distributions to the understudy records provided ranges for the parameters of the models. In the great majority of 15-minute records, the shape parameter γ of Weibull and Gamma distribution is much lower than 1 indicating J-shaped density functions while, at an hourly scale, a greater number of records have bell-shaped densities with γ > 1. It is worth noting that the range of variation of Weibull parameters is much narrower than the corresponding of Gamma distribution, which adds an extra argument to favor the former model in the statistical modeling of water demand. The derived set of model parameters with a relative small variation of shape statistics among households and time intervals can be regarded and used either as prior estimations in cases where no recorded data is available for a household or the sample size is small and, subsequently, the sufficient fitting of a model is highly uncertain. In such cases, the mean parameter values can be used, i.e., for a 15-minute time scale, γ = 0.67 and γ = 0.5 for the Weibull and the Gamma distributions, respectively, and for the hourly time scale, γ = 0.79 and γ = 0.85 for the two models, respectively.  N o 245652). All analyses were implemented in the R programming environment [51] while all graphs were produced with the use of ggplot2 package [53]. Lastly, we would like to sincerely thank I. Tsoukalas for his valuable comments and suggestions, which substantially improved the present paper.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Appendix A presents complementary graphs associated with the analysis of seasonal variation of nonzero water demand, which is presented in Section 3.
Water 2017, 9, x FOR PEER REVIEW 25 of 34

Appendix A
This Appendix presents complementary graphs associated with the analysis of seasonal variation of nonzero water demand, which is presented in Section 3.

Appendix A
This Appendix presents complementary graphs associated with the analysis of seasonal variation of nonzero water demand, which is presented in Section 3.

Appendix B
This Appendix includes complementary material associated with the screening, evaluation, and fitting of probabilistic models to nonzero water demand presented in Section 4.

Appendix B
Appendix B includes complementary material associated with the screening, evaluation, and fitting of probabilistic models to nonzero water demand presented in Section 4.

Appendix B
This Appendix includes complementary material associated with the screening, evaluation, and fitting of probabilistic models to nonzero water demand presented in Section 4.   Table A5. L-moments of Gamma, Weibull, and Lognormal distribution.