Urban Water Consumption at Multiple Spatial and Temporal Scales. A Review of Existing Datasets

: Over the last three decades, the increasing development of smart water meter trials and the rise of demand management has fostered the collection of water demand data at increasingly higher spatial and temporal resolutions. Counting these new datasets and more traditional aggregate water demand data, the literature is rich with heterogeneous urban water demand datasets. They are characterized by heterogeneous spatial scales—from urban districts, to households or individual water ﬁxtures—and temporal sampling frequencies—from seasonal/monthly up to sub-daily (minutes or seconds). Motivated by the need of tracking the existing datasets in this rapidly evolving ﬁeld of investigation, this manuscript is the ﬁrst comprehensive review effort of the state-of-the-art urban water demand datasets. This paper contributes a review of 92 water demand datasets and 120 related peer-review publications compiled in the last 45 years. The reviewed datasets are classiﬁed and analyzed according to the following criteria: spatial scale, temporal scale, and dataset accessibility. This research effort builds an updated catalog of the existing water demand datasets to facilitate future research efforts end encourage the publication of open-access datasets in water demand modelling and management research.


Introduction
Population growth, urbanization, and climate change are expected to increase the stress on freshwater resources and the burden over urban water systems [1][2][3]. Adaptive planning and management strategies are thus needed to address seasonal or prolonged water scarcity in drought-prone areas and meet water demands with reduced operational expenditure, overall increasing the resilience of critical urban water network infrastructure systems [4].
In the last decades, demand-side management has increasingly emerged as a key approach to complement traditional water supply operations [5]. Different water demand management strategies (WDMS) have been proposed in the literature to foster water conservation and more efficient water demands [6,7]. These include technological, financial, legislative, maintenance, and educational interventions [8]. The rise of demand-side water management has motivated the development of more and more sophisticated technologies and mathematical models to monitor, characterize, and predict water demands at different spatial and temporal scales, and capture the existing relationships between water demand and its potential climatic and socio-demographic determinants [9][10][11].
At the coarser urban and suburban scales, the state-of-the-art literature is rich with studies focused on improving the efficiency of water distribution network (WDN) opera-tions (e.g., [12][13][14]). In these studies, water demands are often considered as a stationary or seasonal input to the hydraulic model of the WDN, with a spatial level of aggregation referred to the city or the district scale. Such spatial scales are typically relevant for infrastructure planning, WDN design, and WDN partitioning. More recently, various techniques for water demand forecasting have also been proposed in the literature. They include regression analysis, time series analysis, and techniques based on black box models, including different Artificial Neural Network architectures (e.g., [15]). Demand prediction models have been developed at different spatial and temporal scales, with the majority of the studies focusing on urban and suburban scales, and temporal resolutions spanning from hourly to monthly intervals (e.g., [16][17][18]). A disruptive phase in the development of water demand studies is represented by the advent of smart metering technologies [8,19]. The development of smart meters allowed gathering water demand data with an unprecedented level of spatiotemporal detail. Water demand data became potentially available at the spatial scale of individual households and data logging intervals of a few seconds [20]. While understanding the full range of potential benefits of smart meters for water utilities and customers is still a topic for active discussion [21], the variety of studies in the literature based upon smart meter data demonstrates the diversity of data-driven opportunities that high-resolution smart meter data opened up in the context of water demand modelling and management. These include, e.g., water demand profiling and customer segmentation [22], post meter leak detection and water loss management [23], end use studies for fixturelevel water demand breakdown and detailed demand forecasting [24], and behavioral studies [25].
The continuously increasing amount of smart meter trials and demand modelling and management studies since the middle of the 1990s [8] suggests that several high-resolution water demand datasets have been recently compiled. The availability of high-resolution datasets opens up several opportunities for advanced applications, including the development of water end use disaggregation algorithms and machine learning techniques for user profiling. Such applications could benefit from open datasets to enhance comparative applications, benchmarking, and facilitate the development of general algorithms trained on combined datasets with water consumption data from different sources and locations. High-resolution datasets, considered in combination with the more traditional water demand datasets gathered at coarser spatial and temporal resolutions would represent a valuable resource for researchers and scientific efforts targeting the development and validation of mathematical models of water demand at different spatial and temporal scales, or the development of advanced smart metering analytics.
Yet, information and metadata on individual water demand datasets are scattered in the literature, and to the authors' knowledge, a comprehensive review of the existing datasets is still missing. Existing data are frequently difficult to access or use, and existing literature reviews on urban water consumption focus on demand modelling or other datadriven applications, rather than on analyzing the heterogeneity of existing datasets, their spatial and temporal scales, and accessibility. Motivated by the recent development and availability of datasets gathered with increasingly high spatial and temporal resolution, the aim of this paper is to gather information on the datasets to identify current trends and gaps and help future data-driven research, along with research benchmarking and reproducibility.
This review contributes the first effort of classification and analysis of 92 water demand datasets and 120 related peer-review publications that have been compiled in the last 45 years to monitor urban water consumption data at different spatial and temporal scales and provide data for water demand modelling and management studies. We characterize the reviewed datasets according to their heterogeneous spatial and temporal scales, and investigate their accessibility. Moreover, since digital disruption has transformed the electricity industry earlier and some lessons learned may apply also in the water or multi-utility sectors [26], we additionally explore similarities and differences between the reviewed sub-set of high-resolution water demand datasets and 57 comparable high-resolution electricity demand data.
We thus analyze the reviewed datasets and publications to address these five research questions (see Figure 1): Q1. How are the existing urban water demand datasets distributed across different spatial scales? Q2. How are the existing urban water demand datasets distributed across different temporal scales? Q3. What are the main domains of application of the reviewed datasets, within water demand modelling and management studies? Q4. What is the access policy for the reviewed datasets? Q5. Is there any synergy with comparable datasets in the electricity sector? The ultimate goal of this review is to compile an updated catalog of the existing water demand datasets and facilitate future research efforts in this rapidly evolving field of investigation. Researchers performing water demand studies could refer to this review to identify data readily available in formats, spatial scales, and temporal scales that suit their research needs. This review will finally also help identifying water demand datasets that are accessible free of charge, in the attempt to promote further publication of open-access datasets to foster reproducible research, benchmarking, and the development/validation of existing software tools to generate reliable and realistic synthetic data [27][28][29].
The paper is structured as follows. The dataset review methods and the considered spatial and temporal scales are presented in Section 2; an overview of the dataset search outcomes is presented in Section 3; Sections 4-6 analyzes the reviewed datasets in terms of (i) spatial scales, (ii) temporal scales, and (iii) accessibility; Section 7 analyzes similarities and synergies between some of the reviewed water demand datasets and alike electricity demand datasets; finally, Section 8 draws some final remarks and directions for follow-up research.

Datasets Review Methods
To address the research questions formulated in Figure 1, we searched for water demand datasets collected at different spatial and temporal scales and referenced in the peer-reviewed scientific literature on water demand modelling and management. We searched on different web search engines and scientific databases, namely, Google Scholar (https://scholar.google.com/), Mendeley (https://mendeley.com/), Mendeley Data (https://data.mendeley.com/), and data.world (https://data.world/datasets/). We followed the following 3-step procedure:

1.
We searched for the following combinations of keywords on Google Scholar and Mendeley: Water demand/Water consumption/Household water demand/Residential end use water/Residential water consumption/Residential water demand/Water demand data/Water demand dataset/Water demand data set/Water demand forecasting/Water demand city/Water demand district/Water end-use/Water consumption patterns/Domestic water use/Urban water demand/Water use behavior/District water demand.

2.
We searched for the following combinations of keywords on Mendeley Data and filtered the obtained results to include only two data types, i.e., "Dataset" and "Data Repositories": Water demand/Water consumption/Household water consumption/End use water consumption/Urban water consumption/Urban water demand/District water demand/Water supply demand.

3.
We searched for open datasets in data.world, an online catalog for data and analysis. We restricted our research to datasets included in the data topic "water" and selected only datasets mentioned in peer-reviewed articles. More specifically, we searched for the following combinations of keywords: Water demand/Water consumption/Residential water consumption/Domestic water demand/Demand management.
In addition to the datasets retrieved with the above search, we included in this review other high-resolution datasets retrived from two articles strongly focused on residential water demand, i.e., [30,31].
After compiling an inventory with the datasets and related publications retrieved with the above search methods, we reviewed, classified, and critically analyzed the inventory according to three main criteria: (i) spatial scale (Section 4), (ii) temporal scale (Section 5), and dataset accessibility/access policy (Section 6).

Spatial and Temporal Scales of Interest
Depending on the spatial scale of interest, different metering and monitoring tools for water consumption data gathering can be adopted. For instance, end use metering usually requires ad hoc, customized, solutions [20,32], while household or district water consumption can be monitored with commercial flow meters [33]. Datasets collected at different spatial scales will thus represent different levels of aggregation of water demand and will possibly have implications on data privacy and ownership (e.g., water utilities vs individual water consumers). Numerous benefits can derive from high-resolution data, both for water utilities and water consumers [21]. Such data enable, for instance, accurate modelling of water demand patterns, peaks, and anomalies (e.g., leaks) [28]. However, large and high-resolution data implies also several potential drawbacks, e.g., privacy concerns, need for cloud resources for data storage and new skills for data analytics [34]. We identified four scales of interest for urban water consumption monitoring and analysis, from the coarser to the finer: • City. We refer to a city as an urban centre with its own government and administration. The city scale can be composed of multiple districts and it includes the whole water distribution network. • District. A district is a component of an urban center. The district spatial scale refers to a group of residential buildings in one or more municipalities. In many cases, districts coincide with the water network district meter areas (DMAs), i.e., sub-regions of a water network delimited by closing boundary valves. In the case of small cities or villages, the district and city scale can coincide. • Household. The household scale implies a single dwelling, or a single-family residential building connected to an individual water meter. In this category we also include multi-family homes, when connected to one water meter. Depending on the type of household, its water consumption can be attributed to indoor usage only or both indoor and outdoor usage. • End use. The end use scale refers to an individual water fixture within a single apartment/household. End uses can refer to indoor (e.g., shower, dishwasher, toilet, etc.) or outdoor uses (e.g., garden, swimming pool, etc.).
In this review, we keep into account the spatial scale dependencies of the reviewed datasets and classify them according to the three suburban scales included in the city level: District, Household, and End Use. In the literature, the spatial scale of interest is related to the type of application that requires water demand datasets (WDDs). WDDs at the district scale, for instance, are mainly used to investigate water network partitioning [35,36], compute water balances [37], assess the hydraulic performance of the network system [38], and perform leakage identification and localization [39,40]. The level of aggregation of these WDDs depends on the network configuration and/or DMA design, and often refers to water demands at network nodes [41,42]. At the household scale, WDDs represent domestic water demands and are primarily used to build descriptive and predictive models of water demand, estimate demand peak timing and magnitude to inform water network operations, and inform conservation campaigns and demand management interventions [43,44]. Finally, at end use scale, WDDs are used to improve our understanding of residential water consumption behaviors, develop disaggregation models to estimate the share of household water consumption of individual fixtures, develop customized water demand management strategies and billing reports, and overall increase customer engagement and help water utilities and customers promote efficient water usage [45,46]. In keeping with the different spatial and temporal scales considered in this study, this review includes both water consumption data retrieved with digital water meters and data measured with low resolution meters or retrieved from water bills [47][48][49]. Furthermore, when a dataset or publication considers multiple spatial scales, we classify it according to the finest level of data granularity.
Beside the spatial dimension, we also explore how datasets differ in terms of temporal scale (or time sampling frequency). Previous literature has shown that water demand data gathered at monthly or quarterly resolution is mainly used to inform strategic regional planning and to calculate water bills [11], while a number of additional applications, including post-meter leak detection and water end use disaggregation can be enabled by sub-daily data (e.g., recorded with a time sampling frequency of 1 h or a few minutes/seconds) [28]. Here, we characterize the datasets collected at the district, household, and end use scales according to their time sampling resolution, with primary focus on daily and sub-daily frequencies. We consider datasets to have a low resolution when they include data with a daily or lower time sampling frequencies (e.g., monthly). In turn, we consider as high resolution datasets those gathered with a sub-daily frequency (e.g., hourly, 1 min, 10 s).

Overview of Dataset Search Outcome
As an outcome of the dataset search explained above, we retrieved information on 92 unique datasets referenced in 120 scientific works, which in the last 45 years contributed to the literature on water demand modelling and management. The complete catalogue of the datasets and publications reviewed in this study is publicly available at [50]. We have also stored it in a public GitHub repository where pull requests can be submitted, so that our dataset collection can be collaboratively updated as more datasets become available (the repository is accessible at https://github.com/AnnaDiMauro/WDDreview).
A general overview of the reviewed datasets ( Figure 2) suggests that, first, the majority of the reviewed datasets contain water consumption data at high spatial resolutions (i.e., end use and household). Second, the temporal distribution of the reviewed publications ( Figure 3) is skewed to the right, with a major increase of household and end use studies after 2010. This is likely due to the increasing development of smart meter technologies during the period 2011-2015 [8], following the pioneering studies and prototypes that first appeared in the 1990s (the first end use study reviewed dates back to the 1991-1995 interval in Figure 3).
Finally the worldwide geographical distribution of the reviewed publications ( Figure 4) shows an uneven spatial distribution, with more than 50% of the reviewed studies located either in the USA or Europe: 28% USA, 25% EU, 17% Australia and New Zealand, 13% United Kingdom, 9% Asia, 6% Canada, 2% Africa.
A more detailed analysis on the distribution of the reviewed datasets across spatial and temporal scales, along with a critical analysis on their accessibility, are presented in the next sections.

Dataset Spatial Scales
To answer the first research question reported in Figure 1, we here investigate the distribution of the 92 reviewed datasets across different spatial resolutions, along with their implications for demand modeling and management.
As already reported in Figure 2, we identify only 20 datasets at the district scale. Water demand data collected at this scale relate to specific areas of a water distribution network. They are primarily used to monitor aggregate water demand patterns in the network, or to provide input information to simulation models of water distribution systems. Among these datasets, it is worth highlighting the presence of comprehensive, multi-network datasets, such as the WDSRD database for research applications [51]. This dataset includes data for over 40 different distribution networks, collected by the ASCE Task Committee on Research Databases for Water Distribution Systems for the water distribution system community to develop and test new algorithms for network design, analysis, and operations. A typical problem that requires such type of data is the optimal sensor placement in a partitioned water distribution network [52]. This problem, consisting of finding the optimal sensor location that minimizes the economic costs, while maximizing the amount of information required for network operations and diagnosis, still represents an open challenge for utilities and researchers [53,54]. The datasets classified in the district spatial scale are generally gathered by water utilities for ad hoc analysis on specific case studies within their controlled water network facilities. As the data ownership belongs to water utilities, such data is generally not released to the public, but only released to researchers under non-disclosing agreements. If demand data come from individual household-scale water meters, privacy-protection schemes, e.g., data anonymization, are usually required before data are actually shared.
The majority of the reviewed datasets was collected at the household (31 datasets) or end use (41 datasets) scale. Datasets as such high spatial resolutions have been emerging in the literature in the last 20-30 years, driven by the increasing scientific interest towards smart water metering technology. Smart meters can be defined as digital sensors able to measure, store, and transmit water use data at the household level and with a sub-daily temporal sampling resolution, down to a few seconds [28,55]. Mining smart meter information with advanced data analytics is enabling new opportunities also for developing automatic tools to estimate the water consumption of individual fixtures in a household [56,57], quantify the impact of individual and collective human behaviors on residential water consumption and water conservation [58], and acquire a better understanding on which socio-demographic determinants primarily drive residential water consumption in different geographical contexts [59,60]. Water data at the household/end use scale are of great interest for behavioral studies and provide key information for fostering water conservation, designing water tariffs, promoting more sustainable uses of resource, characterizing water demand during peak hours, and improving demand forecasting and management capabilities [61]. These topics have been already extensively reviewed in the literature, and several comprehensive reviews analyzed the usage and benefits of smart metering for data collection and detailed water demand modelling and management [8,21,62,63].
We report a detailed summary of the metadata of the datasets identified at the district, household and end use scales in Tables 1 (district), 2 (household) and 3 (end use), sorted in chronological order. These metadata include the year when the dataset first appeared in the literature, its size (number of districts/households), time series length, time sampling resolution, access policy (classified in Open (O), Restricted (R), Not Available (NA)), and main goals and dataset applications in the related publications. When a dataset is found to be open access, we include the link to the repository where it is stored at the time of this review.
Some common features and trends can be identified from the information reported in the three tables. First, there is an inverse correlation between the dataset size (or the time series length) and the time sampling resolution. Datasets comprising hundreds or thousands of homes (e.g., [48,49,[64][65][66]) generally include data collected with a monthly or daily time sampling resolution, while datasets with a sub-daily time sampling resolution only include a few units or tens of homes (e.g., [67][68][69]). This may be attributed to the experimental extent of most high-resolution studies, their usually short-term duration, and the costs of deploying large-scale smart metering systems. Second, while datasets collected at the district scale have been primarily used for WDN optimization, WDN design, understanding the effects of socio-economic determinants on aggregate water demand, and leak detection, we identify four categories of state-of-the-art studies that have used, so far, datasets at the household scale listed in Table 2. These four categories, defined based on the scope of the listed studies, are: water demand forecasting, water demand pattern recognition, water conservation and customer awareness, and water end use disaggregation. The problem of water demand forecasting has been investigated for decades with different modelling techniques. Several recent applications exploit Artificial Neural Networks and other machine learning techniques to predict future water demands [44,66,70] and use this information to optimize water network operations or design water use efficiency programs [49,[71][72][73]. Eight studies can be included in this category, among those listed in Table 2. A second category of studies (e.g., [31,[74][75][76]) exploited household-scale water demand data combined with pattern recognition techniques to inform effective water allocation and reduce water demand to enhance urban water service infrastructure. Other 9 studies from those in Table 2 can be included in this category. Third, 11 datasets among those in Table 2 were gathered as part of water conservation and customer awareness research efforts and projects, including [65,[77][78][79]. These studies investigate the potential of smart meter technologies, often coupled with data analytics and digital platforms, for data communication to water consumers, to increase users' awareness on water consumption and sustainable water usage behaviors. Finally, 3 household-scale datasets were primarily used for water demand disaggregation to estimate water use at individual fixture levels with a non-intrusive approach, i.e., coupling the data from a single-point smart meter with a disaggregation algorithm and avoiding the installation of several intrusive sensors to directly monitor the water consumption of each end use [64,80,81].
Water end use disaggregation can be identified as the link between WDDs at the household and the end use level. Since intrusive smart meter installations at the end use level turn out to be costly and unlikely acceptable and/or accepted by water consumers, thus non-viable for large-scale deployments, non-intrusive techniques represent a valid solution. Yet, non-intrusive end use disaggregation algorithms require ground truth data collected at the fixture level, at least for a limited time span, for algorithm training, validation, and performance assessment. For this reason, the majority of the reviewed WDDs classified in the end use spatial scale (see Table 3) has been used to develop and train different end use disaggregation algorithms, including machine learning-based algorithms (see, for instance, [67,68,82,83]). Differently from the WDDs at the household scales, end use datasets feature a short time series duration (a few days or weeks) and a high time sampling resolution, with data collected primarily with a sampling frequency of 5-10 s. These datasets, mainly collected in the last 10 years, usually include samples collected in two heterogeneous periods (e.g., summer and winter) to account for the seasonal variations of some end uses, e.g., outdoor water demand for irrigation. Whereas developing and testing end use disaggregation methods remains the main purpose of collecting water demand data at the end use level, some of the WDDs listed in Table 3 have been also used to evaluate water consumer behaviors and attitudes toward individual residential water uses (e.g., [84,85]), or test the effectiveness of water conservation strategies based on appliance retrofit and efficiency upgrades [86,87], customized tariffs [88], and awareness campaigns [89,90].

Dataset Temporal Scales
In this section, we address Q2 (see Figure 1) by analyzing the temporal scale of the 92 reviewed WDDs, i.e., we investigate which time sampling resolutions characterize the datasets spatially gathered at the district, household, and end use scales.
As defined in Section 2, water demand data can be recorded with a low resolution characterized by daily or monthly time sampling frequency, or with high resolution, when sub-daily measurements are recorded. The sampling represents a limiting factor for the type of analysis that can be performed [28,115]. Considering the 92 WDDs included in this review, the datasets gathered at the district scale mainly include data collected with a low temporal resolution. These data, recorded with a daily, and more often, monthly, or coarser temporal resolution, consist of measures obtained from billing reports, or periodic meter observations. This is consistent with the main needs of the studies using such datasets for, e.g., the estimation of aggregate water demand for water network design, the resolution of optimal sensor placement problems, and the optimization of water network operations. Only some exceptions include data with a time sampling resolution of 15 min (e.g., [94,100,107]). In turn, the household and end use datasets include data gathered with higher time sampling resolution. The classification of these datasets based on their time sampling resolution ( Figure 5) reveals that the majority of the end use-scale datasets contain data gathered with a sub-minute resolution, while most of the household-scale datasets contain data recorded with a time frequency of 15 min to 1 day. The distribution of the end use datasets in Figure 5 is an empirical validation of the findings of a previous study by Cominola et al. [28], which demonstrated that only data gathered with time sampling resolutions of a few seconds or, at most, 1 min, can be used to accurately estimate the contribution, peak, and time of use of individual water fixtures, especially when multiple end uses are active. Besides facilitating accurate end use disaggregation [67][68][69][156][157][158], such high resolution data also allow a detailed characterization of consumer behaviors [77,155,159,160], and the design of customized water demand strategies [88,123,142,161,162].
Conversely, the distribution of the household-scale datasets in Figure 5 confirms that data sampled with lower frequency suffice for water demand pattern analysis at the household level, i.e., with no detailed end-use analysis. Sub-daily resolution still allow extracting water use patterns and recurring routines [28,66,76], identify anomalies [163], and forecast water demand [49,104].
Cross-correlating information on the time sampling resolution with the metadata previously described in Tables 2 and 3, a trade-off between the time sampling resolution and the size of a dataset emerges.

Data Accessibility
Open and free access to scientific datasets can provide valuable support to more reproducible and reusable research [164]. The availability of benchmark datasets accessible by different researchers worldwide would, for instance, help minimize redundant experiments, facilitate benchmarked numerical results on common datasets, and foster reproducibility and incremental research-which in turn drive innovation [165,166]. Yet, data accessibility presents significant challenges in many research fields, due to data ownership, sharing limitations, privacy concerns, technical data management, and security risks [167]. Furthermore, currently available data often lack a standardized format or organized database structure [167,168], or they might not be explicitly referenced in scientific publications, and thus, can be hard to track. Considering the literature on urban water demand modelling and management, WDDs are usually collected as part of large-scale scientific projects carried out by research groups or water utilities at the national and international level [77,86,99,169], or from spatially-constrained experimental settings deployed with the main purpose of creating open-access datasets to be shared for research activities [24,135,145,170].
Here, we aim to answer to Q4 (see Figure 1) and distinguish three main categories of data accessibility to categorize the revised water consumption datasets, namely open, restricted, and not available: • Open WDDs are those available in the literature and downloadable from the web free of charge (when available, the link to each dataset classified as open is reported in Tables 1-3). • Restricted datasets are those WDDs that are available online either only for purchase, or by privately contacting authors/water utilities that own/have direct access to the data. • Not available WDDs are those used and/or cited in the literature (primarily in papers published in the 1970s/80s/90s), but with no information on how to access them.
For the datasets reviewed in this paper, a trade-off emerges between dataset creation and data availability. While there is an increasing amount of water demand data collected at different spatial and temporal scales and related publications (see Figure 3), we found that data sets accessibility is mostly restricted. The datasets we reviewed at the district scale are usually provided by water utilities for specific projects or case studies. As they are owned by water utilities and only released to scientists with non-disclosure agreements for the duration of the relative project, their accessibility is usually restricted or not available. Conversely, the datasets reviewed at the household and end use scales include at least some open and many accessible, but restricted, datasets. Data anonymization, access restriction, or access control filters are usually implemented to protect water consumers privacy [171]. While for many years synthetic household and end use data generation methods have been developed because of limited data availability (e.g., [27,172]), there is an increasing trend of open and restricted household/end use datasets, visible from the number of datasets and access type over time in Figures 6 and 7. The sample of datasets and studies suggests that digital technologies and experimental research are two factors that can foster data availability. Indeed, the majority of the datasets that we classified with Restricted or Open access, have been collected as part of experimental smart meter trials. In such a context, data are often collected from a sample of volunteer households and are made available by design as part of the research, thus they are not prevented from further usage by utility regulations or ownership rights. Figures 6 and 7 are discussed in detail in the following sections.

Household-Scale Datasets Accessibility
At the household scale (see Figure 6), there is a more than linear increase in dataset creation. While the few datasets gathered between 1975 and 1995 are not available, almost all those created between 1996 and the time of this review are accessible with restrictions. This may be motivated by the utilities' and researchers' need to protect sensitive customer data, even if they are usually anonymized, or by the interest to control the access to a potentially high-value asset constituted by a limited resource (household/smart meter data, in this case). Only a few datasets gathered in the last 10 years are openly accessible to the scientific community and the public. We found that this limited set of data is usually composed of datasets delivered as outputs of specific research projects in the European area, e.g., the EU-funded SmartH2O project [77] and the studies in London and the Thames Valley [49,173].

End Use-Scale Dataset Accessibility
Consistently with the household-scale datasets, the majority of end use-scale datasets has restricted access. Yet, some open end use datasets exist since the end of the 1990s. As reported in Figure 7, it also seems that the last 5 years have witnessed an increase of openaccess datasets, compared to the total amount of end use datasets. While datasets collected at the household scale are usually owned by utilities, end use datasets are usually collected by researchers as part of experimental research efforts and smart meter/end use studies. This is one of the reasons why more end use-scale datasets are open access, compared with household-scale datasets. According to the experience of the authors, even those datasets declared open are not often easy to access (e.g., download link is broken, website is not updated), but some encouraging preliminary publications, e.g., ( [24,170]) suggest that further detailed high-resolution open datasets, collected in controlled environments and provided with groud truth end use labels, will be soon available for research.
All the 41 end use-scale datasets reviewed in this paper have been referenced in at least one peer-reviewed publication on water demand analysis or end use disaggregation. However, a detailed analysis of the usage frequency of the different end use datasets (see Figure 8) reveals that, after excluding those datasets with no identification name and used only for ad hoc individual case studies and trial applications ("no name " datasets in Figure 8), only two datasets were used in more than 5 publications, namely the SEQ and the GOLD COAST datasets. The SEQ dataset has been dominating the scientific scene of the last years and contains the largest collection of sub-minute resolution data estimated for different water end uses. It is the output of a residential end-use study carried out in Australia, i.e., the South East Queensland Residential End Use Study (SEQREUS) [135]. The SEQREUS project aimed to quantify and characterise the main water end uses in a sample of 250 single homes. The SEQ dataset contains water demand with a resolution of 5 sec obtained through the installation of smart meters at the household level. Moreover, end use water demand estimations were achieved using a mixed disaggregation method combining information on the smart metering equipment, household stock inventory surveys, and flow trace analysis [127,144]. Three separate water end use analysis occurred during the SEQREUS project. The first reading campaigns were conducted in the winter (14-28 June 2010); the second one was carried out in the summer (1 December 2010-21 February 2011); the third one in winter 2011 (1-15 June). The SEQ dataset has been so far used in the scientific community to investigate pattern recognition of water usage [174], assess the impact of user awarness on water conservation [89], develop end use disaggregation algorithms [175], and develop demand side management programs [83]. Similarly, the GOLD COAST dataset includes data from the Gold Coast Watersaver End Use Project that was conducted in winter 2008 [84]. It includes data for 151 homes located in the Gold Coast, Australia. The project aimed to explore the degree of influence of household socioeconomic features on end uses. The GOLD COAST dataset contains water demand with a time sampling resolution of 10 seconds, obtained with high-resolution water meters and data loggers to enable the identification of heterogeneous water end uses.

Nexus Considerations: Outlook and Comparison with Datasets in the Electricity Sector
Motivated by the strong link between water and energy flows in the urban metabolism [176], as well as by the digital transformation of both the water and the energy industry, coordinated actions that account for the water-energy nexus are receiving increasing attention to archive sustainable resource management [177,178] and foster the development of integrated multi-utility services driven by digital transformation [26]. An increasing number of research studies investigated water and electricity correlations to perform customer segmentation analysis and end use classification of residential waterelectricity demand data [22,69,145,179]. Most of these studies and other research efforts on water end use disaggregation and water demand profiling were inspired by previous advances in the electricity sector. With a more advanced and consolidated development of smart metering and Internet of Things (IoT) technologies in the electricity sector, highresolution household and end use electricity datasets became available earlier than similar datasets in the water sector. Indeed, smart meter developments in the water and electricity sectors followed so far two different timelines and speeds of deployment. They also present some technological differences affecting data gathering. The dependence of smart water meters on their battery, for instance, limits their operating life and their data streaming frequency, while electricity meters are fed by a power source by design.
Yet, we recognize some similarities, e.g., also in the electricity sector the availability of end use datasets was pushed by research efforts on building, training, and testing different end use disaggregation algorithms [180,181]. Moreover, while traditional energy system modelling focuses on the national/international scale to assist utilities and authorities in managing the electricity grid, smart electricity metering at the building level is aimed at improving users' awareness and promoting sustainable behaviours and energy savings possibilities [182,183], similarly to water conservation and demand management in the water sector. Also, similarly to the water sector, the temporal scale for electricity demand data gathering is strictly related to the spatial scale. Daily or monthly electricity data are usually required for demand modelling at national scale, while sub-daily resolution is usually adopted for smart metering at building scale. At this fine scale, both water and electricity data are used to enhance the efficiency of consumer behaviors, improve demand forecasting, foster money/resources-saving opportunities, investigate different customer segments, and potentially design customized billing schemes [184,185].
Acknowledging that water and electricity demand modelling and management present both differences and synergies, here we address the research question Q5 listed in Figure 1. We cross-compare the accessibility of water and electricity datasets to assess differences and similarities in data availability, while we do not aim to compare tools for water/electricity modelling. Adopting similar research criteria to those explained in the dataset review methods (Section 2), we retrieved 57 electricity datasets gathered at the household or end use scale. Complete information on these datasets is reported in Supplementary Tables S1 and S2.
We then compared them with the water datasets discussed in the previous section on data accessibility. The outcome of this comparison is represented in Figure 9. The figure reveals that, first, there is a slight majority of electricity datasets gathered at the end use level. This is consistent with what emerges from the reviewed water datasets. Second, the bar plot in Figure 9 shows that most of the electricity end use datasets we retrieved are mainly open. It is worth noting that this might have been facilitated by the availability of low cost and easy-to-install devices, such as smart sockets and Wi-Fi smart plugs, which allow direct end use data gathering [186]. Moreover, the community of researchers working on electricity Non-Intrusive Load Monitoring (NILM) has been very active and open in the last years. The availability of many open end use datasets has been pushed by the need of benchmarking the increasing amount of NILM algorithms on common datasets [187][188][189], as well as by individual initiatives of some researchers making available data retrieved from their household, or an experimental site equipped with appliance-level sensors, e.g., [145]. Overall, we consider the research efforts in household and end use electricity data collection and analysis as precursors of the trend that is developing in the water sector during the last years. We expect that further developments in the water sector will help fill the gap between available open electricity and water data at the household and end use scales. Similar research will also foster the portability of algorithms and data analytics originally developed for electricity application to water or combined water-energy applications [190,191].

Discussion and Conclusions
In the last decades, demand-side water management emerged as a key strategy to pursue efficient water demands and complement supply-side interventions to enhance the overall resilience of urban water systems. The rise of demand-side water management, coupled with the development of digital water metering technologies, has fostered the collection of water demand data at increasingly higher spatial and temporal resolutions. The availability of water demand data at the spatial scale of individual households or end uses, and with a time sampling resolution of a few seconds or minutes, opened up unprecedented opportunities to improve our understanding of water consumer behaviors and modelling water demand. As a consequence of this transformative process, the literature is now rich with urban water demand datasets collected over time with different spatial and temporal resolutions, and archived with different levels of accessibility.
In this paper, we reviewed 92 water demand datasets and 120 related peer-review publications compiled over the last 45 years. We analyzed the datasets and classified them according to their spatial scale, temporal scale, and level of accessibility. Moreover, we analyzed their domains of application within water demand modelling and management studies, and compared them with similar datasets in the electricity sector. As a result of this review and classification effort, we can summarize the following takeaways and address the research questions introduced in Figure 1.
Q1. How are the existing urban water demand datasets distributed across different spatial scales? We found that the majority of the reviewed datasets was collected at the household (31 datasets) or end use scale (41 datasets). Only 20 datasets were identified at the district scale. This is likely due to the increasing number of water demand studies that developed after the advent of digital water meters. Moreover, the datasets gathered at the district scale are usually owned by water utilities, which make them available to researchers usually only temporarily and for ad hoc case study analyses.
Q2. How are the existing urban water demand datasets distributed across different temporal scales? Focusing on the finest spatial scales analyzed, i.e., the household and end use scales, we found that most of the analyzed datasets contain data sampled with a time frequency in the range of 1 s to 1 day. Yet, differences exist: most of the end use-scale datasets contain data gathered with a sub-minute resolution, while household-scale data are characterized by time sampling resolutions of 15 min to 1 day. This is primarily due to the high temporal resolution required by residential water end use disaggregation models.
Q3. What are the main domains of application of the reviewed studies, within water demand modelling and management studies? Our review reveals that the datasets reviewed at district level are mainly used to estimate aggregate demand patterns used in water distribution networks models to investigate water network partitioning, hydraulic performance, network anomalies, and leakage detection. Household-scale datasets have been primarily used to develop data-driven models for water demand forecasting, as well as for explorative analysis to identify water demand determinants. Consistently with our findings for Q2, end use datasets are primarily gathered to develop, train, and validate end use disaggregation algorithms. Both household and end use datasets have also been used to inform water conservation/demand management programs and monitor their effectiveness to change water demand patterns.
Q4. What is the access policy for the reviewed data sets? Most of the reviewed datasets are not open access. Usually, they have a restricted access, i.e., are available for purchase, or can only be obtained by contacting the researchers or water utilities that compiled and own the dataset. However, some households-and end use-scale datasets became openly available, primarily in the last 5 years. This is an encouraging signal for future data sharing and research reproducibility.
Q5. Is there any synergy with comparable datasets in the electricity sector? Similarities exist in the spatial and temporal scales of interest for both the water and the electricity sector, and the amount of reviewed datasets is comparable. Yet, the datasets in these two domains are still very different for what regard their accessibility. Open access datasets are more easily available in the electricity sector, primarily because of the extensive research efforts developed in the last three decades on the problem of electricity end-use disaggregation.
Overall, this paper can provide researchers in the water demand modelling and management sector with useful information to identify data readily available in formats and spatial and temporal scales that suit their research needs. We also identify a roadmap of priorities to enable a complete disclosure of the information value of urban water demand datasets. First, the scientific community would benefit from increased accessibility to open data. We acknowledge that water demand data are sensitive and anonymization and privacy-protection measures need to be undertaken before they can be made openly available. Sharing high-resolution data, consumer data, and sensitive digital data imply potential risks for the privacy and security of private or personal information. Sensitive datasets could potentially be used by third parties for profit and intimidation, or to intrusively track private activities [168]. In response to privacy and security concerns, data protection regulations such as the General Data Protection Regulation (GDPR) implemented by the EU in 2018 and other policies initiated after it in other countries worldwide should be established at the regulatory level [192]. When guaranteed in compliance with privacy protection and data security frameworks, an increasing availability of open access datasets would guarantee better reproducible research, create opportunities for research benchmarking, and foster more transparent and possibly collaborative development and validation of analytic tools.
Second, this review is focused solely on water demand datasets, with primary focus on the household and the end use scales, and only a general overview of possible applications at different temporal and spatial resolutions is provided. Future work could look at systematically reviewing the different goals of existing urban water demand studies at different suburban and urban scales, including those focused on outdoor water use [193], urban landscape water conservation [194], economics and price influences [91], socioeconomic factors and drivers of water demand [195], and metropolitan water planning [196]. Especially these last categories of studies and applications entail cross-domain analysis which combine water consumption data with data from other sources (e.g., socio-economics, climate, behavioral data). Beside requiring proper analytic tools for data analysis, proper data management and sharing frameworks and protocols should be designed to facilitate data fusion among private/public water utilities and the other stakeholders involved in these inter-sectoral studies.
Third, the reviewed datasets are unevenly geographically spread worldwide (some geographical hot spots in USA, Europe, and Australia were identified) and come with different spatial and temporal resolutions. Research efforts aimed at quantitatively comparing water demand data (water consumption volumes, peaks, patterns) gathered across different scales and geographical contexts would advance the generalization of water demand models and contribute to upscale the findings from currently localized water demand studies. In addition, important aspects related to the use of water consumption data from different meters include data standardization and meter accuracy. Data from various sources need a standardized format to facilitate and improve the use of WDDs and increase data portability, interoperability, and overall data quality [197,198]. Moreover, future research could focus on assessing and comparing datasets in the catalogue we have built in this work in terms of measurement precision and accuracy.
Finally, we expect that the current challenges posed to the resilience of interconnected critical infrastructure will foster efforts aimed at overcoming data silos and encourage the development and transfer of multi-sectoral analytic tools to inform resilience planning across sectors (e.g., smart electricity grids, green infrastructure), and scales [26].
Supplementary Materials: The complete catalog with the 92 state-of-the-art water demand datasets and 120 publications reviewed in this paper is available on Zenodo (https://doi.org/10.5281/zenodo. 4390460 [50]) and in this public GitHub repository: https://github.com/AnnaDiMauro/WDDreview. The complete list and metadata of the additional 57 electricity datasets at the end use and household scales that we reviewed in this paper is reported in Supplementary Tables S1 (end use scale) and S2 (household scale). The following are available online at https://www.mdpi.com/2073-4441/13/1/36/s1.

Conflicts of Interest:
The authors declare no conflict of interest.