1. Introduction
Regional water use patterns result from the combination of individual water user behaviors. Knowledge of water use behavior at the household level is required to understand and manage these regional patterns through a combination of supply-side and demand management strategies. Availability of widespread high temporal resolution water use data can help achieve urban water management sustainability goals and expand our knowledge about residential water use [
1,
2]. High temporal resolution data (i.e., observations recorded with a time interval <1 min) enables detection, characterization, and classification of water end uses. An end-use event represents a water using occurrence (e.g., a toilet flush). Most residential water meters in operation today are not capable of collecting this type of data. Additional dataloggers are commonly used to collect high temporal resolution data on top of magnetically driven meters [
3,
4,
5]. These dataloggers count magnetic pulses (rotations of a magnet within the meter’s measuring element), with each pulse representing a fixed volume of water passing through the meter. High resolution data are typically recorded by aggregating the number of pulses that occur within each time step of a selected temporal resolution. The pulse data are then processed and analyzed to generate end-use information.
Water use events are usually identified in recorded data as periods of non-zero flow, and several event features are calculated for use in classifying events into a corresponding end-use category (e.g., a toilet, shower, faucet). Average, mode, and maximum flow rate; duration; time of occurrence; volume; and the number of vertices within the shape of an event’s trace (vertices are defined at the change points where flowrate transitions from one flowrate to another) are the most commonly used features for event classification [
6,
7,
8,
9]. Most of these features are influenced by the temporal resolution at which data are recorded and by the volumetric pulse resolution of the meter (i.e., the volume of water that each pulse represents). The volumetric resolution of the pulses is constant across meters of the same size and brand, while its magnitude can vary significantly across different meter sizes and brands [
3]. For example, the volumetric pulse resolution of a 5/8-inch (in) Neptune T-10 m is approximately 0.03 liters (L) [
3], whereas the pulse resolution for a 1 in Master Meter Bottom Load meter is approximately 0.16 L [
3]. Datalogger devices used for high temporal resolution water use data collection have no control over this parameter (except for counting multiple rotations as a single pulse), which leads to inconsistency in collected data, even when a consistent temporal resolution is used.
Consistency in the temporal resolution of data collection for residential end-use studies has not been the case, with different studies having collected data at different temporal resolutions (aggregating all water use within a fixed time interval): 10 s temporal resolution [
10,
11,
12,
13], 5 s [
14,
15], and more recently at 4 s [
3,
5,
16,
17]. Cominola et al. [
2] assessed the impact of temporal resolution on end-use disaggregation and classification accuracy using a stochastic model and found that accuracy increases for data at higher temporal resolutions. However, the highest temporal resolution simulated was 10 s [
2] as the model relied on a dataset collected at this resolution [
10]. Despite the number of end-use studies reported in the literature, no recommended temporal resolution has emerged as a standard.
Accurately identifying simultaneous events (i.e., two different water use events occurring at the same time) and differentiating events that occur at similar flow rates are highly dependent on the temporal resolution of the data. Data temporal resolution also affects the accuracy of calculated event features. For example, the estimated duration of an event depends on data temporal resolution because the start and end of an event can occur at any moment within a data recording interval, leading to uncertainty at the beginning and end of an event, especially with longer recording intervals. The duration of an event, usually calculated as the number of recorded time intervals for which there is non-zero flow, has an impact on the average flow rate, which is often calculated by dividing an event’s volume by its duration. The accuracy with which event features can be estimated, in turn, impacts the methods that can be used for event classification and the accuracy of classification results.
Water use events can be mechanical (those where the resident has no direct control over the flow rate, the duration, or both (i.e., toilets, clothes washer, dish washer, automated irrigation events) or user-regulated (where the resident has control over the flow rate and or duration—i.e., showers, faucet, bathtub, manual hose irrigation). Mechanical events are typically classified using their features, including duration, volume, flow rate, or cycle information [
7,
8,
9]. However, different approaches have been used to classify user regulated events. For example, after identifying and classifying mechanical events at a residence, Nguyen et al. [
9] used a rules-based procedure to label all user regulated events with a volume less than 15 L as faucet events. They then identified events using more than 15 L as either shower or irrigation events. In contrast, Attallah et al. [
7] classified all types of events using a procedure that relies on training a classification model based on the features of a set of events manually labeled by a water user, indicating that it is possible to classify all types of events based on their features. However, the ability to accurately discriminate between events of different types based on their features clearly requires accurate estimates of event feature values. Furthermore, the temporal resolution of recorded data and subsequent processing of time aggregated data using filtering techniques can remove distinct event features (resulting from flow rate fluctuations) that could otherwise facilitate the classification process.
There are currently no general methods for filtering raw pulse data, disaggregating overlapping events, and classifying events that have been tested and proven to work across the different temporal resolutions that have been used for data collection in past residential end-use studies. While a generalized approach would be incredibly useful, it remains impractical given the data collection capabilities of current smart water meters and dataloggers (i.e., in many cases, data collection is constrained by available metering and/or data logging technology). Furthermore, a comprehensive characterization of how the temporal resolution at which data are recorded affects the values of event features, and hence our ability to classify them has not been possible until now given the lack of data at a sufficient temporal resolution to enable this analysis.
In this study, we sought to evaluate how the temporal resolution of residential water use data affects our ability to identify end-use events, calculate features of individual events, and classify events by end use. While we tested some of the same data aggregation intervals tested by Cominola et al. [
2], we also explored multiple data recording intervals with temporal resolutions higher than the finest resolution they used (10 s) to explore data resolutions used in more recent end-use studies [
14,
17]. We employed a datalogger device designed specifically to collect water use data on a residential water meter by recording all magnetic pulses generated by the meter as they happen, producing what we term “full pulse resolution data”. These data record water use at the highest possible temporal resolution (i.e., the full pulse resolution of the meter) and represent data not previously collected or analyzed. We then used these data to address the following research questions: (a) How does the temporal aggregation interval of recorded data affect the ability to identify, classify, and calculate attributes of individual events and the data volumes generated?, and (b) What unique features can be extracted for events derived from full pulse resolution data that can be used to identify and classify end-use events, including cases when simultaneous events occur? We analyzed full pulse resolution data using an innovative data collection method and then aggregated the data to simulate different temporal resolutions to generate insights into event features that answer these questions. This paper shows that collecting full pulse resolution data has several advantages versus temporally aggregated data, a key contribution to the field of water demand management and water end-use studies.
4. Conclusions
In this paper, we presented analyses and comparison of residential water use data at different temporal resolutions in comparison to full pulse resolution data collected using a specialized datalogger. To answer our first research question about how the temporal aggregation interval of recorded data affects ability to identify, classify, and calculate attributes or features of individual events, we demonstrated that as data temporal resolution decreases, the number of detected end-use events decreases. We also showed how estimates of event features and the shape of overlapping events were impacted with decreasing temporal resolution (e.g., as data temporal resolution decreases, estimated event duration increases and average flowrate decreases). Our results show that temporally aggregating pulse data reduces ability to accurately estimate event features and generates oscillations in the data that require filtering techniques to remedy. However, those same filtering techniques can remove or mask important event features that could be used for event classification.
Regarding the volume of data generated, the final component of our first research question, pulse data captured a larger, and more accurate, number of events at each of the sites without negatively impacting the volume of data generated when compared to time aggregated data collected at temporal resolutions most suitable for end-use identification, disaggregation, and classification (i.e., up to 10 s resolution). Additionally, when overlapping events occur, time aggregation of the data can mask the features of such events, whereas pulse data provide a much cleaner trace that would better facilitate disaggregating overlapping events.
We observed that the values of features calculated for events changed as the temporal resolution decreased, which will negatively impact any classification algorithm or methodology that uses those features. Key event features, such as the mode flow rate, the average flow rate, and the duration vary as the temporal resolution decreases leading to more overlap in the distributions of these values and less power in using these features to discriminate event types (e.g., for classification). These variations in the number of identified events and their features have implications on the accuracy of any analyses based on frequency or event features. For example, estimates of the technical performance of water using fixtures are impacted by data temporal resolution and would best be done using pulse data.
Regarding our second research question, the pulse spacing values within events provide unique features that could be used to more accurately identify and classify end-use events. In our controlled experiment, events of different types exhibited unique behavior at the beginning and end of events, and the median pulse spacing for events of different types shows great promise as a discriminating feature for classification purposes. These results also argue for using meters with higher pulse resolution (i.e., smaller volume per pulse), which would provide greater detail in the trace of individual events and reduce the likelihood of “zero-pulse” events (i.e., events having volume smaller than the pulse resolution of the meter) that are registered as part of the subsequent event. While it may not be practical to replace existing meters for this reason, and in some cases may be impossible given requirements for safe and accurate meter operation at higher flowrates (e.g., those seen at homes with automated irrigation systems), the pulse resolution of the meter may be an important consideration when installing new or in retrofitting existing meter networks.
While we evaluated data from only two single family residential properties, the data were similar for both newer, single-lever-type fixtures and older dual-knob-type fixtures, indicating that the uniqueness of event features from different water use fixtures we observed in the pulse data (e.g., flow rate, pulse spacing, event shape, and unique behavior at the beginning and ending of events) will exist across water using fixtures at any property. Thus, collecting pulse data could provide generalized capability to not only provide temporally aggregated data for any existing operational purposes (e.g., regular billing) but can also provide more detailed information and discriminating event features for use in end-use studies. More discriminating features could, in turn, make end-use disaggregation and classification algorithms simpler and more computationally efficient. This could change smart metering technology by enabling more efficient computation of end-use information directly on the meter using edge computing techniques such as those demonstrated by Attallah et al. [
5]. This would open the door for more real-time applications of the data, including customer feedback portals, in-home displays, and leak detection and alerting.
The benefits of pulse data are clearly illustrated here and warrant consideration in future data collection efforts. While differences in the volumetric pulse resolutions of different meter brands, models, and sizes will still exist, collecting full pulse resolution data would eliminate differences among data collected with different temporal resolutions, leading to greater standardization of data collection and analysis methods. Full pulse resolution data were superior in clearly identifying a larger number of end-use events, they contributed to more accurate and less ambiguous calculation of event features, they reduce or eliminate the need for data filtering prior to calculating event features, they more clearly capture the complexity of overlapping events, and they provide new event features that are highly discriminatory among events of different types—all without increasing the volume of data that have to be recorded, transmitted, stored, and analyzed. These benefits bring opportunities for smart-metering manufacturers to adopt similar data collection strategies which can lead to better information about water use, faster analytics, and more accurate user feedback.