Water End Use Disaggregation Based on Soft Computing Techniques

Abstract: Disaggregating residential water end use events through the available commercial tools needs a great investment in time to manually process smart metering data. Therefore, it is extremely difficult to achieve a homogenous and sufficiently large corpus of classified single-use events capable of accurately describe residential water consumption. The main goal of the present paper is to develop an automatic tool that facilitates the disaggregation of the individual water consumptions events from the raw flow trace. The proposed disaggregation methodology is conducted through two actions that are iteratively performed: first, the use of an advanced two-step filter, whose calibration is automatically conducted by the Elitist Non-Dominated Sorting Genetic Algorithm NSGA-II; and second, a cropping algorithm based on the filtered water consumption flow traces. As a secondary goal, yet complementary to the main one, a semiautomatic massive classification process has been developed, so that the resulting single-use events can be easily categorized in the different water end uses in a household. This methodology was tested using water consumption data from two different case studies. The characteristics of the households taken as reference and their occupants were unequivocally dissimilar from each other. In addition, the monitoring equipment used to obtain the consumption flow traces had completely different technical specifications. The results obtained from the processing of the two studies show that the automatic disaggregation is both robust and accurate, and produces significant time saving compared to the standard manual analysis.


Introduction
Since the Brundtland Report [1] was presented, sustainability in the use of water resources has been a steady concern in designing water policies [2][3][4].This is a problem with many different faces, from the source (surface or ground water, desalination, reclamation, etc.) to the use (agriculture, residential, industrial, environmental preservation, etc.).All of them are relevant, but bearing in mind that most of the human population lives in cities, urban water management becomes an issue of paramount importance.Therefore, accounting urban water consumption and knowing about end-uses at each customer's household is not only key because of the amount of water resource that is used and/or can be saved, but also because of many other considerations.In this regard, there are well-founded reasons for the research currently being conducted on residential end-uses, such as reduction in treatment costs linked to water consumption; improvement and better effectiveness of conservation measures in the urban environment; conservation of energy linked to water consumption; design optimization of indoor piping systems; improved demand forecasting models; etc.
As an essential tool for enhancing urban water management, the new technologies being implemented today in smart meters are making possible a significant leap forward in recording and characterizing domestic water consumption.Further than the traditional monthly volume read, new meters may provide hourly consumption time patterns or a volume-flow pattern.They may also send alarms when a leak, a forgotten open faucet or a continuous back flow is detected, and all that information may be immediately sent through an AMR (automatic meter reading) system.However, the above-mentioned capabilities are only the first tier when considering all the real possibilities current smart meters may yield.Though feasible today, a second, more advanced tier is not fully developed because of its notable complexity.It consists of high frequency monitoring-duration, volume and flow rate-of domestic water consumption, so that every single use in a household (hh) is accurately registered.Then, and after a detailed analysis, all consumption events can be categorized into the different end uses present in the household [5].
As soon as this will be soundly achieved on a large scale, new improvements in efficient water management strategies will be within reach.To name a few, from the consumer's perspective, water conservation measures could be tailored to each individual consumer [6], thus maximizing the saving potential in each case, or the variable term in the tariff could be designed according to consumer's characteristics to guarantee the balance between equity and income [7].Furthermore, from the utility's view, water demand prediction models could be reliably produced from a more accurate bottom-up approach [8,9].
Nowadays, few commercial tools allow for this water end use analysis exercise-Trace Wizard ® [10], Identiflow ® [11] and BuntBrainForEndUses ® [12].However, any of these tools involves a great investment in time and human resources, as a significant part of the data processing work requires human intervention.Furthermore, the results from the analysis are unavoidably affected by arbitrary and constantly changing human criteria.
Alternatively, an automatic prototype based on machine learning algorithms was proposed by Nguyen et al. [13][14][15][16] to disaggregate and classify water consumption events.Unfortunately, the universal usability and compatibility of the tool is limited by the fact that the algorithms were trained with data originated from a specific water meter/data logger combination.In addition, all data were collected in the same geographical area from consumers sharing very similar water consumption habits and water appliances.Furthermore, the set of data employed for the training of the proposed machine learning tool has been obtained using Trace Wizard ® software, which has limited capabilities for disaggregating overlapped consumption events [13].Following a similar approach, Piga et al. [17] proposed an automated water and energy end use disaggregation, which has only been tested against electric energy data.Also, several start-ups claim to have developed software to automatically classify residential water consumption events into various uses [18][19][20].Unfortunately, in this case there is not official or public information available about the processing tools and algorithms used, and the real performance achieved in the water end use classification for various types of households.
This paper presents a novel methodology to substitute manual water end use disaggregation and to produce more accurate sets of classified single end use events that can be employed as training sets for automatic recognition algorithms.Figure 1 depicts the general structure of the methodology, comprising two main processes: disaggregation, which is fully automatic and the main objective of this paper, and classification, which is semiautomatic at its current stage.
The proposed disaggregation process focuses on an advanced two-stage filtering based on the algorithm described in Pastor et al. [21], which is calibrated using the Elitist Non-Dominated Sorting Genetic Algorithm NSGA-II [22], and a new cropping algorithm having as input the filtered water consumption flow traces.
The contribution of this methodology can be summarized in two main aspects.The first one is the integration of a universal two-stage filtering algorithm that can be used to simplify, with a minimum loss of information, the flow traces originated in most commercial metering and logging equipment available in the market.The second one is the reduction of human intervention by automatically disaggregating overlapped water consumption events (as the one used as an example in Figure 1) into single-use events (examples in Appendix A), which are associated with individual uses of water through different appliances.Both features facilitate and improve the processing of flow traces generated during a long-term metering campaign.
of water through different appliances.Both features facilitate and improve the processing of flow traces generated during a long-term metering campaign.The classification process corresponds to the use of unsupervised techniques to solve the classification of single-use events into the various water end use categories.This step is complementary to the disaggregation process and makes unnecessary the intervention of human analysts, except for validating the results, to classify thousands of automatically obtained single-use events.The paper presents a basic fully operational, yet semiautomatic, version to show the capabilities of these techniques for massive classification of consumption events.
As a case study, data originated from two independent water end uses studies conducted in distinct geographical areas were used to test the proposed filtering and disaggregation methodologies.The characteristics of the households and their occupants were unequivocally dissimilar from each other.Moreover, and in order to test the universal applicability of the methodology, the water consumption flow traces analyzed were obtained from monitoring equipment having completely different technical specifications.
BuntBrainForEndUses ® was the commercial online software employed for the water end use analysis.This software offers the possibility of exporting raw flow traces and importing them back after some manipulation has been carried out by the user.This feature is particularly useful for the study conducted as the filters and disaggregating algorithms can be developed with a specialized The classification process corresponds to the use of unsupervised techniques to solve the classification of single-use events into the various water end use categories.This step is complementary to the disaggregation process and makes unnecessary the intervention of human analysts, except for validating the results, to classify thousands of automatically obtained single-use events.The paper presents a basic fully operational, yet semiautomatic, version to show the capabilities of these techniques for massive classification of consumption events.
As a case study, data originated from two independent water end uses studies conducted in distinct geographical areas were used to test the proposed filtering and disaggregation methodologies.The characteristics of the households and their occupants were unequivocally dissimilar from each other.Moreover, and in order to test the universal applicability of the methodology, the water consumption flow traces analyzed were obtained from monitoring equipment having completely different technical specifications.
BuntBrainForEndUses ® was the commercial online software employed for the water end use analysis.This software offers the possibility of exporting raw flow traces and importing them back after some manipulation has been carried out by the user.This feature is particularly useful for the study conducted as the filters and disaggregating algorithms can be developed with a specialized external analysis software, completely independent from BuntBrainForEndUses ® , and then have the results displayed and corrected in the online application.For the methodology presented, the filtering and disaggregation algorithms were programed in R statistics [23].

Materials and Methods
The methodology proposed is divided into two processes (Figure 1).The first one, disaggregation, works on the original, and generally overlapped, consumption events in an iterative way until all the resulting subevents are either single-use (the most) or uncertain (a few) events (examples in Appendix A).By means of this methodology, the resulting subevents are more homogenous than the ones obtained by manual processing of flow traces through, for example, BuntBrainForEndUses ® .This software application follows the same analysis procedure and allows human analysts to graphically crop water consumption flow traces into its various individual components.
The second process of the methodology, classification, assigns a specific water end use category to each single-use event.In the case study developed, this classification is done by identifying homogeneous subsets of events by means of a non-supervised learning technique and assigning a water end use to each one of them.Whatever the classification technique used, its effectiveness is increased by the fact that the subevents generated by the disaggregating algorithms consistently create subevents using homogeneous and well-defined criteria.

Disaggregation Process
Figure 2 shows the flow chart of the proposed disaggregation process, which breaks down the unprocessed events defined by the raw flow trace into simpler consumption events, and classifies the resulting subevents as single-use or uncertain.
The reliability of the process strongly relies on the first analysis stage: filtering of the original flow trace.The filter is controlled by 10 parameters [21], and their calibration is automatically solved per consumption event by the Elitist Non-Dominated Sorting Genetic Algorithm NSGA-II [22] (R package mco).There are three objective functions to be minimized in this calibration: (a) number of points that describe the filtered flow trace; (b) total accumulated volume difference between raw and filtered flow trace (Figure 3(a2,b2), Input and Output, respectively); (c) maximum on the curve of accumulated volume difference.The first objective function (FO 1 ) leads NSGA-II algorithm to solutions that simplify the filtered flow trace, whereas the other two (FO 2 and FO 3 ) focus on improving its fitting quality respect the original raw flow trace.
The next steps are followed to calculate the curve of accumulated volume difference: (1) given a raw water flow trace, demonstrated as vector qr = (qr 1 , qr 2 , . . .qr i . . ., qr m ), and its corresponding filtered water flow trace qf = (qf 1 , qf 2 , . . ., qf i , . . ., qf m ), both expressed in litres per hour (L/h) and recorded at time t i in seconds (s), two new synchronized time series are generated by linear interpolation, qrs and qfs, for the set of unique t i that belong to qr and qf; (2) vector of time window tw and vector of reference flow qref are defined as: (3) the curve of accumulated volume difference is defined for those components of tw i greater than 0.001 s (all flow rate jumps that take place in the raw water flow trace have this duration, as it can be seen in Figure 3(a1), Input) as follows: where V qr and V qf are the accumulated volume along the raw and the filtered water flow traces, respectively, until t = t j + tw j .On the other hand, q th is a user-defined threshold that has been established to properly process the events with a continuous low-flow water leak.In these cases, small differences in leakage flow rate between the raw and the filtered water flow traces can be maintained over a long period of time, resulting in large accumulated volume differences that decrease the representativeness of FO 2 and FO 3 .To avoid this, the accumulated volume difference is divided by tw i when qref j is less than the maximum level of leakage flow rate q th defined by the user.In the same way, the curve of accumulated volume difference is defined for those components of tw i greater than 0.001 s to reduce the appearance of noise when the flow rate is below q th .
Water 2018, 10, 46 5 of 21 established to properly process the events with a continuous low-flow water leak.In these cases, small differences in leakage flow rate between the raw and the filtered water flow traces can be maintained over a long period of time, resulting in large accumulated volume differences that decrease the representativeness of FO2 and FO3.To avoid this, the accumulated volume difference is divided by twi when qrefj is less than the maximum level of leakage flow rate qth defined by the user.In the same way, the curve of accumulated volume difference is defined for those components of twi greater than 0.001 s to reduce the appearance of noise when the flow rate is below qth.Regarding NSGA-II parameters, processing time was the most constraining factor to select the population size and the number of generations.The genetic algorithm achieves good results for the most intricate cases-long duration events with a great degree of overlapping, which typically come from households with leaks and high average daily consumption-with 24 individuals and 10 generations in a reasonable computing time.In relation to crossover and mutation probabilities, the default values taken were 0.7 and 0.2, respectively.Regarding NSGA-II parameters, processing time was the most constraining factor to select the population size and the number of generations.The genetic algorithm achieves good results for the most intricate cases-long duration events with a great degree of overlapping, which typically come from households with leaks and high average daily consumption-with 24 individuals and 10 generations in a reasonable computing time.In relation to crossover and mutation probabilities, the default values taken were 0.7 and 0.2, respectively.
The result of this calibration process is a Pareto Front, and the chosen solution is the one for which the following expression is minimized: where the maximum value reached in each objective function within the Pareto Front (max(FO 1−PF ), max(FO 2−PF ) y max(FO 3−PF )) was taken to standardize the corresponding term.A conservative criterion for Filter-A, which prioritizes the simplification of the flow traces, establishes the weights for each objective function (w1 = 0.8; w2 = 0.1; w3 = 0.1).This is necessary for subsequent disaggregation processes, which can only be applied if certain requirements are satisfied based on a strong filtering of the raw flow trace.Additionally, the calibration time can be limited by a user-defined threshold.
In case a solution is not found within the established time limits (only happening in less than 0.01% of the sample events in the case studies below), the default values for the filter parameters will be used (p1 = 150 (h*ms)/L; p2 = 0.16 L; p3 = 80 L/h; p4 = 40 degrees; p5 = 5%; p6 = 100 L/h; p7 = 10,000 ms; p8 = 6000 ms, p9 = 5%, p10 = 10 degrees).These default values are the result of the authors' experience while developing and applying this process in several projects around the world.In any case, a future sensitivity analysis is already planned to improve the filter performance and improve the speed of the calibration procedure.
Water 2018, 10, 46 6 of 21 for Filter-A, which prioritizes the simplification of the flow traces, establishes the weights for each objective function (w1 = 0.8; w2 = 0.1; w3 = 0.1).This is necessary for subsequent disaggregation processes, which can only be applied if certain requirements are satisfied based on a strong filtering of the raw flow trace.Additionally, the calibration time can be limited by a user-defined threshold.
In case a solution is not found within the established time limits (only happening in less than 0.01% of the sample events in the case studies below), the default values for the filter parameters will be used (p1 = 150 (h*ms)/L; p2 = 0.16 L; p3 = 80 L/h; p4 = 40 degrees; p5 = 5%; p6 = 100 L/h; p7 = 10,000 ms; p8 = 6000 ms, p9 = 5%, p10 = 10 degrees).These default values are the result of the authors' experience while developing and applying this process in several projects around the world.In any case, a future sensitivity analysis is already planned to improve the filter performance and improve the speed of the calibration procedure.Once the consumption event defined by the raw flow trace has been filtered for the first time (Filter-A in Figure 2) the event can be classified as simple, constituted by only four vertexes, or overlapped.Simple events are analyzed by an additional filtering process (Filter-B in Figure 2).This process prioritises how accurately the filtered flow trace matches the original one.In this case, the solution selected from the Pareto Front should attain a value for the KGE index ( [24]; R package hydroGOF) higher than 0.8 with a maximum w1 weight.Figure 3  Once the consumption event defined by the raw flow trace has been filtered for the first time (Filter-A in Figure 2) the event can be classified as simple, constituted by only four vertexes, or overlapped.Simple events are analyzed by an additional filtering process (Filter-B in Figure 2).This process prioritises how accurately the filtered flow trace matches the original one.In this case, the solution selected from the Pareto Front should attain a value for the KGE index ( [24]; R package hydroGOF) higher than 0.8 with a maximum w1 weight.Figure 3 compares the resulting filtered flow traces after going through the first and second filtering processes.Only if the final filtered flow trace (after Filter-B) is formed by only four vertexes, the event is finally classified as a single-use event.Otherwise, it will be classified as an uncertain event.The inclusion of a second filter significantly reduces the classification errors of single-use events compared to a one-stage filter approach.
An additional step of the disaggregation process is the analysis of the minimum/baseline flow (Q base ) in the filtered (Filter-A) events that have not been classified as simple events.The existence of a base event is considered when more than one horizontal section of the event satisfies the following two conditions: (i) the flow rate falls within a specified range {Q base − tolerance, Q base + tolerance}, where the tolerance is defined by the user; (ii) the volume associated with the horizontal section of the event is greater than a specified threshold.When these two conditions are met (Figure 4a), the events between horizontal sections are cropped from the base event (Figure 4b).If only the second condition is not satisfied (Figure 4a), the section is processed as a fictitious union, so that the events are separated and the union removed (Figure 4b).Fictitious unions appear when processing the raw flow traces.They do not actually correspond to any real water consumption (they are a distortion in the flow trace caused by the data-acquisition equipment).The end or starting times of the previous and following subevents are then recalculated to account for the volume removed from the fictitious union.Typically, this volume corresponds to one or two pulses from the pulse emitter of the water meter.The resulting subevents generated by these cropping operations are individually analyzed through the whole process again.
Water 2018, 10, 46 7 of 21 two conditions: (i) the flow rate falls within a specified range {Qbase − tolerance, Qbase + tolerance}, where the tolerance is defined by the user; (ii) the volume associated with the horizontal section of the event is greater than a specified threshold.When these two conditions are met (Figure 4a), the events between horizontal sections are cropped from the base event (Figure 4b).If only the second condition is not satisfied (Figure 4a), the section is processed as a fictitious union, so that the events are separated and the union removed (Figure 4b).Fictitious unions appear when processing the raw flow traces.They do not actually correspond to any real water consumption (they are a distortion in the flow trace caused by the data-acquisition equipment).The end or starting times of the previous and following subevents are then recalculated to account for the volume removed from the fictitious union.Typically, this volume corresponds to one or two pulses from the pulse emitter of the water meter.The resulting subevents generated by these cropping operations are individually analyzed through the whole process again.
(a) (b) Finally, if the event does not fall into any of the previous categories, a gradient analysis of the filtered flow trace is carried out (Figure 2).Only in case that the event has three major slopes, being the first one positive and other two negative or vice versa, it is considered that it is constituted by two or more different events that are overlapped in time, which begin or end within the same time range (Figure 5a).In this case, the events are cropped (Figure 5b), and the two new events are classified as single-use events.On the contrary, if the number of major slopes in the event is greater than three, it is directly categorized as an uncertain event.The key aspect of this separation process is to correctly identify the start or the end, depending on the case, of the second major slope on the raw flow trace (Figure 5a, point 2) and the flow rate at the analogous instant in the filtered flow trace (Figure 5a, point 1).
It should be highlighted as an important contribution of the proposed methodology that all previous separation processes are implemented on the raw flow trace.Thus, signal smoothing in the filtered flow trace do not generate any loss of information in the resulting subevents obtained.This Finally, if the event does not fall into any of the previous categories, a gradient analysis of the filtered flow trace is carried out (Figure 2).Only in case that the event has three major slopes, being the first one positive and other two negative or vice versa, it is considered that it is constituted by two or more different events that are overlapped in time, which begin or end within the same time range (Figure 5a).In this case, the events are cropped (Figure 5b), and the two new events are classified as single-use events.On the contrary, if the number of major slopes in the event is greater than three, it is directly categorized as an uncertain event.The key aspect of this separation process is to correctly identify the start or the end, depending on the case, of the second major slope on the raw flow trace (Figure 5a, point 2) and the flow rate at the analogous instant in the filtered flow trace (Figure 5a, point 1).
It should be highlighted as an important contribution of the proposed methodology that all previous separation processes are implemented on the raw flow trace.Thus, signal smoothing in the filtered flow trace do not generate any loss of information in the resulting subevents obtained.This particular feature can be clearly observed in Figure 5, where the separated consumption events (Figure 5b) maintain the details of the original flow trace (Figure 5a).The presence of these details can be used in a later stage to improve the effectiveness of the automatic classification tools that can be developed.

Classification Process
At the end of the disaggregation process, all the events have been classified into two categories: single-use and uncertain events.As discussed below, the first group is the most numerous in any of the households analyzed, typically ranging from 75% to 92% depending on the amount of water consumption in the household.In addition, the amount of single-use events generated by the methodology is higher if the uncertain events having less than 3 L in volume are assumed to be singleuse events (in this case the percentage of single-use events will range between 85% and 95%).The cropping and classification of the uncertain event group, corresponding to intricate events with high flow rate variability, will be the aim of future research.In this sense, it should be noted that high flow rate variability is not always associated to water consumption overlapping from different uses.Occasionally, the so-called uncertain events may be originated by pressure fluctuations, or the user changing the opening of a faucet for convenience or to adjust water temperature.Cropping and classification of uncertain events is not an easy task when accounting for the previous considerations and the fact that the overlapping of two uses may not produce a consumption flow rate that is the sum of the individual flow rates.The reason for this effect can be found in the pressure losses caused by the plumbing.Depending on the sizing of the pipes, the consumption flow rate of a given use may be significantly reduced by the appearance of additional water usages within the household.
The working hypothesis to categorizing single-use events is the following-those events with similar physical characteristics correspond to the same end use.In accordance with this, an initial clustering analysis is conducted by an unsupervised learning machine, Partition Around Mediods (PAM; [25]; R package cluster).One advantage of this partition clustering technique relies on the fact that it can work with different similarity measurements other than the Euclidean distance.In this work, the Gower distance has been chosen as the similarity measure, since it scales all the variables considered and allow for the definition of individual weights for the variables.On the other hand, the characteristics of the events taken into account as input data are the total volume and the average flow rate.Event duration was rejected due to the considerable noise generated by long water uses, which impeded clusters identification.This effect was observed in households with leakage, showing long single-use events with low consumption flow rates.However, in these cases the duration of the event is a variable used as a preliminary filter to allow the clustering analysis to solely focus on the

Classification Process
At the end of the disaggregation process, all the events have been classified into two categories: single-use and uncertain events.As discussed below, the first group is the most numerous in any of the households analyzed, typically ranging from 75% to 92% depending on the amount of water consumption in the household.In addition, the amount of single-use events generated by the methodology is higher if the uncertain events having less than 3 L in volume are assumed to be single-use events (in this case the percentage of single-use events will range between 85% and 95%).The cropping and classification of the uncertain event group, corresponding to intricate events with high flow rate variability, will be the aim of future research.In this sense, it should be noted that high flow rate variability is not always associated to water consumption overlapping from different uses.Occasionally, the so-called uncertain events may be originated by pressure fluctuations, or the user changing the opening of a faucet for convenience or to adjust water temperature.Cropping and classification of uncertain events is not an easy task when accounting for the previous considerations and the fact that the overlapping of two uses may not produce a consumption flow rate that is the sum of the individual flow rates.The reason for this effect can be found in the pressure losses caused by the plumbing.Depending on the sizing of the pipes, the consumption flow rate of a given use may be significantly reduced by the appearance of additional water usages within the household.
The working hypothesis to categorizing single-use events is the following-those events with similar physical characteristics correspond to the same end use.In accordance with this, an initial clustering analysis is conducted by an unsupervised learning machine, Partition Around Mediods (PAM; [25]; R package cluster).One advantage of this partition clustering technique relies on the fact that it can work with different similarity measurements other than the Euclidean distance.In this work, the Gower distance has been chosen as the similarity measure, since it scales all the variables considered and allow for the definition of individual weights for the variables.On the other hand, the characteristics of the events taken into account as input data are the total volume and the average flow rate.Event duration was rejected due to the considerable noise generated by long water uses, which impeded clusters identification.This effect was observed in households with leakage, showing long single-use events with low consumption flow rates.However, in these cases the duration of the event is a variable used as a preliminary filter to allow the clustering analysis to solely focus on the bulk of single-use events.Once the clustering analysis is finished (Figure 6), the application allows the user to visualize random subsamples of events from each cluster and associate them with an end use according to their physical characteristics.The subset of categorized single-use events and the subset of uncertain events are written in a flat CSV file at the end of the process.This file can be read by BuntBrainForEndUses ® , a web application for manual processing and editing of water end uses.In this way, the user can correct misclassifications and further edit the uncertain events that have not been properly analyzed by the algorithms.

Case Study
The water consumption data utilized for testing the methodology was sourced from two different water end uses studies conducted in geographically distant regions.One of the main differences between the studies is the type of monitoring equipment employed.In the first study (R1), the smart meters installed for water consumption monitoring were ELSTER Y250 single-jet (maximum flow rate of 5 m 3 /h) or ELSTER Y250M multi-jet (maximum flow rate of 7 m 3 /s) depending on the type of residential household.These meters produce a pulse every 0.04 L or 0.06 L of water consumed, respectively.Specially designed data loggers calculated and recorded the average consumption flow rate at approximately 3-s intervals.This recording mode was chosen to optimize the file size while preserving the quality of the flow trace.Files were periodically sent (twice per day) to the server via GPRS/GSM.On the other hand, in the second study (R2) a piston type volumetric water meter was used (Aquadis+, ITRON (WA, United States)), which generates a pulse every 0.1 L. The data logger used, recorded the occurrence time of every pulse with a resolution of 0.02 s.
For this analysis, a selection of significant households of both studies was conducted according to the average daily consumption and the presence or not of continuous leakage.The final selected sample was composed by 20 households-10 from R1 and 10 from R2-for which two-week period of monitoring data were selected.For the first study, the data corresponds to consumption made during autumn 2015, while for the second study the data were collected during autumn 2016.In total 19,858 sampled events were analyzed during the period considered.Figure 7 shows the general characteristics of the selected households and events associated with them.
As shown in Figure 7, there is a considerable difference between the households and events characteristics of these two studies.In the first study, the average daily consumption of the sample was close to 1600 L/hh/day, while in the second it is less than 400 L/hh/day.The number of daily events are also completely different: 110 events/hh/day vs. 35 events/hh/day.The dissimilarities of The subset of categorized single-use events and the subset of uncertain events are written in a flat CSV file at the end of the process.This file can be read by BuntBrainForEndUses ® , a web application for manual processing and editing of water end uses.In this way, the user can correct misclassifications and further edit the uncertain events that have not been properly analyzed by the algorithms.

Case Study
The water consumption data utilized for testing the methodology was sourced from two different water end uses studies conducted in geographically distant regions.One of the main differences between the studies is the type of monitoring equipment employed.In the first study (R1), the smart meters installed for water consumption monitoring were ELSTER Y250 single-jet (maximum flow rate of 5 m 3 /h) or ELSTER Y250M multi-jet (maximum flow rate of 7 m 3 /s) depending on the type of residential household.These meters produce a pulse every 0.04 L or 0.06 L of water consumed, respectively.Specially designed data loggers calculated and recorded the average consumption flow rate at approximately 3-s intervals.This recording mode was chosen to optimize the file size while preserving the quality of the flow trace.Files were periodically sent (twice per day) to the server via GPRS/GSM.On the other hand, in the second study (R2) a piston type volumetric water meter was used (Aquadis+, ITRON (WA, United States)), which generates a pulse every 0.1 L. The data logger used, recorded the occurrence time of every pulse with a resolution of 0.02 s.
For this analysis, a selection of significant households of both studies was conducted according to the average daily consumption and the presence or not of continuous leakage.The final selected sample was composed by 20 households-10 from R1 and 10 from R2-for which two-week period of monitoring data were selected.For the first study, the data corresponds to consumption made during autumn 2015, while for the second study the data were collected during autumn 2016.In total 19,858 sampled events were analyzed during the period considered.Figure 7 shows the general characteristics of the selected households and events associated with them.
As shown in Figure 7, there is a considerable difference between the households and events characteristics of these two studies.In the first study, the average daily consumption of the sample was close to 1600 L/hh/day, while in the second it is less than 400 L/hh/day.The number of daily events are also completely different: 110 events/hh/day vs. 35 events/hh/day.The dissimilarities of the households considered in the analysis conducted in this paper emphasize the reliability of the methodology, which can be used independently of the data sources, as long as the quality of the flow traces allows for end use disaggregation.

Results and Discussion
The proposed methodology has been applied to 19,858 unprocessed water consumption events.After applying the disaggregation process (filtering and cropping) to the flow signal, the total number of events increased to 46,721, being the average number of cropping operations per day equal to 121 and 58 for the studies R1 and R2, respectively (Table 1).The average processing time consumed per each one of these operations is 21.8 s, using an Intel Core i5-4440 processor.Per study, the average cropping time is equal to 18.6 s for R1 and 28.3 s for R2.The calibration of the filtering algorithm is the task requiring more processing time, which increases with the density per unit time of data points in the raw flow trace.For this reason, it takes longer to carry out a cropping operation in the case of an event belonging to the study R2, since in this study flow data were recorded with a lower temporal resolution (0.02 s vs. approx.3 s).Currently, the research team is working in optimizing the calibration algorithms and reducing the required processing time.The strategies proposed for this optimization are: (1) developing a methodology to calibrate the filter per household rather than per event, without a considerable loss of filtering accuracy; (2) finding the filter parameters through an algorithm that combines heuristic and guided search methods.

Results and Discussion
The proposed methodology has been applied to 19,858 unprocessed water consumption events.After applying the disaggregation process (filtering and cropping) to the flow signal, the total number of events increased to 46,721, being the average number of cropping operations per day equal to 121 and 58 for the studies R1 and R2, respectively (Table 1).The average processing time consumed per each one of these operations is 21.8 s, using an Intel Core i5-4440 processor.Per study, the average cropping time is equal to 18.6 s for R1 and 28.3 s for R2.The calibration of the filtering algorithm is the task requiring more processing time, which increases with the density per unit time of data points in the raw flow trace.For this reason, it takes longer to carry out a cropping operation in the case of an event belonging to the study R2, since in this study flow data were recorded with a lower temporal resolution (0.02 s vs. approx.3 s).Currently, the research team is working in optimizing the calibration algorithms and reducing the required processing time.The strategies proposed for this optimization are: (1) developing a methodology to calibrate the filter per household rather than per event, without a considerable loss of filtering accuracy; (2) finding the filter parameters through an algorithm that combines heuristic and guided search methods.Analyzing in detail the consumption events selected from study R1 (Figure 8), the result of applying Filter-A to all 14,648 unprocessed events, was that 8768 events were classified as simple, whereas the remaining 5880 were classified as overlapped.As to the simple events, most of them were, as expected, single-use events (6394).The remaining 2374 simple events were not simple enough and were classified as uncertain events.None of these uncertain events originated from simple events could be cropped or further processed; however, most of them correspond to single-use events with unsteady consumption flows (for example, a faucet that is adjusted to the desired flow rate).
Overlapped events correspond to events that could not be simplified as four-nodes events after Filter-A.These events have a considerable degree of complexity due to the overlapping of water uses.The algorithm proposed is capable of separating into single-use events most of these overlapped events by accurately cropping the flow traces.The initially identified 5880 overlapped events were separated into 24,024 subevents after the disaggregation process.Most of these new subevents were classified as single-use events (17,908), and the rest (6116) as uncertain.In total, the analysis of the worst case scenario of study R1 produced 24,302 single-use events (74.1%) and 8490 uncertain events (25.9%).From the uncertain events, 4224 had a volume of less than 3 L. These, because of their low volume, could be also be added to the single-use group.
Water 2018, 10, 46 11 of 21 be cropped or further processed; however, most of them correspond to single-use events with unsteady consumption flows (for example, a faucet that is adjusted to the desired flow rate).
Overlapped events correspond to events that could not be simplified as four-nodes events after Filter-A.These events have a considerable degree of complexity due to the overlapping of water uses.The algorithm proposed is capable of separating into single-use events most of these overlapped events by accurately cropping the flow traces.The initially identified 5880 overlapped events were separated into 24,024 subevents after the disaggregation process.Most of these new subevents were classified as single-use events (17,908), and the rest (6116) as uncertain.In total, the analysis of the worst case scenario of study R1 produced 24,302 single-use events (74.1%) and 8490 uncertain events (25.9%).From the uncertain events, 4224 had a volume of less than 3 L. These, because of their low volume, could be also be added to the single-use group.For study R2, showing simpler flow traces typical of water consumption profiles of a European family, the results are even more positive.As shown in Figure 8, 91.5% of 13,929 resulting events were considered to be single-use events and only 8.5% were catalogued as uncertain events.The difference between studies is mainly due to the characteristics of the households: the flow traces belonging to the sample R1 are notably more complex, obtained from households with high average daily consumption and frequent overlapping of water uses.For study R2, showing simpler flow traces typical of water consumption profiles of a European family, the results are even more positive.As shown in Figure 8, 91.5% of 13,929 resulting events were considered to be single-use events and only 8.5% were catalogued as uncertain events.The difference between studies is mainly due to the characteristics of the households: the flow traces belonging to the sample R1 are notably more complex, obtained from households with high average daily consumption and frequent overlapping of water uses.
Overall, the methodology generated a number of single-use events equal to 37,042 events (79.3% of the 46,721 resulting events), of which 75.8% has been obtained through the proposed disaggregation process.Consequently, human intervention to crop and generate single-use events has been significantly minimized, with subsequently large human working time savings.In raw numbers, the total automatic disaggregation process has taken less than 4 days (96 computing hours); whereas, according to the authors' experience, the same work would have required about 45 human-working days (360 h).
The distribution of the physical characteristics of the single-use and uncertain events for both studies is presented in Figure 9.Some outliers, with a duration of more than 8 min, have been removed to improve the readability of the basic statistics (median, first and third quartile).As expected, the heterogeneity of the uncertain events is significantly larger than the one obtained for single-use events.In addition, the average duration, volume and flow rate of uncertain events are greater than those for single-use events.It should also be mentioned that some of the events considered here as uncertain correspond to single uses, and their flow rate variability can be caused by adjusting the faucet or variations in the input pressure.The distribution of the physical characteristics of the single-use and uncertain events for both studies is presented in Figure 9.Some outliers, with a duration of more than 8 min, have been removed to improve the readability of the basic statistics (median, first and third quartile).As expected, the heterogeneity of the uncertain events is significantly larger than the one obtained for single-use events.In addition, the average duration, volume and flow rate of uncertain events are greater than those for single-use events.It should also be mentioned that some of the events considered here as uncertain correspond to single uses, and their flow rate variability can be caused by adjusting the faucet or variations in the input pressure.After the disaggregation process, single-use events could be categorized by clustering analysis or any other classification algorithm [15].Similar methodologies have already been used in other fields, like non-intrusive electric load data disaggregation [26][27][28][29].As an example of the results that can be achieved by these techniques, Figure 10 shows the findings for one of the most complex households, HH-06 of R1, showing an indoor leak and a high average daily consumption.The unsupervised learning technique used, allows to identify different types of water consumption uses: Cluster 2 in Figure 10 mostly includes events corresponding to toilets, while Cluster 6 is composed of washing machine and shower events.After the disaggregation process, single-use events could be categorized by clustering analysis or any other classification algorithm [15].Similar methodologies have already been used in other fields, like non-intrusive electric load data disaggregation [26][27][28][29].As an example of the results that can be achieved by these techniques, Figure 10 shows the findings for one of the most complex households, HH-06 of R1, showing an indoor leak and a high average daily consumption.The unsupervised learning technique used, allows to identify different types of water consumption uses: Cluster 2 in Figure 10 mostly includes events corresponding to toilets, while Cluster 6 is composed of washing machine and shower events.Figure 11 shows the final results after assigning a water end use to each cluster for the household under study.Given the same monitoring period, the outcome of proposed methodology is compared with the one obtained manually.It can be observed that the physical characteristics of the events in each end use category tend to be similar.Nevertheless, there is a significant deviation, especially with respect to the mean flow rate and duration of the events: the average flow rate for each end use tends to be higher in the presented approach, while the average duration is generally shorter as more cropping operations are conducted through the automatic disaggregation process.In addition, both parameters-volume and duration of single-use events-are less dispersed when the flow traces are automatically cropped.This is directly related to the inherent defects of manual editing that can be seen in Figure 12 (additional examples in Appendix B): the automatic algorithms recognize the leakage event by means of a volume check (Figure 12b), whereas the analyst has subjectively decided in this specific case to ignore it (Figure 12a) and add the volume to the toilet use.When a human analyst processes the consumption data, the resulting average duration of faucets and toilets is larger and the average flow rate smaller.Additionally, for the same reason, a greater number of leakage events have been identified and separated from other consumption through the proposed methodology.These findings demonstrate that automatic disaggregation tools can generate a standardized corpus of processed data, which is more homogeneous and reliable because it is obtained as a result of cropping operations based on solid and well established criteria.Therefore, the single-use events obtained from the automatic disaggregation algorithms developed in this study are significantly more reliable, in terms of duration, average flow rate and shape than those resulting from a manual processing and cropping the water consumption flow traces.These results have direct implications in the probability functions used to characterize water consumption events frequency, duration and intensity [30][31][32][33][34]. Figure 11 shows the final results after assigning a water end use to each cluster for the household under study.Given the same monitoring period, the outcome of proposed methodology is compared with the one obtained manually.It can be observed that the physical characteristics of the events in each end use category tend to be similar.Nevertheless, there is a significant deviation, especially with respect to the mean flow rate and duration of the events: the average flow rate for each end use tends to be higher in the presented approach, while the average duration is generally shorter as more cropping operations are conducted through the automatic disaggregation process.In addition, both parameters-volume and duration of single-use events-are less dispersed when the flow traces are automatically cropped.This is directly related to the inherent defects of manual editing that can be seen in Figure 12 (additional examples in Appendix B): the automatic algorithms recognize the leakage event by means of a volume check (Figure 12b), whereas the analyst has subjectively decided in this specific case to ignore it (Figure 12a) and add the volume to the toilet use.When a human analyst processes the consumption data, the resulting average duration of faucets and toilets is larger and the average flow rate smaller.Additionally, for the same reason, a greater number of leakage events have been identified and separated from other consumption through the proposed methodology.These findings demonstrate that automatic disaggregation tools can generate a standardized corpus of processed data, which is more homogeneous and reliable because it is obtained as a result of cropping operations based on solid and well established criteria.Therefore, the single-use events obtained from the automatic disaggregation algorithms developed in this study are significantly more reliable, in terms of duration, average flow rate and shape than those resulting from a manual processing and cropping the water consumption flow traces.These results have direct implications in the probability functions used to characterize water consumption events frequency, duration and intensity [30][31][32][33][34].       Obviously, more accurate classification techniques can be developed as processing experience is gained and larger and more reliable data sets are available for training the algorithms.The work presented should be considered as an important first stage to produce sets of individual events that are built consistent and accurately, which can be used to improve the training of automatic recognition algorithms.Therefore, the main contribution of the proposed methodology is mainly related to the quality of the single-use events obtained through an automatic separation technique that can be easily used for developing faster and better performing classification algorithms.

Conclusions
The work presented intends to be a step forward to the main objective of understanding in detail how water is consumed through end uses, and the reasons behind it.It proposes a new fully-automatic disaggregation process for water consumption events that is based on a two-step filtering and event cropping algorithm.An additional advantage achieved by the flexibility of the filter is that the whole process can be universally applied to different type of customers and monitoring equipment.
The disaggregation process presented is divided into two main stages: (a) The raw water consumption events are filtered and categorized as simple or overlapped to facilitate subsequent operations.The filtering relies on an advanced algorithm that is automatically calibrated for each water consumption event by means of NSGA-II genetic algorithm.Simple events are then characterized as single-use, which correspond to actual individual water uses, or uncertain events.(b) On the other hand, overlapped events, originated by simultaneous water uses, are cropped and separated into simpler single-use events.All cropping operations are implemented on the raw flow trace, and potential distortions in the filtered signal do not generate any loss of information in the resulting subevents.In other words, all the subevents created maintain the characteristics of the original flow trace.This particular feature increases the amount of information available for the classification algorithms that can be developed in the future, improving their effectiveness.
Finally, as a case study, an example of the way in which the events generated during the previous stages can be easily categorized into various end uses by a semiautomatic algorithm is added to the work presented.Single-use events are massively classified into various water end use categories by means of clustering analysis.
Regarding the performance analysis of the first and second stages for the case study presented, the following conclusions were raised: The original raw flow traces, of the 20 households belonging to the studies R1 and R2, covering a monitoring period of 15 days per household, contain 19,858 events.After the filtering and separation process, the number of subevents grew to 46,721, of which 79.3% (37,042 events) are single uses.In other words, the number of water consumption events increased by 130%, and 26,863 new events were created.Up to 75.8% of the single-use events that can be classified, have been obtained through the disaggregation process defined in this work.Therefore, the methodology proposed solves most of the cropping operations that need to be performed and reduces significantly the human intervention required to disaggregate the overlapped consumption events into single-use events, with significant time savings.
Finally, by comparing the manual and the automatically disaggregated events, it was observed that the characteristics of the events originated from the algorithms proposed are more homogeneous and consistent than the ones obtained by manual cropping.This result can be easily justified by the fact that the automatic separation algorithms always apply the same criteria, while a human analyst may change the cropping criteria while conducting the analysis.Furthermore, the inherent subjectivity of manual separation introduces dispersion in the physical characteristics of the events belonging to a specific water end use.More dispersion regarding the physical characteristics of the consumption events associated to an end use, unavoidably lead to poorer performance of whatever automatic classification technique that could be applied.This is why the single-use events obtained by the methodology proposed constitute a more reliable corpus for training and developing end use classifications algorithms.

Figure 1 .
Figure 1.General structure of the proposed methodology.

Figure 1 .
Figure 1.General structure of the proposed methodology.

Figure 2 .
Figure 2. Flow chart of the disaggregation process and sorting of events as single-use or uncertain.

Figure 2 .
Figure 2. Flow chart of the disaggregation process and sorting of events as single-use or uncertain.

Figure 4 .
Figure 4. (a) Example of an overlapped event.(b) Resulting events obtained after disaggregating fictitious unions and conducting a baseline flow analysis.

Figure 4 .
Figure 4. (a) Example of an overlapped event.(b) Resulting events obtained after disaggregating fictitious unions and conducting a baseline flow analysis.

Figure 5 .
Figure 5. (a) Overlapped event in which the two subevents start at the same time.(b) Two resulting single-use events after the cropping operation.

2 Figure 5 .
Figure 5. (a) Overlapped event in which the two subevents start at the same time.(b) Two resulting single-use events after the cropping operation.

Water 2018, 10 , 46 9 of 21 Figure 6 .
Figure 6.Result of Partition Around Mediods (PAM) algorithm with a number of clusters equal to 7 and similarity matrix based on Gower distance.

7 Figure 6 .
Figure 6.Result of Partition Around Mediods (PAM) algorithm with a number of clusters equal to 7 and similarity matrix based on Gower distance.

Water 2018, 10 , 46 10 of 21 Figure 7 .
Figure 7. General characteristics of the analyzed households and the events associated with them.

Figure 7 .
Figure 7. General characteristics of the analyzed households and the events associated with them.

Figure 8 .
Figure 8. Classification of events after filtering and cropping (disaggregation process).

Figure 8 .
Figure 8. Classification of events after filtering and cropping (disaggregation process).

Water 2018 ,
10, 46 12 of 21 according to the authors' experience, the same work would have required about 45 human-working days (360 h).

Figure 9 .
Figure 9. Distribution of the physical characteristics of the events classified as single-use and uncertain per study.

Figure 9 .
Figure 9. Distribution of the physical characteristics of the events classified as single-use and uncertain per study.

Figure 10 .
Figure 10.Display of 20 randomly selected individuals from clusters 2 and 6 for household HH-06 that belongs the R1 study.

Figure 11 .
Figure 11.The manual vs proposed methodology final results of complete disaggregation processing for the household HH-06 of the R1 study.

Figure 11 .
Figure 11.The manual vs proposed methodology final results of complete disaggregation processing for the household HH-06 of the R1 study.

Figure 11 .Figure 12 .
Figure 11.The manual vs proposed methodology final results of complete disaggregation processing for the household HH-06 of the R1 study.

Figure A3 .
Figure A3.Examples of Manual vs. Automatic Disaggregation from R2 study.

Table 1 .
General statistics about performance of separation process.

Table 1 .
General statistics about performance of separation process. ). ).