With an increasing need for secured water supply, a better understanding of the water consumption behavior is beneficial. This can be achieved through end-use classification, i.e., identifying end-uses such as toilets, showers or dishwashers from water consumption data. Previously, both supervised and unsupervised machine learning (ML) techniques are employed, demonstrating accurate classification results on particular datasets. However, a comprehensive comparison of ML techniques on a common dataset is still missing. Hence, in this study, we are aiming at a quantitative evaluation of various ML techniques on a common dataset. For this purpose, a stochastic water consumption simulation tool with high capability to model the real-world water consumption pattern is applied to generate residential data. Subsequently, unsupervised clustering methods, such as dynamic time warping, k-means, DBSCAN, OPTICS and Hough transform, are compared to supervised methods based on SVM. The quantitative results demonstrate that supervised approaches are capable to classify common residential end-uses (toilet, shower, faucet, dishwasher, washing machine, bathtub and mixed water-uses) with accuracies up to 0.99, whereas unsupervised methods fail to detect those consumption categories. In conclusion, clustering techniques alone are not suitable to separate end-use categories fully automatically. Hence, accurate labels are essential for the end-use classification of water events, where crowdsourcing and citizen science approaches pose feasible solutions for this purpose.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited