The Use of Cluster Analysis to Evaluate the Impact of COVID-19 Pandemic on Daily Water Demand Patterns

: Proper determination of unitary water demand and diurnal distribution of water consumption (water consumption histogram) provides the basis for designing, dimensioning, and all analyses of water supply networks. It is important in the case of mathematical modelling of ﬂows in the water supply network, particularly during the determination of nodal water demands in the context of Extended Period Simulation (EPS). Considering the above, the analysis of hourly water consumption in selected apartment buildings was performed to verify the justiﬁcation of the application of grouping by means of k-means clustering. The article presents a detailed description of the adopted methodology, as well as the obtained results in the form of synthetic distributions of hourly water consumption, and the effect of the COVID-19 pandemic on their change.


Introduction
Research and analyses of water demand for the purposes of, e.g., housing, services, or public purpose buildings has been enjoying increasing popularity in recent years. It is primarily due to the development of technology allowing for the measurement of water flow or automatic water meters readings (AMR) with a higher frequency than so far (once a month, quarter, or even year-depending on the adopted system of billing settlements), but also due to an increase in the use of computer networks in the exploitation of modern water distribution systems, allowing for the collection of any amount of data. This progress permits the measurement of water flow with any time interval [1].
The preparation of an accurate diagram of water demand is essential from the point of view of mathematical modelling of water distribution systems. It also provides the basis for appropriate designing and dimensioning of water supply networks, connections, and selection of the water meter [1,2].
Water demand is determined by a number of variables depending on the type of object to which water is supplied. In housing, the primary factors determining the course of water consumption during the day include daily behaviours of residents, their lifestyle, and routine activities involving the consumption of supplied water. Another important aspect is the day of the week, i.e., whether it is a working day or a holiday (weekend or a bank holiday). The announcement of the state of epidemic emergency in the country, caused by the virus SARS-CoV-2, could have considerably affected daily behaviours of

Background
Interest in processing databases of water meter records has been increasing in recent years due to increasingly frequent application of computer hydraulic modelling in the management and dimensioning of water supply networks. Hydraulic modelling of water supply networks is performed with the application of EPS, determining quasi-dynamic behaviour of the system in time, by calculating the state of the system for a series of determined simulations in which hydraulic demand and threshold conditions change in time. Credibility of a hydraulic model for conducting any analyses of the operation of the distribution system requires its relevant calibration [5]. This process requires accurate measurement of water consumption with the application of AMR throughout the water supply system, and then defining threshold conditions within the consumption range, not only in the form of mean diurnal consumption in computational nodes, but also their changes in time. The already developed models of water consumption can be found in the literature. For example, Obradović and Lonsdale [6] devoted a considerable part of their publication to models of water consumption for an exceptionally broad circle of recipients, including a hospital, university, school, church, bank, hotel, authority office, military barracks, prison, industrial factories by type, and many others. In the introduction, the authors emphasised, however, that an attempt to determine "typical patterns" of water consumption is doomed to be unsuccessful due to the overlapping of a number of factors, often difficult to identify, that affect water consumption [6]. Therefore, the proposed form of processing data involved averaged hourly models of water consumption with their maximum and minimum envelopes.
As mentioned above, such models can also be obtained through AMR solution and further analysis of their results. It should be emphasised that this type of solution is possible for all recipients and requires limitation to designating so-called reference recipients for whom hourly patterns of water consumptions are identified and then ascribed to particular nodes of the hydraulic model [5,6].
Although the technique of allocation of water consumption variables in time for particular computational nodes based on the selected reference recipients appears simple, unfortunately, it is not common. The conventional method described as the top-down approach is applied considerably more frequently. It assumes water consumption in nodes of the model determined only for entire zones of water supply and then ascribed to all nodes of the network with consideration of correlation coefficients [7]. This not only permits the simplification of the development of computer models of water supply networks, but also fast obtainment of convergences of computational algorithms in the calibration process. The resulting image of water flows, however, can carry a significant error.
Considering the above, Blokker et al. [7] proposed an entirely different method of estimation of water consumption in computational nodes of the network, described as the bottom-up approach. They developed a stochastic model of water consumption by the final recipient (for a unitary water supply connection), based only on statistical information on water users such as: number of residents, their age, frequency of water consumption, duration and rate of water flow per water consumption episode, and frequency of occurrence of different types of water consumption for different purposes during the day (e.g., for flushing the toilet, washing clothes, and washing hands). The model is called SIMDEUM (Simulation of water Demand; an End-Use Model) and is based on results of earlier research by Buchberber and Wells [8]. Its authors evidenced the possibility of describing water consumption patterns for household purposes by means of the non-homogeneous Poisson process (PRP).
The PRP model in reference to modelling water consumption found application among others in countries such as the USA [9], Spain [10], or Mexico [11]. In each case, the implementation required conducting a measurement campaign, and the obtained parameters of the PRP model showed mutual differences. The SIMDEUM model appears to have more practical application, because it does not require conducting research on water flows and is limited to the identification of statistical parameters of end consumption. This offers the possibility of generating models of water consumption also for non-existing areas planned for expansion. Moreover, the model can be applied not only to water consumption for household purposes, as in the case of the PRP model, but also to recipients such as, e.g., hotels, office buildings, or social care facilities. Both models generate consumption patterns with high resolution for the individual recipient, whereas hydraulic models of water supply networks are calculated with a 1 h time step. Therefore, Alvisi et al. [12] proposed a special spatial-temporal procedure for the aggregation of synthetic water consumption originally generated at the level of a single recipient and small-time step (e.g., 1 min-SIMDEUM and PRP) to synthetic series of consumption related to a group of recipients with a temporal resolution of 1 h.
Next to the already discussed models, Aksela and Aksela [2] proposed another solution employing the probabilistic model for generating patterns of water demand in single-family and semi-detached housing, with a temporal resolution of 1 h. The computation procedure involves the following first two steps: forecasting of weekly water consumption based on the linear regression model and division of the analysed water recipients into four separate classes based on the recorded mean weekly consumption. The division was performed by means of the cluster analysis and specifically k-means clustering. Then, the probability distribution of water consumption in time was modelled with the application of mixture of Gaussian models, and eventually, final probabilistic water consumption models were developed as a result of sampling from the probability distributions determined earlier.
The aforementioned examples of publications focusing on the issue of modelling of water consumption patterns reveal the current availability of specialised computer models allowing for the determination of the course of water consumption by individual recipients, primarily in housing. This issue, however, still enjoys great interest in the scientific circles, which provided the basis for this paper. For this purpose, a water consumption measurement campaign was conducted in three apartment buildings with a temporal resolution of 1 h. The temporal resolution resulted from the possibility of data registration by the existing AMR system. The determination of patterns of hourly demand for water in these buildings employed the cluster analysis, namely k-means clustering.

Formulation of the Model
The cluster analysis is a discipline in multi-dimensional statistics that includes a group of methods for the identification of uniform subsets of elements. Based on variables characteristic of our elements, the cluster analysis finds groups (clusters) of elements that are similar to elements belonging to the same cluster and simultaneously differ from those in the remaining clusters [13]. Importantly, depending on the adopted method, the division can occur to an a priori non-defined or defined number of groups (also known as clusters). The obtained groups cover subsets of the analysed population satisfying the conditions of their decouplability and completeness [13]. The division of the set of water consumption diagrams initially involved the application of the hierarchical agglomeration method and then the non-hierarchical method-k-means clustering.
The hierarchical agglomeration method initially assumes that each element is a separate cluster. Then, it gradually combines the mutually nearest (most similar) elements into new clusters, until a single cluster is obtained. The determination of sufficient similarity of two clusters requires defining measures of distance between the elements and developed clusters, and the rules of their combining. A review of available distance measures and their characteristics can be found in the publication by Stanisz [13] and the cited source literature [14]. The analyses employed the most popular and considered most natural distance metrics: Euclidean metric and square Euclidean distance. The Euclidean distance metric d(x,y) of elements x and y is represented by the following Formula (1) [13,15]: where x = (x 1 , . . . , x r ) and y = (y 1 , . . . , y r ). Formula (1) for p = 2 and p = 3 is the equivalent to distance on a plane and in space of two points x and y.
In the case of methods of linkage, a broad range of algorithms is also available, primarily including the following methods: single linkage, complete linkage, unweighted pairgroup average, weighted pair-group average, unweighted pair-group centroid, weighted pair-group centroid, and Ward's method. The analysis of water consumption employed the popular unweighted pair-group average method, where differences (distances) between all pairs of elements included in particular clusters are calculated. Averaged differences between all pairs are adopted as the measure of distance between particular clusters. The application of that computation method allows for the determination of which elements are similar to each other and can be included to the same clusters, and to what degree particular clusters are similar to each other and can be combined into structures of larger clusters. As a result of these methods, development of characteristic "chains" composed of elements combined into sequences forming clusters is observed for similar elements of the set. This permits easy identification of mutually strongly similar elements. The aforementioned chains are evidently visible in dendrograms resulting from agglomeration, constituting graphical illustration of the structure of a set of elements by decreasing similarity between its elements (and therefore increasing linkage distances).
The methods presented above lead to obtaining a dendrogram in which lower grade clusters are included in the composition of higher-grade clusters. K-means clustering is considerably different. It is a non-hierarchical method resulting in a breakdown in which no cluster is a sub-cluster of another cluster. In k-means clustering, k clusters are designated, differing from each other to the highest possible degree. It is necessary to assume a priori a specific number k of clusters into which the set of elements is partitioned. Therefore, k subsets are developed, and then objects are moved between them for the distance between them within subsets to be possibly small, and for the distance between clusters to be possibly large. The transfer procedure is repeated towards the most effective distinction of clusters. It can be traced in detail based on the calculation example presented in the publication by Larose [15]. In the study, the possibility of use of the clustering method was preceded by analysis with the application of the agglomeration method. As a result, a potential range of the number of clusters was determined, to be considered in the case of partitioning of patterns of hourly water demand in the object. Then, from that range, the optimal number of clusters was determined, conducted with the application of the analysis of total within sum of squares (wss) and Caliński and Harabasz Index values (CHIndex) for a different number of clusters. The total within the sum of squares wss is calculated based on the Formula (2) below: where k-number of clusters, x-element of cluster, C i -i-th data cluster, m i -centroid of of the cluster i, and x − m i 2 -Euclidean distance between two vectors. The CHIndex is calculated based of the following Formula (3) [16]: where k-number of clusters, N-total number of observations (elements of the set), SS B -total variance between clusters (trace of interclass covariance matrix), and SS W -total within cluster variation (trace of intraclass covariance matrix). It is assumed that the optimal number of clusters equals obtaining the maximum value of the CHIndex. A high value of the index is related to maximising the ratio of SS B and SS W , which means that particular clusters show considerable differences, and elements of the set grouped in particular clusters show strong similarity (relatively weakly differentiated). The wss value is also important. It naturally decreases with an increase in the number of clusters, and after reaching the optimal number of clusters, the rate of the decrease substantially decreases [16].
As a result of adopting the number of clusters, after running the k-clustering algorithm, it is possible to eventually assign specific patterns to the adopted clusters but also to prepare diagrams of averaged histograms of accumulated water demand for such clusters. The clustering algorithm was run three times by means of the bootstrap method, used for estimating the distribution of estimation errors, by means of multiple random drawing with sample return. This means that the clustering algorithm was performed each time for random samples from the entire set of water demand diagrams. Results obtained in subsequent iterations are comparable, permitting the determination of the bootmean parameter values. The bootmean parameter is calculated as the average value of the Jaccard index (Jaccard similarity coefficient) for each cluster. The Jaccard coefficient itself measures similarity between two sets and is determined as a quotient of the power set of the intersection of sets and power set of the sum of these sets. It is assumed that the value of the bootmean parameter should be higher than 0.6, because it is presumed from this value that the designated clusters do not include a random cluster, i.e., a cluster that includes patterns deviating from the remaining clusters but at the same time mutually dissimilar [16].

Characteristics of the Studied Objects
The research on water demand covered 3 mutually similar apartment buildings constructed in the 1980s within the same housing estate in Bydgoszcz, which is located in northern Poland. With a population of approximately 348,000, Bydgoszcz is the eighth largest city in the country [17].
The buildings chosen for research are composed of 5 floors and are divided into 4 or 5 staircases, depending on the number of apartments. One of them features a total of 60 apartments, and the remaining ones feature 50 apartments each. All apartments are equipped with the basic sanitary facilities, i.e., washbasin taps, kitchen sink taps, toilet flushes, showers or bath tubs, and washing machines. Moreover, each building has a discharge valve for watering greenery. A list of the facilities with the indication of the number of apartments and water meter diameter is presented in Table 1.
The primary source of water supply in the buildings is water supply service connections with a diameter of DN50, with mounted volumetric water meters with diameters DN32 and DN40, characterised by high sensitivity to very low flows. Moreover, due to the principle of their operation, their installation does not require keeping straight sections in front of and behind the water meter. They can also be mounted in any position, because it does not affect the metrological properties of the device. Their only disadvantage is high sensitivity to solid particles occurring in water [18].  Table 2 below presents the basic parameters for each water meter. Reading of data from water meters was performed by means of transceivers or recording attachment, permitting recording hourly water consumption.

Study Period
The study of hourly water demand in the discussed objects by means of the aforementioned devices commenced on 16 May 2019 and lasted, with intervals, until 6 October 2020, over a total of 464 days of measurement. Breaks in measurements resulted among others from the necessity to replace water meters-on 14 January a water meter with a reading panel in the form of a recorder was replaced, causing temporary problems with data reading; from the outbreak of the COVID-19 pandemic; and more specifically, from the 'total lockdown' from March 2020 and all the resulting restrictions that made readings temporarily impossible.
Therefore, the analysis of hourly water demand in the each of three buildings covered 464 days, whereas: 1.
133 days were days free from work (weekends).
Moreover, the data set for 464 days was partitioned into two separate sets to allow for the analysis of differences in water use by recipients caused by the announcement of the state of epidemic emergency in the country. Therefore, further in the paper, the following division is applied:
133 days were days free from work (weekends).
54 days were days free from work (weekends).
It was also necessary for the data sets to take account of holidays. For the period before the COVID-19 pandemic, their number was 8, and they occurred on business days, and for the period during the pandemic, their number was 5, and 3 of them occurred on business days. After consideration of the additional division, the data sets adopted for analysis were of the following character: 1.
for the period before the COVID-19 pandemic: a. 189 days were business days, b.
87 days were days free from work (weekends and holidays), c.

2.
for the period during the COVID-19 pandemic: a. 131 days were business days, b.
57 days were days free from work (weekends and holidays), c.

Calculations
Data initially obtained from AMR system regarding hourly water consumption in the object were presented in a table with one column specifying the date and time of reading, and the other total hourly flow through the water meter in units dm 3 /h. The data were partitioned with consideration of two periods: before the outbreak of the COVID-19 pandemic and during the pandemic. Then, the water flow data were processed to obtain data matrices in which rows represented subsequent days of the study, and columns represent the percent contribution of hourly water consumption (of total diurnal water demand). It should be emphasised that each row of the developed matrices formed an individual histogram of water consumption for each day. A total of 6 data matrices were prepared. They were used in subsequent stages to perform classification by means of the cluster analysis.
The first stage of the cluster analysis employed the hierarchical agglomeration method in which the determination of the measure of distance between elements and the resulting clusters employed the Euclidean distance metric, and the linkage rule was determined by means of the unweighted pair-group average. The set agglomeration resulted in a graphical illustration of the structure of the set of elements by decreasing similarity between its elements (i.e., increasing linkage distance). Dendrograms for each of the objects with division into measurement periods are presented in Figures 1-6. They suggest the occurrence of clusters, i.e., groups of demand patterns with similar histograms. Strong outlier elements (histograms) were also identified, for example, day number 124 for apartment building 1 ( Figure 1). a. 134 days were business days, b. 54 days were days free from work (weekends).
It was also necessary for the data sets to take account of holidays. For the period before the COVID-19 pandemic, their number was 8, and they occurred on business days, and for the period during the pandemic, their number was 5, and 3 of them occurred on business days. After consideration of the additional division, the data sets adopted for analysis were of the following character: 1. for the period before the COVID-19 pandemic: a. 189 days were business days, b. 87 days were days free from work (weekends and holidays), c. i.e., 68.5% and 31.5% of all days, respectively. 2. for the period during the COVID-19 pandemic: a. 131 days were business days, b. 57 days were days free from work (weekends and holidays), c. i.e., 68.7% and 31.3% of all days, respectively.

Calculations
Data initially obtained from AMR system regarding hourly water consumption in the object were presented in a table with one column specifying the date and time of reading, and the other total hourly flow through the water meter in units dm 3 /h. The data were partitioned with consideration of two periods: before the outbreak of the COVID-19 pandemic and during the pandemic. Then, the water flow data were processed to obtain data matrices in which rows represented subsequent days of the study, and columns represent the percent contribution of hourly water consumption (of total diurnal water demand). It should be emphasised that each row of the developed matrices formed an individual histogram of water consumption for each day. A total of 6 data matrices were prepared. They were used in subsequent stages to perform classification by means of the cluster analysis.
The first stage of the cluster analysis employed the hierarchical agglomeration method in which the determination of the measure of distance between elements and the resulting clusters employed the Euclidean distance metric, and the linkage rule was determined by means of the unweighted pair-group average. The set agglomeration resulted in a graphical illustration of the structure of the set of elements by decreasing similarity between its elements (i.e., increasing linkage distance). Dendrograms for each of the objects with division into measurement periods are presented in Figures 1-6. They suggest the occurrence of clusters, i.e., groups of demand patterns with similar histograms. Strong outlier elements (histograms) were also identified, for example, day number 124 for apartment building 1 (Figure 1).                   It should be emphasised that the cluster analysis is very sensitive to strong outliers. Therefore, it is recommended to remove them from the set of elements. Before further analyses based on the dendrograms (Figures 1-6), the strongest outlier histograms were eliminated from the set. Then, agglomeration of data was performed again, resulting in new structures of the set of elements. The operation of exclusion of days with anomalies in water consumption was performed for buildings 2 and 3 in the period before the pandemic and for buildings 1 and 2 for the period during the pandemic. In the case of apartment building 1 for the period before the COVID-19 pandemic, as well as apartment building 3 for the period during the COVID-19 pandemic, the current dendrogram was kept due to single outlier days that did not considerably affect the further part of computations. Therefore, Figures 7 and 8 present dendrograms for buildings 2 and 3 from the period before the pandemic the structures of which changed considerably. The set is evidently divided into two clusters with single histograms deviating from these two clusters. Figures 9 and 10 for buildings 1 and 2 from the period during the COVID-19 pandemic also show changed structures. In their case, the partitioning into clusters is ambiguous. due to single outlier days that did not considerably affect the further part of computations. Therefore, Figures 7 and 8 present dendrograms for buildings 2 and 3 from the period before the pandemic the structures of which changed considerably. The set is evidently divided into two clusters with single histograms deviating from these two clusters. Figures 9 and 10 for buildings 1 and 2 from the period during the COVID-19 pandemic also show changed structures. In their case, the partitioning into clusters is ambiguous.   due to single outlier days that did not considerably affect the further part of computations. Therefore, Figures 7 and 8 present dendrograms for buildings 2 and 3 from the period before the pandemic the structures of which changed considerably. The set is evidently divided into two clusters with single histograms deviating from these two clusters. Figures 9 and 10 for buildings 1 and 2 from the period during the COVID-19 pandemic also show changed structures. In their case, the partitioning into clusters is ambiguous.      The analyses of data agglomeration resulting in dendrograms permitted the determination of the potential range of the number of clusters that should be considered in the case of partitioning of patterns of hourly water consumption in buildings. The optimal number of clusters required for k-means clustering was determined based on the analysis of the total wws and CHIndex for different numbers of clusters. Both of these analyses were again performed and presented in the graphic form for each object separately with division into the period before the pandemic (Figures 11-13) and the period during the COVID-19 pandemic (Figures 14-16).
For all objects in the study period before the COVID-19 pandemic (Figures 11-13), the optimal number of clusters k should be adopted as 3. The optimal number of clusters equal to 3 results from obtaining the maximum CHIndex (CHIndex = 17.38; 41.07; 44.59, respectively) and from the course of the wws diagram whose rate of decrease in the parameter value from the same point evidently decreases. In the case of apartment building 1 (Figure 11), a second evident decrease in the wss parameter value is observed for a higher number of clusters equal to 13. For the optimal number of clusters equal to 13, however, the CHIndex value is not the highest value in the analysed range from 2 to 20 clusters. The analyses of data agglomeration resulting in dendrograms permitted the determination of the potential range of the number of clusters that should be considered in the case of partitioning of patterns of hourly water consumption in buildings. The optimal number of clusters required for k-means clustering was determined based on the analysis of the total wws and CHIndex for different numbers of clusters. Both of these analyses were again performed and presented in the graphic form for each object separately with division into the period before the pandemic (Figures 11-13) and the period during the COVID-19 pandemic (Figures 14-16).
Sustainability 2021, 13, x FOR PEER REVIEW 12 of 23 Figure 11. Value of the CHIndex and total within sum of squares (wws), and for a set of 276 days of normalised histograms of hourly flows for apartment building 1 from the period before the COVID-19 pandemic, depending on the adopted number of clusters k. Figure 11. Value of the CHIndex and total within sum of squares (wws), and for a set of 276 days of normalised histograms of hourly flows for apartment building 1 from the period before the COVID-19 pandemic, depending on the adopted number of clusters k. Figure 11. Value of the CHIndex and total within sum of squares (wws), and for a set of 276 days of normalised histograms of hourly flows for apartment building 1 from the period before the COVID-19 pandemic, depending on the adopted number of clusters k.   Figure 13. Value of the CHIndex and total within sum of squares (wws), and for a set of 263 days of normalised histograms of hourly flows for apartment building 3 from the period before the COVID-19 pandemic, depending on the adopted number of clusters k. Figure 13. Value of the CHIndex and total within sum of squares (wws), and for a set of 263 days of normalised histograms of hourly flows for apartment building 3 from the period before the COVID-19 pandemic, depending on the adopted number of clusters k.  For all objects in the study period before the COVID-19 pandemic (Figures 11-13), the optimal number of clusters k should be adopted as 3. The optimal number of clusters equal to 3 results from obtaining the maximum CHIndex (CHIndex = 17.38; 41.07; 44.59, respectively) and from the course of the wws diagram whose rate of decrease in the parameter value from the same point evidently decreases. In the case of apartment building 1 (Figure 11), a second evident decrease in the wss parameter value is observed for a higher number of clusters equal to 13. For the optimal number of clusters equal to 13, however, the CHIndex value is not the highest value in the analysed range from 2 to 20 clusters. Figure 15. Values of the CHIndex and total within sum of squares (wws) and for a set of 183 days of normalised histograms of hourly flows for apartment building 2 from the period during the COVID-19 pandemic, depending on the adopted number of clusters k. Figure 16. Values of the CHIndex and total within sum of squares (wws) and for a set of 188 days of normalised histograms of hourly flows for apartment building 3 from the period during the COVID-19 pandemic, depending on the adopted number of clusters k.
For objects in the period during the COVID-19 pandemic (Figures 14-16), the optimal number of clusters k is different depending on the analysed building. For buildings 1 and 2, the optimal number of clusters k should be 7, and for apartment building 3, it should be 5. This again results from obtaining the maximum value of the CHIndex and course of the For objects in the period during the COVID-19 pandemic (Figures 14-16), the optimal number of clusters k is different depending on the analysed building. For buildings 1 and 2, the optimal number of clusters k should be 7, and for apartment building 3, it should be 5. This again results from obtaining the maximum value of the CHIndex and course of the wws diagram whose rate of decrease in the parameter value from the same point evidently decreases.
It should be emphasised that in the case of this study period, the CHIndex did not reach a very high value: This means that particular clusters do not considerably differ from each other, and the elements of the set grouped in particular clusters are mutually similar (relatively weakly variable).
After the determination of the optimal number of clusters for each set of normalised histograms of accumulated water consumption in 3 objects with consideration of the study period (Table 3), k-means clustering was performed. The results and their analysis are presented in the following section. Table 3. The adopted optimal number of clusters for particular sets.

Results and Discussion
K-means clustering analysis employed the clustering algorithm run multiple times with the application of the bootstrap method. This permitted the determination of the bootmean parameter, used for the determination of the probability of elements in a given cluster. The value of the parameter should be higher than 0.6, because it is presumed from this value that the designated clusters do not include a random cluster, i.e., a cluster that includes outlier patterns that are mutually dissimilar. Table 4 presents values of the bootmean parameter for particular clusters determined during k-means clustering. Numbers of clusters were automatically assigned by the computational algorithm. The bootmean parameter value did not exceed 0.6 for any cluster. Values above or approximate to that threshold were primarily obtained for the study period before the outbreak of the COVID-19 pandemic. This means that less incidental and outlier events occurred in that period. During the COVID-19 pandemic, use of water by recipients showed a certain degree of chaos that made it impossible to classify the data credibly. This means that a large portion of histograms of diurnal water consumption was mutually dissimilar. Then, clusters obtained in the cluster analysis, i.e., averaged histograms of accumulated water consumption in buildings (consumption patterns) were analysed in terms of the classified type of days. The results of this procedure for all objects during and before the pandemic are shown in Table 5. The data show an evident division of histograms into patterns for business days and days free from work. For example, for building 2 in pattern No. 1, 99 business days were classified, and only two days were free from work; therefore, the pattern can be recognised as typical of business days. In the case of apartment building 1, pattern No. 3 covers 74 business days and 35 days free from work. Recognising it as typical of days free from work is only possible after analysing the course of the diagram.
Pattern No. 2 for building 1 also draws attention. It covered only 10 days among all days included in the analysis. This means that only 3.6% of the analysed days strongly deviated from water consumption diagrams typical of the object. This is also confirmed by the bootmean parameter value, which, for the remaining patterns, is higher than 0.7, i.e., histograms for these clusters are strongly similar.
Diagrams of accumulated water consumption obtained in the cluster analysis in the period before the pandemic in buildings with partitioning into patterns for business days and days free from work are shown in Figures 17 and 18. It was observed that although patterns obtained for three different objects are compared, their course is similar, and they differ only in the volume of water consumption in a given hour. In the case of days free from work ( Figure 18), averaged histograms of water consumption have a somewhat different course in comparison to business days. The first difference is a shift of maximum water consumption in the morning hours, starting only at 9:00 and lasting until 12:00. This suggests that recipients rest longer on these days. Moreover, before noon, maximum water consumption is also considerably greater than in the evening. This suggests that water is consumed not only for daily hygiene activities but also for the purposes of, e.g., cleaning or preparing meals [19].  Due to the characteristics of their course, histograms generated for business days ( Figure 17) are distributions of diurnal water demand typical of that type of day. They reach two maximums: first in the morning hours (7:00-9:00) when people prepare for work or school, and second, which is considerably greater in the evening hours, when residents spend time preparing meals and bathing (19:00-22:00).
In the case of days free from work ( Figure 18), averaged histograms of water consumption have a somewhat different course in comparison to business days. The first difference is a shift of maximum water consumption in the morning hours, starting only at 9:00 and lasting until 12:00. This suggests that recipients rest longer on these days. Moreover, before noon, maximum water consumption is also considerably greater than in the evening. This suggests that water is consumed not only for daily hygiene activities but also for the purposes of, e.g., cleaning or preparing meals [19]. Figure 17. Averaged histograms of accumulated water consumption in buildings for business days in the study period before the COVID-19 pandemic. Figure 18. Averaged histograms of accumulated water consumption in buildings for days free from work in the study period before the COVID-19 pandemic. Figure 18. Averaged histograms of accumulated water consumption in buildings for days free from work in the study period before the COVID-19 pandemic.
As was mentioned above, pattern No. 2 for building 1 showed strongly untypical diagrams of diurnal water consumption by recipients. This is also confirmed by the course of the averaged histogram for 10 days classified for that cluster (Figure 19). Due to its characteristics, i.e., maximum water demand around noon (between 9:00 and 14:00), the pattern could be classified as days free from work. According to Table 5, however, the cluster primarily covers business days (70%). Moreover, due to the low number of days in the cluster and low value of the bootmean parameter (0.2810), it cannot be considered representative for analyses, e.g., in mathematical modelling of water supply networks. These histograms should be interpreted individually.
Sustainability 2021, 13, x FOR PEER REVIEW 18 of 23 As was mentioned above, pattern No. 2 for building 1 showed strongly untypical diagrams of diurnal water consumption by recipients. This is also confirmed by the course of the averaged histogram for 10 days classified for that cluster (Figure 19). Due to its characteristics, i.e., maximum water demand around noon (between 9:00 and 14:00), the pattern could be classified as days free from work. According to Table 5, however, the cluster primarily covers business days (70%). Moreover, due to the low number of days in the cluster and low value of the bootmean parameter (0.2810), it cannot be considered representative for analyses, e.g., in mathematical modelling of water supply networks. These histograms should be interpreted individually. Figure 19. Averaged histogram of accumulated water consumption-pattern 2 for building 1 in the study period before the COVID-19 pandemic.
In the case of research conducted after the announcement of the state of epidemic emergency, 7 synthetic patterns of water demand were generated for buildings 1 and 2, and 5 patterns for building 3. The entire data set was divided between particular clusters Figure 19. Averaged histogram of accumulated water consumption-pattern 2 for building 1 in the study period before the COVID-19 pandemic.
In the case of research conducted after the announcement of the state of epidemic emergency, 7 synthetic patterns of water demand were generated for buildings 1 and 2, and 5 patterns for building 3. The entire data set was divided between particular clusters in a practically even way. No cluster includes a large majority of classified days (Table 6). Moreover, bootmean parameters for the designated clusters reached the threshold of a value approximate to or higher than 0.6 only in single cases-for pattern No. 2 in building 1 and for pattern No. 1 and 3 in building 3 (Table 4). This shows a high level of uniqueness in the way of water consumption by recipients. Wanting to verify the results of classification in terms of type of days assigned to particular clusters, however, in the majority of cases, based on the number of particular days in those sets, we are not able to assign them to a specific type of day ( It was also observed that some clusters cover a number of assigned days low in comparison to the entire set. They are among others: pattern No. 7 for building 1, including 8 days (which constitutes 4.3%), and pattern No. 5 for building 3, including only 3 days (1.6%).
Presentation of the obtained synthetic diagrams of water demand during the COVID-19 pandemic in the buildings was attempted in Figures 20 and 21 with consideration of the partitioning into patterns for business days and days free from work. The obtained patterns showed different courses. Moreover, they differ from those before the pandemic among others in higher water consumption during the day and by night, between 2:00 and 4:00.  For days free from work ( Figure 21), averaged histograms of water consumption are somewhat more approximate to those generated for the period before the pandemic, i.e., with a shifted maximum in water consumption in the morning to 9:00, and water consumption in the evening considerably lower than in the morning. Each synthetic pattern, however, has its own individual course, and they show no similarities. The synthetic diagrams of water demand presented in Figures 20 and 21 therefore confirm a certain type of disturbance in water use by residents in the 3 study objects.  For days free from work ( Figure 21), averaged histograms of water consumption are somewhat more approximate to those generated for the period before the pandemic, i.e., with a shifted maximum in water consumption in the morning to 9:00, and water consumption in the evening considerably lower than in the morning. Each synthetic pattern, however, has its own individual course, and they show no similarities. The synthetic diagrams of water demand presented in Figures 20 and 21 therefore confirm a certain type of disturbance in water use by residents in the 3 study objects. Due to reaching two maximums (in the morning and evening), histograms generated for business days (Figure 20) constitute specific typical distributions of diurnal water demand. Only water consumption during the day is untypical. It is probably the effect of introduction of remote work in the majority of companies and remote learning at schools.
For days free from work ( Figure 21), averaged histograms of water consumption are somewhat more approximate to those generated for the period before the pandemic, i.e., with a shifted maximum in water consumption in the morning to 9:00, and water consumption in the evening considerably lower than in the morning. Each synthetic pattern, however, has its own individual course, and they show no similarities. The synthetic diagrams of water demand presented in Figures 20 and 21 therefore confirm a certain type of disturbance in water use by residents in the 3 study objects.

Conclusions
This paper's objective was to develop a methodology supporting clustering and generation of synthetic distributions of diurnal water demand in apartment buildings for the purposes of mathematical modelling that is increasingly frequently applied in the processes of management and dimensioning of water supply networks [5]. It particularly involves hydraulic computations with the application of the EPS simulation, i.e., for longer temporal horizons, allowing for the understanding of the hydraulics of the distribution system, tracing changes in flows in time, or designation of zones of water mixing. This requires the investigation of the temporal and spatial dynamics of water consumption in particular nodes. A currently accepted simplification is the application in the mathematical model of the top-down approach in which consumption in line with that recorded in measurement points or pumping stations is imposed. A number of publications and scientific studies show that only the bottom-up approach is appropriate. It involves a determination of nodal consumption from the level of an individual recipient. This approach, however, requires knowledge of patterns of water demand in different objects [20,21].
Research of diurnal time series of water consumption was conducted in three mutually similar apartment buildings constructed in the 1980s in the same housing estate in Bydgoszcz. The time of recording of hourly water consumption covered a total of 464 days, whereas 276 days occurred before the announcement of the state of epidemic emergency in the country, and the other 188 days were recorded during the COVID-19 pandemic [22]. The results were analysed in terms of possibilities of the application of the cluster analysis for clustering and generation of synthetic distributions of daily water consumption. It should be emphasised, however, that before performing computations with the application of the k-means method, it was necessary to eliminate outlier days from the data set. This procedure was performed based on dendrograms developed by means of the hierarchical agglomeration method.
The application of the cluster analysis-k-means clustering-permitted the development of characteristic patterns of hourly water demand with division into business days and days free from work and holidays. The division was particularly unambiguous for the period before the pandemic. In the process, three synthetic patterns of hourly water consumption were generated. Two of them were patterns typical of business days, and the third one was a pattern for days free from work with a low number of business days, predominantly close to weekends (it probably results from the so-called "long weekends" in the holiday period). The exception was pattern 2 for building 1. It covered only 10 days completely deviating from the remaining ones and simultaneously dissimilar towards one another, which should be interpreted individually. The resulting averaged histograms of water consumption for particular buildings could be used for the determination of nodal water consumption in mathematical modelling of water supply networks.
A somewhat different situation was observed in the case of recording of data during the COVID-19 pandemic. Considerably more patterns of water demand were generated, and no cluster included a large majority of recorded data. Moreover, the majority of the designated clusters were too similar to each other or included histograms with a random character, as evidenced by the bootmean parameter value. A number of untypical behaviours in terms of water consumption were also observed, for example: water consumption by night, between 2:00 and 4:00, or increased varied water consumption during the day. A change in water consumption by recipients probably results from the introduction of remote work in the majority of companies and remote learning at schools. This suggests that credible classification for this data set is impossible due to the high uniqueness of water consumption by residents. Using the data in a mathematical model of water supply networks would require partitioning the set into smaller subsets and performing the clustering again. It may turn out that the generated clusters include a single or a low number of days. There is no doubt, however, that the COVID-19 pandemic has greatly influenced the daily water demand patterns worldwide [3]. Many studies [19,[23][24][25][26][27][28][29][30] show that during COVID-19 pandemic, water consumption patterns have followed other notable trends of the new normal 'stay at home' life. Families are getting their day started later, with peak morning consumption shifting two hours later. In addition, comparing pre-COVID-19 to current residential usage patterns, it is clear that the highest increase in water usage is happening in the afternoon as stay at home schoolers and workers are taking a break, getting up to use the restroom, washing their hands, and prepping meals. In general, people on average are using the bathroom at home three times more each day, and they are flushing the toilet five times more per day than before the pandemic [24,27,29,30]. They are also showering almost three more times per week than they did. In addition to the increase in the number of showers, the time of day when people are showering has also changed, further reflecting a later start to their day: midday and evening showers increased while morning showers shifted later in the day [29,30].
To sum up, this paper presents the method of clustering and generating synthetic diagrams of diurnal water consumption for the purposes of mathematical modelling. It also points to the variability among the generated patterns and the way the COVID-19 pandemic affected water consumption. Therefore, hydraulic analyses with the application of mathematical models require continuous records of water consumption by recipients for the purpose of updating the synthetic patterns of water consumption.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the privacy restrictions.