Multivariate Analysis of Water Quality Data for Drinking Water Supply Systems

Vulnerability of drinking water supply systems (DWSSs) depends on different factors such as failures, loss of security, man-made threats, and the change and deterioration of supply-water quality. Currently, the lifespan of several DWSSs worldwide has been exceeded, exasperating these issues. The monitoring activity and the transparency of information on water availability and quality are becoming increasingly important in accordance with the national regulations and standards, and with guidelines of the World Health Organization (WHO). These activities can be considered as support and guidance tools for identifying health-related risks, for building a safe management of drinking water supply systems, and for improved user confidence in the consumption of tap water. In this context, in the present work an analysis of the quality monitoring data of DWSSs was carried out using multivariate techniques. The analysis considered several chemical–physical parameters collected in the period 2013–2020 for some DWSSs in the Emilia-Romagna region, Italy. Principal component analysis (PCA) and cluster analysis (CA) methods were used to process and reduce the dimensionality of the data, to highlight the parameters that have the greatest influence on the qualitative state of the supplied water and to identify clusters.


Introduction
Water resources are vital to human life and activities. Considering the economic, social, environmental, and public health aspects of water resources, important is the sustainable management of these resources together with the management aspects of drinking water supply systems (DWSSs) [1]. Design and management of DWSSs involve several aspects of research related to optimization models for design, pipe failures, vulnerability, quality, robustness, and resilience [2][3][4][5][6].
Currently, in numerous countries, many DWSSs have exceeded their lifespan and the age of these networks is a factor, contributing to water quality deterioration within the system.
Water quality is monitored through sampling and analysis of a certain number of physical, chemical, and bacteriological parameters at different sampling points; the number and type of monitored parameters, frequency and method of sampling, and monitoring plans are defined according to government regulations.
In EU, the Drinking Water Directive (DWD)-Council Directive 98/83/EC laid down the water quality standards at EU level. On 16 December 2020, the European Parliament formally adopted the revised Drinking Water Directive. The Directive entered in force on January 2021, and Member States will have two years to transpose it into national legislation. Among the innovations present in the revised DWD, there is the assessment of risks associated with distribution, including the domestic distribution, and the effective and transparent communication to users regarding the water quality in order to promote the consumer confidence. An important aspect of the revised DWD is also the need to provide users with simple and accessible information on the availability and quality of tap water, thus improving trust in tap water use and favoring the reduction of bottled water consumption.
For managers and water-supply companies, data analysis and interpretation are important elements for proper DWSSs management. The monitoring data constitute an n-dimensional space characterized by parameters measured and interpreted separately. In this context, the methods of multivariate statistical analysis, such as principal component analysis (PCA), allow for establishing correlations between parameters, and to interpret and analyze the parameters together.
Methods of multivariate statistical analysis have been mainly used for monitoring the quality of surface water [7][8][9][10][11][12][13] and groundwater [14][15][16][17][18][19], although some studies were also carried out on drinking water and DWSSs. Praus (2005) [20] analyzed by multivariate methods a set of data relating the quality monitoring in 126 drinking water samples taken from a city water network in North Moravia, Czech Republic. Through PCA, the dimensionality of the dataset was reduced and the clusters sorted the drinking water samples according to their groundwater and surface water origin. Using principal component analysis and cluster analysis, Jankowska et al. (2017) [21] studied the multivariate correlation among chemical parameters in 11 WDNs of Siedlce County in Poland, and the water networks were grouped based on similar water quality aspects. Quality assessment of three types of drinking water sources in Guinea-Bissau was carried out by [22], considering microbiological and physicochemical parameters. Six water samples were collected from 22 different points considering piped water distribution systems, deep tubewells, and shallow hand-dug wells. Euclidean distance and principal component analysis were used to assess relationship between the microbiological, physical, and chemical parameters considered in the study. Tiouiouine et al. (2020) [23] used PCA to analyze data extracted from the SISE-Eaux database archived since 1990 by the French Regional Health Agency. The authors selected a 10-year period (2006-2016) from the Provence-Alpes-Côte d'Azur region database. PCA was used to synthesize the information and to separate the independent sources of variability.
In this work, a multivariate statistical analysis was carried out considering some DWSSs of the Emilia-Romagna region, Italy. In Italy, the governance of integrated water services has been characterized in recent decades by various regulatory provisions and an evolving political, legal, and institutional framework. Over the years, the fragmentation of the management of this service has not allowed a systematic and organized collection of data on the integrated water service and on the water quality monitoring. Therefore, the lack of such data did not allow for carrying out in Italy many studies on the aspects of water quality and monitoring for DWSSs. Emilia-Romagna, on the other hand, is a region that has implemented over the years a reorganization of the integrated water service by creating a single optimal territorial area governed by an entity, endowed with legal personality under public law and administrative, accounting, and technical autonomy. The management is entrusted to some companies. The reorganization and management of integrated water services in this region has favored the collection and availability of data. In this context, the interest of this work deals with the possibility to perform a multivariate statistical analysis using the sampling data of the regular monitoring carried out by one water-supply company for seven territorial operational districts of the region, at municipal scale. The samplings refer to a large of time period, 2013-2020, and several physical and chemical parameters were considered.
It is worth highlighting that the water supplied is of good quality for all districts and in accordance with standard values. The aim of this paper is use PCA to analyze and reduce the dimensionality of the data acquired during the regular monitoring of the water quality of the DWSSs and to identify correlations between the chemical and physical parameters taken into consideration. Results are used to better interpret these data in support of the DWSSs management activities and investment. Clusters have also been identified using the k-means clustering method. Using this approach, managers can have a global view of water quality in urban areas and identify the parameters that present maximum variance and could be benefit from expanded monitoring.

Case Study Area
The Emilia-Romagna region has a population of about 4.5 million inhabitants and a territorial extension of 22,500 km 2 . The quality data of the DWSSs used for the work presented here concern the regular monitoring carried out by the Hera group, whose basic activity in the water sector concerns management of the Integrated Water Service for many municipalities of the Emilia-Romagna region [24]. The territorial operational districts, the provinces of which the districts are part, the number of municipalities, and some characteristics of the DWSSs managed by Hera S.p.A., whose quality data were used in this study, are indicated in the following Overall, the quality monitoring data relating to the networks of 164 municipalities were analyzed. Figure 1 shows the Emilia-Romagna region and all the municipalities of the seven territorial operational districts analyzed.

Data
Monitoring data were collected from 2013 to 2020 according to Italian and European standards. The regulatory framework consists of: The results of 18 physical and chemical parameters were considered and the samples collected from various points of DWSSs from sources to the distribution system, and in drinking-water treatment plants (means of the measurements) [26]. Analyses were carried out according to the actual standardized ISO and EN methods. All the monitoring parameters and standard values are described in Table 2.
Among these parameters, for each operational district, a preliminary selection of the variables carrying the most useful information was carried out. The criterion used for the selection of the parameters was based on the relative deviation standard (RDS) together with the evaluation and disposal of variables showing weak correlation with the others.

Data
Monitoring data were collected from 2013 to 2020 according to Italian and European standards. The regulatory framework consists of: The results of 18 physical and chemical parameters were considered and the samples collected from various points of DWSSs from sources to the distribution system, and in drinking-water treatment plants (means of the measurements) [26]. Analyses were carried out according to the actual standardized ISO and EN methods. All the monitoring parameters and standard values are described in Table 2. Among these parameters, for each operational district, a preliminary selection of the variables carrying the most useful information was carried out. The criterion used for the selection of the parameters was based on the relative deviation standard (RDS) together with the evaluation and disposal of variables showing weak correlation with the others.

Methods
Among multivariate statistical techniques, PCA is probably one of the most popular and used in many different fields. PCA was first introduced by Pearson [27] and its present formalization was developed by Hotelling [28]. PCA allows for the analysis of large datasets that represent observations of several variables that are generally interrelated [29]. Goals of PCA are to extract the most important information from the dataset, reduce dimensionally the dataset, and simplify it, thereby minimizing information loss, allowing for analysis of the structure of observations and variables. This reduction is achieved by transforming original variables to a new set of variables, the so-called principal components, PCs, which are uncorrelated and ordered so that the first few retain most of the variation present in all of the original variables [30]. Computation of the principal components reduces to the solution of an eigenvalue-eigenvector problem for a positive-semidefinite symmetric matrix [29]. PCA can be considered as a rotation of the axes of the original variable coordinate system to new orthogonal axes, called principal axes, such that the new axes coincide with directions of maximum variation of the original observations. PCA can be performed using a method based on the variance-covariance matrix or the correlation matrix. A correlation PCA occurs when variables are measured with different units and a standardization of each variable to the unit norm is carried out.
Considering a data matrix X, PCA provides an approximation of X in terms of the product of two matrices T and P'. These matrices, T and P', capture the essential data patterns of X. Plotting the columns of T gives a picture of the dominant "object patterns" of X and, analogously, plotting the rows of P' shows the complementary "variable patterns" [31]: The columns in T, ta, are the score vectors and the rows in P', pa, are defined as loading vectors. E is defined as the residual matrix.
For the study described in this paper, the data were normalized to the z score before applying PCA, while the scree plot was used to select the number of PCs [32]. The scree plot shows the number of eigenvalues ordered from largest to smallest, and the number of components defined as the number that appears prior to the elbow of the curve.
Regarding CA, k-means clustering algorithm was used. This algorithm allows for partitioning the data into a certain number of clusters such that a clustering criterion is optimized. The objective of k-means clustering is to minimize the squared error function [33].
To measure the datasets adequacy, the Kaiser-Meyer-Olkin (KMO) test was carried out [34]. A KMO value of 0.5 is considered the smallest value acceptable for PCA.
PCA and CA were performed using the Octave 5.2.0 software, while the KMO test was performed by Python 3.7.9.

Results and Discussion
The PCA multivariate method and the k-means clustering were applied to the dataset of each district of the study case. This section describes the results obtained.
For each district, the correlation matrix, the table of eigenvalues of each principal component, the scree plot of PCs, and the georeferenced map of the district with clusters highlighted are included as Supplementary Materials (Annex I-VII).

District of the Metropolitan City of Bologna
As previously mentioned, for this study there were 42 reference municipalities in the area of the metropolitan city of Bologna.
A value of 0.69 was obtained by performing the KMO test.
Regarding PCA analysis, two PCs were identified, which explained about 89% of the total variability. The first component PC1 is associated with calcium, chloride, pH, conductivity, water hardness, magnesium, and dry residue. PC2 is mainly composed of bicarbonate alkalinity, total alkalinity, nitrate, and potassium.
Concerning the clustering, two clusters were identified based on PCA scores. The first cluster includes 17 municipalities while the second comprises 25 municipalities. The difference between the clusters is related to the supply systems; the municipalities of the first cluster are served by the same DWSS with supply from surface water, groundwater, and springs, while the municipality of the second cluster are almost all served by the same DWWS as the first cluster, but also by other supply systems (springs, groundwater, and surface water from a lake). Figure 2 shows the 2D-Score plot considering PC1 vs. PC2 with the two clusters highlighted.
component, the scree plot of PCs, and the georeferenced map of the district with clusters highlighted are included as Supplementary Materials (Annex I-VII).

District of the Metropolitan City of Bologna
As previously mentioned, for this study there were 42 reference municipalities in the area of the metropolitan city of Bologna.
A value of 0.69 was obtained by performing the KMO test.
Regarding PCA analysis, two PCs were identified, which explained about 89% of the total variability. The first component PC1 is associated with calcium, chloride, pH, conductivity, water hardness, magnesium, and dry residue. PC2 is mainly composed of bicarbonate alkalinity, total alkalinity, nitrate, and potassium.
Concerning the clustering, two clusters were identified based on PCA scores. The first cluster includes 17 municipalities while the second comprises 25 municipalities. The difference between the clusters is related to the supply systems; the municipalities of the first cluster are served by the same DWSS with supply from surface water, groundwater, and springs, while the municipality of the second cluster are almost all served by the same DWWS as the first cluster, but also by other supply systems (springs, groundwater, and surface water from a lake). Figure 2 shows the 2D-Score plot considering PC1 vs. PC2 with the two clusters highlighted.

District of the Province of Ferrara
Regarding the eleven municipalities of this district managed by Hera group, 11 parameters were considered: bicarbonate alkalinity, total alkalinity, calcium, pH, conductivity at 20 °C, water hardness, magnesium, nitrate (NO3), potassium, dry residue at 180 °C , and sulfate. A value of 0.61 was obtained by performing the KMO test.

District of the Province of Ferrara
Regarding the eleven municipalities of this district managed by Hera group, 11 parameters were considered: bicarbonate alkalinity, total alkalinity, calcium, pH, conductivity at 20 • C, water hardness, magnesium, nitrate (NO 3 ), potassium, dry residue at 180 • C, and sulfate. A value of 0.61 was obtained by performing the KMO test.
For this district, two PCs were identified which explained 91% of the total variability. PC1 explained 80% of the total variability and is mainly associated with bicarbonate alkalinity, total alkalinity, calcium, conductivity, water hardness, magnesium, potassium, dry residue, and sulfate. PC2 (11%) is associated with pH and nitrate.
Regarding the clustering analysis, two clusters were identified; the first groups almost all the municipalities while the second cluster consists of only one municipality (Figure 3).
For this district, two PCs were identified which explained 91% of the total variability. PC1 explained 80% of the total variability and is mainly associated with bicarbonate alkalinity, total alkalinity, calcium, conductivity, water hardness, magnesium, potassium, dry residue, and sulfate. PC2 (11%) is associated with pH and nitrate.
Regarding the clustering analysis, two clusters were identified; the first groups almost all the municipalities while the second cluster consists of only one municipality (Figure 3).

District of Forlì -Cesena
Regarding the thirty municipalities of the district of Forlì-Cesena, 11 parameters were considered. For this district, the KMO test value is 0.62 and the parameters used were bicarbonate alkalinity, total alkalinity, calcium, pH, conductivity at 20 °C, water hardness, fluoride, magnesium, potassium, dry residue at 180 °C , and sulfate.
With the 90% of the total variability explained, for this district two PCs were identified. PC1 (about 83% of total variability) is composed of bicarbonate alkalinity, total alkalinity, calcium, conductivity, water hardness, magnesium, potassium, dry residue, and sulfate, while PC2 (about 7%) is associated with pH and fluoride.
Regarding the clustering, two clusters were identified; the first cluster groups six municipalities while the second consists of 24. For the largest cluster, most of the water availability (about 63%) derives from the same water source, Lake Ridracoli. For the remaining part, the water supply is also provided by municipal and intermunicipal DWSSs. Regarding the first cluster, the water supply is mainly provided by municipal and intermunicipal DWSSs. These considerations justify the difference between the two clusters and the results obtained. Figure 4 shows the 2D-Score plot considering PC1 vs. PC2; the same figure also shows the clusters.

District of Forlì-Cesena
Regarding the thirty municipalities of the district of Forlì-Cesena, 11 parameters were considered. For this district, the KMO test value is 0.62 and the parameters used were bicarbonate alkalinity, total alkalinity, calcium, pH, conductivity at 20 • C, water hardness, fluoride, magnesium, potassium, dry residue at 180 • C, and sulfate.
With the 90% of the total variability explained, for this district two PCs were identified. PC1 (about 83% of total variability) is composed of bicarbonate alkalinity, total alkalinity, calcium, conductivity, water hardness, magnesium, potassium, dry residue, and sulfate, while PC2 (about 7%) is associated with pH and fluoride.
Regarding the clustering, two clusters were identified; the first cluster groups six municipalities while the second consists of 24. For the largest cluster, most of the water availability (about 63%) derives from the same water source, Lake Ridracoli. For the remaining part, the water supply is also provided by municipal and intermunicipal DWSSs. Regarding the first cluster, the water supply is mainly provided by municipal and intermunicipal DWSSs. These considerations justify the difference between the two clusters and the results obtained. Figure 4 shows the 2D-Score plot considering PC1 vs. PC2; the same figure also shows the clusters.

District of the Province of Modena
Regarding this district, 11 parameters were considered. The result of the KMO test obtained is not very suitable as it is just above the accept-

District of the Province of Modena
Regarding this district, 11 parameters were considered. The result of the KMO test obtained is not very suitable as it is just above the acceptability limit and equal to 0.51, although the variance explained is about 91%. Therefore, the result obtained for the multivariate analysis, in terms of clusters and grouping by similar features, resulted in compatibility with DWWSs schemes and the origin of water sources.
Parameters considered are bicarbonate alkalinity, total alkalinity, calcium, pH, conductivity, magnesium, nitrate, potassium, dry residue, sodium, and sulfate. PC1 (about 77%) is composed of calcium, pH, conductivity, magnesium, nitrate, potassium, and dry residue, while PC2 explained about 15% of the total variability and is associated with bicarbonate alkalinity, total alkalinity, sodium, and sulfate.
Regarding the clustering, three clusters were identified; the first cluster groups nine municipalities while the second consists of 12 municipalities, and the last cluster is composed of five municipalities. For the first and second cluster, all municipalities are served by the same two DWSSs; similarly, in the case of the third cluster, almost all the municipalities are served by the same DWSS. It is worth noting that two municipalities of the first cluster are also interconnected with DWSSs of the third cluster.
The 2D-Score plot considering PC1 vs. PC2 with highlighted clusters is shown in Figure 5.

District of the Province of Rimini
Regarding this district, 15 parameters were considered. The result of the KMO test is 0.6.
Parameters considered are bicarbonate alkalinity, total alkalinity, calcium, free residual chlorine, chloride, pH, conductivity, water hardness, fluoride, magnesium, potassium, dry residue, sodium, and sulfate. PC1 explained about 72% of the total variability and it is associated with conductivity, water hardness, magnesium, potassium, dry residue, sodium, and sulfate; PC2 explained about 10% of the total variability and it is composed of bicarbonate alkalinity, total alkalinity, calcium, free residual chlorine, chloride, pH, fluoride, and nitrate. Three clusters were identified; the first cluster groups nine municipalities, the second consists of two municipalities, and the last cluster groups 13 municipalities. In general, the entire area is served by two large DWSSs with supply from two lakes (one of which is used only in summer). In addition to these supplies, the water supply for the various municipalities is also integrated with wells. The differences between the three clusters are due to the different wells that provide water resources to the municipalities of each cluster. Figure 6 shows the 2D-Score plot considering PC1 vs. PC2 and clusters.

District of the Province of Rimini
Regarding this district, 15 parameters were considered. The result of the KMO test is 0.6.
Parameters considered are bicarbonate alkalinity, total alkalinity, calcium, free residual chlorine, chloride, pH, conductivity, water hardness, fluoride, magnesium, potassium, dry residue, sodium, and sulfate. PC1 explained about 72% of the total variability and it is associated with conductivity, water hardness, magnesium, potassium, dry residue, sodium, and sulfate; PC2 explained about 10% of the total variability and it is composed of bicarbonate alkalinity, total alkalinity, calcium, free residual chlorine, chloride, pH, fluoride, and nitrate. Three clusters were identified; the first cluster groups nine municipalities, the second consists of two municipalities, and the last cluster groups 13 municipalities. In general, the entire area is served by two large DWSSs with supply from two lakes (one of which is used only in summer). In addition to these supplies, the water supply for the various Water 2021, 13, 1766 9 of 14 municipalities is also integrated with wells. The differences between the three clusters are due to the different wells that provide water resources to the municipalities of each cluster. Figure 6 shows the 2D-Score plot considering PC1 vs. PC2 and clusters.
Water 2021, 13, x FOR PEER REVIEW 10 of 15 Figure 6. District of Rimini 2D-Score plot showing PC1 vs. PC2 and clusters: first cluster in red, second cluster in blue, and third cluster in green.

District of the Province of Ravenna
For this area, 14 parameters were considered: bicarbonate alkalinity, total alkalinity, calcium, chloride, pH, conductivity, water hardness, magnesium, manganese, nitrate, potassium, dry residue, sodium, and sulfate.
The KMO test value is 0.62 while regarding PCA analysis, three PCs were identified, which explain about 97% of the total variability. PC1 is associated with conductivity, water hardness, magnesium, potassium, dry residue and sulfate, while PC2 is composed by bicarbonate alkalinity, total alkalinity, calcium, manganese, and nitrate. Finally, PC3 is associated with chloride, pH, and sodium.
Regarding the clustering, two clusters were identified; in the largest cluster, the six municipalities are mainly served by the same DWSS while the two municipalities of the other cluster differ. In addition to being served by the same DWSS, it is served by about 60% through another water supply scheme.
The 2D-Score plot considering PC1 vs. PC2, PC1 vs. PC3, and PC2 vs. PC3, and the 3D-score plot of the three principal components and clusters are shown in Figure 7.

District of the Province of Ravenna
For this area, 14 parameters were considered: bicarbonate alkalinity, total alkalinity, calcium, chloride, pH, conductivity, water hardness, magnesium, manganese, nitrate, potassium, dry residue, sodium, and sulfate.
The KMO test value is 0.62 while regarding PCA analysis, three PCs were identified, which explain about 97% of the total variability. PC1 is associated with conductivity, water hardness, magnesium, potassium, dry residue and sulfate, while PC2 is composed by bicarbonate alkalinity, total alkalinity, calcium, manganese, and nitrate. Finally, PC3 is associated with chloride, pH, and sodium.
Regarding the clustering, two clusters were identified; in the largest cluster, the six municipalities are mainly served by the same DWSS while the two municipalities of the other cluster differ. In addition to being served by the same DWSS, it is served by about 60% through another water supply scheme.
The 2D-Score plot considering PC1 vs. PC2, PC1 vs. PC3, and PC2 vs. PC3, and the 3D-score plot of the three principal components and clusters are shown in Figure 7.

District of Imola-Faenza
Regarding this district, PCA was carried out considering a total of 14 parameters: bicarbonate alkalinity, total alkalinity, calcium, chloride, pH, conductivity, water hardness, fluoride, magnesium, nitrate, potassium, dry residue, sodium, and sulfate. For this district, as for that of Modena, the KMO test value of 0.56 is not suitable, although, also in this case, the total explained variability is equal to 94%.
Using PCA, the 14 parameters were reduced to three PCs. PC1 is mainly associated with bicarbonate alkalinity, total alkalinity, calcium, conductivity, water hardness, magnesium, and dry residue. PC2 is associated with chloride, fluoride, potassium, and sulfate while PC3 is related to pH and nitrate.
Using the k-means clustering method, two clusters were identified of which the first cluster groups 19 municipalities while the second consists of only four municipalities.
Regarding this cluster, it is worth highlight that three municipalities of the first cluster use waters from the same drinking water treatment plant and one also uses water resources from the DWSS of the largest municipality, the city of Imola. Within the second cluster, the values obtained for one municipality also differ more in relation to the use of water from wells located in the same municipality territory.

District of Imola-Faenza
Regarding this district, PCA was carried out considering a total of 14 parameters: bicarbonate alkalinity, total alkalinity, calcium, chloride, pH, conductivity, water hardness, fluoride, magnesium, nitrate, potassium, dry residue, sodium, and sulfate. For this district, as for that of Modena, the KMO test value of 0.56 is not suitable, although, also in this case, the total explained variability is equal to 94%.
Using PCA, the 14 parameters were reduced to three PCs. PC1 is mainly associated with bicarbonate alkalinity, total alkalinity, calcium, conductivity, water hardness, magnesium, and dry residue. PC2 is associated with chloride, fluoride, potassium, and sulfate while PC3 is related to pH and nitrate.
Using the k-means clustering method, two clusters were identified of which the first cluster groups 19 municipalities while the second consists of only four municipalities.
Regarding this cluster, it is worth highlight that three municipalities of the first cluster use waters from the same drinking water treatment plant and one also uses water resources from the DWSS of the largest municipality, the city of Imola. Within the second cluster, the values obtained for one municipality also differ more in relation to the use of water from wells located in the same municipality territory. Figure 8 shows the 2D-Score plot considering PC1 vs. PC2, PC1 vs. PC3, and PC2 vs. PC3, and the 3D-score plot of the three principal components and clusters.  Figure 8 shows the 2D-Score plot considering PC1 vs. PC2, PC1 vs. PC3, and PC2 vs. PC3, and the 3D-score plot of the three principal components and clusters.
It is worth noting that overall, for all of the districts, the discarded parameters were ammonium, arsenic, free residual chlorine, and manganese, followed for almost all the districts by fluoride, chloride, and sodium.
Regarding PCA results, for most districts two PCs were identified, except for two districts in which three PCs were identified. PCs explained high percentages of total variability, between 82% and 97%.
The cluster analysis allowed for grouping the municipalities according to similar features in accordance with DWSSs and source origins of the water in an area where the water supply schemes are complex and interconnected.
The districts of Rimini and Ravenna show similar results in terms of discarded and most influential parameters; in particular, conductivity, water hardness, magnesium, potassium, dry residue, and sulfate results are the most influential parameters for these districts. Similar consideration concerns the districts of Bologna, Ferrara, and Forlì-Cesena. These results are related to the origin of water resources and to the interconnected schemes. It is worth noting that overall, for all of the districts, the discarded parameters were ammonium, arsenic, free residual chlorine, and manganese, followed for almost all the districts by fluoride, chloride, and sodium.
Regarding PCA results, for most districts two PCs were identified, except for two districts in which three PCs were identified. PCs explained high percentages of total variability, between 82% and 97%.
The cluster analysis allowed for grouping the municipalities according to similar features in accordance with DWSSs and source origins of the water in an area where the water supply schemes are complex and interconnected.
The districts of Rimini and Ravenna show similar results in terms of discarded and most influential parameters; in particular, conductivity, water hardness, magnesium, potassium, dry residue, and sulfate results are the most influential parameters for these districts. Similar consideration concerns the districts of Bologna, Ferrara, and Forlì-Cesena. These results are related to the origin of water resources and to the interconnected schemes.

Conclusions
In this study, the water quality of several DWSSs that provide water for 164 municipalities was analyzed using multivariate methods. In particular, PCA and CA were used to carry out a comprehensive evaluation of the water quality monitoring data, to extract the parameters that most affect water quality and its variation, and to identify clusters in

Conclusions
In this study, the water quality of several DWSSs that provide water for 164 municipalities was analyzed using multivariate methods. In particular, PCA and CA were used to carry out a comprehensive evaluation of the water quality monitoring data, to extract the parameters that most affect water quality and its variation, and to identify clusters in relation to similar characteristics. In this paper, the analysis was performed considering seven districts characterized by complex and interconnected DWSSs. In general, considering all the districts, albeit with the necessary differences, the parameters that most influence the variation in the water resource quality are mainly conductivity, water hardness, magnesium, and dry residue, followed by bicarbonate alkalinity, total alkalinity, calcium, and sulfate.
It is worth highlighting that similar results in terms of the most influential parameters are found for the districts of Bologna, Ferrara, and Forlì-Cesena, which are justified by the interconnection of the DWSSs and the similar types of water sources. Similar results and considerations occur for the districts of Rimini and Ravenna.
Considering each district, the dimensionality of the dataset was reduced to two or three PCs, and the use of CA also allowed for the identification of various clusters. These groups are confirmed and justified in relation to the DWSSs that serve the municipalities, and to the origin of the water supply sources.
The study presented here and the results described aimed at illustrating the usefulness of the PCA method for analyzing and better interpreting complex datasets by reducing their dimensionality and obtaining information on the parameters that have the greatest influence on the variation of water quality. In the same way, the use of CA can be useful to create maps and to group monitoring sites, water supply schemes, water supply sources, and, as in the case of this study, cluster municipalities served by the same DWSSs with similar characteristics. The use and the combination of these methods can be a useful tool for managers and companies to better interpret monitoring data, for the management and monitoring activities, and can lead to improved knowledge concerning the DWSSs, especially where there are complex interconnected schemes and, in case of inadequate knowledge, of the entire water system infrastructure. Finally, considering the growing use of SCADA systems and the transition of DWSSs to cyber-physical systems, the interest of multivariate statistical methods also concerns the prospect of being able to use this approach for the analysis and interpretation of the large amount of data made available by this new type of smart technology system.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/w13131766/s1, Figure S1: Scree plot-District of Bologna, Figure S2: Georeferenced map of the district of Bologna with clusters highlighted, Figure S3: Scree plot-District of Ferrara, Figure S4: Georeferenced map of the district of Ferrara with clusters highlighted, Figure S5: Scree plot-District of Forlì-Cesena, Figure S6: Georeferenced map of the district of Forlì-Cesena with clusters highlighted, Figure S7: Scree plot-District of Modena, Figure S8: Georeferenced map of the district of Modena with clusters highlighted, Figure S9: Scree plot-District of Rimini, Figure S10: Georeferenced map of the district of Rimini with clusters highlighted, Figure S11: Scree plot-District of Ravenna, Figure S12: Georeferenced map of the district of Ravenna with clusters highlighted, Figure S13: Scree plot-District of Imola-Faenza, Figure S14: Georeferenced map of the district of Imola-Faenza with clusters highlighted, Table S1: Correlation matrix-District of Bologna,   Data Availability Statement: Data available online https://www.gruppohera.it/gruppo/attivita_ servizi/business_acqua/qualita/qualita_acqua_hera_qualita_media_comuni (Accessed on 12 January 2021).