Identiﬁcation of Factors That Inﬂuence Energy Performance in Water Distribution System Mains

: This paper aims at identifying paramount hydraulic factors in energy dynamics of water mains, using Principal Components Analysis (PCA). The proposed method is applied to two large ensembles of leaky and non-leaky pipes comprising over 40,000 pipes selected from 18 North American water distribution systems to guarantee the versatility of pipe characteristics and statistical signiﬁcance of the explored patterns. PCA mono-plots indicate energy metrics such as Net Energy Efﬁciency, Energy Lost to Friction and Energy Lost to Leakage serve better in identiﬁcation of low from high efﬁciency pipes. In addition, PCA mono-plots and bi-plots reveal relative importance of hydraulic parameters and that average ﬂow rate, hydraulic proximity to major components and average unit headloss can have more tangible effects on energy dynamics of pipes compared to leakage and average pressure. Some factors such as elevation, diameter and C HW are not as inﬂuential as expected in distinguishing high-efﬁciency from low-efﬁciency pipes. Further, a comparison between the approach used in this paper and a simpliﬁed common-practice replacement strategy points out the difference energy considerations can make, if included in a bigger asset management landscape.


Introduction
Water main aging and deterioration tends to evolve in lockstep with a loss of hydraulic capacity, an increased leakage, and a higher pumping energy requirement in a water distribution system [1][2][3][4][5].
There are several factors such as operating pressure and topographical elevation that are known to have a large impact on the energy performance of water mains and water distribution systems in general [6]. Often, detailed hydraulic modeling and/or optimization are needed to fully ascertain the extent to which changes in hydraulic parameters will change the energy use in distribution systems. Unfortunately, advanced energy modeling and optimization are not extensively used in engineering practice nor do most water utilities have the resources to perform these analyses to characterize the energy performance of their systems. Further, the large number of variables in distribution system operation is an important barrier that makes it difficult for water utility managers to gain a clear picture of how system operations and the state of deterioration of pipes act together to affect the energy performance of water mains and their systems.
Previous studies that have considered energy issues in distribution systems have only examined a few case study systems in an ad hoc fashion. Energy indicators to relate system-wide energy efficiency to pump efficiency and reservoir location were developed, without considering leakage impacts [7]. Cabrera et al. [8] presented a set of metrics to characterize the system-wide energy performance that includes losses to friction, leakage, and overpressure. These energy metrics provided a useful set of tools to help water utility managers better understand how far their systems were from an ideal energy-efficient state but fall short of being able to identify individual pipes that would be problematic. Building upon their earlier work, the same authors presented additional metrics to assess the energy efficiency of a pressurized system and procedures to prioritize interventions on a system-wide basis [9]. The study [10] examined the energy dynamics of groups of pipes and pumps in the Toronto distribution system. While these researchers also solved the energy balance to examine the frictional losses in individual pipes of the Toronto system, they did not examine the efficiency, leakage, and other energy characteristics of these pipes. The results of these previous studies pertain specifically to the specific distribution systems examined and it is an open question as to whether these results are transferrable to a wide cross section of real, complex systems.
To address this problem, this research focuses on applying the tools of statistical analysis to large datasets spanning multiple systems to examine the relationships between pipe and hydraulic parameters and energy performance. What motivates the use of large datasets and this statistical approach is that the ensuing results have statistical significance and are transferrable across a wide cross section of large, complex distribution systems [11]. The knowledge gleaned from this research can be used by water utilities to identify water main assets with threshold levels that lead to low energy performance without having to resort to advanced water distribution and energy modeling and optimization techniques.
The aim of this paper is to build on the work of Hashemi et al. [11] to identify the hydraulic parameters that have the largest impact on the energy performance of water main assets in distribution systems. The paper answers three research questions:

1.
What hydraulic parameters have the largest influence on the energy performance for water mains data in distribution systems? 2.
What combinations of hydraulic parameters can better distinguish highly efficient water mains from those with low efficiency? 3.
How aligned are the simplified rehabilitation approaches, for example those based on pipe age or break rate, with energy efficiency in water mains?
A statistical approach is taken to address these three research questions. This paper applies principal components analysis (PCA) to an ensemble of 40,000 water mains across 18 water distribution systems. The choice to use PCA in this paper is motivated by two challenges: First, the high dimensionality (numerous pipe and hydraulic parameters) of the dataset makes it difficult to visualize and identify what parameters drive energy use. Second, the dataset comprises 40,000 data points (40,000 water mains across 18 systems) so the large number of data points requires advanced statistical techniques to fully explore [2,3]. PCA is a proven technique that can simplify a large dataset and identify the most influential parameters that drive energy use in distribution systems. Thus far, there has been little research that has deployed statistical techniques like PCA to examine the energy dynamics of water mains with a large set of data on pipes across numerous systems.
The new knowledge created in this paper has the potential to help water utilities perform a screening-level identification of groups of pipes that are likely to have a low energy performance and follow up with targeted condition assessment and hydraulic modeling to further examine the energy performance of pipes and their candidacy for rehabilitation.

Pipe-Level Energy Metrics
The pipe-level energy metrics developed by the authors [12] were used in conjunction with PCA to examine the relationship between pipe and hydraulic parameters (e.g., average flow (Ave Q), pipe roughness (C HW ), pipe diameter (D), average unit headloss (UH), presssure (P), elevation (Elv) and hydraulic proximity to major components) and energy use in water mains of distribution systems (see Table 1). The pipe-level metrics and their parameters are defined in Equations (1)- (6). The reader can refer to [12] to obtain more details on these pipe-level metrics.
where GEE, the gross energy efficiency (Equation (1)), compares the energy delivered to the users serviced by a pipe (E delivered ) to the energy supplied to that pipe (E supplied ). Each energy components are defined in Table 1. NEE, net energy efficiency (Equation (2)), compares the energy delivered to users serviced by a pipe (E delivered ) to the net energy in that pipe (E supplied − E ds ), where net energy is the energy supplied to the pipe minus (E supplied ) the energy supplied to users located downstream of the pipe and not directly serviced by the pipe (E ds ). ENU, energy need by user (Equation (3)), compares the energy delivered to the users serviced by a pipe (E delivered ) against the minimum energy needed by those users (E need ). The minimum energy by a user is defined as E need = γ Q min H min ∆t and is a function of the minimum water use needed by users (Q min ) and the minimum pressure head required to deliver acceptable water service to users (H min ). ELTF, energy lost to friction (Equation (4)), compares the magnitude of friction loss in the pipe (E friction to satisfy the demand and leakage at the end of the pipe, and demands downstream of the pipe) to the net energy supplied to the pipe (E supplied − E ds ). ELTL, energy lost to leakage (Equation (5)), compares the sum of the energy lost directly to leakage and the frictional energy loss along the pipe required to meet the leakage flow, Q l , at the end of the pipe or E leak + E friction (leak) relative to the net energy supplied to the pipe.

Principal Components Analysis (PCA)
The multivariate analysis of a considerable number of energy indicators and hydraulic factors (a high-dimensional problem) makes PCA a pertinent tool to reduce the dimensionality of a dataset. Using PCA makes it possible to identify what parameters account for most of the variance and scatter in the original dataset [13,14]. In this way, PCA makes it possible to visualize a large dataset and identify what pipes and groups of pipes possess combinations of characteristics that lead to low energy performance.
PCA essentially builds on a correlation matrix to visualize and explore patterns or relationships not captured by correlation analysis. Principal Components (PCs) are linear combination of Eigenvalues of the correlation matrix of the statistical ranks of hydraulic parameters and energy metrics [11]. They are, therefore, orthogonal and not correlated. Each data point (pipe) will be assigned a score on the PCs, hence the dataset along these PCs shows the most variance/scatter. This will distinguish pipes or groups of pipes from each other as the point of interest in this paper. The first few PCs, corresponding to the larger Eigenvalues of the correlation matrix, describe most of the variance in the dataset and are statistically sufficient to describe the variance of the data.

PCA Mono-Plots and Bi-Plots
A "mono-plot" is the representation of the hydraulic factors and energy metrics on the orthogonal axes of the first two PCs. Hydraulic factors or energy metrics with high scores on either axes of the mono-plot tend to be more influential on the variance of the dataset. Parameters that track closely together have similar effects on the dataset, while parameters that diverge from one another have a different influence on the dataset. Moreover, the original observations in the dataset can be presented in a "bi-plot" by transforming their original values into new PC coordinates, as described in Equation (6): where (pipe i score) j is the score of pipe i on the jth PC, [pipe i ] 1×n is the vector on the ith row on the matrix of pipes including the ranks of all n hydraulic variable values for pipe i and [PC j ] n×j is the Eigenvector corresponding to the jth largest Eigenvalue (the jth PC), including the scores of all n hydraulic parameters. Therefore, each pipe or observation will be assigned one value on each of the new directions or PCs, which makes the visualization of the observation on the new/transformed coordinate system possible. Clusters of pipe scores on the PCA bi-plot can distinguish data groups with similar characteristics. The formation of clusters can help identify the factors that have the most impact on the similarities or dissimilarities in the observations.

Application of Multivariate Statistical Analyses in Large WDSs
To yield robust results in statistical analysis, this paper required a benchmarking dataset representative of the wide variety of characteristics such as configuration, pipe conditions and age profile, found in different water distribution systems. Eighteen distribution networks, therefore, were selected from different areas in the states of Kentucky and Ohio in the United States as well as the province of Ontario in Canada [15][16][17]. The network models in Ohio and Ontario are those utilized by corresponding municipalities while Kentucky models comprise a database developed from GIS files obtained from the Kentucky Infrastructure Authority [15]. This large dataset includes over 40,000 pipes. Seventeen of these systems comprise almost 20,000 pipes without information available on leakage (the non-leaky ensemble), while one system comprises approximately 20,000 pipes that include leakage for each pipe as estimated by a robust field measurement campaign (the leaky ensemble). It is noted that there might be background leakage or leakage included as a part of the demand for the non-leaky ensemble, however, not included as a separate factor. The results of multivariate statistical analyses of the networks with and without leakage were also juxtaposed to understand the importance of considering leakage in the energy dynamics of WDSs. The characteristics of these WDSs are summarized in Table 2. To obtain hydraulic outputs of the distribution systems (such as nodal pressures and pipe flows), EPANET2.0 network models were used [18]. EPANET2.0 hydraulic outputs were then retrieved by a code in Visual Basic 6.0 to evaluate energy metrics. Except for average pressure that is a hydraulic output by EPANET2.0, all other hydraulic characteristics of the systems in Table 2, such as water demand, pipe roughness and sizes, leakage, etc., are surveyed data and input to the hydraulic models. Lastly, Matlab R15 was used to perform matrix algebra calculations to obtain mono-plots and bi-plots [19].

Non-Leaky Ensemble
PCA for non-leaky systems includes 11 variables, including both hydraulic parameters as well as energy metrics, summarized in correlation matrix (Table 3). Figure 1 indicates the corresponding contribution of each PC, based on the 11 Eigenvalues of the correlation matrix in Table 3. According to Figure 1, in the non-leaky ensemble, the first two PCs describe almost 65% of the variance in the data (47.3% and 16.9%, respectively), while the other nine PCs account for 35% of the variance in the data. Hence, the first two PCs are selected to compress and visualize the pipe dataset in a two-dimensional space. It is noted that the PCs do not directly correspond to the parameters and variables in the correlation matrix of Table 3 and are linear combinations of them to introduce new directions on which the correlation is minimized and variance maximized. Table 3. Correlation matrix of energy metrics and pipe hydraulic factors [11].    The mono-plots presented in the next section are a visual expression of the importance of hydraulic parameters and metrics with regard to the first two PCs in both non-leaky and leaky (in a similar fashion) ensembles. Figure 2 shows the mono-plot for the non-leaky ensemble. The x-axis represents PC 1 , describing the most variation in the pipe dataset (47.3%). The y-axis represents PC 2 , describing the second most variation in the data (16.9%). More influential parameters and metrics are mainly perceived to have scores higher than 0.3 on each PC. However, to narrow down the important parameters and metrics for decision making, higher alignment with the PCs will also be preferable. According to Figure 2, GEE and NEE track closely, meaning that higher values for one result in higher values for the other as well. The PC 1 values for these two metrics suggest that they are more influential in describing variance than parameters such as C HW , diameter and Elv. On the other hand, ELTF, Average flow (Ave. Q), headloss and proximity are clustered together. This not only means that they have similar effects on pipes, but also that high values of these parameters result in lower values of GEE and NEE. It is also noted that all parameters of GEE, NEE, ELTF, proximity, Ave Q and headloss are well represented with regard to PC 1 , as their respective vectors are first, much larger compared to parameters such as C HW and diameter and, second, closely aligned with the PC 1 axis. Moreover, along the PC 2 axis, ENU and Pressure (P) are clustered together and have high vector magnitudes compared to other parameters. Therefore, it can be inferred that these two vectors are highly correlated/aligned to each other, and that they have higher importance compared to D and C HW . However, ENU and P have lower importance or influence compared to those with higher values along the PC 1 axis (ELTF, GEE, NEE, headloss, etc.). This is mainly because PC 2 describe less variance (16.9%) compared to PC 1 (47.3%).
The C HW and D vectors are not situated close to any other parameters or to each other, which means that they will not affect the dataset in the same way as the other parameter. In addition, these vectors are not well represented on either the PC 1 or PC 2 axes, both in their magnitude and in their direction, which explains less importance in the variance of the ensemble. The Elv vector points away from other parameters, implying that it has a different effect on the variance of the pipe dataset. It can also be inferred that higher elevations cause lower pressure, since the Elv and P vectors point in opposite directions.
The Ave Q and unit headloss vectors point in a similar direction and in the opposite direction of the GEE and NEE vectors. From a hydraulic standpoint, this implies that an increase in Ave Q and/or unit headloss in a pipe causes a decrease in GEE, NEE. Moreover, P closely tracks with ENU, confirming that pressure directly influences energy surplus or deficit in a pipe. The correlation matrix in Table 3 also indicates no relationship between these two groups of parameters, supporting the results shown in Figure 2, where the two groups (P and ENU versus GEE, NEE and Ave Q) do not track closely. Figure 3 shows the mono-plot for the leaky ensemble (indicated as system OH 2 in Table 2 as the largest distribution network). PCA for this includes the same variables as in the non-leaky ensemble plus daily leakage and ELTL, which comes to the total of 13 variables as opposed to 11 in the non-leaky ensemble. Even though the percentage of leakage in this system is fairly small (almost 8%), Figure 3 shows slightly different results to those of Figure 2 which could illustrate the importance of considering leakage. As shown by the axes, PC 1 describes 36.5% and PC 2 20% of the variance. The sum of these contributions is slightly smaller compared to that of the previous mono-plot mainly because more parameters (including daily leakage and ELTL) are now included in the PCA, which makes each parameter less descriptive with regard to the total variance in the ensemble. According to Figure 3, influential parameters are GEE, ELTF, proximity, Ave Q and headloss as they all hold comparatively higher scores along PC 1 (above absolute values of over 0.3 on both axes). Diameter is now more influential and closer to the cluster of ELTF, headloss, proximity, and Ave Q compare to the non-leaky ensemble results. This difference in the relative importance of the parameters may emphasize the importance of considering leakage and larger networks. Further, ELTL, leakage and NEE are the next most influential vectors, with high PC 2 scores (absolute value over 0.40). Elv in the leaky ensemble is a fairly important parameter compared to C HW , pressure, and even ENU, because of the length of the corresponding vector in Figure 3. As in the results for the non-leaky ensemble, C HW is a less important parameter-its vector points away from other parameters and has a small magnitude. Compared to the non-leaky ensemble, although ENU and P still track closely, their importance is dwarfed by ELTL and leakage along the PC 2 axis. This implies that in systems with leakage, the impact of pressure may be lower compared to leakage on the energy dynamics of the system. Another noticeable difference from the non-leaky ensemble is that the NEE vector direction does not match those of other parameters. This could be due to considering leakage, meaning that NEE seems to be affected by two sources of inefficiency, leakage and friction. This can explain why NEE and GEE do not cluster together as they did in the non-leaky ensemble results. In general, considering leakage in the analysis seems to have shuffled the importance of some of the hydraulic parameters, even though there are still similarities between the two cases.

Leaky Ensemble
For the leaky case, diameter now seems to have gained more influence (with a score of −0.3 on PC 1 ). It is also observed that average unit headloss and diameter can potentially have similar effects on the dataset, which was not captured originally by correlation analysis. It could also be interpreted as larger pipes being generally located near the major components, and thus may bear inherently higher unit headloss rates. This is also corroborated by the fact that water main sizes decrease moving away from major components in a system.

Clusters of High Efficiency Versus Low Efficiency Pipes
Results in Figures 2 and 3 explain that of all the variables included in PCA, comparatively energy dynamics have higher importance in describing the variance in both ensembles, as they are expressed through longer vectors and highly aligned with PC 1 and PC 2 axes. This would suggest that the mono-plots represent energy dynamics landscape in the two ensembles. Therefore, the energy metrics values can be used to characterize high/low-efficiency pipes throughout the whole dataset.
To find clusters of high-efficiency pipes in the non-leaky and leaky ensembles, the corresponding threshold as to distinguish these pipes ought to be defined. Investigation on the relationship between the energy metrics values and common-practice thresholds by Hashemi et al. [11] indicate that, for the efficiency metric, NEE, values above 70th percentile of the data set correspond to high efficiency (NEE > 99.9%). Similarly, energy loss metric ELTF values below 30th percentile correspond to low efficiency pipes (ELTF < 0.0018%). Moreover, 100% < ENU < 105% correspond to the optimal pressure range (approximately 30-50 m) specified in North American guidelines [20][21][22]. In a similar way, for the other efficiency and energy loss metrics (GEE and ELTL), it is assumed that the same thresholds suffice to distinguish high efficiency pipes. Therefore, GEE > 20% and ELTL < 0.8% are considered highly efficient [11,23,24]. Based on the same investigations, low efficiency pipes are considered to have metric values below 30th percentile for GEE and NEE and above 70th percentile for ELEF and ELTL. ENU values corresponding to excessive pressure in pipes indicated by standard (approximately 70 m) are considered to indicate low efficiency pipes for this metric. Therefore, GEE < 15%, NEE < 99.4%, ENU > 113%, ELTF > 0.3% and ELTL > 3% indicate low-efficiency pipes. The threshold values considered for both non-leaky and leaky ensembles are summarized in Table 4. Table 4. Threshold values to define high and low efficiency pipes by [11,23] in both non-leaky and leaky ensembles.  Figure 4 shows the bi-plot for the ELTF clusters, with ELTF value-ranges (represented by different colors) stratifying bands along the y-axis with the colors changing along the x-axis. Pipes with higher values of ELTF (low efficiency) stratify on the left-hand side while pipes with lower values of ELTF stratify on the right side of the bi-plot. This is because ELTF has a high score on PC 1 , therefore, based on Equation (6) pipes with similar ELTF values (pipe i,n ) will have similar products of these values and ELTF score on PC n,1 (pipe i,n × PC n,1 ). As Figure 2 indicated ELTF as one of the most influential hydraulic factors (with high score along PC 1 ), the product of pipe parameter values and the ELTF score, based on Equation (6), will then be higher and form bands on the direction shown in Figure 4. Based on Table 4, threshold values of Figure 4 are chosen to distinguish high efficiency pipes in green, low efficiency in red and other values in between in light blue. Further, it is seen that the direction on which the colors of bands change in Figure 4 is the same as the direction of the ELTF vector in Figure 2, i.e., higher values of ELTF, that is tantamount to low efficiency pipes in terms of ELTF, cause these pipes to form a band on the left side of Figure 4 (based on ELTF vector in Figure 2). Similarly, NEE and GEE display clusters of low values (close to zero) on the left hand side and the cluster of higher values (close to 1) on the right hand side. However, because of similar visual result as the ELTF cluster, they are not presented. As general rule, pipes with similar values of metrics expressed with larger vectors (GEE, NEE, ENU and ELTF) tend to cluster more visibly in certain areas of the bi-plots.

Non-Leaky Ensemble
Unlike the ELTF, NEE and GEE, ENU stratifications change more closely along the PC 2 axis than the PC 1 axis, as seen in Figure 5. The direction on which the color of bands changes is the same as the direction of the ENU vector in Figure 2. Values of ENU > 113% that correspond to low efficiency pipes tend to stratify on the bottom (indicated in the color of red) while those that correspond to 100% < ENU < 105% form a horizontal band closer to the top and indicated in green. Other pipes indicated in blue pertain to the other pipes ranging between high efficiency and low efficiency pipes. ENU obtains higher score along PC 2 , which indicates, first, lower importance compared to GEE, NEE and ELTF (that merit high scores on PC 1 ) and, second, no correlation between the two set of variable. This implies the direction on which the ENU values change has no correlation to that of GEE, NEE and ELTF, as PC 1 and PC 2 are orthogonal. In other words, efficiency in terms of ENU does not seem to have an effect on efficiency in terms of GEE, NEE and ELTF.  Clusters of high and low efficiency pipes are formed by combining high and low values of energy metrics such as GEE, NEE, ENU and ELTF indicated in Table 4. Therefore, the intersection of vertical bands (from metrics with higher scores on PC 1 ) and horizontal bands (from the metrics with high scores on PC 2 ) forms smaller clusters of high or low efficiency pipes. High-efficiency pipes are defined as summarized in Table 4. However, the purpose of setting thresholds for the metrics is to approximately locate the cluster of high-efficiency pipes on the PCA bi-plots, and not to suggest threshold values for rehabilitation and replacement in practice, as this would be a complex decision task involving multiple factors such as budgetary limitations, risk assessment, water quality, pipe age and break rates, along with energy considerations. The selected thresholds lead to a cluster formed on the top right area of the plot in Figure 6. Similarly, low efficiency pipes cluster are also defined as per summarized in Table 4. The mentioned thresholds create a cluster on the left hand side of the bi-plot. To the extent that stricter values of metrics are desired, the clusters can be smaller or larger, however, the location of clusters will remain the same.

Leaky Ensemble
According to Figure 7, pipes with higher NEE values are located in the top left area, while pipes with smaller values of NEE tend to cluster in the bottom right area of the plot. This arrangement of the clusters is mainly because of the direction of NEE axis relative to PC 1 and PC 2 axes, which makes this set of results different from non-leaky set of pipes. Thresholds of high versus low-efficiency pipes are considered based on Table 4. From the mono-plot in Figure 3, it was seen that ELTL has a high score on the PC 2 axis (value of 0.42) and its corresponding vector is closely aligned with the PC 2 axis. This is reflected in the bi-plot in Figure 8, where ELTL values change almost along PC 2 . High efficiency pipes regarding ELTL (based on Table 4) are clustered at the top (indicated in green), while low-efficiency pipes, on the bottom (indicated in red) and other value ranges (indicated in light blue) are situated in between. Stratifications of metrics values for GEE, NEE and ENU for the leaky ensemble resemble those of the non-leaky ensemble considering the same thresholds, and therefore are not presented here.
The combination of high efficiency values of metrics based on Table 4 results in the cluster of high efficiency pipes on the top left corner of Figure 9, indicated in green. In a similar way, the intersection of low efficiency bands of metrics form the cluster of low efficiency pipes located on the bottom right corner of Figure 9, indicated in red. In addition, similar to the non-leaky ensemble, choosing stricter or more lenient thresholds can result in smaller or larger clusters of high versus low-efficiency pipes; however, the location of the clusters will remain the same. The data points indicated in blue correspond to pipes in the ensemble with an efficiency that is in between the two cohorts of high/low efficiency pipes. Locating high/low efficiency pipes on the bi-plots of Figures 6 and 9 can help identify which hydraulic factors can better point towards these cohorts (considering vectors of hydraulic factors in Figures 2 and 3).

Examining Current-Practice Pipe Rehabilitations
To assess how simplified, common-practice rehabilitation plans would perform from an energy efficiency standpoint, the pipe replacement plan for the leaky ensemble proposed by Prosser et al. [16] was compared to the proposed approach in this paper. The approach proposed by Prosser et al. [16] considers thresholds of 25 breaks per 100 km or 100 years of age in pipes as two alternatives to trigger replacement. Figure 10 shows the clusters of high and low-efficiency pipes as in Figure 9. In addition, the pipes earmarked for replacement by Prosser et al. [16] beyond the benchmark date (in this case, from 2013 to 2040) are shown as yellow triangles, while previously replaced pipes are shown as black diamonds. It can be seen that many of the pipes to be replaced do not overlap with the low efficiency cluster, as identified through PCA, nor do these pipes move towards high-efficiency pipes after replacement.

Discussion
One of the main findings of the analysis was to identify which parameters have the largest influence on describing the variance in energy performance across all pipes in the dataset. Knowing these parameters can help to identify high versus low efficiency pipes.
The PCA results shown through mono-plots in Figures 2 and 3 indicated that of all the hydraulic parameters and energy metrics included in the analysis, NEE, ELTF in both ensembles and ELTL in the leaky ensemble have the highest capacity to describe the variance in the dataset, or, in other words, the energy dynamics landscape, based on their score associated with the two PCs.
Accounting for leakage causes the mono-plot (e.g., Figure 3) and the bi-plots (e.g., Figures 7-9) to take a different shape compared to those of the non-leaky ensemble (e.g., Figures 2 and 4-6). For instance, stratification of similar value ranges for metrics (e.g., ELTF and ENU in Figures 4 and 5) are vertical or horizontal in the non-leaky pipe dataset, while NEE stratifications in the leaky pipes ( Figure 7) are diagonal. This result emphasizes the need to consider leakage in energy analysis to fully characterize the complex relationships between friction and flow in individual pipes. Consideration of leakage as shown in Figure 3 causes NEE to take on a distinguished direction compared to other metrics and may potentially identify a different group of low efficiency pipes.
Having categorized high efficiency versus low efficiency pipes, it is of interest to know what characteristics these pipe would have in common, within each cluster. When planning for asset management or rehabilitation, water utilities need to associate highly efficient or low efficiency assets with more familiar decision factors (e.g., pipe size, roughness, unit headloss, pressure, etc.) that would be more readily available, given the level of effort to calculate energy metrics. However, they would also need to know what combination of the readily available decision factors (hydraulic parameters in this study) and with what priority these parameters could be used to better distinguish low from high performance pipes. This objective of the study is achieved by juxtaposition of mono-plots (Figures 2  and 3) and bi-plots (Figures 6 and 9) for each ensemble.
Having located the high and low efficiency categories of pipes on the bi-plot using the criteria in Table 4, comparison of the PCA results for the non-leaky ensemble (Figures 2 and 6) indicates that Ave Q, hydraulic proximity and unit headloss are suitable candidates for identifying the two cohorts of high/low efficiency pipes, based on their vector orientations. It should be noted that, based on the relative importance of these parameters, i.e., the association with the PCs, low efficiency pipes are better characterized by unit headloss (or high Ave Q) compared to pressure. Because of their vector sizes and directions, Elv, D and C HW would not serve as suitable guides to identify high/low efficiency pipes. Although C HW is considered as an influential factor for pipe replacement in practice, the results indicate that this parameter alone is not a suitable representative of energy efficiency.
In a similar way, considering the mono-plot ( Figure 3) and the bi-plot (Figure 9) in the leaky ensemble, the best hydraulic parameters with which to identify high/low efficiency pipes are revealed. In this case, the most influential parameters are Ave Q, hydraulic proximity and unit headloss in pipes, because of the size of their corresponding vectors as well as their alignment with the most important principal component, PC 1 . The next best set of hydraulic parameters includes leakage and pressure, which have relatively less importance in identifying high/low efficiency pipes, because of their alignment with the second most important principal component, PC 2 . At the same time, the leakage flow itself seems to be more important compared to pressure because of its vector size. Therefore, leakage in pipes, if well characterized, would be a better indicator than pressure for identifying energy efficiency in pipes. Similar to the case for the non-leaky pipe ensemble, Elv, C HW and diameter play a less significant role in characterizing energy dynamics in pipes. In addition, in the leaky ensemble, diameter seemingly has gained more importance due to a longer vector along PC 1 in Figure 3. This perhaps corresponds to the correlation of larger pipe sizes and higher flows (not clearly shown in the non-leaky ensemble), due to more accurate model calibration compared to the KY systems, which are the majority of the non-leaky ensemble. However, since diameter does not point directly towards the clusters of high/low efficiency pipes on the bi-plot of Figure 9, they would not be nominated among the most influential parameters in the energy dynamic landscape.
By mathematical definition, Ave Q and hydraulic proximity are more directly reflected in GEE and NEE, in the way that pressure and unit headloss are reflected in ENU and ELTF, and that makes these parameters and their corresponding energy metrics highly correlated. Thus, the hydraulic parameters can be used to target high/low efficiency pipes, as corroborated by mono-plots and bi-plots, particularly given that parameters such as unit headloss and Ave Q are more available to decision makers at water utilities. If leakage is known and well-characterized, it can serve alongside unit headloss and Ave Q as the best candidates for enabling decision makers to effectively earmark high efficiency/low efficiency pipes. In fact, a combination of hydraulic parameters in the form of the resultant vector of unit headloss (or Ave Q) and leakage on the mono-plot of the leaky ensemble has an orientation that would better serve to identify low efficiency pipes. Pipes experiencing high unit headloss (or Ave Q) and high leakage would be the most likely candidates in this case.
To understand how energy dynamics align with replacement programs based upon age of pipe or pipe break rates, Figure 10 placed these methods side by side to high and low efficiency pipes. The results showed that the pipes that would be replaced based upon their age or break rate history are generally not the ones that are the least energy efficient. Although considering energy efficiency alone does not suffice for pipe replacement decision making, the difference in the outcome of the two approaches implies that energy efficiency should be considered in conjunction with other factors such as age, break rate, water quality, payback period and risk assessment to complement the bigger asset management picture, particularly as water utilities seek to become more energy efficient.

Conclusions
The goal of the present paper is to explore the patterns and relationships between energy metrics and hydraulic parameters to better understand which parameters have the greatest influence on energy performance of individual pipes. PCA is used to simultaneously visualize the relationships between energy metrics and hydraulic factors and to prioritize these parameters by their importance. This statistical approach helps to reduce the dimensionality of the dataset to allow for identification of combinations of factors that have significant influence on the energy performance of water mains. Two large ensembles comprising over 40,000 pipes juxtaposed the difference in results for systems with and without leakage and highlighted the importance of considering of leakage in large real-world systems when studying their energy dynamics.
The PCA mono-plots show that parameters such as flow, hydraulic proximity of pipes and unit headloss play more important roles in influencing the energy efficiency of pipes compared to leakage and pressure. However, leakage and pressure have a greater impact on the energy efficiency of pipes than diameter, C HW and Elv, which are not well-represented on any of the principal components. However, since leakage and C HW could change throughout the time, it would be a worthwhile study to consider time-based degradation of the leakage and C HW in other efforts.
The PCA bi-plots help to visualize and locate low-versus high-efficiency pipes in a two-dimensional space considering all hydraulic parameters and pipe-level energy metrics. In both leaky and non-leaky cases, clusters of high-and low-efficiency pipes are located on two opposite corners of the plot, and, when considered in conjunction with mono-plots, reveal combinations of hydraulic parameters that would more directly point towards these clusters. The hydraulic parameters of unit headloss and average flow, which are more explicitly involved in mathematical definitions of energy metrics, would serve best in the absence energy metrics.
Overall, energy dynamics along with risk assessment, pipe break rates and age, and water quality, can help to prioritize the replacement and rehabilitation of pipes and should be considered as part of the bigger picture to improve overall water distribution system asset performance. This study has identified several metrics and parameters that could be useful for this purpose moving forward.