## 1. Introduction

Water distribution systems (WDSs) are strategic, infrastructural networks designed for providing potable water to consumers. However, ageing pipes and deteriorating components may often affect the level of service, sometimes also with negative effects on the quality of the resource. As a consequence, water utilities have to pursue a continuous process of monitoring, control and operational management, in order to guarantee the delivery of safe water even under occasional circumstances of high demand or adverse situations arising from climatic or accidental emergencies.

In the few last years, Water Safety Plans (WSPs) have been recognized a valid tool for a thorough identification of the various hazards and related impacts on a WDS [

1,

2], allowing water managers and public institutions to perform global risk assessment and to prioritize control measures and strategic interventions in order to enhance overall reliability [

3,

4]. In this context, many studies have focused on quantitative resilience and robustness evaluation (recent reviews are in [

5,

6]), but less on reliability analysis for water quality (see [

7] for a review).

One problem still unsolved is represented by the optimal design of monitoring systems for the prompt identification of random or targeted contamination events, which are, today, more reliable due to the rapid improvements achieved in sensor technology and data transfer and communications. Both single-objective and multi-objective optimization problems have been addressed [

8].

In the last two decades, Complex Network Theory (CNT) has received increasing attention for the comprehension of a wide spectrum of real systems, ranging from physical infrastructures to social communities [

9,

10]. In particular, the application of CNT to the design and operation of WDSs has attracted a growing number of researchers, because of its inherent capability of unveiling hidden properties not grasped by traditional analyses or modelling approaches [

11].

CNT has been adopted for evaluating the topological characteristics and the resilience of a system [

12], especially for expansion strategies [

13]. Several authors investigated vulnerability-related issues, like node vulnerability under cascading failures [

14,

15,

16], spectral methods to establish vulnerability areas [

17,

18], or methods for evaluating robustness under random or intentional attacks [

19]. Other studies focused on the analysis of the formation of isolated communities [

20] and on the segmentation of WDNs for the identification of District Metered Areas (DMAs) using general metrics and modularity [

21,

22]. The optimal sampling design has also been addressed with the modularity concept [

23] and through a combination of classical optimization and CNT [

24].

More recently, complex network theory has been adopted for the systematic classification of WDSs [

25], and for the optimal design of a water distribution system, adopting a tradeoff between network cost and flow entropy as a measure of network reliability [

26]. All such studies have been mainly focused on the topological aspects of WDSs. A step forward is taken in [

27], where the standard definition of three centrality measures is extended in order to take into account the intrinsic relevance of vertices beyond the mere topology.

In this paper, we adopt a different approach, analyzing the functional relationship within a system through extended period simulations and the source tracing of a contaminant accidentally injected at junctions. In this regard, we recall that random or accidental contamination events can occur in real systems subjected to transients (which sometimes determine situations of negative pressure) or, more frequently, when localized rehabilitation work requires the temporary isolation and emptying of a certain number of pipes, thus sometimes facilitating the intrusion of undesired substances. More specifically, we introduce a methodology for evaluating a synthetic vulnerability index in order to compare different WDSs and their robustness with reference to the number of potentially contaminated nodes.

## 2. Materials and Methods

#### 2.1. Complex Networks

Complex networks are ubiquitous in nature, society and technology [

28], and can be represented by a set of multiple interconnected components (vertices or nodes) and a set of links (or edges). A fundamental distinction may be introduced according to the type of connection: technological (or material) networks are made by physical links, while virtual, conceptual and functional networks are based on several types of relationship between nodes (social networks or the World Wide Web are typical examples). Each network can be identified by an adjacency matrix, from which many characteristics can be derived: degree distribution, centrality measures (degree, closeness and betweenness), spectral properties, metric and topological structure, and global invariants [

29,

30].

In particular, degree distribution is fundamental for obtaining useful insights into the structure of a network, and indeed, some seminal studies classified complex networks according to their degree distribution as random [

31], small-world [

32] and scale-free [

33]. Random and small-world networks are characterized by Poisson or exponential distributions, while scale-free networks exhibit power-law degree distributions, resembling their self-similarity and the fact they are formed by many nodes with a small number of connections and very few nodes with many connections, called “hubs”.

Traditionally, such classification has also been adopted for distinguishing complex networks according to their behavior arising as a consequence of random failures or targeted attacks, i.e., structure affects function [

34]; random networks are less vulnerable to intentional attacks, while scale-free networks are less vulnerable to random failures.

#### 2.2. Cayley Regular Graphs

In order to evaluate the Vulnerability Index, we adopt a reference system represented by a particular kind of regular and self-similar tree-like network, that is, a Cayley regular graph [

29];

Figure 1 shows an example with a bifurcation ratio of three,

X = 3 (five levels are represented). As the number of levels grows, the network resembles a fractal object [

35]. It is an idealized case of a WDS with only one reservoir, supplying a self-similar tree-like network.

This represents a simple case in which the adjacency matrix of the WDS is the same as that of the virtual network, which we can obtain from quali-quantitative simulations, as described in the following section. In this case, the degree distribution of the number of nodes supplying a given number of “downstream” nodes is exactly power-law (of course, such a system is most vulnerable to targeted attacks, meaning that it can be completely compromised with one attack at its main “hub”, or central supply node).

Some analytical expressions can be derived for such a simple system. The total number of nodes,

N_{t}(

l), supplied by a given node that is at level

l is given by:

where the symbol

l represents the number of the level counted from the origin of the graph (as shown in

Figure 1), so

l = 0, 1, 2, 3, 4, while

l’ is given by:

l’ =

L −

l − 1, and can be thought as a “reverse” coordinate with respect to

l.

N_{t}(

l) can also be expressed as:

and in terms of

l’, the following expression holds:

from which it is clearly seen that

N_{t}(

l’) behaves as a power law:

The number of nodes at level

l,

N_{S}(

l), is given by:

while the total number of nodes in the graph can be calculated as:

The number of nodes at level

l, dimensionalized with respect to the total number of nodes in the network, is then expressed as:

which can be written in terms of

l’:

We can easily obtain the cumulative value of

N_{S,a}(

l’), denoted

N_{S,a,cum}(

l’), as:

which is approximated by the following power law:

Thus, by comparing Equation (4) with Equation (10), we can conclude that:

The variable N_{S,a,cum}(l’) represents the quantity we need in order to calculate the distribution function of potentially contaminated nodes. From Equation (11), it is easily seen that N_{S,a,cum}(l’) scales linearly with respect to N_{t}(l’) on a log-log plot (with a negative slope). In a real system, where loops are present and the direction of flows is not easily determined, we need to perform hydraulic simulations. The following section describes the methodology adopted for calculating the distribution function for a generic WDS.

#### 2.3. Methodology for Determining the Virtual Network and Its Distribution Functions

The aim of the methodology is to assess the vulnerability of a WDS with respect to contamination threats. In other words, we qualify a system as highly vulnerable (high values of Vulnerability Index) if there is a high probability that big portions or even all of it could be potentially contaminated should a random attack occur (by random attack we mean an accidental injection of pollutant).

The key issue resides on the construction of a virtual network through quality simulations performed on the WDS, in such a way that we determine, for every junction, all the other “downstream” nodes that are hydraulically connected to it. As we will see, such a virtual network is much more dense than that of the underlying WDS, the former being much more connected and complex than the latter, having a planar and almost regular structure. In this respect, we recall that WDSs are city spatial networks with topologies constrained by several environmental factors, and need to be robust with respect to random failures and intentional threats; for such reasons, WDSs are not generally characterized by scale-free features [

36].

The analysis proceeds with the derivation of the out-degree distribution function of this virtual network, and analytical expressions are adopted for fitting the simulation results. In particular, the overall out-degree distribution is well approximated by a stretched exponential function, given by:

where

A,

B and

C are positive constants. The vulnerability of a WDS may be estimated as the “distance”, in a probabilistic sense, between the out-degree distribution of the corresponding virtual network and that of a “reference” system, which is, by definition, the least vulnerable with respect to random attacks. To this end, it is possible to measure such “distance” by integrating the difference between these two out-degree distribution functions. We will denote, by the

Vulnerability Index, the measure of such difference.

Since Equation (12) in not analytically integrable, we opt for the separate approximation of the out-degree distribution between a power-law (for low-to-medium values of downstream supplied nodes) and an exponential function (for higher values). In this way, an analytical expression of the Vulnerability Index may be derived.

Moreover, we make the following assumptions: (1) contamination events originate only from one substance injection event at nodes; (2) the hydraulic conditions are not altered by the presence of contaminants (or by the volume of the injection); (3) each node may be the source of injection with equal probability (we therefore consider the likely intrusion of contaminants along pipes as if it were at junctions); (4) injection is modeled as steady-state, continuous and constant throughout the simulation; (5) pollutant decay or deposition processes (or even interactions with pipe walls) are neglected.

The whole procedure may be summarized in the following main steps:

Extended period simulations and source tracing analyses are performed with Epanet, one for each junction as a trace node, with sufficient simulation time in order to reach steady-state (or cyclic) conditions.

A directed functional network is built, where the adjacency matrix is characterized by having 1 at position ij (for i ≠ j) if, and only if, the contaminant injected at the i-th node reaches the j-th node, with a predefined concentration threshold (typically, 0.1%).

The out-degree distribution of this directed, complex network is what we have previously called N_{S,a,cum}(l’), and may be plotted on a log-log scale with respect to N_{t}(l’), the number of nodes supplied.

A stretched exponential function is fitted to the results of numerical simulations in such log-log plots. This allows the evaluation of its scaling behavior and can be compared to the power-law distribution describing the tree-like regular network. Accordingly, two separate fittings may also be obtained, namely, a power-law and an exponential function for subsequent analytical derivation.

The analytical formula derived in

Section 2.4 allows an estimate of the vulnerability of the system. Thus, the more “distant” (in a probabilistic sense) the two distributions are, the more different the behavior of the real WDS is from a purely tree-like and auto-similar network, resulting in higher values of the index.

#### 2.4. Analytical Expression of the Vulnerability Index: A Pragmatic Derivation

The relationship between

N_{S,a,cum} and

N_{t} for a generic WDN may be approximated by a stretched exponential distribution on a log-log plot, as emerged from the results (see

Section 3). However, in order to derive an analytical expression for the Vulnerability Index, the first part of the out-degree distribution is approximated by a power-law, and the second part by an exponential fit. If we denote

N_{inf} = 1 and

N_{sup} as the lower and upper limits of the range for which power-law is adopted and

N_{sup} and

N_{max} as the corresponding limits for the exponential approximation, we can say that the following expressions hold (

x represents

N_{t}):

On the other hand, the Cayley graph represented in

Figure 1 is characterized by the following relationship, valid throughout the domain:

In this way, the analytical expression giving the Vulnerability Index

V_{I}—that is, the area between the out-degree distribution of the WDN and the tree-like Cayley graph—can be written as:

Keeping in mind that

N_{inf} = 1, simple manipulation gives the following expression:

## 4. Discussion and Concluding Remarks

The paper presents a new approach based on a synergy between Complex Network Theory and quali-quantitative Epanet simulations. The results obtained from source tracing analyses allow the determination of an adjacency matrix describing a virtual, directed network in which connections are established if, and only if, a substance injected at a given node reaches another one. In this way, a distribution function representing the potential contaminated nodes in a WDS can be derived, and its distance (in a probabilistic sense) from a theoretical benchmark tree-like, auto-similar network may provide an overall assessment of its vulnerability to random contamination threats.

The paper introduces a vulnerability index useful for WDS operators from a practical point of view, since it summarizes with only one number the tendency of a WDS to be resilient with respect to random injections. According to this number, different systems can be directly compared, and WDS operators may be supported in planning interventions in their assets. This could also help decision makers and WDS operators in developing Water Safety Plans for their systems.

It is interesting to note that the diagrams in the preceding figures describe a scaling law which, starting at the peripheral portions of a network, “goes up” towards the main points of supply. In other words, on the left of the plots, the results show what happens at the terminal units of a WDS (small values of N_{t}), while when moving to the right of the diagram, we are virtually going “upstream” in the water distribution system, towards the reservoirs (which supply a large amount of N_{t}).

The analysis of the cumulative distribution functions tells us that at the peripheral portions of a WDS, the scaling law resembles that of a scale-free network, well described by a power-law distribution. However, the overall behavior of a real WDS is actually quite different from that of a scale-free network, being distant from the theoretical line representing the power-law distribution. This is in agreement with what was already observed by other authors [

36,

43], because of the requirement that real systems should be reliable with respect to random failures and intentional threats.

Such results may also be interpreted as a confirmation of the growth and evolution of such man-made systems, where after the initial planning and construction phases (typically regular and carefully designed), an expansion and aggregation of portions follow, in order to progressively connect the peripheral areas to the main system. In this way, while the main skeleton is almost regular in structure (exponential-like distribution), the successive portions are more tree-like and characterized by a more fractal-like behavior, reflected in a power-law out-degree distribution function.

This observation leads us to the possibility of quantifying the overall vulnerability of a given WDS by looking at the difference between its cumulative distribution function and the theoretical line representing a tree-like auto-similar network, the latter described by a power-law distribution.

The vulnerability index is a dimensionless number representing the distance between two probability distributions, one related to a real WDS and the other represented by an ideal, tree-like and auto-similar network with only one supply node (reservoir). The more distant the two distributions are, the more the behavior of the real WDS is different from that of the ideal network. Since the ideal network is highly resilient with respect to random contamination events, the higher the value of the vulnerability index (like WDS G), the less resilient the real WDS is to random events.

In order to derive an analytical expression, we have to split the overall approximation in two separate fittings: the power-law for low-to-medium values of N_{t} and the exponential for higher values.

The results show that different systems can be compared and that a qualitative assessment of the vulnerability may be performed in order to prioritize interventions or for planning purposes when compiling water safety plans. The current limitation of the proposed approach resides in the assumption of the dynamics of contamination event, that is, the hypothesis that the injection lasts for a sufficient time until steady-state conditions are established all over the system. This is the very first step, however, but has the undoubted advantage of unveiling some basic mechanisms of WDS “functioning”, as well as the tendency of some networks to be more prone to diffuse contamination in many areas. The next step will be that of relying on more realistic contamination events as described by transient injections.

One interesting issue to put into evidence is that it is possible to identify some critical portions of a WDS that may be affected by contamination events, not only for failures occurring nearby, but also for accidental injections coming from distant portions of the system. In this regard, it must be remembered that we are dealing with a virtual, directed complex network, and such particular zones could also serve as a further criterion in defining DMAs as being less vulnerable from the safety point of view. Research is currently being done in this direction.