A Novel Graph-Based Vulnerability Metric in Urban Network Infrastructures: The Case of Water Distribution Networks

: The key contribution of this paper is to embed the analysis of the network in a framework based on a mapping from the input space whose elements are nodes of a graph or the entire graph into an information space whose elements are probability distributions associated to objects in the input space. Speciﬁcally, a node is associated to the probability distribution of its node-to-node distances and the whole graph to the aggregation of these node distributions. In this space two distances are proposed for this analysis: Jensen-Shannon and Wasserstein, based respectively on information theory and optimal transport theory. This representation allows to compute the distance between the original network and the one obtained by the removal of nodes or edges and use this distance as an index of the increase in vulnerability induced by the removal. In this way a new characterization of vulnerability is obtained. This new index has been tested in two real-world water distribution networks. The results obtained are discussed along those which relate vulnerability to the loss of efﬁciency and those given by the analysis of the spectra of the adjacency and Laplacian matrices of the network. The models and algorithms considered in this paper have been integrated into an analytics framework which can also support the analysis of other networked infrastructures among which power grids, gas distribution, and transit networks are included.


Overview and Motivation
The main motivation of this paper is two-fold. The first is theoretical, meaning the introduction of a "rich" representation of a graph underlying a water distribution network as an element in a space of probability distributions. This space can be endowed with different distance measures which allow the computation of a new index of the dissimilarity between networks. The second is to show that the vulnerability index derived from this representation can offer additional insights to those derived from the loss of efficiency and the eigenvalue analysis of the adjacency and Laplacian matrices.
Indeed, the availability of new vulnerability measures is important in the analysis of networked infrastructures, as water, energy, and transport, which have developed similar functional and structural features in their evolution over time: spatial, but also financial, constraints have significantly restricted their connectivity, robustness, and their capability to deliver their service with failed or damaged components, in short, their robustness. The above constraints have also generated systemic risk and cascading effects exacerbated by the complexity of the infrastructure with a large number of components: pipes, valves, pumping stations, tanks and consumption points in the case of water distribution networks; generation structures, switching substations and high voltage connections in power grids

The Contributions of This Paper
The main contribution of this paper is to propose a novel vulnerability measure which can be used along other measures in order to give additional insight into the structural features of the network.
This result is based on the introduction of a mapping from an "input" space where the elements are graphs or graph elements like nodes or edges to a probabilistic space whose elements are probability distributions associated to elements in the input space. The use of a probabilistic distance-Wasserstein distance in particular-between elements in the probabilistic space, can be specialized to discrete distributions and particularly histograms.
Histograms are suitable to represent node-to-node distance distributions in the graph model of the WDN. This allows the introduction of a new set of vulnerability metrics given by the distance between the probability distributions of node-node distances between the original network and that resulting from the removal of nodes/edges. Two such probabilistic measures have been analyzed: Jensen-Shannon (JS), based on information theory, and the Wasserstein (WST) distance, an instance of optimal transport. The computational results confirm that the value of the distances JS and WST is strongly related to the criticality of the removed edges.
There are two major advantages of the Wasserstein distance: the first is that JS might become undefined in many situations while WST distances are generally well defined and provide an interpretable distance metric between distributions.
The second is that, under quite general conditions, the WST distance is a differentiable function of the parameters of the distributions which makes possible its use to assess the sensitivity of the network robustness to distributional perturbations.
A general methodological scheme is proposed connecting different modelling and computational elements, concepts, and analysis tools; it enables an analysis framework suitable for assessing robustness also of other networked infrastructure like energy, gas, and transport. This framework has been designed, implemented, and tested on two real-life urban networks; it can support decision-making both at the design stage, to simulate alternative network layouts of different robustness, and at the operational stage where it is necessary to make a decision about which nodes/edges are to be temporarily removed for maintenance and rehabilitation.

Organization of the Paper
The structure of this paper is as follows: Section 2 gives background notions on graph models and network analysis; Section 3 contains background material on the spectral analysis, including spectral clustering, and the measures of vulnerability based on the notion of efficiency. Section 4 introduces the new methodology based on probabilistic measures of distance between networks. Section 5 describes the different WDNs used in this study, the computational results, and their discussion. Section 6 describes the modeling and algorithmic structure of the analysis framework proposed. Finally, Section 7 contains some conclusions and perspectives.

Mathematical Background
Graph theory is the mathematical basis to provide a unifying language for the study of networks. With this in mind, it is useful to give some basic definitions which will be used in the sequel. For a wide-ranging analysis of the role of graph theory in the analysis of networks the reader is advised to look at [29].

Graph Theory
Let us denote a graph with G = (V, E), where V is the set of nodes and E is the set of edges. Each edge of G is represented by a pair of nodes (i, j) with i = j, and i, j ∈ V and with n = |V| and m = |E|. If (i, j) ∈ E, i and j are called adjacent nodes. A graph G is undirected if (i, j) and (j, i) represent the same edge. A graph G is simple if no self-loops are admitted (edges starting from a node and ending on the same node) and only one edge can exist between each pair of nodes (i, j), with i = j. The adjacency relationship between the nodes of G can be represented through a non-negative n × n matrix A (i.e., the adjacency matrix of G). The entry a i,j of the adjacency matrix A is 1 if i and j are adjacent nodes (i.e., (i, j) ∈ E), and 0 otherwise. Furthermore, a ij = a ji if G is undirected and a ii (entries on the diagonal) are 0 if G is simple.

•
The degree of the node i, k i is the number of edges having i as one of the two nodes on the edge: k i = n ∑ j=1 a i,j . Any of the edges having i as one of its nodes is called incident When G is directed, meaning that the order of the two nodes of an edge is relevant for its definition, the k i can be split into out-degree (number of edges having i as first node) and in-degree (number of edges having i as second node). • A path in a graph is a sequence of nodes connected by edges and the length of the path is the number of edges. A connected component is a maximal subgraph when all nodes can be reached from every other.

•
The shortest path between i and j is the path with the smallest length. This length is called the distance between i and d i,j . The largest distance among each possible pair of nodes in G is named diameter D(G).

•
The characteristic path length is the average distance for every possible pair of nodes (i, j).
A useful representation is to arrange the distances in the distance matrix D = d i,j i, j = 1, . . . , n.
The maximum entry of row i max j=1,...,n d i,j is also known as the eccentricity of node i.
The maximum eccentricity among the nodes is equal to D(G).
if is the largest possible subgraph for which you could not find another node in the graph that could be added to the graph with all the nodes be still connected.
The core concept is centrality which addresses the question "which are the most important nodes in a network?". There are many centrality measures from the simplest like node degree, which can anyway be illuminating, to eigenvector-based measures like Page Rank.

Network Analysis: The Basic Measures
The density of the network is the fraction of edges which are present in the network: The number of edges m = 1 If c is the mean node degree, c = 1 n n ∑ i=1 k i and we get c = 2m n and q = c n−1 . The density is in the range (0, 1). A cut-set, specifically a node cut-set, is a set of nodes whose removal disconnects i and j. A minimum cut-set is the smallest cut-set, analogously for edge cut-set.
The centrality measures address the issue of the relative importance of nodes/edges. The most widely used measures are: : is based on the mean distance from i to j averaged on all nodes. extent to which a node lies on the paths between other nodes. An edge betweenness that counts the number of shorter paths that run along the edge can be similarly defined. • Link-per-node ratio (e), as the number of edges of a graph with respect to the number of its nodes. e = m n . • Central point dominance c b , based on betweenness centrality is a measure for characterizing the organization of a network according to its path-related connectivity; is the betweenness centrality of the node i and b max is the maximum value of betweenness centrality over all the n nodes of the network.

•
The clustering coefficient (CC) is the number of triangles with respect to the overall number of possible connected triples, where a triple consists of three nodes connected at least by two edges while a triangle consists of three nodes connected exactly by three edges: There are other definitions of CC for which the reader is addressed to [29]. To compute the centrality indices in this paper, the open-source software Cytoscape has been adopted [30]. This point will be further discussed in Section 3.

Background on Vulnerability and Spectral Analysis
This section introduces background notions about the analysis of the graph eigen structure and how it is exploited to compute connectivity-based vulnerability measures.

Vulnerability Analysis Based on Efficiency
The performance of the network is often evaluated as the change of the efficiency, as defined in [3]: where the d ij represent the distance between i and j. Normalization by n(n − 1) ensures that E ≤ 1, in case of unweighted graph. The maximum value E = 1, is assumed if and only if the graph is complete. A way to measure the vulnerability of the network is using the loss of efficiency [3,31] observed when some nodes/edges are removed. The relative drop in the network efficiency (loss of efficiency) caused by the removal of a node i from the graph is defined as where G\{i} denotes the network G without the node i and is called the loss of efficiency of G. The maximum and the mean loss, over all the nodes, are given by: Analogous formulas can be written removing the edges. Another useful reference for the analysis and generalization of efficiency measures is [31].

Spectral Analysis
Spectral graph theory studies the eigenvalues of matrices that embody the graph structure. One of the main objectives in spectral graph theory is to deduce structural characteristics of a graph from such eigenvalue spectra.
In case of undirected graphs, the adjacency matrix A(G) is symmetric and all its eigenvalues are real. The eigenvalues µ 1 (G) ≤ µ 2 (G) ≤ . . . µ n (G) of A(G) are called the spectrum of G. The largest eigenvalue of the adjacency matrix µ n (G) is called spectral radius of G and is denoted by ρ(G). An important property is given by the following inequality [32] . . , n that relates the spectral radius with ∆(G), the maximum degree of the nodes.
The spectrum of A(G) allows to define the Eigenvector centrality of the node i, is a ij x j . Katz centrality and Page Rank algorithm are just parametrized version of eigenvector centrality [29]. The difference s(G) = ρ(G) − µ n−1 (G) between the spectral radius of G and the second largest eigenvalue of the adjacency matrix A(G) is called the spectral gap of G [33]. A small value of s(G) is usually observed through low connectivity, and the presence of bottlenecks and bridges whose removal cuts the graph into disconnected parts. A spectral distance between two networks has been proposed in [34] as the sum of the absolute differences of the eigenvalues of the adjacency matrix.
The Laplacian matrix of G is an n × n matrix is positive semi-definite in case of simple graphs. The eigenvalues of L(G) are called the Laplacian eigenvalues of G. The Laplacian eigenvalues λ 1 (G) = 0 ≤ λ 2 (G), . . . ≤ λ n (G) are all real and nonnegative. The smallest eigenvalue is always equal to 0 with multiplicity equal to the number of connected components of G. The second smallest eigenvalue is called the algebraic connectivity of G which is one of the most widely used measures of connectivity. Larger values λ 2 (G) represent higher robustness against efforts to disconnect the graph, so the larger it is, the more difficult it is to cut a graph into independent components. An important inequality for the algebraic connectivity [6] is: that relates it with the minimum degree of the nodes δ(G) = min i=1,...,n k i . In case of a connected graph, also the following inequality can be proved [35] that relates the algebraic connectivity with the diameter of the graph. Another spectral distance is based on the analysis of the eigenvectors of the Laplacian [4].

Graph Clustering
Given two disjointed subsets of V, C 1 , and C 2 , an n-dimensional vector z i.e., is used to represent the association of each node to cluster C 1 or C 2 The graph clustering problem can be formulated as the minimization of the following function f (z): where L ij are the entries of the Laplacian matrix. The important feature of spectral clustering methods is that they produce a set of balanced clusters. An elegant solution, conceptually simple but computationally inefficient, to the problem was proposed in [6] which identified the eigenvector corresponding to λ 2 (usually known as the Fiedler vector) as the vector z which provides the optimal bi-partitioning of the graph.
An effective computational scheme, proposed in [36], uses a data representation in the lower dimensional space spanned by the most relevant eigenvectors and has spawned many new methods analyzed in [37].
The basic steps are: 1.
Construct an affinity matrix S(G) = I + A(G) whose eigenvalues are the same as Compute the eigenvalues of L N (G) and the eigenvectors corresponding to the K largest eigenvalues of L N (G) and denote them by u 1 , u 2 , . . . , u k . 4.
Build the matrix U such that the k-th column of U is u k and normalize the rows such that each row has unit length.

5.
Treating the rows as points in the K-dimensional space R K and perform K-means clustering of these points in K clusters.
The implementation of this method is given in the plugin ClusterMaker2 of Cytoscape. The main option is to set K manually either before the clustering process or after the eigenvalue calculation.
ClusterMaker2 offers an option to select the number of clusters automatically evaluating the eigenvalues ϕ i , i = 1, . . . , n of S(G). K is the smallest integer i such that the ratio The parameter ε can be tuned; smaller values imply a few larger clusters while larger values generate more fine-grained clusters.

Vulnerability Analysis Using Spectral Analysis
There is no specific formula, contrary to those reported in the previous subsections, linking spectral analysis to a measure of vulnerability related to the removal of a node. However, both algebraic connectivity λ 2 , the second smallest eigenvalue of the Laplacian L(G) and spectral gap s(G), the difference between the spectral radius of G and the second eigenvalue of the adjacency matrix A(G), are indicators of the difficulty of splitting the graph. The larger the algebraic connectivity, the more difficult it is to disconnect the graph. It is also related to the min-cut problem in spectral clustering [26]. A large value of the spectral gap, together with a uniform degree distribution, results in higher robustness against node and link failures. The larger the spectral gap the more robust is the network.

Probabilistic Measure of Distance between Networks
The measures introduced in Section 3 are based on distances and their average values. In this section a new analysis is performed in terms of node-node discrete distance distributions whose values are the fraction of nodes which are connected to i at a distance k with each node i = 1, . . . , n of the graph G(V, E).
The distance distribution over the whole graph is given by Two graphs G and G are considered with their distributions P G (k) and P G (k) that will be referred to as p and p .
In order to give an instance of the computation of the node-to-node distributions, a small synthetic water distribution network, Anytown (Figure 1), is considered [38]. The associated graph G consists of 25 nodes and 44 edges. G is the graph without the red edge. Two graphs and ′ are considered with their distributions ( ) and ′ ( ) that will be referred to as and ′.
In order to give an instance of the computation of the node-to-node distributions, a small synthetic water distribution network, Anytown (Figure 1), is considered [38]. The associated graph consists of 25 nodes and 44 edges. ′ is the graph without the red edge.  In order to compare two probability distributions one can use many distance and divergence measures [28].
In this paper the following distances are considered: • The Kullback-Leibler (KL) divergence: • The Jensen-Shannon (JS) divergence The support of P G (k) and P G (k) are respectively the integers k = 1, . . . , D(G) (analougsly for G ). When G is derived from G removing some edges then D(G ) ≥ D(G). Since the distributions are represented by histograms (Figure 2) one can extend to G the support of G setting µ G (k) = 0 for k = D(G) + 1, . . . , D(G ). Two graphs and ′ are considered with their distributions ( ) and ′ ( ) that will be referred to as and ′.
In order to give an instance of the computation of the node-to-node distributions, a small synthetic water distribution network, Anytown (Figure 1), is considered [38]. The associated graph consists of 25 nodes and 44 edges. ′ is the graph without the red edge.  In order to compare two probability distributions one can use many distance and divergence measures [28].
In this paper the following distances are considered: • The Kullback-Leibler (KL) divergence: • The Jensen-Shannon (JS) divergence In order to compare two probability distributions one can use many distance and divergence measures [28].
In this paper the following distances are considered: • The Kullback-Leibler (KL) divergence: • The Jensen-Shannon (JS) divergence Information-theoretic measures like KL and JS might become undefined if the compared distributions do not have identical support.

•
The Wasserstein distance is: where Γ(p, p ) denotes the set of all joint distributions γ(x, y) whose marginals are respectively p and p . The Wasserstein distance is also called the Earth Mover (EM) distance. Intuitively, γ(x, y) indicates how much mass must be transported from x to y in order to transform the distributions p into the distribution p . The Earth Mover's distance is the minimum energy cost of moving and transforming a pile of sand in the shape of p to the shape of p'. The cost is quantified by the amount of sand moved times the moving distance.
The EM distance then is the cost of the optimal transport plan. Generally, W(p, p ) can be by generalized by an index q to become Let's now consider the case of 2 discrete distributions f(x) and g(x): where The unit cost of transport between x i and y j is defined as the q-th power of the Euclidean distance c ij = x i − y j q . The transport plan γ ij represents the mass transported from x i to y j . The WST distance between discrete distributions f and g is: The constraints above ensure that the total mass transported from x i and the total mass to y j matches respectively f i and g j .
There are some particular cases, very relevant in applications, where WST can be written in an explicit form. Let P and P be the cumulative distribution for one-dimensional distributions p and p on the real line and P −1 and Q −1 be their quantile functions. Then In the case of water distribution networks, the distributions of node-node distances are discrete and 1-dimensional with the same number of samples, and the computation of WST reduces to the comparison of two 1-dimensional histograms which can be performed by a simple sorting and the application of Formula (22).
where x * i and y * i are the sorted samples. In this paper, q = 1. A key advantage of EM over JS is its differentiability with respect to distribution parameters, as shown by the following example.
Let us consider Z = U(0, 1) the uniform distribution on the unit interval. Let P be the distribution of (0, Z) (0 on the x-axis and the random variable Z on the y axis and P θ = (θ, Z). Therefore, Wasserstein provides a smooth measure which is useful for any optimization and learning process using gradient descent [39].
The Wasserstein distance can be traced back to the works of Gaspard Monge [40] and Lev Kantorovich [41].
Recently it has also been used in the generation of adversarial networks [42]. Important references are [10,43] which also give an up-to-date survey of numerical methods.
Wasserstein distances are generally well defined and provide an interpretable distance metric between distributions. Computing Wasserstein distances requires in general the solution of a constrained linear optimization problem which has, when the support of the probability distributions is multidimensional, a very large number of variables and constraints.
The square root of the Jensen-Shannon divergence is a metric (D ∈ [0, 1]). The software used is the Wasserstein function from the python library SciPy.

The Network Models
In this section, two WDNs are analyzed.
• Neptun is the WDN of the Romanian city of Timisoara ( Figure 4), with an associated graph of 333 nodes and 339 edges, analyzed in the European project Icewater [44]. • Abbiategrasso refers to a pressure management zone in Milan (namely, Abbiategrasso) with an associated graph consisting of 1213 nodes and 1391 edges, analyzed in the European project Icewater [44].
In analyzing WDNs one must consider that most of the end-users are supplied by single connections. To avoid a bias in the analysis, a preliminary preprocessing has been performed by cutting the final connections.

Clustering
Graph clustering approaches, such as Spectral Clustering, can be used to identify the specific edges (pipes) whose removal may induce a disconnection of the network. In this paper, Spectral Clustering has been performed (through Cytoscape's plug-in named ClusterMaker2) to identify sub-networks connected by a limited (minimal) number of edges, that are pipes whose breakage implies the disconnection of some WDN portion. The number of clusters K is set according to context information about the districtualization adopted by the water utility. In the following figures these pipes are highlighted; it is important to note that breakages must occur, at the same time, on all the different red edges to imply a hydraulic disconnection. Breakages affecting only one pipe may imply a reduction in the efficiency of the network and an increase in vulnerability. Table 1 exhibits the basic measures of Neptun (Figure 3). It is evident that WDNs have specific features: the density is very low and correspondingly the link/node ratio is close to one. edges to imply a hydraulic disconnection. Breakages affecting only one pipe may imply a reduction in the efficiency of the network and an increase in vulnerability. Table 1 exhibits the basic measures of Neptun (Figure 3). It is evident that WDNs have specific features: the density is very low and correspondingly the link/node ratio is close to one.   Table 2 exhibits the efficiency-related measures as defined in Section 3.1 by Equations (4), (6), and (7) and the algebraic connectivity defined in Section 3.2.  . Critical edges (red) highlight a cut-set in Neptun consisting of 2 edges (bridges) whose simultaneous removal generates a disconnection. Table 2 exhibits the efficiency-related measures as defined in Section 3.1 by Equations (4), (6), and (7) and the algebraic connectivity defined in Section 3.2.  Table 3 exhibits the probabilistic distances and loss of efficiency. By Jensen-Shannon we mean the distance D JS (G, G ). Wasserstein is the distance W(G, G ). Loss of efficiency is defined by Equation (5) in Section 3.1.  Table 4 exhibits the basic measures of Abbiategrasso (Figure 4). It is evident that WDNs have specific features: the density is very low and correspondingly the link/node ratio is close to one.  Table 3 exhibits the probabilistic distances and loss of efficiency. By Jensen-Shannon we mean the distance ( , ′ ). Wasserstein is the distance ( , ′ ). Loss of efficiency is defined by Equation (5) in Section 3.1.  Table 4 exhibits the basic measures of Abbiategrasso ( Figure 4). It is evident that WDNs have specific features: the density is very low and correspondingly the link/node ratio is close to one.   Table 5 exhibits the efficiency-related measures as defined in Section 3.1 by Equations (4), (6), and (7) and the algebraic connectivity defined in Section 3.2.  . Critical edges (red) highlight a cut-set in Abbiategrasso consisting of 3 edges (bridges) whose simultaneous removal generates a disconnection. Table 5 exhibits the efficiency-related measures as defined in Section 3.1 by Equations (4), (6), and (7) and the algebraic connectivity defined in Section 3.2.  Table 6 exhibits the probabilistic distances and loss of efficiency. By Jensen-Shannon we mean the distance D JS (G, G ). Wasserstein is the distance W(G, G ). Loss of efficiency is defined by Equation (5) in Section 3.1.

Discussion of the Computational Results
WDN have their own specific features: the two real-world WDNs analyzed are very sparse (with density q lower or equal to 0.006). In particular, their degree distribution does not follow a power law and their connectivity measures, given in Tables 1 and 5 respectively for Neptun and Abbiategrasso, really set them apart from other kinds of networks like transportation, communications, and social. The near-planarity of WDNs, as suggested in [11], might be the reason behind this deviation. Indeed, WDNs are sparse near-planar graphs whose structure is the result of urban growth and unplanned expansion.
The computational results show that probabilistic distance measures have a good capacity to discriminate between different networks not only globally but also edge-wise.
They can support critical tasks of WDN management by just using topological and geometric information.
This remarkable result is displayed in Figures 5 and 6 respectively for Neptun and Abbiategrasso. Given the graph G = (V, E) associated to the network, each edge is represented by a pair of adjacent nodes (i, j). The removal of (i, j) yields G\{(i, j)} for which we compute the aggregate node-node distance distribution p(G\{(i, j)}) and the Wasserstein distance W(p(G), p(G\{(i, j)} ), whose value is represented by the the color associated to each edge (i, j) by the Wasserstein distance. The Wasserstein metric can be regarded as a natural extension of the Euclidean distance to statistical distributions via a single metric while still exploiting all the information present in the distributions. An instance in which the node-to-node distribution carries information not captured by the mean is given in Figure 6.  The Wasserstein metric can be regarded as a natural extension of the Euclidean distance to statistical distributions via a single metric while still exploiting all the information present in the distributions. An instance in which the node-to-node distribution carries information not captured by the mean is given in Figure 6.  The graphs G in Neptun (Table 3) and G in Abbiategrasso (Table 5)

The Analysis Framework
The following schema in Figure 8 summarizes the proposed analytical framework, and how network theory measures, graph clustering, and probabilistic distance measures are combined with the aim to (i) identify critical links (i.e., WDN's pipes) and (ii) quantify the impact implied by their individual removal (i.e., pipe breakage) in terms of loss of efficiency and distance between the original and induced graph (i.e., structural difference between the original WDN and the damaged one). As the main result, this supports the water utility in defining a preventive rehabilitation plan to improve WDN's robustness, targeting the most relevant WDN components, under budget constraints. All the different elements, concepts, and tools are connected among them to create an analysis framework which has been tested not only on synthetic networks, but on two real-life urban networks.

•
This analysis framework supports decision-making at the design stage to simulate alternative network layouts of different robustness and also at operational stage where the decision to be taken can be, which nodes/edges are to be temporarily removed for maintenance and rehabilitation. The differentiability of the Wasserstein distance is important to assess the sensitivity of the network robustness.

•
The main findings in this paper as well as the modeling and algorithmic framework The Wasserstein metric can be regarded as a natural extension of the Euclidean distance to statistical distributions via a single metric while still exploiting all the information present in the distributions. An instance in which the node-to-node distribution carries information not captured by the mean is given in Figure 6.

The Analysis Framework
The following schema in Figure 8 summarizes the proposed analytical framework, and how network theory measures, graph clustering, and probabilistic distance measures are combined with the aim to (i) identify critical links (i.e., WDN's pipes) and (ii) quantify the impact implied by their individual removal (i.e., pipe breakage) in terms of loss of efficiency and distance between the original and induced graph (i.e., structural difference between the original WDN and the damaged one). As the main result, this supports the water utility in defining a preventive rehabilitation plan to improve WDN's robustness, targeting the most relevant WDN components, under budget constraints. All the different elements, concepts, and tools are connected among them to create an analysis framework which has been tested not only on synthetic networks, but on two real-life urban networks.

Conclusions
Topological network techniques offer significant answers regarding the structure and functions of WDNs. As remarked in [11] "a realistic assessment of the network structure, efficiency or vulnerability should avoid attempting an exclusive characterization of network structure of using only single (or even a few) network measurements as ultimate indicators." The goal of this paper is to add to the already existing measures another characterization of the network robustness.
The main conclusions from the results reported in this paper can be synthetized as follows: • The main result of this paper is that probabilistic measures based on the probability distribution of node-node distances, yield a distance between the original network and that resulting from the removal, which can provide a set of new indicators of the increase in vulnerability.

•
The first such measure is given by the Jensen-Shannon divergence, based on Kullback-Leibler divergence; the second is the Wasserstein-1 (also called the "Earth-Mover") distance, an instance of optimal transport. The computational results confirm that the value of the distances JS and WST is strongly related to the criticality of the removed edges.
The key advantage of the Wasserstein distance is that is generally well defined and provides an interpretable distance metric between distributions. Moreover, under quite general conditions, it is a differentiable function of the parameters of the distributions. The differentiability of the Wasserstein distance is important to assess the sensitivity of the network robustness.

•
All the different elements, concepts, and tools are connected among them to create an analysis framework which has been tested not only on synthetic networks, but on two real-life urban networks.

•
This analysis framework supports decision-making at the design stage to simulate alternative network layouts of different robustness and also at the operational stage where it should be decided which nodes/edges are to be temporarily removed for • This analysis framework supports decision-making at the design stage to simulate alternative network layouts of different robustness and also at operational stage where the decision to be taken can be, which nodes/edges are to be temporarily removed for maintenance and rehabilitation. The differentiability of the Wasserstein distance is important to assess the sensitivity of the network robustness.

•
The main findings in this paper as well as the modeling and algorithmic framework platform developed can be straightforwardly translated to many networked infrastructures among which power grids, transit networks and also global supply chains whose vulnerability has been exposed in the recent COVID crisis.
According to the schema, as a first step, all the relevant network metrics are computed on the graph G associated with WDN. Then, Spectral Clustering is performed to identify the links whose removal leads to a disconnection of the graph. Iteratively, each one of these critical links is removed from G leading to a graph G . Both network measures for G and distance between G and G are computed and collected to rank the critical links with respect to the impact of their removal from G, in terms of loss of efficiency and distance.
It is important to remark that, for the purposes of this study, the analysis has been performed by considering all the edges of G, to empirically prove that the critical ones are actually those whose removal imply the highest loss of efficiency and distance value.

Conclusions
Topological network techniques offer significant answers regarding the structure and functions of WDNs. As remarked in [11] "a realistic assessment of the network structure, efficiency or vulnerability should avoid attempting an exclusive characterization of network structure of using only single (or even a few) network measurements as ultimate indicators." The goal of this paper is to add to the already existing measures another characterization of the network robustness.
The main conclusions from the results reported in this paper can be synthetized as follows:

•
The main result of this paper is that probabilistic measures based on the probability distribution of node-node distances, yield a distance between the original network and that resulting from the removal, which can provide a set of new indicators of the increase in vulnerability.

•
The first such measure is given by the Jensen-Shannon divergence, based on Kullback-Leibler divergence; the second is the Wasserstein-1 (also called the "Earth-Mover") distance, an instance of optimal transport. The computational results confirm that the value of the distances JS and WST is strongly related to the criticality of the removed edges. The key advantage of the Wasserstein distance is that is generally well defined and provides an interpretable distance metric between distributions. Moreover, under quite general conditions, it is a differentiable function of the parameters of the distributions. The differentiability of the Wasserstein distance is important to assess the sensitivity of the network robustness. • All the different elements, concepts, and tools are connected among them to create an analysis framework which has been tested not only on synthetic networks, but on two real-life urban networks. • This analysis framework supports decision-making at the design stage to simulate alternative network layouts of different robustness and also at the operational stage where it should be decided which nodes/edges are to be temporarily removed for maintenance and rehabilitation. Indeed, critical tasks of WDN management can be supported by just using topological and geometric information.
Author Contributions: All authors contributed equally to the paper. All authors have read and agreed to the published version of the manuscript.