A New Evolutionary Approach to Optimal Sensor Placement in Water Distribution Networks

: The sensor placement problem is modeled as a multi-objective optimization problem with Boolean decision variables. A new multi objective evolutionary algorithm (MOEA) is proposed for approximating and analyzing the set of Pareto optimal solutions. The evaluation of the objective functions requires the execution of a hydraulic simulation model of the network. To organize the simulation results a data structure is proposed which enables the dynamic representation of a sensor placement and its ﬁtness as a heatmap. This allows the deﬁnition of information spaces, in which the ﬁtness of a placement can be represented as a matrix or, in probabilistic terms as a histogram. The key element in the new algorithm is this probabilistic representation which is embedded in a space endowed with a metric based on a speciﬁc notion of distance. Among several distances between probability distributions the Wasserstein (WST) distance has been selected: WST has enabled to derive new genetic operators, indicators of the quality of the Pareto set and criteria to choose among the Pareto solutions. The new algorithm has been tested on a benchmark water distribution network with two objective functions showing an improvement over NSGA-II, in particular for low generation counts, making it a good candidate for expensive black-box multi-objective optimization


Introduction
In this paper, a new algorithm is proposed for effectively and efficiently monitoring the spread of "effects" triggered by "events". Many real-world problems fit into this general framework among which water distribution networks (WDNs), where sensing spots are sensors deployed at specific locations (i.e., nodes of the network), events are natural/intended contaminations and effects are spatio-temporal concentrations of the contaminant. This problem is usually known as optimal sensor placement (SP), where the goal is to optimize several objectives such as the amount of contaminated water, the number of inhabitants affected before detection, the detection time or the detection likelihood so that we faced with a multi-objective optimization problem (MOP). Due to potentially conflicting objectives, generally there is not a unique solution to multi-objective optimization problems but a set of solutions representing a trade-off between different objectives. This trade-off is characterized by the notion of dominance. The set of solutions representing the optimal (non-dominated) trade-offs between objectives is referred to as the Pareto set. Its image in the objectives space is called the Pareto front.
Exactly locating the complete Pareto set may be not possible, even for cheap objective functions: indeed, searching for an optimal sensor placement can be shown to be NPhard. Therefore, even for mid-size networks, efficient approximate methods are required, among which evolutionary approaches are often used. The overall goal of multi objective optimization is to generate a good approximation of the Pareto set.
Time to detection is the key element in the optimal sensor placement problem upon whose value the other objectives of a sensor placement can be computed.
The main motivation of this study is that considering only the average value, over all the events, in an objective function could be misleading and entail significant risks. In order to mitigate this risk, the standard deviation of detection time is considered as a companion objective. The idea has been put forward in different papers [1,2].
Evaluating the above objective functions is expensive because it requires to perform the hydraulic simulation for a large set of contamination events, each one associated to a different location where the contaminant is injected.
Simulating the contaminant propagation allows to compute the minimum detection time (MDT) provided by a specific sensor placement (SP) over all the simulated events and its standard deviation.
Literature review: Evolutionary algorithms (EA) have a long history in multi-objective optimization, with many successful applications to sensor placement in water distribution networks (WDN's) since the challenge "Battle of Water Sensors Networks" and many contributions have used NSGA-II [2][3][4]. More recently [5] considers the potential variations of model contamination probabilities within the WDN and introduce a regret based scalarization function based on the Chebyshev distance in the NSGA-II framework. An interesting methodology conditional value at risk (CVar) has been developed in [6], using four objectives and a criterion to choose from the Pareto solution set. Their approach considers quantiles to model extreme losses, in terms of detection time and affected population, caused by uncertain parameters. They also use a specific data structure storing all EPANET outputs for single point injection scenarios and extract from them multi-point water quality matrix based on superposition principles. A closely related paper to ours is [2] in which the conflicting objectives are the number of sensors (as a surrogate for their cost) and the risk of contamination defined as the product of the probability of not detecting the contaminant intrusion and the corresponding consequence expressed as the average volume of water consumed before detection. The algorithm NSGA-II is used to solve the MOP. Optimal sensor placement has been also addressed with methods from mathematical programming as in [7] which uses a mixed integer programming model solved by cPlex and a greedy randomized adaptive search procedure (GRASP) and [8] where a branch and bound sensor algorithm is proposed based on greedy heuristics and convex relaxation.
In this paper, the optimal sensor placement problem is modeled as a multi-objective optimization problem with Boolean decision variables and a new evolutionary algorithm for finding a good approximation of the Pareto set is proposed.
To organize the simulation results in a computationally efficient way a data structure is proposed collecting simulation outcomes for every SP which is particularly suitable for visualization and evolutionary optimization.
This data structure enables to visualize the dynamic representation of the spatiotemporal concentration of the contaminant and the detection time associated to a sensor placement. This structure is revealed through matrices represented by heat-maps which integrate a randomized set of contamination events.
Upon these heat-maps a mapping is built from the search space (where each SP is represented as a binary vector) into an information space, whose elements can be represented as a matrix or, as a histogram in the space of one-dimensional discrete probability distributions. The fitness of a sensor placement is represented probabilistically in the information space as the histogram of a discrete random variable.
The representation of a sensor placement as a histogram is the cornerstone of the new algorithm: it enables to measure the distance between sensor placements as a distance between distributions allowing a deep analysis of the search landscape and a significant speed-up in the optimization.
Among the many distances available the Wasserstein distance (WST) has been used in that provides a smooth and interpretable distance metric between distributions. This Water 2021, 13, 1625 3 of 18 probabilistic representation has allowed to derive new genetic operators and indicators of the quality of the Pareto set.
Distinguishing features of the new algorithm MOEA/WST are in particular the selection operator which drives a more effective diversification and exploration among candidate solutions and a problem specific crossover operator which generates two "feasible by design" children from two feasible parents. This new algorithm, called multi-objective evolutionary algorithm-Wasserstein (MOEA-WST) has been tested on two benchmark and two real-world water distribution networks resulting in significant improvement over a standard algorithm both in terms of hypervolume improvement and coverage, in particular at low generation counts.
After the formulation of the problem in Section 2, Section 3 describes the specific data structure to organize the simulation results in a computationally efficient way which is also suitable for visualization and evolutionary optimization. The mathematical background is explained in two sections: Section 4 covers the methodological background of the probabilistic characterization of the search space, in particular the introduction of the Wasserstein distance. Section 5 covers the basic notions of Pareto analysis of multiobjective optimization and generalizes them in the context of the probabilistic representation. Section 6 leverages all the elements previously introduced into the design of the new algorithm MOEA/WST stressing the role of the new operators. The computational results are presented and discussed in Section 7. The last section "Conclusions" elaborates on the advantages of the algorithm and on how the probabilistic representation has allowed substantial improvements in terms of hypervolume and coverage, requiring a significantly lower number of generations. This makes MOEA/WST a good candidate for expensive multi-objective optimization problems.

Problem Formulation
We consider a graph G = (V, E) We assume a set of possible locations for placing sensors, that is L ⊆ V. Thus, a SP is a subset of sensor locations, with the subset's size less or equal to p depending on the available budget. An SP is represented by a binary vector s ∈ {0, 1} |L| whose components are s i = 1 if a sensor is located at node i, s i = 0 otherwise. Thus, an SP is given by the nonzero components of s.
For a WDN the vertices in V represent junctions, tanks, reservoirs or consumption points, and edges in E represent pipes, pumps, and valves.
Let A ⊆ V denote the set of contamination events a ∈ A which must be detected by a sensor placement s, and d ai the impact measure associated to a contamination event a detected by the i-th sensor.
A probability distribution is placed over possible contamination events associated to the nodes. In the computations we assume-as usual in the literature-a uniform distribution, but in general discrete distributions are also possible. In this paper, we consider as objective functions the detection time and its standard deviation.
We consider a general model of sensor placement, where d ai is the "impact" of a sensor located at Node I when the contaminant has been introduced at Node a.
• α a is the probability for the contaminant to enter the network at Node a. • d ai is the impact for a sensor located at Node i to detect the contaminant introduced at Node a. • x ai = 1 if s i = 1, where i is the first sensor detecting the contaminant injected at Node a; 0 otherwise.
• p is the budget available in terms of number of sensors.
In our study we assume that all the events have the same chance of happening, that is α a = 1/|A|, therefore f 1 (s) is: As a measure of risk, we consider the standard deviation This model can be specialized to different objective functions as: f 1 is the average over the contamination events of the detection time for each event. For each event a and sensor placement s the minimum detection time is defined as MDT a = min Witht a the minimum time step at which concentration reaches or exceeds a given threshold τ for the event a. f 2 is the standard deviation of the sample average approximation of f 1 .

WNTR
The Water Network Tool for Resilience (WNTR) [9] is a Python package designed to simulate and analyze resilience of WDNs. WNTR is based on EPANET 2.0, which is a tool to simulate flowing of drinking water constituents within a WDN.
The simulation is computationally costly as we need one execution for each contamination event.
In our study, each simulation has been performed for 24 h, with a simulation step of 1 h. Assuming L = V and A = V (i.e., the most computationally demanding problem configuration) the time required to run a simulation for the synthetic example called Net1 (available with EPANET and WNTR, and whose associated graph is depicted in Figure 1) is 2 s. The simulation time scales linearly with the inverse of the simulation step. The detection threshold is 10% of the initial concentration.
• is the impact for a sensor located at Node to detect the contaminant introduced at Node .
where is the first sensor detecting the contaminant injected at Node ; 0 otherwise. • is the budget available in terms of number of sensors.
In our study we assume that all the events have the same chance of happening, that is α = 1 | | ⁄ , therefore ( ) is: where ̂ = ∑ ,..,| | is the MDT of event . As a measure of risk, we consider the standard deviation This model can be specialized to different objective functions as: is the average over the contamination events of the detection time for each event. For each event and sensor placement the minimum detection time is defined as = min : .
With ̂ the minimum time step at which concentration reaches or exceeds a given threshold for the event .
is the standard deviation of the sample average approximation of .

WNTR
The Water Network Tool for Resilience (WNTR) [9] is a Python package designed to simulate and analyze resilience of WDNs. WNTR is based on EPANET 2.0, which is a tool to simulate flowing of drinking water constituents within a WDN.
The simulation is computationally costly as we need one execution for each contamination event.
In our study, each simulation has been performed for 24 h, with a simulation step of 1 h. Assuming = and = (i.e., the most computationally demanding problem configuration) the time required to run a simulation for the synthetic example called Net1 (available with EPANET and WNTR, and whose associated graph is depicted in Figure 1) is 2 s. The simulation time scales linearly with the inverse of the simulation step. The detection threshold is 10% of the initial concentration.

Single Sensor and Sensor Placement Matrices
Denote with S ∈ R (K+1)×|A| the so-called "sensor matrix" ( represents the concentration of the contaminant for the event a ∈ A at the simulation step t = 0, . . . , K, with T max = K∆t. (ℓ) , ℓ represents the concentration of the contaminant for the event ∈ at the simulation step = 0, . . , , with = ∆ .
Thus, in our study = 24, ∆ = 1 and = 24. Without loss of generality, we assume that the contaminant is injected at the beginning of the simulation (i.e., = 0).
Analogously, a "sensor placement matrix" (Figure 3), ( ) ∈ ℝ ( ) | | is defined, where every entry ℎ represents the maximum concentration over those detected by the sensors in , for the event a and at time step . Suppose to have a sensor placement consisting of sensors with associated sensor matrices , … , , then ( ) is the matrix with entries ℎ = max ,…, ∀ ∈ . There is a relation between s and the associated ( ) : more precisely, the columns of ( ) having maximum concentration at row = 0 (i.e., injection time) are those associated to events with injection occurring at the deployment locations of the sensors in .
Moreover, ( ) is the basic data structure on which MDT is computed. Indeed, we can now explicit the computation of ̂ in ( ) and ( ): ̂ is the minimum time step at which concentration reaches or exceeds a given threshold τ for the scenario a, that is Thus, in our study T max = 24, ∆t = 1 and K = 24. Without loss of generality, we assume that the contaminant is injected at the beginning of the simulation (i.e., t = 0).
Analogously, a "sensor placement matrix" (Figure 3), H (s) ∈ R (K+1)×|A| is defined, where every entry h ta represents the maximum concentration over those detected by the sensors in s, for the event a and at time step t. Suppose to have a sensor placement s consisting of m sensors with associated sensor matrices S 1 , . . . , S m , then H (s) is the matrix with entries h ta = max

The Probabilistic Characterization of the Search Space
The search space consists of all the possible SPs, given a set of possible locations for their deployment, and resulting feasible with respect to the constraints in (P). Formally, s ∈ Ω = ∈ {0,1} | | : ∑ | | ≤ . In this section, sensor placements will be associated to a discrete probability distribution, specifically histograms, and a distance between them, namely the Wasserstein distance, will be introduced and briefly analyzed.

Histograms
In general terms a histogram is a function that counts the number of n observations of a random variable that fall into each of the disjoint categories (known as bins). Therefore, if k is the number of bins, the histogram satisfies the condition: There is a relation between s and the associated H (s) : more precisely, the columns of H (s) having maximum concentration at row t = 0 (i.e., injection time) are those associated to events with injection occurring at the deployment locations of the sensors in s.
Moreover, H (s) is the basic data structure on which MDT is computed. Indeed, we can now explicit the computation oft a in f 1 (s) and f 2 (s):t a is the minimum time step at which concentration reaches or exceeds a given threshold τ for the scenario a, that iŝ t a = min

The Probabilistic Characterization of the Search Space
The search space consists of all the possible SPs, given a set L of possible locations for their deployment, and resulting feasible with respect to the constraints in (P). Formally, In this section, sensor placements will be associated to a discrete probability distribution, specifically histograms, and a distance between them, namely the Wasserstein distance, will be introduced and briefly analyzed.

Histograms
In general terms a histogram is a function m i that counts the number of n observations of a random variable that fall into each of the disjoint categories (known as bins). Therefore, if k is the number of bins, the histogram m i satisfies the condition: To construct a histogram, the first step is to "bin" the range of values-that is, divide the support of the random variable into a number of intervals-and then compute the "weight" of the bin counting how many values fall into each interval. The bins are usually specified as adjacent, consecutive, non-overlapping intervals and are usually of the same size. If bins are the time subintervals of the simulation horizon and the weights are the number of events detected in each time subinterval (or their relative frequency) the histogram obtained is a rich representation of the information in H (s) about a placement ( Figure 4). We denote the time steps in the simulation cardinality is the number of contamination events, the bins are ∆t i and m i = |A i |.
To each sensor placement s we can associate not only the placement matrix H (s) but also the histogram h (s) whose bins are ∆t i and weights are |A i |.  We have added an "extra" bin (86,400 to 90,000) whose weight | | represents number of contamination events which were undetected during the simulation up to 86,400. In this way ∑ | | = 1.
The "ideal" placement is that in which | | = |A|. The relation between SPs and histograms is many to one: one histogram can be associated to different SPs.
Intuitively, the larger the probability mass in lower Δ the better is the sensor placement (Figure 4 left), the larger the probability mass.in the higher Δ the worse is sensor placement (Figure 4 right). The worst SPs are those for which no detection took place in the simulation horizon.

Wasserstein Distance
Let and ' be two continuous probability distributions. The Wasserstein ( ) distance is a measure of the distance between and ' given by: We have added an "extra" bin (86,400 to 90,000) whose weight |A k+1 | represents number of contamination events which were undetected during the simulation up to 86,400. In this way The "ideal" placement is that in which |A 1 | = |A|. The relation between SPs and histograms is many to one: one histogram can be associated to different SPs.
Intuitively, the larger the probability mass in lower ∆t i the better is the sensor placement (Figure 4 left), the larger the probability mass.in the higher ∆t the worse is sensor placement (Figure 4 right). The worst SPs are those for which no detection took place in the simulation horizon.

Wasserstein Distance
Let p and p be two continuous probability distributions. The Wasserstein (WST) distance is a measure of the distance between p and p given by: where Π(p, p ) denotes the set of all joint distributions γ(x, y) whose marginals are respectively p(x) and p (y).
In general terms, given two continuous random variables X and Y whose joint distribution f (X, Y) is known, then the marginal probability density function can be obtained by integrating the joint probability distribution, over Y, and vice versa. That is: Therefore, the marginal distribution over x adds up to ∑ x γ(x, y) = p (y) and analogously ∑ y γ(x, y) = p(x). The formula can be easily specialized to the case of two discrete random variables, X and Y. The marginal distribution of either variable-X(Y), given the joint probability distribution p(x , y), is given by: The expected cost averaged over all the (x, y) pairs can be computed as: The Wasserstein distance is also called earth mover's (EM) distance from its informal interpretation as the minimum cost of moving and transforming a pile of sand in the shape of the probability distribution p to the shape of the distribution p . The cost is quantified by the amount of sand moved times the moving distance.
If x is the starting point and y the destination the total amount of sand moved is γ(x, y) and the traveling distance is ||x − y|| and thus the cost is γ(x, y)||x − y||. One joint distribution γ(x, y) ∈ Π(p, p ) describes one transport plan: intuitively γ(x, y) indicates how much mass must be transported from x to y in order to transform the distribution p into the distribution p . The minimum among the costs of all sand moving solutions as the EM distance which is the cost of the optimal transport plan.
The Wasserstein (WST) distance fits in the framework of optimal transport theory. It can be traced back to the work of Gaspard Monge [10] and received its modern linear programming formulation by Lev Kantorovich [11]. Recently, the Wasserstein distance has become a key tool in image processing and machine learning, and it has been used also the generation of adversarial networks [12].
The formulation, computation and generalization of the WST distance require sophisticated mathematical models and raise challenging computational problems: important references are [13,14] which also gives an up-to-date survey of numerical methods. In the case of the sensor placement problem, the computation of WST reduces to the comparison of two 1-D histograms which can be done by simple sorting.
Among other probability distances, as Kullback-Leibler or Jensen-Shannon, WST has two key advantages. In addition, in the cases when the distributions are supported in different spaces, even without overlaps, WST can still provide a meaningful representation of the distance between distributions [15]. Another advantage of WST is its differentiability. Both points are illustrated in the following example [16].
Consider Z = U(0, 1) the uniform distribution on the unit interval. Let P be the distribution of (0, Z) (0 on the x axis and the random variable Z on the y axis and P θ = (θ, Z). Therefore, Wasserstein provides a smooth measure which is useful for any optimization and learning process using gradient descent [16].

Search Space and Information Space
Our search space consists of all the possible SPs, given a set L of possible locations for their deployment, and resulting feasible with respect to the constraint in (P). Formally the feasible set is Ω = s ∈ {0, 1} |L| : ∑ |L| i=1 s i ≤ p . The placement matrix H (s) allows the computation of f 1 and f 2 and of the histograms. Denote with π this computational process: We use φ H (s) to stress the fact that the computation is actually performed over H (s) -within the "information space"-and then it generates the observation of the two objectives ( f 1 (s), f 2 (s)) and the histograms h (s) h (s ) .
Each bin of the histogram h (s) represents the number of events that are detected in a specific time range by s. These values can be extracted from the placement matrix H (s) . Indeed, each column of this matrix represents an event, and the detection time of this event is given by the row in which the contaminant concentration exceed a given threshold τ.
A metric in the information space has been introduced in Section 4 using histograms and the WST distance. Let consider two values s, s ∈ {0, 1} |L| such that s 1 = 1 and s i+1 = s i with i = 1, . . . , |L|. If d(., .) is the Hamming distance, we have s, s : d(s, s ) = max x, x ∈{0,1} |L| d(x, x ). We obtain H (s) = H (s ) and ( f 1 (s), f 2 (s)) = ( f 1 (s ), f 2 (s )). This example shows how a distance in Ω can be highly misleading, in that two sensor placements s and s , distant in Ω, might correspond to close values of the placement matrices H (s) and H (s ) , leading to close points in the objective space. This means that the landscape of the problem in the search space might have a huge number of global (not only local) optima, also significantly distant among them in Ω. This information space can be endowed by metrics as matrix distance (e.g., Frobenius). In this paper, histograms are used as elements of the information space which has been endowed with by the Wasserstein distance (WST).
A graphical representation of the computational process π is given in Figure 5. . This example shows how a distance in Ω can be highly misleading, in that two sensor placements and ', distant in Ω, might correspond to close values of the placement matrices ( ) and , leading to close points in the objective space. This means that the landscape of the problem in the search space might have a huge number of global (not only local) optima, also significantly distant among them in Ω.
This information space can be endowed by metrics as matrix distance (e.g., Frobenius). In this paper, histograms are used as elements of the information space which has been endowed with by the Wasserstein distance (WST).
A graphical representation of the computational process is given in Figure 5. MOEA/WST is instantiated in the above framework according to the following steps: 1. The individuals of the population, that are sensor placements, in the evolutionary algorithm are represented as discrete probability distributions, namely histograms. 2. The space of histograms is endowed with a metric given by the WST distance be-

1.
The individuals of the population, that are sensor placements, in the evolutionary algorithm are represented as discrete probability distributions, namely histograms.

2.
The space of histograms is endowed with a metric given by the WST distance between.

3.
The results of WST based computations are mapped back into the search space.

Pareto Analysis and Quality Indicators of the Approximate Pareto Set
The multi-objective optimization problem (MOP) can be stated as follows: Pareto rationality is the theoretical framework to analyze multi-objective optimization problems where m objective functions f 1 (x), . . . , f m (x), where f i (x) :→ R are to be simultaneously optimized in the search space Ω ⊆ R d . Here we use x to be compliant with the typical Pareto analysis's notation, clearly in this study x is a sensor placement s.
Let u, v ∈ R m u is said to dominate v if and only if u i ≥ v i ∀i = 1, . . . , n and u j > v j for at least one index j.
The goal in multi-objective optimization is to identify the Pareto frontier of F(x). A point x * is pareto optimal for Problem 2 if there is no point x such that F(x) dominate F(x * ). This implies that any improvement in a Pareto optimal point in one objective leads to a deterioration in another. The set of all Pareto optimal points is the Pareto set and the set of all Pareto optimal objective vectors is the Pareto front (PF). The interest in finding locations x having the associated F(x) on the Pareto frontier is clear: all of them represent efficient trade-offs between conflicting objectives and are the only ones, according to the Pareto rationality, to be considered by the decision maker.
A fundamental difference between single and multi-objective optimization is that it is not obvious which metric to use to evaluate the solution quality.
To measure the progress of the optimization a natural and widely used metric is the hypervolume indicator [17], that measures the objective space between a non-dominated set and a predefined reference vector. An example of Pareto frontier, along with the reference point to compute the hypervolume, is reported in Figure 6. A fundamental difference between single and multi-objective optimization is that it is not obvious which metric to use to evaluate the solution quality.
To measure the progress of the optimization a natural and widely used metric is the hypervolume indicator [17], that measures the objective space between a non-dominated set and a predefined reference vector. An example of Pareto frontier, along with the reference point to compute the hypervolume, is reported in Figure 6.  The hypervolume indicator is the golden standard in evaluating multi-objective algorithm also because it has strict Pareto compliance. However, it is computationally inefficient.

The Structure of MOEA/WST Algorithm
This section contains the analysis of the new algorithm proposed. It is shown how all the mathematical constructs presented in the previous sections are structured in the MOEA/WST algorithm. Section 6.1 offers a global view of the interplay of all algorithmic components which are described in the following Sections 6.2-6.6.

Chromosome Encoding
In the algorithm, each chromosome (individual) consists in a | |-dimensional binary array that encodes a sensor placement. Each gene represents a node in which a sensor can be placed. A gene assumes value 1 if a sensor is located in the corresponding node, 0 otherwise.

Chromosome Encoding
In the algorithm, each chromosome (individual) consists in a |L|-dimensional binary array that encodes a sensor placement. Each gene represents a node in which a sensor can be placed. A gene assumes value 1 if a sensor is located in the corresponding node, 0 otherwise.

Initialization
The initial population has to be sampled. Our algorithm randomly samples the initial chromosomes. All the individuals in the population have to be different (sampling with-

Initialization
The initial population has to be sampled. Our algorithm randomly samples the initial chromosomes. All the individuals in the population have to be different (sampling without replacement). This method does not guarantee to have only feasible individual. Among this population we select the non-dominated solutions (i.e., the initial approximation of the Pareto set).

Selection
In order to select the pairs of parents to be mated using the crossover operation, we have introduced a problem specific selection method that takes place into the information space ( Figure 9). First, we randomly sample from the actual Pareto set two pairs of individuals (F 1 , If at least one individual of the pair of parents is not feasible (i.e., the placement contains more sensors than the budget ) the constraint violation ( ) is considered instead. Let = [ ] be a generic individual and the budget, the constraint violation is defined as follows:

Crossover
The standard crossover operators applied to sensor placement might generate unfeasible children which might induce computational inefficiency in terms of function evaluations. To avoid this in MOEA/WST, it has been introduced a problem specific crossover which generates two "feasible-by-design" children from two feasible parents Denote with , ∈ Ω two feasible parents and with (FatherPool) and ′ (Mother-Pool) the two associated sets = { : = 1} and ′ = { : ′ = 1}. To obtain two feasible children, and ' are initialized as [0, … ,0]. In turn, and ' samples an index from and from ′, respectively, without replacement. Therefore, the new operator rules out children with more than non-zero components.
Any distance could be considered, for instance the Frobenius distance between two placement matrices H (F i ) and H (M i ) related to F i and M i . In this paper, we used the Wasserstein distance between the histograms corresponding to the sensor placement F i and M i .
If at least one individual of the pair of parents is not feasible (i.e., the placement contains more sensors than the budget p) the constraint violation (CV) is considered instead. Let c = [c i ] be a generic individual and p the budget, the constraint violation is defined as follows: (CV(F i ) + CV(M i )).

Crossover
The standard crossover operators applied to sensor placement might generate unfeasible children which might induce computational inefficiency in terms of function evaluations. To avoid this in MOEA/WST, it has been introduced a problem specific crossover which generates two "feasible-by-design" children from two feasible parents Denote with x, x ∈ Ω two feasible parents and with J (FatherPool) and J (MotherPool) the two associated sets J = {i : x i = 1} and J = {i : x i = 1}. To obtain two feasible children, c and c are initialized as [0, . . . , 0]. In turn, c and c samples an index from J and from J , respectively, without replacement. Therefore, the new operator rules out children with more than p non-zero components.
In Figure 10, an example is shown comparing the behavior of our crossover compared to a typical 1-point crossover.

Mutation
The aim of mutation is to guarantee diversification in the population and to avoid getting trapped into local optima. We consider the bitflip mutation operator, for which a mutation probability is typically used to set the "relevance" of exploration in genetic algorithms. We have been using the bitflip mutation in Pymoo (each gene has a probability of mutation of 0.1).
We have considered the case that the available budget allows to have a maximum of 4 sensors in the optimal SP, that is = 4 in the constraints in (P)-valid also for the proposed bi-objective formulation. The value of has been defined according to a preliminary analysis on the single-objective problem (P): further increasing does not offer any further improvement of ( ).
We remind that an SP is represented through a binary vector with | | components, with at the most components equal to 1. The search space for this example is therefore quite limited, allowing us to solve the problem via exhaustive search. More precisely, only 561 SPs are feasible according to the constraints in (P). Thus, we exactly know the Pareto set, the Pareto frontier and the associated hypervolume.
We have used Pymoo both for implementing MOEA/WST and using directly NSGA-II. In both cases, we used 100 generations and a population size of 40.
After 100 generations, NSGA-II and MOEA/WST obtained the same results in terms of hypervolume, quantile and coverage. Figure 11 shows a significant difference in terms of the rate of convergence of the approximate Pareto to the optimal one between the two algorithms.

Mutation
The aim of mutation is to guarantee diversification in the population and to avoid getting trapped into local optima. We consider the bitflip mutation operator, for which a mutation probability is typically used to set the "relevance" of exploration in genetic algorithms. We have been using the bitflip mutation in Pymoo (each gene has a probability of mutation of 0.1).

Net
The synthetic example Net1 provided by EPANET and WNTR. The graph associated to Net1 is depicted in previous Figure 1. Net1 consists of 1 reservoir (at Location 2), 1 tank (at Location 1) and 11 junctions (nodes). The set of possible locations to deploy sensors are the 11 junctions, therefore the set L is L = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}. The same junctions but 1 and 2 are assumed the nodes where the contaminant can be injected at, therefore A = L\{1, 2}.
We have considered the case that the available budget allows to have a maximum of 4 sensors in the optimal SP, that is p = 4 in the constraints in (P)-valid also for the proposed bi-objective formulation. The value of p has been defined according to a preliminary analysis on the single-objective problem (P): further increasing p does not offer any further improvement of f 1 (s).
We remind that an SP is represented through a binary vector s with |L| components, with at the most p components equal to 1. The search space for this example is therefore quite limited, allowing us to solve the problem via exhaustive search. More precisely, only 561 SPs are feasible according to the constraints in (P). Thus, we exactly know the Pareto set, the Pareto frontier and the associated hypervolume.
We have used Pymoo both for implementing MOEA/WST and using directly NSGA-II. In both cases, we used 100 generations and a population size of 40.
After 100 generations, NSGA-II and MOEA/WST obtained the same results in terms of hypervolume, quantile and coverage. Figure 11 shows a significant difference in terms of the rate of convergence of the approximate Pareto to the optimal one between the two algorithms. Water 2021, 13, x FOR PEER REVIEW 15 of 24 Figure 11. Coverage between the optimal Pareto frontier and the approximate one. Figure 11. Coverage between the optimal Pareto frontier and the approximate one.

Hanoi
Hanoi is a benchmark used in the literature [18]. Figure 12 displays the graph associated to Hanoi water distribution network.

Hanoi
Hanoi is a benchmark used in the literature [18]. Figure 12 displays the graph associated to Hanoi water distribution network.  Table 1 reports the performance metrics considered for the two algorithms, NSGA-II and MOEA/WST.  Figure 13 shows the relation between the detection time and the sensor budget.  Table 1 reports the performance metrics considered for the two algorithms, NSGA-II and MOEA/WST.  Figure 13 shows the relation between the detection time and the sensor budget. Figures 14 and 15 compare the performance of NSGA-II and MOEA-WST in terms of hypervolume and coverage, respectively.

Neptun
Neptun is small WDN in Timisoara, Romania, more specifically it is a district metered area (DMA) of a large WDN, and it was a pilot area of the European project ICe-Water [19]. Figure 16 displays the hypervolume per generation of the two considered algorithms.

Neptun
Neptun is small WDN in Timisoara, Romania, more specifically it is a district metered area (DMA) of a large WDN, and it was a pilot area of the European project ICeWater [19]. Figure 16 displays the hypervolume per generation of the two considered algorithms.

Abbiategrasso
Abbiategrasso is a pressure management zone in Milano with an associated graph of 1213 nodes and 1391 edges and it was also a pilot in the European project ICeWater [20]. Figure 17 displays the hypervolume per generation of the two considered algorithms.

Abbiategrasso
Abbiategrasso is a pressure management zone in Milano with an associated graph of 1213 nodes and 1391 edges and it was also a pilot in the European project ICeWater [20]. Figure 17 displays the hypervolume per generation of the two considered algorithms.

Discussions
The computational results related to the four networks show some interesting facts.
• MOEA/WST provides a better approximation of the Pareto frontier in terms of hypervolume; moreover, it shows that the gain is significantly larger for low generation counts and as the size of the network increases. • MOEA/WST offers a better coverage than NSGA-II. The gain is more significant in the range = 3,4,5 (knee point).
The computational results confirm that MOEA/WST is a promising solution for computationally expensive simulation-optimization multi objective black box optimization problems.

Conclusions
The main result of this paper is the proposal of a new evolutionary algorithms MOEA/WST for optimal sensor placement in water distribution networks. This algorithm has a more general application domain for intrusion detection problems, where the goal is to monitor the spread of "effects" triggered by "events" like, for instance detection of fake news in Web blogs.
The algorithm has been derived in the context of a sensor placement problem in a

Discussions
The computational results related to the four networks show some interesting facts.
• MOEA/WST provides a better approximation of the Pareto frontier in terms of hypervolume; moreover, it shows that the gain is significantly larger for low generation counts and as the size of the network increases. • MOEA/WST offers a better coverage than NSGA-II. The gain is more significant in the range p = 3, 4, 5 (knee point).
The computational results confirm that MOEA/WST is a promising solution for computationally expensive simulation-optimization multi objective black box optimization problems.

Conclusions
The main result of this paper is the proposal of a new evolutionary algorithms MOEA/WST for optimal sensor placement in water distribution networks. This algorithm has a more general application domain for intrusion detection problems, where the