Lost in Optimization of Water Distribution Systems: Better Call Bayes

: The main goal of this paper is to show that Bayesian optimization can be regarded as a general framework for the data-driven modelling and solution of problems arising in water distribution systems. Scenario-based hydraulic simulation and Monte Carlo are key tools in modelling in water distribution systems. The related optimization problems fall into a simulation/optimization framework in which objectives and constraints are often black box. Bayesian optimization (BO) is characterized by a surrogate model, usually a Gaussian process but also a random forest, as well as neural networks and an acquisition function that drives the search for new evaluation points. These modelling options make BO nonparametric, robust, ﬂexible, and sample efﬁcient, making it particularly suitable for simulation/optimization problems. A deﬁning characteristic of BO is its versatility and ﬂexibility, given, for instance, by different probabilistic models, in particular different kernels, different acquisition functions. These characteristics of the Bayesian optimization approach are exempliﬁed by two problems: cost/energy optimization in pump scheduling and optimal sensor placement for early detection of contaminant intrusion. Different surrogate models have been used both in explicit and implicit control schemes, showing that BO can drive the process of learning control rules directly from operational data. BO can also be extended to multi-objective optimization. Two algorithms are proposed for multi-objective detection problems using two different acquisition functions.


Introduction
Optimization problems arising in environmental modelling are usually very challenging. One reason is the presence of several objectives usually conflicting: instances are the financial cost and resilience in designing a water distribution network, and in general the issue of sustainability. Optimizing the detection time, cost and probability of detection in designing a network to monitor water quality is another instance of multi-objective optimization. Multi-objective (MO) problems do not have typically a single best solution: the goal is to identify the set of Pareto optimal solutions such that any improvement in one objective means deteriorating another. Another reason is that we are dealing with simulationoptimization problems in a reference scenario where the objectives are expensive-to-evaluate black-box functions with no known analytical expression and no observable gradients. Another challenge is that the system performance has to be optimized in different conditions which adds one more element of computational complexity.
The reference problem is: This is the case of optimal sensor placement in a monitoring network where is the placement of a number of sensors over the nodes, the environmental condition is the injection of a contaminant at a node in a Water Distribution Network and ( , ) can be the detection time of the contamination event or of the fake new and is the detection time averaged over all contamination events. • Optimizing the expected value of systems modelled by a discrete event simulation ( , ) where is a random variable. In this case ( , ) = ( ( , * |) and the objective becomes either (2), if is discrete, or (3) if is continuous and we can obtain noisy evaluations of ( , ) by simulating ( , * ) where * is drawn from its conditional distribution given . This is the case for instance in the design of a water distribution and transmission systems to maximize several performance metrics subject to stochastic patterns of water availability and demands.
Problem 2 arises in hyperparameter optimization in a machine learning algorithm. In this application ( , ) is the imetric (predictive accuracy, fairness, explainability) on fold w using hyperparameter x and our goal is to minimize ∑ ( , ). In this paper we shall not consider this problem, but it has some water specific applications for instance water demand forecasting (Candelieri, 2017;Shabani et al., 2018) Even if the basic models (2) and (3) share some properties the computational strategies are different and, in this paper, we shall focus on the discrete case. The resulting combinatorial problems typically are NP-hard and cannot be solved in reasonable computing time. Approximation methods have been found to be very efficient in finding near optimal solutions to a wide range of problems settings. In the following the main types of methods are described. ). An analysis of MOBO will be given in 2.3.
The main aim of this paper is to show that the flexibility of BO makes it an effective tool to solve a number of optimization problems in WDNs: this feature is exemplified by two basic problems in WSNs.
The first topic addressed is the energy optimization of pump scheduling.BO can be used to model explicit control, where the decision variables are the status of the pumps, and implicit control where the optimization variables are the trigger thresholds in learning the pressure based control rules. The Bayesian approach can also be used when only SCADA data are available. Also the problem of black-box constraints can be handled in Bayesian optimization.
The second topic specifically addressed is the design of a sensor network for the detection of intrusion of contaminants in a WDN: it is shown that the optimal sensor placement in presence of different contamination scenarios can be formulated as a sample average approximation (SAA) multi-objective optimization problem and shown to be amenable to BO with a specific acquisition functions. based on hypervolume improvement. This is at the authors' knowledge the first case in which Hypervolume improvement has been used in the optimal sensor problem. The focus of this paper is not on the computational performance per se but rather on the analysis of a benchmark problem.

Related works
The literature on energy optimization of pump scheduling is extremely wide. gives an application of NSGA-II for a large scale WDN. Bayesian optimization has been first considered in (Candelieri et al., 2018). The BO approach has been subsequently applied in , towards data efficient learning of implicit control strategies in Water Distribution Networks. A further step has been outlined in (Candelieri et al., 2021-April) where active learning of optimal controls for pump scheduling optimization is performed based only on SCADA data. The issue of crash constraints is particularly relevant in the optimization of WDS: the EPANET simulator in case of violation of some constraints does not yield any results. A specific method for this problem is proposed in (Tsai et al., 2018) where stochastic optimization for feasibility determination is applied to water pump operation in water distribution network. This approach has been generalized in  which addresses the issue in general terms modelling it as a classification/optimization problem.
The literature on Optimal sensor placement is extremely wide.
This is also a key problem in water research literature which has been studied in the last 2 decades following different approaches. Early contributions are (Guestrin et al., 2005) which models the optimal sensor placement using gaussian processes and ( proposes to assess the quality of a sensor placement strategies by a resilience index with respect to sensor failures. A new approach to optimal sensor has been proposed in . The novelty of this approach is that a sensor placement is represented as a histogram which captures better than the sample average of all the information gathered during the simulation. The Wasserstein (WST) distance between histograms enables the design of a new algorithm called MOEA/WST which has shown a remarkable performance in solving the optimal sensor placement problem.

Our Contribution
• An analysis of the constraints in pump scheduling optimization focusing on the black-box case also when the objective function is undefined outside the feasible region. • Hydraulic simulation can be embedded in a BO framework exploiting its sample efficiency to deal with expensive function evaluation. • BO can drive the learning process of the pressure control rules in implicit models of pump scheduling • The sample efficiency is carried over the multi-objective case where an hyper-volume based acquisition function is shown to outperform the approach based on Chebyshev scalarization and expected improvement.

Organization of the paper
Sect. 2 outlines the basic notion of BO explaining the probabilistic model, the acquisition function and the basis of multiobjective optimization.
Sect. 3 outlines the features of BoTorch a Bayesian optimization library which will be used in the solution of the optimal sensor placement.
Sect. 4 contains the data and software resources used in this paper.
Sect. 5 contains two formulation of the problem of energy/cost optimization of pump scheduling 3 and the computational results on two networks.
Sect. 6 gives the multi-objective formulation of the optimal sensor placement problem. The two BO algorithms developed and the computational results.
Sect. 7 is devoted to conclusions and perspectives.

Bayesian Optimization
This section contains a basic description of the key components of BO for the single objective case (2.1 and 2.2) and the outline of its generalization to the multi-objective problem (2.3).

Figure 1. The general scheme of a problem underlying BO.
Although BO can work for any kind of optimization problems, its substantial computational overhead has been making it a de-facto standard in black-box situations ( Figure 1) where the objective functions are expensive to evaluate, the analytical form of objectives and constraints and derivatives is available, and we can evaluate f at a sequence of points with the aim of determining a near optimal approximation after a small number of evaluations. which can be also noisy.
Bayesian Optimization is a statistically principled approach to adaptively and sequentially select these query points (or batches of query points) that enables an effective trade-off between evaluating f in regions of good observed performance and regions of high uncertainty. points not yet observed. Therefore, the posterior is the distribution over the outputs conditioned on the data observed so far. When using the Gaussian process model the posterior is given explicitly as a multivariate distribution.
The choice of the kernel establishes prior assumptions over the structural properties of the underlying (aka latent) function ( ), specifically its smoothness. Moreover, almost every kernel has its own hyperparameters to tuneusually via Maximum Log-likelihood Estimation (MLE) or Maximum A Posteriori Probability (MAP)for reducing the potential mismatches between prior smoothness assumptions and the observed data. Common kernels for GP regressionconsidered in this paperare: • Squared Exponential:

Acquisition functions
The acquisition function is the mechanism to implement the trade-off between exploration and exploitation in the BO. In particular, any acquisition function aims to guide the search of the optimum towards points with potential high values of objective function either because the prediction of ( ), based on ( ) is high or the uncertainty is high (or both). While The other acquisition function selects the next point to evaluate according to the Confidence Bound concept. Lower Confidence Bound -LCBand Upper Confidence Bound -UCBin the minimization and maximization case, respectively. For the case of minimization, LCB is given by: where k ≥ 0 is the parameter to manage the exploration/exploitation trade-off (k = 0 means pure exploitation).

Multi-Objective Bayesian Optimization
As far as the surrogate model is concerned there are two main strategies for generalizing BO to the multi-objective context. (Rahat et al., 2017) The first is termed multi-surrogate in which each objective function is modelled using a GP and the combination of these models induces a multivariate predictive distribution from which an acquisition function can be derived. The second, mono-surrogate, aggregates the objectives into a scalarized model which is used to compute the acquisition function. The weight of the objectives are sampled from a uniform distribution to construct a single objective function whose optimizers can be shown to span, under wide conditions, the whole Pareto set (Paria et al. The hypervolume is an indicator of the solution quality that measures the objective space between a non-dominated set and a predefined reference vector. Figure 3 displays the mechanism of hypervolume improvement in the case of max ( ) = ( (1) ( ), (2) ( )).  The situation is further complicated when constraints are also black-box or even undefined outside the feasible set. In this case MOEA algorithm are not easily made suitable for this kind of problem. A general solution is proposed in (Antonio, 2021).

BoTorch
Two libraries are used in this paper: mlrMBO for Pump Scheduling Optimization (Sect. 5) and Ax-BoTorch (Bakshy, 2018; Balandat, 2020) for the optimal sensor placement (Sect. 6) BoTorch is a library for Bayesian Optimization built on top of PyTorch and is part of the PyTorch ecosystem. In this section we present a short analysis of its main features. The reader is referred to (Balandat et al., 2020) for a detailed exposition. BoTorch is particularly suitable to exploit the defining characteristic of BO as its versatility and flexibility, given by different probabilistic models, in particular different kernels and different acquisition functions. BoTorch supports a wider set of kernels than the basic set in Sect. 2.1 and therefore a wider set of surrogate models. Beyond the acquisition functions EI and L/U confidence bounds in Sect. 2.2, BoTorch supports knowledge gradient and max volume entropy search. The knowledge gradient is also available in its multi-fidelity version. BoTorch allows novel approaches that do not admit analytic solutions, including batch acquisition functions and multi-output models with multiple correlated outcomes. In particular BoTorch utilizes Monte-Carlo methods in the computation of acquisition functions, lifting the restrictive assumptions about the underlying model. It also supports the optimization of acquisition functions over any kind of posterior distribution, as long as it can be sampled from. BoTorch supports batch versions of EI and UCB: in batch BO, design points are selected for parallel evaluation. BoTorch supports also black-box constraints using variants of the expected improvement in which the improvement of the objective is weighted by the probability of feasibility (Letham et al., 2019). BoTorch supports the computation of the Pareto front and of the Expected hypervolume improvement (EHVI) as acquisition function computing its gradients by auto-differentiation.

Data resources
Hanoi is a benchmark used in the literature whose associated graph consists in 32 nodes and 34 edges.

Software resources
In order to implement the BO framework proposed for solving the pump scheduling optimization problem, the R package named "mlrMBO" has been adopted: it is a flexible and comprehensive toolbox for model-based optimization (MBO). This toolbox has been designed for both single-and multi-objective optimization with mixed continuous, categorical and conditional decision variables, and infill criteria and infill optimizers are easily exchangeable.
BoTorch is a library for Bayesian Optimization built on top of PyTorch and is part of the PyTorch ecosystem. BoTorch is supported by an extensive documentation. BoTorch is best used in tandem with Ax, Facebook's open-source adaptive experimentation platform, which provides an easy-to-use interface for defining, managing, and running sequential experiments. BoTorch provides a modular and easily extensible interface for composing Bayesian optimization primitives, including probabilistic models, acquisition functions, and optimizers. BoTorch organizes the computations in batches of size q. BoTorch provides first-class support for state-of-the art probabilistic models in GPyTorch, a library for efficient and scalable GPs implemented in PyTorch As hydraulic simulation software we have used WINTR, a recent Python-based open-source based on EPANET 2.0, the most widely adopted hydraulic simulation software for pressurized WDNs.

Pump scheduling optimization
Two approaches are possible in optimal pump management: explicit and implicit control. The first assumes that decision variables are pump statuses/speeds to be set up at prefixed time, and it is usually also known as Pump Scheduling Optimization (PSO). Thus, the problem is to efficiently search among all the possible schedules (i.e., configurations of the decision variables) to optimize the objective functiontypically minimization of the energy-related costswhile satisfying hydraulic feasibility. A plethora of methods have been proposed such as mathematical programming, metaheuristics, evolutionary and nature-inspired algorithms. However, explicit control typically implies many decision variables for real-world water distribution networks, increasing with the number of pumps and the number of time intervals for actuating the control. The resulting high dimensionality of the search space consequently im-plies to evaluate a huge number of possible solutions (i.e., schedules), each one requiring performing a hydraulic simulation run.
On the contrary, implicit control aims at controlling pump status/speeds depending on some control rules related, for instance, to pressure into the network: pump is activated if pressure (at specific locations) is lower than a minimum threshold, or it is deactivated if pressure exceeds a maximum threshold, otherwise, status/speed of the pump is not modified. These thresholds become the decision variables of the optimal implicit control problem. Compared to explicit control, implicit control approaches allow to significantly reduce the number of decision variables, at the cost of making more complex the search space, due to the introduction of further constraints and conditions among decision variables (e.g., the pressure value implying to switching a pump off cannot be lower than the pressure value implying to switching the same pump on).

Pump Scheduling Optimization through Bayesian Optimization
The reference paper is (Candelieri et al., 2018) which proposes a BO approach for the PSO problem by also comparing two different probabilistic surrogate models to approximate the black-box objective function, that are a Gaussian Process (GP) regression and a Random Forest (RF).
The approach considers a simulation-optimization setting, where a simulation run of the software hydraulic model of the WDN, given a certain pump schedule, provides the associated value of the objective function (i.e., energy cost to be minimized) or its not-computability, meaning that some hydraulic constraints has been violated leading to the impossibility to compute the objective function associated to a hydraulic feasible schedule. More precisely, hydraulic feasibility refers to "basic" computational constraints (e.g., the impossibility to have a negative pressure at some location, lack of convergence in the simulation, impossibility to satisfy water demand, etc.) as well as operational constraints which can be set by the user, such as specific min-max operating ranges for pressure at each node. Finally, feasibility also depends on the specific simulated scenario, such as the water demand patterns the user associates to each node according to historical data.
Although "penalizing" infeasible (aka not-computable) schedule proved to be a sufficiently workaround (Candelieri et al., 2018), successively the estimation of the unknown feasibility region has been successively addressed in (Tsai et al., 2018) in the case of PSO, and in  in the case of a generic problem with a potentially not-computable black-box objective function.
EPANET 2.0 is the most well-known and largely adopted open-source software for simulating WDN. More recently, WNTR has been proposed: it is coded in Python and provides a larger set of functionalities than EPANET, especially for structural and resilience analysis of a WDN.

Learning optimal control rules as a black-box optimization problem
Implicit control has been less explored and when combined with BO it can successfully unlock data efficiency in learning optimal control strategies, as empirically proved in .
The search space for the implicit control has a lower dimensionality than in the explicit case. For instance, if one has to control n pumps, by deciding to simply switch them on/off on an hourly basis over H hours, the entire search space for finding an optimal explicit control (i.e., optimal pump schedule) consists of 2nH possible pump schedules, within a Hdimensional space. In contrast, the search space associated to implicit control, for instance consisting of min-max ranges on pumps' pressures, has just 2n dimensions. On the other hand, the structure of the associated search space is typically more complex, due to the constraints/conditions among values of the decision variables.
In implicit control, the pumps are controlled depending on pressure at some locations, usually their associated pressures. For the sake of simplicity, one can consider the easiest case, where the control rule associated to a pump is based on two different thresholds, τ1 and τ2, all day long, respectively the lowest threshold implying to switch the pump on and the highest threshold implying to switch the pump off: where p is the current pressure value at the pump and S is the status of the pump (i.e., ON/OFF). Figure 6 shows an example of this simple control rule for a single pump.

Figure 6. A schematic representation of and implicit pump control strategy based on min-max thresholds (red dotted lines) on the pressure value (in blue).
Thus, τ1 and τ2 are the decision variables to be optimized in the implicit control case, with the aim to minimize the associated energy cost while constrained to the satisfaction of the water demand.
Also in this case, the possibility to generate infeasible (aka not-computable) control strategies was taken into account, leading to a simulation-optimization problem having a black-box objective function and subject to unknown constraints. Computational results on a real-life WDN also showed that implicit control strategies learned through the proposed BO approach were not only optimal and efficient in terms on number of simulation runs required to identify them, but also "hydraulically" robust with respect to perturbations in the water demand.
One further step has been taken in the paper "Active learning of pressure control rules of water distribution networks" EGU'21 in which it is shown that an accurate approximation of the outputs of the hydraulic simulation, like EPANET, can be obtained training a deep neural network on SCADA data. whose effectiveness and feasibility have been tested on a real-life mid-scale WDN.
The main result is that the hydraulic simulation model can be "replaced" by a Deep Neural Network (DNN): the knowledge hidden in the data can be leveraged into a low cost source using SCADA data to train a DNN to predict the relevant outputs (i.e., energy and hydraulic feasibility) avoiding costs for the design, development, validation and execution of a "virtual twin" of the real-world water distribution network.
Another result is that thresholds-based rules for implicit control can be learned through an active learning approaches well-suited for the implicit control setting: the lower dimensionality of the search space, compared to explicit control, substantially improves computational efficiency.

The basic model
We consider a graph = ( , ) We assume a set of possible locations for placing p sensors, that is ⊆ . Thus, a sensor placement (SP) is a subset of sensor locations, with the subset's size less or equal to depending on the available budget. An SP is represented by a binary vector ∈ {0,1} | | whose components are = 1 if a sensor is located at node , = 0 otherwise. Thus, an SP is given by the nonzero components of .
For a WDN the vertices in represent junctions, tanks, reservoirs or consumption points, and edges in represent pipes, pumps, and valves.
Let ⊆ denote the set of contamination events ∈ which must be detected by a sensor placement , and is the detection time of a sensor placed in node I for a contamination event in node a.
A probability distribution is placed over possible contamination events associated to the nodes. In the computations we assumeas usual in the literaturea uniform distribution, but in general discrete distributions are also possible. In this paper we consider as objective functions the detection time and its standard deviation.
We consider a general model of sensor placement, • is the probability for the contaminant to enter the network at node .
where is the first sensor detecting the contaminant injected at node ; 0 otherwise.
In our study we assume that all the events have the same chance of happening, that is α a = 1 |A| ⁄ , therefore f 1 (s) is: where ̂= ∑ =1,..,| | is the MDT of event . .
with ̂ the minimum time step at which concentration reaches or exceeds a given threshold for the event .
As a measure of risk, we consider 2 as the standard deviation of the sample average approximation of 1 .

Multi-Objective Bayesian Optimization for sensor placement
Two Bayesian optimization algorithms A1 and A2 have been used in this paper for optimal sensor placement.
A1 is mono-surrogate based on Chebyshev scalarization of the objectives of (1) and (2) . a GP of the aggregate function and the Expected Improvement (EI) as acquisition function.
A2 is multi-surrogate based on GP models of the single objectives and the Expected Hypervolume Improvement as acquisition function).
The authors have implemented them using software components from BoTorch. Both organize the computations in batches of size q=5.
The "helper" function creates the outcomes required for the scalarized objective and applies the scalarization and the constraints.
The helper function initializes A1 and A2 and returns the batch ( 1 , 2 , … , ) along with the observed values. In A1 each candidate is optimized in a sequential greedy fashion using a different random scalarization.
The BO loop for a batch of size q iterates the following steps: • Given a surrogate choose a batch of points.
• Update the surrogate model. A1 can also be extended to the constrained case by weighting the EI by the probability of feasibility.
In A2 a list of acquisition functions are created each with different scalarization weights. This list generates one candidate for each acquisition and conditions the next candidate (and acquisition) on the previously selected candidates.

Computational results
The plot below (Figure 7) shows for Net 1 (left) and the Hanoi network (right) the difference of the hypervolume between the feasible true Pareto front and hypervolume of the observed Pareto front for algorithms A1 and A2.

Figure 7. Difference in hypervolume between the optimal Pareto front and the Pareto front approximated by qParEgo (blue) and qEHVI (red) over the iteration.
It is apparent that A2 relative performance improves with the size of the problem.
An effective way to visualize the optimization process is to represent the observations by a color associated the iteration count of A1 and A2: it is apparent from Figure 8 that A2 (right plot) is much quicker than A1 (left plot) to identify the Pareto front.

Conclusions
The main conclusion is that Bayesian Optimization offers a comprehensive framework for the solution of a wide range of problems both for the design and operation of water distribution networks and other environmental domains. The evaluation of the objective functions are largely based on the simulation of the underlying system in different contexts and can result in computationally expensive black-box problems. The probabilistic surrogate model enables a quantification of predictive uncertainty which is fed into an acquisition function which drives the selection of the next evaluation point. It is shown how this feature can be exploited in multi-objective problem incorporating in the acquisition function the concept of hypervolume improvement. Evolutionary algos are often used for these problems, also in the multi-objective case, but they are not nearly as sample efficient as BO, albeit their performance can be improved incorporating a surrogate model. The sample efficiency is BO is particularly important when the data and computational resources are severely constrained by the high computational cost of simulation experiments encompassing several different scenarios or Monte Carlo simulation.