Resilience Assessment of Water Quality Sensor Designs under Cyber-Physical Attacks

: Water distribution networks (WDNs) are critical infrastructure for the welfare of society. Due to their spatial extent and difﬁculties in deployment of security measures, they are vulnerable to threat scenarios that include the rising concern of cyber-physical attacks. To protect WDNs against different kinds of water contamination, it is customary to deploy water quality (WQ) monitoring sensors. Cyber-attacks on the monitoring system that employs WQ sensors combined with deliberate contamination events via backﬂow attacks can lead to severe disruptions to water delivery or even potentially fatal consequences for consumers. As such, the water sector is in immediate need of tools and methodologies that can support cyber-physical quality attack simulation and vulnerability assessment of the WQ monitoring system under such attacks. In this study we demonstrate a novel methodology to assess the resilience of placement schemes generated with the Threat Ensemble Vulnerability Assessment and Sensor Placement Optimization Tool (TEVA-SPOT) and evaluated under cyber-physical attacks simulated using the stress-testing platform RISKNOUGHT, using multidimensional metrics and resilience proﬁle graphs. The results of this study show that some sensor designs are inherently more resilient than others, and this trait can be exploited in risk management practices. Author Contributions: Conceptualization, D.N., C.M., A.O., and E.S.; methodology, D.N., C.M., A.O., and E.S.; software, D.N.; validation, E.S.; formal analysis, D.N.; investigation, D.N.; resources, C.M., A.O., and E.S.; data curation, D.N.; writing—original draft preparation, D.N.; writing—review and editing, E.S. and A.O; visualization, D.N.; supervision, funding


Introduction
Water distribution networks (WDNs) are spatially large and complex systems supplying drinking water to consumers by satisfying multiple objectives, such as maintaining adequate hydraulic pressure, storing water for firefighting, maintaining disinfectant residuals to limit microbial regrowth, and minimizing potential harmful concentrations of substances in the water [1]. Because the requirement of clean drinking water is of utmost importance to public health and societal welfare, WDNs are considered to be critical infrastructure [2]. Indeed, various incidents showcase that severe public hazard, including serious illnesses and deaths and other sociopolitical impacts (e.g., great economic loss), can be the outcome of contamination events in WDNs. Notable incidents include the 1993 cryptosporidium outbreak in Milwaukee [3][4][5], the 2014 Elk River 4-methylcyclohexanemethanol (MCHM) spill in West Virginia [6], the 2016 Beijing misconnection of reclaimed and drinking water pipes, the 2016 lead contamination of Hong Kong's water system [7], the 2019 Askøy water supply system campylobacter outbreak, and the 2019 E. coli outbreak in Long Beach [8].
Due to the distributed size of WDNs even for smaller settlements, and the practical difficulty of protecting the numerous potential contamination entry points, the policy discourse is concerned with the safety of such systems under various threats. These can stem from deliberate, negligent, or accidental actions [9][10][11], including terrorism [10,[12][13][14][15]. Of

Resilience Assessment
The concept of resilience originally stems from an ecological system's ability to persist stress [44] and has since been applied in engineering systems with different interpretations in literature, mainly revolving around "continuity and efficiency of system function during and after failure" [45] with various definitions, e.g., [46][47][48]. We use an operational definition where resilience is defined as "the degree to which an urban water system continues to perform under progressively increasing disturbance", whereas robustness is best defined as "the level of pressure that the system can sustain without failing (or without performance loss)" [43]. With this definition, special types of curves, termed "resilience profile graphs," communicate the performance of the system to meet its objectives on the y-axis as measured by a metric of reliability, whereas on the x-axis ticks lie scenarios of increasing disturbance and is by definition an ordinal scale. Note that these disturbances extend from stresses within design standards to stresses well beyond design standards. Resilience for a given design is the area under the curve i.e., the integral of reliability over the ordinal scale of scenarios. To scale resilience from 0 to 1, we compare this area to the area of a totally reliable and robust design for all scenarios [43].
To adapt the methodology for the CPS system's CWS performance under cyberphysical attacks, we use a resilience profile graph as shown in Figure 1. The figure demonstrates the performance of an ideal, perfectly robust, reliable, and resilient CWS design, as well as two others with different properties. In this example, CWS design 1 is more resilient than CWS design 2, as communicated by the respective areas under their reliability curves. CWS design 1 retains higher reliability with the progressive increase in stress from the scenarios, whereas CWS design 2 rapidly loses performance as stress increases. The scenarios represent combinations of cyber-physical attacks that hinder the CWS's ability to detect contamination while the WDN water is deemed unsafe for consumption. The CWS designs refer to different topologies of placed sensors (different total number and locations monitored) and sets of controls for response and mitigation measures. The reliability metric could be any metric that can describe the CWS performance and the impact on the WDN in a systematic way, such as the detection rate of contamination events.

Cyber-Physical Simulation Platform
RISKNOUGHT is a cyber-physical stress-testing simulation platform that simulates WDNs as integrated CPSs. This is achieved by coupling two interacting models with feedback loops: (a) the cyber model, a customizable SCADA layer that simulates the control logic and behavior of PLCs, sensors, and actuators, and (b) the physical model, a custom

Cyber-Physical Simulation Platform
RISKNOUGHT is a cyber-physical stress-testing simulation platform that simulates WDNs as integrated CPSs. This is achieved by coupling two interacting models with feedback loops: (a) the cyber model, a customizable SCADA layer that simulates the control logic and behavior of PLCs, sensors, and actuators, and (b) the physical model, a custom WDN simulator that leverages the WNTR [35] package (version 0.3 at time of writing) to implement pressure-driven analysis (PDA) [49] and quality simulations using the recent EPANET version 2.2 solver [50]. The RISKNOUGHT model is described in detail in [19], whereas the analysis of WQ cyber-physical simulation is presented in [42]. In the stepwise simulation, the physical layer feeds input data (e.g., tank level, concentration of contaminant at nodes, etc.) from the hydraulic and quality simulation to the cyber layer, which uses the information to resolve statements of the deployed control logic schemes and then passes decisions to the physical layer, affecting the hydraulic state for the subsequent simulation step (e.g., open a pump, close a valve, etc.). With regard to the implementation of control logic to facilitate response and mitigation measures against contamination events, RISKNOUGHT enables the use of district metered area (DMA) isolation commands via isolation valves to minimize the extent of the contamination and activation of flushing units to remove the contaminant from the system. As such, RISKNOUGHT can simulate the impact of contamination events even after being detected and enforcing response measures. The platform incorporates a cyber-physical attack model that simulates a variety of cyberattacks on cyber elements (e.g., denial of service (DoS) attacks, sensor data manipulation, etc.) and physical attacks on elements of the WSN (such as backflow contaminant injections, destruction of pipes, etc.) in stress-testing scenarios.

Generation of Sensor Designs
In order to generate different topologies of sensor designs, we use TEVA-SPOT [51], a well-known software utilized for sensor placement optimization and the assessment of consequences of contamination events in large WDNs. TEVA-SPOT can employ a wide range of objective functions in sensor placement also common in other works in literature, including the time to detection [29], extent of contamination (total length of contaminated pipes) [52], number of failed detections [52] (which is similar to the inverse of detection likelihood [53]), volume of contaminated water consumed, mass of contaminant consumed [54], and population exposed [52], with the ability to differentiate between population killed and population receiving a specific dose and above. TEVA-SPOT also ranks the sensors deployed from a specific optimization scheme from most to least important in terms of reduced impact of contamination events due to its presence. It should be noted that TEVA-SPOT estimates impact based on the assumption that when a contamination is detected, mitigation and response measures will follow, but the software has no means to actually simulate them, other than providing a specified response time. Therefore, other sensor placement optimization tools could also be considered for the generation of sensor designs, such as Chama [55].

Selecting Cyber-Attack Scenarios
For proper resilience assessment and stress-testing, a scenario set of increasing pressure from scenario to scenario should be constructed. As the focus of this work is to assess the resilience of different sensor topologies against contamination attacks and cyberphysical attacks, the increased stress is translated to progressively more sensors being cyber-attacked. Due to the immense topological, hydraulic, and control complexity of large WDNs, standardization is difficult, as different sensor designs differ in WDN coverage and control sets, thus varying wildly in performance during failure. For normalization reasons, the control sets for response and mitigation measures are similar for every sensor design, i.e., we use isolation commands that cut off connections between all DMAs to stop the contamination spread and activate flushers in all DMAs from where the detected con-tamination event could have originated. Note that demands on nodes remain unaffected, i.e., are not reduced to zero, to represent consumer compliance to a do-not-use warning. This could be used as another rule but is not dependent upon the system's control logic rather than consumers' behavior, and we want to avoid the overhead of changing the network's properties during runtime. The cyber-attacks have similar attributes i.e., sensor data manipulation to report normal water quality readings masking the contamination, and to negate the effect of temporal hydraulic variance in detection capability, the cyber-attacks happen from simulation start to end for all scenarios and combinations. To choose which sensor to add to the attacked sensors set in a progressive manner, the TEVA-SPOT ranking is used. Therefore, the least stressful scenario contains a single attack on the most important sensor of the topology, then the second most important sensor is also attacked, and so on, with the worst scenario including attacks on all sensors.

Sampling Contaminant Injection Locations
Because of WDN complexity, a single physical attack combined with a cyber-physical attack scenario cannot capture the general expected impact of cyber-physical attacks, as this injection may happen at a still adequately monitored part of the WDN and have a low impact or on the contrary have a large impact if it coincides with an important part of the WDN's CWS being cyber-attacked. In addition, some injection locations are inherently more topologically critical in terms of reaching other parts of the network, whereas others may pertain only to small branches of the WDN, limiting the impact despite being unnoticed by the CWS. Thus, it is imperative to combine the scenarios with a multitude of different injection locations to calculate the expected outcome of the cyber-physical attacks. Although ideally all injection locations should be considered, the computational time for all the combinations increases drastically. The computational budget may allow for a small number of examined injection locations, so we propose the use of a weighted random choice of nodes from the network, with probabilities of choice not uniform but according to the "fitness" f i of the node i as an injection location. In this work, we use two metrics, equally weighted to calculate f i . The node betweenness centrality (bc) [56] is a metric that is also used to optimally place sensors topologically without hydraulic simulations [57,58] (among other centrality metrics and weighted variations of them by characteristics such as diameters of pipes, etc., which could also be used here). The betweenness centrality bc i for each node i is calculated as where I is the set of discrete node elements, s and e are pairs of network nodes, σ(w, e) is the number of shortest (w, e) paths, and σ(w, e|i ) is the number of those paths passing through node i. A higher value signifies nodes that are more central, in the sense of residing along more of the shortest path between pairs of other nodes. This metric conveys the expected probability for a node to be along the pathway of a contamination spread from any source point to others, and so topologically lower values will be assigned to nodes residing in smaller branches of the network and higher values along the main water pipes. Because tanks and sources in a water distribution graph representation reside in branches, the second metric considers the actual hydraulic traceability of each location and is calculated from trace simulations in EPANET. These trace simulations are fast, as there is no cyber-physical coupling, and, if required for large networks, a much coarser simulation timestep can be used. After performing a trace quality simulation for each node i, we can obtain p ijt , which is the percentage of water originating from i for each other node j and for each timestep t ∈ {1, m}, where m is the number of simulation steps. With n being the number of nodes (i.e., the cardinality of set N), the node reach nr i can be calculated as where This metric conveys the expected max contamination reach of an injection, thus locations primarily near sources and secondarily near tanks have a bigger nr value.
After normalizing metrics bc i and nr i to the range 0 to 1 as bc i and nr i (by subtracting the minimum value of each set and dividing by the respective range) and assigning equal weights w 1 = w 2 = 0.5, fitness f i for each node is calculated as: To get the probability distribution function P for the nodes, the following equation is used: Finally, using P for the weighted random choice (without replacement) we can generate a random set of k nodes for injection locations. For large WDNs and small k many of the sampled nodes may be clustered in some areas. In this case, we can modify the sampling procedure: Each time a sample is selected, we find the shortest paths of length l = 1, · · · , l lim from the sampled node, and the nodes in the paths are excluded from the sampling pool to distribute the sampling selection to a wider area and increase network coverage. The l lim can be estimated empirically or by trial and error. For medium-sized networks, smaller values (i.e., 1 or 2) of l lim should work best. An example is showed in the Section 3.
Furthermore, the actual starting time and duration of an injection can differentiate the outcome even at a single location, due to temporal hydraulic variations (e.g., flow can change direction, a pump could be open, etc.), whereas injected mass always affects node concentrations. Because simulating the attacks for different time conditions linearly increases the required number of total simulations, to confide within a time budget in this work we standardize all injections to have the same characteristics in terms of starting time, duration, and injection mass rate, even though for some physical attacks this may not produce the worst contamination outcome. It should also be noted that other aspects such as physical accessibility of a location and (if any) implemented security measures affect its attractiveness and could be considered when making an informed selection.

Performance Metrics
The simulation-derived data of the WDN with its CWS impacted by a specific combination of physical and cyber-attacks (i.e., contaminant injection and a set of cyber-attacked sensors for the work presented here) can be mapped by a variety of multidimension metrics [59]. These metrics include:

•
Unmet demand (UD), a hydraulic metric that describes the quantity of unsupplied water to consumers during the simulation. For each simulation timestep t let d it be the demand of node i in m 3 s −1 ,d it be the actual supplied quantity in m 3 s −1 , and t step be the simulation step size in s, so that ud i is the unmet demand of node i and UD the integral for all nodes in I: • Mass consumed (MC), which describes the total contaminant mass consumed in g for the entire simulation. Following the notation for UD and letting c it be the concentration in mg L −1 for timestep t, mc i is the mass consumed for node i and MC the integral for all nodes in I: • Nodes affected (NA), a spatial metric that describes the extent of the contamination as a percentage of nodes that are affected by the spread of contamination. For node i the metric na i denotes whether contaminated water has passed through or not and N A is the integral for all nodes in I: • Population affected (PA), which describes the number of consumers affected by consumption of a contaminant. If no census data exist for the nodes of the WDN, we can use the average daily per capita demand pcd to estimate population per affected node pa i and the integral PA for all nodes (assuming that m corresponds to a single day, otherwise this is modified accordingly): • Earliest detection time (EDT), a temporal metric that describes how fast the CWS can detect the contamination event by measuring the earliest time in s that the CWS emits a contaminant warning. For undetected events, we can assign to EDT the value of the simulation duration with the assumption that by then the contamination will be discovered by other means. If o is the number of sensors that detect contamination, and dt s is the detection time for each sensor s, then • Mass consumption before detection (MCBD), which describes the impact of the cyberphysical attack before being detected, therefore before being able to generate any public warnings. This metric can be more representative of the actual impact than MC. This may happen because MC is greatly affected by the mitigation measures in effect after detection, e.g., flushing, isolation of DMAs, preemptive total cutoff of water supply, issuing a general public warning, etc., that can affect both the supply of water and the demand of the consumers.
Within this paper, all these metrics refer to scenarios comprising a single contaminant injection (and a set of cyber-attacks), although they are valid for multiple simultaneous injections as well. For a sampling set of scenarios with various different single contaminant injection locations, a statistical property of the metric set could be used to assess performance, such as the mean, median, or worst value. There is one more interesting performance metric that can be generated from the sampling location set: the detection rate (DR) that describes the performance of the CWS regarding its ability to detect contamination events and can act as a reliability surrogate metric. It should be noted that this simple metric does not differentiate between different injection scenarios and all have the same weight. With J being the size of the injection sampling set, o j being the number of sensors that detected the contamination originating from injection j, and D j the Boolean operator for detection,

Cyber-Physical System
The resilience assessment methodology for CWS under cyber-physical attacks was demonstrated at the C-Town WDN, a benchmark model based on a real medium-sized system used for various studies, e.g., [17,[60][61][62]. We used RISKNOUGHT to create a SCADA for the hydraulic controls, as shown in [19], then updated the cyber and physical layers to include a CWS with: • Four actuators and respective isolation valves that can separate the network into DMAs 1 to 5, placed at pipes P409, P424, P310, P796, and P237; • Five flushing units (one in each DMA in lower elevation nodes) as new nodes adjacent to present nodes J1056, J416, J1208, J185, and J87 and the necessary actuators and isolation valves for operation; and • A variable number of water quality sensors depending on sensor design. Figure 2 shows an example of CWS with these properties. The cyber-physical simulation had duration of 86,400 s, a hydraulic simulation timestep of 600 s, and a water quality timestep of 300 s. The SCADA simulation timestep (how often sensors update and control logic is checked and applied) was matched to the hydraulic step. For the pressure driven demand analysis equations we used for all nodes a minimum pressure of 0 m and a required pressure of 20 m. The pressure exponent was equal to 0.5.
The controls added for response and mitigation measures depended on the placement of the sensors and had the following form:

•
If any sensor detects contamination, isolate all DMAs; and • If a sensor placed within a DMA detects contamination, activate the corresponding flushing unit and the flushing units in the DMAs that are within the possible path of the contamination extent. For example, if contamination is detected in a node in DMA5, activate the flushing units in DMA5 and DMA1 (from where the water originates).

Sensor Designs
As the associated set of controls included the isolation of DMAs to minimize the spread of contamination, the rational choice of objective function in placement optimization in TEVA-SPOT was minimization of the mean contamination extent. We generated 12 sensor designs of size N = 1, 2, . . . , 10, 15, 20, as seen in Table 1 with the sensor names and their rankings, by employing 388 (i.e., for all junctions) injection scenarios. The spatial representation is depicted in Figure 3. Designs of size N = 1 to 10 were selected to explore a progressive increase in sensor number, and N = 15, 20 to explore designs with a relatively high-monitored locations-to-total nodes ratio (roughly 3.8% and 5.1% respectively). Table 1. Sensor designs of size N and the sensor ranking of the design.

Size N
Sensor Ranking Order (1st to nth)

Cyber-Physical Scenarios
For the cyber-attack part, the scenarios had progressively more pressure by increasing the number of k most important sensors being manipulated to show normal water quality readings, with k values in the set {1, 2, 3, 4, 5, 10, 15}, producing 7 scenario sets. Sensor designs with N ≤ k were effectively completely unmonitored. Table 2 presents the sensors selected by each scenario set from each sensor design N. Table 2. Sensors under cyber-attack for each scenario set and sensor design. For the physical attack part, we choose to analyze injections at 15% of the total number of nodes (388), which gave 58 injection locations, sampled with weighted random choice with l lim = 1 based on the node criticality estimation described above. Figure 4 shows the values of each metric, the nodes' weighted probabilities, and the final sampled set of injection locations. Each contaminant injection started at time 0 s and ended at time 3600 s with a mass rate of 1 kg s −1 of a completely conservative contaminant (zero diffusivity and no reactions with water or pipe materials).
For the physical attack part, we choose to analyze injections at 15% of the total number of nodes (388), which gave 58 injection locations, sampled with weighted random choice with lim = 1 based on the node criticality estimation described above. Figure 4 shows the values of each metric, the nodes' weighted probabilities, and the final sampled set of injection locations. Each contaminant injection started at time 0 s and ended at time 3600 s with a mass rate of 1 kg s −1 of a completely conservative contaminant (zero diffusivity and no reactions with water or pipe materials).

Results from Stress Testing
The 58 physical attacks were applied to 12 sensor designs with 7 scenarios of cyberattacks, as described in Table 2, and also accounting for baseline normal operation of the system, and produced a total of 2900 cyber-physical simulations within a total computational budget of roughly 24 h. Figure 5 shows metric curves that aggregate the results with descriptive metrics. Each curve refers to a scenario set with attacked sensors with

Results from Stress Testing
The 58 physical attacks were applied to 12 sensor designs with 7 scenarios of cyberattacks, as described in Table 2, and also accounting for baseline normal operation of the system, and produced a total of 2900 cyber-physical simulations within a total computational budget of roughly 24 h. Figure 5 shows metric curves that aggregate the results with descriptive metrics. Each curve refers to a scenario set with k attacked sensors with points of the curve describing the mean impact of the scenario set (y-axis) to the topology of N sensors (x-axis). Water 2021, 13, x FOR PEER REVIEW 17 of 24

Normal Operation
As expected, sensor topologies with more deployed sensors fared better in every metric when no cyber-attack hindered the operation of the CWS, as communicated by the blue curves in the subplots of Figure 5. There were some irregularities, such as, for example, in the N A and PA metrics where the topology of N = 4 performed slightly worse than the N = 3 (also true for N = 5 and N = 6), but this should be attributed to the small random set of physical attack locations. Had the contaminant injection set included all the possible nodes, the actual performance would have been at least equal. For most of the metrics, performance gains dropped rapidly from N = 8 onwards, whereas N = 6 can be regarded as a good trade-off of performance versus deployment cost. Designs with substantially more sensors (N = 15, N = 20) generally did not offer much improvement in normal operation status, except for the EDT metric. This exception should be expected, as more sensors equal better coverage and faster detection, but metrics such as MC and N A are affected by the mitigation and response measures, which in this case were the exact same for all sensor designs (i.e., isolation of the five DMAs and activation of the flushing units in the contaminated DMAs). It should be noted that for DMAs where contamination was not detected, no flushing took place. In reality, if contamination is detected usually the water utility issues a public do-not-use warning, and that is why we also employed the MCBD metric. In addition, FM plus MC quantities in scenarios generally do not equal the total injected mass (the same is true for cyber-physical attack scenarios as well), as there are instances where a contaminant mass resides in pipes by the end of the simulation and is neither flushed nor consumed, due to pressure deficient conditions.

The Case of a Single Attack on the Most Important Sensor
For a small number of deployed sensors (N ≤ 5), the loss of the most important sensor (curve k = 1) greatly affected most performance metrics because if detection of the contamination were achieved, it would be at a much later stage, as communicated by the EDT metric, which was inflated by the fact that undetected events received the edt i value of simulation duration. The large increase in impact was not true for N A and PA metrics, where the performance drop was smaller, because these were spatial metrics without a temporal dimension, and for small N values the contamination had a greater spread regardless. For N > 5 generally the impact was greatly reduced or barely noticeable. This happened because, as seen in Table 1 and Figure 3, the sensor placement optimization algorithm for N ≥ 5 deployed in all cases sensors that could cover in a redundant way DMA1 (even if not deployed inside, the sensors were close to it), which was the biggest, and it happened that in all sensor designs the most important sensor in ranking was either deployed in J301, J304, or J317, locations on the outskirts of DMA1. This algorithmically unintentional fact gives an insight into the importance of redundancy as a property of resilient design of CWSs.

Effects as k Increased for Designs up to 10 Sensors
Generally, as the strain of the scenarios increased, performance dropped rapidly for most sensor designs. There were a few irregularities for some metrics and curves, such as, for example, mean MC being slightly higher for k = 2 than k = 3 for the topology N = 6, but these can be attributed to the small sample size and some hydraulic conditions that did not alter linearly due to the set of controls for response measures. For metrics MC, MCBD, and EDT, performance was retained for up to k = 3 attacked sensors for sensor designs with N ≥ 8. Interestingly, the expected EDT, MC, and MCBD of N = 8, 9, 10 designs even with the three first ranked sensors taken out were still lower than the N = 5, 6, 7 designs under normal operation conditions, even though the expected spatial extents N A of the contamination events were higher. High performance of these designs up to k = 3 regarding the UD, FO, and FM metrics (i.e., high unmet demand, high volume of flushed water, and high mass of flushed contaminant, respectively) signifies that the response and mitigation measures worked as they should, managing to isolate DMAs and flush significant contaminant mass out of the system before being consumed. Therefore, these sensor designs were robust, but when k > 3, performance rapidly dropped, which was not a desired trait for the CWS, as for resilient designs, failure should be more gradual, i.e., a design that is more safe-to-fail.

Effects as k Increased for Designs with 15 and 20 Sensors
The sensor designs with N = 15 and N = 20 proved to be both more robust and more resilient, as demonstrated by their performance. Although it is expected that topologies with significantly more sensors (i.e., at least 50% and 100% more compared to N ≤ 10 designs) outperform the rest, these designs possess an inherent trait of more gradual performance loss. As demonstrated by the k = 10 curves in Figure 5, even with one quarter and half of the sensors attacked respectively, the performance drop in all metrics was much smaller than the counterparts in other sensor designs. Notably, even with only a quarter of the deployed sensors still operating, N = 20 design still performed comparably to the un-attacked N = 5 design in terms of UD FO and FM metrics, showing that the mitigation measures can still provide significant protection fast enough, as communicated by the expected EDT value, even though the contamination spread more (N A), affecting more people (PA), and contaminant mass consumption was higher before detection (MCBD).

Resilience Profile Graphs
Examining the metrics presented in Figure 5 allows us to fully describe the system's behavior under cyber-physical attacks and thus under failure in a variety of scenarios. In addition, we need a well-defined, straightforward, and comprehensible way to measure the reliability of the system to generate resilience profile graphs. Thus, we used the detection rate of the contamination events metric, DR, as a surrogate for reliability. The resilience profile graphs for each sensor design are presented in Figure 6. It is evident that sensor designs N = 20 and N = 15 were the most resilient and robust. Notably, the resilience score for N = 20 and this particular ordinal set of disturbances was 85.99%, signifying that even under failure it retains most of its ability to detect contamination in most cases, even when this means lower performance in most other metrics. On the contrary, the N = 6 sensor design, which seems the best trade-off between performance and cost in normal operation, under these disturbances (some of which, i.e., k = 10, 15 were well beyond its design capacity) had a much lower resilience score of 48.70%. Even scaled within its design capacity for up to k = 5, the resilience score was 64.90% owing to its much lower normal operation reliability and greater susceptibility due to the lower count of sensors. In addition, seeming like a paradox, pairs (N = 20, k = 15) and (N = 20, k = 10) attained better reliability (detection rate) than (N = 5, k = 0) and (N = 10, k = 0), but the actual design goal was to place sensors to minimize the mean extent of contamination, and this was a surprising side-effect. The remaining working sensors in N = 20 happened to fare in detection rates better than the complete sets of sensors in N = 5 and N = 10, as these were be placed in less central locations (due to their lower ranking) and thus, monitor more locations, increasing network coverage. It should also be noted that designs N = 6 and N = 7 had exactly the same resilience results. This was to be expected, as these designs had the same sensors, except the lowest ranked sensor in N = 7 (placed at J179), which was placed at a position monitoring locations that were a subset of locations monitored by the sixth sensor (placed at J297) in both designs. The resilience scores are presented in Table 3.

Generating Topologies with Other Placement Optimization Strategies
The behavior of the = 20 topology generated by optimizing the placement with the objective function mean contamination extent ( _ ) exhibited, as expected, better performance than other topologies with a lower number of sensors, but also showed robustness and resilience traits. This brings up the question of whether using other topologies comparable in the number of sensors would fortify the network with better performance under loss of function due to cyber-physical attacks. Using TEVA-SPOT, four more topologies of 20 sensors were generated using the objective functions (i) minimize mean mass consumed, _ , (ii) minimize mean volume consumed, _ , (iii) minimize

Generating Topologies with Other Placement Optimization Strategies
The behavior of the N = 20 topology generated by optimizing the placement with the objective function mean contamination extent (obj_ec) exhibited, as expected, better performance than other topologies with a lower number of sensors, but also showed robustness and resilience traits. This brings up the question of whether using other topologies comparable in the number of sensors would fortify the network with better performance under loss of function due to cyber-physical attacks. Using TEVA-SPOT, four more topologies of 20 sensors were generated using the objective functions (i) minimize mean mass consumed, obj_mc, (ii) minimize mean volume consumed, obj_vc, (iii) minimize mean time before detection, obj_tbd, and (iv) minimize mean number of failed detections, obj_n f d. The sensors selected for each objective function are shown in Table 4. The same selecting methodology of k best ranked sensors for cyber-attacks and the same injection location set were used to compare performance. For better comparison purposes we used all k sets for k ∈ {0, 15}. Figure 7 depicts the sensors' locations in the network, color coded by sets of attacks. Table 4. Sensors generated from other objective functions, arranged by rank from most to least critical. mean time before detection, _ , and (iv) minimize mean number of failed detections, _ . The sensors selected for each objective function are shown in Table 4. The same selecting methodology of best ranked sensors for cyber-attacks and the same injection location set were used to compare performance. For better comparison purposes we used all sets for ∈ {0,15}. Figure 7 depicts the sensors' locations in the network, color coded by sets of attacks.

CWS Resilience Results of Various Sensor Placement Strategies
By subjecting the topologies of 20 sensors generated by the _ , _ , _ , and _ to the same stress-testing procedure as the = 20 topology of the _ , we acquired the results presented in Figure 8. As expected, the topology of the design strategy of minimizing the mean detection failure ( _ ) yielded the best reliability score for undisrupted service, near the ideal design at 98.27%. In this state, performance (it should be noted that by performance hereafter we refer to detection rate-not the performance of any other metric) was matched by strategy _ , which employed a very similar sensor set but with different rankings for the exact same sensors ( _ was better in = 7 but worse in = 2,11,12,13), then followed closely by _ with 96.55%, _ exhibiting a high score of 93.10%, and the lowest performance attributed to _ Figure 7. Topologies of 20 sensors as placed by the objective functions used in this work, color coded by the set of k attacks they belong to (some representative sets are selected).

CWS Resilience Results of Various Sensor Placement Strategies
By subjecting the topologies of 20 sensors generated by the obj_mc, obj_vc, obj_tbd, and obj_n f d to the same stress-testing procedure as the N = 20 topology of the obj_ec, we acquired the results presented in Figure 8. As expected, the topology of the design strategy of minimizing the mean detection failure (obj_n f d) yielded the best reliability score for undisrupted service, near the ideal design at 98.27%. In this state, performance (it should be noted that by performance hereafter we refer to detection rate-not the performance of any other metric) was matched by strategy obj_tbd, which employed a very similar sensor set but with different rankings for the exact same sensors (obj_tbd was better in k = 7 but worse in k = 2, 11, 12, 13), then followed closely by obj_mc with 96.55%, obj_vc exhibiting a high score of 93.10%, and the lowest performance attributed to obj_ec at 87.93%.
Interestingly, as the disruption to the system increased with progressively attacking the k most important sensors, the reliability in detecting contaminations of obj_n f d and obj_tbd diminished to 70.68% (a performance delta of 27.59%), taking the last place, whereas obj_ec rose to first place with 75.86% (a smaller performance delta of 12.07%). The two other strategies shared the second rank, where obj_mc and obj_vc demonstrated performance deltas of 22.41% and 18.97%, respectively. The performance curves exhibited different characteristics: obj_tbd lost reliability immediately at k = 1, obj_n f d and obj_vc started to lose reliability at k = 2, and obj_mc weathered the disturbance up to k = 3. The robustness trait of obj_ec to withstand up to k = 8 is remarkable given that while it was not the top performing strategy by design in reliability to detect contamination, by k = 6 it had closed the performance gap with the other strategies and then outperformed them all at every k > 9. Looking at the resilience scores, we got the following scores: obj_ec was the most resilient with a score of 85.23%, then followed obj_mc with 84.54%, obj_vc with 83.85%, and both obj_n f d and obj_tbd sharing the last position with 83.22%. and _ started to lose reliability at = 2, and _ weathered the disturbance up to = 3. The robustness trait of _ to withstand up to = 8 is remarkable given that while it was not the top performing strategy by design in reliability to detect contamination, by = 6 it had closed the performance gap with the other strategies and then outperformed them all at every > 9. Looking at the resilience scores, we got the following scores: _ was the most resilient with a score of 85.23%, then followed _ with 84.54%, _ with 83.85%, and both _ and _ sharing the last position with 83.22%.

Insights from the Case Study for CWS Design
This study is the first to the authors' knowledge in which the resilience of different sensor designs is assessed under complex cyber-physical attacks, including deliberate contamination, where the cyber-physical model employs a series of realistic response and mitigation measures in an integrated control scheme that affects the actual outcome of the simulation, measured with multidimensional measures. The results obtained by this case are transferable to real-world water distribution networks, and signify the interplay between different attributes of sensor designs, such as increased sensor count, redundancy in monitored locations, and control actions. As shown, small variations in sensor size design can have anywhere from unnoticeable to big impacts on performance in metrics when under loss of operation of some elements, with some designs unintentionally proving to

Insights from the Case Study for CWS Design
This study is the first to the authors' knowledge in which the resilience of different sensor designs is assessed under complex cyber-physical attacks, including deliberate contamination, where the cyber-physical model employs a series of realistic response and mitigation measures in an integrated control scheme that affects the actual outcome of the simulation, measured with multidimensional measures. The results obtained by this case are transferable to real-world water distribution networks, and signify the interplay between different attributes of sensor designs, such as increased sensor count, redundancy in monitored locations, and control actions. As shown, small variations in sensor size design can have anywhere from unnoticeable to big impacts on performance in metrics when under loss of operation of some elements, with some designs unintentionally proving to be more resilient or robust. It is self-explanatory that more sensors equal better performance, but even one more sensor placed at a carefully chosen location can provide a great benefit against loss of function and redundancy in the network. Moreover, the examination of the behavior of different sensor designs against the same sets of attacks shows that there are cases where a generally less optimal solution in detecting contamination for normal operation status (e.g., obj_ec versus obj_n f d) is eventually far more robust and resilient against disturbance, perhaps counterintuitively. This insight is also shared with many other engineered systems: A top-performing and reliable system at normal operation status is not always resilient, as there are also robust systems that are not resilient [43,63]. This behavior can only be evaluated by systematic stress-testing procedures that assess the performance of the system under failure, denoting the importance of such resilient assessment methodologies for engineered systems.
Because in this study the cyber-attacks are deliberately chosen from sensor ranking order, the fact that some designs can be more resilient can only be exacerbated by more targeted attacks that can, for example, blind sensors monitoring the same DMA regardless of ranking. As such, we conclude that sensor placement optimization methodologies should generally take into account the possibility of a subset of sensors not operating. In addition, by carefully examining the sensor placement locations of the various strategies employed, it is evident that the strategies obj_n f d and obj_tbd deploy all sensors to the outskirts of the network at terminal nodes, obj_mc places 40% of them, and on the contrary obj_vc and obj_ec none of them. This fact suggests that there is some connection with the choice of placement locations and inherent resilience against disturbances in the capacity to eventually detect contamination due to loss of performance because of deliberate attacks and could be exploited to design algorithms to place sensors with the goal of optimizing CWS resilience. Moreover, the resilience assessment methodology presented here is generally applicable to case studies with simple malfunctions or inability of sensors to detect contamination that may also not be deliberate, but accidental, and thus can be employed outside of cyber-physical design studies without modification.
Although we study the problem from a "what-if" civil engineering perspective, there are two more aspects of the resilience assessment performed here that can be improved upon through interdisciplinary research efforts. The first is the fact that both the contaminant and the sensors are deprived from their chemical nature to simplify things. Obviously, contaminants rarely are as persistent and conservative as in the simulations here and react with water, pipe materials, etc., possibly forming even more dangerous substances. Sensors typically monitor surrogate quality properties of water to detect anomalies, and thus may not always detect a contaminant. A multi-species water quality analysis with an analysis of the reactions and kinetics of the contaminant and the impact on sensor monitor properties can give a more accurate representation of the cyber-physical control scheme and impact assessment of the cyber-physical attack. The second is the IT (information technology) aspect of the cyber-physical system. Although no cyber system is unsusceptible to hacking, there can be solutions implemented that hinder certain types of attacks, or some elements of the system that can be of different technology, and such measures should be addressed in a more realistic representation of the problem and its scenario simulation.

Conclusions
We presented a formal resilience assessment methodology for different water quality sensor designs in cyber-physical water distribution systems under cyber-physical threats. We also demonstrated the use of the methodology through a comprehensive analysis of a medium-sized real-world-based WDN under a multitude of cyber-attacks combined with a wide range of deliberate physical contamination attacks and evaluated its performance with multidimensional metrics. The results show that resilience and robustness are traits of the system that can be assessed only by stress-testing methodologies that evaluate the system's performance under failure. It is suggested that this methodology can prove useful to water utility risk assessment, risk mitigation studies, and strategic planning, as well as inform water quality sensor placement decisions.