Sensor Placement for the Classification of Multiple Failure Types in Urban Water Distribution Networks

Parajuli, Utsav; Magar, Binod Ale; Ghimire, Amrit Babu; Shin, Sangmin

doi:10.3390/urbansci9100413

Open AccessArticle

Sensor Placement for the Classification of Multiple Failure Types in Urban Water Distribution Networks

School of Civil, Environmental and Infrastructure Engineering, Southern Illinois University, 1230 Lincoln Drive, Carbondale, IL 62901, USA

^*

Author to whom correspondence should be addressed.

Urban Sci. 2025, 9(10), 413; https://doi.org/10.3390/urbansci9100413

Submission received: 7 September 2025 / Revised: 1 October 2025 / Accepted: 2 October 2025 / Published: 7 October 2025

(This article belongs to the Special Issue Urban Water Resources Assessment and Environmental Governance)

Download

Browse Figures

Versions Notes

Abstract

Urban water distribution networks (WDNs) are increasingly vulnerable to diverse disruptions, including pipe leaks/bursts and cyber–physical failures. A critical step in a resilience-based approach against these disruptions is the rapid and reliable identification of failures and their types for the timely implementation of emergency or recovery actions. This study proposes a framework for sensor placement and multiple failure type classification in WDNs. It applies a wrapper-based feature selection (recursive feature elimination) with Random Forest (RF–RFE) to find the best sensor locations and employs an Autoencoder–Random Forest (AE–RF) framework for failure type identification. The framework was tested on the C-town WDN using the failure type scenarios of pipe leakage, cyberattacks, and physical attacks, which were generated using EPANET-CPA and WNTR models. The results showed a higher performance of the framework for single failure events, with accuracy of 0.99 for leakage, 0.98 for cyberattacks, and 0.95 for physical attacks, while the performance for multiple failure classification was lower, but still acceptable, with a performance accuracy of 0.90. The reduced performance was attributed to the model’s difficulty in distinguishing failure types when they produced hydraulically similar consequences. The proposed framework combining sensor placement and multiple failure identification will contribute to advance the existing data-driven approaches and to strengthen urban WDN resilience to conventional and cyber–physical disruptions.

Keywords:

urban water systems; machine learning; feature selection; cyber–physical system; cyber–physical attacks; anomaly detection

1. Introduction

Water distribution networks (WDNs) play a vital role in meeting the growing water demand of cities. This urban water infrastructure needs to be regularly monitored to ensure reliable water supply services [1]. A pioneering strategy may include smart water management, data analytics, automation, and resource conservation. The solutions need to be resilient, sustainable, and responsive to future challenges. Urban WDNs are continuously facing various challenges from water availability under climate change to pipe deterioration and leaks/bursts with aging, as well as cyberattacks, and natural disasters which can cause partial or complete failures in the WDNs’ functions [2,3]. As WDNs are critical infrastructure, they require increased resilience to those challenges to maintain service standards [4].

In this context, the concept and strategies of resilience have brought significant attention in recent years to the design, operation, and management of WDNs [5]. One of the practices of resilience strategies is a cyber–physical system (CPS) or smart system approach in the design, operation, and management of the WDNs. A CPS is equipped with SCADA (Supervisory Control and Data Acquisition) including sensors/meters, information and communication technologies, and remote controllers [6]. Through water flow and pressure sensors, water CPSs collect and analyze data on the hydraulic and operational conditions of WDNs and detect anomalies in operation and services. This can help system managers to quickly take emergency actions (control modification) to minimize disruptions in water supply services and to implement recovery options to restore the disrupted services. Thus, appropriate sensor placement for effective monitoring and accurate anomaly detection are the start for implementing resilience strategies.

In general, adequate sensor placement for a WDN is not straightforward. A WDN consists of hundreds of demand nodes and pipes and multiple water tanks, which can be potential locations to install sensors to measure the hydraulic and operational performances of the WDN. However, only a limited number of sensors can be deployed in WDNs due to budget constraints in sensor installation and maintenance and limited accessibility to selected sensor locations [7]. This will force a trade-off with the WDN monitoring effectiveness achieved. These constraints consequently motivate a strategy to find optimal sensor locations in WDNs while minimizing the number of sensors and maximizing the WDN monitoring performance. Thus, sensor placement is an optimization problem to find the best set of the limited number of sensors and their locations in a WDN [7,8].

A lot of sensor placement strategies have been developed for the detection of conventional types of failure, such as leakage, pipe burst, and water contamination. For example, ref. [9] used a dynamic prediction graph neural network with genetic algorithms to find the optimal sensor placement for water quality monitoring. Ref. [10] used a hybrid information–entropy approach for pressure sensor placement in leak detection. Similarly, ref. [11] used the multi-objective pressure sensor placement method for burst detection.

In addition to the conventional failures, WDNs are now vulnerable to cyber–physical attacks because of smart infrastructures development [12]. There have been several studies on the detection of these attacks. Ref. [13] used deep learning to detect cyber–physical attacks. Ref. [14] used a model approach for detection. Ref. [15] used an artificial neural network for attack detection. The research focused on the capacity of each of their techniques to detect cyber–physical attacks. The data developed during the BATADAL contained a fixed number of sensors. The same sensor location was used for detection approaches in most of the research [13]. However, few studies have addressed sensor placement strategies for the identification of cyber–physical attacks. In addition, previous studies have paid less attention to sensor placement strategy for distinguishing a specific failure event among various types of potential failures in a WDN, or for detecting disruptions resulting from a combination of multiple failure events. Ref. [16] used support vector machine, random forest (RF), and artificial neural network (ANN) to identify the failure types along with cyberattacks and physical attacks in a WDN. The research used the same fixed number of sensors that have been used in the BATADAL in C-Town network. There is a necessity for research to find an approach to determine the optimal sensor locations in failure identification for diverse failure types, and a combination of multiple failure types.

The literature has mainly focused on two approaches for failure detection and sensor placement: model-based [17] and data-based [18]. Model-based processes depend on systemic models [19]. Additionally, a model-based method may not be reliable when constrained by the limited fidelity of hydraulic models, which depend on the accurate estimation of model parameters and the description of operational conditions. An alternative would be to use data-driven strategies to address sensor placement and detection. As a result, failure diagnosis may be accomplished using machine learning; in general, a supervised classification model. In addition, strategies to choose only pertinent features (sensors) can be identified for efficient and reliable failure detection. Ref. [20] applied mixed integer linear programming (MILP) for sensor location optimization in contamination detection with the performance objective of the affected population, time of detection, the quantity of contaminant retained, contaminant extent, and frequency of undiagnosed occurrences. Ref. [21] employed the multi-objective white whale optimization algorithm to develop an optimal layout model for sensor placement. Water pressure correlation, water pressure sensitivity, and water demand of nodal points were taken as objective functions to build the model. Ref. [22] provided a geographical analysis based on graph theory and GIS for improving the placement of acoustic sensors for leak detection. Additionally, they used ground elevation-based pressure sensor optimization to ensure that every district metered area (DMA) contained sensors and physically covered every network node. To detect water loss incidents effectively and profitably, ref. [23] used an entropy-based technique. The strategy used entropy’s maximality, subadditivity, and equivocation features. The greedy search heuristic method was employed to maximize entropy to obtain optimum sensor placement. Ref. [24] treated sensor placement as an integer optimization problem with the goal of minimizing non-isolable leaks following previously proposed isolability criteria. A genetic algorithm was chosen as the strategy for solving the optimization problem due to its size and nonlinear integer nature. Ref. [25] applied information theory. The goal was to increase the number of sensors with relevant information and to reduce the amount of redundant information between the chosen sensors. The solution was produced using a heuristic method. Ref. [26] employed the clustering technique to tackle the sensor optimization problem in leak detection, which involved an unsupervised classification of patterns with a branch and bound search.

Water quality monitoring, leak/burst detection, and localization are the initial goals of sensor deployment research in WDNs. Recent years have seen the development of various cyber–physical intrusion diagnosis models; however, limited attention has been paid to the sensor placement strategy for cyberattack and physical attack detection Although there have been many studies on the detection of cyber–physical attacks, most of them have used the battle of the attack detection algorithm (BATADAL) datasets and sensor features regarded as SCADA sensors [13]. Additionally, previous research has concentrated on the sensor placement issue for a single failure event. The strategy for selecting sensors to identify a specific failure event among various types of potential failures has not been adequately established. Furthermore, the existing sensor placement method’s performance for multiple failure events compared to single failure events is as yet unknown. It is equally important to establish whether the sensor placement strategy created for a single failure event or for identifying one failure type has the same significance for identifying other failures (multiple failure event). This aids in administration and operational management so that the most critical sensors can be more clearly identified, and more significant security protocols can be devised to protect those sensors from physical and cyberattacks.

This paper formulates sensor placement strategy as the feature selection problem. It uses a wrapper-based feature selection, i.e., recursive feature elimination with the RF model (RF–RFE), to choose sensor locations. Wrapper methods construct and employ the model to score feature subsets produced as part of a heuristic search [27]. The process of selecting relevant features for the dataset by applying specified criteria to a subset of an initial feature set is known as feature selection [28]. Numerous feature selection approaches have been developed through years of research to obtain the best subset possible from the initially generated features. However, irrelevant, and redundant features can impair classification performance and can make them ineffective in real-world applications, even after feature selection. The feature selection in machine learning for the failure identification and sensor placement strategy has been considered in other research domains as a sensor placement strategy. Ref. [29] conducted the feature selection based on the mutual information metric and performed the outlier detection using principal component analysis. Ref. [30] applied the wavelet denoising technique to extract nine important features and one class support vector machine (SVM) for anomaly detection in earth dams. Ref. [31] used recursive feature elimination using RF for feature selection and a deep learning classifier for cyberattack detection in computer networks. Similarly, ref. [32] proposed a sensor selection approach for leak localization based on a classification approach coupled with a hybrid feature selection approach.

The second part of this research proposes to use the ensemble of an autoencoder (AE) and RF (AE–RF) for failure type identification for both a single failure event and multiple failure events. The process is carried out after sensor selection with RF–RFE. Due to the enormous dimensions of many real-world data sets, computational time and space are significantly increased [33]. A possible solution would be to use feature extraction as dimensionality reduction in the preprocessing stage to deal with the high time demands and complexity in the classification phase for failure diagnosis. The relevant research highlights deep learning classifiers that perform well with large amounts of data. Autoencoder, a deep learning neural network, has been extensively studied for unsupervised anomaly detection, as it can only be trained on normal datasets [34]. Some studies have also made use of its multirole capacity for dimensionality reduction through feature extraction [35,36]. To identify failures in a WDN, this study investigated the capability of AE dimensionality reduction in conjunction with the RF classifier. This study’s goal is to build a failure identification model using an ensemble of AE feature extraction and the RF (AE–RF) model from a chosen sensor feature through RF–RFE.

Overall, this research offers a common sensor placement strategy for single failure events and multiple failure events using RF–RFE and proposes a failure identification framework using AE–RF. Other urban smart infrastructure systems, such as transportation, traffic management, gas line, smart agriculture, and flood control, among others, can use a similar approach. Overall, the model created using this method improves urban water systems’ operational management, system resilience, and adaptiveness to any dysfunctionality in the future, thereby promoting sustainable water management.

2. Materials and Methods

This study presents an integrated framework for a sensor placement strategy by applying a feature selection approach with RF–RFE and failure identification using the ensemble of deep learning AE with the RF classifier (AE–RF). Figure 1 shows the approach presented in this study. Selected sensor locations were tested using the AE–RF to identify failures. First, different types of failures, such as leakage, cyberattack, and physical attack, were considered, and then the sensor location for a single failure event was determined using the feature selection techniques in machine learning. Second, the same approach was applied to identify multiple failure events (chance of any failure from various failure types). Following the creation of sensor placement strategies for each single and multiple failure scenario, the performance of AE–RF to identify failures was evaluated. This study’s approach to find optimal sensor locations and the methods to evaluate its performance for failure identification are described in detail below.

2.1. Feature (Sensor Locations) Selection with RF–RFE for Sensor Placement

2.1.1. Random Forest Classifier

Random forest (RF) is a widespread and influential classifier with multiple decision trees and an ensemble classifier [37]. The bootstrap resampling technique arbitrarily chooses sample subsets in each decision tree during the training phase. The scores from each decision tree combine to provide the RF’s ultimate classifying output. The correlation between the created decision trees and each tree’s capacity for categorization affects the classifier’s performance. In classification, the voting method provides a probabilistic value for each potential output class. When all the trees’ probabilities are averaged, the class with the highest value is identified as a predicted output.

RF is superior to other machine learning methods, such as decision tree (DT), support vector machine (SVM), because of its unbiased estimator and parallelization, which limits overfitting, and reduces the impact of outliers on prediction [38]. Since RF is also a group of decision trees, its overfitting difficulties can be resolved while maintaining the prediction and classification performance by averaging the results of the various decision trees. The feature importance measure, which highlights the significance of each sensor location, is one of the main benefits of RF [39]. Sensor locations are evaluated as being more significant than the sensor features with fewer importance scores. The feature importance values offer a way to assess the value of each sensor feature’s contribution to failure identification. An out-of-bag error is first calculated for each decision tree to determine the feature relevance. The importance of a particular feature is permuted across all test data, and the out-of-bag error is once more determined. The difference between the two out-of-bag errors demonstrates the significance of that specific attribute. The feature is crucial if the error increment is significant.

2.1.2. Recursive Feature Elimination

Any fault detection and diagnosis method’s foundation depends on creating background knowledge about the system flaws. The detailed specification of all the known potential defects must be performed in this manner. The next issue is detecting and identifying these defects on new data once the process faults have been described. Therefore, the ideal choice of critical sensor site is crucial for successfully characterizing failures. This study aims to develop an effective monitoring sensor by using a feature selection approach to choose among all potential sensor placements. Feature selection is used to evaluate each feature simultaneously to decide which features impact the dataset’s results. High precision is sought by lowering the dimensions of the high-dimensional data. Recursive feature elimination (RFE), a wrapper-based feature selection technique, evaluates the classifier’s performance by deleting attributes iteratively [31]. The goal of the RFE employed in this study is to provide a method for placing a specific number of internal pressure sensors in the DMA of a WDN to achieve a sensor configuration with the best possible performance for failure identification.

RFE provides the classifier with all the features to calculate the model’s performance. Calculations are made to determine the relative weights of each feature utilized in the classifier. The features are then iteratively removed to form the subset [38]. The classifier model is retrained after each subset, and the performance is assessed. Additionally, the cluster’s feature ranking or relevance values are calculated. The classifier model can calculate importance or ranking values using information gain or the Gini index based on the model used. Figure 2 shows the RF–RFE method with cross validation that was used in this study. First, we used the RF technique on the training data and determined the relative relevance of each sensor data based on how much it contributed to failure identification. The sensors were then ranked in importance, from most important to least important. This phase involved obtaining a ranking of the sensors. Finally, we removed the least significant part, retrained the RF model using the updated sensors, and achieved classification performance using 10-fold cross-validation with the new sensor set. The cross-validation accuracy was then assessed to see if the weakest sensor was removed. The procedure is repeated until accuracy decreases or stays the same while eliminating sensor sets to provide the ideal sensor placement.

2.1.3. AE–RF for Failure Identification

An autoencoder is a popular dimensionality reduction deep learning neural network that contains two major components: an encoder and a decoder [38]. The encoder converts the features as input into a latent representation used by a decoder to recreate the input feature at the output. In a general scenario in AE, the number of nodes decreases as the layer increases. The most hidden layer is also called the bottleneck layer. Suppose we have normal condition data with a sensor set {x₁, x₂, x₃… x_n}. This data can be reconstructed with AE as {x′₁, x′₂, x′₃, … x′_n}. An error generated during the reconstruction, as defined in Equation (1), is called the reconstruction error (RE).

E {(x_{i}, {x^{'}}_{i})}^{2} = \sum_{1}^{d} {(x_{i} - {x^{'}}_{i})}^{2}

(1)

The RE is lower when the AE model is trained on the dataset for normal conditions. Setting the RE threshold to the point at which it can be categorized as a failure state allows for differentiating failure cases from the AE model [38]. Research on failure detection in WDNs has shown that AE is proficient in unsupervised anomaly detection by assessing the reconstruction error [13,40]. This study demonstrates its capability in building its multirole capacity [36] with the RF classification model for multiple failure type identification. The ensemble model for multiple failure identification comprises the encoder layer, the bottleneck layer, and the supervised classifier (RF) networks, as shown in Figure 3. In the hidden layer, the input features are condensed to fewer neurons. The activation i is described in the Equation (2) as follows:

h_{i} = f_{θ} (x) = s (\sum_{j = 1}^{n} w_{i j}^{i n p u t} x_{j} + b_{i}^{i n p u t})

(2)

Here, x is defined as the number of input features, w is the encoder weight, and b is a bias vector. After an encoder model is trained, the reduced dimensional sensor set is used as an input feature for the random forest classification model for failure identification.

Three hidden layers with thirty features (sensors) as the number of neurons in the input layer were selected for the autoencoder used in this model. In the first hidden layer, the number of neurons was 24. In the second hidden layer, the number of neurons was 18. And lastly, the third hidden layer contained 12 neurons. The tanh activation function was applied in the model. The “Adam” optimizer was used since it reduces reconstruction loss. With 500 epochs, the batch size (the number of training data samples in one iteration) was set to 16. The model employed early stopping to avoid data overfitting. After developing the encoder model, the collected features were then trained to the random forest multiclass classification for failure identification. Ten decision trees using entropy as the criterion were selected for the random forest classifier.

2.2. Evaluation

This study treats failure identification as a binary and multiclass classification problem. Accuracy cannot be used alone to evaluate a classifier in the failure identification problem. There are other performance indicators for failure diagnosis [41]. Equations (3)–(6) describe the evaluation criteria utilized in this study, where a true positive (TP) is a case where the model correctly predicts positive outcome, true negative (TN) is a case where the model correctly predicts a negative outcome, false positive (FP) is a case where the model incorrectly predicts a positive outcome for an actual negative instance, and false negative (FN) is a case where the model incorrectly predicts a negative outcome for an actual positive instance.

A c c u r a c y = \frac{(T P) + (T N)}{(T P) + (T N) + (F P) + (F N)}

(3)

P r e c i s i o n = \frac{(T P)}{(T P) + (F P)}

(4)

R e c a l l (S e n s i t i v i t y) = \frac{(T P)}{(T P) + (F N)}

(5)

F 1 s c o r e = \frac{2 * (p r e c i s i o n * r e c a l l)}{(p r e c i s i o n + r e c a l l)}

(6)

In the context of WDN resilience, these indicators have practical significance [42,43]. Accuracy reflects the overall dependability of the classification framework in distinguishing among diverse failure events and normal states. Precision reduces false alarms, thereby avoiding unnecessary maintenance actions and operating expenses. For leak and burst detection, recall is essential since low recall can lead to failures going unnoticed, potentially resulting in significant service disruptions. Finally, the F1-score provides a balanced compromise between recall and precision, which makes it especially suitable for multiple failure scenarios where operational resilience depends on both minimizing false alarms and ensuring critical failures are identified.

2.3. Data Generation

2.3.1. Study WDN

In this study, the C-town WDN was used to test the approach described above for sensor placement and failure identification (Figure 4). The C-town WDN has been employed in previous studies to investigate its performance (e.g., reliability, resilience, and water supply) under various failure events, such as cyberattacks, operational disruptions, pipe leaks and bursts, and contaminant intrusion [13,16,40,44,45]. The C-town WDN consists of the physical components of 388 demand nodes, 432 pipes, 11 pumps, seven tanks, and one actuated valve and the cyber components (SCADA) of nine PLCs and a data acquisition server [13]. The PLCs receive data from the system components (e.g., nodal pressure, tank water levels, and pump flow) and transfer the data to the SCADA system that detects and identifies a disruptive event. All seven tanks and nodes pressure were used to train the model for sensor selection.

2.3.2. Failure Scenarios

EPANET-based hydraulic simulation models, including EPANET-CPA [46], WNTR [47], were utilized to develop the realistic scenario of a cyberattack, a physical attack, leakage, and standard operating stage. The lack of failure data for the WDN operating under different failure conditions makes using the WDN model failure simulation necessary. This study considered pressure-driven (PD) conditions during the entire hydraulic simulation for analyzing the hydraulic performance in a pressure deficient state [48]. The sections below describe the simulation period, scenarios, and duration for each state (normal operation, cyberattack, and physical attack of WDN).

(a): Normal Operating state

A WDN running under a systemic design state without the presence of any discrepancy and external intrusion is termed a normal operating state in this research. A WNTR model was used for the simulation of the normal state. The simulation created the normal condition dataset of 6 months (4320 h). The default parameters, such as demand, demand patterns, and characteristics (valve, pipe, tanks, pumps), were used as default, as provided in the C-town WDN.

(b) Cyberattack

In earlier research, the cyberattack scenario was described and simulated using EPANET-CPA [46]. In this study, a cyberattack simulation is performed using the same model. The model simulated a cyberattack over six days (144 h). Table 1 provides a detailed description of the cyberattack simulation scenario. In this study, the problem of sensor placement and failure identification is solved using 128 datasets, each with 144 h. Each scenario and attacked component were designed to last 96, 108, 120, and 132 h, respectively.

The attack on the communication route between the tank’s water level and the PLCs is represented by scenario 1 in the table. During the simulation, the following components were attacked: T1/PLC2, T2/PLC3, T3/PLC4, T4/PLC5, T5/PLC7, and T7/PLC9. This attack causes the control rule to be activated and sends the information to the PLC to turn off the pump by having the operator window always show a high tank level regardless of its actual condition. Scenario 2 in Table 1 shows how the PLCs’ control logic was changed, which led to the PLC turning the pump on and off erratically. In this case, PLC1’s pump 1/2, PLC3’s pump 4/5, PLC3’s pump 6/7, and PLC5’s pump 8, pump 10, and pump 11 are the compromised components.

The third scenario involved a denial of service (DOS) attack [6]. The attacker can render the system unreachable with this attack. It halts the transmission of data, the sending or receiving commands, and the execution of orders by sensors, actuators, and PLCs. As a result, in the third scenario of this research, the PLC was unable to obtain updates from another PLC on the water levels in the tanks, which led to the pump running continuously and eventually resulting in the tank overflowing. The following attack components were designed for dataset generation: PLC2/PLC1, PLC4/PLC3, PLC9/PLC5, PLC6/PLC3, and PLC7/PLC5. Finally, attack concealment, as a replay attack, was considered for the fourth, fifth, and sixth scenarios by replicating the failure patterns from scenarios 1, 2, and 3. In this attack, the hackers intercept and decode the PLC signal before saving the reading of the system’s normal operational status. The adversaries then slightly alter those readings with a random component and replace them with the data during the attack scenario built similarly to the one above.

(c) Physical Attack

Physical attack characterization and simulation for dataset generation were conducted using EPANET-CPA. If the attacker has direct physical access to the system, a physical attack on the WDN is feasible. Any type of physical attack is possible, including contaminant intrusion, sensor damage, manipulation, and control over the operation of actuators, like pumps and valves. This study considers modifying the pump’s operation in opposition to the control rule by gaining direct access to the system as a physical attack. The details of the component that was attacked are provided in Table 2. The physical attack simulation was run for 144 h, and 72 datasets were produced.

(d) Water leakage (conventional physical failure)

The WNTR model is effectively constructed to simulate pipe leaks/bursts in the WDN. In general, previous studies on significant leaks (or pipe bursts) have specified the emitter coefficient for leakage simulation. For modeling the leakage scenario (Table 3), the WNTR model used the relation suggested by [49] as indicated in Equations (7) and (8).

d_{leak} = C_{d} A p^{α} \sqrt{\frac{2}{ρ}}

(7)

d_{leak} = c_{d} A \sqrt{2 g h}, α = 0.5 s

(8)

2.3.3. Data Generation for System Performance

The WNTR model was used to generate the leakage dataset in this research. WNTR is an open-source model developed in the python interface popular in simulating multiple failure states, resilience strategies, and contamination conditions in the WDN, which makes it appropriate for assessing system performance under normal and extreme conditions. Similarly, EPANET-CPA, a MATLAB 2021a-based toolbox, was used for simulating the cyberattack and the physical attack in the WDN. EPANET-CPA is developed to connect the WDN components, such as tanks, valves, pumps, and pipes, with the network architecture, such as PLCs, and SCADA. This integration extends the traditional EPANET capabilities by providing the necessary datasets to evaluate the vulnerability of water systems to threats and operational failures.

3. Results and Discussion

The C-town WDN was used to test the presented framework for sensor placement and failure identification. The RF–RFE model initially considered the pressure sensor data from all nodes in the WDN as input features for selecting the best sensor locations. In the second stage, the number of sensors was constrained to 30. Sensor placement optimization was carried out, followed by failure identification with the ensemble of AE–RF. The framework was applied to identify the sensor placement strategies for each failure type independently and for scenarios with multiple failures occurring at once. The performance of failure identification was evaluated using the AE–RF model at the later stage. Figure 5 signifies the sensor placement strategy developed for the identification of single failure events and multiple failure events (combinations of multiple different failures including pipe leaks/bursts, cyberattacks, and physical attacks).

For leakage identification, it was observed that most of the selected sensor locations were positioned at upstream nodes. Of the seven tanks, the pressure of tank T1 in the upstream vicinity was selected as one of the thirty sensors. This indicated that the junction node pressure was more important for identifying leaks than the tank water levels. For cyberattack identification, as shown in Figure 5b, the best sensor locations were more widely distributed. Five of the seven tank water levels and another 25 junction nodes were identified as essential sensors. It was noted that upstream sensors were more critical for the cyberattack scenarios because most cyberattacks created in this study resulted in tank overflows or insufficient tank water levels. This result indicates that the demand side of the WDN experienced more significant hydraulic repercussions than the supply side.

For physical attack identification, the selected sensor locations are also shown in Figure 5c. Six tanks’ water levels, together with other sensors at the junction nodes, were important, following a positioning similar to cyberattack detection. The quick start–stop operation of the pumps during a physical attack led to tank overflows and water shortages, resulting in hydraulic responses similar to those observed during a cyberattack. Consequently, installing sensors in the downstream of the WDN was justified to capture the sudden hydraulic effects observed in the downstream junctions.

Finally, the sensor placement strategy for identifying multiple failure events combined both upstream and downstream locations across the C-town WDN to detect and distinguish a failure event from multiple failure types. Water levels from three of the seven tanks were selected as sensor locations (features), indicating that tank water levels play a more significant role in identifying multiple failures. The choice of sensor locations for multiple failure identification highlights the trade-offs between upstream and downstream sensor placement locations in the WDN designed to detect and distinguish conventional failure (pipe leaks/bursts) and cyber–physical attack effectively.

For the second step, the performance of the presented framework (AE–RF model) was evaluated in identifying individual single failure events and multiple failure events using the sensor placements (Figure 5) developed by the RF–RFE model. Table 4 and Table 5 present the levels of four evaluation indicators for the framework performance in identifying each single failure event and multiple failure events, respectively. It is noted from Table 4 and Table 5 that sensor placements based on individual failure events achieved an accuracy ranging from 0.95 to 0.99 for identifying corresponding single failures, while those based on multiple failure events produced an accuracy of 0.90 in classifying and distinguishing multiple failures, which still demonstrates acceptable performance. Overall, the sensor placements determined through the proposed framework demonstrated the superior performance in leakage identification, with a slightly lower performance observed for cyberattacks and physical attacks.

The relatively low performance (recall = 0.40 and F1-score = 0.47) for physical attack identification indicates that these events are more difficult to identify than leakage or cyberattacks. Physical attacks, such as direct pump shutdowns, can lead to significant nodal pressure drop or rise across the WDN, which can make it relatively easy for anomaly detection to recognize that a failure has occurred. However, the hydraulic consequences from physical attacks can often overlap or be similar to those of cyberattacks—e.g., both cyberattacks on pump controllers (PLCs) and direct physical attacks to pumps can cause pump shutdowns, leading to substantial pressure drop. This similarity reduces the classification performance of the AE–RF model, leading to misclassification among the different failure types. As shown in Table 1 and Table 2, the hydraulic responses from physical attacks can also be reproduced by cyberattack scenarios, while cyberattack scenarios include other failure cases that produce different hydraulic responses which cannot be observed in physical attack scenarios. This difference explains the higher performance of the proposed framework in distinguishing cyberattacks, compared to physical attacks under multiple failure events.

From the observation of relatively lower accuracy of the multiple failure-based sensor placement in identifying the failure types, a single failure-based sensor placement still remains necessary due to its higher accuracy. Lower accuracy can result in a delayed or incorrect response during a critical time, false alarms, or an increased operational risk due to cascading failures. The reduced accuracy for multiple failure identification was also evident when the hydraulic responses under the multiple failures showed similar patterns or consequences, making it challenging for the framework to separate one event from another—e.g., nodal pressure drops can also be caused by pipe leaks as well as pump failure, which can also be induced by cyberattacks or physical attacks. Although the framework produced high performance in identifying multiple failure events, with an overall accuracy of 0.9, further enhancements are considered necessary to more accurately distinguish between multiple failure events that can lead to similar hydraulic consequences.

In the earlier phase of this study, sensor placement was constructed to independently identify leakage, cyberattack, physical attack, and multiple failures. It is also essential to understand how the sensor locations selected for each failure identification perform when other failures or multiple failure types occur. Table 6 shows the framework performance for different failure identification goals using the different sensor placement strategies. All sensor placement strategies showed consistent performance in identifying leakage. On the other hand, the sensor placement based on leakage identification showed slightly reduced accuracy in identifying other single failure types and multiple failures. This is likely due to the spatially biased sensor deployment that was determined for leakage identification, which placed more priority on upstream nodes, while sensors for identifying physical attacks and cyberattacks were prioritized on nodes more distributed across the WDN (Figure 5).

The results from Table 6 indicate that the sensitivity of sensor placements to variations in the sensors’ spatial distribution and failure scenarios needs to be further investigated. For instance, leakage scenarios in this study had a 50 mm leakage diameter. Reducing the leakage size could degrade the performance of the sensor locations focused on leakage identification, because other sensor locations or distributions may not be able to detect the effects of smaller-scale failures. Similarly, cyber–physical attacks have extended attack periods, which last until the system recognizes them. If physical attacks and cyberattacks occur for a short time, they may go undetected. In this case, the best sensor locations for other failure types may not be able to classify these failures accurately.

4. Conclusions

Considering the uncertainties in climate and socioeconomic changes that influence urban water systems, the current urban water infrastructure may not fully prevent service disruptions caused by unexpected or uncertain failure events. As a critical component of a resilience-based strategy aimed at minimizing system losses and implementing rapid recovery, the timely and reliable detection of disruptions and the identification of failure types are essential to support effective emergency response and recovery actions. In this context, this study presented a sensor placement framework for urban WDNs integrating a well-established RF–RFE feature selection approach to identify the best sensor locations and the ensemble of the AE–RF (autoencoder with random forest classifier) model to identify the failure types—leakage, cyberattacks, and physical attacks. The demonstration of the framework highlighted upstream sensor nodes for leakage identification, with nodal pressure given greater weight than tank water levels, reflecting the greater hydraulic impact of leakage on upstream nodes. By contrast, downstream sensors and tank water levels were more critical for identifying other failure types. The relevance of each selected sensor location was also highlighted, providing decision-makers with insights to implement additional security measures and to ensure the deployment of sensors resilient to malfunction and disruptions.

The proposed framework also demonstrated reasonable accuracy of 0.99 for leakage, 0.98 for cyberattacks, 0.95 for physical attacks, and 0.90 for multiple failure scenarios. The results explain that sensor placement using the proposed framework can contribute to enhancing WDN monitoring and supporting rapid and reliable failure detection. By integrating with the SCADA systems, this framework can improve real-time decision-making in urban WDN operations to more effectively distinguish failure events and to improve resilience against cyber–physical attacks. The framework’s performance was also superior in identifying each single failure type compared to the classification in multiple failure scenarios. Although the framework suggests an acceptable performance for failure identification from single and multiple failure types, the relatively lower accuracy highlights the challenges of overlapping hydraulic consequences across different failure events.

This study investigated the potential of the RF–RFE and AE–RF framework for sensor placement and failure identification in urban WDNs. While the main focus of this study was to test the feature selection methods and data-driven models currently used in practice, several limitations were identified. Practical deployment of the framework is challenging, considering model dependency, scalability issues, static network operations, and cost constraints. It is believed that a comparison of the proposed framework with existing sensor placement and failure classification approaches (e.g., GA-based optimization methods, entropy-based strategies, single ML models such as SVM and LSTM) is equally important. Thus, future research needs to conduct comparative experiments using the C-town WDN dataset and to evaluate additional performance metrics, including computational efficiency, robustness under sensor data loss, cost-effectiveness and accessibility, and accuracy in failure identification.

The reduced accuracy in identifying multiple failure events was attributed to the similarity of hydraulic consequences across different failure events. In this regard, future studies can advance the framework by incorporating advanced models, such as deep learning models and ensemble models, or hybrid approaches that integrate multiple physics-based models and develop a customized framework that considers WDN-specific properties, such as nonlinear hydraulic performance, topological irregularities, and temporal correlations in pressure/flow signals. In addition, these models should be tested for real-world scenarios with multiple failures. This will lead to the challenge of acquiring data with multiple and varied failure events. Incorporating frequency-based features and redundant sensor data can further improve the efficiency of a failure detection model to capture multiple failure events. Another insight into improving the performance of multiple failure classification can be understanding the impact of spatially biased sensor deployment. When sensors are unevenly distributed, failure detection and the type of identification may be unreliable in areas with insufficient sensor coverage. For example, a downstream-focused sensor layout may not fully capture upstream pump disruptions that could be critical for detecting physical attacks. Thus, the sensitivity analysis of a spatially biased sensor placement under multiple failure scenarios is recommended to improve the proposed framework.

In this study, the sensor placements created for a single failure type was also tested to identify different failure types. For example, the sensor placement for leakage identification was applied to identify other failure types, showing a modest decrease in performance. However, meaningful comparisons were limited because the database used for this study comprised failure simulations with maximum hydraulic impacts as feasible. For instance, the leakage diameter was set to 50 mm, and cyberattacks and physical attacks were simulated until the end of the simulation as an unidentified failure. Large or severe failures can produce hydraulic impacts that spread the disruption to many nodes. In order to prevent the hydraulic consequences of all failure types from spreading to the entire system and to consider the possibility that small changes may go undetected in the system, future research could investigate smaller-scale failures (subtle disruptions) for all failure types.

Author Contributions

Conceptualization, S.S. and U.P.; methodology, S.S. and U.P.; software, U.P.; validation, U.P., B.A.M. and A.B.G.; formal analysis, S.S. and U.P.; investigation, U.P., B.A.M., A.B.G. and S.S.; resources, S.S.; data curation, U.P., B.A.M. and A.B.G.; writing—original draft preparation, U.P. and S.S.; writing—review and editing, S.S., U.P., B.A.M. and A.B.G.; visualization, U.P.; supervision, S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding authors.

Acknowledgments

This article is based on the author’s master’s thesis entitled “Data-Driven-Modeling-Based Sensor Placement for the Detection and Identification of Failures in Cyber Physical Water Distribution Networks”, completed at Southern Illinois University.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ramos, H.M.; Kuriqi, A.; Besharat, M.; Creaco, E.; Tasca, E.; Coronado-Hernández, O.E.; Pienika, R.; Iglesias-Rey, P. Smart Water Grids and Digital Twin for the Management of System Efficiency in Water Distribution Networks. Water 2023, 15, 1129. [Google Scholar] [CrossRef]
Clark, R.M.; Panguluri, S.; Nelson, T.D.; Wyman, R.P. Protecting Drinking Water Utilities from Cyberthreats. J. AWWA 2017, 109, 50–58. [Google Scholar] [CrossRef]
Acharya, A.; Liu, J.; Shin, S. Evaluating the Multi-Dimensional Resilience of Water Distribution Networks to Contamination Events. Water Supply 2023, 23, 1416–1433. [Google Scholar] [CrossRef]
Ghimire, A.B.; Parajuli, U.; Bhusal, A.; Parajuli, A.; Banjara, M.; Shin, S. Investigating a Diversified and Decentralized Water Distribution System to Enhance Water Supply Resilience to Disruptive Events. In World Environmental and Water Resources Congress 2023: Adaptive Planning and Design in an Age of Risk and Uncertainty, Proceedings of the World Environmental and Water Resources Congress 2023, Henderson, NV, USA, 21–24 May 2023; ASCE Press: Reston, VA, USA, 2023; pp. 941–951. [Google Scholar] [CrossRef]
Shin, S.; Lee, S.; Judi, D.R.; Parvania, M.; Goharian, E.; McPherson, T.; Burian, S.J. A Systematic Review of Quantitative Resilience Measures for Water Infrastructure Systems. Water 2018, 10, 164. [Google Scholar] [CrossRef]
Taormina, R.; Galelli, S.; Tippenhauer, N.O.; Salomons, E.; Ostfeld, A. Characterizing Cyber-Physical Attacks on Water Distribution Systems. J. Water Resour. Plan. Manag. 2017, 143, 4017009. [Google Scholar] [CrossRef]
Yang, G.; Wang, H. Optimal Pressure Sensor Deployment for Leak Identification in Water Distribution Networks. Sensors 2023, 23, 5691. [Google Scholar] [CrossRef]
Diao, K.; Emmerich, M.; Lan, J.; Yevseyeva, I.; Sitzenfrei, R. Sensor Placement in Water Distribution Networks Using Centrality-Guided Multi-Objective Optimisation. J. Hydroinform. 2023, 25, 2291–2303. [Google Scholar] [CrossRef]
Salem, A.K.; Abokifa, A.A. Optimal Sensor Placement in Water Distribution Networks Using Dynamic Prediction Graph Neural Networks. Eng. Proc. 2024, 69, 171. [Google Scholar] [CrossRef]
Khorshidi, M.S.; Nikoo, M.R.; Taravatrooy, N.; Sadegh, M.; Al-Wardy, M.; Al-Rawas, G.A. Pressure Sensor Placement in Water Distribution Networks for Leak Detection Using a Hybrid Information-Entropy Approach. Inf. Sci. 2020, 516, 56–71. [Google Scholar] [CrossRef]
Du, K.; Yu, J.; Zheng, F.; Xu, W.; Savic, D.; Kapelan, Z. A Robust Multi-Objective Pressure Sensor Placement Method for Burst Detection in Water Distribution Systems. Water Resour. Res. 2024, 60, e2024WR037258. [Google Scholar] [CrossRef]
Bosco, C.; Raspati, G.S.; Tefera, K.; Rishovd, H.; Ugarelli, R. Protection of Water Distribution Networks against Cyber and Physical Threats: The STOP-IT Approach Demonstrated in a Case Study. Water 2022, 14, 3895. [Google Scholar] [CrossRef]
Taormina, R.; Galelli, S. Deep-Learning Approach to the Detection and Localization of Cyber-Physical Attacks on Water Distribution Systems. J. Water Resour. Plan. Manag. 2018, 144, 4018065. [Google Scholar] [CrossRef]
Housh, M.; Ohar, Z. Model-Based Approach for Cyber-Physical Attack Detection in Water Distribution Systems. Water Res. 2018, 139, 132–143. [Google Scholar] [CrossRef]
Abokifa, A.A.; Haddad, K.; Lo, C.; Biswas, P. Real-Time Identification of Cyber-Physical Attacks on Water Distribution Systems via Machine Learning–Based Anomaly Detection Techniques. J. Water Resour. Plan. Manag. 2019, 145, 4018089. [Google Scholar] [CrossRef]
Joaquim, S.; João, M.; Alfeu, S.M.; Ricardo, G. Optimal Management of Water Distribution Networks with Simulated Annealing: The C-Town Problem. J. Water Resour. Plan. Manag. 2016, 142, C4015010. [Google Scholar] [CrossRef]
Rodriguez, L.; Fernandez, C.; Pantano, N.; Scaglia, G.; Keesman, K.J. Optimizing Sensor Placement for Enhanced Observability in Water Distribution Networks. J. Hydroinform. 2025, 27, 946–959. [Google Scholar] [CrossRef]
Mahdi, N.M.; Jassim, A.H.; Abulqasim, S.A.; Basem, A.; Ogaili, A.A.F.; Al-Haddad, L.A. Leak Detection and Localization in Water Distribution Systems Using Advanced Feature Analysis and an Artificial Neural Network. Desalin. Water Treat. 2024, 320, 100685. [Google Scholar] [CrossRef]
Geelen, C.V.C.; Yntema, D.R.; Molenaar, J.; Keesman, K.J. Optimal Sensor Placement in Hydraulic Conduit Networks: A State-Space Approach. Water 2021, 13, 3105. [Google Scholar] [CrossRef]
Watson, J.-P.; Greenberg, H.J.; Hart, W.E. A Multiple-Objective Analysis of Sensor Placement Optimization in Water Networks. In Critical Transitions in Water and Environmental Resources Management, Proceedings of the World Water and Environmental Resources Congress 2004, Salt Lake City, UT, USA, 27 June 2004; ASCE Press: Reston, VA, USA, 2012; pp. 1–10. [Google Scholar] [CrossRef]
Guan, Y.; Lv, M.; Li, S.; Su, Y.; Dong, S. Optimized Sensor Placement of Water Supply Network Based on Multi-Objective White Whale Optimization Algorithm. Water 2023, 15, 2677. [Google Scholar] [CrossRef]
Agathokleous, A.; Xanthos, S.; Christodoulou, S.E. Real-Time Monitoring of Water Distribution Networks. Water Util. J. 2015, 10, 15–24. [Google Scholar]
Christodoulou, S.E.; Gagatsis, A.; Xanthos, S.; Kranioti, S.; Agathokleous, A.; Fragiadakis, M. Entropy-Based Sensor Placement Optimization for Waterloss Detection in Water Distribution Networks. Water Resour. Manag. 2013, 27, 4443–4468. [Google Scholar] [CrossRef]
Casillas, M.V.; Puig, V.; Garza-Castañón, L.E.; Rosich, A. Optimal Sensor Placement for Leak Location in Water Distribution Networks Using Genetic Algorithms. Sensors 2013, 13, 14984–15005. [Google Scholar] [CrossRef]
Santos-Ruiz, I.; López-Estrada, F.R.; Puig, V.; Valencia-Palomo, G.; Hernández, H.R. Pressure Sensor Placement for Leak Localization in Water Distribution Networks Using Information Theory. Sensors 2022, 22, 443. [Google Scholar] [CrossRef] [PubMed]
Sarrate, R.; Blesa, J.; Nejjari, F.; Quevedo, J. Sensor Placement for Leak Detection and Location in Water Distribution Networks. Water Supply 2014, 14, 795–803. [Google Scholar] [CrossRef]
Jović, A.; Brkić, K.; Bogunović, N. A Review of Feature Selection Methods with Applications. In Proceedings of the 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015—Proceedings, Opatija, Croatia, 25–29 May 2015; pp. 1200–1205. [Google Scholar] [CrossRef]
Cai, J.; Luo, J.; Wang, S.; Yang, S. Feature Selection in Machine Learning: A New Perspective. Neurocomputing 2018, 300, 70–79. [Google Scholar] [CrossRef]
Pascoal, C.; De Oliveira, M.R.; Valadas, R.; Filzmoser, P.; Salvador, P.; Pacheco, A. Robust Feature Selection and Robust PCA for Internet Traffic Anomaly Detection. In Proceedings of the 2012 Proceedings IEEE INFOCOM, Orlando, FL, USA, 25–30 March 2012; pp. 1755–1763. [Google Scholar] [CrossRef]
Fisher, W.D.; Camp, T.K.; Krzhizhanovskaya, V.V. Anomaly Detection in Earth Dam and Levee Passive Seismic Data Using Support Vector Machines and Automatic Feature Selection. J. Comput. Sci. 2017, 20, 143–153. [Google Scholar] [CrossRef]
Ustebay, S.; Turgut, Z.; Aydin, M.A. Intrusion Detection System with Recursive Feature Elimination by Using Random Forest and Deep Learning Classifier. In Proceedings of the International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism, IBIGDELFT 2018—Proceedings, Ankara, Turkey, 3–4 December 2019; pp. 71–76. [Google Scholar] [CrossRef]
Soldevila, A.; Blesa, J.; Tornil-Sin, S.; Fernandez-Canti, R.M.; Puig, V. Sensor Placement for Classifier-Based Leak Localization in Water Distribution Networks Using Hybrid Feature Selection. Comput. Chem. Eng. 2018, 108, 152–162. [Google Scholar] [CrossRef]
Wang, S.; Ding, Z.; Fu, Y. Feature Selection Guided Auto-Encoder. Proc. AAAI Conf. Artif. Intell. 2017, 31, 2725–2731. [Google Scholar] [CrossRef]
Li, L.; Yan, J.; Wang, H.; Jin, Y. Anomaly Detection of Time Series with Smoothness-Inducing Sequential Variational Auto-Encoder. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 1177–1191. [Google Scholar] [CrossRef]
Kunang, Y.N.; Nurmaini, S.; Stiawan, D.; Zarkasi, A.; Jasmir, F. Automatic Features Extraction Using Autoencoder in Intrusion Detection System. In Proceedings of 2018 International Conference on Electrical Engineering and Computer Science, ICECOS 2018, Pangkal, Indonesia, 2–4 October 2019; pp. 219–224. [Google Scholar] [CrossRef]
Ditthapron, A.; Banluesombatkul, N.; Ketrat, S.; Chuangsuwanich, E.; Wilaiprasitporn, T. Universal Joint Feature Extraction for P300 EEG Classification Using Multi-Task Autoencoder. IEEE Access 2019, 7, 68415–68428. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chen, Q.; Meng, Z.; Liu, X.; Jin, Q.; Su, R. Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE. Genes 2018, 9, 301. [Google Scholar] [CrossRef]
Rogers, J.; Gunn, S. Identifying Feature Relevance Using a Random Forest. In Subspace, Latent Structure and Feature Selection, Proceedings of the Statistical and Optimization Perspectives Workshop, SLSFS 2005 Bohinj, Slovenia, 23–25 February 2005, Revised Selected Papers; Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2006; Volume 3940, pp. 173–184. [Google Scholar] [CrossRef]
Fan, X.; Zhang, X.; Yu, X. Machine Learning Model and Strategy for Fast and Accurate Detection of Leaks in Water Supply Network. J. Infrastruct. Preserv. Resil. 2021, 2, 10. [Google Scholar] [CrossRef]
Sokolova, M.; Lapalme, G. A Systematic Analysis of Performance Measures for Classification Tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
Joseph, K.; Shetty, J.; Patnaik, R.; Matthew, N.S.; Van Staden, R.; Liyanage, W.P.; Powell, G.; Bennett, N.; Sharma, A.K. Early Leak and Burst Detection in Water Pipeline Networks Using Machine Learning Approaches. Water 2025, 17, 2164. [Google Scholar] [CrossRef]
Jun, S.; Jung, D. Exploration of Deep Learning Leak Detection Model across Multiple Smart Water Distribution Systems: Detectable Leak Sizes with AMI Meters. Water Res. X 2025, 29, 100332. [Google Scholar] [CrossRef]
Ghimire, A.B.; Magar, B.A.; Parajuli, U.; Shin, S. Impacts of Missing Data Imputation on Resilience Evaluation for Water Distribution System. Urban Sci. 2024, 8, 177. [Google Scholar] [CrossRef]
Noy, K.; Alex, F.; Mashor, H. Detecting Cyber-Physical Attacks in Water Distribution Systems: One-Class Classifier Approach. J. Water Resour. Plan. Manag. 2020, 146, 04020060. [Google Scholar] [CrossRef]
Taormina, R.; Galelli, S.; Douglas, H.C.; Tippenhauer, N.O.; Salomons, E.; Ostfeld, A. A Toolbox for Assessing the Impacts of Cyber-Physical Attacks on Water Distribution Systems. Environ. Model. Softw. 2019, 112, 46–51. [Google Scholar] [CrossRef]
Klise, K.A.; Hart, D.; Moriarty, D.M.; Bynum, M.L.; Murray, R.; Burkhardt, J.; Haxton, T. Water Network Tool for Resilience (WNTR) User Manual; Sandia National Lab.: Albuquerque, NM, USA, 2017. [CrossRef]
Muranho, J.; Ferreira, A.; Sousa, J.; Gomes, A.; Marques, A.S. Pressure-Driven Simulation of Water Distribution Networks: Searching for Numerical Stability. Environ. Sci. Proc. 2020, 2, 48. [Google Scholar] [CrossRef]
Crowl, D.A.; Louvar, J.F. Chemical Process Safety: Fundamentals with Applications; Pearson Education: London, UK, 2001. [Google Scholar]

Figure 1. Block diagram for the integrated sensor selection and failure identification approach in a WDN.

Figure 2. Flow chart of recursive feature elimination with cross validation.

Figure 3. The AE–RF model that includes an encoder layer of the AE and RF model for multiclass classify cation.

Figure 4. The C-town water distribution network [16].

Figure 5. Selected sensor locations and their importance for the identification of (a) leakage, (b) cyberattacks, (c) physical attacks, and (d) multiple failure events.

Table 1. Cyberattack scenario in simulated dataset.

Attack Summary	Compromised Components	Period (h)
Attack on Communication channel (Tank water level and PLC)	Tanks and PLCs	96, 108, 120, 132
Alteration of Control Logic in PLCs	PLCs and pumps	96, 108, 120, 132
Denial of Service (DOS) attacks in between PLCs	PLC to PLC	96, 108, 120, 132
Attack concealment (scenario 1)	Tanks, PLCs, and SCADA	96, 108, 120, 132
Attack concealment (scenario 2)	PLCs, Pumps and SCADA	96, 108, 120, 132
Attack concealment (scenario 3)	PLC to PLC and SCADA	96, 108, 120, 132

Table 2. Physical attack scenario in simulated dataset.

Attack Summary	Compromised Components	Period (h)
Pump turned on physically	Pump 1 to 11	96, 108, 120, 132
Pump turned off physically	Pump 1 to 11	96, 108, 120, 132

Table 3. Leakage scenario in simulated dataset.

Leakage Element	Diameter of Leak (m)	Period (h)
Leakage on every junction (one junction at each node)	0.05	96

Table 4. Identification of leakage/cyberattack/physical attack.

Sensor Selection Option	Failure State	Precision	Recall	F1-Score	Accuracy
Leakage-based sensor placement	Normal	0.98	0.99	0.98	0.99
Leakage-based sensor placement	Leakage	1.00	0.99	1.00	0.99
Cyberattack-based sensor placement	Normal	0.96	0.98	0.97	0.98
Cyberattack-based sensor placement	Cyberattack	0.99	0.98	0.98	0.98
Physical-attack-based sensor placement	Normal	0.92	0.97	0.94	0.95
Physical-attack-based sensor placement	Physical attack	0.97	0.93	0.95	0.95

Table 5. Identification of multiple failure event.

Sensor Selection Option	Failure State	Precision	Recall	F1-Score	Accuracy
Multiple-failure event-based sensor placement	Normal	0.95	0.99	0.97	0.90
	Cyberattack	0.75	0.81	0.78
	Leakage	0.97	0.98	0.98
	Physical attack	0.57	0.40	0.47

Table 6. Performance evaluation (accuracy) of sensor placement strategy on multiple failure types.

Sensor Selection Option	Leakage Identification	Cyber-Attack Identification	Physical Attack Identification	Multiple Failure Event Identification
Leakage-based sensor placement	0.99	0.97	0.94	0.88
Cyberattack-based sensor placement	0.99	0.98	0.95	0.92
Physical attack-based sensor placement	0.99	0.98	0.95	0.92
Multiple-failure event-based sensor placement	0.99	0.98	0.94	0.90

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Parajuli, U.; Magar, B.A.; Ghimire, A.B.; Shin, S. Sensor Placement for the Classification of Multiple Failure Types in Urban Water Distribution Networks. Urban Sci. 2025, 9, 413. https://doi.org/10.3390/urbansci9100413

AMA Style

Parajuli U, Magar BA, Ghimire AB, Shin S. Sensor Placement for the Classification of Multiple Failure Types in Urban Water Distribution Networks. Urban Science. 2025; 9(10):413. https://doi.org/10.3390/urbansci9100413

Chicago/Turabian Style

Parajuli, Utsav, Binod Ale Magar, Amrit Babu Ghimire, and Sangmin Shin. 2025. "Sensor Placement for the Classification of Multiple Failure Types in Urban Water Distribution Networks" Urban Science 9, no. 10: 413. https://doi.org/10.3390/urbansci9100413

APA Style

Parajuli, U., Magar, B. A., Ghimire, A. B., & Shin, S. (2025). Sensor Placement for the Classification of Multiple Failure Types in Urban Water Distribution Networks. Urban Science, 9(10), 413. https://doi.org/10.3390/urbansci9100413

Article Menu

Sensor Placement for the Classification of Multiple Failure Types in Urban Water Distribution Networks

Abstract

1. Introduction

2. Materials and Methods

2.1. Feature (Sensor Locations) Selection with RF–RFE for Sensor Placement

2.1.1. Random Forest Classifier

2.1.2. Recursive Feature Elimination

2.1.3. AE–RF for Failure Identification

2.2. Evaluation

2.3. Data Generation

2.3.1. Study WDN

2.3.2. Failure Scenarios

2.3.3. Data Generation for System Performance

3. Results and Discussion

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI