# A Machine Learning-based Algorithm for Water Network Contamination Source Localization

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Water Supply Network Benchmarks

#### 2.2. Algorithmic Framework

#### 2.3. ANN Classifier

#### 2.4. RF Regression

## 3. Results and Discussion

#### 3.1. Net3 Network Contamination Scenario

#### 3.2. Richmond Network Contamination Scenario

#### 3.3. Algorithm Parameters Investigation

## 4. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Abbreviations

BBN | Bayesian Belief Networks |

MC | Monte Carlo |

ANN | Artificial Neural Network |

LVQNN | Learning Vector Quantization Neural Network |

LS-SVM | Least Squares Support Vector Machines |

MCMC | Markov Chain Monte Carlo |

PSVM | Probabilistic Support Vector Machines |

PNN | Probabilistic Neural Networks |

RF | Random Forests |

HPC | High Performance Computing |

CWS | Center for Water Systems |

CPU | Central Processing Unit |

ML | Machine Learning |

SLURM | Simple Linux Utility for Resource Management |

RMSE | Root Mean Square Error |

MLP | Multi-layer Perceptron |

## References

- Ostfeld, A.; Uber, J.G.; Salomons, E.; Berry, J.W.; Hart, W.E.; Phillips, C.A.; Watson, J.P.; Dorini, G.; Jonkergouw, P.; Kapelan, Z.; et al. The battle of the water sensor networks (BWSN): A design challenge for engineers and algorithms. J. Water Resour. Plan. Manag.
**2008**, 134, 556–568. [Google Scholar] [CrossRef][Green Version] - Zhao, Y.; Schwartz, R.; Salomons, E.; Ostfeld, A.; Poor, H.V. New formulation and optimization methods for water sensor placement. Environ. Model. Softw.
**2016**, 76, 128–136. [Google Scholar] [CrossRef] - Ung, H.; Piller, O.; Gilbert, D.; Mortazavi, I. Accurate and Optimal Sensor Placement for Source Identification of Water Distribution Networks. J. Water Resour. Plan. Manag.
**2017**, 143, 04017032. [Google Scholar] [CrossRef][Green Version] - Guidorzi, M.; Franchini, M.; Alvisi, S. A multi-objective approach for detecting and responding to accidental and intentional contamination events in water distribution systems. Urban Water J.
**2009**, 6, 115–135. [Google Scholar] [CrossRef] - Alfonso, L.; Jonoski, A.; Solomatine, D. Multiobjective optimization of operational responses for contaminant flushing in water distribution networks. J. Water Resour. Plan. Manag.
**2010**, 136, 48–58. [Google Scholar] [CrossRef] - Hu, C.; Yan, X.; Gong, W.; Liu, X.; Wang, L.; Gao, L. Multi-objective based scheduling algorithm for sudden drinking water contamination incident. Swarm Evol. Comput.
**2020**, 55, 100674. [Google Scholar] [CrossRef] - Preis, A.; Ostfeld, A. A contamination source identification model for water distribution system security. Eng. Optim.
**2007**, 39, 941–947. [Google Scholar] [CrossRef] - Zechman, E.M.; Ranjithan, S.R. Evolutionary computation-based methods for characterizing contaminant sources in a water distribution system. J. Water Resour. Plan. Manag.
**2009**, 135, 334–343. [Google Scholar] [CrossRef] - Kranjčević, L.; Čavrak, M.; Šestan, M. Contamination source detection in water distribution networks. Eng. Rev.
**2010**, 30, 11–25. [Google Scholar] - Vankayala, P.; Sankarasubramanian, A.; Ranjithan, S.R.; Mahinthakumar, G. Contaminant source identification in water distribution networks under conditions of demand uncertainty. Environ. Forensics
**2009**, 10, 253–263. [Google Scholar] [CrossRef] - Xuesong, Y.; Jie, S.; Chengyu, H. Research on contaminant sources identification of uncertainty water demand using genetic algorithm. Cluster Comput.
**2017**, 20, 1007–1016. [Google Scholar] [CrossRef] - Yan, X.; Zhu, Z.; Li, T. Pollution source localization in an urban water supply network based on dynamic water demand. Environ. Sci. Pollut. Res.
**2019**, 26, 17901–17910. [Google Scholar] [CrossRef] [PubMed] - Adedoja, O.; Hamam, Y.; Khalaf, B.; Sadiku, R. Towards development of an optimization model to identify contamination source in a water distribution network. Water
**2018**, 10, 579. [Google Scholar] [CrossRef][Green Version] - Dawsey, W.J.; Minsker, B.S.; VanBlaricum, V.L. Bayesian belief networks to integrate monitoring evidence of water distribution system contamination. J. Water Resour. Plan. Manag.
**2006**, 132, 234–241. [Google Scholar] [CrossRef][Green Version] - De Sanctis, A.; Boccelli, D.; Shang, F.; Uber, J. Probabilistic approach to characterize contamination sources with imperfect sensors. In World Environmental and Water Resources Congress 2008: Ahupua’A; ASCE: Reston, VA, USA, 2008; pp. 1–10. [Google Scholar]
- Perelman, L.; Ostfeld, A. Bayesian networks for source intrusion detection. J. Water Resour. Plan. Manag.
**2013**, 139, 426–432. [Google Scholar] [CrossRef] - Neupauer, R.M.; Ashwood, W.H. Backward probabilistic modeling to identify contaminant sources in a water distribution system. In World Environmental and Water Resources Congress 2008: Ahupua’A; ASCE: Reston, VA, USA, 2008; pp. 1–10. [Google Scholar]
- Shen, H.; McBean, E. False negative/positive issues in contaminant source identification for water-distribution systems. J. Water Resour. Plan. Manag.
**2011**, 138, 230–236. [Google Scholar] [CrossRef] - Huang, J.J.; McBean, E.A. Data mining to identify contaminant event locations in water distribution systems. J. Water Resour. Plan. Manag.
**2009**, 135, 466–474. [Google Scholar] [CrossRef] - Eliades, D.; Lambrou, T.; Panayiotou, C.G.; Polycarpou, M.M. Contamination event detection in water distribution systems using a model-based approach. Procedia Eng.
**2014**, 89, 1089–1096. [Google Scholar] [CrossRef][Green Version] - Liu, L.; Zechman, E.M.; Mahinthakumar, G.; Ranjithan, S.R. Coupling of logistic regression analysis and local search methods for characterization of water distribution system contaminant source. Eng. Appl. Artif. Intell.
**2012**, 25, 309–316. [Google Scholar] [CrossRef] - Kim, M.; Choi, C.Y.; Gerba, C.P. Source tracking of microbial intrusion in water systems using artificial neural networks. Water Res.
**2008**, 42, 1308–1314. [Google Scholar] [CrossRef][Green Version] - Rutkowski, T.; Prokopiuk, F. Identification of the Contamination Source Location in the Drinking Water Distribution System Based on the Neural Network Classifier. IFAC-PapersOnLine
**2018**, 51, 15–22. [Google Scholar] [CrossRef] - Wang, K.; Wen, X.; Hou, D.; Tu, D.; Zhu, N.; Huang, P.; Zhang, G.; Zhang, H. Application of Least-Squares Support Vector Machines for Quantitative Evaluation of Known Contaminant in Water Distribution System Using Online Water Quality Parameters. Sensors
**2018**, 18, 938. [Google Scholar] [CrossRef] [PubMed][Green Version] - Guo, S.; Yang, R.; Zhang, H.; Weng, W.; Fan, W. Source identification for unsteady atmospheric dispersion of hazardous materials using Markov Chain Monte Carlo method. Int. J. Heat Mass Transf.
**2009**, 52, 3955–3962. [Google Scholar] [CrossRef] - Wade, D.; Senocak, I. Stochastic reconstruction of multiple source atmospheric contaminant dispersion events. Atmos. Environ.
**2013**, 74, 45–51. [Google Scholar] [CrossRef][Green Version] - Bashi-Azghadi, S.N.; Kerachian, R.; Bazargan-Lari, M.R.; Solouki, K. Characterizing an unknown pollution source in groundwater resources systems using PSVM and PNN. Expert Syst. Appl.
**2010**, 37, 7154–7161. [Google Scholar] [CrossRef] - Vesselinov, V.V.; Alexandrov, B.S.; O’Malley, D. Contaminant source identification using semi-supervised machine learning. J. Contam. Hydrol.
**2018**, 212, 134–142. [Google Scholar] [CrossRef] [PubMed] - Rossman, L.A. EPANET 2: Users Manual; U.S. Environmental Protection Agency: Washington, DC, USA, 2000.
- Van Zyl, J.E. A Methodology for Improved Operational Optimization of Water Distribution Systems. Ph.D. Thesis, University of Exeter, Exeter, UK, 2001. [Google Scholar]
- CWS; UoE. CWS Benchmarks. Available online: http://emps.exeter.ac.uk/engineering/research/cws/downloads/benchmarks/ (accessed on 6 November 2019).
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Breiman, L. Random forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef][Green Version]

**Figure 1.**Net3 water supply network layout with sensor positioning by Preis and Ostfeld [7].

**Figure 2.**Richmond water supply network layout with sensor positioning by Preis and Ostfeld [7].

**Figure 8.**Number of times out of 30 runs for which a node (marked blue) was ranked as first for the Net3 contamination event.

**Figure 10.**Richmond network true contamination event (black) and predicted contamination events (grey).

Successful Runs | ${\mathit{S}}_{\mathit{m}}$ | ${\mathit{S}}_{\mathit{m}}$ RMSE | ${\mathit{E}}_{\mathit{m}}$ | ${\mathit{E}}_{\mathit{m}}$ RMSE | ${\mathit{C}}_{\mathit{m}}$ | ${\mathit{C}}_{\mathit{m}}$ RMSE |
---|---|---|---|---|---|---|

30 | 14:20 h | 48 min | 20:20 h | 4.38 min | 813.7 mg/L | 18.06 mg/L |

min ${\mathit{S}}_{\mathit{m}}$ | max ${\mathit{S}}_{\mathit{m}}$ | min ${\mathit{E}}_{\mathit{m}}$ | max ${\mathit{E}}_{\mathit{m}}$ | min ${\mathit{C}}_{\mathit{m}}$ | max ${\mathit{C}}_{\mathit{m}}$ |
---|---|---|---|---|---|

−3.8 h | +0.2 h | 0.0 h | +0.2 h | −0.31 mg/L | −40.11 mg/L |

**Table 3.**Net3 network contamination event results comparison between true, best and worst of the total 30 runs.

Run | ${\mathit{S}}_{\mathit{m}}$ | ${\mathit{E}}_{\mathit{m}}$ | ${\mathit{C}}_{\mathit{m}}$ |
---|---|---|---|

True | 14:20 h | 20:20 h | 813.7 mg/L |

Best | 14:20 h | 20:20 h | 813.4 mg/L |

Worst | 14:10 h | 20:20 h | 773.6 mg/L |

Successful Runs | ${\mathit{S}}_{\mathit{m}}$ | ${\mathit{S}}_{\mathit{m}}$ RMSE | ${\mathit{E}}_{\mathit{m}}$ | ${\mathit{E}}_{\mathit{m}}$ RMSE | ${\mathit{C}}_{\mathit{m}}$ | ${\mathit{C}}_{\mathit{m}}$ RMSE |
---|---|---|---|---|---|---|

29 | 06:50 h | 6.06 min | 07:40 h | 12.36 min | 837 mg/L | 299.84 mg/L |

**Table 5.**Richmond network minimum and maximum errors (${S}_{m}$, ${E}_{m}$ and ${C}_{m}$) for 30 runs.

min ${\mathit{S}}_{\mathit{m}}$ | max ${\mathit{S}}_{\mathit{m}}$ | min ${\mathit{E}}_{\mathit{m}}$ | max ${\mathit{E}}_{\mathit{m}}$ | min ${\mathit{C}}_{\mathit{m}}$ | max ${\mathit{C}}_{\mathit{m}}$ |
---|---|---|---|---|---|

0.0 h | +0.4 h | −0.4 h | +0.6 h | −122.81 mg/L | −446.81 mg/L |

**Table 6.**Richmond network contamination event results comparison between true, best and worst of the total 30 runs.

Run | ${\mathit{S}}_{\mathit{m}}$ | ${\mathit{E}}_{\mathit{m}}$ | ${\mathit{C}}_{\mathit{m}}$ |
---|---|---|---|

True | 06:50 h | 07:40 h | 837 mg/L |

Best | 06:50 h | 07:40 h | 714.2 mg/L |

Worst | 06:50 h | 08:00 h | 390.2 mg/L |

m | Successful Runs | Best Rank | Worst Rank | Times Won | ${\mathit{S}}_{\mathit{m}}$ RMSE | ${\mathit{E}}_{\mathit{m}}$ RMSE | ${\mathit{C}}_{\mathit{m}}$ RMSE | Average Run Time |
---|---|---|---|---|---|---|---|---|

20 | 9 | 1 | 7 | 2 | 1.44 h | 1.21 h | 291.89 mg/L | 50 s |

40 | 9 | 3 | 8 | 0 | 1.76 h | 1.11 h | 371.93 mg/L | 70 s |

80 | 10 | 1 | 6 | 1 | 0.75 h | 0.36 h | 283.45 mg/L | 120 s |

100 | 10 | 1 | 6 | 4 | 0.18 h | 0.21 h | 187.39 mg/L | 160 s |

200 | 10 | 1 | 6 | 4 | 0.58 h | 0.18 h | 109.40 mg/L | 270 s |

400 | 10 | 1 | 6 | 5 | 0.83 h | 0.10 h | 39.23 mg/L | 420 s |

800 | 10 | 1 | 5 | 6 | 0.14 h | 0.11 h | 9.43 mg/L | 560 s |

1000 | 10 | 1 | 6 | 9 | 0.32 h | 0.03 h | 15.69 mg/L | 820 s |

1200 | 10 | 1 | 1 | 10 | 0.06 h | 0.00 h | 11.61 mg/L | 980 s |

2000 | 10 | 1 | 1 | 10 | 0.03 h | 0.00 h | 3.57 mg/L | 1400 s |

3000 | 10 | 1 | 1 | 10 | 0.03 h | 0.00 h | 4.46 mg/L | 2100 s |

4000 | 10 | 1 | 1 | 10 | 0.00 h | 0.00 h | 4.09 mg/L | 3200 s |

5000 | 10 | 1 | 1 | 10 | 0.03 h | 0.00 h | 1.41 mg/L | 4000 s |

6000 | 10 | 1 | 1 | 10 | 0.03 h | 0.00 h | 3.59 mg/L | 4800 s |

10,000 | 10 | 1 | 1 | 10 | 0.00 h | 0.00 h | 2.00 mg/L | 6300 s |

**Table 8.**Influence of the number of the tournament group size k on the accuracy and efficiency of the algorithm.

k | Successful Runs | Best Rank | Worst Rank | Times Won | ${\mathit{S}}_{\mathit{m}}$ RMSE | ${\mathit{E}}_{\mathit{m}}$ RMSE | ${\mathit{C}}_{\mathit{m}}$ RMSE | CPUs Used |
---|---|---|---|---|---|---|---|---|

2 | 10 | 1 | 5 | 6 | 0.14 h | 0.11 h | 9.43 mg/L | 46 |

4 | 10 | 1 | 5 | 7 | 0.27 h | 0.08 h | 27.46 mg/L | 23 |

10 | 10 | 1 | 5 | 5 | 0.74 h | 0.17 h | 83.16 mg/L | 9 |

40 | 5 | 1 | 3 | 1 | 0.35 h | 2.19 h | 215.59 mg/L | 2 |

80 | 4 | 1 | 2 | 2 | 0.27 h | 0.53 h | 143.04 mg/L | 2 |

**Table 9.**Influence of the number of the tournament loops l on the accuracy and efficiency of the algorithm.

l | Successful Runs | Best Rank | Worst Rank | Times Won | ${\mathit{S}}_{\mathit{m}}$ RMSE | ${\mathit{E}}_{\mathit{m}}$ RMSE | ${\mathit{C}}_{\mathit{m}}$ RMSE | m/L |
---|---|---|---|---|---|---|---|---|

1 | 10 | 1 | 5 | 6 | 0.14 h | 0.11 h | 9.43 mg/L | 800 |

2 | 10 | 1 | 6 | 8 | 0.18 h | 0.08 h | 23.20 mg/L | 400 |

3 | 10 | 1 | 4 | 8 | 0.11 h | 0.04 h | 15.42 mg/L | 267 |

4 | 10 | 1 | 3 | 9 | 0.08 h | 0.03 h | 14.71 mg/L | 200 |

5 | 10 | 1 | 3 | 7 | 0.07 h | 0.08 h | 14.57 mg/L | 160 |

6 | 10 | 1 | 2 | 8 | 0.07 h | 0.05 h | 23.29 mg/L | 134 |

7 | 10 | 1 | 1 | 10 | 0.03 h | 0.05 h | 14.77 mg/L | 115 |

8 | 9 | 1 | 3 | 7 | 0.07 h | 0.07 h | 25.49 mg/L | 100 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Grbčić, L.; Lučin, I.; Kranjčević, L.; Družeta, S.
A Machine Learning-based Algorithm for Water Network Contamination Source Localization. *Sensors* **2020**, *20*, 2613.
https://doi.org/10.3390/s20092613

**AMA Style**

Grbčić L, Lučin I, Kranjčević L, Družeta S.
A Machine Learning-based Algorithm for Water Network Contamination Source Localization. *Sensors*. 2020; 20(9):2613.
https://doi.org/10.3390/s20092613

**Chicago/Turabian Style**

Grbčić, Luka, Ivana Lučin, Lado Kranjčević, and Siniša Družeta.
2020. "A Machine Learning-based Algorithm for Water Network Contamination Source Localization" *Sensors* 20, no. 9: 2613.
https://doi.org/10.3390/s20092613