Relevance of Machine Learning Techniques in Water Infrastructure Integrity and Quality: A Review Powered by Natural Language Processing
- The article conducts a bibliographic analysis of English-language articles from 2015 onward, resulting in a total of 1087 articles, to explore the applications of machine learning (ML) techniques in water infrastructure integrity and quality.
- The study utilizes a semi-automatic approach, leveraging BERTopic, for conducting the bibliographic analysis, which enhances the contextual comprehension in topic modeling.
- The article also emphasizes the potential of combining ML techniques with cutting-edge monitoring systems in multiple aspects of water infrastructure and quality.
- The insights drawn from the analysis highlight the instrumental role of ML in enhancing water infrastructure’s integrity and quality, suggesting promising future research directions.
2.1. Topic Analysis
2.2. Bigram Analysis
3. BERT Topics and General Bibliometrics
4. Bigram and Traditional Results
4.1. Advancements in Machine Learning for Water Contaminants and Soil Erosion
4.1.1. Bigram Document Analysis
4.1.2. Traditional Analysis
4.2. Assessing Water Quality and Potability
4.2.1. Bigram Document Analysis
4.2.2. Traditional Analysis
4.3. Forecasting Water Levels
4.3.1. Bigram Document Analysis
4.3.2. Traditional Analysis
4.4. Advanced Leak Detection in Water Networks
4.4.1. Bigram Document Analysis
4.4.2. Traditional Analysis
5. Discussion and Future Research Directions
5.1. Advancements in Machine Learning for Water Contaminants and Soil Erosion
5.2. Forecasting Water Levels
5.3. Advanced Leak Detection in Water Networks
5.4. Assessing Water Quality and Potability
Data Availability Statement
Conflicts of Interest
- Hanjra, M.A.; Blackwell, J.; Carr, G.; Zhang, F.; Jackson, T.M. Wastewater irrigation and environmental health: Implications for water governance and public policy. Int. J. Hyg. Environ. Health 2012, 215, 255–269. [Google Scholar] [CrossRef]
- Green, T.R.; Taniguchi, M.; Kooi, H.; Gurdak, J.J.; Allen, D.M.; Hiscock, K.M.; Treidel, H.; Aureli, A. Beneath the surface of global change: Impacts of climate change on groundwater. J. Hydrol. 2011, 405, 532–560. [Google Scholar] [CrossRef]
- Koop, S.H.; van Leeuwen, C.J. Assessment of the sustainability of water resources management: A critical review of the city blueprint approach. Water Resour. Manag. 2015, 29, 5649–5670. [Google Scholar] [CrossRef]
- Marques, A.C.; Veras, C.E.; Rodriguez, D.A. Assessment of water policies contributions for sustainable water resources management under climate change scenarios. J. Hydrol. 2022, 608, 127690. [Google Scholar] [CrossRef]
- Ferreira, D.C.; Graziele, I.; Marques, R.C.; Gonçalves, J. Investment in drinking water and sanitation infrastructure and its impact on waterborne diseases dissemination: The Brazilian case. Sci. Total Environ. 2021, 779, 146279. [Google Scholar] [CrossRef] [PubMed]
- Hussain, M.I.; Muscolo, A.; Farooq, M.; Ahmad, W. Sustainable use and management of non-conventional water resources for rehabilitation of marginal lands in arid and semiarid environments. Agric. Water Manag. 2019, 221, 462–476. [Google Scholar] [CrossRef]
- Wu, B.; Tian, F.; Zhang, M.; Piao, S.; Zeng, H.; Zhu, W.; Liu, J.; Elnashar, A.; Lu, Y. Quantifying global agricultural water appropriation with data derived from earth observations. J. Clean. Prod. 2022, 358, 131891. [Google Scholar] [CrossRef]
- Gao, M.; Zhu, L.; Peh, C.K.; Ho, G.W. Solar absorber material and system designs for photothermal water vaporization towards clean water and energy production. Energy Environ. Sci. 2019, 12, 841–864. [Google Scholar] [CrossRef]
- Mishra, R.; Dubey, S. Fresh water availability and it’s global challenge. Br. J. Multidiscip. Adv. Stud. 2023, 4, 1–78. [Google Scholar] [CrossRef]
- Sohail, M.; Mustafa, S.; Ali, M.; Riaz, S. Agricultural communities’ risk assessment and the effects of climate change: A pathway toward green productivity and sustainable development. Front. Environ. Sci. 2023, 10, 948016. [Google Scholar] [CrossRef]
- Khan, H.F.; Arshad, S.A. Beyond water scarcity: Water (in) security and social justice in Karachi. J. Hydrol. Reg. Stud. 2022, 42, 101140. [Google Scholar] [CrossRef]
- Ajith, J.B.; Manimegalai, R.; Ilayaraja, V. An IoT based smart water quality monitoring system using cloud. In Proceedings of the 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), Vellore, India, 24–25 February 2020; pp. 1–7. [Google Scholar]
- Panigrahi, N.; Patro, S.; Kumar, R.; Omar, M.; Ngan, T.T.; Giang, N.L.; Thu, B.T.; Thang, N.T. Groundwater Quality Analysis and Drinkability Prediction using Artificial Intelligence. Earth Sci. Inform. 2023, 16, 1701–1725. [Google Scholar] [CrossRef]
- Xu, Z.; Lv, Z.; Li, J.; Shi, A. A novel approach for predicting water demand with complex patterns based on ensemble learning. Water Resour. Manag. 2022, 36, 4293–4312. [Google Scholar] [CrossRef]
- Ayati, A.H.; Haghighi, A.; Ghafouri, H.R. Machine Learning–Assisted Model for Leak Detection in Water Distribution Networks Using Hydraulic Transient Flows. J. Water Resour. Plan. Manag. 2022, 148, 04021104. [Google Scholar] [CrossRef]
- Hofmann, T. Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 2001, 42, 177. [Google Scholar] [CrossRef]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401, 788–791. [Google Scholar] [CrossRef] [PubMed]
- Arora, S.; Ge, R.; Moitra, A. Learning topic models–going beyond SVD. In Proceedings of the 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, New Brunswick, NJ, USA, 20–23 October 2012; pp. 1–10. [Google Scholar]
- Grootendorst, M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv 2022, arXiv:2203.05794. [Google Scholar]
- Garcia, J.; Villavicencio, G.; Altimiras, F.; Crawford, B.; Soto, R.; Minatogawa, V.; Franco, M.; Martínez-Muñoz, D.; Yepes, V. Machine learning techniques applied to construction: A hybrid bibliometric analysis of advances and future directions. Autom. Constr. 2022, 142, 104532. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- McInnes, L.; Healy, J.; Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar]
- Campello, R.J.; Moulavi, D.; Sander, J. Density-based clustering based on hierarchical density estimates. In Proceedings of the Advances in Knowledge Discovery and Data Mining: 17th Pacific-Asia Conference, PAKDD 2013, Gold Coast, Australia, 14–17 April 2013; Proceedings, Part II 17. Springer: Berlin/Heidelberg, Germany, 2013; pp. 160–172. [Google Scholar]
- Aria, M.; Cuccurullo, C. bibliometrix: An R-tool for comprehensive science mapping analysis. J. Inf. 2017, 11, 959–975. [Google Scholar] [CrossRef]
- Grivel, L.; Mutschke, P.; Polanco, X. Thematic mapping on bibliographic databases by cluster analysis: A description of the sdoc environment with solis. Knowl. Organ. 1995, 22, 70–77. [Google Scholar]
- López-Fernández, M.C.; Serrano-Bedia, A.M.; Pérez-Pérez, M. Entrepreneurship and family firm research: A bibliometric analysis of an emerging field. J. Small Bus. Manag. 2016, 54, 622–639. [Google Scholar] [CrossRef]
- Bradford, S.C. Sources of information on specific subjects. Engineering 1934, 137, 85–86. [Google Scholar]
- Rao, P.; Wang, Y.; Liu, Y.; Wang, X.; Hou, Y.; Pan, S.; Wang, F.; Zhu, D. A comparison of multiple methods for mapping groundwater levels in the Mu Us Sandy Land, China. J. Hydrol. Reg. Stud. 2022, 43, 101189. [Google Scholar] [CrossRef]
- Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef]
- Li, X.; Sha, J.; Wang, Z.L. Comparison of daily streamflow forecasts using extreme learning machines and the Random Forest method. Hydrol. Sci. J. 2019, 64, 1857–1866. [Google Scholar] [CrossRef]
- Schütze, H.; Manning, C.D.; Raghavan, P. Introduction to Information Retrieval; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
- Jurafsky, D.; Martin, J.H. Speech and Language Processing, 3rd ed.; Stanford University: Stanford, CA, USA, 2019. [Google Scholar]
- Knoll, L.; Breuer, L.; Bach, M. Large scale prediction of groundwater nitrate concentrations from spatial data using machine learning. Sci. Total Environ. 2019, 668, 1317–1327. [Google Scholar] [CrossRef]
- Garosi, Y.; Sheklabadi, M.; Conoscenti, C.; Pourghasemi, H.R.; Van Oost, K. Assessing the performance of GIS-based machine learning models with different accuracy measures for determining susceptibility to gully erosion. Sci. Total Environ. 2019, 664, 1117–1132. [Google Scholar] [CrossRef]
- Tan, K.; Ma, W.; Chen, L.; Wang, H.; Du, Q.; Du, P.; Yan, B.; Liu, R.; Li, H. Estimating the distribution trend of soil heavy metals in mining area from HyMap airborne hyperspectral imagery based on ensemble learning. J. Hazard. Mater. 2021, 401, 123288. [Google Scholar] [CrossRef]
- Mosavi, A.; Sajedi-Hosseini, F.; Choubin, B.; Taromideh, F.; Rahi, G.; Dineva, A.A. Susceptibility mapping of soil water erosion using machine learning models. Water 2020, 12, 1995. [Google Scholar] [CrossRef]
- Mukherjee, A.; Sarkar, S.; Chakraborty, M.; Duttagupta, S.; Bhattacharya, A.; Saha, D.; Bhattacharya, P.; Mitra, A.; Gupta, S. Occurrence, predictors and hazards of elevated groundwater arsenic across India through field observations and regional-scale AI-based modeling. Sci. Total Environ. 2021, 759, 143511. [Google Scholar] [CrossRef] [PubMed]
- Chakraborty, M.; Sarkar, S.; Mukherjee, A.; Shamsudduha, M.; Ahmed, K.M.; Bhattacharya, A.; Mitra, A. Modeling regional-scale groundwater arsenic hazard in the transboundary Ganges River Delta, India and Bangladesh: Infusing physically-based model with machine learning. Sci. Total Environ. 2020, 748, 141107. [Google Scholar] [CrossRef] [PubMed]
- Knoll, L.; Breuer, L.; Bach, M. Nation-wide estimation of groundwater redox conditions and nitrate concentrations through machine learning. Environ. Res. Lett. 2020, 15, 064004. [Google Scholar] [CrossRef]
- Harrison, J.W.; Lucius, M.A.; Farrell, J.L.; Eichler, L.W.; Relyea, R.A. Prediction of stream nitrogen and phosphorus concentrations from high-frequency sensors using Random Forests Regression. Sci. Total Environ. 2021, 763, 143005. [Google Scholar] [CrossRef]
- Messier, K.P.; Kane, E.; Bolich, R.; Serre, M.L. Nitrate variability in groundwater of North Carolina using monitoring and private well data models. Environ. Sci. Technol. 2014, 48, 10804–10812. [Google Scholar] [CrossRef]
- Messier, K.P.; Wheeler, D.C.; Flory, A.R.; Jones, R.R.; Patel, D.; Nolan, B.T.; Ward, M.H. Modeling groundwater nitrate exposure in private wells of North Carolina for the Agricultural Health Study. Sci. Total Environ. 2019, 655, 512–519. [Google Scholar] [CrossRef]
- Ransom, K.M.; Nolan, B.T.; Stackelberg, P.; Belitz, K.; Fram, M.S. Machine learning predictions of nitrate in groundwater used for drinking supply in the conterminous United States. Sci. Total Environ. 2022, 807, 151065. [Google Scholar] [CrossRef]
- Podgorski, J.; Araya, D.; Berg, M. Geogenic manganese and iron in groundwater of Southeast Asia and Bangladesh–Machine learning spatial prediction modeling and comparison with arsenic. Sci. Total Environ. 2022, 833, 155131. [Google Scholar] [CrossRef]
- Kwon, S.; Seo, I.W.; Noh, H.; Kim, B. Hyperspectral retrievals of suspended sediment using cluster-based machine learning regression in shallow waters. Sci. Total Environ. 2022, 833, 155168. [Google Scholar] [CrossRef] [PubMed]
- Giri, S.; Kang, Y.; MacDonald, K.; Tippett, M.; Qiu, Z.; Lathrop, R.G.; Obropta, C.C. Revealing the sources of arsenic in private well water using Random Forest Classification and Regression. Sci. Total Environ. 2023, 857, 159360. [Google Scholar] [CrossRef] [PubMed]
- Alygizakis, N.; Giannakopoulos, T.; Thomaidis, N.S.; Slobodnik, J. Detecting the sources of chemicals in the Black Sea using non-target screening and deep learning convolutional neural networks. Sci. Total Environ. 2022, 847, 157554. [Google Scholar] [CrossRef] [PubMed]
- Raheja, H.; Goel, A.; Pal, M. Prediction of groundwater quality indices using machine learning algorithms. Water Pract. Technol. 2022, 17, 336–351. [Google Scholar] [CrossRef]
- Alipio, M.I. Data-driven IoT-based water quality monitoring and potability classification system in rural areas. In Proceedings of the 2020 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea, 21–23 October 2020; pp. 634–639. [Google Scholar]
- Riyantoko, P.A.; Sugiarto; Diyasa, I.G.S.M.; Kraugusteeliana. “FQAM” Feyn-QLattice Automation Modelling: Python Module of Machine Learning for Data Classification in Water Potability. In Proceedings of the 2021 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), Jakarta, Indonesia, 28–29 October 2021; pp. 135–141. [Google Scholar]
- Yusuf, H.; Alhaddad, S.; Yusuf, S.; Hewahi, N. Classification of Water Potability Using Machine Learning Algorithms. In Proceedings of the 2022 International Conference on Data Analytics for Business and Industry (ICDABI), Sakhir, Bahrain, 25–26 October 2022; pp. 454–458. [Google Scholar]
- Priyadarshini, I.; Alkhayyat, A.; Obaid, A.J.; Sharma, R. Water pollution reduction for sustainable urban development using machine learning techniques. Cities 2022, 130, 103970. [Google Scholar] [CrossRef]
- Rivas-Villar, D.; Rouco, J.; Carballeira, R.; Penedo, M.G.; Novo, J. Fully automatic detection and classification of phytoplankton specimens in digital microscopy images. Comput. Methods Programs Biomed. 2021, 200, 105923. [Google Scholar] [CrossRef]
- Alipio, M.I. Towards developing a classification model for water potability in Philippine rural areas. ASEAN Eng. J. 2020, 10, 24–34. [Google Scholar] [CrossRef]
- Dalal, S.; Onyema, E.M.; Romero, C.A.T.; Ndufeiya-Kumasi, L.C.; Maryann, D.C.; Nnedimkpa, A.J.; Bhatia, T.K. Machine learning-based forecasting of potability of drinking water through adaptive boosting model. Open Chem. 2022, 20, 816–828. [Google Scholar] [CrossRef]
- Alomani, S.M.; Alhawiti, N.I.; Alhakamy, A. Prediction of Quality of Water According to a Random Forest Classifier. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 892–899. [Google Scholar] [CrossRef]
- Haq, M.I.K.; Ramadhan, F.D.; Az-Zahra, F.; Kurniawati, L.; Helen, A. Classification of water potability using machine learning algorithms. In Proceedings of the 2021 International Conference on Artificial Intelligence and Big Data Analytics, Bandung, Indonesia, 27–29 October 2021; pp. 1–5. [Google Scholar]
- He, S.; Wu, J.; Wang, D.; He, X. Predictive modeling of groundwater nitrate pollution and evaluating its main impact factors using Random Forest. Chemosphere 2022, 290, 133388. [Google Scholar] [CrossRef]
- Kouadri, S.; Pande, C.B.; Panneerselvam, B.; Moharir, K.N.; Elbeltagi, A. Prediction of irrigation groundwater quality parameters using ANN, LSTM, and MLR models. Environ. Sci. Pollut. Res. 2021, 29, 21067–21091. [Google Scholar] [CrossRef] [PubMed]
- Sun, X.; Zhang, Y.; Shi, K.; Zhang, Y.; Li, N.; Wang, W.; Huang, X.; Qin, B. Monitoring water quality using proximal remote sensing technology. Sci. Total Environ. 2022, 803, 149805. [Google Scholar] [CrossRef]
- Xu, X.; Liu, Y.; Liu, S.; Li, J.; Guo, G.; Smith, K. Real-time detection of potable-reclaimed water pipe cross-connection events by conventional water quality sensors using machine learning methods. J. Environ. Manag. 2019, 238, 201–209. [Google Scholar] [CrossRef]
- Cao, Q.; Yu, G.; Qiao, Z. Application and recent progress of inland water monitoring using remote sensing techniques. Environ. Monit. Assess. 2023, 195, 1–16. [Google Scholar] [CrossRef] [PubMed]
- Ahmed, S.; Mahzabin, M.; Shahpar, S.; Tonni, S.I.; Rahman, M.S. Assessment of Water Quality in Smart City Environment Leveraging ML-IoT. In Proceedings of the International Conference on Fourth Industrial Revolution and Beyond 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 215–227. [Google Scholar]
- Zai, C.; El Mechal, C.; El Amrani El Idrissi, N.; Ghennioui, H. Prediction of Water Quality Using Artificial Intelligence (AI) and Statistical Approach. In Proceedings of the Digital Technologies and Applications: Proceedings of ICDTA’22, Fez, Morocco, 28–30 January 2022; Volume 1, pp. 34–42. [Google Scholar]
- Bajpai, A.; Chaubey, S.; Patro, B.; Verma, A. A Real-Time Approach to Classify the Water Quality of the River Ganga at Mehandi Ghat, Kannuaj. In Proceedings of the 2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET), Kota Kinabalu, Malaysia, 13–15 September 2022; pp. 1–6. [Google Scholar]
- Chafloque, R.; Rodriguez, C.; Pomachagua, Y.; Hilario, M. Predictive Neural Networks Model for Detection of Water Quality for Human Consumption. In Proceedings of the 2021 13th International Conference on Computational Intelligence and Communication Networks (CICN), Lima, Peru, 22–23 September 2021; pp. 172–176. [Google Scholar]
- El-Attar, N.E.; Lotfy, H.R.; Awad, W.A. Performance of Artificial Intelligence Models in Analysis and Prediction of Water Potability. In Proceedings of the 2022 International Telecommunications Conference (ITC-Egypt), Alexandria, Egypt, 26–28 July 2022; pp. 1–6. [Google Scholar]
- Panahi, M.; Sadhasivam, N.; Pourghasemi, H.R.; Rezaie, F.; Lee, S. Spatial prediction of groundwater potential mapping based on convolutional neural network (CNN) and support vector regression (SVR). J. Hydrol. 2020, 588, 125033. [Google Scholar] [CrossRef]
- Bonakdari, H.; Ebtehaj, I.; Samui, P.; Gharabaghi, B. Lake water-level fluctuations forecasting using minimax probability machine regression, relevance vector machine, Gaussian process regression, and extreme learning machine. Water Resour. Manag. 2019, 33, 3965–3984. [Google Scholar] [CrossRef]
- Páliz Larrea, P.; Zapata-Ríos, X.; Campozano Parra, L. Application of neural network models and ANFIS for water level forecasting of the Salve Faccha Dam in the Andean Zone in Northern Ecuador. Water 2021, 13, 2011. [Google Scholar] [CrossRef]
- Truong, V.H.; Ly, Q.V.; Le, V.C.; Vu, T.B.; Le, T.T.T.; Tran, T.T.; Goethals, P. Machine learning-based method for forecasting water levels in irrigation and drainage systems. Environ. Technol. Innov. 2021, 23, 101762. [Google Scholar] [CrossRef]
- Hikouei, I.S.; Eshleman, K.N.; Saharjo, B.H.; Graham, L.L.; Applegate, G.; Cochrane, M.A. Using machine learning algorithms to predict groundwater levels in Indonesian tropical peatlands. Sci. Total Environ. 2023, 857, 159701. [Google Scholar] [CrossRef]
- Emami, M.; Ahmadi, A.; Daccache, A.; Nazif, S.; Mousavi, S.F.; Karami, H. County-level irrigation water demand estimation using machine learning: Case study of California. Water 2022, 14, 1937. [Google Scholar] [CrossRef]
- Oliveira, L.C.d.; Santos, C.A.G.; de Farias, C.A.S.; da Silva, R.M.; Singh, V.P. Predicting Groundwater Levels in Ogallala Aquifer Wells Using Hierarchical Cluster Analysis and artificial neural networks. J. Hydrol. Eng. 2023, 28, 04022042. [Google Scholar] [CrossRef]
- Shang, Y.; Song, K.; Lai, F.; Lyu, L.; Liu, G.; Fang, C.; Hou, J.; Qiang, S.; Yu, X.; Wen, Z. Remote sensing of fluorescent humification levels and its potential environmental linkages in lakes across China. Water Res. 2023, 230, 119540. [Google Scholar] [CrossRef] [PubMed]
- Demir, V.; Yaseen, Z.M. Neurocomputing intelligence models for lakes water level forecasting: A comprehensive review. Neural Comput. Appl. 2023, 35, 303–343. [Google Scholar] [CrossRef]
- Hu, X.; Han, Y.; Yu, B.; Geng, Z.; Fan, J. Novel leakage detection and water loss management of urban water supply network using multiscale neural networks. J. Clean. Prod. 2021, 278, 123611. [Google Scholar] [CrossRef]
- Bohorquez, J.; Alexander, B.; Simpson, A.R.; Lambert, M.F. Leak detection and topology identification in pipelines using fluid transients and artificial neural networks. J. Water Resour. Plan. Manag. 2020, 146, 04020040. [Google Scholar] [CrossRef]
- Liu, Y.; Ma, X.; Li, Y.; Tie, Y.; Zhang, Y.; Gao, J. Water pipeline leakage detection based on machine learning and wireless sensor networks. Sensors 2019, 19, 5086. [Google Scholar] [CrossRef]
- Sun, C.; Parellada, B.; Puig, V.; Cembrano, G. Leak localization in water distribution networks using pressure and data-driven classifier approach. Water 2019, 12, 54. [Google Scholar] [CrossRef]
- Guo, G.; Yu, X.; Liu, S.; Ma, Z.; Wu, Y.; Xu, X.; Wang, X.; Smith, K.; Wu, X. Leakage detection in water distribution systems based on time–frequency convolutional neural network. J. Water Resour. Plan. Manag. 2021, 147, 04020101. [Google Scholar] [CrossRef]
- Ravichandran, T.; Gavahi, K.; Ponnambalam, K.; Burtea, V.; Mousavi, S.J. Ensemble-based machine learning approach for improved leak detection in water mains. J. Hydroinform. 2021, 23, 307–323. [Google Scholar] [CrossRef]
- Butterfield, J.D.; Meyers, G.; Meruane, V.; Collins, R.P.; Beck, S.B. Experimental investigation into techniques to predict leak shapes in water distribution systems using vibration measurements. J. Hydroinform. 2018, 20, 815–828. [Google Scholar] [CrossRef]
- Fereidooni, Z.; Tahayori, H.; Bahadori-Jahromi, A. A hybrid model-based method for leak detection in large scale water distribution networks. J. Ambient Intell. Humaniz. Comput. 2021, 12, 1613–1629. [Google Scholar] [CrossRef]
- Chen, J.; Feng, X.; Xiao, S. An iterative method for leakage zone identification in water distribution networks based on machine learning. Struct. Health Monit. 2021, 20, 1938–1956. [Google Scholar] [CrossRef]
- Levinas, D.; Perelman, G.; Ostfeld, A. Water leak localization using high-resolution pressure sensors. Water 2021, 13, 591. [Google Scholar] [CrossRef]
- Alves Coelho, J.; Glória, A.; Sebastião, P. Precise water leak detection using machine learning and real-time sensor data. IoT 2020, 1, 474–493. [Google Scholar] [CrossRef]
- Tariq, S.; Bakhtawar, B.; Zayed, T. Data-driven application of MEMS-based accelerometers for leak detection in water distribution networks. Sci. Total Environ. 2022, 809, 151110. [Google Scholar] [CrossRef]
- Cantos, W.P.; Juran, I.; Tinelli, S. Machine-learning–based risk assessment method for leak detection and geolocation in a water distribution system. J. Infrastruct. Syst. 2020, 26, 04019039. [Google Scholar] [CrossRef]
- Mysorewala, M.F.; Cheded, L.; Ali, I.M. Leak detection using flow-induced vibrations in pressurized wall-mounted water pipelines. IEEE Access 2020, 8, 188673–188687. [Google Scholar]
- Mashhadi, N.; Shahrour, I.; Attoue, N.; El Khattabi, J.; Aljer, A. Use of machine learning for leak detection and localization in water distribution systems. Smart Cities 2021, 4, 1293–1315. [Google Scholar] [CrossRef]
- Tijani, I.; Abdelmageed, S.; Fares, A.; Fan, K.; Hu, Z.; Zayed, T. Improving the leak detection efficiency in water distribution networks using noise loggers. Sci. Total Environ. 2022, 821, 153530. [Google Scholar] [CrossRef]
- Chen, J.; Tang, P.; Rakstad, T.; Patrick, M.; Zhou, X. Augmenting a deep-learning algorithm with canal inspection knowledge for reliable water leak detection from multispectral satellite images. Adv. Eng. Inform. 2020, 46, 101161. [Google Scholar] [CrossRef]
- Yu, T.; Chen, X.; Yan, W.; Xu, Z.; Ye, M. Leak detection in water distribution systems by classifying vibration signals. Mech. Syst. Signal Process. 2023, 185, 109810. [Google Scholar] [CrossRef]
- Vanijjirattikhan, R.; Khomsay, S.; Kitbutrawat, N.; Khomsay, K.; Supakchukul, U.; Udomsuk, S.; Suwatthikul, J.; Oumtrakul, N.; Anusart, K. AI-based acoustic leak detection in water distribution systems. Results Eng. 2022, 15, 100557. [Google Scholar] [CrossRef]
- Bykerk, L.; Valls Miro, J. Detection of Water Leaks in Suburban Distribution Mains with Lift and Shift Vibro-Acoustic Sensors. Vibration 2022, 5, 370–382. [Google Scholar] [CrossRef]
- Gupta, A.; Kulat, K. A selective literature review on leak management techniques for water distribution system. Water Resour. Manag. 2018, 32, 3247–3269. [Google Scholar] [CrossRef]
- Kammoun, M.; Kammoun, A.; Abid, M. LSTM-AE-WLDL: Unsupervised LSTM Auto-Encoders for Leak Detection and Location in Water Distribution Networks. Water Resour. Manag. 2023, 37, 731–746. [Google Scholar] [CrossRef]
|MLR, CART, RF||MAE, RMSE, R2. RF was the best|
with R2 = 0.54, MAE = 5.9, RMSE = 9.19
|Groundwater database˙GruWaH˙to record groundwater|
quality throughout the state of Hesse
|The spatial distribution of nitrate concentration|
in groundwater .
|SVM, RF, GAM, NB||MAE, RMSE, AUC. The best performance|
was RF with AUC = 92.4, RMSE = 0.2874,
MAE = 0.082
|Digital map of the spatial distribution of gullies.|
130 locations of Ekbatan Dam Basin,
Hamedan, western Iran
|Create gully erosion susceptibility map (GESM)|
in a part of Ekbatan Dam Basin, Hamedan,
western Iran .
|SVM, RF, XGBoost,|
|R2, RMSE, MAE. Of the tested models,|
XGBoost achieved the best results
for all the analyzed metals.
|The collection of surface soil samples was essentially|
synchronized with the acquisition of airborne
hyperspectral data from late April and early May of 2017.
|The prediction of four heavy metals is studied.|
Arsenic (As), chromium (Cr), lead (Pb),
and zinc (Zn) .
|Accuracy, Kappa, POD. The best model|
was RF with Accuracy = 91%,
Kappa = 82%, POD = 94%
|Data included 227 samples of erosion and non-erosion|
locations through field surveys
|Machine learning models for mapping|
susceptibility to soil erosion by water .
|RF||Accuracy, 82–84%, AUC, 88–89%.||The data was acquired from the National Rural Drinking|
Water Programme (NRDWP) and the Central Ground
Water Board (CGWB), both under the Ministry of Jal
Shakti of the Government of India. It includes
2,611,365 drinking water wells (NRDWP, 2018) and
649 monitoring wells (CGWB, 2018).
|Use field observations of arsenic (As) in groundwater|
with high spatial resolution, with the aim of delineating
the regional-scale occurrence of elevated arsenic
concentrations in groundwater .
|RF, Boosted Regression|
Trees (BRT) and
Logistic Regression (LR)
|Accuracy, Sensitivity, Specificity.The best|
model was Random Forest with 82%
in all indicators.
|A total of 100,358 arsenic data in groundwater were|
compiled from various databases or published reports
from India and Bangladesh, BGS/DPHE (2001),
PHED (2006), and BWDB (2013).
|Transboundary regional scale models are used to|
calculate the probability of arsenic concentrations
in groundwater .
|RF, QRF||R2 0.53 (RFO2), 0.24 (RFFe), 0.51 ()||The study uses data from the WFD groundwater|
monitoring network in Germany and selects
monitoring sites based on criteria like metadata,
sampling depth, observation period, and
excluding concentration outliers.
|Applies machine learning techniques to estimate|
groundwater redox conditions and nitrate
concentrations across Germany .
|RF||MAE, RMSE. For , MAE = 0.075,|
RMSE = 0.12
|The dataset features nutrient parameters, method|
detection limits, sensor calibration, and collection
intervals ranging from 1 to 15 min, with
monthly quality checks for water quality monitoring.
|To estimate stream nitrogen (N) and phosphorus (P)|
concentrations from sensor data in a forested,
mountainous drainage area in upstate New York .
|Kriging, SVM, GBM,|
|Accuracy, Kappa. The best result with|
Up-Sampling. Accuracy = 0.725,
Kappa = 0.369
|Consists of 22,059 groundwater nitrate measurements|
from private wells in North Carolina, collected and
maintained by the NC-DHHS between 1990 and 2011.
The data were obtained by Messier et al. (2014) 
and used for modeling.
|Estimate groundwater nitrate concentrations in|
private wells, aiming to improve exposure
estimates for the AHS cohort .
|XGB||76 variables were retained in the final|
model, with an R2 of 0.83 for the
training data and 0.49 for the hold-out
data. The RMSE values were 1.15 and
2.01 for the training and hold-out data
|The model uses data from 12,082 wells and|
various predictor variables, providing accurate
estimates at both national and regional scales.
|The objective of this study is to develop an extreme|
gradient boosting (XGB) machine learning model
to predict the distribution of nitrate in groundwater
across the conterminous United States (CONUS) .
|RF, Boosted models||The best-performing model was the Random|
Forest (RF) model, which achieved an AUC
of 0.80 and a Cohen’s Kappa score of 0.43.
|The data used in the study consist of over 6000|
groundwater measurements of manganese (Mn)
and iron (Fe) from Southeast Asia and Bangladesh.
These measurements were statistically examined along
with other physicochemical parameters.
|To use machine learning methods, specifically random|
forest and generalized boosted regression modeling,
to analyze over 6000 groundwater measurements of
naturally occurring manganese (Mn) and iron (Fe)
in Southeast Asia and Bangladesh .
|Gaussian mixture model|
clustering technique, random
forest regressor (RFR)
|The clustered RFR model yielded|
improvements of 10.82% in R2, 18.57%
in RMSEP, 3.03% in MAPE, and 10.81
in TSE compared to the nonclustering case.
|The study involves preprocessing hyperspectral images,|
converting digital numbers to normalized reflectance
values, and matching the processed images with the
Suspended Sediment Concentration (SSC) dataset.
|To present a framework called cluster-based machine|
learning regression for optical variability (CMR-OV)
that aims to overcome the challenges of remote
sensing of suspended sediment in shallow waters .
|Random Forest Classification|
|The results show that Random Forest|
Classification achieves a 66% testing accuracy
while Random Forest Regression yields an R2
of 0.11 and a 55% accuracy when applying
a 0.005 mg/L threshold.
|The study employs four NJDEP datasets to analyze|
factors impacting arsenic concentrations in private wells
within a 152.4-m buffer zone, examining LULC types,
orchards, contaminated sites, and abandoned mines in
west-central New Jersey.
|To develop Random Forest Classification and|
Regression, to identify factors contributing to
higher arsenic concentration in private
drinking water wells in west-central
New Jersey .
|MLP, a customized CNN,|
the VGG Net and ResNet
|Customized CNN model achieved an F1|
score of 0.993 and identified the
majority of chemical compounds from
the Danube River.
|The study used JBSS dataset from EU/UNDP EMBLAS II|
project (2016–2017). Seawater samples analyzed via
LC-HRMS, resulting in 30,489 signals. 35 compounds
tentatively identified using non-target screening workflow.
|The study aims to develop an open-source end-to-end|
workflow to estimate pollution load from major
inflowing rivers and other unidentified sources using
a deep learning convolutional neural network
classification model .
|GBM, ANN, XGBoost||The artificial neural network (ANN)|
model demonstrated the best performance,
R2 = 0.989, RMSE = 0.037, and
NSE = 0.995.
|The study aims to analyze 392 datasets|
containing 12 hydrochemical parameters
and identify the most significant
parameters affecting groundwater quality.
|The objective of this text is to compare the performance of three machine|
learning models, DNN, XGBoost, and GBM, in predicting water quality
for drinking purposes using two indices, EWQI and WQI, in Haryana
state, India .
|Naive Bayes, KNN,|
|The approach utilizes a model voting|
system to achieve a 97% accuracy rate
|Collected from sensor nodes that|
are portable and able to gather
physicochemical properties of water.
|The application is to present a water quality monitoring and potability|
classification system that utilizes an Internet of Things (IoT) framework
for rural areas in developing countries .
|FEYN and Q-lattice||Accuracy. The best was Q-lattice|
obtaining a 68%.
|Data primarily focusing on minerals|
and pH levels.
|The objective is to explore the use FEYN and Q-Lattice, for classifying|
water potability based on the presence of key minerals such as pH value,
sulfate, and chloramines .
|Naive Bayes, Decision|
Tree (DT), KNN,
LR, ANN, SVM
|RF achieving the highest accuracy rate|
of 83.78% and DT achieving 74.98%,
while LR had the lowest accuracy rate
|Data primarily focusing on minerals|
and pH levels.
|To classify the potability of drinking water, various classification|
algorithms were used .
|RF, DT, SVM, ANN,|
XGB, LR, GB,
|The performance metrics used were|
Precision, Recall, accuracy, and F1-score.
The RF model achieved the best results
with values of 0.81, 0.8, 0.91, and 0.85, respectively.
|The study examined data from|
269 cities in China.
|Determine the extent to which industrial water usage exacerbates|
the country’s pollution problem .
|CNN, Gabor Filter,|
|The classification accuracy for target species|
was 87.5%. Recall levels for different species,
including relevant toxic species, were 81.82%,
57.15%, 85.71%, and 95%.
|Input images consisting of single-specimen|
marine phytoplankton images, which can be
found in various public datasets
|The authors propose a novel fully automatic methodology that|
uses digital microscopy images of water samples to perform
phytoplankton analyses .
|KNN, Naive Bayes,|
DT, Regression Tree
|Accuracy 97%||Collected from sensor nodes that are portable|
and able to gather physicochemical properties
|A data-driven water classification model that utilizes sensor nodes|
and machine learning algorithms to monitor water parameters such
as pH, turbidity, total dissolved solids, and temperature wirelessly .
|XGBoost tree, ANN,|
|The study improved the forecast accuracy of|
various machine learning techniques for water
quality classification with an ensemble model
|The dataset for the study was adopted|
|The study presents a machine learning-based model using adaptive|
boosting technique to categorize and evaluate the quality rate of
drinking water .
|RF with PySpark||The model demonstrated exceptional|
performance, achieving a perfect 1.0 score
for accuracy, precision, recall, and F1-score.
|The dataset for the study was adopted|
|Developed a Random Forest model using PySpark classification to|
predict the potability of river water based on ten different features .
|DT, Naive Bayes||Accuracy, 97.23%||The dataset for the study was adopted|
|To compare the performance of two machine learning|
algorithms—the Decision Tree Algorithm and the Naive Bayes
algorithm—in predicting drinking water quality .
|RF||MAE, RMSE, R2. The best R2 was achieved|
in 2010, with a value of 0.993, an MAE of 0.132,
and an RMSE of 0.260. The worst performance
occurred in 2005, with an R2 of 0.932, an MAE
of 0.64, and an RMSE of 1.511.
|Remote sensing and GIS methods were|
employed to create training and test sets
|To introduce a study that uses a Random Forest model to predict shallow|
groundwater nitrate concentrations in the Yinchuan Region of central
Yinchuan Plain during four different years .
|the ANN model demonstrated the best performance|
for scenario 1, with R2 values above 0.99 for all
variables (RSC, MH, SAR, PI, SSP, and KI) in both
training and testing. The MLR models showed better
results in scenario 2 compared to the ANN and
|Observations were collected from wells|
within the basin area, utilizing 140 water
samples for this model.
|To develop accurate and reliable machine learning models for predicting|
irrigation water quality parameters, which can help plan irrigation water
and crop management more effectively .
|ANN||MAE, RMSE, R2. On average, across the various|
experiments, R2 achieved a value of 0.83, MAE
obtained a value of 0.22, and RMSE recorded a
value of 0.39.
|Data collected from A high-resolution|
sensor with spatial, temporal, and spectral
(1 nm) capabilities enables continuous
observation and effective long-term
monitoring of inland water quality,
|Introduces proximal remote sensing for inland water quality monitoring,|
presents a high-resolution hyperspectral imager, and demonstrates the
effectiveness of machine learning algorithms in accurately estimating key
water quality parameters such as nitrogen, phosphorus, and
chemical oxygen using this new technology .
|Pearson CC-SVM||The AUC of the PCC-SVM-based method|
was approximately 1.
|Taking the dataset into consideration,|
the parameters included residual chlorine,
pH, turbidity, temperature, conductivity,
oxidation-reduction potential, and
chemical oxygen demand.
|To emphasize the increasing risk of cross-connections in potable-reclaimed|
water dual distribution systems, and to highlight the need for reliable,
cost-effective, and real-time online detection methods .
|Remote sensing p|
|The article compares the advantages of various|
remote sensing platforms and inversion models
while discussing hyperspectral monitoring
applications for multiple water quality parameters.
|comparing various remote sensing|
platforms, inversion models, and
water quality parameters
|To provide an overview of the development and current applications of|
hyperspectral remote sensing in inland water quality detection.
It compares the merits of various remote sensing platforms, inversion
models, and the monitoring of specific water quality parameters .
Support vector regression (SVR)
|The results show that the CNN model outperforms|
the SVR model. This is demonstrated by the higher
AUC values for both the training (0.844) and testing
(0.843) datasets for CNN, compared to the AUC
values of 0.75 for both training and testing datasets
for the SVR model
|The study involves creating 140 groundwater|
datasets in South Korea, dividing them into
calibration and testing groups, and using
15 groundwater conditioning factors
for model training.
|To develop groundwater potential maps using|
machine learning algorithms (specifically
SVR and CNN) to aid in the conservation
and management of groundwater resources .
Machine Regression (MPMR),
Relevance Vector Machine (RVM),
Gaussian Process Regression (GPR),
Extreme Learning Machine (ELM)
|The results showed that the MPMR model performed|
the best among the four models, with the following
metrics: R2 = 0.984, MAE = 0.035, RMSE = 0.044,
Nash–Sutcliffe Efficiency (ENS) = 0.984,
DRefined = 0.995, and Extreme Learning Machine
(ELM) = 0.874.
|Datasets of Lake Huron’s water levels.|
The data spans from 1918 to 2013,
with the period from 1918 to 1993 used
for the training phase, and the remaining
data (from 1994 to 2013) used for testing.
|To evaluate the performance of four advanced|
artificial intelligence models, MPMR, RVM,
GPR, ELM, for forecasting lake level
fluctuations in Lake Huron using
historical datasets .
|Artificial neural network (ANN)|
and adaptive neuro-fuzzy
inference system (ANFIS)
|The best ANN and ANFIS models showed high|
performance with r > 0.95, Nash index > 0.95,
and RMSE < 0.1. The optimal NN
model was t + 4, while ANFIS’s best was t + 6.
|daily rainfall and water level data from 1 January|
2012 to 31 December 2019. These data were
collected from stations P68 and C13, and
provided Empresa PŁblica Metropolitana de
Agua Potable y Saneamiento de Quito.
|To develop and compare machine learning models,|
ANN and ANFIS models, to forecast the water
level of the Salve Faccha reservoir, which
supplies water to Quito, the capital of Ecuador .
|GBT, XGBoost, SVM, DT, RF,|
AdaBoost, LightGBM, ANN
|The GTB model had the lowest mean-squares-error|
and the highest R-squared and adjusted R-squared
values in all case studies. Additionally, over 91%
of the total samples had an error rate below
10% between the predicted and observed values.
|The dataset used in this study consists of|
3348 samples collected over a 21-year monitoring
period from the Bac-Hung-Hai catchment,
which is the largest irrigation and drainage
area in Vietnam.
|To explore the application of machine learning|
methods, specifically GTB model, for
estimating water levels without comprehensive
knowledge of hydrological processes or
complex irrigation system databases .
|Extreme gradient boosting,|
|The results of this study showed that the extreme|
gradient boosting model performed the best,
with R2 = 0.998 and RMSE = 0.048 m, followed
by Random Forest (R2 = 0.997,
RMSE = 0.054 m) and multilinear regression
(R2 = 0.970, RMSE = 0.221 m).
|The dataset, from a Central Kalimantan peat dome,|
contains 2010–2012 groundwater level
measurements, elevation, and precipitation data,
as these factors significantly impact groundwater
|To convey the importance of understanding|
groundwater levels in peatlands, particularly
in Indonesia, which possesses the largest
share of tropical peat carbon .
|XGB, MLR, GWR, SVR,|
RF, IDW, OK, COK.
|The results show that the XGB algorithm|
with the Tweedie loss function achieved
the best performance with an R2 value
of 1.00, and the lowest errors when
compared to other machine learning and
interpolation methods like MLR, GWR,
SVR, RF, IDW, OK, and COK.
|The study used datasets from various sources,|
including precipitation from China
Meteorological Data Service Centre,
topographical factors from SRTM, soil factors
from CSCD, vegetation index from NASA
earth data, auxiliary factors from the National
basic geographic database, land cover data
from ESA, lithology data from the Spatial
Database of Digital Geologic Map
of China, and coordinates from the projection
|To present a framework using the XGB machine|
learning method for learning groundwater
depth in unconfined aquifers in hilly terrain,
where spatial interpolation methods often
face errors .
|Gaussian process regression˙||The most significant variables in predicting|
irrigation water demand were found to be
irrigated cropped area, air temperature, and
vapor pressure deficit. The Gaussian
process regression model showed high
accuracy, with an R2 higher than 0.97 and
RMSE as low as 0.06 km3, even with
different input variable combinations.
|The study used datasets from California Natural|
Resources Agency and gridMET to predict
annual irrigation water demand. They simplified
400+ commodities into 20 crop categories and
multi-crop areas for modeling.
|To develop machine learning models to predict|
California’s annual, county-level irrigation
water demand using various input variables
over an 18-year time span .
|hierarchical cluster analysis (HCA)|
artificial neural networks (ANNs)
|The study discovered 30 clusters through|
HCA, with higher groundwater levels in
the western part of the Ogallala Aquifer
that decreased towards the east. The ANN
models accurately predicted even for
non-calibrated wells, and integrating HCA
and ANN allowed for effective annual
groundwater level forecasting for well
|The study is based on the time series of|
groundwater levels in
403 wells of the Ogallala Aquifer,
|To present a study that employs HCA and ANNs|
to predict annual groundwater levels in
403 wells of the Ogallala Aquifer, which is critical
for agricultural irrigation and public water
|LR, RF, SVM, XGBoost||The XGBoost model, with R2 = 0.86 and|
RMSE = 0.29, outperformed other models.
The entire dataset of HIX had a strong
association with Landsat reflectance.
The HIX decreased from 2015 to 2020.
|Two datasets were used in this research:|
Landsat 8 OLI product, which collected
multispectral imagery of the Earth’s
surface to derive HIX in lakes across
China, and 1150 pairs of field samples
to match Landsat surface reflectance
data and select sensitive spectral
variables for machine learning methods.
|To develop a general model based on Landsat 8|
OLI product embedded in Google Earth Engine
(GEE) to derive the humification index (HIX)
based on Excitation-Emission
Matrices (EEMs) in lakes across China .
|DBSCAN, CNN||Mean Per-Class Error = 0||The data source for leakage detection in this|
paper is derived from hydraulic simulations.
|To tackle the issue of water leakage in urban water supply networks, which|
affects water quality, hydraulics, and public health .
|ANN||Distance leak error < 2.32 m|
for 95% of points.
|Transient head traces at the valve|
after its closure
|To introduce a novel methodology for identifying features in water pipelines,|
which, by accurately predicting the presence of junctions and leaks, aims to
enhance the assessment and maintenance of water distribution systems .
|SVM||Accuracy between 80 and|
83% for the testing dataset.
|Wireless sensors networks was|
presented + 4G for transmission
|To develop an efficient water pipeline monitoring system using wireless|
sensor networks and SVM-based leakage identification to conserve
resources and minimize economic losses .
|LDA, ANN||Accuracy 80%, for the|
|Pressure data||Introduce a data-driven approach using limited pressure measurements and|
machine learning classifiers to accurately localize leaks in water distribution
networks, ultimately conserving water resources and reducing costs .
|Time-frequency CNN||Accuracy 99%, for|
the testing dataset.
|Real datasets from Chengdu|
city and synthesized datasets
containing Gaussian white noise
|To propose a leakage spectrogram and time-frequency convolutional|
neural network model for improved accuracy and stability in leak
detection, and to compare its performance with other classification
models under various signal-to-noise ratio conditions .
for the testing dataset.
|Several months from multiple|
cities across North America.
|To provide an overview of the challenges faced by water utilities in|
detecting and managing leaks in aging water infrastructure and to
present various technologies and methods that have been developed
to address this issue .
|RF||Accuracy in different|
context. The mean of
accuracy 33.5 %
|The dataset includes 24 features as|
inputs to each mode. The leak shapes
were divided into five separate datasets
based on their leak area.
|To present a methodology for predicting leak shapes using vibration|
signals and introduce an innovative signal processing technique that
combines machine learning methods, specifically Random Forest
classifiers, with various signal features .
KNN, Naive Bayes
|Accuracy, Precision, Recall,|
F1-score. 10 scenarios were
simulated, where the best
algorithm was Bayes,
obtaining an average of
96.27, 95.94, 95.78, 93.80.
|Vitens company dataset that describes|
the water distribution networks of
Leeuwarden. The dataset includes data
on flow, pressure, temperature,
turbidity, conductivity, and acidity.
|A real-time hybrid method that uses AI algorithms and hydraulic|
relations for detecting and locating leaks, as well as identifying the
volume of losses material in Water Distribution Networks .
|RF||Accuracy 95%||The data was collected using 18 pressure|
sensors and 3 flow sensors, with a noise
level of 1.5% MPI
|To propose a method for improving the accuracy of a classifier|
model for leak location in water distribution networks .
|KNN||Accuracy 52%||The dataset is created through a two-step|
process involving hydraulic transient
simulations of a water network. The dataset
consists of pressure head data from all nodes
in the network.
|A novel method for identifying and locating leaking pipes in|
pressurized water distribution systems using transient modeling
and the K-nearest neighbors (K-NN) algorithm .
|Accuracy 75%||Data from smart meters located alongside|
water distribution pipelines +
|To introduce the concept of the Internet of Things in water management. It discusses|
the challenges and limitations of traditional methods of leak detection in pipes and
highlights the potential of using a low-cost sensor network and machine learning
algorithms to monitor and control water leaks more efficiently .
|Accuracy: The most|
outstanding result was
achieved by the Random
Forest (RF) model, which
demonstrated a 100%
|Used the radial sensing direction for signal|
collection with a sampling rate of
3000 samples/s in streaming mode.
|To introduce the use of cost-effective MEMS-based accelerometers for|
leak detection and explain the methodology used, including experiments
on real networks, data analysis, and the development of machine learning
|SVM||Accuracy between 88 and 93%||Vibrations measurements using|
|To develop, test, validate, and demonstrate a machine-learning-based risk|
assessment method for early detection of leaks with high likelihood, their
geolocation, and accuracy assessment in the water distribution system at
the University of Lille’s SUNRISE demonstration site in France .
|ANN||Accuracy 97%||Flow data set for pipe||Implementing a machine learning-based risk assessment method enables|
quick detection of highly probable leaks, precise geolocation, and
accurate evaluation within the water distribution system .
Close to 100%.
|Flow and pressure data were|
determined using EPANET software
|To present a study examining the capability of machine learning methods to localize|
leaks in water distribution systems, which is crucial due to the economic losses,
infrastructure damage, and soil contamination caused by water leakage .
|Accuracy 100% in the|
best cases for both metal
and non-metal pipes.
|Features extracted from de-noised|
signals of sound
|Leaks in water distribution networks .|
|Accuracy, Recall. Precision.|
On average, 0.64, 0.65, 0.62.
|8 satellite images and their derived|
parameter for water leak detection
in canal systems.
|The application of this approach lies in the domain of remote sensing and|
infrastructure maintenance, with a particular emphasis on automating the
detection and evaluation of water leaks within extensive canal systems .
|SVM, DT, KNN,|
|Accuracy. The best was|
SquueezeNet with a 95.15%.
|Utilizing piezoelectric accelerometers|
to gather real network data across
multiple cities in China.
|Enhance leak detection efficiency, minimize water losses, mitigate|
structural damage, and bolster public safety by automating the leak
detection process in water distribution systems .
|MLP, CNN, SVM||Accuracy. MLP was the|
best with a 94.89%
|Comprises both leakage and non-leakage|
sounds, systematically gathered via a
cloud information management system
from confirmed underground leakages in
|Developing an AI-based system to address the challenges of|
Non-Revenue Water in densely populated cities .
|Accuracy, Precision, AUC.|
The TFCNN achieved the
highest performance in
terms of accuracy (97.99%),
precision (95.51%), and
area under the curve
|Various methods, including ground penetrating|
radar, gas injection, hydrophones,
vibro-acoustic noise loggers and correlators,
infrared thermography, and in-line devices,
|To effectively monitor and maintain potable Water Distribution|
Networks in order to ensure a continuous and uninterrupted
water supply for customers .
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
García, J.; Leiva-Araos, A.; Diaz-Saavedra, E.; Moraga, P.; Pinto, H.; Yepes, V. Relevance of Machine Learning Techniques in Water Infrastructure Integrity and Quality: A Review Powered by Natural Language Processing. Appl. Sci. 2023, 13, 12497. https://doi.org/10.3390/app132212497
García J, Leiva-Araos A, Diaz-Saavedra E, Moraga P, Pinto H, Yepes V. Relevance of Machine Learning Techniques in Water Infrastructure Integrity and Quality: A Review Powered by Natural Language Processing. Applied Sciences. 2023; 13(22):12497. https://doi.org/10.3390/app132212497Chicago/Turabian Style
García, José, Andres Leiva-Araos, Emerson Diaz-Saavedra, Paola Moraga, Hernan Pinto, and Víctor Yepes. 2023. "Relevance of Machine Learning Techniques in Water Infrastructure Integrity and Quality: A Review Powered by Natural Language Processing" Applied Sciences 13, no. 22: 12497. https://doi.org/10.3390/app132212497