Relevance of Machine Learning Techniques in Water Infrastructure Integrity and Quality: A Review Powered by Natural Language Processing
Abstract
:1. Introduction
- The article conducts a bibliographic analysis of English-language articles from 2015 onward, resulting in a total of 1087 articles, to explore the applications of machine learning (ML) techniques in water infrastructure integrity and quality.
- The study utilizes a semi-automatic approach, leveraging BERTopic, for conducting the bibliographic analysis, which enhances the contextual comprehension in topic modeling.
- The article also emphasizes the potential of combining ML techniques with cutting-edge monitoring systems in multiple aspects of water infrastructure and quality.
- The insights drawn from the analysis highlight the instrumental role of ML in enhancing water infrastructure’s integrity and quality, suggesting promising future research directions.
2. Methodology
2.1. Topic Analysis
2.2. Bigram Analysis
3. BERT Topics and General Bibliometrics
4. Bigram and Traditional Results
4.1. Advancements in Machine Learning for Water Contaminants and Soil Erosion
4.1.1. Bigram Document Analysis
4.1.2. Traditional Analysis
4.2. Assessing Water Quality and Potability
4.2.1. Bigram Document Analysis
4.2.2. Traditional Analysis
4.3. Forecasting Water Levels
4.3.1. Bigram Document Analysis
4.3.2. Traditional Analysis
4.4. Advanced Leak Detection in Water Networks
4.4.1. Bigram Document Analysis
4.4.2. Traditional Analysis
5. Discussion and Future Research Directions
5.1. Advancements in Machine Learning for Water Contaminants and Soil Erosion
5.2. Forecasting Water Levels
5.3. Advanced Leak Detection in Water Networks
5.4. Assessing Water Quality and Potability
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Hanjra, M.A.; Blackwell, J.; Carr, G.; Zhang, F.; Jackson, T.M. Wastewater irrigation and environmental health: Implications for water governance and public policy. Int. J. Hyg. Environ. Health 2012, 215, 255–269. [Google Scholar] [CrossRef]
- Green, T.R.; Taniguchi, M.; Kooi, H.; Gurdak, J.J.; Allen, D.M.; Hiscock, K.M.; Treidel, H.; Aureli, A. Beneath the surface of global change: Impacts of climate change on groundwater. J. Hydrol. 2011, 405, 532–560. [Google Scholar] [CrossRef]
- Koop, S.H.; van Leeuwen, C.J. Assessment of the sustainability of water resources management: A critical review of the city blueprint approach. Water Resour. Manag. 2015, 29, 5649–5670. [Google Scholar] [CrossRef]
- Marques, A.C.; Veras, C.E.; Rodriguez, D.A. Assessment of water policies contributions for sustainable water resources management under climate change scenarios. J. Hydrol. 2022, 608, 127690. [Google Scholar] [CrossRef]
- Ferreira, D.C.; Graziele, I.; Marques, R.C.; Gonçalves, J. Investment in drinking water and sanitation infrastructure and its impact on waterborne diseases dissemination: The Brazilian case. Sci. Total Environ. 2021, 779, 146279. [Google Scholar] [CrossRef] [PubMed]
- Hussain, M.I.; Muscolo, A.; Farooq, M.; Ahmad, W. Sustainable use and management of non-conventional water resources for rehabilitation of marginal lands in arid and semiarid environments. Agric. Water Manag. 2019, 221, 462–476. [Google Scholar] [CrossRef]
- Wu, B.; Tian, F.; Zhang, M.; Piao, S.; Zeng, H.; Zhu, W.; Liu, J.; Elnashar, A.; Lu, Y. Quantifying global agricultural water appropriation with data derived from earth observations. J. Clean. Prod. 2022, 358, 131891. [Google Scholar] [CrossRef]
- Gao, M.; Zhu, L.; Peh, C.K.; Ho, G.W. Solar absorber material and system designs for photothermal water vaporization towards clean water and energy production. Energy Environ. Sci. 2019, 12, 841–864. [Google Scholar] [CrossRef]
- Mishra, R.; Dubey, S. Fresh water availability and it’s global challenge. Br. J. Multidiscip. Adv. Stud. 2023, 4, 1–78. [Google Scholar] [CrossRef]
- Sohail, M.; Mustafa, S.; Ali, M.; Riaz, S. Agricultural communities’ risk assessment and the effects of climate change: A pathway toward green productivity and sustainable development. Front. Environ. Sci. 2023, 10, 948016. [Google Scholar] [CrossRef]
- Khan, H.F.; Arshad, S.A. Beyond water scarcity: Water (in) security and social justice in Karachi. J. Hydrol. Reg. Stud. 2022, 42, 101140. [Google Scholar] [CrossRef]
- Ajith, J.B.; Manimegalai, R.; Ilayaraja, V. An IoT based smart water quality monitoring system using cloud. In Proceedings of the 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), Vellore, India, 24–25 February 2020; pp. 1–7. [Google Scholar]
- Panigrahi, N.; Patro, S.; Kumar, R.; Omar, M.; Ngan, T.T.; Giang, N.L.; Thu, B.T.; Thang, N.T. Groundwater Quality Analysis and Drinkability Prediction using Artificial Intelligence. Earth Sci. Inform. 2023, 16, 1701–1725. [Google Scholar] [CrossRef]
- Xu, Z.; Lv, Z.; Li, J.; Shi, A. A novel approach for predicting water demand with complex patterns based on ensemble learning. Water Resour. Manag. 2022, 36, 4293–4312. [Google Scholar] [CrossRef]
- Ayati, A.H.; Haghighi, A.; Ghafouri, H.R. Machine Learning–Assisted Model for Leak Detection in Water Distribution Networks Using Hydraulic Transient Flows. J. Water Resour. Plan. Manag. 2022, 148, 04021104. [Google Scholar] [CrossRef]
- Hofmann, T. Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 2001, 42, 177. [Google Scholar] [CrossRef]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401, 788–791. [Google Scholar] [CrossRef] [PubMed]
- Arora, S.; Ge, R.; Moitra, A. Learning topic models–going beyond SVD. In Proceedings of the 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, New Brunswick, NJ, USA, 20–23 October 2012; pp. 1–10. [Google Scholar]
- Grootendorst, M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv 2022, arXiv:2203.05794. [Google Scholar]
- Garcia, J.; Villavicencio, G.; Altimiras, F.; Crawford, B.; Soto, R.; Minatogawa, V.; Franco, M.; Martínez-Muñoz, D.; Yepes, V. Machine learning techniques applied to construction: A hybrid bibliometric analysis of advances and future directions. Autom. Constr. 2022, 142, 104532. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- McInnes, L.; Healy, J.; Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar]
- Campello, R.J.; Moulavi, D.; Sander, J. Density-based clustering based on hierarchical density estimates. In Proceedings of the Advances in Knowledge Discovery and Data Mining: 17th Pacific-Asia Conference, PAKDD 2013, Gold Coast, Australia, 14–17 April 2013; Proceedings, Part II 17. Springer: Berlin/Heidelberg, Germany, 2013; pp. 160–172. [Google Scholar]
- Aria, M.; Cuccurullo, C. bibliometrix: An R-tool for comprehensive science mapping analysis. J. Inf. 2017, 11, 959–975. [Google Scholar] [CrossRef]
- Grivel, L.; Mutschke, P.; Polanco, X. Thematic mapping on bibliographic databases by cluster analysis: A description of the sdoc environment with solis. Knowl. Organ. 1995, 22, 70–77. [Google Scholar]
- López-Fernández, M.C.; Serrano-Bedia, A.M.; Pérez-Pérez, M. Entrepreneurship and family firm research: A bibliometric analysis of an emerging field. J. Small Bus. Manag. 2016, 54, 622–639. [Google Scholar] [CrossRef]
- Bradford, S.C. Sources of information on specific subjects. Engineering 1934, 137, 85–86. [Google Scholar]
- Rao, P.; Wang, Y.; Liu, Y.; Wang, X.; Hou, Y.; Pan, S.; Wang, F.; Zhu, D. A comparison of multiple methods for mapping groundwater levels in the Mu Us Sandy Land, China. J. Hydrol. Reg. Stud. 2022, 43, 101189. [Google Scholar] [CrossRef]
- Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef]
- Li, X.; Sha, J.; Wang, Z.L. Comparison of daily streamflow forecasts using extreme learning machines and the Random Forest method. Hydrol. Sci. J. 2019, 64, 1857–1866. [Google Scholar] [CrossRef]
- Schütze, H.; Manning, C.D.; Raghavan, P. Introduction to Information Retrieval; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
- Jurafsky, D.; Martin, J.H. Speech and Language Processing, 3rd ed.; Stanford University: Stanford, CA, USA, 2019. [Google Scholar]
- Knoll, L.; Breuer, L.; Bach, M. Large scale prediction of groundwater nitrate concentrations from spatial data using machine learning. Sci. Total Environ. 2019, 668, 1317–1327. [Google Scholar] [CrossRef]
- Garosi, Y.; Sheklabadi, M.; Conoscenti, C.; Pourghasemi, H.R.; Van Oost, K. Assessing the performance of GIS-based machine learning models with different accuracy measures for determining susceptibility to gully erosion. Sci. Total Environ. 2019, 664, 1117–1132. [Google Scholar] [CrossRef]
- Tan, K.; Ma, W.; Chen, L.; Wang, H.; Du, Q.; Du, P.; Yan, B.; Liu, R.; Li, H. Estimating the distribution trend of soil heavy metals in mining area from HyMap airborne hyperspectral imagery based on ensemble learning. J. Hazard. Mater. 2021, 401, 123288. [Google Scholar] [CrossRef]
- Mosavi, A.; Sajedi-Hosseini, F.; Choubin, B.; Taromideh, F.; Rahi, G.; Dineva, A.A. Susceptibility mapping of soil water erosion using machine learning models. Water 2020, 12, 1995. [Google Scholar] [CrossRef]
- Mukherjee, A.; Sarkar, S.; Chakraborty, M.; Duttagupta, S.; Bhattacharya, A.; Saha, D.; Bhattacharya, P.; Mitra, A.; Gupta, S. Occurrence, predictors and hazards of elevated groundwater arsenic across India through field observations and regional-scale AI-based modeling. Sci. Total Environ. 2021, 759, 143511. [Google Scholar] [CrossRef] [PubMed]
- Chakraborty, M.; Sarkar, S.; Mukherjee, A.; Shamsudduha, M.; Ahmed, K.M.; Bhattacharya, A.; Mitra, A. Modeling regional-scale groundwater arsenic hazard in the transboundary Ganges River Delta, India and Bangladesh: Infusing physically-based model with machine learning. Sci. Total Environ. 2020, 748, 141107. [Google Scholar] [CrossRef] [PubMed]
- Knoll, L.; Breuer, L.; Bach, M. Nation-wide estimation of groundwater redox conditions and nitrate concentrations through machine learning. Environ. Res. Lett. 2020, 15, 064004. [Google Scholar] [CrossRef]
- Harrison, J.W.; Lucius, M.A.; Farrell, J.L.; Eichler, L.W.; Relyea, R.A. Prediction of stream nitrogen and phosphorus concentrations from high-frequency sensors using Random Forests Regression. Sci. Total Environ. 2021, 763, 143005. [Google Scholar] [CrossRef]
- Messier, K.P.; Kane, E.; Bolich, R.; Serre, M.L. Nitrate variability in groundwater of North Carolina using monitoring and private well data models. Environ. Sci. Technol. 2014, 48, 10804–10812. [Google Scholar] [CrossRef]
- Messier, K.P.; Wheeler, D.C.; Flory, A.R.; Jones, R.R.; Patel, D.; Nolan, B.T.; Ward, M.H. Modeling groundwater nitrate exposure in private wells of North Carolina for the Agricultural Health Study. Sci. Total Environ. 2019, 655, 512–519. [Google Scholar] [CrossRef]
- Ransom, K.M.; Nolan, B.T.; Stackelberg, P.; Belitz, K.; Fram, M.S. Machine learning predictions of nitrate in groundwater used for drinking supply in the conterminous United States. Sci. Total Environ. 2022, 807, 151065. [Google Scholar] [CrossRef]
- Podgorski, J.; Araya, D.; Berg, M. Geogenic manganese and iron in groundwater of Southeast Asia and Bangladesh–Machine learning spatial prediction modeling and comparison with arsenic. Sci. Total Environ. 2022, 833, 155131. [Google Scholar] [CrossRef]
- Kwon, S.; Seo, I.W.; Noh, H.; Kim, B. Hyperspectral retrievals of suspended sediment using cluster-based machine learning regression in shallow waters. Sci. Total Environ. 2022, 833, 155168. [Google Scholar] [CrossRef] [PubMed]
- Giri, S.; Kang, Y.; MacDonald, K.; Tippett, M.; Qiu, Z.; Lathrop, R.G.; Obropta, C.C. Revealing the sources of arsenic in private well water using Random Forest Classification and Regression. Sci. Total Environ. 2023, 857, 159360. [Google Scholar] [CrossRef] [PubMed]
- Alygizakis, N.; Giannakopoulos, T.; Thomaidis, N.S.; Slobodnik, J. Detecting the sources of chemicals in the Black Sea using non-target screening and deep learning convolutional neural networks. Sci. Total Environ. 2022, 847, 157554. [Google Scholar] [CrossRef] [PubMed]
- Raheja, H.; Goel, A.; Pal, M. Prediction of groundwater quality indices using machine learning algorithms. Water Pract. Technol. 2022, 17, 336–351. [Google Scholar] [CrossRef]
- Alipio, M.I. Data-driven IoT-based water quality monitoring and potability classification system in rural areas. In Proceedings of the 2020 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea, 21–23 October 2020; pp. 634–639. [Google Scholar]
- Riyantoko, P.A.; Sugiarto; Diyasa, I.G.S.M.; Kraugusteeliana. “FQAM” Feyn-QLattice Automation Modelling: Python Module of Machine Learning for Data Classification in Water Potability. In Proceedings of the 2021 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), Jakarta, Indonesia, 28–29 October 2021; pp. 135–141. [Google Scholar]
- Yusuf, H.; Alhaddad, S.; Yusuf, S.; Hewahi, N. Classification of Water Potability Using Machine Learning Algorithms. In Proceedings of the 2022 International Conference on Data Analytics for Business and Industry (ICDABI), Sakhir, Bahrain, 25–26 October 2022; pp. 454–458. [Google Scholar]
- Priyadarshini, I.; Alkhayyat, A.; Obaid, A.J.; Sharma, R. Water pollution reduction for sustainable urban development using machine learning techniques. Cities 2022, 130, 103970. [Google Scholar] [CrossRef]
- Rivas-Villar, D.; Rouco, J.; Carballeira, R.; Penedo, M.G.; Novo, J. Fully automatic detection and classification of phytoplankton specimens in digital microscopy images. Comput. Methods Programs Biomed. 2021, 200, 105923. [Google Scholar] [CrossRef]
- Alipio, M.I. Towards developing a classification model for water potability in Philippine rural areas. ASEAN Eng. J. 2020, 10, 24–34. [Google Scholar] [CrossRef]
- Dalal, S.; Onyema, E.M.; Romero, C.A.T.; Ndufeiya-Kumasi, L.C.; Maryann, D.C.; Nnedimkpa, A.J.; Bhatia, T.K. Machine learning-based forecasting of potability of drinking water through adaptive boosting model. Open Chem. 2022, 20, 816–828. [Google Scholar] [CrossRef]
- Alomani, S.M.; Alhawiti, N.I.; Alhakamy, A. Prediction of Quality of Water According to a Random Forest Classifier. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 892–899. [Google Scholar] [CrossRef]
- Haq, M.I.K.; Ramadhan, F.D.; Az-Zahra, F.; Kurniawati, L.; Helen, A. Classification of water potability using machine learning algorithms. In Proceedings of the 2021 International Conference on Artificial Intelligence and Big Data Analytics, Bandung, Indonesia, 27–29 October 2021; pp. 1–5. [Google Scholar]
- He, S.; Wu, J.; Wang, D.; He, X. Predictive modeling of groundwater nitrate pollution and evaluating its main impact factors using Random Forest. Chemosphere 2022, 290, 133388. [Google Scholar] [CrossRef]
- Kouadri, S.; Pande, C.B.; Panneerselvam, B.; Moharir, K.N.; Elbeltagi, A. Prediction of irrigation groundwater quality parameters using ANN, LSTM, and MLR models. Environ. Sci. Pollut. Res. 2021, 29, 21067–21091. [Google Scholar] [CrossRef] [PubMed]
- Sun, X.; Zhang, Y.; Shi, K.; Zhang, Y.; Li, N.; Wang, W.; Huang, X.; Qin, B. Monitoring water quality using proximal remote sensing technology. Sci. Total Environ. 2022, 803, 149805. [Google Scholar] [CrossRef]
- Xu, X.; Liu, Y.; Liu, S.; Li, J.; Guo, G.; Smith, K. Real-time detection of potable-reclaimed water pipe cross-connection events by conventional water quality sensors using machine learning methods. J. Environ. Manag. 2019, 238, 201–209. [Google Scholar] [CrossRef]
- Cao, Q.; Yu, G.; Qiao, Z. Application and recent progress of inland water monitoring using remote sensing techniques. Environ. Monit. Assess. 2023, 195, 1–16. [Google Scholar] [CrossRef] [PubMed]
- Ahmed, S.; Mahzabin, M.; Shahpar, S.; Tonni, S.I.; Rahman, M.S. Assessment of Water Quality in Smart City Environment Leveraging ML-IoT. In Proceedings of the International Conference on Fourth Industrial Revolution and Beyond 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 215–227. [Google Scholar]
- Zai, C.; El Mechal, C.; El Amrani El Idrissi, N.; Ghennioui, H. Prediction of Water Quality Using Artificial Intelligence (AI) and Statistical Approach. In Proceedings of the Digital Technologies and Applications: Proceedings of ICDTA’22, Fez, Morocco, 28–30 January 2022; Volume 1, pp. 34–42. [Google Scholar]
- Bajpai, A.; Chaubey, S.; Patro, B.; Verma, A. A Real-Time Approach to Classify the Water Quality of the River Ganga at Mehandi Ghat, Kannuaj. In Proceedings of the 2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET), Kota Kinabalu, Malaysia, 13–15 September 2022; pp. 1–6. [Google Scholar]
- Chafloque, R.; Rodriguez, C.; Pomachagua, Y.; Hilario, M. Predictive Neural Networks Model for Detection of Water Quality for Human Consumption. In Proceedings of the 2021 13th International Conference on Computational Intelligence and Communication Networks (CICN), Lima, Peru, 22–23 September 2021; pp. 172–176. [Google Scholar]
- El-Attar, N.E.; Lotfy, H.R.; Awad, W.A. Performance of Artificial Intelligence Models in Analysis and Prediction of Water Potability. In Proceedings of the 2022 International Telecommunications Conference (ITC-Egypt), Alexandria, Egypt, 26–28 July 2022; pp. 1–6. [Google Scholar]
- Panahi, M.; Sadhasivam, N.; Pourghasemi, H.R.; Rezaie, F.; Lee, S. Spatial prediction of groundwater potential mapping based on convolutional neural network (CNN) and support vector regression (SVR). J. Hydrol. 2020, 588, 125033. [Google Scholar] [CrossRef]
- Bonakdari, H.; Ebtehaj, I.; Samui, P.; Gharabaghi, B. Lake water-level fluctuations forecasting using minimax probability machine regression, relevance vector machine, Gaussian process regression, and extreme learning machine. Water Resour. Manag. 2019, 33, 3965–3984. [Google Scholar] [CrossRef]
- Páliz Larrea, P.; Zapata-Ríos, X.; Campozano Parra, L. Application of neural network models and ANFIS for water level forecasting of the Salve Faccha Dam in the Andean Zone in Northern Ecuador. Water 2021, 13, 2011. [Google Scholar] [CrossRef]
- Truong, V.H.; Ly, Q.V.; Le, V.C.; Vu, T.B.; Le, T.T.T.; Tran, T.T.; Goethals, P. Machine learning-based method for forecasting water levels in irrigation and drainage systems. Environ. Technol. Innov. 2021, 23, 101762. [Google Scholar] [CrossRef]
- Hikouei, I.S.; Eshleman, K.N.; Saharjo, B.H.; Graham, L.L.; Applegate, G.; Cochrane, M.A. Using machine learning algorithms to predict groundwater levels in Indonesian tropical peatlands. Sci. Total Environ. 2023, 857, 159701. [Google Scholar] [CrossRef]
- Emami, M.; Ahmadi, A.; Daccache, A.; Nazif, S.; Mousavi, S.F.; Karami, H. County-level irrigation water demand estimation using machine learning: Case study of California. Water 2022, 14, 1937. [Google Scholar] [CrossRef]
- Oliveira, L.C.d.; Santos, C.A.G.; de Farias, C.A.S.; da Silva, R.M.; Singh, V.P. Predicting Groundwater Levels in Ogallala Aquifer Wells Using Hierarchical Cluster Analysis and artificial neural networks. J. Hydrol. Eng. 2023, 28, 04022042. [Google Scholar] [CrossRef]
- Shang, Y.; Song, K.; Lai, F.; Lyu, L.; Liu, G.; Fang, C.; Hou, J.; Qiang, S.; Yu, X.; Wen, Z. Remote sensing of fluorescent humification levels and its potential environmental linkages in lakes across China. Water Res. 2023, 230, 119540. [Google Scholar] [CrossRef] [PubMed]
- Demir, V.; Yaseen, Z.M. Neurocomputing intelligence models for lakes water level forecasting: A comprehensive review. Neural Comput. Appl. 2023, 35, 303–343. [Google Scholar] [CrossRef]
- Hu, X.; Han, Y.; Yu, B.; Geng, Z.; Fan, J. Novel leakage detection and water loss management of urban water supply network using multiscale neural networks. J. Clean. Prod. 2021, 278, 123611. [Google Scholar] [CrossRef]
- Bohorquez, J.; Alexander, B.; Simpson, A.R.; Lambert, M.F. Leak detection and topology identification in pipelines using fluid transients and artificial neural networks. J. Water Resour. Plan. Manag. 2020, 146, 04020040. [Google Scholar] [CrossRef]
- Liu, Y.; Ma, X.; Li, Y.; Tie, Y.; Zhang, Y.; Gao, J. Water pipeline leakage detection based on machine learning and wireless sensor networks. Sensors 2019, 19, 5086. [Google Scholar] [CrossRef]
- Sun, C.; Parellada, B.; Puig, V.; Cembrano, G. Leak localization in water distribution networks using pressure and data-driven classifier approach. Water 2019, 12, 54. [Google Scholar] [CrossRef]
- Guo, G.; Yu, X.; Liu, S.; Ma, Z.; Wu, Y.; Xu, X.; Wang, X.; Smith, K.; Wu, X. Leakage detection in water distribution systems based on time–frequency convolutional neural network. J. Water Resour. Plan. Manag. 2021, 147, 04020101. [Google Scholar] [CrossRef]
- Ravichandran, T.; Gavahi, K.; Ponnambalam, K.; Burtea, V.; Mousavi, S.J. Ensemble-based machine learning approach for improved leak detection in water mains. J. Hydroinform. 2021, 23, 307–323. [Google Scholar] [CrossRef]
- Butterfield, J.D.; Meyers, G.; Meruane, V.; Collins, R.P.; Beck, S.B. Experimental investigation into techniques to predict leak shapes in water distribution systems using vibration measurements. J. Hydroinform. 2018, 20, 815–828. [Google Scholar] [CrossRef]
- Fereidooni, Z.; Tahayori, H.; Bahadori-Jahromi, A. A hybrid model-based method for leak detection in large scale water distribution networks. J. Ambient Intell. Humaniz. Comput. 2021, 12, 1613–1629. [Google Scholar] [CrossRef]
- Chen, J.; Feng, X.; Xiao, S. An iterative method for leakage zone identification in water distribution networks based on machine learning. Struct. Health Monit. 2021, 20, 1938–1956. [Google Scholar] [CrossRef]
- Levinas, D.; Perelman, G.; Ostfeld, A. Water leak localization using high-resolution pressure sensors. Water 2021, 13, 591. [Google Scholar] [CrossRef]
- Alves Coelho, J.; Glória, A.; Sebastião, P. Precise water leak detection using machine learning and real-time sensor data. IoT 2020, 1, 474–493. [Google Scholar] [CrossRef]
- Tariq, S.; Bakhtawar, B.; Zayed, T. Data-driven application of MEMS-based accelerometers for leak detection in water distribution networks. Sci. Total Environ. 2022, 809, 151110. [Google Scholar] [CrossRef]
- Cantos, W.P.; Juran, I.; Tinelli, S. Machine-learning–based risk assessment method for leak detection and geolocation in a water distribution system. J. Infrastruct. Syst. 2020, 26, 04019039. [Google Scholar] [CrossRef]
- Mysorewala, M.F.; Cheded, L.; Ali, I.M. Leak detection using flow-induced vibrations in pressurized wall-mounted water pipelines. IEEE Access 2020, 8, 188673–188687. [Google Scholar]
- Mashhadi, N.; Shahrour, I.; Attoue, N.; El Khattabi, J.; Aljer, A. Use of machine learning for leak detection and localization in water distribution systems. Smart Cities 2021, 4, 1293–1315. [Google Scholar] [CrossRef]
- Tijani, I.; Abdelmageed, S.; Fares, A.; Fan, K.; Hu, Z.; Zayed, T. Improving the leak detection efficiency in water distribution networks using noise loggers. Sci. Total Environ. 2022, 821, 153530. [Google Scholar] [CrossRef]
- Chen, J.; Tang, P.; Rakstad, T.; Patrick, M.; Zhou, X. Augmenting a deep-learning algorithm with canal inspection knowledge for reliable water leak detection from multispectral satellite images. Adv. Eng. Inform. 2020, 46, 101161. [Google Scholar] [CrossRef]
- Yu, T.; Chen, X.; Yan, W.; Xu, Z.; Ye, M. Leak detection in water distribution systems by classifying vibration signals. Mech. Syst. Signal Process. 2023, 185, 109810. [Google Scholar] [CrossRef]
- Vanijjirattikhan, R.; Khomsay, S.; Kitbutrawat, N.; Khomsay, K.; Supakchukul, U.; Udomsuk, S.; Suwatthikul, J.; Oumtrakul, N.; Anusart, K. AI-based acoustic leak detection in water distribution systems. Results Eng. 2022, 15, 100557. [Google Scholar] [CrossRef]
- Bykerk, L.; Valls Miro, J. Detection of Water Leaks in Suburban Distribution Mains with Lift and Shift Vibro-Acoustic Sensors. Vibration 2022, 5, 370–382. [Google Scholar] [CrossRef]
- Gupta, A.; Kulat, K. A selective literature review on leak management techniques for water distribution system. Water Resour. Manag. 2018, 32, 3247–3269. [Google Scholar] [CrossRef]
- Kammoun, M.; Kammoun, A.; Abid, M. LSTM-AE-WLDL: Unsupervised LSTM Auto-Encoders for Leak Detection and Location in Water Distribution Networks. Water Resour. Manag. 2023, 37, 731–746. [Google Scholar] [CrossRef]
Techniques | Metrics | Data | Application |
---|---|---|---|
MLR, CART, RF | MAE, RMSE, R2. RF was the best with R2 = 0.54, MAE = 5.9, RMSE = 9.19 | Groundwater database˙GruWaH˙to record groundwater quality throughout the state of Hesse | The spatial distribution of nitrate concentration in groundwater [34]. |
SVM, RF, GAM, NB | MAE, RMSE, AUC. The best performance was RF with AUC = 92.4, RMSE = 0.2874, MAE = 0.082 | Digital map of the spatial distribution of gullies. 130 locations of Ekbatan Dam Basin, Hamedan, western Iran | Create gully erosion susceptibility map (GESM) in a part of Ekbatan Dam Basin, Hamedan, western Iran [35]. |
SVM, RF, XGBoost, ExtraTrees, Adaboost, ELM, MLP | R2, RMSE, MAE. Of the tested models, XGBoost achieved the best results for all the analyzed metals. | The collection of surface soil samples was essentially synchronized with the acquisition of airborne hyperspectral data from late April and early May of 2017. | The prediction of four heavy metals is studied. Arsenic (As), chromium (Cr), lead (Pb), and zinc (Zn) [36]. |
RF, NB, Gaussian process | Accuracy, Kappa, POD. The best model was RF with Accuracy = 91%, Kappa = 82%, POD = 94% | Data included 227 samples of erosion and non-erosion locations through field surveys | Machine learning models for mapping susceptibility to soil erosion by water [37]. |
RF | Accuracy, 82–84%, AUC, 88–89%. | The data was acquired from the National Rural Drinking Water Programme (NRDWP) and the Central Ground Water Board (CGWB), both under the Ministry of Jal Shakti of the Government of India. It includes 2,611,365 drinking water wells (NRDWP, 2018) and 649 monitoring wells (CGWB, 2018). | Use field observations of arsenic (As) in groundwater with high spatial resolution, with the aim of delineating the regional-scale occurrence of elevated arsenic concentrations in groundwater [38]. |
RF, Boosted Regression Trees (BRT) and Logistic Regression (LR) | Accuracy, Sensitivity, Specificity.The best model was Random Forest with 82% in all indicators. | A total of 100,358 arsenic data in groundwater were compiled from various databases or published reports from India and Bangladesh, BGS/DPHE (2001), PHED (2006), and BWDB (2013). | Transboundary regional scale models are used to calculate the probability of arsenic concentrations in groundwater [39]. |
RF, QRF | R2 0.53 (RFO2), 0.24 (RFFe), 0.51 () | The study uses data from the WFD groundwater monitoring network in Germany and selects monitoring sites based on criteria like metadata, sampling depth, observation period, and excluding concentration outliers. | Applies machine learning techniques to estimate groundwater redox conditions and nitrate concentrations across Germany [40]. |
RF | MAE, RMSE. For , MAE = 0.075, RMSE = 0.12 | The dataset features nutrient parameters, method detection limits, sensor calibration, and collection intervals ranging from 1 to 15 min, with monthly quality checks for water quality monitoring. | To estimate stream nitrogen (N) and phosphorus (P) concentrations from sensor data in a forested, mountainous drainage area in upstate New York [41]. |
Kriging, SVM, GBM, Balancing class techniques. | Accuracy, Kappa. The best result with Up-Sampling. Accuracy = 0.725, Kappa = 0.369 | Consists of 22,059 groundwater nitrate measurements from private wells in North Carolina, collected and maintained by the NC-DHHS between 1990 and 2011. The data were obtained by Messier et al. (2014) [42] and used for modeling. | Estimate groundwater nitrate concentrations in private wells, aiming to improve exposure estimates for the AHS cohort [43]. |
XGB | 76 variables were retained in the final model, with an R2 of 0.83 for the training data and 0.49 for the hold-out data. The RMSE values were 1.15 and 2.01 for the training and hold-out data , respectively. | The model uses data from 12,082 wells and various predictor variables, providing accurate estimates at both national and regional scales. | The objective of this study is to develop an extreme gradient boosting (XGB) machine learning model to predict the distribution of nitrate in groundwater across the conterminous United States (CONUS) [44]. |
RF, Boosted models | The best-performing model was the Random Forest (RF) model, which achieved an AUC of 0.80 and a Cohen’s Kappa score of 0.43. | The data used in the study consist of over 6000 groundwater measurements of manganese (Mn) and iron (Fe) from Southeast Asia and Bangladesh. These measurements were statistically examined along with other physicochemical parameters. | To use machine learning methods, specifically random forest and generalized boosted regression modeling, to analyze over 6000 groundwater measurements of naturally occurring manganese (Mn) and iron (Fe) in Southeast Asia and Bangladesh [45]. |
Gaussian mixture model clustering technique, random forest regressor (RFR) | The clustered RFR model yielded improvements of 10.82% in R2, 18.57% in RMSEP, 3.03% in MAPE, and 10.81 in TSE compared to the nonclustering case. | The study involves preprocessing hyperspectral images, converting digital numbers to normalized reflectance values, and matching the processed images with the Suspended Sediment Concentration (SSC) dataset. | To present a framework called cluster-based machine learning regression for optical variability (CMR-OV) that aims to overcome the challenges of remote sensing of suspended sediment in shallow waters [46]. |
Random Forest Classification and Regression | The results show that Random Forest Classification achieves a 66% testing accuracy while Random Forest Regression yields an R2 of 0.11 and a 55% accuracy when applying a 0.005 mg/L threshold. | The study employs four NJDEP datasets to analyze factors impacting arsenic concentrations in private wells within a 152.4-m buffer zone, examining LULC types, orchards, contaminated sites, and abandoned mines in west-central New Jersey. | To develop Random Forest Classification and Regression, to identify factors contributing to higher arsenic concentration in private drinking water wells in west-central New Jersey [47]. |
MLP, a customized CNN, the VGG Net and ResNet architectures | Customized CNN model achieved an F1 score of 0.993 and identified the majority of chemical compounds from the Danube River. | The study used JBSS dataset from EU/UNDP EMBLAS II project (2016–2017). Seawater samples analyzed via LC-HRMS, resulting in 30,489 signals. 35 compounds tentatively identified using non-target screening workflow. | The study aims to develop an open-source end-to-end workflow to estimate pollution load from major inflowing rivers and other unidentified sources using a deep learning convolutional neural network classification model [48]. |
Techniques | Metrics | Data | Application |
---|---|---|---|
GBM, ANN, XGBoost | The artificial neural network (ANN) model demonstrated the best performance, R2 = 0.989, RMSE = 0.037, and NSE = 0.995. | The study aims to analyze 392 datasets containing 12 hydrochemical parameters and identify the most significant parameters affecting groundwater quality. | The objective of this text is to compare the performance of three machine learning models, DNN, XGBoost, and GBM, in predicting water quality for drinking purposes using two indices, EWQI and WQI, in Haryana state, India [49]. |
Naive Bayes, KNN, CART | The approach utilizes a model voting system to achieve a 97% accuracy rate | Collected from sensor nodes that are portable and able to gather physicochemical properties of water. | The application is to present a water quality monitoring and potability classification system that utilizes an Internet of Things (IoT) framework for rural areas in developing countries [50]. |
FEYN and Q-lattice | Accuracy. The best was Q-lattice obtaining a 68%. | Data primarily focusing on minerals and pH levels. | The objective is to explore the use FEYN and Q-Lattice, for classifying water potability based on the presence of key minerals such as pH value, sulfate, and chloramines [51]. |
Naive Bayes, Decision Tree (DT), KNN, LR, ANN, SVM | RF achieving the highest accuracy rate of 83.78% and DT achieving 74.98%, while LR had the lowest accuracy rate of 48.74%. | Data primarily focusing on minerals and pH levels. | To classify the potability of drinking water, various classification algorithms were used [52]. |
RF, DT, SVM, ANN, XGB, LR, GB, LightGB, HistGB | The performance metrics used were Precision, Recall, accuracy, and F1-score. The RF model achieved the best results with values of 0.81, 0.8, 0.91, and 0.85, respectively. | The study examined data from 269 cities in China. | Determine the extent to which industrial water usage exacerbates the country’s pollution problem [53]. |
CNN, Gabor Filter, RF, SVM | The classification accuracy for target species was 87.5%. Recall levels for different species, including relevant toxic species, were 81.82%, 57.15%, 85.71%, and 95%. | Input images consisting of single-specimen marine phytoplankton images, which can be found in various public datasets | The authors propose a novel fully automatic methodology that uses digital microscopy images of water samples to perform phytoplankton analyses [54]. |
KNN, Naive Bayes, DT, Regression Tree | Accuracy 97% | Collected from sensor nodes that are portable and able to gather physicochemical properties of water | A data-driven water classification model that utilizes sensor nodes and machine learning algorithms to monitor water parameters such as pH, turbidity, total dissolved solids, and temperature wirelessly [55]. |
XGBoost tree, ANN, Ensemble Model | The study improved the forecast accuracy of various machine learning techniques for water quality classification with an ensemble model achieving 96.4%. | The dataset for the study was adopted from Kaggle. | The study presents a machine learning-based model using adaptive boosting technique to categorize and evaluate the quality rate of drinking water [56]. |
RF with PySpark | The model demonstrated exceptional performance, achieving a perfect 1.0 score for accuracy, precision, recall, and F1-score. | The dataset for the study was adopted from Kaggle. | Developed a Random Forest model using PySpark classification to predict the potability of river water based on ten different features [57]. |
DT, Naive Bayes | Accuracy, 97.23% | The dataset for the study was adopted from Kaggle. | To compare the performance of two machine learning algorithms—the Decision Tree Algorithm and the Naive Bayes algorithm—in predicting drinking water quality [58]. |
RF | MAE, RMSE, R2. The best R2 was achieved in 2010, with a value of 0.993, an MAE of 0.132, and an RMSE of 0.260. The worst performance occurred in 2005, with an R2 of 0.932, an MAE of 0.64, and an RMSE of 1.511. | Remote sensing and GIS methods were employed to create training and test sets | To introduce a study that uses a Random Forest model to predict shallow groundwater nitrate concentrations in the Yinchuan Region of central Yinchuan Plain during four different years [59]. |
ANN(MLP, LSTM) and MLR | the ANN model demonstrated the best performance for scenario 1, with R2 values above 0.99 for all variables (RSC, MH, SAR, PI, SSP, and KI) in both training and testing. The MLR models showed better results in scenario 2 compared to the ANN and LSTM models. | Observations were collected from wells within the basin area, utilizing 140 water samples for this model. | To develop accurate and reliable machine learning models for predicting irrigation water quality parameters, which can help plan irrigation water and crop management more effectively [60]. |
ANN | MAE, RMSE, R2. On average, across the various experiments, R2 achieved a value of 0.83, MAE obtained a value of 0.22, and RMSE recorded a value of 0.39. | Data collected from A high-resolution sensor with spatial, temporal, and spectral (1 nm) capabilities enables continuous observation and effective long-term monitoring of inland water quality, | Introduces proximal remote sensing for inland water quality monitoring, presents a high-resolution hyperspectral imager, and demonstrates the effectiveness of machine learning algorithms in accurately estimating key water quality parameters such as nitrogen, phosphorus, and chemical oxygen using this new technology [61]. |
Pearson CC-SVM | The AUC of the PCC-SVM-based method was approximately 1. | Taking the dataset into consideration, the parameters included residual chlorine, pH, turbidity, temperature, conductivity, oxidation-reduction potential, and chemical oxygen demand. | To emphasize the increasing risk of cross-connections in potable-reclaimed water dual distribution systems, and to highlight the need for reliable, cost-effective, and real-time online detection methods [62]. |
Remote sensing p latforms | The article compares the advantages of various remote sensing platforms and inversion models while discussing hyperspectral monitoring applications for multiple water quality parameters. | comparing various remote sensing platforms, inversion models, and water quality parameters | To provide an overview of the development and current applications of hyperspectral remote sensing in inland water quality detection. It compares the merits of various remote sensing platforms, inversion models, and the monitoring of specific water quality parameters [63]. |
Techniques | Metrics | Data | Application |
---|---|---|---|
Convolutional neural network (CNN), Support vector regression (SVR) | The results show that the CNN model outperforms the SVR model. This is demonstrated by the higher AUC values for both the training (0.844) and testing (0.843) datasets for CNN, compared to the AUC values of 0.75 for both training and testing datasets for the SVR model | The study involves creating 140 groundwater datasets in South Korea, dividing them into calibration and testing groups, and using 15 groundwater conditioning factors for model training. | To develop groundwater potential maps using machine learning algorithms (specifically SVR and CNN) to aid in the conservation and management of groundwater resources [69]. |
Minimax Probability Machine Regression (MPMR), Relevance Vector Machine (RVM), Gaussian Process Regression (GPR), Extreme Learning Machine (ELM) | The results showed that the MPMR model performed the best among the four models, with the following metrics: R2 = 0.984, MAE = 0.035, RMSE = 0.044, Nash–Sutcliffe Efficiency (ENS) = 0.984, DRefined = 0.995, and Extreme Learning Machine (ELM) = 0.874. | Datasets of Lake Huron’s water levels. The data spans from 1918 to 2013, with the period from 1918 to 1993 used for the training phase, and the remaining data (from 1994 to 2013) used for testing. | To evaluate the performance of four advanced artificial intelligence models, MPMR, RVM, GPR, ELM, for forecasting lake level fluctuations in Lake Huron using historical datasets [70]. |
Artificial neural network (ANN) and adaptive neuro-fuzzy inference system (ANFIS) | The best ANN and ANFIS models showed high performance with r > 0.95, Nash index > 0.95, and RMSE < 0.1. The optimal NN model was t + 4, while ANFIS’s best was t + 6. | daily rainfall and water level data from 1 January 2012 to 31 December 2019. These data were collected from stations P68 and C13, and provided Empresa PŁblica Metropolitana de Agua Potable y Saneamiento de Quito. | To develop and compare machine learning models, ANN and ANFIS models, to forecast the water level of the Salve Faccha reservoir, which supplies water to Quito, the capital of Ecuador [71]. |
GBT, XGBoost, SVM, DT, RF, AdaBoost, LightGBM, ANN | The GTB model had the lowest mean-squares-error and the highest R-squared and adjusted R-squared values in all case studies. Additionally, over 91% of the total samples had an error rate below 10% between the predicted and observed values. | The dataset used in this study consists of 3348 samples collected over a 21-year monitoring period from the Bac-Hung-Hai catchment, which is the largest irrigation and drainage area in Vietnam. | To explore the application of machine learning methods, specifically GTB model, for estimating water levels without comprehensive knowledge of hydrological processes or complex irrigation system databases [72]. |
Extreme gradient boosting, Random Forest, Multilinear regression | The results of this study showed that the extreme gradient boosting model performed the best, with R2 = 0.998 and RMSE = 0.048 m, followed by Random Forest (R2 = 0.997, RMSE = 0.054 m) and multilinear regression (R2 = 0.970, RMSE = 0.221 m). | The dataset, from a Central Kalimantan peat dome, contains 2010–2012 groundwater level measurements, elevation, and precipitation data, as these factors significantly impact groundwater levels spatiotemporally. | To convey the importance of understanding groundwater levels in peatlands, particularly in Indonesia, which possesses the largest share of tropical peat carbon [73]. |
XGB, MLR, GWR, SVR, RF, IDW, OK, COK. | The results show that the XGB algorithm with the Tweedie loss function achieved the best performance with an R2 value of 1.00, and the lowest errors when compared to other machine learning and interpolation methods like MLR, GWR, SVR, RF, IDW, OK, and COK. | The study used datasets from various sources, including precipitation from China Meteorological Data Service Centre, topographical factors from SRTM, soil factors from CSCD, vegetation index from NASA earth data, auxiliary factors from the National basic geographic database, land cover data from ESA, lithology data from the Spatial Database of Digital Geologic Map of China, and coordinates from the projection coordinate system. | To present a framework using the XGB machine learning method for learning groundwater depth in unconfined aquifers in hilly terrain, where spatial interpolation methods often face errors [29]. |
Gaussian process regression˙ | The most significant variables in predicting irrigation water demand were found to be irrigated cropped area, air temperature, and vapor pressure deficit. The Gaussian process regression model showed high accuracy, with an R2 higher than 0.97 and RMSE as low as 0.06 km3, even with different input variable combinations. | The study used datasets from California Natural Resources Agency and gridMET to predict annual irrigation water demand. They simplified 400+ commodities into 20 crop categories and multi-crop areas for modeling. | To develop machine learning models to predict California’s annual, county-level irrigation water demand using various input variables over an 18-year time span [74]. |
hierarchical cluster analysis (HCA) artificial neural networks (ANNs) | The study discovered 30 clusters through HCA, with higher groundwater levels in the western part of the Ogallala Aquifer that decreased towards the east. The ANN models accurately predicted even for non-calibrated wells, and integrating HCA and ANN allowed for effective annual groundwater level forecasting for well sets. | The study is based on the time series of groundwater levels in 403 wells of the Ogallala Aquifer, | To present a study that employs HCA and ANNs to predict annual groundwater levels in 403 wells of the Ogallala Aquifer, which is critical for agricultural irrigation and public water supply [75]. |
LR, RF, SVM, XGBoost | The XGBoost model, with R2 = 0.86 and RMSE = 0.29, outperformed other models. The entire dataset of HIX had a strong association with Landsat reflectance. The HIX decreased from 2015 to 2020. | Two datasets were used in this research: Landsat 8 OLI product, which collected multispectral imagery of the Earth’s surface to derive HIX in lakes across China, and 1150 pairs of field samples to match Landsat surface reflectance data and select sensitive spectral variables for machine learning methods. | To develop a general model based on Landsat 8 OLI product embedded in Google Earth Engine (GEE) to derive the humification index (HIX) based on Excitation-Emission Matrices (EEMs) in lakes across China [76]. |
Techniques | Metrics | Data | Application |
---|---|---|---|
DBSCAN, CNN | Mean Per-Class Error = 0 | The data source for leakage detection in this paper is derived from hydraulic simulations. | To tackle the issue of water leakage in urban water supply networks, which affects water quality, hydraulics, and public health [78]. |
ANN | Distance leak error < 2.32 m for 95% of points. | Transient head traces at the valve after its closure | To introduce a novel methodology for identifying features in water pipelines, which, by accurately predicting the presence of junctions and leaks, aims to enhance the assessment and maintenance of water distribution systems [79]. |
SVM | Accuracy between 80 and 83% for the testing dataset. | Wireless sensors networks was presented + 4G for transmission | To develop an efficient water pipeline monitoring system using wireless sensor networks and SVM-based leakage identification to conserve resources and minimize economic losses [80]. |
LDA, ANN | Accuracy 80%, for the testing dataset. | Pressure data | Introduce a data-driven approach using limited pressure measurements and machine learning classifiers to accurately localize leaks in water distribution networks, ultimately conserving water resources and reducing costs [81]. |
Time-frequency CNN | Accuracy 99%, for the testing dataset. | Real datasets from Chengdu city and synthesized datasets containing Gaussian white noise | To propose a leakage spectrogram and time-frequency convolutional neural network model for improved accuracy and stability in leak detection, and to compare its performance with other classification models under various signal-to-noise ratio conditions [82]. |
GBT | Accuracy, 99.8%, for the testing dataset. | Several months from multiple cities across North America. | To provide an overview of the challenges faced by water utilities in detecting and managing leaks in aging water infrastructure and to present various technologies and methods that have been developed to address this issue [83]. |
RF | Accuracy in different context. The mean of accuracy 33.5 % | The dataset includes 24 features as inputs to each mode. The leak shapes were divided into five separate datasets based on their leak area. | To present a methodology for predicting leak shapes using vibration signals and introduce an innovative signal processing technique that combines machine learning methods, specifically Random Forest classifiers, with various signal features [84]. |
RF, Dtree, KNN, Naive Bayes | Accuracy, Precision, Recall, F1-score. 10 scenarios were simulated, where the best algorithm was Bayes, obtaining an average of 96.27, 95.94, 95.78, 93.80. | Vitens company dataset that describes the water distribution networks of Leeuwarden. The dataset includes data on flow, pressure, temperature, turbidity, conductivity, and acidity. | A real-time hybrid method that uses AI algorithms and hydraulic relations for detecting and locating leaks, as well as identifying the volume of losses material in Water Distribution Networks [85]. |
RF | Accuracy 95% | The data was collected using 18 pressure sensors and 3 flow sensors, with a noise level of 1.5% MPI | To propose a method for improving the accuracy of a classifier model for leak location in water distribution networks [86]. |
KNN | Accuracy 52% | The dataset is created through a two-step process involving hydraulic transient simulations of a water network. The dataset consists of pressure head data from all nodes in the network. | A novel method for identifying and locating leaking pipes in pressurized water distribution systems using transient modeling and the K-nearest neighbors (K-NN) algorithm [87]. |
RF, ANN, SVM | Accuracy 75% | Data from smart meters located alongside water distribution pipelines + Narrow-Band IoT | To introduce the concept of the Internet of Things in water management. It discusses the challenges and limitations of traditional methods of leak detection in pipes and highlights the potential of using a low-cost sensor network and machine learning algorithms to monitor and control water leaks more efficiently [88]. |
DT, KNN, RF, Adaboost | Accuracy: The most outstanding result was achieved by the Random Forest (RF) model, which demonstrated a 100% accuracy rate. | Used the radial sensing direction for signal collection with a sampling rate of 3000 samples/s in streaming mode. | To introduce the use of cost-effective MEMS-based accelerometers for leak detection and explain the methodology used, including experiments on real networks, data analysis, and the development of machine learning models [89]. |
SVM | Accuracy between 88 and 93% | Vibrations measurements using low-power accelerometers | To develop, test, validate, and demonstrate a machine-learning-based risk assessment method for early detection of leaks with high likelihood, their geolocation, and accuracy assessment in the water distribution system at the University of Lille’s SUNRISE demonstration site in France [90]. |
ANN | Accuracy 97% | Flow data set for pipe | Implementing a machine learning-based risk assessment method enables quick detection of highly probable leaks, precise geolocation, and accurate evaluation within the water distribution system [91]. |
ANN, RF, DT, Logistic Regression | Accuracy, Precision, Recall, F1-score. Close to 100%. | Flow and pressure data were determined using EPANET software | To present a study examining the capability of machine learning methods to localize leaks in water distribution systems, which is crucial due to the economic losses, infrastructure damage, and soil contamination caused by water leakage [92]. |
ANN, DT, SVM, KNN | Accuracy 100% in the best cases for both metal and non-metal pipes. | Features extracted from de-noised signals of sound | Leaks in water distribution networks [93]. |
Physics-Guided Neural Networks (PGNN), CNN. | Accuracy, Recall. Precision. On average, 0.64, 0.65, 0.62. | 8 satellite images and their derived parameter for water leak detection in canal systems. | The application of this approach lies in the domain of remote sensing and infrastructure maintenance, with a particular emphasis on automating the detection and evaluation of water leaks within extensive canal systems [94]. |
SVM, DT, KNN, CNN(SqueezeNet) | Accuracy. The best was SquueezeNet with a 95.15%. | Utilizing piezoelectric accelerometers to gather real network data across multiple cities in China. | Enhance leak detection efficiency, minimize water losses, mitigate structural damage, and bolster public safety by automating the leak detection process in water distribution systems [95]. |
MLP, CNN, SVM | Accuracy. MLP was the best with a 94.89% | Comprises both leakage and non-leakage sounds, systematically gathered via a cloud information management system from confirmed underground leakages in urban areas. | Developing an AI-based system to address the challenges of Non-Revenue Water in densely populated cities [96]. |
Time-Frequency CNN (TFCNN), Frequency CNN (FCNN) | Accuracy, Precision, AUC. The TFCNN achieved the highest performance in terms of accuracy (97.99%), precision (95.51%), and area under the curve (AUC) (0.98). | Various methods, including ground penetrating radar, gas injection, hydrophones, vibro-acoustic noise loggers and correlators, infrared thermography, and in-line devices, are explored. | To effectively monitor and maintain potable Water Distribution Networks in order to ensure a continuous and uninterrupted water supply for customers [97]. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
García, J.; Leiva-Araos, A.; Diaz-Saavedra, E.; Moraga, P.; Pinto, H.; Yepes, V. Relevance of Machine Learning Techniques in Water Infrastructure Integrity and Quality: A Review Powered by Natural Language Processing. Appl. Sci. 2023, 13, 12497. https://doi.org/10.3390/app132212497
García J, Leiva-Araos A, Diaz-Saavedra E, Moraga P, Pinto H, Yepes V. Relevance of Machine Learning Techniques in Water Infrastructure Integrity and Quality: A Review Powered by Natural Language Processing. Applied Sciences. 2023; 13(22):12497. https://doi.org/10.3390/app132212497
Chicago/Turabian StyleGarcía, José, Andres Leiva-Araos, Emerson Diaz-Saavedra, Paola Moraga, Hernan Pinto, and Víctor Yepes. 2023. "Relevance of Machine Learning Techniques in Water Infrastructure Integrity and Quality: A Review Powered by Natural Language Processing" Applied Sciences 13, no. 22: 12497. https://doi.org/10.3390/app132212497
APA StyleGarcía, J., Leiva-Araos, A., Diaz-Saavedra, E., Moraga, P., Pinto, H., & Yepes, V. (2023). Relevance of Machine Learning Techniques in Water Infrastructure Integrity and Quality: A Review Powered by Natural Language Processing. Applied Sciences, 13(22), 12497. https://doi.org/10.3390/app132212497