Machine Learning Analysis of Hydrological and Hydrochemical Data from the Abelar Pilot Basin in Abegondo (Coruña, Spain)
Abstract
1. Introduction
2. Materials and Methods
2.1. Abelar Pilot Basin and Available Data
2.2. ML Methods
2.2.1. Data Preprocessing
2.2.2. Cluster Analysis
2.2.3. Times Series Gaussian Process
2.2.4. Model Inspection Methods
2.2.5. Model Validation
3. Results
3.1. CA Results
3.2. TS-GPR Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Stylianoudaki, C.; Trichakis, I.; Karatzas, G.P. Modeling Groundwater Nitrate Contamination Using Artificial Neural Networks. Water 2022, 14, 1173. [Google Scholar] [CrossRef]
- Castrillo, M.; García, Á.L. Estimation of High Frequency Nutrient Concentrations from Water Quality Surrogates Using Machine Learning Methods. Water Res. 2020, 172, 115490. [Google Scholar] [CrossRef] [PubMed]
- Xu, J.; Xu, Z.; Kuang, J.; Lin, C.; Xiao, L.; Huang, X.; Zhang, Y. An Alternative to Laboratory Testing: Random Forest-Based Water Quality Prediction Framework for Inland and Nearshore Water Bodies. Water 2021, 13, 3262. [Google Scholar] [CrossRef]
- Rodriguez-Galiano, V.; Mendes, M.P.; Garcia-Soldado, M.J.; Chica-Olmo, M.; Ribeiro, L. Predictive Modeling of Groundwater Nitrate Pollution Using Random Forest and Multisource Variables Related to Intrinsic and Specific Vulnerability: A Case Study in an Agricultural Setting (Southern Spain). Sci. Total Environ. 2014, 476–477, 189–206. [Google Scholar] [CrossRef]
- Kratzert, F.; Klotz, D.; Brenner, C.; Schulz, K.; Herrnegger, M. Rainfall–Runoff Modelling Using Long Short-Term Memory (LSTM) Networks. Hydrol. Earth Syst. Sci. 2018, 22, 6005–6022. [Google Scholar] [CrossRef]
- Zhi, W.; Feng, D.; Tsai, W.-P.; Sterle, G.; Harpold, A.; Shen, C.; Li, L. From Hydrometeorology to River Water Quality: Can a Deep Learning Model Predict Dissolved Oxygen at the Continental Scale? Environ. Sci. Technol. 2021, 55, 2357–2368. [Google Scholar] [CrossRef]
- Barzegar, R.; Aalami, M.T.; Adamowski, J. Short-Term Water Quality Variable Prediction Using a Hybrid CNN–LSTM Deep Learning Model. Stoch. Environ. Res. Risk Assess. 2020, 34, 415–433. [Google Scholar] [CrossRef]
- Bu, J.; Liu, W.; Pan, Z.; Ling, K. Comparative Study of Hydrochemical Classification Based on Different Hierarchical Cluster Analysis Methods. Int. J. Environ. Res. Public Health 2020, 17, 9515. [Google Scholar] [CrossRef]
- Zhu, Y.; Yang, H.; Xiao, Y.; Hao, Q.; Li, Y.; Liu, J.; Wang, L.; Zhang, Y.; Hu, W.; Wang, J. Identification of Hydrochemical Characteristics, Spatial Evolution, and Driving Forces of River Water in Jinjiang Watershed, China. Water 2024, 16, 45. [Google Scholar] [CrossRef]
- Liu, H.; Yang, J.; Ye, M.; James, S.C.; Tang, Z.; Dong, J.; Xing, T. Using t-Distributed Stochastic Neighbor Embedding (t-SNE) for Cluster Analysis and Spatial Zone Delineation of Groundwater Geochemistry Data. J. Hydrol. 2021, 597, 126146. [Google Scholar] [CrossRef]
- Khandelwal, A.; Xu, S.; Li, X.; Jia, X.; Stienbach, M.; Duffy, C.; Nieber, J.; Kumar, V. Physics Guided Machine Learning Methods for Hydrology. arXiv 2020, arXiv:2012.02854. [Google Scholar]
- Aris, A.Z.; Praveena, S.M.; Abdullah, M.H.; Radojevic, M. Statistical Approaches and Hydrochemical Modelling of Groundwater System in a Small Tropical Island. J. Hydroinform. 2011, 14, 206–220. [Google Scholar] [CrossRef]
- Fabbrocino, S.; Rainieri, C.; Paduano, P.; Ricciardi, A. Cluster Analysis for Groundwater Classification in Multi-Aquifer Systems Based on a Novel Correlation Index. J. Geochem. Explor. 2019, 204, 90–111. [Google Scholar] [CrossRef]
- Fang, M.; Lyu, L.; Wang, N.; Zhou, X.; Hu, Y. Application of Unsupervised Clustering Model Based on Graph Embedding in Water Environment. Sci. Rep. 2023, 13, 22774. [Google Scholar] [CrossRef]
- He, Q.; Barajas-Solano, D.; Tartakovsky, G.; Tartakovsky, A.M. Physics-Informed Neural Networks for Multiphysics Data Assimilation with Application to Subsurface Transport. Adv. Water Resour. 2020, 141, 103610. [Google Scholar] [CrossRef]
- Haggerty, R.; Sun, J.; Yu, H.; Li, Y. Application of Machine Learning in Groundwater Quality Modeling—A Comprehensive Review. Water Res. 2023, 233, 119745. [Google Scholar] [CrossRef]
- Bui, D.T.; Khosravi, K.; Karimi, M.; Busico, G.; Khozani, Z.S.; Nguyen, H.; Mastrocicco, M.; Tedesco, D.; Cuoco, E.; Kazakis, N. Enhancing Nitrate and Strontium Concentration Prediction in Groundwater by Using New Data Mining Algorithm. Sci. Total Environ. 2020, 715, 136836. [Google Scholar] [CrossRef]
- Bhattarai, A.; Dhakal, S.; Gautam, Y.; Bhattarai, R. Prediction of Nitrate and Phosphorus Concentrations Using Machine Learning Algorithms in Watersheds with Different Landuse. Water 2021, 13, 3096. [Google Scholar] [CrossRef]
- Wang, S.; Gong, J.; Gao, H.; Liu, W.; Feng, Z. Gaussian Process Regression and Cooperation Search Algorithm for Forecasting Nonstationary Runoff Time Series. Water 2023, 15, 2111. [Google Scholar] [CrossRef]
- Susiluoto, J.; Spantini, A.; Haario, H.; Härkönen, T.; Marzouk, Y. Efficient Multi-Scale Gaussian Process Regression for Massive Remote Sensing Data with satGP v0.1.2. Geosci. Model Dev. 2020, 13, 3439–3463. [Google Scholar] [CrossRef]
- Samper, J.; Naves, A.; Pisani, B.; Montenegro, L.; Mon, A.; Fernández, J.; Arias, R.; Piñeiro, R.; Velo, M.; Ameijenda, C. Estudio Hidrogeológico, Vulnerabilidad y Protección de Las Captaciones de Los Suministros Rurales En Abegondo (A Coruña). In Proceedings of the Conference “Las Aguas Subterráneas y la Planificación Hidrológica: Congreso Hispano-Luso Sobre las Aguas Subterráneas en el Segundo Ciclo de Planificación Hidrológica”, Madrid, Spain, 28–30 December 2016; pp. 335–344. [Google Scholar]
- Naves, A.; Samper, J.; Mon, A.; Pisani, B.; Montenegro, L.; Carvalho, J.M. Demonstrative Actions of Spring Restoration and Groundwater Protection in Rural Areas of Abegondo (Galicia, Spain). Sustain. Water Resour. Manag. 2019, 5, 175–186. [Google Scholar] [CrossRef]
- Samper, J.; Naves, A.; Pisani, B.; Dafonte, J.; Montenegro, L.; García-Tomillo, A. Sustainability of Groundwater Resources of Weathered and Fractured Schists in the Rural Areas of Galicia (Spain). Environ. Earth Sci. 2022, 81, 141. [Google Scholar] [CrossRef]
- Soto, B.; Brea, M.A.; Pérez, R.; Díaz-Fierros, F. Influence of 7-Year Old Eucalyptus Globulus Plantation on the Low Flow of a Small Basin. In Proceedings of the Water in Celtic Countries: Quantity, Quality and Climate Variability (Proceedings of the Fourth InterCeltic Colloquium on Hydrology and Management of Water Resources, Guimarães, Portugal, 11–14 July 2005; pp. 232–241. [Google Scholar]
- Rodríguez-Suárez, J.A.; Soto, B.; Perez, R.; Diaz-Fierros, F. Influence of Eucalyptus Globulus Plantation Growth on Water Table Levels and Low Flows in a Small Catchment. J. Hydrol. 2011, 396, 321–326. [Google Scholar] [CrossRef]
- Peel, M.C.; Finlayson, B.L.; McMahon, T.A. Updated World Map of the Köppen-Geiger Climate Classification. Hydrol. Earth Syst. Sci. 2007, 11, 1633–1644. [Google Scholar] [CrossRef]
- Samper; Pisani, B.; Marques, J.E. Hydrological models of interflow in three Iberian mounntain basins. Environ. Earth Sci. 2025, 73, 2645–2656. [Google Scholar] [CrossRef]
- Pisani, B.; Samper, J.; Espinha-Marques, J. Climate change impact on groundwater resources of a hard rock mountain region (Serra da Estrela, Central Portugal). Sustain. Water Resour. Manag. 2019, 5, 289–304. [Google Scholar] [CrossRef]
- Espinha Marques, J.; Samper, J.; Pisani, B.; Alvares, D.; Carvalho, J.M.; Chaminé, H.I.; Marques, J.M.; Vieira, G.T.; Mora, C.; Sodré Borges, F. Evaluation of Water Resources in a High-Mountain Basin in Serra Da Estrela, Central Portugal, Using a Semi-Distributed Hydrological Model. Environ. Earth Sci. 2011, 62, 1219–1234. [Google Scholar] [CrossRef]
- Stigter, T.Y.; Nunes, J.P.; Pisani, B.; Fakir, Y.; Hugman, R.; Li, Y.; Tomé, S.; Ribeiro, L.; Samper, J.; Oliveira, R.; et al. Comparative assessment of climate change impacts on coastal groundwater resources and dependent ecosystems in the Mediterranean. Reg. Environ. Change 2014, 14 (Suppl. S1), S41–S56. [Google Scholar] [CrossRef]
- Tukey, J.W. Exploratory Data Analysis; John, W., Ed.; Addison-Wesley Pub. Co.: Reading, MA, USA, 1977; ISBN 978-0-201-07616-5. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Macqueen, J. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, Davis, CA, USA, 1 January 1967; Volume 1, pp. 281–297. [Google Scholar]
- Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; Adaptive computation and machine learning; MIT Press: Cambridge, MA, USA, 2008; ISBN 978-0-262-18253-9. [Google Scholar]
- Seeger, M. Gaussian Processes for Machine Learning. Int. J. Neur. Syst. 2004, 14, 69–106. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Goldstein, A.; Kapelner, A.; Bleich, J.; Pitkin, E. Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation. J. Comput. Graph. Stat. 2015, 24, 44–65. [Google Scholar] [CrossRef]
Name of Attribute | Description | Units |
---|---|---|
Date | Date of the observation | - |
P | Precipitation | mm |
R | Recharge | mm |
Surface runoff | mm | |
Interflow/subsurface flow | mm | |
Groundwater flow | mm | |
Total flow | mm | |
K | Potassium concentration | mg/L |
Na | Sodium concentration | mg/L |
Ca | Calcium concentration | mg/L |
Mg | Magnesium concentration | mg/L |
Fe | Iron concentration | µg/L |
Mn | Manganese concentration | µg/L |
Cu | Copper concentration | µg/L |
Zn | Zinc concentration | µg/L |
Al | Aluminium concentration | µg/L |
Vn | Vanadium concentration | µg/L |
Si | Silicon concentration | mg/L |
Cl | Chloride concentration | mg/L |
Sulphate concentration | mg/L | |
Nitrate concentration | mg/L |
Cluster Number | |||
Statistic | 0 | 1 | 2 |
count | 179 | 76 | 133 |
mean | 25.156 | 19.802 | 21.333 |
std | 2.860 | 2.037 | 3.296 |
min | 17.700 | 13.200 | 9.809 |
25% quantile | 23.114 | 18.674 | 19.099 |
50% quantile | 25.399 | 20.049 | 21.400 |
75% quantile | 27.500 | 21.124 | 23.779 |
max | 30.699 | 23.499 | 29.099 |
Cluster Number | |||
Statistic | 0 | 1 | 2 |
count | 179 | 76 | 133 |
mean | 0.793 | 0.026 | 0.548 |
std | 0.093 | 0.045 | 0.116 |
min | 0.538 | 0 | 0.297 |
25% quantile | 0.730 | 0 | 0.479 |
50% quantile | 0.804 | 0 | 0.545 |
75% quantile | 0.871 | 0.026 | 0.599 |
max | 0.942 | 0.171 | 0.919 |
Cluster Number | |||
Statistic | 0 | 1 | 2 |
count | 179 | 76 | 133 |
mean | 0.205 | 0.973 | 0.447 |
std | 0.093 | 0.045 | 0.120 |
min | 0.057 | 0.828 | 0.046 |
25% quantile | 0.127 | 0.973 | 0.398 |
50% quantile | 0.195 | 1 | 0.452 |
75% quantile | 0.269 | 1 | 0.515 |
max | 0.460 | 1 | 0.703 |
Metrics of Accuracy | |||||
---|---|---|---|---|---|
Step | Input Variables | Case | R2 | NRMSE | NMAE |
1 | Training | 0.07 | 0.19 | 0.15 | |
Testing | 0.13 | 0.20 | 0.15 | ||
2 | R, , | Training | −0.01 | 0.19 | 0.15 |
Testing | 0.22 | 0.19 | 0.15 | ||
3 | R, , with shifting | Training | 0.44 | 0.14 | 0.10 |
Testing | 0.45 | 0.16 | 0.11 | ||
4 | R, , with shifting and K, Ca, Mg, Cl | Training | 0.82 | 0.08 | 0.05 |
Testing | 0.80 | 0.10 | 0.06 | ||
Validation | R, , with shifting and K, Ca, Mg, Cl | Validation | 0.85 | 0.15 | 0.12 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Samper-Pilar, J.; Samper-Calvete, J.; Mon, A.; Pisani, B.; Paz-González, A. Machine Learning Analysis of Hydrological and Hydrochemical Data from the Abelar Pilot Basin in Abegondo (Coruña, Spain). Hydrology 2025, 12, 49. https://doi.org/10.3390/hydrology12030049
Samper-Pilar J, Samper-Calvete J, Mon A, Pisani B, Paz-González A. Machine Learning Analysis of Hydrological and Hydrochemical Data from the Abelar Pilot Basin in Abegondo (Coruña, Spain). Hydrology. 2025; 12(3):49. https://doi.org/10.3390/hydrology12030049
Chicago/Turabian StyleSamper-Pilar, Javier, Javier Samper-Calvete, Alba Mon, Bruno Pisani, and Antonio Paz-González. 2025. "Machine Learning Analysis of Hydrological and Hydrochemical Data from the Abelar Pilot Basin in Abegondo (Coruña, Spain)" Hydrology 12, no. 3: 49. https://doi.org/10.3390/hydrology12030049
APA StyleSamper-Pilar, J., Samper-Calvete, J., Mon, A., Pisani, B., & Paz-González, A. (2025). Machine Learning Analysis of Hydrological and Hydrochemical Data from the Abelar Pilot Basin in Abegondo (Coruña, Spain). Hydrology, 12(3), 49. https://doi.org/10.3390/hydrology12030049