Machine-Learned Emulators for Teleconnection Discovery and Uncertainty Quantification in Coupled Human–Natural Systems
Abstract
1. Introduction

2. Materials and Methods
2.1. Research Site
2.2. Estimation Procedures for Emulators
3. Results
3.1. Machine-Learned Emulators
3.2. Uncertainty Quantification and Teleconnections
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| A2EM | Advanced aquatic ecosystem model |
| AI | Artificial Intelligence |
| CHANS | Coupled human and natural system |
| ChlA | Chlorophyll A |
| CMIP5 | Coupled Model Intercomparison Project Phase 5 |
| CRF | Classifier random forest |
| CLGBM | Classifier Light Gradient Boost Model |
| CXGB | Classifier Extreme Gradient Boost |
| GCC | Global climate change |
| GCM | Global climate model |
| HABs | Harmful algal blooms |
| HPC | High performance computing |
| IQR | Interquartile Range |
| LULCC | Land use land cover change |
| ML | Machine Learning |
| N | Nitrogen |
| P | Phosphorus |
| RCA | Row column advanced ecological systems modeling program |
| RCP | Representative concentration pathways |
| RHESSys | Regional hydro-ecologic simulation system |
| RRF | Regression random forest |
| RLGBM | Regression Light Gradient Boosting Machine |
| RXGB | Regression Extreme Gradient Boost |
Appendix A
| Variables | Units | Description | Obs | Mean | St. Dev. | Min | Max |
|---|---|---|---|---|---|---|---|
| RCA Outputs | |||||||
| JPO4 | mg/m2—day | Phosphate flux to water column | 3,807,376 | 3.7909 | 7.8466 | −2.6164 | 447.8828 |
| JPON | mg/m2—day | Particulate organic nitrogen water–sediment flux | 3,807,376 | 7.2544 | 2.8408 | 0.6824 | 23.419 |
| JPOP | mg/m2—day | Particulate organic phosphorus water–sediment flux | 3,807,376 | 1.5116 | 1.1544 | 0.0328 | 44.6003 |
| PO4JRES | g/day | PO4 resuspension flux | 3,807,376 | 0 | 0 | 0 | 0 |
| PO4T2R | mg/kg | Total orthophosphate, layer 2 (anaerobic) | 3,807,376 | 168,874.2 | 41,152.39 | 32,501.23 | 409,603.7 |
| POPR | mg/g | Total particulate organic phosphorus | 3,807,376 | 0.1303 | 0.01 | 0.0794 | 0.1548 |
| RHESSys Outputs | |||||||
| streamflow | m3/s | Total Stream Outflow at Missisquoi River at Swanton (USGS Gauged) | 3,807,376 | 56.421 | 68.3438 | 0.0009 | 1594.72 |
| baseflow | m3/s | Baseflow at Missisquoi River at Swanton (USGS Gauged) | 3,807,376 | 1.5579 | 0.7102 | 0.2391 | 8.2501 |
| %sat_area | m2/m2 | Percent Saturated Area over entire watershed | 3,807,376 | 1.2225 | 0.3763 | 0.1078 | 1.7651 |
| evap | mm | Evaporation over entire watershed | 3,807,376 | 1.0902 | 0.8628 | 0.00007 | 4.9778 |
| sat_def | mm of water | Saturation Deficit (volume) over entire watershed | 3,807,376 | 198.6768 | 163.8577 | 36.2512 | 829.4112 |
| snowpack | mm | Snow Water Equivalent (SWE) over entire watershed | 3,807,376 | 4.9031 | 18.5799 | 0 | 274.5981 |
| Derived Hydrology | |||||||
| missisquoi_Q | m3/s | Volume of water entering Missisquoi Bay from the Missisquoi River | 3,807,376 | 68.2336 | 72.6332 | 0.001 | 1627.26 |
| missisquoi_TP | mg/L | Amount of total phosphorus entering Missisquoi Bay from the Missisquoi River | 3,807,376 | 0.04194 | 0.0472 | 0 | 3.215 |
| missisquoi_TN | mg/L | Amount of total nitrogen entering Missisquoi Bay from the Missisquoi River | 3,807,376 | 0.6435 | 0.0652 | 0.5822 | 2.0444 |
| pike_Q | m3/s | Volume of water entering Missisquoi Bay from the Pike River | 3,807,376 | 15.5469 | 16.5493 | 0.0002 | 370.7702 |
| pike_TP | mg/L | Amount of total phosphorus entering Missisquoi Bay from the Pike River | 3,807,376 | 0.0419 | 0.0472 | 0 | 3.215 |
| pike_TN | mg/L | Amount of total nitrogen entering Missisquoi Bay from the Pike River | 3,807,376 | 0.6435 | 0.0652 | 0.5822 | 2.0444 |
| rock_Q | m3/s | Volume of water entering Missisquoi Bay from the Rock River | 3,807,376 | 2.5911 | 2.7582 | 0.00003 | 61.795 |
| rock_TP | mg/L | Amount of total phosphorus entering Missisquoi Bay from the Rock River | 3,807,376 | 0.0419 | 0.0472 | 0 | 3.215 |
| rock_TN | mg/L | Amount of total nitrogen entering Missisquoi Bay from the Rock River | 3,807,376 | 0.6435 | 0.0652 | 0.5822 | 2.0444 |
| GCMs | |||||||
| canesm2.1 | True/False | GCM used | 3,807,376 | 0.0632 | 0.2434 | 0 | 1 |
| ccsm4.1 | True/False | GCM used | 3,807,376 | 0.2439 | 0.4294 | 0 | 1 |
| gfdl-esm2m.1 | True/False | GCM used | 3,807,376 | 0.0843 | 0.2778 | 0 | 1 |
| inmcm4.1 | True/False | GCM used | 3,807,376 | 0.0421 | 0.2009 | 0 | 1 |
| ipsl-cm5a-mr.1 | True/False | GCM used | 3,807,376 | 0.0361 | 0.1866 | 0 | 1 |
| miroc-esm-chem.1 | True/False | GCM used | 3,807,376 | 0.0361 | 0.1866 | 0 | 1 |
| miroc-esm.1 | True/False | GCM used | 3,807,376 | 0.0843 | 0.2778 | 0 | 1 |
| mri-cgcm3.1 | True/False | GCM used | 3,807,376 | 0.2469 | 0.4312 | 0 | 1 |
| noresm1-m.1 | True/False | GCM used | 3,807,376 | 0.1626 | 0.369 | 0 | 1 |
| RCP Used in Each Scenario | |||||||
| rcp26 | True/False | RCP used | 3,807,376 | 0.1174 | 0.3219 | 0 | 1 |
| rcp45 | True/False | RCP used | 3,807,376 | 0.3222 | 0.4673 | 0 | 1 |
| rcp60 | True/False | RCP used | 3,807,376 | 0.2379 | 0.4258 | 0 | 1 |
| rcp85 | True/False | RCP used | 3,807,376 | 0.3222 | 0.4673 | 0 | 1 |
| Phosphorus Reduction Scenario | |||||||
| 0_redux | True/False | 0% reduction in phosphorus input to Missisquoi Bay | 3,807,376 | 0.3313 | 0.4706 | 0 | 1 |
| 20_redux | True/False | 0% reduction in phosphorus input to Missisquoi Bay | 3,807,376 | 0.0873 | 0.2823 | 0 | 1 |
| 40_redux | True/False | 40% reduction in phosphorus input to Missisquoi Bay | 3,807,376 | 0.0873 | 0.2823 | 0 | 1 |
| 60_redux | True/False | 60% reduction in phosphorus input to Missisquoi Bay | 3,807,376 | 0.0512 | 0.2204 | 0 | 1 |
| 64_redux | True/False | 64.3% reduction in phosphorus input to Missisquoi Bay. EPA Total Maximum Daily Load (TMDL) policy scenario. | 3,807,376 | 0.0873 | 0.2823 | 0 | 1 |
| 80_redux | True/False | 80% reduction in phosphorus input to Missisquoi Bay | 3,807,376 | 0.0873 | 0.2823 | 0 | 1 |
| 100_redux | True/False | 100% reduction in phosphorus input to Missisquoi Bay | 3,807,376 | 0.0873 | 0.2823 | 0 | 1 |
| S0_redux | True/False | Baseline P reduction scenario: assumes 0% P reduction until 2021, and then scales up to 96% reduction by 2051. | 3,807,376 | 0.0361 | 0.1866 | 0 | 1 |
| S1_redux | True/False | Early action P reduction scenario: assumes 32.15% phosphorus reduction in 2001 and achieves 96% reduction by 2031. | 3,807,376 | 0.0361 | 0.1866 | 0 | 1 |
| S2_redux | True/False | Early onset P loading scenario: assume a 32.15% increase in phosphorus load in 2011 and scales to a 128% increase by 2051. | 3,807,376 | 0.0361 | 0.1866 | 0 | 1 |
| S3_redux | True/False | Mild early action P reduction scenario: assumes 32.15% P reduction in 2011, and then scales up to 96% reduction by 2041. | 3,807,376 | 0.0361 | 0.1866 | 0 | 1 |
| Other Input Variables | |||||||
| RandomNum | None | A random number between 0 and 1 | 3,807,376 | 0.4998 | 0.2885 | 4.94 × 10−7 | 0.9999 |
| Target Variable | |||||||
| CHLAClass | None | 0 for ChlA < 2.6 1 for ChlA ≥ 2.6 and < 10 2 for ChlA ≥ 10 and < 40 3 for ChlA > 40 | 3,807,376 | 1.600835 | 0.71751 | 0 | 3 |
| ChlAAVESurfaceMean | Micrograms/L | Geographic average ChlA mean for surface layer of the bay | 3,807,376 | 18.3481 | 19.4694 | 0.5687 | 283.7196 |
Appendix B








References
- Miller, T.; Durlik, I.; Kostecka, E.; Kozlovska, P.; Łobodzińska, A.; Sokołowska, S.; Nowy, A. Integrating artificial intelligence agents with the internet of things for enhanced environmental monitoring: Applications in water quality and climate data. Electronics 2025, 14, 696. [Google Scholar] [CrossRef]
- Sharma, S.; Sharma, K.; Grover, S. Real-time data analysis with smart sensors. In Application of Artificial Intelligence in Wastewater Treatment; Springer: Berlin/Heidelberg, Germany, 2024; pp. 127–153. [Google Scholar]
- Gani, A.; Hussain, A.; Pathak, S.; Omar, P.J. Analysing heavy metal contamination in groundwater in the vicinity of Mumbai’s landfill sites: An in-depth study. Top. Catal. 2024, 67, 1009–1023. [Google Scholar] [CrossRef]
- Pan, D.; Deng, Y.; Yang, S.X.; Gharabaghi, B. Recent Advances in Remote Sensing and Artificial Intelligence for River Water Quality Forecasting: A Review. Environments 2025, 12, 158. [Google Scholar] [CrossRef]
- Reichstein, M.; Camps-Valls, G.; Stevens, B.; J, M.; Denzler, J.; Carvalhais, N.; Prabhat, F. Deep learning and process understanding for data-driven Earth system science. Nature 2019, 566, 195–204. [Google Scholar] [CrossRef]
- Zia, A. Towards the Deployment of Food, Energy and Water Security Early Warning Systems as Convergent Technologies for Building Climate Resilience. In The Water, Energy, and Food Security Nexus in Asia and the Pacific: Central and South Asia; Springer International Publishing: Cham, Switzerland, 2024; pp. 99–118. [Google Scholar]
- Droegemeier, K.; Kontos, C.; Kratsios, M.; Córdova, F.; Walker, S.; Parker, L.; Romine, C.; Kurose, J.; Binkley, S. The National Artificial Intelligence Research and Development Strategic Plan: 2019 Update; National Science & Technology Council: Washington, DC, USA, 2019. [Google Scholar]
- Carter, J.; Feddema, J.; Kothe, D.; Neely, R.; Pruet, J.; Stevens, R.; Balaprakash, P.; Beckman, P.; Foster, I.; Iskra, K. Advanced Research Directions on AI for Science, Energy, and Security: Report on Summer 2022 Workshops; U.S. Department of Energy: Oak Ridge, TN, USA, 2023. [Google Scholar]
- Liu, J.; Hull, V.; Batistella, M.; DeFries, R.; Dietz, T.; Fu, F. Telecoupling: A new frontier for global sustainability. Ecol. Soc. 2013, 18, 26. [Google Scholar]
- Friis, C.; Nielsen, J.Ø. From teleconnection to telecoupling: Taking stock of an emerging framework in land system science. J. Land Use Sci. 2015, 10, 214–230. [Google Scholar] [CrossRef]
- Hüllermeier, E.; Waegeman, W. Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Mach. Learn. 2021, 110, 457–506. [Google Scholar] [CrossRef]
- Schmelzer, M.; Dwight, R.P.; Cinnella, P. Discovery of algebraic Reynolds-stress models using sparse symbolic regression. Flow Turbul. Combust. 2020, 104, 579–603. [Google Scholar] [CrossRef]
- Runge, J.; Bathiany, S.; Bollt, E.; Camps-Valls, G.; Coumou, D.; Deyle, E.; Glymour, C.; Kretschmer, M.; Mahecha, M.D.; Muñoz-Marí, J. Inferring causation from time series in Earth system sciences. Nat. Commun. 2019, 10, 2553. [Google Scholar] [CrossRef] [PubMed]
- Robinson, D.T.; Di Vittorio, A.; Alexander, P.; Arneth, A.; Barton, C.M.; Brown, D.G.; Kettner, A.; Lemmen, C.; O’neill, B.C.; Janssen, M. Modelling feedbacks between human and natural processes in the land system. Earth Syst. Dyn. 2018, 9, 895–914. [Google Scholar] [CrossRef]
- Calvin, K.; Bond-Lamberty, B. Integrated human-earth system modeling—State of the science and future directions. Environ. Res. Lett. 2018, 13, 063006. [Google Scholar] [CrossRef]
- Saltelli, A.; Ratto, M.; Andres, T.; Campolongo, F.; Cariboni, J.; Gatelli, D.; Saisana, M.; Tarantola, S. Global Sensitivity Analysis: The Primer; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
- Paxton, P.; Curran, P.J.; Bollen, K.A.; Kirby, J.; Chen, F. Monte Carlo experiments: Design and implementation. Struct. Equ. Model. 2001, 8, 287–312. [Google Scholar] [CrossRef]
- Willems, P. Model uncertainty analysis by variance decomposition. Phys. Chem. Earth Parts A/B/C 2012, 42, 21–30. [Google Scholar] [CrossRef]
- Wang, X.; Wang, L. Uncertainty quantification and propagation analysis of structures based on measurement data. Math. Comput. Model. 2011, 54, 2725–2735. [Google Scholar] [CrossRef]
- Vrugt, J.A.; Gupta, H.V.; Bouten, W.; Sorooshian, S. A Shuffled Complex Evolution Metropolis algorithm for optimization and uncertainty assessment of hydrologic model parameters. Water Resour. Res. 2003, 39, 1201. [Google Scholar] [CrossRef]
- Tolson, B.A.; Shoemaker, C.A. Dynamically dimensioned search algorithm for computationally efficient watershed model calibration. Water Resour. Res. 2007, 43, W01413. [Google Scholar] [CrossRef]
- Le, X.-H.; Huynh, T.T.; Song, M.; Lee, G. Quantifying predictive uncertainty and feature selection in river bed load estimation: A multi-model machine learning approach with particle swarm optimization. Water 2024, 16, 1945. [Google Scholar] [CrossRef]
- Wu, B.; Zheng, Y.; Tian, Y.; Wu, X.; Yao, Y.; Han, F.; Liu, J.; Zheng, C. Systematic assessment of the uncertainty in integrated surface water-groundwater modeling based on the probabilistic collocation method. Water Resour. Res. 2014, 50, 5848–5865. [Google Scholar] [CrossRef]
- Guzman, J.A.; Shirmohammadi, A.; Sadeghi, A.M.; Wang, X.; Chu, M.L.; Jha, M.K.; Parajuli, P.B.; Harmel, R.D.; Khare, Y.P.; Hernandez, J.E. Uncertainty considerations in calibration and validation of hydrologic and water quality models. Trans. ASABE 2015, 58, 1745–1762. [Google Scholar] [CrossRef]
- Yapo, P.O.; Gupta, H.V.; Sorooshian, S. Multi-objective global optimization for hydrologic models. J. Hydrol. 1998, 204, 83–97. [Google Scholar] [CrossRef]
- Vrugt, J.A.; Ter Braak, C.J. DREAM (D): An adaptive Markov Chain Monte Carlo simulation algorithm to solve discrete, noncontinuous, and combinatorial posterior parameter estimation problems. Hydrol. Earth Syst. Sci. 2011, 15, 3701–3713. [Google Scholar] [CrossRef]
- Thyer, M.; Renard, B.; Kavetski, D.; Kuczera, G.; Franks, S.W.; Srikanthan, S. Critical evaluation of parameter consistency and predictive uncertainty in hydrological modeling: A case study using Bayesian total error analysis. Water Resour. Res. 2009, 45, W00B14. [Google Scholar] [CrossRef]
- Beven, K.; Binley, A. The future of distributed models: Model calibration and uncertainty prediction. Hydrol. Process. 1992, 6, 279–298. [Google Scholar] [CrossRef]
- Gong, W.; Duan, Q.; Li, J.; Wang, C.; Di, Z.; Ye, A.; Miao, C.; Dai, Y. Multiobjective adaptive surrogate modeling-based optimization for parameter estimation of large, complex geophysical models. Water Resour. Res. 2016, 52, 1984–2008. [Google Scholar] [CrossRef]
- Farahani, M.A.; Wood, A.W.; Tang, G.; Mizukami, N. Calibrating a large-domain land/hydrology process model in the age of AI: The SUMMA CAMELS emulator experiments. Hydrol. Earth Syst. Sci. 2025, 29, 4515–4537. [Google Scholar] [CrossRef]
- Bennett, A.; Tran, H.; De la Fuente, L.; Triplett, A.; Ma, Y.; Melchior, P.; Maxwell, R.M.; Condon, L.E. Spatio-temporal machine learning for regional to continental scale terrestrial hydrology. J. Adv. Model. Earth Syst. 2024, 16, e2023MS004095. [Google Scholar] [CrossRef]
- Zia, A.; Oikonomou, P.D.; Clemins, P.J.; Schroth, A.W. H11N-0864 Co-producing hydroclimatic forecasts and evaluating their impact on nutrient budgeting and abatement costs for securing clean water in transboundary Missisquoi bay of Lake Champlain, 2000–2050. In Proceedings of the AGU Fall Meeting Abstracts, Washington, DC, USA, 9–13 December 2024; Available online: https://agu.confex.com/agu/agu24/meetingapp.cgi/Paper/1573373 (accessed on 24 December 2025).
- Zia, A.; Schroth, A.W.; Hecht, J.S.; Isles, P.; Clemins, P.J.; Turnbull, S.; Bitterman, P.; Tsai, Y.; Mohammed, I.N.; Bucini, G. Climate change-legacy phosphorus synergy hinders lake response to aggressive water policy targets. Earth’s Future 2022, 10, e2021EF002234. [Google Scholar] [CrossRef]
- Wurtsbaugh, W.A.; Paerl, H.W.; Dodds, W.K. Nutrients, eutrophication and harmful algal blooms along the freshwater to marine continuum. Wiley Interdiscip. Rev. Water 2019, 6, e1373. [Google Scholar] [CrossRef]
- Feng, L.; Wang, Y.; Hou, X.; Qin, B.; Kutser, T.; Qu, F.; Chen, N.; Paerl, H.W.; Zheng, C. Harmful algal blooms in inland waters. Nat. Rev. Earth Environ. 2024, 5, 631–644. [Google Scholar] [CrossRef]
- Iiames, J.; Salls, W.; Mehaffey, M.; Nash, M.; Christensen, J.; Schaeffer, B. Modeling anthropogenic and environmental influences on freshwater harmful algal bloom development detected by MERIS over the central United States. Water Resour. Res. 2021, 57, e2020WR028946. [Google Scholar] [CrossRef]
- Lan, J.; Liu, P.; Hu, X.; Zhu, S. Harmful algal blooms in eutrophic marine environments: Causes, monitoring, and treatment. Water 2024, 16, 2525. [Google Scholar] [CrossRef]
- Igwaran, A.; Kayode, A.J.; Moloantoa, K.M.; Khetsha, Z.P.; Unuofin, J.O. Cyanobacteria harmful algae blooms: Causes, impacts, and risk management. Water Air Soil Pollut. 2024, 235, 71. [Google Scholar] [CrossRef]
- Woolway, R.I.; Zhang, Y.; Jennings, E.; Zohary, T.; Jane, S.F.; Jansen, J.; Weyhenmeyer, G.A.; Long, D.; Fleischmann, A.; Feng, L. Extreme and compound events in lakes. Nat. Rev. Earth Environ. 2025, 6, 593–611. [Google Scholar] [CrossRef]
- Zhao, G.; Li, K.; Tian, S.; Liang, R.; Wang, Y. Applying a coupled model framework to assess global climate change impacts on the river-type harmful algal blooms in the middle and lower reaches of the Hanjiang River, China. Ecol. Indic. 2024, 169, 112834. [Google Scholar] [CrossRef]
- Elhabashy, A.; Li, J.; Sokolova, E. Water quality modeling of a eutrophic drinking water source: Impact of future climate on Cyanobacterial blooms. Ecol. Model. 2023, 477, 110275. [Google Scholar] [CrossRef]
- Golub, M.; Thiery, W.; Marcé, R.; Pierson, D.; Vanderkelen, I.; Mercado, D.; Woolway, R.I.; Grant, L.; Jennings, E.; Schewe, J. A framework for ensemble modelling of climate change impacts on lakes worldwide: The ISIMIP Lake Sector. Geosci. Model Dev. Discuss. 2022, 15, 4597–4623. [Google Scholar] [CrossRef]
- Chapra, S.C.; Boehlert, B.; Fant, C.; Bierman, V.J., Jr.; Henderson, J.; Mills, D.; Mas, D.M.; Rennels, L.; Jantarasami, L.; Martinich, J. Climate change impacts on harmful algal blooms in US freshwaters: A screening-level assessment. Environ. Sci. Technol. 2017, 51, 8933–8943. [Google Scholar] [CrossRef] [PubMed]
- Lin, Q.; Zhang, K.; Giguet-Covex, C.; Arnaud, F.; McGowan, S.; Gielly, L.; Capo, E.; Huang, S.; Ficetola, G.F.; Shen, J. Transient social–ecological dynamics reveal signals of decoupling in a highly disturbed Anthropocene landscape. Proc. Natl. Acad. Sci. USA 2024, 121, e2321303121. [Google Scholar] [CrossRef]
- Celikkol, S.; Fortin, N.; Tromas, N.; Andriananjamanantsoa, H.; Greer, C.W. Bioavailable nutrients (N and P) and precipitation patterns drive cyanobacterial blooms in Missisquoi Bay, Lake Champlain. Microorganisms 2021, 9, 2097. [Google Scholar] [CrossRef] [PubMed]
- McCarthy, M.J.; Gardner, W.S.; Lehmann, M.F.; Bird, D.F. Implications of water column ammonium uptake and regeneration for the nitrogen budget in temperate, eutrophic Missisquoi Bay, Lake Champlain (Canada/USA). Hydrobiologia 2013, 718, 173–188. [Google Scholar] [CrossRef]
- Wynne, T.T. Remote sensing of cyanobacterial blooms in Lake Champlain with a focus on Missisquoi Bay. J. Great Lakes Res. 2024, 50, 102293. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 3146–3154. [Google Scholar]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Battiti, R. Using Mutual Information for Selecting Features in Supervised Neural Net Learning. IEEE Trans. Neural Netw. 1994, 5, 537–550. [Google Scholar] [CrossRef] [PubMed]
- Kohavi, R.; John, G.H. Wrappers for Feature Subset Selection. In Proceedings of the International Journal of Artificial Intelligence, Atlanta, Georgia, 10–13 June 1997; pp. 273–324. [Google Scholar]
- Peng, H.; Long, F.; Ding, C. Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef] [PubMed]
- Quinlan, R. Induction of Decision Trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
- Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Kincaid, D.W.; Adair, E.C.; Joung, D.; Stockwell, J.D.; Schroth, A.W. Ice cover and thaw events influence nitrogen partitioning and concentration in two shallow eutrophic lakes. Biogeochemistry 2022, 157, 15–29. [Google Scholar] [CrossRef]
- Salih, A.M.; Galazzo, I.B.; Raisi-Estabragh, Z.; Petersen, S.E.; Menegaz, G.; Radeva, P. Characterizing the contribution of dependent features in XAI methods. IEEE J. Biomed. Health Inform. 2024, 28, 6466–6473. [Google Scholar] [CrossRef]
- Casalicchio, G.; Molnar, C.; Bischl, B. Visualizing the feature importance for black box models. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Dublin, Ireland, 10–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 655–670. [Google Scholar]
- Cao, Z.; Wang, S.; Luo, P.; Xie, D.; Zhu, W. Watershed Ecohydrological Processes in a Changing Environment. Water 2022, 14, 1234. [Google Scholar] [CrossRef]
- Gu, J.; Sun, S.; Wang, Y.B.; Li, X. Sociohydrology: An Effective Way to Reveal Coupled Evolution. Water Resour. Manag. 2021, 35, 123–139. [Google Scholar] [CrossRef]
- Li, B.; Sivapalan, M. Long-Term Coevolution of an Urban Human–Water System. Water Resour. Res. 2020, 56, e2019WR026123. [Google Scholar] [CrossRef]
- Marlow, D. Small Water Cycles: What They Are, Their Importance, Their Restoration; Water Research Australia: Adelaide, Australia, 2019. [Google Scholar]












| Model | MAE | RMSE | MAPE | R2 |
|---|---|---|---|---|
| Random Forest | 2.12 | 4.91 | 13.59 | 0.93 |
| LightGBM | 2.27 | 4.06 | 16.73 | 0.95 |
| XGBoost | 1.72 | 3.49 | 12.45 | 0.96 |
| Model | Precision | Recall | F1 Scores |
|---|---|---|---|
| Random Forest | 93.12 | 93.12 | 93.06 |
| XGBoost | 95.65 | 95.66 | 95.66 |
| LightGBM | 96.15 | 96.16 | 96.15 |
| Predicted Class | Precision | Recall | F1-Score | |
|---|---|---|---|---|
| Classifier Random Forest Model | ||||
| Oligotrophic | 0.9570 | 0.82 | 0.8859 | |
| Mesotrophic | 0.9567 | 0.9681 | 0.9623 | |
| Eutrophic | 0.9019 | 0.9286 | 0.9151 | |
| Hypereutrophic | 0.9138 | 0.8037 | 0.8552 | |
| Classifier Light GBM | ||||
| Oligotrophic | 0.9437 | 0.8970 | 0.9198 | |
| Mesotrophic | 0.9785 | 0.9806 | 0.9796 | |
| Eutrophic | 0.9485 | 0.9547 | 0.9516 | |
| Hypereutrophic | 0.9345 | 0.9138 | 0.9240 | |
| Classifier XGBoost | ||||
| Oligotrophic | 0.9546 | 0.8937 | 0.9231 | |
| Mesotrophic | 0.9756 | 0.9781 | 0.9769 | |
| Eutrophic | 0.9421 | 0.9479 | 0.9450 | |
| Hypereutrophic | 0.9235 | 0.9043 | 0.9138 | |
| Random Forest Model | LightGBM Model | XGBoost | ||||
|---|---|---|---|---|---|---|
| No. | Features | Importance | Features | Importance | Features | Importance |
| 1 | JPO4 | 0.287729 | JPO4 | 0.26522 | 0_redux | 0.10656 |
| 2 | POPR | 0.109358 | snowpack | 0.159382 | JPO4 | 0.077549 |
| 3 | JPON | 0.10622 | JPON | 0.105759 | S4_redux | 0.065863 |
| 4 | sat_def | 0.090995 | POPR | 0.092713 | snowpack | 0.055426 |
| 5 | snowpack | 0.086359 | sat_def | 0.082248 | gfdl-esm2m.1 | 0.044624 |
| 6 | PO4T2R | 0.085574 | PO4T2R | 0.068385 | miroc-esm.1 | 0.040734 |
| 7 | JPOP | 0.059167 | JPOP | 0.06173 | JPON | 0.035666 |
| 8 | baseflow | 0.022333 | evap | 0.022808 | sat_def | 0.033486 |
| 9 | %sat_area | 0.018513 | %sat_area | 0.021349 | POPR | 0.032793 |
| 10 | 0_redux | 0.017991 | baseflow | 0.021025 | mri-cgcm3.1 | 0.031535 |
| 11 | evap | 0.016577 | 0_redux | 0.013934 | noresm1-m.1 | 0.030706 |
| 12 | mri-cgcm3.1 | 0.007068 | pike_TP | 0.007113 | canesm2.1 | 0.029454 |
| 13 | noresm1-m.1 | 0.006373 | mri-cgcm3.1 | 0.006814 | S2_redux | 0.028656 |
| 14 | miroc-esm.1 | 0.006246 | noresm1-m.1 | 0.006297 | PO4T2R | 0.027717 |
| 15 | ccsm4.1 | 0.005241 | streamflow | 0.006265 | JPOP | 0.025247 |
| 16 | gfdl-esm2m.1 | 0.005103 | missisquoi_TP | 0.005444 | ccsm4.1 | 0.0238 |
| 17 | pike_TP | 0.004589 | miroc-esm.1 | 0.005205 | basin_TN_gps | 0.022494 |
| 18 | missisquoi_TP | 0.004564 | ccsm4.1 | 0.004781 | pike_TN | 0.022414 |
| 19 | rock_TP | 0.004527 | trans | 0.004062 | 20_redux | 0.017224 |
| 20 | streamflow | 0.004387 | rcp85 | 0.003764 | rcp45 | 0.015557 |
| 21 | RandomNum | 0.004144 | rock_TP | 0.003586 | rcp85 | 0.014869 |
| 22 | rcp85 | 0.00411 | S4_redux | 0.003427 | missisquoi_TP | 0.013824 |
| 23 | rcp60 | 0.00385 | rcp45 | 0.003059 | pike_TP | 0.013381 |
| 24 | trans | 0.003554 | rcp60 | 0.002743 | rock_TP | 0.012564 |
| 25 | basin_TP_gps | 0.003306 | missisquoi_Q | 0.002709 | rcp60 | 0.01245 |
| 26 | rcp45 | 0.003131 | gfdl-esm2m.1 | 0.00232 | 60_redux | 0.011888 |
| 27 | S4_redux | 0.002717 | pike_TN | 0.001869 | 100_redux | 0.010429 |
| 28 | canesm2.1 | 0.002629 | rock_TN | 0.001729 | %sat_area | 0.010406 |
| 29 | S2_redux | 0.002148 | rock_Q | 0.001712 | missisquoi_Q | 0.010406 |
| 30 | pike_TN | 0.001976 | pike_Q | 0.001681 | S1_redux | 0.009953 |
| 31 | missisquoi_TN | 0.00196 | canesm2.1 | 0.001596 | rcp26 | 0.009949 |
| 32 | missisquoi_Q | 0.001931 | basin_TP_gps | 0.001407 | baseflow | 0.009673 |
| 33 | basin_TN_gps | 0.001913 | basin_TN_gps | 0.001385 | evap | 0.008914 |
| 34 | rock_TN | 0.00189 | S2_redux | 0.001307 | rock_TN | 0.008367 |
| 35 | pike_Q | 0.001856 | rcp26 | 0.001035 | S0_redux | 0.008351 |
| 36 | rock_Q | 0.001845 | missisquoi_TN | 0.001001 | pike_Q | 0.007178 |
| 37 | 20_redux | 0.001678 | S1_redux | 0.000646 | missisquoi_TN | 0.007067 |
| 38 | rcp26 | 0.001372 | RandomNum | 0.000614 | 40_redux | 0.006701 |
| 39 | S1_redux | 0.001217 | inmcm4.1 | 0.000366 | 80_redux | 0.006283 |
| 40 | 40_redux | 0.000733 | 20_redux | 0.000277 | streamflow | 0.005903 |
| 41 | S0_redux | 0.000676 | miroc-esm-chem.1 | 0.000221 | trans | 0.005435 |
| 42 | S3_redux | 0.000535 | 100_redux | 0.000177 | inmcm4.1 | 0.005432 |
| 43 | 60_redux | 0.000406 | 40_redux | 0.00016 | S3_redux | 0.00479 |
| 44 | 64_redux | 0.00037 | S3_redux | 0.00016 | basin_TP_gps | 0.004218 |
| 45 | inmcm4.1 | 0.000323 | ipsl-cm5a-mr.1 | 0.000128 | 64_redux | 0.003713 |
| 46 | 100_redux | 0.000316 | S0_redux | 0.000113 | miroc-esm-chem.1 | 0.003645 |
| 47 | miroc-esm-chem.1 | 0.000194 | 60_redux | 0.0001 | ipsl-cm5a-mr.1 | 0.003206 |
| 48 | ipsl-cm5a-mr.1 | 0.000161 | 64_redux | 0.000094 | rock_Q | 0.002245 |
| 49 | 80_redux | 0.000144 | 80_redux | 0.000078 | RandomNum | 0.001254 |
| 50 | PO4JRES | 0 | PO4JRES | 0 | PO4JRES | 0 |
| Random Forest Model | LightGBM Model | XGBoost | ||||
|---|---|---|---|---|---|---|
| No. | Features | Importance | Features | Importance | Features | Importance |
| 1 | JPO4 | 0.174135 | sat_def | 0.451271 | 0_redux | 0.162563 |
| 2 | snowpack | 0.088807 | PO4T2R | 0.117711 | snowpack | 0.069072 |
| 3 | JPON | 0.087086 | JPON | 0.095367 | JPO4 | 0.056315 |
| 4 | POPR | 0.072338 | snowpack | 0.054473 | gfdl-esm2m.1 | 0.050352 |
| 5 | JPOP | 0.061633 | RandomNum | 0.052135 | canesm2.1 | 0.040594 |
| 6 | evap | 0.059516 | evap | 0.047379 | S4_redux | 0.03954 |
| 7 | sat_def | 0.057572 | JPO4 | 0.037579 | S2_redux | 0.037964 |
| 8 | PO4T2R | 0.043682 | POPR | 0.024385 | JPON | 0.031313 |
| 9 | sat_area | 0.042902 | JPOP | 0.020665 | POPR | 0.028901 |
| 10 | baseflow | 0.034019 | baseflow | 0.015573 | 60_redux | 0.026905 |
| 11 | pike_TP | 0.018696 | streamflow | 0.015533 | miroc-esm.1 | 0.021996 |
| 12 | rock_TP | 0.018348 | %sat_area | 0.01309 | JPOP | 0.018945 |
| 13 | missisquoi_TP | 0.018347 | pike_TP | 0.012963 | rock_TP | 0.018756 |
| 14 | streamflow | 0.016547 | basin_TP_gps | 0.006362 | mri-cgcm3.1 | 0.018128 |
| 15 | missisquoi_TN | 0.016017 | missisquoi_TP | 0.005235 | inmcm4.1 | 0.017671 |
| 16 | rock_TN | 0.015806 | missisquoi_Q | 0.004666 | PO4T2R | 0.017074 |
| 17 | basin_TN_gps | 0.015592 | rock_TP | 0.003397 | sat_def | 0.01596 |
| 18 | pike_Q | 0.015444 | rock_TN | 0.002987 | ccsm4.1 | 0.015888 |
| 19 | basin_TP_gps | 0.01533 | S4_redux | 0.00249 | noresm1-m.1 | 0.015487 |
| 20 | rock_Q | 0.015297 | missisquoi_TN | 0.002086 | 20_redux | 0.015102 |
| 21 | pike_TN | 0.015199 | trans | 0.001732 | rock_TN | 0.014766 |
| 22 | missisquoi_Q | 0.015091 | 0_redux | 0.001224 | evap | 0.014276 |
| 23 | trans | 0.013491 | rcp85 | 0.00115 | pike_Q | 0.014007 |
| 24 | RandomNum | 0.01264 | rock_Q | 0.001026 | S1_redux | 0.01308 |
| 25 | 0_redux | 0.005702 | rcp45 | 0.000941 | missisquoi_TN | 0.012917 |
| 26 | noresm1-m.1 | 0.005023 | pike_TN | 0.000845 | 100_redux | 0.012649 |
| 27 | mri-cgcm3.1 | 0.005021 | gfdl-esm2m.1 | 0.000809 | missisquoi_TP | 0.012622 |
| 28 | ccsm4.1 | 0.00446 | pike_Q | 0.000806 | pike_TP | 0.012354 |
| 29 | miroc-esm.1 | 0.00426 | ccsm4.1 | 0.000762 | %sat_area | 0.012291 |
| 30 | rcp85 | 0.003901 | mri-cgcm3.1 | 0.000692 | baseflow | 0.0113 |
| 31 | gfdl-esm2m.1 | 0.003872 | basin_TN_gps | 0.000537 | rcp26 | 0.01106 |
| 32 | rcp45 | 0.003719 | noresm1-m.1 | 0.000531 | missisquoi_Q | 0.010336 |
| 33 | rcp60 | 0.003572 | rcp60 | 0.000531 | rcp60 | 0.010291 |
| 34 | canesm2.1 | 0.003333 | 64_redux | 0.000521 | rcp45 | 0.009828 |
| 35 | rcp26 | 0.002196 | miroc-esm.1 | 0.000468 | rcp85 | 0.009625 |
| 36 | S1_redux | 0.001335 | canesm2.1 | 0.000441 | trans | 0.008985 |
| 37 | inmcm4.1 | 0.001283 | 80_redux | 0.000438 | streamflow | 0.00877 |
| 38 | 60_redux | 0.001077 | rcp26 | 0.000242 | S0_redux | 0.008641 |
| 39 | S2_redux | 0.001037 | S2_redux | 0.000167 | pike_TN | 0.008479 |
| 40 | S0_redux | 0.000932 | inmcm4.1 | 0.000157 | rock_Q | 0.008107 |
| 41 | S3_redux | 0.000931 | miroc-esm-chem.1 | 0.000128 | S3_redux | 0.007825 |
| 42 | S4_redux | 0.000833 | 60_redux | 0.000094 | miroc-esm-chem.1 | 0.00776 |
| 43 | 64_redux | 0.000806 | 20_redux | 0.000089 | 40_redux | 0.007487 |
| 44 | 20_redux | 0.000637 | S1_redux | 0.000083 | basin_TP_gps | 0.006696 |
| 45 | 40_redux | 0.000564 | ipsl-cm5a-mr.1 | 0.000082 | 64_redux | 0.006549 |
| 46 | miroc-esm-chem.1 | 0.000529 | S3_redux | 0.000055 | basin_TN_gps | 0.006182 |
| 47 | 80_redux | 0.000487 | 40_redux | 0.000052 | ipsl-cm5a-mr.1 | 0.005993 |
| 48 | 100_redux | 0.000478 | S0_redux | 0.000028 | 80_redux | 0.005598 |
| 49 | ipsl-cm5a-mr.1 | 0.000477 | 100_redux | 0.000023 | RandomNum | 0.003002 |
| 50 | PO4JRES | 0 | PO4JRES | 0 | PO4JRES | 0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zia, A.; Clemins, P.J.; Adil, M.; Schroth, A.; Rizzo, D.; Oikonomou, P.D.; Wshah, S. Machine-Learned Emulators for Teleconnection Discovery and Uncertainty Quantification in Coupled Human–Natural Systems. Water 2026, 18, 79. https://doi.org/10.3390/w18010079
Zia A, Clemins PJ, Adil M, Schroth A, Rizzo D, Oikonomou PD, Wshah S. Machine-Learned Emulators for Teleconnection Discovery and Uncertainty Quantification in Coupled Human–Natural Systems. Water. 2026; 18(1):79. https://doi.org/10.3390/w18010079
Chicago/Turabian StyleZia, Asim, Patrick J. Clemins, Muhammad Adil, Andrew Schroth, Donna Rizzo, Panagiotis D. Oikonomou, and Safwan Wshah. 2026. "Machine-Learned Emulators for Teleconnection Discovery and Uncertainty Quantification in Coupled Human–Natural Systems" Water 18, no. 1: 79. https://doi.org/10.3390/w18010079
APA StyleZia, A., Clemins, P. J., Adil, M., Schroth, A., Rizzo, D., Oikonomou, P. D., & Wshah, S. (2026). Machine-Learned Emulators for Teleconnection Discovery and Uncertainty Quantification in Coupled Human–Natural Systems. Water, 18(1), 79. https://doi.org/10.3390/w18010079

