Machine Learning-Based Prediction and Interpretability Analysis of Chlorophyll-a and Algal Density Using High-Frequency Water Quality Data
Abstract
1. Introduction
2. Materials and Methods
2.1. Study Area and Data Acquisition
2.2. Physicochemical Indicators and Feature Description
| Index | Units | Mean Values | Standard Deviation | Min Values | Top 25% Quantile | Top 50% Quantile | Top 75% Quantile |
|---|---|---|---|---|---|---|---|
| Water Temperature | °C | 21.97 | 6.90 | 11.60 | 15.20 | 22.90 | 27.98 |
| pH | 8.01 | 0.49 | 5.82 | 7.62 | 7.97 | 8.33 | |
| Dissolved Oxygen | mg/L | 9.49 | 4.22 | 0.01 | 6.23 | 8.65 | 12.42 |
| Conductivity | μS/cm | 898.81 | 196.68 | 459.23 | 694.20 | 962.05 | 1065.50 |
| Turbidity | NTU | 23.04 | 35.94 | 0.10 | 7.10 | 12.75 | 24.89 |
| COD (Permanganate Index) | mg/L | 4.67 | 0.82 | 3.10 | 4.05 | 4.51 | 5.14 |
| Ammonia Nitrogen | mg/L | 0.092 | 0.104 | 0.001 | 0.027 | 0.059 | 0.115 |
| Total Phosphorus | mg/L | 0.12 | 0.04 | 0.03 | 0.09 | 0.11 | 0.15 |
| COD (Dichromate Index) | mg/L | 21.44 | 8.12 | 11.85 | 16.27 | 18.67 | 24.77 |
| Chlorophyll a | mg/L | 0.0609 | 0.0783 | 0.0010 | 0.0090 | 0.0247 | 0.0820 |
| Algal Density | Cells/L | 80,568.28 | 114,369.10 | 20,617.00 | 25,878.00 | 41,218.25 | 63,409.00 |
2.3. Data Preprocessing and Quality Control
2.4. Machine Learning Model Development and Optimization
2.5. Model Interpretability and Feature Attribution Framework
3. Results
3.1. Model Performance and Predictive Accuracy
3.2. Identification of Key Driving Factors
3.3. Feature Dependency in High-Concentration Algal Events (Top 25% Quantile)
3.4. Seasonal Dynamics of Environmental Drivers for Algal Density Prediction
4. Discussion
4.1. Different Mechanisms of Physical and Chemical Drivers on Chl-a and Algal Density
4.2. Seasonal Succession of Limiting Factors: From Thermal Triggers to Nutrient Constraints
4.3. Implications for Water Environment Management and Early Warning Systems
4.4. Generalizability and Context Dependence of the DO–COD Dualistic Control Framework
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Sung, Y.H.; Liew, J.H.; Chan, W.S.; Fok, A.W.L.; Leung, J.; Wong, H.F.; Baker, D.M.; Bonebrake, T.C.; Dingle, C.; Dudgeon, D.; et al. Stable isotope analysis successfully identifies wild-caught individuals of threatened asian freshwater turtles in illegal trade. Glob. Ecol. Conserv. 2025, 64, e03947. [Google Scholar] [CrossRef]
- Faghihinia, M.; Xu, Y.; Liu, D.; Wu, N. Freshwater biodiversity at different habitats: Research hotspots with persistent and emerging themes. Ecol. Indic. 2021, 129, 107926. [Google Scholar] [CrossRef]
- Huisman, J.; Codd, G.A.; Paerl, H.W.; Ibelings, B.W.; Verspagen, J.M.H.; Visser, P.M. Cyanobacterial blooms. Nat. Rev. Microbiol. 2018, 16, 471–483. [Google Scholar] [CrossRef]
- Paerl, H.W.; Barnard, M.A. Mitigating the global expansion of harmful cyanobacterial blooms: Moving targets in a human- and climatically-altered world. Harmful Algae 2020, 96, 101845. [Google Scholar] [CrossRef]
- Li, T.; Zhang, Y.; Zhang, L.; Liu, Z.; Zhu, J.; Zhou, Y.; Yang, J.R. Succession of phytoplankton functional groups in a subtropical lake associated with rainfall patterns. Sci. Rep. 2025, 15, 16865. [Google Scholar] [CrossRef] [PubMed]
- Ho, J.C.; Michalak, A.M. Challenges in tracking harmful algal blooms: A synthesis of evidence from Lake Erie. J. Gt. Lakes Res. 2015, 41, 317–325. [Google Scholar] [CrossRef]
- Joshi, N.; Park, J.; Zhao, K.; Londo, A.; Khanal, S. Monitoring harmful algal blooms and water quality using sentinel-3 OLCI satellite imagery with machine learning. Remote Sens. 2024, 16, 2444. [Google Scholar] [CrossRef]
- Kirchner, J.W. Getting the right answers for the right reasons: Linking measurements, analyses, and models to advance the science of hydrology. Water Resour. Res. 2006, 42, W03S04. [Google Scholar] [CrossRef]
- Westerberg, I.K.; Wagener, T.; Coxon, G.; Mcmillan, H.K.; Castellarin, A.; Montanari, A.; Freer, J. Uncertainty in hydrological signatures for gauged and ungauged catchments. Water Resour. Res. 2016, 52, 1847–1865. [Google Scholar] [CrossRef]
- Zhao, W.; Li, Z.L.; Wu, H.; Tang, B.H.; Zhang, X.; Song, X.; Zhou, G. Determination of bare surface soil moisture from combined temporal evolution of land surface temperature and net surface shortwave radiation. Hydrol. Process. 2013, 27, 2825–2833. [Google Scholar] [CrossRef]
- Wagener, T.; Sivapalan, M.; Troch, P.A.; Mcglynn, B.L.; Harman, C.J.; Gupta, H.V.; Kumar, P.; Rao, P.S.C.; Basu, N.S.; Wilson, J.S. The future of hydrology: An evolving science for a changing world. Water Resour. Res. 2010, 46, WR008906. [Google Scholar] [CrossRef]
- Kim, S.; Kim, S.; Green, C.H.M.; Jeong, J. Multivariate polynomial regression modeling of total dissolved-solids in rangeland stormwater runoff in the Colorado River Basin. Environ. Modell. Softw. 2022, 157, 105523. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Shen, C. A transdisciplinary review of deep learning research and its relevance for water resources scientists. Water Resour. Res. 2018, 54, 8541–9707. [Google Scholar] [CrossRef]
- Tyralis, H.; Papacharalampous, G.; Langousis, A. A brief review of random forests for water scientists and practitioners and their recent history in water resources. Water 2019, 11, 910. [Google Scholar] [CrossRef]
- Olden, J.D.; Lawler, J.J.; Poff, N.L. Machine learning methods without tears: A primer for ecologists. Q. Rev. Bio. 2008, 83, 171–193. [Google Scholar] [CrossRef] [PubMed]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017), Proceedings of the 31st Conference on Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 4765–4774. [Google Scholar]
- Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, 2nd ed. Available online: https://christophm.github.io/interpretable-ml-book/ (accessed on 16 April 2026).
- Coffey, R.; Paul, M.J.; Stamp, J.; Hamilton, A.; Johnson, T. A review of water quality responses to air temperature and precipitation changes 2: Nutrients, algal blooms, sediment, pathogens. JAWRA J. Am. Water Resour. Assoc. 2019, 55, 844–868. [Google Scholar] [CrossRef]
- Reinl, K.L.; Harris, T.D.; North, R.L.; Almela, P.; Berger, S.A.; Bizic, M.; Burnet, S.H.; Grossart, H.P.; Ibelings, B.W.; Jakobsson, E. Blooms also like it cold. Limnol. Oceanogr. Lett. 2023, 8, 546–564. [Google Scholar] [CrossRef]
- Chorus, I.; Spijkerman, E. What Colin Reynolds could tell us about nutrient limitation, N: P ratios and eutrophication control. Hydrobiologia 2021, 848, 95–111. [Google Scholar] [CrossRef]
- Yu, H.; Zhang, J.; Yin, Z.; Liu, Z.; Chen, J.; Xu, J.; Gao, Q.; Liu, J. A method for quantifying the contribution of algal sources to CODMn in water bodies based on ecological chemometrics and its potential applications. Environ. Chem. Eng. 2024, 12, 111943. [Google Scholar] [CrossRef]
- Shan, X.; Li, C.G.; Li, F.M. Water quality variation of a typical urban landscape river replenished with reclaimed water. Water Cycle 2023, 4, 137–144. [Google Scholar] [CrossRef]
- Nong, X.; Huang, L.; Chen, L.; Wei, J.; Li, R. Distribution, relationship, and environmental driving factors of chlorophyll-a and algal cell density: A national view of China. Glob. Ecol. Conserv. 2024, 54, e03084. [Google Scholar] [CrossRef]
- Gao, L.; Shangguan, Y.; Sun, Z.; Shen, Q.; Zhou, L. A novel algal bloom risk assessment framework by integrating environmental factors based on explainable machine learning. Ecol. Inform. 2025, 87, 103098. [Google Scholar] [CrossRef]
- He, Y.; Wang, X.; Xu, F. How reliable is chlorophyll-a as algae proxy in lake environments? New insights from the perspective of n-alkanes. Sci. Total Environ. 2022, 836, 155700. [Google Scholar] [CrossRef] [PubMed]
- Fadel, A.; Atoui, A.; Lemaire, B.J.; Vinçon-Leite, B.; Slim, K. Environmental factors associated with phytoplankton succession in a Mediterranean reservoir with a highly fluctuating water level. Environ. Monit. Assess. 2015, 187, 633. [Google Scholar] [CrossRef] [PubMed]
- Wu, Y.; Xian, B.; Xiang, X.; Fang, F.; Chu, F.; Deng, X.; Fang, T. Identification of key feature variables and prediction of harmful algal blooms in a water diversion lake based on interpretable machine learning. Environ. Res. 2025, 276, 121491. [Google Scholar] [CrossRef] [PubMed]
- Huang, J.; Zhang, J.; Wang, N.; Hu, S.; Duan, Y. Identification of the driving factors to algal biomass in Lake Dianchi: Implications for eutrophication control. Water 2024, 16, 3485. [Google Scholar] [CrossRef]
- Dai, J.Y.; Wu, S.; Lv, X.; Yang, Q.; Wu, X.; Zhou, J.; Wang, F. Effect of water diversion on spatial-temporal dynamics of organic pollutants in Gonghu Bay, Lake Taihu. J. Hydroecol. 2016, 37, 39–46. [Google Scholar]
- Wen, C.C.; Huang, T.L.; Kong, C.H.; Zhang, Z.G.; Tian, P.F. Analysis of mechanism and start-up thresholds of seasonal algal blooms in a northern eutrophic stratified reservoir. Huan Jing Ke Xue 2023, 44, 1452–1464. [Google Scholar]
- Li, Y.; Huang, Y.; Ji, D.; Cheng, Y.; Nwankwegu, A.S.; Paerl, H.W.; Li, J. Storm and floods increase the duration and extent of phosphorus limitation on algal blooms in a tributary of the Three Gorges Reservoir, China. J. Hydrol. 2022, 607, 127562. [Google Scholar] [CrossRef]
- Busari, I.; Sahoo, D.; Harmel, R.D.; Haggard, B.E. A review of machine learning models for harmful algal bloom monitoring in freshwater systems. J. Nat. Resour. Agric. Ecosyst. 2023, 1, 63–76. [Google Scholar] [CrossRef]
- Demiray, B.Z.; Mermer, O.; Baydaroğlu, Ö.; Demir, I. Predicting harmful algal blooms using explainable deep learning models: A comparative study. Water 2025, 17, 676. [Google Scholar] [CrossRef]
- Shi, X.; Wang, L.; Chen, A.; Yu, W.; Liu, Y.; Huang, X.; Qu, D. Enhancing water quality and ecosystems of reclaimed water-replenished river: A case study of Dongsha River, Beijing, China. Sci. Total Environ. 2024, 926, 172024. [Google Scholar] [CrossRef]
- Jiang, Z.P.; Tong, Y.; Tong, M.; Yuan, J.; Cao, Q.; Pan, Y. The effects of suspended particulate matter, nutrient, and salinity on the growth of Amphidinium carterae under estuary environmental conditions. Front. Mar. Sci. 2021, 8, 690764. [Google Scholar] [CrossRef]
- Pulina, S.; Lugliè, A.; Mariani, M.A.; Sarria, M.; Sechi, N.; Padedda, B.M. Multiannual decrement of nutrient concentrations and phytoplankton cell size in a Mediterranean reservoir. Nat. Conserv. 2019, 34, 163–191. [Google Scholar] [CrossRef]
- Wu, L.; Liu, K.; Wang, Z.; Yang, Y.; Sang, R.; Zhu, H.; Liu, F. Temporal–spatial variations in physicochemical factors and assessing water quality condition in river–lake system of Chaohu Lake Basin, China. Sustainability 2025, 17, 2182. [Google Scholar] [CrossRef]
- Liu, J.; Du, G.; Wu, D.; Wu, Y.; Yang, Z.; Hua, Z. On nutritional status and blue-green algae water bloom of urban rivers and lakes in Beijing. J. Saf. Environ. 2006, 6, 5–8. [Google Scholar]
- Jin, Y.; Ren, S.; Wu, Y.; Zhang, X.; Chen, Z.; Xie, B. Microbial community structures and bacteria-Cylindrospermopsis raciborskii interactions in Yilong Lake. FEMS Microbiol. Ecol. 2024, 100, fiae048. [Google Scholar] [CrossRef]
- Wu, Y.; Peng, C.; Li, G.; He, F.; Huang, L.; Sun, X.; Wu, S. Integrated evaluation of the impact of water diversion on water quality index and phytoplankton assemblages of eutrophic lake: A case study of Yilong Lake. J. Environ. Manag. 2024, 357, 120707. [Google Scholar] [CrossRef]
- He, J.; Zhang, Y.; Wu, X.; Yang, Y.; Xu, X.; Zheng, B.; Deng, W.; Shao, Z.; Lu, L.; Wang, L.; et al. A study on the relationship between metabolism of cyanobacteria and chemical oxygen demand in Dianchi Lake, China. Water Environ. Res. 2019, 91, 1650–1660. [Google Scholar] [CrossRef]
- Zhang, W.; Gu, P.; Zhu, W.; Jing, C.; He, J.; Yang, X.; Zhou, L.; Zheng, Z. Effects of cyanobacterial accumulation and decomposition on the microenvironment in water and sediment. J. Soils Sediments 2020, 20, 2510–2525. [Google Scholar] [CrossRef]
- Yang, J.; Wang, F.; Lv, J.; Liu, Q.; Nan, F.; Xie, S.; Feng, J. Responses of freshwater algal cell density to hydrochemical variables in an urban aquatic ecosystem, northern China. Environ. Monit. Assess. 2019, 191, 29. [Google Scholar] [CrossRef]










| Index | Target | MSE | R2 |
|---|---|---|---|
| 1 | Chlorophyll a (Train) | 0.000332668 (μg2/L2) | 0.947 |
| 2 | Chlorophyll a (Test) | 0.00042915 (μg2/L2) | 0.926 |
| 3 | Algal Density (Train) | 1,038,403,219 (cells2/L2) | 0.921 |
| 4 | Algal Density (Test) | 1,238,666,970 (cells2/L2) | 0.903 |
| Index | Spring | Summer | Autumn | Winter |
|---|---|---|---|---|
| Turbidity | 0.0021 | 0.018 | 0.00010 | 0.0023 |
| Ammonia Nitrogen | 0.0037 | 0.0056 | 0.00010 | 0.0023 |
| COD (Dichromate Index) | 0.032 | 0.0031 | 0.00010 | 0.0018 |
| COD (Permanganate Index) | 0.0011 | 0.0019 | 0.00020 | 0.00050 |
| Conductivity | 0.0029 | 0.0049 | 0.0050 | 0.0037 |
| Dissolved Oxygen | 0.0068 | 0.0051 | 0.00010 | 0.032 |
| Water Temperature | 0.079 | 0.0031 | 0.0031 | 0.00080 |
| Total Phosphorus | 0.010 | 0.00030 | 0.0011 | 0.0060 |
| pH | 0.0082 | 0.0033 | 0.00020 | 0.00050 |
| Index | Spring | Summer | Autumn | Winter |
|---|---|---|---|---|
| Turbidity | 3941 | 9235 | 770 | 86 |
| Ammonia Nitrogen | 1055 | 1532 | 2051 | 146 |
| COD (Dichromate Index) | 63,332 | 29,959 | 12,565 | 664 |
| COD (Permanganate Index) | 3644 | 56,227 | 12,075 | 315 |
| Conductivity | 1309 | 8738 | 3854 | 51 |
| Dissolved Oxygen | 2460 | 7800 | 793 | 534 |
| Water Temperature | 21,913 | 4904 | 12,827 | 2738 |
| Total Phosphorus | 8214 | 2052 | 4075 | 60 |
| pH | 1358 | 8369 | 1328 | 101 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, W.; Hu, X.; Meng, H.; Liu, C.; Wang, Y.; Jiao, T.; Chang, Q.; Lai, B. Machine Learning-Based Prediction and Interpretability Analysis of Chlorophyll-a and Algal Density Using High-Frequency Water Quality Data. Diversity 2026, 18, 282. https://doi.org/10.3390/d18050282
Wang W, Hu X, Meng H, Liu C, Wang Y, Jiao T, Chang Q, Lai B. Machine Learning-Based Prediction and Interpretability Analysis of Chlorophyll-a and Algal Density Using High-Frequency Water Quality Data. Diversity. 2026; 18(5):282. https://doi.org/10.3390/d18050282
Chicago/Turabian StyleWang, Wei, Xinglu Hu, Hongzhi Meng, Chuankun Liu, Yang Wang, Tong Jiao, Qixin Chang, and Bo Lai. 2026. "Machine Learning-Based Prediction and Interpretability Analysis of Chlorophyll-a and Algal Density Using High-Frequency Water Quality Data" Diversity 18, no. 5: 282. https://doi.org/10.3390/d18050282
APA StyleWang, W., Hu, X., Meng, H., Liu, C., Wang, Y., Jiao, T., Chang, Q., & Lai, B. (2026). Machine Learning-Based Prediction and Interpretability Analysis of Chlorophyll-a and Algal Density Using High-Frequency Water Quality Data. Diversity, 18(5), 282. https://doi.org/10.3390/d18050282

