Machine Learning Models for Chlorophyll-a Forecasting in a Freshwater Lake: Case Study of Lake Taihu
Abstract
1. Introduction
2. Materials and Methods
2.1. Study Area
2.2. Data Preprocessing
2.2.1. Data Introduction
2.2.2. Data Preprocessing Steps
2.3. Machine Learning Model
2.3.1. Feature Analysis Methods
- (a)
- Pearson correlation coefficient (PCC) and principal component analysis (PCA)
- (b)
- Mutual information (MI) and Spearman rank correlation coefficients (SRCCs)
- (c)
- SHAP model
2.3.2. ML Model Construction
- (a)
- Linear Regression
- (b)
- Decision Tree
- (c)
- Support Vector Regression
- (d)
- Multi-Layer Perceptrons
- (e)
- Random Forest
- (f)
- XGBoost
2.3.3. Evaluation Indicators
3. Results
3.1. The Spatial and Temporal Distribution Features of Algal and Water Quality Datasets
3.1.1. The Spatio-Temporal Distribution of Chl-a Concentrations
3.1.2. Correlation Analysis of Algal Information and Chl-a
3.1.3. Analysis of Water Quality Information and Chl-a
3.2. Machine Learning Prediction of Chl-a Concentration
3.2.1. Model Accuracy
3.2.2. Feature Important Explanation
Rank | Ref [38] | Ref [43] | Ref [46] | Ref [47] | Ref [21] | Ref [45] | Ref [8] | This Study | ||
---|---|---|---|---|---|---|---|---|---|---|
Juam reservoir | Yeongsan reservoir | Nakdong river | Nakdong river | Lake Shinji | Lake Jordan | Imha reservoir | Lacustrine zone | Han river | Lake Taihu | |
Model | SVM | SVM | RF | ANN | RF | Mechanistic model | SVM | SVM | RF | XGBoost |
1 | PO4–P | NH3–N | PO4–P | Wind velocity | NTU | Limiting nutrient | WT | WT | TOC | EC |
2 | NO3–N | NO3–N | DO | EC | CODMn | TN | TSS a | Prep b | TN | WT |
3 | Wind speed | Solar radiation | NH3–N | Alkalinity | SS c | TP | DO | BOD | pH | CODMn |
4. Discussion
4.1. Model Performance and Feature Analysis
4.2. Model Generalization Performance
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
ANN | Artificial neural network |
Chl-a | Chlorophyll-a |
CNN | Convolutional neural network |
CODMn | Permanganate index |
DD | Data-driven |
DL | Deep learning |
DO | Dissolved oxygen |
DT | Decision tree |
EC | Electrical conductivity |
Cyano-HABs | Cyanobacteria harmful blooms |
LR | Linear regression |
LSTM | Long short-term memory |
ML | Machine learning |
MLP | Multi-layer perceptron |
NH3-N | Ammonia nitrogen |
NTU | Nephelometric turbidity units |
PB | Process-based |
PCA | Principal component analysis |
PCC | Pearson correlation coefficient |
R2 | Coefficient of determination |
RF | Random forest |
RMSE | Root mean square error |
SRCC | Spearman rank correlation coefficient |
SVR | Support vector regression |
TP | Total phosphorus |
TN | Total nitrogen |
WT | Water temperature |
XGBoost | eXtreme Gradient Boosting |
References
- Lévesque, B.; Gervais, M.C.; Chevalier, P.; Gauvin, D.; Anassour-Laouan-Sidi, E.; Gingras, S.; Fortin, N.; Brisson, G.; Greer, C.; Bird, D. Prospective study of acute health effects in relation to exposure to cyanobacteria. Sci. Total Environ. 2014, 466, 397–403. [Google Scholar] [CrossRef] [PubMed]
- Rousso, B.Z.; Bertone, E.; Stewart, R.; Hamilton, D.P. A systematic literature review of forecasting and predictive models for cyanobacteria blooms in freshwater lakes. Water Res. 2020, 182, 115959. [Google Scholar] [CrossRef] [PubMed]
- Guo, L. Doing battle with the green monster of Taihu Lake. Science 2007, 317, 1166. [Google Scholar] [CrossRef]
- Wang, H.; Zhu, R.; Zhang, J.; Ni, L.; Shen, H.; Xie, P. A Novel and Convenient Method for Early Warning of Algal Cell Density by Chlorophyll Fluorescence Parameters and Its Application in a Highland Lake. Front. Plant Sci. 2018, 9, 869. [Google Scholar] [CrossRef]
- Recknagel, F. Current scope, case studies and future directions of ecological informatics. J. Environ. Inform. 2013, 21, 3–11. [Google Scholar] [CrossRef]
- Boyer, J.N.; Kelble, C.R.; Ortner, P.B.; Rudnick, D.T. Phytoplankton bloom status: Chlorophyll a biomass as an indicator of water quality condition in the southern estuaries of Florida, USA. Ecol. Indic. 2009, 9, S56–S67. [Google Scholar] [CrossRef]
- Yang, J.; Zheng, Y.; Zhang, W.; Zhou, Y.; Zhang, Y. Comparative analysis of machine learning methods for prediction of chlorophyll-a in a river with different hydrology characteristics: A case study in Fuchun River, China. J. Environ. Manag. 2024, 364, 121386. [Google Scholar] [CrossRef]
- Kim, K.-M.; Ahn, J.-H. Machine learning predictions of chlorophyll-a in the Han river basin, Korea. J. Environ. Manag. 2022, 318, 115636. [Google Scholar] [CrossRef]
- Qin, B.; Paerl, H.W.; Brookes, J.D.; Liu, J.; Jeppesen, E.; Zhu, G.; Zhang, Y.; Xu, H.; Shi, K.; Deng, J. Why Lake Taihu continues to be plagued with cyanobacterial blooms through 10 years (2007–2017) efforts. Sci. Bull. 2019, 64, 7–9. [Google Scholar] [CrossRef]
- Shin, Y.; Kim, T.; Hong, S.; Lee, S.; Lee, E.; Hong, S.; Lee, C.; Kim, T.; Park, M.S.; Park, J.; et al. Prediction of Chlorophyll-a Concentrations in the Nakdong River Using Machine Learning Methods. Water 2020, 12, 1822. [Google Scholar] [CrossRef]
- Fadel, A.; Lemaire, B.J.; Vinçon-Leite, B.; Atoui, A.; Slim, K.; Tassin, B. On the successful use of a simplified model to simulate the succession of toxic cyanobacteria in a hypereutrophic reservoir with a highly fluctuating water level. Environ. Sci. Pollut. Res. Int. 2017, 24, 20934–20948. [Google Scholar] [CrossRef]
- Elliott, J.A. Is the future blue-green? A review of the current model predictions of how climate change could affect pelagic freshwater cyanobacteria. Water Res. 2012, 46, 1364–1371. [Google Scholar] [CrossRef]
- Pätynen, A.; Elliott, J.A.; Kiuru, P.; Sarvala, J.; Ventelä, A.M.; Jones, R.I. Modelling the impact of higher temperature on the phytoplankton of a boreal lake. Boreal Environ. Res. 2014, 19, 66–78. [Google Scholar]
- Yu, Z.; Yang, K.; Luo, Y.; Shang, C. Spatial-temporal process simulation and prediction of chlorophyll-a concentration in Dianchi Lake based on wavelet analysis and long-short term memory network. J. Hydrol. 2020, 582, 124488. [Google Scholar] [CrossRef]
- Weijuan, K.; Ronghua, M.A.; Hongtao, D. The neural network model for estimation of chlorophyll-a with water temperature in Lake Taihu. J. Lake Sci. 2009, 21, 193–198. [Google Scholar] [CrossRef]
- Yi, H.S.; Park, S.; An, K.G.; Kwak, K.C. Algal Bloom Prediction Using Extreme Learning Machine Models at Artificial Weirs in the Nakdong River, Korea. Int. J. Environ. Res. Public Health 2018, 15, 2078. [Google Scholar] [CrossRef] [PubMed]
- Park, Y.; Lee, H.K.; Shin, J.K.; Chon, K.; Kim, S.; Cho, K.H.; Kim, J.H.; Baek, S.S. A machine learning approach for early warning of cyanobacterial bloom outbreaks in a freshwater reservoir. J. Environ. Manag. 2021, 288, 112415. [Google Scholar] [CrossRef]
- Zhang, T.L.; He, M.X. A Method to Retrieve the Oceanic Chlorophyll-a Concentrations in Case I Water Based on Artificial Neural Network. Natl. Remote Sens. Bull. 2002, 1, 44–48. [Google Scholar]
- Ly, Q.V.; Nguyen, X.C.; Lê, N.C.; Truong, T.D.; Hoang, T.H.; Park, T.J.; Maqbool, T.; Pyo, J.; Cho, K.H.; Lee, K.S.; et al. Application of Machine Learning for eutrophication analysis and algal bloom prediction in an urban river: A 10-year study of the Han River, South Korea. Sci. Total Environ. 2021, 797, 149040. [Google Scholar] [CrossRef]
- Soranno, P. Factors affecting the timing of surface scums and epilimnetic blooms of blue-green algae in a eutrophic lake. Can. J. Fish. Aquat. Sci. 1997, 54, 1965–1975. [Google Scholar]
- Han, Y.; Aziz, T.N.; Del Giudice, D.; Hall, N.S.; Obenour, D.R. Exploring nutrient and light limitation of algal production in a shallow turbid reservoir. Environ. Pollut. 2021, 269, 116210. [Google Scholar] [CrossRef] [PubMed]
- Cao, H.; Recknagel, F.; Bartkow, M. Spatially-explicit forecasting of cyanobacteria assemblages in freshwater lakes by multi-objective hybrid evolutionary algorithms. Ecol. Model. 2016, 342, 97–112. [Google Scholar] [CrossRef]
- Liu, J.-Y.; Zeng, L.-H.; Ren, Z.-H. The application of spectroscopy technology in the monitoring of microalgae cells concentration. Appl. Spectrosc. Rev. 2020, 56, 171–192. [Google Scholar] [CrossRef]
- Liu, J.Y.; Zeng, L.H.; Ren, Z.H.; Du, T.M.; Liu, X. Rapid in situ measurements of algal cell concentrations using an artificial neural network and single-excitation fluorescence spectrometry. Algal Res. 2020, 45, 101739. [Google Scholar] [CrossRef]
- Chen, Y.; Song, L.; Liu, Y.; Yang, L.; Li, D. A Review of the Artificial Neural Network Models for Water Quality Prediction. Appl. Sci. 2020, 10, 5776. [Google Scholar] [CrossRef]
- Yang, H.; Kong, J.; Hu, H.; Du, Y.; Gao, M.; Chen, F. A Review of Remote Sensing for Water Quality Retrieval: Progress and Challenges. Remote Sens. 2022, 14, 1770. [Google Scholar] [CrossRef]
- Lucas, H.R.; Fernandez, R.D. Navigating the dynamic landscape of alpha-synuclein morphology: A review of the physiologically relevant tetrameric conformation. Neural Regen. Res. 2020, 15, 407–415. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
- Cohen, I.; Huang, Y.; Chen, J.; Benesty, J. Noise Reduction in Speech Processing; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- Cover, T.M. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 1999. [Google Scholar]
- Lubo-Robles, D.; Devegowda, D.; Jayaram, V.; Bedle, H.; Marfurt, K.J.; Pranter, M.J. Machine learning model interpretability using SHAP values: Application to a seismic facies classification task. In Proceedings of the SEG International Exposition and Annual Meeting, Virtual Event, 12–16 October 2020. [Google Scholar]
- Seber, G.A.F.; Lee, A.J. Linear Regression Analysis, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
- Quinlan, J.R. Induction of decision trees. Machine Learning. In Proceedings of the 24th Annual ACM Symposium on the Theory of Computing, Berkeley, CA, USA, 28–30 May 1986. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Andrews, D.F. A Robust Method for Multiple Linear Regression. Technometrics 1974, 16, 523–531. [Google Scholar] [CrossRef]
- Vapnik, V.; Golowich, S.; Smola, A. Support vector method for function approximation, regression estimation and signal processing. Adv. Neural Inf. Process. Syst. 1996, 9, 281–287. [Google Scholar]
- Park, Y.; Cho, K.H.; Park, J.; Cha, S.M.; Kim, J.H. Development of early-warning protocol for predicting chlorophyll-a concentration using machine learning models in freshwater and estuarine reservoirs, Korea. Sci. Total Environ. 2015, 502, 31–41. [Google Scholar] [CrossRef]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Wei, B.; Sugiura, N.; Maekawa, T. Use of artificial neural network in the prediction of algal blooms. Water Res. 2001, 35, 2022–2028. [Google Scholar] [CrossRef]
- Cohen, J. Statistical Power Analysis for the Behavioral Sciences; Routledge: London, UK, 2013. [Google Scholar]
- Shin, Y.; Lee, H.; Lee, Y.J.; Seo, D.K.; Jeong, B.; Hong, S.; Kim, J.; Kim, T.; Lee, J.K.; Heo, T.Y. The prediction of diatom abundance by comparison of various machine learning methods. Math. Probl. Eng. 2019, 2019, 5749746. [Google Scholar] [CrossRef]
- Beretta-Blanco, A.; Carrasco-Letelier, L. Relevant factors in the eutrophication of the Uruguay River and the Río Negro. Sci. Total Environ. 2021, 761, 143299. [Google Scholar] [CrossRef]
- Mamun, M.; Kim, J.J.; Alam, M.A.; An, K.G. Prediction of algal chlorophyll-a and water clarity in monsoon-region reservoir using machine learning approaches. Water 2019, 12, 30. [Google Scholar] [CrossRef]
- Kim, H.G.; Hong, S.; Jeong, K.S.; Kim, D.K.; Joo, G.J. Determination of sensitive variables regardless of hydrological alteration in artificial neural network model of chlorophyll a: Case study of Nakdong River. Ecol. Model. 2019, 398, 67–76. [Google Scholar] [CrossRef]
- Yajima, H.; Derot, J. Application of the Random Forest model for chlorophyll-a forecasts in fresh and brackish water bodies in Japan, using multivariate long-term databases. J. Hydroinformatics 2018, 20, 206–220. [Google Scholar] [CrossRef]
Dataset | WT | pH | DO | NTU | EC | CODMn | NH3-N | TP | TN |
---|---|---|---|---|---|---|---|---|---|
1 | 334 | 334 | 334 | 334 | 334 | 363 | 365 | 365 | 365 |
2 | 320 | 320 | 320 | 320 | 320 | 351 | 351 | 351 | 351 |
3 | 334 | 334 | 334 | 334 | 334 | 363 | 365 | 365 | 365 |
Variable | Unit | Count | Mean | Std | Min | 25% a | 50% b | 75% c | Max |
---|---|---|---|---|---|---|---|---|---|
WT | ℃ | 332 | 18.88 | 8.13 | 5.80 | 11.00 | 19.25 | 25.35 | 34.50 |
pH | - | 332 | 7.22 | 0.41 | 7.00 | 7.00 | 7.00 | 7.00 | 8.00 |
DO | mg/L | 332 | 6.07 | 2.60 | 1.00 | 3.90 | 5.80 | 8.30 | 11.90 |
turbidity | - | 332 | 54.07 | 27.45 | 8.60 | 34.5 | 48.95 | 66.32 | 202.7 |
EC | μs/cm | 332 | 387.51 | 85.12 | 193.7 | 338.58 | 395.15 | 448.6 | 583.6 |
CODMn | mg/L | 332 | 4.12 | 1.11 | 2.00 | 3.38 | 4.00 | 4.60 | 8.30 |
NH3-N | mg/L | 332 | 0.39 | 0.30 | 0.07 | 0.20 | 0.30 | 0.41 | 2.05 |
TP | mg/L | 332 | 0.05 | 0.02 | 0.02 | 0.04 | 0.05 | 0.06 | 0.17 |
TN | mg/L | 332 | 2.34 | 1.43 | 0.43 | 1.29 | 1.76 | 3.71 | 6.50 |
Chl-a | μg/L | 332 | 24.76 | 35.41 | 2.25 | 8.79 | 13.62 | 23.12 | 290.51 |
Model | Hyper-Parameter | Model Performance | ||||
---|---|---|---|---|---|---|
R2 | RMSE | |||||
Train | Test | Train | Test | |||
LR | Fit intercept | True | 0.45 | 0.3 | 13.46 | 16.23 |
DT | Max depth | 5 | 0.81 | 0.54 | 8.0 | 13.08 |
SVR | Kernel, C, Epsilon | RBF, 10,000, 0.0001 | 0.64 | 0.46 | 14.23 | 10.87 |
MLP | Hidden layer, Node | 3, (128, 512, 128) | 0.71 | 0.58 | 9.86 | 12.46 |
RF | Estimator | 100 | 0.95 | 0.64 | 4.03 | 11.53 |
XGBoost | Estimator | 100 | 1.0 | 0.78 | 0.0 | 8.97 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, G.; Zhu, W.; Qian, X.; Wei, C.; Xie, P.; Shi, Y.; Cao, X.; He, Y. Machine Learning Models for Chlorophyll-a Forecasting in a Freshwater Lake: Case Study of Lake Taihu. Water 2025, 17, 1219. https://doi.org/10.3390/w17081219
Sun G, Zhu W, Qian X, Wei C, Xie P, Shi Y, Cao X, He Y. Machine Learning Models for Chlorophyll-a Forecasting in a Freshwater Lake: Case Study of Lake Taihu. Water. 2025; 17(8):1219. https://doi.org/10.3390/w17081219
Chicago/Turabian StyleSun, Guojin, Weitang Zhu, Xiaoyan Qian, Chunlei Wei, Pengfei Xie, Yao Shi, Xiaoyong Cao, and Yi He. 2025. "Machine Learning Models for Chlorophyll-a Forecasting in a Freshwater Lake: Case Study of Lake Taihu" Water 17, no. 8: 1219. https://doi.org/10.3390/w17081219
APA StyleSun, G., Zhu, W., Qian, X., Wei, C., Xie, P., Shi, Y., Cao, X., & He, Y. (2025). Machine Learning Models for Chlorophyll-a Forecasting in a Freshwater Lake: Case Study of Lake Taihu. Water, 17(8), 1219. https://doi.org/10.3390/w17081219