Machine Learning Models for Chlorophyll-a Forecasting in a Freshwater Lake: Case Study of Lake Taihu
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Area
2.2. Data Preprocessing
2.2.1. Data Introduction
2.2.2. Data Preprocessing Steps
2.3. Machine Learning Model
2.3.1. Feature Analysis Methods
- (a)
- Pearson correlation coefficient (PCC) and principal component analysis (PCA)
- (b)
- Mutual information (MI) and Spearman rank correlation coefficients (SRCCs)
- (c)
- SHAP model
2.3.2. ML Model Construction
- (a)
- Linear Regression
- (b)
- Decision Tree
- (c)
- Support Vector Regression
- (d)
- Multi-Layer Perceptrons
- (e)
- Random Forest
- (f)
- XGBoost
2.3.3. Evaluation Indicators
3. Results
3.1. The Spatial and Temporal Distribution Features of Algal and Water Quality Datasets
3.1.1. The Spatio-Temporal Distribution of Chl-a Concentrations
3.1.2. Correlation Analysis of Algal Information and Chl-a
3.1.3. Analysis of Water Quality Information and Chl-a
3.2. Machine Learning Prediction of Chl-a Concentration
3.2.1. Model Accuracy
3.2.2. Feature Important Explanation
Rank | Ref [38] | Ref [43] | Ref [46] | Ref [47] | Ref [21] | Ref [45] | Ref [8] | This Study | ||
---|---|---|---|---|---|---|---|---|---|---|
Juam reservoir | Yeongsan reservoir | Nakdong river | Nakdong river | Lake Shinji | Lake Jordan | Imha reservoir | Lacustrine zone | Han river | Lake Taihu | |
Model | SVM | SVM | RF | ANN | RF | Mechanistic model | SVM | SVM | RF | XGBoost |
1 | PO4–P | NH3–N | PO4–P | Wind velocity | NTU | Limiting nutrient | WT | WT | TOC | EC |
2 | NO3–N | NO3–N | DO | EC | CODMn | TN | TSS a | Prep b | TN | WT |
3 | Wind speed | Solar radiation | NH3–N | Alkalinity | SS c | TP | DO | BOD | pH | CODMn |
4. Discussion
4.1. Model Performance and Feature Analysis
4.2. Model Generalization Performance
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
ANN | Artificial neural network |
Chl-a | Chlorophyll-a |
CNN | Convolutional neural network |
CODMn | Permanganate index |
DD | Data-driven |
DL | Deep learning |
DO | Dissolved oxygen |
DT | Decision tree |
EC | Electrical conductivity |
Cyano-HABs | Cyanobacteria harmful blooms |
LR | Linear regression |
LSTM | Long short-term memory |
ML | Machine learning |
MLP | Multi-layer perceptron |
NH3-N | Ammonia nitrogen |
NTU | Nephelometric turbidity units |
PB | Process-based |
PCA | Principal component analysis |
PCC | Pearson correlation coefficient |
R2 | Coefficient of determination |
RF | Random forest |
RMSE | Root mean square error |
SRCC | Spearman rank correlation coefficient |
SVR | Support vector regression |
TP | Total phosphorus |
TN | Total nitrogen |
WT | Water temperature |
XGBoost | eXtreme Gradient Boosting |
References
- Lévesque, B.; Gervais, M.C.; Chevalier, P.; Gauvin, D.; Anassour-Laouan-Sidi, E.; Gingras, S.; Fortin, N.; Brisson, G.; Greer, C.; Bird, D. Prospective study of acute health effects in relation to exposure to cyanobacteria. Sci. Total Environ. 2014, 466, 397–403. [Google Scholar] [CrossRef] [PubMed]
- Rousso, B.Z.; Bertone, E.; Stewart, R.; Hamilton, D.P. A systematic literature review of forecasting and predictive models for cyanobacteria blooms in freshwater lakes. Water Res. 2020, 182, 115959. [Google Scholar] [CrossRef] [PubMed]
- Guo, L. Doing battle with the green monster of Taihu Lake. Science 2007, 317, 1166. [Google Scholar] [CrossRef]
- Wang, H.; Zhu, R.; Zhang, J.; Ni, L.; Shen, H.; Xie, P. A Novel and Convenient Method for Early Warning of Algal Cell Density by Chlorophyll Fluorescence Parameters and Its Application in a Highland Lake. Front. Plant Sci. 2018, 9, 869. [Google Scholar] [CrossRef]
- Recknagel, F. Current scope, case studies and future directions of ecological informatics. J. Environ. Inform. 2013, 21, 3–11. [Google Scholar] [CrossRef]
- Boyer, J.N.; Kelble, C.R.; Ortner, P.B.; Rudnick, D.T. Phytoplankton bloom status: Chlorophyll a biomass as an indicator of water quality condition in the southern estuaries of Florida, USA. Ecol. Indic. 2009, 9, S56–S67. [Google Scholar] [CrossRef]
- Yang, J.; Zheng, Y.; Zhang, W.; Zhou, Y.; Zhang, Y. Comparative analysis of machine learning methods for prediction of chlorophyll-a in a river with different hydrology characteristics: A case study in Fuchun River, China. J. Environ. Manag. 2024, 364, 121386. [Google Scholar] [CrossRef]
- Kim, K.-M.; Ahn, J.-H. Machine learning predictions of chlorophyll-a in the Han river basin, Korea. J. Environ. Manag. 2022, 318, 115636. [Google Scholar] [CrossRef]
- Qin, B.; Paerl, H.W.; Brookes, J.D.; Liu, J.; Jeppesen, E.; Zhu, G.; Zhang, Y.; Xu, H.; Shi, K.; Deng, J. Why Lake Taihu continues to be plagued with cyanobacterial blooms through 10 years (2007–2017) efforts. Sci. Bull. 2019, 64, 7–9. [Google Scholar] [CrossRef]
- Shin, Y.; Kim, T.; Hong, S.; Lee, S.; Lee, E.; Hong, S.; Lee, C.; Kim, T.; Park, M.S.; Park, J.; et al. Prediction of Chlorophyll-a Concentrations in the Nakdong River Using Machine Learning Methods. Water 2020, 12, 1822. [Google Scholar] [CrossRef]
- Fadel, A.; Lemaire, B.J.; Vinçon-Leite, B.; Atoui, A.; Slim, K.; Tassin, B. On the successful use of a simplified model to simulate the succession of toxic cyanobacteria in a hypereutrophic reservoir with a highly fluctuating water level. Environ. Sci. Pollut. Res. Int. 2017, 24, 20934–20948. [Google Scholar] [CrossRef]
- Elliott, J.A. Is the future blue-green? A review of the current model predictions of how climate change could affect pelagic freshwater cyanobacteria. Water Res. 2012, 46, 1364–1371. [Google Scholar] [CrossRef]
- Pätynen, A.; Elliott, J.A.; Kiuru, P.; Sarvala, J.; Ventelä, A.M.; Jones, R.I. Modelling the impact of higher temperature on the phytoplankton of a boreal lake. Boreal Environ. Res. 2014, 19, 66–78. [Google Scholar]
- Yu, Z.; Yang, K.; Luo, Y.; Shang, C. Spatial-temporal process simulation and prediction of chlorophyll-a concentration in Dianchi Lake based on wavelet analysis and long-short term memory network. J. Hydrol. 2020, 582, 124488. [Google Scholar] [CrossRef]
- Weijuan, K.; Ronghua, M.A.; Hongtao, D. The neural network model for estimation of chlorophyll-a with water temperature in Lake Taihu. J. Lake Sci. 2009, 21, 193–198. [Google Scholar] [CrossRef]
- Yi, H.S.; Park, S.; An, K.G.; Kwak, K.C. Algal Bloom Prediction Using Extreme Learning Machine Models at Artificial Weirs in the Nakdong River, Korea. Int. J. Environ. Res. Public Health 2018, 15, 2078. [Google Scholar] [CrossRef] [PubMed]
- Park, Y.; Lee, H.K.; Shin, J.K.; Chon, K.; Kim, S.; Cho, K.H.; Kim, J.H.; Baek, S.S. A machine learning approach for early warning of cyanobacterial bloom outbreaks in a freshwater reservoir. J. Environ. Manag. 2021, 288, 112415. [Google Scholar] [CrossRef]
- Zhang, T.L.; He, M.X. A Method to Retrieve the Oceanic Chlorophyll-a Concentrations in Case I Water Based on Artificial Neural Network. Natl. Remote Sens. Bull. 2002, 1, 44–48. [Google Scholar]
- Ly, Q.V.; Nguyen, X.C.; Lê, N.C.; Truong, T.D.; Hoang, T.H.; Park, T.J.; Maqbool, T.; Pyo, J.; Cho, K.H.; Lee, K.S.; et al. Application of Machine Learning for eutrophication analysis and algal bloom prediction in an urban river: A 10-year study of the Han River, South Korea. Sci. Total Environ. 2021, 797, 149040. [Google Scholar] [CrossRef]
- Soranno, P. Factors affecting the timing of surface scums and epilimnetic blooms of blue-green algae in a eutrophic lake. Can. J. Fish. Aquat. Sci. 1997, 54, 1965–1975. [Google Scholar]
- Han, Y.; Aziz, T.N.; Del Giudice, D.; Hall, N.S.; Obenour, D.R. Exploring nutrient and light limitation of algal production in a shallow turbid reservoir. Environ. Pollut. 2021, 269, 116210. [Google Scholar] [CrossRef] [PubMed]
- Cao, H.; Recknagel, F.; Bartkow, M. Spatially-explicit forecasting of cyanobacteria assemblages in freshwater lakes by multi-objective hybrid evolutionary algorithms. Ecol. Model. 2016, 342, 97–112. [Google Scholar] [CrossRef]
- Liu, J.-Y.; Zeng, L.-H.; Ren, Z.-H. The application of spectroscopy technology in the monitoring of microalgae cells concentration. Appl. Spectrosc. Rev. 2020, 56, 171–192. [Google Scholar] [CrossRef]
- Liu, J.Y.; Zeng, L.H.; Ren, Z.H.; Du, T.M.; Liu, X. Rapid in situ measurements of algal cell concentrations using an artificial neural network and single-excitation fluorescence spectrometry. Algal Res. 2020, 45, 101739. [Google Scholar] [CrossRef]
- Chen, Y.; Song, L.; Liu, Y.; Yang, L.; Li, D. A Review of the Artificial Neural Network Models for Water Quality Prediction. Appl. Sci. 2020, 10, 5776. [Google Scholar] [CrossRef]
- Yang, H.; Kong, J.; Hu, H.; Du, Y.; Gao, M.; Chen, F. A Review of Remote Sensing for Water Quality Retrieval: Progress and Challenges. Remote Sens. 2022, 14, 1770. [Google Scholar] [CrossRef]
- Lucas, H.R.; Fernandez, R.D. Navigating the dynamic landscape of alpha-synuclein morphology: A review of the physiologically relevant tetrameric conformation. Neural Regen. Res. 2020, 15, 407–415. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
- Cohen, I.; Huang, Y.; Chen, J.; Benesty, J. Noise Reduction in Speech Processing; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- Cover, T.M. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 1999. [Google Scholar]
- Lubo-Robles, D.; Devegowda, D.; Jayaram, V.; Bedle, H.; Marfurt, K.J.; Pranter, M.J. Machine learning model interpretability using SHAP values: Application to a seismic facies classification task. In Proceedings of the SEG International Exposition and Annual Meeting, Virtual Event, 12–16 October 2020. [Google Scholar]
- Seber, G.A.F.; Lee, A.J. Linear Regression Analysis, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
- Quinlan, J.R. Induction of decision trees. Machine Learning. In Proceedings of the 24th Annual ACM Symposium on the Theory of Computing, Berkeley, CA, USA, 28–30 May 1986. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Andrews, D.F. A Robust Method for Multiple Linear Regression. Technometrics 1974, 16, 523–531. [Google Scholar] [CrossRef]
- Vapnik, V.; Golowich, S.; Smola, A. Support vector method for function approximation, regression estimation and signal processing. Adv. Neural Inf. Process. Syst. 1996, 9, 281–287. [Google Scholar]
- Park, Y.; Cho, K.H.; Park, J.; Cha, S.M.; Kim, J.H. Development of early-warning protocol for predicting chlorophyll-a concentration using machine learning models in freshwater and estuarine reservoirs, Korea. Sci. Total Environ. 2015, 502, 31–41. [Google Scholar] [CrossRef]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Wei, B.; Sugiura, N.; Maekawa, T. Use of artificial neural network in the prediction of algal blooms. Water Res. 2001, 35, 2022–2028. [Google Scholar] [CrossRef]
- Cohen, J. Statistical Power Analysis for the Behavioral Sciences; Routledge: London, UK, 2013. [Google Scholar]
- Shin, Y.; Lee, H.; Lee, Y.J.; Seo, D.K.; Jeong, B.; Hong, S.; Kim, J.; Kim, T.; Lee, J.K.; Heo, T.Y. The prediction of diatom abundance by comparison of various machine learning methods. Math. Probl. Eng. 2019, 2019, 5749746. [Google Scholar] [CrossRef]
- Beretta-Blanco, A.; Carrasco-Letelier, L. Relevant factors in the eutrophication of the Uruguay River and the Río Negro. Sci. Total Environ. 2021, 761, 143299. [Google Scholar] [CrossRef]
- Mamun, M.; Kim, J.J.; Alam, M.A.; An, K.G. Prediction of algal chlorophyll-a and water clarity in monsoon-region reservoir using machine learning approaches. Water 2019, 12, 30. [Google Scholar] [CrossRef]
- Kim, H.G.; Hong, S.; Jeong, K.S.; Kim, D.K.; Joo, G.J. Determination of sensitive variables regardless of hydrological alteration in artificial neural network model of chlorophyll a: Case study of Nakdong River. Ecol. Model. 2019, 398, 67–76. [Google Scholar] [CrossRef]
- Yajima, H.; Derot, J. Application of the Random Forest model for chlorophyll-a forecasts in fresh and brackish water bodies in Japan, using multivariate long-term databases. J. Hydroinformatics 2018, 20, 206–220. [Google Scholar] [CrossRef]
Dataset | WT | pH | DO | NTU | EC | CODMn | NH3-N | TP | TN |
---|---|---|---|---|---|---|---|---|---|
1 | 334 | 334 | 334 | 334 | 334 | 363 | 365 | 365 | 365 |
2 | 320 | 320 | 320 | 320 | 320 | 351 | 351 | 351 | 351 |
3 | 334 | 334 | 334 | 334 | 334 | 363 | 365 | 365 | 365 |
Variable | Unit | Count | Mean | Std | Min | 25% a | 50% b | 75% c | Max |
---|---|---|---|---|---|---|---|---|---|
WT | ℃ | 332 | 18.88 | 8.13 | 5.80 | 11.00 | 19.25 | 25.35 | 34.50 |
pH | - | 332 | 7.22 | 0.41 | 7.00 | 7.00 | 7.00 | 7.00 | 8.00 |
DO | mg/L | 332 | 6.07 | 2.60 | 1.00 | 3.90 | 5.80 | 8.30 | 11.90 |
turbidity | - | 332 | 54.07 | 27.45 | 8.60 | 34.5 | 48.95 | 66.32 | 202.7 |
EC | μs/cm | 332 | 387.51 | 85.12 | 193.7 | 338.58 | 395.15 | 448.6 | 583.6 |
CODMn | mg/L | 332 | 4.12 | 1.11 | 2.00 | 3.38 | 4.00 | 4.60 | 8.30 |
NH3-N | mg/L | 332 | 0.39 | 0.30 | 0.07 | 0.20 | 0.30 | 0.41 | 2.05 |
TP | mg/L | 332 | 0.05 | 0.02 | 0.02 | 0.04 | 0.05 | 0.06 | 0.17 |
TN | mg/L | 332 | 2.34 | 1.43 | 0.43 | 1.29 | 1.76 | 3.71 | 6.50 |
Chl-a | μg/L | 332 | 24.76 | 35.41 | 2.25 | 8.79 | 13.62 | 23.12 | 290.51 |
Model | Hyper-Parameter | Model Performance | ||||
---|---|---|---|---|---|---|
R2 | RMSE | |||||
Train | Test | Train | Test | |||
LR | Fit intercept | True | 0.45 | 0.3 | 13.46 | 16.23 |
DT | Max depth | 5 | 0.81 | 0.54 | 8.0 | 13.08 |
SVR | Kernel, C, Epsilon | RBF, 10,000, 0.0001 | 0.64 | 0.46 | 14.23 | 10.87 |
MLP | Hidden layer, Node | 3, (128, 512, 128) | 0.71 | 0.58 | 9.86 | 12.46 |
RF | Estimator | 100 | 0.95 | 0.64 | 4.03 | 11.53 |
XGBoost | Estimator | 100 | 1.0 | 0.78 | 0.0 | 8.97 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, G.; Zhu, W.; Qian, X.; Wei, C.; Xie, P.; Shi, Y.; Cao, X.; He, Y. Machine Learning Models for Chlorophyll-a Forecasting in a Freshwater Lake: Case Study of Lake Taihu. Water 2025, 17, 1219. https://doi.org/10.3390/w17081219
Sun G, Zhu W, Qian X, Wei C, Xie P, Shi Y, Cao X, He Y. Machine Learning Models for Chlorophyll-a Forecasting in a Freshwater Lake: Case Study of Lake Taihu. Water. 2025; 17(8):1219. https://doi.org/10.3390/w17081219
Chicago/Turabian StyleSun, Guojin, Weitang Zhu, Xiaoyan Qian, Chunlei Wei, Pengfei Xie, Yao Shi, Xiaoyong Cao, and Yi He. 2025. "Machine Learning Models for Chlorophyll-a Forecasting in a Freshwater Lake: Case Study of Lake Taihu" Water 17, no. 8: 1219. https://doi.org/10.3390/w17081219
APA StyleSun, G., Zhu, W., Qian, X., Wei, C., Xie, P., Shi, Y., Cao, X., & He, Y. (2025). Machine Learning Models for Chlorophyll-a Forecasting in a Freshwater Lake: Case Study of Lake Taihu. Water, 17(8), 1219. https://doi.org/10.3390/w17081219