Probabilistic Water Quality Monitoring Using Multi-Temporal Sentinel-2 Data: A Situational Awareness Framework for Harmful Algal Bloom Forecasting
Highlights
- A confidence-based water quality monitoring framework integrating Sentinel-2 imagery with XGBoost quantile regression (0.05, 0.50, 0.95 quantiles) and LightGBM temporal forecasting achieved 2.9% and 5.7% MAPE for 10-day and 20-day harmful algal bloom forecasts with 90% prediction intervals.
- Analysis of 235 data points from Lake Okeechobee revealed a 47.2% bloom frequency, with LightGBM outperforming XGBoost, Random Forest, and Ridge Regression across all temporal horizons (RMSE: 0.2333 vs. 0.3017+ for 10-day forecasts).
- The probabilistic paradigm transforms water quality monitoring from deterministic predictions to uncertainty-aware decision support, enabling resource managers to implement risk-based responses through categorical classifications (LOW, MEDIUM, HIGH) that accommodate stakeholder-specific risk tolerances.
- Integration of satellite remote sensing with quantile regression provides a scalable operational framework for water quality monitoring and early warning systems, bridging the algorithmic-operational gap through proof-of-concept visualization tools that communicate prediction reliability alongside forecasted bloom magnitude for actionable management decisions.
Abstract
1. Introduction
- A confidence-based prediction framework replacing binary outputs with quantile regression-based uncertainty intervals;
- An integrated architecture combining current condition assessment (XGBoost) with temporal forecasting (LightGBM);
- A novel risk classification scheme incorporating both predicted values and confidence levels for actionable intelligence;
- An interactive dashboard system enabling stakeholder-specific risk interpretation and decision support;
- Empirical validation demonstrating operational utility and multi-horizon forecasting capabilities (10- to 20-day) through a representative case study in a large subtropical freshwater system.
2. Materials and Methods
2.1. Study Area
2.2. System Architecture
- Data Ingestion and Caching Module: This component is responsible for the automated retrieval of Sentinel-2 satellite imagery from the Microsoft Planetary Computer platform, the extraction of relevant spectral bands, and storage of processed data in a structured cache system to avoid redundant downloads and enable efficient temporal analysis. The use of cloud computing platforms for satellite data processing has become increasingly important for large-scale environmental monitoring applications, providing scalable infrastructure for handling multi-temporal datasets [42].Feature Engineering Module: This module extracts a comprehensive set of environmental indicators from satellite imagery, including spectral indices (e.g., Normalized Difference Vegetation Index, Chlorophyll Index), band ratios optimized for cyanobacteria detection, and contextual information such as land cover type derived from ancillary geospatial datasets. Spectral feature optimization has proven critical for water quality estimation using remote sensing approaches, with the careful selection of wavelength combinations significantly improving predictive performance [43,44,45,46].Cyanobacteria Density Prediction Module: This component employs extreme gradient boosting (XGBoost) models trained with quantile regression loss functions to predict cyanobacteria density at three probability levels (0.05, 0.50, and 0.95 quantiles), providing explicit uncertainty bounds around point estimates. Recent advances in quantile extreme gradient boosting have demonstrated enhanced capability for uncertainty quantification in environmental applications, particularly for capturing both aleatoric and epistemic uncertainties in complex nonlinear systems [47].Time Series Forecasting Module: This module utilizes Light Gradient Boosting Machine (LightGBM) algorithms to forecast cyanobacteria density over 10-day and 20-day horizons, incorporating temporal dependencies and seasonal patterns while explicitly modeling prediction uncertainty through quantile regression. LightGBM has shown particular effectiveness in environmental time series forecasting applications, including greenhouse temperature prediction and water quality monitoring, due to its computational efficiency and ability to handle high-dimensional feature spaces [48,49,50,51,52].Risk Assessment and Visualization Module: This component synthesizes predicted values and uncertainty estimates into categorical risk classifications (LOW, MEDIUM, HIGH) and generates intuitive visualizations through an interactive web-based dashboard interface (Figure 4). From an operational deployment standpoint, the risk classification scheme accounts for both the magnitude of predicted cyanobacteria density and the width of prediction intervals, enabling integration into existing water management workflows where different agencies may maintain distinct action thresholds based on their regulatory mandates and resource constraints.
2.3. Data Sources
2.3.1. Sentinel-2 Multispectral Imagery
- Visible and Near-Infrared bands (10 m native resolution): Blue (B02, 490 nm), Green (B03, 560 nm), Red (B04, 665 nm), and Near-Infrared (B08, 842 nm);
- Red Edge and Shortwave Infrared bands (20 m native resolution): Vegetation Red Edge (B05, 705 nm; B06, 740 nm; B07, 783 nm), Narrow Near-Infrared (B8A, 865 nm), and Shortwave Infrared (B11, 1610 nm; B12, 2190 nm);
- Atmospheric and Quality bands (60 m native resolution): Coastal Aerosol (B01, 443 nm), Water Vapor (B09, 945 nm), and SWIR-Cirrus (B10, 1375 nm);
- Auxiliary Products: Scene Classification Layer (SCL) for cloud masking and quality assessment, and Aerosol Optical Thickness (AOT) for atmospheric correction verification.
2.3.2. Training Dataset
2.3.3. Feature Engineering
- 1.
- Normalized Difference Vegetation Index (NDVI) variants:
- NDVI_B04: (B08 − B04)/(B08 + B04);
- NDVI_B05: (B08 − B05)/(B08 + B05);
- NDVI_B06: (B08 − B06)/(B08 + B06);
- NDVI_B07: (B08 − B07)/(B08 + B07).
- 2.
- Band ratios:
- Green/Red ratio: B03/B04;
- Green/Blue ratio: B03/B02;
- Red/Blue ratio: B04/B02;
- Green 95th percentile to blue mean ratio: percentile(B03, 95)/mean(B02);
- Green 5th percentile to blue mean ratio: percentile(B03, 5)/mean(B02).
- 3.
- Percentile-based features:
- 95th percentile of green band values (green95th);
- 5th percentile of green band values (green5th).
- 4.
- Water classification:
- Percentage of pixels classified as water using the Scene Classification Layer (percent_water).
- 5.
- Band statistics for all 15 Sentinel-2 bands (AOT, B01-B12, B8A, SCL, WVP):
- Mean values (e.g., B01_mean, B02_mean, …, WVP_mean);
- Minimum values (e.g., B01_min, B02_min, …, WVP_min);
- Maximum values (e.g., B01_max, B02_max, …, WVP_max);
- Range values (e.g., B01_range, B02_range, …, WVP_range).
- 6.
- Temporal and metadata features:
- Month of acquisition;
- Days before sampling;
- Land cover classification.
- 7.
- The complete feature set includes:
- Satellite image features (25 features): B01_mean, B02_mean, B03_mean, B04_mean, B05_mean, B06_mean, B07_mean, B08_mean, B09_mean, B11_mean, B12_mean, B8A_mean, WVP_mean, AOT_mean, percent_water, green95th, green5th, green_red_ratio, green_blue_ratio, red_blue_ratio, green95th_blue_ratio, green5th_blue_ratio, NDVI_B04, NDVI_B05, NDVI_B06, NDVI_B07, AOT_range;
- Satellite metadata features (2 features): month, days_before_sample;
- Sample metadata features (1 feature): land_cover.
2.4. XGBoost and LightGBM
2.5. Quantile Regression for Uncertainty Quantification
Uncertainty Propagation Between Modeling Stages
2.6. Risk Classification
- 1.
- HIGH risk:
- Predicted value exceeds the severe bloom threshold (11.5, corresponding to 100,000 cells/mL), or
- Predicted value exceeds the moderate bloom threshold (10.0, corresponding to 20,000 cells/mL) and bloom probability ≥ 0.7.
- 2.
- MEDIUM risk:
- Predicted value exceeds the moderate bloom threshold and bloom probability ≥ 0.4 but <0.7, or
- Predicted value below the moderate bloom threshold but bloom probability ≥ 0.7.
- 3.
- LOW risk:
- All other cases.
3. Results
3.1. Historical Data Analysis
3.2. Temporal Forecasting Accuracy and Model Comparison
3.3. Predictive Confidence and Temporal Dynamics
3.4. Feature Contributions
3.5. Summary and Implications
3.6. Bloom Risk Assessment
4. Discussion
4.1. Advantages of the Confidence-Based Approach
4.2. Limitations and Challenges
4.3. Implications for Environmental Monitoring and Decision Support
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| AOT | Aerosol Optical Thickness |
| API | Application Programming Interface |
| FDS | Fast Detection Strategies |
| HAB | Harmful Algal Bloom |
| L2A | Level-2A (Sentinel-2 product) |
| LightGBM | Light Gradient Boosting Machine |
| MAE | Mean Absolute Error |
| MAPE | Mean Absolute Percentage Error |
| MODIS | Moderate Resolution Imaging Spectroradiometer |
| MSI | MultiSpectral Instrument |
| R2 | Coefficient of Determination |
| RMSE | Root Mean Square Error |
| SA | Situational Awareness |
| SCL | Scene Classification Layer |
| SWIR | Shortwave Infrared |
| XGBoost | Extreme Gradient Boosting |
References
- Anderson, D.M.; Cembella, A.D.; Hallegraeff, G.M. Progress in understanding harmful algal blooms: Paradigm shifts and new technologies for research, monitoring, and management. Annu. Rev. Mar. Sci. 2012, 4, 143–176. [Google Scholar] [CrossRef]
- Lega, M.; Napoli, R.M.A. A new approach to solid waste landfill aerial monitoring. WIT Trans. Ecol. Environ. 2008, 109, 193–199. [Google Scholar] [CrossRef]
- Endsley, M.R. Toward a theory of situation awareness in dynamic systems. Hum. Factors 1995, 37, 32–64. [Google Scholar] [CrossRef]
- Faucheux, S.; Froger, G.; Noël, J.-F. What forms of rationality for sustainable development? J. Socio-Econ. 1995, 24, 169–209. [Google Scholar] [CrossRef]
- Lega, M.; Medio, G.; Severino, V.; Casazza, M.; Endreny, T.; Teta, R. Coastal Water Pollution Characterization: Enhanced Situational Awareness Through Multiscale Data Acquisition and Analysis. Int. J. Environ. Impacts 2024, 7, 188–202. [Google Scholar] [CrossRef]
- Persechino, G.; Schiano, P.; Lega, M.; Napoli, R.M.A.; Ferrara, C.; Kosmatka, J. Aerospace-based support systems and interoperability: The solution to fight illegal dumping. WIT Trans. Ecol. Environ. 2010, 140, 203–214. [Google Scholar] [CrossRef]
- Mohsen, N.; Lu, J.; Zhang, G. An intelligent situation awareness support system for safety-critical environments. Decis. Support Syst. 2014, 59, 325–340. [Google Scholar] [CrossRef]
- Schaeffer, B.A.; Schaeffer, K.G.; Keith, D.; Lunetta, R.S.; Conmy, R.; Gould, R.W. Barriers to adopting satellite remote sensing for water quality management. Int. J. Remote Sens. 2013, 34, 7534–7544. [Google Scholar] [CrossRef]
- Esposito, G.; De Rosa, T.; Di Matteo, V.; Ciccarelli, C.; Ajaoud, M.; Teta, R.; Lega, M.; Costantino, V. Bio-tracking, bio-monitoring and bio-magnification interdisciplinary studies to assess cyanobacterial harmful algal blooms (cyanoHABs)’ impact in complex coastal systems. Sci. Total Environ. 2025, 978, 179480. [Google Scholar] [CrossRef]
- Mishra, S.; Stumpf, R.P.; Schaeffer, B.A.; Werdell, P.J.; Loftin, K.A.; Meredith, A. Measurement of cyanobacterial bloom magnitude using satellite remote sensing. Sci. Rep. 2019, 9, 18310. [Google Scholar] [CrossRef]
- Lega, M.; Casazza, M.; Teta, R.; Zappa, C.J. Environmental impact assessment: A multilevel, multi-parametric framework for coastal waters. Int. J. Sustain. Dev. Plan. 2018, 13, 1041–1049. [Google Scholar] [CrossRef]
- Colkesen, I.; Ozturk, M.Y.; Altuntas, O.Y. Comparative evaluation of performances of algae indices, pixel- and object-based machine learning algorithms in mapping floating algal blooms using Sentinel-2 imagery. Stoch. Environ. Res. Risk Assess. 2024, 38, 1613–1634. [Google Scholar]
- Nguyen, H.Q.; Ha, N.T.; Pham, T.L. Inland harmful cyanobacterial bloom prediction in the eutrophic Tri An Reservoir using satellite band ratio and machine learning approaches. Environ. Sci. Pollut. Res. 2020, 27, 9135–9151. [Google Scholar] [CrossRef] [PubMed]
- Xie, Z.; Lou, I.; Ung, W.K.; Mok, K.M. Freshwater algal bloom prediction by support vector machine in macau storage reservoirs. Math. Probl. Eng. 2012, 2012, 397473. [Google Scholar] [CrossRef]
- Pamula, A.S.P.; Gholizadeh, H.; Krzmarzick, M.J.; Mausbach, W.E.; Lampert, D.J. A remote sensing tool for near real-time monitoring of harmful algal blooms and turbidity in reservoirs. JAWRA J. Am. Water Resour. Assoc. 2023, 59, 929–949. [Google Scholar] [CrossRef]
- Medio, G.; Severino, V.; Teta, R.; Endreny, T.; Lega, M. Hierarchical Monitoring of Water Quality: Coordinating the Spatiotemporal Resolution of Multilayer and Multispectral Sensors to Characterize Pollution. In WIT Transactions on Ecology and the Environment; WIT Press: Southampton, UK, 2022; Volume 257. [Google Scholar]
- Bagherian, K.; Fernández-Figueroa, E.G.; Rogers, S.R.; Wilson, A.E.; Bao, Y. Predicting Chlorophyll-a Concentration and Harmful Algal Blooms in Lake Okeechobee Using Time-Series MODIS Satellite Imagery and Long Short-Term Memory. J. ASABE 2024, 67, 619–632. [Google Scholar] [CrossRef]
- Ameer, S.; Shah, M.A.; Khan, A.; Song, H.; Maple, C.; Islam, S.U.; Asghar, M.N. Comparative analysis of machine learning techniques for predicting air quality in smart cities. IEEE Access 2019, 7, 128325–128338. [Google Scholar] [CrossRef]
- Kang, Y.; Ozdogan, M.; Zhu, X.; Ye, Z.; Hain, C.; Anderson, M. Comparative assessment of environmental variables and machine learning algorithms for maize yield prediction in the US Midwest. Environ. Res. Lett. 2020, 15, 064005. [Google Scholar] [CrossRef]
- Mermer, O.; Zhang, E.; Demir, I. Predicting Harmful Algal Blooms Using Ensemble Machine Learning Models and Explainable AI Technique: A Comparative Study. EartharXiv 2024. [Google Scholar] [CrossRef]
- Huang, Z.; Ma, R.; Liu, H.; Xue, K.; Hu, M.; Wei, X.; Li, H. Short-term spatial prediction of algal blooms in Lake Taihu via machine learning and GOCI observations. J. Environ. Manag. 2025, 388, 125964. [Google Scholar] [CrossRef]
- Haynes, K.; Lagerquist, R.; McGraw, M.; Musgrave, K.; Ebert-Uphoff, I. Creating and evaluating uncertainty estimates with neural networks for environmental-science applications. Artif. Intell. Earth Syst. 2023, 2, e220061. [Google Scholar] [CrossRef]
- Pyo, J.; Park, L.J.; Pachepsky, Y.; Baek, S.S.; Kim, K.; Cho, K.H. Using convolutional neural network for predicting cyanobacteria concentrations in river water. Water Res. 2020, 186, 116349. [Google Scholar] [CrossRef] [PubMed]
- Teta, R.; Della Sala, G.; Esposito, G.; Stornaiuolo, M.; Scarpato, S.; Casazza, M.; Anastasio, A.; Lega, M.; Costantino, V. Monitoring Cyanobacterial Blooms during the COVID-19 Pandemic in Campania, Italy: The Case of Lake Avernus. Toxins 2021, 13, 471. [Google Scholar] [CrossRef] [PubMed]
- Esposito, G.; Glukhov, E.; Gerwick, W.H.; Medio, G.; Teta, R.; Lega, M.; Costantino, V. Lake Avernus Has Turned Red: Bioindicator Monitoring Unveils the Secrets of ‘Gates of Hades’. Toxins 2023, 15, 208. [Google Scholar] [CrossRef]
- Lu, D.; Ye, M.; Hill, M.C. Analysis of regression confidence intervals and Bayesian credible intervals for uncertainty quantification. Water Resour. Res. 2012, 48, W09521. [Google Scholar] [CrossRef]
- Ajaoud, M.; Ciccarelli, C.; De Mizio, M.; Gargiulo, M.; Parrilli, S.; Savarese, C.; Tufano, F.; Lega, M. Bridging Sustainability and Environmental Impact Assessment: Multi-Scale Bioindication and Remote Sensing for Pollution Monitoring in Agroecosystems. Sustainability 2025, 17, 4115. [Google Scholar] [CrossRef]
- Singh, G.; Moncrieff, G.; Venter, Z.; Cawse-Nicholson, K.; Slingsby, J.; Robinson, T.B. Uncertainty quantification for probabilistic machine learning in earth observation using conformal prediction. Sci. Rep. 2024, 14, 14954. [Google Scholar] [CrossRef]
- Vasseur, S.P.; Aznarte, J.L. Comparing quantile regression methods for probabilistic forecasting of NO2 pollution levels. Sci. Rep. 2021, 11, 10394. [Google Scholar] [CrossRef]
- Verbois, H.; Rusydi, A.; Thiery, A. Probabilistic forecasting of day-ahead solar irradiance using quantile gradient boosting. Sol. Energy 2018, 173, 313–327. [Google Scholar] [CrossRef]
- Poch, M.; Comas, J.; Rodríguez-Roda, I.; Sànchez-Marrè, M.; Cortés, U. Designing and building real environmental decision support systems. Environ. Model. Softw. 2004, 19, 857–873. [Google Scholar] [CrossRef]
- Reynolds, N.; Schaeffer, B.A.; Guertault, L.; Nelson, N.G. Satellite and in situ cyanobacteria monitoring: Understanding the impact of monitoring frequency on management decisions. J. Hydrol. 2023, 617, 128884. [Google Scholar] [CrossRef]
- Neil, C.; Spyrakos, E.; Hunter, P.D.; Tyler, A.N. A global approach for chlorophyll-a retrieval across optically complex inland waters based on optical water types. Remote Sens. Environ. 2019, 229, 159–178. [Google Scholar] [CrossRef]
- Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N.; Prabhat, F. Deep learning and process understanding for data-driven Earth system science. Nature 2019, 566, 195–204. [Google Scholar] [CrossRef] [PubMed]
- Hamilton, D.P.; Carey, C.C.; Arvola, L.; Arzberger, P.; Brewer, C.; Cole, J.J.; Gaiser, E.; Hanson, P.C.; Ibelings, B.W.; Jennings, E.; et al. A Global Lake Ecological Observatory Network (GLEON) for synthesising high-frequency sensor data for validation of deterministic ecological models. Inland Waters 2015, 5, 49–56. [Google Scholar] [CrossRef]
- Havens, K.E.; Hanlon, C.; James, R.T. Seasonal and spatial variation in algal bloom frequencies in Lake Okeechobee, Florida, USA. Lake Reserv. Manag. 1994, 10, 133–143. [Google Scholar] [CrossRef]
- Phlips, E.J.; Badylak, S.; Nelson, N.G.; Havens, K.E. Hurricanes, El Niño and harmful algal blooms in two sub-tropical Florida estuaries: Direct and indirect impacts. Sci. Rep. 2020, 10, 1910. [Google Scholar] [CrossRef]
- Lefler, F.W.; Barbosa, M.; Zimba, P.V.; Smyth, A.R.; Berthold, D.E.; Laughinghouse, H.D. Spatiotemporal diversity and community structure of cyanobacteria and associated bacteria in the large shallow subtropical Lake Okeechobee (Florida, United States). Front. Microbiol. 2023, 14, 1219261. [Google Scholar] [CrossRef]
- Lega, M.; d’Antonio, L.; Napoli, R.M.A. Cultural heritage and waste heritage: Advanced techniques to preserve cultural heritage, exploring just in time the ruins produced by disasters and natural calamities. In Management and the Environment V; Popov, V., Itoh, H., Mander, U., Brebbia, C.A., Eds.; WIT Press: Ashurst Lodge, UK, 2010; pp. 123–134. [Google Scholar]
- Lega, M.; Napoli, R.M.A. Aerial infrared thermography in the surface waters contamination monitoring. Desalination Water Treat. 2010, 23, 141–151. [Google Scholar] [CrossRef]
- Lega, M.; Kosmatka, J.; Ferrara, C.; Russo, F.; Napoli, R.M.A.; Persechino, G. Using advanced aerial platforms and infrared thermography to track environmental contamination. Environ. Forensics 2012, 13, 332–338. [Google Scholar] [CrossRef]
- Mahdianpari, M.; Salehi, B.; Mohammadimanesh, F.; Homayouni, S.; Gill, E. The first wetland inventory map of newfoundland at a spatial resolution of 10 m using sentinel-1 and sentinel-2 data on the google earth engine cloud computing platform. Remote Sens. 2018, 11, 43. [Google Scholar] [CrossRef]
- Dorne, E.; Wetstone, K.; Cerquera, T.B.; Gupta, S. Cyanobacteria Detection in Small, Inland Water Bodies with CyFi. Proceedings of the AGU. 2024. Available online: https://proceedings.scipy.org/articles/PDHK7238 (accessed on 12 May 2025).
- Paneru, B.; Paneru, B. AI for Water Sustainability: Global Water Quality Assessment and Prediction with Explainable AI with LLM Chatbot for Insights. arXiv 2024, arXiv:2409.10898. [Google Scholar]
- Shah, F.U.; Khan, A.U.; Khan, A.W.; Ullah, B.; Ali, S.; Ahmad, I.; Shah, S.U. Comparative analysis of ensemble learning algorithms in water quality prediction. J. Hydroinform. 2024, 26, 3041–3058. [Google Scholar] [CrossRef]
- Van Nguyen, M.; Lin, C.H.; Chu, H.J.; Jaelani, L.M.; Syariz, M.A. Spectral feature selection optimization for water quality estimation. Int. J. Environ. Res. Public Health 2020, 17, 272. [Google Scholar] [CrossRef] [PubMed]
- Yin, X.; Fallah-Shorshani, M.; McConnell, R.; Fruin, S.; Chiang, Y.Y.; Franklin, M. Quantile extreme gradient boosting for uncertainty quantification. arXiv 2023, arXiv:2304.11732. [Google Scholar] [CrossRef]
- Cao, Q.; Wu, Y.; Yang, J.; Yin, J. Greenhouse temperature prediction based on time-series features and LightGBM. Appl. Sci. 2023, 13, 1610. [Google Scholar] [CrossRef]
- Toharudin, T.; Caraka, R.E.; Pratiwi, I.R.; Kim, Y.; Tai, S.K.; Yustiawan, T.; Purnama, A. How to Handle Unbalanced Classification of PM2.5 Concentration Levels by Observing Meteorological Parameters in Jakarta-Indonesia Using AdaBoost, XGBoost, CatBoost, and LightGBM. IEEE Access 2023, 11, 35989–36003. [Google Scholar] [CrossRef]
- Yu, Z.; Ma, J.; Qu, Y.; Pan, L.; Wan, S. PM2.5 extended-range forecast based on MJO and S2S using LightGBM. Sci. Total Environ. 2023, 873, 162369. [Google Scholar] [CrossRef]
- Zhang, X.; Jiang, X.; Li, Y. Prediction of air quality index based on the SSA-BiLSTM-LightGBM model. Sci. Rep. 2023, 13, 5550. [Google Scholar] [CrossRef]
- Zhou, S.; Song, C.; Zhang, J.; Chang, W.; Hou, W.; Yang, L. A hybrid prediction framework for water quality with integrated W-ARIMA-GRU and LightGBM methods. Water 2022, 14, 1322. [Google Scholar] [CrossRef]
- Microsoft Planetary Computer. Available online: https://ui.adsabs.harvard.edu/abs/2022zndo...7261896O/abstract (accessed on 12 May 2025).
- Drusch, M.; Del Bello, U.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P.; et al. Sentinel-2: ESA’s optical high-resolution mission for GMES operational services. Remote Sens. Environ. 2012, 120, 25–36. [Google Scholar] [CrossRef]
- Muller-Karger, F.E.; Hestir, E.; Ade, C.; Turpie, K.; Roberts, D.A.; Siegel, D.; Miller, R.J.; Humm, D.; Izenberg, N.; Keller, M.; et al. Satellite sensor requirements for monitoring essential biodiversity variables of coastal ecosystems. Ecol. Appl. 2018, 28, 749–760. [Google Scholar] [CrossRef]
- Gupta, S.; Gelbart, E.; Gupta, R.; Wetstone, K.; Dorne, E. Cyanobacteria Aggregated Manual Labels Dataset (NASA and DrivenData); SeaBASS; NASA Ocean Biology Distributed Active Archive Center: Greenbelt, MD, USA, 2024. [Google Scholar] [CrossRef]
- Adjovu, G.E.; Stephen, H.; James, D.; Ahmad, S. Overview of the application of remote sensing in effective monitoring of water quality parameters. Remote Sens. 2023, 15, 1938. [Google Scholar] [CrossRef]
- Gholizadeh, M.H.; Melesse, A.M.; Reddi, L. A comprehensive review on water quality parameters estimation using remote sensing techniques. Sensors 2016, 16, 1298. [Google Scholar] [CrossRef] [PubMed]
- Lu, H.; Ma, X. Hybrid decision tree-based machine learning models for short-term water quality prediction. Chemosphere 2020, 249, 126169. [Google Scholar] [CrossRef] [PubMed]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In KDD ′16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM, Inc.: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3146–3154. [Google Scholar]
- DrivenData. Tick Tick Bloom Benchmark Report. 2024. Available online: https://drivendata.co/blog/tick-tick-bloom-benchmark/ (accessed on 12 May 2025).
- Bian, L.; Xie, H.; Wang, H.; Liu, H.; Meng, J.; Chen, J. Application, interpretability and prediction of machine learning method combined with LSTM and LightGBM-a case study for runoff simulation in an arid area. J. Hydrol. 2023, 625, 130091. [Google Scholar] [CrossRef]
- Rahmati, O.; Choubin, B.; Fathabadi, A.; Coulon, F.; Soltani, E.; Shahabi, H.; Mollaefar, E.; Tiefenbacher, J.; Cipullo, S.; Bin Ahmad, B.; et al. Predicting uncertainty of machine learning models for modelling nitrate pollution of groundwater using quantile regression and UNEEC methods. Sci. Total Environ. 2019, 688, 855–866. [Google Scholar] [CrossRef]
- Sun, W.; Tack, F.; Clarisse, L.; Schneider, R.; Stavrakou, T.; Van Roozendael, M. Inferring Surface NO2 Over Western Europe: A Machine Learning Approach With Uncertainty Quantification. J. Geophys. Res. Atmos. 2024, 129, e2023JD040676. [Google Scholar] [CrossRef]
- Cressie, N.; Calder, C.A.; Clark, J.S.; Ver Hoef, J.M.; Wikle, C.K. Accounting for uncertainty in ecological analysis: The strengths and limitations of hierarchical statistical modeling. Ecol. Appl. 2009, 19, 553–570. [Google Scholar] [CrossRef]
- World Health Organization. Guidelines for Safe Recreational Water Environments. Volume 1: Coastal and Fresh Waters; World Health Organization: Geneva, Switzerland, 2003; pp. 136–158. [Google Scholar]
- Office of Water. Recommendations for Cyanobacteria and Cyanotoxin Monitoring in Recreational Waters; United States Environmental Protection Agency: Washington, DC, USA, 2019; p. 5. [Google Scholar]
- Havens, K.E.; Ji, G.; Beaver, J.R.; Fulton, R.S., III; Teacher, C.E. Dynamics of cyanobacteria blooms are linked to the hydrology of shallow Florida lakes and provide insight into possible impacts of climate change. Hydrobiologia 2019, 829, 43–59. [Google Scholar] [CrossRef]
- Ahmed, N.K.; Atiya, A.F.; Gayar, N.E.; El-Shishiny, H. An empirical comparison of machine learning models for time series forecasting. Econom. Rev. 2010, 29, 594–621. [Google Scholar] [CrossRef]
- İleri, K. Comparative analysis of CatBoost, LightGBM, XGBoost, RF, and DT methods optimised with PSO to estimate the number of k-barriers for intrusion detection in WSNs. Int. J. Mach. Learn. Cybern. 2025, 16, 543–566. [Google Scholar] [CrossRef]
- Moreno, J.J.M.; Pol, A.P.; Abad, A.S.; Blasco, B.C. Using the R-MAPE index as a resistant measure of forecast accuracy. Psicothema 2013, 25, 500–506. [Google Scholar] [CrossRef] [PubMed]
- Ye, L.; Yang, G.; Van Ranst, E.; Tang, H. Time-series modeling and prediction of global monthly absolute temperature for environmental decision making. Adv. Atmos. Sci. 2013, 30, 382–396. [Google Scholar] [CrossRef]
- Yang, C.; Tan, Z.; Li, Y.; Shen, M.; Duan, H. A comparative analysis of machine learning methods for algal bloom detection using remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 8589–8605. [Google Scholar] [CrossRef]
- Nong, X.; Zeng, J.; Chen, L.; Wei, J.; Zhang, Y. A novel water quality risk assessment framework for reservoir water bodies coupling key parameter selection and dynamic warning threshold determination. Sci. Rep. 2025, 15, 1242. [Google Scholar] [CrossRef]
- Carrara, P.; Bordogna, G.; Boschetti, M.; Brivio, P.A.; Nelson, A.; Stroppiana, D. A flexible multi-source spatial-data fusion system for environmental status assessment at continental scale. Int. J. Geogr. Inf. Sci. 2008, 22, 781–799. [Google Scholar] [CrossRef]
- Chang, N.B.; Bai, K.; Chen, C.F. Integrating multisensor satellite data merging and image reconstruction in support of machine learning for better water quality management. J. Environ. Manag. 2017, 201, 227–240. [Google Scholar] [CrossRef]
- Chen, B.; Huang, B.; Xu, B. Multi-source remotely sensed data fusion for improving land cover classification. ISPRS J. Photogramm. Remote Sens. 2017, 124, 27–39. [Google Scholar] [CrossRef]
- Li, Z.; Wang, H.; Zhang, T.; Zeng, Q.; Xiang, J.; Liu, Z.; Yang, R. Multi-Source Precipitation Data Merging for High-Resolution Daily Rainfall in Complex Terrain. Remote Sens. 2023, 15, 4345. [Google Scholar] [CrossRef]
- He, Q.; Chen, C.; Wang, Y.; Sun, Y.; Liu, Y.; Hu, B. Fusion Method for Multi-Source Remote Sensing Daily Precipitation Data: Random Forest Model Considering Spatial Autocorrelation. J. Geo-Inf. Sci. 2024, 26, 1517–1530. [Google Scholar] [CrossRef]
- Mak, H.W.L.; Laughner, J.L.; Fung, J.C.H.; Zhu, Q.; Cohen, R.C. Improved Satellite Retrieval of Tropospheric NO2 Column Density via Updating of Air Mass Factor (AMF): Case Study of Southern China. Remote Sens. 2018, 10, 1789. [Google Scholar] [CrossRef]
- Peterson, K.T.; Sagan, V.; Sloan, J.J. Deep learning-based water quality estimation and anomaly detection using Landsat-8/Sentinel-2 virtual constellation and cloud computing. GIScience Remote Sens. 2020, 57, 735–748. [Google Scholar] [CrossRef]
- Samadzadegan, F.; Toosi, A.; Dadrass Javan, F.; Asghari, A.; Fathololoumi, S.; Biswas, A. A critical review on multi-sensor and multi-platform remote sensing data fusion approaches: Current status and prospects. Int. J. Remote Sens. 2025, 46, 1327–1402. [Google Scholar] [CrossRef]
- Yang, J.; Jiang, Y.; Song, Q.; Wang, Z.; Hu, Y.; Li, K.; Sun, Y. An Approach for Multi-Source Land Use and Land Cover Data Fusion Considering Spatial Correlations. Remote Sens. 2025, 17, 1131. [Google Scholar] [CrossRef]
- Zhang, J. Multi-source remote sensing data fusion: Status and trends. Int. J. Image Data Fusion 2010, 1, 5–24. [Google Scholar] [CrossRef]
- Cao, W.; Qi, W.; Lu, P. Air quality prediction based on time series decomposition and convolutional sparse self-attention mechanism transformer model. IEEE Access 2024, 12, 156789–156801. [Google Scholar] [CrossRef]
- Chen, Y.; Chen, X.; Xu, A.; Sun, Q.; Peng, X. A hybrid CNN-Transformer model for ozone concentration prediction. Air Qual. Atmos. Health 2022, 15, 1449–1463. [Google Scholar] [CrossRef]
- Liu, S.; Hu, Y. Air quality prediction based on factor analysis combined with Transformer and CNN-BILSTM-ATTENTION models. Sci. Rep. 2025, 15, 2156. [Google Scholar] [CrossRef]
- Kumari, S.; Singh, S.K. Machine learning-based time series models for effective CO2 emission prediction in India. Environ. Sci. Pollut. Res. 2023, 30, 21844–21856. [Google Scholar] [CrossRef]









| Model | RMSE (10 d) | MAE (10 d) | R2 (10 d) | RMSE (20 d) | MAE (20 d) | R2 (20 d) |
|---|---|---|---|---|---|---|
| LightGBM | 0.2333 | 0.2073 | 0.4760 | 0.4309 | 0.4135 | −0.0007 |
| XGBoost | 0.3017 | 0.2640 | 0.1236 | 0.3928 | 0.3790 | 0.1684 |
| Random Forest | 0.3235 | 0.3006 | −0.0075 | 0.4438 | 0.4283 | −0.0614 |
| Ridge Regression | 0.5482 | 0.4048 | −1.8925 | 0.4007 | 0.3322 | 0.1347 |
| Forecast Period | Predicted Date | Predicted Value | Risk Classification |
|---|---|---|---|
| t + 1 (10 days ahead) | 12 August 2025 | 6.9138 | MEDIUM |
| t + 2 (20 days ahead) | 22 August 2025 | 7.3592 | MEDIUM |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Qamar, M.Z.; Ciccarelli, C.; Ajaoud, M.; Lega, M. Probabilistic Water Quality Monitoring Using Multi-Temporal Sentinel-2 Data: A Situational Awareness Framework for Harmful Algal Bloom Forecasting. Remote Sens. 2026, 18, 959. https://doi.org/10.3390/rs18060959
Qamar MZ, Ciccarelli C, Ajaoud M, Lega M. Probabilistic Water Quality Monitoring Using Multi-Temporal Sentinel-2 Data: A Situational Awareness Framework for Harmful Algal Bloom Forecasting. Remote Sensing. 2026; 18(6):959. https://doi.org/10.3390/rs18060959
Chicago/Turabian StyleQamar, Muhammad Zaid, Cristiano Ciccarelli, Mohammed Ajaoud, and Massimiliano Lega. 2026. "Probabilistic Water Quality Monitoring Using Multi-Temporal Sentinel-2 Data: A Situational Awareness Framework for Harmful Algal Bloom Forecasting" Remote Sensing 18, no. 6: 959. https://doi.org/10.3390/rs18060959
APA StyleQamar, M. Z., Ciccarelli, C., Ajaoud, M., & Lega, M. (2026). Probabilistic Water Quality Monitoring Using Multi-Temporal Sentinel-2 Data: A Situational Awareness Framework for Harmful Algal Bloom Forecasting. Remote Sensing, 18(6), 959. https://doi.org/10.3390/rs18060959

