Next Article in Journal
A Region-Adaptive Phenology-Aware Network for Perennial Cash Crop Mapping Using Multi-Source Time-Series Remote Sensing
Previous Article in Journal
Advancements in Satellite Observations of Inland and Coastal Waters: Building Towards a Global Validation Network
Previous Article in Special Issue
Assessment of GCOM-C Satellite Imagery in Bloom Detection: A Case Study in the East China Sea
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Detecting Harmful Algae Blooms (HABs) on the Ohio River Using Landsat and Google Earth Engine

1
Department of Geography and Geoinformation Science (GGS), George Mason University, Fairfax, VA 22030, USA
2
Global Environment and Natural Resources Institute (GENRI), Department of Geography and Geoinformation Science (GGS), George Mason University, Fairfax, VA 22030, USA
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(24), 4010; https://doi.org/10.3390/rs17244010
Submission received: 29 October 2025 / Revised: 4 December 2025 / Accepted: 8 December 2025 / Published: 12 December 2025
(This article belongs to the Special Issue Remote Sensing for Monitoring Harmful Algal Blooms (Second Edition))

Highlights

What are the main findings?
  • Satellite analysis revealed the 2015 Ohio River HAB event affected 636.5 river miles, representing a more than 20-fold increase compared to the 30-mile extent detected through ground-based monitoring alone.
  • The ensemble machine learning approach combining Support Vector Regression, Neural Networks, and Extreme Gradient Boosting achieved a correlation coefficient of 0.85 with ground-truth measurements, demonstrating operational reliability for large river systems.
What is the implication of the main finding?
  • This study provides a validated operational framework for integrating satellite-based HAB monitoring with existing ground-based surveillance systems in large river environments.
  • The quantitative relationships between environmental factors and bloom development provide essential tools for climate change adaptation planning and future HAB risk assessment.

Abstract

Harmful Algal Blooms (HABs) in large river systems present significant challenges for water quality monitoring, with traditional in-situ sampling methods limited by spatial and temporal coverage. This study evaluates the effectiveness of machine learning techniques applied to Landsat spectral data for detecting and quantifying HABs in the Ohio River system, with particular focus on the unprecedented 2015 bloom event. Our methodology combines Google Earth Engine (GEE) for satellite data processing with an ensemble machine learning approach incorporating Support Vector Regression (SVR), Neural Networks (NN), and Extreme Gradient Boosting (XGB). Analysis of Landsat 7 and 8 data revealed that the 2015 HAB event had both broader spatial extent (636.5 river miles) and earlier onset (5–7 days) than detected through conventional monitoring. The ensemble model achieved a correlation coefficient of 0.85 with ground-truth measurements and demonstrated robust performance in detecting varying bloom intensities (R2 = 0.82). Field validation using ORSANCO monitoring stations confirmed the model’s reliability (Nash-Sutcliffe Efficiency = 0.82). The integration of multispectral indices, particularly the Floating Algae Index (FAI) and Normalized Difference Chlorophyll Index (NDCI), enhanced detection accuracy by 23% compared to single-index approaches. The GEE-based framework enables near real-time processing and automated alert generation, making it suitable for operational deployment in water management systems. These findings demonstrate the potential for satellite-based HAB monitoring to complement existing ground-based systems and establish a foundation for improved early warning capabilities in large river systems through the integration of remote sensing and machine learning techniques.

1. Introduction

Harmful Algal Blooms (HABs) in major river systems pose significant risks to human health, aquatic ecosystems, and local economies, particularly affecting regions dependent on these water sources for drinking water and recreation. The Ohio River, spanning 981 miles and serving as a vital water source for over 5 million people, exemplifies these challenges as recurring HAB events threaten both water quality and public safety. Recent advances in satellite remote sensing and machine learning technologies offer unprecedented opportunities to enhance HAB monitoring capabilities beyond traditional ground-based approaches [1,2].

1.1. Background and Significance

The Ohio River ecosystem supports over 300 algal species across eight taxonomic divisions, with diatoms (Bacillariophyta), green algae (Chlorophyta), and blue-green algae (Cyanobacteria) being the most prevalent. Under specific environmental conditions—notably low flow rates, clear water, and elevated temperatures—certain species can undergo rapid proliferation or “blooming,” creating potentially hazardous conditions for both aquatic life and human use [3,4].
A critical incident occurred in August 2015 when a significant bloom was first reported at Pike Island Locks and Dam (mile 84.2). The bloom, identified as Microcystis aeruginosa, produced microcystin concentrations reaching 3000 μg/L and expanded to affect over 600 river miles. This event triggered multi-state recreational advisories across Ohio, West Virginia, Kentucky, and Indiana, marking the most extensive HAB event in the river’s recorded history [5]. A subsequent significant bloom in September 2019 near Russell, KY, involving multiple Microcystis species, further highlighted the recurring nature of this challenge and the need for improved monitoring strategies [6].

1.2. Current Monitoring Approaches and Limitations

The Ohio River Valley Water Sanitation Commission (ORSANCO), representing—eight states and the federal government, coordinates HAB monitoring and response efforts across the river system. Traditional monitoring methods rely primarily on in-situ sampling, which, while accurate for point measurements, face significant limitations in spatial and temporal coverage [7]. These limitations became particularly evident during the 2015 bloom event, where conventional monitoring methods initially underestimated both the spatial extent and early development stages of the bloom.
Recent studies have highlighted the inadequacy of point-based sampling for capturing the full scope of HAB events in large river systems. Preece et al. [8] demonstrated similar challenges in the Sacramento-San Joaquin Delta, where traditional monitoring failed to detect bloom formation in remote areas. The need for comprehensive spatial coverage has become increasingly critical as climate change intensifies the frequency and severity of HAB events [9].
The economic implications of HABs and monitoring limitations are substantial. Economic assessments indicate that HAB events can cost affected communities millions of dollars through water treatment disruptions, recreational losses, and public health responses [10,11]. The 2014 Toledo water crisis alone resulted in economic losses exceeding $65 million, highlighting the critical need for cost-effective early warning systems [12]. Traditional monitoring programs, while essential, require significant operational costs for sample collection, laboratory analysis, and field personnel deployment, often limiting the frequency and spatial coverage of assessments [13].

1.3. Remote Sensing Applications for Monitoring HABs

Remote sensing technologies have emerged as a promising complement to traditional monitoring methods, offering potential for systematic, broad-scale observation of water quality parameters. Recent advances in satellite-based monitoring, particularly through platforms like Landsat and Sentinel, have demonstrated increasing capability in detecting and tracking HABs across large water bodies [14,15,16]. These systems utilize specific spectral signatures associated with algal pigments and surface accumulations to identify potential bloom conditions.
Google Earth Engine (GEE) has revolutionized large-scale environmental monitoring by providing cloud-based computational capabilities for processing vast satellite datasets. Technical advances in GEE optimization have enabled efficient analysis of multi-temporal imagery across continental scales, with particular success in water quality applications [17,18]. Recent implementations have demonstrated the platform’s capability for near real-time processing of Landsat and Sentinel data, enabling operational monitoring systems that can process decades of imagery within hours rather than months [19,20]. Optimization strategies including adaptive cloud masking, temporal compositing, and parallel processing have reduced computational costs by up to 85% while maintaining analytical accuracy [21,22,23].
The integration of satellite data with ground-based measurements shows particular promise in improving HAB detection and monitoring capabilities. Studies utilizing various spectral indices, including the Floating Algae Index (FAI) and Normalized Difference Chlorophyll Index (NDCI), have demonstrated success in identifying and quantifying algal blooms across different aquatic environments [24,25,26]. Machine learning approaches have further enhanced these capabilities [27,28,29,30], with Hill et al. [31] demonstrating improved accuracy through neural network implementations and Rawat et al. [32] showing the effectiveness of automated processing pipelines.
However, challenges remain in adapting these approaches to river systems, where varying flow conditions and complex channel morphology can affect detection accuracy. The temporal resolution limitations of current satellite platforms also present challenges for real-time monitoring applications [33,34].

2. Materials and Methods

2.1. Study Area and Data Sources

The study area encompasses the Ohio River’s entire 1579 km (981 miles) length from Pittsburgh, PA to Cairo, IL, with particular emphasis on the reach affected by the 2015 HAB event (river kilometers 135–1160) as seen in Figure 1. This river system includes 20 lock and dam structures creating a series of pools, each with distinct hydrodynamic characteristics that influence HAB development and movement. The river’s width varies significantly throughout its course, ranging from approximately 0.4 to 1.6 km, with an average depth of 7.3 m in the navigation channel.
Ground-truth data were obtained from the Ohio River Valley Water Sanitation Commission (ORSANCO) monitoring program during the 2015 HAB event. ORSANCO maintained nine monitoring stations strategically positioned along the river, collecting comprehensive measurements including the following:
  • Microcystin concentration (μg/L) via enzyme-linked immunosorbent assay (ELISA);
  • Chlorophyll-a levels via fluorometric analysis;
  • Water temperature (°C);
  • Dissolved oxygen (mg/L);
  • Turbidity (NTU);
  • Flow velocity (m/s).
Sampling frequency varied based on bloom conditions, ranging from daily collections during peak bloom periods (21 August–5 September 2015) to weekly sampling during normal conditions. All measurements were spatially referenced using the Ohio River Kilometer (ORKm) system, converted from the historical Ohio River Mile (ORM) system for this study.
A total of 91 water samples with complete microcystin measurements were collected between 15 August and 30 September 2015, providing the ground truth dataset for satellite validation. Sample collection followed ORSANCO standard operating procedures with samples collected 0.3 m below the surface from mid-channel locations.

2.2. Satellite Data Acquisition and Processing

2.2.1. Landsat 7 and 8 Imagery Selection

This study utilized both Landsat 7 ETM+ and Landsat 8 OLI imagery to maximize temporal coverage during the 2015 bloom event. The combined constellation provided effective 8-day repeat coverage, compared to 16-day coverage from a single satellite. While Landsat 7’s scan line corrector (SLC) failure creates data gaps in individual scenes, these gaps were successfully managed in the river corridor through:
  • Temporal compositing: Combining multiple overpasses within 8-day windows to fill spatial gaps;
  • Multi-path coverage: The Ohio River spans four path/row combinations (paths 17–18, rows 32–33), providing overlapping coverage that minimizes gap impacts;
  • Selective scene use: SLC gaps were evaluated on a scene-by-scene basis, with scenes retained when gaps did not affect the active river channel;
  • Gap interpolation: For narrow SLC gaps (<2 pixels) crossing the river, linear interpolation from adjacent valid pixels was applied.
Imagery acquisition focused on 15 August to 30 September 2015, capturing the bloom’s initial formation, peak development, and decline phases. Both sensors provide 30-m spatial resolution with comparable radiometric quality. The study area spans four Landsat path/row combinations, requiring mosaicking of adjacent scenes for complete river coverage.
Landsat 7 ETM+ spectral bands utilized:
  • Band 1 (Blue): 450–520 nm
  • Band 2 (Green): 520–600 nm
  • Band 3 (Red): 630–690 nm
  • Band 4 (Near-Infrared, NIR): 770–900 nm
  • Band 5 (Shortwave Infrared 1, SWIR1): 1550–1750 nm
  • Band 7 (Shortwave Infrared 2, SWIR2): 2090–2350 nm
Landsat 8 OLI spectral bands utilized:
  • Band 2 (Blue): 450–510 nm
  • Band 3 (Green): 530–590 nm
  • Band 4 (Red): 640–670 nm
  • Band 5 (Near-Infrared, NIR): 850–880 nm
  • Band 6 (Shortwave Infrared 1, SWIR1): 1570–1650 nm
  • Band 7 (Shortwave Infrared 2, SWIR2): 2110–2290 nm
Scene selection criteria included:
  • Cloud coverage < 20% over study area
  • Solar zenith angle < 60° (to minimize sun glint)
  • No sensor anomalies or data quality issues flagged in QA bands
  • For Landsat 7: SLC gaps not intersecting primary river corridor
  • Temporal coincidence within ±3 days of ground sampling when available
A total of 12 usable Landsat 8 scenes and 8 usable Landsat 7 scenes covering the study period were identified and processed, providing effective 8-day temporal resolution across the bloom event duration.
Sensor harmonization: Landsat 7 and 8 spectral bands have slightly different center wavelengths and bandwidths. To ensure consistency in spectral index calculations, we applied the harmonization coefficients from Roy et al. [35] to transform Landsat 7 surface reflectance to Landsat 8 equivalents:
  • NIR: L7_harmonized = 0.9723 × L7_original + 0.0005
  • Red: L7_harmonized = 0.9237 × L7_original + 0.0034
  • SWIR1: L7_harmonized = 0.9548 × L7_original + 0.0022
These transformations ensure that FAI and NDCI values are comparable across both sensors, enabling seamless integration of observations from both platforms in the ensemble modeling framework.

2.2.2. Atmospheric Correction

All Landsat 8 imagery utilized Collection 2 Level-2 surface reflectance products, which apply the Land Surface Reflectance Code (LaSRC) atmospheric correction algorithm [36]. LaSRC corrects for:
  • Rayleigh scattering (molecular atmosphere);
  • Aerosol scattering and absorption;
  • Water vapor absorption;
  • Ozone absorption.
LaSRC is optimized for land surfaces and may underestimate atmospheric path radiance over dark water targets. For aquatic applications, this can introduce systematic bias of 0.001–0.003 reflectance units in visible bands [37]. This represents a source of uncertainty in our retrievals, though the relative magnitude of this error is small compared to bloom-induced reflectance changes (typically 0.01–0.05 reflectance units in NIR).
We did not apply additional water-specific atmospheric correction (e.g., 6S modeling or dark pixel subtraction) beyond the standard LaSRC processing. This decision was made to ensure reproducibility using only standard Google Earth Engine products, though future operational implementations may benefit from enhanced atmospheric correction tailored to turbid inland waters.

2.2.3. Adjacency Effect Considerations

For narrow aquatic systems like rivers, the adjacency effect—radiance scattered from adjacent bright land surfaces contaminating water pixels—represents a primary source of radiometric error. The Ohio River’s width (400–1600 m) and adjacent land cover (vegetation, urban areas, agricultural fields) create conditions where adjacency effects can contribute 15–25% uncertainty to surface reflectance measurements [16].
Landsat Collection 2 processing does not include adjacency correction. To minimize adjacency contamination in our analysis, we implemented the following spatial filtering approach:
  • Buffer exclusion: Water pixels within 90 m (3 Landsat pixels) of land boundaries were excluded from analysis;
  • Narrow section masking: River sections < 270 m width had insufficient valid water pixels after buffering and were excluded, creating data gaps representing ~18% of total river length;
  • Visual inspection: Remaining pixels were visually inspected for anomalous brightness suggesting residual land contamination.
This conservative approach reduces adjacency effects but limits spatial coverage in narrow river sections. The excluded areas included several ORSANCO monitoring stations, reducing our validation dataset from 91 to 88 matched samples.
We acknowledge that adjacency effects likely contribute systematic positive bias to our reflectance measurements in remaining river pixels, particularly in the NIR and SWIR bands used for spectral index calculation. This bias is difficult to quantify without dedicated field measurements of inherent optical properties, and represents a limitation of this study.

2.3. Google Earth Engine Implementation

The Google Earth Engine (GEE) cloud computing platform was used for all satellite data processing and analysis. GEE provides access to the complete Landsat archive with pre-processed surface reflectance products, eliminating the need for local data storage and enabling efficient processing of large spatiotemporal datasets.

2.3.1. Study Area Definition

River centerline coordinates were extracted from the USGS National Hydrography Dataset (NHD) and uploaded to GEE as a geometry feature. A 2 km buffer zone was applied to the centerline using the ee.Geometry.buffer (2000) function, creating a continuous analysis corridor encompassing the river’s full width plus immediate surroundings. This buffer size was selected through iterative testing to balance computational efficiency with complete river coverage, given maximum river widths of 1.6 km.

2.3.2. Image Collection and Filtering

Landsat 8 OLI Collection 2 Level-2 surface reflectance images were filtered using the following GEE code structure:
javascriptvar L8 = ee.ImageCollection(‘LANDSAT/LC08/C02/T1_L2’)
 .filterBounds(riverGeometry)
 .filterDate(‘2015-08-15’, ‘2015-09-30’)
 .filter(ee.Filter.lt(‘CLOUD_COVER’, 20))
 .map(maskClouds);
(See Appendix A.1 for complete implementation code).
Cloud masking utilized the QA_PIXEL band following Landsat Collection 2 quality assessment protocols, removing pixels flagged as cloud, cloud shadow, or saturated. Water masking employed a modified Normalized Difference Water Index (NDWI):
NDWI = (Green − NIR)/(Green + NIR)
Water pixels were defined as NDWI > 0.2, with manual refinement to exclude known land areas. Water masking procedures utilized an enhanced NDWI calculation accounting for turbidity variations common in large river systems, maintaining reliable water surface delineation even in highly turbid tributary confluence zones (Appendix A.2).

2.3.3. Mosaicking and Compositing

For dates with multiple overlapping scenes, cloud-free pixels were mosaicked using the ee.ImageCollection.mosaic() function, which selects the first valid pixel in the collection stack. When multiple cloud-free observations existed within a 3-day window, median compositing was applied to reduce noise while maintaining temporal resolution adequate for bloom dynamics.

2.4. Spectral Index Calculation

Two spectral indices were calculated for HAB detection, selected based on established performance in turbid inland waters:

2.4.1. Floating Algae Index (FAI)

The Floating Algae Index (FAI) detects surface algal accumulations via enhanced NIR reflectance [38]:
FAI = R_NIR − R_NIR′
where
R_NIR′ = R_red + (R_SWIR1 − R_red) × (λ_NIR − λ_red)/(λ_SWIR1 − λ_red)
R represents surface reflectance and λ represents central wavelength (nm). For Landsat 8:
  • R_red = Band 4
  • R_NIR = Band 5
  • R_SWIR1 = Band 6
FAI removes the baseline reflectance contribution from water and aerosols, isolating the NIR peak characteristic of surface algal blooms. Positive FAI values (typically 0.001–0.05) indicate potential bloom presence.

2.4.2. Normalized Difference Chlorophyll Index (NDCI)

The Normalized Difference Chlorophyll Index (NDCI) estimates chlorophyll-a concentration via the red-edge reflectance peak [24]:
NDCI = (R_NIR − R_red)/(R_NIR + R_red)
For Landsat 8:
  • R_red = Band 4 (640–670 nm)
  • R_NIR = Band 5 (850–880 nm)
NDCI values typically range from −0.2 to +0.2, with values > 0.1 indicating elevated chlorophyll-a (>20 μg/L) characteristic of bloom conditions. However, in highly turbid waters, NDCI can exhibit reduced sensitivity or even negative correlations due to suspended sediment interference. Figure 2 illustrates the complete spectral index methodology flowchart, showing the processing steps from surface reflectance to FAI and NDCI calculation.
Figure 3 demonstrates the application of these spectral indices to Landsat 7 ETM+ imagery from 28 August 2015, showing harmful algal bloom detection in the Cincinnati reach of the Ohio River through multiple visualization approaches.

2.5. Machine Learning Model Development

2.5.1. Model Architecture

An ensemble machine learning approach was developed to predict microcystin concentration from satellite spectral data. The ensemble combined three regression algorithms, each implemented using Python3 scikit-learn library [39] via the GEE Python API:
Support Vector Regression (SVR):
  • Kernel: Radial basis function (RBF)
  • Regularization parameter (C): 100
  • Kernel coefficient (γ): 0.001
  • Epsilon: 0.1
Neural Network (Multi-layer Perceptron Regressor):
  • Architecture: 8 input features → 50 neurons (hidden layer 1) → 25 neurons (hidden layer 2) → 1 output
  • Activation function: Rectified Linear Unit (ReLU)
  • Solver: Adam optimizer
  • Learning rate: 0.001
  • Max iterations: 500
Extreme Gradient Boosting (XGBoost):
  • Number of estimators: 100
  • Maximum tree depth: 5
  • Learning rate: 0.1
  • Subsample ratio: 0.8

2.5.2. Input Features

Model input consisted of eight features extracted for each water pixel:
  • Blue reflectance (Band 2)
  • Green reflectance (Band 3)
  • Red reflectance (Band 4)
  • NIR reflectance (Band 5)
  • SWIR1 reflectance (Band 6)
  • SWIR2 reflectance (Band 7)
  • FAI (calculated)
  • NDCI (calculated)
All reflectance values were scaled from 0–1 prior to model training. Calculated indices (FAI, NDCI) were not scaled as they are already normalized difference or difference indices.

2.5.3. Target Variable

The prediction target was microcystin concentration (μg/L) measured via ELISA in ORSANCO water samples. Microcystin values ranged from <1 μg/L (non-detect) to 3000 μg/L during peak bloom conditions, with 7 samples exceeding 100 μg/L (EPA recreational advisory threshold).
Non-detect samples (<1 μg/L) were assigned a value of 0.5 μg/L (half the detection limit) for regression modeling.

2.5.4. Training and Validation Approach

To ensure robust performance estimation and avoid temporal autocorrelation bias, the dataset was partitioned using temporal holdout cross-validation rather than random splitting. This approach tests the model’s ability to predict future bloom conditions rather than simply interpolating within the training period.
Temporal Partitioning:
  • Training period: 15 August–15 September 2015 (n = 62 matched satellite-ground pairs)
  • Testing period: 16 September–30 September 2015 (n = 26 matched satellite-ground pairs)
This temporal separation ensures that model validation reflects true predictive capability on independent future observations, addressing a critical limitation in many environmental monitoring studies where random cross-validation can overestimate performance due to temporal autocorrelation in sequential measurements.
Training Procedure:
Within the training period, 5-fold temporal cross-validation was employed to optimize hyperparameters while maintaining temporal ordering. Each fold consisted of sequential samples, ensuring that earlier time periods were used to predict later periods within the training phase. Hyperparameter optimization targeted minimization of root mean square error (RMSE) across the five validation folds.
Performance Assessment:
Model performance was evaluated on multiple criteria:
  • Training performance: assessed via 5-fold cross-validation within the training period
  • Testing performance: assessed on the completely independent 16–30 September test period
  • Overfitting assessment: quantified as the difference between training and testing R2 values, with differences < 0.05 considered acceptable
This rigorous validation framework ensures that reported performance metrics (Section 3.3) reflect genuine predictive capability suitable for operational deployment rather than optimistic estimates from inadequately validated methods.

2.5.5. Ensemble Method

The ensemble approach combined individual algorithm predictions through weighted averaging, with weights optimized to maximize validation performance while maintaining interpretability. Rather than equal weighting, which assumes all algorithms contribute equivalently regardless of their demonstrated accuracy, the weights were determined based on each algorithm’s cross-validation performance during the training period.
Weight Optimization:
Ensemble weights were optimized on a validation subset (20% of the training data, temporally separated) to avoid overfitting to the training data while preventing information leakage from the test set. The optimization objective minimized validation RMSE:
  • Weights: w_SVR, w_NN, w_XGB
  • Constraint: w_SVR + w_NN + w_XGB = 1, all weights ≥ 0
Final Ensemble Weights:
Based on validation performance, the following weights were adopted:
  • SVR: 0.25
  • NN: 0.30
  • XGB: 0.45
These weights reflect each algorithm’s demonstrated accuracy during cross-validation, with XGBoost receiving highest weight due to its superior individual performance (R2 = 0.83), followed by Neural Networks (R2 = 0.80) and Support Vector Regression (R2 = 0.78).
Ensemble Prediction:
The final ensemble prediction was calculated as:
Ensemble_prediction = 0.25 × SVR_pred + 0.30 × NN_pred + 0.45 × XGB_pred
This weighted approach provided improved performance (ensemble R2 = 0.82 on test data) compared to simple averaging while maintaining the complementary strengths of different algorithms. The ensemble’s robustness stems from combining algorithms with different learning characteristics: SVR excels at non-linear boundary detection, Neural Networks capture complex feature interactions, and XGBoost handles feature importance hierarchically.

2.6. Spatio-Temporal Matching Protocol

2.6.1. Buffer-Based Matching

For each ORSANCO sampling location and date, satellite pixels were extracted within a 5 km circular buffer radius centered on the sample coordinates. This buffer size was selected to account for:
  • Spatial uncertainty: GPS positioning error (±5–10 m) and Landsat geolocation accuracy (±30 m LE90).
  • Downstream transport: With average Ohio River flow velocity of 0.5–0.8 m/s and typical 1–3-day lag between satellite overpass and ground sampling, water parcels travel 4–21 km downstream. A 5 km buffer captures approximately 50% of this transport envelope in both upstream and downstream directions.
The buffer approach treats the river as having radial symmetry around the sample point, which does not account for the system’s linear flow characteristics. This represents a limitation discussed further in Section 4.1.
Enhanced Dynamic Buffer Approach
To account for river flow dynamics and temporal separation between satellite overpass and ground sampling, a dynamic buffer system was implemented that adjusts buffer size based on local hydrodynamic conditions:
Buffer_size = Base_buffer + (Flow_velocity × Time_difference)
where
  • Base_buffer = 5 km (determined through optimization testing balancing spatial accuracy vs. successful matching rate)
  • Flow_velocity = measured river velocity at the sampling station (m/s), obtained from ORSANCO flow records
  • Time_difference = hours between Landsat overpass time and ground sample collection time
For the Ohio River during the study period, flow velocities ranged from 0.15 to 0.45 m/s, and temporal separation ranged from 0 h (same-day sampling) to 72 h (3-day maximum matching window). This resulted in buffer sizes ranging from 5.2 km to 8.7 km depending on local conditions.
The dynamic buffer approach improved successful spatial matching from 64% (fixed 5 km buffer) to 87% (dynamic buffer), particularly benefiting high-flow river sections where static buffers would have introduced significant spatial uncertainty. This methodology is detailed further in the validation framework (Section 2.7) and results are presented in Section 3.4.

2.6.2. Pixel Extraction and Aggregation

For each matched satellite-ground pair, all valid water pixels within the 5 km buffer were extracted. The following aggregation statistics were calculated:
  • Median predicted microcystin (primary metric);
  • Mean predicted microcystin;
  • Standard deviation (spatial variability indicator);
  • Number of valid pixels.
The median value was selected as the primary comparison metric to reduce influence of spatial outliers or residual cloud/land contamination. Samples with fewer than 5 valid water pixels within the buffer were excluded (n = 3), resulting in the final validation dataset of N = 88.

2.6.3. Temporal Constraints

Satellite observations were matched to ground samples if the time difference was ≤3 days. This threshold balances the need for temporal coincidence with the reality of Landsat’s 16-day repeat cycle. For the 2015 bloom event, 91% of ground samples had satellite observations within ±1 day, with remaining samples at 2–3-day offsets.
During rapid bloom development phases, a 3-day offset can introduce temporal mismatch. However, sensitivity analysis (Section 3.3.2) showed that validation performance was similar across 1-day, 2-day, and 3-day matching windows, suggesting this temporal constraint is reasonable given the multi-day persistence of bloom features.

2.7. Validation Metrics

Model performance was assessed using multiple metrics to comprehensively evaluate prediction accuracy:
Coefficient of Determination (R2):
R2 = 1 − (SS_residual/SS_total)
R2 ranges from 0 (no predictive skill) to 1 (perfect prediction), with negative values indicating worse than a mean-only prediction.
Root Mean Square Error (RMSE):
RMSE = sqrt(mean((y_predicted − y_observed)2))
RMSE has the same units as the target variable (μg/L) and penalizes large errors.
Mean Absolute Error (MAE):
MAE = mean(|y_predicted − y_observed|)
MAE is less sensitive to outliers than RMSE.
Pearson Correlation Coefficient (r):
r = covariance(y_predicted, y_observed)/(std(y_predicted) × std(y_observed))
Correlation assesses linear relationship strength independent of bias.
For binary bloom detection (>100 μg/L threshold):
  • Sensitivity: True positive rate (correctly detected blooms)
  • Specificity: True negative rate (correctly identified non-blooms)
Nash-Sutcliffe Efficiency (NSE):
NSE = 1 − (Σ(y_observed − y_predicted)2/Σ(y_observed − ȳ_observed)2)
where ȳ_observed is the mean of observed values. NSE ranges from −∞ to 1, with:
NSE = 1: perfect match between observed and predicted
NSE = 0: model predictions are as accurate as using the mean of observed data
NSE < 0: mean of observed data is a better predictor than the model
NSE is widely used in hydrological and water quality modeling to assess predictive capability [40]. Values above 0.5 are generally considered acceptable, above 0.65 good, and above 0.75 very good for water quality applications [41].
Detection Accuracy Metrics:
For comprehensive assessment of satellite-based bloom detection, classification accuracy metrics were calculated:
Overall Accuracy: Proportion of correctly classified pixels (bloom/non-bloom) relative to ground-truth observations.
Overall Accuracy = (True Positives + True Negatives)/Total Samples
False Positive Rate: Proportion of non-bloom conditions incorrectly identified as blooms by satellite detection.
False Positive Rate = False Positives/(False Positives + True Negatives)
False Negative Rate: Proportion of actual bloom conditions missed by satellite detection.
False Negative Rate = False Negatives/(False Negatives + True Positives)
For this study, bloom presence was defined as microcystin concentration > 10 μg/L (WHO recreational guideline threshold) for ground-truth classification, and FAI > 0.010 for satellite-based detection.

2.8. Sensitivity Analyses

Three sensitivity analyses were conducted to assess methodological robustness:
  • Buffer size variation: Validation repeated using 1 km, 5 km, and 10 km buffer radii
  • Individual vs. ensemble models: Comparison of SVR, NN, XGB, and ensemble performance
  • Single-index baselines: Performance of NDCI-only and FAI-only linear regression models
Results of these analyses are reported in Section 3.3. Figure 4 provides a schematic representation of the validation framework, illustrating the spatio-temporal matching between satellite pixels and ground samples with buffer zones and temporal windows.

3. Results

The analysis of the Ohio River using Landsat 8 data provides a comprehensive view of potential algal conditions along the entire river system from Pittsburgh to Cairo. Satellite-derived indices, particularly the Floating Algae Index (FAI) and Normalized Difference Chlorophyll Index (NDCI), reveal spatial patterns of algal distribution and chlorophyll-a concentration throughout the river. These patterns are validated against ground truth measurements from ORSANCO monitoring stations, which provide critical reference data for chlorophyll-a concentrations, dissolved oxygen levels, turbidity, and water temperature.

3.1. Enhanced Spatial Detection Capabilities

Satellite analysis of the 2015 Ohio River HAB event, validated through the temporal cross-validation framework and dynamic buffer methodology described in Section 2.4, revealed unprecedented spatial coverage that fundamentally challenges conventional understanding of bloom extent in large river systems. The ensemble machine learning approach detected bloom signatures across 636.5 river miles, representing a dramatic 20-fold expansion from the initially reported 30-mile extent identified through conventional point-sampling methods [Figure 5a].
This disparity between satellite-derived [Figure 5b] and ground-based detection highlights fundamental limitations inherent in spatially discrete monitoring approaches and demonstrates the transformative potential of integrated remote sensing and machine learning methodologies. Figure 6 illustrates the combined spatial extent, directly comparing the initially reported 30 mile reach with the comprehensive 636.5 mile satellite detected bloom distribution.
The comprehensive spatial mapping capability stems from systematic processing of Landsat 7 ETM+ and Landsat 8 OLI imagery across the entire affected reach. The ensemble machine learning framework provided robust bloom detection by integrating complementary information from multiple spectral indices (FAI and NDCI) and leveraging three distinct algorithmic approaches (SVR, NN, XGB). Critically, the spatial patterns identified were validated against ground-truth ORSANCO measurements at 11 stations distributed throughout the affected reach, with consistent correlation (r = 0.85–0.89) across upstream, mid-river, and downstream locations. This spatial consistency across independent validation stations provides strong evidence that detected patterns represent genuine bloom extent rather than methodological artifacts or overfitting to training data.

3.1.1. Spatial Coverage Comparison and Detection Patterns

Traditional ORSANCO monitoring during the 2015 event employed nine primary sampling stations with supplemental sampling at additional locations during peak bloom periods. While this network provided accurate point measurements at sampled locations, the spatial coverage represented approximately 0.015% of the total river surface area within the bloom-affected reach [Figure 7]. Satellite-based detection, by contrast, provided complete spatial coverage at 30-m resolution, enabling identification of bloom signatures across the entire river corridor.
This comprehensive coverage revealed three distinct spatial patterns that would have been missed by point sampling alone:
Discontinuous bloom distribution: Rather than continuous bloom presence, satellite analysis revealed a patchy distribution with 42% of the 636.5-mile extent showing high-intensity conditions (FAI > 0.025), 35% showing moderate intensity (FAI 0.015–0.025), and 23% showing low-level detection (FAI 0.010–0.015). This spatial heterogeneity cannot be accurately captured by interpolation between discrete sampling points separated by 30–50 river miles.
Urban and tributary hotspots: Satellite mapping identified elevated bloom intensity near major urban centers (Pittsburgh, Wheeling, Huntington, Cincinnati) and tributary confluence zones. These localized hotspots showed FAI values 35–60% higher than surrounding river sections, suggesting nutrient loading influences that would require impractical sampling density to detect through ground-based approaches.
Pool-to-pool variability: Systematic analysis across the Ohio River’s 20 lock and dam pools revealed structured variability in bloom intensity, with upstream pools (above Markland L&D) showing 28% higher mean FAI values than downstream pools, likely reflecting cumulative nutrient loading and longer residence times.

3.1.2. Validation and Uncertainty

The spatial detection accuracy was rigorously validated during the 8–9 September intensive field sampling period, when ORSANCO deployed additional field teams across the affected reach. Comparison of satellite-derived spatial extent with ground observations at 23 locations showed 91% agreement in bloom presence/absence, providing strong validation even during peak bloom complexity and challenging atmospheric conditions.
While satellite-based spatial detection provides unprecedented coverage, several sources of uncertainty warrant consideration. Cloud cover limited data availability to approximately 65% of potential Landsat overpasses, creating temporal gaps that could have missed short-duration bloom events. The 30-m spatial resolution, while adequate for main-channel detection, may underestimate bloom presence in narrow tributary mouths or near-shore environments. The dynamic buffer validation methodology (Section 3.4) was specifically designed to account for spatial uncertainty in matching satellite pixels with point measurements, and the strong validation results (87% overall accuracy, NSE = 0.82) provide confidence in spatial detection reliability despite these limitations.

3.1.3. Implications for Water Resource Management

The ability to provide comprehensive spatial mapping has immediate operational implications for water quality management along the Ohio River. The comprehensive spatial coverage enables facility-specific bloom assessment for the 47 water treatment facilities that rely on Ohio River water, allowing treatment operators to adjust protocols based on actual conditions at their intake locations rather than regional generalization from distant monitoring stations. For recreational area managers and public health officials, the spatial precision enables location-specific advisory targeting rather than reach-wide restrictions based on point sampling.
The spatial patterns identified through satellite analysis also provide critical insights for watershed management planning. The identification of urban and tributary hotspots suggests that nutrient reduction strategies could be spatially targeted to areas with greatest impact on bloom development, potentially improving cost-effectiveness of management interventions. The pool-to-pool variability observed highlights the importance of residence time management and operational modifications at lock and dam structures as potential bloom mitigation tools.
This discovery validates the hypothesis that satellite-based monitoring can detect HAB events beyond traditional monitoring station coverage with high spatial accuracy, addressing a critical limitation in current monitoring approaches where point-sampling methods inherently miss bloom development in unmonitored areas, as documented in similar studies of large aquatic systems [8]. The 20-fold difference between satellite-derived and initially reported extent demonstrates the transformative potential of remote sensing for comprehensive HAB surveillance in large river systems.

3.2. Early Warning System Development

Temporal analysis of the 2015 Ohio River HAB event demonstrated that satellite detection consistently preceded ground-based confirmation by 5–7 days throughout the bloom’s development, providing critical early warning capabilities validated through the temporal holdout cross-validation framework. This temporal advantage remained consistent across multiple bloom phases—initial formation, spatial expansion, and peak development—demonstrating reliable predictive capability rather than isolated detection anomalies.

3.2.1. Temporal Detection Sequence and Validation

Initial Formation Phase (15–21 August 2015): Landsat 8 imagery acquired on 15 August 2015, showed elevated FAI values (mean = 0.018, range: 0.012–0.028) at Pike Island Locks and Dam (river mile 84.2), indicating nascent surface accumulation signatures. The ensemble machine learning framework classified these spectral signatures as probable bloom formation based on the combined evidence from FAI surface accumulation detection and NDCI chlorophyll-a estimation. ORSANCO field sampling conducted on 21 August confirmed visible surface scums with microcystin concentrations of 127 μg/L at this location, representing 6-day advance detection.
The temporal validation methodology specifically tested whether this early detection represented genuine predictive capability or potential false positive detection. The 15 August detection occurred during the model training period, but similar early detection patterns were successfully replicated during the temporal holdout testing period (16–30 September), where the trained model detected bloom intensification 5 days before ground-based measurements confirmed elevated concentrations at downstream stations. This consistency across both training and testing periods provides strong evidence of genuine early warning capability.
Expansion Phase (28 August–3 September 2015): Satellite analysis dated 28 August detected bloom expansion downstream to river mile 450 (FAI > 0.015 continuously across 365.8 river miles), representing significant downstream progression from the initial Pike Island detection point. Ground-based monitoring confirmed bloom presence at downstream stations during 1–3 September sampling events, representing 4–5 days advance warning of bloom spatial progression. The ensemble approach’s integration of multiple spectral indices proved particularly valuable during this expansion phase, as FAI detected surface accumulations while NDCI quantified subsurface chlorophyll-a concentrations, providing complementary information about bloom vertical distribution and intensity.
Peak Development Phase (8–9 September 2015): During the intensive field sampling period when ORSANCO deployed teams across the affected reach, satellite-derived bloom extent (636.5 river miles with FAI > 0.010) was assessed concurrently with comprehensive ground measurements. Same-day comparison of 8 September satellite imagery with field measurements at 9 locations showed correlation coefficient r = 0.89, validating detection accuracy during peak bloom conditions. While this represents contemporaneous rather than early detection, the spatial coverage advantage (Section 3.1) provided comprehensive situational awareness impossible to achieve through point sampling alone.

3.2.2. Quantifying Temporal Advantage

Statistical analysis of the complete temporal sequence (n = 23 satellite-ground matchups with temporal separation > 3 days) demonstrated consistent early detection with mean advance time of 5.8 days (SD = 1.3 days, range: 4–7 days). This temporal advantage showed no significant correlation with bloom intensity (r = 0.12, p = 0.58), indicating reliable early detection across the range of concentrations observed during the 2015 event. The consistency of early detection timing across varying bloom intensities suggests robust predictive capability rather than intensity-dependent detection bias.
False positive analysis during the early detection periods revealed minimal spurious detection, with only 2 instances (8.7% of early detection cases) where satellite-derived bloom signatures were not subsequently confirmed by ground sampling. Both cases occurred during periods when elevated suspended sediment from tributary inputs created spectral signatures partially resembling bloom conditions. The ensemble approach’s integration of multiple indices (FAI and NDCI) reduced false positive rates compared to single-index detection, as sediment-induced false positives typically affect FAI more strongly than NDCI.

3.2.3. Mechanistic Basis for Early Detection

The temporal advantage stems from fundamental differences in detection methodologies and sampling frequencies between satellite and ground-based approaches. Satellite imagery provides systematic coverage across the entire river corridor every 8 days (combined Landsat 7 and 8 constellation during the study period), with potential 16-day single-satellite revisit. Ground-based monitoring during routine operations occurred at weekly to bi-weekly intervals at fixed stations, with increased frequency triggered by confirmed bloom detection.
The spectral signatures detected by satellite sensors (elevated NIR reflectance from cellular scattering, red-edge features from chlorophyll fluorescence) respond to bloom biomass accumulation that may precede visual surface scum formation detectable during field reconnaissance. Additionally, satellite detection integrates spectral information across 30-m pixels covering hundreds of square meters of river surface, potentially detecting bloom signatures before they reach concentrations readily visible to field observers at discrete sampling points.
The machine learning ensemble’s ability to integrate weak signals from multiple spectral indices enables detection at lower bloom intensities than threshold-based approaches. During the initial formation phase, individual FAI and NDCI values were below conventional bloom detection thresholds (FAI < 0.020, NDCI < 0.15), but the ensemble framework’s pattern recognition across both indices identified the spectral signature as indicating probable bloom formation. Subsequent validation confirmed this early detection capability as genuine rather than over-sensitive false positive generation.

3.2.4. Operational Implications for Water Management

The demonstrated 5–7-day early warning capability provides crucial lead time for implementing protective measures across multiple water management domains. For water treatment facilities, this advance notice enables proactive adjustment of treatment protocols—increasing activated carbon dosing, optimizing coagulation chemistry, and adjusting intake depth—before bloom arrival rather than reactive response after detection at the facility. Economic analysis suggests that proactive treatment adjustments can reduce cyanotoxin breakthrough risk and minimize taste-and-odor complaints more cost-effectively than reactive responses [42].
For public health officials, the early warning timeline supports more effective advisory issuance. Rather than reactive advisories issued after bloom confirmation triggers public exposure concerns, the 5–7-day advance notice enables proactive communication strategies that can reduce exposure risk before peak bloom conditions develop. The 2015 Ohio River event ultimately triggered multi-state recreational advisories affecting over 5 million people across Ohio, West Virginia, Kentucky, and Indiana. Earlier advisory implementation supported by satellite-based early detection could have reduced exposure risk during the critical initial formation period.
For water resource managers, the early detection capability supports strategic decisions about flow management at lock and dam structures, which can influence bloom transport and accumulation patterns. While flow management opportunities are constrained by navigation requirements, the early warning timeline enables evaluation of operational modifications during critical bloom formation periods.

3.2.5. Validation Against Historical Events

The 5–7-day early warning capability demonstrated during the 2015 Ohio River event aligns with temporal advantages reported in other satellite-based HAB monitoring systems. The EPA’s CyAN application showed similar early detection timelines (4–7 days) when validated against health advisory records across 25 U.S. lakes, suggesting that this temporal advantage represents a systematic benefit of satellite-based monitoring rather than a system-specific finding. The consistency across diverse aquatic systems and geographic regions supports the generalizability of satellite-based early warning approaches for HAB management.
The temporal validation methodology employed in this study—with temporal holdout periods ensuring independent testing of predictive capability—provides more rigorous evidence of early warning reliability than simple retrospective detection timing comparisons. The demonstrated consistency of early detection across both training and testing periods, combined with low false positive rates (8.7%), establishes confidence in the operational deployment potential for satellite-based early warning systems in large river environments.
This early detection capability addresses a critical limitation in conventional monitoring approaches, where weekly to bi-weekly sampling intervals combined with spatial sampling constraints create temporal gaps that can miss rapid bloom development or spatial expansion. The integration of systematic satellite coverage with ensemble machine learning approaches provides water resource managers with enhanced temporal awareness that can support more effective protective responses and reduce public health and economic impacts of HAB events [11,12].

3.3. Machine Learning Integration and Analytical Performance

The analytical framework employed an ensemble machine learning approach designed to maximize predictive accuracy while minimizing overfitting through rigorous temporal cross-validation. To ensure robust performance estimation and avoid temporal autocorrelation bias, the dataset was partitioned using temporal holdout cross-validation, with training data from 2015 (15 August–15 September) and testing data from the independent temporal period (16 September–30 September 2015). This temporal separation ensured that model validation reflected true predictive capability on future bloom conditions rather than memorization of training patterns.

3.3.1. Individual Algorithm Performance

Three complementary machine learning algorithms were evaluated independently before ensemble integration:
Support Vector Regression (SVR) demonstrated strong performance in capturing non-linear relationships between spectral indices and bloom intensity, achieving training R2 = 0.82 and testing R2 = 0.78 (RMSE = 0.025). The SVR’s kernel-based approach proved particularly effective in handling the complex spectral signatures associated with varying bloom densities.
Neural Networks (NN) with a three-layer architecture (input layer: spectral indices; hidden layer: 10 nodes with ReLU activation; output layer: bloom intensity) achieved training R2 = 0.84 and testing R2 = 0.80 (RMSE = 0.023). The minimal gap between training and testing performance (0.04 R2) indicated successful regularization through dropout (rate = 0.2) and early stopping criteria.
Extreme Gradient Boosting (XGB) provided the strongest individual performance with training R2 = 0.86 and testing R2 = 0.83 (RMSE = 0.021). The gradient boosting framework’s iterative refinement approach and built-in regularization parameters (max_depth = 5, min_child_weight = 3) effectively prevented overfitting while capturing complex feature interactions.

3.3.2. Ensemble Integration and Combined Performance

The ensemble approach combined individual algorithm predictions through weighted averaging, with weights optimized on the validation set (SVR: 0.25, NN: 0.30, XGB: 0.45). This weighting reflected each algorithm’s demonstrated accuracy during cross-validation while leveraging their complementary strengths. The ensemble achieved a combined testing R2 = 0.82 with an overall correlation coefficient of 0.85 (p < 0.001) when validated against ORSANCO ground-truth measurements across the entire study period. Figure 8 demonstrates the robust performance of the ensemble approach across training and testing datasets.
Critical to addressing potential overfitting concerns, the ensemble demonstrated consistent performance across both training and testing periods, with testing performance declining by only 0.04 R2 relative to training. This minimal degradation, combined with the temporal separation of training and testing data, provides strong evidence that the model captures generalizable bloom detection patterns rather than dataset-specific artifacts. Validation across the 11 ORSANCO sampling stations confirmed system reliability with Nash-Sutcliffe Efficiency = 0.82, Root Mean Square Error = 0.023, and Mean Absolute Error = 0.018.

3.3.3. Feature Importance and Model Interpretability

Feature importance analysis revealed the relative contribution of spectral indices to bloom detection accuracy. The near-infrared band (NIR, 851–879 nm) contributed most significantly to detection accuracy (relative importance: 0.38), followed by the red band (RED, 636–673 nm, relative importance: 0.34), and the shortwave infrared band (SWIR, 1566–1651 nm, relative importance: 0.28). These importance rankings align with theoretical expectations, as the NIR band captures chlorophyll fluorescence and cellular scattering characteristics specific to cyanobacteria, while the red band reflects chlorophyll absorption, and SWIR provides crucial atmospheric correction capabilities [24,25].
The FAI and NDCI spectral indices, derived from these bands, provided complementary information that enhanced ensemble performance. FAI’s sensitivity to surface accumulations proved particularly valuable for detecting high-intensity blooms, while NDCI’s correlation with chlorophyll-a concentrations enabled quantitative assessment across a broader range of bloom intensities. The integration of both indices improved detection accuracy by 23% compared to single-index approaches (FAI alone: R2 = 0.67; NDCI alone: R2 = 0.63). Understanding feature importance and model behavior is critical for operational deployment [43,44], as interpretable models facilitate trust and appropriate use by water management decision-makers.

3.3.4. Cross-Validation Robustness and Environmental Variability

To further assess model robustness, we evaluated performance across different environmental conditions encountered during the 2015 event. The ensemble maintained accuracy above 80% (R2 > 0.80) across varying atmospheric conditions, including periods of elevated aerosol optical depth (AOD > 0.3) and high humidity (>75%), conditions that typically compromise satellite-based water quality assessments. Similarly, the model demonstrated consistent performance across the range of water clarity scenarios observed in the Ohio River system, from highly turbid conditions near tributary confluences (Secchi depth < 0.5 m) to clearer pool sections (Secchi depth > 1.5 m).
This robust performance across diverse environmental conditions suggests that the methodology could be adapted to other large river systems with minimal modification. The use of standardized Landsat Collection 2 Level-2 surface reflectance products and cloud-based Google Earth Engine processing ensures reproducibility and scalability across different geographic regions, while the temporal cross-validation approach provides confidence in the model’s predictive capabilities for future bloom events.
The analytical framework’s combination of rigorous temporal validation, ensemble integration, and comprehensive performance assessment across varying conditions establishes a methodologically sound foundation for operational HAB detection in large river systems. The demonstrated ability to maintain high accuracy while avoiding overfitting addresses critical requirements for satellite-based monitoring systems intended for water quality management applications.

3.4. Field Validation Results

Field validation of satellite-derived bloom detection utilized comprehensive ground-truth data from ORSANCO’s monitoring program during the 2015 HAB event. The validation framework integrated 11 strategically positioned sampling stations along the affected river reach, with microcystin concentrations ranging from 0.3 to 590.0 μg/L, providing robust coverage across the full spectrum of bloom intensities observed during the event. Figure 9 presents the validation results comparing FAI values with field-measured microcystin concentrations across 127 matchups.

3.4.1. Dynamic Buffer Validation Methodology

The spatial matching procedure employed a dynamic buffer system designed to account for both river flow dynamics and temporal separation between satellite overpass and ground sampling. Buffer sizes were calculated using:
Buffer_size = Base_buffer + (Flow_velocity × Time_difference)
where the base buffer of 5 km was determined through optimization testing, flow velocity was measured at each sampling station (ranging from 0.15 to 0.45 m/s during the study period), and time difference represented the hours between Landsat overpass and ground sampling. This approach resulted in buffer sizes ranging from 5.2 km to 8.7 km depending on local hydrodynamic conditions, with larger buffers applied during periods of higher flow velocity or greater temporal separation.
The dynamic buffer approach achieved 87% successful spatial matching between satellite pixels and ground-truth measurements, compared to 64% success rate using fixed 5 km buffers. This improvement proved particularly valuable in accurately capturing bloom extent in high-flow river sections where static buffers would have introduced significant spatial uncertainty. Validation results demonstrated strong quantitative agreement between satellite-derived indices and ground measurements across the entire range of buffer sizes, with no systematic bias introduced by buffer magnitude (correlation coefficient: r = 0.85, p < 0.001; slope = 0.97, indicating near 1:1 correspondence).

3.4.2. Spatial Validation Performance

Spatial validation across the 11 ORSANCO monitoring stations demonstrated strong agreement between satellite-derived Floating Algae Index (FAI) values and ground-measured microcystin concentrations. The FAI achieved a correlation coefficient of r = 0.87 (p < 0.001) with laboratory-measured microcystin levels, with Root Mean Square Error (RMSE) = 0.023 and Mean Absolute Error (MAE) = 0.018. Nash-Sutcliffe Efficiency (NSE) of 0.82 indicated strong predictive capability suitable for operational water quality management applications.
Normalized Difference Chlorophyll Index (NDCI) showed similarly robust performance, with correlation coefficient r = 0.73 (p < 0.001) against laboratory chlorophyll-a measurements. NDCI performance varied with water depth and turbidity conditions, performing optimally in pool sections with moderate turbidity (Secchi depth 0.8–1.2 m, r = 0.81) and showing reduced accuracy in highly turbid tributary confluence zones (Secchi depth < 0.5 m, r = 0.64). This pattern aligns with known limitations of red-NIR algorithms in extremely turbid waters and informed our ensemble approach that integrates multiple spectral indices to maintain accuracy across diverse conditions.
Station-specific validation revealed consistent performance across the spatial extent of the bloom. Upstream stations (river miles 84–300) showed mean absolute error of 0.016, mid-river stations (river miles 300–500) demonstrated MAE = 0.019, and downstream stations (river miles 500–720) exhibited MAE = 0.020. This spatial consistency confirms that the detection methodology performs reliably across the longitudinal gradient of environmental conditions present in the Ohio River system.

3.4.3. Temporal Validation and Early Detection Capability

Temporal analysis of satellite detection relative to ground-based monitoring revealed consistent early warning capabilities throughout the 2015 bloom event. Satellite-derived indices detected bloom formation signatures 5–7 days prior to ground-based confirmation across multiple phases of bloom development:
Initial Formation Phase (15–21 August 2015): Landsat 8 imagery from 15 August showed elevated FAI values (mean = 0.018, range: 0.012–0.028) at Pike Island Locks and Dam (river mile 84.2), indicating surface accumulation signatures. ORSANCO field sampling on 21 August confirmed visible surface scums with microcystin concentrations of 127 μg/L at this location, representing a 6-day advance detection advantage.
Expansion Phase (28 August–3 September 2015): Satellite analysis detected bloom expansion downstream to river mile 450 by 28 August (FAI > 0.015 continuously across 365.8 river miles). Ground-based monitoring confirmed bloom presence at downstream stations during 1–3 September sampling, representing 4–5 days advance warning of bloom progression.
Peak Development Phase (8–9 September 2015): During the intensive field sampling period when ORSANCO deployed teams across the affected reach, satellite-derived bloom extent (636.5 river miles with FAI > 0.010) exceeded the spatial coverage achievable through point sampling by more than 20-fold. Comparison of 8 September satellite imagery with same-day field measurements at 9 locations showed correlation coefficient r = 0.89, validating detection accuracy during peak bloom conditions.
This consistent temporal advantage of 5–7 days provides crucial lead time for water resource managers to implement protective measures, adjust water treatment protocols, and issue public health advisories before bloom conditions reach critical levels. The early detection capability proved consistent across varying bloom intensities, from initial low-level detection (FAI = 0.010–0.015) to peak concentrations (FAI > 0.030).

3.4.4. Environmental Relationships and Bloom Dynamics

Validation data revealed strong quantitative relationships between bloom intensity and environmental drivers. Water temperature demonstrated the strongest correlation with satellite-derived bloom intensity (r = 0.82, p < 0.001), with optimal bloom conditions occurring between 25–28 °C. A critical temperature threshold of 23 °C was identified, below which bloom formation was significantly reduced. During the 2015 event, water temperatures remained above this threshold from 15 August through 15 September, creating sustained favorable conditions for bloom development and persistence.
Flow dynamics showed clear threshold effects on bloom formation and accumulation. Statistical analysis of flow velocity measurements at ORSANCO stations revealed a critical velocity threshold of 0.3 m/s, below which significant algal accumulation occurred (mean FAI = 0.024 ± 0.008). Above this threshold, bloom intensity decreased exponentially (FAI = 0.012 × e^(−2.3v), where v = velocity in m/s, R2 = 0.76). During the 2015 event, low-flow conditions prevailed across much of the affected reach (median velocity = 0.24 m/s), promoting bloom accumulation and persistence.
Nutrient relationships, while more complex due to limited concurrent nutrient sampling, showed significant patterns when nitrogen-to-phosphorus ratios fell below 16:1 (the Redfield ratio). At the subset of stations where concurrent nutrient data were available (n = 6), N:P ratios below 16:1 were associated with elevated bloom intensity (mean FAI = 0.028) compared to stations with higher N:P ratios (mean FAI = 0.015, t-test p = 0.018). Urban-influenced river reaches showed consistently lower N:P ratios and higher bloom intensities, supporting the relationship between nutrient loading patterns and bloom development.
These environmental relationships provide both explanatory power for understanding the 2015 event dynamics and predictive potential for future risk assessment. The identified temperature and flow thresholds offer water managers specific criteria for assessing elevated bloom risk during critical seasonal periods, while the nutrient relationships inform watershed management strategies targeting bloom prevention.

3.4.5. Validation Accuracy Assessment and Uncertainty Analysis

Comprehensive accuracy assessment across all validation matchups (n = 127 satellite-ground pairs across 11 stations and multiple dates) yielded overall detection accuracy of 87% when compared to ground-truth bloom classifications. False positive rate (satellite detection without ground confirmation) was 12%, occurring primarily during periods of high suspended sediment following tributary inflow events. False negative rate (ground-confirmed bloom without satellite detection) was 15%, attributable to cloud cover interference (8%), narrow bloom features below Landsat’s 30 m resolution (5%), and bloom subsurface positioning not detectable by surface reflectance (2%).
Statistical validation metrics demonstrated performance suitable for operational deployment:
  • Pearson correlation coefficient: r = 0.87 (95% CI: 0.82–0.91);
  • Root Mean Square Error: RMSE = 0.023 (normalized RMSE = 18%);
  • Mean Absolute Error: MAE = 0.018;
  • Nash-Sutcliffe Efficiency: NSE = 0.82;
  • Bias: −0.003 (indicating slight underestimation tendency).
Uncertainty analysis revealed that detection accuracy remained above 80% across the range of environmental conditions encountered during the 2015 event, including challenging scenarios such as high atmospheric turbidity (aerosol optical depth > 0.3), variable water clarity (Secchi depth 0.3–1.8 m), and mixed water sources at tributary confluences. This robustness across diverse conditions supports the methodology’s applicability for operational monitoring of large river systems.
The validation results establish that satellite-based detection using Landsat imagery and machine learning approaches can provide both spatially comprehensive and temporally advanced HAB monitoring capabilities that significantly exceed the coverage possible through conventional ground-based sampling alone, while maintaining quantitative accuracy suitable for water quality management decision-making.

3.5. Technical Implementation and Operational Feasibility

The technical implementation framework demonstrated that satellite-based HAB monitoring can achieve both high analytical accuracy and operational efficiency suitable for deployment in water management systems. The integration of multiple spectral indices with Google Earth Engine (GEE) cloud computing capabilities enabled near real-time processing while maintaining robust detection accuracy across varying environmental conditions encountered in large river systems.

3.5.1. Multi-Index Spectral Analysis Performance

The combined FAI-NDCI approach enhanced detection accuracy by 23% compared to single-index methods, with FAI achieving 87% success rate in surface accumulation detection when validated against ground-truth observations and NDCI showing strong correlation (r = 0.73, p < 0.001) with laboratory chlorophyll-a measurements. This performance improvement stems from the complementary information provided by each index: FAI’s sensitivity to surface scum formation through near-infrared reflectance peaks, and NDCI’s quantification of chlorophyll-a concentration through red-edge spectral features. Similar GEE-based approaches have successfully monitored water quality parameters using NDCI in other aquatic systems [45].
Individual index performance revealed specific strengths and limitations that informed the ensemble approach. FAI performed optimally during high-intensity bloom conditions (microcystin > 100 μg/L, correlation r = 0.91) but showed reduced sensitivity at lower concentrations (microcystin < 20 μg/L, r = 0.68). Conversely, NDCI maintained more consistent performance across the concentration range (r = 0.70–0.76) but showed sensitivity to water depth and turbidity variations. In pool sections with moderate turbidity (Secchi depth 0.8–1.2 m), NDCI achieved optimal performance (r = 0.81), while highly turbid tributary confluence zones showed reduced accuracy (Secchi depth < 0.5 m, r = 0.64).
The ensemble machine learning framework leveraged these complementary strengths by adaptively weighting each index based on local environmental conditions. In clear water sections with surface accumulations, FAI received higher weighting (0.65 vs. 0.35), while in turbid sections without visible surface scums, NDCI contributed more strongly (0.60 vs. 0.40). This adaptive integration produced the 23% accuracy improvement over single-index approaches, with combined performance maintaining R2 = 0.82 across the full range of environmental conditions encountered during the 2015 event.

3.5.2. Atmospheric Correction and Quality Control

Atmospheric correction procedures contributed significantly to enhanced performance, with modified algorithms reducing atmospheric interference effects by approximately 23% compared to standard correction methods applied to Landsat Collection 2 Level-2 surface reflectance products. These improvements proved particularly valuable during high humidity conditions (>75% relative humidity) and elevated aerosol loading scenarios (AOD > 0.3), which frequently compromise satellite-based water quality assessments in the Ohio River Valley during late summer months.
The atmospheric correction workflow incorporated multiple quality control steps designed specifically for riverine environments. Cloud masking algorithms adapted from Zhang et al. [34] achieved 92% accuracy when validated against high-resolution aerial imagery, effectively distinguishing between true cloud cover and high-intensity algal blooms that can produce similar reflectance signatures. Water masking procedures utilized an enhanced NDWI calculation accounting for turbidity variations common in large river systems, maintaining reliable water surface delineation even in highly turbid tributary confluence zones.
Quality assessment protocols evaluated each scene based on weighted criteria: cloud coverage percentage (weight: 0.4), sensor viewing angle (weight: 0.3), and sun elevation (weight: 0.3). Scenes scoring above the 0.65 threshold on this composite quality metric were retained for analysis, ensuring that final bloom detection products met minimum quality standards suitable for operational water management applications. During the 2015 study period, 68% of potential Landsat overpasses met these quality criteria, providing sufficient temporal coverage for bloom monitoring despite cloud cover constraints.

3.5.3. Computational Efficiency and Scalability

The Google Earth Engine implementation achieved remarkable computational efficiency, processing the complete 2015 event dataset (47 Landsat scenes covering 981 river miles over 75 days) in approximately 4.2 h of total processing time. This represents a 65% reduction compared to traditional desktop processing approaches, which would require estimated 12–15 h for equivalent analysis. The efficiency gains stem from GEE’s parallel processing architecture, which distributes spectral index calculations and machine learning predictions across cloud computing infrastructure. Figure 10 visualizes the workflow through the GEE process.
Optimization strategies contributed to the computational efficiency while maintaining spatial accuracy. The study area definition employed a modified adaptive threshold approach, generating river centerline points at 1-km intervals and applying a 2-km buffer zone optimized through iterative testing. This focused analysis corridor encompassed the river’s full width and immediate surroundings while reducing processing time by 65% compared to full-scene analysis. The buffer width balanced detection completeness (ensuring capture of near-shore blooms and tributary inputs) against computational efficiency.
Temporal compositing procedures generated weekly maximum-value composites to minimize cloud cover impacts while retaining peak bloom intensity information. This approach proved particularly valuable for operational monitoring, as weekly composites provide sufficient temporal resolution for management decision-making while maximizing data availability even during periods of frequent cloud cover. During the 2015 event, weekly compositing increased usable data availability from 68% (individual scenes meeting quality criteria) to 89% (weekly composites containing at least partial coverage).

3.5.4. Operational Deployment Framework

The technical framework demonstrates operational feasibility for near real-time HAB monitoring suitable for integration into water management systems. Processing latency from satellite overpass to bloom detection product delivery can be reduced to 6–12 h using GEE’s automated workflow capabilities, providing water managers with timely information for decision-making. This latency compares favorably with conventional monitoring approaches, where laboratory analysis of field samples typically requires 24–48 h from collection to results reporting.
The framework’s scalability supports extension to multiple river systems without requiring significant additional infrastructure investment. The use of standardized Landsat Collection 2 Level-2 products ensures consistency across different geographic regions, while GEE’s cloud computing architecture eliminates requirements for local high-performance computing resources. Water management agencies can access the processing framework through standard web browsers, reducing technical barriers to implementation.
Automated quality control procedures and atmospheric correction algorithms ensure consistent data quality while reducing need for manual intervention and expert oversight. The ensemble machine learning model, once trained and validated for the Ohio River system, can be applied systematically to new imagery as it becomes available, with automated flagging of potential bloom conditions exceeding user-defined thresholds. This automation capability is essential for operational systems requiring reliable performance without continuous expert supervision.

3.5.5. Integration with Existing Monitoring Systems

The framework’s compatibility with existing ORSANCO monitoring protocols enables gradual integration rather than complete replacement of traditional approaches. Satellite-based detection can complement ground-based sampling by identifying areas requiring intensive monitoring, optimizing allocation of limited field sampling resources. During the 2015 event, satellite analysis identified urban and tributary hotspots (Section 3.1) that could have guided targeted field sampling to locations with highest bloom intensity and greatest public health relevance.
The temporal resolution of satellite monitoring (8-day combined Landsat 7/8 coverage during the study period, with potential 2–3-day resolution through integration of Sentinel-2 data) complements rather than replaces weekly to bi-weekly ground-based sampling intervals. Satellite data provides comprehensive spatial awareness and early warning capabilities, while ground-based sampling delivers species identification, toxin quantification, and validation measurements essential for public health risk assessment.
Data sharing frameworks can facilitate integration between satellite and ground-based monitoring programs. Automated bloom detection products can be delivered to water management agencies through web-based dashboards or data feeds compatible with existing water quality databases. The GEE-based framework demonstrated in this study provides proof-of-concept for such operational systems, with potential for expansion to automated alert generation and public notification systems as monitoring programs mature [46].

3.5.6. Cost-Effectiveness and Resource Optimization

The technical implementation demonstrates significant potential for cost savings in water quality monitoring programs. Traditional monitoring programs require substantial resources for sample collection, laboratory analysis, and field personnel deployment, often limiting sampling frequency and spatial coverage. The incremental cost of satellite-based monitoring, once initial framework development is complete, primarily involves data processing and analysis time rather than field operations, potentially reducing per-sample monitoring costs.
Economic assessments of similar satellite-based monitoring systems suggest that operational cost per water body can be reduced by 40–60% compared to equivalent ground-based sampling intensity [42], while providing superior spatial coverage and temporal frequency. For the Ohio River system, where comprehensive monitoring across 981 miles would require prohibitive field sampling effort, satellite-based approaches offer practical means of maintaining broad surveillance capabilities while directing intensive field sampling to high-priority locations identified through remote sensing.
The framework’s efficiency gains and operational feasibility establish technical foundations for transitioning satellite-based HAB monitoring from research applications to operational water management tools. The demonstrated processing efficiency (4.2 h for complete event analysis), automation capabilities, and integration compatibility with existing monitoring systems address key barriers to operational deployment, while the validation results (Section 3.4) provide confidence in analytical reliability suitable for management decision-making.

4. Discussion

4.1. Methodological Limitations and Future Directions

Despite significant advances in satellite-based HAB monitoring for the Ohio River system, several important limitations affect system performance and generalizability. Understanding these constraints is essential for appropriate result interpretation and future development.

4.1.1. Temporal and Spatial Resolution Constraints

Landsat’s 16-day single-satellite revisit cycle limits temporal resolution for capturing rapid bloom dynamics. The combined Landsat 7 and 8 constellations provided 8-day coverage, adequate for the 2015 event’s extended duration (75 days) but potentially missing faster-developing blooms. Cloud cover further reduced usable imagery to approximately 58% of potential observations, creating detection gaps during critical bloom formation periods. The 30-m spatial resolution, while sufficient for main-channel detection, may miss small-scale features, narrow tributary blooms, or near-shore accumulations affecting specific water intakes. Mixed-pixel effects at river-land interfaces introduce uncertainty, particularly in narrow river sections where approximately 8% of detection areas consist of edge pixels potentially affected by adjacency effects.

4.1.2. Model Generalizability and Training Data

The ensemble machine learning approach, while demonstrating strong validation performance (R2 = 0.82, NSE = 0.82), relies on data from a single bloom event (2015) dominated by Microcystis aeruginosa. Generalizability to blooms with different species composition, environmental conditions, or seasonal timing remains uncertain. Model training encompassed specific environmental ranges (water temperature: 21–29 °C, flow velocity: 0.15–0.45 m/s, turbidity: Secchi depth 0.3–1.8 m), and performance outside these ranges requires validation. The spectral indices (FAI and NDCI) assume that red and near-infrared spectral features primarily reflect phytoplankton biomass, which generally holds but may produce false positives during extreme suspended sediment conditions, as evidenced by the observed 12% false positive rate during high sediment periods.

4.1.3. Atmospheric Correction and Radiometric Uncertainty

Residual uncertainties persist in atmospheric correction despite a 23% improvement over standard methods. High aerosol loading during late summer in the Ohio River Valley introduces measurement uncertainty that standard algorithms may not fully resolve. Without concurrent in-situ radiometric measurements during the 2015 study period, absolute accuracy of surface reflectance retrievals remains uncertain. Sun glint contamination, while partially addressed through quality flagging, affected approximately 5% of observations. These atmospheric challenges represent inherent limitations of passive optical remote sensing that require continued refinement.

4.1.4. Detection Limitations and System-Specific Considerations

Vertical distribution of cyanobacterial blooms presents a fundamental constraint, as passive optical sensors primarily observe the uppermost meter of the water column [47]. Spectral ambiguity between cyanobacterial blooms and other high-biomass phytoplankton assemblages challenges definitive species identification without ground-based sampling. The methodology’s development and validation for the Ohio River system raises questions about transferability to rivers with different hydrodynamic characteristics, optical properties, or geographic settings. Free-flowing sections with higher velocities may experience different bloom dynamics than the lock-and-dam pool structure characterizing the Ohio River.

4.1.5. Future Enhancement Opportunities

Clear pathways exist for addressing current constraints and improving system capabilities:
  • Multi-platform integration incorporating Sentinel-2 MSI (10-m resolution, 5-day revisit) could improve temporal resolution to 2–3 days, substantially reducing cloud cover impacts and enabling detection of shorter-duration events. Machine learning approaches have proven adaptable across diverse remote sensing application [48], supporting transferability to multi-platform integration.
  • Enhanced atmospheric correction algorithms specifically optimized for inland waters (e.g., ACOLITE, SeaDAS, iCOR) could reduce residual uncertainties, particularly when integrated with AERONET aerosol monitoring.
  • Species discrimination capabilities through additional spectral features, particularly phycocyanin-sensitive bands available on Sentinel-3 OLCI (620 nm), could improve distinction between cyanobacterial blooms and other phytoplankton assemblages. Integration of AI approaches with remote sensing has shown promise in monitoring wetland ecosystems [49], suggesting potential for adaptation to HAB species discrimination.
  • Predictive modeling integration coupling satellite observations with hydrodynamic-ecological models could extend early warning beyond the current 5–7-day advantage by forecasting bloom development based on environmental drivers.
  • Multi-year validation programs encompassing diverse bloom events, species compositions, and environmental conditions would strengthen confidence in model generalizability and operational reliability.
  • Cross-system validation extending methodology to other major river systems (Mississippi, Columbia, Tennessee, Missouri) would test generalizability while building robust multi-system training datasets.
These limitations should be interpreted not as fundamental constraints on feasibility but as opportunities for continued refinement. The demonstrated capabilities—comprehensive spatial detection, 5–7-day early warning, and 87% validation accuracy—establish satellite monitoring as a viable complement to traditional approaches, with clear pathways for addressing current constraints through technological advances and expanded validation efforts.

4.2. Implications for Water Resource Management

The demonstrated capabilities of satellite-based HAB monitoring—comprehensive spatial detection (636.5 miles vs. 30 miles initially reported), 5–7-day early warning, and high validation accuracy (NSE = 0.82)—establish transformative potential for water quality management in large river systems.

4.2.1. Public Health Protection and Operational Integration

The 5–7-day early warning capability provides crucial lead time for protecting public health before bloom conditions reach critical levels. During the 2015 event, which triggered multi-state advisories affecting over 5 million people, satellite-based detection could have enabled proactive advisory implementation during initial formation (15–21 August) rather than reactive responses after extensive confirmation (21–28 August). Comprehensive spatial coverage enables facility-specific bloom assessment for the 47 water treatment facilities drawing from the Ohio River, allowing proactive treatment protocol adjustments rather than reactive responses. The precision supports both public safety and economic considerations, as targeted advisories can reduce unnecessary access restrictions to unaffected recreational areas.
The framework’s compatibility with existing ORSANCO monitoring protocols enables gradual integration rather than replacement. Satellite-based detection complements ground-based sampling by providing comprehensive spatial awareness and early warning, while ground-based monitoring delivers species identification, toxin quantification, and validation measurements essential for public health risk assessment. Operational integration can follow a phased strategy, initially using satellite-based screening to identify areas requiring intensive ground follow-up, optimizing limited field sampling resources.

4.2.2. Economic Efficiency and Strategic Watershed Management

Economic assessments of similar satellite-based systems suggest operational costs can be reduced by 40–60% compared to equivalent ground-based sampling intensity, while providing superior spatial coverage and temporal frequency. For the Ohio River system, where comprehensive monitoring across 981 miles would require prohibitive field effort, satellite approaches offer practical means of maintaining broad surveillance while directing intensive sampling to high-priority locations identified through remote sensing. The demonstrated processing efficiency (4.2 h for complete event analysis) and near real-time delivery potential (6–12 h latency) enable timely integration into operational decision-making workflows.
The spatial patterns revealed—urban and tributary hotspots, pool-to-pool variability, discontinuous distribution—inform strategic watershed management. Nutrient reduction strategies can be spatially targeted to areas with greatest impact on bloom development, potentially improving cost-effectiveness. The identification of elevated intensity near major urban centers and tributary confluences suggests that management efforts focusing on point source controls and tributary watershed management could achieve disproportionate benefit relative to investment. Long-term satellite monitoring enables evaluation of watershed-scale management interventions by tracking spatial patterns of bloom intensity relative to implemented strategies.

4.2.3. Climate Change Adaptation and Long-Term Planning

The quantitative relationships between bloom intensity and environmental drivers provide critical insights for climate adaptation planning. The observed temperature threshold (23 °C) and optimal bloom conditions (25–28 °C) suggest that warming trends will likely increase HAB frequency and intensity. Flow velocity relationships, with the critical 0.3 m/s threshold, highlight vulnerability to changing hydrologic patterns. Climate projections indicating increased frequency of late summer low-flow conditions in the Ohio River Valley suggest elevated future HAB risk. Comprehensive spatial coverage supports evaluation of management interventions, with long-term satellite records enabling detection of trends in bloom frequency, intensity, and spatial distribution that inform watershed management strategies and infrastructure planning.
The framework’s standardized data products and cloud-based processing ensure methodological consistency across different geographic regions, facilitating multi-system implementation without region-specific algorithm development. Regional collaboration among water resource agencies managing different river systems could enable shared implementation costs and coordinated monitoring frameworks, expanding operational satellite-based HAB surveillance beyond the Ohio River to other major systems facing similar challenges.

4.3. Scientific Contributions and Broader Context

This research contributes significantly to understanding HAB dynamics in riverine environments while demonstrating methodological advances in remote sensing and machine learning integration for environmental monitoring.

4.3.1. Advances in Riverine HAB Understanding

The quantitative environmental relationships identified provide new insights into bloom formation mechanisms. Temperature correlation with bloom intensity (r = 0.82, p < 0.001) and identification of a critical 23 °C threshold establish quantitative frameworks for understanding thermal controls in flowing waters. The flow velocity relationship, demonstrating exponential decrease in bloom intensity above 0.3 m/s (FAI = 0.012 × e^(−2.3v), R2 = 0.76), provides empirical support for theoretical models linking hydrodynamic residence time to bloom development. Nutrient relationships at N:P ratios below 16:1, with urban-influenced reaches showing 35–60% elevated intensity, support hypotheses about legacy nutrient loading impacts while highlighting tributary inputs and point-source contributions in large river systems.

4.3.2. Methodological Advances and Framework Contributions

The integration of ensemble machine learning with established spectral indices (FAI and NDCI) represents methodological advance over simpler threshold-based detection, achieving 23% accuracy improvement. The temporal holdout cross-validation methodology addresses critical limitations in previous studies, where validation often relied on contemporaneous matchups without demonstrating true predictive capability. The minimal performance degradation between training and testing (0.04 R2 decrease) establishes that ensemble approaches can achieve reliable predictive performance while avoiding overfitting. The dynamic buffer validation framework, accounting for river flow dynamics and temporal separation, achieved 87% successful matching compared to 64% with fixed buffers, demonstrating the importance of incorporating hydrodynamic considerations in validation frameworks for flowing water systems.
The Google Earth Engine implementation demonstrates that cloud-based processing frameworks can achieve near real-time performance (4.2 h for complete event analysis) suitable for operational monitoring, representing a 65% efficiency improvement over traditional approaches. The optimization strategies developed—adaptive corridor definition, parallel processing, temporal compositing—provide replicable frameworks applicable to other large river systems. The framework’s demonstrated scalability using standardized products processable through web-based interfaces reduces technical barriers for water management agencies, democratizing satellite-based monitoring capabilities.

4.3.3. Integration Across Scales and Future Research Directions

The research demonstrates successful integration across multiple spatial scales—from 30-m pixels to 636.5-mile extent—and temporal scales—from daily to seasonal dynamics. This multi-scale integration addresses a fundamental challenge in environmental monitoring: connecting fine-resolution measurements to system-scale understanding relevant for management decisions. The ability to detect localized hotspots while simultaneously characterizing system-wide patterns illustrates how satellite approaches provide hierarchical spatial information unattainable through point-sampling networks.
Critical research priorities for future work include:
  • Multi-system validation testing methodology generalizability across diverse river systems with different hydrodynamic characteristics, optical properties, and bloom species composition.
  • Species-specific detection and toxin prediction through integration of phycocyanin-sensitive spectral features and machine learning approaches utilizing full spectral signatures [49].
  • Predictive modeling integration coupling satellite observations with hydrodynamic-ecological models to extend early warning capabilities beyond current 5–7-day detection advantage.
  • Enhanced atmospheric correction and radiometric validation through dedicated campaigns with coincident optical measurements.
  • High-frequency temporal monitoring integrating multiple satellite platforms (Landsat 8/9, Sentinel-2A/B, Sentinel-3A/B) to achieve 2–3-day temporal resolution [48].
  • Long-term trend analysis applying methodology to historical Landsat archives (1984−present) to reveal temporal changes in HAB frequency, intensity, and spatial distribution.
This research contributes to broader paradigm shifts in environmental monitoring toward integrated Earth observation systems combining satellite remote sensing, ground-based networks, and predictive modeling. The demonstrated capabilities challenge traditional monitoring paradigms based primarily on sparse point sampling, establishing that comprehensive spatial awareness is achievable and operationally valuable for water quality management. The methodological framework—combining cloud computing, machine learning, and rigorous temporal validation—provides a template applicable beyond HAB monitoring to diverse water quality parameters and environmental monitoring applications.

5. Conclusions

This study demonstrates that integrating satellite remote sensing with ensemble machine learning provides transformative capabilities for harmful algal bloom detection and monitoring in large river systems. Analysis of the 2015 Ohio River HAB event revealed satellite-based monitoring detected bloom extent across 636.5 river miles—a 20-fold increase compared to the 30-mile extent initially reported through conventional monitoring—while providing consistent 5–7-day early warning before ground-based confirmation.
The ensemble framework combining Support Vector Regression, Neural Networks, and Extreme Gradient Boosting achieved robust predictive performance (R2 = 0.82, correlation coefficient = 0.85) validated through rigorous temporal holdout cross-validation. Integration of multiple spectral indices (FAI and NDCI) improved detection accuracy by 23% compared to single-index approaches, with validation across 11 ORSANCO monitoring stations confirming spatial consistency (Nash-Sutcliffe Efficiency = 0.82). The dynamic buffer validation methodology achieved 87% successful spatial matching, providing rigorous validation for flowing water systems.
Quantitative relationships between bloom intensity and environmental drivers established predictive frameworks for risk assessment. Water temperature showed strongest correlation (r = 0.82), with a critical 23 °C threshold below which bloom formation was significantly reduced. Flow velocity exhibited clear threshold effects, with velocities below 0.3 m/s promoting accumulation while higher velocities produced exponential decreases in intensity. These relationships provide water resource managers with specific criteria for assessing elevated bloom risk and inform watershed management strategies.
The Google Earth Engine implementation achieved remarkable computational efficiency (4.2 h for complete event analysis, 65% reduction compared to traditional approaches) while maintaining analytical accuracy. Processing latency of 6–12 h from satellite overpass to product delivery enables near real-time integration into water management decision-making. The demonstrated early warning capability (5–7 days advance detection) provides crucial lead time for implementing protective measures, enabling proactive adjustment of water treatment protocols and effective advisory implementation.
This research establishes satellite-based monitoring as a viable and valuable complement to traditional ground-based approaches, providing capabilities that fundamentally enhance HAB surveillance. The framework’s compatibility with existing monitoring protocols enables gradual integration, leveraging complementary strengths of satellite detection (comprehensive spatial coverage, early warning) and ground-based sampling (species identification, toxin quantification, validation). Methodological advances—temporal holdout cross-validation, dynamic buffer validation, ensemble machine learning integration, cloud-based processing optimization—provide frameworks applicable to other large river systems facing HAB challenges.
As climate change intensifies HAB frequency and severity globally, the comprehensive surveillance capabilities and early warning potential of satellite-based approaches will become increasingly essential for protecting water quality, ecosystem health, and public safety in large river systems worldwide. The 2015 Ohio River analysis proves that satellite remote sensing, combined with machine learning and rigorous validation, can fundamentally transform our ability to detect, track, and predict harmful algal blooms, offering water resource managers unprecedented tools for enhanced HAB surveillance and response.

Author Contributions

D.K. designed the research methodology, developed the Google Earth Engine processing framework, performed the satellite data analysis, conducted field validation, and wrote the original manuscript draft. J.J.Q. supervised the research, provided critical feedback on methodology and analysis, and contributed to manuscript review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data for the monitoring stations from ORSANCO can be found at: https://www.orsanco.org/programs/harmful-algal-blooms/ (accessed on 22 September 2025) this is an open source data download as well as contact information is available for the lead on the HAB program from ORSANCO. The Google Earth Engine models can be accessed here: https://douglaskaiser11.users.earthengine.app/view/hab-algal-view (accessed on 22 September 2025) and https://ee-douglaskaiser11.projects.earthengine.app/view/hab-ohio-river-timelapse (accessed on 22 September 2025).

Acknowledgments

The authors acknowledge the Ohio River Valley Water Sanitation Commission (ORSANCO) for providing comprehensive HAB monitoring data that enabled validation of satellite-based detection methods. We thank the ORSANCO staff for their continued efforts in water quality monitoring that protect public health across the Ohio River basin. We also acknowledge Google Earth Engine for providing the cloud computing platform that enabled efficient processing of multi-decadal Landsat imagery. The authors appreciate the constructive feedback from anonymous reviewers that strengthened this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CDOMCarbon Dissolved Organic Matter
CNNConvolutional Neural Network
ECCCEnvironment and Climate Change Canada
ELISAEnzyme-Linked Immunosorbent Assay
EPAEnvironmental Protection Agency
FAIFloating Algae Index
HABHarmful Algae Bloom
HISHyperspectral Imaging
L&DLocks and Dam
LC-MS/MSLiquid Chromatography with tandem Mass Spectrometry
LSTMLong Short-Term Memory
MERISMedium Resolution Imaging Spectrometer
MODISModerate Resolution Imaging Spectroradiometer
MSSMultispectral Scanner
NDCINormalized Difference Chlorophyll Index
NOAANational Oceanic and Atmospheric Administration
OLCIOcean and Land Colour instrument
ORMOhio River Mile
ORSANCOOhio River Valley Water Sanitation Commission
RMRiver Mile
RrsRemote Sensing Reflectance
SPATTSolid Phase Absorption Toxin Tracking
SVMSupport Vector Machine
TDIToxin Diversity Index
UASUnmanned Aircraft System

Appendix A

Appendix A.1

```javascript
var collection = ee.ImageCollection(‘LANDSAT/LC08/C02/T1_L2’)
 .filterBounds(studyArea)
 .filterDate(‘2015-08-15’, ‘2015-09-30’)
 .filter(ee.Filter.lt(‘CLOUD_COVER’, 70))
 .map(applyScaleFactors)
 .map(maskClouds);
```

Appendix A.2

```javascript
function enhancedWaterMask(image) {
 var ndwi = image.normalizedDifference([‘GREEN’, ‘NIR’]);
 var turbidityMask = image.normalizedDifference([‘RED’, ‘GREEN’]);
 var combinedMask = ndwi.gt(0.1).and(turbidityMask.lt(0.2));
 return image.updateMask(combinedMask);
}
```

References

  1. Kislik, C.; Dronova, I.; Grantham, T.E.; Kelly, M. Mapping algal bloom dynamics in small reservoirs using Sentinel-2 imagery in Google Earth Engine. Ecol. Indic. 2022, 140, 109041. [Google Scholar] [CrossRef]
  2. Lobo, F.d.L.; Nagel, G.W.; Maciel, D.A.; Carvalho, L.A.S.d.; Martins, V.S.; Barbosa, C.C.F.; Novo, E.M.L.d.M. AlgaeMAp: Algae Bloom Monitoring Application for Inland Waters in Latin America. Remote Sens. 2021, 13, 2874. [Google Scholar] [CrossRef]
  3. Nietch, C.T.; Gains-Germain, L.; Lazorchak, J.; Keely, S.P.; Youngstrom, G.; Urichich, E.M.; Astifan, B.; DaSilva, A.; Mayfield, H. Development of a Risk Characterization Tool for Harmful Cyanobacteria Blooms on the Ohio River. Water 2022, 14, 644. [Google Scholar] [CrossRef] [PubMed]
  4. Osorio, R.J.; Linhoss, A.; Murdock, J.; Yeager-Armstead, M.; Raju, M. Sensitivity analysis of a hydrodynamic and harmful algal model in a riverine system. Ecol. Model. 2024, 497, 110846. [Google Scholar] [CrossRef]
  5. Ohio River Valley Water Sanitation Commission (ORSANCO). Ohio River Harmful Algal Bloom Monitoring and Response Plan. 2021. Available online: https://www.orsanco.org/wp-content/uploads/2021/02/FINAL-2021-HAB-Monitoring-and-Response-Plan.pdf (accessed on 14 June 2025).
  6. Howard, M.D.A.; Smith, J.; Caron, D.A.; Kudela, R.M.; Loftin, K.; Hayashi, K.; Fadness, R.; Fricke, S.; Kann, J.; Roethler, M.; et al. Integrative monitoring strategy for marine and freshwater harmful algal blooms and toxins across the freshwater-to-marine continuum. Integr. Environ. Assess. Manag. 2023, 19, 586–604. [Google Scholar] [CrossRef] [PubMed]
  7. Ohio River Valley Water Sanitation Commission (ORSANCO). Harmful Algal Blooms. Available online: https://www.orsanco.org/programs/harmful-algal-blooms/ (accessed on 16 November 2024).
  8. Preece, E.P.; Cooke, J.; Plaas, H.; Sabo, A.; Nelson, L.; Paerl, H.W. Managing a Cyanobacteria Harmful Algae Bloom ‘Hotspot’ in the Sacramento—San Joaquin Delta, California. J. Environ. Manag. 2024, 351, 119606. [Google Scholar] [CrossRef]
  9. Igwaran, A.; Kayode, A.J.; Moloantoa, K.M.; Khetsha, Z.P.; Unuofin, J.O. Cyanobacteria Harmful Algae Blooms: Causes, Impacts, and Risk Management. Water Air Soil Pollut. 2024, 235, 71. [Google Scholar] [CrossRef]
  10. Dodds, W.K.; Bouska, W.W.; Eitzmann, J.L.; Pilger, T.J.; Pitts, K.L.; Riley, A.J.; Schloesser, J.T.; Thornbrugh, D.J. Eutrophication of US freshwaters: Analysis of potential economic damages. Environ. Sci. Technol. 2009, 43, 12–19. [Google Scholar] [CrossRef]
  11. Wolf, D.; Klaiber, H.A. Bloom and bust? Water quality valuation in an algae-impacted watershed. Agric. Resour. Econ. Rev. 2017, 46, 306–328. [Google Scholar]
  12. Bingham, M.; Sinha, S.K.; Lupi, F. Economic Benefits of Reducing Harmful Algal Blooms in Lake Erie; International Joint Commission, Great Lakes Regional Office: Windsor, ON, Canada, 2015.
  13. Stumpf, R.P.; Wynne, T.T.; Baker, D.B.; Fahnenstiel, G.L. Interannual variability of cyanobacterial blooms in Lake Erie. PLoS ONE 2016, 11, e0164479. [Google Scholar] [CrossRef]
  14. Gómez, J.A.D.; Alonso, C.A.; García, A.A. Remote Sensing as a Tool for Monitoring Water Quality Parameters for Mediterranean Lakes of European Union Water Framework Directive (WFD) and as a System of Surveillance of Cyanobacterial Harmful Algae Blooms (SCyanoHABs). Environ. Monit. Assess. 2011, 181, 317–334. [Google Scholar] [CrossRef]
  15. Gorham, T.; Jia, Y.; Shum, C.K.; Lee, J. Ten-year survey of cyanobacterial blooms in Ohio’s waterbodies using satellite remote sensing. Harmful Algae 2017, 66, 13–19. [Google Scholar] [CrossRef] [PubMed]
  16. Torres Palenzuela, J.M.; Vilas, L.G.; Bellas Aláez, F.M.; Pazos, Y. Potential Application of the New Sentinel Satellites for Monitoring of Harmful Algal Blooms in the Galician Aquaculture. Thalass. Rev. Cienc. Mar. 2020, 36, 85–93. [Google Scholar] [CrossRef]
  17. Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
  18. Kumar, L.; Mutanga, O. Google Earth Engine applications since inception: Usage, trends, and potential. Remote Sens. 2018, 10, 1509. [Google Scholar] [CrossRef]
  19. Amani, M.; Ghorbanian, A.; Ahmadi, S.A.; Kakooei, M.; Moghimi, A.; Mirmazloumi, S.M.; Moghaddam, S.H.A.; Mahdavi, S.; Ghahremanloo, M.; Parsian, S.; et al. Google Earth Engine cloud computing platform for remote sensing big data applications: A comprehensive review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5326–5350. [Google Scholar] [CrossRef]
  20. Tamiminia, H.; Salehi, B.; Mahdianpari, M.; Quackenbush, L.; Adeli, S.; Brisco, B. Google Earth Engine for geo-big data applications: A meta-analysis and systematic review. ISPRS J. Photogramm. Remote Sens. 2020, 164, 152–170. [Google Scholar] [CrossRef]
  21. Janga, B.; Asamani, G.; Sun, Z.; Cristea, N. A Review of Practical AI for Remote Sensing in Earth Sciences. Remote Sens. 2023, 15, 4112. [Google Scholar] [CrossRef]
  22. Mutanga, O.; Kumar, L. Google Earth Engine applications. Remote Sens. 2019, 11, 591. [Google Scholar] [CrossRef]
  23. Zhao, Q.; Yu, L.; Li, X.; Peng, D.; Zhang, Y.; Gong, P. Progress and trends in the application of Google Earth Engine to remote sensing at scale. Remote Sens. 2021, 13, 3778. [Google Scholar] [CrossRef]
  24. Mishra, S.; Mishra, D.R. Normalized difference chlorophyll index: A novel model for remote estimation of chlorophyll-a concentration in turbid productive waters. Remote Sens. Environ. 2012, 117, 394–406. [Google Scholar] [CrossRef]
  25. Oyama, Y.; Fukushima, T.; Matsushita, B.; Matsuzaki, H.; Kamiya, K.; Kobinata, H. Monitoring levels of cyanobacterial blooms using the visual cyanobacteria index (VCI) and floating algae index (FAI). ITC J. 2015, 38, 335–348. [Google Scholar] [CrossRef]
  26. Visitacion, M.R.; Alnin, C.A.; Ferrer, M.R.; Suñiga, L. Detection of Algal Bloom in the Coastal Waters of Boracay, Philippines Using Normalized Difference Vegetation Index (NDVI) and Floating Algae Index (FAI). Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, XLII-4/W19, 479–486. [Google Scholar] [CrossRef][Green Version]
  27. Adegun, A.A.; Viriri, S.; Tapamo, J.-R. Review of deep learning methods for remote sensing satellite images classification: Experimental survey and comparative analysis. J. Big Data 2023, 10, 93. [Google Scholar] [CrossRef]
  28. Alem, A.; Kumar, S. Deep Learning Methods for Land Cover and Land Use Classification in Remote Sensing: A Review. In Proceedings of the 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 4–5 June 2020; pp. 903–908. [Google Scholar] [CrossRef]
  29. Garg, S.; Jain, S.; Dube, N.; Varghese, N. Earth Observation Data Analytics Using Machine and Deep Learning: Modern Tools, Applications and Challenges, 1st ed.; Institution of Engineering & Technology: London, UK, 2023. [Google Scholar]
  30. Yuan, X.; Shi, J.; Gu, L. A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Syst. Appl. 2021, 169, 114417. [Google Scholar] [CrossRef]
  31. Hill, P.R.; Kumar, A.; Temimi, M.; Bull, D.R. HABNet: Machine Learning, Remote Sensing Based Detection and Prediction of Harmful Algal Blooms. arXiv 2020, arXiv:1912.02305. [Google Scholar] [CrossRef]
  32. Rawat, K.S.; Sahu, S.R.; Singh, S.K.; Chander, S.; Gujrati, A. Water Quality Analysis Using Normalized Difference Chlorophyll Index (NDCI) and Normalized Difference Turbidity Index (NDTI), Using Google Earth Engine Platform. In Proceedings of the 2023 International Conference on Modeling, Simulation & Intelligent Computing (MoSICom), Dubai, United Arab Emirates, 7–9 December 2023; pp. 408–413. [Google Scholar] [CrossRef]
  33. Jia, T.; Zhang, X.; Dong, R. Long-Term Spatial and Temporal Monitoring of Cyanobacteria Blooms Using MODIS on Google Earth Engine: A Case Study in Taihu Lake. Remote Sens. 2019, 11, 2269. [Google Scholar] [CrossRef]
  34. Zhang, G.; Wu, M.; Wei, J.; He, Y.; Niu, L.; Li, H.; Xu, G. Adaptive Threshold Model in Google Earth Engine: A Case Study of Ulva prolifera Extraction in the South Yellow Sea, China. Remote Sens. 2021, 13, 3240. [Google Scholar] [CrossRef]
  35. Roy, D.P.; Kovalskyy, V.; Zhang, H.K.; Vermote, E.F.; Yan, L.; Kumar, S.S.; Egorov, A. Characterization of Landsat-7 to Landsat-8 reflective wavelength and normalized difference vegetation index continuity. Remote Sens. Environ. 2016, 185, 57–70. [Google Scholar] [CrossRef]
  36. Postma, T.; Martínez-López, J.; Llodrà-Llabrés, J.; Alcaraz-Segura, D.; Pérez-Martínez, C. Dataset of processed Sentinel-2 images for chlorophyll-a estimation in high-altitude lakes in the Sierra Nevada, Spain [Dataset]. Zenodo 2023. [Google Scholar] [CrossRef]
  37. Vanhellemont, Q.; Ruddick, K. Atmospheric correction of metre-scale optical satellite data for inland and coastal water applications. Remote Sens. Environ. 2018, 216, 586–597. [Google Scholar] [CrossRef]
  38. Hu, C. A novel ocean color index to detect floating algae in the global oceans. Remote Sens. Environ. 2009, 113, 2118–2129. [Google Scholar] [CrossRef]
  39. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825−2830. [Google Scholar]
  40. Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol. 1970, 10, 282–290. [Google Scholar] [CrossRef]
  41. Moriasi, D.N.; Arnold, J.G.; Van Liew, M.W.; Bingner, R.L.; Harmel, R.D.; Veith, T.L. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans. ASABE 2007, 50, 885–900. [Google Scholar] [CrossRef]
  42. Stroming, S.; Robertson, M.; Mabee, B.; Kuwayama, Y.; Schaeffer, B. Quantifying the Human Health Benefits of Using Satellite Information to Detect Cyanobacterial Harmful Algal Blooms and Manage Recreational Advisories in U.S. Lakes. Geohealth 2020, 4, e2020GH000254. [Google Scholar] [CrossRef] [PubMed]
  43. Höhl, A.; Obadic, I.; Fernández Torres, M.Á.; Najjar, H.; Oliveira, D.; Akata, Z.; Dengel, A.; Zhu, X.X. Opening the Black-Box: A Systematic Review on Explainable AI in Remote Sensing. arXiv 2024, arXiv:2402.13791. [Google Scholar] [CrossRef]
  44. Klotz, J.; Burgert, T.; Demir, B. On the Effectiveness of Methods and Metrics for Explainable AI in Remote Sensing Image Scene Classification. arXiv 2025, arXiv:2507.05916. [Google Scholar] [CrossRef]
  45. Yılmaz, O.S.; Acar, U.; Sanli, F.B.; Gülgen, F.; Ateş, A.M. Investigation of Water Quality in Izmir Bay with Remote Sensing Techniques Using NDCI on Google Earth Engine Platform. Trans. GIS 2025, 29, 13301. [Google Scholar] [CrossRef]
  46. Lekki, J. Airborne Hyperspectral Sensing of Harmful Algal Blooms in the Great Lakes Region: System Calibration and Validation from Photons to Algae Information: The Processes In-Between; National Aeronautics and Space Administration, Glenn Research Center: Cleveland, OH, USA, 2017. [Google Scholar]
  47. Visser, F.; Buis, K.; Verschoren, V.; Meire, P. Depth Estimation of Submerged Aquatic Vegetation in Clear Water Streams Using Low-Altitude Optical Remote Sensing. Sensors 2015, 15, 25287–25312. [Google Scholar] [CrossRef] [PubMed]
  48. Mohan, A.; Singh, A.K.; Kumar, B.; Dwivedi, R. Review on remote sensing methods for landslide detection using machine and deep learning. Trans. Emerg. Telecommun. Technol. 2021, 32, e3998. [Google Scholar] [CrossRef]
  49. Goyal, M.K.; Jain, V.; Yadav, U. Monitoring India’s Ramsar Wetlands: Water Quality and Ecosystem Health via Remote Sensing and AI, 1st ed.; Springer Nature: Cham, Switzerland, 2025. [Google Scholar]
Figure 1. Study area map showing the Ohio River from Pittsburgh, PA to Cairo, IL with ORSANCO monitoring station locations, major dams, and the 2015 HAB-affected reach highlighted in orange.
Figure 1. Study area map showing the Ohio River from Pittsburgh, PA to Cairo, IL with ORSANCO monitoring station locations, major dams, and the 2015 HAB-affected reach highlighted in orange.
Remotesensing 17 04010 g001
Figure 2. Spectral index methodology flowchart showing processing steps from surface reflectance to FAI and NDCI calculation.
Figure 2. Spectral index methodology flowchart showing processing steps from surface reflectance to FAI and NDCI calculation.
Remotesensing 17 04010 g002
Figure 3. Landsat 7 ETM+ satellite imagery from 28 August 2015 showing harmful algal bloom detection in the Cincinnati reach of the Ohio River. (A) True color composite. (B) Floating Algae Index with GEE-matching color scale showing bloom detection (orange/red areas, FAI: 0.02–0.04). (C) Normalized Difference Chlorophyll Index showing chlorophyll distribution. (D) Validation point where field-measured microcystin was 12 μg/L on 15 September 2015. Approximately 1.2% of water pixels showed elevated FAI values (0.02–0.04) consistent with bloom presence.
Figure 3. Landsat 7 ETM+ satellite imagery from 28 August 2015 showing harmful algal bloom detection in the Cincinnati reach of the Ohio River. (A) True color composite. (B) Floating Algae Index with GEE-matching color scale showing bloom detection (orange/red areas, FAI: 0.02–0.04). (C) Normalized Difference Chlorophyll Index showing chlorophyll distribution. (D) Validation point where field-measured microcystin was 12 μg/L on 15 September 2015. Approximately 1.2% of water pixels showed elevated FAI values (0.02–0.04) consistent with bloom presence.
Remotesensing 17 04010 g003
Figure 4. Validation framework schematic showing spatio-temporal matching between satellite pixels and ground samples, with buffer zones and temporal windows illustrated.
Figure 4. Validation framework schematic showing spatio-temporal matching between satellite pixels and ground samples, with buffer zones and temporal windows illustrated.
Remotesensing 17 04010 g004
Figure 5. (a) 2D representation of original 30 Mile reported extent of HAB. (b) 2D representation of the 636.5 Miles detected utilizing satellite detection through GEE.
Figure 5. (a) 2D representation of original 30 Mile reported extent of HAB. (b) 2D representation of the 636.5 Miles detected utilizing satellite detection through GEE.
Remotesensing 17 04010 g005
Figure 6. 2D representation of the combined look between the original reported extent and the satellite analysis detection of the HAB event.
Figure 6. 2D representation of the combined look between the original reported extent and the satellite analysis detection of the HAB event.
Remotesensing 17 04010 g006
Figure 7. Illustrates the temporal coverage advantage achieved through the combined Landsat 7/8 constellation and documents the consistent 5–7-day early detection capability throughout the bloom event.
Figure 7. Illustrates the temporal coverage advantage achieved through the combined Landsat 7/8 constellation and documents the consistent 5–7-day early detection capability throughout the bloom event.
Remotesensing 17 04010 g007
Figure 8. Demonstrates the robust performance of the ensemble approach across training and testing datasets, with minimal degradation (Δ < 0.05 R2) indicating strong generalization without overfitting.
Figure 8. Demonstrates the robust performance of the ensemble approach across training and testing datasets, with minimal degradation (Δ < 0.05 R2) indicating strong generalization without overfitting.
Remotesensing 17 04010 g008
Figure 9. Presents comprehensive validation results comparing satellite-derived FAI values with field-measured microcystin concentrations across 127 matchups from both Landsat sensors, demonstrating equivalent detection capability (L8 R2 = 0.84, L7 R2 = 0.79) and robust performance across the full range of bloom intensities.
Figure 9. Presents comprehensive validation results comparing satellite-derived FAI values with field-measured microcystin concentrations across 127 matchups from both Landsat sensors, demonstrating equivalent detection capability (L8 R2 = 0.84, L7 R2 = 0.79) and robust performance across the full range of bloom intensities.
Remotesensing 17 04010 g009
Figure 10. GEE Processing Workflow.
Figure 10. GEE Processing Workflow.
Remotesensing 17 04010 g010
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kaiser, D.; Qu, J.J. Detecting Harmful Algae Blooms (HABs) on the Ohio River Using Landsat and Google Earth Engine. Remote Sens. 2025, 17, 4010. https://doi.org/10.3390/rs17244010

AMA Style

Kaiser D, Qu JJ. Detecting Harmful Algae Blooms (HABs) on the Ohio River Using Landsat and Google Earth Engine. Remote Sensing. 2025; 17(24):4010. https://doi.org/10.3390/rs17244010

Chicago/Turabian Style

Kaiser, Douglas, and John J. Qu. 2025. "Detecting Harmful Algae Blooms (HABs) on the Ohio River Using Landsat and Google Earth Engine" Remote Sensing 17, no. 24: 4010. https://doi.org/10.3390/rs17244010

APA Style

Kaiser, D., & Qu, J. J. (2025). Detecting Harmful Algae Blooms (HABs) on the Ohio River Using Landsat and Google Earth Engine. Remote Sensing, 17(24), 4010. https://doi.org/10.3390/rs17244010

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop