Chlorophyll-a in the Chesapeake Bay Estimated by Extra-Trees Machine Learning Modeling

Nezlin, Nikolay P.; Son, SeungHyun; Salem, Salem I.; Ondrusek, Michael E.

doi:10.3390/rs17132151

Open AccessArticle

Chlorophyll-a in the Chesapeake Bay Estimated by Extra-Trees Machine Learning Modeling

by

Nikolay P. Nezlin

^1,2

,

SeungHyun Son

^1,3,*

,

Salem I. Salem

⁴ and

Michael E. Ondrusek

¹

NOAA/NESDIS Center for Satellite Applications and Research, 5830 University Research Court, College Park, MD 20740, USA

²

Global Science & Technology, Inc., 7501 Greenway Center Drive, Suite 1100, Greenbelt, MD 20770, USA

³

Cooperative Institute for Satellite Earth System Studies (CISESS), Earth System Science Interdisciplinary Center (ESSIC), University of Maryland, 5825 University Research Court, College Park, MD 20742, USA

⁴

Plymouth Marine Laboratory, Prospect Place, Plymouth PL1 3DH, UK

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(13), 2151; https://doi.org/10.3390/rs17132151

Submission received: 2 April 2025 / Revised: 1 June 2025 / Accepted: 16 June 2025 / Published: 23 June 2025

(This article belongs to the Special Issue Validation and Evaluation of Global Ocean Satellite Products (Second Edition))

Download

Browse Figures

Versions Notes

Abstract

Monitoring chlorophyll-a concentration (Chl-a) is essential for assessing aquatic ecosystem health, yet its retrieval using remote sensing remains challenging in turbid coastal waters because of the intricate optical characteristics of these environments. Elevated levels of colored (chromophoric) dissolved organic matter (CDOM) and suspended sediments (aka total suspended solids, TSS) interfere with satellite-based Chl-a estimates, necessitating alternative approaches. One potential solution is machine learning, indirectly including non-Chl-a signals into the models. In this research, we develop machine learning models to predict Chl-a concentrations in the Chesapeake Bay, one of the largest estuaries on North America’s East Coast. Our approach leverages the Extra-Trees (ET) algorithm, a tree-based ensemble method that offers predictive accuracy comparable to that of other ensemble models, while significantly improving computational efficiency. Using the entire ocean color datasets acquired by the satellite sensors MODIS-Aqua (>20 years) and VIIRS-SNPP (>10 years), we generated long-term Chl-a estimates covering the entire Chesapeake Bay area. The models achieve a multiplicative absolute error of approximately 1.40, demonstrating reliable performance. The predicted spatiotemporal Chl-a patterns align with known ecological processes in the Chesapeake Bay, particularly those influenced by riverine inputs and seasonal variability. This research emphasizes the potential of machine learning to enhance satellite-based water quality monitoring in optically complex coastal waters, providing valuable insights for ecosystem management and conservation.

Keywords:

Chesapeake Bay; chlorophyll-a; satellite remote sensing; machine learning; Extra-Trees

1. Introduction

Surface chlorophyll-a concentration (Chl-a) is one of the most widely used satellite ocean color products monitored by Earth-orbiting satellites [1]. Chl-a is an indicator of a water body’s trophic condition [2]. It plays a fundamental role in our understanding how marine ecosystems may respond to climate variability and climate change [3]. Also, Chl-a is used for assessments of global trends and regime shifts in aquatic ecosystems [4,5] and for monitoring of ecosystem health due to its direct link to aquatic net primary productivity and biomass [6]. The historical importance and availability of multi-decade records of remotely sensed Chl-a motivated its inclusion into the list of Essential Climate Variables (ECVs), a limited set of parameters deemed critical to the characterization of the planet’s climate [7,8].

In shallow, coastal (Case 2) waters, the remote sensing of water color is a challenging task because, in contrast to the open ocean, color signal in nearshore waters is affected by significant inputs of dissolved and suspended substances aside from phytoplankton [9,10,11], resulting in reflectance spectra that are confounded by non-phytoplankton constituents, which violates the assumptions of open ocean (Case 1) waters and leads to systematic biases in chlorophyll-a retrievals and other water-quality parameters when using standard NASA ocean color algorithms. Retrieving accurate satellite radiances is especially problematic in narrow estuarine systems such as the Chesapeake Bay due to high water opacity and closeness to sources of urban aerosols that complicated the atmospheric correction process [12,13,14].

Since the Chesapeake Bay is a typical optically complex Case 2 aquatic system, there are strong concerns about atmospheric corrections in the ocean color satellite data. Son and Wang [15] demonstrated that the improved atmospheric correction algorithm based on near-infrared/short-wave infrared (NIR-SWIR) combined atmospheric correction methods can improve the reliability of the satellite-derived Chl-a in the Chesapeake Bay by reducing noises/uncertainties introduced by colored dissolved organic matter (CDOM) and total suspended solids (TSS). While the existing Chl-a (OCx) algorithm worked properly in the relatively clear Lower Bay, considerable uncertainties in Chl-a optical algorithms still remained in the very turbid Middle and Upper Bays.

Considerable efforts have been focused on the development of methods for the detection of water quality properties in turbid nearshore waters, including regionally tuned algorithms for the Chesapeake Bay [12,16,17,18,19,20,21,22,23]. Several empirical and semi-analytical models have been tailored for that area: Early blue-to-green band-ratio models (e.g., OC4 and OC3) were regionally tuned for SeaWiFS and MODIS-Aqua by Harding et al. [24] and Werdell et al. [12], but their accuracy suffered in Case 2 waters with high concentrations of CDOM and TSS [23]. To address this, the approaches were introduced employing wavebands in red and NIR regions of the spectrum (650–800 nm), which are less sensitive than traditional blue-to-green (440–550 nm) ratio algorithms to the absorption by CDOM and scattering by non-algal particles [12,16,22,25]. Tzortziou et al. [18] showed that the remote sensing reflectance ratio R_rs(677)/R_rs(554) outperforms blue-to-green ratios in mid-Bay waters, and Gitelson et al. [16] and Gilerson et al. [22] developed two- and three-band red/NIR algorithms. These empirical algorithms, while reasonably successful in specific tests, are less robust in Chesapeake Bay’s highly variable, optically complex waters. Empirical algorithms often need frequent re-calibration for different seasons or sub-regions as inherent optical properties shift.

In the absence of reliable algorithms predicting Chl-a in optically complex (Case 2) waters, it seems reasonable to use machine learning (ML) technology for this purpose. The advantage of ML over other methods of numerical prediction is that ML can simulate potential complex nonlinear relationships in the absence of prior knowledge of the internal structure and functioning of the analyzed object. Recent advancements in ML techniques have led to the emergence of ML models designed to estimate Chl-a from ocean color imagery across global and regional contexts [26,27,28,29,30,31].

Many ML models used for the analysis of water color satellite imagery have utilized a neural network (deep learning) approach [26,27,29,32,33], which works well for prediction but has some limitations. Specifically, neural network models are very time-consuming and computationally intensive and require high volumes of data to be effective [34,35]. An alternative approach is tree-based ensemble methods, such as Random Forests models [36], which are quick and easy to train and tune based on their hyperparameters and require much less input preparation [34,35]. Random Forests models are less prone to overfitting because they combine the predictions of many decision trees into a single ensemble model [37]. These attractive features motivated us to use a variant of tree-based ensemble ML models called “Extra-Trees” (ET) [38] for the prediction of Chl-a in the Chesapeake Bay from ocean color satellite imagery collected by two satellite sensors: Moderate Resolution Imaging Spectroradiometer onboard the Aqua platform (MODIS-Aqua) [39] operated by the National Aeronautics and Space Administration (NASA) and Visible Infrared Imaging Radiometer Suite on the Suomi National Polar-orbiting Partnership platform (VIIRS-SNPP) [40] operated by National Oceanic and Atmospheric Administration (NOAA). MODIS-Aqua acquired the lengthiest time series of ocean color imagery (2002–2022); VIIRS-SNPP (launched in 2012) represents the first of the three VIIRS sensors (on the SNPP, NOAA-20, and NOAA-21 platforms, with additional launches of VIIRS planned), which are anticipated to deliver reliable and consistent multi-spectral ocean color imagery for the next decade and into the future.

In this research, we aim to (1) develop machine learning models predicting Chl-a in the Chesapeake Bay from R_rs data collected from in situ measurements and satellite-based sensors on MODIS-Aqua and VIIRS-SNPP; (2) assess the accuracy of these models using in situ Chl-a measurements; (3) using these models, generate time series of Chl-a and composite maps; and (4) analyze spatial and temporal variations of Chl-a in the Chesapeake Bay and recognize the factors affecting these fluctuations.

2. Materials and Methods

2.1. Study Area: Chesapeake Bay

Chesapeake Bay, one of the largest estuaries in North America, is located in a highly populated and economically important nearshore area. Consequently, the bay undergoes regular water quality monitoring, including collaborative efforts involving federal, state, academic, and community organizations and resulting in a rich collection of water quality measurements.

Chesapeake Bay (Figure 1a,b) extends from the north to the south for approximately 300 km. It varies in width from 8 to 48 km, with an average depth of 8 m and a narrow (1 to 4 km) main stem trench up to 50 m depth [41]. The upper part of the bay is river-dominated, with most of freshwater components (about 60% of total discharge) coming from the Susquehanna River, the mouth of which is located at the head of the bay in the north (Figure 1b). The entry to the Chesapeake Bay in the south is a 16 km wide strait, and its shallow underwater topography limits water influx and efflux to the Mid-Atlantic Bight. As a result, the bay hydrological residence time is as long as 180 days, even in the ocean-dominated lower part of the estuary [42]. In comparison to other estuaries, tidal forcing in the Chesapeake Bay is weak with a tidal range rarely exceeding 1 m [43,44], which provides little effect on Chl-a concentration even in the southern part of the bay [45].

Phytoplankton biomass in the Chesapeake Bay is high, resulting from long-term eutrophication attributable to riverine nutrient inputs [46,47,48,49,50]. Chesapeake Bay’s watershed extends over a wide area (164,200 km²) and has a lengthy (18,800 km) dendritic shoreline [49] resulting in substantial loads of nutrients fueling primary production. Because of restricted water exchange through the bay entrance, phytoplankton biomass is accumulated within the bay. In contrast to many other estuarine systems, Chesapeake Bay does not resemble a pipe transmitting nutrients directly to the nearby ocean [51,52,53].

The physical and ecological features of the bay are characterized by pronounced north–south gradients. Salinity increases from the north southward with distance from the Susquehanna River mouth [54]. On this basis, some authors distinguish three regions in the Chesapeake Bay (Upper, Middle, and Lower) associated with oligohaline, mesohaline, and polyhaline salinity zones [17,49,55]. Water in the upper part of the estuary and in branch rivers contains increased concentrations of dispersed particulate matter and CDOM [15,54], which, in turn, increases its optical complexity [12].

2.2. Field Data

The machine learning models predicting Chl-a in the Chesapeake Bay from R_rs were trained and tested on a dataset compiled from two sources: (1) pairs of R_rs and Chl-a measured in situ, and (2) in situ Chl-a paired with satellite-based R_rs collected over the same location during the same day.

Synchronized in situ measurements of R_rs and Chl-a were retrieved from the database of Sea-viewing Wide Field-of-view Sensor (SeaWiFS) Bio-optical Archive and Storage System (SeaBASS) NASA bio-Optical Marine Algorithm Dataset (NOMAD) (https://seabass.gsfc.nasa.gov/wiki/NOMAD, accessed on 1 April 2025) [56]. A total of 399 observations were collected within the Chesapeake Bay (Figure 1b) during 1996–2016. The dataset includes Chl-a in the surface layer measured following the standard NASA protocols [57], in situ R_rs at wavelengths 411, 443, 489, 555, and 670 nm, and Chl-a calculated from these R_rs using the NASA standard OC3 algorithms [58,59].

In situ surface measurements of Chl-a synchronized to satellite imagery were obtained from the SeaBASS database (collected 2002–2016) and from the DataHub of the Chesapeake Bay Program (CBP) (https://datahub.chesapeakebay.net, accessed on 22 July 2024). The latter dataset was collected within the scope of the Chesapeake Bay Water Quality program during 2002–2022 at 86 stations (Figure 1b). The data were collected during summer at biweekly intervals and monthly during other seasons. In situ measurements of Chl-a were sampled at the depths 0–1 m and processed using a monochromatic method with correction for pheophytin, meeting the requirements of the American Society for Testing and Materials (ASTM). The procedure incorporates filtration of water samples through a GF/F filter, acetone extraction, centrifuging, and spectrophotometry. The details can be found in the reports [60,61].

As the Chl-a tends to be log-normally distributed [62], all Chl-a data were log10-transformed before numerical operations (including both training machine learning models and statistical analysis) and the outputs were inverse-transformed.

Daily streamflow measurements of the Susquehanna River discharge at station 01578310 in Conowingo, MD, USA (Figure 1b) were retrieved from the US Geological Survey online database [63] and averaged to monthly means for 2000–2023.

2.3. Satellite Data

Ocean color measurements collected by MODIS-Aqua and VIIRS-SNPP satellite sensors were utilized to train and test the machine learning models predicting Chl-a in the Chesapeake Bay. Ocean color parameters, i.e., remote sensing reflectance at multiple wavelengths, R_rs(λ), and the Chl-a concentrations generated using the NASA standard OC3 algorithms [58,59] (hereafter OC3 Chl-a), were acquired from the NASA/Goddard Space Flight Center (GSFC) Ocean Biology Processing Group (OBPG) (http://oceancolor.gsfc.nasa.gov/, accessed on 1 April 2025). These data include all available Level 2 (the latest reprocessing R2022.0) MODIS-Aqua data from July 2002 to December 2022 and VIIRS-SNPP data from January 2012 to December 2023 passing over the Chesapeake Bay. For MODIS-Aqua, the central wavelengths of acquired measurements included 412, 443, 488, 531, 547, and 667 nm; for VIIRS-SNPP, the bands included 410, 443, 486, 551, and 671 nm. The details of Chl-a algorithms can be found at https://www.earthdata.nasa.gov/apt/documents/chlor-a/v1.0, accessed on 1 April 2025.

All MODIS-Aqua and VIIRS-SNPP Level 2 composites were remapped with a cylindrical projection at 0.75 × 0.75-km spatial resolution for the Chesapeake Bay (36.8–39.6°N, 77.0–75.5°W) after applying the following Level 2 flags: high solar and sensor zenith angles and high sun glint. To train and test the models, all non-masked pixels within the circles of 10 km radius centered at the locations of in situ data collection were extracted from MODIS-Aqua and VIIRS-SNPP daily remapped images collected the same day and averaged as medians. These parameters of matchup selection (10 km radius and the same-day time window) were different from the conventionally used 5 × 5 pixels and ±3 h [64] and focused on increasing the number of matchups at the expense of some spatial and temporal smoothing. The total number of cloud-free matchups was 4729 for MODIS-Aqua and 3104 for VIIRS-SNPP.

To investigate the influence of bottom topography and distance offshore on the accuracy of models, the bathymetry data from SeaDAS (SeaWiFS Data Analysis System) software package provided by NASA OBPG were used. For each station where in situ Chl-a was measured, the nearest pixel in the bathymetry file was found and used as the bottom depth in that location. The distance offshore was calculated for each station as the distance (in km) to the nearest pixel with zero depth.

2.4. Extra-Trees Machine Learning

Extra-Trees (ET, aka Extremely Randomized Trees) belong to ensemble learning approach, which enhance predictive performance by combining the predictions from multiple models. The most popular algorithms based on this approach are Random Forest (RF) models [36]. An RF model is an ensemble of multiple tree predictors, where every individual tree is constructed using a bootstrap approach (sampling with replacement). During tree construction, a random subset of features from the training dataset is considered for splitting, and the optimal split is chosen at each node. After obtaining results from all the trees, the final prediction is obtained through averaging in the case of a regression model or majority voting in the case of a classification model.

Unlike RF, ET [38] introduces more randomness because split points are selected randomly, rather than optimizing splits based on the dataset. Additionally, ET often use the entire dataset (without bootstrapping) for training each tree. Thus, ET determines tree splits purely based on random decisions for features and split points, without considering how well the splits separate the target labels [65]. Since node splits in ET are random, it works significantly faster than RF [66]. Training ET models is a fast process usually requiring little or no tuning in order to achieve optimal performance [67]. Several studies have demonstrated an advantage in the performance of ET models as compared to RF and other ML methods [68,69,70,71].

ET demonstrates reduced variance as compared with other ensemble learning algorithms due to its highly randomized splitting of nodes, which decreases the correlation between individual trees and ensures that the algorithm is not heavily influenced by certain features or patterns in the dataset [38]. ET keeps a higher performance in presence of noisy features and performs consistently better than other ensemble learning algorithms when there are a few relevant predictors and many noisy ones [66,71].

For implementing the models in this study, the Python’s ‘scikit-learn’ package [72] was used. The models for predicting Chl-a (output variable) were created and trained using R_rs as input variables (features) from three data sources, with or without a group of four additional features characterizing the location and season of sampling (latitude, day of year, water column depth, and distance offshore), resulting in total six models. The first dataset comprised in situ Chl-a and R_rs at five wavelengths (411, 443, 489, 555, and 670 nm) (399 samples). The second dataset consisted of in situ Chl-a measurements and MODIS-Aqua imagery matchup R_rs at six wavelengths (412, 443, 488, 531, 547, and 667 nm) (4729 samples), while the third dataset involved in situ Chl-a measurements and satellite R_rs matchups from VIIRS-SNPP imagery at five wavelengths (410, 443, 486, 551, and 671 nm with 3104 samples).

Using the aforementioned lists of input parameters, we imply that non-phytoplankton constituents affecting optical signal (CDOM, TSS, and atmospheric effects) are treated by the models as noise. Including into the models the variables like CDOM and TSS is an attractive idea, but the lack of in situ measurements of these constituents prevents us from exploring this approach.

The models were created and trained as follows. Each input dataset was randomly partitioned into two subgroups: 80% used in the process of model training and 20% for final testing. The testing subset was kept entirely separate during training to ensure an unbiased evaluation of the model’s performance and to prevent overfitting. For each of the six models, 100 model instances were trained independently. For each instance, the training dataset was randomly split into 80% (64% of the initial dataset) for training and 20% (16% of the initial dataset) for validation. Each model instance was evaluated (using the validation subset and minimum mean absolute error metric; see Section 2.5) and the best instance was selected. For each of the models, the amount of decision trees was set to 1000; other hyperparameters were set as default. Experiments with tuning hyperparameters (the number of features to select randomly, the minimum number of instances needed to divide a node, etc.) did not improve the accuracy of the ET models (not shown).

For all six resulting models, the contribution of each input feature (R_rs at different bands with/without additional features characterizing location and season) was assessed, as shown in Section 3.1, using the permutation feature importance method via the ‘permutation_importance’ function provided by the ‘scikit-learn’ Python package. The approach is based on randomly shuffling the values of a single feature and evaluating the resulting degradation of the model’s score.

2.5. Accuracy Metrics and Statistical Methods of Data Analysis

To assess the accuracy of model-derived Chl-a, we used multiplicative mean bias (MMB) and mean absolute multiplicative error (MAE), both calculated from log10-transformed Chl-a, i.e., the metrics based on simple (i.e., not squared) deviations and recommended for non-Gaussian distributions with outliers [73,74,75]. Multiplicative error metrics are most effective for distributions where errors vary in proportion to the magnitude, as is the case of Chl-a analyzed in this study.

The multiplicative mean bias (MMB) measures the systematic dissimilarity or the difference between the model-predicted value (M; in this instance, the Chl-a predicted by the model) and the reference (observed) value (R, in this instance, the in situ Chl-a), which is calculated as follows:

M M B = 10^{(\frac{1}{n} \sum_{i = 1}^{n} ({l o g}_{10} (M_{i}) - {l o g}_{10} (R_{i})))},

(1)

where n is the quantity of samples. Multiplicative absolute error is unitless. The nearer MMB is to a value of 1 (one), the less is the difference between M and R. For example, MMB = 1.1 indicates an M that overestimates R by ~10% and an MMB = 0.9 indicates an M that underestimates R by ~11%.

The mean absolute error (MAE) quantifies the error size, demonstrating the absolute difference between the actual and predicted quantities:

M A E = 10^{(\frac{1}{n} \sum_{i = 1}^{n} | {l o g}_{10} (M_{i}) - {l o g}_{10} (R_{i}) |)}

(2)

Like MMB, MAE is multiplicative and unitless; therefore, MAE = 1.1 indicates that the error of M is ~10% of R.

For assessment of the slope of the linear relationship between the in situ and modeled Chl-a, we used Type 2 regression from Python package ‘pylr2’. Type 2 regression [76,77], instead of lessening the vertical deviation of the independent data from the linear fit (as in Type 1 regression), minimizes the orthogonal deviation of the data points from the linear fit. The regression of Type 1 naturally accepts that the independent variable (in this case, in situ Chl-a) is measured without errors, when in practice, in situ measurements are also impacted by uncertainties that are difficult to quantify, which is typical for ocean color data [78]. Using Type 2 regressions result in slopes closer to one, as compared with Type 1 regressions, which helps to avoid erroneous conclusions about the shape of the relationship between the two variables.

Another metric evaluating the accuracy of Chl-a predicted by different models (ET vs. OC3) is the ratio between the model-predicted and the actual (measured) Chl-a in each sample [79]. Comparing the statistics of these ratios (arithmetic mean, standard deviation, median, and interquartile range) reveals the levels of uncertainties in these predictions.

To facilitate comparison between the results of ET modeling and the assessments of the performance of other models made in previous publications, we also utilized the metrics traditionally used by the ocean color community, such as the coefficient of determination R2 and mean square deviation/error (MSE).

For Chl-a predicted by ET models, the metrics were calculated from the test (20%) subsets. For accurate comparison of the results of ET modeling to the accuracy of OC3 Chl-a extracted from Level 2 satellite data, the metrics for OC3 Chl-a were calculated from the same test (20%) subsets and from the entire datasets.

For analysis of the spatiotemporal variability of predicted Chl-a, all remapped satellite data were used to generate monthly composites of R_rs for each satellite. These data were used as input datasets for ET models predicting Chl-a in 246 monthly composites for MODIS-Aqua and 144 monthly composites for VIIRS-SNPP. The pixels where the total number of valid data for the entire satellite sampling period was lower than 200 for MODIS-Aqua and 100 for VIIRS-SNPP were masked as “no data” and not used in statistical analysis. Missing Chl-a in the pixels with the number of valid data above the aforementioned thresholds (1.84% of pixels for MODIS-Aqua and 2.32% of pixels for VIIRS-SNPP) were reconstructed through the Data INterpolating Empirical Orthogonal Functions (DINEOF), a self-consistent and parameter-free statistical method reconstructing missing information in geophysical datasets [80,81,82]. The resulting Chl-a data were aggregated as medians in the 0.75-km latitudinal zones (the rows of the grids) and used for analysis of long-standing variations and time-lagged correlations of Chl-a with Susquehanna River discharge.

To reveal long-term variations from the time series of river flow and Chl-a, we removed seasonal variability using the Seasonal-Trend decomposition using LOESS (STL) technique [83]. This filtering method splits a time series into three components—trend, seasonal, and residual—by extracting smoothed estimates of each component using a LOcally Estimated Scatterplot Smoothing (LOESS) method based on local polynomial regressions. A salient feature of STL is that the amplitude in the seasonal component for a given period (e.g., annual) is variable, in contrast to seasonal decomposition methods based on constant seasonal amplitude. As a result, STL helps capture a more significant portion of total variance [84]. LOESS uses “a robust fitting procedure that guards against deviant points distorting the smoothed points” [85]. In each time series, the smoothing factor (the fraction of the data used when estimating each value) was set to default value (2/3 of the entire time series).

Time-lagged cross-correlations between “deseasonalized” (trend plus residuals) time series of Chl-a and Susquehanna River discharge were calculated as Pearson coefficients, with river discharge leading Chl-a with time lag from 0 to 12 months. We did not assess significance levels directly. The reason was that all examined time series exhibited strong autocorrelation, reducing the stringency of statistical tests.

3. Results

3.1. Accuracy of Extra-Trees Machine Learning Models Predicting Chl-a in the Chesapeake Bay

A comparison between Chl-a generated by the ET machine learning models, the standard OC3 algorithms, and the in situ observations in the Chesapeake Bay demonstrate significant improvement in the predictive power of ET models compared to OC3 algorithms (Table 1; Figure 2). For both satellites (MODIS-Aqua and VIIRS-SNPP), the accuracy of Chl-a predicted by ET is close to 40% and multiplicative bias is close to one (MAE = 1.40–1.41; MMB = 0.99; Table 1; Figure 2e,f). The coefficients of determination (R2) of ET models varied within 0.475–0.581, while similar metrics for OC3 algorithms were much lower and often below zero (−1.251–0.219), indicating the negative predictive power of OC3 models in coastal waters. The mean squared errors (MSE) of ET models were low (0.036–0.069) as compared to OC3 algorithms (0.077–0.190). At the same time, the slopes of all Type 2 regressions for ET models were less than one (0.639–0.756), in contrast to OC3 algorithms (0.711–1.237).

The means and medians of the ratios between Chl-a predicted by ET models and Chl-a measured in situ are close to one for all three datasets (in situ R_rs, MODIS-Aqua, and VIIRS-SNPP; Table 2), indicating that the prediction of Chl-a by ET model is well balanced, i.e., the chances of overestimation are close to the chances of underestimation. Similar metrics for OC3 Chl-a indicate a higher level of uncertainty, with means varying between 0.941 and 1.375 and medians varying between 0.885 and 1.241 (Table 2). The metrics of variability (standard deviation and interquartile range) for Chl-a predicted by ET models are also substantially lower than for standard OC3 Chl-a, with a standard deviation of 0.286–1.195 for ET models vs. 0.314–1.443 for OC3 and an interquartile range of 0.217–0.245 for ET models vs. 0.338–0.574 for OC3 (Table 2). It is noteworthy that nonparametric statistics (median and interquartile range) are more consistent compared to the statistical metrics based on a normal distribution (mean and standard deviation). The latter assessments vary between different datasets because they are strongly affected by a small number of outliers.

The test of feature importance in the ET models reveals two wavelengths that play a primary role in predicting Chl-a from R_rs—blue–green (488 nm for MODIS-Aqua and 486 nm for VIIRS-SNPP) and red (667 nm for MODIS-Aqua and 671 nm for VIIRS-SNPP) (Figure 3a–c). Including additional features like latitude (distance from the river mouth), day of year (season), and depth and distance offshore in the ET model insignificantly improved the accuracy for satellite data (MAE = 1.36–1.37; MMB = 0.99–1.00; Table 1) and slightly decreased the accuracy for in situ data (MAE = 1.55; MMB = 1.03; Table 1). The tests of feature importance demonstrates that the role of latitude and season is comparable to the role of R_rs (Figure 3d–f), while the role of depth and distance offshore was negligible (Figure 3). Including/excluding depth and distance offshore from ET models did not affect their accuracy (not shown).

An improvement of the accuracy of ET models predicting Chl-a from a combination of R_rs and position features (latitude and season) was small as compared with the models using R_rs only, indicating the small role of position features. As such, we calculated monthly Chl-a composites using ET models based solely on R_rs.

Chl-a composites generated using ET machine learning models (Figure 4d,e; monthly climatologies in Supplementary Figures S1 and S2) demonstrate the basic spatial features of Chl-a known from previous studies. The zone of most high Chl-a (>10 µg L⁻¹) is located between 38.5°N and 39.2°N and upstream in the tributaries (e.g., Patuxent, Potomac, and Rappahannock Rivers). To the north of 39.2°N and up to the mouth of Susquehanna River, Chl-a is substantially lower. To the south of 38.5°N, Chl-a gradually decreases, with higher Chl-a concentrations along the western part of the Bay.

The spatial distribution of the satellite-derived Chl-a generated using OC3 algorithms (Figure 4b,c) is somewhat different from that on the in situ Chl-a map (Figure 4a). Specifically, the OC3 Chl-a is significantly overestimated over most of the study area. Particularly, OC3 Chl-a concentrations in the northern part of the Bay (to the north of 39.2°N) and in the James River are high, while those on the in situ Chl-a map are much lower. At the same time, the satellite-derived Chl-a generated using ET models is quantitatively and spatially quite consistent with the in situ Chl-a map.

3.2. Spatiotemporal Variations of Chl-a Predicted from Satellite Imagery

Long-term variations of Chl-a in the entire bay demonstrate Susquehanna River discharge (Figure 5) as a primary factor regulating phytoplankton dynamics (Figure 6). The observed variations can be explained by both the physical effect of southward displacement by water flow and the complex effect of discharged compounds on phytoplankton dynamics, including nutrients stimulating phytoplankton growth and suspended sediments suppressing photosynthesis by shadowing algae from sunlight. While the physical flushing effect implies short time lag, the effect on phytoplankton growth can be delayed in time because it includes the accumulation of discharged compounds, multiple cycles of nutrient uptake and recycling, etc.

The discharge of the Susquehanna River is highly seasonal, with spring freshet (annual surge of water flow in rivers caused by melting snow) and summer minimum (Figure 5a,b). The decomposition of the timeseries of river flow by the STL method (Figure 5c) revealed three periods when the river flow was higher than normal (in 2003–2004, 2011, and 2018–2019; Figure 5c and Figure 6a). All these periods coincided with pronounced changes in the north–south distribution of Chl-a, with the zone of low Chl-a in the northern part of the bay extending southward (from ~39.2°N to ~39.0°N) and the southern boundary of the zone of high Chl-a (e.g., Chl-a exceeding 8 µg L⁻¹) typically located about 38.0°N, extending southward almost to the bay entrance (Figure 6b,c).

On a monthly scale, the time-lagged response of Chl-a for river flow (Figure 7) demonstrates the type of relationship that aligns with the pattern described above. In the upper (north) part of the bay, the correlations between deseasonalized time series are negative, with maximum correlations at time lag 1–2 months. To the south of 38.8°N, the correlations are positive, with time lags up to 4 months (Figure 7). Notable delay in response of Chl-a to river discharge indicates primary role of discharged compounds on phytoplankton growth, while negative correlations with short time lags indicate advective flushing effect.

4. Discussion

4.1. Model Performance

The results of this research indicate that the ET machine learning approach is a prospective method for transforming satellite ocean color imagery into Chl-a in nearshore aquatic areas with complex optical properties, such as the Chesapeake Bay. Supposedly, a similar approach can be applied to other water quality properties like CDOM, TSS, etc. The direct inclusion of CDOM and TSS in the models predicting Chl-a in coastal waters could substantially increase their predictive power, but without a sufficient amount of in situ measurements of these constituents, we are unable to pursue this approach.

The results confirm that the precision of the conventional OC3 algorithms predicting Chl-a in nearshore waters is unsatisfactory. The poor performance of OC3 models can be explained by two reasons: First, OC3 models were trained using mostly data collected in offshore Case 1 waters with optical properties depending mostly on phytoplankton concentration [9,10,11]. This optical signal is different from the color properties of the nearshore Case 2 waters of the Chesapeake Bay, where the optical signal is also affected by dissolved and suspended matter in addition to phytoplankton. Second, OC3 models are based on polynomial empirical equations relating satellite R_rs to Chl-a, while nearshore, these relationships are more complex. The main advantage of ML over other methods of numerical prediction is that ML models do not necessitate previous knowledge of the internal structure and functioning of the analyzed objects and, as such, are a better choice for the nearshore environment.

It is worth mentioning that generalizations of all empirical models, including ML models, are affected by the span of the training datasets when they are used in other regions [86]. ET models trained on limited (<400 samples) in situ R_rs measurements do not perform as well as models trained on larger datasets of satellite R_rs (Table 1). The latter performed better because of a much larger range of training data, although satellite R_rs includes additional distortions resulting from atmospheric correction, which are even more challenging nearshore [12,87,88,89]. Another positive outcome of utilizing contemporaneous satellite-derived data instead of in situ data to train the ML model is that uncertainties in the satellite-derived data are implicitly included in the empirically calculated weights of the created model (i.e., model coefficients), making predictions based on satellite data more accurate [32].

The performance of ET models predicting Chl-a in the Chesapeake Bay (uncertainty about 40%) was close to the requirements of the ocean color community commonly defined in clear, natural waters as 35% [64]. Including additional (position) features like distance from river mouth (latitude) and season (day of year) slightly improved the model’s performance (Table 1). However, including position features makes the ET models site-specific, while the purpose of this study was predicting Chl-a from satellite imagery not only in the Chesapeake Bay but in other coastal regions as well.

In the Chesapeake Bay, both positional parameters (latitude and season) may be linked to the balance between phytoplankton and non-phytoplankton components, primarily, CDOM and total suspended solids (TSS), contributing to water color. The effect of non-phytoplankton components is most significant in the upper part of the estuary and gradually decreases with the distance from the river mouth [15,24] (Figure 4 and Figures S1 and S2). Phytoplankton biomass demonstrates regular seasonal variations, with its maximum in spring–summer [12,90,91] affecting the optical signal acquired by satellite sensors. Another factor affecting Chl-a estimations from water color is the species composition of phytoplankton. Specifically, Nezlin et al. [92] demonstrated significant seasonal differences in red light absorption by chlorophyll supposedly resulting from the effect of pigment packaging in larger cells prevailing in spring [93,94,95].

Including depth and distance offshore into the model did not improve its performance. A possible explanation for depth is that the range of depths in the analyzed samples was too narrow to provide significant effect, because the Chesapeake Bay is comparatively shallow; the areas with depths greater than 10 m constitute only 24% of its surface area, with an average depth of just 6.5 m [49].

The lack of an effect of distance offshore on Chl-a prediction indicates that the land adjacency effect (LAE) is small, considering the coarse spatial resolution (0.75 × 0.75-km) of MODIS-Aqua and VIIRS-SNPP data used in this study. We expect that excluding one to two pixels close to land may improve the model’s performance [29,96], but in the Chesapeake Bay with its narrow tributaries, this operation would meaningfully shrink the explored area.

The comparative importance of the features included into the models indicates the prevailing role of blue–green and red wavelengths for Chl-a prediction (Figure 3). Many semi-analytic models developed for the Chesapeake Bay and other estuaries, nearshore and inland bodies of water, were utilizing red and NIR wavelengths [12,16,17,18,19,21] because blue and green wavelengths traditionally used in Chl-a algorithms are sensitive to signal contamination by CDOM and TSS [12,16,22,25]. Some previously reported predicting-Chl-a ML models demonstrated higher performance after those sensitive to CDOM and TSS short wavelengths were excluded from the input features [29,33].

Previous studies, primarily related to estimation of geophysical parameters from satellite imagery in different areas, demonstrated a comparable or higher performance of ET models as compared with RF, deep neural networks (DNN), and other ML algorithms [29,32,67,86,97,98,99]. The ET algorithm is an extension of RF that injects more randomness into tree-splitting (selecting split thresholds at random rather than searching for the optimal cut). This design dramatically reduces computational cost (faster training) while acting as a form of regularization that limits overfitting [99]. This robustness is crucial for the remote sensing of optically complex waters, where the absorption of Chl-a results in weaker reflectance signal, along with noise due to presence of other water constituents like CDOM and TSS. ET’s randomized splits help it avoid fitting spurious patterns in such high-noise, low-signal feature sets, whereas a more complex DNN could easily over-parameterize and memorize noise without substantially more training data or careful regularization. Saeed et al. [98] reported that an Extra-Trees-based model was markedly noise-resistant and had a lower bias and variance error than both RF and DNN when learning from moderate-sized sensor datasets. In addition, their ET ensemble not only achieved higher predictive accuracy under noisy conditions but also trained much faster than the DNN. The only unfavorable quality aspect of ET approach is that it needs a significantly larger amount of memory than other ML algorithms [67]; but recent and anticipated progress in computer technology is making this issue less critical.

The models developed in this study utilize standard Level 2 satellite imagery available from the NASA/GSFC/OBPG depository. We intentionally avoided time- and labor-consuming reprocessing involving newly developed methods of atmospheric correction [100,101]. Our future plans include testing the models developed for the Chesapeake Bay in other waters characterized by complex optical signal, as well as applying that used in this study’s simplified approach coupled with the easy-to-use and computationally efficient ET ML method to the datasets collected in other nearshore and inland aquatic regions.

4.2. Spatiotemporal Variations of Satellite-Derived Chl-a in the Chesapeake Bay

The regional and yearly variations of Chl-a estimated by the ET models using the imagery of two satellites (MODIS-Aqua and VIIRS-SNPP) reproduced the basic features of Chl-a in the Chesapeake Bay described in many publications. Across different parts of the Chesapeake Bay, Chl-a varied between 5–20 µg L⁻¹ (Figure 4 and Figures S1 and S2), which closely matched the Chl-a field measurements reported previously [12,47,102].

The largest Chl-a biomass was detected in the upper part of the bay, excluding the region closest to the Susquehanna River mouth. The zone of low Chl-a near the river mouth was clearly visible in the composite maps of Chl-a based on in situ Chl-a measurements and predicted by ET models (Figure 4 and Figures S1 and S2). At the same time, similar zones of low Chl-a were not observed in the composite maps of Chl-a produced by the standard OC3 algorithms (Figure 4b,c; see also [15]) and by the previously developed regionally tuned Chl-a algorithms, including the Generalized Stacked-Constraints Model (GSCM) [92,103], Green-Red Ocean Color 4 (GROC4) [21], and Red-Green Chlorophyll Index (RGCI) [104] algorithms. This feature indicates that ET models work better than other algorithms in the waters where the optical signal is contaminated by CDOM and TSS.

The Seasonal-Trend decomposition (STL) method applied to the Chl-a time series helped to reveal yearly variations and confirmed a key role of Susquehanna River discharge controlling phytoplankton growth in the Chesapeake Bay [48,105,106] (Figure 6). The periods when the river flow was higher than normal were characterized by a decrease in Chl-a near the river mouth and an increase in Chl-a further to the south and throughout the entire bay. An increase in Chl-a is attributed to additional nutrients discharged to the bay with river water and the associated relief of nutrient limitation [24,90,107]. Previous publications indicated that nutrient inputs to the Chesapeake Bay “vary by a factor of 2 between wet and dry years” [108]. Much of the nutrients pass throughout the northern part of the bay in the period of elevated runoff without undergoing assimilation, because of the short durations of residence of these nutrients compared to the slow assimilation rates of phytoplankton [54]. As a result, the periods with warm and wet climatic conditions are characterized by spring blooms that peak in biomass farther southward, are larger in extent, occur in the spring’s later stages, and cover a larger area as compared to the periods with a prevalence of cool and dry weather conditions [91].

The flow-regulated decrease in Chl-a near the river mouth can be ascribed to several processes. First, the zone of low Chl-a can be extended further south by the advective “flush effect” when the rate of the transport of phytoplankton biomass away from the observed area is exceeding the rate of phytoplankton growth [109,110,111]. Another explanation is that increased freshwater discharge enhances stratification [49,108], leaving a substantial portion of phytoplankton biomass accumulated in the near-bottom layer under the euphotic zone [105], where it is undetected by optical sensors. An alternative situation during the periods of increased river flow is when discharged, vertically mixed turbid waters inhibit photosynthesis by a lack of light, resulting in low phytoplankton biomass in the upper part of the bay [91,112,113,114,115].

The extension of the zone of low Chl-a is closely related to the Estuarine Turbidity Maximum (ETM) [116,117], a turbid zone extending 10–50 km downstream from the Susquehanna River mouth (Figure 4 and Figures S1 and S2). ETM is formed mostly by the local resuspension of sediments from the seafloor and by the “sediment trap” that is formed in the upper sections of the estuary by local circulation [116]. Previous studies demonstrated that the southern boundary of ETM (i.e., the gradients of water clarity and light penetration) mark the northern edge of spring bloom and are documented to relocate in reaction to river flow [24,118].

The processes described above are associated with different time lags, which can be seen at the correlations between river discharge and Chl-a in different latitudinal zones (Figure 7). Near the river mouth, the correlations were negative and time lags were short (up to one month), while in the central and southern parts of the bay, the correlations were positive with time lag as long as four months. Short time lags in the zone of adverse impact of river runoff on phytoplankton near the river mouth can be attributed to the prompt response of Chl-a to both advective flushing effect and limited light availability caused by elevated levels of river-transported suspended sediments. Liu and Wang [119] showed that after every substantial river high-flow event, the duration of which is typically only a few days, high TSS level in the northern part of the Chesapeake Bay can remain for ~10–20 days, that is, close to the time lags of one-month duration reported in this study. At the same time, in the zone of positive relationship between freshwater discharge and Chl-a, the time lags are long because the effect of freshwater discharge on the growth of phytoplankton biomass is indirect, including several cycles of nutrient consumption by phytoplankton and its feeding by zooplankton, the decay of organic matter, depositing in sediments, resuspension, nutrient recycling, and so on [53,102,105].

Time-lagged correlations between river discharge and Chl-a in the Chesapeake Bay agree with prior research. Du and Shen [42] showed that the Susquehanna River discharge needs two to three months to approach the southern part of the bay and the water turnover time in the southern part of the bay is approximately four months. So, long water turnover time was attributed to modest river input to the estuary, in contrast to large estuarine systems such as the Amazon, Mississippi, and Changjiang Rivers [120]. At the same time, the detected time lags are much longer than the time lags of one month reported by Jiang and Xia [121], derived from numerical simulations, and Acker et al. [106], based on satellite ocean color imagery collected in 2002–2003 and comparing very low discharge in 2002 to very high Susquehanna River flow in 2003.

In this study, we found no evident increasing or decreasing trends in Chl-a during the period 2002–2023, i.e., starting with the launch of MODIS-Aqua (Figure 6). Previous publications demonstrated a significant (by a factor of 5- to 10 in the southern regions and by a factor of 1.5- to 2 elsewhere) increase in Chl-a concentrations since 1945 [48] with elevated Chl-a in much of the estuary before the end of the 1980s [122]. Since that time, extensive efforts to reduce nutrient inputs [46,49] resulted in no further increase being observed, in spite of substantial seasonal and interannual variability [15,47,122]. Turner at al. [123], analyzing remote sensing reflectance in MODIS-Aqua imagery, found a prolonged decline in the concentration of suspended solids and light attenuation in the absence of a consistent decrease in Chl-a level.

5. Conclusions

This study demonstrated that the Extra-Trees (ET) machine learning algorithm is a robust and efficient approach for deriving chlorophyll-a (Chl-a) concentrations from satellite imagery of ocean color. In nearshore waters with complex optical properties, including the Chesapeake Bay, where traditional empirical and semi-analytical methods struggle, the ET models based on remote sensing reflectance effectively captured Chl-a variability while offering significantly improved computational efficiency compared to other machine learning approaches. Including into ET models additional features characterizing location and season resulted in minor improvement of their predictive power, making the models site-specific and preventing from using them in other regions. The long-term Chl-a composites generated from MODIS-Aqua (>20 years) and VIIRS-SNPP (>10 years) datasets revealed spatial and temporal patterns consistent with established field observations, reinforcing the reliability of the approach. Notably, the results confirmed that freshwater inflow from the Susquehanna River is a key driver of Chl-a distribution in the Chesapeake Bay, highlighting the strong influence of hydrological processes on estuarine productivity. This study underscores the power of machine learning to enhance satellite-based water quality monitoring in turbid coastal waters, providing a scalable and computationally efficient framework for tracking ecosystem dynamics and informing management strategies in estuarine and coastal environments.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs17132151/s1, Figure S1: Monthly climatologies of Chl-a concentrations that show the spatial distributions of Chl-a typical for each month averaged over the entire period of satellite observations in the Chesapeake Bay predicted using Extra-Trees machine learning model from remote sensing reflectances (R_rs) measured by MODIS-Aqua satellite; Figure S2: Monthly climatologies of Chl-a concentrations in the Chesapeake Bay predicted using Extra-Trees machine learning model from remote sensing reflectances (R_rs) measured by VIIRS-SNPP satellite.

Author Contributions

Conceptualization, N.P.N. and S.S.; methodology, N.P.N., S.S., S.I.S. and M.E.O.; software, N.P.N. and S.I.S.; investigation, N.P.N. and S.S.; data curation, S.S.; writing—original draft preparation, N.P.N.; writing—review and editing, N.P.N., S.S., S.I.S. and M.E.O.; visualization, N.P.N.; supervision, S.S.; project administration, S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by NOAA grant NA24NESX432C0001 (Cooperative Institute for Satellite Earth System Studies–CISESS) at the University of Maryland/ESSIC. Part of this work was performed and funded under ST13301CQ0050/1332KP22FNEED0042.

Data Availability Statement

The SeaBASS & NOMAD radiometric data are available at https://seabass.gsfc.nasa.gov/wiki/NOMAD, accessed on 1 April 2025. The MODIS-Aqua and VIIRS-SNPP satellite imagery are available at http://oceancolor.gsfc.nasa.gov/, accessed on 1 April 2025. The Susquehanna River flow data are available at https://waterdata.usgs.gov/nwis, accessed on 1 April 2025. The Python codes used in this study are available from the Coastal Application Knowledge Hub (AKH; https://www.star.nesdis.noaa.gov/socd/coast/resources.html, accessed on 1 April 2025).

Acknowledgments

The authors appreciate the scientists providing radiometric measurements to SeaBASS & NOMAD databases, the Chesapeake Bay Program for the chlorophyll-a measurements, and NASA Ocean Biology Processing Group for MODIS-Aqua and VIIRS-SNPP data. The comments of three anonymous reviewers were thoughtful and greatly improved the final manuscript. These scientific results and conclusions, as well as any views or opinions expressed herein, are those of the author(s) and do not necessarily reflect those of NOAA or the US Department of Commerce.

Conflicts of Interest

Author Nikolay P. Nezlin was employed by the company Global Science & Technology, Inc. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

IOCCG. Why Ocean Colour? The Societal Benefits of Ocean-Colour Technology; International Ocean-Colour Coordinating Group (IOCCG): Dartmouth, NS, Canada, 2008; p. 141. [Google Scholar]
Carlson, R.E. A trophic state index for lakes. Limnol. Oceanogr. 1977, 22, 361–369. [Google Scholar] [CrossRef]
Muller-Karger, F.E.; Kavanaugh, M.T.; Montes, E.; Balch, W.M.; Breitbart, M.; Chavez, F.P.; Doney, S.C.; Johns, E.M.; Letelier, R.M.; Lomas, M.W.; et al. A Framework for a Marine Biodiversity Observing Network Within Changing Continental Shelf Seascapes. Oceanography 2014, 27, 18–23. [Google Scholar] [CrossRef]
Behrenfeld, M.J.; O’Malley, R.T.; Siegel, D.A.; McClain, C.R.; Sarmiento, J.L.; Feldman, G.C.; Milligan, A.J.; Falkowski, P.G.; Letelier, R.M.; Boss, E.S. Climate-driven trends in contemporary ocean productivity. Nature 2006, 444, 752–755. [Google Scholar] [CrossRef]
Chavez, F.P.; Messié, M.; Pennington, J.T. Marine Primary Production in Relation to Climate Variability and Change. Annu. Rev. Mar. Sci. 2011, 3, 227–260. [Google Scholar] [CrossRef]
Wang, H.; Convertino, M. Algal bloom ties: Systemic biogeochemical stress and Chlorophyll-a shift forecasting. Ecol. Indic. 2023, 154, 110760. [Google Scholar] [CrossRef]
Bojinski, S.; Verstraete, M.; Peterson, T.C.; Richter, C.; Simmons, A.; Zemp, M. The Concept of Essential Climate Variables in Support of Climate Research, Applications, and Policy. Bull. Am. Meteorol. Soc. 2014, 95, 1431–1443. [Google Scholar] [CrossRef]
GCOS. Systematic Observation Requirements for Satellite-Based Products for Climate. 2011 Update; Global Climate Observing System, World Meteorological Organization: Geneva, Switzerland, 2011; p. 138. [Google Scholar]
Gordon, H.R.; Morel, A.Y. Remote Assessment of Ocean Color for Interpretation of Satellite Visible Imagery: A Review; Springer: Berlin/Heidelberg, Germany, 1983; p. 114. [Google Scholar] [CrossRef]
Morel, A. Optical modeling of the upper ocean in relation to its biogenous matter content (case I waters). J. Geophys. Res. Oceans 1988, 93, 10749–10768. [Google Scholar] [CrossRef]
Morel, A.; Prieur, L. Analysis of variations in ocean color. Limnol. Oceanogr. 1977, 22, 709–722. [Google Scholar] [CrossRef]
Werdell, P.J.; Bailey, S.W.; Franz, B.A.; Harding, L.W.; Feldman, G.C.; McClain, C.R. Regional and seasonal variability of chlorophyll-a in Chesapeake Bay as observed by SeaWiFS and MODIS-Aqua. Remote Sens. Environ. 2009, 113, 1319–1330. [Google Scholar] [CrossRef]
Gordon, H.R.; Wang, M. Retrieval of water-leaving radiance and aerosol optical thickness over the oceans with SeaWiFS: A preliminary algorithm. Appl. Opt. 1994, 33, 443–452. [Google Scholar] [CrossRef]
Gordon, H.R. Atmospheric correction of ocean color imagery in the Earth Observing System era. J. Geophys. Res. Atmos. 1997, 102, 17081–17106. [Google Scholar] [CrossRef]
Son, S.; Wang, M. Water properties in Chesapeake Bay from MODIS-Aqua measurements. Remote Sens. Environ. 2012, 123, 163–174. [Google Scholar] [CrossRef]
Gitelson, A.A.; Schalles, J.F.; Hladik, C.M. Remote chlorophyll-a retrieval in turbid, productive estuaries: Chesapeake Bay case study. Remote Sens. Environ. 2007, 109, 464–472. [Google Scholar] [CrossRef]
Magnuson, A.; Harding, L.W.; Mallonee, M.E.; Adolf, J.E. Bio-optical model for Chesapeake Bay and the Middle Atlantic Bight. Estuar. Coast. Shelf Sci. 2004, 61, 403–424. [Google Scholar] [CrossRef]
Tzortziou, M.; Subramaniam, A.; Herman, J.R.; Gallegos, C.L.; Neale, P.J.; Harding, L.W. Remote sensing reflectance and inherent optical properties in the mid Chesapeake Bay. Estuar. Coast. Shelf Sci. 2007, 72, 16–32. [Google Scholar] [CrossRef]
Le, C.; Hu, C.; Cannizzaro, J.; English, D.; Muller-Karger, F.; Lee, Z. Evaluation of chlorophyll-a remote sensing algorithms for an optically complex estuary. Remote Sens. Environ. 2013, 129, 75–89. [Google Scholar] [CrossRef]
Ondrusek, M.; Stengel, E.; Kinkade, C.S.; Vogel, R.L.; Keegstra, P.; Hunter, C.; Kim, C. The development of a new optical total suspended matter algorithm for the Chesapeake Bay. Remote Sens. Environ. 2012, 119, 243–254. [Google Scholar] [CrossRef]
Abbas, M.M.; Melesse, A.M.; Scinto, L.J.; Rehage, J.S. Satellite Estimation of Chlorophyll-a Using Moderate Resolution Imaging Spectroradiometer (MODIS) Sensor in Shallow Coastal Water Bodies: Validation and Improvement. Water 2019, 11, 1621. [Google Scholar] [CrossRef]
Gilerson, A.A.; Gitelson, A.A.; Zhou, J.; Gurlin, D.; Moses, W.; Ioannou, I.; Ahmed, S.A. Algorithms for remote estimation of chlorophyll-a in coastal and inland waters using red and near infrared bands. Opt. Express 2010, 18, 24109–24125. [Google Scholar] [CrossRef]
Mishra, S.; Mishra, D.R. Normalized difference chlorophyll index: A novel model for remote estimation of chlorophyll-a concentration in turbid productive waters. Remote Sens. Environ. 2012, 117, 394–406. [Google Scholar] [CrossRef]
Harding, L.W.; Magnuson, A.; Mallonee, M.E. SeaWiFS retrievals of chlorophyll in Chesapeake Bay and the mid-Atlantic bight. Estuar. Coast. Shelf Sci. 2005, 62, 75–94. [Google Scholar] [CrossRef]
Dall’Olmo, G.; Gitelson, A.A.; Rundquist, D.C. Towards a unified approach for remote estimation of chlorophyll-a in both terrestrial vegetation and turbid productive waters. Geophys. Res. Lett. 2003, 30, 18. [Google Scholar] [CrossRef]
Gilerson, A.; Malinowski, M.; Agagliate, J.; Herrera-Estrella, E.; Tzortziou, M.; Tomlinson, M.C.; Meredith, A.; Stumpf, R.P.; Ondrusek, M.; Jiang, L.; et al. Development of VIIRS-OLCI chlorophyll-a product for the coastal estuaries. Front. Mar. Sci. 2024, 11, 1476425. [Google Scholar] [CrossRef]
Hieronymi, M.; Müller, D.; Doerffer, R. The OLCI Neural Network Swarm (ONNS): A Bio-Geo-Optical Algorithm for Open Ocean and Coastal Waters. Front. Mar. Sci. 2017, 4, 140. [Google Scholar] [CrossRef]
Pahlevan, N.; Smith, B.; Schalles, J.; Binding, C.; Cao, Z.; Ma, R.; Alikas, K.; Kangro, K.; Gurlin, D.; Hà, N.; et al. Seamless retrievals of chlorophyll-a from Sentinel-2 (MSI) and Sentinel-3 (OLCI) in inland and coastal waters: A machine-learning approach. Remote Sens. Environ. 2020, 240, 111604. [Google Scholar] [CrossRef]
Cao, Z.; Wang, M.; Ma, R.; Zhang, Y.; Duan, H.; Jiang, L.; Xue, K.; Xiong, J.; Hu, M. A decade-long chlorophyll-a data record in lakes across China from VIIRS observations. Remote Sens. Environ. 2024, 301, 113953. [Google Scholar] [CrossRef]
Salah, M.; Salem, S.I.; Utsumi, N.; Higa, H.; Ishizaka, J.; Oki, K. 3LATNet: Attention based deep learning model for global Chlorophyll-a retrieval from GCOM-C satellite. ISPRS J. Photogramm. Remote Sens. 2025, 220, 490–508. [Google Scholar] [CrossRef]
Salah, M.; Higa, H.; Ishizaka, J.; Salem, S.I. 1D Convolutional Neural Network-based Chlorophyll-a Retrieval Algorithm for Sentinel-2 MultiSpectral Instrument in Various Trophic States. Sens. Mater. 2023, 35, 3743–3761. [Google Scholar] [CrossRef]
Chen, S.; Hu, C.; Barnes, B.B.; Wanninkhof, R.; Cai, W.-J.; Barbero, L.; Pierrot, D. A machine learning approach to estimate surface ocean pCO2 from satellite measurements. Remote Sens. Environ. 2019, 228, 203–226. [Google Scholar] [CrossRef]
El-Habashi, A.; Ahmed, S.; Ondrusek, M.; Lovko, V. Analyses of satellite ocean color retrievals show advantage of neural network approaches and algorithms that avoid deep blue bands. J. Appl. Remote Sens. 2019, 13, 024509. [Google Scholar] [CrossRef]
Ahmad, M.W.; Mourshed, M.; Rezgui, Y. Trees vs Neurons: Comparison between random forest and ANN for high-resolution prediction of building energy consumption. Energy Build. 2017, 147, 77–89. [Google Scholar] [CrossRef]
Nawar, S.; Mouazen, A.M. Comparison between Random Forests, Artificial Neural Networks and Gradient Boosted Machines Methods of On-Line Vis-NIR Spectroscopy Measurements of Soil Total Nitrogen and Total Carbon. Sensors 2017, 17, 2428. [Google Scholar] [CrossRef] [PubMed]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Banerjee, M.; Ding, Y.; Noone, A.M. Identifying representative trees from ensembles. Stat. Med. 2012, 31, 1601–1616. [Google Scholar] [CrossRef]
Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
Salomonson, V.V.; Barnes, W.L.; Maymon, P.W.; Montgomery, H.E.; Ostrow, H. MODIS: Advanced facility instrument for studies of the Earth as a system. IEEE Trans. Geosci. Remote Sens. 1989, 27, 145–153. [Google Scholar] [CrossRef]
Goldberg, M.D.; Kilcoyne, H.; Cikanek, H.; Mehta, A. Joint Polar Satellite System: The United States next generation civilian polar-orbiting environmental satellite system. J. Geophys. Res. Atmos. 2013, 118, 13463–13475. [Google Scholar] [CrossRef]
Cerco, C.F.; Cole, T. Three-Dimensional Eutrophication Model of Chesapeake Bay. J. Environ. Eng. 1993, 119, 1006–1025. [Google Scholar] [CrossRef]
Du, J.; Shen, J. Water residence time in Chesapeake Bay for 1980–2012. J. Mar. Syst. 2016, 164, 101–111. [Google Scholar] [CrossRef]
Zhong, L.; Li, M. Tidal energy fluxes and dissipation in the Chesapeake Bay. Cont. Shelf Res. 2006, 26, 752–770. [Google Scholar] [CrossRef]
Zhong, L.; Li, M.; Foreman, M.G.G. Resonance and sea level variability in Chesapeake Bay. Cont. Shelf Res. 2008, 28, 2565–2573. [Google Scholar] [CrossRef]
Shi, W.; Wang, M.; Jiang, L. Tidal effects on ecosystem variability in the Chesapeake Bay from MODIS-Aqua. Remote Sens. Environ. 2013, 138, 65–76. [Google Scholar] [CrossRef]
Boesch, D.F.; Brinsfield, R.B.; Magnien, R.E. Chesapeake Bay eutrophication: Scientific understanding, ecosystem restoration, and challenges for agriculture. J. Environ. Qual. 2001, 30, 303–320. [Google Scholar] [CrossRef]
Harding, L.W.; Mallonee, M.E.; Perry, E.S.; Miller, W.D.; Adolf, J.E.; Gallegos, C.L.; Paerl, H.W. Long-term trends, current status, and transitions of water quality in Chesapeake Bay. Sci. Rep. 2019, 9, 6709. [Google Scholar] [CrossRef]
Harding, L.W.; Perry, E.S. Long-term increase of phytoplankton biomass in Chesapeake Bay, 1950–1994. Mar. Ecol. Prog. Ser. 1997, 157, 39–52. [Google Scholar] [CrossRef]
Kemp, W.M.; Boynton, W.; Adolf, J.; Boesch, D.; Boicourt, W.; Brush, G.; Cornwell, J.; Fisher, T.; Glibert, P.; Hagy Iii, J.; et al. Eutrophication of Chesapeake Bay: Historical Trends and Ecological Interactions. Mar. Ecol. Prog. Ser. 2005, 303, 1–29. [Google Scholar] [CrossRef]
Murphy, R.R.; Kemp, W.M.; Ball, W.P. Long-Term Trends in Chesapeake Bay Seasonal Hypoxia, Stratification, and Nutrient Loading. Estuaries Coasts 2011, 34, 1293–1309. [Google Scholar] [CrossRef]
Borum, J. Shallow Waters and Land/Sea Boundaries. In Eutrophication in Coastal Marine Ecosystems; American Geophysical Union: Washington, DC, USA, 1996; pp. 179–203. [Google Scholar]
Nixon, S.W.; Ammerman, J.W.; Atkinson, L.P.; Berounsky, V.M.; Billen, G.; Boicourt, W.C.; Boynton, W.R.; Church, T.M.; Ditoro, D.M.; Elmgren, R.; et al. The fate of nitrogen and phosphorus at the land-sea margin of the North Atlantic Ocean. Biogeochemistry 1996, 35, 141–180. [Google Scholar] [CrossRef]
Kemp, W.M.; Smith, E.M.; Marvin-DiPasquale, M.; Boynton, W.R. Organic carbon balance and net ecosystem metabolism in Chesapeake Bay. Mar. Ecol. Prog. Ser. 1997, 150, 229–248. [Google Scholar] [CrossRef]
Schubel, J.R.; Pritchard, D.W. Responses of upper Chesapeake Bay to variations in discharge of the Susquehanna River. Estuaries 1986, 9, 236–249. [Google Scholar] [CrossRef]
Gallegos, C.L.; Werdell, P.J.; McClain, C.R. Long-term changes in light scattering in Chesapeake Bay inferred from Secchi depth, light attenuation, and remote sensing measurements. J. Geophys. Res. Oceans 2011, 116, C7. [Google Scholar] [CrossRef]
Werdell, P.J.; Bailey, S.W. An improved in-situ bio-optical data set for ocean color algorithm development and satellite data product validation. Remote Sens. Environ. 2005, 98, 122–140. [Google Scholar] [CrossRef]
Mueller, J.; Fargion, G.; McClain, C.; Pegau, W.; Zanefeld, J.; Mitchell, B.; Kahru, M.; Wieland, J.; Stramska, M. Ocean Optics Protocols for Satellite Ocean Color Sensor Validation, Rev. 4, Vol. IV: Inherent Optical Properties: Instruments, Characterisations, Field Measurements and Data Analysis Protocols; Goddard Space Flight Space Center: Greenbelt, MD, USA, 2003. [Google Scholar]
Hu, C.; Feng, L.; Lee, Z.; Franz, B.A.; Bailey, S.W.; Werdell, P.J.; Proctor, C.W. Improving Satellite Global Chlorophyll a Data Products Through Algorithm Refinement and Data Recovery. J. Geophys. Res. Oceans 2019, 124, 1524–1543. [Google Scholar] [CrossRef]
O’Reilly, J.E.; Maritorena, S.; Mitchell, B.G.; Siegel, D.A.; Carder, K.L.; Garver, S.A.; Kahru, M.; McClain, C. Ocean color chlorophyll algorithms for SeaWiFS. J. Geophys. Res. Oceans 1998, 103, 24937–24953. [Google Scholar] [CrossRef]
Olson, M. Guide to Using Chesapeake Bay Program Water Quality Monitoring Data (EPA 903-R-12-001); Chesapeake Bay Program: Annapolis, MD, USA, 2012; pp. 1–155. [Google Scholar]
D3731-20; Standard Practices for Measurement of Chlorophyll Content of Algae in Surface Waters. ASTM: West Conshohocken, PA, USA, 2020.
Campbell, J.W. The lognormal distribution as a model for bio-optical variability in the sea. J. Geophys. Res. Oceans 1995, 100, 13237–13254. [Google Scholar] [CrossRef]
US Geological Survey. National Water Information System Data Available on the World Wide Web (USGS Water Data for the Nation); US Geological Survey: Reston, VA, USA, 2016. [Google Scholar] [CrossRef]
Bailey, S.W.; Werdell, P.J. A multi-sensor approach for the on-orbit validation of ocean color satellite data products. Remote Sens. Environ. 2006, 102, 12–23. [Google Scholar] [CrossRef]
Geurts, P.; Louppe, G. Learning to rank with extremely randomized trees. PMLR 2011, 14, 49–61. [Google Scholar]
Camana Acosta, M.R.; Ahmed, S.; Garcia, C.E.; Koo, I. Extremely Randomized Trees-Based Scheme for Stealthy Cyber-Attack Detection in Smart Grid Networks. IEEE Access 2020, 8, 19921–19933. [Google Scholar] [CrossRef]
Smith, D.; Yenduri, S.; Iqbal, S.; Krishna, P.V. An efficient distributed protein disorder prediction with pasted samples. Comput. Electr. Eng. 2018, 65, 342–356. [Google Scholar] [CrossRef]
Adams, S.; Choudhary, C.; De Cock, M.; Dowsley, R.; Melanson, D.; Nascimento, A.C.A.; Railsback, D.; Shen, J. Privacy-preserving training of tree ensembles over continuous data. Proc. Priv. Enhancing Technol. 2021, 2022, 205–226. [Google Scholar] [CrossRef]
Ghazwani, M.; Begum, M.Y. Computational intelligence modeling of hyoscine drug solubility and solvent density in supercritical processing: Gradient boosting, extra trees, and random forest models. Sci. Rep. 2023, 13, 10046. [Google Scholar] [CrossRef] [PubMed]
Götz, M.; Weber, C.; Blöcher, J.; Stieltjes, B.; Meinzer, H.-P.; Maier-Hein, K. Extremely randomized trees based brain tumor segmentation. In Proceedings of the MICCAI Workshop: Brain Tumor Segmentation (BraTS) 2014, Boston, MA, USA, 14 September 2014. [Google Scholar]
Lawson, E.; Smith, D.; Sofge, D.; Elmore, P.; Petry, F. Decision forests for machine learning classification of large, noisy seafloor feature sets. Comput. Geosci. 2017, 99, 116–124. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Seegers, B.N.; Stumpf, R.P.; Schaeffer, B.A.; Loftin, K.A.; Werdell, P.J. Performance metrics for the assessment of satellite data products: An ocean color case study. Opt. Express 2018, 26, 7404–7422. [Google Scholar] [CrossRef]
Wynne, T.T.; Mishra, S.; Meredith, A.; Litaker, R.W.; Stumpf, R.P. Intercalibration of MERIS, MODIS, and OLCI Satellite Imagers for Construction of Past, Present, and Future Cyanobacterial Biomass Time Series. Remote Sens. 2021, 13, 2305. [Google Scholar] [CrossRef]
Wynne, T.T.; Tomlinson, M.C.; Briggs, T.O.; Mishra, S.; Meredith, A.; Vogel, R.L.; Stumpf, R.P. Evaluating the Efficacy of Five Chlorophyll-a Algorithms in Chesapeake Bay (USA) for Operational Monitoring and Assessment. J. Mar. Sci. Eng. 2022, 10, 1104. [Google Scholar] [CrossRef]
Glover, D.M.; Jenkins, W.J.; Doney, S.C. Modeling Methods for Marine Science; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Sokal, R.; Rohlf, F. Biometry: The Principles and Practice of Statistics in Biological Research, 2nd ed.; W.H. Freeman and Company: New York, NY, USA, 2012; Volume 133. [Google Scholar]
Brewin, R.J.W.; Sathyendranath, S.; Müller, D.; Brockmann, C.; Deschamps, P.-Y.; Devred, E.; Doerffer, R.; Fomferra, N.; Franz, B.; Grant, M.; et al. The Ocean Colour Climate Change Initiative: III. A round-robin comparison on in-water bio-optical algorithms. Remote Sens. Environ. 2015, 162, 271–294. [Google Scholar] [CrossRef]
Wang, M.; Son, S.; Shi, W. Evaluation of MODIS SWIR and NIR-SWIR atmospheric correction algorithms using SeaBASS data. Remote Sens. Environ. 2009, 113, 635–644. [Google Scholar] [CrossRef]
Azcarate, A.; Barth, A.; Sirjacobs, D.; Lenartz, F.; Beckers, J.-M. Data Interpolating Empirical Orthogonal Functions (DINEOF): A tool for geophysical data analyses. Mediter. Mar. Sci. 2011, 12, 5–11. [Google Scholar] [CrossRef]
Beckers, J.M.; Rixen, M. EOF Calculations and Data Filling from Incomplete Oceanographic Datasets. J. Atmos. Ocean. Technol. 2003, 20, 1839–1856. [Google Scholar] [CrossRef]
Liu, X.; Wang, M. Global daily gap-free ocean color products from multi-satellite measurements. Int. J. Appl. Earth Obs. Geoinf. 2022, 108, 102714. [Google Scholar] [CrossRef]
Cleveland, R.B.; Cleveland, W.S.; McRae, J.E.; Terpenning, I. STL: A seasonal-trend decomposition procedure based on loess. J. Off. Stat. 1990, 6, 3–73. [Google Scholar]
Vantrepotte, V.; Mélin, F. Temporal variability of 10-year global SeaWiFS time-series of phytoplankton chlorophyll a concentration. ICES J. Mar. Sci. 2009, 66, 1547–1556. [Google Scholar] [CrossRef]
Cleveland, W.S. Robust Locally Weighted Regression and Smoothing Scatterplots. J. Am. Stat. Assoc. 1979, 74, 829–836. [Google Scholar] [CrossRef]
Stock, A. Spatiotemporal distribution of labeled data can bias the validation and selection of supervised learning algorithms: A marine remote sensing example. ISPRS J. Photogramm. Remote Sens. 2022, 187, 46–60. [Google Scholar] [CrossRef]
Wang, D.; Tang, B.-H.; Li, Z.-L. Evaluation of five atmospheric correction algorithms for multispectral remote sensing data over plateau lake. Ecol. Inform. 2024, 82, 102666. [Google Scholar] [CrossRef]
Feng, L.; Hu, C. Land adjacency effects on MODIS Aqua top-of-atmosphere radiance in the shortwave infrared: Statistical assessment and correction. J. Geophys. Res. Oceans 2017, 122, 4802–4818. [Google Scholar] [CrossRef]
Pahlevan, N.; Mangin, A.; Balasubramanian, S.V.; Smith, B.; Alikas, K.; Arai, K.; Barbosa, C.; Bélanger, S.; Binding, C.; Bresciani, M.; et al. ACIX-Aqua: A global assessment of atmospheric correction methods for Landsat-8 and Sentinel-2 over lakes, rivers, and coastal waters. Remote Sens. Environ. 2021, 258, 112366. [Google Scholar] [CrossRef]
Harding, L.W.; Mallonee, M.E.; Perry, E.S.; David Miller, W.; Adolf, J.E.; Gallegos, C.L.; Paerl, H.W. Seasonal to Inter-Annual Variability of Primary Production in Chesapeake Bay: Prospects to Reverse Eutrophication and Change Trophic Classification. Sci. Rep. 2020, 10, 2019. [Google Scholar] [CrossRef]
Miller, W.D.; Harding, J.; Lawrence, W. Climate forcing of the spring bloom in Chesapeake Bay. Mar. Ecol. Prog. Ser. 2007, 331, 11–22. [Google Scholar] [CrossRef]
Nezlin, N.P.; Testa, J.M.; Zheng, G.; DiGiacomo, P.M. Satellite observations estimating the effects of river discharge and wind-driven upwelling on phytoplankton dynamics in the Chesapeake Bay. Integr. Environ. Assess. Manag. 2022, 18, 921–938. [Google Scholar] [CrossRef] [PubMed]
Lohrenz, S.E.; Weidemann, A.D.; Tuel, M. Phytoplankton spectral absorption as influenced by community size structure and pigment composition. J. Plankton Res. 2003, 25, 35–61. [Google Scholar] [CrossRef]
Ciotti, Á.M.; Lewis, M.R.; Cullen, J.J. Assessment of the relationships between dominant cell size in natural phytoplankton communities and the spectral shape of the absorption coefficient. Limnol. Oceanogr. 2002, 47, 404–417. [Google Scholar] [CrossRef]
Bricaud, A.; Claustre, H.; Ras, J.; Oubelkheir, K. Natural variability of phytoplanktonic absorption in oceanic waters: Influence of the size structure of algal populations. J. Geophys. Res. Oceans 2004, 109, C11. [Google Scholar] [CrossRef]
Feng, L.; Hu, C.; Chen, X.; Tian, L.; Chen, L. Human induced turbidity changes in Poyang Lake between 2000 and 2010: Observations from MODIS. J. Geophys. Res. Oceans 2012, 117, C7. [Google Scholar] [CrossRef]
Shen, M.; Luo, J.; Cao, Z.; Xue, K.; Qi, T.; Ma, J.; Liu, D.; Song, K.; Feng, L.; Duan, H. Random forest: An optimal chlorophyll-a algorithm for optically complex inland water suffering atmospheric correction uncertainties. J. Hydrol. 2022, 615, 128685. [Google Scholar] [CrossRef]
Saeed, U.; Jan, S.U.; Lee, Y.-D.; Koo, I. Fault diagnosis based on extremely randomized trees in wireless sensor networks. Reliab. Eng. Syst. Saf. 2021, 205, 107284. [Google Scholar] [CrossRef]
Im, G.; Lee, D.; Lee, S.; Lee, J.; Lee, S.; Park, J.; Heo, T.-Y. Estimating Chlorophyll-a Concentration from Hyperspectral Data Using Various Machine Learning Techniques: A Case Study at Paldang Dam, South Korea. Water 2022, 14, 4080. [Google Scholar] [CrossRef]
Vanhellemont, Q.; Ruddick, K. Atmospheric correction of metre-scale optical satellite data for inland and coastal water applications. Remote Sens. Environ. 2018, 216, 586–597. [Google Scholar] [CrossRef]
Zhao, D.; Feng, L.; Sun, K. Development of a Practical Atmospheric Correction Algorithm for Inland and Nearshore Coastal Waters. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5402515. [Google Scholar] [CrossRef]
Malone, T.C.; Kemp, W.M.; Ducklow, H.W.; Boynton, W.R.; Tuttle, J.H.; Jonas, R.B. Lateral variation in the production and fate of phytoplankton in a partially stratified estuary. Mar. Ecol. Prog. Ser. 1986, 32, 149–160. [Google Scholar] [CrossRef]
Zheng, G.; DiGiacomo, P.M. Detecting phytoplankton diatom fraction based on the spectral shape of satellite-derived algal light absorption coefficient. Limnol. Oceanogr. 2018, 63, S85–S98. [Google Scholar] [CrossRef]
Le, C.; Hu, C.; Cannizzaro, J.; Duan, H. Long-term distribution patterns of remotely sensed water quality parameters in Chesapeake Bay. Estuar. Coast. Shelf Sci. 2013, 128, 93–103. [Google Scholar] [CrossRef]
Malone, T.C.; Crocker, L.H.; Pike, S.E.; Wendler, B.W. Influences of river flow on the dynamics of phytoplankton production in a partially stratified estuary. Mar. Ecol. Prog. Ser. 1988, 48, 235–249. [Google Scholar] [CrossRef]
Acker, J.G.; Harding, L.W.; Leptoukh, G.; Zhu, T.; Shen, S. Remotely-sensed chl a at the Chesapeake Bay mouth is correlated with annual freshwater flow to Chesapeake Bay. Geophys. Res. Lett. 2005, 32, 5. [Google Scholar] [CrossRef]
Miller, W.D.; Kimmel, D.G.; Harding, L.W., Jr. Predicting spring discharge of the Susquehanna River from a winter synoptic climatology for the eastern United States. Water Resour. Res. 2006, 42, 5. [Google Scholar] [CrossRef]
Hagy, J.D.; Boynton, W.R.; Keefe, C.W.; Wood, K.V. Hypoxia in Chesapeake Bay, 1950–2001: Long-term change in relation to nutrient loading and river flow. Estuaries 2004, 27, 634–658. [Google Scholar] [CrossRef]
Qin, Q.; Shen, J. Typical relationships between phytoplankton biomass and transport time in river-dominated coastal aquatic systems. Limnol. Oceanogr. 2021, 66, 3209–3220. [Google Scholar] [CrossRef]
Lucas, L.V.; Deleersnijder, E. Timescale Methods for Simplifying, Understanding and Modeling Biophysical and Water Quality Processes in Coastal Aquatic Ecosystems: A Review. Water 2020, 12, 2717. [Google Scholar] [CrossRef]
Scavia, D.; Field, J.C.; Boesch, D.F.; Buddemeier, R.W.; Burkett, V.; Cayan, D.R.; Fogarty, M.; Harwell, M.A.; Howarth, R.W.; Mason, C.; et al. Climate change impacts on U.S. Coastal and Marine Ecosystems. Estuaries 2002, 25, 149–164. [Google Scholar] [CrossRef]
Fisher, T.R.; Peele, E.R.; Ammerman, J.W.; Harding, L.W., Jr. Nutrient limitation of phytoplankton in Chesapeake Bay. Mar. Ecol. Prog. Ser. 1992, 82, 51–63. [Google Scholar] [CrossRef]
Fisher, T.R.; Gustafson, A.B.; Sellner, K.; Lacouture, R.; Haas, L.W.; Wetzel, R.L.; Magnien, R.; Everitt, D.; Michaels, B.; Karrh, R. Spatial and temporal variation of resource limitation in Chesapeake Bay. Mar. Biol. 1999, 133, 763–778. [Google Scholar] [CrossRef]
Zhang, Q.; Fisher, T.R.; Trentacoste, E.M.; Buchanan, C.; Gustafson, A.B.; Karrh, R.; Murphy, R.R.; Keisman, J.; Wu, C.; Tian, R.; et al. Nutrient limitation of phytoplankton in Chesapeake Bay: Development of an empirical approach for water-quality management. Water Res. 2021, 188, 116407. [Google Scholar] [CrossRef]
Jiang, L.; Xia, M. Wind effects on the spring phytoplankton dynamics in the middle reach of the Chesapeake Bay. Ecol. Model. 2017, 363, 68–80. [Google Scholar] [CrossRef]
Schubel, J.R. Turbidity Maximum of the Northern Chesapeake Bay. Science 1968, 161, 1013–1015. [Google Scholar] [CrossRef]
Zheng, G.; DiGiacomo, P.M.; Kaushal, S.S.; Yuen-Murphy, M.A.; Duan, S. Evolution of Sediment Plumes in the Chesapeake Bay and Implications of Climate Variability. Environ. Sci. Technol. 2015, 49, 6494–6503. [Google Scholar] [CrossRef]
Harding, L. Long-term trends in the distribution of phytoplankton in Chesapeake Bay: Roles of light, nutrients and streamflow. Mar. Ecol. Prog. Ser. 1994, 104, 267–291. [Google Scholar] [CrossRef]
Liu, X.; Wang, M. River runoff effect on the suspended sediment property in the upper Chesapeake Bay using MODIS observations and ROMS simulations. J. Geophys. Res. Oceans 2014, 119, 8646–8661. [Google Scholar] [CrossRef]
Chen, B.; Cai, W.-J.; Brodeur, J.R.; Hussain, N.; Testa, J.M.; Ni, W.; Li, Q. Seasonal and spatial variability in surface CO and air–water CO flux in the Chesapeake Bay. Limnol. Oceanogr. 2020, 65, 3046–3065. [Google Scholar] [CrossRef]
Jiang, L.; Xia, M. Dynamics of the Chesapeake Bay outflow plume: Realistic plume simulation and its seasonal and interannual variability. J. Geophys. Res. Oceans 2016, 121, 1424–1445. [Google Scholar] [CrossRef]
Harding, L.W.; Gallegos, C.L.; Perry, E.S.; Miller, W.D.; Adolf, J.E.; Mallonee, M.E.; Paerl, H.W. Long-Term Trends of Nutrients and Phytoplankton in Chesapeake Bay. Estuaries Coasts 2016, 39, 664–681. [Google Scholar] [CrossRef]
Turner, J.S.; Friedrichs, C.T.; Friedrichs, M.A.M. Long-Term Trends in Chesapeake Bay Remote Sensing Reflectance: Implications for Water Clarity. J. Geophys. Res. Oceans 2021, 126, e2021JC017959. [Google Scholar] [CrossRef]

Figure 1. (a) Position of the Chesapeake Bay in North America; (b) the map of the Chesapeake Bay’s water depth and terrain (color shading), the location (white circle) of the Susquehanna River streamflow monitoring USGS station 01578310, the locations (yellow circles) of in situ radiometric measurements (SeaBASS database), and the locations (red triangles) of the Chesapeake Bay Water Quality monitoring sites where surface chlorophyll-a concentrations were measured during 2002–2023.

Figure 2. Correlations of measured in situ Chl-a (x-axes) with predicted Chl-a (y-axes) calculated from R_rs using (a–c) NASA standard OC3 algorithm and (d–f) Extra-Trees (ET) machine learning models based solely on R_rs. Predicted Chl-a was calculated from remote sensing reflectance (R_rs) from in situ SeaBASS database (a,d), from MODIS-Aqua imagery (b,e), and from VIIRS-SNPP imagery (c,f). Red lines and circles indicate test subsets (20%); grey lines and dots in (a–c) indicate the entire OC3 datasets. Black dashed lines indicate a 1:1 relationship. The statistical metrics of the correlations are in Table 1.

Figure 3. Importance of the input features of the Extra-Trees (ET) machine learning models for in situ (SeaBASS database) (a,d), MODIS-Aqua (b,e), and VIIRS-SNPP (c,f) datasets. Input features of ET models include either remote sensing reflectance (R_rs) (a–c) or R_rs with additional factors: latitude, day of year (DoY), station depth (Depth), and distance offshore (D_off).

Figure 4. Median Chl-a concentrations in the Chesapeake Bay interpolated from field data (a) and predicted from reflectances (R_rs) measured by satellite sensors MODIS-Aqua (b,d) and VIIRS-SNPP (c,e) using standard OC3 algorithm (b,c) and Extra-Trees machine learning models (d,e).

Figure 5. Monthly Susquehanna River water discharge (a) (10⁶ m³ day⁻¹) decomposed using Seasonal-Trend decomposition using LOESS (STL) method into seasonal component (b), smoothed trend (c, thick line), and residuals (c, thin line).

Figure 6. Interannual variations of (a) “deseasonalized” (see Figure 5) Susquehanna River discharge and “deseasonalized” Chl-a in the longitudinal zones of the Chesapeake Bay predicted by Extra-Trees machine learning models from MODIS-Aqua (b) and VIIRS-SNPP (c) remote sensing reflectance. The x-axes (time) of the panels (a–c) are aligned to visualize the effects of elevated discharge in 2003/2004, 2011, and 2018/2019. The y-axes of the panels (b,c) and the maps to the left are aligned to indicate the longitudinal zones where Chl-a variations occurred. The location of the Susquehanna River mouth is indicated at the maps (b,c) by red circle.

Figure 7. Time-lagged correlations in the latitudinal regions of the Chesapeake Bay between Susquehanna River discharge and median Chl-a predicted by Extra-Trees machine learning models from remote sensing reflectance acquired by MODIS-Aqua (a) and VIIRS-SNPP (b) satellites.

Table 1. Accuracy metrics of Chl-a in the Chesapeake Bay predicted from R_rs by standard OC3 algorithm and Extra-Trees (ET) machine learning models vs. in situ Chl-a. For the results of ET modeling, the metrics were calculated from the test (20%) subsets. For the OC3 Chl-a, the metrics were calculated from the same test (20%) subsets (for accurate comparison with the results of ET modeling) and from the entire datasets.

Data Source of R_rs	Chl-a Algorithm	ET Model Features	N ¹	R2 ²	MSE ³	MAE ⁴	MMB ⁵	Slope ⁶
In situ—test subset	ET	R_rs	80	0.522	0.047	1.481	1.113	0.639
In situ—test subset	ET	R_rs, Lat ⁷, DoY ⁸, Depth ⁹, D_off ¹⁰	80	0.475	0.069	1.554	1.031	0.720
In situ—test subset	OC3		80	0.219	0.077	1.644	0.830	0.719
In situ—all data	OC3		399	0.100	0.096	1.765	0.763	0.711
MYD ¹¹—test subset	ET	R_rs	946	0.502	0.044	1.395	0.994	0.756
MYD—test subset	ET	R_rs, Lat, DoY, Depth, D_off	946	0.581	0.036	1.362	0.990	0.698
MYD—test subset	OC3		946	−0.925	0.169	2.087	1.562	1.115
MYD—all data	OC3		4729	−0.903	0.165	2.066	1.556	1.057
VII ¹²—test subset	ET	R_rs	621	0.486	0.044	1.409	0.995	0.735
VII—test subset	ET	R_rs, Lat, DoY, Depth, D_off	621	0.557	0.037	1.371	1.004	0.722
VII—test subset	OC3		621	−0.988	0.171	2.060	1.717	1.220
VII—all data	OC3		3104	−1.251	0.190	2.159	1.741	1.237

¹ N—number of observations; ² R2—determination coefficient; ³ MSE—mean squared error of log10-transformed Chl-a; ⁴ MAE—mean multiplicative absolute error (Equation (1)); ⁵ MMB—mean multiplicative bias (Equation (2)); ⁶ Slope—Type 2 linear slope between two log10-transformed Chl-a; ⁷ Lat—Latitude, (°N); ⁸ DoY—day of year; ⁹ Depth—sampling station depth (m); ¹⁰ D_off—distance between the sampling station and the nearest pixel with zero depth (km); ¹¹ MYD—MODIS-Aqua; ¹² VII—VIIRS-SNPP.

Table 2. Statistics of the ratios between Chl-a calculated from R_rs (by standard OC3 or proposed ET models based solely on R_rs) and measured in situ. Metrics are calculated for test subsets; the numbers of observations are in Table 1.

Data Source of R_rs	Standard OC3 Algorithms				Extra-Trees Machine Learning Models
Data Source of R_rs	Mean	St. Dev.	Median	IQR ¹	Mean	St. Dev.	Median	IQR ¹
In situ	0.941	0.314	0.885	0.338	1.056	0.286	1.002	0.245
MODIS-Aqua	1.375	1.443	1.202	0.499	1.107	1.195	1.000	0.217
VIIRS-SNPP	1.364	1.301	1.241	0.574	1.059	0.780	1.000	0.241

¹ Interquartile range.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nezlin, N.P.; Son, S.; Salem, S.I.; Ondrusek, M.E. Chlorophyll-a in the Chesapeake Bay Estimated by Extra-Trees Machine Learning Modeling. Remote Sens. 2025, 17, 2151. https://doi.org/10.3390/rs17132151

AMA Style

Nezlin NP, Son S, Salem SI, Ondrusek ME. Chlorophyll-a in the Chesapeake Bay Estimated by Extra-Trees Machine Learning Modeling. Remote Sensing. 2025; 17(13):2151. https://doi.org/10.3390/rs17132151

Chicago/Turabian Style

Nezlin, Nikolay P., SeungHyun Son, Salem I. Salem, and Michael E. Ondrusek. 2025. "Chlorophyll-a in the Chesapeake Bay Estimated by Extra-Trees Machine Learning Modeling" Remote Sensing 17, no. 13: 2151. https://doi.org/10.3390/rs17132151

APA Style

Nezlin, N. P., Son, S., Salem, S. I., & Ondrusek, M. E. (2025). Chlorophyll-a in the Chesapeake Bay Estimated by Extra-Trees Machine Learning Modeling. Remote Sensing, 17(13), 2151. https://doi.org/10.3390/rs17132151

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Chlorophyll-a in the Chesapeake Bay Estimated by Extra-Trees Machine Learning Modeling

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area: Chesapeake Bay

2.2. Field Data

2.3. Satellite Data

2.4. Extra-Trees Machine Learning

2.5. Accuracy Metrics and Statistical Methods of Data Analysis

3. Results

3.1. Accuracy of Extra-Trees Machine Learning Models Predicting Chl-a in the Chesapeake Bay

3.2. Spatiotemporal Variations of Chl-a Predicted from Satellite Imagery

4. Discussion

4.1. Model Performance

4.2. Spatiotemporal Variations of Satellite-Derived Chl-a in the Chesapeake Bay

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI