Monitoring Water Quality Indicators over Matagorda Bay, Texas, Using Landsat-8

: Remote sensing datasets offer a unique opportunity to observe spatial and temporal trends in water quality indicators (WQIs), such as chlorophyll-a, salinity, and turbidity, across various aquatic ecosystems. In this study, we used available in situ WQI measurements (chlorophyll-a: 17, salinity: 478, and turbidity: 173) along with Landsat-8 surface reflectance data to examine the capability of empirical and machine learning (ML) models in retrieving these indicators over Matagorda Bay, Texas, between 2014 and 2023. We employed 36 empirical models to retrieve chlorophyll-a (12 models), salinity (2 models), and turbidity (22 models) and 4 ML families—deep neural network (DNN), distributed random forest, gradient boosting machine, and generalized linear model—to retrieve salinity and turbidity. We used the Nash–Sutcliffe efficiency coefficient (NSE), correlation coefficient ( r ), and normalized root mean square error (NRMSE) to assess the performance of empirical and ML models. The results indicate that (1) the empirical models displayed minimal effectiveness when applied over Matagorda Bay without calibration; (2) once calibrated over Matagorda Bay, the performance of the empirical models experienced significant improvements (chlorophyll-a— NRMSE: 0.91 ± 0.03, r : 0.94 ± 0.04, NSE: 0.89 ± 0.06; salinity—NRMSE: 0.24 ± 0, r : 0.24 ± 0, NSE: 0.06 ± 0; turbidity—NRMSE: 0.15 ± 0.10, r : 0.13 ± 0.09, NSE: 0.03 ± 0.03); (3) ML models outperformed calibrated empirical models when used to retrieve turbidity and salinity, and (4) the DNN family outperformed all other ML families when used to retrieve salinity (NRMSE: 0.87 ± 0.09, r : 0.49 ± 0.09, NSE: 0.23 ± 0.12) and turbidity (NRMSE: 0.63 ± 0.11, r : 0.79 ± 0.11, NSE: 0.60 ± 0.20). The developed approach provides a reference context, a structured framework, and valuable insights for using empirical and ML models and Landsat-8 data to retrieve WQIs over aquatic ecosystems. The modeled WQI data could be used to expand the footprint of in situ observations and improve current efforts to conserve, enhance, and restore important habitats in aquatic ecosystems.


Introduction
Collecting and analyzing water quality (WQ) data are crucial for assessing spatiotemporal trends in the health of aquatic ecosystems.WQ has a substantial impact on the wellbeing of the human population and the species that rely on these ecosystems [1][2][3][4].Water quality indicators (WQIs) are specific parameters used to assess the WQ in various ecosystems [5,6].These indicators include parameters such as temperature, dissolved oxygen (DO), pH (hydrogen ion concentration), salinity, turbidity, suspended solids, nutrients, and heavy metals [7][8][9][10][11].The ultimate purpose of monitoring WQIs is to identify changes in aquatic conditions and to provide data to decision makers, which they can use to inform their environmental management and protection practices [12][13][14].
In situ observations involve collecting measurements of WQIs directly at a location of interest within an aquatic ecosystem.These observations provide a common approach to monitoring WQIs across ecosystems [7,15].However, these observations are relatively expensive and time-consuming, and are spatially and temporally limited [2, 12,15,16].Some areas are difficult to access for sampling purposes and/or have very limited resources.In addition, periodic WQ sampling stations might also miss the short-lived magnitudes, trends, and gradients in WQIs.Moreover, permanent field WQ stations are vulnerable to damage from extreme climate events.
Matagorda Bay, also called the Lavaca-Colorado estuary, is one of seven major estuarine systems located on the Texas coast (Figure 1; inset a).As the second largest estuary in Texas, Matagorda Bay is known for its vital role in supporting diverse ecosystems, serving as a critical habitat for marine life, and providing a foundation for commercial and recreational activities [4,[59][60][61].
In this study, turbidity, chlorophyll-a, and salinity WQIs were retrieved from Landsat-8 images over Matagorda Bay during the period from 2014 to 2023.Because they exhibit complex spatiotemporal patterns and trends, these three WQIs have been established in several studies as significant markers of the health of Matagorda Bay [62,63].Along with water depth, current conditions, and temperature, several studies have found these WQIs to be the most significant in controlling the growth of oysters in Matagorda Bay [4,61,64,65].Both empirical and ML models were used.Specific questions to be addressed in this study include the following: (1) Which empirical model(s) could perform well in extracting chlorophyll-a, salinity, and turbidity data over Matagorda Bay? (2) Which model(s) witness a significant performance enhancement once recalibrated over Matagorda Bay? (3) Do ML models usually outperform empirical models in extracting WQI data?(4) Which ML model family performs well in extracting salinity and turbidity data over Matagorda Bay?
A total of 36 empirical models, documented in previous research, were applied to retrieve turbidity, chlorophyll-a, and salinity data over Matagorda Bay, Texas (Figure 1).Specifically, 12 models were employed for chlorophyll-a, 2 models for salinity, and 22 models for turbidity estimation.In addition, four distinct ML algorithm families-DNN, GBM, Distributed Random Forest (DRF), and Generalized Linear Model (GLM)-were employed to retrieve salinity and turbidity data over Matagorda Bay.In this study, turbidity, chlorophyll-a, and salinity WQIs were retrieved from Landsat-8 images over Matagorda Bay during the period from 2014 to 2023.Because they exhibit complex spatiotemporal patterns and trends, these three WQIs have been established in several studies as significant markers of the health of Matagorda Bay [62,63].Along with water depth, current conditions, and temperature, several studies have found these WQIs to be the most significant in controlling the growth of oysters in Matagorda Bay [4,61,64,65].Both empirical and ML models were used.Specific questions to be addressed in this study include the following: (1) Which empirical model(s) could perform well in extracting chlorophyll-a, salinity, and turbidity data over Matagorda Bay?
(2) Which model(s) witness a significant performance enhancement once recalibrated over Matagorda Bay? (3) Do ML models usually outperform empirical models in extracting WQI data?(4) Which ML model family performs well in extracting salinity and turbidity data over Matagorda Bay?

Study Area
Matagorda Bay (area: 1070 km 2 ) comprises a primary bay (Matagorda Bay) and several subsystems, including Lavaca Bay, Tres Palacios Bay, and East Matagorda Bay (Figure 1) [60,63,66].Matagorda Bay holds diverse ecological significance as an estuarine ecosystem, nurturing a wide range of flora and fauna and functioning as a habitat and breeding ground for various fish, shellfish, and aquatic organisms that enhance local biodiversity and support fisheries [59,[67][68][69].This ecological richness translates into substantial economic value, driven by thriving fisheries, recreational pursuits, and the potential for shipping and transportation.Both commercial and recreational fishing bolster the local economy, and the bay attracts tourists for activities like boating, birdwatching, and outdoor exploration [60,70,71].
Oysters are commercially harvested in Matagorda Bay [72].They also play a crucial role in maintaining WQ, supporting local fisheries, and enhancing the overall ecological resilience of Matagorda Bay [59,64,73].Studies have highlighted the oyster reefs in Matagorda Bay as important nurseries for various commercially valuable species, such as spotted seatrout and red drum [74].The threats these oyster populations face include habitat degradation, reduced water flow, and disease, which are directly related to the WQ in Matagorda Bay [59,61,64].Monitoring WQIs is significant for developing effective conservation measures that are crucial to preserving the ecological and economic benefits that oysters in Matagorda Bay provide.Matagorda Bay's ecosystem is significantly shaped by its connectivity to neighboring water bodies, primarily the Gulf of Mexico for the main bay and river connections for the secondary bays [73].The estuary is fed by four major rivers: the Colorado River, the Lavaca River, the Carancahua River, and the Tres Palacios River [61,75].The Colorado River is the primary source of freshwater, supplying an estimated 34% of the total freshwater inflow for the bay system [4,66,73,[76][77][78].Freshwater input plays a pivotal role in determining the overall health of Matagorda Bay [4,61,73,75,78]; it supplies nutrients to the bay [4,66], regulates salinity [4,61,66,73,77], and is responsible for the transportation of sediment [4,73].

In situ Data
We gathered in situ measurements over Matagorda Bay for the period from 2014 to 2023 (Inset b; Figure 1).These data were collected from the public databases of the Texas Commission on Environmental Quality (TCEQ) [79], Lower Colorado River Association (LCRA) [80], and Texas Wildlife Parks Department (TWPD).TCEQ sampling procedures have been adopted by both LCRA and TWPD [81,82].For sampling within estuaries, water samples were obtained at a depth of 0.30 m.Salinity was measured using a multiprobe instrument and recorded to the nearest 0.10 psu (practical salinity unit) [83].Turbidity is measured using the benchtop turbidity meter instrument and reported to the nearest 0.02 NTU (nephelometric turbidity unit) [83,84].Chlorophyll-a concentrations were measured on collected water samples [83] using a spectrophotometer and recorded in µg/L [85].
The locations of in situ measurements are shown in Figure 1.A total of 17 measurements were collected for chlorophyll-a, 478 for salinity, and 173 for turbidity.These are the only measurements that align in time with the dates of the Landsat-8 acquisition during the investigated period (2014)(2015)(2016)(2017)(2018)(2019)(2020)(2021)(2022)(2023).Table 1 provides statistics for these measurements.

Remote Sensing Data
In this study, we used Landsat-8 (Collection 2, Level 2, Tier 1) images from 2014 to 2023 over Matagorda Bay.These images cover path 26 and row 40 of the Landsat worldwide reference system.Landsat-8 launched on 11 February 2013, as a joint mission between the U.S. Geological Survey and the National Aeronautics and Space Administration [7,86].Landsat-8 carries two instruments: the Operational Land Imager (OLI) and the Thermal Infrared Sensor (TIR).These instruments record data over 11 spectral bands.OLI records apparent radiance in nine optical bands in wavelengths ranging from optical to shortwave infrared at a spatial resolution of 30 m.The TIR instrument records in thermal infrared wavelengths at a 120 m spatial resolution.Landsat-8 has 16-day temporal resolution and a 185 km swath width.A total of five Landsat-8 images were used to retrieve chlorophyll-a.Salinity required 121 images, while turbidity required 43 images.

Surface Reflectance
Surface reflectance data were extracted from Landsat-8 images in a Google Earth Engine (GEE) environment.In the GEE editor, the scale factor was applied to Landsat-8 ′ s band 1 through band 7 (B1-B7) to generate surface reflectance data.These surface reflectance data were corrected for the temporally, spatially, and spectrally varying scattering and absorbing effects of atmospheric gases, aerosols, and water vapor [7,86].For modeling purposes, surface reflectance data were extracted over locations of in situ observations (Figure 1) and times where in situ observations align with the dates of Landsat-8 acquisition during the investigated period (2014-2023).Due to the inherently low surface reflectance of water, some pixels showed negative surface reflectance values [87,88].These pixels were deleted and not considered for further analyses.Surface reflectance values for pixels in cloud-contaminated areas were also removed.Figure 2 displays scatterplots depicting the relationship between surface reflectance data across each band (B1-B7) and the WQIs.The correlation coefficient between chlorophyll-a and surface reflectance, from all bands, was estimated to be 0.89 ± 0.07 (average ± standard deviation).B6 and B7 exhibited the highest positive correlation (0.96 and 0.98, respectively) with chlorophyll-a, while B1 and B2 demonstrated the lowest positive correlations (0.80 and 0.81, respectively).The correlation coefficient between salinity and surface reflectance was weak.It was estimated at −0.11 ± 0.04.B4 and B5 showed the maximum negative correlation of -0.16, whereas B7 showed the minimum negative correlation (−0.05).The correlation coefficient between turbidity and surface reflectance was estimated to be 0.05 ± 0.04.B4 exhibited the highest correlation of 0.12, whereas B7 demonstrated the lowest correlation of −0.02.
Table 2 presents statistics for surface reflectance data generated from B1 through B7 during the periods when in situ observations correspond with Landsat-8 acquisition dates.The surface reflectance data used for salinity retrieval displayed mean values ranging from 0.04 to 0.11, with corresponding standard deviations ranging from 0.03 to 0.07.For turbidity, the surface reflectance data revealed average values spanning from 0.06 to 0.12, accompanied by standard deviations ranging from 0.04 to 0.09.The surface reflectance data used for chlorophyll-a retrieval demonstrated average and standard deviation values ranging from 0.07 to 0.15 and 0.09 to 0.16, respectively.was estimated to be 0.05 ± 0.04.B4 exhibited the highest correlation of 0.12, whereas B7 demonstrated the lowest correlation of −0.02.Table 2 presents statistics for surface reflectance data generated from B1 through B7 during the periods when in situ observations correspond with Landsat-8 acquisition dates.The surface reflectance data used for salinity retrieval displayed mean values ranging from 0.04 to 0.11, with corresponding standard deviations ranging from 0.03 to 0.07.For turbidity, the surface reflectance data revealed average values spanning from 0.06 to 0.12, accompanied by standard deviations ranging from 0.04 to 0.09.The surface reflectance data used for chlorophyll-a retrieval demonstrated average and standard deviation values ranging from 0.07 to 0.15 and 0.09 to 0.16, respectively.

Empirical Models
A total of 36 empirical models, documented in previous studies, were applied to Landsat-8 surface reflectance data to extract WQI data for Matagorda Bay.We reported all empirical models that were published and used to retrieve WQIs from Landsat-8 data.These models along with their geographic origins and sources are presented in Table 3. Salinity retrieval used 2 models (Sn1 and Sn2), turbidity retrieval involved 22 models (T1-T22), and chlorophyll-a estimation used 12 models (C1-C12) (Table 3).The empirical models for chlorophyll-a used B1, B2, B3, B4, B5, B6, and B7.Salinity made use of B1, B2, B3, and B4, while turbidity models employed B1, B2, B3, B4, and B5.These models were initially used with their original weights (Table 3), i.e., uncalibrated models.Additionally, we performed recalibration specifically tailored to Matagorda Bay, resulting in the generation of new weights for each of these models (i.e., calibrated models).Empirical models over Matagorda Bay were calibrated using the ordinary least squares package in a geographic information system environment [89,90].
Table 3. Empirical model coefficients published in previous studies along with the locations they were applied to and their respective sources.Models over Matagorda Bay were calibrated.

WQI Model ID Equation Location Source
Chlorophyll-a C1 −0.025 + In this study, the open-source H2O-AML (automated ML) platform was employed (accessible at: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html;accessed on 1 June 2023).H2O-AML provides user-friendly, fully automated supervised learning algorithms, catering to both those well versed in the field and those without expertise [103].Within the realm of H2O-AML, various ML families are used in this study, including DNN, DRF, GBM, and GLM.

ML Model Setup and Structure
For both turbidity and salinity, the input data for the ML models comprised seven Landsat-8 bands (Figure 3).The organization of the input data aimed to facilitate a more straightforward comparison between ML and empirical models.Due to the limited data availability, ML models were not created for chlorophyll-a.
To guarantee equal consideration of all input variables, the following equation was employed to normalize model inputs within the range of 0 to 1 (Figure 3): where x norm represents the normalized value for a specific input x, while x max and x min represent the maximum and minimum recorded values of x, respectively.To guarantee equal consideration of all input variables, the following equation was employed to normalize model inputs within the range of 0 to 1 (Figure 3): where  represents the normalized value for a specific input x, while  and  represent the maximum and minimum recorded values of x, respectively.
For each model family (DNN, DRF, GBM, and GLM), the input and target data were divided into training (64%), validation (16%), and testing (20%) sets (Figure 3).The process of dividing the input/target data followed a random format.This method of generating random data ensures unique input values for each model run.
To prevent overfitting in each model, early stopping criteria were enforced, employing the mean-squared error as the stopping metric.This involved setting a stopping round value of 5 and a stopping tolerance of 0.0001.This option specifies the tolerance value by which an ML model must improve before training ceases.In this case, the moving average for last six simulation rounds is calculated, where the first moving average is the reference value for comparison with the other five moving averages.The model will stop training if the ratio between the best moving average and the reference moving average is more than or equal to 0.0001.These stopping options are used to increase model performance by restricting the number of models that are built.To improve the simulation accuracy, we ran each simulation for 50 runs.The model with the highest testing performance, based on the Nash-Sutcliffe efficiency (NSE) coefficient, was selected as an optimal model.An optimal model was selected for salinity and for turbidity from each ML model family (e.g., DNN, DRF, GBM, and GLM) (Figure 3).

Model Performance Measures
Performance for both empirical and ML models was assessed using the NSE coefficient, correlation coefficient (r), and normalized root mean square error (NRMSE).These measures were used to evaluate the performance of similar models worldwide [15,91,100,113].The standard variation in the 50 ML model simulations was used to quantify errors in each performance metric.For each model family (DNN, DRF, GBM, and GLM), the input and target data were divided into training (64%), validation (16%), and testing (20%) sets (Figure 3).The process of dividing the input/target data followed a random format.This method of generating random data ensures unique input values for each model run.
To prevent overfitting in each model, early stopping criteria were enforced, employing the mean-squared error as the stopping metric.This involved setting a stopping round value of 5 and a stopping tolerance of 0.0001.This option specifies the tolerance value by which an ML model must improve before training ceases.In this case, the moving average for last six simulation rounds is calculated, where the first moving average is the reference value for comparison with the other five moving averages.The model will stop training if the ratio between the best moving average and the reference moving average is more than or equal to 0.0001.These stopping options are used to increase model performance by restricting the number of models that are built.To improve the simulation accuracy, we ran each simulation for 50 runs.The model with the highest testing performance, based on the Nash-Sutcliffe efficiency (NSE) coefficient, was selected as an optimal model.An optimal model was selected for salinity and for turbidity from each ML model family (e.g., DNN, DRF, GBM, and GLM) (Figure 3).

Model Performance Measures
Performance for both empirical and ML models was assessed using the NSE coefficient, correlation coefficient (r), and normalized root mean square error (NRMSE).These measures were used to evaluate the performance of similar models worldwide [15,91,100,113].The standard variation in the 50 ML model simulations was used to quantify errors in each performance metric.
The NSE measures the relative magnitude of residual variance to the variance in the observed data.NSE values range from −∞ to 1, with optimal performance at 1: (2) The r value measures the strength of the linear relationship between predicted and observed data with value ranges between −1 and 1.An r value of 0 means there is no correlation, and positive (negative) values mean positive (negative) correlations, with 1 (−1) indicating perfect positive (negative) correlations between predicted and observed values: The NRMSE is the RMSE normalized by the standard deviation of the observed data with value ranges from 0 to ∞, given by the following equation: where X pred.and X obs.represent the predicted value and observed (e.g., in situ) values, respectively; Xpred.and X obs.are the mean predicted and observed values; n represents the input data size; and σ is the standard deviation of the observed data.
The performance of the uncalibrated empirical salinity models varied across the matrices, with NRMSE ranging from 180.47 to 180.48, r from -0.16 to 0.16, and NSE from −32,638.62 to −32,635.55 (Figure 4b; Table 4).Notably, neither of the uncalibrated empirical salinity models produced statistically significant performances (NRMSE: 180.47 ± 0.01, r: 0.16 ± 0.22, NSE: −32,635 ± 2.18).The calibrated empirical salinity model exhibited a slight improvement with NRMSE, r, and NSE values of 0.24, 0.24, and 0.06, respectively (Figure 4b; Table 4).However, the scatterplot for the calibrated salinity model indicates poor performance (Figure 5b).It is important to note that since the same bands (B1-B4) were used to generate the uncalibrated salinity models, only one calibrated model was generated for salinity.

Applications of Optimal Models
The optimal model for chlorophyll-a was determined to be the calibrated empirical C2 model.However, for salinity and turbidity, the optimal models were identified as DNN models.These models were then used to retrieve WQI data across all of Matagorda Bay. Figure 10 provides examples of the retrieved products for August and November of 2018.Spatial temporal variations in WQIs are outside the scope of this study.However, turbidity reflects the clarity and suspended particle levels in the water [114], chlorophyll-a is a key indicator of phytoplankton and algae levels [115][116][117], and salinity is crucial for understanding the bay's conditions and the organisms it can support [61,62,65].
The high spatial resolution of WQI data, such as those produced in this study, provides opportunities to analyze and monitor trends in the health of aquatic systems, significantly contributing to previous conservation and pollution assessment efforts [98,118].Specifically, these data could be used to (1) expand in situ observational footprints and improve current efforts to conserve, enhance, and restore important aquatic habitats; (2) define areas of "anomalous" WQIs that may need further future investigation; (3) enable a demonstration and use of satellite observations to improve the understanding of factors controlling spatial and temporal variability in WQs across complex aquatic systems; and (4) help realize an opportunity and process to enable standard WQ monitoring protocols to be expanded and complemented by free satellite products.

Applications of Optimal Models
The optimal model for chlorophyll-a was determined to be the calibrated empirical C2 model.However, for salinity and turbidity, the optimal models were identified as DNN models.These models were then used to retrieve WQI data across all of Matagorda Bay. Figure 10 provides examples of the retrieved products for August and November of 2018.Spatial and temporal variations in WQIs are outside the scope of this study.However, turbidity reflects the clarity and suspended particle levels in the water [114], chlorophylla is a key indicator of phytoplankton and algae levels [115][116][117], and salinity is crucial for understanding the bay's conditions and the organisms it can support [61,62,65].
The high spatial resolution of WQI data, such as those produced in this study, provides opportunities to analyze and monitor trends in the health of aquatic systems, significantly contributing to previous conservation and pollution assessment efforts [98,118].Specifically, these data could be used to (1) expand in situ observational footprints and improve current efforts to conserve, enhance, and restore important aquatic habitats; (2) define areas of "anomalous" WQIs that may need further future investigation; (3) enable a demonstration and use of satellite observations to improve the understanding

Discussion
The performances of the implemented empirical and ML models are influenced by the quality of input datasets.Landsat-8 surface reflectance data showed reasonable quality over Matagorda Bay.Relevant studies indicate a significant correlation (r: 0.70) between Landsat-8-derived and in-situ-derived surface reflectance data [119].In addition, we removed pixels over cloud-contaminated regions and those with negative surface reflectance.Atmospheric conditions can significantly affect the quality of Landsat-8 imagery.Aerosols, clouds, and water vapor the atmosphere can scatter and absorb light, leading to errors in the measured surface reflectance values [120,121].However, the Landsat-8 level 2 data used in this study are atmospherically corrected [7,86].Landsat 8, like any satellite, has a calibration process to ensure the accuracy of its radiometric measurements [7,86].The angle of the sun and the presence of sun glint (sunlight reflecting directly off the water's surface) can impact the reflectance measurements [122][123][124].To reduce these effects, we removed these pixels from our modeling exercise.
Matagorda Bay is influenced by four major rivers: the Colorado River, the Lavaca River, the Carancahua River, and the Tres Palacios River.Suspended solids and dissolved organic matter from these rivers can directly affect water turbidity, salinity, and reflectance in the area [125].Colored, dissolved organic matter, a yellow substance, contains inherently optical properties that affect surface reflectance [126,127].In addition, previous studies have reported significant negative correlations between salinity and total suspended solids [128].In this study, correlations of 0.05 ± 0.04 were estimated between salinity and turbidity.Turbidity, dissolved organic matter, and suspended solids directly influence water color and spectral reflectance.This indirect relationship between salinity and spectral bands motivated the use of Landsat-8 data to estimate salinity, a non-optical WQI.Similar principles have been applied in several studies utilizing Landsat-5 [129][130][131][132][133][134], MODIS [135], and Landsat-8 [27,97,136] data for salinity retrieval.However, the relatively lower correlation between Landsat-8-derived spectral reflectance and salinity might explain the significantly lower performance of empirical models and the relatively lower performance of the ML over Matagorda Bay.

Empirical Models
Most uncalibrated empirical models produced poor performance measures.The inadequacies of these models are more likely due to their application in water bodies, for which they were not originally designed.These models were initially developed across geographic regions encompassing lakes, rivers, and estuaries, each with distinct surface areas, hydrological conditions, and surface reflectance signatures.These discrepancies between the model's training conditions and the environmental characteristics of Matagorda Bay contribute to their poor performance.
The empirical models, however, demonstrated improved performance when calibrated over Matagorda Bay.The calibrated chlorophyll-a models showed the most significant improvement.The average ± standard deviation of the NRMSE, r, and NSE measures increased from 126.91 ± 424.70, 0.42 ± 0.67, and −1,961,890.27± 6,791,336.75for uncalibrated models to 0.91 ± 0.03, 0.94 ± 0.04, and 0.89 ± 0.06 for calibrated models, respectively.The highest performance of the calibrated models was reported for Model C2, which used B1, B3, B5, B6, and B7.These bands were all highly correlated (r > 0.80) with in situ chlorophylla values, which likely made it easier for this model to generalize a linear relationship between surface reflectance and chlorophyll-a measurements.The metrics of the calibrated C2 model outperform the metrics published in previous studies (0.74 ± 0.12) [91].
It is notable that the number of data points utilized in creating the empirical models for chlorophyll-a (total: 17) may raise concerns.However, these measurements were the only ones available that coincided with Landsat-8 overpasses over Matagorda Bay.They represent the sole data points temporally synchronized with the dates of Landsat-8 acquisition throughout the 10-year study period (2014-2023).Some of these in situ chlorophyll-a data were gathered in the field when weather conditions allowed, while others were obtained from historical records (e.g., TCEQ, TWPD, and LCRA).The results of our chlorophyll-a models should be used with caution.We acknowledge that a larger sample size would undoubtedly enhance the robustness of the empirical models, and we are actively pursuing efforts to acquire additional data to expand the sample size in future studies.Despite the limited sample size, our primary objective was to evaluate the feasibility of empirical models in estimating chlorophyll-a using the available data.The empirical salinity models also experienced improvements after calibration over Matagorda Bay.The RMSE, r, and NSE increased from 180.47 ± 0.01, 0.00 ± 0.22, and −32,637.09± 2.18 for uncalibrated models to 0.24, 0.24, and 0.06 for the calibrated model, respectively.The calibrated salinity models, Sn1 and Sn2, used B1, B2, B3, and B4.These bands indicated a weakly inverse relationship (r: −0.16 to −0.09) with in situ salinity measurements.This weak relationship likely hinders the empirical models in establishing meaningful linear relationships between in situ salinity observations and surface reflectance.In addition, the performance of calibrated salinity models was significantly lower than the performance reported for the original study.In its original applications, the r values for Sn1 and Sn2 were 0.84 [96] and 0.77 [97], respectively.However, the original studies only used 33 data points, which might have led to overfitting and produced misleading performance measures.
The calibrated turbidity models demonstrated some improvements in performance.The NRMSE, r, and NSE measures increased from 5.86 ± 17.78, 0.07 ± 0.08, and −337.09± 1553.51 for uncalibrated models to 0.15 ± 0.10, 0.13 ± 0.09, and 0.03 ± 0.03 for calibrated models, respectively.The highest-performing calibrated empirical turbidity model is T12, which used B2, B3, B4, and B5.These bands had a positive, but low, correlation with in situ turbidity (r: 0.05 − 0.12).Model T12 had higher performances reported in its original studies (r: 0.85 ± 0.18) compared to that calibrated to Matagorda Bay [100].This might be due to the complex turbidity patterns over Matagorda Bay.

ML Models
ML algorithms performed better for both salinity and turbidity compared to empirical models.ML algorithms are much more complex, and equipped to ascertain intricate relationships between inputs (surface reflectance) and target (salinity and turbidity) datasets [137].In addition, ML algorithms are better able to map nonlinear relationships, such as those between salinity and turbidity and surface reflectance (correlation between salinity and surface reflectance: −0.10 ± 0.04; correlation between turbidity and surface reflectance: 0.04 ± 0.04).
The DNN, GBM, and DRF families examined in this study demonstrated acceptable performance when used to retrieve turbidity and salinity over Matagorda Bay.Notably, DNN performed better than the other families when simulating both salinity and turbidity.DNN is widely recognized for its efficacy in modeling high-dimensional and complex datasets, particularly those characterized by intricate nonlinear associations between input and target variables [22,138].DNNs excel in extracting hierarchical features from input and target data, endowing them with stronger modeling capabilities relative to alternative ML families [16,105,138].The GBM family consistently secured the second-highest model performance when simulating salinity and turbidity.However, no large differences were found in the performance of both DRF and GBM.Remarkably, the GLM family consistently produced the poorest-performing models for both salinity and turbidity simulations.GLM models adopt a straightforward linear regression approach and lack the intricacy required to dissect the multifaceted relationships inherent in the multidimensional and nonlinear datasets used in this study.Consequently, the limitations of GLM models prevent them from effectively capturing the complexities in the data.
Most of the ML models exhibited a slight decrease in performance during the testing phase compared to the training phase.This relatively modest reduction in testing performance can be attributed to the limited number of training samples [139,140].For instance, the salinity models were trained with only 382 data points, while the turbidity models had a training set of 138 data points.Another potential explanation for the variation in performance between the training and testing sets may stem from differences in the input/target patterns within the model inputs in the testing set compared to those in the training set.Salinity models were tested with 96 data points, while turbidity was tested with 35.Attempts to model chlorophyll-a using ML models were unsuccessful, likely due to a lack of in situ data.Chlorophyll-a measurements have only 17 data points.It is likely that this dataset did not provide large enough target time series to allow the ML algorithms to adequately the complex relationship between surface reflectance and the provided chlorophyll-a observations.

Conclusions
The integration of public-domain remote sensing data-modeling techniques has significantly enhanced WQI monitoring capabilities.Landsat-8 offers a unique opportunity to observe the spatiotemporal trends of WQ within expansive aquatic ecosystems.With advances in modeling techniques, numerous inquiries have arisen regarding the most effective approaches to model WQIs.This study examines the capabilities of both empirical and ML models to determine optimal methods for retrieving salinity, turbidity, and chlorophyll-a from Landsat-8 data over Matagorda Bay, Texas.
The uncalibrated empirical models exhibited poor performance when applied to Matagorda Bay.The unique environmental characteristics of Matagorda Bay contributed to the failure of these uncalibrated models to produce WQIs.However, when calibrated over Matagorda Bay, model performance improved when used to retrieve chlorophyll-a data.ML models, on the other hand, yielded meaningful results for both salinity and turbidity.Among the implemented ML families, DNN demonstrated the highest performance.This model was able to successfully map complex nonlinear relationships between Landsat-8 reflectance datasets and salinity and turbidity.
Our methodology offers a point of reference, a structured framework, and valuable insights for employing empirical and ML models on Landsat-8 data to characterize WQIs.This approach encourages broader, enhanced use of remote sensing datasets by the scientific community, end-users, and decision makers.Given the higher performance of calibrated C2 empirical models for chlorophyll-a and DNN models for both salinity and turbidity, the modeled WQIs could be used to expand the footprint of in situ observations and improve current efforts to conserve, enhance, and restore important habitats in aquatic ecosystems.The developed approach serves as a guide to enhance monitoring procedures for aquatic ecosystems.

27 Figure 1 .
Figure 1.The spatial distribution of Matagorda Bay along the Texas Gulf Coast, and the locations of the in situ WQIs measurements used in this study.(a) The location of the study area along the Texas coast.(b) The yearly distribution of in situ WQI measurements during the study period (2014-2023).

Figure 1 .
Figure 1.The spatial distribution of Matagorda Bay along the Texas Gulf Coast, and the locations of the in situ WQIs measurements used in this study.(a) The location of the study area along the Texas coast.(b) The yearly distribution of in situ WQI measurements during the study period (2014-2023).

Figure 3 .
Figure 3. Input and target variables and structures of the ML families used to retrieve salinity and turbidity data over Matagorda Bay.

Figure 3 .
Figure 3. Input and target variables and structures of the ML families used to retrieve salinity and turbidity data over Matagorda Bay.

Figure 4 .
Figure 4. Performance statistics for (a) chlorophyll-a, (b) salinity, and (c) turbidity uncalibrated (solid colors) and calibrated (dashed lines) empirical models.Note the y-axis is fitted to ±1 for display purposes; actual higher and lower values are listed in Table4.

Figure 4 . 27 Figure 5 .
Figure 4. Performance statistics for (a) chlorophyll-a, (b) salinity, and (c) turbidity uncalibrated (solid colors) and calibrated (dashed lines) empirical models.Note the y-axis is fitted to ±1 for display purposes; actual higher and lower values are listed inTable 4. Remote Sens. 2024, 16, x FOR PEER REVIEW 14 of 27

Figure 5 .
Figure 5. Observed and modeled (a) chlorophyll-a, (b) salinity, and (c) turbidity values generated from calibrated empirical models over Matagorda Bay.Red lines indicate a 1:1 relationship.

Figure 6 .
Figure 6.Performance metrics for optimal salinity models derived from different ML model families (e.g., DNN, DRF, GBM, GLM).Metrics are displayed for (a) training and (b) testing phases.

Figure 6 .
Figure 6.Performance metrics for optimal salinity models derived from different ML model families (e.g., DNN, DRF, GBM, GLM).Metrics are displayed for (a) training and (b) testing phases.
Figure 9 depicts scatterplots that illustrate the relationship between the observed and modeled turbidity values produced by the optimal model from each ML family during both the training and testing phases.All model families tend to overpredict turbidity values during the training phase.The performances of the DNN, GBM, and DRF models are close for both training and testing.Remote Sens. 2024, 16, x FOR PEER REVIEW 16 of 27

Figure 7 .
Figure 7.The observed and modeled salinity values generated for the optimal model in each ML family during the training and testing phases.The red lines indicate a 1:1 relationship.

Figure 7 .
Figure 7.The observed and modeled salinity values generated for the optimal model in each ML family during the training and testing phases.The red lines indicate a 1:1 relationship.
depicts scatterplots that illustrate the relationship between the observed and modeled turbidity values produced by the optimal model from each ML family during both the training and testing phases.All model families tend to overpredict turbidity values during the training phase.The performances of the DNN, GBM, and DRF models are close for both training and testing.

Figure 8 .
Figure 8. Performance metrics for different turbidity ML model families (e.g., DNN, DRF, GBM, and GLM).Metrics are displayed for the (a) training and (b) testing phases.

Figure 8 .
Figure 8. Performance metrics for different turbidity ML model families (e.g., DNN, DRF, GBM, and GLM).Metrics are displayed for the (a) training and (b) testing phases.
depicts scatterplots that illustrate the relationship between the observed and modeled turbidity values produced by the optimal model from each ML family during both the training and testing phases.All model families tend to overpredict turbidity values during the training phase.The performances of the DNN, GBM, and DRF models are close for both training and testing.

Figure 8 .
Figure 8. Performance metrics for different turbidity ML model families (e.g., DNN, DRF, GBM, and GLM).Metrics are displayed for the (a) training and (b) testing phases.

Figure 9 .
Figure 9.The observed and modeled turbidity values generated for the optimal model in each ML family during the training and testing phases.The red lines indicate a 1:1 relationship.

Figure 9 .
Figure 9.The observed and modeled turbidity values generated for the optimal model in each ML family during the training and testing phases.The red lines indicate a 1:1 relationship.

in August 2023 to 25.50 µg/L in September 2019. Salinity ranged from 0.10 psu in May 2021 to 35.91 psu in October 2022. Turbidity ranged from 2.0 NTU in November 2017 to 91.0 NTU in July 2020. The average (± standard deviation) is estimated at 17.37 ± 8.22 psu, 25.18 ± 18.60 NTU, and 5.11 ± 7.55 µg/L for salinity, turbidity, and chlorophyll-a, respectively.Table 1 .
Statistics for in situ WQIs (salinity, turbidity, and chlorophyll-a) collected over Matagorda Bay, at locations shown in Figure1, from 2014 to 2023 1 .

Table 2 .
Statistics for surface reflectance data extracted at in situ locations 1 .
1As shown in Figure1, at times where Landsat-8 acquisition dates and in situ observation align with each other the during investigated period (2014-2023). 2St.Dev."denotes standard deviation.Remote Sens. 2024, 16, x FOR PEER REVIEW 6 of 27

Table 2 .
Statistics for surface reflectance data extracted at in situ locations1.

Table 4 .
Performance metrics for 36 uncalibrated and calibrated empirical models (equations shown in Table3) applied over Matagorda Bay.

Table 5 .
Performance metrics for optimal ML model generated for salinity and turbidity during training and testing phases.

Table 5 .
Performance metrics for optimal ML model generated for salinity and turbidity during training and testing phases.