A Six-Year, Spatiotemporally Comprehensive Dataset and Data Retrieval Tool for Analyzing Chlorophyll-a, Turbidity, and Temperature in Utah Lake Using Sentinel and MODIS Imagery
Abstract
1. Summary
2. Data Description
- Date: Daily timestep from 1 January 2019 to 20 March 2025, formatted as mm/dd/yyyy.
- Point_id: A unique integer identifier for each sample point location.
- Latitude: In the WGS 84 projection.
- Longitude: In the WGS 84 projection.
- Dataset: A label indicating the set of sampling points that the data point belongs to: whole-lake, boxes, or clusters (see Section 3.3).
- Category: For the clusters and boxes datasets, a label that indicates which sub-category the data point belongs to. For clusters, either Open Water or Near Shore; for boxes, one of Provo Bay, North Lake, Center Lake, or Goshen Bay.
- In_PB: FALSE if the data point is outside Provo Bay, TRUE if inside.
- Int_flag: FALSE if the data point is from a satellite image, TRUE if the data point is interpolated from surrounding data (see Section 3.4). Note that this flag only applies to chl-a and turbidity data, not MODIS-derived data.
- Parameter: Chl-a, turbidity, dayTemp, or nightTemp. Chl-a and turbidity are derived from Sentinel 2 imagery; dayTemp and nightTemp are derived from MODIS imagery.
- value: computed parameter value for the image pixel in the location specified by the coordinates on the specified date, or the interpolated value if satellite data for that pixel was not available on that date.
- ○
- The units for chl-a, turbidity, and temperature are µg/L, NTU, and °C, respectively.
- There are 5,611,432 rows in total, with 468,400 values for each of the three datasets and four parameters.
- Date: Daily timestep from 1 January 2019 to 20 March 2025, formatted as mm/dd/yyyy.
- Point_id: A unique integer identifier for each sample point location.
- Red: Value of the red band.
- Green: Value of the green band.
- Blue: Value of the blue band.
- RE1: Value of the red edge 1 band.
3. Methods
3.1. Sentinel Image Processing
3.2. Remote Sensing Models
3.2.1. In Situ Data
3.2.2. Combining Physics-Based and Empirical Models
3.2.3. Chlorophyll-a
3.2.4. Turbidity
3.2.5. Temperature
3.3. Image Sampling and Model Application
- Making the data suitable for statistical analyses that assume a random sample.
- Allowing us to extract more usable data from images with partial cloud cover.
- Eliminating the need for complex and imprecise water masking procedures, because we located sample points only in areas where we knew there would be water.
- Providing a smaller, more accessible dataset relative to the extremely large and complex data set of satellite imagery over extended time periods.
3.3.1. Whole-Lake Samples
3.3.2. Boxes Sample
3.3.3. Machine Learning Near-Shore, Open-Water Clusters
3.3.4. Model Application
3.4. Imputation of Missing and Anomalous Values
3.4.1. Adjusting Anomalous Near-Shore Temperature Values
3.4.2. Imputing and Interpolating Missing Temperature Data
3.4.3. Interpolating Missing Chl-a and Turbidity Values
4. Jupyter Notebook Implementation
4.1. Notebook Description
4.2. Notebook Outline
- Setup
- Load packages and set up GEE API;
- Define the rough lake outline with hand-selected coordinates (to analyze a different water body, the user can supply different coordinates).
- Retrieve and process satellite image collection
- Load Sentinel 2 data;
- Apply processing functions that scale bands, perform initial quality assurance, and rename bands for future processing.
- Create sample point collection
- Whole-lake collection
- Generate a whole-lake boundary by creating a composite Sentinel 2 image and applying the Modified Normalized Difference Water Index;
- Export boundary as a GEE asset;
- Generate sample points inside the lake boundary, add metadata features, and export the feature collection as a GEE asset.
- Cluster collection
- Generate cluster boundaries by creating a composite Sentinel 2 image and applying a clustering algorithm to computed band percentiles;
- Export clusters as polygons to GEE asset;
- Generate sample points inside cluster boundaries, add metadata features, and export feature collection as GEE asset.
- Boxes collection
- Generate box boundaries with user-selected coordinates and export as a GEE asset (to analyze a different water body, the user can supply different coordinates);
- Generate sample points inside box boundaries, add metadata features, and export feature collection as GEE asset.
- Combine the three feature collections into one, add a point ID for future merging, and export as a GEE asset and as a shapefile for visualizations.
- Obtain Sentinel 2 band values from sampling points
- Load the combined points collection and extract pixel values from Sentinel 2 images at the specified points;
- Export pixel data with date, location, and metadata to Google Drive (cannot export to asset due to GEE’s memory limits even with this reduced dataset).
- Obtain MODIS temperature values from sampling points
- Retrieve and process MODIS imagery collection
- Apply processing functions that scale bands and set metadata properties;
- Extract pixel values from images at the specified points;
- Export pixel data with date, location, and metadata to Google Drive (cannot export to asset due to GEE’s memory limits);
- Extract temperature values from the single usable MODIS pixel in Provo Bay and export to Google Drive (not necessary for other waterbodies unless there is a similar issue with a small area entirely excluded by the 1 km buffer).
- Process extracted MODIS data
- Replace values for pixels located in Provo Bay with the value of a single unmixed pixel (not necessary for other waterbodies unless there is a similar issue with a small area where only a single MODIS pixel is valid);
- Replace values of pixels within 1 km of shore with nearest neighbor values
- Impute missing data from partially clouded images with daily lake temperature median;
- Impute missing data from fully clouded images with PCHIP temporal interpolation.
- Apply pre-defined chl-a and turbidity models to Sentinel 2 data
- Load the exported dataset of band values;
- Apply band models and filter for valid values;
- Impute missing data with PCHIP temporal interpolation.
- Combine and export the final dataset
- Merge processed Sentinel 2 and MODIS datasets and perform additional data cleaning and formatting.
4.3. Key Outputs
- Waterbody boundary polygon shapefile.
- Random sample point dataset with coordinates.
- Chl-a and turbidity estimates for each point from Sentinel 2.
- Day and night surface temperature estimates for each point from MODIS.
5. User Notes
5.1. Summary Statistics
5.2. Summary Plots
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Strong, A.E. Remote sensing of algal blooms by aircraft and satellite in Lake Erie and Utah Lake. Remote Sens. Environ. 1974, 3, 99–107. [Google Scholar] [CrossRef]
- Maciel, D.A.; Pahlevan, N.; Barbosa, C.C.F.; de Novo, E.M.L.d.M.; Paulino, R.S.; Martins, V.S.; Vermote, E.; Crawford, C.J. Validity of the Landsat surface reflectance archive for aquatic science: Implications for cloud-based analysis. Limnol. Oceanogr. Lett. 2023, 8, 850–858. [Google Scholar] [CrossRef]
- Toming, K.; Kutser, T.; Laas, A.; Sepp, M.; Paavel, B.; Nõges, T. First Experiences in Mapping Lake Water Quality Parameters with Sentinel-2 MSI Imagery. Remote Sens. 2016, 8, 640. [Google Scholar] [CrossRef]
- Taggart, J.B.; Ryan, R.L.; Williams, G.P.; Miller, A.W.; Valek, R.A.; Tanner, K.B.; Cardall, A.C. Historical Phosphorus Mass and Concentrations in Utah Lake: A Case Study with Implications for Nutrient Load Management in a Sorption-Dominated Shallow Lake. Water 2024, 16, 933. [Google Scholar] [CrossRef]
- Gholizadeh, M.H.; Melesse, A.M.; Reddi, L. A Comprehensive Review on Water Quality Parameters Estimation Using Remote Sensing Techniques. Sensors 2016, 16, 1298. [Google Scholar] [CrossRef] [PubMed]
- Nelson, S.A.C.; Soranno, P.A.; Cheruvelil, K.S.; Batzli, S.A.; Skole, D.L. Regional assessment of lake water clarity using satellite remote sensing. J. Limnol. 2003, 62, 27. [Google Scholar] [CrossRef]
- Kutser, T. Quantitative detection of chlorophyll in cyanobacterial blooms by satellite remote sensing. Limnol. Oceanogr. 2004, 49, 2179–2189. [Google Scholar] [CrossRef]
- Ogashawara, I. Determination of Phycocyanin from Space—A Bibliometric Analysis. Remote Sens. 2020, 12, 567. [Google Scholar] [CrossRef]
- Olmanson, L.G.; Bauer, M.E.; Brezonik, P.L. A 20-year Landsat water clarity census of Minnesota’s 10,000 lakes. Remote Sens. Environ. 2008, 112, 4086–4097. [Google Scholar] [CrossRef]
- Shi, K.; Zhang, Y.; Qin, B.; Zhou, B. Remote sensing of cyanobacterial blooms in inland waters: Present knowledge and future challenges. Sci. Bull. 2019, 64, 1540–1556. [Google Scholar] [CrossRef]
- Hadjimitsis, D.G.; Clayton, C. Assessment of temporal variations of water quality in inland water bodies using atmospheric corrected satellite remotely sensed image data. Environ. Monit. Assess. 2009, 159, 281–292. [Google Scholar] [CrossRef]
- Hansen, C. Google Earth Engine as a Platform for Making Remote Sensing of Water Resources a Reality for Monitoring Inland Waters. In Proceedings of the World Environmental and Water Resources Congress 2015, Austin, TX, USA, 17–21 May 2015. [Google Scholar]
- Sogandares, F.M.; Fry, E.S. Absorption spectrum (340–640 nm) of pure water. I. Photothermal measurements. Appl. Opt. 1997, 36, 8699–8709. [Google Scholar] [CrossRef]
- Matthews, M.W. A current review of empirical procedures of remote sensing in inland and near-coastal transitional waters. Int. J. Remote Sens. 2011, 32, 6855–6899. [Google Scholar] [CrossRef]
- Tanner, K.B.; Cardall, A.C.; Williams, G.P. A Spatial Long-Term Trend Analysis of Estimated Chlorophyll-a Concentrations in Utah Lake Using Earth Observation Data. Remote Sens. 2022, 14, 3664. [Google Scholar] [CrossRef]
- Hansen, C.H.; Williams, G.P. Evaluating Remote Sensing Model Specification Methods for Estimating Water Quality in Optically Diverse Lakes Throughout the Growing Season. Hydrology 2018, 5, 62. [Google Scholar] [CrossRef]
- Pahlevan, N.; Sarkar, S.; Franz, B.A.; Balasubramanian, S.V.; He, J. Sentinel-2 MultiSpectral Instrument (MSI) data processing for aquatic science applications: Demonstrations and validations. Remote Sens. Environ. 2017, 201, 47–56. [Google Scholar] [CrossRef]
- Cardall, A.; Tanner, K.B.; Williams, G.P. Google Earth Engine Tools for Long-Term Spatiotemporal Monitoring of Chlorophyll-a Concentrations. Open Water J. 2021, 7, 4. [Google Scholar]
- Pekel, J.-F.; Cottam, A.; Gorelick, N.; Belward, A.S. High-resolution mapping of global surface water and its long-term changes. Nature 2016, 540, 418–422. [Google Scholar] [CrossRef]
- Mishra, S.; Mishra, D.R. Normalized difference chlorophyll index: A novel model for remote estimation of chlorophyll-a concentration in turbid productive waters. Remote Sens. Environ. 2012, 117, 394–406. [Google Scholar] [CrossRef]
- Cardall, A.C.; Hales, R.C.; Tanner, K.B.; Williams, G.P.; Markert, K.N. LASSO (L1) Regularization for Development of Sparse Remote-Sensing Models with Applications in Optically Complex Waters Using GEE Tools. Remote Sens. 2023, 15, 1670. [Google Scholar] [CrossRef]
- PSOMAS; SWAC. Utah Lake TMDL: Pollutant Loading Assessment & Designated Beneficial Use Impairment Assessment. In Prepared for the Utah Division of Water Quality; PSOMAS: Salt Lake City, UT, USA; SWAC: Salt Lake City, UT, USA, 2007; pp. 1–88. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Cruz-Retana, A.; Becerril-Piña, R.; Fonseca, C.R.; Gómez-Albores, M.A.; Gaytán-Aguilar, S.; Hernández-Téllez, M.; Mastachi-Loza, C.A. Assessment of Regression Models for Surface Water Quality Modeling via Remote Sensing of a Water Body in the Mexican Highlands. Water 2023, 15, 3828. [Google Scholar] [CrossRef]
- Nahorniak, J.S.; Abbott, M.R.; Letelier, R.M.; Scott Pegau, W. Analysis of a Method to Estimate Chlorophyll-a Concentration from Irradiance Measurements at Varying Depths. J. Atmos. Ocean. Technol. 2001, 18, 2063–2073. [Google Scholar] [CrossRef]
- Hansen, C.H.; Williams, G.P.; Adjei, Z.; Barlow, A.; Nelson, E.J.; Miller, A.W. Reservoir water quality monitoring using remote sensing with seasonal models: Case study of five central-Utah reservoirs. Lake Reserv. Manag. 2015, 31, 225–240. [Google Scholar] [CrossRef]
- Lazhu; Yang, K.; Qin, J.; Hou, J.; Lei, Y.; Wang, J.; Huang, A.; Chen, Y.; Ding, B.; Li, X. A Strict Validation of MODIS Lake Surface Water Temperature on the Tibetan Plateau. Remote Sens. 2022, 14, 5454. [Google Scholar] [CrossRef]
- Pour, H.K.; Duguay, C.R.; Solberg, R.; Rudjord, Ø. Impact of satellite-based lake surface observations on the initial state of HIRLAM. Part I: Evaluation of remotely-sensed lake surface water temperature observations. Tellus A Dyn. Meteorol. Oceanogr. 2014, 66, 21534. [Google Scholar] [CrossRef]
- Tavares, M.H.; Cunha, A.H.F.; Motta-Marques, D.; Ruhoff, A.L.; Cavalcanti, J.R.; Fragoso, C.R., Jr.; Martín Bravo, J.; Munar, A.M.; Fan, F.M.; Rodrigues, L.H.R. Comparison of methods to estimate lake-surface-water temperature using Landsat 7 ETM+ and MODIS imagery: Case study of a large shallow subtropical lake in southern Brazil. Water 2019, 11, 168. [Google Scholar] [CrossRef]
- Chavula, G.; Brezonik, P.; Thenkabail, P.; Johnson, T.; Bauer, M. Estimating the surface temperature of Lake Malawi using AVHRR and MODIS satellite imagery. Phys. Chem. Earth 2009, 34, 749–754. [Google Scholar] [CrossRef]
- Hook, S.J.; Vaughan, R.G.; Tonooka, H.; Schladow, S.G. Absolute radiometric in-flight validation of mid infrared and thermal infrared data from ASTER and MODIS on the Terra spacecraft using the Lake Tahoe, CA/NV, USA, automated validation site. IEEE Trans. Geosci. Remote Sens. 2007, 45, 1798–1807. [Google Scholar] [CrossRef]
- Pareeth, S.; Salmaso, N.; Adrian, R.; Neteler, M. Homogenised daily lake surface water temperature data generated from multiple satellite sensors: A long-term case study of a large sub-Alpine lake. Sci. Rep. 2016, 6, 31251. [Google Scholar] [CrossRef]
- Crosman, E.T.; Horel, J.D. MODIS-derived surface temperature of the Great Salt Lake. Remote Sens. Environ. 2009, 113, 73–81. [Google Scholar] [CrossRef]
- Liu, G.; Ou, W.; Zhang, Y.; Wu, T.; Zhu, G.; Shi, K.; Qin, B. Validating and mapping surface water temperatures in Lake Taihu: Results from MODIS land surface temperature products. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 1230–1244. [Google Scholar] [CrossRef]
- Zanazzi, A.; Wang, W.; Peterson, H.; Emerman, S.H. Using Stable Isotopes to Determine the Water Balance of Utah Lake (Utah, USA). Hydrology 2020, 7, 88. [Google Scholar] [CrossRef]
- Pelleg, D.; Moore, A. X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In Proceedings of the 17th International Conference on Machine Learning, San Francisco, CA, USA, 29 June–2 July 2000. [Google Scholar]
- Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef] [PubMed]
Dataset | Parameter | Values Imputed with Mean | Values Temporally Interpolated with PCHIP |
---|---|---|---|
Whole-lake | Day Temp | 102,895 (22%) | 623 (0.14%) |
Whole-lake | Night Temp | 97,590 (21%) | 640 (0.15%) |
Boxes | Day Temp | 102,485 (22%) | 722 (0.04%) |
Boxes | Night Temp | 102,522 (22%) | 711(0.04%) |
Clusters | Day Temp | 104,462 (23%) | 614 (0.04%) |
Clusters | Night Temp | 100,103 (22%) | 426 (0.02%) |
Dataset | Parameter | Min | Max | Standard Deviation | Interquartile Range | Skewness | Kurtosis |
---|---|---|---|---|---|---|---|
Whole-Lake | Chl-a | 0.0 | 290 | 24 | 21 | 1.86 | 6.30 |
Whole-Lake | Turbidity | 1 | 499 | 34 | 30 | 3.05 | 19.97 |
Whole-Lake | Day Temp | −7 | 45 | 10 | 19 | 0.047 | 1.68 |
Whole-Lake | Night Temp | −20 | 30 | 34 | 30 | −0.12 | 1.92 |
Boxes | Chl-a | 0 | 235 | 28 | 38 | 1.25 | 4.00 |
Boxes | Turbidity | 1 | 500 | 35 | 30 | 3.49 | 25.20 |
Boxes | Day Temp | −7 | 37 | 10 | 19 | 0.05 | 1.70 |
Boxes | Night Temp | −18 | 27 | 9 | 16 | −0.12 | 1.92 |
Clusters | Chl-a | 0 | 301 | 26 | 29 | 1.53 | 5.05 |
Clusters | Turbidity | 1 | 500 | 39 | 35 | 2.94 | 17.85 |
Clusters | Day Temp | −9 | 44 | 10 | 19 | 0.06 | 1.71 |
Clusters | Night Temp | −20 | 28 | 10 | 16 | −0.10 | 1.93 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tanner, K.B.; Cardall, A.C.; Williams, G.P. A Six-Year, Spatiotemporally Comprehensive Dataset and Data Retrieval Tool for Analyzing Chlorophyll-a, Turbidity, and Temperature in Utah Lake Using Sentinel and MODIS Imagery. Data 2025, 10, 128. https://doi.org/10.3390/data10080128
Tanner KB, Cardall AC, Williams GP. A Six-Year, Spatiotemporally Comprehensive Dataset and Data Retrieval Tool for Analyzing Chlorophyll-a, Turbidity, and Temperature in Utah Lake Using Sentinel and MODIS Imagery. Data. 2025; 10(8):128. https://doi.org/10.3390/data10080128
Chicago/Turabian StyleTanner, Kaylee B., Anna C. Cardall, and Gustavious P. Williams. 2025. "A Six-Year, Spatiotemporally Comprehensive Dataset and Data Retrieval Tool for Analyzing Chlorophyll-a, Turbidity, and Temperature in Utah Lake Using Sentinel and MODIS Imagery" Data 10, no. 8: 128. https://doi.org/10.3390/data10080128
APA StyleTanner, K. B., Cardall, A. C., & Williams, G. P. (2025). A Six-Year, Spatiotemporally Comprehensive Dataset and Data Retrieval Tool for Analyzing Chlorophyll-a, Turbidity, and Temperature in Utah Lake Using Sentinel and MODIS Imagery. Data, 10(8), 128. https://doi.org/10.3390/data10080128