Advancing Scalable Methods for Surface Water Monitoring: A Novel Integration of Satellite Observations and Machine Learning Techniques

Renshaw, Megan; Magruder, Lori A.

doi:10.3390/geosciences15070255

Open AccessArticle

Advancing Scalable Methods for Surface Water Monitoring: A Novel Integration of Satellite Observations and Machine Learning Techniques

by

Megan Renshaw

^1,2

and

Lori A. Magruder

^1,2,*

¹

Department of Aerospace Engineering and Engineering Mechanics, Cockrell School of Engineering, University of Texas at Austin, Austin, TX 78705, USA

²

Center for Space Research, University of Texas at Austin, Austin, TX 78759, USA

^*

Author to whom correspondence should be addressed.

Geosciences 2025, 15(7), 255; https://doi.org/10.3390/geosciences15070255

Submission received: 7 May 2025 / Revised: 19 June 2025 / Accepted: 30 June 2025 / Published: 3 July 2025

(This article belongs to the Section Hydrogeology)

Download

Browse Figures

Versions Notes

Abstract

Accurate surface water volume (SWV) estimates are crucial for effective water resource management and for the regional monitoring of hydrological trends. This study introduces a multi-resolution surface water volume estimation framework that integrates ICESat-2 altimetry, Sentinel-1 Synthetic Aperture Radar (SAR), and Sentinel-2 multispectral imagery via machine learning to improve the vertical resolution of a digital elevation model (DEM) to improve the accuracy of SWV estimates. The machine learning approach provides a significant improvement in terrain accuracy relative to the DEM, reducing RMSE by ~66% and 78% across the two models, respectively, over the initial data product fidelity. Assessing the resulting SWV estimates relative to GRACE-FO terrestrial water storage in parts of the Amazon Basin, we found strong correlations and basin-wide drying trends. Notably, the high correlation (r > 0.8) between our surface water estimates and the GRACE-FO signal in the Manaus region highlights our method’s ability to resolve key hydrological dynamics. Our results underscore the value of improved vertical DEM availability for global hydrological studies and offer a scalable framework for future applications. Future work will focus on expanding our DEM dataset, further validation, and scaling this methodology for global applications.

Keywords:

ICESat-2; GRACE-FO; machine learning; hydrology; Sentinel-2; surface water volume

1. Introduction

Surface water (SW) describes the total water stored in Earth’s rivers, lakes, reservoirs, and wetlands. Tracking and monitoring SW values is critical to NASA’s strategic goal to enhance our understanding of Earth’s water cycle in the context of a changing climate [1]. For local communities, precise volume estimates of SW are necessary to allocate water for agricultural, industrial, and municipal use. This information is also vital for designing and maintaining flood defenses and water storage facilities that are key to preventing damage during extreme weather events [2]. At a broader scale, SW plays a major role in the global water cycle as a contributor to terrestrial water storage (TWS). TWS refers to the total water stored within the land-based entities of Earth’s hydrological cycle [3]. In addition to SW, it also includes contributions from groundwater (GW), snow water equivalent (SWE), canopy water (CW), and soil moisture (SM). Fluctuations in TWS, which are known as terrestrial water storage anomalies (TWSAs), are indicators of regional water availability, drought, and flood conditions. Tracking these TWSAs is crucial for reconciling water fluxes and maintaining the continental water balance.

The Gravity Recovery and Climate Experiment (GRACE) mission (2003–2017) and its successor GRACE Follow-On (GRACE-FO) mission (2018–present) measure changes in Earth’s gravity field to monitor TWSAs and provide a global-scale picture of water mass changes [4]. While the GRACE and GRACE-FO missions provide critical insights into TWSAs, their observations are relatively coarse (~1°) and cannot separate out the individual TWS components (e.g., surface water, groundwater, and soil moisture). However, despite the importance of SWV and, ultimately, SW estimates, accurate data collection to facilitate estimation can be limited.

Traditional in situ methods for SW observation, such as flumes or weirs, are spatially and temporally limited by their discrete locations and infrequent sampling. These observations can fail to capture the dynamic nature of surface water bodies, especially in remote or inaccessible regions. Gauge stations offer another measurement option, but their coverage distribution is often sparse, leaving vast areas unmonitored. Airborne remote sensing, such as airborne lidar surveys (ALSs), offer broader spatial coverage than in situ methods, but can be limited by cost and logistics. Hydrological models, such as HEC-HMS (Hydrologic Engineering Center—Hydrologic Modeling System), simulate water movement and can estimate surface water volume, but their accuracy is heavily dependent on the quality and availability of input data (e.g., precipitation and topography) and may not fully capture the complexities of real-world hydrological processes such as from anthropogenic effects.

Space-based remote sensing platforms provide global coverage and frequent observations, addressing many of the issues faced by other approaches in search of SWV estimates. These platforms enable the estimation of SWV by combining information on water extent or water surface area (typically derived from multispectral imagery like Sentinel-2 and Landsat missions) with topographic information. Current work leverages these remote sensing capabilities with initiatives like DAHITI (Database for Hydrological Time Series), providing valuable datasets of water storage changes [5]. Factors like cloud cover, topography, and canopy cover still affect the quality of optical imagery and topographic observations, which warrant new methods for improving existing techniques.

Accurate SWV estimates derived from satellite observations depend on the relationship between the surface area of a water body and its underlying topography, which can be derived from a digital elevation model (DEM). The accuracy of this DEM is fundamentally important because vertical error in the terrain elevation directly propagates into the water depth calculation (Water Depth = Water Surface Elevation—Bed Elevation), leading to significant miscalculations of total stored volume. Even minor, systematic biases in a DEM, such as those caused by vegetation canopy, can distort the perceived shape. Recent research has underscored this dependency, demonstrating that the choice of DEM significantly impacts lake volume monitoring and is critical for accurately reconstructing river channel cross-sections for hydrological modeling [6,7]. To address these inaccuracies, machine learning models have emerged as a powerful tool for DEM correction and refinement. For example, studies have demonstrated that both gradient-boosted regression (GBR) trees and convolutional neural networks (CNNs) can effectively estimate elevation errors in DEM using satellite laser altimetry data as a reference. These models incorporate terrain attributes and land cover to predict and correct elevation biases, particularly in complex environments such as vegetated and water-covered areas [8].

There exists a research gap in creating a unified framework that can accurately estimate SWV across multiple scales in areas like the Amazon. This gap is driven by two key challenges—(1) standard global DEMs suffer from large vertical errors under dense vegetation canopies, and (2) persistent cloud cover limits the use of optical satellites for consistent water extent mapping.

This manuscript presents a methodology to estimate SWV at both local (~10 m) and regional (~1°) spatial resolution scales, with a focus on the Brazilian Amazon as a proxy for a future solution to global SWV needs. This approach consists of using the following data sources and techniques:

High vertical accuracy lidar-derived DEMs from airborne and spaceborne systems to correct and refine existing topographic data.
Synthetic Aperture Radar (SAR) imagery for water extent mapping, leveraging SAR’s ability to penetrate cloud cover and vegetation, providing all-weather water body detection.
A robust volume estimation technique combining water extent and DEM data from multi-data fusion.

Generating these detailed volume estimates with a new machine learning approach, we aim to ultimately analyze SW contribution to TWSA and investigate local surface water volume dynamics, including reservoir storage trends and flood inundation patterns. This multi-scale water monitoring framework approach offers a method to enhance water resource management and deepen our understanding of the Amazon’s hydrological processes.

We present the materials and methods in Section 2 with a focus on data from space-based missions and airborne systems, as well as the techniques for ML model development and implementation. Section 3 provides the results of the study and includes the model assessment and the validation of the resulting estimates of SWV at a local and regional scale. Section 4 will present discussion points on the fidelity of the methods output and future research opportunities to push these techniques forward.

2. Materials and Methods

2.1. Volume Estimation Overview

Existing methods for SWV estimation typically combine water extent values derived from remote sensing imagery with 3D topographic data. Common approaches include pixel-based volume calculation, where the volume of each water pixel is computed by multiplying its area by the water level at that location (derived from the DEM). Then, each individual pixel is summed across the entire water body for the total SWV. Figure 1 illustrates the standard workflow for volume estimation, where V is the total volume, A_i is the area of the i-th pixel identified as water, and h_i is the water depth at that pixel, derived from the DEM.

To obtain water level values, we leverage SAR-derived water masks to determine a water boundary and extract our elevation values from the DEM at this point. Then, using the extent and elevation, we can determine the water level at a specific point in time at a specific geographical location. Water levels can also be obtained via satellite altimetry from platforms like ICESat-2 (Ice, Cloud, and land Elevation Satellite-2). This approach has been used as a part of flood studies, coastal area research, and hydrodynamic simulations [9]. Additionally, refined and validated water level estimates are necessary when direct altimetry measurements may be sparse.

Other techniques for SWV estimation utilize relationships between water surface area and volume, often based on empirical relationships or hypsometric curves [10,11]. Hypsometry relates the surface area of a water body to its volume, based on the basin’s hypsometric curve. This approach relies on the assumption of a consistent relationship between area and volume, which may not always be valid in complex floodplains [12].

The accuracy of SWV estimates is fundamentally dependent on the quality of both the water extent delineation and the underlying DEM. In densely vegetated regions like the Amazon, traditional DEMs derived from radar interferometry or photogrammetry can be inaccurate due to vegetation canopy obscuring the ground [10]. This can lead to significant errors in water depth estimation and, consequently, in volume calculation. Also, accurate water extent delineation in vegetated areas is challenging due to the spectral mixing of water and vegetation, as well as the presence of cloud cover. Building on current SWV estimation challenges, we sought to improve both DEM accuracy and water surface area to generate higher-fidelity SWV estimates, specifically in densely canopied regions where inaccuracies are most prolific.

2.2. Study Region

With a focus on the Amazon Rainforest, this new SWV estimation methodology is evaluated in parts of Brazil and Peru. The region is characterized by dense, multilayered vegetation canopies and exceptional biodiversity, and it experiences pronounced annual flood cycles with water level fluctuations of up to 12 m. These floods are crucial for nutrient distribution and ecosystem health, creating unique habitats such as várzea (seasonally flooded forests) and igapó (permanently flooded forests) [13]. Presently, flood events in the Amazon continue to increase in frequency and intensity due to large scale climate patterns such as the El Niño-Southern Oscillation (ENSO) and anthropogenic-fueled climate change [14,15]. ENSO events, particularly La Niña, have been linked to increased precipitation and flooding in the Amazon, while human-induced climate change is known to exacerbate these patterns’ impact [16].

Additionally, the Amazon’s seasonal large-scale flooding allows us to monitor meaningful water storage changes at both local and regional levels. TWSA measurements from GRACE-FO reflect these seasonal (wet and dry) and inter-annual variations driven by broader climate patterns. Figure 2 illustrates the TWSA from 2019 to 2024 over South America.

2.3. DEM Generation

Global and publicly available DEMs, such as those derived from the SRTM (Shuttle Radar Topography Mission) or the Advanced World 3D 30 m (AW3D30), are typically limited to a 30 m spatial resolution, which can be insufficient for detailed, local-scale analysis. Additionally, these DEMs are often not bare-earth terrain models but rather surface models that include canopy height and other features (i.e., built structures) on top of the terrain. This can lead to significant overestimations of water depth, especially in densely vegetated areas. In turn, this fundamentally impacts the calculation of water level and estimation volume. The process typically employed to correct or refine DEMs from these known issues involves removing vegetation, filling data voids, and adjusting any known terrain inaccuracies. A corrected DEM can better represent the heights and provide a continuous terrain surface. Machine learning (ML) models can be effective in reducing errors in elevation data by incorporating methods of data fusion from multi-sensor observations for the prediction of a height correction at a given location. To enhance the accuracy of our SWV estimates, we generate a high-resolution DEM with canopy removed through a two-stage ML approach, combining the high vertical accuracy of ICESat-2 lidar data with the broader coverage of other elevation datasets. Figure 3 illustrates the difference between the lidar and a reference DEM in the along-track profile.

The first-stage ML model is trained on several million reference elevations from ICESat-2 and is paired (correlated) with the corresponding elevation from an initial DEM and other terrain/environment characteristics. This first stage produces an improved (more accurate) DEM over a broad region, in contrast to the sparsely sampled terrain measurements in the actual ICESat-2 elevation profiles.

Building on the DEM product from the first stage, a second ML model is developed to reach a high-spatial-resolution DEM, like those derived from airborne lidar survey (ALS) data. These proxy ALS DEMs use the previous initial DEM and environmental parameters along with the first model’s predicted heights. This two-stage approach allowed us to generate a high-vertical-resolution DEM with a canopy removed.

2.3.1. Machine Learning Data Inputs

AW3D30 (Terrain Reference)

This study utilizes AW3D30, a global DEM derived from the Advanced Land Observing Satellite (ALOS), as the baseline DEM. The AW3D30 DEM is a publicly available global dataset developed by the Japanese Aerospace Exploration Agency (JAXA). The Panchromatic Remote-sensing Instrument for Stereo Mapping (PRISM) aboard ALOS provides a digital surface model (DSM), capturing vegetation and building features in addition to unobscured terrain heights [17]. While PRISM excels in high-resolution terrain mapping, optical systems are limited in the ability to see through dense vegetation canopies for terrain measurements compared to active remote sensing technologies like lidar or SAR. AW3D30 does show a good performance over complex terrain, but like most DEMs, the accuracy is dependent on regional characteristics like slope and vegetation density [18]. As the baseline DEM, AW3D30 is the starting point for terrain representation over our study areas. The staged workflow will refine and correct this surface model, which is a critical process for the improvement of SWV estimate accuracy. We obtained our AW3D30 elevation models from Google Earth Engine (GEE).

ICESat-2 (Terrain Reference)

The first-stage ML model targets ICESat-2 terrain estimates as the reference variable. ICESat-2, launched in 2018, provides a near-global coverage (88°N–88°S) at a revisit period of 91 days and an 11 m swath width for each of its six profiling beams [19]. Onboard, the Advanced Topographic Laser Altimeter System (ATLAS) instrument provides sub-centimeter vertical accuracy with < 3 m average geolocation knowledge. ATLAS measurements are collected over all surfaces, providing a means to collect both water surface levels and ground elevations even in the presence of vegetation, which is especially important in the South American study regions of interest. The ATLAS 03 (ATL03) along-track product provides global WGS84 ellipsoid geolocated photon heights, while the higher-level land and vegetation along-track product (ATL08) provides binned photon estimates at variable length scales with canopy and ground classifications that are essential to describing ecosystem structure [20,21]. We used SlideRule, an on-demand science data processing service, to obtain ATL03 data with ATL08 classifications [22]. The ground photon count (“gnd_ph_count”) parameter was used to filter ATL03 photon data for high-certainty ground returns. The ICESat-2 ATL03 elevations from a region of South America is shown in Figure 4 as an example of the spatial distribution of data available.

Sentinel-1

Sentinel-1 is a European Space Agency (ESA) C-band Synthetic Aperture Radar (SAR) mission. Sentinel-1 provides high spatial and temporal resolution radar backscatter returns, with a 10 m spatial resolution and a revisit time of six days up to 2021 and of twelve days after 2021 [23]. The backscatter intensity collected by Sentinel-1 is sensitive to various terrain characteristics, including surface roughness, soil moisture, and vegetation structure. Smooth water surfaces typically exhibit low backscatter, appearing dark in SAR imagery, while rougher/more vegetated surfaces have a higher backscatter, appearing brighter. In this study, Sentinel-1 data were used for mapping water extent, taking advantage of its all-weather capabilities and ability to identify water surfaces. Sentinel-1 also provides input features into the ML models. These data were also obtained from GEE.

Sentinel-2

Sentinel-2 is another ESA mission, providing high-resolution optical imagery from its Multi-Spectral Instrument (MSI) [24]. Sentinel-2’s MSI acquires data in 13 spectral bands, across the visible to shortwave infrared regime of the electromagnetic spectrum. These bands have differing sensitivities to different land surface characteristics including vegetation, soil, and water. Sentinel-2′s spatial resolution (10 m for its red, blue, green, and near-infrared bands) and multi-spectral capabilities make it useful for land cover mapping and vegetation monitoring. Sentinel-2 data provided information about vegetation characteristics for the ML models. We used Sentinel-2 scenes from the Harmonized Landsat Sentinel (HLS) dataset downloaded from GEE.

Airborne Lidar Surveys (Terrain Reference)

ALSs produce high-resolution point clouds of terrain and vegetation in areas spanning several km. In the Amazon, the Sustainable Landscapes Project has sponsored ALS collections since 2008. These datasets are publicly available and have been georeferenced, noise-filtered, and corrected for misalignment of overlapping flightlines. Their average point density was 10 points per square meter and many of these surveys were conducted over national reserves. Lidar surveys from 2008 to 2021 were filtered for ground returns and rasterized at one meter resolution [25,26,27,28,29,30]. The lidar data were originally referenced to the SIRGAS2000 and Universal Transverse Mercator (UTM) projection. Figure 5 shows the distribution of ALS collection in the ML architecture for improving SWV estimates.

2.3.2. Initial Assessment

In regions like the Amazon, the presence of a dense canopy contributes significantly to terrain elevation accuracy in a DEM like AW3D30. ICESat-2 measurements of terrain elevations under vegetation are much less often negatively influenced. There are many examples in the literature that show that ICESat-2 has a comparable performance of elevation retrieval accuracy to ALSs, despite challenges with vegetation [31,32]. However, a drawback associated with ICESat-2 estimates is the sparse data coverage in the across-track direction, which is a common issue with profiling systems, as shown in Figure 6.

To leverage both the high accuracy but sparse ICESat-2 elevation, as well as the wall-to-wall coverage of AW3D30, we developed a ML model to predict ICESat-2 terrain-classified elevations across a wider spatial extent, which can be used to correct the initial AW3D30 DEM. RMSE (Root Mean Square Error), MAE (Mean Absolute Error), and MAPE (Mean Absolute Percentage Error) are the metrics used as a means to assess the initial quality of AW3D30 heights relative to the ICESat-2 observations. Equations (1)–(3) represent the equations for calculating RMSE, MAE, and MAPE, respectively.

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}}

(1)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |x_{i} - y_{i}|

(2)

M A P E = \frac{1}{n} \sum_{i = 1}^{n} |\frac{x_{i} - y_{i}}{x_{i}}| \times 100

(3)

The initial RMSE using the baseline AW3D30 DEM in comparison to ICESat-2 was found to be 34.01 m, while the initial MAE was 22.64 m. These error values are extremely high, but can be expected for the Amazon region as the error levels are very similar to the average tree height of 30 m [33]. The dynamic range of terrain elevations (~0 to 800 m) is also a contributing factor. The initial MAPE was found to be approximately 14%. Figure 7 represents the initial distribution of the difference between the ICESat-2 and AW3D30 terrain heights over our ROI. Figure 6a indicates a fairly symmetric distribution of errors between ICESat-2 and AW3D30, but shows that there is a slight negative bias in the direct comparison due to AW3D30 overestimating the terrain elevation. Figure 6b shows that the correlation between the two terrain products is mostly linear but with a wide range of errors both above and below the one-to-one fit line.

2.3.3. Model Selection and Training

To identify the best model architecture for our task, we evaluated several ML algorithms, including ensemble methods (Random Forest, Extremely Randomized Trees, XGBoost, CatBoost, Gradient Boosting, and LightGBM) known for their ability to handle complex relationships and large datasets, as well as regularized regression models (Lasso and Ridge) to address potential linearity in the terrain elevation estimates. We first trained these models on a subset of the data to assess their performance in minimizing the terrain height RMSEs between AW3D30 and ICESat-2 heights. Next, we selected the best-performing candidate for each stage, and we used an optimization framework for hyperparameter tuning (Optuna) to maximize model performance [34]. Figure 8 outlines the pipeline for our model training.

The inputs to the model include ICESat-2 and ALS lidar terrain heights; AW3D30 heights and predictor variables; derived slope; and topographic characteristics from Sentinel-1 and Sentinel-2. From Sentinel-1, we used VV and VH polarization returns and computed their standard deviations using Equations (4) and (5).

μ_{V V} (x, y) = \frac{1}{N} \sum_{i = - 1}^{1} \sum_{j = - 1}^{1} I_{V V} (x + i, y + j)

(4)

σ_{V V} (x, y) = \sqrt{\frac{1}{N} \sum_{i = - 1}^{1} \sum_{j = - 1}^{1} {(I}_{V V} {(x + i, y + j) - μ_{V V} (x, y))}^{2}}

(5)

For Sentinel-2, we used the Normalized Difference Vegetation Index (NDVI) computed from its near-infrared (NIR) and red bands, as well as the Enhanced Vegetation Index (EVI) computed from its NIR, red, and blue bands. The NDVI and EVI both assess vegetation health, but the EVI is less sensitive to atmospheric conditions. Equations (6)–(8) illustrate the calculation of NDVI, EVI, and vegetation cover, respectively. We transformed latitude and longitude into Cartesian coordinates to avoid discontinuities and to establish more consistent spatial relationships for each point. Finally, for each input variable, we computed moving averages for each feature to better characterize the surrounding points.

N D V I = \frac{(N I R - R e d)}{(N I R + R e d)}

(6)

E V I = 2.5 \times \frac{(N I R - R e d)}{(N I R + 6 \times R e d - 7.5 \times B l u e + 1)}

(7)

V e g e t a t i o n C o v e r = \frac{V H}{V V}

(8)

Table 1 outlines the untransformed model inputs. Due to the general heterogeneity and skewness of our geospatial data, we transformed several of our inputs using a power transform to improve model performance. We also looked at the nonlinear relationships between our inputs using polynomial transformations to better incorporate complex relationships. For the second model stage, we also incorporated ALS data.

To assess uncertainty in the model’s output, we trained several instances of our model to assess the variability in predictions and estimate confidence intervals through quantile regression. By quantifying uncertainties in the first stage, we gain a better understanding of how initial errors might propagate through to the final elevation estimates.

2.4. Surface Water Mapping in All-Weather Conditions

A key requirement for our calculations is the ability to obtain returns from areas obscured by canopy. To achieve this, we utilized SAR backscatter returns to identify water. The Sentinel-1 Ground Range Detected (GRD) products provide backscatter intensity in both the VV (vertical transmit, vertical receive) and VH (vertical transmit, horizontal receive) polarizations. VV polarization is sensitive to surface roughness, while VH is influenced by vegetation. Depending on the reference dataset we were interested in comparing, the scenes were either at 10 or 100 m resolutions. Higher resolutions were used for detailed local-scale analysis like floods or reservoir storage, while the coarser resolution was suitable for regional-scale assessments. This multi-resolution approach allows for flexibility in analyzing water dynamics at various spatial scales.

To obtain temporally coincident estimates for use in volume calculations, we used Sentinel-1 Ground Range Detected (GRD) scenes acquired through GEE [35]. The GRD images used were acquired in interferometric wide swath (IW) mode. In GEE, Sentinel-1 data are preprocessed using a restituted orbit file, GRD border noise removal, thermal noise removal, the application of radiometric calibration values, and orthorectification.

While there has been progress (machine learning and advanced image processing), our dataset’s size and varying spatial resolutions benefited from a less computationally intense approach. We used the VV and VH band backscatter returns to threshold for water detection. Several studies have used Sentinel-1 radar returns to detect water surfaces under different conditions such as flooded areas and inland waterbodies [36,37,38]. Figure 9 and Figure 10 highlight the performance of Sentinel-1 relative to HLS imagery for a thresholding approach, demonstrating how cloud cover, shadow, built structures, and canopy can hinder accurate water body delineation using optical imagery.

In addition to the VV and VH returns, we employed the Radar Normalized Difference Water Index (R-NDWI) as a primary indicator of water. The R-NDWI is calculated as shown in Equation (9) and has been effective in enhancing water body signatures while minimizing the influence of surface roughness and vegetation. In areas where the R-NDWI was insufficient for clear water discrimination (for example, complex backscatter patterns that affected our thresholding approach), we also incorporated the VH/VV ratio. The VH/VV ratio is useful when distinguishing water from other features due to its sensitivity to changes in vegetation and surface moisture.

R - N D W I = \frac{(V V - V H)}{(V V + V H)}

(9)

Thresholding Approach

With these indicators, we then used Otsu’s method to dynamically determine the optimal threshold and binarize the dataset for water mask generation [39]. This approach finds the threshold based on the distribution of pixels by minimizing intraclass variance to distinguish water and non-water pixels. We removed outliers and interpolated missing data using a nearest-neighbor approach. We also removed speckle noise and striping artifacts in the data using a Lee filter and localized median filter, respectively. The tuning of these filters and outlier removal were dependent on the data resolutions. Once the data were pre-processed, we computed R-NDWI and VH/VV from the backscatter returns. We used Otsu thresholding to classify and convert these images into binary water masks. We then assessed the agreement between each metric and removed built structures (if needed) from the masks using land cover classifications. By employing this comprehensive methodology, we were able to generate consistent and accurate monthly water masks. Figure 11 depicts a graphical representation of the workflow.

3. Results

3.1. Validation Regions

We sought to assess our methodology results at local (~10 m) and regional (~1°) scales. For local results, we selected four reservoirs in Brazil and compared SWV variations. These reservoirs were Pires Ferreira, Poço da Cruz, Nova Ponte, and Jacareí, and their volume variations were acquired from the DAHITI database [40,41,42,43]. The DAHITI dataset generated their water masks from HLS optical data, and their water levels were obtained using an array of spaceborne altimeters. In contrast, our methodology uses SAR-derived water masks and estimated water levels.

These reservoirs are dynamic, increasing in area from 2016 to present. For Pires Ferreira, there is a 3-year, 3-month overlap; Poço da Cruz has a 3-year, 2-month overlap; Nova Ponte has a 3-year, 3-month overlap, and Jacareí has a 2-year, 11-month overlap between the DAHITI SWVs and our SWVs. Figure 12 shows the location of each reservoir, while Table 2 contains the reservoir name, location, average coverage area, and the timespan overlap of our SWV estimates and DAHITI SWV estimates.

To investigate large-scale water storage anomalies, we selected four regions within the Amazon Basin that contain varying amounts of surface water. Each region covers a 1° × 1° area that we used to compare with GRACE-FO water storage anomaly estimates. Figure 13 and Table 3 provide information on the location of these regions.

We corrected the initial AW3D30 DEMs with our two-stage model approach, obtained monthly water masks, and calculated surface water volume variations for these eight areas. For our reservoirs, we validated our SWV estimates with reservoir estimates from DAHITI [5]. For our regional estimates, we compared the relative surface water storage (relative SWV divided by total area) with storage estimates from GRACE-FO and the Global Land Data Assimilation System (GLDAS) to assess the influence of surface water on regional water storage changes [44]. The GLDAS provides water storage anomalies for soil moisture, snow water equivalent, and canopy water anomalies.

3.2. ML Model Assessment

To inform model selection, we trained a range of machine learning regression models using a 10% subset of our data. In addition to our evaluation metrics (RMSE, MAE, and MAPE), we also considered runtime and the average RMSE from k-fold cross-validation (CV RMSE Mean) to assess both performance and model stability. Table 4 summarizes the results, including error metrics on the test set, performance on a spatially distinct holdout region, CV RMSE mean, and training runtime. For Model 1, which predicts ICESat-2 terrain heights, LGBM achieved the best balance of accuracy and efficiency. For Model 2, which uses a slightly different input–output configuration (in addition to Model 1′s output), Extra Trees yielded the best overall performance. Table 4 details the results of our initial assessment.

Based on the initial assessment, we selected LGBM for the first model and Extra Trees for the second model. While Gradient Boosting provided a lower RMSE, its runtime of 1100 s on a subset of the data made it impractical for our larger dataset. LGBM was a good balance of a lower RMSE, a lower MAPE, and a faster runtime. For the second model, Extra Trees outperformed all models in every category, excluding runtime, and had a runtime of less than 5 s.

After tuning each model with Optuna, we obtained the following hyperparameters for Model 1 (LGBM) and Model 2 (Extra Trees). The hyperparameters for Model 1 were as follows: learning rate = 0.195, num_leaves = 299, max_depth = 14, min_child_samples = 16, subsample = 0.504, colsample_bytree = 0.910, reg_alpha = 0.051, reg_lambda = 0.223, and n_estimators = 861. For Model 2, the selected hyperparameters were as follows: n_estimators = 596, max_depth = 30, min_samples_split = 3, min_samples_leaf = 1, and max_features = None. For Model 1 (LGBM), the hyperparameters we selected control how the model builds and refines a large number of decision trees. The learning rate (0.195) determines how quickly the model updates during training, while num_leaves (299) and max_depth (14) control the complexity of each tree. Parameters like min_child_samples (16), subsample (0.504), and colsample_bytree (0.910) prevent overfitting by limiting how much data and how many features each tree can use. Regularization terms (reg_alpha and reg_lambda) further reduce overfitting. The parameter n_estimators (861) sets the number of trees in the model. For Model 2 (Extra Trees), we optimized settings that influence how the forest of decision trees splits the data. Again, n_estimators (596) is the number of trees, and max_depth (30) controls how deep each tree can grow. The min_samples_split (3) and min_samples_leaf (1) values specify the minimum number of data points required to make a split or be a leaf. max_features = None means each tree can consider all features when making splits.

Finally, we trained our Model 1 (LGBM) using 4 million data points to predict ICESat-2 elevation and validated it against a test set of data. Then, with our predicted height outputs for DEM correction as an additional input, we used our Model 2 (Extra Trees) to generate a final DEM. Table 5 outlines the RMSE, MAE, and MAPE results of the initial and predicted data, as well as the percent improvement.

The resulting terrain elevation estimate is closer to the ICESat-2 estimates on average, and the elevation errors are proportionally smaller relative to the terrain estimates across varying elevations. Figure 14 contains the post-Model 1 distribution of error between ICESat-2 and our predicted DEM. Compared to the initial DEM, our resultant DEM has a better performance relative to ICESat-2 terrain estimates.

Similarly, for Model 2, we were able to improve our generated DEM using ALS data as the reference. The quantitative metric improvements are provided in Table 6.

Unlike the ICESat-2 and AW3D30 datasets, the terrain estimates from ALS had a less-uniform distribution of errors, as shown Figure 15; however, they were able to be corrected through our Model 2 (Extra Trees) approach.

When leveraging this approach over our regions of interest, we validated our Model 1 approach with available ICESat-2 terrain heights. These results are detailed in Table 7.

Model 1 demonstrated a good and consistent performance across all ROIs, significantly improving elevation estimates, even in areas with limited ICESat-2 coverage. RMSE reductions ranged from 55% to over 95%, with the most notable improvements seen in Amazonas and Pires Ferreira, despite their low data availability. The MAPE also decreased substantially in all regions, with extreme outliers like Manaus (where near-zero elevations inflated initial MAPE) still demonstrating a substantial error reduction. These results indicate that the model generalized well using terrain and land cover features across diverse landscapes and data availability.

3.3. Water Masking with SAR

We generated the monthly water masks using our SAR backscatter return methodology and validated the data against a composite dataset. This composite approach was necessary because no single ground truth exists for Amazon wetland, and each individual dataset carries methodological limitations that could bias validation results [45]. The composite dataset combined three complementary sources—JRC Global Surface Water (JRC GSW), MapBiomas water classification, and the LBA Amazon wetlands dataset. JRC GSW provides long-term temporal coverage from Landsat satellites but faces limitations in tropical regions due to cloud cover and forest canopy [46]. MapBiomas water classification provides annual surface water extent maps using Landsat 5 TM, Landsat 7 ETM+, and Landsat 8 OLI sensors with cloud and shadow masking applied to each scene; this dataset was specifically calibrated for the Amazon [47]. This dataset is annual, so it does not fully capture seasonal water dynamics. The LBA Amazon wetlands dataset provides regional-scale wetland mapping using L-band SAR [48]. We combined these datasets with weighing to generate a single comparison dataset.

We computed the F1 score, Intersection over Union (IoU), precision, and recall metrics comparing our SAR-derived masks against this composite reference for both wet season (December–May) and dry season (June–November) periods. The F1 Score is the harmonic mean of precision and recall, with values ranging from 0 (poor) to 1 (perfect). The Intersection over Union (IoU) measures the spatial overlap between predicted and reference water areas. Higher IoU values indicate better agreement. Precision is the fraction of predicted water pixels that are correct, and recall is the fraction of actual water pixels detected. The comparison results are shown in Table 8, and they represent a consensus across several different classification approaches.

The SAR-based water detection algorithm demonstrated a good performance across both seasonal conditions with F1 scores of 0.805 ± 0.075 for the dry season and 0.819 ± 0.081 for the wet season. The IoU values of 0.679 ± 0.098 (dry) and 0.700 ± 0.100 (wet) indicated a good spatial overlap between SAR-derived masks and the composite reference dataset. The algorithm exhibited a consistently high precision across seasons (0.898 ± 0.102 for dry and 0.911 ± 0.091 for wet) and moderate recall values (0.745 ± 0.111 for dry and 0.756 ± 0.102 for wet). Standard deviations ranging from 7.5% to 11.1% across metrics indicated a reasonable spatial consistency with potential for improvement for a reduction in ROI variability. The wet season consistently outperformed the dry season across all metrics, with improvements ranging from 1.1 to 2.1 percentage points. During the peak dry season, when there is a minimum water extent, larger scale discrepancies appeared. The discrepancies were primarily false negatives (FNs) of missed water, and these discrepancies were larger in the regions containing more surface water and seasonal floodplains. Figure 16 provides a representation with a classification error map from the peak wet season (May 2023) alongside one from the dry season (November 2023).

The seasonal difference is very pronounced in our two error maps. The wet season map (May 2023) shows a strong agreement between our classification and the reference data, with major river channels and floodplains accurately being identified as water (true positives) and non-inundated areas correctly being classified as non-water (true negatives). The dry season has large portions of the floodplain classified as missed water (FNs). Since the errors are localized along inundation boundaries, we can attribute the error to our monthly classifications compared to a likely wet season reference composite.

3.4. Volume Estimates

Using our twice-improved AW3D30 DEM and SAR-derived water masks, we generated our volume estimates for our eight regions—the four reservoirs to investigate local performance and four 1° × 1° ROIs to assess regional performance.

3.4.1. Relative SWV Estimates Comparison at Local Scales

We compared our monthly reservoir volume variations against the DAHITI dataset’s volume variations for four Brazilian reservoirs over their temporal overlap (roughly two to three years). DAHITI volume variations are relative to a different initial baseline. HLS data and an array of satellite altimetry were used to derive water masks and levels, respectively. In contrast, our volume anomalies are relative to the water level represented in our corrected DEM. We used SAR to detect water bodies, and we used a boundary approach to infer median water surface elevation.

We calculated RMSE and MAE in km³; due to the differences between the two volume estimates, we also calculated the Pearson Correlation Coefficient and p-value. The Pearson Correlation Coefficient measures the strength and direction of the linear relationship between our two estimates, while the p-value indicates the statistical significance of our correlation. We found that for the three reservoirs, there was a high and statistically significant correlation, while one reservoir (Jacareí) had a statistically significant and moderate correlation of ~0.63. There were significant differences in the magnitude of the volume variations, which are reflected in the initial RMSE values (ranging from 0.0211 km³ for Poço da Cruz to 0.3964 km³ for Jacareí) and MAE values (ranging from 0.0176 km³ to 0.3891 km³). These magnitude differences are expected due to the differing baseline references and volume calculation methodologies.

Due to the high correlation, as well as the differences in baseline water volume, we computed a linear offset and subsequently adjusted our calculated volume anomalies and reevaluated the RMSE and MAE (adjusted). Our approach follows established practices in satellite validation studies where systematic offsets between independent measurement systems are common and expected [49]. The linear offset correction assumes that both methods detect the same underlying hydrological signals but reference different baseline elevations. Given that DAHITI uses satellite altimetry referenced to geoid heights, while our method uses SAR-derived water masks referenced to our corrected DEM elevations, a linear offset is needed.

The results show a good agreement, excluding Jacareí’s comparison, which is likely due to the higher RMSE in the DEM or a more complex water body that challenges our boundary-based water level approach. Similar validation challenges have been observed in other South American water monitoring studies, where site-specific factors such as topographic complexity and vegetation interference can affect accuracy. These results are outlined in Table 9 and demonstrate that our SAR-based volume estimation methodology captures realistic temporal variations in reservoir storage, providing confidence in its application for broader surface water monitoring across data-sparse regions.

The strong correlations and a significant reduction in error metrics after linear adjustment suggest that our method effectively captures the temporal patterns and relative variations in reservoir volume, even though there are inherent differences in absolute magnitude and reference baseline compared to the DAHITI product. The generated time series and scatter plots (both original and linearly adjusted) in Figure 17, Figure 18, Figure 19 and Figure 20 visually confirm these findings.

3.4.2. Long-Term SW Storage and GRACE-FO Observations

For regional-level observations, we computed the SWV for each month and divided the monthly volumes by area (~12,392 km² for a 1° × 1° region) to obtain a surface water storage (SWS) anomaly (SWA) in Liquid Water Equivalent (LWE) in meters from 2018 to 2024. At this resolution, we assessed our SWA against the GRACE-FO-derived TWSAs and the GLDAS-derived Soil Moisture, Canopy Water, and Snow Water Equivalent Anomalies (SMAs + CWAs + SWEAs). The relationship between these anomalies is described in Equation (7) below. The groundwater anomaly (GWA) is not represented in GLDAS data or our estimates, but we can compute it through Equation (7).

TWSA = SWEA + SMA + CWA+ GWA + SWA

(10)

To evaluate the relationship between our SWA calculation, TWSAs, and SWEAs + SMAs + CWAs, we constructed a monthly time series for each ROI. Missing values within these series were handled using time-weighted linear interpolation. Time-weighted linear interpolation estimates missing values by assuming a linear trend between two known data points. The missing value is calculated based on its proportional distance in time along that trend line. To prevent long periods of artificial data, we only filled gaps up to three months. We used a single month backwards and forwards fill to populate missing data at the beginning and end of each time series. We then removed seasonal trends from each time series using an additive decomposition model and extracted the residual component to isolate non-seasonal anomalies. Finally, an approximate estimate of the Groundwater Storage Anomaly (GWA) was computed by rearranging the water balance equation in Equation (7), where GWSA is the residual of the TWSA after subtracting the SWA and GLDAS (SWEA + SMA + CWA). Figure 21, Figure 22, Figure 23 and Figure 24 are the plots for each ROI.

Quantitatively, we applied least squares linear regression to calculate the slope of the trend (m/yr) and its p-value (to determine statistical significance). To further explore relationships between these components after removing wet–dry cycles, we computed Pearson’s correlation coefficients for each pair (GRACE-FO and SWA, GRACE-FO and GLDAS, and GLDAS and SWA). Finally, to evaluate how well the sum of SWA and SWEA+SMA+CWA explains the total TWSA signal, we computed RMSE (m) and the mean bias (m). The metrics for trend comparison are presented in Table 10, while the metrics for component correlation and bias are in Table 11.

Iquitos experienced a statistically significant drying trend over our time range, with total TWS declining at a rate of approximately −0.0217 m/yr (p < 0.001). This decline was similarly represented in all water storage components including GLDAS (SWEA + SMA + CWA) at −0.002 m/yr and SWS at −0.001 m/yr. The estimated GWS decline was −0.029 m/yr. These values were all statistically significant. We found that TWSA is moderately driven by GLDAS components (r = 0.524) but shows almost no correlation with SWS signal (r = 0.092). These results suggest that the Iquitos ROI surface water variability is not the primary driver of water storage changes.

Manaus experienced the most pronounced decrease in TWS, with a statistically significant trend of −0.048 m/yr (p = 0.003). The long-term trend for SWS was not statistically significant (p = 0.19) but was highly correlated with the TWS signal (r = 0.821). Out of our four regions, Manaus contains the most surface water bodies. The GLDAS components signal was also highly correlated at r = 0.818. This indicates that both surface water and soil/canopy water dynamics are drivers of water storage variation in this area. This region also had the highest RMSE and a negative bias, which indicates that our components are not fully capturing the water storage variation.

In the Acre ROI, which contains the least amount of surface water, we found an extremely strong correlation between TWS and the GLDAS components (r = 0.96). This region had a drying trend of −0.0346 m/yr (p < 0.001), which correlated with the negative trends in the GLDAS components and SWS. We also had a strong TWS and SWS trend correlation (r = 0.59) in Acre and the lowest RMSE.

Amazonas showed a consistent and significant drying trend over each component. TWS declined at a rate of −0.0326 m/yr (p < 0.001), while SWS declined at −0.0022 m/yr and GLDAS components declined at -0.0018 m/yr. There was a moderate correlation between SWS and TWS (r = 0.56), as well as a strong (r = −0.70) correlation between the TWS and GLDAS (SWEA + SMA + CWA) components. This likely indicates a combined contribution from both soil and canopy water and surface water in driving the overall water storage signal.

3.4.3. Temporally Coincident Altimetry for SWV Estimates

We validated our inferred water level estimates against ICESat-2 altimetry data. Monthly averaged ICESat-2 tracks were compared to corresponding monthly water levels within 1 km grid cells, considering only cells containing ≥15 ICESat-2 points and that were classified as water. The validation revealed discrepancies between ICESat-2 and estimated water levels, with maximum errors reaching 20 m, with an RMSE of approximately 5 m. Error magnitudes were predominantly concentrated in regions characterized by ephemeral water bodies and sparse ICESat-2 coverage. Furthermore, the use of a monthly temporal resolution for both datasets proved too coarse to resolve the rapid water level fluctuations inherent to dynamic floodplain environments like the Amazon Basin. ICESat-2 has a 91-day repeat cycle, so its monthly data for a given cell in our region typically represent the water level at one day in the month. In contrast, we average SAR images over a month and use that for our water level estimates. Because these acquisitions were not synchronized, the discrepancy between them can be quite large, especially as a result of the Amazon’s variability. Additionally, the ICESat-2 track location can influence the performance. More ICESat-2 tracks over stable water will have a smaller error while fewer ICESat-2 tracks with a couple outlier measurements may lead to larger error. Figure 25 illustrates this effect, comparing a dry-season month with a wet-season month.

4. Discussion

The results of this study highlight the importance of accurate DEMs and water inundation masks for the estimation of SWV, especially in challenging environments. The Amazon Basin exemplifies these challenges as standard DEMs suffer from vertical errors when compared to ICESat-2- and ALS-classified ground. Additionally, spectral mixing between water and vegetation, along with frequent cloud cover, complicates accurate water extent delineation from optical imagery. SAR platforms like Sentinel-1 prove useful for all weather approaches. Our Sentinel-1 implementation worked well for major water extent changes but struggled with Otsu thresholding in regions with very low water-to-land ratios, like Acre, which require manual thresholding. Future work to improve our SAR estimates would involve methods like adaptive thresholding or machine learning algorithms for water classification. SAR missions utilizing longer wavelengths such as L and P band data demonstrate superior canopy penetration capabilities when available [50]. The upcoming NASA-ISRO Synthetic Aperture Radar (NISAR) mission will provide systematic L-band observations with consistent temporal sampling and can be implemented in our masking methodology to further improve our water extent mapping [51].

The temporal limitations of the composite reference mask resulted in varying validation performances across our monthly water masks. The reference datasets represented different temporal aggregations—annual (MapBiomas), multi-year averages (JRC GSW), and static mapping (LBA wetlands). The validation performance was best during peak flood season and declined during low water periods due to the reference composite capturing peak or average water occurrence.

While our SWV estimates and the DAHITI dataset are referenced to different baselines, a strong temporal correlation showed that both methods capture the same underlying hydrological patterns. Applying a linear offset to account for this difference was essential for a direct comparison and validated our methodology’s ability to track dynamic changes in reservoir storage.

For our larger 1° × 1° regional estimates, we sought to deconstruct the TWSA obtained from GRACE-FO observations by integrating our surface water estimates with soil, snow, and canopy water data from GLDAS. Our findings reveal the highly variable role of surface water in driving regional water storage changes, successfully captured by our methodology. Manaus showed a strong correlation (r = 0.821) between SWA and TWSA, indicating that in water-rich landscapes, surface water dynamics are important drivers of TWSA and must be quantified. The high RMSE and bias in the region imply that our observed components (SWS and GLDAS) do not fully describe TWS changes. This suggests that groundwater storage has a more substantial role in TWS changes. The Iquitos ROI showed almost no correlation between SWS and the overall drying trend (r = 0.092). Soil moisture and canopy water from GLDAS and the estimated ground water drove the decline. Acre similarly had GLDAS components as the primary drivers for water storage changes, which is consistent with the relatively small amount of surface water in this region. Finally, Amazonas presented a mixed case, with significant contributions from both surface water and soil and canopy water. Ultimately, this analysis confirms there is no single approach to deconstructing TWSAs, as observed by GRACE-FO. By generating estimates of SWS that are independent of GRACE-FO, we provide a dataset to more accurately partition the overall signal. Through this, we can better isolate contributions from groundwater and ultimately improve the calibration of hydrological models for water cycle monitoring.

Estimating water levels by leveraging a DEM and water masks represents a computationally efficient and scalable approach that is well-suited for large-scale analyses over large regions or extended temporal periods. This approach is constrained by several accuracy limitations. The primary sources of uncertainty include systematic errors in water mask delineation along shoreline boundaries where mixed pixels, vegetation interference, and seasonal variation can lead to misclassification. Additionally, DEM accuracy becomes a critical limiting factor since vertical errors in the topographic dataset propagate to water level estimates. These errors are often amplified in low-relief areas such as floodplains and wetlands since small topographic uncertainties can lead to the overestimation or underestimation of inundated areas and their corresponding water depths. Altimetry data provide crucial water level measurements for validation. Laser altimetry from ICESat-2 has shown remarkable precision, with reported RMSE values as low as 0.06 m for inland water monitoring, making it an invaluable tool for constraining and quantifying water level estimates [52]. To best assess the accuracy of our inferred water levels, a comparison methodology specifically designed to bridge the temporal and spatial gaps between the satellite’s point-based measurements and our gridded outputs is needed.

SWV plays a non-negligible role in terrestrial water storage, and improving SWV estimates is crucial to begin investigating TWS components and the water cycle. In some cases, researchers estimating water storage changes can omit surface water, but this component constitutes up to 70% of TWSAs in humid and monsoon-dominated basins like the Amazon [10,11]. The novel machine learning framework established here provides a robust method for creating accurate, multi-scale (10 m, 1°) time series of surface water storage.

5. Conclusions

This study was motivated by a clear research gap—the need for a unified, scalable framework to estimate surface water volume in complex environments like the Amazon where standard DEMs can suffer from larger vertical errors, and cloud cover and canopy limit water mapping from optical satellites. Our results demonstrate that the proposed methodology, consisting of a novel two-stage machine learning approach (LGBM and Extra Trees ensemble) for DEM enhancement with SAR-based water detection, provides an effective solution to these interconnected challenges, while maintaining a computational performance that is suitable for large-scale applications. Validation with high-precision altimetry and external SWV estimates confirms the effectiveness of this method, providing a valuable contribution to remote water resource management, especially in data-limited regions.

Looking forward, our SWV estimates and the framework that generated them can be further improved for monitoring multi-scale water movement. We plan to continue analysis and validation over diverse regions in the US with more ALS data and validation sources. For these new areas, we will further refine our processing pipeline, as well incorporating a sensitivity analysis to investigate error propagation. Ultimately, we will leverage our methodology to investigate water storage changes at the local and regional levels.

Author Contributions

Conceptualization: L.A.M. and M.R.; methodology: L.A.M. and M.R.; software: M.R.; validation: M.R.; formal analysis: M.R.; investigation: M.R.; resources: L.A.M. and M.R.; data curation: M.R.; writing—original draft preparation: M.R.; writing—review and editing: L.A.M.; visualization: M.R.; supervision: L.A.M.; project administration: L.A.M.; funding acquisition: L.A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by NASA, through award 80NSSC22K0044.

Data Availability Statement

The original data presented in this study are openly available at the following sources: Sentinel-1 SAR imagery, Harmonized Landsat and Sentinel-2 (HLS) data, AW3D30 digital surface models, and ESA WorldCover land cover classifications were accessed through Google Earth Engine (https://earthengine.google.com/) (accessed on 29 June 2025). Original dataset references are available within the Earth Engine Data Catalog. The ICESat-2 ATL08 (DOI: 10.5067/ATLAS/ATL08.005) and ATL03 (DOI: 10.5067/ATLAS/ATL03.005) datasets were accessed using the SlideRule platform (https://slideruleearth.io/) (accessed on 29 June 2025) and are originally distributed by NASA’s National Snow and Ice Data Center (NSIDC). Airborne LiDAR surveys from the Amazon Sustainable Landscapes Project (2008–2021) are publicly available and were accessed through the REDAPE (Rede Amazônica de Inventários Florestais) portal (https://www.redape.dados.embrapa.br/) (accessed on 29 June 2025) [19,20,21,22,23,24,25,26]. GRACE-FO Terrestrial Water Storage Anomaly data (TELLUS_GRFO_L3_CSR_RL06.3_LND_v04_RL0) are available at NASA PO.DAAC (DOI: 10.5067/GFL30-LN004). GLDAS-derived TWS anomaly data (TELLUS_GLDAS-NOAH-3.3_TWS-ANOMALY_MONTHLY) are available at NASA PO.DAAC (DOI: 10.5067/TELLUS_GLDAS-3.3).

Acknowledgments

The authors would like to thank the research staff at the Center for Space Research for their support. They would also like to thank the scientists and engineers from ICESat-2. Additionally, the authors thank the anonymous reviewers for their comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

National Research Council. Thriving on Our Changing Planet: A Decadal Strategy for Earth Observation from Space; National Academies Press: Washington, DC, USA, 2018; ISBN 978-0-309-46757-5. [Google Scholar]
The United Nations World Water Development Report 2024: Water for Prosperity and Peace-UNESCO Digital Library. Available online: https://unesdoc.unesco.org/ark:/48223/pf0000388948 (accessed on 19 February 2025).
Terrestrial Water Storage (TWS). Available online: https://gcos.wmo.int/site/global-climate-observing-system-gcos/essential-climate-variables/terrestrial-water-storage-tws (accessed on 14 January 2025).
Save, H.; Bettadpur, S.; Tapley, B.D. High-Resolution CSR GRACE RL05 Mascons. J. Geophys. Res. Solid Earth 2016, 121, 7547–7569. [Google Scholar] [CrossRef]
Schwatke, C.; Dettmering, D.; Bosch, W.; Seitz, F. DAHITI–an Innovative Approach for Estimating Water Level Time Series over Inland Waters Using Multi-Mission Satellite Altimetry. Hydrol. Earth Syst. Sci. 2015, 19, 4345–4364. [Google Scholar] [CrossRef]
Yuan, C.; Zhang, F.; Liu, C. A Comparison of Multiple DEMs and Satellite Altimetric Data in Lake Volume Monitoring. Remote Sens. 2024, 16, 974. [Google Scholar] [CrossRef]
Rezende, I.; Fatras, C.; Oubanas, H.; Gejadze, I.; Malaterre, P.-O.; Peña-Luque, S.; Domeneghetti, A. Reconstruction of Effective Cross-Sections from DEMs and Water Surface Elevation. Remote Sens. 2025, 17, 1020. [Google Scholar] [CrossRef]
Guenther, E.; Magruder, L.; Neuenschwander, A.; Maze-England, D.; Dietrich, J. Examining CNN Terrain Model for TanDEM-X DEMs Using ICESat-2 Data in Southeastern United States. Remote Sens. Environ. 2024, 311, 114293. [Google Scholar] [CrossRef]
Cohen, S.; Raney, A.; Munasinghe, D.; Loftis, J.D.; Molthan, A.; Bell, J.; Rogers, L.; Galantowicz, J.; Brakenridge, G.R.; Kettner, A.J.; et al. The Floodwater Depth Estimation Tool (FwDET v2.0) for Improved Remote Sensing Analysis of Coastal Flooding. Nat. Hazards Earth Syst. Sci. 2019, 19, 2053–2065. [Google Scholar] [CrossRef]
Papa, F.; Frappart, F. Surface Water Storage in Rivers and Wetlands Derived from Satellite Observations: A Review of Current Advances and Future Opportunities for Hydrological Sciences. Remote Sens. 2021, 13, 4162. [Google Scholar] [CrossRef]
Wang, S.; Li, J.; Russell, H.A.J. Methods for Estimating Surface Water Storage Changes and Their Evaluations. J. Hydrometeorol. 2023, 24, 445–461. [Google Scholar] [CrossRef]
Di Baldassarre, G.; Schumann, G.; Bates, P.D.; Freer, J.E.; Beven, K.J. Flood-Plain Mapping: A Critical Discussion of Deterministic and Probabilistic Approaches. Hydrol. Sci. J. 2010, 55, 364–376. [Google Scholar] [CrossRef]
Barichivich, J.; Gloor, E.; Peylin, P.; Brienen, R.J.W.; Schöngart, J.; Espinoza, J.C.; Pattnayak, K.C. Recent Intensification of Amazon Flooding Extremes Driven by Strengthened Walker Circulation. Sci. Adv. 2018, 4, eaat8785. [Google Scholar] [CrossRef]
Has Climate Change Already Affected ENSO?|NOAA Climate.gov. Available online: http://www.climate.gov/news-features/blogs/enso/has-climate-change-already-affected-enso (accessed on 17 September 2023).
Del Rio Amador, L.; Boudreault, M.; Carozza, D.A. Global Asymmetries in the Influence of ENSO on Flood Risk Based on 1,600 Years of Hybrid Simulations. Geophys. Res. Lett. 2023, 50, e2022GL102027. [Google Scholar] [CrossRef]
Espinoza, J.-C.; Marengo, J.A.; Schongart, J.; Jimenez, J.C. The New Historical Flood of 2021 in the Amazon River Compared to Major Floods of the 21st Century: Atmospheric Features in the Context of the Intensification of Floods. Weather Clim. Extrem. 2022, 35, 100406. [Google Scholar] [CrossRef]
Tadono, T.; Ishida, H.; Oda, F.; Naito, S.; Minakawa, K.; Iwamoto, H. Precise Global DEM Generation by ALOS PRISM. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2014, II–4, 71–76. [Google Scholar] [CrossRef]
Uuemaa, E.; Ahi, S.; Montibeller, B.; Muru, M.; Kmoch, A. Vertical Accuracy of Freely Available Global Digital Elevation Models (ASTER, AW3D30, MERIT, TanDEM-X, SRTM, and NASADEM). Remote Sens. 2020, 12, 3482. [Google Scholar] [CrossRef]
Magruder, L.; Brunt, K.; Neumann, T.; Klotz, B.; Alonzo, M. Passive Ground-Based Optical Techniques for Monitoring the On-Orbit ICESat-2 Altimeter Geolocation and Footprint Diameter. Earth Space Sci. 2021, 8, e2020EA001414. [Google Scholar] [CrossRef]
Neumann, T.; Brenner, A.; Hancock, D.; Robbins, J.; Saba, J.; Harbeck, K.; Gibbons, A.; Lee, J.; Luthcke, S.; Rebold, T.; et al. ATLAS/ICESat-2 L2A Global Geolocated Photon Data; Version 5 2021; NASA National Snow and Ice Data Center Distributed Active Archive Center: Boulder, CO, USA.
Neuenschwander, A.; Pitts, K. The ATL08 Land and Vegetation Product for the ICESat-2 Mission. Remote Sens. Environ. 2019, 221, 247–259. [Google Scholar] [CrossRef]
SlideRule—Sliderule v4.3.1 Documentation. Available online: https://slideruleearth.io (accessed on 18 October 2024).
ESA-Sentinel-1. Available online: https://www.esa.int/Applications/Observing_the_Earth/Copernicus/Sentinel-1 (accessed on 21 October 2024).
European Space Agency. Sentinel-2 MSI Level-1C TOA Reflectance; European Space Agency: Paris, France, 2022.
Keller, M.; Batistella, M.; Gorgens, E.B. LiDAR Survey on 250 Hectares in Jari, Amapá, Brazil, 2021; Brazilian Agricultural Research Corporation (EMBRAPA): Brasilia, Brazil, 2024. [Google Scholar] [CrossRef]
Keller, M.; Batistella, M.; Gorgens, E.B. LiDAR Survey on 2400 Hectares in São Sebastião Do Uatumã, Amazonas, Brazil, 2021; SIDALC: Turrialba, Costa Rica, 2024. [Google Scholar] [CrossRef]
Keller, M.; Batistella, M.; Gorgens, E.B. LiDAR Survey on 968 Hectares in GEDI, Mato Grosso, Brazil, 2021; SIDALC: Turrialba, Costa Rica, 2024. [Google Scholar] [CrossRef]
Keller, M.; Batistella, M.; Gorgens, E.B. LiDAR Survey on 765 Hectares in Tumbira, Pará, Brazil, 2021; SIDALC: Turrialba, Costa Rica, 2024. [Google Scholar] [CrossRef]
Dos Santos, M.N.; Keller, M.; Batistella, M. LiDAR Survey on 245.7 Hectares in Manaus, Amazonas, Brasil in 2017; SIDALC: Turrialba, Costa Rica, 2023. [Google Scholar] [CrossRef]
Keller, M.; Batistella, M.; Gorgens, E.B. LiDAR Survey on 1257 Hectares in Reserva Ducke, Amazonas, Brazil, 2021; SIDALC: Turrialba, Costa Rica, 2024. [Google Scholar] [CrossRef]
Dos-Santos, M.N.; Keller, M.M.; Morton, D.C. LiDAR Surveys over Selected Forest Research Sites, Brazilian Amazon, 2008–2018; ORNL DAAC: Oak RIdge, TN, USA, 2019. [Google Scholar] [CrossRef]
Magruder, L.; Leigh, H.; Neuenschwander, A. Evaluation of Terrain and Canopy Height Products in Central African Tropical Forests. Int. J. Remote Sens. 2016, 37, 5365–5387. [Google Scholar] [CrossRef]
Helmer, E.H.; Lefsky, M.A. Forest Canopy Heights in Amazon River Basin Forests as Estimated with the Geoscience Laser Altimeter System (GLAS); USDA Forest Service Proceedings RMRS-P-42CD; U.S. Department of Agriculture, Forest Service, Rocky Mountain Research Station: Fort Collins, CO, USA, 2006.
Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-Generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 2623–2631. [Google Scholar]
Sentinel-1 Algorithms|Google Earth Engine. Available online: https://developers.google.com/earth-engine/guides/sentinel1 (accessed on 30 October 2024).
White, L.; Brisco, B.; Dabboor, M.; Schmitt, A.; Pratt, A. A Collection of SAR Methodologies for Monitoring Wetlands. Remote Sens. 2015, 7, 7615–7645. [Google Scholar] [CrossRef]
Tarpanelli, A.; Mondini, A.C.; Camici, S. Effectiveness of Sentinel-1 and Sentinel-2 for Flood Detection Assessment in Europe. Nat. Hazards Earth Syst. Sci. 2022, 22, 2473–2489. [Google Scholar] [CrossRef]
Gulácsi, A.; Kovács, F. Sentinel-1-Imagery-Based High-Resolution Water Cover Detection on Wetlands, Aided by Google Earth Engine. Remote Sens. 2020, 12, 1614. [Google Scholar] [CrossRef]
Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
Pires Ferreira, Reservoir|General Info|Database for Hydrological Time Series of Inland Waters (DAHITI). Available online: https://dahiti.dgfi.tum.de/en/8671/ (accessed on 26 February 2025).
Poço Da Cruz, Reservoir|General Info|Database for Hydrological Time Series of Inland Waters (DAHITI). Available online: https://dahiti.dgfi.tum.de/en/8702/ (accessed on 26 February 2025).
Nova Ponte, Reservoir|General Info|Database for Hydrological Time Series of Inland Waters (DAHITI). Available online: https://dahiti.dgfi.tum.de/en/10351/ (accessed on 26 February 2025).
Jacarei, Reservoir|General Info|Database for Hydrological Time Series of Inland Waters (DAHITI). Available online: https://dahiti.dgfi.tum.de/en/10345/ (accessed on 26 February 2025).
Monthly Gridded Global Land Data Assimilation System (GLDAS) from Noah-v3.3 Land Hydrology Model for GRACE and GRACE-FO over Nominal Months. Available online: https://podaac.jpl.nasa.gov/dataset/TELLUS_GLDAS-NOAH-3.3_TWS-ANOMALY_MONTHLY (accessed on 22 January 2025).
Cooley, S.W.; Smith, L.C.; Stepan, L.; Mascaro, J. Tracking Dynamic Northern Surface Water Changes with High-Frequency Planet CubeSat Imagery. Remote Sens. 2017, 9, 1306. [Google Scholar] [CrossRef]
Pekel, J.-F.; Cottam, A.; Gorelick, N.; Belward, A.S. High-Resolution Mapping of Global Surface Water and Its Long-Term Changes. Nature 2016, 540, 418–422. [Google Scholar] [CrossRef]
MapBiomas Amazonía Project. Collection 6 of Annual Land Cover and Land Use Maps, Version 1. Available online: https://amazonia.mapbiomas.org/ (accessed on 11 May 2025).
Hess, L.L.; Melack, J.M.; Affonso, A.G.; Barbosa, C.; Gastil-Buhl, M.; Novo, E.M.L.M. Wetlands of the Lowland Amazon Basin: Extent, Vegetative Cover, and Dual-Season Inundated Area as Mapped with JERS-1 Synthetic Aperture Radar. Wetlands 2015, 35, 745–756. [Google Scholar] [CrossRef]
Pedreros-Guarda, M.; Abarca-del-Río, R.; Crétaux, J.F.; Paris, A. Long-Term Lake Water Levels in Central Chile Using Satellite Altimetry and Conceptual Hydrological Modelling Approaches. Adv. Space Res. 2025, in press. [Google Scholar] [CrossRef]
Jesus, J.; Kuplich, T. Applications of SAR Data to the Estimate of Forest Biophysical Variables in Brazil. Cerne 2020, 26, 88. [Google Scholar] [CrossRef]
Quick Facts|Mission. Available online: https://nisar.jpl.nasa.gov/mission/quick-facts (accessed on 31 December 2024).
Zhang, Z.; Bo, Y.; Jin, S.; Chen, G.; Dong, Z. Dynamic Water Level Changes in Qinghai Lake from Integrating Refined ICESat-2 and GEDI Altimetry Data (2018–2021). J. Hydrol. 2023, 617, 129007. [Google Scholar] [CrossRef]

Figure 1. Flowchart of volume estimation approach.

Figure 2. TWSA changes measured by GRACE-FO in April (wet season) and in October (dry season) from 2019 to 2024 over South America. TWSA is relative to a 2004–2009 mean baseline. Red represents a negative anomaly (less than the baseline), while blue represents a positive anomaly. The consistent water loss in both wet and dry seasons around the southernmost part of the continent is due to the melting Patagonian Icefields.

Figure 3. Along-track profile of ICESat-2 ground and canopy returns and AW3D30 heights (read right to left for descending orbit). In areas with dense canopy, AW3D30 fails to reach the ground.

Figure 4. Geospatial distribution of ICESat-2 data points (2-million-point subset). The data are roughly between 2°S and 13°S and between 75°W and 50°W.

Figure 5. Geospatial distribution of airborne lidar surveys. Flight lines are over field research sites in the Brazilian states of Acre, Amazonas, Bahia, Goias, Mato Grosso, Para, Rondonia, Santa Catarina, and Sao Paulo.

Figure 6. (a) AW3D30 raster and (b) available ICESat-2 ground-classified points. The ICESat-2 points are too sparse in certain areas to interpolate without a reference DEM.

Figure 7. (a) Error histogram of ICESat-2 heights relative to AW3D30. (b) Scatter plot of ICESat-2 terrain heights vs. AW3D30 heights. While there is a correlation between the two estimates, the range of errors between the two indicates a need to improve the initial DEM’s terrain estimates.

Figure 8. DEM ML training workflow. We use the output of Model 1 (predictions of ICESat-2 terrain heights) as an input in Model 2 to obtain a DEM with elevations more closely aligned to ALSs. All other inputs are used in both models.

Figure 9. Imagery (left) and SAR (right) delineation of the Poço da Cruz Reservoir in 2018. The imagery-derived water indication is hindered by cloud cover (yellow) and shadow (dark blue, paired with clouds). In contrast, the SAR returns do not face such issues.

Figure 10. Multispectral imagery and SAR delineation of water bodies around Iquitos, Peru. In this more canopied region, EVI indicators can miss obscured or miscolored water and may misidentify built structures as water.

Figure 11. Water masking workflow. We use both VV and VH Sentinel-1 returns to best determine water extent..

Figure 12. Location of reservoirs for volume comparisons and an example water mask for each.

Figure 13. Validation-case ROIs for GRACE-FO comparison in the Amazon Basin. These are 1° × 1° areas around Iquitos in Peru, as well as Amazonas, Acre, and Manaus in Brazil. They vary in the amount of surface water stored in each ROI, with Acre having the least amount of surface water and Manaus having the most.

Figure 14. (a) Error histogram of ICESat-2 heights relative to predicted values. (b) Scatter plot of ICESat-2 terrain heights vs. predicted ICESat-2 heights.

Figure 15. (a) Distribution of the initial error of ICESat-2 and AW3D30. (b) Final distribution of the error between predicted and actual ALS heights.

Figure 16. Classification error maps for Manaus. In the wet season, there are almost no false negatives (FNs), while they dominate in the dry season.

Figure 17. Pires Ferreira original and linearly adjusted time series comparisons.

Figure 18. Poço da Cruz original and linearly adjusted time series comparisons.

Figure 19. Nova Ponte original and linearly adjusted time series comparisons.

Figure 20. Jacareí original and linearly adjusted time series comparisons.

Figure 21. Iquitos water storage variations from 2018 to 2024. The top plot represents the original anomalies, the middle plot represents the anomalies with seasonal trends removed, and the bottom plots represent the GW anomaly trends derived from the other TWS components.

Figure 22. Amazonas water storage variations from 2018 to 2024. The top plot represents the original anomalies, the middle plot represents the anomalies with seasonal trends removed, and the bottom plots represent the GW anomaly trends derived from the other TWS components.

Figure 23. Acre water storage variations from 2018 to 2024. The top plot represents the original anomalies, the middle plot represents the anomalies with seasonal trends removed, and the bottom plots represent the GW anomaly trends derived from the other TWS components. The gap observed for Acre exceeds this three-month threshold for interpolation and is therefore left unfilled. This conservative approach prevents the creation of unreliable synthetic data over long periods.

Figure 24. Manaus water storage variations from 2018 to 2024. The top plot represents the original anomalies, the middle plot represents the anomalies with seasonal trends removed, and the bottom plots represent the GW anomaly trends derived from the other TWS components.

Figure 25. Difference between ICESat-2- and DEM-inferred water levels. The top image shows the height differences for the dry season, while the bottom image shows the height differences for the wet season. The differences between these estimates can be attributed to the availability of ICESat-2 data, water masking performance, and resolution/resampling.

Table 1. Model features.

Feature Name	Source	Description
Ground-Classified Photon Heights	ICESat-2 ATL03	Height measurements of ground-classified photon returns (terrain).
Distance to Coast	AW3D30	Proximity to coastal region.
DEM Height	AW3D30	Height measurement of DEM.
Terrain Slope	AW3D30	Rate of elevation change.
Vegetation Cover	Sentinel-1 GRD	SAR-based classification of vegetation density.
EVI	Sentinel-2 MSI	Enhanced Vegetation Index, assessing vegetation health and coverage.
NDVI	Sentinel-2 MSI	Normalized Difference Vegetation Index, assessing vegetation health and coverage.
VV Standard Deviation	Sentinel-1 GRD	Variability in vertical transmit/receive polarization, used to assess roughness.
VH Standard Deviation	Sentinel-1 GRD	Variability in vertical transmit/receive polarization, used to assess vegetation.
Location (X,Y,Z)	Coordinate data in Cartesian coordinates	Spatial representation of points converted to avoid distortion.

Table 2. Reservoir location, area, and temporal overlap for comparison, as well as error metrics compared to our estimates. The reservoirs vary in size from roughly 20 km² to 276 km² [40,41,42,43] The date overlap column refers to the timeframe between the DAHITI water volume estimates and our calculated water volume estimates.

Reservoir	Latitude	Longitude	Avg Area [km²]	Date Overlap
Pires Ferreira	−4.225°	−40.4487°	44.454	2016-12 to 2020-03
Poço da Cruz	−8.4973°	−37.681°	22.607	2016-10 to 2019-12
Nova Ponte	−19.1291°	−47.3831°	276.11	2016-12 to 2020-03
Jacareí	−22.968°	−46.3516°	34.718	2016-12 to 2019-11

Table 3. Extents of our four ROIs for GRACE-FO comparison.

ROI	Min Latitude	Max Latitude	Min Longitude	Max Longitude
Iquitos	−4.0°	−3.0°	−74.0°	−73.0°
Amazonas	−5.0°	−4.0°	−69.0°	−68.0°
Acre	−10.0°	−9.0°	−68.0°	−67.0°
Manaus	−4.0°	−3.0°	−60.0°	−59.0°

Table 4. Initial assessment of possible model architectures. We found that for our first model (trained to predict ICESat-2 terrain heights), LGBM performed the best for Model 1, while Extra Trees (trained to predict ALS terrain heights) performed the best for Model 2.

Dataset	Model	Points	Test RMSE (m)	Test MAE (m)	Test MAPE (%)	Holdout RMSE (m)	CV RMSE Mean (m)	Runtime (s)
Model 1 (subset)	Gradient Boosting	400,000	24.8143	16.9337	37.1727	32.9318	25.6186	1101.7
	Ridge	400,000	25.1981	16.6244	42.3386	33.1267	27.9791	0.64
	LGBM	400,000	25.4732	17.3843	32.6046	40.1545	21.1549	144.95
	Extra Trees	400,000	25.6704	17.764	32.4216	36.2144	20.7149	200.36
	Lasso	400,000	26.1356	16.984	53.7819	33.3548	28.7234	2.81
	ElasticNet	400,000	26.8239	17.3842	56.0406	33.9216	29.5481	1.99
	CatBoost	400,000	29.5375	20.2909	24.0245	43.4658	18.3456	132.39
	XGBoost	400,000	29.7489	20.2821	25.4909	39.7076	17.827	7.75
	RandomForest	400,000	30.7209	20.9816	35.9711	39.6489	13.7533	3647.28
Model 2	Extra Trees	18,805	1.4684	0.7352	0.1261	1.679	1.8851	4.53
	RandomForest	18,805	1.5718	0.8073	0.1318	1.946	1.9714	45.9
	XGBoost	18,805	1.8431	1.1379	0.2507	2.1834	2.2182	1.01
	LGBM	18,805	2.1521	1.3565	0.354	2.4626	2.3754	2.89
	CatBoost	18,805	2.2152	1.5176	0.4297	2.4711	2.642	19.7
	Gradient Boosting	18,805	2.5266	1.6664	0.4748	2.8236	2.7969	16.84
	Ridge	18,805	6.4828	4.7891	2.1388	6.3496	6.3454	0.13
	Lasso	18,805	8.9525	6.3691	3.1299	9.0292	8.7669	0.44
	ElasticNet	18,805	36.7624	29.1	14.2088	36.0068	36.5794	0.19

Table 5. Performance metrics and improvement percentage for Model 1. We assess the performance of the AW3D30 and corrected AW3D30 DEMs relative to ICESat-2 terrain heights.

Metric	Pre-Model 1	Post-Model 1	Percent Improvement
RMSE (m)	34.01 (11.95%)	11.38 (4%)	66.53%
MAE (m)	22.64 (7.95%)	7.37 (2.59%)	67.43%
MAPE (%)	14.13%	4.24%	70.01%

Table 6. Performance metrics and improvement percentage for Model 2.

Metric	Pre-Model 2	Post-Model 2	Percent Improvement
RMSE (m)	9.32 (1.7%)	1.978 (0.36%)	78.78%
MAE (m)	6.08 (1.11%)	0.93 (0.17%)	84.71%
MAPE (%)	3.46%	0.17%	95.09%

Table 7. Model 1 results for our specific ROIs. * The very large initial MAPE value for Manaus is due to the terrain elevation for this region being close to zero, which inflates the MAPE value.

ROI	ICESat-2 Coverage (%)	Initial RMSE (m)	Final RMSE (m)	Initial MAE (m)	Final MAE (m)	Initial MAPE (%)	Final MAPE (%)
Acre	10.84%	11.06	4.19	7.28	3.00	3.92%	1.73%
Amazonas	2.04%	16.14	2.74	13.82	1.71	12.20%	1.75%
Manaus	23.97%	9.63	2.86	6.99	2.11	226.63% *	3.01%
Iquitos	6.19%	13.43	2.49	10.61	1.63	8.03%	1.39%
Pires Ferreira	0.41%	31.59	1.39	31.53	1.01	17.20%	0.66%
Nova Ponte	9.12%	23.16	3.14	22.65	2.10	2.44%	0.30%
Poço da Cruz	2.77%	19.21	1.21	19.16	0.79	4.23%	0.18%
Jacareí	0.71%	10.50	4.55	8.897	2.63	1.03%	0.30%

Table 8. Comparison metrics for our SAR water mask assessment.

Metric	Season	Mean ± Std
F1_Score	Dry	0.805 ± 0.075
F1_Score	Wet	0.819 ± 0.081
Iou_Jaccard	Dry	0.679 ± 0.098
Iou_Jaccard	Wet	0.700 ± 0.100
Precision	Dry	0.898 ± 0.102
Precision	Wet	0.911 ± 0.091
Recall	Dry	0.745 ± 0.111
Recall	Wet	0.756 ± 0.102

Table 9. Reservoir comparison metrics.

Reservoir	Initial RMSE (km³)	Adjusted RMSE (km³)	Initial MAE (km³)	Adjusted MAE (km³)	Pearson’s Correlation Coefficient	p-Value
Pires Ferreira	0.2147	0.0461	0.1647	0.0364	0.9478	0.0000
Poço da Cruz	0.0211	0.0102	0.0176	0.0084	0.7914	0.0000
Nova Ponte	0.3215	0.1681	0.2406	0.1216	0.9687	0.0000
Jacareí	0.3964	0.0612	0.3891	0.0506	0.6347	0.0002

Table 10. Metrics for trend comparison between water storage components.

ROI	Trend GRACE-FO TWS (m/yr)	p-Value GRACE-FO TWS Trend	Trend GLDAS (m/yr)	p-Value GLDAS Trend	Trend SWS (m/yr)	p-Value SWS Trend	Trend Est. GWS (m/yr)	p-Value Est. GWS Trend
Iquitos	−0.0217	0	−0.0022	0	−0.0014	0.0095	−0.0287	0
Manaus	−0.0468	0.0034	−0.0021	0.0299	−0.0032	0.1919	−0.0218	0.0223
Acre	−0.0346	0	−0.0033	0.0034	−0.0013	0	−0.0169	0
Amazonas	−0.0326	0.0008	−0.0018	0.0011	−0.0022	0	−0.0209	0.0135

Table 11. Correlation metrics for water storage components.

ROI	RMSE (GRACE-FO vs. Sum) (m)	Bias (GRACE-FO—Sum) (m)	Corr (GRACE-FO SWA)	Corr (GRACE-FO GLDAS)	Corr (GLDAS SWA)
Iquitos	0.0959	−0.0215	0.0917	0.524	0.1643
Manaus	0.3102	−0.2153	0.821	0.8178	0.4635
Acre	0.0619	0.0065	0.5947	0.9623	0.6303
Amazonas	0.177	0.0303	0.5577	0.7027	0.3569

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Renshaw, M.; Magruder, L.A. Advancing Scalable Methods for Surface Water Monitoring: A Novel Integration of Satellite Observations and Machine Learning Techniques. Geosciences 2025, 15, 255. https://doi.org/10.3390/geosciences15070255

AMA Style

Renshaw M, Magruder LA. Advancing Scalable Methods for Surface Water Monitoring: A Novel Integration of Satellite Observations and Machine Learning Techniques. Geosciences. 2025; 15(7):255. https://doi.org/10.3390/geosciences15070255

Chicago/Turabian Style

Renshaw, Megan, and Lori A. Magruder. 2025. "Advancing Scalable Methods for Surface Water Monitoring: A Novel Integration of Satellite Observations and Machine Learning Techniques" Geosciences 15, no. 7: 255. https://doi.org/10.3390/geosciences15070255

APA Style

Renshaw, M., & Magruder, L. A. (2025). Advancing Scalable Methods for Surface Water Monitoring: A Novel Integration of Satellite Observations and Machine Learning Techniques. Geosciences, 15(7), 255. https://doi.org/10.3390/geosciences15070255

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Advancing Scalable Methods for Surface Water Monitoring: A Novel Integration of Satellite Observations and Machine Learning Techniques

Abstract

1. Introduction

2. Materials and Methods

2.1. Volume Estimation Overview

2.2. Study Region

2.3. DEM Generation

2.3.1. Machine Learning Data Inputs

AW3D30 (Terrain Reference)

ICESat-2 (Terrain Reference)

Sentinel-1

Sentinel-2

Airborne Lidar Surveys (Terrain Reference)

2.3.2. Initial Assessment

2.3.3. Model Selection and Training

2.4. Surface Water Mapping in All-Weather Conditions

Thresholding Approach

3. Results

3.1. Validation Regions

3.2. ML Model Assessment

3.3. Water Masking with SAR

3.4. Volume Estimates

3.4.1. Relative SWV Estimates Comparison at Local Scales

3.4.2. Long-Term SW Storage and GRACE-FO Observations

3.4.3. Temporally Coincident Altimetry for SWV Estimates

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI