Systematic Water Fraction Estimation for a Global and Daily Surface Water Time-Series

: Fresh water is a vital natural resource. Earth observation time-series are well suited to monitor corresponding surface dynamics. The DLR-DFD Global WaterPack (GWP) provides daily information on globally distributed inland surface water based on MODIS (Moderate Resolution Imaging Spectroradiometer) images at 250 m spatial resolution. Operating on this spatiotemporal level comes with the drawback of moderate spatial resolution; only coarse pixel-based surface water quantiﬁcation is possible. To enhance the quantitative capabilities of this dataset, we systematically access subpixel information on fractional water coverage. For this, a linear mixture model is employed, using classiﬁcation probability and pure pixel reference information. Classiﬁcation probability is derived from relative datapoint (pixel) locations in feature space. Pure water and non-water reference pixels are located by combining spatial and temporal information inherent to the time-series. Subsequently, the model is evaluated for different input sets to determine the optimal conﬁguration for global processing and pixel coverage types. The performance of resulting water fraction estimates is evaluated on the pixel level in 32 regions of interest across the globe, by comparison to higher resolution reference data (Sentinel-2, Landsat 8). Results show that water fraction information is able to improve the product’s performance regarding mixed water/non-water pixels by an average of 11.6% (RMSE). With a Nash-Sutcliffe efﬁciency of 0.61, the model shows good overall performance. The approach enables the systematic provision of water fraction estimates on a global and daily scale, using only the reﬂectance and temporal information contained in the input time-series.


Introduction
Global freshwater resources are vital for life and our society. Only 2.5% of the earth's hydrosphere is fresh water [1]. Of this, 0.3% can be found on the surface as lakes, wetlands and rivers. This relatively small fraction is nonetheless highly important, as it constitutes an essential human resource and interacts with climate and biodiversity [2][3][4][5][6]. The quantification of surface water has become a major application in the field of remote sensing, since large and remote regions of interest (ROIs) can be investigated efficiently [7,8]. The utilization of spatial and temporal coherent time-series datasets furthermore offers unique opportunities to analyze variability, dynamics and trends of water bodies [9][10][11][12][13][14]. On a global level, such information can contribute significantly to the understanding of climate change [15] and the effects of human activities [16]. Furthermore, knowledge of water level variations (e.g., by altimetry time-series), in combination with surface water extent dynamics, enables the approximation of waterbody bathymetry, subsequently allowing the rapid inundation changes). Typically, related applications are limited to distinctive sites and have not been applied to comprehensive time-series data. This may be owing to the requirement of prior knowledge concerning target regions (e.g., land cover, a priori water/non-water state, elevation, endmember definition and samples for learning algorithms) in order to apply subpixel methods [78,82,84]. This includes machine learning techniques (e.g., decision trees [34], regression tree models [84], Gaussian Process Regression model [83]) used for water fraction estimation. Thus, the constraint of training samples for model building limits applications to distinct sites. In addition, neural networks have been applied for water fraction estimation [85], facing similar constraints as machine learning approaches. By implementing additional data (e.g., topographic, hydrographic data [60]), the downscaling of coarse resolution images to a finer resolution is possible. Limitations to downscaling techniques are due to the quality of original observations and ancillary data. A further technique is the fusion of moderate-and high-resolution images [86]. However, fusion techniques rely on the underlying assumption of spatiotemporal continuity, which is not given for rapid changes. Many applications feature unmixing methods to estimate water fraction, mainly based on the utilization of a linear mixture model [29,34,42,81,87]. Such approaches rely on the assumption that remotely sensed reflectance of a mixed pixel can be expressed as a linear combination of spectral endmembers. Linear spectral mixture modeling is often applied to hyperspectral data to determine the relative abundance of predefined endmembers. A constraint is given by the prerequisite of a priori endmember definition, which is critical for accurately extracting the fraction of endmembers [88]. Furthermore, a sufficient number of bands is necessary to solve decomposition equations [81]. For the global and systematic application of a linear mixture model, two constraints need to be considered [81,83]. Global endmembers have to be defined dynamically in space and time; otherwise, they are not representative. Additionally, the number of bands provided should be larger than the number of endmembers. MODIS only provides two bands at 250 m resolution; consequently, a rather simple and robust model has to be considered.
In this paper, we aim to systematically derive water fractions on pixel level, to improve the quantification capabilities of a MODIS-based earth observation time-series for inland surface water [31]. To achieve this in a consistent spatiotemporal manner, water fraction is estimated on the basis of classification probability and spatial pixel relations, including temporal coherences. Furthermore, we evaluate the optimal configuration of input features and parameter settings for the global utilization of a linear mixture model [89]. The performance of resulting subpixel water fraction estimates is assessed by comparison to reference data gained from classified higher resolution optical data (Sentinel-2 and Landsat 8). Finally, we show that water fraction estimates are able to enhance quantitative capabilities concerning mixed pixels.

Materials and Methods
For this study, we use reflectance data from MODIS, GWP water/non-water classifications, and higher resolution optical images (Landsat 8 and Sentinel-2) as input data. Input features for the linear mixture model are determined from GWP time-series information and MODIS RED and NIR reflectance. Generated water fraction maps are evaluated utilizing reference data gained from higher resolution remote sensing images. By comparison of model performance using different model parameters, the optimal setting considering all reference data is determined. Outcomes of the final model are then evaluated together with the original GWP time-series on pixel level ( Figure 1).

Data
The Global WaterPack (GWP) inland surface water time-series is based on bi-diurnal moderate resolution acquisitions of MODIS sensors Aqua (MYD09GQ) and Terra (MOD09GQ). A binary water mask is created in a dynamic thresholds-based procedure, utilizing NIR and NIR-RED band reflectance. Additionally, several auxiliary layers (multi spectral information, day-night difference, urban areas, relief shadows, thermal information) are used to refine classification output and mask clouds and ocean (MODIS MYD10A1, MOD10A1). To create a temporally consistent time-series, data gaps (mostly due to cloud cover) are interpolated by a moving window approach, which utilizes the closest past and future valid observations of a pixel. For this paper, we use GWP data of 23 MODIS tiles in conjunction with 32 spatiotemporal matches of higher resolution reference datasets. Suitable reference surface reflectance images are selected from Sentinel-2 (Level-2A) and Landsat 8 (OLI/TIRS Level 2) archives. For this study, ROIs are defined by intersections of MODIS tiles and selected Sentinel-2 /Landsat 8 tiles, which spread across different continents and climate zones to represent global variability ( Figure 2).

Generation of Reference Data
Reference data was generated based on Sentinel-2 and Landsat 8 images (Table A1). Spatiotemporal matches with MODIS data were selected, considering global distribution and data integrity (low cloud coverage (<1%) and completeness). Accordingly, a selection of 32 remote-sensing image pairs enabled the generation of approximately 622,900 km 2 of reference area (15,260,357 reference pixels). Consequently, a wide range of surface cover compositions accounted for method sensibility to variable surface reflectance properties. Higher resolution (30 m for Landsat 8, 10 m for Sentinel-2) water/non-water masks were generated by k-means segmentation [91,92] as well as tasseled cap transformation [93]. After unsupervised classification, all reference images underwent rigorous visual verification to ensure data quality. Subsequently, higher resolution water pixels were allocated to a larger MODIS pixel, in case their pixel centers intersected. To minimize projection related errors in the spatial matching of pixels, a MODIS pixel grid vector version was reprojected from sinusoidal projection to the according spatial reference system of the reference image. Finally, reference water fraction was determined by the ratio of allocated higher resolution water and non-water pixels ( Figure 3). Furthermore, a reference water fraction pixel is only valid if the accumulated pixel area of allocated higher resolution pixels amounts to at least 95% of the MODIS pixel area (~59,375 m 2 ).

Classification Probability
The classification probability of a remote-sensing product depends on the classification methodology (e.g., fuzzy or soft classification). To determine the classification probability of GWP, relative datapoint locations in feature space were utilized. For every daily GWP classification of sensor-specific data (MODIS Aqua and Terra), two thresholds (t NIR and t NIR-RED ) were derived from known water pixels. These water pixels were selected in the original product based on additional tile-specific information from MOD44W (MODIS Terra Land Water Mask) and MOD10A1/MYD10A1 products [31]. The resulting distribution of NIR and NIR-RED values within training areas allowed the determination of daily thresholds. To minimize the day-to-day noise, eventually an 8-day-mean temporal filter was used for smoothing. Accordingly, water and non-water pixels were discriminated based on NIR and RED reflectance. For GWP, classification probability relates to the distance between pixel data points and scene-specific dynamically derived thresholds in the NIR and NIR-RED feature space ( Figure 4).

Linear Mixture Model for Water Fraction Estimation
To facilitate systematic water fraction estimation, the linear mixture model of Sheng et al. [89] was utilized. Accordingly, the reflectance of a mixed pixel (R mix ) for VIS to SWIR wavelengths could be estimated by its fractional water coverage (WF) and the approximated reflectance of pure water (R water ) and non-water (R non-water ) pixels: solved for WF, Since this method was applied to Aqua and Terra data separately, outcomes were combined based on spatial data availability. By using information from both MODIS sensors, generally a higher quality of outcomes could be expected (e.g., MODIS Nadir Bidirectional Reflectance Distribution Function (BRDF)-Adjusted Reflectance dataset MCD43A4 [70]). Thus, in case both sensors were able to provide surface reflectance information for a pixel, the average of model outputs was used; otherwise a single estimate depicted the combined result. The number of included sensors (one, two or none) was stated in an additional reliability band of the output raster.

Feature Selection
R predictor variables in the linear mixture model can be substituted by specific features related to reflectance [34]. Aspiring to model application on a global level, we tested a variety of input parameters to determine the optimal setting, considering all ROI datasets ( Figure 1). As proxies for R mix , R water , and R non-water , eight features based on MODIS reflectance and GWP classification thresholds were derived (Table 1): Table 1. Input features for the linear mixture model.

Feature
RED NIR NIR-RED NIR/RED Distance NIR threshold Distance NIR-RED threshold Distance to threshold border Distance to threshold point 1 1 The threshold point is defined as the point of intersection between t NIR and t NIR-RED .
For feature selection, significant linear correlation (r > 0.5, p < 1%) in the majority of ROIs to known reference water fraction was a precondition. An example for a typical distribution of pixel data points with given reference water fraction is shown in Figure 5.
For application in the linear mixture model, features are expected to show lower values for water than for non-water pixels and vice versa. Consequently, distance features are signed negative for pixels in the GWP water classification window (area beneath t NIR and left of t NIR-RED , Figures 4 and 5).

Selection of Pure Pixel Reference Features
To enable a systematic approximation of pure water and non-water features (R water and R non-water ) we utilized spatial pixel relations and temporal information contained in the time-series. Pure water pixels are typically part of larger coherent waterbodies. According to this spatial relationship, a pixel that has water-covered neighbors is likely to contain pure water coverage itself. The same principle applies for the determination of pure non-water pixels. By considering the 8-pixel neighborhood of an evaluation pixel in the center, pure pixel candidates were evaluated. As optical MODIS observations are often rendered invalid by cloud coverage, we also determined monthly and yearly temporal averages (TAs) based on future and past observations of the evaluation pixel. For this, the ratio of classified water and valid observations for monthly (±15 days) and yearly (±182 days) temporal windows was calculated to yield water probability (pw). To also consider the reliability of averages, the ratio of valid observations and temporal window length (rw) was determined. Consequently, invalid observations in the pixel neighborhood were primarily substituted with monthly TA, given a valid observation in the monthly temporal vicinity. Otherwise, the yearly TA was used ( Figure 6): Finally, a purity probability (pp) and reliability (pr) were determined by averaging of all n pixels in the neighborhood as well as the center pixel: Hereby, valid water or non-water observations contributed with 100% or 0% probability, respectively, and 100% reliability. Subsequently, pixels could be considered as pure if several criteria were met. First, pure water candidate pixels must exhibit a pp of 100% (completely surrounded by neighboring water pixels), with a pr value of at least 95%. Conversely, pure non-water pixels had to show a pp of 0%. Furthermore, only pure pixels that had been classified as water or non-water based on a valid observation on the respective day were selected (Figure 7). Based on this pure pixel selection, samples for typical water/non-water feature values (Table 1) were derived. Subsequently, respective reference statistics, including mean, median, 75th and 25th percentile were determined from these sample feature values. By utilizing different numerical measures, more modest (25th percentile), optimistic (75th percentile) and balanced (mean, median) water fraction estimates were available for evaluation.

Performance Evaluation and Determination of Optimal Settings
Water fractions were evaluated on pixel level in 32 ROIs (Figure 1). Results for respective combinations of features and reference feature statistics were evaluated by determining Pearson's correlation (r), root mean square error (RMSE) and mean absolute error (MAE) with regard to reference data: where e is the difference between model estimate (x) and reference water fraction (y), and n is the number of data points. We chose two error metrics to further investigate error distribution. While the MAE gives the same weight to all errors, the RMSE penalizes higher variability in the distribution of error magnitudes by assigning more weight to larger error values [94]. Typically, the majority of pixels in an ROI exhibit no water coverage, biasing evaluation metrics towards zero-water fraction conditions. To reveal coverage-specific performance, we grouped spatiotemporal matches of model output and reference data in pixel coverage categories. Thus, separate assessments were conducted for reference pixels showing full water coverage, no water coverage and partial water coverage (0% < x < 100%). The distribution of all generated reference pixels is shown in Figure 8: To determine the best performing model configuration for respective coverage categories, all ROI-specific outcomes per MODIS sensor were averaged.

Evaluation According to Water Permanence Types
To assess the influence of pixel cover permanence on model performance, we determined the number of pixel state changes between water and non-water. This was conducted based on the GWP time-series for a time span of 6575 days (2003-2020). The resulting pixel variabilities of all pixels in the ROIs were grouped according to variability percentiles. Therefore, each selected variability range represents 1% of the available data, except the zero percentile (permanent pixels), which includes 94% of the data. Finally, the MAE of the respective model estimates and reference water fractions provides information on the influence of pixel change behavior on model performance.

Results
Best performing settings and corresponding results (lowest RMSE/MAE, highest correlation) are shown in Table 2.
The global evaluation showed that each coverage category favors a specific parameter setting. As most of the data is constituted by pure non-water pixels, the optimal setting for all data is biased towards this coverage category (distance threshold point feature and 25th percentile reference statistic). Larger errors occur in the partial coverage category, whereas pure pixel coverage can be approximated more accurately (average of 13% lower error).
Using the optimal settings for all data (distance to threshold point, 25th percentile), we evaluated model output together with raw (without refinement by auxiliary layers) binary GWP classifications (Figure 9).
The comparison shows higher accuracy of water fraction model estimates for pixels with partial water coverage. For entirely water or non-water covered pixels, binary GWP classifications average to lower MAEs than model estimations, as only falsely classified pixels lead to errors in pure pixel categories. The water fraction model tends to underestimate surface water in the pure water category (average of 16% MAE), and to a smaller extent overestimates water coverage for pure non-water reference pixels (average of 4% MAE). Differences between ROI-specific RMSE and MAE error statistics indicate a generally smaller variance in the distribution of error magnitudes for model estimations than for GWP. This is caused primarily by rare GWP misclassifications of pure pixels, which result in large outlier errors of 100%. Model and GWP performance in the partial water category is dependent on the ratio of pixel water and non-water coverage ( Figure 10).  Larger errors in GWP were found for pixels with a balanced water/non-water ratio, whereas model estimates show lower accuracy for nearly full pixels (0.8 < x ≤ 0.9 water fraction). A general skewedness towards low water fraction indicates a more reliable approximation of minorly water-covered pixels.
To assess the goodness of fit of the proposed approach, we also determined the Nash-Sutcliffe efficiency (NSE), percent bias (PBIAS) and RMSE, considering all datapoints given by the reference data.
A NSE of 0.61 indicates a generally good match between model and reference data [95,96]. With a negative PBIAS of −62.34%, the model tends overestimate actual water fraction [97].
The total RMSE of 0.11 indicates a good overall performance, considering a standard deviation of 0.19 of all reference water fractions [98]. The evaluation according to water permanence types showed larger MAEs for change-intensive pixels than for more permanent ones. In the range of moderate pixel variability (34 to 186 state changes in 6575 days), only minor differences in model performance can be observed. Figure 11 shows MAEs for pixels in specific pixel variability ranges, based on variability percentiles.  To illustrate the improvements achieved by water fraction estimation, Figure 12 shows a comparison of original GWP and higher resolution water masks, model and reference water fractions, as well as corresponding variability and true color information for the Songhua Lake and River.

Discussion
For geospatial raster datasets such as GWP, total surface water extent can be approximated by accumulation of water-classified pixels. Limitations given by the moderate spatial resolution of a product induce quantification errors. A single pixel in the highest spatial resolution of MODIS accounts for approximately 62,500 m 2 of classified area. Accordingly, relatively coarse spatial resolution results in discrete areal quantification steps for the estimation of the actually continuous extent of waterbodies such as lakes, reservoirs or rivers. This can lead to erratic changes in estimated surface water extent, given a change-pixel constitutes a relatively large part of a study site or waterbody. Furthermore, subpixel-scale water extent variations are able to move respective mixed pixel data points just across classification threshold boundaries, thus overemphasizing actual extent changes. As a consequence, derived temporal behavior can be misleading. The extent to which this affects estimations of total surface water area strongly depends on the investigation scale and waterbody characteristics. Accordingly, water fraction estimates become increasingly relevant if extent changes are a matter of small-area variations (few pixels to subpixel scale, e.g., shoreline pixels of relatively stable waterbodies).
We showed that by considering GWP classification probability and spatiotemporal time-series information, subpixel water fractions can be approximated in an automated fashion. Accordingly, subpixel-scale water or non-water surfaces are accounted for; they otherwise would be disregarded. Especially for waterbodies with jagged shoreline characteristics (large shore-to-area ratio) as shown in Figure 12, the approach enables an enhanced estimation of surface water extent compared to binary classification.
To accurately capture the distinct temporal extent dynamics of highly variable waterbodies (e.g., dam regulated reservoir), the residual water or non-water fraction of change-intensive shoreline pixels (Figure 12, left bottom) must not be neglected. Therefore, the presented approach enables a systematic provision of water fraction along with the product's spatiotemporal extent. Nonetheless, several arbitrary factors influence model accuracy. Since the relative pixel location in the NIR/NIR-RED feature space mainly determines its water fraction content, surface types with similar spectral properties in respective wavelengths can lead to an overestimation of water fraction (e.g., urban areas; Figure 12, top right). However, as problematic surface types are already masked by auxiliary layers in GWP, the handling of overestimations caused by urban areas, cloud shadows, burned areas or relief shadows is straightforward. On the other hand, water properties such as depth, in conjunction with ground materials, turbidity and vegetation content, can corrupt water fraction estimation. The other variable parameters in the linear mixture model are the feature reference values for pure water and non-water pixels. As they are determined from a selection of reference pixels, derived values per scene are dependent on the availability (e.g., by cloud cover location) and the current state (e.g., land cover type phenological state) of these pixels. Nevertheless, information for the determination of model input data is collected dynamically in time from a respective large-area MODIS tile. Consequently, the approach is robust to certain randomness, as a relatively large amount of pure pixel reference information is used to derive model parameters. If certain applications demand surface water estimation in more detail, a customization of model settings for specific tiles or even smaller ROIs is feasible. Similar strategies, along with the use of auxiliary data, have led to highly accurate applications in previous studies [29,34,81,82]. The effect of different model settings on the performance regarding specific pixel coverage categories can be seen when comparing the error scores of Table 2 and Figure 9. Here, better results can be achieved in pure water and partial water coverage categories if other parameter settings are chosen than the optimal setting for all data.
The comparison to reference water fractions derived from higher resolution data showed that the proposed model is able to improve quantification accuracy of the product. This results in a better approximation of the true water coverage of mixed water/nonwater pixels. Consequently, the ability to estimate residual contents in otherwise binary classified pixels is gained. The extent to which this improves the accuracy of surface water estimations depends on the relative area of partially water-covered pixels in an ROI, as well as the coverage ratio of those pixels ( Figure 10). Thus, applications focusing on known waterbodies and their areal dynamics in time benefit from more comprehensive consideration of mixed shoreline pixels ( Figure 12). The evaluation of optimal settings resulted in a modest (25th percentile reference statistic) approach, which promotes the estimation of minorly water-covered pixels ( Figure 10). Although GWP surpasses model performance in pure pixel categories ( Figure 9) considering MAE, larger RMSEs on the other hand suggest that the distribution of error magnitudes is more variable. Consequently, model estimations can be expected to not feature as extraordinarily large errors in single cases as GWP.
The evaluation of model performance according to water permanence types showed that higher error occurs for pixels that change their water/non-water state more frequently ( Figure 11). As the model accuracy is generally higher for pixels exhibiting near to complete water or non-water coverage (Figures 9 and 10), these findings correspond with the fact that change-intensive pixels are more likely to be in a transition state and therefore depict considerable water and non-water coverage. Furthermore, such pixels can be expected to not feature deep (blue) water. Consequently, actually (shallow) water-covered pixels do not exhibit distinct water-like reflectance properties, eventually falsifying model predictions.
On the other hand, relatively permanent pixels are more likely to show distinct water or non-water reflectance signatures.
The limitations of the evaluation results have to be considered critically. Clearly, the higher resolution of Landsat 8 and Sentinel-2 as well as the increased number of spectral bands enables a highly accurate classification of reference water and non-water pixels. However, discrepancies can emerge from projection-based deviations, difference in acquisition time (<12 h), pixel-based quantification and not ideally overlapping tile and pixel grids (diverging granules). Furthermore, sampling of reference datasets was very restrictive due to the high requirements to obtain reliable data. Consequently, only a fraction of global spatiotemporal surface composition (land cover, seasonal state) variability was considered.
For the estimation of extent dynamics of coherent waterbodies, a combined use of original binary product and water fraction estimates is recommended. As most inner pixels (not in contact with the shoreline) of a waterbody are completely water covered, higher accuracy can be expected for such pixels using a binary classification. On the other hand, water fraction information on shoreline pixels (outer fringe of the waterbody) can lead to a more accurate approximation of the true overall extent.
Using our approach, we addressed major limitations of systematic global water fraction estimation. Accurate definition of endmembers (pure pixels) in space and time could be achieved by utilization of the binary water/non-water classifications of GWP time-series and inherent temporal information. Consequently, the high temporal resolution of one day enables rapid monitoring of large and diverse areas in an automated fashion. However, this comprehensive spatiotemporal availability comes with the cost of a simplified linear unmixing formulation. We show the feasibility of our approach in diverse ROIs around the globe and that an improved estimation of surface water can be achieved compared to a hard classification.

Conclusions
Earth observation surface water time-series enable the quantification of surface water extent dynamics on a global level. Providing this information in high temporal resolution comes with the constraint of coarse spatial resolution. Consequently, quantification accuracy decreases due to mixed-pixel effects. By utilizing subpixel water fraction information, corresponding errors can be reduced.
We demonstrated a systematic approach to estimate subpixel water fractions for a global, daily surface water time-series dataset. Accordingly, shortcomings due to coarse spatial resolution are addressed. The approach is independent of external datasets, as only time-series inherent information is utilized. Thus, water fraction estimates are made available alongside the spatiotemporal resolution of the original time-series on the pixel level. In the case of the GWP surface water time-series product, this includes daily data provision for the global MODIS tile grid and the full temporal range of mid-2002 to the present day.
Accurate results were achieved in 32 ROIs across the globe by comparison to reference data derived from classified Sentinel-2 and Landsat 8 images. Accordingly, water fractions were validated for a total reference area of approximately 622,900 km 2 , considering various surface type compositions and climate zones. Based on these results, we determined the optimal model configuration for global application. The comparison of model output and original binary product showed that by accounting for the water fraction content of mixed pixels, the overall accuracy of the product on pixel level is enhanced (average of 11.6% lower RMSE).
Given the results of this study, earth observation surface water time-series products can be enhanced by the proposed methodology in order to achieve more detailed information on surface water extent on the subpixel level. For an extension to other remote-sensing surface water time-series, products have to feature inherent classification probability and temporal information. Currently, satellite remote-sensing platforms offer either high spatial or temporal resolution. Many higher spatial resolution datasets are unable to determine true timing or even occurrence of hydrological events (e.g., floods, droughts) due to longer data gaps in specific regions (e.g., tropical climates). By estimating water fractions systematically, as proposed in this paper, from sensors such as MODIS, more detailed results, including actual water coverage of mixed water/non-water pixels, are accessible in shorter intervals. This increased spatiotemporal availability enables the investigation of small-area and/or short-term changes on a global scale. Consequently, improved inferences on temporal surface water dynamics can be achieved.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.