Integrating Global Satellite-Derived Data Products as a Pre-Analysis for Hydrological Modelling Studies: A Case Study for the Red River Basin

: With changes in weather patterns and intensifying anthropogenic water use, there is an increasing need for spatio-temporal information on water ﬂuxes and stocks in river basins. The assortment of satellite-derived open-access information sources on rainfall (P) and land use/land cover (LULC) is currently being expanded with the application of actual evapotranspiration (ET act ) algorithms on the global scale. We demonstrate how global remotely sensed P and ET act datasets can be merged to examine hydrological processes such as storage changes and streamﬂow prior to applying a numerical simulation model. The study area is the Red River Basin in China in Vietnam, a generally challenging basin for remotely sensed information due to frequent cloud cover. Over this region, several satellite-based P and ET act products are compared, and performance is evaluated using rain gauge records and longer-term averaged streamﬂow. A method is presented for fusing multiple satellite-derived ET act estimates to generate an ensemble product that may be less susceptible, on a global basis, to errors in individual modeling approaches. Subsequently, monthly satellite-derived rainfall and ET act are combined to assess the water balance for individual subcatchments and types of land use, deﬁned using a global land use classiﬁcation improved based on auxiliary satellite data. It was found that a combination of TRMM rainfall and the ensemble ET act product is consistent with streamﬂow records in both space and time. It is concluded that monthly storage changes, multi-annual streamﬂow and water yield per LULC type in the Red River Basin can be successfully assessed based on currently available global satellite-derived products.

such data can be integrated in hydrological modeling procedures on the regional and global scale. In addition, Hain et al. [37] demonstrated how ET act retrieved from energy balance can be combined with an inferred local water balance to diagnose ancillary sources and sinks of moisture across landscapes, e.g., due to intensive irrigation or agricultural drainage, or access to shallow water tables.
In this paper, we aim to: (1) demonstrate how integrating satellite-derived P, ET act and LULC maps constitutes an important pre-analysis in the first stages of hydrological modeling; (2) show that consistency between hydrological variables is a way to evaluate and compare individual earth observation products, with a focus on five new global ET act products; and (3) evaluate the suitability of global satellite-derived data products for assessing water resources in a basin with challenging conditions for remote sensing. We present our case in the context of the transboundary Red River Basin in Southeast Asia, traditionally a problematic region for remote sensing because of weather patterns, but also a basin with pressing water management issues where limited international data sharing hampers a comprehensive understanding of basin water use and hydrology.

Study Area
The Red River Basin ( Figure 1) can be roughly divided in an upstream half situated in the province of Yunnan in southern China and a downstream half in northern Vietnam, with a minor portion of less than 1% located in Lao PDR. Its total surface area is approximately 164,000 km 2 . The Red River has two main tributaries: Da River (Lixian in Chinese) and Lo River (Panlong). The upstream part of the basin is largely forested, mountainous and sparsely populated. The delta of the Red River, downstream of the confluence of the three major branches, is a densely populated area of great importance to Vietnam for its agricultural productivity and economic activity. an inferred local water balance to diagnose ancillary sources and sinks of moisture across landscapes, e.g., due to intensive irrigation or agricultural drainage, or access to shallow water tables.
In this paper, we aim to: (1) demonstrate how integrating satellite-derived P, ETact and LULC maps constitutes an important pre-analysis in the first stages of hydrological modeling; (2) show that consistency between hydrological variables is a way to evaluate and compare individual earth observation products, with a focus on five new global ETact products; and (3) evaluate the suitability of global satellite-derived data products for assessing water resources in a basin with challenging conditions for remote sensing. We present our case in the context of the transboundary Red River Basin in Southeast Asia, traditionally a problematic region for remote sensing because of weather patterns, but also a basin with pressing water management issues where limited international data sharing hampers a comprehensive understanding of basin water use and hydrology.

Study Area
The Red River Basin ( Figure 1) can be roughly divided in an upstream half situated in the province of Yunnan in southern China and a downstream half in northern Vietnam, with a minor portion of less than 1% located in Lao PDR. Its total surface area is approximately 164,000 km 2 . The Red River has two main tributaries: Da River (Lixian in Chinese) and Lo River (Panlong). The upstream part of the basin is largely forested, mountainous and sparsely populated. The delta of the Red River, downstream of the confluence of the three major branches, is a densely populated area of great importance to Vietnam for its agricultural productivity and economic activity. Annual rainfall varies substantially across the Red River Basin, with values between 700 and 3000 mm found based on long-term station time series [38,39], while even local annual averages of over 4000 mm/year are reported [40]. Approximately 80% of this rainfall occurs in the months May to October, which comprise the wet season for both the Vietnamese and the Chinese portion of the basin [41]. The resulting variability of river discharge in space and time, as well as population growth, lead to substantial challenges related to flood control and water stress, particularly in Vietnamese Annual rainfall varies substantially across the Red River Basin, with values between 700 and 3000 mm found based on long-term station time series [38,39], while even local annual averages of over 4000 mm/year are reported [40]. Approximately 80% of this rainfall occurs in the months May Remote Sens. 2016, 8, 279 4 of 28 to October, which comprise the wet season for both the Vietnamese and the Chinese portion of the basin [41]. The resulting variability of river discharge in space and time, as well as population growth, lead to substantial challenges related to flood control and water stress, particularly in Vietnamese territories [42]. Water management options across the basin have increased with the construction of five large multi-purpose reservoirs in the Vietnamese Da River and tributaries of the Lo River, as well as manifold smaller hydropower dams in both China and Vietnam. However, this has also increased the need for spatiotemporal data on water availability to support reservoir management [43].
At the tail end of the basin, the Red River Delta has seen many centuries of human water management, from the construction of hydraulic works for protection from floodwaters to the support of irrigation by avoiding inflow of brackish water and enhancing land drainage, making use of tidal influences if possible. Three zones can be distinguished: the lowlands, midlands, and highlands, based on their elevation relative to the water table [44]. The spatial distribution of water resources across the Delta is unequal, with some areas approaching the minimum level of water availability required to "sustain life and agricultural production" [45]. Most of the surface area of the Delta is characterized by rice paddies for a major part of the year. Typically, two rice seasons are observed, an irrigation-dependent spring season and a rainfall-dependent summer season [46]. If irrigation water availability allows, farmers grow a third "dry" crop such as vegetables or maize during the October-February period, particularly in the highlands and midlands. Reuse of drainage water within irrigation schemes is substantial [47]. Still, non-consumed irrigation water is one of the main sources of aquifer recharge, and thus of industrial and domestic water supply [48]. The outflow from the complex stream network of the Red River Delta into the Gulf of Tonkin occurs through nine different outlets [49].

Land Use/Land Cover
The current application requires an accurate and recent LULC map covering northern Vietnam/southern China with a sufficient level of spatial detail, validity for a year within the past 10-15 years and distinguishing between classes relevant for the nature of water use, including a class for irrigated cropland. An overview of existing global LULC maps is provided by Mora et al. [19], with spatial resolutions ranging from mid-resolution (300-500 m) to lower resolution (ě1 km) products. In addition, the first high-resolution Landsat-based global LULC products are now also available [50,51]. The number of classes of the available LULC maps varies from 9 to 37, and years of coverage from 1992 to 2012. Based on the criteria mentioned above, in particular Globcover 2009 [52] and GLCNMO2008 [53] were identified as potentially suitable inputs to this study.
Accuracies of global LULC products were previously found to be in the range of 69%-87% [12]. Ongoing initiatives such as the Global Observation for Forest Cover and Land Dynamics (GOFC/GOLD) of ESA seek to enhance the quality of global LULC products. In the meantime, auxiliary satellite images from the public domain are helpful to enhance LULC maps for a specific region. We adopted an approach of deriving an optimized LULC map for the Red River Basin derived from a combination of existing LULC GSDPs and time series of freely available MODerate resolution Imaging Spectroradiometer (MODIS) satellite images [54] a proven methodology for improving the accuracy of LULC maps [55,56]. Regional-scale improvement of global land cover products, incorporating auxiliary data and a priori knowledge, leads to more accurate and actionable water accounting information.
The 300 m Globcover 2009 map was taken as the basis for the new LULC map. Although the spatial distribution of forested and shrubland classes seems in accordance with expert knowledge, the original Globcover 2009 product largely contains rainfed cropland pixels for the Red River Delta. This is erroneous when viewed against the abundant presence of irrigation infrastructure. However, there is a blurred line between rainfed and irrigated agriculture in the region, as the wet-season is likely rainfed in both classes, with water coming from rainfall or recession of seasonal floods [57]. The main Remote Sens. 2016, 8, 279 5 of 28 distinctive feature between locations with a single, exclusively rainfed crop and multi-cropped areas with at least one irrigated cycle is therefore the occurrence of a winter and/or spring crop [58].
According to the Globcover 2009 validation report [59], irrigated pixels are regularly misclassified as other agricultural classes. Therefore, to correct the Globcover 2009 agricultural classes, first all cells containing >50% cropland were merged into a single cropland class. MODIS Normalized Difference Vegetation Index (NDVI) values within the merged cropland class during the spring season were decisive in distinguishing irrigated from rainfed agriculture. Pixels covered by clouds, as indicated by the MODIS pixel reliability layer, were omitted from this analysis. No gapfilling of individual images was performed, in order to only include pixels directly sensed by MODIS with sufficient quality. An average NDVI of at least 0.55 in the months March to May was used as a criterion for identifying irrigation, in accordance with the typical Red River Delta spring cropping cycle. A different cropping calendar was identified from NDVI time series analyses for the northern parts of the basin, with a pronounced peak during January. For this reason, a second precondition of a minimum NDVI of at least 0.55 in January was introduced to account for irrigation in the upstream portion of the basin. The underlying assumption is that an NDVI of 0.55 for cropland in the Red River Basin cannot be achieved in January or March-May by relying solely on rainwater.
In addition to the correction of the Globcover 2009 cropland classification, a visual assessment of the original map against high-resolution satellite imagery indicated an underestimation of urban area in the Red River basin. It was observed that the urban land use class of GLCNMO2008 is more realistic and these cells were therefore introduced to represent built-up area in the improved LULC map. As a final step, isolated pixels were filtered out using a GIS focal majority filter.
MODIS NDVI time series of three major classes in the final LULC map are displayed in Figure 2. While some noise is apparent due to the different cloud masks applied to each of the individual images, distinct temporal patterns are clearly identified. The second annual cropping season in the irrigated class is clearly visible when compared to the rainfed cropland. A third, less pronounced peak of irrigated NDVI values can be observed in the winter months. Year-to-year differences of winter and spring crop NDVI are illustrative of varying water availability. As is to be expected, average NDVI of the merged forested class remains relatively stable and high (>0.5) throughout the entire year. the MODIS pixel reliability layer, were omitted from this analysis. No gapfilling of individual images was performed, in order to only include pixels directly sensed by MODIS with sufficient quality. An average NDVI of at least 0.55 in the months March to May was used as a criterion for identifying irrigation, in accordance with the typical Red River Delta spring cropping cycle. A different cropping calendar was identified from NDVI time series analyses for the northern parts of the basin, with a pronounced peak during January. For this reason, a second precondition of a minimum NDVI of at least 0.55 in January was introduced to account for irrigation in the upstream portion of the basin. The underlying assumption is that an NDVI of 0.55 for cropland in the Red River Basin cannot be achieved in January or March-May by relying solely on rainwater.
In addition to the correction of the Globcover 2009 cropland classification, a visual assessment of the original map against high-resolution satellite imagery indicated an underestimation of urban area in the Red River basin. It was observed that the urban land use class of GLCNMO2008 is more realistic and these cells were therefore introduced to represent built-up area in the improved LULC map. As a final step, isolated pixels were filtered out using a GIS focal majority filter.
MODIS NDVI time series of three major classes in the final LULC map are displayed in Figure 2. While some noise is apparent due to the different cloud masks applied to each of the individual images, distinct temporal patterns are clearly identified. The second annual cropping season in the irrigated class is clearly visible when compared to the rainfed cropland. A third, less pronounced peak of irrigated NDVI values can be observed in the winter months. Year-to-year differences of winter and spring crop NDVI are illustrative of varying water availability. As is to be expected, average NDVI of the merged forested class remains relatively stable and high (>0.5) throughout the entire year. The final, enhanced LULC map is depicted in Figure 3. Visual comparison with the recently released IWMI map of irrigation in Asia [60], retrieved 19 November 2015) shows similar spatial distributions of rainfed and irrigated land. As the Red River Delta has been the main focus area of previous studies, availability of validation data is mainly limited to this area. The modifications to the original Globcover 2009 yield a total irrigated area of 869,029 ha in the 10 provinces that together make up the Red River Delta administrative region. Literature sources report irrigated acreages varying from 670,000 to 850,000 ha, although the exact spatial and temporal scope of these figures is not always specified in these studies [43,45,[61][62][63]. These values are all somewhat lower than the acreage found in a recent Advanced Synthetic Aperture Radar (ASAR)-based study, reporting 1,180,000 ha of doublecropped rice for 2007-2011 [58], so some uncertainty persists. Overall, the new LULC map corresponds well with the majority of available information from other sources and suffices for the current purpose. The final, enhanced LULC map is depicted in Figure 3. Visual comparison with the recently released IWMI map of irrigation in Asia [60], retrieved 19 November 2015) shows similar spatial distributions of rainfed and irrigated land. As the Red River Delta has been the main focus area of previous studies, availability of validation data is mainly limited to this area. The modifications to the original Globcover 2009 yield a total irrigated area of 869,029 ha in the 10 provinces that together make up the Red River Delta administrative region. Literature sources report irrigated acreages varying from 670,000 to 850,000 ha, although the exact spatial and temporal scope of these figures is not always specified in these studies [43,45,[61][62][63]. These values are all somewhat lower than the acreage found in a recent Advanced Synthetic Aperture Radar (ASAR)-based study, reporting 1,180,000 ha

Rainfall
A spatially distributed monthly rainfall product is required which covers the Red River Basin for the last 10-15 years. Existing rainfall GSDPs with over 10 years of data in the period 2000 to present and a spatial resolution of ≤0.25 degree were downloaded and evaluated: the Tropical Rainfall Measurement Mission monthly best estimate (TRMM 3B43 v7), the global rainfall estimate based on the CPC MORPHing technique (CMORPH) and the Climate Hazards Group InfraRed Precipitation with Station dataset (CHIRPS v1.8). Since no readily available CMORPH monthly product exists, three-hourly data were aggregated to obtain monthly values. Table 1 presents the main characteristics of the rainfall GSDPs evaluated for the Red River Basin. Table 1. Evaluated rainfall GSDPs for the Red River Basin. The basin-wide mean rainfall (μ) and year-to-year standard deviation (σ) are reported for the overlapping period (January 2003-December 2014). April-September and October-March rainfall statistics are listed separately to reflect the regional seasonality of rainfall. In order to select the most accurate rainfall product for the target basin, the performance of each of the GSDPs was assessed by means of ground observations. Daily rainfall station data were purchased from the Vietnamese National Center for Hydro-Meteorological Forecasting (NCHMF) and downloaded from the NOAA Global Summary of the Day (GSOD) database, as distributed by the National Climatic Data Center (NCDC). In total, multiple years of rainfall data for 76 gauges were available for GSDP validation. Figure 1 indicates the location and amount of data available for each station. A full list of all rain gauges can be found in Table A1. Data from 62% of these stations are not provided in the public domain and are therefore particularly suitable for validation, since the TRMM and CHIRPS algorithms incorporate a calibration procedure based on open-access rainfall gauge

Rainfall
A spatially distributed monthly rainfall product is required which covers the Red River Basin for the last 10-15 years. Existing rainfall GSDPs with over 10 years of data in the period 2000 to present and a spatial resolution of ď0.25 degree were downloaded and evaluated: the Tropical Rainfall Measurement Mission monthly best estimate (TRMM 3B43 v7), the global rainfall estimate based on the CPC MORPHing technique (CMORPH) and the Climate Hazards Group InfraRed Precipitation with Station dataset (CHIRPS v1.8). Since no readily available CMORPH monthly product exists, three-hourly data were aggregated to obtain monthly values. Table 1 presents the main characteristics of the rainfall GSDPs evaluated for the Red River Basin. Table 1. Evaluated rainfall GSDPs for the Red River Basin. The basin-wide mean rainfall (µ) and year-to-year standard deviation (σ) are reported for the overlapping period (January 2003-December 2014). April-September and October-March rainfall statistics are listed separately to reflect the regional seasonality of rainfall. In order to select the most accurate rainfall product for the target basin, the performance of each of the GSDPs was assessed by means of ground observations. Daily rainfall station data were purchased from the Vietnamese National Center for Hydro-Meteorological Forecasting (NCHMF) and downloaded from the NOAA Global Summary of the Day (GSOD) database, as distributed by the National Climatic Data Center (NCDC). In total, multiple years of rainfall data for 76 gauges were available for GSDP validation. Figure 1 indicates the location and amount of data available for each station. A full list of all rain gauges can be found in Table A1. Data from 62% of these stations are not provided in the public domain and are therefore particularly suitable for validation, since the TRMM and CHIRPS algorithms incorporate a calibration procedure based on open-access rainfall gauge measurements. Nevertheless, it was decided to also include public GSOD data in this validation exercise as otherwise no validation data from Chinese territories would be available. Figure 4 shows plots of satellite-derived monthly rainfall data against rain gauge measurements. Of the three evaluated products, the TRMM regression line is closest to the line of 1:1 correspondence, followed by CHIRPS and CMORPH respectively. A few outliers are clearly visible, where high gauged rainfall amounts do not correspond with satellite-derived estimates. These were all recorded at the Bac Quang station. It is unclear if this signifies a problem with the measurement station or the GSDPs. However, as these 10 points make up only a minor portion of the total number of monthly rainfall values evaluated (10,368), their impact on further analyses is negligible. measurements. Nevertheless, it was decided to also include public GSOD data in this validation exercise as otherwise no validation data from Chinese territories would be available. Figure 4 shows plots of satellite-derived monthly rainfall data against rain gauge measurements. Of the three evaluated products, the TRMM regression line is closest to the line of 1:1 correspondence, followed by CHIRPS and CMORPH respectively. A few outliers are clearly visible, where high gauged rainfall amounts do not correspond with satellite-derived estimates. These were all recorded at the Bac Quang station. It is unclear if this signifies a problem with the measurement station or the GSDPs. However, as these 10 points make up only a minor portion of the total number of monthly rainfall values evaluated (10,368), their impact on further analyses is negligible. The error in monthly rainfall estimates for each of the products is further evaluated in Figure 5. With −5.83 mm, CHIRPS has a slightly lower error than TRMM, while the mean error of CMORPH monthly rainfall estimates are furthest from measured values. It is interesting to note that, although the CHIRPS mean error is lower than the TRMM mean error, the standard deviation of the CHIRPS error is higher as a result of the amount of months with large error values. Table 2 lists a number of other commonly used validation statistics. These indicate a favorable performance of TRMM in terms of the relationship between measured and estimated values (r), the relative mean absolute error (RMAE), and the predictive power of the algorithm relative to the gauged mean (Nash-Sutcliffe coefficient).   The error in monthly rainfall estimates for each of the products is further evaluated in Figure 5. With´5.83 mm, CHIRPS has a slightly lower error than TRMM, while the mean error of CMORPH monthly rainfall estimates are furthest from measured values. It is interesting to note that, although the CHIRPS mean error is lower than the TRMM mean error, the standard deviation of the CHIRPS error is higher as a result of the amount of months with large error values. Table 2 lists a number of other commonly used validation statistics. These indicate a favorable performance of TRMM in terms of the relationship between measured and estimated values (r), the relative mean absolute error (RMAE), and the predictive power of the algorithm relative to the gauged mean (Nash-Sutcliffe coefficient). measurements. Nevertheless, it was decided to also include public GSOD data in this validation exercise as otherwise no validation data from Chinese territories would be available. Figure 4 shows plots of satellite-derived monthly rainfall data against rain gauge measurements. Of the three evaluated products, the TRMM regression line is closest to the line of 1:1 correspondence, followed by CHIRPS and CMORPH respectively. A few outliers are clearly visible, where high gauged rainfall amounts do not correspond with satellite-derived estimates. These were all recorded at the Bac Quang station. It is unclear if this signifies a problem with the measurement station or the GSDPs. However, as these 10 points make up only a minor portion of the total number of monthly rainfall values evaluated (10,368), their impact on further analyses is negligible. The error in monthly rainfall estimates for each of the products is further evaluated in Figure 5. With −5.83 mm, CHIRPS has a slightly lower error than TRMM, while the mean error of CMORPH monthly rainfall estimates are furthest from measured values. It is interesting to note that, although the CHIRPS mean error is lower than the TRMM mean error, the standard deviation of the CHIRPS error is higher as a result of the amount of months with large error values. Table 2 lists a number of other commonly used validation statistics. These indicate a favorable performance of TRMM in terms of the relationship between measured and estimated values (r), the relative mean absolute error (RMAE), and the predictive power of the algorithm relative to the gauged mean (Nash-Sutcliffe coefficient).

Indicator Formula CHIRPS TRMM CMORPH
Based on the findings discussed above, TRMM was identified as the most suitable GSDP for describing monthly rainfall in the Red River basin. This is in line with earlier findings that TRMM is the most favorable option for satellite-derived rainfall on the monthly scale in an area in southern China [67], and a successful application of TRMM precipitation in a modeling study in central Vietnam [68]. Apparently, for the Red River Basin, the higher spatial resolution of the CHIRPS product does not lead to a more accurate assessment of rainfall when compared to the point scale. It should be noted that some of the GSOD stations used for validation may also have been part of the TRMM and CHIRPS algorithms, whereas CMORPH is uncorrected for station values.
It is often decided to perform a bias-correction of rainfall GSDPs based on ground observations. However, special attention should go to the issue of scale when comparing point measurements of rainfall gauges to coarse pixels [69]. Naturally, a 25 km pixel can be quite heterogeneous e.g., in terms of topography, and different rainfall rates may occur over short distances within a grid cell. Vernimmen et al. [70] discuss in detail how the presence of multiple ground stations within a grid cell enhances opportunities for validation. In the Red River Basin, five TRMM pixels were identified containing two rainfall stations ( Figure A1). The records of these gauges were averaged per pixel and plotted against TRMM values. This resulted in a slope of the fitted line of 0.97 ( Figure A2). This increase relative to 0.93 ( Figure 4) indicates that performance of TRMM seems satisfactory in terms of representing intra-pixel variability. Although the sample size is insufficient to draw any definitive conclusions, this brief analysis does not provide a reason for assuming that a point-based bias correction would improve the 25-km TRMM rainfall estimate.

Available ET act Products
While the network of rain gauges in the Red River Basin is sufficient to arrive at a well-informed choice of an optimal GSDP for precipitation, this is unfortunately not the case for evapotranspiration. No network of ET act measurements is available for the Red River Basin, limiting the foundation for selecting a single ET act GSDP. We therefore take an ensemble approach to defining ET act across the basin, combining information from multiple GSDPs.
In this study, five ET act products were evaluated with a coverage of the Red River Basin at a spatial resolution of ď5 km with a time series of over 10 years: the MODIS Global Terrestrial Evapotranspiration Product (MOD16, [22]), the Operational Simplified Surface Energy Balance (SSEBop, [23]), the revised Surface Energy Balance System (SEBS, [25]), CSIRO MODIS Reflectance Scaling actual ET (CMRSET, [24]), and the Atmosphere-Land Exchange Inverse (ALEXI) water and energy budget model [71]. Although these products all use MODIS satellite data to some extent, their fundamental modeling strategies are markedly different. SSEBop and SEBS rely on MODIS land surface temperature (LST) data for determination of the latent heat flux. ALEXI uses a similar approach but integrates a range of different spaceborne data sources. CMRSET combines a vegetation index for estimating photosynthetic activity with shortwave infrared reflections to estimate vegetation water content and presence of standing water. MOD16 follows the Penman-Monteith logic and relies on visible and near-infrared data to account for Leaf Area Index (LAI) variability. The latter is currently the only global product that has been tested and reviewed in a substantial number of scientific articles [29]. For a detailed description of each of the ET act algorithms, the reader is referred to the citations listed in Table 3. Table 3. Properties of evaluated ET act products for the Red River Basin. The basin-wide mean ET act (µ) and year-to-year standard deviation (σ) are reported for the overlapping period (January 2003-December 2012). Temporal coverages indicate the time series of each product that were (made) available for this study. ALEXI is the only model for which no preprocessed monthly product was available. Therefore weekly values were aggregated to monthly maps, with ET act during weeks overlapping two months being proportionally divided over these months. Maps of annually averaged ET act for the Red River Basin in 2003-2012 retrieved from the five aforementioned methods can be found in Figure A3. Table 3 lists the basin-averaged ET act according to the individual products. The annual average ET act in 2003-2012 falls within a range of 268 mm, with SSEBop on the low end and SEBS on the high end of the values. It is interesting to note that the standard deviation of seasonal sums in the dry season is higher than in the wet season for all products. This reflects the different ways in which the algorithms simulate evapotranspiration under stressed conditions; during the rainy season, ET act will likely equal ET pot most of the time. None of the retrieved annual ET act amounts conflict with reported values for reference evapotranspiration in the Red River Basin [39], or with the basin annual average potential evapotranspiration (ET pot ) of 1306 mm according to a 1 km global dataset on long-term average monthly ET pot distributed by CGIAR [74].
Karimi and Bastiaanssen [12] report a mean absolute percentage error of 5.4% for remote sensing-based ET act estimations. However, the range of values in Table 3 indicates that algorithms developed for the global scale yield substantially different outlooks on the Red River Basin water balance. This is also visible when comparing the spatial patterns in Figure A3. Specific locations where ET act values of the different products correspond or contradict can be observed in Figure A4, where a spatial depiction of the coefficient of variation (CV) in annual average ET act is provided per pixel. The highest CV values are observed in areas with high elevation along some of the subbasin boundaries, where especially SEBS deviates from the other GSDPs (see Figure A3). A high CV is also found in the coastal zone, possibly caused by differing methodologies for dealing with standing water, or differences in applied land/water masks.
Examining the monthly variability of ET act for different LULC classes against a priori knowledge is a way to further evaluate the five models. Figure 6 shows how monthly ET act varies for three major land use types: irrigated cropland, rainfed cropland and the merged forested classes. In general, the different products agree reasonably well in terms of temporal patterns in monthly ET act , and no clear discrepancies are observed in relation to known monthly rainfall patterns. The least temporal variation is observed in CMRSET, and the highest in SEBS followed by SSEBop. Rainfed agriculture has generally the lowest ET act of these three LULC classes, according to all products. It is found that all models compute a reduction in the difference between the rainfed and irrigated classes as the wet season progresses. This is to be expected to a certain extent, as rainfed crops will have access to sufficient water during this period. The difference remains the largest in SSEBop ET act , whereas almost full convergence of the rainfed and irrigated CMRSET curves occurs from July onwards. MOD16 is the only model that predicts ET act to be highest for the forest class throughout the year. ALEXI and CMRSET predict a very similar time series for the forested and irrigated classes, which may seem surprising as the physical conditions of these ecosystems are rather different. However, both forest and irrigated crops have access to ancillary moisture unavailable to rainfed crops (the forests due to deeper rooting depths), and with the current information it is difficult to determine which of the five temporal curves for these LULC types are most realistic. Despite the differences between products, Figure 6 does not provide sufficient basis for excluding any of the ET act models from further analyses. both forest and irrigated crops have access to ancillary moisture unavailable to rainfed crops (the forests due to deeper rooting depths), and with the current information it is difficult to determine which of the five temporal curves for these LULC types are most realistic. Despite the differences between products, Figure 6 does not provide sufficient basis for excluding any of the ETact models from further analyses.

Solving the Water Balance to Evaluate ETact
As the information available for the Red River Basin is insufficient to verify the quality of the ETact products independently from rainfall, TRMM data were used in combination with streamflow records to check the closure of the water balance: where Q is measured river discharge and ΔS is the change in catchment storage. Fundamental hydrological principles and the law of mass conservation dictate that, over a number of hydrological years, the rainfall surplus (P − ETact) should equal Q at the downstream end of a catchment. In this study, the storage change over a period of 10 years is assumed to be negligibly small. Time series of daily river discharge were purchased from the NCHMF for the hydrological stations indicated in Figure 1. Metadata of these stations are provided in  [49,75]. First, a preliminary check of the reliability of these Q data was performed by checking the consistency of temporal patterns between upstream and downstream stations in the same river branch ( Figure A5). Although the upstream and downstream stations in the Da and Lo basins follow approximately the same pattern, the time series for the Thao River are quite different. For the years 2004 and 2005, hardly any runoff seems to be generated in the largely forested area of 14,000 km 2 between Lao Cai and Yen Bai, while in 2003-2004 Q measurements downstream are even lower than upstream (in other words, net consumption seems to occur), which is impossible given the size and dominant LULC types of the area. As the Yen Bai discharge curve corresponds well with temporal patterns observed at other stations, it was decided to eliminate Lao Cai from further analyses. Averaged over the overlapping period of records, Yen Bai, Vu Quang and Hoa Binh, the three

Solving the Water Balance to Evaluate ET act
As the information available for the Red River Basin is insufficient to verify the quality of the ET act products independently from rainfall, TRMM data were used in combination with streamflow records to check the closure of the water balance: where Q is measured river discharge and ∆S is the change in catchment storage. Fundamental hydrological principles and the law of mass conservation dictate that, over a number of hydrological years, the rainfall surplus (P´ET act ) should equal Q at the downstream end of a catchment. In this study, the storage change over a period of 10 years is assumed to be negligibly small. Time series of daily river discharge were purchased from the NCHMF for the hydrological stations indicated in Figure 1. Metadata of these stations are provided in  [49,75]. First, a preliminary check of the reliability of these Q data was performed by checking the consistency of temporal patterns between upstream and downstream stations in the same river branch ( Figure A5). Although the upstream and downstream stations in the Da and Lo basins follow approximately the same pattern, the time series for the Thao River are quite different. For the years 2004 and 2005, hardly any runoff seems to be generated in the largely forested area of 14,000 km 2 between Lao Cai and Yen Bai, while in 2003-2004 Q measurements downstream are even lower than upstream (in other words, net consumption seems to occur), which is impossible given the size and dominant LULC types of the area. As the Yen Bai discharge curve corresponds well with temporal patterns observed at other stations, it was decided to eliminate Lao Cai from further analyses. Averaged over the overlapping period of records, Yen Bai, Vu Quang and Hoa Binh, the three downstream stations in the subbasins, measure 92.8% of the total runoff at Son Tay. This is according to expectations, with the remaining 7.2% to be generated in the small intermediate area. In short, the analysis of streamflow records yields sufficient confidence in all available measurement stations, with the exception of Lao Cai.
It was decided to use long-term streamflow at one downstream gauging station to assess the area-averaged ET act . Son Tay is the obvious choice, as it is located downstream of the confluence of the main tributaries and upstream of the Red River Delta, the main area of water demand. ET act upstream from Son Tay was compared against TRMM rainfall and measured streamflow in Table 4. Hydrological years were defined from 1 April until 31 March of the subsequent calendar year, in order to include one full wet and dry season. Using this precipitation and streamflow dataset, SSEBop shows the best performance over this basin in terms of accordance with the laws of mass conservation, overestimating P minus Q by only 3.4%. For all other ET act products, values are found to exceed P minus Q with a range of 14.0% (CMRSET) to 34.3% (SEBS). Table 4. TRMM rainfall (P), measured streamflow at Son Tay (Q) and ET act from each of the products for the overlapping period of hydrological years. Only the area upstream of the gauging station has been considered. It is important to realize that the aforementioned differences between P minus Q and ET act are not only a product of uncertainties in satellite-derived P and ET act . A variety of factors cause a potentially significant uncertainty in streamflow records, with errors of 10%-20% not uncommon for single observations [76][77][78]. In the Red River Basin, local stage-discharge relations may become outdated after a number of years, depending on geology, in-stream sand mining and changes in erosion-sedimentation patterns due to reservoir construction. Specifically for the Son Tay gauging station, a error of 10%-15% in streamflow values was reported in 2014 [49]. Since the SSEBop retrieval of ET act falls well within this range of accuracy, we assume that it represents the upstream conditions most accurately in terms of absolute ET act . Still, the outcomes of such assessments should be regarded as comparative analyses, rather than absolute validation exercises.

Construction of an Ensemble ET act Product
While P minus Q comparisons provide a means for assessing general reasonability of ET act retrievals at basin scales, they provide no information about the relative model accuracy in spatially distributing ET act . Each of the algorithms incorporates different inputs, procedures and assumptions, leading to substantial differences in spatial patterns between models, which can be viewed in Figures A3 and A4. Previous studies demonstrated that the performance of a certain ET act algorithm is dependent on factors such as LULC type, climate and the presence of mountains [27][28][29]33,34], meaning that the accuracy of ET act predictions will vary across a basin. An ensemble approach was taken toward generating "best-guess" maps of ET act in the Red River Basin, under the assumption that spatial errors between related yet differing mapping approaches will tend to cancel in the ensemble average. A superior performance of different ET act ensemble products with respect to individual algorithms was previously observed for the Nile Basin [79], where flux towers were available for validation.
To identify models that are spatially most similar, spatial patterns were analyzed in terms of the Pearson correlation coefficient (r) at the pixel level (Table 5). A minimum value of 0.5 was assumed to represent a sufficiently strong spatial correlation to warrant combination in an ensemble ET act product. It was found that the correlation between all pair-wise combinations of ALEXI, MOD16 and SSEBop was above this threshold, whereas CMRSET and SEBS do not achieve this level of correlation with any of the products. Pixel values of monthly ET act for ALEXI, MOD16 and SSEBop were scaled around 1 (the average for each product upstream of Son Tay) and the resulting maps were averaged to create a relative ET act map for each month. Finally, these relative values were multiplied with the SSEBop ET act Son Tay catchment average. In this way, a final monthly ET act product was constructed that is congruent with the basin water balance inferred from P minus Q, as well as with the spatial patterns predicted by the majority of the available ET act GSDPs. The resulting annual ensemble ET act for the Red River Basin is presented in Figure 7. To identify models that are spatially most similar, spatial patterns were analyzed in terms of the Pearson correlation coefficient (r) at the pixel level (Table 5). A minimum value of 0.5 was assumed to represent a sufficiently strong spatial correlation to warrant combination in an ensemble ETact product. It was found that the correlation between all pair-wise combinations of ALEXI, MOD16 and SSEBop was above this threshold, whereas CMRSET and SEBS do not achieve this level of correlation with any of the products. Pixel values of monthly ETact for ALEXI, MOD16 and SSEBop were scaled around 1 (the average for each product upstream of Son Tay) and the resulting maps were averaged to create a relative ETact map for each month. Finally, these relative values were multiplied with the SSEBop ETact Son Tay catchment average. In this way, a final monthly ETact product was constructed that is congruent with the basin water balance inferred from P minus Q, as well as with the spatial patterns predicted by the majority of the available ETact GSDPs. The resulting annual ensemble ETact for the Red River Basin is presented in Figure 7.

Results
In this section, the ensemble-averaged ETact is used to study the water budget of the Red River basin. Long-term rainfall surplus is examined to determine the net production and consumption of water resources across the basin, in wet vs. dry seasons, and per LULC class. Subsequently, monthly runoff patterns are investigated for each subcatchment and storage changes are expressed as a function of rainfall surplus.

Rainfall Surplus
Rainfall surplus (Psur) can be viewed as the total water budget available for generating surface

Results
In this section, the ensemble-averaged ETact is used to study the water budget of the Red River basin. Long-term rainfall surplus is examined to determine the net production and consumption of water resources across the basin, in wet vs. dry seasons, and per LULC class. Subsequently, monthly runoff patterns are investigated for each subcatchment and storage changes are expressed as a function of rainfall surplus.

Rainfall Surplus
Rainfall surplus (P sur ) can be viewed as the total water budget available for generating surface runoff, replenishing aquifers, or recharging soil moisture stores. The partitioning of P sur among different hydrological processes depends on factors such as soil type, slope, and intensity of precipitation. For multi-annual time scales on which ∆S can be neglected, P sur equals the water yield (P´ET act´∆ S), the comprehensive term that is transported downstream through surface and sub-surface pathways to constitute river flow. Figure 8 presents the rainfall surplus in the Red River Basin for 2003-2012. From this map it can be concluded that the Red River Basin in a sense is an atypical river basin, with the upstream part generating relatively little runoff. Particularly the forested areas of the northern portion of the basin have a low P sur over this ten-year period. Rainfall is lower here than in other parts of the basin, and forests likely grow deep roots to tap into aquifers. The highest P sur occurs in the central part of the basin, a transitional area between the low-lying southeast and the mountainous north, with peak values of up to 1300 mm/year. From the perspective of transboundary water management, it is interesting to note that the majority of the average annual P sur occurs in Vietnamese territories (825.4 mm, or~73,000 km 3  (P − ETact − ΔS), the comprehensive term that is transported downstream through surface and sub-surface pathways to constitute river flow. Figure 8 presents the rainfall surplus in the Red River Basin for 2003-2012. From this map it can be concluded that the Red River Basin in a sense is an atypical river basin, with the upstream part generating relatively little runoff. Particularly the forested areas of the northern portion of the basin have a low Psur over this ten-year period. Rainfall is lower here than in other parts of the basin, and forests likely grow deep roots to tap into aquifers. The highest Psur occurs in the central part of the basin, a transitional area between the low-lying southeast and the mountainous north, with peak values of up to 1300 mm/year. From the perspective of transboundary water management, it is interesting to note that the majority of the average annual Psur occurs in Vietnamese territories (825.4 mm, or ~73,000 km 3 ), while only 390.3 mm (~30,000 km 3 ) is produced in China.  Figure 8 shows that the irrigated Red River Delta on average does not consume water on the annual scale. This, however, is not the case when examining the irrigated spring rice season. Figure 9a shows how Psur becomes negative due to water withdrawals during February-April 2010, when a net water consumption of up to 100 mm is observed in the delta. In general, a negative Psur can be partially related to changes of water storage in the unsaturated zone, but a negative value during elongated periods is indicative of withdrawals. During the rainy summer season, Psur is high in the entire basin (Figure 9b). Within the delta, Psur is observed to be highest in the western part, where drainage is the most challenging due to the low relative altitude in relation to the water level [44].  Figure 8 shows that the irrigated Red River Delta on average does not consume water on the annual scale. This, however, is not the case when examining the irrigated spring rice season. Figure 9a shows how P sur becomes negative due to water withdrawals during February-April 2010, when a net water consumption of up to 100 mm is observed in the delta. In general, a negative P sur can be partially related to changes of water storage in the unsaturated zone, but a negative value during elongated periods is indicative of withdrawals. During the rainy summer season, P sur is high in the entire basin (Figure 9b). Within the delta, P sur is observed to be highest in the western part, where drainage is the most challenging due to the low relative altitude in relation to the water level [44]. water consumption of up to 100 mm is observed in the delta. In general, a negative Psur can be partially related to changes of water storage in the unsaturated zone, but a negative value during elongated periods is indicative of withdrawals. During the rainy summer season, Psur is high in the entire basin (Figure 9b). Within the delta, Psur is observed to be highest in the western part, where drainage is the most challenging due to the low relative altitude in relation to the water level [44]. To evaluate water consumers and producers in the Red River Basin, the spatially distributed P sur assessment was coupled with the improved LULC map (Figure 3). Table 6 provides an overview of water consumption and production by the different LULC classes in the Red River Basin. It is found that, on average, there is no net water-consuming LULC class on the annual scale. The largest amount of water in the Red River basin is produced by the extensive forest and shrubland ecosystems (an annual total of 62.3 km 3 ). In total, 102.6 km 3 (or 621 mm/year) of water is produced on average per year, which can be viewed as an estimation of the total outflow of the complex stream network of the Red River Delta. One of the most striking findings from this analysis is that a relatively large amount of water is produced by areas classified as irrigated cropland, while the opposite is found for the single-cropped rainfed class. Although this may be counterintuitive, it is caused by the geographical concentration of single crop agriculture in areas with a relatively low annual rainfall (˘1000-1300 mm). It is observed that the areas equipped with irrigation infrastructure (particularly the delta) are generally receiving more rainfall from the Tonkin sea during the rainy season than the zones dominated by rainfed agriculture further land inwards. Therefore, the observed higher ET act in double-or triple-cropped systems (890.3 mm/year vs. 736.8 mm/year) does not lead to a lower rainfall surplus compared to single crop agriculture. This very high summer rainfall in the delta is a known phenomenon, and the different tributaries and canals essentially serve as drainage canals during this period [80].

Runoff Response Patterns and Storage Changes
When considering time scales of a single year or smaller, the change in storage ∆S becomes an essential component of the water balance. By relating the measured Q from different gauging stations to upstream P sur , it is possible to e.g., identify the locations within a river basin where most streamflow originates, and the time periods when water stores in the soil profile and aquifers are replenished.
For different sections of the Red River Basin, measured streamflow and satellite-derived P sur are compared in Figure 10. In the rainy season, streamflow from the catchments of all available stations typically lags behind the increase in P sur by 1 to 2 months, while the decline in both parameters around September occurs simultaneously. This is likely caused by water storage in aquifers and the soil profile, occurring up to the point of saturation after which all P sur will be discharged as surface runoff. River discharge in parts of the Red River Basin is largely managed, as several large man-made reservoirs are present aimed at flood buffering and hydropower generation [81]. Dry season flow is highest at Hoa Binh and Vu Quang, where artificial storage capacity in the upstream catchments is largest. crop agriculture. This very high summer rainfall in the delta is a known phenomenon, and the different tributaries and canals essentially serve as drainage canals during this period [80].

Runoff Response Patterns and Storage Changes
When considering time scales of a single year or smaller, the change in storage ΔS becomes an essential component of the water balance. By relating the measured Q from different gauging stations to upstream Psur, it is possible to e.g., identify the locations within a river basin where most streamflow originates, and the time periods when water stores in the soil profile and aquifers are replenished.
For different sections of the Red River Basin, measured streamflow and satellite-derived Psur are compared in Figure 10. In the rainy season, streamflow from the catchments of all available stations typically lags behind the increase in Psur by 1 to 2 months, while the decline in both parameters around September occurs simultaneously. This is likely caused by water storage in aquifers and the soil profile, occurring up to the point of saturation after which all Psur will be discharged as surface runoff. River discharge in parts of the Red River Basin is largely managed, as several large man-made reservoirs are present aimed at flood buffering and hydropower generation [81]. Dry season flow is highest at Hoa Binh and Vu Quang, where artificial storage capacity in the upstream catchments is largest. Figure 10. Graphs of upstream rainfall surplus (Psur) from remote sensing and measured streamflow (Q) for each of the available discharge stations. Table 7 lists the long-term Q/Psur values for each of the catchments. Some values deviate substantially from 100%, which indicates that the 2003-2012 ΔS term may not be negligible for these Figure 10. Graphs of upstream rainfall surplus (P sur ) from remote sensing and measured streamflow (Q) for each of the available discharge stations. Table 7 lists the long-term Q/P sur values for each of the catchments. Some values deviate substantially from 100%, which indicates that the 2003-2012 ∆S term may not be negligible for these areas. The low 10-year average of 80.9% for Muong Te can be explained by the construction of several dams in the Chinese part of the Da basin. In previous work [82], at least nine hydropower reservoirs were identified that were commissioned in the years 2007-2009. The filling of these reservoirs in the preceding years has caused an average Q/P sur of 56.1% until March 2007, whereas for April 2007 till September 2012 a value of 99% is found, indicating an almost perfect closure of the water balance by satellite-derived P sur . The total volume of water stored in the new Chinese reservoirs in Da River and its tributaries between April 2003 and March 2007 is estimated at 22.7 km 3 . Another interesting finding is that annual Q/P sur values for the Hoa Binh catchment continuously exceed 100%, whereas the opposite is observed for the adjacent Yen Bai catchment. In combination with the satisfactory agreement between P, ET act and Q data in other catchments, and in the absence of any notable interbasin transfers (to our knowledge), this phenomenon may be partly explained by groundwater flow from the Thao basin to the Da basin. To compare monthly Q and P sur , Table 7 lists the slope and R 2 obtained from linear regression between both variables. For the entire gauged portion of the Red River Basin (upstream of Son Tay), 43% of rainfall surplus is converted to surface runoff. The highest Q/P sur value of 0.65 is found for Hoa Binh catchment, whereas only 33% of P sur contributes to surface runoff upstream of Yen Bai. A reason for this difference is likely the catchment topography, with a lower average slope in the upstream catchent of the latter station. In addition, average annual P sur is substantially higher in Hoa Binh with 730 mm as opposed to 460 mm for Yen Bai, increasing the frequency of occurrence of saturated conditions in the soil profile.
Although the multi-annual Q and P sur are congruent at the subcatchment scale, it is not obvious that a correlation on the monthly scale should be expected. Especially in dry months when the catchment storage is relatively empty, a low and stable Q is observed (likely driven by baseflow) that is not significantly affected by variability in monthly P sur . During wet months, however, the progression of the Q/P sur ratio is representative of the changing response of the catchment to rainfall. Table 7 lists average Q/P sur for each of the months in the rainy season. A similar pattern is observed for all catchments, in which the ratio increases as the rainy season progresses and exceeds 1 at the end of monsoon in September. This consistent increase of Q/P sur suggests that in the Red River Basin saturation excess processes are dominant in runoff generation, rather than Hortonian runoff occurring during high-intensity precipitation events [83,84]. Monthly Q/P sur values substantially higher than 1 could occur due to groundwater flow between catchments, or human actions; e.g. when large volumes of water are released from the reservoirs. These releases occur in particular during the monsoonal months, when flood buffering capacity is required and a maximum water level is maintained [85]. Although these management actions are expected to affect Q/P sur , the natural processes of streamflow generation still appear clearly in the figures in Table 7.
As correlation between Q and P sur is logically weak for specific months, it is not yet feasible to predict Q for every month solely from remote sensing. This could change when ET act GSDPs come available on a daily basis, which will enable a detailed investigation of the relation between cumulative P sur from the start of the hydrological year and the Q/P sur term [84]. However, a clear relation is observed between monthly P sur and ∆S, as the storage capacity of the Red River Basin is not fully satisfied for the major part of the year. Therefore, it is possible to express volumetric ∆S as a function of remotely sensed P sur . Figure 11 depicts a plot of monthly ∆S vs. P sur , upstream of Son Tay. A clockwise hysteresis pattern can be observed. Linear models were derived that enable the prediction of ∆S without the need for ground observations. From December until the start of the rainy season in April, the slope of the models is near to 1 with a relatively stable intercept in the order of 23-29 mm, which can be viewed as the contribution of groundwater to streamflow. The slope of the model decreases as storage fills up and the contribution of P sur to Q increases. As P sur values decrease in September and October due to declining rainfall, the low intercept is representative of the rainwater from previous months that is now taken out of storage to contribute to streamflow. Errors in the derived models for monsoonal months are partly caused by human interventions in Red River water management, and this approach is expected to work even better in more "natural" river basins. not fully satisfied for the major part of the year. Therefore, it is possible to express volumetric ΔS as a function of remotely sensed Psur. Figure 11 depicts a plot of monthly ΔS vs. Psur, upstream of Son Tay. A clockwise hysteresis pattern can be observed. Linear models were derived that enable the prediction of ΔS without the need for ground observations. From December until the start of the rainy season in April, the slope of the models is near to 1 with a relatively stable intercept in the order of 23-29 mm, which can be viewed as the contribution of groundwater to streamflow. The slope of the model decreases as storage fills up and the contribution of Psur to Q increases. As Psur values decrease in September and October due to declining rainfall, the low intercept is representative of the rainwater from previous months that is now taken out of storage to contribute to streamflow. Errors in the derived models for monsoonal months are partly caused by human interventions in Red River water management, and this approach is expected to work even better in more "natural" river basins.

Discussion
With the increasing availability of global actual evapotranspiration data in the public domain, in addition to rainfall and land use/land cover, it is now possible to quantify the main components of the water balance for river basins in a distributed manner. This paper shows that rainfall surplus can be successfully computed from global satellite-derived data products for monthly, annual and multiannual time scales. The total annual water yield of 102.6 km 3 computed for the entire Red River Basin is an estimation of long-term river outflow, which is especially valuable because of the lack of streamflow gauges in the Red River Delta [75]. In non-saturated conditions, spatially distributed monthly Psur is strongly related to changes in storage, and monthly ΔS can thus be quantitatively determined from satellite data. These findings demonstrate that assessments of rainfall surplus from satellite-derived P and ETact potentially facilitate sound water accounting in ungauged river basins that was previously impossible due to missing ground data.
It was found that the SSEBop ETact product succeeds in closing the water balance of the Red River Basin with respect to TRMM rainfall and longer-term streamflow records, while the other products seem to have a tendency to overestimate ETact. The range of average annual ETact values according to different products is found to be rather large (268 mm/year) and illustrates the need for a thorough comparison. For areas with frequent cloud cover, a part of this range is likely attributed to the various ways in which the ETact algorithms deal with cloud-covered skies and data gaps. The observed difference between the individual models is somewhat inconsistent with the very low errors in Figure 11. Monthly changes in storage upstream of Son Tay station (∆S) plotted against the rainfall surplus (P sur ). Dashed lines indicate the lines of best fit for each month (colored) and the entire year (black). For each month, the derived linear model is given on the right with its respective coefficient of determination (R 2 ).

Discussion
With the increasing availability of global actual evapotranspiration data in the public domain, in addition to rainfall and land use/land cover, it is now possible to quantify the main components of the water balance for river basins in a distributed manner. This paper shows that rainfall surplus can be successfully computed from global satellite-derived data products for monthly, annual and multi-annual time scales. The total annual water yield of 102.6 km 3 computed for the entire Red River Basin is an estimation of long-term river outflow, which is especially valuable because of the lack of streamflow gauges in the Red River Delta [75]. In non-saturated conditions, spatially distributed monthly P sur is strongly related to changes in storage, and monthly ∆S can thus be quantitatively determined from satellite data. These findings demonstrate that assessments of rainfall surplus from satellite-derived P and ET act potentially facilitate sound water accounting in ungauged river basins that was previously impossible due to missing ground data.
It was found that the SSEBop ET act product succeeds in closing the water balance of the Red River Basin with respect to TRMM rainfall and longer-term streamflow records, while the other products seem to have a tendency to overestimate ET act . The range of average annual ET act values according to different products is found to be rather large (268 mm/year) and illustrates the need for a thorough comparison. For areas with frequent cloud cover, a part of this range is likely attributed to the various ways in which the ET act algorithms deal with cloud-covered skies and data gaps. The observed difference between the individual models is somewhat inconsistent with the very low errors in satellite-derived ET act that were found in the review by Karimi and Bastiaanssen [12], which illustrates the current disparity between region-specific ET act estimate with opportunities for parameter-tuning and extractions from global datasets. It was found that for the Red River Basin spatial patterns of MOD16, SSEBop and ALEXI are similar, and this finding has been used to compute the areal ET act patterns from these three ET act products with equal weight. The fundamental differences between a relatively simple, largely LST-based model (SSEBop), an algorithm with more advanced physics incorporating temporal LST variability and a separation between evaporation and transpiration (ALEXI), and a method strongly reliant on LAI (MOD16), support the assumption that the selected models complement each other in terms of performance over a heterogeneous terrain. The consistency between satellite-derived P sur and measured Q in terms of both inter-and intra-annual variability, as well as their agreement for individual subcatchments, put confidence in the constructed ensemble ET act maps.
Previous studies in other basins have yielded differing outcomes regarding the relative performance of the respective ET act algorithms. Therefore, the appropriate choice of models for basin-scale normalization is expected to vary from basin to basin. In future studies, depending on the properties of the river basin at hand, different types of ensemble products may be suitable. It is advised that future research focuses on reviewing the strengths and weaknesses of the ET act GSDPs with respect to different LULC types and climate zones, with the aim to achieve a reliable satellite-derived ET act estimation on the global scale. When doing so, the uncertainties associated with each of the components of the water balance, including streamflow records, should receive sufficient attention. It should be noted that the ET act products applied in this research are in differing stages of development and substantial progress is to be expected in the next few years. For example, future versions of the ALEXI product will implement microwave-based LST [86] to provide estimates of ET act over all-sky conditions which is particularly important over the Red River basin during persistently cloudy periods. This use of microwave LST will help constrain estimates of ET act during such periods, which currently rely on gap-filling techniques with high uncertainty and are likely responsible for some of the overestimation of ET act seen in this study.
Analyses of global remote sensing products provide a valuable first outlook on the main hydrological processes within a river basin, especially after verification against the longer term total river outflow to ensure mass balance and consistency. Hydrological models are capable of providing complementary information, for example on non-linear sub-soil flow processes that determine runoff, infiltration, storage change, percolation and recharge. These processes govern the partitioning of rainfall surplus into groundwater and surface water. Models also facilitate analyses on a daily time scale, for which only a few ET act GSDPs are currently available. It is already common practice to use satellite-derived information, in particular P and LULC, as inputs to hydrological models. However, results of remote sensing-based quantifications of monthly ET act , P sur and ∆S, as well as multi-annual Q can also be used to train and constrain hydrological models and water management decision tools. Examples are already available in which remotely sensed ET act is used to constrain hydrological models, or for calibration purposes [87][88][89][90][91][92][93]. P minus ET act appears to be highly correlated with the root zone storage capacity [36]. By using satellite-derived information as a reality check, model performance can be improved. This is in particular relevant in areas with abundant water withdrawals, which require a lot of assumptions to simulate but are implicitly included in remotely sensed ET act [94].
Currently, much attention goes out to the development of global hydrological models (GHMs). Several reviews of the current state of art were recently published [95][96][97]. There are even ongoing attempts to create the first operational, hyper-resolution GHM [98]. Integration with remote sensing is identified as one of the promising trends in GHMs to reduce uncertainties [97]. The latest generation of GHMs is capable of spatially explicit assessments of the consumed fraction of applied irrigation water, thus no longer requiring an estimate of efficiencies as input [99,100]. However, these models still quantify water withdrawals for irrigation by supplying water until optimal growing conditions are achieved, an approach that is likely to lead to an overestimation of withdrawals [101]. Alternatively, non-physically based statistical methods are used to quantify water withdrawals for different water using sectors [102,103]. With ET act maps now readily available on the global scale, it is a logical next step to start incorporating these products in GHMs, either as model constraints or in the calibration procedure. This could lead to a more realistic representation of withdrawals [101,104,105], and therefore of non-consumed water and reuse.

Conclusions
This paper demonstrates how an integration of readily available global satellite-derived data products can shed light on river basin hydrology. With the availability of rainfall (P), land use/land cover (LULC) and the newly available actual evapotranspiration (ET act ) data on the global scale, such analyses can now be performed for all river basins as pre-analyses to numerical hydrology studies. The consistency between different P and ET act products and downstream river discharge should first be evaluated by applying the law of mass conservation on the multi-annual scale. Even for a challenging basin in terms of atmospheric conditions such as the Red River Basin, satisfactory and meaningful conclusions were drawn. Based of 102.6 km 3 , of which only 29% is generated in China. Forests are the main water producer, while also irrigated cropland does not consume water on the annual scale. In addition, it proved possible to model monthly storage changes solely based on satellite derived P and ET act . The ratio of streamflow (Q) over rainfall surplus (P sur ) was found to increase steadily during the rainy season, signifying the importance of saturation excess processes in runoff generation. This is a first step into determining the partitioning between fast surface runoff and slow groundwater runoff.
Although our comparison for the Red River shows that the range between values of individual evapotranspiration products is still substantial, it is concluded that there is a large potential for applying monthly remotely sensed ET act , P sur , storage changes and multi-annual Q to constrain or calibrate hydrological models. This facilitates quantification of hydrological processes that take place on the daily or weekly time scale, or processes that cannot be assessed by remote sensing alone, such as withdrawals, non-consumed water and reuse. Further studies are required to examine the performance of the ET act products for different geographical regions, climate zones and land use types, in order to ultimately facilitate the coupling between these products and global hydrological models. In the meantime, it is concluded that the proposed methodology based on spatial correlations among individual ET act products and absolute calibration of longer-term P´Q works well for the conditions encountered in the Red River basin.   Figure A1. Spatial distribution of TRMM pixels with one (red) or two (green) rainfall gauges. Figure A2. Comparison of TRMM data with measured monthly rainfall averaged per pixel for gauges in pixels with multiple stations. The red line gives the linear regression best fit with 0 intercept.        Figure A4. Map of the coefficient of variation (CV) of annual average ETact based on five different products.  Figure A5. Runoff in million cubic meters (MCM) generated per hydrological year in each of the subbasins, according to streamflow (Q) records. Figure A5. Runoff in million cubic meters (MCM) generated per hydrological year in each of the subbasins, according to streamflow (Q) records.