Discharge Estimation Using Harmonized Landsat and Sentinel-2 Product: Case Studies in the Murray Darling Basin

Quantifying river discharge is a critical component for hydrological studies, floodplain ecological conservation research, and water resources management. In recent years, a series of remote sensing-based discharge estimation methods have been developed. An example is the use of the near infrared (NIR) band of optical satellite images, with the principle of calculating the ratio between a stable land pixel for calibration (C) and a pixel within the river for measurement (M), applying a linear regression between C/M series and observed discharge series. This study trialed the C/M method, utilizing the Harmonized Landsat and Sentinel-2 (HLS) surface reflectance product on relatively small rivers with 30~100 m widths. Two study sites with different river characteristics and geographic settings in the Murray-Darling Basin (MDB) of Australia were selected as case studies. Two independent sets of HLS data and gauged discharge data for the 2017 and 2018 water years were acquired for modeling and validation, respectively. Results reveal high consistency between the HLS-derived discharge and gauged discharge at both sites. The Relative Root Mean Square Errors are 53% and 19%, and the Nash-Sutcliffe Efficiency coefficients are 0.24 and 0.69 for the two sites. This study supports the effectiveness of applying the fine-resolution HLS for modeling discharge on small rivers based on the C/M methodology, which also provides evidence of using multisource synthesized datasets as the input for discharge estimation.


Introduction
Quantifying river discharge is essential for both scientific and operational applications for water resources management [1]. Historically, river discharge estimation has relied on having a network of in situ gauge stations to measure river levels. Data from the stations are then transferred to a central collection point, usually through some automated transfer system. Establishment of networks is expensive and requires maintenance, and indeed there are many rivers in the world which are not gauged. This shortage in global discharge data can be partially compensated for by the use of remote sensing [2]. Relevant research both in theory and practice is continually promoting methods and applications of discharge estimation regionally and globally [3][4][5].
There are three broad categories of methods for discharge estimation using remote sensing data [6]. The first category correlates remotely sensed water levels or inundation areas acquired at or near a gauge station with simultaneously collected ground data. Discharge is calculated using a site-specific stage-discharge relationship-this method is generally known as the rating curve method out to show the performance of modeling, and to evaluate the discharge estimation while using this high spatial resolution and high temporal resolution synthesized dataset as the input.

Study Area
The Murray-Darling Basin (MDB) in southeastern Australia is considered to be the most important and only fully developed water system in Australia. Most of the basin is relatively flat with long, slow rivers, which exhibit significant variations in annual flow [30]. In this study, hydrological Gauge 410130 in the Murrumbidgee River and Gauge 414200 in the Murray River and their surrounding areas were selected as the two test sites (Figure 1) for discharge estimation. The Murrumbidgee is a significant sub-basin of the Murray-Darling system and flows into the Murray, which flows down through the southern states of Australia to its exit to the sea. Approximate river widths at the gauge locations were measured on Google Earth, 60 m for 410130 and 90 m for 414200. Both sites have significant surrounding vegetation, including some farm-land, grass-land, and forests. No flow regulation structures were observed near the gauge locations.
Remote Sens. 2020, 12, x FOR PEER REVIEW 3 of 12 discharge estimation while using this high spatial resolution and high temporal resolution synthesized dataset as the input.

Study Area
The Murray-Darling Basin (MDB) in southeastern Australia is considered to be the most important and only fully developed water system in Australia. Most of the basin is relatively flat with long, slow rivers, which exhibit significant variations in annual flow [30]. In this study, hydrological Gauge 410130 in the Murrumbidgee River and Gauge 414200 in the Murray River and their surrounding areas were selected as the two test sites (Figure 1) for discharge estimation. The Murrumbidgee is a significant sub-basin of the Murray-Darling system and flows into the Murray, which flows down through the southern states of Australia to its exit to the sea. Approximate river widths at the gauge locations were measured on Google Earth, 60 m for 410130 and 90 m for 414200. Both sites have significant surrounding vegetation, including some farm-land, grass-land, and forests. No flow regulation structures were observed near the gauge locations.

Input Data
Discharge and remote sensing data were acquired for the study. The discharge data covering the 2017 water year (1 July 2017 to 30 June 2018) and the 2018 water year (1 July 2018 to 30 June 2019) were used as the modeling and validation periods, respectively. Observed daily discharge data (obtained in megaliters/day (ML/day) and converted to cubic meters/second (m 3 /s)) were acquired from the New South Wales Office of Water (http://realtimedata.water.nsw.gov.au/) (for Gauge 410130 in the Murrumbidgee site) and the Victorian Department of Environment and Primary Industries (http://data.water.vic.gov.au/) (for Gauge 414200 in the Murray site) (Figure 2).

Input Data
Discharge and remote sensing data were acquired for the study. The discharge data covering the 2017 water year (1 July 2017 to 30 June 2018) and the 2018 water year (1 July 2018 to 30 June 2019) were used as the modeling and validation periods, respectively. Observed daily discharge data (obtained in megaliters/day (ML/day) and converted to cubic meters/second (m 3 /s)) were acquired from the New South Wales Office of Water (http://realtimedata.water.nsw.gov.au/) (for Gauge 410130 in the Murrumbidgee site) and the Victorian Department of Environment and Primary Industries (http://data.water.vic.gov.au/) (for Gauge 414200 in the Murray site) (Figure 2).  Landsat 8 Operational Land Imager (OLI) is the latest sensor of Landsat missions and is one of the most popular sources of satellite data [31]. Sentinel-2 Multi Spectral Instrument (MSI) is another advanced data source with two sensors (2A/2B) working together. Combining Landsat and Sentinel-2 MSI creates a denser time series (less than 3 days, theoretically) which has been processed and made available as the Harmonized Landsat and Sentinel-2 (HLS) surface reflectance product [25], published on https://hls.gsfc.nasa.gov/. The HLS products obtain seamless products from both sensors (OLI and MSI) using a set of algorithms that include atmospheric correction, cloud and cloudshadow masking, spatial co-registration and common gridding, bidirectional reflectance distribution function normalization, and spectral bandpass adjustment. Readers are referred to the user's guide [32] for details of HLS processing. Corresponding bands between Harmonized Landsat-8 and Sentinel-2 are shown in Table 1. As Sentinel-2 of HLS was resampled from 10 to 30 m, band 5 of Landsat-8 and band 8A of Sentinel-2 were used as the input image series of the C/M method. All available images in the 2017 modeling and 2018 validation water years were acquired, and images that are cloud-free on the study sites (about 65%) were retained after visual inspection. Finally, for both sites, 80 images were used for modeling, and 72 images for validation, indicating an average revisit interval of approximately 5 days.  Landsat 8 Operational Land Imager (OLI) is the latest sensor of Landsat missions and is one of the most popular sources of satellite data [31]. Sentinel-2 Multi Spectral Instrument (MSI) is another advanced data source with two sensors (2A/2B) working together. Combining Landsat and Sentinel-2 MSI creates a denser time series (less than 3 days, theoretically) which has been processed and made available as the Harmonized Landsat and Sentinel-2 (HLS) surface reflectance product [25], published on https://hls.gsfc.nasa.gov/. The HLS products obtain seamless products from both sensors (OLI and MSI) using a set of algorithms that include atmospheric correction, cloud and cloud-shadow masking, spatial co-registration and common gridding, bidirectional reflectance distribution function normalization, and spectral bandpass adjustment. Readers are referred to the user's guide [32] for details of HLS processing. Corresponding bands between Harmonized Landsat-8 and Sentinel-2 are shown in Table 1. As Sentinel-2 of HLS was resampled from 10 to 30 m, band 5 of Landsat-8 and band 8A of Sentinel-2 were used as the input image series of the C/M method. All available images in the 2017 modeling and 2018 validation water years were acquired, and images that are cloud-free on the study sites (about 65%) were retained after visual inspection. Finally, for both sites, 80 images were used for modeling, and 72 images for validation, indicating an average revisit interval of approximately 5 days.

Methodology
A flowchart detailing the used inputs and methods in this study is shown in Figure 3. It mainly contains two parts, discharge modeling and result validation. Discharge modeling applies a linear regression between input NIR image series in the 2017 (modeling) water year and corresponding gauge observations to construct the empirical model. Result validation computes the bias between the observed and estimated discharge in the 2018 (validation) water year.

Methodology
A flowchart detailing the used inputs and methods in this study is shown in Figure 3. It mainly contains two parts, discharge modeling and result validation. Discharge modeling applies a linear regression between input NIR image series in the 2017 (modeling) water year and corresponding gauge observations to construct the empirical model. Result validation computes the bias between the observed and estimated discharge in the 2018 (validation) water year.

Discharge Modeling
The principle of discharge modeling using the C/M method is to take advantage of the ratio of reflectance from the NIR band between one stable land pixel (the calibration (C) pixel) and one pixel within the river (the measurement (M) pixel), and establish a relationship between the ratio (C/M) and the observed river discharge, which is then utilized for estimating discharge of another observation period based on remote sensing data. In this study, the C pixel was revised to be the average value of multiple land pixels in order to increase the stability of C observations. M remains to be the reflectance of a single pixel that best reflects in-channel water variation. To improve the efficiency of computation, C and M were selected inside a region of interest (ROI) that is within a certain distance to the gauge. A distance of 450 m was determined after trailing, which makes the ROI for both sites to be a rectangle of 31 × 31 pixels.
The input data include the observed discharge and quality-checked HLS product of the 2017 water year. To select the stable land pixels, a coefficient of variation (CV) was applied to each pixel, with CV defined as the ratio of the standard deviation to the mean computed through the HLS images

Discharge Modeling
The principle of discharge modeling using the C/M method is to take advantage of the ratio of reflectance from the NIR band between one stable land pixel (the calibration (C) pixel) and one pixel within the river (the measurement (M) pixel), and establish a relationship between the ratio (C/M) and the observed river discharge, which is then utilized for estimating discharge of another observation period based on remote sensing data. In this study, the C pixel was revised to be the average value of multiple land pixels in order to increase the stability of C observations. M remains to be the reflectance of a single pixel that best reflects in-channel water variation. To improve the efficiency of computation, C and M were selected inside a region of interest (ROI) that is within a certain distance to the gauge. A distance of 450 m was determined after trailing, which makes the ROI for both sites to be a rectangle of 31 × 31 pixels.
The input data include the observed discharge and quality-checked HLS product of the 2017 water year. To select the stable land pixels, a coefficient of variation (CV) was applied to each pixel, with CV defined as the ratio of the standard deviation to the mean computed through the HLS images in the 2017 water year. The CV value indicates the variability of surface reflectance [33], with lower CV representing more stable land cover in the pixel. Here, pixels with CV values lower than the 5th percentile (which represent relatively high stability) were chosen and averaged as the C pixel of each image [22]. Finding an optimal M pixel was more difficult. All pixels in the ROI were considered as M candidates at first. For each M candidate, a time series of C/M was calculated, whose correlation with observed discharge was quantified by Pearson's correlation coefficient, r, and tested with the Remote Sens. 2020, 12, 2810 6 of 12 corresponding p-value. Finally, the pixel with the highest r value and a p-value < 0.05 (correlation is significant at the 95% confidence level) was chosen to be the optimal location of M. A linear function was then fitted between the final C/M series and observed discharge for each test site, which was then served as the discharge model for this specific site. For detailed information about the C/M method, readers are referred to Tarpanelli et al. [22].

Result Validation
Based on the established discharge estimation model, the HLS product for the 2018 water year was further employed as the input to estimate discharge for that year, which was then compared with the observed discharge for accuracy evaluation. Locations of C and M pixels determined in the modeling process were adopted directly for calculating the C/M time series for the 2018 water year. This C/M time series was then input into the fitted linear model to estimate discharge series. Using observed discharge corresponding to the remote sensing observations in water year 2018 as the reference, the performance of discharge estimation was evaluated by Root Mean Square Error (RMSE) (Equation (1)), Relative RMSE (RRMSE) (Equation (2)), and Nash-Sutcliffe Efficiency coefficient (NSE) (Equation (3)). RMSE represents the general prediction error. RRMSE represents the RMSE value normalized by the observed discharge, which indicates higher modeling accuracy with a lower value [34]. NSE is widely used in hydrological modeling, with a value closer to 1 indicating a more stable and reliable model [35]. In these equations, Q rs and Q g are remotely sensed estimated discharge and corresponding observed discharge, respectively. Q g represents the average value of observed discharge. Q t g and Q t rs are observed and remotely sensed estimated discharge at time t, respectively.

Discharge Modeling Results
Locations of multiple C pixels were selected through CV images and are displayed as green circles in Figure 4a,b. The C series was computed through the method described in Section 2.2.1. Two images of Pearson's correlation coefficient, r, were generated for these two sites in the 2017 water year (Figure 4c,d), together with the corresponding images of p-value (Figure 4e,f), which represent the significance level of correlation. Thus, the pixel with the highest r value at higher than 95% significance level (p < 0.05) was selected as the optimal M pixel for each site (marked as a red circle in Figure 4). Based on the final determined C and M pixels, time series of the C/M ratio were derived from the NIR band of the HLS product. Two linear functions were fitted between the C/M ratio series and observed discharge series for the 2017 water year in both sites ( Figure 5). It is obvious that both sites have significant linear correlation with coefficients of determination (R 2 ) equal to 0.81 for Murrumbidgee Gauge 410130 and 0.77 for Murray Gauge 414200.

Validation
To further validate the discharge model and evaluate the accuracy, another independent set of HLS data and observed discharge acquired in the 2018 water year were employed. In order to

Validation
To further validate the discharge model and evaluate the accuracy, another independent set of HLS data and observed discharge acquired in the 2018 water year were employed. In order to

Validation
To further validate the discharge model and evaluate the accuracy, another independent set of HLS data and observed discharge acquired in the 2018 water year were employed. In order to maintain the stability and improve the applicability of the discharge estimation model, the C and M locations Remote Sens. 2020, 12, 2810 8 of 12 determined for the 2017 dataset were adopted directly for the 2018 dataset, to generate the C/M series at each site. These series were then input into the fitted linear models to estimate discharge, and finally, an estimated discharge series for the 2018 water year was derived, shown together with the observed discharge in Figure 6.
sometimes when the actual discharge is also very small. During a low flow period, the model tends to underestimate the discharge. For Gauge 414200, whose base flow is much higher, the estimated discharge series also got the shape of observed discharge series quite well. For the biggest flow peak event, which occurred in November 2018, the model successfully reflects the rapid increment of river discharge, although it is still a little bit underestimated. To further quantify the accuracy of estimated discharge series, three accuracy indices (RMSE, RRMSE, and NSE) were calculated, with the results reported in Table 2. Gauge 414200 has a RMSE significantly higher than Gauge 410130, which is mainly due to its much higher base flow. After normalizing by the base flow, RRMSE therefore represents a relatively unbiased evaluation on the accuracy. It is obvious that Gauge 414200 has a higher accuracy with RRMSE equaling to 19%, which suggests a good accuracy. The NSE index also suggests that Gauge 414200 has a much higher accuracy than Gauge 410130.

Discussion
The results based on the use of a Harmonized Landsat and Sentinel-2 product in this study suggest that blending multisource NIR bands from different sensors is feasible for discharge estimation by the C/M method. Through involving more remote sensing data sources, it may be possible to achieve daily discharge estimation for most of the relatively smaller rivers all over the world. Although the major restriction of the C/M method is that a series of observed discharge data are required to construct the regression model, this limitation can be alleviated by mounting automatic flow recording devices to collect discharge observation data for a period, as it seems that a relatively short period of observation can help establish a reliable estimation model. Our study An overall consistency was identified for the estimated and observed discharge series at both sites. For Gauge 410130, whose discharge is relatively small, the estimated series sometimes exaggerates the flow peak, for up to 10 m 3 /s. Abnormal negative discharge may be produced sometimes when the actual discharge is also very small. During a low flow period, the model tends to underestimate the discharge. For Gauge 414200, whose base flow is much higher, the estimated discharge series also got the shape of observed discharge series quite well. For the biggest flow peak event, which occurred in November 2018, the model successfully reflects the rapid increment of river discharge, although it is still a little bit underestimated. To further quantify the accuracy of estimated discharge series, three accuracy indices (RMSE, RRMSE, and NSE) were calculated, with the results reported in Table 2. Gauge 414200 has a RMSE significantly higher than Gauge 410130, which is mainly due to its much higher base flow. After normalizing by the base flow, RRMSE therefore represents a relatively unbiased evaluation on the accuracy. It is obvious that Gauge 414200 has a higher accuracy with RRMSE equaling to 19%, which suggests a good accuracy. The NSE index also suggests that Gauge 414200 has a much higher accuracy than Gauge 410130.

Discussion
The results based on the use of a Harmonized Landsat and Sentinel-2 product in this study suggest that blending multisource NIR bands from different sensors is feasible for discharge estimation by the C/M method. Through involving more remote sensing data sources, it may be possible to achieve daily discharge estimation for most of the relatively smaller rivers all over the world. Although the major restriction of the C/M method is that a series of observed discharge data are required to construct the regression model, this limitation can be alleviated by mounting automatic flow recording devices to collect discharge observation data for a period, as it seems that a relatively short period of observation can help establish a reliable estimation model. Our study demonstrates that a regression model established for a single water year is able to generate proper discharge estimation for another independent year, which suggests a possible extended application with the caveat that the surrounding vegetation does not change much. In addition, based on the theory of hydrological analogy [36], basins are hydrologically similar if they have similar geographical conditions, climate, and water sources. The river discharge characteristic of a gauged basin can be applied to estimate the discharge of a hydrologically similar ungauged basin. Based on this theory, Li et al. [24] presented a similarity coefficient of multiplying basin area and multiyear average precipitation, which was then combined with the C/M model at a gauged basin to obtain discharge estimation at its hydrologically similar basin. This confirms the applicability of the C/M method in ungauged basins, as long as a gauged hydrologically similar basin exists.
This study supports the effectiveness of discharge estimation by the C/M method for small rivers applying the HLS product. However, there are cases where the method may not be applicable. Firstly, it was suggested by this study that using a 30 m resolution HLS product to estimate discharge is more suitable for relatively wider river sections with higher base flow, which may lead to an issue of comparing spatial resolution of images and river extent. Thus, when the river is too small, remote sensing data and methods may need improvement. Secondly, resampling the NIR band of Sentinel-2 from 10 to 30 m may weaken the sensitivity in river discharge modeling through generalizing the remote sensing information. Furthermore, optical images are easily contaminated by clouds and their shadows, which further decreases the number of available HLS images. A possible solution for this is to integrate with microwave remote sensing data, such as Sentinel-1 Synthetic Aperture Radar (SAR) data, to eliminate weather effects. Thirdly, sediment content in river channels is likely to influence the model accuracy during flooding as the amount of sediment can be huge, changing the water reflectance significantly [37]. These above facts could detract from being able to capture more accurate and intense discharge variation during flooding events. Besides, it is noted that only two sites each with one year of record were tested in this study. The performances of discharge estimation using the C/M method may vary due to river morphological characteristics, such as river width, floodplain patterns, etc. [5]. More comprehensive studies are needed to further investigate how the C and M locations are affected by river morphological characteristics, which would then benefit the optimal selection of C and M pixels.

Conclusions
This study has demonstrated the feasibility of utilizing high spatial resolution and a high temporal resolution Harmonized Landsat and Sentinel-2 surface reflectance product to estimate river discharge for relatively small rivers with 30~100 m widths based on the C/M method. Two sites with different river characteristics and geographic settings in the MDB of Australia were selected for the study. It was found that for Gauge 410130 at Murrumbidgee River, where the river width is about 60 m, the RMSE, RRMSE, and NSE of the modeled discharge were 4.61 m 3 /s, 53%, and 0.24 respectively, indicating a moderate accuracy. Overestimation happened when estimating flow peaks. For Gauge 414200 at Murray River, whose river width is about 90 m, the modeled accuracy was significantly higher, with RMSE, RRMSE, and NSE of 20.87 m 3 /s, 19%, and 0.69, respectively. Low RRMSE and high NSE values indicate that the HLS-based C/M method works much better on a wider river with larger base flow. The study results are encouraging and sufficient to continue investment in advancing the C/M method, and in the meantime, filling the gap of estimating discharge for relatively smaller rivers by replacing coarse resolution MODIS data with higher resolution HLS data. Since the HLS product has the advantages of both high spatial and temporal resolution, our study provides a detailed investigation regarding the applicability of this product, and results support its application in the field of river discharge estimation. We are in an era of multi-source data explosion, and it is anticipated that many new synthesized products (such as the HLS) will be developed. This study demonstrates the feasibility of using multisource synthesized data as input for discharge estimation. It is also noted that there is significant difference in the estimation accuracy at both sites, which may result from river width or floodplain-specific patterns [5]. Our future work is to quantitatively investigate how these parameters affect the discharge estimation accuracy using the C/M method. To achieve this, more study sites, and many other data sources, including some globally available river width products [38][39][40][41], are to be involved in the future.