Daily River Discharge Estimation Using Multi-Mission Radar Altimetry Data and Ensemble Learning Regression in the Lower Mekong River Basin

Estimating river discharge (Q) is critical for ecosystems and water resource management. Traditionally, estimating Q has depended on a single rating curve or the Manning equation. In contrast to the single rating curve, several rating curves at different locations have been linearly combined in an ensemble learning regression method to estimate Q (ELQ) at the Brazzaville gauge station in the central Congo River in a previous study. In this study, we further tested the proposed ELQ and apply it to the Lower Mekong River Basin (LMRB) with three locations: Stung Treng, Kratie, and Tan Chau. Two major advancements for estimating Q with ELQ are presented. First, ELQ successfully estimated Q at Tan Chau, downstream of Kratie, where hydrodynamic complexities exist. Since the hydrologic characteristics downstream of Kratie are extremely diverse and complex in time and space, most previous studies have estimated Q only upstream from Kratie with hydrologic models and statistical methods. Second, we estimated Q over the LMRB using ELQ with water levels (H) obtained from two radar altimetry missions, Envisat and Jason-2, which made it possible to estimate Q seamlessly from 2003 to 2016. Owing to ELQ with multi-mission radar altimetry data, we have overcome the problems of a single rating curve: Locations for estimating Q have to be close to virtual stations, e.g., a few tens of kilometers, because the performance of the single rating curve degrades as the distance between the location of Q estimation and a virtual station increases. Therefore, most previous studies had not used Jason-2 data whose cross-track interval is about 315 km at the equator. On the contrary, several H obtained from Jason-2 altimetry were used in this study regardless of distances from in-situ Q stations since the ELQ method compensates for degradation in the performance for Q estimation due to the poor rating curve with virtual stations away from in-situ Q stations. In general, the ELQ-estimated Q (QELQ) showed more accurate results compared to those obtained from a single rating curve. In the case of Tan Chau, the root mean square error (RMSE) of QELQ decreased by 1504/1338 m3/s using Envisat-derived H for the training/validation datasets. We successfully applied ELQ to the LMRB, which is one of the most complex basins to estimate Q with multi-mission radar altimetry data. Furthermore, our method can be used to obtain finer temporal resolution and enhance the performance of Q estimation with the current altimetry missions, such as Sentinel-3A/B and Jason-3.


Introduction
Inland freshwater is regarded as an essential resource for ecosystems, yet we still have limited knowledge of river discharge (Q) variation due to its heterogeneities in time and space [1,2]. The number of gauging stations has declined primarily due to reductions in government funding for the maintenance of stream gauges [3]. Instead, studies on the estimation of Q have emerged using hydraulic variables (water levels, inundation areas, river widths, and surface water slopes) obtained from remote sensing with a multiplicative method [4,5], such as the Manning equation [6] or a rating curve generated by comparing those hydraulic variables from remote sensing with in-situ Q measurements [7,8]. However, those methods require multiple hydraulic variables that may not be available everywhere or depend on rating curves (at-a-station hydraulic geometry, AHG) [9] that vary with high and low water seasons [10][11][12][13].
Recently, Kim et al. [14] introduced the ensemble learning regression for estimating Q (termed ELQ), which has been proven to be an effective method compared to previous methods based on a single rating curve. Details about ELQ can be found in [14]. Its essence is that ELQ, a machine learning method, trains each base learner individually and then combines them linearly. In other words, ELQ generates several base learners, e.g., rating curves, at different locations and combines and harmonizes them using a data-driven approach. As such, ELQ has the potential to create a more accurate function to estimate Q. A brief review of ensemble learning regression and ELQ will be presented in Section 3.
Kim et al. [14] showed that ELQ could generate a strong Q estimation model using altimetryderived water levels (hereafter "altimetry-derived water levels" are referred to as "H") obtained at two or three different virtual stations. A virtual station (VS) is defined as a location where a ground track of satellite altimetry overpasses a waterbody [15,16]. Traditionally, AHG is used at a specific location and is not transferable to other sites, whereas ELQ can integrate observations at different locations.
Although ELQ is a novel method to estimate Q, it still requires in-situ Q data in order to train the base learners and obtain their weights [14]. Note that Kim et al. [17] applied ELQ to estimate Q from 2003 to 2010 using H and the hydrologic-hydraulic hillslope river routing (HRR) model without the aid of in-situ Q over the central Congo River. However, the HRR model used in [17] was parameterized and compared to mean monthly streamflow determined from the historical Q data from 1903 to 1990. Moreover, ELQ has been applied only to the central Congo River basin in [14,17], and their studies only estimated Q at a single location, the Kinshasa-Brazzaville station (Lat/Lon: 4.2822°S/15.3008°E).
On the other hand, the hydraulic complexities in the Lower Mekong River Basin (LMRB) hinder accurate estimation of Q. One of these complexities is due to the existence of Tonle Sap Lake (TSL, see Figure 1 for its location). The TSL is the largest body of freshwater in Southeast Asia, and it has a crucial role for Cambodia's food supply and flood control as a natural reservoir [18]. The TSL has a flow reversal between the wet and dry seasons. The Mekong River flows from the mainstem into the TSL during the wet season (May/June to October/November) while the river flows from the TSL into the mainstem during the dry season (October/November to May/June) [18,19]. The Mekong River Commission (MRC) [18] in 2005 (p.49) describes the difficulty in estimating Q in the LMRB as: "Downstream from Kratie, seasonal floodplain storage dominates the annual regime and there is significant movement of water between channels over flooded areas, the seasonal refilling of the Great Lake and the flow reversal in the Tonle Sap. There is extreme hydrodynamic complexity in both time and space, and it becomes impossible to measure channel discharge." In addition, tidal intrusion can reach up to Phnom Penh [20,21]. Due to these hydraulic complexities in the LMRB, most previous studies derived Q from a statistical method or a hydrologic model, focusing only on relatively upstream locations of the LMRB. For example, Wang et al. [22] estimated Q using a hydrologic model at six locations of the Mekong River. Among the six locations, the most downward was Stung Treng (see Figure 1). Mohammed et al. [23] derived Q using a hydrologic model at eight locations, including Kratie, the most downward location. Again, using a statistical method, the most downward location for estimating Q was Kratie in [24]. Currently, an online web-based nowcast of Q with remotely sensed data and a variable infiltration capacity (VIC) hydrologic model along the Mekong River (http://depts.washington.edu/saswe) is also only available upstream of Kampong Cham (Lat/Lon: 11.909°N/105.388°E) [25]. Here, this work integrated multi-mission radar altimetry data with ELQ to estimate daily Q from 2003 to 2016 at three locations in the LMRB: Stung Treng and Kratie in the middle reach, and Tan Chau, where hydrodynamic complexities exist (see Figure 1 for their locations). To apply ELQ over the LMRB, we assumed that the three in-situ Q stations had become decommissioned after Although previous studies have integrated multi-mission radar altimetry data to estimate Q with a hydrologic model [8] or historical in-situ Q [26], and locations of Q estimation had to be close enough to VSs (e.g., a few tens of kilometers) since they used a single rating curve. In light of the characteristics of ELQ, which combines several H linearly, we present more accurate estimates of Q compared to those obtained from a single rating curve.

Mekong River
The Mekong River Basin (MRB) is the sixth largest in mean annual Q (16,000 m 3 /s), and the river flows through six countries: China, Myanmar, Laos, Thailand, Cambodia, and Vietnam [27]. In general, the water level of the river starts to rise in May and reaches a peak in October. The lowest water levels occur in March and April [28].
The in-situ Q data from Stung Treng and Kratie were obtained from the MRC and can be found in the supporting datasets from [22,29]. The in-situ Q data from Tan Chau from 2003 to 2006 and from 2013 to 2016 were provided by the Asian Disaster Preparedness Center (ADPC) and National Center for Water Resources Planning and Investigation (NAWAPI), respectively. Detailed information about in-situ Q datasets is provided in Table 1.

Radar Altimetry Data
We used Envisat and Jason-2 altimetry data. Envisat, launched on March 1 2002, is the followon to ERS-1 and ERS-2. From 2002 to 2010, Envisat Radar Altimeter-2 (RA-2) determined the heights of Earth's surface using the two-way travel time of radar pulses with a 35-day repeat cycle. We used the 18-Hz along-track range data in the Geophysical Data Record (GDR), which is publicly available from the Center for Topographic Studies of the Ocean and Hydrosphere (CTOH; http://ctoh.legos.obs-mip.fr/data/alongtrack-data/datarequest). Jason-2 is the follow-on to TOPEX/Poseidon and Jason-1, and it observed the Earth with a 10-day repeat cycle. We obtained the 20-Hz along-track range data in the GDR product, which is available from the National Centre for Space Studies (CNES) archive (ftp://avisoftp.cnes.fr/AVISO/pub/jason-2/gdr_d).
In this study, we used the ICE-1 retracked range measurements for Envisat, which is considered most suitable for inland waterbodies [30] and ICE retracked range measurements for Jason-2 [31]. The geophysical corrections (solid Earth and pole tides) and dry troposphere correction were applied. The wet troposphere correction using the onboard microwave radiometers for Envisat and Jason-2 is degraded due to land contaminations [32]. Thus, the wet troposphere correction was calculated by the French Meteorological Office (FMO) from the European Center for the Medium-Range Weather Forecasts (ECMWF) numerical weather prediction model [31,33]. The ionosphere correction derived from the Global Ionosphere Map (GIM) was used instead of the correction based on the dualfrequency range measurements, also because of land contaminations.
Only a handful of VSs can be generated in the LMRB owing to its sparse orbital interval. We used seven VSs from Envisat and four VSs from Jason-2. H were extracted using an automated algorithm described in Okeowo et al. [34]. This algorithm is based on K-means clustering for the detection of outliers without user intervention. It was found that their method is computationally efficient and effective compared to another method, such as the Kalman filter approach described in [35]. More information about the automated algorithm can be found in [34]. Detailed information of VSs is shown in Table 2.

Methods
In Section 3.1, we provide a brief summary of ensemble learning and ELQ. Then, we present detailed methods to generate base learners with Envisat and Jason-2 altimetry data in Section 3.2. In Section 3.3, the integration of the generated base learners is described. Then, combining multiple radar altimetry missions to estimate Q with ELQ is proposed in Section 3.4. Lastly, the performance metrics of ELQ are explained in Section 3.5. Here, we assumed that the three in-situ Q stations have become decommissioned after 31 December 2006 since in-situ Q data are not available from 2007 to 2012 at Tan Chau (see Table 1). We only used in-situ Q from January 2003 to December 2006 for training and validating the ELQ method. Then, the daily Q from January 2007 to September 2016 at the three locations were estimated from Envisat-and Jason-2-derived H without the aid of in-situ Q.

Ensemble Learning and ELQ: A Brief Review
Ensemble learning indicates a series of procedures to train several functions and combine their results based on an integrating rule [36]. Typically, the ensemble process consists of two parts: Ensemble generation and ensemble integration [37]. Some studies [38,39] added an ensemble pruning between the ensemble generation and integration. As it can be seen in Figure 2, in the first step (ensemble generation), a number of candidate functions (base learners) are generated. Then, the ensemble pruning step eliminates some of the generated functions in the first step. Finally, the ensemble integration combines the selected functions to reduce errors.
Kim et al. [14] introduced ELQ, which generates a more accurate estimate of Q by combining several functions obtained from several locations in river reaches. The generalized form of ELQ ( ) is: where is the observation of the hydraulic variables, is the weight determined by the ELQ integration process, and is the base learner. indicates the -th observation, and denotes the -th variable, which represents a location of the obtained variable. α and are the intercept and error term of ELQ, respectively. ELQ has number of base learners, but it is appropriate to have two or three base learners to avoid overfitting [14,40].
Here, candidate ensemble functions indicate several rating curves. We used a depth AHG relationship between historical in-situ Q data and H variations. Time series of H were obtained at different VSs using Envisat and Jason-2 altimetry data.

Generating Base Learners
Base learners for ELQ, which can be obtained from an empirical relationship (i.e., a rating curve) between H and in-situ Q data, were generated. Time series of H obtained from Envisat and Jason-2 altimetry can be found in Figures 3 and 4.  The aim of ensemble generation is to produce a set of base learners for an ensemble integration step. First, before generating the rating curves, the flood travel time in the LMRB should be investigated since different VSs, located upstream and downstream of the in-situ Q stations, were used to estimate Q. Based on previous studies [18,19], the maximum flow peaks in August/September at Phnom Penh while September/October is the month of maximum flow at Tan Chau and TSL. Insitu water levels from 2003 to 2006 at four locations (see Figure 5), Stung Treng (ST), Phnom Penh (PP), Tan Chau (TC), and TSL, were obtained from ADPC. Using these in-situ water level data, we checked the peak flood dates, and they occurred in the order of ST, PP, TC, and TSL as shown by the dashed red line in Figure 5. The reason why the peak flood in TSL occurs after that in TC located downstream of TSL is because of the role of TSL in the flow reversal of the flood. In other words, the filling of TSL occurs in August/September, and the draining of TSL starts in September/October [18]. The peak flood time interval from ST to PP (blue line in x-axis in Figure 5) is more or less unchanged from 2003 to 2006. However, the peak flood time intervals from PP to TC (red line in x-axis in Figure  5) and from TC to TSL are irregular during the same period. The discrepancies for peak flood time intervals during these periods are because the peak flood timing upstream of Phnom Penh depends on the peak flood timing on the mainstem of the Mekong River and local hydrological characteristics, such as rainfall intensities and contributions from tributaries [19]. Therefore, we conclude that there is not enough evidence to determine the specific range of flood travel time for the LMRB.
Second, it is required to estimate water depth ( ) to establish the rating curves. Here, was estimated from the depth-discharge ( -) relationship ( = ), generated using altimetry-derived H as in [14]. For details about generating the -relationship, readers are referred to [14]; only a brief procedure to obtain is provided here. Note that can be obtained from the equation = − + , where is the minimum H during the altimetry observation period, and can be obtained by minus (the height from the reference ellipsoid to the bottom of a crosssectional wetted area). For Steps (1)-(3), the information used was not absolute H but relative H.  Step (1): Calculate relative water stages ( ); subtract from interpolated H.


Step (2): Obtain the coefficient of determination ( ) using the -relationship with 0.1-m increments on .  Step (3): Find the optimum , where of the -relationship is maximized.
During the low water season, the TSL's is reported to be 0.5 to 0.8 m [18,19]. However, as shown in Table 3, our estimated in TSL, for example, at EnvP952, were 28.4, 24.8, and 2.2 m with in-situ Q from Stung Treng, Kratie, and Tan Chau, respectively. The estimated at other VSs also showed different values. This shows that different at a VS were obtained using in-situ Q from different stations. Therefore, the optimal obtained from Steps (1)-(3) above cannot represent the real at that VS. Therefore, hereafter, is replaced by , which indicates the obtained stage value from Steps (1)- (3). Note that the -relationship can still be effectively used as the base learner as shown in Figure 6 ( > 0.70).  Third, in order to estimate daily Q, H were linearly interpolated. As shown in Figure 3, H showed a strong pattern of seasonality for all seven VSs, and therefore it is reasonable to use the daily interpolated H for base learners. Moreover, Kim et al. [17] demonstrated that degradation of the performance estimating Q due to the use of interpolated H can be mitigated with temporally denser acquisition of the original H by integrating more than two base learners. As previously mentioned, it was assumed that the in-situ Q stations were decommissioned after 31  Finally, we obtained base learners using the power law relationship between and in-situ Q, such as: where indicates the -th observation, denotes the virtual stations, and are the coefficients in the power law relationship ( = ), is the stage value estimated with specific in-situ Q data, and is the generated base learner. Using the training dataset, unknown parameters and can be estimated.

Integrating Base Learners
After generating base learners, the final ensemble function can be obtained by a weighted average method [14]. Here, only two base learners were used for the final ensemble function since a large number of base learners may lead to overfitting [40]: where is the final ensemble function, and are base learners (generated ensemble functions), and are the weights of ELQ, α and are the intercept and error terms of ELQ, respectively, and indicates the -th observation ( = 1, 2, ⋯, ).
Ensemble pruning can also be performed before the integration process. Zhou [41] claimed that in order to obtain the best result in the ensemble process, the base learners that are less accurate should be excluded. Table 4 shows of the base learners at seven Envisat VSs with in-situ Q datasets from three different stations. As can be seen from Figure 6, that shows box plots of those ; outliers beyond the upper and lower whiskers were detected. The whisker values were set to half of the interquantile range (IQR), which corresponds to approximately ±1.35 (where is the standard deviation) and 82.3% coverage of the normally distributed data [42]. Based on this analysis, we decided to prune less accurate base learners in the ensemble integration process. Therefore, EnvP952 and EnvP565B were excluded for Q estimation at Stung Treng and Kratie and at Tan Chau, respectively.

Combining Multiple Radar Altimetry Missions
Although ELQ needs in-situ Q data to establish the rating curves and to obtain weights of the base learners, it can be used to fill missing Q data for decommissioned stream gauges. In other words, the discontinued Q data can be more accurately estimated using available historical in-situ Q data and the ELQ method. As previously mentioned, Bogning et al. [26] used several altimetry missions, such as ERS-2, Envisat, Satellite with Argos and AltiKa (SARAL/AltiKa), Cryosat-2, and Sentinel-3A, to estimate Q at Lambaréné in the Ogooué River Basin in Gabon, Central Africa, but the locations for estimating Q had to be close to VSs (e.g., within a few tens of kilometers). Therefore, the study did not use Jason-2 data, whose cross-track interval is about 315 km at the equator. On the contrary, we used several H obtained from Envisat and Jason-2 altimetry regardless of distances from in-situ Q stations since the ELQ method compensates degradation of performance for Q estimation due to a poor rating curve with VSs away from in-situ Q stations.
As shown in Secondly, the trained ELQ can reconstruct Q for the ungauged period (test period) from January 2007 to October 2010. However, since the Envisat satellite moved to a new orbit on October 22 2010, the trained ELQ cannot be used to reconstruct Q after that date. Therefore, it is required to use other radar altimetry data; in this case, the Jason-2 altimeter. In contrast to a previous study [26] adopting multi-mission radar altimetry data with a single rating curve, consideration of inter-mission biases is not required in this study. In other words, rating curves generated from different altimetry missions in [26] should consider the inter-mission biases because locations for estimating Q had to be close to VSs, and locations of VSs from several altimetry missions are not generally matched. On the other hand, since ELQ obtains a relationship between H variations at several different VSs and historical Q data at a specific station, ELQ does not need to consider inter-mission biases. In other words, several rating curves for the ELQ process with multi-mission altimetry data can be obtained not at a single location, but at several different distant locations. In addition, owing to the overlapping period between Envisat and Jason-2 altimetry missions, the training step for Jason-2 was separated from the one for Envisat. Therefore, it is not required to consider inter-biases in this study.
Finally, Q data from October 2010 to 2016 can be estimated using ELQ with Jason-2-derived H. In this step, ELQ-estimated Q, which were reconstructed using the Envisat data from September 2008 to October 2010 (green box in Figure 7), can be used instead of in-situ Q data for the Jason-2 training step of ELQ. Since the values of the base learners with J001L for Stung Treng and Kratie were 0.46 and 0.52, respectively (see Table 5), these weak base learners were pruned in the integration step. Using the obtained ELQ parameters in the Jason-2 training period, ELQ can now be used to estimate Q from October 2010 to September 2016, which includes the ungauged period, without the aid of insitu Q data.

Performance Comparison
We compared ELQ-estimated Q (hereafter ) with estimates of Q obtained from a single rating curve, which is Model 1: = • ( − ) + (hereafter ) [5]. and are parameters to be calibrated with in-situ Q data.
Four metrics were adopted to evaluate the performance, including the mean error (ME), root mean square error (RMSE), relative RMSE (RRMSE), and Nash-Sutcliffe coefficient (NSE) [43,44], for and . NSE ranges from −∞ to 1, where 1 is the perfect match between the estimated and measured Q: = × 100 (%), where is the measured river discharge, is the mean value of , is the estimated river discharge, and is the number of observations.  For Stung Treng, Kratie, and Tan Chau, the best results obtained from the training and validation datasets were from the combinations EnvP565A-021B, EnvP021A-565B, and EnvP866-952, respectively. However, there exists a possibility that the VS combination of the best from the test dataset (January 2007 to October 2010) might be different from that of the training and validation datasets due to different patterns in the variation of H in the training/validation and test datasets [45]. Since we used the parameters of ELQ obtained from the training dataset, it is advantageous to use the test dataset with a similar pattern to the training dataset. Additionally, the correlation coefficient ( ) between two base learners ( and ) should be small in order to maximize the performance of ELQ [14]. To analyze the best combination of VSs for the test dataset, a comparison of obtained from Envisat-derived H between the training and test datasets was performed as shown in Figure 8. For Stung Treng, the best base learner was obtained from EnvP021B (see Table 6). In order to select the best combination with EnvP021B, should be small as much as possible in both the training and test datasets. Since obtained from H between EnvP565A and EnvP021B was the smallest, the combination EnvP565A-021B was selected for the test dataset for ELQ. Similarly, the combinations EnvP866-EnvP565A and EnvP866-EnvP952 were selected for the test dataset of ELQ for Kratie ( Figure 8b) and Tan Chau (Figure 8c), respectively. Table 7 compares the performance of and for the three in-situ Q stations according to the four metrics with the test dataset. Similar to the results from the training and validation datasets, outperformed for all three locations. It should be noted that in-situ water levels at Tan Chau were used for generating pseudo-in-situ Q for statistics in Table 6 since in-situ Q at Tan Chau does not exist for the test period. Hydrographs for the training and validation datasets are shown in Figure 9. In the case of Stung Treng and Kratie, and showed a similar pattern. However, were inaccurate in the high-water season of 2009. These inaccurate might be attributed to the test dataset, which has a different H pattern compared to the training dataset [46]. In other words, the fluctuating Q pattern in the high-water season of 2009 might affect the performance of whereas showed a better fit to in-situ Q because ELQ combines two base learners. In the case of Tan Chau, outperformed since the pattern of Q at Tan Chau is affected by TSL and the Mekong mainstem [18].   data from the test datasets were obtained from the combinations EnvP565A-021B (Stung Treng), EnvP565A-866 (Kratie), and EnvP866-952 (Tan Chau) (see Table 7). data from the test datasets were obtained from EnvP021B (Stung Treng), EnvP866 (Kratie), and EnvP565A (Tan Chau). The blue vertical lines were added for visual clarity among training, validation, and test datasets.

Estimating River Discharge Using Jason-2-Derived Water Levels
Once from January 2007 to October 2010 with Envisat-derived H are obtained, we can train the base learners with Jason-derived H and in the overlapping period from September 2008 to October 2010. The ensemble generation and integration with the four Jason-derived H were performed for the overlap period. After obtaining base learners with Jason-derived H, were estimated from October 2010 to September 2016. Table 8 compares the performance of and for the three in-situ Q stations. In contrast to the experiment with Envisat-derived H, the performance difference between and was marginal for Stung Treng and Kratie. The performance of was actually slightly better than at Stung Treng. The reason for this relatively poor performance for is likely due to the lack of base learners' diversity [41,47]; as increases, the diversity of base learners decreases. On the other hand, as can be seen in Table 8, if we consider only J140 and J179 for the ELQ process, the performance of was better than that of due to enhanced diversity (i.e., smaller ). Similarly, with the combination J001L-J179 ( < 0.60) showed better performance than with both the training and validation datasets for Tan Chau. Hydrographs of Jason-derived and are shown in Figure 10. Note that hydrographs for Stung Treng were generated until December 2012 due to the availability of in-situ Q data. Due to the relatively simple pattern of Q from Stung Treng, hydrographs of and were similar. On the other hand, due to a more complex pattern of Q at Kratie, showed a better fit to in-situ Q in the validation period (see Table 8 and Figure 11). In the case of Tan Chau, in general, agreed better with in-situ Q than .    Table 8). data were obtained from J001U (Stung Treng), J001U (Kratie), and J001L (Tan Chau). Note that hydrographs for Stung Treng were generated until December 2012 due to the availability of in-situ Q data. The blue vertical lines were added for visual clarity between training and validation datasets.

Analysis of ELQ's Performance
In this section, the performance of ELQ was analyzed by three indices, degree of compensation ( ) [14], degree of dominance ( ) [48], and power of base learner ( ) [17]. The analyses were performed with Envisat-derived since the available number of combinations with Jasonderived was small (i.e., <4). Firstly, it was found that a lower between two VSs can provide better estimates of Q in the ELQ process. Kim et al. [14] introduced the performance index of ELQ, : where is the combination pairs of the variables, and is the correlation coefficient between variables in the combination pairs. ranges from 0 (no compensation) to 1 (perfect compensation). As shown in Figures 11a-c, Q improvements increased when increased, where Q improvement (m 3 s -1 ) is defined as RMSE difference between and : Secondly, Kim et al. [48] developed a performance index for ELQ termed : where and are the maximum and minimum weights. Note that the sum of weights ( + ) is 1. ranges from 0 (uniformly distributed dominance) to 1 (skewed dominance).
Since the weights ( and ) are assigned in the weighted average method, these weights are determined by the importance of the respective base learner. The study in [17] analyzed the relationship between and Q improvement over the central Congo River. They found that Q improvement increases when decreases ( = 0.74). This relationship is also confirmed in this study (Figures 11d-f); between and Q improvement from Stung Treng, Kratie, and Tan Chau were 0.48, 0.46, and 0.47, respectively. In some cases, the value of the weight is biased toward one variable because between the two variables is strongly correlated (e.g., < 0.10) [49]. The biased weight implies that the additional base learner, which has a smaller weight, has limited additional information. Consequently, the performance of becomes comparable to that of traditional methods using . Thirdly, Kim et al. [17] also introduced another performance index for ELQ termed power of base learner ( ), which is calculated by averaging the NSEs of base learners. In order to investigate the relationship with Q improvement, , and , each result was color-coded with , as shown in Figures 11g-i. In contrast to the results in [17], which showed a positive relationship between Q improvement and , these two showed a negative relationship for Stung Treng, Kratie, and Tan Chau. The reason for this negative relationship is likely due to a greater from the LMRB. In other words, although more base learners were added in the ELQ process, no improvement in Q estimation was made when the performance of each base learner was high enough (e.g., NSE > 0.90). On the other hand, the values with higher Q improvement (e.g., reddish dots in Figure 11i) showed relatively low . This demonstrates that Q improvement becomes maximized when relatively weak base learners are integrated [40].

Parsimonious Model of ELQ
In this study, only two base learners were used for ELQ. As described in [14], a large number of base learners can lead to overfitting, and therefore, it is necessary to determine the optimal number of base learners. Similar to [14], we investigated the performance of ELQ using two to six base learners with Envisat-derived H for Tan Chau as an example. As shown in Table 9, with more VSs, the performance metrics were slightly improved in the training dataset, but no significant improvement was made in the validation dataset. The number of effective weights remained at three. This indicates that more than three base learners made no additional contribution because the added base learners might be highly correlated with the existing base learners, or the NSE of the added base learner was worse than that of the existing ones. Therefore, it might be optimal to use three base learners. However, compared to the result obtained from two base learners, improvement in the performance of ELQ was negligible when three were used. Therefore, we decided to use only two base learners in this study (principle of parsimony) [50].

ELQ Versus AMHG?
The strength of the ELQ process is that it can combine several base learners, which are obtained from spatially distributed VSs. Using H obtained from several VSs for the estimation of Q is a new concept since a single rating curve has traditionally been generated for the estimation of Q at a specific location. On the other hand, Gleason and Smith [51] introduced a method estimating Q without the aid of in-situ Q using the so-called at-many-stations hydraulic geometry (AMHG), which uses loglinearly related parameters in AHG geometry ( = , = , = , where = width, = depth, = velocity, = discharge, and , , , , , are empirical parameters). Among three loglinear relationships (i.e., -, -, and -), the -relationship has mainly been selected for their studies since river widths along river reaches can be easily extracted from satellite images. The AMHG approach does not require in-situ Q data. However, the accuracy of Q estimation ranges with an RRMSE of 20% to 30% when compared with in-situ Q data [51]. Moreover, the AMHG uses a genetic algorithm (GA) to retrieve instantaneous Q for each observation date [51]. Therefore, this process might be computationally expensive compared to an empirical method. Additionally, since optical satellite images, such as Landsat Thematic Mapper imagery, were adopted to estimate Q using AMHG [51][52][53], estimating Q at a target location that is covered with clouds can be somewhat limited.
Both AMHG and ELQ use several AHG relationships generated from many stations, but they differ in the way they combine the AHG relationships. In order to investigate the similarity between the AMHG and ELQ, we tested the existence of the AMHG relationship at three locations, Stung Treng, Kratie, and Tan Chau with Envisat-derived H. As shown in Figure 12, the log-linear relationship between log and was consistent in all three locations. Our findings indicate that the AMHG formulation is conserved with H even along a relatively long river reach (~a few hundreds of kilometers) although Gleason and Smith [51] assumed that the log-linear relationships exist with many AHG pairs obtained from river reaches only within ~10 km. The primary difference between AMHG and ELQ for the estimation of Q is that the AMHG focuses on the correlative relationship between coefficients and exponents among many AHG relationships, whereas ELQ concentrates on the difference among several AHG relationships, which cannot be identified in the log-linear relationship. In other words, AMHG estimates Q using shared information among many AHG relationships without the aid of in-situ Q while ELQ uses the identified complementary information from several AHG relationships in an ensemble learning process. Therefore, a synergy between AMHG and ELQ could be made based on the recent finding in [17], which obtained more accurate Q with an RRMSE of 7% to 10%, with ELQ and a hydrologic model-derived Q, whose daily accuracy was about RRMSE of 15% to 18%. Similarly, the accuracy of AMHG-derived Q could be further improved with ELQ.

Conclusions
This study demonstrated that more accurate daily Q estimates can be obtained with ELQ and multi-mission radar altimetry data over the LMRB from 2003 to 2016. In the case of Tan Chau, the RMSE of obtained from EnvP866 and EnvP952 decreased by 1504/1338 m 3 s -1 with Envisatderived H for the training/validation datasets. This corresponds to the mean annual Q of the Arkansas River (Arkansas, USA; 1128 m 3 s -1 ), which is a major tributary of the Mississippi River. Once the ELQ model was established in the training period, were reconstructed with Envisat-derived H from January 2007 to October 2010. Then, Jason-derived H were trained using the reconstructed from September 2008 to October 2010. Finally, we estimated from October 2010 to September 2016 with Jason-derived H. Our results showed that ELQ can successfully estimate Q even if only a few VSs exist along a river reach. In other words, since the ELQ method compensates for degradation of the performance of Q estimation due to poor rating curves with the VSs away from in-situ Q stations, we could obtain improved Q estimates using ELQ and Jason-2 altimetry data compared to the estimates obtained from a single rating curve.
Our results are in alignment with [14,17], who concluded that (1) ELQ outperforms the previous method based on a single rating curve, (2) is one of the contributing factors determining the performance of ELQ, and (3) improvement of Q estimation might not be obtained when the performance of each base learner is high enough (e.g., NSE > 0.90).
In a future study, we will also use H from other altimetry missions, such as Sentinel-3A/B and Jason-3, in order to obtain a finer temporal resolution and enhance ELQ's performance. We will also apply ELQ to several other poorly gauged river basins to expand our knowledge of Q variation globally.