A Physically Based Empirical Localization Method for Assimilating Synthetic SWOT Observations of a Continental-Scale River : A Case Study in the Congo Basin

Water resource management has faced challenges in recent decades due to limited in situ observations and the limitations of hydrodynamic modeling. Data assimilation techniques have been proposed to improve hydrodynamic model outputs of local rivers (river length ≤ 1500 km) using synthetic observations of the future Surface Water and Ocean Topography (SWOT) satellite mission to overcome limited in situ observations and the limitations of hydrodynamic modeling. However, large-scale data assimilation schemes require computationally efficient filtering techniques, such as the Local Ensemble Transformation Kalman Filter (LETKF). Expansion of the assimilation domain to maximize observations is limited by error covariance caused by limited ensemble size in complex river networks, such as the Congo River. Therefore, we tested the LETKF algorithm in a continental-scale river (river length > 1500 km) using a physically based empirical localization method to maximize the observations available while filtering error covariance areas. Physically based empirical local patches were derived separately for each river pixel, considering spatial auto-correlations. An observing system simulation experiment (OSSE) was performed using empirical localization parameters to evaluate the potential of our method for estimating discharge. We found our method could improve discharge estimates considerably without affected from error covariance while fully using the available observations. We compared this experiment using empirical localization parameters with conventional fixed-shape local patches of different sizes. The empirical local patch OSSE showed the lowest normalized root mean square error of discharge for the entire Congo basin. Extending the conventional local patch without considering spatial auto-correlation results in very large errors in LETKF assimilation due to error covariance between small tributaries. The empirical local patch method has the potential to overcome the limitations of conventional local patches for continental-scale rivers using SWOT observations.


Introduction
Management of water resources is essential to society, as surface waters are vulnerable to floods and droughts.Although river discharge is the primary focus of water resource assessments [1], stream gauges have inadequate temporal and spatial resolution for detailed assessment of water resources.Recent advances in satellite technology have enabled estimation of river discharge from remote sensing data, complementing data collected with existing in situ gauge networks [2].The next-generation satellite altimetry mission Surface Water and Ocean Topography (SWOT) is intended to provide simultaneous mapping of inundated areas and inland surface waters (i.e., rivers, lakes, wetlands, and reservoirs) that vary both temporally and spatially using a Ka-band radar interferometer [3,4].The channel centerline and width (above 50 m) [5], which can be extracted from the dynamic water mask of SWOT [6], can be used to measure water storage changes in terrestrial water bodies and characterize river discharge [7].
Data assimilation methods can be used to extract information from space-borne measurements that is not directly observable [8].These methods are useful for reducing the uncertainty of hydraulic models and thus facilitating flood monitoring [9].Andreadis et al. [10] and Biancamaria et al. [5] used synthetic SWOT measurements to correct river hydrodynamic model forecasts in the Ohio (50 km reach) and Ob (1120 km reach) Rivers, respectively.Andreadis et al. [11] proposed a methodology for improved flood forecasting with a hydrodynamic model based on satellite water elevation and water area data (from nadir altimetry, LiDAR, synthetic aperture radar (SAR) imagery and SWOT) in a 500-km reach of the Ohio River.Durand et al. [12] and Yoon et al. [2] developed methods for estimating bathymetry and slope in a 240 km reach of the Amazon River floodplain and a 1580 km reach of the Ohio River, respectively.Pedinotti et al. [13] accurately estimated Manning's coefficient based on virtual SWOT observations of the Niger Basin.However, these studies focused only on local rivers or portions of a main stem without considering its tributaries, and were conducted in basins much smaller than the Congo River.
Large-scale hydrologic data assimilation demands an efficient Kalman filtering technique with a low computational burden.The computationally efficient Local Ensemble Transformation Kalman Filter (LETKF: [14]) has been used extensively for numerical weather prediction (NWP) at the global scale [6][7][8], because it can process a large number of variables efficiently in a local patch.This method has the ability to separately update the forecast for each pixel using the local patch, and this operation can be executed in parallel.Therefore, LETKF has the potential to be an efficient algorithm for estimation of river hydrodynamics with a low computational burden.
River hydrodynamics exhibit a large degree of spatial dependency, and thus increasing patch size enables extraction of information from distant observations.Conventionally, a square shaped local patch is defined to obtain information from distant observations.However, large conventional local patches introduce large sampling errors and may destabilize LETKF assimilation [15].Most hydrologic data assimilation studies examine local river sections (river length ≤ 1500 km) without considering tributaries (e.g., Andreadis et al. [10], Biancamaria et al. [5]; Yoon et al. [2]).Hence, these studies could remove the error covariance caused by small tributaries.In contrast, when assimilating an entire river basin like that of the Congo River, local patch shape and size must be adaptively selected for each river pixel to maximize the number of observations while reducing error covariance.Although local patch size and shape selection for NWP has been well studied (e.g., Houtekamer et al. [16]; Anderson [17]; Miyoshi et al. [15]), understanding the effect of the local patch size and shape, on hydrologic data assimilation remains limited.
When assimilating distant observations, spurious errors can occur due to error covariance caused by the limited ensemble size [18], despite proper local patch shape and size selection.The most common method for reducing such errors is observation or covariance localization, which uses a Gaussian [15,18,19] or fifth-order piecewise rational [2,5,20] weighting function.However, identification of the most appropriate localization parameters using ad hoc methods is a challenge.Most previous studies of hydrologic data assimilation have employed different covariance localization parameters for different river networks; e.g., 10 km [5], 250 km [21], and 200 km [20] for the Ob, Tennessee, and Niger rivers, respectively.Usually, such studies used a constant localization weighting function (constant localization parameter), as they focused on only one river stem.For complex river networks, such as the Congo, no method has yet been proposed to select the localization parameters for hydrologic data assimilation.The main aim of this study is to examine the potential of an adaptive empirical localization technique (e.g., local patch and observation localization parameter) for assimilating water surface elevation (WSE) to estimate river discharge for a continental-scale river (river length > 1500 km).The empirical local patch was obtained considering the spatial auto-correlations of the simulated WSE for individual river pixels separately.We hypothesize that the empirical local patch proposed here can filter out observations affected by error covariance and thereby obtain information from distant observations more effectively than the conventional local patch.We adopted LETKF [14] to assimilate WSE for the integration of future SWOT observations into a global river hydrodynamic model, Catchment-based Macro-scale Floodplain (CaMa-Flood: [22]), achieving continental-scale data assimilation at a reasonable computational cost.In Section 2, we describe the study area.A detailed description of the assimilation methodology is presented in Section 3 and the experimental conditions are explained in Section 4. Results and discussion are provided in Section 5, followed by our conclusion in Section 6.

Study Area
In this study, we focused on the Congo River (Figure 1a), which is the second-longest river in Africa.The major sources of the Congo River are located in the East African highlands, including Lake Tanganyika and Lake Mweru.The river is approximately 4700 km in length, with a drainage area of 4 million km 2 and an average discharge of 41,000 m 3 /s at its mouth.Its major tributaries include the Alima, Aruwimi, Elila, Itimbiri, Kwa, Lomani, Lowa, Lufira, Lukuga, Lulonga, Luvua, Mongala, Sangha, Ruki, and Ubangi.Figure 1a shows a map of the Congo River, with red dots indicating the locations used for results comparisons, which are labeled C 1 -C 6 from upstream to downstream.
Two flow regimes can be observed in the Congo Basin, depending on the geographic location.North of the Equator, with contrasting wet and dry seasons, a single peak is present in the annual discharge distribution.The Oubangui tributary shows a marked maximum discharge between September and November, and strongly reduced discharge between February and April.In the southern Congo basin, the Lualaba, Luvua and Luapula tributaries also show single peaks, but these occur between March and May, with significantly lower discharge between September and November.Thus, the downstream (Kinshasa) reach of the Congo River shows two hydrographic peaks, with reduced flow in June and July.Furthermore, Figure 1b shows a hydrograph at the Kinshasa Global Runoff Data Centre (GRDC) location with two clear peaks: a small peak in March-April and a large peak in November-December.elevation (WSE) to estimate river discharge for a continental-scale river (river length > 1500 km).The empirical local patch was obtained considering the spatial auto-correlations of the simulated WSE for individual river pixels separately.We hypothesize that the empirical local patch proposed here can filter out observations affected by error covariance and thereby obtain information from distant observations more effectively than the conventional local patch.We adopted LETKF [14] to assimilate WSE for the integration of future SWOT observations into a global river hydrodynamic model, Catchment-based Macro-scale Floodplain (CaMa-Flood: [22]), achieving continental-scale data assimilation at a reasonable computational cost.In Section 2, we describe the study area.A detailed description of the assimilation methodology is presented in Section 3 and the experimental conditions are explained in Section 4. Results and discussion are provided in Section 5, followed by our conclusion in Section 6.

Study Area
In this study, we focused on the Congo River (Figure 1a), which is the second-longest river in Africa.The major sources of the Congo River are located in the East African highlands, including Lake Tanganyika and Lake Mweru.The river is approximately 4700 km in length, with a drainage area of 4 million km 2 and an average discharge of 41,000 m 3 /s at its mouth.Its major tributaries include the Alima, Aruwimi, Elila, Itimbiri, Kwa, Lomani, Lowa, Lufira, Lukuga, Lulonga, Luvua, Mongala, Sangha, Ruki, and Ubangi.Figure 1a shows a map of the Congo River, with red dots indicating the locations used for results comparisons, which are labeled C1-C6 from upstream to downstream.
Two flow regimes can be observed in the Congo Basin, depending on the geographic location.North of the Equator, with contrasting wet and dry seasons, a single peak is present in the annual discharge distribution.The Oubangui tributary shows a marked maximum discharge between September and November, and strongly reduced discharge between February and April.In the southern Congo basin, the Lualaba, Luvua and Luapula tributaries also show single peaks, but these occur between March and May, with significantly lower discharge between September and November.Thus, the downstream (Kinshasa) reach of the Congo River shows two hydrographic peaks, with reduced flow in June and July.Furthermore, Figure 1b shows a hydrograph at the Kinshasa Global Runoff Data Centre (GRDC) location with two clear peaks: a small peak in March-April and a large peak in November-December.
The Congo basin was selected as our study area for several reasons.First, the Congo Basin is a major river network, i.e., the second longest on the African continent.Second, it is affected by a low frequency of SWOT observations, as SWOT observations are less common near the equator than at higher latitudes.Third, Africa has the poorest in situ river gauging network among continents [23].Finally, there were very limited amount of SWOT-related studies have been carried out in the Congo basin (e.g., Revel et al. [24]).The Congo basin was selected as our study area for several reasons.First, the Congo Basin is a major river network, i.e., the second longest on the African continent.Second, it is affected by a low frequency of SWOT observations, as SWOT observations are less common near the equator than at higher latitudes.Third, Africa has the poorest in situ river gauging network among continents [23].Finally, there were very limited amount of SWOT-related studies have been carried out in the Congo basin (e.g., Revel et al. [24]).

Framework of the Virtual Assimilation Experiment
We used an observing system simulation experiment (OSSE) [2,10] to assess the potential estimation of discharge through assimilation of WSE at the continental scale.The OSSE consisted of three separate simulations: the 'true simulation', 'corrupted simulation', and 'assimilated simulation' (Figure 2).The CaMa-Flood hydrodynamic model [22] was used to generate the true, corrupted, and assimilated simulation estimates for the data assimilation framework in this study (see Supplementary Materials).
To create synthetic SWOT observations, we carried out the true simulation to generate the true virtual water state, which was continuous in space and time.In the true simulation, the river hydrodynamic model was forced by true (i.e., assumed to be true) input runoff forcing (or non-corrupted runoff) and true water state data (river discharge, WSE, and water storage) were generated.Then, synthetic SWOT observations were generated by applying a SWOT coverage mask delineated using orbit data [25] to the true WSEs, followed by addition of Gaussian noise.Therefore, we assumed that only a portion of the true water state was known (i.e., WSEs in the SWOT observation area with some observation errors) when data assimilation was performed (creation of virtual SWOT observations is explained in Section 3.4).
The 'corrupted simulation' was carried out to compare the corrupted state of the model with the true and assimilated simulations.The corrupted simulation in this study was executed using corrupted model settings (i.e., corrupted input runoff forcing, corrupted Manning's coefficient) representing errors in both forcing and model parameters.All other parameters (i.e., river channel depth, river width, elevation) in the corrupted simulation were identical to those in the true simulation.Furthermore, noise was added to the corrupted settings (runoff and Manning's coefficient) to generate the ensemble states used in the assimilation procedure.

Framework of the Virtual Assimilation Experiment
We used an observing system simulation experiment (OSSE) [2,10] to assess the potential estimation of discharge through assimilation of WSE at the continental scale.The OSSE consisted of three separate simulations: the 'true simulation', 'corrupted simulation', and 'assimilated simulation' (Figure 2).The CaMa-Flood hydrodynamic model [22] was used to generate the true, corrupted, and assimilated simulation estimates for the data assimilation framework in this study.
To create synthetic SWOT observations, we carried out the true simulation to generate the true virtual water state, which was continuous in space and time.In the true simulation, the river hydrodynamic model was forced by true (i.e., assumed to be true) input runoff forcing (or noncorrupted runoff) and true water state data (river discharge, WSE, and water storage) were generated.Then, synthetic SWOT observations were generated by applying a SWOT coverage mask delineated using orbit data [25] to the true WSEs, followed by addition of Gaussian noise.Therefore, we assumed that only a portion of the true water state was known (i.e., WSEs in the SWOT observation area with some observation errors) when data assimilation was performed (creation of virtual SWOT observations is explained in Section 3.4).
The 'corrupted simulation' was carried out to compare the corrupted state of the model with the true and assimilated simulations.The corrupted simulation in this study was executed using corrupted model settings (i.e., corrupted input runoff forcing, corrupted Manning's coefficient) representing errors in both forcing and model parameters.All other parameters (i.e., river channel depth, river width, elevation) in the corrupted simulation were identical to those in the true simulation.Furthermore, noise was added to the corrupted settings (runoff and Manning's coefficient) to generate the ensemble states used in the assimilation procedure.We executed the 'assimilated simulation' to test the potential utility of SWOT observations for estimating discharge.We used the same model settings employed in the corrupted simulation, but with assimilation of synthetic SWOT observations.At the end of each day, the synthetic SWOT observations were assimilated into the water state forecast, and the initial conditions of the simulation for the following day were updated to reflect the assimilated water state.The assimilation of WSE was carried out using LETKF.

Hydrodynamic Model Description and Implementation
We used the global river hydrodynamic model CaMa-Flood [22,26,27] to propagate the hydrodynamic parameters over time within our data assimilation framework.CaMa-Flood receives runoff data from a land surface model (LSM) as the input forcing (amount of water entering a river from a unit of land area in mm/day), and simulates river and floodplain hydrodynamics (i.e., river discharge, WSE, inundated area, and surface water storage) at the global scale.The spatial resolution of CaMa-Flood (set to 0.25 • in this study) is coarser than that of two-dimensional flood inundation models (typically < 1 km) (e.g., Bates et al. [28]).CaMa-Flood calculates river discharge using a local inertial flow equation (a computationally efficient modification of the shallow water equation) [27,28].Furthermore, the WSE values simulated using CaMa-Flood were directly comparable to WSE observations based on satellite altimetry [26].Though the 0.25 • (~25 km near the equator) resolution simulation by CaMa-Flood was applicable for large-scale rivers [22,26], comparison between model and observation might be difficult in smaller and steep rivers.For the fully use of satellite altimetry, higher-resolution river model is being developed currently.Therefore, we selected CaMa-Flood as the hydrodynamic core of our data assimilation framework.
We used the runoff output from the Minimal Advanced Treatment of Surface Interaction Runoff (MATSIRO) [29] LSM as the input runoff forcing for CaMa-Flood.Previous assessments showed that river hydrodynamics were reasonably well represented by the combination of CaMa-Flood and MATSIRO runoff forcing [22,26,27], supporting the use of CaMa-Flood simulations as a 'virtual truth' method for the data assimilation framework.For the true simulation, the runoff from MATSIRO [30] was used directly, whereas in the corrupted and assimilated simulations, the runoff forcing was intentionally modified to represent uncertainty in runoff data.We conducted the experiment over 1 year using runoff forcing from 2008.Thus, the initial conditions of the true simulation were determined from 2007 true runoff data.
We added an artificial bias to the true runoff forcing to create the corrupted runoff, following previous SWOT assimilation experiments [10].Thus, corrupted runoff values were generated through addition of a −25% bias to the true runoff forcing.In general, river discharge and WSE in the corrupted simulation are 25% smaller than those of the true simulation due to this bias in the runoff forcing.The initial conditions of the corrupted and assimilated simulations were generated using corrupted runoff data for 2007.
In this study, the ensemble of model simulations was represented using multiple runoff forcing conditions.We used 20 ensembles in this study, although errors in Monte Carlo sampling decrease with increasing ensemble size [31].Ensemble size strongly affects the computational cost of data assimilation, as the CaMa-Flood model has a higher computational burden than the data assimilation algorithm.We prepared 20 different runoff forcing conditions by adding a random Gaussian noise variable to the corrupted runoff (Figure 2), to simulate 20 different water state forecasts in the assimilated simulation.The standard deviation of the Gaussian noise was set to 25% of the monthly mean runoff value.
Furthermore, artificially corrupted Manning's coefficients were used for the corrupted and assimilated simulations, representing errors in model parameters or formulation.In the true simulation, Manning's coefficient was determined using the original CaMa-Flood model (0.03 for river channel flow and 0.1 for floodplains).Meanwhile, for the corrupted and assimilated simulations, Manning's coefficient (river channel flow) was determined by multiplying the original Manning's coefficient by a Gaussian noise term representing a unit mean and 25% standard deviation Hence, Manning's coefficient is distributed approximately normally over the range of 0.0225 to 0.0375 for the river channel.Manning's coefficients for natural streams seems to be vary between 0.02 to 0.05 according to Chow [32], Barnes [33], and Akan [34].In addition the error of the model should not to be very large, hence the assimilation always finds the observation is much accurate.Therefore we select CaMa-Flood Manning's coefficient to be normally distributed between −25% to +25% of the original Manning's coefficient in assimilated and corrupted simulations.

Data Assimilation Strategy
A data assimilation scheme is typically used to estimate time-varying model state variables, e.g., hydraulic model states such as discharge or water depth.In this study, we used LETKF [14,15], which is a variation of the Ensemble Kalman Filter (EnKF) [31], an advanced Kalman filter (KF) [35], to simultaneously assimilate WSE from SWOT observations.The computational cost of using an EnKF at the global scale can be reduced with LETKF, which enables global-scale data assimilation.
Our implementation of the data assimilation strategy involves: (1) propagation of the model state variables through time with the CaMa-Flood model, and (2) updating the state variables based on SWOT observations using LETKF.The LETKF analysis equation for the update step is: where X a is the posterior state estimator (or assimilator); X f is the prior state estimator (or forecast); Y o is the observation (here, WSE); H is the observation operator, which is linearly related to the observation and the state; m is the number of ensembles; E f is the prior state error covariance, which is obtained directly from the ensembles; R is the observation error covariance, determined from the uncertainty of the measurements; w is the weighting term for the observation localization [15]; and VDV T is given by: where I is the unit matrix with dimension m which is the number of ensembles.VD −1 V T and VD −1/2 V T can be calculated from the eigenvalue decomposition of VDV T .

Generation of Synthetic SWOT Observations
We generated synthetic SWOT observations at the end of each daily time step using the WSE from the true simulation (Figure 3, left).Generation of synthetic SWOT observations followed three steps: (1) obtaining WSE from the true simulation, (2) delineating SWOT observations using the SWOT coverage mask, and (3) adding observation error (following the basic steps presented in Figure 3).The true simulation was carried out as described in Section 3.1.The SWOT coverage mask was created using SWOT orbit data, which are available online from the website of the National Centre for Space Studies, France [25].Orbit data provide the path of the 120 km wide observation swath containing a 20 km nadir gap for each day of the 21-day orbit cycle.We converted these path data into a 0.25 • observation coverage mask with the same grid coordinate system used by CaMa-Flood (Figure 3: upper middle).If the center point of a 0.25 • grid was within the observation coverage of the path data, the grid was considered within the coverage mask.Because the observation area differed daily within the orbit cycle, we prepared 21 coverage masks to generate synthetic SWOT observations for grids containing rivers wider than 50 m within the coverage mask.Moreover, we assumed Gaussian random error with zero mean and standard deviation of 5 cm (Figure 3, lower right), following previous studies [2,10,11].SWOT data should have error of less than 10 cm for areas greater than 1 km 2 [11,23]; as the error decreases exponentially with increasing averaging area [10], our assumption about observation error appears valid.The CaMa-Flood grid resolution of 0.25 • is about 25 × 25 km near the equator.

Empirical Determination of the Local Patch
Here, we derive the adaptive local patch shapes and sizes to filter the error covariance observations as much as possible; such filtering cannot be achieved using the conventional local patch.Empirical local patches were derived from CaMa-Flood-modeled WSE for each river pixel separately.First, CaMa-Flood-modeled WSE for 1980 to 2000 was converted into spatial dependency weights.This spatial dependency weighting was derived from the auto-correlation length, which was obtained from semi-variogram analysis.Calculating spatial dependency weights involved four steps: (1) removing trends, (2) removing seasonality, (3) standardizing, and (4) finding auto-correlation lengths.Then, we derived the empirical local patches by defining the spatial dependency weight threshold for each river pixel separately.

Experimental Conditions
We performed three different OSSEs, referred to as the "Empirical", "Zero" and "Fixed" local patch experiments, to examine the efficiency of the assimilation scheme.We examined the potential of the empirical local patch to assimilate distance observations without being affected by the error covariance due to the limited ensemble size.In the empirical patch experiment, empirically derived local patches were used for assimilation.A conventional fixed square-shaped local patch was used in the fixed local patch experiments.We assimilated observations only in the target pixel for the Zero local patch experiment.The details of these experiments are explained in Sections 4.1, 4.2 and 4.3 for the empirical, zero and fixed local patch experiments, respectively.The experimental conditions are summarized in Table 1.

Empirical Local Patch Experiment
Here, we developed physically-based local patches using CaMa-Flood-modeled WSE data for 1980 to 2000.We did not consider the year 2008 (Section 3.2) for local patch derivation.A spatial dependency weight was calculated using CaMa-Flood-modeled WSE transformed into a distribution similar to a normal distribution.The spatial dependency weighting function was derived based on

Empirical Determination of the Local Patch
Here, we derive the adaptive local patch shapes and sizes to filter the error covariance observations as much as possible; such filtering cannot be achieved using the conventional local patch.Empirical local patches were derived from CaMa-Flood-modeled WSE for each river pixel separately.First, CaMa-Flood-modeled WSE for 1980 to 2000 was converted into spatial dependency weights.This spatial dependency weighting was derived from the auto-correlation length, which was obtained from semi-variogram analysis.Calculating spatial dependency weights involved four steps: (1) removing trends, (2) removing seasonality, (3) standardizing, and (4) finding auto-correlation lengths.Then, we derived the empirical local patches by defining the spatial dependency weight threshold for each river pixel separately.

Experimental Conditions
We performed three different OSSEs, referred to as the "Empirical", "Zero" and "Fixed" local patch experiments, to examine the efficiency of the assimilation scheme.We examined the potential of the empirical local patch to assimilate distance observations without being affected by the error covariance due to the limited ensemble size.In the empirical patch experiment, empirically derived local patches were used for assimilation.A conventional fixed square-shaped local patch was used in the fixed local patch experiments.We assimilated observations only in the target pixel for the Zero local patch experiment.The details of these experiments are explained in Sections 4.1-4.3 for the empirical, zero and fixed local patch experiments, respectively.The experimental conditions are summarized in Table 1.

Empirical Local Patch Experiment
Here, we developed physically-based local patches using CaMa-Flood-modeled WSE data for 1980 to 2000.We did not consider the year 2008 (Section 3.2) for local patch derivation.A spatial dependency weight was calculated using CaMa-Flood-modeled WSE transformed into a distribution similar to a normal distribution.The spatial dependency weighting function was derived based on the auto-correlation length, which was obtained from semi-variogram analysis.Then, we derived the local patches by defining the threshold of the spatial dependency weights (Figure 4).Hereafter, we refer to these local patches derived from CaMa-Flood-modeled WSE as "Empirical" local patches.Next, we carried out an OSSE with the localization parameters determined from semi-variogram analysis (hereafter, the "Empirical" local patch OSSE).the auto-correlation length, which was obtained from semi-variogram analysis.Then, we derived the local patches by defining the threshold of the spatial dependency weights (Figure 4).Hereafter, we refer to these local patches derived from CaMa-Flood-modeled WSE as "Empirical" local patches.Next, we carried out an OSSE with the localization parameters determined from semi-variogram analysis (hereafter, the "Empirical" local patch OSSE).We selected 0.6 as the threshold for deriving the local patch for each target pixel using the spatial dependency weight.We examined the sensitivity of the weighting threshold for the local patch by comparing the root mean square error (RMSE) of the assimilated WSE.We found that the threshold of 0.6 performs well with 20 ensembles and note that stricter threshold values decrease the observation frequency.Therefore, a weighting threshold value of 0.6 was selected for our data assimilation scheme.Furthermore, we used 5 pixels along each river stem as the baseline for the Empirical local patch.
We derived the observation localization weighting factor to force large errors for distant observations in the LETKF algorithm.In this study, we used a Gaussian function to calculate the localization weight [15].
where σ is the localization length and r is the distance between the target pixel and an observation.
Here, we use a lag distance corresponding to the threshold of the spatial dependency weight (used to define the local patch, which is 0.6 in this study) as the limiting value; the weight drops to zero beyond that lag distance, following a fifth-order piecewise rational function [36].Therefore, we calculated the localization parameter σ as follows: We selected 0.6 as the threshold for deriving the local patch for each target pixel using the spatial dependency weight.We examined the sensitivity of the weighting threshold for the local patch by comparing the root mean square error (RMSE) of the assimilated WSE.We found that the threshold of 0.6 performs well with 20 ensembles and note that stricter threshold values decrease the observation frequency.Therefore, a weighting threshold value of 0.6 was selected for our data assimilation scheme.Furthermore, we used 5 pixels along each river stem as the baseline for the Empirical local patch.
We derived the observation localization weighting factor to force large errors for distant observations in the LETKF algorithm.In this study, we used a Gaussian function to calculate the localization weight [15].
where σ is the localization length and r is the distance between the target pixel and an observation.
Here, we use a lag distance corresponding to the threshold of the spatial dependency weight (used to define the local patch, which is 0.6 in this study) as the limiting value; the weight drops to zero beyond that lag distance, following a fifth-order piecewise rational function [36].Therefore, we calculated the localization parameter σ as follows: Water 2019, 11, 829 where a is the lag distance, corresponding to the threshold used to define the local patch.We did not assimilate observations beyond the lag distance a from the target pixel.

Zero Local Patch Experiment
We assimilated river pixels only when a direct observation is available at that pixel and the WSE of only the observed pixel was updated.Thus, we did not consider any observation localization techniques here, as we did not assimilate distant observations.

Empirical Local Patch Experiment
In this section, we discuss the results of the data assimilation scheme developed using empirical local patches and evaluate the potential of a future SWOT mission to estimate river discharge in a situation with 25% negatively biased runoff.The relative effect for the Congo River was determined using AI, considering locations C1-C6.
Hydrographs collected at locations C1-C6 during the simulation period (366 days) are shown in Figure 6a-f

Evaluation Method
We defined the assimilation index (AI) [37] to evaluate the effectiveness of data assimilation in a virtual experiment.AI is calculated from the ratio of river discharge error rates between the assimilated and corrupted simulations given by the following equation: AI describes the similarity between the assimilated and true simulations compared to the corrupted simulation.High AI (near the maximum value of 1) indicates that the assimilated discharge is closer to the true discharge than to the corrupted discharge, while low AI indicates that the assimilated discharge did not have improved accuracy relative to the corrupted discharge.AI is particularly useful for evaluating the effectiveness of data assimilation, as river discharge in the corrupted simulation is generally 25% lower than that in the true simulation for most places and times.AI is a metric representing the relative effectiveness of data assimilation and not a measure of simulation accuracy, such as the Nash-Sutcliffe (NS) coefficient [38].In addition, AI can be calculated for any time and location during the experiment, enabling analysis of when and where the data assimilation framework effectively estimated river discharge.We excluded days when the true and corrupted discharge values were similar (<10% error) when calculating the annual mean AI.Furthermore, we compared the fixed and empirical local patch experiments using normalized root mean square error (NRMSE).NRMSE is calculated as the RMSE based on true discharge and normalized to the same quantity.

Results and Discussion
This section outlines the results derived from the Empirical, Zero, Fixed-Small and Fixed-Large local patch OSSEs.We demonstrate the effectiveness of hydrological data assimilation using the Empirical local patch in Section 5.1.A comparison of the Empirical local patch assimilation scheme with the other local patch experiments is provided in Section 5.2.Furthermore, we present the details of the assimilation and the computational efficacy of our method in Sections 5.3 and 5.4, respectively.

Empirical Local Patch Experiment
In this section, we discuss the results of the data assimilation scheme developed using empirical local patches and evaluate the potential of a future SWOT mission to estimate river discharge in a situation with 25% negatively biased runoff.The relative effect for the Congo River was determined using AI, considering locations C 1 -C 6 .
Hydrographs collected at locations C 1 -C 6 during the simulation period (366 days) are shown in Figure 6a-f.Red, blue, and black lines indicate the assimilated, corrupted and true discharge values, respectively.The green line represents AI.When the true and corrupted discharge values are very similar (within 10%), we used a yellow line to indicate AI.Green/Yellow dots on the AI curve represent days with direct observations for the target pixel.Mean AI and percentage bias (pBias) are provided in the upper left corner of the graph for each location, C 1 -C 6 .
Figure 6a shows the hydrograph for C 1 , which is the most upstream location studied.The assimilated (red line) and true (black line) discharge values are similar for most of the simulation period, except in January, the end of June, and the middle of October.Figure 6b presents the time series of discharge at location C 2 , immediately downstream of the confluence of the Lualaba (the main tributary of the Congo River) and Lindi (a small tributary).The assimilated (red line) and true discharge (black line) are generally similar, but some low AI values were observed in the beginning of January, the end of May, and the beginning of December.Figure 6c shows the hydrograph of location C 3 , located on the Oubangui tributary.Low AI values were observed in January, the end of April, and the end of May. Figure 6d shows the time series of discharge at location C 4 , on the Kasai tributary.AI remained high (>0.8)for most of the simulation period, but was low at the beginning of January, beginning of August, and beginning of November.The downstream locations C 5 and C 6 are presented in Figures 6e  and 6f, respectively.The assimilated discharge (red lines) is almost identical to the true discharge (black lines), except in the beginning of January.
Figure 7 shows the spatial distribution of annual mean AI for the Congo Basin in the Empirical local patch experiment, where the annual mean discharge was greater than 500 m 3 /s.The annual mean AI was computed for each grid to compare the effectiveness of data assimilation spatially.Annual mean AI is nearly 1 for most large tributaries (Figure 7), indicating good assimilation.We excluded days from the calculation of annual mean AI if the pBias was <10% of the corrupted discharge, with respect to the true discharge (e.g., yellow lines in Figure 6).The main stem and large tributaries (Oubangui, Kasai, and Lualaba) of the Congo River have high mean AI values (>0.8), and other tributaries, such as the Lulonga, Sangha, Lomami, Lindi, Kotto, and Uele, also exhibit relatively high mean AI (>0.6).Only the most upstream river pixels show very low AI (<0.3).Although most upstream sites have low efficiency of assimilation, the majority of the Congo River appears to be reasonably well estimated, with river discharge estimates close to the true values.Figure 6a shows the hydrograph for C1, which is the most upstream location studied.The assimilated (red line) and true (black line) discharge values are similar for most of the simulation period, except in January, the end of June, and the middle of October.Figure 6b presents the time series of discharge at location C2, immediately downstream of the confluence of the Lualaba (the main tributary of the Congo River) and Lindi (a small tributary).The assimilated (red line) and true discharge (black line) are generally similar, but some low AI values were observed in the beginning of January, the end of May, and the beginning of December.Figure 6c shows the hydrograph of location C3, located on the Oubangui tributary.Low AI values were observed in January, the end of To our knowledge, this is the first attempt to use physically based localization parameters for hydrological data assimilation with LETKF.Although conversion from the initial corrupted state to the well-assimilated state could take 10-15 days, the peaks, troughs, and time to peak discharge of the respective hydrographs were recreated well.AI was unreasonably depressed when the true and corrupted discharge values were similar, despite the effectiveness of assimilation.AI was improved, even for upstream sites and places where direct observations are unavailable, i.e., locations where river width < 50 m.Thus, local patches enable better estimation of discharge, even without direct observations.

Empirical Local Patches
Examples of local patches for target pixels C 1 -C 6 are shown in Figure 8a-f.The upstream target pixels (Figure 8a-d) have small local patches compared to the downstream target pixels (Figure 8e,f).We derived local patches for each river pixel, which have unique shapes and sizes based on the river's hydrodynamics.We were able to expand the number of observations, while filtering the observations based on error covariance using these empirical local patches.Most of these local patches consist of a portion of the downstream river stem and a major contributing upstream river stem, which is equivalent to the area considered in small-scale hydrologic data assimilation studies (e.g., Andreadis et al. [10]; Biancamaria et al. [5]; Yoon et al. [2]; Munier et al. [20]).Hence, we were able to use the maximum number of observations while minimizing errors due to error covariance caused by the limited ensemble spread in our data assimilation scheme using the adaptive empirical local patch technique.The size of each empirical local patch is determined from the river hydrodynamics, and therefore empirical local patches derived for different river pixels had different sizes (Figure 9a).Large rivers are associated with large empirical local patches (in terms of area), whereas the empirical local patches of small rivers are small.Large river stems show a right-skewed distribution of empirical local patch sizes, with a mean of 80 pixels.Medium and small rivers show relatively small spread,  The size of each empirical local patch is determined from the river hydrodynamics, and therefore empirical local patches derived for different river pixels had different sizes (Figure 9a).Large rivers are associated with large empirical local patches (in terms of area), whereas the empirical local patches of small rivers are small.Large river stems show a right-skewed distribution of empirical local patch sizes, with a mean of 80 pixels.Medium and small rivers show relatively small spread, The size of each empirical local patch is determined from the river hydrodynamics, and therefore empirical local patches derived for different river pixels had different sizes (Figure 9a).Large rivers are associated with large empirical local patches (in terms of area), whereas the empirical local patches of small rivers are small.Large river stems show a right-skewed distribution of empirical local patch sizes, with a mean of 80 pixels.Medium and small rivers show relatively small spread, with mean pixel sizes of 13 and 5, respectively.Therefore, small rivers have relatively smaller local patches, while large rivers have large local patches.with mean pixel sizes of 13 and 5, respectively.Therefore, small rivers have relatively smaller local patches, while large rivers have large local patches.≥ watershed area > 5000 km 2 ), and small (watershed area < 5000 km 2 ) river pixels.

Observation Frequency
Figure 10 shows the number of observations available for assimilation within the derived empirical local patch (indicated by light blue bars) for locations C1-C6.Days with direct SWOT observations are indicated by red stars.Here, we assume that SWOT observations have a similar spatial resolution to the CaMa-Flood model, which is 0.25° for our virtual experiments.The number of observations at location C4, which is represented by only one direct SWOT observation per SWOT cycle, increased to 34 per cycle.The number of observations increased to 72, 180, 144, 34, 291, and 277, respectively, for C1-C6.Hence, through derivation of an empirical local patch, the observation frequency increases by a large margin compared to the use of direct observations.We note that the observation frequency is extended using the local patch, enabling the WSE to be updated without direct observations (see Figure 10).Upstream sites and small tributaries that may not be directly observable with SWOT gain the greatest benefit from the extension of the local patch.Hence, through derivation of an empirical local patch, the observation frequency increases by a large margin compared to the use of direct observations.We note that the observation frequency is extended using the local patch, enabling the WSE to be updated without direct observations (see Figure 10).Upstream sites and small tributaries that may not be directly observable with SWOT gain the greatest benefit from the extension of the local patch.

Physically Based Observation Localization
Spatial dependency weights follow the hydrodynamics of the river and can be used successfully to derive the covariance among adjacent river pixels.Spatial dependency weights represent features such as sudden topographic changes (Figure 11b) (i.e., waterfalls), connecting tributaries carrying large discharge (Figure 11c) and WSE slope changes (Figure 11d) (i.e., from mild to steep).Most previous studies have used fifth-order piecewise rational weighting functions to replicate localization errors with a constant localization parameter [5,20,21], which may not represent the actual flow behavior of the river.In addition, some NWP studies have noted the importance of flow-dependent localization in their data assimilation schemes [17,[39][40][41].We derived observation localization parameters adaptively for each river pixel using a lag distance corresponding to the threshold (0.6 for deriving the empirical local patch) and Equation ( 4).Thus, we derived observation localization weights that better represent the hydrodynamics of the river.

Physically Based Observation Localization
Spatial dependency weights follow the hydrodynamics of the river and can be used successfully to derive the covariance among adjacent river pixels.Spatial dependency weights represent features such as sudden topographic changes (Figure 11b) (i.e., waterfalls), connecting tributaries carrying large discharge (Figure 11c) and WSE slope changes (Figure 11d) (i.e., from mild to steep).Most previous studies have used fifth-order piecewise rational weighting functions to replicate localization errors with a constant localization parameter [5,20,21], which may not represent the actual flow behavior of the river.In addition, some NWP studies have noted the importance of flow-dependent localization in their data assimilation schemes [17,[39][40][41].We derived observation localization parameters adaptively for each river pixel using a lag distance corresponding to the threshold (0.6 for deriving the empirical local patch) and Equation 4. Thus, we derived observation localization weights that better represent the hydrodynamics of the river.

Comparison among OSSEs
Figure 12 represents the hydrographs for Empirical, Zero, Fixed-Small, and Fixed-Large experiments in panels a-d, respectively, for the GRDC location at Kinshasa on the Congo River (see Figure 1).Annual mean AI indicates that empirical patch assimilation outperformed other assimilation methods.Ensemble spread is lowest in the Empirical local patch experiment and increased in the order of Zero, Fixed-Small, and Fixed-Large OSSEs.Comparison with the hydrograph from Kinshasa (which is on the main stem of the Congo River) revealed that the empirical

Comparison among OSSEs
Figure 12 represents the hydrographs for Empirical, Zero, Fixed-Small, and Fixed-Large experiments in panels a-d, respectively, for the GRDC location at Kinshasa on the Congo River (see Figure 1).Annual mean AI indicates that empirical patch assimilation outperformed other assimilation methods.Ensemble spread is lowest in the Empirical local patch experiment and increased in the order of Zero, Fixed-Small, and Fixed-Large OSSEs.Comparison with the hydrograph from Kinshasa (which is on the main stem of the Congo River) revealed that the empirical local patch scheme is best among the Empirical, Zero, Fixed-Small, and Fixed-Large local patch assimilation schemes in terms of mean AI and ensemble spread.We found that the low assimilation efficiency of the Empirical local patch experiment was caused by the low observation frequency.As noted above, the observation frequency increased with the use of a larger local patch.However, the annual mean AI of both fixed local patch experiments are smaller than that of the zero patch OSSE.This discrepancy is primarily due to two factors: (1) uneven distribution of SWOT cycle observations (Figure 13) and ( 2) low spatial dependency of the observations in small tributaries on the target pixel (tributaries that the empirical local patch did not include) (Figure 14).The empirical local patch contains more than 20 observations for each day of the 21-day SWOT cycle (Figure 14a), whereas for fixed local patches (Fixed-Small and Fixed-Large), observations are available only on days 3, 4, 9, 13, 19, and 20 (Figure 14b,c).Furthermore, spatial dependency along the main stem of the Congo River appears as far away as Kisangani (see Figure 1).In the fixed local patches, such spatial dependency was not considered for assimilation.Thus, the empirical local patch for Kinshasa on the Congo River performs better than the fixed local patches.We found that the low assimilation efficiency of the Empirical local patch experiment was caused by the low observation frequency.As noted above, the observation frequency increased with the use of a larger local patch.However, the annual mean AI of both fixed local patch experiments are smaller than that of the zero patch OSSE.This discrepancy is primarily due to two factors: (1) uneven distribution of SWOT cycle observations (Figure 13) and ( 2) low spatial dependency of the observations in small tributaries on the target pixel (tributaries that the empirical local patch did not include) (Figure 14).The empirical local patch contains more than 20 observations for each day of the 21-day SWOT cycle (Figure 14a), whereas for fixed local patches (Fixed-Small and Fixed-Large), observations are available only on days 3, 4, 9, 13, 19, and 20 (Figure 14b,c).Furthermore, spatial dependency along the main stem of the Congo River appears as far away as Kisangani (see Figure 1).In the fixed local patches, such spatial dependency was not considered for assimilation.Thus, the empirical local patch for Kinshasa on the Congo River performs better than the fixed local patches.In Figure 15a-c, we present the difference in mean annual AI between the empirical local patch experiment and the zero, fixed-small, and fixed-large local patch experiments, respectively.For most places on the Congo River, the annual mean AI of the Zero local patch was one unit (difference ≈ 0.1) lower than that obtained from the Empirical local patch experiment.The Fixed-Small and Fixed-Large OSSEs show a similar pattern, where large streams have lower annual mean AI than the Empirical patch experiment (Fixed-Small: 0.1~0.3,Fixed-Large: 0.3~0.5).Meanwhile, the upstream sites and smaller tributaries show slightly elevated AI in the fixed local patch experiment.Thus, assimilation of the empirical local patch is better than that of the Zero local patch for most of the Congo River and better than the Fixed-Small and Fixed-Large local patches in downstream reaches of large river stems.We carried out a very large local patch OSSE, using 81 × 81 pixels with 20 ensembles, in addition to the Zero, Fixed-Large, Fixed-Small, and Empirical local patch experiments.We found that WSE assimilation leads to large errors caused by error covariance between small tributaries due the limited ensemble size [15].Several researchers have reported spurious errors caused by error covariance due to limited ensemble size when assimilating distant observations in NWP [15,18,39].Our adaptive empirical local patches can effectively filter observations with error covariance and extend the local patch to use the maximum possible number of observations.In Figure 15a-c, we present the difference in mean annual AI between the empirical local patch experiment and the zero, fixed-small, and fixed-large local patch experiments, respectively.For most places on the Congo River, the annual mean AI of the Zero local patch was one unit (difference ≈ 0.1) lower than that obtained from the Empirical local patch experiment.The Fixed-Small and Fixed-Large OSSEs show a similar pattern, where large streams have lower annual mean AI than the Empirical patch experiment (Fixed-Small: 0.1~0.3,Fixed-Large: 0.3~0.5).Meanwhile, the upstream sites and smaller tributaries show slightly elevated AI in the fixed local patch experiment.Thus, assimilation of the empirical local patch is better than that of the Zero local patch for most of the Congo River and better than the Fixed-Small and Fixed-Large local patches in downstream reaches of large river stems.In Figure 15a-c, we present the difference in mean annual AI between the empirical local patch experiment and the zero, fixed-small, and fixed-large local patch experiments, respectively.For most places on the Congo River, the annual mean AI of the Zero local patch was one unit (difference ≈ 0.1) lower than that obtained from the Empirical local patch experiment.The Fixed-Small and Fixed-Large OSSEs show a similar pattern, where large streams have lower annual mean AI than the Empirical patch experiment (Fixed-Small: 0.1~0.3,Fixed-Large: 0.3~0.5).Meanwhile, the upstream sites and smaller tributaries show slightly elevated AI in the fixed local patch experiment.Thus, assimilation of the empirical local patch is better than that of the Zero local patch for most of the Congo River and better than the Fixed-Small and Fixed-Large local patches in downstream reaches of large river stems.We carried out a very large local patch OSSE, using 81 × 81 pixels with 20 ensembles, in addition to the Zero, Fixed-Large, Fixed-Small, and Empirical local patch experiments.We found that WSE assimilation leads to large errors caused by error covariance between small tributaries due the limited ensemble size [15].Several researchers have reported spurious errors caused by error covariance due to limited ensemble size when assimilating distant observations in NWP [15,18,39].Our adaptive empirical local patches can effectively filter observations with error covariance and extend the local patch to use the maximum possible number of observations.We carried out a very large local patch OSSE, using 81 × 81 pixels with 20 ensembles, in addition to the Zero, Fixed-Large, Fixed-Small, and Empirical local patch experiments.We found that WSE assimilation leads to large errors caused by error covariance between small tributaries due the limited ensemble size [15].Several researchers have reported spurious errors caused by error covariance due to limited ensemble size when assimilating distant observations in NWP [15,18,39].Our adaptive empirical local patches can effectively filter observations with error covariance and extend the local patch to use the maximum possible number of observations.A conventional fixed local patch is not effective at using information from distant observations due to the fixed shape and size.When extending the size of a fixed local patch to capture significantly correlated areas (such as those included in the empirical local patch; Figure 14a), other areas with non-significant correlations are also included.Therefore, error covariance between small tributaries causes spurious correlations.More strict localization (Fixed-Small local patch) reduces errors due to non-significant correlation areas but disregards flow-dependent areas with significant correlations [15].Consequently, the conventional fixed local patch technique is less effective for using available observations while removing error covariance.
Local patch size can be adjusted according to observation error and ensemble size in the assimilation.In this study, we assumed that observation error could be represented as spatially uncorrelated, following a Gaussian distribution with a mean of zero and standard deviation of about 5 cm, in accordance with previous works [2,10,12].Moreover, we limited the ensemble size to 20 members.Depending on the observation error and ensemble size, the threshold for identifying the local patch based on the spatial dependency weight can be adjusted; importantly, this will increase the availability of observations.16).In the first 2 months (January and February), the lowest NRMSE was observed for the empirical local patch (0.031).This finding indicates that conversion from the initial-corrupted state to the well-assimilated state is efficient in the empirical local patch assimilation scheme.Thus, the empirical local patch OSSE performed well in terms of estimating discharge and efficiently converting the initial-corrupted state to the wellassimilated state.

Assimilation Efficiency
When comparing NRMSE among experiments (Figure 16), NRMSE values from all experiment other than the empirical local patch were similar, while the empirical local patch experiment shows a markedly lower NRMSE.In all experiments aside from the empirical local patch experiment, the localization weight was calculated along the river, rather than from the spatial distance, using Hubeny's formula.Therefore, weights are qualitatively similar for the same location.NRMSE is calculated using the mean ensemble discharge, and does not reflect the ensemble spread of assimilated discharge.Mean NRMSE is lower for the Empirical local patch OSSE, largely because of A conventional fixed local patch is not effective at using information from distant observations due to the fixed shape and size.When extending the size of a fixed local patch to capture significantly correlated areas (such as those included in the empirical local patch; Figure 14a), other areas with non-significant correlations are also included.Therefore, error covariance between small tributaries causes spurious correlations.More strict localization (Fixed-Small local patch) reduces errors due to non-significant correlation areas but disregards flow-dependent areas with significant correlations [15].Consequently, the conventional fixed local patch technique is less effective for using available observations while removing error covariance.
Local patch size can be adjusted according to observation error and ensemble size in the assimilation.In this study, we assumed that observation error could be represented as spatially uncorrelated, following a Gaussian distribution with a mean of zero and standard deviation of about 5 cm, in accordance with previous works [2,10,12].Moreover, we limited the ensemble size to 20 members.Depending on the observation error and ensemble size, the threshold for identifying the local patch based on the spatial dependency weight can be adjusted; importantly, this will increase the availability of observations.16).In the first 2 months (January and February), the lowest NRMSE was observed for the empirical local patch (0.031).This finding indicates that conversion from the initial-corrupted state to the well-assimilated state is efficient in the empirical local patch assimilation scheme.Thus, the empirical local patch OSSE performed well in terms of estimating discharge and efficiently converting the initial-corrupted state to the well-assimilated state.

Assimilation Efficiency
When comparing NRMSE among experiments (Figure 16), NRMSE values from all experiment other than the empirical local patch were similar, while the empirical local patch experiment shows a markedly lower NRMSE.In all experiments aside from the empirical local patch experiment, the localization weight was calculated along the river, rather than from the spatial distance, using Hubeny's formula.Therefore, weights are qualitatively similar for the same location.NRMSE is calculated using the mean ensemble discharge, and does not reflect the ensemble spread of assimilated discharge.Mean NRMSE is lower for the Empirical local patch OSSE, largely because of the efficiency of transformation from the initial-corrupted state to the well-assimilated state.Figure 16 shows that the NRMSE of the Empirical local patch experiment is lowest in January and February.When sufficient observations were available, NRMSE converged to a similar value in the Zero, Fixed-Small, and Fixed-Large experiments.The latter part of the simulation benefitted from the increased number of direct observations and propagation of the inflow correction from upstream areas.Large errors in NRMSE (January-March) in the large local patch experiment may be the result of spurious error covariance due to sampling errors caused by the limited ensemble size for assimilating distant observations [2,5,15,20].Thus, the Fixed-Small local patch can be used as a simplified method for Empirical local patches, but it is essential that localization be performed along the river.

Computational Efficiency
As this work represents a major step toward building a global-scale hydrodynamic data assimilation scheme, we used an LETKF-based assimilation technique.In global-scale assimilation schemes, LETKF has been favored among the available assimilation techniques [19,42].Furthermore, each pixel can be assimilated in parallel due to localization in the LETKF, which completes the update step in the area spanned by the ensembles, and therefore the computational cost of LETKF is considerably lower [42,43].Our LETKF assimilation for the empirical local patch required less than 20 s per day for the Congo basin (100 × 100 pixels) on a 12-core 2.6-GHz processor using Intel Fortran with Math Kernel library (MKL); thus, this method is sufficiently fast for real-time application (for the entire globe, it can be executed in less than 2 min per day of assimilation).
Furthermore, assimilating WSE while considering distinct localization parameters for each river pixel requires an extensive data input/output (I/O) process, which must be addressed before this method can be used in near real-time applications.By characterizing the localization parameters (according to elevation, slope of the river bathymetry, etc.), the computational burden of this data I/O process can be reduced greatly.

Computational Efficiency
As this work represents a major step toward building a global-scale hydrodynamic data assimilation scheme, we used an LETKF-based assimilation technique.In global-scale assimilation schemes, LETKF has been favored among the available assimilation techniques [19,42].Furthermore, each pixel can be assimilated in parallel due to localization in the LETKF, which completes the update step in the area spanned by the ensembles, and therefore the computational cost of LETKF is considerably lower [42,43].Our LETKF assimilation for the empirical local patch required less than 20 s per day for the Congo basin (100 × 100 pixels) on a 12-core 2.6-GHz processor using Intel Fortran with Math Kernel library (MKL); thus, this method is sufficiently fast for real-time application (for the entire globe, it can be executed in less than 2 min per day of assimilation).
Furthermore, assimilating WSE while considering distinct localization parameters for each river pixel requires an extensive data input/output (I/O) process, which must be addressed before this method can be used in near real-time applications.By characterizing the localization parameters (according to elevation, slope of the river bathymetry, etc.), the computational burden of this data I/O process can be reduced greatly.

Conclusions
In this study, we carried out four distinct OSSEs, i.e., Empirical, Zero, Fixed-Small, and Fixed-Large experiments, to evaluate the potential of physically-based localization parameters for use in hydrological data assimilation using LETKF.We conducted semi-variogram analysis to determine the spatial dependency weights and derived local patches by defining a threshold in these weights.A fixed number of grids were used for each of the Zero (1 × 1), Fixed-Small (11 × 11), and Fixed-Large (21 × 21) local patch experiments.Then, we compared the four OSSEs with synthetic SWOT observations for 2008 (366 days) and found that the Empirical local patch experiment estimated river discharge more efficiently than the fixed local patch assimilation methods.
The empirical local patches were derived adaptively for each river pixel, with consideration of spatial auto-correlation.We were able to use the maximum number of observations for assimilation without promoting error covariance due to the limited sample size by using empirical local patches.Conventional local patches cannot filter based on error covariance of observations, which lead to spurious errors from small tributaries.Using the empirical local patch technique allows use of distant observations, which cannot be effectively used with the conventional local patch method.Therefore, the limitations of conventional patches can be overcome using empirical local patches.
The Empirical local patch OSSE results suggested that SWOT observations have the potential to improve continental-scale river discharge with the use of physically based spatial dependency parameters.Overall, assimilation was effective for the entire Congo Basin, with high AI values even in upstream river sections where direct observations are unavailable (river width < 50 m).The hydrodynamics of continental-scale rivers can be reasonably estimated by assimilating SWOT observations using an empirical local patch, even when the model formulation and input runoff forcing contain errors.Hence, our study provides a useful technique for improving observation frequency by enlarging the local patch in an effective manner and performing data assimilation at the continental-scale with a low computational burden.
In our comparison of the fixed and empirical local patch experiments, we note that the latter OSSE has a lower mean NRMSE over the entire simulation.The Fixed-Large local patch experiment is most strongly affected by sampling errors due to limited ensemble spread when assimilating distant observations.The NRMSE of the initial months (January-February) suggested notable differences between the small and empirical local patch OSSEs, showing that the transition from the initial-corrupted state to the well-assimilated sate is highly effective with an Empirical local patch OSSE.
In this study, we used a simple error structure for both input forcing and model parameters.Future studies should test more complicated error structures, including those that may be spatially correlated.Moreover, we assumed that errors in model parameters or formulation would occur only for Manning's coefficient.However, river bathymetry represents one of the greatest sources of uncertainty in river hydrodynamic modeling.Hence, the empirical local patch can be combined with state parameter estimation techniques [31].Future studies should focus on developing a hybrid system that combines conventional and empirical localization techniques to overcome the limitations of the present study (i.e., assimilation in upstream areas).This study is an initial step in the development of a more robust global assimilation scheme using physically based localization parameters.

Figure 1 .Figure 1 .
Figure 1.(a) Map of the Congo River.Red dots indicate locations C1-C6 considered in this study.The locations of Kinshasa and Kisangani are represented by black dots.The tributaries Lualaba, Lindi, Figure 1.(a) Map of the Congo River.Red dots indicate locations C 1 -C 6 considered in this study.The locations of Kinshasa and Kisangani are represented by black dots.The tributaries Lualaba, Lindi, Lomami, Kasai, and Oubangui are shown.(b) Hydrograph at Kinshasa for 1980-1983 obtained from the Global Runoff Data Centre (GRDC) dataset.

Figure 2 .
Figure 2. General framework of the observing system simulation experiment (OSSE).The same framework was used for all experiments.

Figure 2 .
Figure 2. General framework of the observing system simulation experiment (OSSE).The same framework was used for all experiments.

Figure 4 .
Figure 4. Schematic diagram of the delineation of the empirical local patch for the target pixel indicated by a red star.(a) Spatial dependency weights and (b) local patch.River pixels inside the local patch are shown in blue, while other river pixels are shown in gray.

Figure 4 .
Figure 4. Schematic diagram of the delineation of the empirical local patch for the target pixel indicated by a red star.(a) Spatial dependency weights and (b) local patch.River pixels inside the local patch are shown in blue, while other river pixels are shown in gray.

4. 3 . 24 Figure 5 .
Figure 5. Schematic diagram of the delineation of the conventional l × l fixed local patch for the target pixel indicated by a red star (l, number of pixels).River pixels inside the local patch are shown in blue, while other river pixels are shown in grey.
. Red, blue, and black lines indicate the assimilated, corrupted and true discharge values, respectively.The green line represents AI.When the true and corrupted discharge values are very similar (within 10%), we used a yellow line to indicate AI.Green/Yellow dots on the AI curve represent days with direct observations for the target pixel.Mean AI and percentage bias (pBias) are provided in the upper left corner of the graph for each location, C1-C6.

Figure 5 .
Figure 5. Schematic diagram of the delineation of the conventional l × l fixed local patch for the target pixel indicated by a red star (l, number of pixels).River pixels inside the local patch are shown in blue, while other river pixels are shown in grey.

Figure 6 .
Figure 6.Hydrographs for the year 2008.(a)-(f) represent locations C1-C6, respectively, along the Congo River in the empirical local patch experiment.True, corrupted, and assimilated discharge values are indicated by black, blue, and red lines, respectively.The thin blue and red lines show the ensembles of corrupted and assimilated discharge, respectively.The assimilation index (AI) is shown in green, and the yellow line indicates the bias of corrupted discharge relative to true discharge.Green/Yellow dots represent the times of synthetic SWOT observations.The mean AI and percent bias (pBias) of the assimilated simulation are shown in the left corner of each hydrograph.

Figure 6 .
Figure 6.Hydrographs for the year 2008.(a-f) represent locations C 1 -C 6 , respectively, along the Congo River in the empirical local patch experiment.True, corrupted, and assimilated discharge values are indicated by black, blue, and red lines, respectively.The thin blue and red lines show the ensembles of corrupted and assimilated discharge, respectively.The assimilation index (AI) is shown in green, and the yellow line indicates the bias of corrupted discharge relative to true discharge.Green/Yellow dots represent the times of synthetic SWOT observations.The mean AI and percent bias (pBias) of the assimilated simulation are shown in the left corner of each hydrograph.

Water 2019 , 24 Figure 7 .
Figure 7. Annual mean AI of empirical local patch experiments.Pixels with annual mean discharge > 100 m 3 /s are shown for visualization purposes.

Figure 8 .
Figure 8. Local patches for target pixels (red circles) in the Congo River (blue area), with locations C1-C6 shown in (a)-(f), respectively.Red circles indicate the target pixels.Grey denotes major tributaries of the Congo River.

Figure 7 . 24 Figure 7 .
Figure 7. Annual mean AI of empirical local patch experiments.Pixels with annual mean discharge > 100 m 3 /s are shown for visualization purposes.

Figure 8 .
Figure 8. Local patches for target pixels (red circles) in the Congo River (blue area), with locations C1-C6 shown in (a)-(f), respectively.Red circles indicate the target pixels.Grey denotes major tributaries of the Congo River.

Figure 8 .
Figure 8. Local patches for target pixels (red circles) in the Congo River (blue area), with locations C 1 -C 6 shown in (a-f), respectively.Red circles indicate the target pixels.Grey denotes major tributaries of the Congo River.

Water 2019 ,
11, x FOR PEER REVIEW 14 of 24

Figure 9 .
Figure 9. (a) Size of the empirical local patch for the Congo basin (as area, km 2 ).(b) Box plot of the empirical local patch sizes (number of pixels) for large (watershed area ≥ 10 5 km 2 ), medium (10 5 km 2 ≥ watershed area > 5000 km 2 ), and small (watershed area < 5000 km 2 ) river pixels.

Figure 9 .
Figure 9. (a) Size of the empirical local patch for the Congo basin (as area, km 2 ).(b) Box plot of the empirical local patch sizes (number of pixels) for large (watershed area ≥ 10 5 km 2 ), medium (10 5 km 2 ≥ watershed area > 5000 km 2 ), and small (watershed area < 5000 km 2 ) river pixels.5.1.2.Observation Frequency Figure 10 shows the number of observations available for assimilation within the derived empirical local patch (indicated by light blue bars) for locations C 1 -C 6 .Days with direct SWOT observations are indicated by red stars.Here, we assume that SWOT observations have a similar spatial resolution to the CaMa-Flood model, which is 0.25 • for our virtual experiments.The number of observations at location C 4 , which is represented by only one direct SWOT observation per SWOT cycle, increased to 34 per cycle.The number of observations increased to 72, 180, 144, 34, 291, and 277, respectively, for C 1 -C 6 .Hence, through derivation of an empirical local patch, the observation frequency increases by a large margin compared to the use of direct observations.We note that the observation frequency is extended using the local patch, enabling the WSE to be updated without direct observations (see Figure10).Upstream sites and small tributaries that may not be directly observable with SWOT gain the greatest benefit from the extension of the local patch.

Water 2019 , 24 Figure 10 .
Figure 10.Number of SWOT observations in the derived local patch (light blue).Red stars indicate days for which direct SWOT observations are available.Figures (a)-(f) correspond to the local patches of locations C1-C6, respectively.

Figure 10 .
Figure 10.Number of SWOT observations in the derived local patch (light blue).Red stars indicate days for which direct SWOT observations are available.Figures (a-f) correspond to the local patches of locations C 1 -C 6 , respectively.Water 2019, 11, x FOR PEER REVIEW 16 of 24

Figure 11 .
Figure 11.Variation of spatial dependency as a weighting factor (red vertical lines) along each river stem, with target pixels C1-C6 shown in (a)-(f), respectively.The upstream area has the largest correlated lengths.The horizontal axis shows distance from the target pixel, with negative values indicating the upstream direction and positive values downstream.The bathymetric profile is shown in black, average water surface elevation (WSE) is indicated in blue, and average discharge is shown in indigo.Average values are calculated for 1980-2000.

Figure 11 .
Figure 11.Variation of spatial dependency as a weighting factor (red vertical lines) along each river stem, with target pixels C 1 -C 6 shown in (a-f), respectively.The upstream area has the largest correlated lengths.The horizontal axis shows distance from the target pixel, with negative values indicating the upstream direction and positive values downstream.The bathymetric profile is shown in black, average water surface elevation (WSE) is indicated in blue, and average discharge is shown in indigo.Average values are calculated for 1980-2000.

Water 2019 , 24 Figure 12 .
Figure 12.Hydrograph at Kinshasa for the year 2008 in the (a) empirical, (b) zero, (c) fixed-small, and (d) fixed-large local patch experiments.True, corrupted, and assimilated discharge values are indicated by black, blue, and red lines, respectively.The thin blue and red lines show the ensembles of corrupted and assimilated discharge, respectively.The AI is shown in green, and the yellow line indicates the bias of corrupted discharge relative to true discharge.Green/Yellow dots represent the times of synthetic SWOT observations.The mean AI and ensemble spread (EnSpr) of the assimilated simulation are shown in the lower-right corner of each hydrograph.

Figure 12 .
Figure 12.Hydrograph at Kinshasa for the year 2008 in the (a) empirical, (b) zero, (c) fixed-small, and (d) fixed-large local patch experiments.True, corrupted, and assimilated discharge values are indicated by black, blue, and red lines, respectively.The thin blue and red lines show the ensembles of corrupted and assimilated discharge, respectively.The AI is shown in green, and the yellow line indicates the bias of corrupted discharge relative to true discharge.Green/Yellow dots represent the times of synthetic SWOT observations.The mean AI and ensemble spread (EnSpr) of the assimilated simulation are shown in the lower-right corner of each hydrograph.

Water 2019 , 24 Figure 13 .
Figure 13.Number of SWOT observations in the (a) Empirical, (b) Fixed-Small, and (c) Fixed-Large local patches (light blue) for Kinshasa.Red stars indicate days for which direct SWOT observations are available.

Figure 14 .
Figure 14.Local patch for Kinshasa used in the (a) empirical, (b) fixed-small, and (c) fixed-large local patch experiments (blue).Background color indicates the number of SWOT observations per cycle.The Congo River network is shown in black.Red circle indicates the target pixel.

Figure 13 .
Figure 13.Number of SWOT observations in the (a) Empirical, (b) Fixed-Small, and (c) Fixed-Large local patches (light blue) for Kinshasa.Red stars indicate days for which direct SWOT observations are available.

Water 2019 , 24 Figure 13 .
Figure 13.Number of SWOT observations in the (a) Empirical, (b) Fixed-Small, and (c) Fixed-Large local patches (light blue) for Kinshasa.Red stars indicate days for which direct SWOT observations are available.

Figure 14 .
Figure 14.Local patch for Kinshasa used in the (a) empirical, (b) fixed-small, and (c) fixed-large local patch experiments (blue).Background color indicates the number of SWOT observations per cycle.The Congo River network is shown in black.Red circle indicates the target pixel.

Figure 14 .
Figure 14.Local patch for Kinshasa used in the (a) empirical, (b) fixed-small, and (c) fixed-large local patch experiments (blue).Background color indicates the number of SWOT observations per cycle.The Congo River network is shown in black.Red circle indicates the target pixel.

Water 2019 , 24 Figure 15 .
Figure 15.Difference in the annual mean AI between the empirical and (a) zero, (b) fixed-small, and (c) fixed-large local patch experiments.Pixels with annual mean discharge > 100 m 3 /s are shown for visualization purposes.

Figure 16
Figure 16 presents a comparison of the NRMSE values of different OSSEs.The Zero, Fixed-Small, Fixed-Large, and Empirical local patch experiments have mean NRMSE values of 0.078, 0.073, 0.133, and 0.030, respectively, after rounding to the third decimal point.All experiments have lower NRMSE values than the corrupted state (blue line in Figure16).In the first 2 months (January and February), the lowest NRMSE was observed for the empirical local patch (0.031).This finding indicates that conversion from the initial-corrupted state to the well-assimilated state is efficient in the empirical local patch assimilation scheme.Thus, the empirical local patch OSSE performed well in terms of estimating discharge and efficiently converting the initial-corrupted state to the wellassimilated state.When comparing NRMSE among experiments (Figure16), NRMSE values from all experiment other than the empirical local patch were similar, while the empirical local patch experiment shows a markedly lower NRMSE.In all experiments aside from the empirical local patch experiment, the localization weight was calculated along the river, rather than from the spatial distance, using Hubeny's formula.Therefore, weights are qualitatively similar for the same location.NRMSE is calculated using the mean ensemble discharge, and does not reflect the ensemble spread of assimilated discharge.Mean NRMSE is lower for the Empirical local patch OSSE, largely because of

Figure 15 .
Figure 15.Difference in the annual mean AI between the empirical and (a) zero, (b) fixed-small, and (c) fixed-large local patch experiments.Pixels with annual mean discharge > 100 m 3 /s are shown for visualization purposes.

Figure 16
Figure 16 presents a comparison of the NRMSE values of different OSSEs.The Zero, Fixed-Small, Fixed-Large, and Empirical local patch experiments have mean NRMSE values of 0.078, 0.073, 0.133, and 0.030, respectively, after rounding to the third decimal point.All experiments have lower NRMSE values than the corrupted state (blue line in Figure16).In the first 2 months (January and February), the lowest NRMSE was observed for the empirical local patch (0.031).This finding indicates that conversion from the initial-corrupted state to the well-assimilated state is efficient in the empirical local patch assimilation scheme.Thus, the empirical local patch OSSE performed well in terms of estimating discharge and efficiently converting the initial-corrupted state to the well-assimilated state.When comparing NRMSE among experiments (Figure16), NRMSE values from all experiment other than the empirical local patch were similar, while the empirical local patch experiment shows a markedly lower NRMSE.In all experiments aside from the empirical local patch experiment,

Water 2019 ,
11, x FOR PEER REVIEW 20 of 24When sufficient observations were available, NRMSE converged to a similar value in the Zero, Fixed-Small, and Fixed-Large experiments.The latter part of the simulation benefitted from the increased number of direct observations and propagation of the inflow correction from upstream areas.Large errors in NRMSE (January-March) in the large local patch experiment may be the result of spurious error covariance due to sampling errors caused by the limited ensemble size for assimilating distant observations[2,5,15,20].Thus, the Fixed-Small local patch can be used as a simplified method for Empirical local patches, but it is essential that localization be performed along the river.

Figure 16 .
Figure 16.Time series of normalized root mean square error (NRMSE) of assimilated discharge in the zero (magenta), fixed-small (red), fixed-large (violet), and empirical (cyan) local patch OSSEs.Blue line indicates the NRMSE of the corrupted simulation.The fixed-small and fixed-large patches were 11 × 11 and 21 × 21 pixels, respectively.The y-axis has been stretched to enhance the visibility of low NRMSE values.

Figure 16 .
Figure 16.Time series of normalized root mean square error (NRMSE) of assimilated discharge in the zero (magenta), fixed-small (red), fixed-large (violet), and empirical (cyan) local patch OSSEs.Blue line indicates the NRMSE of the corrupted simulation.The fixed-small and fixed-large patches were 11 × 11 and 21 × 21 pixels, respectively.The y-axis has been stretched to enhance the visibility of low NRMSE values.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2073-4441/11/4/829/s1,Descriptions of (1) Development of empirical local patches, (2) Selection of fixed local patch sizes, and (3) Very large local patch assimilation, Figure S1: (a) Spatial dependency weights and (b) local patch for Kinshasa in Congo River.River pixels inside the local patch are shown in blue and other river pixels are shown in black, Figure S2: Time series of normalized root mean square error (NRMSE) of assimilated discharge in the 7 × 7 (magenta), 11 × 11 (red), 15 × 15 (violet), 21 × 21 (cyan), and 31 × 31 (blue) local patch OSSEs, Figure S3: (a) Local patch with the number of SWOT observations (colors) and (b) time series of WSE of Kinshasa.

Table 1 .
Description of experimental conditions.

Table 1 .
Description of experimental conditions.