2.3.1. EnKF
The data assimilation method adopted here was the ensemble Kalman filter (EnKF [
46,
47,
48]), which uses an ensemble of model trajectories that is evolved via the full nonlinear model to describe the flow-dependent forecast error distribution. The EnKF is a Gaussian method and the uncertainty about the state estimate is described via the ensemble mean and the ensemble covariance.
If
is the analysis at time step
k for the ensemble member
, then the forecast
at time step
generated with the model
M is:
where the random term
is a realization of the model error, representing the complete effect of perturbations to forcing and model variables. Observations
with random error
were collected at time steps
k and their simulated counterparts
(observation predictions) were generated using the nonlinear observation operator
:
The EnKF works in a sequential fashion whereby the ensemble of forecast states,
, is updated when new observations are available. This process produces the analysis ensemble:
where the Kalman gain matrix,
, is given by:
with all entries evaluated at time step
k (we dropped the index in the equation for the simplicity of notation). The matrix
is the sample error covariance (across the ensemble) between the model state forecast
and the observation prediction
. Similarly,
is the sample error covariance of the observation predictions.
R is the observation error covariance.
The ensemble size was set here to
. In the present one-dimensional setting, the model state and the assimilated data are scalar quantities; thus, all matrices in Equation (
9) reduce to scalar real numbers. The observations
are given by the ASCAT backscatter
, processed as described in
Section 2.1.2. The dynamic state variable
x is the surface soil moisture (SSM) from the SCHEME model and the observation operator is given by the WCM. The observation prediction is generated by using, as an input in the WCM, the SCHEME model SSM
and the PROBA-V LAI product.
2.3.2. Perturbations
We worked with the stochastic version of the EnKF [
46], whereby random perturbations are applied to precipitation and to the dynamic state variable (SSM) to generate the ensemble of trajectories, as well as to the assimilated
observations to prevent an under-estimation of the analysis error (see, e.g., [
47]). There was an initial set of perturbations used to generate the SSM ensemble the day at which the ensemble simulation starts, and then, at each daily forecast time step, new soil moisture perturbations were introduced using the method developed in [
13]. Perturbations were designed to preserve the bounds of SSM by using truncated distributions in their generation. Let
be an SSM value (expressed in the range 0–1) to which we added a noise term
drawn from a truncated normal distribution
. Since both
and
must belong to the interval
, we have
. These inequalities define the interval on which
has to be restricted, with standard deviation
drawn from the corresponding probability distribution of SSM standard deviations at time step
k as in [
13].
Together with perturbing the model state vector to mimic the model internal variability, we also mimicked the uncertainty in the external forcing (process error) by perturbing the precipitation fields. Precipitation forcing uncertainties are represented by random multiplicative perturbations drawn from a truncated log-normal distribution with standard deviation varying in time between 0.5 and 5.0 according to the precipitation value. For the calculation of precipitation perturbations, the same technique as in the case of SSM above was applied, taking into account the multiplicative nature of such perturbations to determine the interval in which they can vary. In this approach, we assumed an upper limit of precipitation at 60 mm/day.
A similar approach was adopted for ASCAT in linear scale as well to avoid unrealistic values compared to the long-term ASCAT observations (2010–2019) after applying the observation perturbations, i.e., an upper limit was set. We worked in linear scale with additive perturbations drawn from a truncated normal distribution. The standard deviation of the observation perturbation distribution is a fraction of the RMSE between the ASCAT observations and the deterministic (calibrated) WCM simulations and varies spatially between and in the linear scale across grid cells. The RMSE calculations were carried out for the period 2010–2014 for each grid cell of SCHEME and separately in each catchment.
The EnKF update is fully Gaussian and there are no physical nor dynamical constraints in it that could prevent the analysis values to be out of bounds. This is a well known issue in EnKF-like methods. Only sufficiently small values of the Kalman gain
K could keep the updated soil moisture states inside the bounds; for example, when the covariance
R is very large. However, this would imply a very small or negligible impact of DA. One pragmatic solution would be to repeat the observation sampling each time the bounds are exceeded, hoping that, after a reasonable number of iterations, the innovation term will become sufficiently small. In our case, this approach proved to be very inefficient even after as many as
iterations. Instead, we used a tolerance parameter
T to randomly choose a new updated value when
exceeded its bounds. If
after updating, then
and if
after updating, then
where
u is a random number between 0 and 1.
T is a small real number used in order to generate a model state value near the bound by applying the previous rule; we set
.
2.3.3. Addressing Bias Issues
The EnKF is formulated under the assumption of unbiased model simulations and observations. These conditions are rarely met in real applications and one has to cope with them by attempting to estimate and remove the biases whenever possible. In the present case, biases between observed ASCAT backscatter and simulated backscatter obtained with deterministic SCHEME estimates of SSM and PROBA-V LAI were minimized by calibrating the WCM parameters. However, the use of ensemble perturbations introduces bias between ensemble backscatter simulations and observations.
More specifically, the perturbations applied to the SCHEME model forcing (precipitation), model state (SSM) and observations (backscatter) have an average of 0 if they act additively, or 1 if they act multiplicatively. Therefore, in the limit of a very large sample, the averages of the perturbed values are indistinguishable from the deterministic ones. However, the model state (SSM) is bounded and the SCHEME model is non-linear; therefore, bias generation during ensemble perturbation (needed for DA) cannot be ruled out.
The amount and impact of the latter was elucidated by performing DA-free ensemble runs, hereafter referred to as open loop (OL) simulations. In the OL, the precipitation and SSM perturbations are active but the Kalman equation is disabled.
Figure 4 illustrates an example for the areal averaged SSM over the two catchments, Demer and Ourthe, and for one year in OL simulations. Note that, due to areal averaging over the catchments, the reference simulation may sometimes appear completely outside the range of the ensemble (e.g., November 2011), when, in fact, at several places in the catchments, the reference values are inside the ensemble range. The reference simulation is the deterministic SCHEME model run without perturbations or DA.
By visually inspecting
Figure 4, it is evident that a negative bias is introduced in OL SSM simulations due to perturbations. The bias with respect to the reference (deterministic) simulation was approximately
and
for the Demer and the Ourthe catchments, respectively.
The introduction of SSM bias via ensemble perturbation is a known problem and can be solved by recentering the ensemble around a reference deterministic simulation as in, e.g., [
49]. This approach assumes that the ensemble spread is created by adding Gaussian noise only to the model state variables in an ideal case, which is not true here, thus leading us to opt for an alternative solution for perturbation bias correction, which is as follows. We performed OL and reference simulations for the period 2010–2015 for both catchments. We then took spatial averages over the catchments to obtain the reference and the OL ensemble SSM time series. Based on the OL ensemble average and the reference time series, we constructed the cumulative distribution functions (CDFs) for SSM shown in
Figure 5.
The CDFs can be used to measure the difference between the OL average and the reference soil moisture values at an equal probability. This leads to the soil moisture bias estimates (red lines) displayed in
Figure 6, to which, we fitted a polynomial in order to allow for the calculation of a bias correction for any SSM value. In this case, the polynomial was of degree four for both catchments (blue lines in
Figure 6).
The distributions and the bias functions in
Figure 5 and
Figure 6 provide the statistical properties of time series obtained after the averaging of simulated data over the catchments. The objective here was to apply a bias correction to the forecasted SCHEME model state (SSM) at each model time step, with the goal of minimizing the effect of the bias on the ensemble OL (bias as, e.g., in
Figure 4) and DA performance. More specifically for the DA case, the bias-corrected SSM forecast should also enter as an input in the WCM and the Kalman equation. However, using the bias estimates of
Figure 6 to correct SSM forecasts at each time step (with feedback into the model) would produce large correction values in the intermediate SSM range (e.g., between
and
for the Demer or
and
for the Ourthe in
Figure 6). This would quickly lead to water accumulation in the upper soil layer under low evaporation conditions, which prevail in autumn and winter in Belgium. Therefore, the bias correction was adapted as follows to take into account the evaporation effect.
We only applied a fraction of the bias correction term above, according to the potential evapotranspiration (PET) levels. In this approach, there were two bias sub-functions defined for different intervals of PET, hence the name piece-wise bias correction (or “p-w BC” for short). More precisely, if
B is the SSM bias provided by the bias function (
Figure 6) and SSM
is the bias-corrected surface soil moisture, then
The PET threshold is
for Demer and
for Ourthe. The constants
and
adjusting the bias correction were calculated by minimizing the OL SSM bias relative to a deterministic simulation over the period 2010–2015. The values were
,
for the Demer, and
,
for the Ourthe (resulting in a bias of
for the Demer and
for the Ourthe catchment). In short, the correction given by Equation (
12) was applied to the forecasted SSM at each model time step for either the OL or DA simulation, and it ensured that a bias-corrected SSM enters as an input in the WCM and the Kalman equation.