Data Assimilation of Satellite Soil Moisture into Rainfall-Runoff Modelling : A Complex Recipe ?

Data assimilation (DA) of satellite soil moisture (SM) observations represents a great opportunity for improving the ability of rainfall-runoff models in predicting river discharges. Many studies have been carried out so far demonstrating the possibility to reduce model prediction uncertainty by incorporating satellite SM observations. However, large discrepancies can be perceived between these studies with the result that successful DA is not only related to the quality of the satellite observations but can be significantly controlled by many methodological and morphoclimatic factors. In this article, through an experimental study carried out on the Tiber River basin in Central Italy, we explore how the catchment area, soil type, climatology, rescaling technique, observation and model error selection may affect the results of the assimilation and can be the causes of the apparent discrepancies obtained in the literature. The results show that: (i) DA of SM generally improves discharge predictions (with a mean efficiency of about 30%); (ii) unlike catchment area, the soil type and the catchment specific characteristics might have a remarkable influence on the results; (iii) simple rescaling techniques may perform equally well to more complex ones; (iv) an accurate quantification of the model error is paramount for a correct choice of the observation error and, (v) SM temporal variability has a stronger influence than the season itself. On this basis, we advise that DA of SM may be not a simple task and one should carefully test the optimality of the assimilation experiment prior to drawing any general conclusions. OPEN ACCESS Remote Sens. 2015, 7 11404


Introduction
It is well recognized that soil moisture (SM) is central to the repartition of the mass and energy fluxes between the land surface and the atmosphere, and, as a consequence, it plays a key role in the hydrological cycle.For instance, in rainfall-runoff modelling, it has been demonstrated that the wetness condition of the catchment before a flood event strongly controls the severity of the response of the catchment in terms of runoff [1][2][3][4].In other words, different initial soil moisture conditions can discriminate between catastrophic and minor effects for the same amount of rainfall.Therefore, an accurate estimate of the initial SM conditions is paramount for reducing the uncertainties in early warning flood forecasting models and may help to mitigate the flood hazard.
Hydrological models are valid solutions for obtaining accurate estimate of the SM conditions and have been used worldwide for diminishing the impact of floods on the population by issuing warnings with appropriate lead-time before floods occur.However, hydrological simulations are "imperfect" [5] in the sense that they contain uncertainties which are mainly related to (i) the quality and quantity of the hydrological data used to drive the models [6][7][8] as well as representativeness errors due to scale incompatibility [9]; (ii) the simple representation of the real physical processes leading to inevitable assumptions and simplifications and thus unavoidably imperfect approximations to the complex reality [5] and (iii) errors in parameter estimates which can result in huge errors in the model outputs [10].As a result, there is an urgent need for techniques that effectively and efficiently assimilate new sources of information into the models to produce less uncertain hydrological predictions.Due to the increased spatial-temporal resolution and good accuracy of new satellite SM products [11][12][13][14], several studies have recently performed SM data assimilation (DA) into hydrological models [15][16][17][18][19][20][21][22].
Beside the choice of the most appropriate DA technique [23] and the type of sensor, DA involves a series of methodological assumptions and choices-not directly related to these above two issues-that might significantly affect the final result.Indeed, in the scientific literature, different studies characterized by a number of conflicting conclusions can be found, thus making a general interpretation of the utility of DA of SM in rainfall-runoff modelling difficult.In particular, different factors were found to affect the DA like the particular model structure, the bias correction technique, the model and observation error quantification, the spatial mismatching between catchment scale processes and observations, the catchment topography and climatology, just to cite a few.The result is that DA of SM in rainfall-runoff modelling still presents a number of controversial issues that seem to be not yet fully understood [17,22].
As an example, the model structure (i.e., the part of the model that links the surface and the root-zone SM) may play an important role in the success of the DA experiment.Chen et al. [20], in assimilating in situ SM observations into the Soil and Water Assessment Tool (SWAT) in the Cobb Creek Watershed (341 km 2 ) in USA through Ensemble Kalman Filter (EnKF) [24] (using both synthetic and real case experiments), concluded that: "Assimilation of actual surface soil moisture data had limited success in the upper layers only and was generally unsuccessful in improving stream flow prediction mainly due to the SWAT decoupling between surface and root-zone layer that limits the ability of the EnKF to update the soil moisture states of deeper layers".Brocca et al. [17], in a small catchment in central Italy, assimilated the Advanced SCATterometer (ASCAT, [11]) SM observations into two layer continuous hydrological model obtaining a great improvement in discharge prediction particularly for the floods occurring during dry to wet transition periods.The authors explored the sensitivity of the assimilation of both surface and root zone satellite SM observations to the model structure via two different assimilation experiments: (i) assimilation of the surface SM observations in the upper layer of the hydrological model, and, (ii) assimilation of the root zone SM observations (obtained by the application of the Exponential filter, [25]) into the lower layer of the model.They showed that the second solution provides much better results because of the larger impact of the root zone SM on runoff simulation for the investigated basin and of the weak and highly non-linear linkage between the SM states of the two layers which prevent the EnKF from updating the state of the deeper layer when the surface layer is assimilated.
The accurate estimate of model and observation error along with its structure is another crucial issue in DA of SM [26][27][28].The importance of model error assumption was initially highlighted by Crow and Van Loon [26] using statistics calculated using filter innovations (i.e., differences between model forecast mapped on the observation space and observations themselves).The authors highlighted that inappropriate model error assumptions can lead to circumstances in which the assimilation of surface SM can degrade the performance of open loop simulations.De Lannoy et al. [29] proposed using two ensemble indexes statistics previously employed in weather forecast analysis for calibrating the model error parameters and for giving an estimate of the model error, while in a recent paper Alvarez-Garreton et al. [22] used a methodology that maximizes the likelihood of observing historical streamflow data given the calibrated model parameters and the model parameter errors.Despite the effort made by these studies, a clear guideline for a proper model error assessment appears to be a long way off from reaching a final solution.
The estimation of an appropriate observation error for DA techniques is also another open question [30].Different studies have tried to characterize the satellite SM error either by comparing satellite SM data with ground observations [31], or by providing an estimate of the error independently [30,32].A quantification of the error associated with the retrieval is also supplied as ancillary data by data providers (e.g., the European Organisation for the Exploitation of Meteorological Satellites, EUMETSAT, within the project "Satellite Application Facility on Support to Operational Hydrology and Water Management", H-SAF, supplies ASCAT SM data along with flags of different types and the associated uncertainties).The difficulty stems from the fact that satellite SM observations have in general higher uncertainty than ground measurements with error characterized by non-stationary characteristics [32], a much coarser spatial resolution, and are related to the top few centimetres of the soil.Additionally, they are affected by the error associated with retrieval algorithms.All these factors represent a challenge not only in the mere quantification of the error but also in the observation error spatial correlation [33], error structure [27] and compatibility between error statistics with the underlying hypothesis of the DA technique [27,28].Indeed, in many cases, SM satellite observation error is treated as an independent Gaussian noise with fixed time variance, and this leads to a number of known weaknesses which can affect the optimality of the DA experiment [28].Despite these issues, a final choice is needed in DA and this choice has been made by different methodologies like a simple preliminary calibration of the error [17], by the Triple Collocation (TC) method [21] as well as by using ancillary data provided with the SM retrieval [34].It has to be noted that these choices may lead to a different error quantification and it is still not clear how they are propagated, in terms of optimality, into the DA experiments.
Moreover, model predictions and satellite observations are characterized by systematic differences between hydrological variables and this is particularly true for SM [31,35].This bias is not desired in the framework of sequential DA techniques and should be removed prior to integrating observations into the models [36].For bias removal, several potential strategies have been adopted in the past [16,37,40,42,43] occasionally leading to contrasting results in terms of their optimality in improving discharge predictions.Indeed, it has been shown that the use of suboptimal rescaling schemes [38] can degrade the performance of the stochastic DA and that an optimal rescaling technique (RT) can be identified for DA (i.e., the TC-based RT).However, even if an optimal RT can be theoretically found, its application to reality might not be straightforward.For instance, the successful application of TC requires sufficiently large numbers of coincident data points from three independent time series and, within the analysis period, homogeneity of their linear relationships and error structures [39]-two requirements not always fulfilled.
Beside purely methodological issues, topography, spatio-temporal resolution and climatology are other important factors that may affect the results of the DA experiments.For instance, Brocca et al. [40] assimilated different soil moisture products (from remote sensing and global modelling) into rainfall-runoff modelling for five catchments located in Italy, France, Luxembourg and US using a simple nudging scheme.The authors found that the reliability differs according to the climatic region and the accuracy of satellite retrievals.In particular, no improvements were seen in mountainous regions and in regions characterized by the presence of snow.Moreover, in very dry climates, the very low SM temporal variability precluded an improvement of the rainfall-runoff simulations when SM was assimilated.Matgen et al. [19], in a tiny catchment in Luxembourg, used the Particle Filter [41] for assimilating both in-situ and satellite SM observations into the Bibmodel [42].They found that, while the assimilation of in-situ data was successful, the use of ASCAT data did not lead to significant advantages and that the assimilation of SM is more efficient for flood forecasting during the transition periods (i.e., spring and fall seasons).However, the authors did not clearly identify the reasons for the low performance of satellite data (e.g., spatial mismatch between catchment and observations, observation error estimation) with respect to the better performance of in-situ data.Alvarez-Garreton et al. [37] assimilated SM data derived from the Advanced Microwave Scanning Radiometer (AMSR-E) into simple hydrological model in a semi-arid catchment though the EnKF.They found that the assimilation improved the prediction of both moderate and major floods; however, the improvement was higher and more consistent for moderate events.Recently, Corato et al. [43] used the Particle Filter [44] for assimilating different satellite SM observations (i.e., two active and one passive products) with different spatial and temporal resolutions in the Severn catchment in UK trying to analyse under what conditions (i.e., sensors characteristics and climatology) the assimilation of SM provided a successful result.The authors found that the performance of the DA depends on the period of the year, and active and passive sensors show different "optimal" assimilation periods probably due to the effect of vegetation on the sensor accuracy.
In summary, large discrepancies have been seen in the studies published so far likely due to the differences in the implementation of the DA techniques.The result is that there exist no well-accepted guidelines about how to optimally choose appropriate settings within a DA experiment based on data and basin characteristics, and the "optimal" DA configuration still appears to be a complex task.
In this paper, making use of an experimental study, we discuss and highlight how some of the choices regarding the observations error, the RT and the climatic and topographic conditions can affect the results of the DA experiments in run-off simulations.It is worth noting that this paper does not attempt to provide general guidelines for "optimal" choices of the settings involved in DA studies; instead, we aim to present to the readers a simple but comprehensive and robust case study which highlights the main difficulties encountered in the setup of a DA experiment.The paper has the general objective to emphasize the reasons why further research in this direction is necessary in order to draw a more general approach for solving the open issues in hydrologic DA.Secondary objectives are: (i) To provide new evidence of the improvement in flood prediction from assimilating real satellite SM in an operational context; (ii) to test different RTs and their impacts in improving discharge prediction; (iii) to explore the effect of an erroneous choice of the observation error through a sensitivity analysis; (iv) to study the effects of different catchments characteristics (e.g., size, land use and morphology) on DA and (v) to analyze the effect of the climatology and SM conditions on the results of the DA.
On this basis, in order to analyze in deep the above targets five sub-catchments of the Upper Tiber River Basin with area ranging from 137-2000 km 2 are considered for which high-quality hydro-meteorological hourly observations are available in the period 2000-2013.
The paper is organized as follow: In Section 2, we give an overview of the study areas and the in-situ and satellite data.Section 3 details the procedure adopted in the analysis, describing the rainfall-runoff model, the assimilation techniques, the procedures for bias correction, the model and the observation error and the measure of performance.In Section 4 we present the results and some considerations are drawn in the discussions.In the last section, perspectives and conclusions are given.

Study Area
The study area is located in Central Italy and includes five sub-catchments of the Upper Tiber River basin (Figure 1).The catchments were selected by considering those characterized by a (i) good quality of rainfall, temperature and discharge data and (ii) size of the water storage not larger than 500 mm (about 150 cm of soil depth considering the average porosity in the area) in order to exclude catchments characterized by too much variance in depth between the satellite observations (few centimeters) and the depth involved in the rainfall-runoff transformation.The latter was identified by a preliminary analysis on the catchments for which data were available in the area and by using information of past studies.Finally, we selected: Chiani at Morrano (CHI-MOR), Marroggia at Azzano (MA-AZ), Niccone at Migianella (NI-MI), Tevere at Ponte Felcino (TV-PF), Nestore at Marsciano (NE-MA).The main features of these catchments are shown in Table 1.Specifically, the drainage area ranges from 137 km 2 to 2040 km 2 and the mean catchment elevation ranges between 350 and 604 m a.s.l.The catchments located in the western part, Nestore and Chiani, are characterized by low slopes, and very similar soil and land use, whereas Marroggia lies in the Apennine reliefs, with higher elevations and slopes (Table 1).The climate is Mediterranean with mean annual precipitation of about 800 mm.Higher monthly precipitation values are generally observed during the autumn-winter period when floods caused by widespread rainfall normally occur.Mean annual temperature ranges from around 12-13 °C.Snowfall represents a low percentage of precipitation and is unusual and ephemeral at altitudes below 500 m a.s.l.
Infiltration rates were calculated based on the soil type.Based on this information, catchments located in the western part (NE-MA, CHI-MO, NI-MI) are characterized by a higher percentage of soil with lower infiltration rate while MA-AZ has a higher percentage of soil with high infiltration rate due a greater presence of debris and conoids typical of mountainous areas.Table 1.Physical parameters of the selected catchments (A = area, P = annual precipitation amount, El = mean elevation, Sl = mean slope) along with the most important calibrated model parameters in the period 2000-2009 (Wmax = maximum water storage, Ks = parameter of drainage, Kc = parameter of evapotranspiration, η = lag time parameter of Equation ( 1)) and the time averaged model error standard deviation m  # SM-OBS4 pixels represents the number of the satellite pixels falling inside the catchment area, SM-OBS4 noise is the spatial average of noise SM information provided as ancillary data with SM-OBS4 product while T refer to value of the characteristic time length of the Exponential filter.α1, α2, α3 represent the coefficients used to multiply the value of the parameters for obtaining their model error standard deviation, while σp is the rainfall error standard deviation.

In Situ Data
In the study area, a dense hydro-meteorological monitoring network has been operating for more than 25 years consisting of 84 rain gauges, 36 thermometers and 43 hydrometric gauges.The data are recorded with a time interval of 30 minutes, here aggregated at hourly time steps.Rainfall and temperature recorded from January 2000 to December 2013 were considered in this study.Rainfall data are aggregated at catchment scale by the Thiessen polygon method while temperature data are simply averaged for the sensors located inside the catchment.As regards streamflow data, reliable rating curves provide discharge values to be used for calibrating and validating the procedure.

Satellite Data
Satellite SM data from the scatterometer ASCAT were used in this study due their relatively good performance in Central Italy [17].This allows to alleviate possible issues related to the general overall quality of satellite observations.In particular, the demonstrational product H25 SM-OBS-4 (Metop ASCAT Soil Moisture Time Series) distributed through H-SAF [45] project was used.The product cover all the globe and it is available from 2007.It is retrieved from the radar backscattering coefficients measured by ASCAT onboard the Metop A satellite using a change detection method, developed at the Research Group Remote Sensing, Department for Geodesy and Geoinformation, Vienna University of Technology.The relative SM data ranging between 0% and 100% are derived by scaling the normalized backscattering coefficients between the lowest/highest values corresponding to the driest/wettest soil conditions.The derived product represents the SM content in the first 5 cm of the soil in relative units between totally dry conditions and total water capacity.The unit is degree of saturation (%), but can be turned into volumetric units with the use of soil porosity information.In this study, the soil porosity information was not used since the product was rescaled using the model climatology (refer to Section 3.3 for further details).The spatial resolution of the product is 25 × 25 km, then resampled at 12.5 km and based on the WARP 5 grid (using a weighted mean with weighting coefficients computed according to the Hamming window function).Global coverage over Europe is achieved in ~1.5 days, while in Italy, measurements are available about once a day.Along with the data, a number of ancillary information is provided, such as porosity, orbit direction, processing and surface state flag, soil moisture, noise, and so on.
For the analysis, only default values of the processing flags were used while frozen points were removed from the time series.Information on soil moisture noise varying between 0 and 100% was used as a reference for evaluating the accuracy of the product in the different catchments.
Given the different catchment extensions and spatial resolutions of the product, only data falling inside the catchment boundaries were selected and spatially averaged by using the inverse distance weighted (IDW) interpolation method with respect to the catchment centroid.Thanks to the property of temporal stability of SM [46] and as shown by other studies [47], the ASCAT product satisfactorily represents the ground SM state in the study area, also for small river basins, and, as a result, they can be considered spatially representative of the real SM variations.Associated soil moisture noise information was averaged as well in the same way and is shown in Table 1.As it can be seen, the values are very similar except MA-AZ for which soil moisture noise is slightly larger.Note that these values are used in the analysis only as a reference to estimate the uncertainty of the satellite observations and will not be directly used in the data assimilation experiments.Moreover, it is expected that when the satellite SM time series are averaged, their noise is somehow reduced, hence, there could be a non-straightforward relationship between the mean value of the SM noise in the catchment and the related real noise of the mean satellite SM time series (actually, this is true only for TV-PF and NE-MA since they contain more than one pixel in the catchment as it can be seen in Table 1).

Hydrological Model
The continuous and distributed rainfall-runoff model (MISDc) is a hydrological model developed for flood forecasting in the Upper Tiber River region at the hourly (or less) time scale.The lumped version of MISDc used in this study couples a Soil Water Balance model (SWB, [1]) to simulate the SM temporal pattern and a routing module [48] for transferring the rainfall excess to the outlet section of the catchment.The two models are linked through an experimentally-derived linear relationship between the soil potential maximum retention S of the Soil Conservation Service-Curve Number (SCS-CN, [49]) and the relative SM at the beginning of a rainfall event.It has to be pointed out that SWB simulates the soil moisture evolution considering a single soil layer (in a previous paper [17], the authors used a two layers model).Here, a single layer structure was used in order to reduce the complexity of the assimilation.
The SWB represents the main processes needed for SM simulation: infiltration, percolation and evapotranspiration.The processes are represented for infiltration through the SCS-CN approach, for drainage by a gravity-driven non-linear relationship and for actual evapotranspiration, by a linear relationship with the potential evapotranspiration calculated through a modified Blaney and Criddle method [50].The SM simulated by the SWB is used to calculate the parameter S by means of an experimentally-derived relationship between S and SM [51].
Three different components contribute to generating discharge: the surface runoff, the saturation excess, and the subsurface runoff component.The first two are summed and routed to the catchment outlet by the Geomorphological Instantaneous Unit Hydrograph (GIUH) while the subsurface runoff is transferred to the outlet section by a linear reservoir approach.
For the three components (i.e., surface runoff, saturation excess and subsurface runoff), the lag time is evaluated through the relationship proposed by Melone et al. [52]: with L being the lag time in hours, A the area of the catchment (km 2 ), and η a parameter to be calibrated.MISDc requires as input data rainfall and air temperature.The model outputs are both the total volume of runoff and the catchment mean relative SM.
In summary, MISDc, contains eight parameters to be estimated, i.e., the maximum water capacity of the soil layer, Wmax, the saturated hydraulic conductivity, Ks, the parameter controlling the fraction of drainage that transforms in subsurface runoff, υ, the pore size distribution index, m, the correction coefficient for the potential evapotranspiration, Kc, the lag-area relationship parameter η, the initial abstraction coefficient, λ, and the parameter a of the relationship between modelled SM and the S of the SCS-CN method.

Ensemble Kalman Filter
The EnKF was introduced by Evensen [24] as an alternative to the traditional Extended Kalman filter which has been shown to be problematic because of the strongly nonlinear dynamics.The EnKF is a sequential DA technique that uses the traditional update equation of the classical Kalman filter methods with the difference that, in the gain calculation, the error covariance is evaluated by using an ensemble of model states.
In particular, being Y(t) the vector of system states at time step tk obtained via a generic model and Zk the observation vector at time tk, then, the optimal updating of Yk, can be expressed as where i k  Y and i k  Y refer to the forecast and analysis states for the ith ensemble member, respectively, Hk is the observation operator that maps the model states to the observations, vk is a synthetically generated error added to the observation Zk and represents the uncertainties of the observation process that is assumed to be a mean-zero Gaussian random variable with variance Rk [53], and Kk is the Kalman gain: In Equation ( 3) k  P is the covariance matrix of the forecast error calculated by propagating an ensemble of model states (obtained by Monte Carlo or other ensemble generation methods) using the updated states from the previous time step.
A single deterministic EnKF prediction (i.e., the "analysis") is calculated by averaging model state predictions, k  Y , at each time step across the N members of the ensemble (i.e., the ensemble size).In Equation ( 4) k  Y denotes the mean of k  Y .For further details on the EnKF and its application to rainfall-runoff modelling, refer to Reichle et al. [16].

Filtering and Rescaling Techniques
While satellite SM observations represent only the top few centimeters of the soil column, the active soil layer involved in the rainfall-runoff transformation may be much deeper.Indeed, the hydrological model used in this study simulates the variation of the storage in a single soil layer that can assume depths spanning from the surface to the root zone depending on the catchment hydrology.To use surface observations in the DA experiments, a solution is to apply the exponential filter proposed by Wagner et al. [25] to the satellite time series in order to estimate the soil water index (SWI) of the root-zone.SWI has been widely used to represent SM at deeper layers based on surface observations through the following equation [54].
where msk is the surface SM observed by the satellite sensor at time tk and SWIk is the Soil Wetness Index representing the profile averaged saturation degree at time tk.The gain Gk at time tk is given by (in a recursive form): where T is the characteristic time length and represents the time scale of SM variations to obtain the SWI.In this study, T was calibrated by maximizing the correlation between SWI obtained through ASCAT observations and modelled SM during the calibration period (i.e., the common period between available satellite observations and model calibration results, 2007-2009).
Satellite SM observations and model predictions are generally characterized by considerable bias, hence, prior to the assimilation, this bias must be corrected in order to optimally merge the two time series [55].The bias removal is in general carried out by rescaling the observations against model predictions [56].In this study, three different rescaling schemes were implemented: linear regression (LR, [38]) mean and variance matching (VM, [17]) and cumulative distribution function matching (CDF, [57]).In the LR approach, the observations are regressed to the model predictions using a least squares regression technique, while for the VM the mean and the variance of the observations are matched to the ones of the model by using a simple formulation as described in [17].VM involves the matching of only the first two moments of the probability distribution associated with the time series, hence as additional RT we used the CDF approach in which the matching is carried out for all the moments of the distribution.It is worth noting that other rescaling techniques could be used such as the CDF applied to the anomalies; however, the ones chosen for this work appear to be the most used in the literature.We have also to remark that we did not consider any effect of a bias seasonality as underlined by [58] and we applied the RTs to all data regardless of the season.The latter issue represents an additional complexity that may affect the results of the assimilation and will be the subject of future studies.
In principle, when the errors associated to observations and modelled time series are small with respect to the signals, these three RTs provide solutions close to optimal ones, however, this is hardly true in hydrological DA; hence, the three rescaling methods may present some problems [38].As a result, an evaluation of their performances in a real DA experiment is beneficial for a better understanding of their behavior [34].
The setup of the rescaling schemes was carried out in order to guarantee the assimilation in an operational mode.That is, the parameters related to rescaling were "calibrated" prior to the assimilation in the calibration period.In this way, rescaling is always based on the past available data, while no "future" information is used.It has to be noted that a more optimal methodology can be used such as carrying out the calibration of the rescaling parameters at each assimilation step by using all the available past data.Although this is easily implementable, here we preferred to set a fixed-long (>2 years) time period in order to reduce additional influences from the calibration of the rescaling parameters on the analysis.

Model Error Representation and Observation Error
The determination of the model error is a crucial step in the DA experiment since model predictions and observations are merged through Equation ( 3)-which expresses the relative weight between the model and observation errors through the Kalman gain.If these errors are not properly represented, the determination of the K can significantly affect the results of the assimilation.As remarked in the Introduction, the main sources of uncertainty in hydrologic models are the errors in the forcing data, the model structure and the incorrect specification of model parameters.A common way to represent these errors is by adding unbiased synthetic noise to forcing datasets (precipitation and evapotranspiration), model state variables and/or model parameters.
In this study, we adopted a multiplicative model error [59] for the rainfall data (assuming no autocorrelation in the rainfall error) and an additive perturbation for temperature [17].In particular, rainfall was perturbed by a log-normally distributed, unit mean, spatially homogeneous and temporally uncorrelated multiplicative random noise σp, while for temperature a zero-mean normally distributed additive error was chosen with temperature standard deviation equal to σT.For the perturbation of model parameters, we first carried out a sensitivity analysis (not shown) in order to select for modelled SM the most sensitive parameters.Then, these parameters were perturbed with a normally distributed additive error with zero mean.In particular, only three parameters were selected: Wmax, Ks and Kc, for which the relative standard deviations were expressed as percentage α of the absolute value of the parameter, i.e., σWmax = α1Wmax, σKs = α2Ks, and σKc = α3Kc.The model error estimation was performed by varying the values of the percentages αi (i = 1,2,3) along with σp (σT was assumed constant) in a way that two ensemble verification measures (Test #1 and Test #2 onward) commonly used in meteorology [29] were satisfied.That is, if the ensemble spread sp is large enough, the temporal mean of the ensemble skill <sk> should be similar to the temporal average of the ensemble spread <sp> and, if the observation is indistinguishable from a member of the ensemble, where In Equation ( 9), mse denotes the ensemble mean squared error, i k Q is the simulated discharge of the ith member at assimilation time tk, Qobs,k is the observed discharge at time tk, Tp is the length of the time period, while <• >is the mean operator.Assuming σT constant, from different values of αi (i = 1,2,3), σp and σT we obtained different ensembles and different values, f1(α1, α2, α3, σp) and f2(α1, α2, α3, σp) of Test #1 and Test #2, respectively.Then, the optimal selection of αi (i = 1,2,3), σp was carried out by picking up the minimum of the following function: The calibration of the model error standard deviations and σp was performed during 2000-2009, then the same perturbations were used during the validation period for generating the ensemble.
Test #1 and Test #2 serve to guarantee appropriate ensemble statistics.Indeed, if the spread of the ensemble is too large the filter tends to overfit the observations.By contrast, if the spread of the ensemble is too little, the filter tends to underuse the observations.Hence, Test#1 ensures that the ensemble is large enough while the Test #2 checks whether the truth is statistically indistinguishable from a member of the ensemble [17].
Another problem with perturbation methods is that perturbing restricted value variables (e.g., soil moisture, rainfall and parameters) may produce unrealistic values of such variables and bias in the ensemble that may affect the optimality of the assimilation [60].To avoid this, we adopted the same techniques used by Turner et al. [60] in which restricted variables are perturbed by taking into account their domain of existence.This procedure guarantees realistic values and absence of a biased ensemble.
As remarked in the Introduction, a correct assumption of observation error is fundamental in sequential DA.In this study, we explore the sensitivity of the assimilation results as a function of the observation error for different RTs and different basins with different areas.Since the assimilation of the observation was carried out by considering the SWI, the observation error is referred to this variable.To this end, different observation error standard deviations σobs, were chosen ranging from 0.5%-20%.Moreover, as a reference we calculated the variance of rescaled SWI for VM, LR and CDF schemes and take the ratio between 2 obs  and 2 SWI  .In particular, as in [27], we considered an upper bound of the rescaled observation error for each catchment, and tested a selection of error variances within the bound.The rationality behind this is that if we express the rescaled observation (

Performance Indexes
The performance of the DA experiments were evaluated using different scores.Each score was calculated by comparing the observed discharge time series (Qobs) against (i) the discharge time series obtained during the calibration and the validation period, Qc and Qv; (ii) the time series obtained by the mean of the different discharge ensemble members of the open loop simulations, (Qol) and (iii) against the mean of the different updated discharge ensemble members obtained in assimilation mode (Qa).
In particular, the Nash-Sutcliffe efficiency index, (NS, [61]) was used in order to evaluate the agreement between Qc, Qv, Qol and Qa (Qsim in Equation ( 11)) and Qobs: where obs Q is the mean value of the observed discharge during the period of analysis Pa.NS was calculated for both the entire time series and for a number of selected events.The event selection was carried out by extracting the events with a continuous rainfall characterized by a total rainfall larger than 10 mm, and no rainfall in the preceding day.Only for discharges related to the selected events, we also calculated the error in volume by the following equation: To evaluate the benefit in DA, we calculated the efficiency, Eff, expressed as and represents the relative improvement in discharge estimation after the assimilation of SM.Eff > 0 denotes an improvement of the discharge estimation while Eff ≤ 0 means no improvement or deterioration.Eff in Equation ( 13) was calculated by comparing Qobs with the open loop (Qa = Qol) and with respect to the validation (Qa = Qv).
To evaluate the specific improvement of the assimilation in low and high flows, we used two indexes as in [62].The first one is a modified version of the Nash-Sutcliffe criterion and it is basically a calculation of the NS in logarithmic discharges in order to give more weight to low flows.
In particular, we calculated the differences ΔNS(logQ), ΔANSE between the results obtained after and prior the assimilation.In Equation ( 14), ε is arbitrarily chosen as a small fraction of the inter-annual mean discharge (e.g., 40 obs Q ) and is introduced to avoid problems with nil observed or simulated discharges.As additional score, we calculated the Normalized Root Mean Squared Error (NRMSE, [22]): NRMSE greater than one denotes a deterioration of the results while the opposite means improvement.As highlighted by Alvarez-Garreton et al. [22], the NRMSE provides information about both the spread of the ensemble and the performance of the ensemble mean, which can be considered as the best estimate of the ensemble prediction.
By changing the time window where NRMSE is calculated, it is possible to evaluate the benefit of the assimilation in time.To this end, NRMSE was calculated over a time window of 90 days in order to analyse a possible seasonal effect on the assimilation of SM.

Model Calibration and Validation
Two distinct periods were considered in the analysis: A calibration period of 10 years ranging from 1 January 2000 to 31 December 2009 and a validation period of four years where we run the model in assimilation mode.The calibration of the hydrological model was performed by a standard gradient-based automatic optimisation method ("fmincon" function in MATLAB ® ).The calibrated parameters (only Wmax, Kc, Ks and η) are shown in Table 1 while the results in terms of NS of the calibration are reported in Table 2.A general good behaviour (in calibration) can be seen in the model for all the catchments.
The values of the parameters denote a homogenous behaviour for all catchments for Wmax and Kc while different values of Ks and η were obtained for MA-AZ.The results obtained in validation-we are referring to a single deterministic run using the calibrated parameters of the previous step-provide relatively good performance with NS always above 0.6 (an example of the discharge time series in validation for TV-PF during 2013 is shown in Figure 2b).The analysis of the results was carried out also by calculating the performance of the model on a number of selected flood events (Table 3) extracted from the time series as specified in Section 3.5.The results for validation are generally satisfactory (NS > 0.7 and error in volume lower than 30%) except for TV-PF and NI-MI.The poor performance of the model in these two cases was mainly determined by a significant mismatch existing between the model simulation and the observed discharge for the most important flood event which occurred in November 2012 (see the results for TV-PF in Figure 2c).Indeed, in this case, a leave overtopping occurred prior to the measurement section determining a flood peak much lower than expected (i.e., there was a significant reduction of the river discharge upstream to the measurement section) and, as a consequence, a strange shape of the flood hydrograph formed.
Overall, the model performances can be assumed satisfactory considering the hourly temporal resolution, and thus the experiment can be considered a good test for the analysis of satellite SM DA.Table 2. Results of the calibration-validation procedures and of the data assimilation experiment for the five selected catchments.NSc is the Nash Sutcliffe efficiency index obtained in calibration from 2000-2009, NSv refers to the same index in validation during 2010-2013, while NSol is the Nash Sutcliffe calculated in the same validation period on the ensemble mean.On the right part of the table, σobs refers to the observation error for which the data assimilation provides the best performance in terms of Nash-Sutcliffe (NSa).Effv and Effol refer to the results in terms of efficiency in the validation and in the ensemble mean, respectively.Gray cells in the table denote a deterioration with respect to the open loop simulations.Table 3. Results of the data assimilation experiment for the five selected catchments on the selected flood events (number of events Nev).NSval is the Nash Sutcliffe efficiency index calculated on the selected flood events from 2000-2009, NSol refers to the same index calculated on the selected flood events on the ensemble mean, σobs refers to the observation error (the same of Table 2) while NSa is the Nash-Sutcliffe efficiency index calculated after data assimilation on the selected flood events.Effv and Effol refer to the results in terms of efficiency with respect to the validation and to the ensemble mean, respectively, calculated for the selected flood event.Gray cells in the

Ensemble Generation and Data Assimilation Experimental Setup
Model parameter errors were calibrated (in the period [2000][2001][2002][2003][2004][2005][2006][2007][2008][2009] by running the open-loop (i.e., the run without assimilation of SM) with 50 ensemble members by varying the coefficient α1 between 30 and 80% along with α2 and α3 between 0% and 30%.At the same time, σp varied between 0.25 and 1, while σT was assumed constant and equal to 3 °C .Figure 3 shows the value of the function F of Equation ( 10) in the α1 − σp domain for different values of α2, α3 for NE-MA catchment (the other catchments lead to the same conclusions and are not shown for brevity).It can be seen that there is a problem of equifinality in which different combinations of error parameters lead to a similar value of F. The main responsible of this behavior is α1 which controls the perturbation of the storage parameter Wmax.In order to make a selection, we ran the same assimilation experiments by using different optimal and close-optimal solutions (e.g., 2/3 solutions for each catchment) obtaining very similar results with some small differences in some cases.It is likely that the model-non-linearity plays a role in this case and allows obtaining similar model error covariance and in turn similar results.Finally, since a choice has to be made (e.g., think to the application of the DA at operational level), we selected the best combination of the error parameters as described in Section 3.4.Such values for all the catchments are shown in Table 1.In addition, it has to be stressed that the selection of the quadruplet was made during the calibration period, and hence, this does not guarantee that the quadruplet is optimal also during the validation phase (2010-2013).This eventuality was tested for all catchments and it was found that the tests are close to the required values except in some cases (e.g., NE-MA yields 0.68 vs. 1 for test #1 and 0.63 vs. 0.71 for test #2).Another issue that might potentially affect the methodology of the selection of the best error parameters is the reliability of the stream flow observations.Indeed, there is the assumption that their random error is much lower than the modelling error.Although the quality of the observations is very high-since they were successfully tested and cross-verified by many other papers in the past (e.g., [17,40,63])-we cannot guarantee this is always true.The above results suggest the complexity of the issue, which would require a separate and detailed analysis specifically addressed to the problem of the estimation of the model error covariance.
Table 1 also shows the mean model error standard deviation m  which ranges between 2% and 5% and does not seem to have any relation with the performance of the model in the validation period.The ensemble of model states generally contains a look similar to the one of TV-PF presented in Figure 2. It has to be noted also larger ensemble sizes were used (e.g., 100, 500) leading to the same overall conclusions.In Equation (2), vk (the error representing the uncertainties of the observation process with variance R and hereafter as σobs) was made variable between 0.5% and 20% (i.e., 0.005, 0.01, 0.03, 0.05, 0.07, 0.09, 0.12, 0.15, 0.18, 0.2) and the corresponding R was then compared against the variance of the rescaled SWI for different RTs as explained in Section 3.4.Assuming that the range of volumetric soil moisture variability is equal to 0.33 [1], these values correspond in volumetric terms to a range variable between 0.002 and 0.06 m 3 /m 3 and are in the range of those found in the area by [31].
The determination of the characteristic time length T and the calibration of the rescaling parameters were performed in the period 1 January 2007 to 31 December 2009, then these values were used for filtering and rescaling as the observations from the satellite were made available.The objective was to reproduce a "real time operational" application of DA by using only information from the past for flood prediction.The results of the calibration of the parameters T are shown in Table 1 while the parameters related to the RTs are not shown.T values are consistent with those found by [64] in the area with larger values obtained for NE-MA.The filter setup with the above configuration was used for all the five selected catchments.

Data Assimilation Experiment
Figure 4 summarizes the results of the DA experiments for the selected catchments ordered from the largest (top) to the smallest (bottom) for the different RTs and different observation errors.It can be seen that the assimilation generally provides positive efficiency indexes for a large range of observation errors except for MA-AZ.The improvement are significant for TV-PF, NI-MI and CHI-MO with maximum performances (Effa between 30% and 40%) obtained for σobs ranging between 0.03 and 0.07.NE-MA shows similar improvements but only after σobs = 0.09.The same efficiency index as a function of the ratio between the variance of the error and the rescaled observation  greater than one.It has to be pointed out that the latter results violate the assumption made in Section 3.4, which assumes the variance of the rescaled observation as an upper bound value of the variance of the observation error; hence, making this assumption may not lead to an optimal solution.For MA-AZ, Effa is always lower than zero although it increases by increasing the observation error.For a better understanding the best configurations obtained in DA for the five catchments, we summarize in Table 2 the experiments characterized by an optimal choice of the observation error (assuming it was possible its identification).Among other performance scores, the table shows the NS obtained during the assimilation run, NSa, and Eff calculated by using Qol and Qv.It can be seen that except for CHI-MO the optimal observation error does not change too much from one RT to another and lower NS and Eff are generally obtained with CDF.
The same analysis of Table 2 was carried out on the selected flood events and it is presented in Table 3.It can be seen a significant improvement of the flood prediction for TV-PF, CHI-MOR, NE-MA and NI-MI while for MA-AZ the assimilation provides a deterioration of the results with respect to the open loop.At the same time, the error in volume reduces for TV-PF, NE-MA and CHI-MO while it increases for the MA-AZ and NI-MI for all the RTs.
To assess the effectiveness of the assimilation specifically for low and high flows we calculated the gain (ΔNS(logQ), ΔANSE) obtained in the NS(logQ) and ANSE indexes with respect to the open loop simulation for the three RTs considering the optimal choice of observation error (the same presented in Table 2).The results are plotted in Figure 5 and show significant improvement for both indexes only for NI-MI.For TV-PF, DA succeeds only for high flows and show contrasting results between the different RTs.MA-AZ always provides poor results while CHI-MO and NE-MA have a general consistent behavior.Overall, the improvement for high flows is more consistent.2.
An example of the results of the DA for TV-PF (LR) is shown in Figure 2. Figure 2a shows the reduction of the uncertainty with respect to the open loop simulation of SM (i.e., a reduction of the ensemble spread) determined by the assimilation of the SWI and the value of gain parameter K in time.As it can be seen, K shows a general increase in the last parts of the year meaning that the filter is disposed to using more observations during dry-to-wet periods.Figure 2b contains the open loop simulation, the validation run and the updated discharge during 2013 (we show only one year for a clearer visualization), while Figure 2c focuses on the selected flood events for the entire validation time period (i.e., 2010-2013).It can be seen that while certain flood events (4 April and 30 November 2010, 27 November 2012) are better predicted with SM assimilation, others, even if close to the previous ones (20-28 November 2010), are inaccurate.Finally, to analyze the effect of the seasonality, we calculated the NRMSE within a time window of 90 days as a function of time (Figure 6) and we relate it with the SM temporal variability (by calculating the variance of the rescaled SWI in the same time window).Figure 6 shows the results obtained for the LS RT and the optimal choice of the observation error (as above).Although NRMSE shows some random characteristics due to the occurrence of flood events of different magnitude during the year-which is much more pronounced for relatively small basins-a similar pattern can be observed for all catchments (except MA-AZ) which highlights positive performances (NRMSE < 1) of the assimilation during periods of higher SM temporal variability and negative during periods with smaller temporal variability (NRMSE > 1).

Discussion
The performed analysis highlights a good general behavior of the hydrological model for all the selected catchments both in calibration and in validation (Table 2).From the hydrological modelling point of view, MA-AZ is characterized by a larger value of the coefficient of drainage, Ks, and a much quicker response (smaller η) with respect to the catchments located in the western part of the study area (NE-MA, CHI-MO, NI-MI) while TV-PF shows an intermediate behavior.In particular, the larger Ks for MA-AZ is somehow consistent with the higher infiltration rate value (i.e., MA-AZ is located in the Apennines relief with high permeable rocks, see Table 1) while the smaller η is likely due to the difficulty of the model to reproduce the larger baseflow component for this catchment (also noted for TV-PF).
The overall results of the assimilation are positive (similar results were obtained by [17,63] in the same area).Improvements of the performance after the assimilation are obtained for all catchments for proper configurations of observation error and RT, while for MA-AZ no pairs (σobs-RT) have been found to lead to an enhancement of the performances.The reason for the poor assimilation results of MA-AZ might be due to three main causes.First, the expected poor quality of the satellite observations (the catchment is located in a mountainous area where ASCAT may suffer from topography issues)-this problem is confirmed by the higher SM-OBS4 soil moisture noise with respect to the other catchments as shown in Table 1.Second, an overestimation of the model error covariance, which prevents the filter from obtaining an improvement in the range of the chosen observation errors.Finally, and most likely, it might be due to the runoff generation mechanism, which is different from the other catchments.Indeed, the higher permeability of MA-AZ catchment determines that water stored in the soil is less retained by surface tension (i.e., gravity is dominant) and water is released in large quantities to the subsurface leading to a larger contribution of base flow and subsurface runoff.This contribution is not negligible with respect to the total runoff in this catchment and is much less controlled by SM variation at the surface (i.e., where we have information of SM from the satellites).On the other hand, catchments located in the western part of the Umbria Region are characterized by less permeable soils with higher retention properties and show larger sensitivity to surface SM.This second hypothesis was tested though a synthetic experiment (not shown) which showed that the efficiencies obtainable for the other catchments were much larger than those obtainable for MA-AZ.Although DA provides positive results for most of the catchments, the improvements also contain a number of contradictions, which are highlighted below.
In the calibration of the model error, it was found that different parameter perturbations lead to the same values of the selected ensemble statistical tests (Figure 3) and the model error does not seem to have a relationship with the model performance (e.g., compare the model error of NI-MI and MA-AZ in Table 1 and their performance in Table 2).This highlights some difficulties in the identification of the optimal model error to perform the DA.This issue may also be exacerbated for a distributed model characterized by a large number of model parameters.It must be said, however, that the entire procedure for the quantification of the model error covariance and the satisfaction of the ensemble statistical tests might be affected by the presence of random errors in the observed discharge time series.As a result, for a better understanding of the problems associated with this methodology, we recommend a more focused study (which makes use of a synthetic experiments) specifically addressed to cope with these problems.
The effect of the catchment area along with other basin characteristics was explored by comparing the results of the assimilation for the different catchments (Table 2, Figure 4).No particular pattern was found in the results of assimilation by changing the size of the catchment.Indeed, improvements were obtained for the biggest and the smallest basin.This raises two main issues related to the spatial mismatch between the catchment size and the spatial resolution of the satellite SM observations.That is, (i) coarse satellite SM observations (e.g., 25 km spatial resolution) can add a significant benefit also for small catchments (e.g., NI-MI), and (ii) the spatial mean of more satellite SM pixels (i.e., taken among the pixels fallen inside the catchment area) can be successfully assimilated in a lumped model (e.g., TV-PF).Thus, the area of the catchment seems to play an insignificant role in the performances, at least for the explored range.
Assuming a correct evaluation of the model error, the success of the DA seems slightly affected by the choice of RT.The pattern of the improvements looks very similar for VM, LR and CDF, and the optimal error is most of the time similar for the different RTs.Some slight differences between the performances in terms of optimality in the choice of the observation error can be observed if we express the quantification of the observation error as a fraction of the variance of the rescaled observations (Figure 4.).In this case, VM and CDF provide almost coincident results and LR yields slightly lower values with similar efficiency.Moreover, it is clear in this case that considering the observation error as a fraction of the rescaled observation may be not always valid.
Once we selected the optimal observation error (the one related to the maximum NSa, Table 2) for the three RTs, we obtained an improvement for all catchments except MA-AZ and a general lower performance for the CDF technique.The reason for the lower performance of CDF might be due to the small sample size, (i.e., the length of the time series) where the CDF technique was calibrated.Similar results were found by [34] in the Murray Darling Basin in Australia who analyzed the impact of different RTs on the DA and showed sub-optimal performance of the CDF due to the relatively short two-year data record.As mentioned earlier, we have to stress the existence of other rescaling techniques (see [57,65] for further details); however, we found that the ones selected in this study are the most used in DA of SM in rainfall runoff modelling.Therefore, we prefer to focus our analysis by using them, and pursue the matter in a future study specifically addressed to analyzing the effect of the RT on the DA.
The analysis of the results in terms of flood events (Table 3) or by considering the indexes better suited for high and low flow conditions (Figure 5) offers an interpretation of the outcomes from another perspective and highlights some conflicting results.Indeed, by considering only the flood events, the assimilation deteriorates the performance in terms of error in volume for NI-MI while the analysis in terms of high (ANSE) and low flows (NS(logQ)) highlights significant improvements for TV-PF for high flows but smaller and even negative scores for low flows.NE-MA, CHI-MO and NI-MI have consistent performances for low and high flows as well as MA-AZ for which the scores are always negative.Overall, a general lower performance is obtained with the CDF technique for high flow conditions, while for low flows CDF demonstrates an unexpected good behavior for TV-PF and provides similar results to the other techniques for the other catchments.As before, this might be due to the larger number of SM values associated to low flow conditions with respect to those in high flow conditions that are used for calibrating the CDF parameters.The above considerations suggest that simpler rescaling techniques (e.g., VM) can be considered equally valid (or better), and, in most of the cases, preferable due to the simplicity of their implementation.
The choice of the optimal observation error for a user interested in performing DA in this area also appears to be a complex task.For instance, the optimal error is very different among adjacent catchments characterized by similar climatology, slopes, rainfall and altitude and similar calibration and validation scores (e.g., NE-MA and CHI-MO).In this case, the improvements are similar but the observation error is very different (20% for NE-MA, 5% for CHI-MO, or twice the rescaled observation for NE-MA and about one third of the rescaled observation for CHI-MO).One could argue that this may depend on the different quality of the satellite observations, but if we consider as reliable the SM-OBS4 soil moisture noise information (see Section 2.2), we find that it is very similar between the two (9% for CHI-MO, 9.5% for NE-MA).A justification of this result might be found in the accuracy of the model error estimation.Indeed, while the model performs very similarly for the two catchments, their model errors are quite different (Table 1); hence, the procedure used in Section 4.3 for the estimation of model error may be not completely reliable.Moreover, in validation (2010-2013), the standard deviations calibrated during 2000-2009 could not lead to the same optimal solutions.The result is that a proper choice of the observation error may strongly depend on a correct model error estimation.This is somehow consistent with what has been already found in the literature (e.g., [28]).
The analysis of the seasonality highlights some other interesting points.In general, it has been observed that the assimilation provides good results when SM temporal variability is higher (except for MA-AZ).Indeed, the periods of positive performances match with the periods of larger SM temporal variability for NI-MI, TV-PF, NE-MA and CHI-MO.The higher variability coincides in Central Italy with the wet season where, in fact, in November-March, the natural increase of SM (due to the seasonality) is accompanied by a high interchange between short-lived wet and dry periods, and the non-existence of a proper rainy season determines soil conditions which hardly remain constantly saturated.In these conditions, SM variability increases and the wetness conditions of the catchment prior to the occurrence of a flood event play a crucial role in determining the magnitude of the flood response.By contrast, in summer, rainfall is scarce and the relatively high temperature of the Mediterranean climate lowers the SM toward wilting point conditions for longer periods with a reduction of the SM variability.However, when SM temporal variability is low in the wet season (e.g., end of 2011), it does not guarantee that positive results are obtained by assimilation of SM.This suggests that rather than considering the season, one should consider the SM temporal variability for a good interpretation of the results.In order to find a mathematical proof of this tendency, we calculated the correlation coefficient between the SM temporal variability and the NRMSE, however, due to the randomness of the flood events especially in relatively small river basins (as explained in the Results Section), we found low absolute correlation values (around −0.3 for all the catchments except MA-AZ which yields +0.45).Although these values are low, the presence of an inverse correlation value (i.e., higher SM variability vs. positive performance) somehow suggests the existence of the tendency of a reduction of the NRMSE with higher SM temporal variability.We expect that using a longer time period of analysis and focusing the study on much larger river basins can provide more convincing results.It should also be pointed out that satellite ASCAT SM observations are affected by larger errors during the dry season in the Mediterranean areas due to the effect of the vegetation and other factors [66,67], and this might have an effect on the result interpretation.The contrasting behavior of MA-AZ catchment (i.e., the lower performance of the DA when the SM temporal variability is higher, considering also the positive correlation between the SM variability and NRMSE), highlights the importance of possible local effects or a much larger observation error, which prevents an improvement in flood predictions.
Table 4 summarizes the main points addressed in the discussion and may facilitate the reader to go through the conclusion section.

Model error
We performed a sensitivity analysis and calibrated the perturbation of the parameters and of the rainfall for calculating the optimal value of the ensemble statistical indexes The results are good but we observed a problem of equifinality in which different perturbations lead to same values of the ensemble statistical indexes.

Catchment area, topography and soil type
We compared the performance of the assimilation for different catchments characterized by different areas (140-2000 km 2 ) and soil type We did not observe any effect of the catchment area but catchment specific characteristics may prevent positive results of the DA

Rescaling technique
We compared the performance of the assimilation for three different RTs (LR, VM, CDF) We obtained a uniform pattern of performance of the DA for LR, VM and CDF.We concluded that simple techniques may be equally valid as easier to implement

Observation error
We chose different observation errors and tested their optimality in the DA experiments We obtained that adjacent and similar catchments have very different optimal observation errors, therefore its appropriate choice may strongly depend on a correct model error estimation

Flood magnitude
We explored the performance of the DA for low and high flows We found larger improvements for high flow conditions.

Seasonality
We performed a seasonal analysis and compared it against the SM temporal variability We obtained that DA performance during the year that has a relation with the SM temporal variability.Thus, we advise that this parameter must be taken into consideration when analyzing possible effects of seasonality in the DA of SM.

Conclusions
In this paper, making use of an experimental study, we discuss and highlight how the implementation of a DA experiment involves a complex setup phase characterized by a number of choices that can have non-negligible effects on the success of the DA.Assuming the validity of the assumptions for the application of the EnKF-which are not the goal of this study-we analyzed the effects of some important factors which can introduce complexities, uncertainties and non-optimality conditions for the application of DA and which can be propagated across the whole DA process.In particular, we found a number of contradictions and optimality issues that make the DA of soil moisture into rainfall-runoff modelling a complex task.In general, we found a general low effect for size but a strong effect for catchment specific characteristics.Apparent insensitivity of the RT to the DA performance might be the result of a lack of analysis specifically focused on high, low flow or singular flood events and simple rescaling techniques may be equally valid to complex ones, with the advantage of an easier implementation.The selection of the observation error also provides some evidence that its optimal choice has a strong connection with an accurate representation of the model error and that the latter might play a central role in the success of the DA experiment.The analysis of the seasonality has provided evidence that SM temporal variability rather than the season is an important parameter to consider for a correct interpretation of the results of the DA of SM in terms of seasonal analysis.
Overall, while we provided new evidence that assimilation of SM can significantly improve discharge predictions, at the same time we showed that a number of factors can lead to non-optimal solutions with the result that DA of SM in rainfall-runoff modelling appears to be still a complex task and seems very much affected by local conditions and methodology issues.To this end, we believe that the setup phase must be tailored considering specific catchment characteristics and it needs a preliminary filter calibration operation.Hence, we encourage that further research in this field not only provides general guidelines or rules for a proper DA everywhere, but also supports catchment-data and application specific solutions, which involve new filter calibration methodologies as part of iterative DA setup improvement.This could be effectively reached through the development of open source tools for an easy and simple setup of the DA experiments, which would allow testing sensors, techniques and DA results worldwide.Only then we will have a clearer picture of whether a sensor or a technique is better than another.
R ASCAT SWI as the sum of the "true" rescaled observation   R ASCAT SWI and the rescaled observation error..), and we assume error orthogonality, we can express the variance of the rescaled observation as R (from here onward not in bold since in the paper configuration it is a scalar) in Equation (3).Under this assumption 2 R ASCAT SWI  can be considered an upper bound for R. Specific settings for the model parameter errors and observation errors are discussed ahead in Section 4.2.

Figure 2 .
Figure 2. Results of the data assimilation experiment for Tiber River at Ponte Felcino using LS as RT and an observation error equal to 7%.(a) Plots the satellite SM time series along with the ensemble forecast and analysis (5%-95% percentiles).In purple, it is also shown the Kalman gain.(b) Plots a small part of the discharge time series: the plot of the entire time series is not shown because of problems of visualization.In the panel, Qobs is the observed discharge, Qv is the discharge obtained in validation, Qol is the ensemble mean and Qa is the updated discharge obtained in assimilation.(c) Plots the discharge values for the selected flood events.

Figure 3 .
Figure 3. Contour plot of the function F of Equation (10) as a function of the standard deviation of the precipitation σp, the percentage α1 of the perturbation of the maximum water storage Wmax, and the percentages α2, α3 of the parameters of drainage, Ks and evapotranspiration Kc, respectively (period 2000-2009, catchment NE-MA).The optimal value of these standard deviations is identified by the red cross and corresponds to the minimum value of the function F representing the error made in the identification of the optimal value of the tests.It can be clearly seen a problem of equifinality in the figure with darker blue areas (smaller F) covering a large part of the domain.

22
of the panels) show similar results with TV-PF, CHI-MOR and NI-MI having higher performance with 2 2 R obs SWI  between 0.1 or less, and 0.5.Consistently, NE-MA shows improvements for

Figure 4 .
Figure 4. Efficiency index calculated with respect to the open loop simulation for the five catchments for the three RTs as a function of the observation error.σobs is the observation error standard deviation, 2 obs  is the variance observation error, 2 R SWI  is the variance of the rescaled observation.Values below zero (black line) denote a deterioration of the results.

Figure 5 .
Figure 5. Differences (ΔNS(logQ), and ΔANSE) between performance scores NS(logQ) and ANSE obtained prior and after the assimilation of SM for the five selected catchments.While (a) NS(logQ) is well suited for highlighting the performance in the reproduction of low flows, (b) ANSE is used for characterizing the agreement of high flows.The related observation errors are the same as selected in Table2.

Figure 6 .
Figure 6.Pattern of the Normalized Root Mean Squared error (NRMSE) and variance of the rescaled Soil Water Index (rescaled with LR technique), 2 R SWI  for the five selected catchments calculated with a time window of 90 days.Values above the red horizontal line (NRMSE = 1) denote deteriorations.Periods of increased 2 R SWI  generally correspond with improvements except for MA-AZ.

Parameters of the Rainfall-Runoff Model W max
table denote a deterioration with respect to the open loop simulations.

Table 4 .
Analyzed factors and findings in the DA of soil moisture in Central Italy.