LDAS-Monde Sequential Assimilation of Satellite Derived Observations Applied to the Contiguous US: An ERA-5 Driven Reanalysis of the Land Surface Variables

Land data assimilation system (LDAS)-Monde, an offline land data assimilation system with global capacity, is applied over the CONtiguous US (CONUS) domain to enhance monitoring accuracy for water and energy states and fluxes. LDAS-Monde ingests satellite-derived surface soil moisture (SSM) and leaf area index (LAI) estimates to constrain the interactions between soil, biosphere, and atmosphere (ISBA) land surface model (LSM) coupled with the CNRM (Centre National de Recherches Météorologiques) version of the total runoff integrating pathways (CTRIP) continental hydrological system (ISBA-CTRIP). LDAS-Monde is forced by the ERA-5 atmospheric reanalysis from the European Center for Medium Range Weather Forecast (ECMWF) from 2010 to 2016 leading to a seven-year, quarter degree spatial resolution offline reanalysis of land surface variables (LSVs) over CONUS. The impact of assimilating LAI and SSM into LDAS-Monde is assessed over North America, by comparison to satellite-driven model estimates of land evapotranspiration from the Global Land Evaporation Amsterdam Model (GLEAM) project, and upscaled ground-based observations of gross primary productivity from the FLUXCOM project. Taking advantage of the relatively dense data networks over CONUS, we have also evaluated the impact of the assimilation against in situ measurements of soil moisture from the USCRN (US Climate Reference Network), together with river discharges from the United States Geological Survey (USGS) and the Global Runoff Data Centre (GRDC). Those data sets highlight the added value of assimilating satellite derived observations compared with an open-loop simulation (i.e., no assimilation). It is shown that LDAS-Monde has the ability not only to monitor land surface variables but also to forecast them, by providing improved initial conditions, which impacts persist through time. LDAS-Monde reanalysis also has the potential to be used to monitor extreme events like agricultural drought. Finally, limitations related to LDAS-Monde and current satellite-derived observations are exposed as well as several insights on how to use alternative datasets to analyze soil moisture and vegetation state.


Introduction
One of the major scientific challenges in relation to the adaptation to climate change is observing and simulating the response of land biophysical variables to extreme events, making land surface models (LSMs) constrained by high-quality gridded atmospheric variables and coupled (i.e., model run without assimilation) is assessed using satellite-driven model estimates of land evapotranspiration from the Global Land Evaporation Amsterdam Model (GLEAM) project and upscaled ground-based observations of gross primary productivity from the FLUXCOM project, together with river discharges from the United States Geophysical Survey (USGS) and the Global Runoff Data Centre (GRDC). Over CONUS, in situ measurements of soil moisture from the USCRN network (US Climate Reference Network) are also used in the evaluation. Section 2 describes the different components of LDAS-Monde as well as the evaluation data sets and strategy. Section 3 provides a set of statistical diagnostics to assess and evaluate the impact of the assimilation. Finally, Section 4 provides perspectives and future research directions.

LDAS-Monde System Components
LDAS-Monde allows sequential assimilation of satellite derived land observations at a global scale. The assimilation is performed into the open-access SURFEX modelling platform of Météo-France (SURFace Externalisée, [18]). It produces offline re-analyses of LSVs using (i) an LSM along with data assimilation techniques, (ii) observations, and (iii) atmospheric forcing. Those components of LDAS-Monde are briefly described below.

The SURFEX Modelling Platform
LDAS-Monde uses the CO2-responsive version of ISBA embedded within the SURFEX platform. The most recent version of SURFEX (version 8.1) is used in this study with the "NIT" plant biomass monitoring option for ISBA. In this configuration, ISBA simulates leaf-scale physiological processes and plant growth [15][16][17]. The dynamic evolution of the vegetation biomass and LAI variables is driven by photosynthesis in response to atmospheric and climate conditions. Photosynthesis enables vegetation growth resulting from the CO2 uptake. During the growing phase, enhanced photosynthesis corresponds to a CO2 uptake, which results in vegetation growth from the LAI minimum threshold (prescribed as 1 m 2 m −2 for coniferous forest or 0.3 m 2 m −2 for other vegetation types). Transfers of water and heat through the soil rely on a multilayer diffusion scheme [29,30]. The ISBA parameters are defined for 12 generic land surface patches. They include nine plant functional types (needle leaf trees, evergreen broadleaf trees, deciduous broadleaf trees, C3 crops, C4 crops, C4 irrigated crops, herbaceous, tropical herbaceous, and wetlands) as well as bare soil, rocks, and permanent snow and ice surfaces.

LDAS-Monde System Components
LDAS-Monde allows sequential assimilation of satellite derived land observations at a global scale. The assimilation is performed into the open-access SURFEX modelling platform of Météo-France (SURFace Externalisée, [18]). It produces offline re-analyses of LSVs using (i) an LSM along with data assimilation techniques, (ii) observations, and (iii) atmospheric forcing. Those components of LDAS-Monde are briefly described below.

The SURFEX Modelling Platform
LDAS-Monde uses the CO 2 -responsive version of ISBA embedded within the SURFEX platform. The most recent version of SURFEX (version 8.1) is used in this study with the "NIT" plant biomass monitoring option for ISBA. In this configuration, ISBA simulates leaf-scale physiological processes and plant growth [15][16][17]. The dynamic evolution of the vegetation biomass and LAI variables is driven by photosynthesis in response to atmospheric and climate conditions. Photosynthesis enables vegetation growth resulting from the CO 2 uptake. During the growing phase, enhanced photosynthesis corresponds to a CO 2 uptake, which results in vegetation growth from the LAI minimum threshold (prescribed as 1 m 2 m −2 for coniferous forest or 0.3 m 2 m −2 for other vegetation types). Transfers of water and heat through the soil rely on a multilayer diffusion scheme [29,30]. The ISBA parameters are defined for 12 generic land surface patches. They include nine plant functional types (needle leaf trees, evergreen broadleaf trees, deciduous broadleaf trees, C3 crops, C4 crops, C4 irrigated crops, herbaceous, tropical herbaceous, and wetlands) as well as bare soil, rocks, and permanent snow and ice surfaces.
This version of ISBA is coupled to the CTRIP river routing model through OASIS-MCT [31] in order to simulate streamflows of the main rivers [32][33][34][35]. Besides, a single-source energy budget of a soil/vegetation composite is computed. SURFEX also involves data assimilation techniques to analyse LSVs from the ISBA LSM.
This study makes use of the simplified version of an extended Kalman filter (SEKF), as already used and described in [13,19,21,36]. The SEKF uses finite differences from perturbed simulations to estimate the linear tangent model linking the model state control variables to the observed variables. Satellite derived surface soil moisture (SSM) and leaf area index (LAI) are simultaneously assimilated to update eight model state control variables (i.e., control variables); LAI and soil moisture from seven layers of soil, from 1 cm to 100 cm. Assimilating SSM and LAI within LDAS-Monde results in updates of the LSM variables in different ways. Model variables corresponding to the observations are first updated through the Kalman gain computed by the SEKF. Secondly, control variables are updated through their sensitivity to the observed variables. For example, the assimilation of LAI impacts LAI itself, but also soil moisture from the seven layers present in the state vector and the assimilation of SSM impacts LAI. Finally, other variables are indirectly modified by the analysis through biophysical processes and feedbacks in the model by updates of the control variables.

ESA CCI Surface Soil Moisture and CGLS Leaf Area Index
In this study the European Space Agency and Climate Change Initiative (ESA CCI) SSM-combined version of the product (v4.1) is assimilated into LDAS-Monde (http://www.esa-soilmoisture-cci.org, last access June 2018). The CCI merges SSM observations from seven different microwave radiometers (SMMR, SSM/I, TMI, ASMR-E, WindSat, AMSR2, SMOS) and four different scatterometers (ERS-1 and 2 AMI, and MetOp-A and B ASCAT) into a single combined data set covering the time period from November 1978 to December 2016. Data are expressed in volumetric (m 3 m −3 ) units and quality flags are provided (i.e., snow coverage or temperature below 0 • and dense vegetation). For a more comprehensive overview of the product, see [24,25]. Topographic relief is known to negatively affect satellite remote sensing retrievals of SSM [37], hence the time series for pixels whose average altitude exceeds 1500 m above sea level were not accounted for. Data on pixels with urban land cover fractions larger than 15% were discarded too, to limit the effects of artificial surfaces. These thresholds were set according to [13,20,38], who processed satellite-based SSM retrievals for data assimilation experiments with the ISBA LSM. Data are available almost every day with a spatial resolution of 0.25 • × 0.25 • . To assimilate SSM data, it is important to rescale the observations such that they are consistent with the model climatology [39,40]. Hence, similarly to previous studies, the ESA CCI SSM product has been transformed into the model-equivalent SSM to address possible misspecification of physiographic parameters, such as the wilting point and the field capacity. The linear rescaling approach described in [41] (using the first two moments of the cumulative distribution function, CDF) was used. It consists of a linear rescaling enabling a correction of the differences in the mean and variance of the distribution. It has been applied at a seasonal scale (i.e., for each specific month) following [13].
The GEOV1 LAI is also assimilated. It is produced by the European Copernicus Global Land Service project (http://land.copernicus.eu/global/, last access June 2018). Ref. [42] proposed an evaluation of this product in the context of Numerical Weather Prediction (NWP). LAI observations are retrieved from the SPOT-VGT (from 1999 to 2014) and then from PROBA-V (from 2014 to present) satellite data according to the methodology proposed by [43]. The 1 km spatial resolution observations are interpolated by an arithmetic average to the 0.25 • × 0.25 • model grid points, if at least 50% of the observation grid points are observed (i.e., half the maximum amount). LAI observations have a temporal frequency of 10 days at best (e.g., in presence of clouds, no observations are available). Both assimilated datasets are illustrated by Figure 1, averaged over 2010-2016.

ERA-5 Atmospheric Reanalysis
ERA-5 [44] is the fifth generation of European re-analyses produced by the ECMWF and a key element of the EU-funded Copernicus Climate Change Service (C3S). ERA-5 important changes relative to ERA-interim former ECMWF's atmospheric reanalysis include (i) a higher spatial and Remote Sens. 2018, 10, 1627 5 of 24 temporal resolution as well as (ii) a more recent version of ECMWF Earth system model physics and data assimilation system (corresponding to ECMWF's cycle CY41R2, https://www.ecmwf.int/en/ forecasts/documentation-and-support/changes-ecmwf-model/ifs-documentation, last access June 2018). It makes it possible to use modern parameterizations of Earth processes compared with older versions used in ERA-interim. For instance, in addition to being applied to satellite observations, a variational bias scheme is also applied to aircraft and surface ozone and pressure data. ERA-5 also benefits from reprocessed data sets that were not ready yet during the production of ERA-interim. Two other important features of ERA-5 are the more frequent model output and improved model spatial resolution, going from six-hourly output in ERA-interim to hourly output analysis in ERA-5, and from 79 km (horizontal dimension) and 60 levels (vertical dimension) to 31 km and 137 levels in ERA-5. Finally, ERA-5 also provides an estimate of uncertainty through the use of a 10-member ensemble of data assimilations (EDA) at a coarser resolution (63 km horizontal resolution) and three-hourly frequency. ERA-5 is foreseen to replace ERA-interim re-analysis. All ERA-5 atmospheric variables were interpolated at 0.25 • × 0.25 • spatial resolution. A bilinear interpolation from the native reanalysis grid to the regular grid was made.

Evaluation Datasets and Methods
The LDAS-Monde analysis impact was assessed with respect to the open-loop model run (i.e., no assimilation). The system was spun-up by running year 2010 twenty times. Table 1 presents the set up of the different experiments used in this study, the open-loop and the analysis, as well as two additional model runs: (i) Ini_model, a 12-month model run starting on 1 January 2016 (initialised by the model, that is, the openloop with no data assimilation, simulation run from 2010 to 2015); and (ii) Ini_analysis, a 12-month model run initialised by initial conditions from the analysis on 1 January 2016. The two above-mentioned assimilated datasets (ESA CCI SSM and LAI GEOV1) were used as a way to check to what extent the assimilation system was able to produce analyses closer to these two datasets that were assimilated than the open-loop. Then, two independent spatially distributed datasets, namely evapotranspiration from the GLEAM project [45,46] and gross primary production (GPP) from the FLUXCOM project [47,48], were used in the evaluation process. Ground based measurements of soil moisture from the USCRN (US Climate Reference Network, [49]) were also used, along with river discharge observations from the United States Geophysical Survey (USGS) and the Global Runoff Data Centre (GRDC).
The ability of LDAS-Monde to represent SSM, LAI, evapotranspiration, and GPP was assessed using the correlation coefficient (R) and root mean square difference (RMSD). These metrics were applied at a seasonal scale (i.e., for each month) over 2010-2016. For ground-based measurements of SSM, R was calculated for both absolute and anomaly time series in order to remove the strong impact from the SSM seasonal cycle on this specific metric (see e.g., [13,28]). Ground measurements at a depth of 5 cm were compared with soil moisture of the third layer of soil (between 4 and 10 cm depth) from both the model and the analysis for months from April to September over the 2010-2016 time period to avoid frozen conditions. Only stations with significant R values for the two experiments (with p-value < 0.05) were kept for the evaluation.  In order to provide an easier measurement of the added value of the analysis, statistics were also normalized with respect to the model. The so-called normalized information contribution index (NIC as in [28,50]) was applied to the correlation coefficient (Equation (1), for both volumetric and anomaly time-series) and to RMSD (Equation (2)) to quantify the improvement or degradation from the analysis with respect to the model.
NIC scores were then classified according to three categories: (i) negative impact from the analysis with respect to the model with values smaller than −3%, (ii) positive impact from the analysis with respect to the model with values greater than +3%, and (iii) neutral impact from the analysis with respect to the model with values between −3% and 3%.
Over the 2010-2016 time period, river discharge from the analysis and model runs were compared with daily streamflow data from USGS and GRDC. Data were selected for sub-basins with rather large drainage areas (10,000 km 2 or greater) because of the low resolution of CTRIP (0.5 • × 0.5 • ) and with a long observation time series (48 months or more). As commonly found in the literature, observed and simulated river discharge (Q) data are expressed in m 3 s −1 . However, given that the observed drainage areas may differ from the simulated ones, specific discharge in mm d −1 (the ratio of Q to the drainage area) was used in this study, similarly to [13,28]. Stations with drainage areas differing by more than 20% from the simulated ones were discarded. Impact on Q was evaluated using the Kling-Gupta Efficiency (KGE, [51]) score: with RE µ and RE σ representing the relative error of simulated or analysed mean and standard deviation (Equations (4) and (5)), respectively; R representing the correlation coefficient between the observed discharges and either the modelled or analysed river discharges.
KGE represents the Euclidean distance from the ideal point in the [RE µ , RE σ , R] score space. RE µ , RE σ , and R constitute a set of mathematically independent metrics quantifying the fit of simulated/analysed discharge time series. At best, Re µ and RE σ are equal to 0 and R is equal to 1 (leading to a perfect KGE value of 1), indicating that simulated or analysed time series are identical to the measured one. NIC (Equation (1)) was applied to KGE (Equation (6)) as well, only for stations with KGE values greater than 0. Finally, RE µ and RE σ metrics were normalised, following Equation (7), as well as Equation (8) to appreciate the added value from the analysis with respect to the model.

Analysis Impact on Assimilated Variables
Being the model equivalents of the assimilated observations, LAI and soil moisture from the second layer of soil are expected to be the two variables most affected by the assimilation. Figure 2 presents a 10-day time series of LAI averaged over the whole domain for the 2010-2016 time period. From Figure 2, one can see that the open-loop simulation tends to overestimate the observed LAI in winter periods and that the senescence phase of vegetation is too late over the autumn when compared with the observations. In that respect, the assimilation is efficiently correcting the model; however, analysed LAI does not reach LAI maximal values of the observations. Figure 3 shows maps of LAI for the model (Figure 3a It is worth mentioning that the good level of scores (prior as well as after assimilation) are linked to the rescaling of the ESA CCI SSM data to the model climatology. Finally, Figure 5 shows maps of analysis increments for 4 (out of 8) control variables averaged over the whole 2010-2016 time period (LAI, second, fourth, and sixth layers of soil from left to right, respectively). It can be noticed that the magnitude of increments is decreasing with depth. It can also be noticed that over almost the whole domain, the analysis tends to add water in the soil near the surface (positive increments), while it dries layers where the roots are mainly located (from layer 4 to 6, negative increments).        Table 2 presents the statistical scores from the evaluation of both the open-loop and the analysis with respect to evapotranspiration and GPP averaged over 2010-2016, as well as the mean of the evaluation data set over the considered area. On average, R increases from 0.80 to 0.81 when comparing evapotranspiration from the model and from the analysis, respectively, to the independent estimates. Average RMSD decreases from 0.89 kg m −2 d −1 to 0.85 kg m −2 d −1 . When compared with GPP estimates, averaged correlations rise from 0.74 to 0.78 and RMSD drops from 2.20 g(C) m −2 d −1 to 1.91 g(C) m −2 d −1 when considering the model or the analysis, respectively.  (Figure 6a,c), blue colours represent an improvement from the analysis regarding RMSDs (i.e., the latter better represents either evapotranspiration or GPP than the model), while for NICR (Figure 6b,d), red colours depict an improvement from the analysis. Figure 6 shows that both evapotranspiration and GPP are improved almost everywhere in terms of correlation and RMSD, and that the impact of the assimilation is stronger for GPP than for evapotranspiration. At the seasonal scale (not shown), the assimilation leads to a positive impact all year long in the representation of GPP in terms of both RMSD and R values. Impact from the assimilation on evapotranspiration is smaller (as seen in Table 2), while RMSD values are slightly improved all year long, R values are slightly improved from April to October and slightly degraded from November to March.    Table 2 (Figure 6a,c), blue colours represent an improvement from the analysis regarding RMSDs (i.e., the latter better represents either evapotranspiration or GPP than the model), while for NICR (Figure 6b,d), red colours depict an improvement from the analysis. Figure 6 shows that both evapotranspiration and GPP are improved almost everywhere in terms of correlation and RMSD, and that the impact of the assimilation is stronger for GPP than for evapotranspiration. At the seasonal scale (not shown), the assimilation leads to a positive impact all year long in the representation of GPP in terms of both RMSD and R values. Impact from the assimilation on evapotranspiration is smaller (as seen in Table 2), while RMSD values are slightly improved all year long, R values are slightly improved from April to October and slightly degraded from November to March.  Table 2 (Figure 6a,c), blue colours represent an improvement from the analysis regarding RMSDs (i.e., the latter better represents either evapotranspiration or GPP than the model), while for NIC R (Figure 6b,d), red colours depict an improvement from the analysis. Figure 6 shows that both evapotranspiration and GPP are improved almost everywhere in terms of correlation and RMSD, and that the impact of the assimilation is stronger for GPP than for evapotranspiration. At the seasonal scale (not shown), the assimilation leads to a positive impact all year long in the representation of GPP in terms of both RMSD and R values. Impact from the assimilation on evapotranspiration is smaller (as seen in Table 2), while RMSD values are slightly improved all year long, R values are slightly improved from April to October and slightly degraded from November to March.

Soil Moisture
The statistical scores for surface soil moisture from the model and the analysis (third layer of soil between 4 and 10 cm depth) over 2010-2016 when compared with ground measurements from the USCRN network (at 5 cm depth) are presented in Table 3. Median R values on volumetric soil moisture time-series (anomaly time series), along with their 95% confidence interval of the median, derived from 10,000-sample bootstrapping are as follows: 0.72 ± 0.02 (0.60 ± 0.02) and 0.74 ± 0.02 (0.60 ± 0.02), while median ubRMSDs are 0.049 ± 0.004 and 0.048 ± 0.004 for the model and the analysis, respectively. Figure 7a,b illustrate correlation values on volumetric and anomaly time-series between the model and the observations, respectively, for each station. Figure 7c,d represent the added value of the analysis expressed through the NIC index (Equation (1)) applied for correlations (NIC R ) values on volumetric and anomaly time-series; large blue circles represent a positive impact from the analysis at NIC R greater that +3 (i.e., R values are better when the analysis is used than when the model is used), large red circles a degradation from the analysis at NIC R smaller than −3, while diamond symbols represent a rather neutral impact at NIC R in between [−3; +3]. While 46% (81%) of the pool of stations present a rather neutral impact for R values on volumetric (anomaly) time series, stations more impacted by the analysis tend to be positively impacted at 46% (18%), to be compared with 8% (1%) of negative impacts. Although differences between the model run and the analysis are rather small, these results underline the added value of the analysis with respect to the model run.

Soil Moisture
The statistical scores for surface soil moisture from the model and the analysis (third layer of soil between 4 and 10 cm depth) over 2010-2016 when compared with ground measurements from the USCRN network (at 5 cm depth) are presented in Table 3. Median R values on volumetric soil moisture time-series (anomaly time series), along with their 95% confidence interval of the median, derived from 10,000-sample bootstrapping are as follows: 0.72 ± 0.02 (0.60 ± 0.02) and 0.74 ± 0.02 (0.60 ± 0.02), while median ubRMSDs are 0.049 ± 0.004 and 0.048 ± 0.004 for the model and the analysis, respectively. Figure 7a,b illustrate correlation values on volumetric and anomaly timeseries between the model and the observations, respectively, for each station. Figure 7c,d represent the added value of the analysis expressed through the NIC index (Equation (1)) applied for correlations (NICR) values on volumetric and anomaly time-series; large blue circles represent a positive impact from the analysis at NICR greater that +3 (i.e., R values are better when the analysis is used than when the model is used), large red circles a degradation from the analysis at NICR smaller than −3, while diamond symbols represent a rather neutral impact at NICR in between [−3; +3]. While 46% (81%) of the pool of stations present a rather neutral impact for R values on volumetric (anomaly) time series, stations more impacted by the analysis tend to be positively impacted at 46% (18%), to be compared with 8% (1%) of negative impacts. Although differences between the model run and the analysis are rather small, these results underline the added value of the analysis with respect to the model run.   0.048 ± 0.004 * 46% (18%) 8% (1%) 46% (81%) * 95% confidence interval of the median derived from a 10,000 samples bootstrapping.

Streamflow
A subset of 258 out of 531 gauging stations was selected for the evaluation according to the criteria described in the methodology section, with KGE scores within the [0, 1] interval. Figure 8 presents the performance of analysed streamflow with respect to the one from the model run for this pool of stations, with a focus on the eastern part of the domain. NIC KGE values are presented following the same classification as NIC R applied to soil moisture. Scores are presented in Table 4. Looking at NIC KGE , 62% of the pool of stations (258 stations) present a rather neutral impact (at NIC KGE between [−3; 3]) and 26% of the stations present a positive impact (at NIC KGE > +3), while only 12% of stations have a negative impact (at NIC KGE < −3). NICR, N REσ , and N REµ follow the same classification (with even a smaller percentage of stations being negatively affected by the analysis; 1%); when the analysis is impacting streamflow representation, it tends to be a positive impact.

Streamflow
A subset of 258 out of 531 gauging stations was selected for the evaluation according to the criteria described in the methodology section, with KGE scores within the [0, 1] interval. Figure 8 presents the performance of analysed streamflow with respect to the one from the model run for this pool of stations, with a focus on the eastern part of the domain. NICKGE values are presented following the same classification as NICR applied to soil moisture. Scores are presented in Table 4. Looking at NICKGE, 62% of the pool of stations (258 stations) present a rather neutral impact (at NICKGE between [−3; 3]) and 26% of the stations present a positive impact (at NICKGE > +3), while only 12% of stations have a negative impact (at NICKGE < −3). NICR, NREσ, and NREμ follow the same classification (with even a smaller percentage of stations being negatively affected by the analysis; 1%); when the analysis is impacting streamflow representation, it tends to be a positive impact.  . NIC KGE greater than 3 (blue large circles) suggest an improvement from the analysis over the model, values smaller than −3 (red large circles) suggest a degradation. For sake of clarity, a factor of 100 has been applied to NIC. Table 4. Analysis impact evaluation against daily streamflow over 2010-2016. The impact from the analysis with respect to the model is assessed through the normalized information contribution (NIC) applied to the Kling-Gupta efficiency (KGE) score, as well as using normalized relative error of simulated or analysed mean (RE µ ) and standard deviation (RE σ ). Scores are classified according to three categories: (i) negative impact from the analysis with respect to the model with values smaller than −3, (ii) positive impact from the analysis with respect to the model with values greater than +3, and (iii) neutral impact from the analysis with respect to the model with values between −3 and 3.

Could LDAS-Monde be Used to Monitor Agricultural Droughts?
The previous section has highlighted the LDAS-Monde ability to enhance the monitoring accuracy for land surface variables. It should then be possible to use it to better represent extreme events like agricultural droughts. Figure 9 represents monthly LAI anomalies averaged over the U.S. corn belt (simplified as a box from 110 • W to 70 • W and 30 • N to 50 • N) with respect to 2010-2016 means from the model, the analysis, and the observations. As shown by Figure 9, for the second part of the year 2012, LAI observations exhibit a strong negative anomaly at this domain scale. While it is also visible in the model, the latter clearly overestimates the intensity of the observed anomaly. The analysed LAI anomaly is closer to the observed one than the model. This extreme drought event is known as the August 2012 U.S. corn belt drought. The U.S. Department of Agriculture (USDA, www.nass.usda.gov, last access June 2018) estimated that corn yield (per acre of planted crop) was 26% below the expectation that they had at the beginning of the 2012 growing season. The 2012 corn yield deficit and the implied climatic impact was classified as a 'historic event' [54]. As visible on Figure 9, spring 2012 presents a positive anomaly for vegetation. Ref. [55] defined spring 2012 as the earliest false spring in the North American record (i.e., a period of weather in late winter or early spring allowing vegetation to be prematurely brought out of dormancy). This false spring has contributed to an earlier dry out of the soil. Figure 10 presents maps of the LAI anomaly for this specific month for the model, observations, and analysis from left to right, respectively. Compared with the observations (Figure 10b), the area affected by the anomaly in the model (Figure 10a) is too large and too intense, while the analysis (Figure 10c) better matches the observed pattern in both space and intensity. This impact is valid when compared with most of the severe droughts events that occurred over CONUS (data from the National Oceanic and Atmospheric Administration (NOAA) state of the climate website, last access April 2018 https://www.ncdc.noaa.gov/sotc/drought/201803, not shown). Hence, LDAS-Monde provides a better tool than the model alone to monitor extreme events like agricultural droughts.

Could LDAS-Monde be Used to Monitor Agricultural Droughts?
The previous section has highlighted the LDAS-Monde ability to enhance the monitoring accuracy for land surface variables. It should then be possible to use it to better represent extreme events like agricultural droughts. Figure 9 represents monthly LAI anomalies averaged over the U.S. corn belt (simplified as a box from 110°W to 70°W and 30°N to 50°N) with respect to 2010-2016 means from the model, the analysis, and the observations. As shown by Figure 9, for the second part of the year 2012, LAI observations exhibit a strong negative anomaly at this domain scale. While it is also visible in the model, the latter clearly overestimates the intensity of the observed anomaly. The analysed LAI anomaly is closer to the observed one than the model. This extreme drought event is known as the August 2012 U.S. corn belt drought. The U.S. Department of Agriculture (USDA, www.nass.usda.gov, last access June 2018) estimated that corn yield (per acre of planted crop) was 26% below the expectation that they had at the beginning of the 2012 growing season. The 2012 corn yield deficit and the implied climatic impact was classified as a 'historic event' [54]. As visible on Figure 9, spring 2012 presents a positive anomaly for vegetation. Ref. [55] defined spring 2012 as the earliest false spring in the North American record (i.e., a period of weather in late winter or early spring allowing vegetation to be prematurely brought out of dormancy). This false spring has contributed to an earlier dry out of the soil. Figure 10 presents maps of the LAI anomaly for this specific month for the model, observations, and analysis from left to right, respectively. Compared with the observations (Figure 10b), the area affected by the anomaly in the model (Figure 10a) is too large and too intense, while the analysis (Figure 10c) better matches the observed pattern in both space and intensity. This impact is valid when compared with most of the severe droughts events that occurred over CONUS (data from the National Oceanic and Atmospheric Administration (NOAA) state of the climate website, last access April 2018 https://www.ncdc.noaa.gov/sotc/drought/201803, not shown). Hence, LDAS-Monde provides a better tool than the model alone to monitor extreme events like agricultural droughts.  It is also worth mentioning that if LDAS-Monde brings a clear improvement in the representation of LAI, reducing the overestimation duration as well as the minimal values, as its model counterpart, it fails capturing the observed LAI peak intensity. Ref. [13] has evaluated the model sensitivity of the observation for Europe over 2000-2012 reflected in the SEKF Jacobians. The Jacobians depend on the model physics and their examination provides useful insight that explains the data assimilation system performances [21,56]. Ref. [13] suggests a seasonal dependency of the model sensitivity to the observed LAI. High sensitivity is found in autumn. Smaller model sensitivity at the time of the year where the peak LAI occurs (late spring) prevails the analysis to match the observations correctly.
As highlighted by [57], who have evaluated the capacity of several LSMs (including ISBA) to accurately simulate observed energy and water fluxes during droughts, there is a need to reexamine existing model components in LSMs to improve simulations of soil hydrological processes and water-plant interactions. It appears from Figure 2 that although the analysis is able to correct the overestimated LAI values in winter, the minimum LAI thresholds used in ISBA has to be revisited. Recently, the satellite derived LAI data have been disaggregated following a Kalman filtering technique developed by [58]. This enables the LAI signal for each vegetation type to be separated within the pixel, which provides a dynamic vegetation-dependent estimate of the assimilated LAI within the pixel [59]. This new dataset will make it possible to modify the minimum LAI thresholds accordingly.

Could LDAS-Monde Provide Accurate Initial Conditions for Vegetation Forecasts?
In the context of NWP, assimilation of satellite observations in atmospheric models has the capacity to mitigate model deficiencies, leading to better estimates of system states. This has been the main driver of the improvement of both weather forecast skill and lead time [60]. Data assimilation is able to produce similar benefits for LSVs forecasting. Seeking to foster a link with applications, LDAS-Monde could not only be used to monitor the LSVs, but could also be integrated into a forecasting system (at different time scales) assuming that it can provide better initial conditions than a model run and that its impact lasts in time. Many applications could benefit from a better representation of the LSVs, from NWP [61], to early warning systems of, for example, agricultural drought and yield forecasts. As a first step towards such early warning systems, Figure  11 shows a comparison between LAI from the two last simulations presented in Table 1: Ini_Model and Ini_Analysis. Figure 11a,b shows monthly RMSD (R) values for the year 2016 for LAI. A strong impact is visible not only from the beginning of the two simulations, but also a few months later (up to April). The four maps of Figure 11c show RMSD differences between Ini_analysis and Ini_model from January to April. All maps are dominated by negative values (in blue), suggesting that Ini_analysis presents a better match with the observed LAI and that analysis effects last in time. Southeastern part of the domain is mainly affected, these areas are dominated by broadleaves forest as well as C3 crops. Large differences between the model and the analysis during winter time (as shown on Figure 2) suggest that the minimum LAI threshold used in the model for these vegetation types has to be revisited. Such results are strongly linked to the time of the year when the It is also worth mentioning that if LDAS-Monde brings a clear improvement in the representation of LAI, reducing the overestimation duration as well as the minimal values, as its model counterpart, it fails capturing the observed LAI peak intensity. Ref. [13] has evaluated the model sensitivity of the observation for Europe over 2000-2012 reflected in the SEKF Jacobians. The Jacobians depend on the model physics and their examination provides useful insight that explains the data assimilation system performances [21,56]. Ref. [13] suggests a seasonal dependency of the model sensitivity to the observed LAI. High sensitivity is found in autumn. Smaller model sensitivity at the time of the year where the peak LAI occurs (late spring) prevails the analysis to match the observations correctly.
As highlighted by [57], who have evaluated the capacity of several LSMs (including ISBA) to accurately simulate observed energy and water fluxes during droughts, there is a need to re-examine existing model components in LSMs to improve simulations of soil hydrological processes and water-plant interactions. It appears from Figure 2 that although the analysis is able to correct the overestimated LAI values in winter, the minimum LAI thresholds used in ISBA has to be revisited. Recently, the satellite derived LAI data have been disaggregated following a Kalman filtering technique developed by [58]. This enables the LAI signal for each vegetation type to be separated within the pixel, which provides a dynamic vegetation-dependent estimate of the assimilated LAI within the pixel [59]. This new dataset will make it possible to modify the minimum LAI thresholds accordingly.

Could LDAS-Monde Provide Accurate Initial Conditions for Vegetation Forecasts?
In the context of NWP, assimilation of satellite observations in atmospheric models has the capacity to mitigate model deficiencies, leading to better estimates of system states. This has been the main driver of the improvement of both weather forecast skill and lead time [60]. Data assimilation is able to produce similar benefits for LSVs forecasting. Seeking to foster a link with applications, LDAS-Monde could not only be used to monitor the LSVs, but could also be integrated into a forecasting system (at different time scales) assuming that it can provide better initial conditions than a model run and that its impact lasts in time. Many applications could benefit from a better representation of the LSVs, from NWP [61], to early warning systems of, for example, agricultural drought and yield forecasts. As a first step towards such early warning systems, Figure 11 shows a comparison between LAI from the two last simulations presented in Table 1: Ini_Model and Ini_Analysis. Figure 11a,b shows monthly RMSD (R) values for the year 2016 for LAI. A strong impact is visible not only from the beginning of the two simulations, but also a few months later (up to April). The four maps of Figure 11c show RMSD differences between Ini_analysis and Ini_model from January to April. All maps are dominated by negative values (in blue), suggesting that Ini_analysis presents a better match with the observed LAI and that analysis effects last in time. Southeastern part of the domain is mainly affected, these areas are dominated by broadleaves forest as well as C3 crops. Large differences between the model and the analysis during winter time (as shown on Figure 2) suggest that the minimum LAI threshold used in the model for these vegetation types has to be revisited. Such results are strongly linked to the time of the year when the simulation is initialised by the analysis, that is, the greater the prior difference between the model and the analysis experiments will be, the stronger the impact. As for LAI, and according to Figure 2a, marked impact would be expected from July to March. It is also very promising that the impact of LAI initialisation lasts in time for several weeks or even months. Those results are in line with findings from [5] who have assimilated biomass and LAI observations at "site-level" in CLM4.5. They found that monthly forecasts (and even longer forecasts range) were improved by data assimilation compared with forecasts without data assimilation.
Remote Sens. 2018, 10, x FOR PEER REVIEW 16 of 24 simulation is initialised by the analysis, that is, the greater the prior difference between the model and the analysis experiments will be, the stronger the impact. As for LAI, and according to Figure  2a, marked impact would be expected from July to March. It is also very promising that the impact of LAI initialisation lasts in time for several weeks or even months. Those results are in line with findings from [5] who have assimilated biomass and LAI observations at "site-level" in CLM4.5. They found that monthly forecasts (and even longer forecasts range) were improved by data assimilation compared with forecasts without data assimilation. Although the ISBA LSM does not directly represent grain yield (GY), it is assumed that the regional-scale simulations of above-ground biomass from a generic LSM can provide the interannual variability as a proxy for GY [62,63]. Refs. [13,64] have also found that the LDAS-Monde analysed above-ground biomass inter-annual variability was in better agreement with that of GY than its open-loop counterpart. These studies have been performed over France using straw cereal GY values from the Agreste French agricultural statistics portal (http://agreste.agriculture.gouv.fr). If more evaluations are required to assess LDAS-Monde capacity to represent GY, it paves the way towards potential productivity and yield forecasting system.

Which Alternative Data to Better Constrain LDAS-Monde?
LDAS-Monde re-analyses presented above were repeated, assimilating only SSM or LAI, the results suggested that most of the skill came from the assimilation of LAI (not shown). Assimilating LAI permits to analyze not only LAI itself, but also the root zone soil moisture. While assimilating SSM does mainly affect the first layers of soil (layer 2, 1 cm to 4 cm and layer 3, 4 cm to 10 cm), assimilating LAI has an impact on deeper layers (up to 60 cm) and is more efficient to analyse the Although the ISBA LSM does not directly represent grain yield (GY), it is assumed that the regional-scale simulations of above-ground biomass from a generic LSM can provide the inter-annual variability as a proxy for GY [62,63]. Refs. [13,64] have also found that the LDAS-Monde analysed above-ground biomass inter-annual variability was in better agreement with that of GY than its open-loop counterpart. These studies have been performed over France using straw cereal GY values from the Agreste French agricultural statistics portal (http://agreste.agriculture.gouv.fr). If more evaluations are required to assess LDAS-Monde capacity to represent GY, it paves the way towards potential productivity and yield forecasting system.

Which Alternative Data to Better Constrain LDAS-Monde?
LDAS-Monde re-analyses presented above were repeated, assimilating only SSM or LAI, the results suggested that most of the skill came from the assimilation of LAI (not shown). Assimilating LAI permits to analyze not only LAI itself, but also the root zone soil moisture. While assimilating SSM does mainly affect the first layers of soil (layer 2, 1 cm to 4 cm and layer 3, 4 cm to 10 cm), assimilating LAI has an impact on deeper layers (up to 60 cm) and is more efficient to analyse the root zone soil moisture too. This has also been suggested by [13], when analysing the ISBA LSM sensitivity to the assimilated observations through the SEKF Jacobians.
However, the LAI product used in this study is available every 10 days at best, making it less efficient to constrain the ISBA LSM, particularly in areas of the world affected by clouds for long periods of time (e.g., areas affected by the monsoon regime). Another caveat is the use of a single LAI value for all vegetation types that are represented in SURFEX. As detailed in [20], the innovation is computed from the difference between the observed LAI and the modelled LAI aggregated over all the vegetation types. Then, the Kalman gain is calculated for each individual vegetation type. The analysis increment is added to the background for each vegetation type, producing a vegetation-dependent analysis update. The vegetation dependence is introduced in the Kalman gain via the Jacobian elements. As mentioned already, the possibility of having LAI estimates for each type of vegetation is under investigation [59] and has the capacity of overcoming the above-mentioned weakness. An extension of this work is under development to assimilate in LDAS-Monde the disaggregated LAI product for each vegetation type independently, with very promising preliminary results [65].
Microwave remote sensing over land has mainly focused on soil moisture retrieval [66,67] and vegetation was mostly considered during the retrieval of surface soil moisture as a by-pass product affecting the signal penetration to the surface [68,69]. The attenuation of the signal (i.e., when passing through the vegetation) depends on the vegetation optical depth (VOD). VOD describes the attenuation of radiation due to scattering and absorption within the vegetation layer, which is caused by the water contained in the vegetation. It is function of the frequency of the microwave sensor, the water content of the plant (trunk, branches, leaves), as well as the biomass (e.g., [70][71][72][73]. VOD can be retrieved from microwave data, for example, from the L-band soil moisture and ocean salinity (SMOS) mission [74] or the C-band advanced scatterometer (ASCAT) mission on-board the meteorological operational satellite A (MetOp-A) [75,76]. VOD can be related to LAI (e.g., [77][78][79][80]. Figure 12 presents a map of temporal correlation coefficient values between modelled LAI and microwave-derived VOD from radar backscatter measurements of ASCAT [75,76] (Figure 12a), as well as their distribution (Figure 12b) for 2010-2016. High correlations values are found in large parts of the domain, with a median value of 0.57. The northern part of the domain shows R values greater than 0.7, while smaller R values (and even negative R values) are found in the southern part of the domain. Over dry soils, sub-surface scattering from the microwave signal potentially affects the VOD estimates (Wolfgang Wagner, TU WIEN, personal communication, April 2018). The same VOD dataset has a higher median R value with the observed LAI, that is, 0.88. Consequently it better correlates with the analysis (median R values of 0.61) than with the model. If a strong statistical relationship between C-band VOD and LAI can be obtained through the use of, for example, machine learning techniques (like neural network techniques, [81], it could enable obtaining a surrogate of LAI based on C-band VOD that would have the advantage of having higher temporal frequency than the current LAI product (low frequency microwave observations are not affected much by clouds and are not affected by solar elevation). Such a product could then be assimilated into LDAS-Monde to better constrain the system. Looking at such a relationship for data assimilation purposes is currently under study at CNRM.
Also, retrieved soil moisture is assimilated in LDAS-Monde, from active radar backscatter (σ o ) observations. Retrieval methods usually make use of land surface parameters and auxiliary information (like vegetation, texture, and temperature), possibly being inconsistent with specific model simulations (which also include these parameters, but potentially from different sources). Also, if retrievals and model simulations rely on similar types of auxiliary information, their errors may be cross-correlated, potentially degrading the system performance [82]. This leads to an increasing tendency towards the direct assimilation of σ o observations (and of passive radiometer brightness temperature, T b , as well) [83][84][85][86][87]. CNRM is also investigating the direct assimilation of σ o . It requires the implementation of a forward model for σ o in the ISBA LSM. Ref. [85] used the Water Cloud Model [88] to relate surface soil moisture from the Global Land Evaporation Amsterdam Model (GLEAM, [45,89]) to σ o for data assimilation purposes. Within LDAS-Monde, both surface soil moisture and leaf area index could be related to the radar backscatter. Also, retrieved soil moisture is assimilated in LDAS-Monde, from active radar backscatter (σo) observations. Retrieval methods usually make use of land surface parameters and auxiliary information (like vegetation, texture, and temperature), possibly being inconsistent with specific model simulations (which also include these parameters, but potentially from different sources). Also, if retrievals and model simulations rely on similar types of auxiliary information, their errors may be cross-correlated, potentially degrading the system performance [82]. This leads to an increasing tendency towards the direct assimilation of σo observations (and of passive radiometer brightness temperature, Tb, as well) [83][84][85][86][87]. CNRM is also investigating the direct assimilation of σo. It requires the implementation of a forward model for σo in the ISBA LSM. Ref. [85] used the Water Cloud Model [88] to relate surface soil moisture from the Global Land Evaporation Amsterdam Model (GLEAM, [45,89]) to σo for data assimilation purposes. Within LDAS-Monde, both surface soil moisture and leaf area index could be related to the radar backscatter.

Conclusions
In this study, LDAS-Monde sequential assimilation of satellite derived surface soil moisture and leaf area index, forced by ERA-5 latest atmospheric re-analysis, was applied to the CONtiguous US domain. Ref. [28] have highlighted the added value of using the ERA-5 atmospheric reanalysis to force the ISBA Land Surface Model over the CONtiguous US for the 2010-2016 period. They found that the use of ERA-5 instead of ERA-interim leads to significant improvements in the representation of the land surface variables linked to the terrestrial water cycle (e.g., surface soil moisture, river discharges, snow depth, and turbulent fluxes), but to a rather neutral impact on land surface variables linked to the vegetation cycle (e.g., evapotranspiration, carbon uptake, and leaf area index). Assimilating satellite derived observations linked to vegetation (LAI in this application) through LDAS-Monde forced by ERA-5 not only leads to a clear improvement in the representation of the vegetation cycle in ISBA, but brings further improvement on the representation of the terrestrial water cycle. The results have highlighted the stronger impact of LAI observations assimilation with respect to soil moisture assimilation. Other vegetation-related observations such as vegetation optical depth could be used, under specific circumstances, as a surrogate of LAI limiting the negative impact of the rather low temporal frequency of the LAI product. LDAS-Monde is a powerful tool to track the evolution of land surface variables and to monitor extreme events such as agricultural drought. Since LDAS-Monde analysis is more accurate than a simple model run, it can be used to initialise a forecast experiment of the land surface variables. Preliminary results suggest that its impact on forecast experiments, in particular with respect to vegetation, is positive and lasts in time. It opens the way towards applications from monitoring to forecasting land surface states. For that purpose, LDAS-Monde will be forced by other, higher spatial resolution, ECMWF atmospheric products like the high resolution forecast

Conclusions
In this study, LDAS-Monde sequential assimilation of satellite derived surface soil moisture and leaf area index, forced by ERA-5 latest atmospheric re-analysis, was applied to the CONtiguous US domain. Ref. [28] have highlighted the added value of using the ERA-5 atmospheric reanalysis to force the ISBA Land Surface Model over the CONtiguous US for the 2010-2016 period. They found that the use of ERA-5 instead of ERA-interim leads to significant improvements in the representation of the land surface variables linked to the terrestrial water cycle (e.g., surface soil moisture, river discharges, snow depth, and turbulent fluxes), but to a rather neutral impact on land surface variables linked to the vegetation cycle (e.g., evapotranspiration, carbon uptake, and leaf area index). Assimilating satellite derived observations linked to vegetation (LAI in this application) through LDAS-Monde forced by ERA-5 not only leads to a clear improvement in the representation of the vegetation cycle in ISBA, but brings further improvement on the representation of the terrestrial water cycle. The results have highlighted the stronger impact of LAI observations assimilation with respect to soil moisture assimilation. Other vegetation-related observations such as vegetation optical depth could be used, under specific circumstances, as a surrogate of LAI limiting the negative impact of the rather low temporal frequency of the LAI product. LDAS-Monde is a powerful tool to track the evolution of land surface variables and to monitor extreme events such as agricultural drought. Since LDAS-Monde analysis is more accurate than a simple model run, it can be used to initialise a forecast experiment of the land surface variables. Preliminary results suggest that its impact on forecast experiments, in particular with respect to vegetation, is positive and lasts in time. It opens the way towards applications from monitoring to forecasting land surface states. For that purpose, LDAS-Monde will be forced by other, higher spatial resolution, ECMWF atmospheric products like the high resolution forecast (HRES, current spatial resolution of~9 km), which also gives daily forecasts up to 10 days ahead and/or the ensemble forecast (ENS, current spatial resolution of~18 km), giving daily forecasts up to 15 days (46 days twice a week).