The Joint Assimilation of Remotely Sensed Leaf Area Index and Surface Soil Moisture into a Land Surface Model

: This work tests the hypothesis that jointly assimilating satellite observations of leaf area index and surface soil moisture into a land surface model improves the estimation of land vegetation and water variables. An Ensemble Kalman Filter is used to test this hypothesis across the Contiguous United States from April 2015 to December 2018. The performance of the proposed methodology is assessed for several modeled vegetation and water variables (evapotranspiration, net ecosystem exchange, and soil moisture) in terms of random errors and anomaly correlation coefﬁcients against a set of independent validation datasets (i.e., Global Land Evaporation Amsterdam Model, FLUXCOM, and International Soil Moisture Network). The results show that the assimilation of the leaf area index mostly improves the estimation of evapotranspiration and net ecosystem exchange, whereas the assimilation of surface soil moisture alone improves surface soil moisture content, especially in the western US, in terms of both root mean squared error and anomaly correlation coefﬁcient. The joint assimilation of vegetation and soil moisture information combines the results of individual vegetation and soil moisture assimilations and reduces errors (and increases correlations with the reference datasets) in evapotranspiration, net ecosystem exchange, and surface soil moisture simulated by the land surface model. However, because soil moisture satellite observations only provide information on the water content in the top 5 cm of the soil column, the impact of the proposed data assimilation technique on root zone soil moisture is limited. This work moves one step forward in the direction of improving our estimation and understanding of land surface interactions using a multivariate data assimilation approach, which can be particularly useful in regions of the world where ground observations are sparse or missing altogether.


Introduction
Vegetation and soil moisture both play very important roles in land surface modeling by supporting critical functions in the global and regional water and energy cycles. Vegetation impacts soil characteristics, e.g., its water content, chemistry, and texture, that have feedback on vegetation characteristics, including productivity and structure [1]. Soil moisture represents the land storage for water and energy, effectively controlling the balance between sensible and latent heat flux at the land-atmosphere interface. Soil moisture content impacts atmospheric processes, such as cloud coverage and rainfall, thus affecting hydrological processes such as runoff and plant transpiration [2,3]. 2 of 20 To represent vegetation structural and functional variables, dynamic vegetation models (DVMs) have been designed [4]. In the past, DVMs and land surface models (LSMs) have been combined to improve the estimation of water, carbon, and energy variables [5,6]. As an example, the Noah Multi-Parameterization LSM (hereinafter Noah-MP), which was developed by Niu et al. [7], combines an LSM with a DVM, using a module that allocates carbon to various parts of vegetation and soil carbon pools. However, land surface models are affected by several uncertainties because of different reasons, such as inaccurate initialization, mis-specified forcing and parameters, or inadequate model physics [8].
As an alternative to model simulations, satellite observations also offer the estimations of several land surface variables over the global domain. For vegetation, satellites produce maps of the leaf area index (LAI), defined as the single-sided leaf surface area estimated over the unit ground area, and normalized difference vegetation index (NDVI), estimated by spectral reflectance measurements received in the red and near-infrared regions [9]. For example, the Moderate Resolution Imaging Spectroradiometer (MODIS), which has been receiving data in 36 spectral bands since 2000 [10], and the Advanced Very High-Resolution Radiometer (AVHRR; [11]) produce global maps of vegetation indices. Satellite observations also provide soil moisture information. For instance, the Soil Moisture Active Passive (SMAP; [12]) mission launched in 2015 provides direct sensing of soil moisture in the top 5 cm of the soil column.
Satellite-based observations are affected by gaps in both spatial and temporal coverages (often because of cloud coverage or instrument malfunctioning). To address the gaps in satellite observations and reduce uncertainties in both models and observations, data assimilation (DA) can be adopted to optimally combine the information from observations and model estimates [13]. Land Data Assimilation Systems (LDASs) have been used in the past successfully for merging satellite observations into LSMs [3,13,14]. DA has the potential to improve model estimates of land surface variables, which is useful for different purposes, including the monitoring and forecasting of extreme hydrological events, crop water management, and ecological studies. DA can be particularly effective in data-poor regions, where ground observations are sparse or not available at all.
Soil moisture has been assimilated in land surface models to generate a superior soil moisture product in numerous experiments, which used field measurements and satellite retrievals [3]. For large-scale soil moisture DA, the Ensemble Kalman Filter (EnKF) has become a method of choice [15,16]. Reichle et al. [17] performed a comparison between the Extended Kalman Filter and the EnKF for soil moisture estimation in a twin experiment over the southeastern United States, and both methods provided satisfactory estimates of soil moisture. In the last decade, real satellite soil moisture observations have been used in several experiments for DA within different land surface models [18][19][20][21][22][23].
More recent studies have applied DA techniques to vegetation observations and vegetation dynamic models. For instance, Kumar et al. [24] performed an experiment to assess the impact of satellite-based LAI DA by applying an EnKF technique to the Noah-MP LSM over the Continental U.S. (CONUS). The results showed that LAI DA improves the estimation of key water budget terms (i.e., soil moisture, evapotranspiration (ET), terrestrial water storage, and streamflow) and carbon fluxes (i.e., gross primary production and net ecosystem exchange) when validated with ground-based reference datasets, but the improvement in surface soil moisture was still limited. In another study, Ling et al. [25] assimilated Global Land Surface Satellite (GLASS) LAI data using an Ensemble Adjustment Kalman Filter technique over the global domain. The results demonstrated that the DA reduced the bias in LAI (from 5 m 2 /m 2 to ±1 m 2 /m 2 ). Two past studies proposed synthetic experiments to assimilate LAI alone within Noah-MP using an EnKF [26] and direct insertion (DI) [9]. The results showed an improvement in a subset of water and vegetation variables and a reduction in the impact of large biases in the model input precipitation. However, the improvement in soil moisture was still limited.
Past work that jointly assimilated vegetation and water variables has shown some promise. As an example, Xie et al. [27] assimilated LAI and surface soil moisture observa-Remote Sens. 2022, 14, 437 3 of 20 tions within the CERES-Wheat model for winter wheat yield estimation, using a particle filter algorithm across the Guan Zhong Plain, China. Another study jointly assimilated LAI and soil moisture from Sentinel-1 and Sentinel-2 data into a crop model for winter wheat yield estimation in Heng Shui city, China. Although promising, these studies focused on regions of limited spatial extension [28]. A study by Bonan et al. [8] investigated the joint assimilation of surface soil moisture and LAI within the Land Data Assimilation System LDAS-Monde over the Euro-Mediterranean Region. They adopted a deterministic ensemble square root filter. This joint assimilation was shown to moderately improve ET and highly improve river discharges. LDAS-Monde was developed for performing data assimilation using the ISBA (Interaction Sol-Biosphère-Atmosphère) land surface model, initially developed for France and then extended to the global domain [29]. ISBA interfaces with an atmospheric and a hydrological model but simulates above-ground plant biomass and LAI with a simplistic plant growth model (ISBA-A-gs) [30]. In another recent work, Albergel et al. [31] performed a joint assimilation of satellite-derived LAI and soil water index into the ISBA model to monitor the impact of extreme events (heat waves and droughts), using the Simplified Extended Kalman Filter. This work stems from the studies discussed above and further investigates the potential of jointly assimilating satellite LAI and soil moisture products. Specifically, for the first time, it jointly assimilates GLASS LAI and SMAP soil moisture within the Noah-MP model over the CONUS domain. Noah-MP uses a more sophisticated dynamic vegetation model scheme compared to other similar models used in the literature (e.g., ISBA) [31]. As past work showed that LAI assimilation alone could not improve the estimation of soil moisture (specifically surface soil moisture), the main goal of this work is to improve our estimation and understanding of soil moisture storages in addition to analyzing the impact on vegetation variables.

Materials and Methods
The Noah-MP LSM is used to test the hypothesis that, by combining satellite LAI and soil moisture observations within a land surface model, the estimation of vegetation and water variables can be improved. The experiment was run from 1 April 2015 to 31 December 2018 over the CONUS domain ( Figure 1).

The Noah-MP Land Surface Model
The Noah-MP model is based on a semi-tile sub grid scheme, where energy and carbon fluxes are computed separately over two tiles: the fractional vegetated area and the fractional bare ground area [7,32]. There are multiple options for surface water infiltration, runoff, groundwater transfer and storage, dynamic vegetation, canopy resistance, and frozen soil physics in the Noah-MP model [33]. For the dynamic vegetation model of Noah-MP, the Ball-Berry scheme [33] is implemented, which allocates carbon to vegetation leaves, stems, wood, and roots and to soil carbon fast and slow pools.
The Modern-Era Retrospective analysis for Research and Applications, version 2 (MERRA-2 [18]) dataset is used to force the Noah-MP model. Specifically, atmospheric temperature, radiation, water vapor, precipitation, wind speed, topography, altitude forcing variables from the MERRA-2 dataset are used as input to the Noah-MP model in this study ( Table 1).
The general model parameters in Noah-MP 3.6 are the slope type, vegetation effect on soil heat, soil evaporation, soil heat capacity, surface runoff parameterization, depth of lower boundary, and soil temperature. This model also uses saturation soil moisture content, soil conductivity, soil diffusivity, wilting point of soil moisture, rooting depth, minimum stomatal resistance, minimum and maximum background emissivity, minimum and maximum background albedo, minimum and maximum background roughness, optimum transpiration in air temperature, canopy water capacity [7,34].
The NASA Land Information System (LIS [35]) framework is used in this work to run the Noah-MP model and implement the DA scheme. LIS provides a framework for ensemble land surface modeling on grid points across the global land and can be run at high resolutions. Noah-MP was not recalibrated ad hoc for this experiment.
As past work showed that LAI assimilation alone could not improve the estimation of soil moisture (specifically surface soil moisture), the main goal of this work is to improve our estimation and understanding of soil moisture storages in addition to analyzing the impact on vegetation variables.

Materials and Methods
The Noah-MP LSM is used to test the hypothesis that, by combining satellite LAI and soil moisture observations within a land surface model, the estimation of vegetation and water variables can be improved. The experiment was run from 1 April 2015 to 31 December 2018 over the CONUS domain ( Figure 1).

Satellite-Based Observations
In this work, the satellite-based LAI from the GLASS product is merged within the land surface model. The GLASS LAI product is generated using a general regression neural network approach, which allows a spatially and temporally consistent long-term record of vegetation conditions [36]. Two sources, MODIS and AVHRR, provide input to this dataset. MODIS data are available for 2000-2018 and the AVHRR data are available for 1981-2018. The MODIS-based GLASS LAI V50 product is used in this work, which was preferred over the original MODIS LAI product because of the cloud obstruction gaps present in the latter [37].
The SMAP mission provides direct sensing of soil moisture. The SMAP satellite was launched on 2015 by National Aeronautics and Space Administration (NASA). It provides soil moisture data in the top 5 cm of the soil column during 2015-present. The SMAP satellite is in a 6 AM/6 PM ascending and descending orbit, but retrieving soil moisture from the morning overpasses is the primary objective of SMAP [38]. SMAP Level 1 delivers products from its instrument measurements and Level 2 delivers SMAP's Swath based geographical retrievals [38]. The SMAP Level 3 (SPL3SMP) product, which is used in this work, is a composite based on daily passive radiometer estimates of global land SM at the top 5 cm of the soil that is resampled to a global, cylindrical~36 km Equal-Area Scalable Earth Grid [39]. A summary of the main characteristics of GLASS LAI and the SMAP soil moisture product used in this study is shown in Table 1.

Validation Dataset
Different ground and remotely sensed observations are used as independent datasets to validate the proposed system. These observations are obtained from three main sources: the Global Land Evaporation Amsterdam Model (GLEAM), the FLUXCOM datasets, and the International Soil Moisture Network (ISMN). Details regarding the resolution and the time series availability of these datasets are presented in Table 1.
GLEAM is a set of algorithms that separately estimate the different components of land evaporation (or ET): transpiration, bare-soil evaporation, interception loss, openwater evaporation, and sublimation [40,41]. The GLEAM evapotranspiration data are derived from Priestley and Taylor's evaporation definition and based on remote sensing observations of surface net radiation and near-surface temperature. The GLEAM dataset is freely available at monthly and yearly scales and has been continuously revised and updated since its development in 2011, with a new version introduced in 2017 [40]. This version is used in this study for validating the proposed DA system. GLEAM data have shown good agreement with a number of ground-based observations (e.g., FLUXNET, AU-ASM) [42,43] and have been widely used in the literature [8,24,31,42,43]. The most recent version, GLEAM v3.3b, was chosen in this study to validate the proposed DA system.
The FLUXCOM dataset is derived from a combination of FLUXNET site observations, satellite remote sensing, and meteorological data. A machine learning technique is adopted for estimating the FLUXCOM Gross Primary Production (GPP) and the Net Ecosystem Exchange (NEE). Previous research shows that estimates of seasonal NEE from FLUXCOM provide useful information and are widely used as the validation dataset [44][45][46][47][48]. FLUX-COM is chosen here for its wide availability throughout the CONUS domain during the time span of the experiment and its quality [49,50]. In this work, FLUXCOM NEE has been used to validate the model NEE output.
ISMN is an international global in situ soil moisture database. This network is a joint effort of the Global Energy and Water Exchanges Project (GEWEX), the Committee on Earth Observation Satellites (CEOS), the Global Climate Observing System-Terrestrial Observation Panel for Climate (GCOS-TOPC), the Group of Earth Observation (GEO), and the Global Terrestrial Network on Hydrology (GTN-H) [51]. ISMN soil moisture observations are automatically checked for outliers and transformed into volumetric units. ISMN soil moisture observations are automatically checked for outliers and transformed into volumetric unit. These data have been widely used as a reference in the past, as proven by the existing literature e.g., [8,24,31,52,53]. ISMN provides soil water content at different depths; in this work, observations at 0.1 m depth are considered as surface soil moisture (SSM) and the 1 m depth is considered representative of root zone soil moisture (RZSM).

The Data Assimilation System
EnKF, an approximated version of the traditional Kalman filter based on Monte Carlo simulations, was used as the DA technique in this experiment. In the EnKF, the state probability distribution is represented by a sample (or ensemble), propagated forward in time and updated when a new observation becomes available [14,17,54]. The EnKF technique has flexibility in treating errors in model equations and parameters and is particularly suitable for nonlinear problems, such as soil dynamics [20,55]. Twenty ensemble members are generated by perturbing the atmospheric forcing following the same methodology presented in [24,26]. The precipitation and shortwave radiation are perturbed via multiplicative error models, whereas an additive perturbation is used for longwave radiation.
GLASS LAI observations are perturbed via an additive model with a standard deviation of 0.1. Following Kumar et al. [24], leaf mass is also updated with LAI in this experiment. The observation perturbations for the SMAP product follow an additive model, with zero mean and a standard deviation of 0.04 m 3 /m 3 . Additionally, a cumulative distribution function (CDF) matching was applied prior to the SMAP DA as a scaling strategy to remove any existing bias between the satellite observations and model estimation, following the approach developed by Reichle et al., 2004 [56].
The model was spun up for 15 years (2001-1 April 2015) to set the system to an initial equilibrium. A total of four experiments were run: (i) an open-loop (OL) simulation, in which no assimilation is performed; (ii) a DA run that only merges GLASS LAI; (iii) a DA run that only merges SMAP soil moisture; and (iv) a DA run that jointly merges GLASS LAI and SMAP soil moisture.

System Evaluation
To evaluate the proposed methodology, ET, NEE, and soil moisture from the OL run and the three DA simulations are compared to a set of reference data, described in Section 2.4. Specifically, measurements from the FLUXCOM and GLEAM datasets are used to validate modeled ET and NEE, respectively. The ISMN soil moisture dataset is adopted in this work for validating model simulations of SSM and RZSM. To ease the validation procedure, the validation data were converted to the resolution of the model outputs (0.5 • × 0.625 • ) by simply averaging all points falling within the model grid cell. All three DA experiments were evaluated in terms of ET, NEE, SSM, and RZSM.
Two metrics were used to evaluate the proposed methodology: the Normalized and Centered Root Mean Square Error (NCRMSE) and the Anomaly Correlation Coefficient (ACC). Biases were not evaluated in this study for various reasons. First off, a major assumption of the EnKF is the unbiasedness between model states and observations [57]. Secondly, satellite products, model estimates, and ground observations are different in nature (e.g., point vs. pixel estimate, soil depth for soil moisture) and should therefore be considered as indicators of a specific water/vegetation variable, rather than its perfect approximation.
The NCRMSE is adopted as a measure of the random error [9,26,58,59]: wherex represents the model output (DA or OL), X obs is the related observed variable, and N represents the number of days.
To determine if the DA improves or degrades the estimation of a specific variable with respect to the original (OL) model run, the Normalized Information Contribution (NIC; [24,26]) for NCRMSE is computed as follows: where NIC NCRMSE is the NIC for NCRMSE, NCRMSE DA is the NCRMSE of the DA output, and NCRMSE OL is the NCRMSE for the OL output. A NIC value of 1 corresponds to the DA maximum possible improvement over the OL; a value of zero of NIC means that DA and OL have the same performance; and a negative NIC indicates a model degradation through DA [26]. NIC is a unitless metric. The daily ACC is calculated for the OL and DA model outputs with respect to the reference observations. Anomalies are defined here as the differences between daily values and the yearly climatological average. Each of the anomaly time series is computed relative to the mean of its respective model run. The ACC metric, regardless of potential mean biases or differences in dynamic range, captures the correspondence in phase between model estimates and the reference data [60]. ACC is defined as follows: wherex represents the model output (DA or OL), X obs is the corresponding observed variable from ground or satellite observations, and N represents the number of days.
For investigating the improvement in ACC due to DA, the difference between ACC in the DA run and OL is as follows: If diff ACC is positive (negative), the DA output has higher (lower) correlation with the observed data than the OL output. ACC is unitless.

Impact of Data Assimilation
In order to assess how the proposed DA techniques impact the updated variables, Figure 2 shows the percentage changes between DA and OL for the set of variables of interest in this study: LAI, ET, NEE, SSM, and RZSM. First, all three DA approaches have an impact on the model estimation of these variables. Specifically, the GLASS LAI DA and the joint (GLASS LAI and SMAP soil moisture) DA show higher values of LAI in most areas of CONUS when compared to OL, whereas SMAP DA presents lower values of LAI than the original OL, especially over the central part of CONUS, mostly characterized by short-rooted plants (grassland and cropland). This also translates into larger regions where the joint DA decreases the OL-simulated LAI with larger (negative) differences between the joint DA and the OL (darker red areas in the top right panel of Figure 2).
ET and NEE also show larger differences with the original OL run after the application of GLASS LAI DA and the joint DA (when compared to the SMAP DA). This is due to the fact that these two vegetation variables are directly related to LAI in Noah-MP (and only indirectly related to surface soil moisture). Overall, both ET and NEE decrease after the application of DA, except for NEE in the western portion of CONUS in the SMAP DA experiment and for ET in all three DA experiments. This may be correlated to the land cover type and should be further investigated in the future.
For SSM, in the southwestern region of CONUS (often characterized by severe droughts), the LAI DA shows lower SSM values with respect to the original OL simulation, with stronger differences in the mid-western region. SMAP DA presents larger SSM values in the western US, but slightly smaller SSM values in the Midwest. Once again, the joint DA SSM map looks like a combination of the previous two, with large negative differences in the southwest and moderate positive differences in the central and eastern regions of CONUS.
The model OL produces higher RZSM values than the GLASS LAI DA and the joint DA, especially in the central and eastern US, which is consistent with what was shown for SSM. The impact of SMAP DA on RZSM is limited, as also found in previous works [61,62]. Tangdamrongsub et al. [61] showed that SMAP DA improves the estimation of top soil layer up to 7%, but the improvement in the deeper soil layer is not evident. It is important to reiterate that SMAP can only detect water content in the top few centimeters of the soil column; thus, RZSM is not directly assimilated. Nevertheless, the assimilation of LAI and SMAP observations at the surface is transferred to lower layers, as RZSM is controlled by transpiration, root water uptake, and water content of the above soil layer in the Noah-MP model [24,63]. ET and NEE also show larger differences with the original OL run after the application of GLASS LAI DA and the joint DA (when compared to the SMAP DA). This is due to the fact that these two vegetation variables are directly related to LAI in Noah-MP (and only indirectly related to surface soil moisture). Overall, both ET and NEE decrease after the application of DA, except for NEE in the western portion of CONUS in the SMAP DA experiment and for ET in all three DA experiments. This may be correlated to the land cover type and should be further investigated in the future.
For SSM, in the southwestern region of CONUS (often characterized by severe droughts), the LAI DA shows lower SSM values with respect to the original OL simulation, with stronger differences in the mid-western region. SMAP DA presents larger SSM values in the western US, but slightly smaller SSM values in the Midwest. Once again, the joint DA SSM map looks like a combination of the previous two, with large negative differences in the southwest and moderate positive differences in the central and eastern regions of CONUS.
The model OL produces higher RZSM values than the GLASS LAI DA and the joint DA, especially in the central and eastern US, which is consistent with what was shown for SSM. The impact of SMAP DA on RZSM is limited, as also found in previous works [61,62]. Tangdamrongsub et al. [61] showed that SMAP DA improves the estimation of top soil layer up to 7%, but the improvement in the deeper soil layer is not evident. It is important to reiterate that SMAP can only detect water content in the top few centimeters of the soil column; thus, RZSM is not directly assimilated. Nevertheless, the assimilation

Validation of ET
ET is the amount of water that land surface and plants release to the atmosphere [9]. The ET outputs from the DA and OL runs are evaluated against GLEAM ET according to the NIC of NCRMSE (Figure 3). GLASS LAI DA shows an overall improvement except for some areas in the western region of CONUS. SMAP DA shows limited improvement and some degradation in the random error over the southwest part of CONUS. This region often experiences severe droughts, is located nearby the coast, and characterized by complex terrain. Therefore, the soil moisture observed by the SMAP satellite may not be accurate enough to improve the ET model simulation [64]. Such degradation due to the SMAP DA is also translated into the joint DA run, which nonetheless presents the same improvement provided by the GLASS LAI DA in the remainder of the domain.
Next, the performance of the three DA techniques proposed in this work is evaluated based on the difference in ACC (Figure 4). Similar to results obtained from the analysis of the random error, GLASS LAI shows an overall improvement in ACC compared to the original model simulation. The SMAP DA shows a degradation in the southwestern region but slightly higher correlations (compared to the ones obtained by the OL) in the rest of the study area. The joint DA map again reflects the ACC from both GLASS LAI DA and SMAP SM DA.
ET is the amount of water that land surface and plants release to the atmosphere [9]. The ET outputs from the DA and OL runs are evaluated against GLEAM ET according to the NIC of NCRMSE (Figure 3). GLASS LAI DA shows an overall improvement except for some areas in the western region of CONUS. SMAP DA shows limited improvement and some degradation in the random error over the southwest part of CONUS. This region often experiences severe droughts, is located nearby the coast, and characterized by complex terrain. Therefore, the soil moisture observed by the SMAP satellite may not be accurate enough to improve the ET model simulation [64]. Such degradation due to the SMAP DA is also translated into the joint DA run, which nonetheless presents the same improvement provided by the GLASS LAI DA in the remainder of the domain. Next, the performance of the three DA techniques proposed in this work is evaluated based on the difference in ACC (Figure 4). Similar to results obtained from the analysis of the random error, GLASS LAI shows an overall improvement in ACC compared to the original model simulation. The SMAP DA shows a degradation in the southwestern region but slightly higher correlations (compared to the ones obtained by the OL) in the rest

Validation of NEE
The same analyses shown above for ET are presented for NEE with respect to the FLUXCOM reference data. NEE is defined as the total carbon exchange between the atmosphere and plants [9]. In the Noah-MP model, NEE is computed by summing plant respiration and soil respiration, and by then subtracting the Gross Primary Production. The depletion rate of oxygen and nitrates in the soil rate is influenced by soil water content and soil temperature.
In Figure 5, GLASS LAI DA shows an overall improvement as per NIC of NCRMSE except for some parts of the western region, mostly characterized by shrublands ( Figure   Figure 4. Difference of ACC between DA and OL ET with respect to GLEAM ET from 2016 to 2018. Blue (red) color indicates improvement (degradation) after DA.

Validation of NEE
The same analyses shown above for ET are presented for NEE with respect to the FLUXCOM reference data. NEE is defined as the total carbon exchange between the atmosphere and plants [9]. In the Noah-MP model, NEE is computed by summing plant respiration and soil respiration, and by then subtracting the Gross Primary Production. The depletion rate of oxygen and nitrates in the soil rate is influenced by soil water content and soil temperature.
In Figure 5, GLASS LAI DA shows an overall improvement as per NIC of NCRMSE except for some parts of the western region, mostly characterized by shrublands (Figure 1). SMAP SM DA improves the estimation all over the study domain, except for some small regions in the southwest. The joint DA reflects the improvements and degradations from both the GLASS DA and SMAP DA. Both GLASS LAI DA and the joint DA improve the OL estimation of NEE, with larger anomaly correlation values (dark blue pixels) across CONUS ( Figure 6). However, the SMAP DA has minimal impact on the NEE ACC, which is why the performance of the GLASS LAI and joint DAs results very similar. Both GLASS LAI DA and the joint DA improve the OL estimation of NEE, with larger anomaly correlation values (dark blue pixels) across CONUS ( Figure 6). However, the SMAP DA has minimal impact on the NEE ACC, which is why the performance of the GLASS LAI and joint DAs results very similar. The NIC of NCRMSE for NEE shows a degradation in performance over shrublands for both GLASS LAI DA and the joint DA, with limited impact due to the SMAP DA. The NEE ACCs are smaller in shrubland regions compared to forests, woodlands, and croplands. Nonwoody semiarid ecosystems, e.g., shrubland and grassland, strongly respond to the availability of soil water [65]. However, the NEE estimated by the Noah-MP land surface model not only depends on soil water but also on the growth respiration of root, leaf, and wood. As LAI only provides an indicator of the greenness of the pixel, its assimilation in the model may not be enough to impact the estimation of NEE for small plants in semi-arid regions [66]. On the other hand, FLUXCOM uses greenness observation measures along with the measure of light that is emitted by pigments in the plants that are photosynthetically active (chlorophyll fluorescence), and then uses machine learning to merge the two datasets to estimate net ecosystem exchange [49,65]. This is possibly a reason why the observations of NEE over shrublands are higher than the model output and why the DA performance is poor (i.e., NIC of NCRMSE is negative and ACC is very low) across these vegetation covers compared to forested areas and croplands. The NIC of NCRMSE for NEE shows a degradation in performance over shrublands for both GLASS LAI DA and the joint DA, with limited impact due to the SMAP DA. The NEE ACCs are smaller in shrubland regions compared to forests, woodlands, and croplands. Nonwoody semiarid ecosystems, e.g., shrubland and grassland, strongly respond to the availability of soil water [65]. However, the NEE estimated by the Noah-MP land surface model not only depends on soil water but also on the growth respiration of root, leaf, and wood. As LAI only provides an indicator of the greenness of the pixel, its assimilation in the model may not be enough to impact the estimation of NEE for small plants in semi-arid regions [66]. On the other hand, FLUXCOM uses greenness observation measures along with the measure of light that is emitted by pigments in the plants that are photosynthetically active (chlorophyll fluorescence), and then uses machine learning to merge the two datasets to estimate net ecosystem exchange [49,65]. This is possibly a reason why the observations of NEE over shrublands are higher than the model output and why the DA performance is poor (i.e., NIC of NCRMSE is negative and ACC is very low) across these vegetation covers compared to forested areas and croplands.

Validation of Soil Moisture
Both surface and root zone soil moisture model estimates are validated in this experiment against an independent dataset (i.e., ISMN). In Noah-MP, SSM provides water content estimation at the top 10 cm of the soil layer, whereas RZSM provides estimation in the top 100 cm of the soil layer. ISMN provides data at different depths from 2 cm to 100 cm. The data from 10 cm and 100 cm are compared with SSM and RZSM of model outputs, respectively.
First off, the NIC of NCRMSE for SSM is computed at each ISMN station (Figure 7). The number of stations that present an improvement in NCRMSE due to DA is maximum for the joint assimilation case (39). The NIC of NCRMSE does not show any clear spatial dependence, although the SMAP DA seems to be particularly effective in the western US, which also translates into the joint DA results. Although GLASS LAI DA degrades in a couple of stations in the northwestern US, SMAP DA can improve the performance at those locations, which also shows an improvement after applying the joint DA.

Validation of Soil Moisture
Both surface and root zone soil moisture model estimates are validated in this experiment against an independent dataset (i.e., ISMN). In Noah-MP, SSM provides water content estimation at the top 10 cm of the soil layer, whereas RZSM provides estimation in the top 100 cm of the soil layer. ISMN provides data at different depths from 2 cm to 100cm. The data from 10 cm and 100 cm are compared with SSM and RZSM of model outputs, respectively.
First off, the NIC of NCRMSE for SSM is computed at each ISMN station (Figure 7). The number of stations that present an improvement in NCRMSE due to DA is maximum for the joint assimilation case (39). The NIC of NCRMSE does not show any clear spatial dependence, although the SMAP DA seems to be particularly effective in the western US, which also translates into the joint DA results. Although GLASS LAI DA degrades in a couple of stations in the northwestern US, SMAP DA can improve the performance at those locations, which also shows an improvement after applying the joint DA.  In terms of ACCs, the number of stations improved by the DA technique is higher than the number of stations showing a degradation (Figure 8). GLASS LAI DA increases the SSM ACC at more stations (37) than the one obtained though the joint DA (29). Although some stations only present a slight improvement thanks to the SMAP DA (such as the four located in New Mexico) or even a degradation due to the LAI DA (the northern station in New Mexico), in the joint assimilation run, their ACCs are better than the original OL ones (i.e., blue dots in the bottom panel). In terms of ACCs, the number of stations improved by the DA technique is higher than the number of stations showing a degradation (Figure 8). GLASS LAI DA increases the SSM ACC at more stations (37) than the one obtained though the joint DA (29). Although some stations only present a slight improvement thanks to the SMAP DA (such as the four located in New Mexico) or even a degradation due to the LAI DA (the northern station in New Mexico), in the joint assimilation run, their ACCs are better than the original OL ones (i.e., blue dots in the bottom panel). The same analyses are then performed for RZSM. Figure 9 shows the NCRMSE NIC between OL and DA, with respect to the ISMN stations. Among the 67 stations, SMAP DA shows an improvement at 41 locations, similarly to the joint assimilation (which improves the OL NCRMSE of RZSM at 38 stations). As for SSM, SMAP DA improves most stations in the western US, where GLASS LAI DA cannot fully improve the OL RZSM. Once again, the joint assimilation shows a combination of these results, by improving the model performance in almost all the western locations. This is particularly interesting, The same analyses are then performed for RZSM. Figure 9 shows the NCRMSE NIC between OL and DA, with respect to the ISMN stations. Among the 67 stations, SMAP DA shows an improvement at 41 locations, similarly to the joint assimilation (which improves the OL NCRMSE of RZSM at 38 stations). As for SSM, SMAP DA improves most stations in the western US, where GLASS LAI DA cannot fully improve the OL RZSM. Once again, the joint assimilation shows a combination of these results, by improving the model performance in almost all the western locations. This is particularly interesting, since, as highlighted above, RZSM is not directly assimilated within Noah-MP. Nevertheless, the model physics transfer the SMAP surface soil moisture information to lower layers, which impacts the estimation of root zone soil moisture, by improving the RZSM NCRMSE at more than 60% of the total stations. since, as highlighted above, RZSM is not directly assimilated within Noah-MP. Nevertheless, the model physics transfer the SMAP surface soil moisture information to lower layers, which impacts the estimation of root zone soil moisture, by improving the RZSM NCRMSE at more than 60% of the total stations. Figure 9. Same as Figure 7, but for RZSM. Figure 10 shows ACC differences between DA and OL for RZSM, where correlations are once again computed against the ISMN dataset. Here, the number of stations showing an improvement is the highest after the application of SMAP DA (30), but this number decreases with the application of the LAI and joint DA (25). Moreover, the number of stations at which the ACC improves thanks to DA is smaller than the number of stations where ACC degrades, and this result is consistent for all the DA approaches proposed here.  Figure 10 shows ACC differences between DA and OL for RZSM, where correlations are once again computed against the ISMN dataset. Here, the number of stations showing an improvement is the highest after the application of SMAP DA (30), but this number decreases with the application of the LAI and joint DA (25). Moreover, the number of stations at which the ACC improves thanks to DA is smaller than the number of stations where ACC degrades, and this result is consistent for all the DA approaches proposed here.

Discussion
The results from this work showed that the joint assimilation of GLASS LAI and SMAP SSM has the potential to improve the estimation of vegetation variables (e.g., ET and NEE) and water variables (SSM and RZSM) more efficiently than univariate DA approaches. All three DA techniques have an impact on the model estimation of LAI, ET, NEE, SSM, and RZSM. For all these variables, the joint DA clearly produces a combination of the results obtained by the two univariate DAs.
Although the impact of DA on several land surface variables is often evident, it may be minimal in some cases, as proven by previous studies. For instance, Kumar et al. [24] showed changes in LAI within a range of −0.9 to 0.9 and changes of −0.04 to 0.04 mm 3 /mm 3 in RZSM after the application of LAI assimilation. Another study by Maggioni et al. 2012 [67] showed changes in SSM ACC of the order of 9% due to SSM DA, which are comparable to the improvements discussed in Section 3.4. The impact of SMAP DA on RZSM is even more limited, corroborating findings from several previous studies [61,62]. Tangdamrongsub et al. [61] showed that SMAP DA improves correlations with soil moisture ground observations up to 7% in the top soil layer, but no significant improvements were observed in the 0-30 cm layer.

Discussion
The results from this work showed that the joint assimilation of GLASS LAI and SMAP SSM has the potential to improve the estimation of vegetation variables (e.g., ET and NEE) and water variables (SSM and RZSM) more efficiently than univariate DA approaches. All three DA techniques have an impact on the model estimation of LAI, ET, NEE, SSM, and RZSM. For all these variables, the joint DA clearly produces a combination of the results obtained by the two univariate DAs.
Although the impact of DA on several land surface variables is often evident, it may be minimal in some cases, as proven by previous studies. For instance, Kumar et al. [24] showed changes in LAI within a range of −0.9 to 0.9 and changes of −0.04 to 0.04 mm 3 /mm 3 in RZSM after the application of LAI assimilation. Another study by Maggioni et al. 2012 [67] showed changes in SSM ACC of the order of 9% due to SSM DA, which are comparable to the improvements discussed in Section 3.4. The impact of SMAP DA on RZSM is even more limited, corroborating findings from several previous studies [61,62]. Tangdamrongsub et al. [61] showed that SMAP DA improves correlations with soil mois-ture ground observations up to 7% in the top soil layer, but no significant improvements were observed in the 0-30 cm layer.
The results from this experiment align with previous findings showing an improvement in the random error of several land surface variables thanks to the application of LAI DA, SSM DA, and joint DA. Specifically, GLASS LAI showed an overall improvement of ET compared to the OL simulation, whereas SMAP DA showed a degradation in the southwestern region, where SMAP is known to have detection limitations, as demonstrated by Shellito et al. [64]. Such a degradation translated into the joint DA ET estimation, which, however, presented an improvement in the remainder of CONUS (thanks to the LAI DA).
LAI DA, SMAP DA, and the joint DA were also able to reduce the NEE random error over most of CONUS except for some western regions, mainly characterized by shrublands and grasslands. Furthermore, both LAI DA and the joint DA were able to improve NEE correlations (computed with respect to the FLUXCOM NEE reference) over the OL run across all vegetation types, but SMAP DA presented small impact on NEE anomaly correlations. This was also evident in past studies, such as that of Kumar et al. [24], who showed an improvement in ET and NEE over the central plains thanks to LAI DA.
For soil moisture, the LAI assimilation was shown to be particularly efficient over croplands, as also demonstrated by Kumar et al. [24]. SMAP DA was proven to reduce the random error in SSM and RZSM, which corroborates recent findings over Oklahoma by Rouf et al. [64]. The reduction in SSM random errors was found to be lower in forested areas, compared to locations characterized by short-rooted plants, which aligns with past studies. As an example, Bonan et al. [8] showed that, by using LDAS-Monde: i) the impact of soil moisture data assimilation was more consistent in areas with sparse vegetation and ii) negative correlations were observed between model and observed SSM over arid regions. Additionally, Albergel et al. [31] showed that the joint assimilation of LAI and soil moisture with LDAS-Monde had a small impact on SSM (1-4 cm depth) under severe climatic condition (heat waves and droughts). In terms of correlation coefficients, the proposed DA approaches were able to improve SSM at the majority of the validation sites, but only at a few stations for RZSM. This may be due to the inability of the satellite observations to improve the dynamic range of root zone soil moisture, which is much slower than the SSM one. Analyses based on land cover and soil type may help to shed some light on this topic and temporal aggregations may help to improve such anomaly correlation coefficients.

Conclusions
A major objective of the joint DA approach proposed in this paper is to improve water variables, in addition to vegetation variables. The majority of the ISMN stations (used here as the reference dataset) presented an improvement in the SSM error metrics due to DA, especially when the joint assimilation is adopted. Both in terms of SSM random errors and anomaly correlation coefficients, the number of stations that improved their performance thanks to the DA technique was larger than the number of stations showing a degradation. The joint assimilation was proven to combine the performance of each univariate assimilation and to improve the model performance in almost all the western locations.
This experiment sheds some light on the potential of merging a set of satellite-based observations within a land surface model to improve our characterization of carbon, energy, and water states and fluxes. Additional DA techniques could be investigated in future studies to combine such observations. For instance, Bonan et al. [8] showed that an Ensemble Squared Root Filter performed more efficiently than the Simplified Extended Kalman Filter when jointly assimilating LAI and soil moisture. This DA technique could be used for assimilating GLASS LAI and SMAP SSM and the results could be compared to the findings of this paper.
The SMAP satellite mission provides soil moisture products at two different resolutions (36 km and 9 km). Here, only the 36 km product was used because of computational constraints, but future work could verify the efficiency of assimilating higher resolution data. Future research should also investigate the impact of the proposed DA techniques on other vegetation and water variables across different regions of the world. Data Availability Statement: The original analysis done by the authors is reported in this article. The dataset used for data assimilation and validation can be found in the following links, which are open for the public. GLASS LAI data: http://glass.umd.edu/ (accessed on 9 December 2021); SMAP data: https://earthdata.nasa.gov/ (accessed on 9 December 2021); GLEAM data: https://www.gleam.eu/ (accessed on 9 December 2021); FLUXCOM data: http://fluxcom.org/ (accessed on 9 December 2021); ISMN data: https://ismn.geo.tuwien.ac.at/en/ (accessed on 9 December 2021). Further inquiries can be directed to the corresponding authors.

Conflicts of Interest:
The authors declare no conflict of interest.