Evaluation of Drydown Processes in Global Land Surface and Hydrological Models Using Flux Tower Evapotranspiration

: A key aspect of the land surface response to the atmosphere is how quickly it dries after a rainfall event. It is key because it will determine the intensity and speed of the propagation of drought and also affects the atmospheric state through changes in the surface heat exchanges. Here, we test the theory that this response can be studied as an inherent property of the land surface that is unchanging over time unless the above- and below-ground structures change. This is important as a drydown metric can be used to evaluate a landscape and its response to atmospheric drivers in models used in coupled land–atmosphere mode when the forcing is often not commensurate with the actual atmosphere. We explore whether the speed of drying of a land unit can be quantiﬁed and how this can be used to evaluate models. We use the most direct observation of drying: the rate of change of evapotranspiration after a rainfall event using eddy-covariance observations, or commonly referred to as ﬂux tower data. We analyse the data and ﬁnd that the drydown timescale is characteristic of different land cover types, then we use that to evaluate a suite of global hydrological and land surface models. We show that, at the site level, the data suggest that evapotranspiration decay timescales are longer for trees than for grasslands. The studied model’s accuracy to capture the site drydown timescales depends on the speciﬁc model, the site, and the vegetation cover representation. A more robust metric is obtained by grouping the modeled data by vegetation type and, using this, we ﬁnd that land surface models capture the characteristic timescale difference between trees and grasslands, found using ﬂux data, better than large-scale hydrological models. We thus conclude that the drydown metric has value in understanding land–atmosphere interactions and model evaluation.


Introduction
Soil moisture state and dynamics are key for climate and water resources assessments, particularly in water-limited periods and regions [1]. Soil moisture availability regulates numerous physical processes of the earth system; namely, the partitioning of energy fluxes at the land surface, agricultural growth and ecosystem dynamics, streamflow generation, subsurface drainage, and groundwater recharge [2][3][4][5][6][7]. The memory of soil moisture, understood as the persistence of soil moisture anomalies, can be of the order of weeks to months [8] and, therefore, longer than the memory of atmospheric anomalies (hours to days). This difference in residence times gives soil moisture a buffering or intensifying impact on climate extremes at the surface such as droughts, floods, or heat waves [9], as well as a role in the development of atmospheric processes at shorter timescales [10]. Thus, the soil moisture response to meteorological droughts (periods of no precipitation), and how fast the lack of precipitation leads to hydrological/agricultural droughts (plant stress, groundwater reduction, low 2 of 21 discharge), are crucial factors that we can use to assess model capabilities in a changing climate where droughts are likely to increase in severity and frequency [11].
In this work, we set out to evaluate drydowns, understood as soil drying rates during periods of no precipitation, in global hydrological models (GHMs) and land surface models (LSMs) against observations of evapotranspiration (ET). GHMs and LSMs can work in stand-alone mode, driven by meteorological data, or coupled to the atmosphere in general circulation and earth system models. In these models, soil moisture controls and is affected by the land-atmosphere fluxes of water and, if resolved, energy and carbon. It is strongly model-specific [12] because of the different formulations used to resolve surface processes dependent on soil moisture state, like evaporation or runoff, and to different model parameters. The JULES (Joint U.K. Land Environment Simulator) LSM, for example, does not explicitly consider the residual soil water content (amount of water that cannot be drained from the soil, because it is retained in disassociated pores and immobile films) as a component of soil moisture [13], which may lead to differences in soil water content values when comparing with models of a different nature or field observations as a result of spatial variations of residual water content [14].
Global products of observed surface soil moisture estimation from satellite missions are increasingly becoming available, for example, the works of [15][16][17] and independent studies have shown good correlations with in situ soil moisture data [18,19]. Recent studies use satellite products to characterise observed surface soil moisture drydowns [20], compare with in situ measurements [21], and evaluate model performance [22]. However, the model-specific nature of soil moisture and the depth limitation to these satellite estimates (typically accounting for the top 5-10 cm of soil) limit their usefulness in model evaluation, especially in the presence of vegetation that accesses deeper soil water.
Other studies use in situ observations, for example, the works of [23,24], to investigate soil moisture memory; however, the sparsity and the local nature of point-scale soil moisture observations call for caution on their potential use to evaluate global models' soil moisture dynamics [25].
After a given precipitation event over a vegetated area, a portion of the water input on the land surface is intercepted by vegetation, another portion runs off to surface waters, and the rest infiltrates the soil providing the source water for evapotranspiration (ET). The rate of drying of the soil during following hours and days until the soil moisture reaches the wilting point for the given vegetation in the area has three different regimes [26,27]. These regimes include the following: (1) the drainage stage dominated by gravity while the soil moisture remains above field capacity; (2) the ET first stage when water is held by gravity (no drainage below field capacity) and drying occurs at the rate demanded by the atmospheric potential ET (PET) while soil moisture remains above the critical point (threshold for limited soil water availability for roots); and (3) the ET second stage when the drying occurs at a rate limited by the soil moisture availability for vegetation in the root zone (between the critical point and the wilting point). 'Drydown' will refer hereafter to this latter stage of soil drying under water limitation.
These different timescales of drying are captured using new methods of model evaluation that overcome the limitations of observed soil moisture. For example, a study has focused on the energy response of the surface at different stages through the drying process, rather than the actual soil moisture evolution [28]. These authors evaluate global climate models, analyzing the energy balance response to drought through warming rates of the land surface, using a methodology based on satellite land surface temperature products [29,30].
Another method that takes advantage of the timescales of the drying, but focusing on vegetated areas and water-limited periods, is described by Teuling, et al. [7] and Blyth, et al. [31]. These authors showed that the surface response to soil moisture depletion can be evaluated from time series of ET alone, allowing for a direct drydown assessment using flux tower data. However, only very few sites and single drying curves were used in these studies.
The following challenge remains: Can we use flux tower data to expand this latter idea in order to find a generic drydown metric that would capture the role of vegetation? And how well do models of surface flux exchange, like LSMs and GHMs, represent the drydown process when we apply this metric to evaluate them against flux tower data? Such a parameter will help understand and tackle the problem of model variability and shortcomings in drought propagation affecting surface exchange fluxes with the atmosphere.
In Section 2, we first describe the flux data and models used in this study, then we define a metric of drydowns, and finally we apply this to time series from 35 flux tower observations during periods of no precipitation and to a suite of 10 global model assessments (LSMs and GHMs) from the eartH2Observe project (http://www.earth2observe.eu/). In Section 3, we analyse and compare the model and observations results, generalizing them through the responses under different vegetation types. We then discuss the relationship between the drydown rates and vegetation types and other outcomes of the experiment, in order to come to some conclusions about how to evaluate the models.

Flux Tower Data
We use in situ observations of latent heat flux and meteorological variables (required to compute PET and identify dry periods, see Section 2.3) from eddy-covariance flux towers. The data were obtained through the Protocol for the Analysis of Land Surface Models (PALS; [32]) and come originally from the FLUXNET LaThuile free fair-use subset (fluxdata.org). Both flux and meteorological data have half-hourly temporal resolution, and we averaged them to a daily time step for this study in order to compare with the model daily data. Days where less than two-thirds of the half-hourly time steps were available for latent flux were neglected. Gap filling and quality control of the meteorological variables were previously conducted by PALS [33]. Figure 1 shows the spatial distribution and vegetation types of the sites used for this study, while Table 1 lists the 35 sites in total with some key information and periods available. The sites chosen each have at least one dry event (as defined in Section 2.3) in the time series and represent a global spread of climate and vegetation types. The following challenge remains: Can we use flux tower data to expand this latter idea in order to find a generic drydown metric that would capture the role of vegetation? And how well do models of surface flux exchange, like LSMs and GHMs, represent the drydown process when we apply this metric to evaluate them against flux tower data? Such a parameter will help understand and tackle the problem of model variability and shortcomings in drought propagation affecting surface exchange fluxes with the atmosphere.
In Section 2, we first describe the flux data and models used in this study, then we define a metric of drydowns, and finally we apply this to time series from 35 flux tower observations during periods of no precipitation and to a suite of 10 global model assessments (LSMs and GHMs) from the eartH2Observe project (http://www.earth2observe.eu/). In Section 3, we analyse and compare the model and observations results, generalizing them through the responses under different vegetation types. We then discuss the relationship between the drydown rates and vegetation types and other outcomes of the experiment, in order to come to some conclusions about how to evaluate the models.

Flux Tower Data
We use in situ observations of latent heat flux and meteorological variables (required to compute PET and identify dry periods, see Section 2.3) from eddy-covariance flux towers. The data were obtained through the Protocol for the Analysis of Land Surface Models (PALS; [32]) and come originally from the FLUXNET LaThuile free fair-use subset (fluxdata.org). Both flux and meteorological data have half-hourly temporal resolution, and we averaged them to a daily time step for this study in order to compare with the model daily data. Days where less than two-thirds of the half-hourly time steps were available for latent flux were neglected. Gap filling and quality control of the meteorological variables were previously conducted by PALS [33]. Figure 1 shows the spatial distribution and vegetation types of the sites used for this study, while Table 1 Table 1 for a more complete description.  Table 1 for a more complete description.

Model Data
The model simulations that we evaluate here constitute the eartH2Observe project Water Resources Reanalysis 1 (WRR1) dataset, presented in Schellekens, et al. [34]. WRR1 consists of an ensemble of 10 GHM and LSM water cycle assessments, covering the period 1979-2012, that represents the state of the art in current global hydrological and land surface modeling. A list of the global models and some key information on their nature is given in Table 2. GHMs typically operate at a one-day time step, whereas LSMs work at a shorter time step to allow for a representation and solution of the surface energy balance and the diurnal cycle. All 10 model simulations are driven by the same dataset; namely WATCH (WATer and global CHange) Forcing Data methodology applied to ERA-Interim reanalysis (WFDEI [35]), which provides three-hourly time intervals of the meteorological data that GHMs and LSMs typically use to run when they are uncoupled from atmospheric models (surface pressure, incoming radiation, wind speed, near surface humidity and temperature, and precipitation). The spatial resolution of the meteorological forcing data WFDEI and the WRR1 model outputs is 0.5 • . Table 2. Descriptive information about the 10 Water Resources Reanalysis 1 (WRR1) models. Evapotranspiration scheme as reported in Schellekens, et al. [34], where more detailed information on the WRR1 modelling protocol and different model process schemes is given. The groundwater interactions column refers to whether the groundwater is a dynamic reservoir and can interact with the unsaturated soil above through two-way fluxes (drainage and capillary rise). LSM-land surface model; GHM-global hydrological model; PET-potential evapotranspiration. Benchmarking and signal-to-noise evaluations performed on WRR1 on the different components of the water cycle at the monthly timescale have reported large model uncertainty [34], and a summary of evaluation results on latent heat against flux tower observations is provided in Table S1. Here, we focus on a more process-based evaluation of models' drydowns using daily ET (this is the highest temporal resolution available in WRR1 outputs). No calibration to better represent local conditions was performed by the models for this study, as we aim to evaluate the drydown model skill on large-scale applications such as WRR1.

Defining the Metric
We study drydowns during periods of no precipitation over vegetated areas. Therefore, to do so, we identify our drydowns by looking at the precipitation time series from the flux tower sites and from the model grids. We select periods of 10 or more days of no precipitation in order to have a sufficiently long time series for each drydown.
At the ET second stage, the ET ratio is typically assumed to have a linear dependency with the available soil moisture in the root zone [13,20], that is, as where S(t) is the soil moisture storage in the root zone, S w is the soil moisture at wilting point in the root zone, S cap = S * − S w is the soil moisture capacity or maximum availability for transpiration in the root zone, and S * is the critical soil moisture in the root zone. All soil moisture variables in Equation (1) have units of m 3 m −3 . During this ET second stage of a drydown, there is no precipitation, runoff and drainage are negligible, and thus ET is the only process depleting the root zone soil moisture. The root zone soil water balance can be written as Combining Equations (1) and (2) above, it follows that the ET ratio decays exponentially during the water-limited stage of a period of no precipitation, as 6 of 21 where ET 0 /PET 0 is the ET ratio at t = 0, and τ = S cap /PET. We define τ (days) as the lifetime of a drydown, a physical parameter with units of time that characterises the drydown as a function of the root system of the site/area and the event evaporative demand from the atmosphere. The τ parameter is independent of the initial ET ratio, and thus independent of the initial state of soil moisture when the precipitation stops. A similar exponential curve has been derived and used previously in other studies using observed and simulated time series of ET, for example, the works of [7,31]. However, we have adopted the use of an ET ratio, helping the identification of water-limited drydowns as those events with an ET ratio below one. The method only works when the soil reaches water-limited conditions. To enable this, we introduce a threshold and define a 'dry event' as an event with 0 < τ < 50 days. This threshold indicates that events that take 34 days or more to decrease the ET ratio to a half of its initial value are excluded from the analysis. Such excluded events present very slow drying or no drying at all, which can be the result of no shortage of root zone soil moisture in regions of shallow groundwater table and slow drainage [54], artificial water sourced from irrigation [55], or low radiation supply in winter. We also consider the first day of no precipitation as the starting point of the drydown event. This decision avoids including interception (a process that does not evaporate soil water, but canopy intercepted water) in the water-limited ET, thus effectively assuming that any water intercepted by vegetation is evaporated within 24 h [26]. These constraints are designed to ensure that all the drydowns analysed use data from the ET second stage, and any conclusion drawn from our analysis will be valid for water-limited conditions.
The calculation of PET was performed following the Penman-Monteith approach [56], which includes the effect of stomatal resistance to calculate PET from atmospheric conditions of temperature, wind, incoming radiation, and humidity (as discussed in Robinson, et al. [57]). We specifically used the same parameters as in Equation (4) of the cited paper [57], for a reference crop of 0.12 m height and stomatal resistance of 70 s/m and, therefore, note that it will be biased in winter, as well as for tree sites.
We calculate τ for each drydown using the non-linear least squares interpolation to fit the data to an exponential function in time of the form y = a × exp(−b × t), where y is the time series of daily ET/PET, t is time (days of the dry period), and τ is diagnosed as τ = 1/b.
We are looking to evaluate the response of a landscape to dry periods. Given that we are only considering water-limited conditions, we assign a single drydown exponential curve to all drydown events at a particular site/grid cell. We use the median τ to characterise the landscape.
The assumption that a single drydown metric for a particular site/grid cell is valid is tested by calculating the interquartile range.

Applying the Metric to the Flux Tower Data
We used the in situ measurements of meteorological variables to calculate PET for the observation sites. As a showcase of the methodology, Figure 2 presents four exemplary drydowns identified and analysed from the sites that presented a highest number of dry events per year in their continental area (Audubon in United States, Majadas in Spain, Kruger in South Africa, and Howard in Australia). As a measurement of the tightness of the linear relationship between ET ratio and root zone soil moisture during drydowns, we use the standard deviation error (σ) of the fitting. For the events in Figure 2, σ metrics are given in the caption.

Applying the Metric to the Models
For the models, we calculated a common PET using the WFDEI meteorological dataset that drives all WRR1. We then calculated the drydown metric over the full dataset. Figure 3 shows the median τ values calculated at every grid cell for the 34 year runs of the WRR1 models. This is discussed in Section 4.3.

Applying the Metric to the Models
For the models, we calculated a common PET using the WFDEI meteorological dataset that drives all WRR1. We then calculated the drydown metric over the full dataset. Figure 3 shows the median τ values calculated at every grid cell for the 34 year runs of the WRR1 models. This is discussed in Section 4.3.

Site Level Results and Selection of Representative Sites
The observed median lifetime τ characterizing drydowns for the observation sites, as well as the fraction of events of 10 or more days of no precipitation that fall under our definition of dry event, are shown in Table 3.

Site Level Results and Selection of Representative Sites
The observed median lifetime τ characterizing drydowns for the observation sites, as well as the fraction of events of 10 or more days of no precipitation that fall under our definition of dry event, are shown in Table 3. Table 3. Results of the drydown median lifetimes (τ) and fraction of dry events at the FLUXNET observation sites that present at least one dry event in the available dataset. N dry is the number of dry events identified and N total is the total number of events with 10 or more days of no precipitation. The sites are sorted in order of increasing fraction of events identified as dry events (N dry /N total ≥ 0.5). IQR is the interquartile range of drydown lifetimes distribution for dry events identified in the observations. "-" means that the IQR is non-applicable for sites with only 1 dry event. In addition, Figure 4 shows the identification of the sites as dry (50% or more of the events at the site are dry events) or wet (25% or less of the events at the site are dry). We used the flux tower data for observations (first column) and the modeled data for the 10 WRR1 models at the grid cells containing the sites locations (rest of the columns). A study of these results shows that the sites fall into one of three categories:

1.
Too wet for analysis.

2.
Observations are too wet, but models are dry enough.

3.
Both observations and models are dry enough. Eighteen sites fall into categories 1 or 2. They are found to be wet by the observations (Ndry/Ntotal ≤ 0.25; top sites in Figure 4). However, we can retrieve very useful information in terms of model representation, as the models do not agree in finding some of the locations to be wet (seven sites are in category 2). This is discussed in Section 4.4.
Category 3 includes the sites that can be used for subsequent analysis. On sites with a higher number of dry events and that are found to be dry by observations and by most models (e.g., Tonzi, Figure 4. Results of the dry/wet site identification at the 35 FLUXNET sites by observations and models. Colored squares represent the fraction of events of 10 or more days of no precipitation that are considered as dry events; blue color indicates N dry /N total ≤ 0.25 (wet site), yellow color indicates N dry /N total ≥ 0.5 (dry site), and white color indicates 0.25 < N dry /N total < 0.5.
Eighteen sites fall into categories 1 or 2. They are found to be wet by the observations (N dry /N total ≤ 0.25; top sites in Figure 4). However, we can retrieve very useful information in terms of model representation, as the models do not agree in finding some of the locations to be wet (seven sites are in category 2). This is discussed in Section 4.4.
Category 3 includes the sites that can be used for subsequent analysis. On sites with a higher number of dry events and that are found to be dry by observations and by most models (e.g., Tonzi, Vaira, Majadas, Mopane, Kruger, Audubon), a comparison of observed and modeled drydown characteristics is more comprehensive and significant, providing information about drought response by the land in both observations and models.
The σ metric for all dry events identified at the observation sites by the flux data and by the models is consistently low (see Figure S1) and has no apparent correlation with the ET/PET decrease during the drydown. This weak relationship indicates that the severity of the drydown and/or its duration (factors affecting the decrease in ET ratio) do not have a strong effect on the tightness of the exponential fitting, further supporting the exponential curve assumption for ET/PET decay during drydowns under water-limited conditions. Table 4 and Figure 5 show the computed median τ for observations and all 10 WRR1 models on sites where higher number of dry events are observed (using the threshold of seven events here, see Table 3), ordered from highest to lowest observed τ. Table 4. Results of the drydown median lifetimes (τ [d]) at the 10 FLUXNET sites with higher number of dry events identified and at the grid cells containing the site locations in the WRR1 models. The entries are sorted in order of decreasing τ in the observations. The values given here and their corresponding interquartile range are plotted in Figure 5. The number of dry events in the observations is shown in Table 3, and the number of dry events in the models (not shown here) is greater than that in the observations given the larger data record (34 years In addition to the median τ and its interquartile range for observed and modeled dry events (bar chart inside each plot), Figure 5 represents the hypothetical drydown characterizing each site (orange dash lines for observations and solid color lines for models), following an exponential curve from an ET ratio of 1.0 at the initial day of the dry spell, with a lifetime parameter equal to the median τ. This representation of hypothetical drydowns gives an illustrative idea of the rate of drying represented by observations and models at each site. For these 10 sites in Figure 5, we see a strong model variability. Some features are common to most sites: JULES (yellow) is the slowest (highest τ) of the LSMs, SURFEX-TRIP (red) is the quickest (lowest τ) of the LSMs, and WaterGAP3 (brown) is the quickest model. Across sites, there is greater consistency between the LSM responses than is found between the GHM responses.  Hypothetical drydowns using the median τ from identified dry events by the observations (dashed orange line) and WRR1 models (continuous lines). Ten locations with a higher number of drydowns observed (see Table 3): (a) Rocca 1, (b) El Saler, (c) Tonzi, (d) Mopane, (e) Majadas, (f) Figure 5. Hypothetical drydowns using the median τ from identified dry events by the observations (dashed orange line) and WRR1 models (continuous lines). Ten locations with a higher number of drydowns observed (see Table 3 The mismatch between modeled and observed drydowns in Figure 5 can be explained, in part, by the locality of the flux tower measurements. As observations have a footprint typically below 10 km 2 [58], and given the integrative nature of global models that work at a spatial scale resolution of thousands of km 2 (like WRR1 models; spatial resolution of 0.5 • ), the vegetation 'type' of the flux tower might not match that of the modeled grid cell [34].

Analysis of Results at the Site Level
The mismatch between the modeled vegetation and observations can easily be seen in Figure 5a-d. The models clearly represent drydown rates much better at Tonzi, with six models having a value of median τ within 17% of the observations. However, at the other three sites, most models consistently overestimate the drydown rates (lower lifetimes), meaning that the soil dries quicker in the models than in the observations. In all of these cases, the flux towers and models disagree in the land cover (Table 5); with flux towers saying they are tree based and models saying they are grasses (El Saler and Mopane), or flux towers saying broadleaf trees while the models say shrubs (Rocca 1). The opposite situation occurs at the Vaira site (Figure 5h), where all models underestimate the drydown rates (higher lifetimes), explained in this case by a vegetation cover mismatch in the other direction; grassland at the site and broadleaf trees as dominantly seen by the models (Table 5). At the sites with grass as vegetation cover for both the observation site and the models (Fort Peck and Audubon; Figure 5i,j), the model drydown underestimation is not as widespread and some models (HTESSEL-CaMa, ORCHIDEE, SURFEX-TRIP, WaterGAP3) capture low drydown rates (quick drying).

Generalizing the Results for Model Evaluation
It is clear that vegetation type is an important factor in the drydown metric, as the order of the sites in Table 4 from high to low drydown rates corresponds with decreasing vegetation stature (Table 5). To create a generic metric, we thus aggregate the drydown events according to the vegetation cover at their site or dominant in their model grid cell (Figure S2), enabling a more like-for-like comparison. This allows us to use the whole global dataset in the case of the models, and reach a comprehensive characterization of modeled drydown rates.
In Figure 6, we plot the resulting drydown metric for the four vegetation types in Figure S2 across all climates for all the models. Alongside is the metric from the observations. The relation is very apparent and almost linear in the case of observations, from slowest drying over broadleaf trees (τ~30 d), then needleleaf trees (τ~25 d), and shrubs (τ~22 d), to quick drying over grasses (τ~20 d).
Most models show a similar decrease in median τ from broadleaf trees to grasses, with the exception of the rates for shrubs, where the models (all but SWBM) present quicker rates than those for grasses. This might be explained by the models assuming that shrublands are sparse (low leaf area indexes, low canopy heights, and shallow roots), whereas the flux sites that are categorised as 'shrub' tend to represent zones more intensely vegetated (i.e., the Tonzi site reports an overstory of blue oak trees covering 40% of the total vegetation, with intermittent grey pine trees and understory species including a variety of grasses and herbs [59]). across all climates for all the models. Alongside is the metric from the observations. The relation is very apparent and almost linear in the case of observations, from slowest drying over broadleaf trees (τ~30 d), then needleleaf trees (τ~25 d), and shrubs (τ~22 d), to quick drying over grasses (τ~20 d). Most models show a similar decrease in median τ from broadleaf trees to grasses, with the exception of the rates for shrubs, where the models (all but SWBM) present quicker rates than those for grasses. This might be explained by the models assuming that shrublands are sparse (low leaf area indexes, low canopy heights, and shallow roots), whereas the flux sites that are categorised as 'shrub' tend to represent zones more intensely vegetated (i.e., the Tonzi site reports an overstory of blue oak trees covering 40% of the total vegetation, with intermittent grey pine trees and understory species including a variety of grasses and herbs [59]).

Figure 6.
Drydown rates grouped by plant functional type for observations (red dots) and models (blue dots). The plant functional type representing each event is adopted as reported by the site in the case of observations or dominant at the grid cell in the case of models (see Figure 5). Dots indicate the median τ from identified drydown periods; bottom and top of grey boxplots indicate the 25th and 75th percentiles, respectively. Panels (a) to (j) correspond to comparisons between observations and the 10 models analysed (model indicated in panel legend). Savanah sites (Table 1) are identified as shrubs in this comparison.

Discussion
Global model outputs like the WRR1 product analysed here tend to be evaluated using earth observation datasets for a characterization of the model skill to reproduce observed global or regional water budgets (for example, see the works of [34,36]), whereas flux tower data can be used to evaluate the model surface fluxes exchange (for example, see the work of [60]). In this work, using flux tower data, we look for a process based evaluation that helps characterise the models in their capability to predict drought responses, and thus provide useful information to both model developers that are Figure 6. Drydown rates grouped by plant functional type for observations (red dots) and models (blue dots). The plant functional type representing each event is adopted as reported by the site in the case of observations or dominant at the grid cell in the case of models (see Figure 5). Dots indicate the median τ from identified drydown periods; bottom and top of grey boxplots indicate the 25th and 75th percentiles, respectively. Panels (a-j) correspond to comparisons between observations and the 10 models analysed (model indicated in panel legend). Savanah sites (Table 1) are identified as shrubs in this comparison.

Discussion
Global model outputs like the WRR1 product analysed here tend to be evaluated using earth observation datasets for a characterization of the model skill to reproduce observed global or regional water budgets (for example, see the works of [34,36]), whereas flux tower data can be used to evaluate the model surface fluxes exchange (for example, see the work of [60]). In this work, using flux tower data, we look for a process based evaluation that helps characterise the models in their capability to predict drought responses, and thus provide useful information to both model developers that are constantly working to improve the tools and water managers or other potential users of WRR1 that might be interested in drought response over particular regions. The reasons for the accuracy or shortcomings of the WRR1 models in their simulation of water-limited drydowns, however, will be dependent on particular model configurations and process representations (for instance, only two models incorporate groundwater interactions with the unsaturated zone in the soil).

Role of Vegetation in τ
The analysis in Section 3 shows a strong link between drydown and vegetation cover. We can simplify our analysis to a first order vegetation cover differences using a three-type classification: trees, shrubs, and grasses. For the models, we use the dominant vegetation type map ( Figure S2) to identify any event over tree dominated regions (broadleaf trees, needleleaf trees), shrub dominated regions, and grass dominated regions. For the observation sites, we classify sites with vegetation cover reported of "Grasslands" or "Croplands" (Table 1) as grasses, "Savannah" or "Woody savannah" as shrubs, and the rest of the sites as trees. Under this criterion, we have a simpler characterization of drydown rates by both models and observations ( Figure 7). As discussed in Section 3.3, there is an issue with shrubs as the models dry too quickly (Figure 7b), with SWBM being the only model slower than observations (higher τ). Focusing on trees and grasses (Figure 7a,c), we can see that the models represent better the drydown rates over tree covered regions (six models within 10% of observed median τ: 27.8 days), whereas over grass covered regions, the model variability and differences with observations are higher (only two models, HTESSEL-CaMa and ORCHIDEE, within 10%).
simplify our analysis to a first order vegetation cover differences using a three-type classification: trees, shrubs, and grasses. For the models, we use the dominant vegetation type map ( Figure S2) to identify any event over tree dominated regions (broadleaf trees, needleleaf trees), shrub dominated regions, and grass dominated regions. For the observation sites, we classify sites with vegetation cover reported of "Grasslands" or "Croplands" (Table 1) as grasses, "Savannah" or "Woody savannah" as shrubs, and the rest of the sites as trees. Under this criterion, we have a simpler characterization of drydown rates by both models and observations ( Figure 7). As discussed in Section 3.3, there is an issue with shrubs as the models dry too quickly (Figure 7b), with SWBM being the only model slower than observations (higher τ). Focusing on trees and grasses (Figures 7a,c), we can see that the models represent better the drydown rates over tree covered regions (six models within 10% of observed median τ: 27.8 days), whereas over grass covered regions, the model variability and differences with observations are higher (only two models, HTESSEL-CaMa and ORCHIDEE, within 10%).

Figure 7.
Hypothetical drydowns using the median τ from identified dry periods by the observations (dashed orange line) and WRR1 models (continuous lines). Data grouped by vegetation cover: (a) trees, (b) shrubs, and (c) grasses. Inset bar charts represent the median τ values and error bars represent the interquartile range. Note that whereas the number of drydown events that the median τ represents for the observations is low and very similar for the three vegetation covers (56 for trees, 48 for shrubs, and 56 for grasses), in the case of models, the number of events is much higher and relatively low for shrubs as they are dominant over a smaller fraction of the globe, according to Figure  S2 (of the 10 5 of events per model for trees and grasses and 10 4 for shurbs).
The observations show a reduction in median τ from trees to grasses of 47% (from 27.8 to 14.8 days). However, as shown in Table 6, the models show a smaller reduction depending on whether they are LSMs (between 21% and 38%) or GHMs (22% or below, with the exception of WaterGAP3, which presents very quick drydown rates for all vegetated areas). HTESSEL-CaMa shows this reduction with very good agreement with observations for both trees and grasses. Other two LSMs, ORCHIDEE and SURFEX-TRIP, which are significantly quicker than observations over trees, present a much better agreement over grasses. JULES shows a very good agreement with observations for trees (29.7 days versus 27.8 days in the observations), but the τ reduction for grasses is too small. SWBM also shows very good agreement with observations for trees (30.1 days), but then the The observations show a reduction in median τ from trees to grasses of 47% (from 27.8 to 14.8 days). However, as shown in Table 6, the models show a smaller reduction depending on whether they are LSMs (between 21% and 38%) or GHMs (22% or below, with the exception of WaterGAP3, which presents very quick drydown rates for all vegetated areas). HTESSEL-CaMa shows this reduction with very good agreement with observations for both trees and grasses. Other two LSMs, ORCHIDEE and SURFEX-TRIP, which are significantly quicker than observations over trees, present a much better agreement over grasses. JULES shows a very good agreement with observations for trees (29.7 days versus 27.8 days in the observations), but the τ reduction for grasses is too small. SWBM also shows very good agreement with observations for trees (30.1 days), but then the representation of grasses is highly overestimated with a median τ of 26.5 days versus 14.8 days in the observations (too little reduction of τ from trees to grasses).

Other Factors
We acknowledge that the soil texture and not only the vegetation type should explain part of the observed and modelled variance of the drydown rates, as it determines the soil capacity to hold and conduct water at the surface and through the root zone (i.e., sandy soils with larger pores and lower water suction will evapotranspirate more easily and have quicker drainage down the soil column). This was shown by a clear systematic decrease of drydown rates with sand fraction by McColl, et al. [20] when they analysed surface soil moisture drying. However, when we relate our global modeled data of drydown rates (using ET ratio) with soil water suction data from the Harmonized World Soil Database (HWSD [61]), we do not find a relationship as clear among models as the results show when relating drydown rates with vegetation type ( Figure S3).

Global Maps of Median τ
Our methodology provides global maps of median τ from the WRR1 models, shown in Figure 3. These maps may be useful way to compare the features of the different models and to use satellite data for evaluation.

Identification of Drydowns and Characterization of Sites
The identification of water-limited drydowns detailed in Section 2.3 is a departure from recent studies where the drydown evaluation had to be discretised in bins of antecedent rainfall [28,30]. In addition to the main purpose of this work of evaluating how quickly drydowns occur when the soil water limitation affects surface fluxes, we have used our drydown identification method to characterise sites/regions as dry or wet, considering the ratio of periods of no precipitation that fall on our 'dry event' definition.
A group of sites (Blodgett, El Saler, El Saler 2, Castelporziano, Rocca 2, Amplero, and Espirra) were found to be dry by most models, but considered wet by the observations (Figure 4), even when they were located over semiarid regions, meaning that the models somehow stressed evaporative fluxes during periods of no precipitation, even though such stress was not observed. This points to other model processes being responsible for the observation/model mismatch in the identification of sites as dry, rather than the rate of drying, like irrigation (case of El Saler 2) or intensified soil moisture memory due to shallow groundwater connectivity [62]. Three of these sites (Blodgett, Amplero, and Espirra) were used by Ukkola, et al. [55] to analyse a range of LSMs (JULES and ORCHIDEE included) in their representation of droughts. These authors concluded that the models overestimate intensity and duration, falling in agreement with our findings here. We note that two particular GHMs (PCR-GLOBWB and SWBM) do not find these locations to be dry, however, these models tend to characterise all sites as wet.
Other sites were characterised as wet by both observations and models, as they are located in wet and cold northern latitudes (Hyytiala, Boreas, Degero, Quebecc) or under all year long humid climate conditions (Tumba, Willow).
Therefore, even though other processes beyond soil drying under water-limited conditions, like groundwater connectivity to the root zone, are not within the scope of this analysis as the theory does not hold on such conditions, the simple identification of dry/wet sites already gives some answers to the questions of what sites are actually water-limited during no precipitation periods and whether the models agree with observations in this identification.

On the Methodolody
We acknowledge that the assumption of constant PET during the dry event is a condition in order to obtain Equation (3). Possible oscillations of PET from day to day should be minimal during a dry period, and we have calculated ET/PET on a daily time step, so such oscillations are expected to be smoothed out on the exponential decay adjustment. Further, the restriction to the methodology of only considering events with 0 < τ < 50 days will neglect possible events where PET variations might result in the inaccuracy of the method.
Following this work, but slightly out of the scope of this publication, we consider separating events, for a particular site/region, into different PET categories. This would give us further insights into understanding under what atmospheric conditions the models represent more/less accurately the drydown process.

Summary and Conclusions
In this paper, we study the concept that we can characterise the rate of drydown after a rainfall event as a single value, and whether this is a useful metric to evaluate models. We set out to investigate if such a key quantity could be extracted from several years of direct measurements and whether existing large-scale models have a fixed value of this metric in time.
Ideally, we would be able to observe this quantity from satellite observations so that we could obtain global maps. However, there are problems with the use of satellite soil moisture observations to study this, as only the top surface soil moisture can be seen. In order to overcome the observed soil moisture limitations, other studies have focused on the surface responses rather than the actual soil moisture evolution, for instance, analyzing the energy balance response through warming rates of the land surface, using a methodology based on satellite land surface temperature products [28,30]. In this study, however, we use the most direct observation of evapotranspiration-the eddy-covariance method, commonly referred to as flux tower measurements.
The study reveals that it is possible to quantify the drydown process using the exponential curve of decrease of evapotranspiration (normalised by the evaporative demand or potential evaporation) when we have daily data of evapotranspiration, potential evapotranspiration, and precipitation. We define the lifetime parameter τ of this curve as our drydown metric, and characterise the land using the median τ obtained for all dry events identified within the time series of data available. A comparison of 35 sites showed a marked relation between the vegetation cover at the site and the drydown rates, with tree sites drying slower than grassland sites because of differences in the root system, and thus the soil water that they can reach.
We quantify the drydown metric for a suite of global land surface and hydrological models from the WRR1 assessment [34] run globally at 0.5 • resolution and compare it to the results from flux tower data. We find that a large part of the disagreement between the modeled and observed metric comes from discrepancies in the land cover type specified by the models compared with the actual vegetation at the flux tower. When we group the modeled metric into vegetation cover characterizations, we obtain a more robust metric for the models that correlates well with the findings at the site level. Land surface models show a strong difference between trees and grasses, in agreement with observations, while most large-scale hydrology models show a markedly lower difference.
We conclude that flux tower data can be used to evaluate drydown processes in global models using the median lifetime parameter τ, which is a property of the land and hence independent of the precipitation forcing, as long as the vegetation cover at the sites and in the models is taken into account.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4441/11/2/356/s1. Figure S1: Standard deviation error metric of the exponential curve fittings for all drydowns identified as 'dry events' against the actual decrease of ET/PET ratio during the event duration; Figure S2: Global distribution of dominant land cover or plant functional type (PFT) at the 0.5 • resolution (from data derived from the International Geosphere-Biosphere Programme: http://www.igbp.net/); Figure S3: Drydown rates grouped by soil matric suction at saturation for the WRR1 models. Table 1. Results of FLUXNET sites ( [1]) monthly mean latent heat evaluation for the models evaluated in the paper.