Evaluating Hydrological Processes of the Atmosphere–Vegetation Interaction Model and MERRA-2 at Global Scale

: Hydrological processes are a key component of land surface models and link to the energy budget and carbon cycle. This study assessed the global hydrological processes of the Atmosphere– Vegetation Interaction Model (AVIM) using multiple datasets, including the Global Land Data Assimilation System (GLDAS), the University of New Hampshire and Global Runoff Data Centre (UNH-GRDC), the European Space Agency (ESA) Climate Change Initiative (CCI), the Global Land Evaporation Amsterdam Model (GLEAM), and the Modern-Era Retrospective Analysis for Research and Applications Version 2 (MERRA-2) datasets. The comparisons showed that the AVIM gives a reasonable spatial pattern for surface soil moisture and surface runoff, but a less satisfactory spatial pattern for evapotranspiration. The AVIM clearly underestimates surface runoff worldwide and overestimates the surface soil moisture in the high latitudes of the Northern Hemisphere, while yielding moderately higher evapotranspiration in arid areas and lower evapotranspiration in low-latitude areas near the equator. The annual cycle of evapotranspiration in the AVIM shows good agreement with the GLEAM dataset, whereas the surface soil moisture in the AVIM has a poor annual cycle relative to the CCI dataset. The AVIM simulates a late start time for snowmelt, which leads to a two-month delay in the peak surface runoff. These results clearly point out the directions required for improvements in the AVIM, which will support future investigations of water–carbon– atmosphere interactions. In addition, the evapotranspiration in the MERRA-2 dataset had an overall good performance comparable with that of the GLEAM dataset, but its surface soil moisture did not perform well when validated against the CCI dataset.


Introduction
Water, vegetation, and the atmosphere interact closely with each other [1,2]. Pitman and McAvaney [3] and Pitman [4] stated that, "The ways in which the hydrology are modeled may be more important than the surface energy balance, at least in second-generation schemes." It has been shown that climate change predictions cannot be improved without a better description of the hydrology [4][5][6][7]. Moreover, changes in hydrological processes strongly affect the ecological system [8,9]. The coupling between ecological and hydrological processes has drawn increasing attention in scientific communities [1].
Soil moisture is a core variable in land-atmosphere interactions [10][11][12] and involves a series of feedbacks on local, regional, and global scales [10,13]. The soil moisture has an essential role in the carbon cycle of terrestrial ecosystems through affecting the uptake of carbon by photosynthesis and the release of carbon by respiration [14]. The soil moisture also affects the partitioning of incoming energy between the latent and sensible heat fluxes [10]. Evapotranspiration returns about 60% of the total precipitation over land back to the atmosphere [15] and may use more than half of the total amount of solar energy absorbed by land surfaces [16]. The coupling of soil moisture and evapotranspiration can influence the generation of precipitation in the afternoon [17][18][19], and it is relevant to the The AVIM is a land surface model originally proposed to simulate the energy-water balance among the atmosphere-vegetation-soil system by Ji in 1995 [28] in the Institute of Atmospheric Physics, Chinese Academy of Sciences, China. The AVIM was developed through adding dynamic vegetation physiology processes into the simple one-dimensional land surface process model of Ji and Hu in 1989 [29] in which vegetation was considered as a horizontally uniform layer and coupled with atmosphere and soil through transfers and exchanges of energy and water. The AVIM is involved in two international comparison plans: the Ecosystem Model-Data Intercomparison (EMDI) and the Carbon Subplan in the Project for Intercomparison of Land-surface Parameterization Schemes (PILPS-C). The AVIM showed the best performance among the models participating in the EMDI in terms of the NPP simulation on a global scale. The AVIM has been successfully coupled into several climate models-for example, the Global Ocean-Atmosphere-Land System (IAP/LASG GOALS) [6,30] model and the Regional Integrated Environment Modeling System (RIEMS) [31].
The AVIM V2.0 with a spatial resolution of 1.0 • was used in this study. The AVIM contains one canopy layer and a snowmelt module, but does not take into account the groundwater and subsurface runoff simulations. The AVIM has ten soil layers with a total depth of 2.5 m and assumes that the physical and chemical conditions in the deepest soil layer are constant. A one-dimensional Richards equation was used to calculate the movement of soil water between two adjacent layers as follows. The time step is 1 h.
where η is the volumetric soil water content (mm/mm), η s is the saturated volumetric soil water content (mm/mm), w is the relative soil moisture expressed as η/η s , D is the permeability of the soil, and Su is the transpiration (m/s) due to the uptake of water by roots. The surface runoff is controlled by a simple empirical equation of the surface soil moisture of the top 0.1 m of the soil, the ground precipitation, and the ground snowmelt water. The ground precipitation is the sum of the total precipitation minus the water intercepted by the canopy and the canopy drainage. If the temperature is less than zero, then the surface runoff is equal to zero.
where R s is the surface runoff (m/s), w 1 is the surface soil moisture, S mg is the ground snowmelt water (m/s), P g is the ground precipitation (m/s), P is the total precipitation (m/s), P c is the precipitation intercepted by the canopy (m/s), and D c is the canopy drainage (m/s). The AVIM adopts an aerodynamic method to compute the evapotranspiration (E a ), which includes the evaporation from bare soil (E g ), the evaporation from the wet part of the canopy (E w ), and the transpiration from the dry part of the canopy (E tr ). The total evapotranspiration and the different components are expressed as: where δ c is the ratio of the wet part of the canopy to the total canopy, ρ a is the air density, σ c is the canopy coverage, q s (T c ) is the saturated specific humidity when the leaf temperature is T c , q ac is the specific humidity of the canopy air, and r b is the resistance at the surface boundary layer of the leaves.
where r c is the stomatal resistance.
where q * s (T c ) is the specific humidity at the ground surface, r a is the air resistance above the canopy, and r s is the resistance of the soil surface. Detailed information about the AVIM has been published previously [28,[32][33][34].

Data and Methodology
As the AVIM does not simulate subsurface runoff, the water budget cannot be evaluated, and therefore the AVIM hydrological assessment focused on the surface runoff, soil moisture, and evapotranspiration. We previously evaluated the spatiotemporal characteristics of the runoff generated by GLDAS (versions 1.0 and 2.1) using the UNH-GRDC dataset [35]. In this study, the AVIM global surface runoff was validated directly by the GLDAS surface runoff and indirectly by the UNH-GRDC total runoff data. The soil moisture results from the CCI and MERRA-2 datasets were used to evaluate the surface soil moisture in the AVIM. The evapotranspiration performance of the AVIM was assessed by the GLEAM and MERRA-2 datasets. Furthermore, the surface soil moisture and evapotranspiration in the AVIM were primarily validated by the CCI and GLEAM datasets, respectively. We analyzed the MERRA-2 dataset to validate the AVIM, but were also interested in validating the surface soil moisture and evapotranspiration performances of MERRA-2 against the CCI and GLEAM datasets, respectively. The global datasets used in this study are presented in Table 1, and introduced below.
(1) GLDAS generates the optimum fields of the land surface states and fluxes by integrating satellite-and ground-based observational data using advanced land surface modeling and data assimilation techniques [36]. Lv et al. [35] summarized the features of land surface models: the Common Land Model (CLM), the Variable Infiltration Capacity (VIC), and the Noah land surface model. Similarly to the AVIM, they are all onedimensional land surface models with a snowpack module and use the one-dimensional Richards equation to calculate the soil moisture, and also without taking groundwater into account. (2) UNH-GRDC V1.0 with a 0.5 • resolution was produced by combining the river discharge observations from the GRDC with a climate-driven water balance model developed by the UNH. It provides the annual and monthly climatological global runoff data which preserves the accuracy of the discharge measurements and the spatial and temporal distribution of simulated runoff. Therefore, it is considered as the best available global gridded runoff dataset [37][38][39]. (3) The ESA released a global satellite-observed soil moisture dataset as part of its CCI program. The CCI algorithm generated three consistent, qualitycontrolled, multi-decadal soil moisture datasets from satellite observations-namely, a dataset based on only active microwave data (ACTIVE), a dataset based on only passive microwave data (PASSIVE), and a combined active-passive product (COMBINED) [11,[40][41][42]. The combined product of the CCI V04.7 dataset with a spatial resolution of 0.25 • was used in this study. (4) The GLEAM V3.3a data with a spatial resolution of 0.25 • [43,44], was employed here. This dataset is based on reanalysis radiation and air temperature data, a combination of gage-based, reanalysis, and satellite-based precipitation, and the satellite-based vegetation optical depth. (5) The MERRA-2 dataset with a spatial resolution of (0.5 • × 0.625 • ) is the latest atmospheric reanalysis dataset of the modern satellite era produced by NASA's Global Modeling and Assimilation Office. It replaces the original MERRA reanalysis dataset [45] using an upgraded version of the Goddard Earth Observing System Model, Version 5 data assimilation system. A comprehensive overview of the MERRA-2 dataset can be found in Gelaro et al. [46].
Considering that the AVIM surface runoff was mainly evaluated by the GLDAS 1.0 results in Lv et al. [35], the AVIM global simulation was forced by the GLDAS 1.0 forcing data, and the study period in this study was chosen as the most recent ten years after 2006 which is consistent with the period used by Lv et al. [35]. The AVIM model was run for 500 years to allow for spin-up. When the monthly series of the global area-mean variables (e.g., LAI, NPP, soil moisture, and surface runoff) had no obvious trend and fluctuate regularly, the spin-up was considered sufficient. The datasets used in this study have different spatial resolutions and were therefore re-gridded to a common spatial resolution of 0.25 • . We found that there are differences among the 12-month spatial distributions of missing data in the CCI dataset; therefore, each month of soil moisture data in the AVIM and MERRA-2 datasets was pre-processed to the same spatial distribution as the corresponding month in the CCI dataset. The performances of the AVIM and MERRA-2 datasets were validated by the statistical indexes of the pattern correlation coefficient (PCC), relative bias (RB), and the ratio of the standardized deviation (RSD). These metrics are calculated as follows.
Atmosphere 2021, 12, 16 where x i is the evaluated dataset and x is its average value, y i is the reference dataset and y is the corresponding average, and N is the number of data points.

Surface Runoff
The AVIM generates a weaker surface runoff than the GLDAS models globally (Figures 1 and 2). We found that a large portion (~70%) of precipitation is intercepted by the canopy in the AVIM, which likely accounts for the underestimation of surface runoff. The AVIM global area-weighted mean surface runoff shows an annual range from 1.4 to 1.8 mm month −1 , while the GLDAS models results vary from 2 to 20 mm month −1 . This implies that the AVIM underestimates the annual cycle of surface runoff relative to the GLDAS models. The AVIM global mean surface runoff shows two peaks in March and August, whereas all the GLDAS models show only one peak occurring in the period March-May. The global annual cycle of the AVIM surface runoff is relatively close to that of the GLDAS Noah subsurface runoff whose peaks are in March and September. Seen from Figure 1b,e, the August and September peaks in the AVIM global mean surface runoff and the GLDAS Noah global mean subsurface runoff are the results of their corresponding peaks in the Northern Hemisphere. The annual cycle of the AVIM surface runoff in the Southern Hemisphere matches well with the GLDAS1.0 CLM, with peaks in January and March in both models. Given that the primary peak in the UNH-GRDC total runoff in the Northern Hemisphere occurs in June as a result of the combination of precipitation and snowmelt at high latitudes [35], the annual cycle implies that the snowmelt is delayed in the AVIM, which leads to a late surface runoff peak (August) which is 2 months later than that of UNH-GRDC (June). This was confirmed by analyzing the monthly spatial distribution-namely, the AVIM begins to produce an obvious snowmelt runoff from May, whereas this starts in April in the UNH-GRDC dataset [35].
The AVIM generated an overall reasonable spatial distribution of the mean annual surface runoff that is closest to the spatial pattern of the GLDAS1.0 CLM surface runoff among the models in GLDAS 1.0 and 2.1 (Figure 2 only shows the spatial distribution of the GLDAS 1.0 CLM), characterized by a greater surface runoff over low-latitude regions near the equator and snowmelt surface runoff over the northern high-latitude areas. However, the AVIM surface runoff is excessively high in the southern and western Tibetan Plateau. runoff in the Northern Hemisphere occurs in June as a result of the combination of precipitation and snowmelt at high latitudes [35], the annual cycle implies that the snowmelt is delayed in the AVIM, which leads to a late surface runoff peak (August) which is 2 months later than that of UNH-GRDC (June). This was confirmed by analyzing the monthly spatial distribution-namely, the AVIM begins to produce an obvious snowmelt runoff from May, whereas this starts in April in the UNH-GRDC dataset [35].  The AVIM generated an overall reasonable spatial distribution of the mean annual surface runoff that is closest to the spatial pattern of the GLDAS1.0 CLM surface runoff among the models in GLDAS 1.0 and 2.1 (Figure 2 only shows the spatial distribution of the GLDAS 1.0 CLM), characterized by a greater surface runoff over low-latitude regions near the equator and snowmelt surface runoff over the northern high-latitude areas.

Surface Soil Moisture
The spatial patterns of the monthly mean surface soil moisture from the AVIM, CCI, and MERRA-2 datasets are generally consistent. Figure 3 shows that the differences between the AVIM and CCI datasets and the differences between the MERRA-2 and CCI datasets are spatially similar. However, both the AVIM and MERRA-2 datasets overestimate the surface soil moisture over the high-latitude regions between 65 and 90 • N, and the wet bias is much greater in the MERRA-2 dataset than in the AVIM. For example, the MERRA-2 dataset overestimates the surface soil moisture by >0.2 m 3 /m 3 against around 0.12 m 3 /m 3 in the AVIM in the high latitudes of the Northern Hemisphere. Moreover, the AVIM slightly overestimates the surface soil moisture in northern Africa and moderately overestimates the soil moisture on the Brazilian Plateau, the central-southern region of Africa, and northwest Australia. By contrast, the MERRA-2 dataset shows slightly drier soil moisture over these regions. In addition, the AVIM significantly overestimates the surface soil moisture in the southern and western Tibetan Plateau where there is also a large bias in the surface runoff ( Figure 2). The CCI dataset has missing data in some low-latitude regions and the MERRA-2 dataset is slightly different from the CCI dataset near these regions. We therefore evaluated the AVIM soil moisture in low-latitude regions based on the MERRA-2 data. The comparison with MERRA-2 shows that the AVIM yields slightly drier (wetter) soil over northwestern South America (central Africa) and comparable soil moisture for islands (data not shown).    We also found remarkable differences in the annual cycle of the area-weighted mean soil moisture at the global scale among the CCI, MERRA-2, and AVIM datasets ( Figure  5a). Peaks in the CCI dataset appear in July and December, whereas the peaks in the AVIM are in January and April, and the peaks in the MERRA-2 dataset are in May and October. Figure 5a also shows that the AVIM and MERRA-2 datasets exaggerate the monthly variation relative to the CCI dataset, and it again indicates that the soil moisture overestimations in the AVIM and MERRA-2 datasets are most pronounced in April and May, respectively. Figure 5 implies that the difference in the global annual cycle between the CCI and MERRA-2 datasets is mainly caused by the difference in the Northern Hemisphere. The annual cycle of the MERRA-2 dataset in the Southern Hemisphere is consistent with that of the CCI dataset. Shown in Figure 5c, both the CCI and MERRA-2 datasets show the highest soil moisture in March and the lowest in September, but with a slight underestimation by the MERRA-2 dataset (most obvious in September). However, besides the peak in March, the AVIM presents another peak in June, while the lowest soil moisture occurs in October which is one month later than in the CCI and MERRA-2 datasets (Figure 5c). Moreover, the difference in magnitude between the AVIM and CCI datasets is more obvious in January-April than other months (Figure 5c). Seen from Figure 5b, the disparity among the CCI, AVIM, and MERRA-2 datasets is pronounced in the Northern Hemisphere. The CCI shows a peak in November and a trough in March. We also found remarkable differences in the annual cycle of the area-weighted mean soil moisture at the global scale among the CCI, MERRA-2, and AVIM datasets (Figure 5a). Peaks in the CCI dataset appear in July and December, whereas the peaks in the AVIM are in January and April, and the peaks in the MERRA-2 dataset are in May and October. Figure  5a also shows that the AVIM and MERRA-2 datasets exaggerate the monthly variation relative to the CCI dataset, and it again indicates that the soil moisture overestimations in the AVIM and MERRA-2 datasets are most pronounced in April and May, respectively. Figure 5 implies that the difference in the global annual cycle between the CCI and MERRA-2 datasets is mainly caused by the difference in the Northern Hemisphere. The annual cycle of the MERRA-2 dataset in the Southern Hemisphere is consistent with that of the CCI dataset. Shown in Figure 5c, both the CCI and MERRA-2 datasets show the highest soil moisture in March and the lowest in September, but with a slight underestimation by the MERRA-2 dataset (most obvious in September). However, besides the peak in March, the AVIM presents another peak in June, while the lowest soil moisture occurs in October which is one month later than in the CCI and MERRA-2 datasets (Figure 5c). Moreover, the difference in magnitude between the AVIM and CCI datasets is more obvious in January-April than other months (Figure 5c). Seen from Figure 5b, the disparity among the CCI, AVIM, and MERRA-2 datasets is pronounced in the Northern Hemisphere. The CCI shows a peak in November and a trough in March. However, the peaks appear in May and September in the MERRA-2 dataset and in April and November for the AVIM. The trough is in February in the MERRA-2 dataset and in July for the AVIM. In addition, the differences in magnitude in the Northern Hemisphere are more significant than those in the Southern Hemisphere.

Evapotranspiration
Compared with the monthly mean evapotranspiration in the GLEAM dataset, the MERRA-2 dataset slightly overestimates evapotranspiration over most parts of the globe, except for some local areas where the evapotranspiration is slightly underestimated (Figure 6c,d). Furthermore, the difference is about 20 to 20 mm month -1 . By contrast, the AVIM has a relatively poor performance in terms of simulating evapotranspiration. Although the AVIM produces acceptable evapotranspiration in the high-latitude regions between 50 and 90° N, it clearly overestimates evapotranspiration in arid areas, including northern Africa, western Asia, Australia, and western North America, where the bias varies from about 40 to 194 mm month -1 (Figure 6a,b). The AVIM moderately underestimates evapotranspiration over low-latitude areas near the equator where the evapotranspiration should be highest worldwide. Furthermore, the overestimation of the

Evapotranspiration
Compared with the monthly mean evapotranspiration in the GLEAM dataset, the MERRA-2 dataset slightly overestimates evapotranspiration over most parts of the globe, except for some local areas where the evapotranspiration is slightly underestimated (Figure 6c,d). Furthermore, the difference is about 20 to 20 mm month −1 . By contrast, the AVIM has a relatively poor performance in terms of simulating evapotranspiration.
Although the AVIM produces acceptable evapotranspiration in the high-latitude regions between 50 and 90 • N, it clearly overestimates evapotranspiration in arid areas, including northern Africa, western Asia, Australia, and western North America, where the bias varies from about 40 to 194 mm month −1 (Figure 6a,b). The AVIM moderately underestimates evapotranspiration over low-latitude areas near the equator where the evapotranspiration should be highest worldwide. Furthermore, the overestimation of the AVIM evapotranspiration in Australia is most apparent from October to March, while the greatest overestimations in northern Africa and western Asia occur from May to August. The degree of underestimation of the AVIM evapotranspiration over low-latitude areas is similar throughout the year.  Figure 7 shows the PCC, RSD and RB of the global annual mean monthly evapotranspiration derived from the MERRA-2 and AVIM datasets. The AVIM shows a poorer performance than the MERRA-2 dataset in simulating the spatial pattern of evapotranspiration, with an annual mean PCC of 0.37 for the AVIM and 0.91 for the MERRA-2 dataset. The RSD of the AVIM varies from 0.64 (January) to 0.85 (April), indicating that the AVIM moderately underestimates the spatial variation. The MERRA-2 dataset shows a comparable spatial variation to the GLEAM dataset, with the RSD varying from 0.85 to 1.09. The estimation of the global area-weighted mean evapotranspiration in the AVIM is slightly smaller than that for the GLEAM dataset from November to May with RBs of −3.02 to −7.59%, but it is a little larger between June and October with RBs of 2.34-14.49% (Figure 7c). The MERRA-2 dataset produces higher evapotranspiration in all months, showing RBs of 3.97-11.94%. Figure 8 shows that the annual cycle in the AVIM is consistent with both the GLEAM and MERRA-2 datasets. All three datasets produced a peak in July, but this peak is higher for the AVIM, and especially for the MERRA-2 dataset than for the GLEAM dataset, but the differences in the other months are not significant.  Figure 7 shows the PCC, RSD and RB of the global annual mean monthly evapotranspiration derived from the MERRA-2 and AVIM datasets. The AVIM shows a poorer performance than the MERRA-2 dataset in simulating the spatial pattern of evapotranspiration, with an annual mean PCC of 0.37 for the AVIM and 0.91 for the MERRA-2 dataset. The RSD of the AVIM varies from 0.64 (January) to 0.85 (April), indicating that the AVIM moderately underestimates the spatial variation. The MERRA-2 dataset shows a comparable spatial variation to the GLEAM dataset, with the RSD varying from 0.85 to 1.09. The estimation of the global area-weighted mean evapotranspiration in the AVIM is slightly smaller than that for the GLEAM dataset from November to May with RBs of −3.02 to −7.59%, but it is a little larger between June and October with RBs of 2.34-14.49% (Figure 7c). The MERRA-2 dataset produces higher evapotranspiration in all months, showing RBs of 3.97-11.94%. Figure 8 shows that the annual cycle in the AVIM is consistent with both the GLEAM and MERRA-2 datasets. All three datasets produced a peak in July, but this peak is higher for the AVIM, and especially for the MERRA-2 dataset than for the GLEAM dataset, but the differences in the other months are not significant.

Discussion
The evaluation presented in this study still contains uncertainties. For example, AVIM was forced by one dataset (GLDAS 1.0), which would affect the hydrological simulations. In addition, the satellite and reanalysis observational data also contain some

Discussion
The evaluation presented in this study still contains uncertainties. For example, AVIM was forced by one dataset (GLDAS 1.0), which would affect the hydrological simulations. In addition, the satellite and reanalysis observational data also contain some

Discussion
The evaluation presented in this study still contains uncertainties. For example, AVIM was forced by one dataset (GLDAS 1.0), which would affect the hydrological simulations.
In addition, the satellite and reanalysis observational data also contain some uncertainties, which is another important uncertainty source. Although previous studies have verified the reliability of the datasets used in this study, these datasets are inevitably subject to some bias.
Wang et al. [47] found that the GLDAS 1.0 has a high quality daily and monthly precipitation in a semiarid mesoscale basin in  [51] stated that the GLDAS 1.0 forcing data from 1995 to 1997 have high uncertainties. Moreover, the GLDAS 1.0 precipitation has considerable errors in 1996 [49,52] and 2000 [49], and the temperature of GLDAS 1.0 has large errors during 2000 and 2005 [49]. Therefore, we excluded the period from 1995 to 2005 in this study.
Ma et al. [53] carried out a comparison among different satellite surface soil moisture datasets, including the CCI, the Soil Moisture Active Passive (SMAP), the Soil Moisture and Ocean Salinity (SMOS), and the Advanced Microwave Scanning Radiometer 2 (AMSR2), against global ground-based observations, and found that the CCI dataset outperforms the other three datasets in the overall performance. An et al. [54] used the soil moisture results from meteorological stations and the Noah model to compare with CCI, and the results showed that CCI performs better in grassland than in cropland, urban, and built-up regions in China. Chakravorty et al. [55] found that the CCI combined data is more consistent with MERRA-Land than the SMOS Level-3 data over India for the period 2010-2013. Dorigo et al. [56] compared the CCI data with the 1979-2010 ground-based observational soil moisture of 596 sites from 28 historical and active monitoring networks worldwide, and concluded that the CCI quality has an upward trend over time but declines between 2007 and 2010. Dorigo et al. [11] found that the CCI soil moisture generally has a good agreement with the spatiotemporal distributions obtained by land surface models and in situ observations.
Khan et al. [57] validated the evapotranspiration datasets of GLEAM, GLDAS, and the Moderate Resolution Imaging Spectroradiometer (MODIS) global evapotranspiration project (MOD16) in Asia during the period 2000-2010, and found that all datasets show higher relative uncertainties for low vegetations than tall vegetations, and GLEAM performs best in forest. Miralles et al. [58] showed that the global products of GLEAM and the Priestley-Taylor Jet Propulsion Laboratory model (PT-JPL) outperform the Penman-Monteith algorithm behind the official Moderate Resolution Imaging Spectroradiometer (MODIS) product (PM-MOD) for most ecosystems and climate regimes. Yang et al. [59] assessed the GLEAM product using the ChinaFLUX evapotranspiration measurements from eight sites over China and found that GLEAM can estimate evapotranspiration with reasonable accuracy at different time scales (daily, monthly, annual) and perform well for most of the land-cover types but with a higher skill score for grassland than forest and cropland.
Reichle et al. [60] indicated that the MERRA-2 dataset has a better surface and root zone soil moisture performance than its predecessor MERRA and the ERA-Interim/Land dataset compared with in situ measurements from 220-320 stations in North America, Europe, and Australia. Bosilovich et al. [61] also found that, compared to MERRA, MERRA-2 generates a more accurate surface and root-zone soil moisture that is close to the in situ observations from the 140 sites from the USDA Natural Resources Conservation Service Soil Climate Analysis Network (SCAN) in the US, but it provides higher land evaporation than the NASA Energy and Water Cycle Studies (NEWS) estimate. Bosilovich et al. [62] showed that the land evaporation of MERRA-2 is higher than the NASA Energy and Water Cycle Study observations, but it is closer to the observations than the ocean evaporation of MERRA-2. Draper et al. [63] compared the global energy fluxes of several datasets, such as MERRA-2, GLEAM, and NEWS, and showed that MERRA-2 appears to overestimate the latent heat flux.

Conclusions
We assessed the ability of the AVIM to simulate hydrological processes using the runoff from the GLDAS and UNH-GRDC datasets, the surface soil moisture from the CCI and MERRA-2 datasets, and the evapotranspiration from the GLEAM and MERRA-2 datasets. We also validated the surface soil moisture and evapotranspiration skills in the MERRA-2 dataset.
We found that the AVIM presents a reasonable spatial distribution for surface runoff, but significantly underestimates surface runoff across the globe, which likely resulted from the overestimation of precipitation intercepted by the canopy. The annual cycle of AVIM global surface runoff shows different peak times with those in the GLDAS dataset, but is relatively close to the times of subsurface runoff in the GLDAS Noah, which is mainly controlled by the peak characteristics in the Northern Hemisphere. The AVIM gives a late start time for snowmelt, and this results in a 2-month delayed surface runoff peak relative to the UNH-GRDC result. The spatial pattern of the surface soil moisture is reasonable in the AVIM. The AVIM overestimates the soil moisture in high latitudes of the Northern Hemisphere, especially in April. The time difference of peaks in the annual cycles of the AVIM and CCI datasets is significant in the Northern and Southern Hemispheres. Moreover, the AVIM surface runoff and surface soil moisture both show very high values in the southern and western Tibetan Plateau. The annual cycle of evapotranspiration in the AVIM matches well with that of the GLEAM dataset with respect to the peak time and magnitude. However, the spatial distribution of the AVIM evapotranspiration is relatively poor, particularly in May, and it overestimates evapotranspiration in arid areas and underestimates evapotranspiration over the low-latitude regions near the equator.
The evapotranspiration in the MERRA-2 dataset matches well overall with the GLEAM dataset in terms of spatial distribution, spatial variation, and annual cycle, but with slightly higher evapotranspiration. However, the MERRA-2 dataset does not perform well for the surface soil moisture compared with the CCI dataset. It overestimates the soil moisture in northern high-latitude regions and shows an even larger bias than the AVIM result. The difference in the annual cycles between the global CCI and MERRA-2 datasets is primarily caused by the difference in the annual cycle in the Northern Hemisphere.
These results clearly show that the hydrological processes in the AVIM need further improvement and point out the specific improvement directions. The evaluation is expected to guide the development of the AVIM model and support future studies of water-carbon interactions and climate change projection, which are both important research topics.