Evaluation of the Common Land Model ( CoLM ) from the Perspective of Water and Energy Budget Simulation : Towards Inclusion in CMIP 6

Land surface models (LSMs) are important tools for simulating energy, water and momentum transfer across the land–atmosphere interface. Many LSMs have been developed over the past 50 years, including the Common Land Model (CoLM), a LSM that has primarily been developed and maintained by Chinese researchers. CoLM has been adopted by several Chinese Earth System Models (GCMs) that will participate in the Coupled Model Intercomparison Project Phase 6 (CMIP6). In this study, we evaluate the performance of CoLM with respect to simulating the water and energy budgets. We compare simulations using the latest version of CoLM (CoLM2014), the previous version of CoLM (CoLM2005) that was used in the Beijing Normal University Earth System Model (BNU-GCM) for CMIP5, and the Community Land Model version 4.5 (CLM4.5) against global diagnostic data and observations. Our results demonstrate that CLM4.5 outperforms CoLM2005 and CoLM2014 in simulating runoff (R), although all three models overestimate runoff in northern Europe and underestimate runoff in North America and East Asia. Simulations of runoff and snow depth (SNDP) are substantially improved in CoLM2014 relative to CoLM2005, particularly in the Northern Hemisphere. The simulated global energy budget is also substantially improved in CoLM2014 relative to CoLM2005. Simulations of sensible heat (SH) based on CoLM2014 compare favorably to those based on CLM4.5, while root-mean-square errors (RMSEs) in net surface radiation indicate that CoLM2014 (RMSE = 16.02 W m−2) outperforms both CoLM2005 (17.41 W m−2) and CLM4.5 (23.73 W m−2). Comparisons at regional scales show that all three models perform poorly in the Amazon region but perform relatively well over the central United States, Siberia and the Tibetan Plateau. Overall, CoLM2014 is improved relative to CoLM2005, and is comparable to CLM4.5 with respect to many aspects of the energy and water budgets. Our evaluation confirms CoLM2014 is suitable for inclusion in Chinese GCMs, which will increase the diversity of LSMs considered during CMIP6.


Introduction
The land surface is an important component of the Earth system, serving as the lower boundary of the atmosphere and regulating exchanges of radiation, heat, water, momentum and greenhouse gases (including CO 2 ) between the land and the atmosphere [1][2][3][4].Terrestrial heat and water storage also constitutes a significant source of memory within the climate system [5].For example, variations in soil moisture substantially influence both ecosystem behavior and boundary layer characteristics, especially in regions where evapotranspiration is a predominantly biophysical process [6].
Land surface models (LSMs) simulate fluxes of radiation, heat, water and momentum at the land-atmosphere interface.The development of LSMs to this point has occurred in three stages.The first stage was the initial development of LSMs during the 1960s, including early versions of the "bucket" model [7,8].In the second stage, LSMs began to consider vegetation physiology.Examples of LSMs developed during this stage include the Biosphere-Atmosphere Transfer Scheme (BATS) [9], the Simple Biosphere model (SiB) [10], and the Simplified Simple Biosphere model (SSiB) [11].In the third stage, models were modified to include representations of the carbon cycle and vegetation biochemical processes.Examples of third-stage models include the SiB2 [12], the Community Land Model (CLM) [13], and the Common Land Model (CoLM) [14,15].
Improvement and development of LSMs are vital steps toward improving climate model simulations and producing more reliable projections of climate change.Several offline multi-model intercomparison projects have been conducted at the international level in recent years, including the Project for the Intercomparison of Land-Surface Parameterization Schemes (PILPS) [16], which focused on improving LSM parameterizations at local to regional scales, and the Global Soil Wetness Project Phase 2 (GSPW-2) [17], a global-scale LSM intercomparison study that produced the equivalent of a land-surface reanalysis containing 10 years of global soil moisture, surface flux and related hydrological data.
The CoLM, which has been developed from the initial version of the CLM, combines the advantages of three well-known models: the National Center for Atmospheric Research (NCAR) land surface model [18], BATS, and the 1994 version of the Institute of Atmospheric Physics LSM (IAP94) [19] developed at the Chinese Academy of Sciences.The CoLM has primarily been developed and maintained by Chinese researchers.To date, two versions of the CoLM have been released: CoLM2005 and CoLM2014.CoLM2005 was adopted for use in the Coupled Model Intercomparison Project Phase 5 (CMIP5) version of the Beijing Normal University Earth System Model (BNU-ESM) [20].CoLM2014 is a newly-released version that has been adopted for use in the CMIP6 versions of the BNU-ESM and the Tsinghua CIESM (Community Integrated Earth System Model).
Some aspects of CoLM2005 have previously been evaluated at regional scales.For example, various studies have shown that CoLM provides reasonable simulations of energy partitioning and the land surface state in the high-altitude arid environments of northwestern China [21] and the Tibetan Plateau [22,23].The CoLM has also been shown to adequately represent basic features of the land surface energy balance at daily and seasonal time scales in mixed forests and artificial coniferous forests in temperate and subtropical regions of China [24].Comparison of CoLM and BATS Version le (BATSle) [9] simulations for cornfields in northeastern China indicates that CoLM produces more realistic latent and sensible heat fluxes than BATSle [25].
Previous evaluations of CoLM2005 have focused almost exclusively on regional scales.Few evaluations of CoLM2005 have extended to the global scale, and CoLM2014 has not been evaluated at the global scale at all.In this study, we assess the ability of CoLM2005 and CoLM2014 to simulate the global water and energy budgets.We focus on improvements in CoLM2014 relative to CoLM2005, and evaluate both relative to the CLM, another third-stage LSM.This study is a preliminary assessment of CoLM models before they are coupled in global climate models (GCMs).The results of this evaluation will help to guide further improvements in CoLM2014.

Community Land Model (CLM)
The CLM [13] is the LSM component of the Community Earth System Model (CESM) [26] developed at NCAR.The CLM simulates biogeophysical, hydrological, and biogeochemical processes in the terrestrial environment, and accounts for many aspects of human influences and natural ecosystem dynamics.The CLM model is used in this study as a baseline.
We use CLM4.5, a state-of-the-art LSM released in June 2013 as the land component of CGCM1.2.This version includes several modifications to the equations governing canopy processes, land surface hydrology, snow cover fraction and lakes, among others.A variety of new physical systems were also introduced in this version of the model, including a surface water reservoir and a vertically-resolved soil biogeochemistry scheme.Other changes include the addition of litter, soil carbon and nitrogen pools and an accounting of nitrification and denitrification processes [27].The fire model was replaced, and the model for biogenic volatile organic compounds was updated to the Model of Emissions of Gases and Aerosols from Nature (MEGAN2.1)[28].

CoLM
Despite their common origin, there are major differences between the CoLM and the initial version of the CLM.For example, CoLM includes a one-layer, two-big-leaf submodel for photosynthesis, stomatal conductance, leaf temperature, and energy flux, and treats the sunlit and shaded parts of canopy separately [29].The sunlit fraction of the canopy is where k b is the direct beam extinction coefficient and L AI the leaf area index of the canopy.This division is important, as sunlit leaves will receive much more intense light fluxes than shaded leaves under sunny conditions.In this two-big-leaf model, shaded leaves receive diffuse solar radiation only, while sunlit leaves receive both direct and diffuse solar radiation.This type of submodel is adopted in both CLM4.5 and two CoLMs.Two versions of the CoLM have been released to date: CoLM2005 and CoLM2014.CoLM2014 is radically different from CoLM2005, particularly with respect to global land surface data, pedotransfer functions for soil hydraulic and thermal parameters, the numerical solution of the Richards equation for soil water content, and the groundwater model [30].The CaMa-Flood river model [31] is coupled to CoLM2014, enabling simulations of the global distribution of river discharge.
CoLM2005 derives seven soil hydraulic and thermal parameters (porosity, specific heat capacity soil solids, saturated hydraulic conductivity, thermal conductivity for saturated and dry soil, saturated matrix potential and exponent B defined in Clapp and Hornberger [32]) from soil sand and clay contents.The FAO (Food and Agriculture Organization) and STATSGO (State Soil Geographic) soil classification data are merged together to provide soil texture data for two soil layers (0-30 cm and 30-100 cm).In CoLM2014, calculation of soil parameters has been improved in three aspects.First, the new BNU global soil texture dataset [33] is used to provide soil properties at eight levels between the surface and 2.3 m depth.Second, the impact of soil organic matter on soil hydraulic and thermal properties is considered.Third, all seven soil hydraulic and thermal parameters have been optimized.Different results have been generated for each parameter (e.g., 18 results for porosity) using different algorithms based on soil texture data derived from the BNU data set.The median of these results has then been used to define the optimal parameter value.
The ground water model also differs between the two versions of CoLM.In CoLM2005, subsurface runoff is given by where K D is the saturated soil hydraulic conductivity for the lowermost soil layers contributing to the base flow (K D = 4 × 10 −2 mm s −1 in CoLM2005), I b is a base flow parameter for the saturated fraction of the watershed (I b = 1 × 10 −5 mm s −1 in CoLM2005), w is soil wetness weighted by soil layer thickness and hydraulic conductivity in the lowermost five soil layers, and B is the exponent defined by Clapp and Hornberger [32].The parameter z w is the mean water table depth calculated as where f z is a water table depth scale parameter, z bot is the bottom depth of the lowest soil layer, s j is are the soil wetness in soil layer j, and ∆z j is the thickness of soil layer j.The liquid soil water content in every soil layer is checked at every time step.If liquid soil water content in any layer exceeds the maximum effective liquid soil water content (porosity minus the volume of ice in the soil layer), then the excess liquid water is added to subsurface runoff (R excess in Equation ( 2)).In CoLM2014, the water table depth changes due to aquifer recharge rate, calculated as where q o is the flux of water out of the lowest soil layer and ∆w is the change in the volume of liquid water contained in the lowest soil layer.A positive value of q corresponds to a rise in the water table.The subsurface runoff is calculated according to the equation where I m is the ice impedance factor calculated as The parameters f ice and ∆z sum represent the total ice content and soil thickness for all soil layers that overlap with the water table, down to the lowest soil layer considered by the model.Two corrections are added to R in CoLM2014.The first of these corrections is equivalent to R excess in CoLM2005.The second is intended to limit irreducible wrapping of liquid water in each soil layer (0.01 kg m −2 in CoLM2014).If a soil layer contains insufficient liquid water, this deficiency is compensated by drawing from aquifer water.

Forcing Data for CoLM
Seven types of data are required to conduct an off-line simulation using CoLM2005 or CoLM2014: downward fluxes of shortwave and longwave radiation (W m −2 ), surface air temperature (K), specific humidity (kg kg −1 ), near-surface wind speed (m s −1 ), surface pressure (Pa), and precipitation rate (mm s −1 ).The forcing data used in these simulations are from the Climate Research Unit-National Centers for Environmental Prediction (CRUNCEP) v.4 data set with Qian-NCEP ocean fill [34,35].CRUNCEP is a time-varying global data set based on a combination of CRU TS3.2 time series data and NCEP-NCAR reanalysis products.The data are provided at 0.5 • × 0.5 • spatial resolution every 6 h during 1901-2010.CRUNCEP data have previously been used as boundary conditions for CLM in studies of vegetation growth, evapotranspiration, gross primary production, and trends in net land-atmosphere carbon exchange over the period 1980-2009 [27].

CLM4.5 Output
Monthly post-processed LSM products from an offline run of the CLM4.5 forced by CRUNCEP v.4 were acquired from the Earth System Grid data archive at NCAR [36].This simulation covers the period 1850-2010, with data provided on a 1.25 • (longitude) × 0.9 • (latitude) spatial grid.

Validation Data
Data sets used for model validation are listed in Table 1.Snow depth and runoff data were obtained from the CGCM Land Model Diagnostics Package.Runoff data are from the Global Composite Runoff Data Set (GRDC, v1.0) provided by the Complex Systems Research Center at the University of New Hampshire [37].Snow depth data are from the United States Air Force (USAF) Environmental Technical Applications Center (ETAC) [38].Both of these data sets are provided at monthly resolution on a 0.5 • × 0.5 • spatial grid, and available online [39].Surface Soil moisture (SM) is taken from the European Centre for Medium-Range Weather Forecasts (ECMWF) Interim reanalysis (ERA-Interim) [40].Land surface temperatures (LST) are based on the MOD11C3 and MYD11C3 products from the Moderate Resolution Imaging Spectroradiometer (MODIS) instruments on the Terra and Aqua satellites.Net radiation (NRad) data are obtained from the NASA/GEWEX Surface Radiation Budget (SRB) Release-3.0data set.We use monthly estimates of global net radiation fluxes on a 1 • × 1 • grid.Global distributions of surface latent (LE) and sensible heat fluxes are based on the model tree ensemble (MTE) upscaling of eddy covariance measurements conducted by Jung et al. [41].These data are provided monthly on a 0.5 • × 0.5 • global grid between 1982 and 2008.

Experimental Design and Variables Evaluated
CRUNCEP v.4 data was first interpolated to the time steps of both CoLM models (30 min) using the same interpolation methods as in the offline CLM4.5 simulation.Six-hourly precipitation rates are assumed to apply for all model time steps within the corresponding 6-hour period.Downward solar radiation was interpolated in time using a diurnal function based on the cosine of the solar zenith angle.The other four forcing variables were linearly interpolated to model time steps.Details of the interpolation methods used for all forcing variables are provided in Chapter 26 of the CLM4.5 technical documentation [27].
In CLM4.5, each grid cell may contain a different number of land units, each land unit may contain a different number of columns, and each column may contain multiple PFTs (Plant Functional Types), where the fractions of different PFTs are derived from MODIS data [27].In both versions of CoLM, each surface grid cell is subdivided into a specified number of tiles, where each tile contains a single land cover type [42].Although CLM4.5 and CoLM use different subgrid hierarchies, the land cover map was derived from MODIS data in both cases.The land states are thus broadly consistent across the three models.This study focuses on historical simulations covering 1980-2009 on a 0.5 • spatial grid.For consistency with previous studies [43], we adopt a static land cover map based on MODIS observations from the year 2000.
Three variables related to the terrestrial water cycle (runoff, snow depth, and soil moisture) are evaluated at the global scale.Four additional variables related to land-atmosphere energy exchange (latent heat, sensible heat, land surface temperature, and net radiation) are evaluated on both global and regional scales.

Data Processing and Evaluation Metrics
All simulation results and validation data were mapped onto a 1.25 • (longitude) × 0.9 • (latitude) spatial grid using area-weighted averages.The spatial patterns produced by the models are evaluated against time-mean fields calculated from the validation data.To ensure a fair comparison, the period over which model outputs are averaged is varied to match the period over which the validation data are available (e.g., for LST, the averaging period is 2000-2009).Quantitative evaluation is based on a variety of metrics, including area-weighted global means, root-mean-square errors (RMSEs), centered pattern correlations (CPCs), and zonal mean distributions.The centered pattern correlation (CPC) is defined as where Here, S est is the spatial pattern produced by the model, S obs is the spatial pattern in the corresponding validation data, the overline indicates the area weighted global mean, and n xy is the number of cells in the spatial grid.
Evaluations of the simulated seasonal cycles are presented for four regions with different land cover types and climates, including the Amazon (10 • S-0 • , 70 • -50 • W), the Central US (30 • -50 • N, 105 • -90 • W), Siberia (50 • -66.5 • N, 60 • -140 • E), and the Tibetan Plateau (30 To evaluate diurnal cycles of latent heat and sensible heat fluxes, we use measurements of latent and sensible heat fluxes collected during 2003 at three flux sites: BR-Sa3, located in the Amazon; RU-Ha1, located in Siberia; and US-Ne3, located in the Central US.No flux site data is available for the Tibetan Plateau.Details of these three sites are provided at fluxdata site list website [44].Figure 1 shows the locations of the four focus regions and three flux sites used in this study. Validations of the mean seasonal cycle are based on three statistical metrics.The mean bias (BIAS) is calculated as: where t is the month and n t is the number of months with both simulated and observed data.BIAS varies between −∞ and +∞, with 0 representing the optimal case.The Nash-Sutcliffe efficiency (NASH) is calculated as: ∑ OBS(t) − OBS 2 (11) where the overline represents the time mean.NASH varies from −∞ to 1, where a value of 1 is optimal, a value of 0 indicates that the model simulates the mean state well but not the variability, and a value less than 0 indicates poor performance in both aspects.The root-mean-square error (RMSE) is calculated as: Smaller values of RMSE indicate closer agreement between simulated and observed variability.

Runoff
Runoff is an important component of the water budget.Variations in runoff are determined by combined variations in precipitation, evapotranspiration, interception of precipitation by the canopy, and infiltration.Figure 2 shows simulated spatial distributions of annual mean runoff (Figure 2a-c) and biases relative to validation data (Figure 2d-f) for CLM4.5, CoLM2005, and CoLM2014.Simulated runoff over Antarctica and most of Greenland are masked for consistency with the observed data.Among these three models, CLM4.5 has the highest CPC (0.80), smallest RMSE (0.76 mm day −1 ), and smallest mean error (−0.22 mm day −1 ) relative to the validation standard.These metrics indicate that CLM4.5 performs slightly better than either version of CoLM with respect to this variable.The two versions of CoLM perform similarly with respect to these global metrics.All three models generally underestimate runoff, especially in the southern Amazon, South China, and Indonesia; however, systematic runoff overestimates are also evident in some regions near the Amazon basin.The spatial distributions of biases relative to the validation standard also show some differences among the three models.Underestimates of runoff in the eastern United States and Indonesia are reduced in CLM4.5 relative to the two versions of CoLM, while the magnitudes of overestimates in the Amazon and underestimates in South China are smaller in CoLM2014 than in the other two models.CoLM2005, by contrast, produces more pronounced underestimates in tropical Africa than CLM4.5 or CoLM2014.These results are broadly consistent with previous evaluations of CLM4.5 [45].
Figure 2g shows zonal mean distributions of simulated and observed runoff.CLM4.5 produces a more realistic simulation of runoff in the deep tropics (15° S-5° N) than either version of CoLM.Although CoLM2005 and CoLM2014 perform similarly in the Southern Hemisphere, runoff in the Northern Hemisphere is more realistic in CoLM2014 than in CoLM2005.CoLM2005 and CoLM2014 use the same surface runoff model, but the groundwater model has been updated in CoLM2014.This change influences simulations of both surface and subsurface runoff, and produces different simulations of total runoff.Overall, all three models systematically underestimate runoff in most latitude bands, as also indicated by the global spatial distributions (Figure 2d-f).

Runoff
Runoff is an important component of the water budget.Variations in runoff are determined by combined variations in precipitation, evapotranspiration, interception of precipitation by the canopy, and infiltration.Figure 2 shows simulated spatial distributions of annual mean runoff (Figure 2a-c) and biases relative to validation data (Figure 2d-f) for CLM4.5, CoLM2005, and CoLM2014.Simulated runoff over Antarctica and most of Greenland are masked for consistency with the observed data.Among these three models, CLM4.5 has the highest CPC (0.80), smallest RMSE (0.76 mm day −1 ), and smallest mean error (−0.22 mm day −1 ) relative to the validation standard.These metrics indicate that CLM4.5 performs slightly better than either version of CoLM with respect to this variable.The two versions of CoLM perform similarly with respect to these global metrics.All three models generally underestimate runoff, especially in the southern Amazon, South China, and Indonesia; however, systematic runoff overestimates are also evident in some regions near the Amazon basin.The spatial distributions of biases relative to the validation standard also show some differences among the three models.Underestimates of runoff in the eastern United States and Indonesia are reduced in CLM4.5 relative to the two versions of CoLM, while the magnitudes of overestimates in the Amazon and underestimates in South China are smaller in CoLM2014 than in the other two models.CoLM2005, by contrast, produces more pronounced underestimates in tropical Africa than CLM4.5 or CoLM2014.These results are broadly consistent with previous evaluations of CLM4.5 [45].
Figure 2g shows zonal mean distributions of simulated and observed runoff.Most runoff underestimates in these three models are located in the tropics and mid-latitudes, with differences in simulated runoff largely reflecting differences in the precipitation forcing [46].Figure 3 shows biases in net precipitation (sum of interception and evapotranspiration subtracted from precipitation) and runoff simulated by CLM4.5 forced by the QIAN data set [47].Relative to CLM4.5 forced by CRUNCEP data (Figure 2d), CLM4.5 forced by QIAN data simulates larger amounts of runoff in the tropics and mid-latitudes (Figure 3b).Moreover, the largest biases in simulated runoff (Figure 2d-f) are generally co-located with larger discrepancies in precipitation between the CRUNCEP and QIAN data sets (Figure 3a).Uncertainties in the CRUNCEP precipitation forcing are thus one possible cause of the relatively large runoff biases produced by Most runoff underestimates in these three models are located in the tropics and mid-latitudes, with differences in simulated runoff largely reflecting differences in the precipitation forcing [46].Figure 3 shows biases in net precipitation (sum of interception and evapotranspiration subtracted from precipitation) and runoff simulated by CLM4.5 forced by the QIAN data set [47].Relative to CLM4.5 forced by CRUNCEP data (Figure 2d), CLM4.5 forced by QIAN data simulates larger amounts of runoff in the tropics and mid-latitudes (Figure 3b).Moreover, the largest biases in simulated runoff (Figure 2d-f) are generally co-located with larger discrepancies in precipitation between the CRUNCEP and QIAN data sets (Figure 3a).Uncertainties in the CRUNCEP precipitation forcing are thus one possible cause of the relatively large runoff biases produced by these three model simulations.Other potential sources of error include parametric uncertainties and other model deficiencies.For example, underestimation in areas with high soil clay contents (e.g., the southern Amazon and Central Africa) is potentially related to a lack of fast water infiltration [48].
Atmosphere 2017, 8, 141 9 of 33 these three model simulations.Other potential sources of error include parametric uncertainties and other model deficiencies.For example, underestimation in areas with high soil clay contents (e.g., the southern Amazon and Central Africa) is potentially related to a lack of fast water infiltration [48].Figure 4 shows seasonal cycles of runoff in the four focus regions based on the three model simulations and the validation data.Statistical metrics summarizing the correspondence between the simulated seasonal cycles of runoff and the validation standard are listed in Table 2.All three models produce reasonable seasonal cycles of runoff in the Amazon but underestimate the amplitude.CLM4.5 and CoLM2005 are broadly similar, while CoLM2014 produces less runoff and higher surface soil moisture (see Section 3.3 below) than the other two models from January to July.CLM4.5 provides the best simulation of the runoff seasonal cycle in the Central US, with the largest NASH (0.687) and the smallest RMSE (0.083).CoLM2014 is slightly improved in this region relative to CoLM2005, particularly during summer (JJA), but with a less realistic simulation of runoff during autumn (SON).CLM4.5 also provides the best performance in Siberia, although all three models underestimate runoff in this region between July and December.CoLM2014 better captures the magnitude of the annual cycle than CoLM2005, but the annual peak in runoff occurs one month too early in both versions of CoLM.None of the models can adequately simulate the seasonal cycle of Figure 4 shows seasonal cycles of runoff in the four focus regions based on the three model simulations and the validation data.Statistical metrics summarizing the correspondence between the simulated seasonal cycles of runoff and the validation standard are listed in Table 2.All three models produce reasonable seasonal cycles of runoff in the Amazon but underestimate the amplitude.CLM4.5 and CoLM2005 are broadly similar, while CoLM2014 produces less runoff and higher surface soil moisture (see Section 3.3 below) than the other two models from January to July.CLM4.5 provides the best simulation of the runoff seasonal cycle in the Central US, with the largest NASH (0.687) and the smallest RMSE (0.083).CoLM2014 is slightly improved in this region relative to CoLM2005, particularly during summer (JJA), but with a less realistic simulation of runoff during autumn (SON).CLM4.5 also provides the best performance in Siberia, although all three models underestimate runoff in this region between July and December.CoLM2014 better captures the magnitude of the annual cycle than CoLM2005, but the annual peak in runoff occurs one month too early in both versions of CoLM.None of the models can adequately simulate the seasonal cycle of runoff on the Tibetan Plateau.CLM4.5 and CoLM2014 produce similar seasonal cycles, but the magnitude is closer to observed in CoLM2014.

Snow Depth
Figure 5 shows global spatial distributions of simulated snow depth (Figure 5a-c), spatial distributions of biases in simulated snow depth relative to the validation standard (Figure 5d-f), and zonal mean estimates of snow depth from simulations and observations (Figure 5g).As with runoff, snow cover over Antarctica and most of Greenland are masked for the purposes of these comparisons.The two versions of CoLM produce very similar results, with CPCs of 0.63 and RMSEs of 0.10 m.Whereas CLM4.5 slightly overestimates annual mean snow depth relative to observations, CoLM2005 and CoLM2014 slightly underestimate annual mean snow depth.However, both versions of CoLM produce larger maximum snow depths than CLM4.5.

Snow Depth
Figure 5 shows global spatial distributions of simulated snow depth (Figure 5a-c), spatial distributions of biases in simulated snow depth relative to the validation standard (Figure 5d-f), and zonal mean estimates of snow depth from simulations and observations (Figure 5g).As with runoff, snow cover over Antarctica and most of Greenland are masked for the purposes of these comparisons.The two versions of CoLM produce very similar results, with CPCs of 0.63 and RMSEs of 0.10 m.Whereas CLM4.5 slightly overestimates annual mean snow depth relative to observations, CoLM2005 and CoLM2014 slightly underestimate annual mean snow depth.However, both versions of CoLM produce larger maximum snow depths than CLM4.5.All three models overestimate snow depths in northern Europe and underestimate snow depths in North America and East Asia.CoLM2005 and CoLM2014 show smaller positive biases in northern Europe relative to CLM4.5, but larger negative biases in East Asia and North America.CoLM2014 produces the most realistic simulations of zonal mean snow depth among the three models, with slight improvements relative to CoLM2005 in most latitude bands.
Figure 6 and Table 3 summarize the model evaluation for regional seasonal cycles of snow depth.CoLM2014 provides the most realistic seasonal cycles of snow depth over Siberia and the All three models overestimate snow depths in northern Europe and underestimate snow depths in North America and East Asia.CoLM2005 and CoLM2014 show smaller positive biases in northern Europe relative to CLM4.5, but larger negative biases in East Asia and North America.CoLM2014 produces the most realistic simulations of zonal mean snow depth among the three models, with slight improvements relative to CoLM2005 in most latitude bands.
Figure 6 and Table 3 summarize the model evaluation for regional seasonal cycles of snow depth.CoLM2014 provides the most realistic seasonal cycles of snow depth over Siberia and the Tibetan Plateau, while CLM4.5 provides the best performance for the Central US.Both versions of CoLM underestimate snow depths throughout the year in the Central US, Siberia, and the Tibetan Plateau, while CLM4.5 generally overestimates snow depths in the same three regions.
CoLM2005 and CoLM2014 calculate snow depth in different ways.Precipitation arrives at the land surface either by direct throughfall or by drainage from the overlying canopy.In CoLM2005, precipitation is divided into snowfall and rainfall after removing interception of precipitation by the canopy.By contrast, in CoLM2014, precipitation is first divided into snowfall and rainfall and then interception is calculated separately for each part.Canopy drainage of fresh snowfall is assumed to be zero in CoLM2014, so this difference will result in differences in the net amount and timing of precipitation arriving at the land surface.Another important change is the fraction of potential interception, which has been changed from a value of 1 in CoLM2005 to a value of 0.25 in CoLM2014.Both models capture seasonal variability relatively well, but CoLM2014 consistently presents slight improvements relative to CoLM2005 in these regions.
Atmosphere 2017, 8, 141 12 of 33 CoLM2005 and CoLM2014 calculate snow depth in different ways.Precipitation arrives at the land surface either by direct throughfall or by drainage from the overlying canopy.In CoLM2005, precipitation is divided into snowfall and rainfall after removing interception of precipitation by the canopy.By contrast, in CoLM2014, precipitation is first divided into snowfall and rainfall and then interception is calculated separately for each part.Canopy drainage of fresh snowfall is assumed to be zero in CoLM2014, so this difference will result in differences in the net amount and timing of precipitation arriving at the land surface.Another important change is the fraction of potential interception, which has been changed from a value of 1 in CoLM2005 to a value of 0.25 in CoLM2014.Both models capture seasonal variability relatively well, but CoLM2014 consistently presents slight improvements relative to CoLM2005 in these regions.

Soil Moisture
Albergel et al. [49] validated ERA-Interim soil moisture products against soil moisture measurements from 117 stations representing a variety of different biomes and climate conditions around the world.They found that ERA-Interim agrees well with observational measurements (the mean correlation coefficient is 0.63) but tends to overestimate soil moisture (the mean bias is 0.079 m 3 m −3 ).
Figure 7 shows the distributions and biases of surface layer soil moisture (0-7 cm for ERA-Interim and 0-5 cm for the three model simulations).Relative to ERA-Interim, global mean soil moisture biases in CLM4.5, CoLM2005, CoLM2014 are 0.01 m 3 m −3 , −0.07 m 3 m −3 , and −0.07 m 3 m −3 , respectively.Leaving aside the spatial distribution, and given that ERA-Interim overestimates soil moisture with a global mean bias of 0.079 m 3 m −3 , the two versions of CoLM provide reasonable simulations of soil moisture on the global scale.
Simulated soil moisture contents in Siberia and northern Canada are noticeably different among the three models.Sugimoto et al. [50] reported in situ measurements of soil moisture at a larch forest site in eastern Siberia (62.15 • N, 129.37 • E).Surface layer (0-15 cm) volumetric soil water contents observed at this site range from 10 to 30% during June-August 2000 (their Figure 4), with most measurements closer to 10%.Simulated mean soil moisture at this site during JJA is 33% based on CLM4.5, 12% based on CoLM2005, 9% based on CoLM2014, and 33% based on ERA-Interim.CLM4.5 and ERA-Interim appear to overestimate soil moisture, while the two versions of CoLM produce values more consistent with observations.The Canadian Experiment for Soil Moisture in 2010 (CanEx-SM10) provided measurements of in situ soil moisture at a boreal forest site (53.59 • N-54.27 • N; 104.32 • W-104.99 • W) in northern Canada [7,51].Observed soil moisture at this site was steady throughout the summer at approximately 0.1 m 3 m −3 .Mean simulated soil moisture for JJA at this boreal forest site is 0.24 m 3 m −3 in CLM4.5, 0.17 m 3 m −3 in CoLM2005, and 0.10 m 3 m −3 in CoLM2014, indicating that CoLM2014 produces the best agreement with observations.
Figure 8 shows reanalysis and simulated seasonal cycles of soil moisture in the four focus regions.Among the three models, CoLM2014 is most similar to ERA-Interim for the Tibetan Plateau, while CoLM2005 is most similar to ERA-Interim for Siberia.Both versions of CoLM underestimate soil moisture relative to ERA-Interim in the Central US, Siberia, and Tibetan Plateau regions.By contrast, CLM4.5 overestimates soil moisture in Siberia and underestimates soil moisture on the Tibetan Plateau.All three models provide reasonable simulations of the seasonal cycle of surface soil moisture in the Amazon, but overestimate the amplitude of the seasonal cycle relative to ERA-Interim.CoLM2014 is the most different from the reanalysis.
In the Central US, the largest NASH (0.287) and smallest RMSE (0.014 m 3 m −3 ) are achieved by CLM4.5 (Table 4), but CoLM2014 provides a noticeably better simulation of surface soil moisture in this region than CoLM2005.In Siberia, the two versions of CoLM produce a qualitatively similar seasonal cycle to that produced by ERA-Interim but tend to underestimate soil moisture content relative to the reanalysis (see also discussion above).CLM4.5 overestimates surface soil moisture in this region in all months except for July and August.In the Tibetan Plateau, CoLM2005 and CLM4.5 produce similar results, neither of which agree well with ERA-Interim.By contrast, the seasonal cycle simulated by CoLM2014 in this region is in good qualitative agreement with that produced by ERA-Interim, despite a low bias in soil moisture content.
Atmosphere 2017, 8, 141 14 of 33     Both versions of CoLMs estimate water flow within soils by the equation where K is the soil hydraulic conductivity, ψ is the soil matric potential, and z is soil depth.Equation (13) indicates two potential sources of differences in simulated soil moisture.First, differences in hydraulic conductivity and matric potential may arise from differences in the prescribed pedo-transfer functions (PTFs) and soil textures between the two versions of CoLM.For example, the inclusion of soil organic matter in CoLM2014 results in a larger saturated hydraulic conductivity relative to CoLM2005 (figure not shown).Second, methods of calculating hydraulic conductivity within a given soil layer differ between the two versions of CoLM.In CoLM2005, hydraulic conductivity is calculated as 0.5 0.5 where θ is liquid volumetric soil moisture, θ SAT is porosity, and j is the index of the soil layer.A different method is used in CoLM2014.First a variable α is introduced, defined as  Both versions of CoLMs estimate water flow within soils by the equation where K is the soil hydraulic conductivity, ψ is the soil matric potential, and z is soil depth.Equation (13) indicates two potential sources of differences in simulated soil moisture.First, differences in hydraulic conductivity and matric potential may arise from differences in the prescribed pedo-transfer functions (PTFs) and soil textures between the two versions of CoLM.For example, the inclusion of soil organic matter in CoLM2014 results in a larger saturated hydraulic conductivity relative to CoLM2005 (figure not shown).Second, methods of calculating hydraulic conductivity within a given soil layer differ between the two versions of CoLM.In CoLM2005, hydraulic conductivity is calculated as where θ is liquid volumetric soil moisture, θ SAT is porosity, and j is the index of the soil layer.A different method is used in CoLM2014.First a variable α is introduced, defined as with ψ the matric potential and ∆z the thickness of the soil layer.The hydraulic conductivity is then calculated by the piecewise function

2B+3
, α < 0 (16)  Figure 10 shows simulated and analyzed annual cycles of latent heat fluxes for the four focus regions.CoLM2005 and CLM4.5 consistently overestimate latent heat fluxes in the Amazon Basin, while CoLM2014 underestimates latent heat fluxes during the summer dry season and overestimates latent heat fluxes during other months.The three simulated annual cycles are similar to each other but substantially different from the FLUXNET MTE data.Errors in simulated surface soil moisture (Figure 8) may contribute to this disagreement.Addressing these errors may involve improving model representations of root dynamics [52].Biases in prescribed boundary conditions may also contribute to these errors.For example, Wang et al. [53] reported that CRUNCEP consistently overestimates monthly mean downward longwave and shortwave radiation relative to observations at a site in the tropical Amazon (3.02 • S, 53.97 • W; their Figure 7).Figure 11 shows the monthly mean simulated and observed diurnal cycles of latent heat flux at three flux sites.Mean diurnal cycles are shown for July and November 2003.All three models broadly capture the mean diurnal cycle of latent heat flux in June at the three selected sites, but with significant low biased between 10:00 and 15:00 local solar time.These low biases indicate that the models underestimate evapotranspiration during this part of the day.The situation is different in November.At BR Sa3, which is located in the Amazon, the timing of the peak latent heat flux produced by CoLM2005 is too early in the day, while that produced by CLM4.5 is too late in the day.CoLM2014 produces a more qualitatively reasonable simulation of the diurnal cycle, but with low biases relative to observed.At RU Ha1, which is located in Siberia, all three models fail to capture the diurnal cycle of latent heat.The observed latent heat flux at this site is positive from 15:00 local solar time through the early morning, but simulated latent heat fluxes are approximately zero after about 16:00 (Local Time).Both versions of CoLM overestimate daytime latent heat fluxes at US-Ne2, which is located in the Central US.By contrast, CLM4.5 underestimates the daytime peak in latent heat flux at this site.

Latent Heat Flux
The soil parameters and PTFs used in CoLM2014 and CoLM2005 differ (Section 2.1.2).These differences have wide-ranging impacts on the model simulations.Differences in soil parameters All three models perform relatively well in Siberia, with NASH efficiency coefficients of 0.983 for CLM4.5, 0.953 for CoLM2005, and 0.946 for CoLM2014 (Table 5).CoLM2014 overestimates latent heat fluxes on the Tibetan Plateau, and underperforms both CLM4.5 and CoLM2005 in this region.In the Central US, the NASH efficiency coefficients are 0.839 for CLM4.5, 0.913 for CoLM2005, and 0.880 for CoLM2014.Discrepancies among the model simulations for this region are larger during summer than during the other three seasons, as are differences between the model simulations and the validation standard.CLM4.5 overestimates latent heat fluxes in the Central US in July.Both versions of the CoLM match the validation standard better than CLM4.5;however, all three models produce an annual maximum in June when FLUXNET MTE places this maximum in late July.Figure 11 shows the monthly mean simulated and observed diurnal cycles of latent heat flux at three flux sites.Mean diurnal cycles are shown for July and November 2003.All three models broadly capture the mean diurnal cycle of latent heat flux in June at the three selected sites, but with significant low biased between 10:00 and 15:00 local solar time.These low biases indicate that the models underestimate evapotranspiration during this part of the day.The situation is different in November.At BR Sa3, which is located in the Amazon, the timing of the peak latent heat flux produced by CoLM2005 is too early in the day, while that produced by CLM4.5 is too late in the day.CoLM2014 produces a more qualitatively reasonable simulation of the diurnal cycle, but with low biases relative to observed.At RU Ha1, which is located in Siberia, all three models fail to capture the diurnal cycle of latent heat.The observed latent heat flux at this site is positive from 15:00 local solar time through the early morning, but simulated latent heat fluxes are approximately zero after about 16:00 (Local Time).Both versions of CoLM overestimate daytime latent heat fluxes at US-Ne2, which is located in the Central US.By contrast, CLM4.5 underestimates the daytime peak in latent heat flux at this site.
The soil parameters and PTFs used in CoLM2014 and CoLM2005 differ (Section 2.1.2).These differences have wide-ranging impacts on the model simulations.Differences in soil parameters cause differences in the simulated states of the soil and surface air, which in turn alter simulations of land surface evapotranspiration.For example, soil evaporation depends on the specific humidity of the surface air, which is in turn controlled by the ground surface temperature and the wetness of the topsoil.Simulated climatological mean ground surface temperatures for Central US in July are 301.2K in CoLM2005 and 302.0 K in CoLM2014, while the simulated mean soil moisture is greater in CoLM2005 than in CoLM2014.These differences in soil moisture and ground surface temperature may cause differences in the simulated rate of soil evaporation.Differences between CLM4.5 and the two versions of CoLM are also worth noting.CLM4.5 uses an empirical function of soil water [54] to calculate soil evaporation.This equation, which is intended to represent molecular processes that connect soil pores to the surface within dry soils, has no equivalent in CoLM.Calculations of latent and sensible heat fluxes for non-vegetated surfaces also differ.For CLM4.5, latent and sensible heat fluxes are calculated separately for bare soil, snow-covered, and water-covered surfaces.The area-weighted mean of these fluxes is then used to represent the net flux from non-vegetated surfaces within the grid cell.Neither version of CoLM considers differences in latent and sensible heat fluxes across these different surface types within non-vegetated surfaces.

Sensible Heat Flux
Figure 12 shows spatial distributions in sensible heat fluxes and biases relative to the validation standard.Overall, distributions of sensible heat fluxes produced by three models are less consistent with FLUXNET MTE than are the corresponding distributions of latent heat fluxes.All three models have similar global CPCs and RMSEs relative to the validation standard, but with some regional differences.CLM4.5 overestimates sensible heat fluxes in the Amazon, whereas both versions of CoLM are closer to the validation standard.CLM4.    Figure 13 shows mean seasonal cycles of sensible heat flux for the four focus domains.In the Amazon, CLM4.5 substantially overestimates sensible heat exchange, while both versions of CoLM overestimate the amplitude of the seasonal cycle.Although CoLM2014 represents an improvement over CoLM2005 and CLM4.5, it still produces large biases in this region.CoLM2005 provides the most realistic simulation of sensible heat fluxes in Siberia with a NASH efficiency coefficient of 0.940 (Table 6).CLM4.5 and CoLM2014 underestimate sensible heat fluxes in Siberia from January to June.All three models indicate that the largest monthly mean sensible heat flux in this region occurs in June; however, the FLUXNET MTE data indicate that this annual maximum occurs one month earlier, in May.

Sensible Heat Flux
All three models underestimate sensible heat fluxes in the Tibetan Plateau but simulate qualitatively reasonable seasonal cycles.CoLM2005 produces the greatest NASH efficiency coefficient (0.874), followed by CLM4.5 (0.720).CoLM2014 significantly underestimates the sensible heat flux, but with a seasonal cycle that is otherwise very similar to that indicated by FLUXNET MTE.In the Central US, all three models fail to estimate the seasonal cycle of sensible heat flux.Overall, there is no noticeable improvement in CoLM2014 relative to CoLM2005.Figure 13 shows mean seasonal cycles of sensible heat flux for the four focus domains.In the Amazon, CLM4.5 substantially overestimates sensible heat exchange, while both versions of CoLM overestimate the amplitude of the seasonal cycle.Although CoLM2014 represents an improvement over CoLM2005 and CLM4.5, it still produces large biases in this region.CoLM2005 provides the most realistic simulation of sensible heat fluxes in Siberia with a NASH efficiency coefficient of 0.940 (Table 6).CLM4.5 and CoLM2014 underestimate sensible heat fluxes in Siberia from January to June.All three models indicate that the largest monthly mean sensible heat flux in this region occurs in June; however, the FLUXNET MTE data indicate that this annual maximum occurs one month earlier, in May.
All three models underestimate sensible heat fluxes in the Tibetan Plateau but simulate qualitatively reasonable seasonal cycles.CoLM2005 produces the greatest NASH efficiency coefficient (0.874), followed by CLM4.5 (0.720).CoLM2014 significantly underestimates the sensible heat flux, but with a seasonal cycle that is otherwise very similar to that indicated by FLUXNET MTE.In the Central US, all three models fail to estimate the seasonal cycle of sensible heat flux.Overall, there is no noticeable improvement in CoLM2014 relative to CoLM2005.Monthly mean diurnal cycles of sensible heat at three sites for two months are displayed in Figure 14.At BR Sa3 in July, CoLM2005 shows significant low biases from 7:00 to 12:00 local solar time, followed by high biases from 12:00 to 15:00.CoLM2014 retains some evidence of these biases, but still gives the most reasonable simulation of the diurnal cycle for this month and location among these three models.All three models overestimate daytime sensible heat fluxes at this site in November.At RU Ha1, CoLM2005 outperforms the other two models in July but produces large errors in November.None of these three models generates a reasonable simulation of the diurnal cycle at this location in November.At US Ne2, all three models fail to simulate the diurnal cycle of sensible heat flux in July.The qualitative characteristics of the simulated diurnal cycles at this location are substantially improved in November, but with quantitative biases that are consistent across the three models.Monthly mean diurnal cycles of sensible heat at three sites for two months are displayed in Figure 14.At BR Sa3 in July, CoLM2005 shows significant low biases from 7:00 to 12:00 local solar time, followed by high biases from 12:00 to 15:00.CoLM2014 retains some evidence of these biases, but still gives the most reasonable simulation of the diurnal cycle for this month and location among these three models.All three models overestimate daytime sensible heat fluxes at this site in November.At RU Ha1, CoLM2005 outperforms the other two models in July but produces large errors in November.None of these three models generates a reasonable simulation of the diurnal cycle at this location in November.At US Ne2, all three models fail to simulate the diurnal cycle of sensible heat flux in July.The qualitative characteristics of the simulated diurnal cycles at this location are substantially improved in November, but with quantitative biases that are consistent across the three models.CoLM2005 simulations of sensible heat exchange in arid and semi-arid areas are governed by three key parameters (surface albedo, surface emissivity, and soil thermal conductivity), along with the parameterizations of aerodynamic roughness length and thermal roughness length [55].Among these five major influences, only soil thermal conductivity differs substantially between CoLM2005 and CoLM2014.As shown in Figure 15, CoLM2014 has a higher dry soil thermal conductivity that allows more energy to penetrate to deeper soil layers.CoLM2005 therefore produces a higher surface skin temperature, leading to larger sensible heat fluxes in relatively arid regions like the Central US.CoLM2005 and CoLM2014 also use a different method for calculating aerodynamic roughness length than that used in CLM4.5.This difference is among the reasons why CLM4.5 produces significantly different sensible heat fluxes than those produced by CoLM2005 and CoLM2014.The method for calculating roughness length is thus a potential target for improvement in CoLM.
Sensible and latent heat fluxes in vegetated grid cells are partitioned into vegetation and ground contributions in both versions of CoLM.The vegetation components of the sensible and latent heat fluxes are related to leaf area index, with the sensible heat flux over vegetated surfaces given by ( ) where ρ atm is the density of surface air, C p is the specific heat capacity of air, and T s and T v are the canopy air and leaf temperatures, respectively.L and S are the leaf and stem area indices, respectively, while r b is the leaf boundary layer resistance.LAI (or L) is calculated by using an empirical model that depends on soil temperature in CoLM2005, while LAI is prescribed based on CoLM2005 simulations of sensible heat exchange in arid and semi-arid areas are governed by three key parameters (surface albedo, surface emissivity, and soil thermal conductivity), along with the parameterizations of aerodynamic roughness length and thermal roughness length [55].Among these five major influences, only soil thermal conductivity differs substantially between CoLM2005 and CoLM2014.As shown in Figure 15, CoLM2014 has a higher dry soil thermal conductivity that allows more energy to penetrate to deeper soil layers.CoLM2005 therefore produces a higher surface skin temperature, leading to larger sensible heat fluxes in relatively arid regions like the Central US.CoLM2005 and CoLM2014 also use a different method for calculating aerodynamic roughness length than that used in CLM4.5.This difference is among the reasons why CLM4.5 produces significantly different sensible heat fluxes than those produced by CoLM2005 and CoLM2014.The method for calculating roughness length is thus a potential target for improvement in CoLM.
Sensible and latent heat fluxes in vegetated grid cells are partitioned into vegetation and ground contributions in both versions of CoLM.The vegetation components of the sensible and latent heat fluxes are related to leaf area index, with the sensible heat flux over vegetated surfaces given by where ρ atm is the density of surface air, C p is the specific heat capacity of air, and T s and T v are the canopy air and leaf temperatures, respectively.L and S are the leaf and stem area indices, respectively, while r b is the leaf boundary layer resistance.LAI (or L) is calculated by using an empirical model that depends on soil temperature in CoLM2005, while LAI is prescribed based on reprocessed MODIS observations (from 2000 to 2009) in CoLM2014 [56].Simulated sensible and latent heat fluxes may differ substantially between the two versions of CoLM in regions for which there exist significant differences in LAI.
Atmosphere 2017, 8, 141 23 of 33 reprocessed MODIS observations (from 2000 to 2009) in CoLM2014 [56].Simulated sensible and latent heat fluxes may differ substantially between the two versions of CoLM in regions for which there exist significant differences in LAI.

Ground Surface Temperature
Figure 16 shows global distributions of land surface temperature and biases relative to MODIS observations.The simulated LSTs are based on monthly means that include the full diurnal cycle (time steps every 30 min).We therefore use the mean of all four LST values from MOD11C3 (equator crossings at approximately 10:30 and 22:30 local solar time) and MYD11C3 (equator crossings at approximately 01:30 and 13:30 local solar time) as the validation standard for LST.
All three models show similar spatial patterns and produce a similar global mean land surface temperature.Relative to MODIS estimates, all three models underestimate temperatures at low and middle latitudes in the Northern Hemisphere.CLM4.5 estimates are close to MODIS through most of the Northern Hemisphere high latitudes but with a negative bias in the eastern part of Russia, while CoLM2005 and CoLM2014 show positive biases throughout the Northern Hemisphere high latitudes.Relative to CoLM2005, CoLM2014 simulations of LST are more consistent with MODIS observations in Siberia, Northwest China and the Sahara Desert, but with larger biases in Canada, equatorial Africa, and the Amazon.
Figure 17 shows seasonal cycles of simulated and observed land surface temperatures in the four focus domains and Table 7 lists statistical parameters for the simulated seasonal cycles in these domains relative to MODIS.All three models agree well with MODIS data in the Central US, Siberia, and Tibetan Plateau regions.Simulations based on CoLM2005 and CoLM2014 are similar in the Central US and the Tibetan Plateau, and match MODIS estimates in these regions slightly better than CLM4.5 does.In Siberia, CLM4.5 produces the largest NASH and smallest RMSE relative to MODIS among the three models.In the Amazon, all three models overestimate land surface temperature throughout the year.Although the simulated seasonal cycles are broadly consistent with that indicated by MODIS, the annual minimum LST is delayed by 1-2 months in the models.In addition to model-based errors, overestimation of LST in this region could also result from diurnal sampling biases in MODIS data, as the overpass times of both MODIS-bearing satellites (Aqua and Terra) miss the hottest part of the day (late afternoon to early evening).

Ground Surface Temperature
Figure 16 shows global distributions of land surface temperature and biases relative to MODIS observations.The simulated LSTs are based on monthly means that include the full diurnal cycle (time steps every 30 min).We therefore use the mean of all four LST values from MOD11C3 (equator crossings at approximately 10:30 and 22:30 local solar time) and MYD11C3 (equator crossings at approximately 01:30 and 13:30 local solar time) as the validation standard for LST.
All three models show similar spatial patterns and produce a similar global mean land surface temperature.Relative to MODIS estimates, all three models underestimate temperatures at low and middle latitudes in the Northern Hemisphere.CLM4.5 estimates are close to MODIS through most of the Northern Hemisphere high latitudes but with a negative bias in the eastern part of Russia, while CoLM2005 and CoLM2014 show positive biases throughout the Northern Hemisphere high latitudes.Relative to CoLM2005, CoLM2014 simulations of LST are more consistent with MODIS observations in Siberia, Northwest China and the Sahara Desert, but with larger biases in Canada, equatorial Africa, and the Amazon.
Figure 17 shows seasonal cycles of simulated and observed land surface temperatures in the four focus domains and Table 7 lists statistical parameters for the simulated seasonal cycles in these domains relative to MODIS.All three models agree well with MODIS data in the Central US, Siberia, and Tibetan Plateau regions.Simulations based on CoLM2005 and CoLM2014 are similar in the Central US and the Tibetan Plateau, and match MODIS estimates in these regions slightly better than CLM4.5 does.In Siberia, CLM4.5 produces the largest NASH and smallest RMSE relative to MODIS among the three models.In the Amazon, all three models overestimate land surface temperature throughout the year.Although the simulated seasonal cycles are broadly consistent with that indicated by MODIS, the annual minimum LST is delayed by 1-2 months in the models.In addition to model-based errors, overestimation of LST in this region could also result from diurnal sampling biases in MODIS data, as the overpass times of both MODIS-bearing satellites (Aqua and Terra) miss the hottest part of the day (late afternoon to early evening).Anomalies relative to MODIS data are shown in Figure 18 to better illustrate the differences between simulated LST and MODIS products.All three models overestimate LST in the Amazon region and underestimate LST on the Tibetan Plateau.However, underestimates over the Tibetan Plateau are reduced in both versions of CoLM relative to those produced by CLM4.5, especially during boreal summer.Both versions of CoLM overestimate LST in Siberia, while CLM4.5 underestimates LST in this region through most of the annual cycle.In the Central US, biases in simulated LSTs relative to MODIS show similar seasonal cycles for all three models.Anomalies relative to MODIS data are shown in Figure 18 to better illustrate the differences between simulated LST and MODIS products.All three models overestimate LST in the Amazon region and underestimate LST on the Tibetan Plateau.However, underestimates over the Tibetan Plateau are reduced in both versions of CoLM relative to those produced by CLM4.5, especially during boreal summer.Both versions of CoLM overestimate LST in Siberia, while CLM4.5 underestimates LST in this region through most of the annual cycle.In the Central US, biases in simulated LSTs relative to MODIS show similar seasonal cycles for all three models.

Net Radiation at the Land Surface
In general, the three models considered here generate smaller net radiation fluxes at the land surface than those indicated by SRB (Figure 19).High biases in net radiation fluxes over the Amazon Basin and equatorial Africa are larger in CLM4.5 than in either version of CoLM (Figure 19d-f).The performance of CoLM2014 is very close to that of CoLM2005, but CoLM2014 has a higher CPC (0.85) and a lower RMSE (16.13 W m −2 ) than CoLM2005 with respect to SRB.Net radiation fluxes simulated by CLM4.5 are smaller than SRB estimates in most regions, apart from some regions of South America where CLM4.5 produces high biases.Both versions of CoLM simulate larger net radiation fluxes into the surface than suggested by SRB in regions north of 50° N, while all three models overestimate net radiation fluxes in the southern part of South America.

Net Radiation at the Land Surface
In general, the three models considered here generate smaller net radiation fluxes at the land surface than those indicated by SRB (Figure 19).High biases in net radiation fluxes over the Amazon Basin and equatorial Africa are larger in CLM4.5 than in either version of CoLM (Figure 19d-f).The performance of CoLM2014 is very close to that of CoLM2005, but CoLM2014 has a higher CPC (0.85) and a lower RMSE (16.13 W m −2 ) than CoLM2005 with respect to SRB.Net radiation fluxes simulated by CLM4.5 are smaller than SRB estimates in most regions, apart from some regions of South America where CLM4.5 produces high biases.Both versions of CoLM simulate larger net radiation fluxes into the surface than suggested by SRB in regions north of 50° N, while all three models overestimate net radiation fluxes in the southern part of South America.

Net Radiation at the Land Surface
In general, the three models considered here generate smaller net radiation fluxes at the land surface than those indicated by SRB (Figure 19).High biases in net radiation fluxes over the Amazon Basin and equatorial Africa are larger in CLM4.5 than in either version of CoLM (Figure 19d-f      Differences in land surface albedo between CoLM2005 and CoLM2014 may contribute to differences in simulated net radiation.Albedo for bare soils is related to soil wetness in the surface soil layer in both models.For example, the albedo of base soil in the visible band is parameterized as min , where α sat is the albedo for saturated soil (based on soil color).This parameter differs between the two versions of CoLM, as these models use different soil color data.Δα(θ l,1 ) represents the increase of albedo with increasing soil dryness, represented as a function of surface soil water volumetric content θ l,1 .Canopy reflectance is divided into direct and diffuse portions, and is calculated by combining the effective albedos of the (infinite) canopy and the underlying ground, as well as the leaf and stem area indices.For example, the contribution to albedo of direct beam reflectance is given by  Differences in land surface albedo between CoLM2005 and CoLM2014 may contribute to differences in simulated net radiation.Albedo for bare soils is related to soil wetness in the surface soil layer in both models.For example, the albedo of base soil in the visible band is parameterized as where α sat is the albedo for saturated soil (based on soil color).This parameter differs between the two versions of CoLM, as these models use different soil color data.∆α(θ l,1 ) represents the increase of albedo with increasing soil dryness, represented as a function of surface soil water volumetric content θ l,1 .Canopy reflectance is divided into direct and diffuse portions, and is calculated by combining the effective albedos of the (infinite) canopy and the underlying ground, as well as the leaf and stem area indices.For example, the contribution to albedo of direct beam reflectance is given by where α v,f is the albedo of an infinite canopy, L SAI is the sum of the leaf and stem area indices, µ is the cosine of the solar zenith angle and ωβ is the fraction of direct sunlight scattered upward.Differences in LAI, saturated soil albedo, and surface soil wetness all contribute to differences in simulated albedo between the two versions of CoLM, which in turn result in different simulated net radiation fluxes at the land surface.

Mean Statistics
In this section, we summarize the performance of each model relative to the other models with respect to both the global annual mean and the regional seasonal cycles.Our assessment is based on the Model Climate Performance Index (MCPI) [57], which averages each model's relative error across all of the fields listed in Table 1.Relative error is defined as where E m f is the RMSE for model m and field f and E f is the median RMSE across all three models for the same field f.The latter is also called the "typical error" The median is used in place of the mean to limit the effects of outliers.Negative values indicate better than average performance, while positive values indicate worse than average performance among the three models.Figure 21 summarizes the results for the global annual means and the regional seasonal cycles.The zero line corresponds to the mean result across the three models.These figures also show the contributions of relative errors in each field f to the total MCPI.For the Central US, the minimum MCPI among the three models is achieved by CLM4.5, which produces the most realistic simulations of total runoff, snow depth, and soil moisture (three variables related to the surface water budget).However, CLM4.5 doesn't produce the same advantages with respect to the energy budget, and particularly with respect to land surface temperature.Both versions of CoLM outperform CLM4.5 with respect to land surface temperature and other aspects of the surface energy budget in this region.CoLM2005 produces much larger errors in soil moisture within the Central US than the other two models, with significant low biases relative to the ERA-Interim validation standard.

Conclusions
In this study, we have compared and evaluated simulations of runoff, snow depth, soil moisture, latent and sensible heat fluxes, land surface temperature, and net surface radiation fluxes For the regional seasonal cycles in the Amazon, CLM4.5 provides the most realistic performance of two water budget related variables (runoff and soil moisture) (Figure 21b), but again produces the largest MCPI among the three models due to large errors in sensible heat.CoLM2005 shows a significantly larger bias in latent heat flux than the other two models in this region, while CoLM2014 generates the worst soil moisture due to persistent high biases through most of the year.
In the Tibetan Plateau, CoLM2014 represents an evident improvement relative to CoLM2005 with respect to simulating snow depth, surface soil moisture, and total runoff.However, CoLM2014 diverges away from the validation standards for latent and sensible heat fluxes due to a high bias in the simulated latent heat flux and a low bias in the simulated sensible heat flux.These biases suggest that the partitioning of surface energy fluxes in CoLM2014 needs to be improved in this region.
Overall, CoLM2005 produces the best simulation of the land surface state in Siberia among the three models, as it ranks first in simulations of surface soil moisture, sensible heat flux, and net radiation.CLM4.5 provides the most accurate simulations of latent heat flux and total runoff, but the least accurate simulations of soil moisture and net radiation.Relative errors in CoLM2005 and CoLM2014 are quite similar in this region.
For the Central US, the minimum MCPI among the three models is achieved by CLM4.5, which produces the most realistic simulations of total runoff, snow depth, and soil moisture (three variables related to the surface water budget).However, CLM4.5 doesn't produce the same advantages with respect to the energy budget, and particularly with respect to land surface temperature.Both versions of CoLM outperform CLM4.5 with respect to land surface temperature and other aspects of the surface energy budget in this region.CoLM2005 produces much larger errors in soil moisture within the Central US than the other two models, with significant low biases relative to the ERA-Interim validation standard.

Conclusions
In this study, we have compared and evaluated simulations of runoff, snow depth, soil moisture, latent and sensible heat fluxes, land surface temperature, and net surface radiation fluxes based on the CLM4.5, CoLM2005, and CoLM2014 land surface models.Global centered pattern correlations and root-mean-square errors in simulations of the 1980-2009 climatological mean have been used to compare the relative performances of these three models at the global scale.These metrics have been supplemented by the BIAS, NASH, and RMSE metrics for evaluating model representations of the seasonal cycle in four focus regions.The main conclusions are as follows: 1.
The CLM4.5 provides the most realistic simulation of total runoff.The two versions of CoLM perform comparably in the global mean, with CoLM2014 representing an improvement relative to CoLM2005 in some areas.All three models underestimate runoff in regions with high soil clay contents.This situation could potentially be improved by including a representation of fast flow-through macropore structure.

2.
All models generate seasonal variations in surface soil moisture that match those produced by ERA-Interim in the four focus regions, but often with large quantitative biases.Both versions of CoLM produce reasonable simulations of surface soil moisture at the global scale.

3.
The two versions of CoLM also produce similar simulations of snow depth.CoLM2014 is most consistent with validation data in the temperate zone between 30 • N and 60 • N, while CLM4.Simulated spatial patterns and global mean land surface temperatures agree well with MODIS observations, but large biases in simulated LST are evident in the Amazon.These biases may arise in part due to diurnal sampling biases in MODIS data relative to model simulations; however, the existence of pervasive errors in multiple variables in the simulated land surface state in this region suggests that the current generation of land surface models cannot yet adequately represent the complexity of land surface processes in the Amazon basin.8.
Simulations of net radiation into the surface are similar to SRB estimates, except in the Amazon where the net radiation flux based on CLM4.5 is significantly larger than net radiation fluxes based on CoLM2005, CoLM2014, or SRB.Despite this quantitative difference, CLM4.5 still produces a qualitatively realistic seasonal cycle, as do the two versions of the CoLM.
Overall, the main advantage of CoLM2014 over CoLM2005 is that it is more stable, with relatively consistent error magnitudes among the fields evaluated here.This consistency may be attributed in part to the inclusion of additional soil parameters in CoLM2014 (such as soil organic carbon content) and the addition of an optimization algorithm for soil properties.Moreover, we find that the performance of CoLM2014 is often comparable to that of CLM4.5.CoLM2014 is a stand-alone LSM with a relatively simple structure that can be easily coupled to other atmospheric or hydrological models, and is therefore a reasonable choice for Chinese GCMs.The inclusion of CoLM2014 in Chinese GCMs will have the benefit of increasing the variety of LSMs contributing to CMIP6.There remains room for improvement in CoLM2014 before it is fully coupled into GCMs for CMIP6 simulation; the results of this evaluation study will help to guide this work.The results presented in this paper will also aid in the interpretation of BNU-GCM and CIGCM model results submitted to CMIP6.
RMSE indicate closer agreement between simulated and observed variability.

Figure 1 .
Figure 1.Locations of the four focus regions (Amazon, Central US, Siberia and Tibetan Plateau) and three flux sites (BR-Sa3, RU-Ha1 and US-Ne2).

Figure 1 .
Figure 1.Locations of the four focus regions (Amazon, Central US, Siberia and Tibetan Plateau) and three flux sites (BR-Sa3, RU-Ha1 and US-Ne2).
CLM4.5 produces a more realistic simulation of runoff in the deep tropics (15 • S-5 • N) than either version of CoLM.Although CoLM2005 and CoLM2014 perform similarly in the Southern Hemisphere, runoff in the Northern Hemisphere is more realistic in CoLM2014 than in CoLM2005.CoLM2005 and CoLM2014 use the same surface runoff model, but the groundwater model has been updated in CoLM2014.This change influences simulations of both surface and subsurface runoff, and produces different simulations of total runoff.Overall, all three models systematically underestimate runoff in most latitude bands, as also indicated by the global spatial distributions (Figure 2d-f).Atmosphere 2017, 8, 141 8 of 33

Figure 3 .
Figure 3. Annual mean (a) net precipitation forcing based on Climate Research Unit-National Centers for Environmental Prediction (CRUNCEP) minus that based on the QIAN data set (mm day −1 ) and (b) simulated runoff in CLM4.5 driven by QIAN data minus Global Composite Runoff Data Set (GRDC) total runoff (mm day −1 ).

Figure 3 .
Figure 3. Annual mean (a) net precipitation forcing based on Climate Research Unit-National Centers for Environmental Prediction (CRUNCEP) minus that based on the QIAN data set (mm day −1 ) and (b) simulated runoff in CLM4.5 driven by QIAN data minus Global Composite Runoff Data Set (GRDC) total runoff (mm day −1 ).
Tibetan Plateau.CLM4.5 and CoLM2014 produce similar seasonal cycles, but the magnitude is closer to observed in CoLM2014.
with ψ the matric potential and Δz the thickness of the soil layer.The hydraulic conductivity is then calculated by the piecewise function

Figure 9 33 Figure 9 .
Figure 9 shows global spatial distributions of simulated latent heat fluxes and biases relative to the model tree ensemble upscaling of fluxnet observations (FLUXNET MTE).Both CLM4.5 and CoLM2005 overestimate latent heat fluxes at low latitudes relative to FLUXNET MTE.CPCs for all three models are greater than 0.9, with RMSEs of 10.89 W m −2 for CLM4.5, 15.28 W m −2 for CoLM2005, and 12.26 W m −2 for CoLM2014.Among the three models, CLM4.5 provides the best representations of the spatial distribution and maximum value of latent heat flux, but CoLM2014 outperforms CoLM2005 in all metrics except for CPC.High biases in tropical latent heat fluxes are substantially improved in CoLM2014 relative to CoLM2005, especially in the Amazon and equatorial Africa.Atmosphere 2017, 8, 141 17 of 33
Figure 12 shows spatial distributions in sensible heat fluxes and biases relative to the validation standard.Overall, distributions of sensible heat fluxes produced by three models are less consistent with FLUXNET MTE than are the corresponding distributions of latent heat fluxes.All three models have similar global CPCs and RMSEs relative to the validation standard, but with some regional differences.CLM4.5 overestimates sensible heat fluxes in the Amazon, whereas both versions of CoLM are closer to the validation standard.CLM4.5 and CoLM2014 underestimate sensible heat fluxes across most of North America, with CoLM2005 closer to the validation standard.CLM4.5 overestimates sensible heat fluxes in Europe, while both versions of CoLM underestimate sensible heat fluxes in the same region.Both versions of CoLM underestimate sensible heat fluxes in Southeast Asia, while CoLM2014 produces the smallest biases.

Figure 12
Figure 12 shows spatial distributions in sensible heat fluxes and biases relative to the validation standard.Overall, distributions of sensible heat fluxes produced by three models are less consistent with FLUXNET MTE than are the corresponding distributions of latent heat fluxes.All three models have similar global CPCs and RMSEs relative to the validation standard, but with some regional differences.CLM4.5 overestimates sensible heat fluxes in the Amazon, whereas both versions of CoLM are closer to the validation standard.CLM4.5 and CoLM2014 underestimate sensible heat fluxes across most of North America, with CoLM2005 closer to the validation standard.CLM4.5 overestimates sensible heat fluxes in Europe, while both versions of CoLM underestimate sensible heat fluxes in the same region.Both versions of CoLM underestimate sensible heat fluxes in Southeast Asia, while CoLM2014 produces the smallest biases.

Figure 17 .
Figure 17.Same as Figure 3, but for land surface temperature (Units: K).

Figure 18 .
Figure 18.Anomalies of simulated LST from MODIS product (Simulation-MODIS) for four focus regions (Units: K).

Figure 17 .
Figure 17.Same as Figure 3, but for land surface temperature (Units: K).

Figure 18 .
Figure 18.Anomalies of simulated LST from MODIS product (Simulation-MODIS) for four focus regions (Units: K).

Figure 18 .
Figure 18.Anomalies of simulated LST from MODIS product (Simulation-MODIS) for four focus regions (Units: K).

Figure 20
Figure20shows seasonal cycles of net radiation at the surface based on the three models and SRB data in the four focus regions.In the Amazon Basin, CoLM2005 and CoLM2014 simulations match SRB data well, while results based on CLM4.5 are much larger.Comparison of the NASH coefficients (Table8) shows that the CoLM2005 simulation (NASH = 0.350) is most consistent with SRB in this region.In Siberia, CLM4.5 and CoLM2014 produce similar results.CoLM2014 produces a slightly higher NASH coefficient (0.887) than CLM4.5 in this region; however, the results based on CoLM2005 are most consistent with SRB (NASH = 0.970).Net radiation flux simulations for the Tibetan Plateau based on all three models deviate from SRB during January to May and September to December.The simulations based on CoLM2005 and CoLM2014 match SRB estimates during the intervening months (June-August) better than the simulation based on CLM4.5 does.All three models perform well in the Central US, with seasonal cycles that closely match that indicated by SRB.Differences between CoLM2005 and CoLM2014 are small in these four regions, although NASH values are consistently larger for CoLM2005 than for CoLM2014.

Figure 20
Figure20shows seasonal cycles of net radiation at the surface based on the three models and SRB data in the four focus regions.In the Amazon Basin, CoLM2005 and CoLM2014 simulations match SRB data well, while results based on CLM4.5 are much larger.Comparison of the NASH coefficients (Table8) shows that the CoLM2005 simulation (NASH = 0.350) is most consistent with SRB in this region.In Siberia, CLM4.5 and CoLM2014 produce similar results.CoLM2014 produces a slightly higher NASH coefficient (0.887) than CLM4.5 in this region; however, the results based on CoLM2005 are most consistent with SRB (NASH = 0.970).Net radiation flux simulations for the Tibetan Plateau based on all three models deviate from SRB during January to May and September to December.The simulations based on CoLM2005 and CoLM2014 match SRB estimates during the intervening months (June-August) better than the simulation based on CLM4.5 does.All three models perform well in the Central US, with seasonal cycles that closely match that indicated by SRB.Differences between CoLM2005 and CoLM2014 are small in these four regions, although NASH values are consistently larger for CoLM2005 than for CoLM2014.

Figure 21 .
Figure 21.Relative errors for (a) annual mean global mean fields and regional seasonal cycles in the (b) Amazon; (c) Tibetan Plateau; (d) Siberia; and (e) Central US focus domains.Black solid circles show total MCPI in each region.Colored triangles show relative errors for each of the variables that contribute to the total MCPI.Negative values indicate better than average agreement with the corresponding validation standard.

Figure 21 .
Figure 21.Relative errors for (a) annual mean global mean fields and regional seasonal cycles in the (b) Amazon; (c) Tibetan Plateau; (d) Siberia; and (e) Central US focus domains.Black solid circles show total MCPI in each region.Colored triangles show relative errors for each of the variables that contribute to the total MCPI.Negative values indicate better than average agreement with the corresponding validation standard.
5 outperforms both versions of CoLM in the high latitude regions from 60 • N to 90 • N. 4. CLM4.5 provides the most accurate simulation of latent heat fluxes at the global scale, but CoLM2014 outperforms CoLM2005.None of the three models can currently capture the observed seasonal cycle of latent heat flux in the Amazon, perhaps due to simplistic representations of dynamic root distributions in this region.Simulated latent heat fluxes based on CLM4.5 outperform those based on either version of CoLM relative to observed values in the Tibetan Plateau, Siberia, and Central US focus domains.5. Validation of simulated sensible heat fluxes indicates that these three models perform comparably at the global scale.CPCs for sensible heat fluxes are substantially smaller than those for latent heat fluxes, indicating that models are less able to capture the sensible heat component of the surface flux.As with latent heat flux, none of the three models can capture the seasonal cycle of sensible heat flux in the Amazon.Simulations based on CoLM2005 are most consistent with the validation standard in the other three regions.Differences between CoLM2005 and CoLM2014 may arise from the inclusion of temperature and specific humidity adjustments in the calculation of sensible heat in CoLM2014.The method for calculating roughness lengths in CoLM should be a target for improvement.6.None of the three models generates realistic diurnal cycles of latent and sensible heat fluxes at the three sites chosen for comparison.Biases between simulated and observed fluxes are consistently large during daytime hours.7.

Table 2 .
Statistical summary of simulated runoff annual cycles.

Table 2 .
Statistical summary of simulated runoff annual cycles.

Table 3 .
Statistical summary of simulated snow depth annual cycles.

Table 3 .
Statistical summary of simulated snow depth annual cycles.

Table 5 .
Statistical summary of simulated latent heat flux annual cycles.

Table 6 .
Statistical summary of simulated sensible heat flux annual cycles.

Table 6 .
Statistical summary of simulated sensible heat flux annual cycles.

Table 7 .
Statistical summary of simulated land surface temperature annual cycles.

Table 7 .
Statistical summary of simulated land surface temperature annual cycles.

Table 8 .
Statistical summary of simulated net surface radiation annual cycles.