Evaluation of Land Surface Models in Reproducing Satellite Derived Leaf Area Index over the High-Latitude Northern Hemisphere. Part II: Earth System Models

Leaf Area Index (LAI) is a key parameter in the Earth System Models (ESMs) since it strongly affects land-surface boundary conditions and the exchange of matter and energy with the atmosphere. Observations and data products derived from satellite remote sensing are important for the validation and evaluation of ESMs from regional to global scales. Several decades’ worth of satellite data products are now available at global scale which represents a unique opportunity to contrast observations against model results. The objective of this study is to assess whether ESMs correctly reproduce the spatial variability of LAI when compared with satellite data and to compare the length of the growing season in the different models with the satellite data. To achieve this goal we analyse outputs from 11 coupled carbon-climate models that are based on the set of new global model simulations planned in support of the IPCC Fifth Assessment Report. We focus on the average LAI and the length of the growing season on Northern Hemisphere over the period 1986–2005. Additionally we compare the results with previous analyses (Part I) of uncoupled land surface models (LSMs) to assess the relative contribution of vegetation and climatic drivers on the correct representation of LAI. Our results show that models tend to overestimate the average values of LAI and have a longer growing season due to the later dormancy. The similarities with the uncoupled models suggest that representing the correct vegetation fraction with the associated parameterizations; is more important in controlling the distribution and value of LAI than the climatic variables.


Introduction
The Leaf Area Index (LAI) is defined as one-sided green leaf area per unit ground area in broadleaf canopies, and as the projected needle leaf area in coniferous canopies [1].LAI is a key parameter in most ecosystem productivity models and global (or regional) models of climate, hydrology, biogeochemistry and ecology [2].
Usually defined as the time evolution of the LAI, leaf phenology depends primarily on the climatic conditions for a given biome [3].It strongly affects land-surface boundary conditions and the exchange of matter and energy with the atmosphere, influencing the surface albedo, roughness, and dynamics of the terrestrial water cycle [4,5].Changes in the phase of LAI may therefore have impacts on climate [6,7], on the terrestrial carbon cycle [8], and on the atmospheric chemistry through the emission and deposition of several compounds [9][10][11][12].Therefore, accurate estimates of canopy phenology are critical to quantifying carbon and water exchange between forests and the atmosphere and its response to climate change [8].
Phenology studies based on field observations [13,14], remote-sensing data [15][16][17][18][19], atmospheric CO 2 observations [20], and biogeochemical models [21] indicate that the vegetation growing season length (GSL) has significantly increased over the past decades [8].Specifically, in the temperate and boreal regions of the Northern Hemisphere, the growing season begins in spring with increasing temperatures and solar radiation, the melting of snow, eventual thawing of the soil organic horizons, and the start of photosynthesis [22].It terminates in autumn as temperatures and solar radiation decrease, soils refreeze, and photosynthesis ceases [23,24].Therefore, temperature anomalies in spring and autumn affect the timing and duration of the growing season [8,25], which in turn control the seasonal onset and ending of the ecosystem carbon uptake period in these regions [8,26,27].
Rising temperatures during recent decades have resulted in a widely reported pattern of earlier and longer-lasting growing seasons from local to continental scales [13,15,16,20,[27][28][29]. The greater rate of change observed in the beginning of the growing season is thought to be a response to rapid spring warming, and earlier snowmelt and soil thaw [15,30], while the smaller change in termination date is likely connected with lower rates of autumn warming [31] and the influence of other environmental effects on autumn phenology and growth cessation [23,32,33].
In the first versions of general circulation models (GCMs) and regional climate models (RCMs) the soil-vegetation-atmosphere transfer (SVAT) schemes [47] were originally designed to simulate exchanges of matter and energy between the land surface and the atmosphere, with vegetation leaf area index as a forcing variable, rather than a prognostic state [6,44,[48][49][50][51][52].
In the last few years a new generation of general circulation models has become available to the scientific community.In comparison to the former model generation, these Earth System Models incorporate additional components describing the atmosphere's interaction with land-use and vegetation, as well as explicitly taking into account atmospheric chemistry, aerosols and the carbon cycle [62].
The inclusion of Earth system components in a climate model has a two-fold benefit.Firstly, it allows a consistent calculation of the impacts of climate change on atmospheric composition or ecosystems [63].Secondly, it allows the incorporation of biogeochemical feedbacks, which can be negative, dampening the sensitivity of the climate to external forcing [64], or positive, amplifying the sensitivity [65].However, adding Earth systems components and processes increases the complexity of the model system, thus a consistent validation of the variables simulated by these models is needed.
The assessment of vegetation phenology using remotely sensed data has a long history [66,67] with more recent studies making use of satellite data to examine the potential effects of climate change on phenology [15,[68][69][70][71][72].In fact, remote sensing has been widely recognised as a valuable tool for the detection and analyses of simulated data, both spatially and temporally.The past decade has seen a particularly rapid increase in the number of launched satellites, as well as an improvement in both spatial and spectral resolution of data they produce.Therefore, the ability to rapidly assess LAI using vegetation indices from remotely sensed imagery provides a means to rapidly assess ESMs' skills at simulating vegetation greenness over a wide geographic area.
The existence of vegetation models that use prescribed climate represents a unique opportunity to compare and contrast the effect of inner climatic variation on ESMs against the effect of differences in the vegetation modules.In other words, comparing different LSMs allows the detection of flaws in the vegetation dynamics, while comparing ESMs allows the identification of climate effect on vegetation processes, and the comparison of the two leads to the weighing of both effects.
In this context, we check the ability of different ESMs to reproduce the spatial and temporal variability of the satellite observed LAI.Specifically, the objective of this study is to assess whether ESMs correctly reproduce the spatial variability of LAI when compared with satellite data, and asses how long the growing season is in the different models compared with the satellite data over the Northern Hemisphere.In fact, as described above, over this area several authors have observed an increase in the growing season length.These changes in LAI, mostly due to an increase in temperature at the beginning of the growing season, have important implications on the global carbon cycle [8] and on atmospheric chemistry [9][10][11][12] simulated by the ESMs.Therefore, obtaining an accurate prediction of the temporal evolution of LAI is imperative not only in predicting the correct LAI seasonal changes, but also because of the feedbacks of LAI with the atmosphere.
In addition, we compare results from uncoupled models from part I [73] with the ESMs to elucidate the weighed role of vegetation and climate on the spatial and temporal evolution of LAI.

Coupled Model Intercomparison Project Phase 5 (CMIP5) Simulations
We analyse output from 11 Coupled Model Intercomparison Project Phase 5 (CMIP5) coupled carbon-climate models that, at the time of our analysis, had been submitted to the Program for Climate Model Diagnosis and Inter-comparison (PCMDI) Earth System Grid (ESG) [74].
The land components of these ESMs differ in their representations of vegetation types, soil properties, human disturbances, carbon and nitrogen pools, as well as in their horizontal resolutions.The models used in this study, along with the main features controlling their terrestrial carbon cycle, are listed in Table 1.Our analysis focuses on the historical period (20th century simulations; CO 2 concentration driven), which was forced by a variety of externally imposed changes such as increasing greenhouse gas and sulphate aerosol concentrations, change in solar radiation, and forcing by volcanic eruptions [62].Considering the historical experiments, in general for most of the CMIP5 models the simulation starts in the year 1850 and ends in 2005.Within this period, we focus only on the last 20 years of the 20th century simulation (1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005); in fact, although satellite data are available before 1986, we decided to use the same reference period used by [85] in order to be consistent with their analysis and results.
Besides, it is noteworthy that some models have only one realisation, but other models have many runs; these realisations represent climate simulations with different initial conditions.In the next section, we present results only from the first realization for each individual model.
For comparisons and evaluations, we re-grid all model outputs to a common 1° × 1° grid using a bilinear interpolation method.This resolution was chosen to be consistent with the resolution of uncoupled models [73].Although the CMIP5 archive includes daily means for a few variables, to be consistent with uncoupled models analysis [73] we focus here only on the monthly mean model output.

Satellite Data
The LAI data set used in this study (LAI3g) was generated using an Artificial Neural Network (ANN) from the latest version (third generation) of GIMMS AVHRR NDVI data for the period July 1981 to December 2010 at 15-day frequency [86].The ANN was trained with best-quality Collection 5 MODIS LAI product and corresponding GIMMS NDVI data for an overlapping period of 5 years (2000 to 2004) and then tested for its predictive capability over another five year period (2005 to 2009).The accuracy of the MODIS LAI product is estimated to be 0.66 LAI units [86] and the data is for 1-sided LAI.Further details on the LAI3g and the comparison with other satellite products are provided in [86,87].

Leaf Phenology Analysis
Growing season onset, dormancy and length were all calculated based on the LAI seasonal amplitude.In fact, LAI has been shown to have a normal distribution over the year in northern latitudes [71], so we consider the start of the growing season to be 20% of the maximum amplitude.The values of 20% was defined after different tests were conducted using different thresholds ranging between 5% and 30% of the maximum amplitude; we found that this value provided the best results.
Overall, this method has being proven to be more stable for monthly data, compared to an approach based on sudden LAI changes [8].It also should be noted that due to the lack of daily data for the LAI we were unable to use other methods used in previous studies based on the daily LAI variability [8].
In order to analyse changes in the growing season, we mask out regions where there are small changes in LAI over the year (e.g., evergreen forests and mixed forest with a small deciduous component).Therefore, all grid points where the difference between the maximum and minimum LAI amplitude is less than 0.5 are ignored in this analysis.
Considering every grid cell (x,y) we calculated a critical threshold value (CT x,y ) above which we assume the plants to be photosynthetically active: ( )

LAI
represent the minimum and maximum LAI over one year for the grid cell (x,y).This procedure was repeated on each grid cell and for each year for any given CMIP5 model.The length of the growing season was then calculated as the number of months with a value above this threshold; the onset is the first month above that value and the dormancy is the last.Since naturally part of the growing season might occur after the end of the year [88], we included the first three months of the next year in the calculations of the dormancy.Hence the growing season offset can occur on of the following year, having a DOY (day of year) higher than 365.
Finally, mean length, onset and dormancy were calculated as the average over the whole time period.It should be noted that, even when calculated monthly, all results are presented as days; we retrieved the daily values from the monthly data by multiplying all monthly results by the number of days within the given month.
The temporal changes in the mean annual LAI and GSL were estimated by the linear trend value obtained from a least squares fit line computed for period 1986-2005 of satellite and model data.
In order to quantify the mismatch between models and data, we calculate the root mean square errors (RMSE) and the spatial correlation coefficient between each model and the satellite observations.

Mean Leaf Area Index (LAI)
In Figure 1 we present the mean annual LAI (upper panel), the mean annual land precipitation (middle panel), and the mean annual surface temperature (bottom panel) for each model for the period 1986-2005, with the corresponding interannual variability (IAV) and trends.Considering the x-axis of the temperature, models falling at the left of observations (CRU, [89]) indicate a cold bias while models at the right of the reference data have a warm bias.On the y-axis models above the observations have a stronger trend than observations, while the models below the reference value show a lower trend.The same consideration is also valid for the precipitation, namely models falling at the left of observations (CRU) indicate a dry bias (wet bias when falling at the right), while on the y axis models below the observations have a stronger drying trend than observations.It should also be noted that, to be consistent with LAI, we show the precipitation and temperature only over the land points of the Northern Hemisphere.Finally, we selected CRU as reference data for the validation of climatic variables since it has been used as input data in the offline simulations [90].
The evaluation of the simulated precipitation and temperature is needed to assess whether any bias in the simulated LAI can be related to poor performance of the ESMs at reproducing physical variables, or is mainly due to the poor representation of some biogeochemical processes in the land surface models of ESMs.
Looking at the LAI (Figure 1), in general, except CanESM2 and INMCM4, all the models overestimate the mean annual LAI over the Northern Hemisphere.The poorest performance has been found in GFDL-EMS2G, which shows a mean value of 2.7, much larger than the reference value (0.83); all the other models show a mean annual LAI ranging from 1.2 and 1.7.Conversely, the trends are well captured by quite a few models; specifically, many models are clustered around the reference value, and, according to the observations, all the models show a greening in the last 20 years.The only far outlier is BNU-ESM, having a positive trend 6 times larger than the observed value.The interannual variability is in general well captured by most of the models, although an overestimation of the year-to-year variability is found for a few models; the exceptions are CanESM2 and MPI-ESM-MR, which show an interannual variability slightly lower than LAI3g.Also, in this case, BNU-ESM is the only outlier in reproducing the IAV, having a year-to-year variability much larger than the reference value.The large bias found in BNU-ESM could be related in some way to the strong wet bias that this model has in reproducing the observed precipitation.Specifically, whilst the mean annual precipitation as reported by CRU data is 520 mm/yr, BNU-ESM shows a mean value of 802 mm/yr.However, it should also be noted that, except CanESM2, all the other models also have a wet bias.
The wet bias found in all the CMIP5 ESM could explain the LAI overestimation: in fact the best agreement between observed and simulated LAI is found for CanEMS2, this being the only model without a wet bias.Although in the boreal and Arctic region the temperature is the main limiting factor for the carbon assimilation, at mid-latitudes the precipitation plays a pivotal role through its control on the soil moisture [91,92].
The Although models in general show good skills in reproducing the observed climate, we would highlight that this agreement in the mean values over a large region could arise from a compensation between overestimation in some points of the domain and underestimation in other points [85].This suggests that to perform an exhaustive model validation we should look at the spatial patterns (e.g., maps).
For this reason in Figure 2 we show the spatial distribution of the mean annual LAI in the Northern Hemisphere as calculated from the CMIP5 ESMs and observed by satellite over the period 1986-2005.Results are projected over a stereographic projection from the North Pole, with the latitude ranging from 30°N to 90°N.The observed spatial pattern of LAI is characterized by a wide maximum over Northern America and by a negative gradient extending from central Europe to Northern-Eastern Asia, with a broad minimum in the Tibetan plateau due to the sparse vegetation.Although there is an overall overestimation by most of the CMIP5 models, 10 out of 11 models correctly reproduce this pattern: in particular CESM1-BGC, IPSL-CM5A-MR, and NorESM1-ME show a very good agreement with observations in terms of locations of the maximum and minimum values, as well as fairly simulating the gradient over the Eurasian region.This is confirmed by the relatively high value (>0.6) of the spatial correlation computed between the models and the reference data.Conversely, GFDL-ESM2G is not able to reproduce this spatial pattern, and LAI values above 5 are simulated over the whole North America and Asia; for this reason this model exhibits the lowest spatial correlation (0.21).It should be noted that, since ESMs generate their own climate, there is no reason to expect models and observations to agree on the phasing of interannual variations [85]; for this reason we did not account the RMSE for the mean annual LAI.
The seasonal amplitude patterns show large disagreement between the models and the satellite data (Figure 3).Some models (e.g., BNU-ESM and MIROC-ESM) clearly overestimate the mean amplitude, which is particularly evident over the whole North America and Eurasia.Other models (e.g., CESM1-BGC, HadGEM2-CC, MPI-ESM-MR, and NorESM1-ME) show a smaller seasonality than satellite data, while INMCM4, CanESM2 and BCC-CSM1 perform better than the rest of the models in reproducing the satellite-derived observations.The RMSE, indicating the mean error of the models in reproducing a given variable, suggests that CanESM2 has the lowest error: in fact this model, albeit it slightly underestimates the seasonal amplitude over the Russia, has the correct magnitude for the observed seasonal amplitude.The same considerations are also valid for IPSL-CM5A-MR, INMCM4 and BCC-CSM1 which show a RMSE of 1.1.Conversely, BNU-ESM and MIROC-ESM show a larger seasonal amplitude than the satellite data, therefore they have high RMSE values.The spatial correlation, indicating how well models reproduce the observed spatial pattern, confirm that CanESM2, INMCM4 overperform the spatial pattern of the seasonal amplitude compared to other models.

Growing Season
Figure 4 displays the spatial distribution of the mean onset dates of green-up as calculated from the CMIP5 ESMs and satellite observation for the period 1986-2005.As expected the satellite data show that the mean green-up date is progressively delayed with increasing latitude and increasing continentally [93].The latest dates of green-up occur in northern Siberia, northern Canada, and over the Tibetan Plateau, owing to low temperatures.
The growing season onset derived from CMIP5 ESMs shows that quite a few models do correctly reproduce the observed spatial pattern, as confirmed by the relatively high spatial correlation (>0.5).The exceptions are BCC-CSM1 who has some patchy areas of agreement with satellite data and HadGEM2-CC who has an earlier onset in the whole Arctic area than in the temperate regions.In the latter case, the wrong pattern leads also to a negative correlation (−0.13).-90°N).For each model we masked out all the grid points where the seasonal amplitude is less than 0.5 (see Figure 3).In the box the value of root mean square error (in days) and spatial correlation, as computed from mean annual data, are presented for each model against the observations.The models that correctly reproduce the green-up spatial pattern show an overestimation of the onset day, namely these models generally predict later onset values, particularly over the boreal forests of Siberia and Northern America.This leads to the large RMSE values found for the onset in most of the models, ranging from 24 to 38 days.
Considering GFDL-ESM2G, this model shows a larger onset date in the few "non-masked" grid points (Figure 4), while a large area of Eurasia and Northern America shows a seasonal amplitude less than 0.5 (Figure 3).This suggests a problem in the initialization of the vegetation during the spin up phase: in fact the GFDL land model only allows coniferous trees to grow in cold climates, i.e., deciduous trees and grass do not grow in these cold regions.As a result, coniferous trees are established in areas where there should be tundra or cold deciduous trees, and therefore the seasonal amplitude is lower than expected.
Satellite data shows that the dates of vegetation dormancy (Figure 5) occur in reverse order of the green-up onset, namely the green-up wave progresses northwards and dormancy wave progresses southwards [93].Considering all the 11 ESMs only BNU-ESM, INMCM4, GFDL-ESM2G, and CanESM2 have a dormancy distribution similar to the observed pattern, the spatial correlation being larger than 0.4.The remaining models have some patchy area of agreement, mainly over Europe and the temperate region of North America, while we found relevant problems over the whole Arctic area in HadGEM2-CC, IPSL-CM5A-MR and MPI-ESM-MR, with an offset occurring after DOY 365. ).For each model we masked out all the grid points where the seasonal amplitude is less than 0.5 (see Figure 3).In the box the value of root mean square error (in days) and spatial correlation, as computed from mean annual data, are presented for each model against the observations.In these regions, where the dominant vegetation is composed by boreal deciduous forest, modelled dormancy can happen after the end of the year (DOY higher than 365).However, the snow corrupts the satellite signal in those months and this partially explains why the errors are large in the dormancy date.This is particularly evident in IPSL-CM5A-MR that shows a dormancy date around February-March of the new year over the whole Arctic area, while satellite data shows that the offset occurs 5-6 months before.This explains the large RMSE (82 days) found in this model.
Although MIROC-ESM does not show the highest spatial correlation due to a few area of disagreement mainly located in Alaska, it shows one of the lower RMSE (27 days); the other models have errors ranging between 26 days (BCC-CSM1) and almost 3 months (IPSL-CM5A-MR).Compared to other CMIP5 models, GFDL-ESM2G has a smaller RMSE than the average.However it should be noted that it has been computed considering only a few grid points, due to the incorrect representation of the seasonal amplitude.-90°N).For each model we masked out all the grid points where the seasonal amplitude is less than 0.5 (see Figure 3).In the box the value of root mean square error (in days) and spatial correlation, as computed from mean annual data, are presented for each model against the observations.Looking at the satellite data, the growing season length (Figure 6) is found to increase dramatically with decreasing latitude [93].It is the shortest in central and eastern Siberia along the Arctic coast, with a duration of only 4 months.In contrast, most of Europe, Eastern China and Southern North America have long growing seasons.The growing season length shows a good agreement between satellite data and models, although individual models like IPSL-CM5A-MR, and HadGEM2-CC still exhibit large errors in reproducing the observed spatial pattern.
Specifically, the best results are found in BNU-ESM, INMCM4, being the correlation systematically greater than 0.6.Besides, in the few grid points covered by deciduous forests GFDL-ESM2G shows a good agreement with satellite GSL and this explains the relative high correlation and low RMSE compared to other ESMs.Consistent with previous results, HadGEM2-CC shows a negative correlation, indicating the inability of this model to reproduce the observed spatial variability.In addition HadGEM2-CC and IPSL-CM5A-MR also show the highest RMSE for the GSL, the error being more than 2 times larger than the lowest RMSE found in CanESM2.

Temporal Trends
Quite a few models predict an overall increase of LAI with time in most of the Northern Hemisphere, which is consistent with the satellite observations (Figure 7) which show a greening over the whole Eurasia and almost no negative trend in the whole Northern Hemisphere, with a small exception over western North America and few locations in the Eurasian boreal forest.For each model we masked out all the grid points where the seasonal amplitude is less than 0.5 (see Figure 3).The value in the box represents the spatial correlation between modelled and satellite data.
From the whole compendium, BNU-ESM, GFDL-ESM2G, HadGEM2-CC and NorESM1-ME display the highest increase in LAI (see also Figure 1), mostly over the eastern coast of North America, Europe and the boreal forest of Asia.IPSL-CM5A-MR, MIROC-ESM, BCC-CSM1 and INMCM4 have an intermediate signal with the increase shown over the same regions, and some patchy areas where LAI decreased.We found that none of the models were able to reproduce the correct spatial pattern, the spatial correlation being close to 0 for almost all the models, except MIROC-ESM, which shows a positive correlation of 0.15.
The models also show a general increase in the growing season length, with patchy areas where it decreases (Figure 8).It is clear that from the 11 ESMs, those that perform better at calculating the growing season (both on the onset and dormancy) and LAI also do better for the trends, despite there being no spatial correlation between CMIP5 models and satellite data.For each model we masked out all the grid points where the seasonal amplitude is less than 0.5 (see Figure 3).The value in the box represents the spatial correlation between modelled and satellite data.11 Earth System Models (ESMs) and satellite observations over the Northern Hemisphere (30°-90°N) computed using the Gross Primary Production (GPP).For each model we masked out all the grid points where the Leaf Area Index (LAI) seasonal amplitude is less than 0.5 (see Figure 3).In the box the value of root mean square error (in days) and spatial correlation, as computed from mean annual data, are presented for each model against the observations.Although the wet bias found in most of the analysed CMIP5 models could explain the positive bias in LAI, this overestimation of the mean LAI is consistent with results from the uncoupled models (Table 3), suggesting that it is unlikely that differences in climate in the coupled models are solely responsible for this positive bias.This general overestimation could also be explained by a combination of underestimation of observed LAI, likely due to a saturation of satellite instrumentation, particularly on areas with dense vegetation, and by missing parameterizations of disturbances in the models (e.g., pollution, insect attack, nutrient limitation, grazing, fire dynamics), which leads to a larger amount of carbon stored in the biomass, which, in turn, leads to a larger LAI.The combination of these two effects explains why we found a relevant overestimation of simulated LAI in both coupled and uncoupled models.
The geographical pattern of average LAI is also similar between coupled and uncoupled models [73].The overestimation of LAI is found consistently over the boreal forest (55°N) when compared to the satellite observations, with better agreement over areas with scarce vegetation.When comparing models with the same vegetation model (TRIFFID vs. HADGEM2-CC, ORCHIDEE vs. IPSL-CM5A-MR and CLM vs. NorESM1 or CESM1-BGC) there are little differences in the distribution of LAI, suggesting that climatic variations in the coupled models are less important in controlling the distribution of LAI than having the correct vegetation distribution and parameterizations.
The onset patterns are similar among all coupled and uncoupled models, with the latest onset occurring over the boreal region.The similarities are even stronger over the dormancy where all models display a general overestimation over the boreal region, possibly explained by the late leaf shed in all models [94].
These results suggest that both coupled and uncoupled models predict a later dormancy (day) and a longer growing season length in comparison to satellite observation (Table 3).It seems that leaves in the models remain for longer than they should.However the late dormancy is not in line with the vegetation photosynthetic activity: in fact, when the same methodology to calculate the end of the photosynthetic active period was applied to the gross primary productivity (GPP), we found that the dormancy began at 277 ± 7 days in the uncoupled models and 287 ± 13 days in case of CMIP5 models, which is remarkably earlier than previously predicted by LAI, and much closer to the observed value of 295 days.It is evident that all models are keeping inactive leaves for longer than they should, which does not have any impact on the carbon cycle but could potentially modify surface radiation budget and turbulent fluxes, affecting therefore the PBL dynamics, which in turn could lead to potential bias in lower atmospheric dynamics simulated by ESMs.In addition to those ESMs having an interactive tropospheric chemistry component, the presence of inactive leaves could modify the deposition fluxes that strongly depend on the area of the canopy [11].Conversely, the longer offset simulated by offline models does not affect simulation results since the climate is provided as input data and the feedbacks between the land surface and the atmosphere are not taken into account.Looking at the LAI trends, all the coupled models show a clear greening over the whole Northern America and Eurasia, consistent with satellite data, while not all the offline models show the same pattern over the high latitudes of the Northern Hemisphere.Considering all the ESMs, the greening of the high latitudes is likely driven by positive temperature trend (Figure 1) but in some of the offline models we observe a browning over the same region, suggesting that offline modelled LAI is also sensitive to moisture changes, as most of the browning occurs over areas where precipitation shows a decrease (not shown).
The previous similarities between coupled and uncoupled models, similar geographical distribution of LAI with higher values than the satellite data, and an extended growing season mostly driven by a later dormancy all suggests that the correct initialization, distribution, and parameterization of vegetation in the models is the most important feature in the correct representation of LAI.Nevertheless climatic variables, temperature in particular, have proven to be the main drivers of changes over time [95].

Conclusions
We compared LAI as simulated by 11 Earth-System Models against satellite data and uncoupled models [73], for the Northern Hemisphere during the 1986-2005 period.We compared the mean annual LAI, the spatial pattern of LAI and the onset, dormancy and length of the growing season.Our results show that models consistently overestimate the mean value of LAI, although considering the error of the reference data estimated by comparing the satellite data with ground measurements the model-data misfit is significantly reduced.In addition models have an increased growing season, mostly due to a later dormancy, compared to the satellite data.This is consistent with the finding on the uncoupled models.
We conclude that validating LAI in each model against satellite observations should be a fundamental step for all modelling groups.This is essential since LAI is a fundamental variable in all models, required to correctly calculate the hydrological, energetic and carbon fluxes.

Figure 1 .
Figure 1.The x-axis shows the observed and simulated mean annual leaf area index (LAI) (top), annual land precipitation (middle), and mean annual surface temperature over land (bottom).The y-axis shows the temporal trend, while the colorbar reports the interannual variability as computed from the annual standard deviation.
precipitation trends in general are well reproduced by the models, being all scattered around the reference data and all showing a wettening over the last 20 years.The exceptions are INMCM4, which does not show any trend in the land precipitation and GFDL-ESM2G, which has a wet bias two times larger than CRU.The interannual variability of the reference data is about 85 mm/yr and only INMCM4, IPSL-CM5A-MR, GFDL-ESM2G and MIROC-ESM well reproduce this value, while CanESM2 (~80 mm/yr), NorESM1-ME (~80 mm/yr) and BCC-CSM1 (75 mm/yr) have a slightly lower IAV and the remaining models show a larger IAV.It is noteworthy that MPI-ESM-MR has a IAV two times larger than the reference data.Looking at the temperature, all the models are clustered around the reference data and only HadGEM2-CC (cold bias) and MIROC-ESM (warm bias) show a bias greater than 1.5 °C.In addition, all the models predict a warming in the Northern Hemisphere during the last 20 years; the weaker trends have been found in HadGEM2-CC and INMCM4 being about 4 times smaller than the one reported by CRU.The observed temperature interannual variability is about 0.8 °C and only INMCM4 and MIROC-ESM have a similar IAV; all the other models show a larger IAV than CRU with NorESM1-ME having an IAV of about 1 °C.

Figure 2 .
Figure 2. Spatial distribution of mean annual leaf area index (LAI) as simulated by 11 Earth System Models (ESMs) and observed by satellite over the period 1986-2005 in the Northern Hemisphere (30°-90°N).The value in the box represents the spatial correlation between modelled and satellite values.

Figure 3 .
Figure 3. Leaf area index (LAI) Seasonal amplitude as simulated by 11 Earth System Models (ESMs) and satellite observations for the Northern Hemisphere (30°-90°N).Spatial correlation and root mean square errors between the respective model and observations are given in the white box.

Figure 4 .
Figure 4. Mean growing season onset (day) as simulated by 11 Earth System Models (ESMs) and satellite observations over the Northern Hemisphere (30°-90°N).For each model we masked out all the grid points where the seasonal amplitude is less than 0.5 (see Figure3).In the box the value of root mean square error (in days) and spatial correlation, as computed from mean annual data, are presented for each model against the observations.

Figure 5 .
Figure 5. Mean growing season dormancy (day) as simulated by 11 Earth System Models (ESMs) and satellite observations over the Northern Hemisphere (30°-90°N).For each model we masked out all the grid points where the seasonal amplitude is less than 0.5 (see Figure3).In the box the value of root mean square error (in days) and spatial correlation, as computed from mean annual data, are presented for each model against the observations.

Figure 6 .
Figure 6.Mean growing season length (days) for 11 Earth System Models (ESMs) and satellite observations over the Northern Hemisphere (30°-90°N).For each model we masked out all the grid points where the seasonal amplitude is less than 0.5 (see Figure3).In the box the value of root mean square error (in days) and spatial correlation, as computed from mean annual data, are presented for each model against the observations.

Figure 7 .
Figure 7. Observed and simulated Leaf Area Index (LAI) trends (%) computed over the period 1986-2005 for 11 Earth System Models (ESMs) and satellite observations over the Northern Hemisphere (30°-90°N).For each model we masked out all the grid points where the seasonal amplitude is less than 0.5 (see Figure3).The value in the box represents the spatial correlation between modelled and satellite data.

Figure 8 .
Figure 8. Observed and simulated Growing Season Length (GSL) trends (days/year) computed over the period 1986-2005 for 11 CMIP5 ESMs and satellite observations over the Northern Hemisphere (30°-90°N).For each model we masked out all the grid points where the seasonal amplitude is less than 0.5 (see Figure3).The value in the box represents the spatial correlation between modelled and satellite data.

Figure 9 .
Figure 9. Mean growing season dormancy (day) as simulated by 11 Earth System Models (ESMs) and satellite observations over the Northern Hemisphere (30°-90°N) computed using the Gross Primary Production (GPP).For each model we masked out all the grid points where the Leaf Area Index (LAI) seasonal amplitude is less than 0.5 (see Figure3).In the box the value of root mean square error (in days) and spatial correlation, as computed from mean annual data, are presented for each model against the observations.

Table 1 .
Coupled Model Intercomparison Project Phase 5 (CMIP5) Earth System Models used in this study with the associated land models and main features controlling the terrestrial carbon cycle.

Table 3 .
Comparison of coupled and uncoupled ensemble means of Leaf Area Index (LAI) and phenology averaged over the Northern Hemisphere (30°-90°N).