Satellite-Based Models Need Improvements on Simulating Annual Gross Primary Productivity: A Comparison of Six Models for Regional Modeling of Deciduous Broadleaf Forests

: Modeling vegetation gross primary productivity (GPP) is crucial to understanding the land–atmosphere interactions and, hence, the global carbon cycle. While studies have demonstrated that satellite-based models could well simulate intra-annual variation of vegetation GPP, there is a need to understand our ability to capture interannual GPP variability. This study compares the spatiotemporal performance of six satellite-based models in regional modeling of annual GPP for deciduous broadleaf forests across the eastern United States. The 2001–2012 average annual gross primary productivities (AAGPPs) derived from different models have mismatched spatial patterns with divergent changing trends along both latitude and longitude. Evaluation using ﬂux tower data indicates that some models could have considerable biases on a yearly basis. All tested models, despite performing well on the 8-day basis because of the underlying strong seasonality in vegetation productivity, fail to capture interannual variation of GPP across sites and years. Our study identiﬁes considerable modeling uncertainties on a yearly basis even for an extensively studied biome of deciduous broadleaf forest at both site and large scales. Improvements to the current satellite-based models have to be made to capture interannual GPP variation in addition to intra-annual variation. This study compares six satellite-based vegetation photosynthesis models for simulating the GPP of deciduous broadleaf forest. We ﬁnd that the simulated multiyear AAGPPs from different


Introduction
Vegetation gross primary productivity (GPP), the total carbon assimilation rate of vegetation in unit area in unit time, plays a key role in driving terrestrial ecosystem processes [1]. Quantifying and modeling terrestrial GPP is crucial to understand the massive carbon exchanges between the land surface and the atmosphere and, hence, the global carbon cycle [2][3][4].
Satellite remote sensing that provides synoptic observational data of the land surface has become an attractive tool for large-scale studies on terrestrial carbon cycles [5,6]. As the carbon fluxes between the land surface and the atmosphere are not directly measureable over large areas, various methods-mostly modeling methods such as the light use efficiency models, the process-based models, and the data-driven models-have been developed to simulate terrestrial GPP using remote sensing data [7]. The underlying idea of these methods is to simulate vegetation productivity based on satellite-derived leaf area index (LAI) and/or fraction of photosynthetically active radiation (FPAR), given that solar radiation absorption of vegetation leaves is closely related to plant carbon assimilation [8,9]. These models largely differ from each other in terms of model sophistication and consideration for the impacts of environmental factors (e.g., temperature, vapor pressure deficit, and soil water content) that regulate leaf photosynthetic rates [10][11][12][13][14].
Given that various satellite-based vegetation productivity models have become available, the model performance is key for applications in land surface studies. At the site scale, continuous ground measurements of the net energy and material exchanges between ecosystems and the atmosphere at eddy covariance flux towers have been found to be particularly useful in the study of terrestrial ecosystem carbon cycles [15][16][17]. As such, most studies developed GPP models based on in situ observations from flux tower sites and further applied the developed models to characterize the spatiotemporal pattern of terrestrial GPP at regional to global scales [18]. The model performance over large areas is less clearly understood than that at the site scale as there is a lack of appropriate large-scale reference data. A comparison of the model results is useful to identify and understand where current models need improvements. For example, comparisons among multiple models are found to deviate largely in specific regions like the Amazon basin and Africa, where ground observations are also limited [19].
Most existing studies focus on modeling and assessing GPP on the intra-annual (i.e., subdaily, daily, and/or weekly) basis rather than on the interannual basis, likely because evaluating the modeling results needs abundant long-term ground observations that are often unavailable. Deciduous broadleaf forest is one of the most extensively studied biomes given the abundant eddy covariance flux towers established in the midlatitude temperate region, of which the modeling is generally considered accurate with low uncertainties in land surface studies as compared with other biomes [20,21]. Studies have reported that the modeled GPP of deciduous broadleaf forest well agrees with flux tower measurements on the intra-annual basis [22,23]. Data accumulation from flux towers now allows assessment and examination of model performance at yearly time scale, which is still largely a knowledge gap in existing studies.
Given that modeling vegetation productivity on the yearly basis is one key to understanding the long-term dynamics of the land surface, this study compares and evaluates multiple models that use satellite data to simulate GPP of deciduous broadleaf forest across the eastern United States. The study would help improve our knowledge on (1) how current models agree with each other in an extensively studied biome and region and (2) how current models perform on the yearly basis as evaluated using site-level data. Such a study could then provide us a baseline for understanding our abilities of capturing interannual variation of vegetation productivity and point a potential direction in which to improve the current models.

Flux Tower Data
Data for 59 site-year combinations from 9 flux towers of deciduous broadleaf forests ( Table 1) that have Level 2 or Level 4 data of gross primary productivity (GPP) available from the AmeriFlux website (http://ameriflux.ornl.gov/) were used [11]. We used half-hourly gap-filled Level 4 data as priority and half-hourly or hourly Level 2 data as the secondary choice if Level 4 data were missing. Half-hourly or hourly flux tower measurements, such as air temperature, incoming solar radiation, atmospheric pressure, vapor pressure deficit, gross primary productivity, and latent heat, were preprocessed to a daily basis. Extraterrestrial solar radiation, photoperiod, and solar zenith angle (i.e., the angle of the sun away from directly overhead) are calculated as a function of geolocation (i.e., latitude and longitude), the day of year (DOY), and solar time of the day [24]. Diffuse solar radiation, if not measured, was derived based on incoming solar radiation and extraterrestrial solar radiation using an empirical model recently developed for the United States [25]. Atmospheric pressure, if missing, is derived as a function of elevation. All other functions that derive the needed but missing model inputs can be found in detail in [26].

Climate Data
We obtained daily 1000 m climate data from the Daymet datasets [35] as distributed by the Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (http://daymet.ornl.gov/). The Daymet dataset provides daily climate variables that include minimum air temperature, maximum air temperature, precipitation, shortwave radiation, vapor pressure, snow water equivalent, and day length. The climate datasets are reprojected to match the Sinusoidal projection of the Moderate-Resolution Imaging Spectroradiometer (MODIS) data using the tool of the Geospatial Data Abstraction Library. The Daymet dataset was processed to meet the requirements of different models. Daily mean air temperature was calculated as the average of daily maximum and minimum air temperatures. Daily vapor pressure deficits were derived as the difference between average saturated vapor pressure and vapor pressure. Atmospheric pressure was derived from the 1000 m digital elevation maps as obtained from NOAA's Global Land One-kilometer Base Elevation (GLOBE) project (http://www.ngdc.noaa.gov/mgg/topo/). Other climate data required by models, if missing, were calculated the same way as in the processing of the flux tower datasets.

MODIS Data
The 8-day 500 m MODIS leaf area index (LAI) Version 6 products (MOD15A2H; [36]), the 8-day 500 m MODIS surface reflectance Version 6 products (MOD09A1; [37]), and the yearly 500 m MODIS Land Cover type Version 5 products (MCD12Q1; [38]) were downloaded from the Land Processes Distributed Active Archive Center (https://lpdaac.usgs.gov/). Both MOD15A2H and MOD09A1 were used to derive necessary inputs for model simulation and MCD12Q1 was used to obtain areas of deciduous broadleaf forests. The satellite-derived LAI in MOD15A2H could fluctuate unrealistically in the time series due to cloud and/or aerosol contamination [39]. To produce a reasonable LAI time series, we first replaced poor-quality LAI data as derived based on the Quality Control data in MOD15A2H using the median value of a three-point moving window and then filled the gap, if any, using the autoregressive modeling approach [40]. Spikes in the gap-filled time series due to possible outliers were removed using the Hampel filter and then a Savitzky-Golay filter was applied to produce smoothed time series of canopy LAI on an 8-day basis, which were further linearly interpolated to daily time series [41]. The processed 500 m MODIS LAI time series were resampled to match the 1000 m climate datasets based on the averaging method for the pixels that are classified as deciduous broadleaf forests in the MODIS land cover product. The satellite-based LAI time series were extracted for the pixel containing the corresponding flux tower site. The surface reflectance data in MOD09A1 were processed in the same manner as the LAI data and were further used to derive satellite indices such as enhanced vegetation index (EVI; [42]) and land surface water index (LSWI; [43]). The MODIS land cover product of MCD12Q1 was resampled from 500 m to 1000 m resolution based on the majority approach using the Geospatial Data Abstraction Library (GDAL) and was then used to mask out pixels that are not deciduous broadleaf forests. Details on the processing of both climate and MODIS data can also be found in [11].

Methods
Six models, including three light use efficiency models (the MOD17 algorithm [44], the two-leaf light use efficiency (TL-LUE) model [45], and the Vegetation Photosynthesis Model (VPM) [46]), two process-based models (the Breathing Earth System Simulator (BESS) model [47] and the Growing Production Day (GPD) model [26]), and the eddy covariance and MODIS data-driven approach (the EC-MOD model [48]) were studied and compared for the biome of deciduous broadleaf forest. The six models tested here are representative as they have been applied to a broad range of studies and some have already been used to produce routine products. Details on the used datasets and the studied models are described as follows.

The MODIS GPP Product
The MOD17 algorithm is essentially a big-leaf light use efficiency model that uses both remote sensing data and meteorological data to model vegetation GPP. The MOD17 algorithm applies environmental scalar functions to account for the constraints of minimum air temperature and vapor pressure deficit on photosynthesis [12]. The operational GPP product, the 8-day 500 m MODIS GPP Version 6 product (MOD17A2H), was produced using the MOD17 algorithm with large scale from the NASA Global Modeling and Assimilation Office. MOD17A2H was also obtained from the website of the Land Processes Distributed Active Archive Center. Details on the MOD17 algorithm and the MOD17A2H product can be found in the user guide of the MODIS GPP product [39]. Note that the MOD17A2H product provides 46 GPP summations for each 8-day time period in one calendar year, and, therefore, the extracted GPP data were divided by 8 for the first 45 values in a year and by 5 (or 6 in a leap year) for the last value to obtain daily values. The obtained 8-day 500 m GPP was then spatially averaged to 1000 m resolution for subsequent analysis.

Two-Leaf Light Use Efficiency Model
The two-leaf light use efficiency (TL-LUE) model [45] aims to improve the MOD17 algorithm by separating the canopy into sunlit and shaded leaf groups and calculating GPP separately for them with different maximum light use efficiencies. TL-LUE considers the canopy radiative transfer processes and accounts for the different photosynthetic light responses of sunlit and shaded leaves. The basic formulas for the TL-LUE are as follows: where APAR su and APAR sh denote the photosynthetically active radiation absorbed by sunlit and shaded leaves, respectively; ε msu and ε msh denote the maximum light use efficiency for sunlit and shaded leaves, respectively; and TMIN scalar and VPD scalar denote the down-regulation scalars that account for the effects of daily minimum air temperature and vapor pressure deficit, respectively, on plant light use efficiency. Further, where PAR dir and PAR dif denote the direct and diffuse components of incoming photosynthetically active radiation, respectively; PAR dif,u denotes the diffuse photosynthetically active radiation under the canopy and is calculated following [49]; α denotes canopy albedo; β denotes mean leaf-sun angle; θ denotes solar zenith angle; C denotes the contribution of multiple scattering of the total photosynthetically active radiation to the diffuse irradiance per unit leaf area within the canopy; and LAI, LAI sh , and LAI su denote the leaf area index, sunlit leaf area index, and shaded leaf area index, respectively. The sunlit and shaded components of LAI are derived as follows: where LAI, LAI sh , and LAI su denote the leaf area index, sunlit leaf area index, and shaded leaf area index, respectively; and Ω is the canopy clumping index. Following previous studies [45,50,51], key parameters for the biome of deciduous broadleaf forest in TL-LUE are set as follows: ε msu = 0.47 gC MJ −1 , ε msh = 2.06 gC MJ −1 , TMIN min = −8.00 • C, TMIN max = 7.94 • C, VPD min = 0.93 kPa, VPD max = 4.10 kPa, α = 0.18, β = 60 • , and Ω = 0.80.

Vegetation Photosynthesis Model
The Vegetation Photosynthesis Model (VPM) proposed by [46] is a big-leaf light use efficiency model to simulate vegetation GPP. The underlying idea for VPM to improve the GPP modeling is to separate photosynthetically active vegetation (mostly green leaves) and non-photosynthetically active vegetation (mostly senescent leaves, stems, and branches). The formulas of VPM are as follows: where GPP denotes gross primary production; PAR denotes photosynthetically active radiation; FPAR PAV denotes the fraction of photosynthetically active radiation absorbed by photosynthetically active vegetation; ε max denotes the maximum light use efficiency; and T scalar , W scalar , and P scalar denote the down-regulation scalars that account for the effects of air temperature, water, and leaf phenology, respectively, on plant light use efficiency. Further, where EVI denotes Enhanced Vegetation Index as derived from the surface reflectance from near-infrared, red, and blue bands; and a denotes the correlation coefficient between EVI and FPAR PAV . The environmental scalars are derived as follows: from bud burst to leaf full expansion 1 after leaf full expansion (10) where T denotes daily mean air temperature; T max , T min , and T opt denote the maximum, minimum, and optimal air temperature for vegetation photosynthesis, respectively; LSWI denotes the land surface water index; and LSWI max denotes the maximum land surface water index during the vegetative growing season.

The Growing Production Day Model
The Growing Production Day (GPD) model is a process-based model that extends our previous work [52,53] and synthesizes state-of-the-art models to simulate canopy photosynthesis. The GPD model consists of several submodels and each submodel involves a complex system of equations. A brief summary for each submodel is provided in Table 2, while complete descriptions of the GPD model can be found in Xin [26]. The GPD model accounts for five fundamental processes within a canopy: (1) a radiative transfer model that determines the canopy light environment and the leaf radiation absorption; (2) a leaf photosynthesis model that simulates the photosynthetic rates of individual leaves; (3) a leaf conductance model that derives the leaf boundary layer and stomatal conductance; (4) a leaf energy balance model that accounts for leaf temperature and leaf transpiration; and (5) a surface energy balance model that derives soil evaporation rates. The GPD model has already been calibrated for the biome of deciduous broadleaf forests and the parameter setting can be found in [26]. In this study, we calibrated the parameter of CN (i.e., the ratio of leaf carbon to nitrogen) as 38 g C g −1 N and kept all the other parameters unchanging. Table 2. The generalized equations for the growing production day (GPD) model to simulate canopy photosynthesis and evapotranspiration.

Model Name Equations
Canopy radiative transfer Note: PAR l denotes the radiation absorbed by either sunlit or shaded leaves per leaf hemi-surface area at the photosynthetically active radiation wavelength; R ln denotes the net shortwave radiation at the leaf surface; R sn denotes the net shortwave radiation at the soil surface; f denotes the function with input arguments in parentheses; R g denotes daily total incoming solar radiation at the canopy top; R d denotes daily diffuse radiation at the canopy top; θ denotes solar zenith angle; LAI denotes leaf area index; Ω denotes foliage clumping index; A n denotes the leaf net photosynthetic rate; T l denotes daily mean leaf temperature; Photo denotes daily photoperiod for a given day of the year at a given geolocation; SWC denotes the root zone soil water content; P atm denotes daily atmospheric pressure; g s denotes the leaf stomatal conductance; g b denotes the leaf boundary layer conductance; c i denotes the intercellular CO 2 partial pressure; VPD denotes daily vapor pressure deficit; u denotes the wind speed; [CO 2 ] denotes atmospheric CO 2 concentration; λE l denotes the latent heat at the leaf surface; T a denotes daily mean air temperature; and λE s denotes the evaporation rate for the soil surface.

The Breathing Earth System Simulator Product
Breathing Earth System Simulator (BESS) is a process-based vegetation model that couples atmosphere and canopy radiative transfers, canopy photosynthesis, transpiration, and energy balance. BESS incorporates multiple MODIS atmosphere and land products [47] to produce global GPP and evapotranspiration products. The 8-day 1000 m GPP products as derived from BESS were obtained from the Environmental Ecology Lab of Seoul National University (http://environment.snu.ac.kr/). A detailed description of the BESS model can be found in [47] and a comprehensive evaluation of the BESS GPP products across biomes using the FLUXNET 2015 dataset can be found in [18].

The Eddy Covariance and MODIS Data-Driven Model
The eddy covariance and MODIS data-driven model (EC-MOD) applies a data-driven approach that simulates vegetation GPP using the eddy covariance flux tower data, MODIS data streams, micrometeorological reanalysis data, stand age, and aboveground biomass data [54]. The EC-MOD dataset that provides gridded carbon and water fluxes over North America was downloaded from the Global Ecology Group at University of New Hampshire (http://globalecology.unh.edu/). The EC-MOD dataset consists of gross primary productivity (GPP), net ecosystem exchange (NEE), ecosystem respiration (ER), and evapotranspiration (ET) at 1000 m spatial resolution [48], and only yearly data were accessible when the study was conducted.

Model Implementation and Comparison
Regional-scale studies were performed at a spatial resolution of 1000 m for the time period spanning from 2001 to 2012. The TL-LUE, VPM, and GPD models were implemented using the MODIS data and the Daymet meteorological data. The modeled daily GPP at 1000 m resolution was then summed for the calendar year to obtain the corresponding annual GPP. Vegetation GPP from the preprocessed 8-day 1000 m MOD17A2H data and from the 8-day 1000 m BESS product were multiplied by 8 for the first 45 values/year and by 5 (or 6 in a leap year) for the final period and then summed to obtain annual values. The yearly 1000 m GPP data from the EC-MOD product were extracted directly for use. All model results are resampled using the nearest neighborhood approach to the MODIS Sinusoid projection to allow for direct comparisons. To quantitatively compare the spatiotemporal patterns of the modeling results, annual GPP was first temporally averaged for the years from 2001 to 2012 and was then zonally and meridionally averaged for every 1 degree along latitude and longitude, respectively [55].
Site-scale studies were further conducted to understand how the models perform as evaluated against flux tower measurement data on both the 8-day and the yearly bases. For studies on the 8-day basis, daily GPP derived from the TL-LUE, VPM, and GPD models and those obtained from flux tower measurements are temporally averaged for each 8-day time period in a calendar year. All modeled results on both the 8-day and the yearly bases were extracted for the pixel that contains the corresponding flux tower site from the 1000 m datasets. To understand how different climate input data influence the modeling results, the TL-LUE, VPM, and GPD models were also driven using the meteorological data measured at flux tower sites. The commonly used metrics, including the coefficient of determination (R 2 ), the root-mean-squared error (RMSE), and the mean bias error (bias), were used for model assessment and comparisons.

Regional-Scale Model Comparisons
The spatial distributions of 12-year (2001-2012) average annual gross primary productivity (AAGPP) of deciduous broadleaf forests in the eastern United States vary across models and datasets ( Figure 1). The MOD17A2H product, a widely used dataset in land surface studies, shows an east-west gradient in the spatial distribution of AAGPP, where inland areas have low AAGPP. Differently, AAGPP derived from VPM exhibits a south-north gradient, where southern areas have higher AAGPP than northern areas. TL-LUE generally agrees with VPM in terms of the spatial distribution of AAGPP but TL-LUE has much higher AAGPP estimates than VPM in most areas. Among the six tested methods, GPD matches the data-driven dataset of EC-MOD in terms of the spatial distribution and magnitude of AAGPP. Compared with other methods, the EC-MOD data appear to contain a large spatial heterogeneity of AAGPP. Similar to MOD17A2H, the BESS product has lower AAGPP in inland areas but also shows suspicious mosaicking effects in the modeled AAGPP.
The zonally and meridionally averaged AAGPP profiles also vary widely across models (Figure 2). In the south-north direction, zonally averaged AAGPP of MOD17A2H does not change with latitudes (p value = 0.304) while AAGPP derived from all other models decreases as latitude increases ( Table 3). The zonally averaged AAGPP derived from BESS decreases at a much slower rate (−21.962 gC/m 2 /year/degree latitude) along latitude than those derived from the other models. In the west-east direction, meridionally averaged AAGPP derived from MOD17A2H increases as longitude increases (6.797 gC/m 2 /year/degree longitude), meridionally averaged AAGPP derived from BESS does not change with longitude (p value = 0.944), and all the other modeled results have decreased AAGPP with increasing longitudes. Although the zonally and meridionally averaged AAGPP profiles of TL-LUE, VPM, GPD, and EC-MOD have similar trends, the absolute values of AAGPP could vary largely: for example, AAGPP values derived from TL-LUE are nearly 1.4 times higher than those derived from VPM.    Given the mismatched spatial distributions of AAGPP across models, the spatial distribution of average meteorological variables could provide qualitative references to understand the sensitivities of modeled AAGPP to climate conditions. The average LAI, average air temperature, and average VPD have similar spatial distributions, where their values are lower in high-latitude regions than in low latitude regions and are lower in mountainous regions than in low-altitude regions (Figure 3). Note that high radiation absorption (i.e., high LAI and solar radiation) and air temperature mostly enhance vegetation photosynthesis, while high VPD prohibits stomatal openness and negatively impacts vegetation photosynthesis. The simulated AAGPP levels of TL-LUE, VPM, GPD, and EC-MOD generally match those of average LAI and air temperature, while the MOD17A2H data show the opposite, indicating that the current version of the algorithm that generates MOD17A2H is highly  Table 3. Changing rates of zonally and meridionally averaged profiles of 12-year (2001-2012) average annual gross primary productivity (AAGPP) derived from different models along latitude and longitude. Given the mismatched spatial distributions of AAGPP across models, the spatial distribution of average meteorological variables could provide qualitative references to understand the sensitivities of modeled AAGPP to climate conditions. The average LAI, average air temperature, and average VPD have similar spatial distributions, where their values are lower in high-latitude regions than in low latitude regions and are lower in mountainous regions than in low-altitude regions (Figure 3). Note that high radiation absorption (i.e., high LAI and solar radiation) and air temperature mostly enhance vegetation photosynthesis, while high VPD prohibits stomatal openness and negatively impacts vegetation photosynthesis. The simulated AAGPP levels of TL-LUE, VPM, GPD, and EC-MOD generally match those of average LAI and air temperature, while the MOD17A2H data show the opposite, indicating that the current version of the algorithm that generates MOD17A2H is highly sensitive to VPD. The spatial pattern of AAGPP derived from BESS is apparently not related to the distribution of those key meteorological variables. Since both MOD17A2H and BESS data are produced using downscaled coarse-resolution climate data, the uncertainties among different meteorological data could influence the GPP simulation.

Models AAGPP as a Function of Latitude AAGPP as a Function of Longitude
Remote Sens. 2018, 10, x FOR PEER REVIEW 10 of 17 sensitive to VPD. The spatial pattern of AAGPP derived from BESS is apparently not related to the distribution of those key meteorological variables. Since both MOD17A2H and BESS data are produced using downscaled coarse-resolution climate data, the uncertainties among different meteorological data could influence the GPP simulation.

Site-Scale Model Comparisons
While the six tested models are shown to have divergent spatiotemporal patterns of derived AAGPP, model evaluation against the eddy covariance flux tower data shows that the accuracies of different models are all high for studies on the 8-day basis (Figure 4). Even for the MOD17A2H product, which has the lowest modeled accuracy among the tested models, the R² value obtained is as high as 0.744 and the RMSE value is as low as 2.

Site-Scale Model Comparisons
While the six tested models are shown to have divergent spatiotemporal patterns of derived AAGPP, model evaluation against the eddy covariance flux tower data shows that the accuracies of different models are all high for studies on the 8-day basis (Figure 4). Even for the MOD17A2H product, which has the lowest modeled accuracy among the tested models, the R 2 value obtained is as high as 0. However, on the yearly basis, the modeled annual GPP are far from satisfactory when validated against the flux tower data ( Figure 5). Among the tested methods, the best-performing model of TL-LUE only achieved R² values as low as 0.373 and the RMSE values as high as 278.8 gC/m²/year. Both GPD (R² = 0.279, RMSE = 245.6 gC/m²/year) and BESS (R² = 0.224, RMSE = 382.7 gC/m²/year) perform slightly better than the remaining models. The EC-MOD data as derived based on a data-driven approach, however, is leaps and bounds from the flux tower data on the yearly basis, despite the fact that its spatial distribution of AAGPP appears to be reasonable in Figure 1.
To understand the model uncertainties introduced by different climate data, three models (i.e., the TL-LUE, VPM, and GPD models) were also implemented with the meteorological data measured at flux towers. The modeled 8-day GPP as driven using flux tower data ( Figure 6) performs similarly to that driven by the Daymet data (Figure 4), suggesting that the mismatch in meteorological data between flux tower measurements and the Daymet data is not the decisive factor. The modeled yearly GPP can vary largely using different climate input data for the TL-LUE and VPM models ( Figures 5  and 6), indicating that both light-use-efficiency-based models are sensitive to the climate inputs on the yearly basis. The GPD model produces consistent results when modeling using the flux tower climate data and the Daymet data. However, on the yearly basis, the modeled annual GPP are far from satisfactory when validated against the flux tower data ( Figure 5). Among the tested methods, the best-performing model of TL-LUE only achieved R 2 values as low as 0.373 and the RMSE values as high as 278.8 gC/m 2 /year. Both GPD (R 2 = 0.279, RMSE = 245.6 gC/m 2 /year) and BESS (R 2 = 0.224, RMSE = 382.7 gC/m 2 /year) perform slightly better than the remaining models. The EC-MOD data as derived based on a data-driven approach, however, is leaps and bounds from the flux tower data on the yearly basis, despite the fact that its spatial distribution of AAGPP appears to be reasonable in Figure 1.
To understand the model uncertainties introduced by different climate data, three models (i.e., the TL-LUE, VPM, and GPD models) were also implemented with the meteorological data measured at flux towers. The modeled 8-day GPP as driven using flux tower data ( Figure 6) performs similarly to that driven by the Daymet data (Figure 4), suggesting that the mismatch in meteorological data between flux tower measurements and the Daymet data is not the decisive factor. The modeled yearly GPP can vary largely using different climate input data for the TL-LUE and VPM models ( Figures 5 and 6), indicating that both light-use-efficiency-based models are sensitive to the climate inputs on the yearly basis. The GPD model produces consistent results when modeling using the flux tower climate data and the Daymet data.

Discussion
For a widely studied biome like deciduous broadleaf forest and for an extensively studied region like the eastern United States, the mismatches in the spatiotemporal patterns of derived annual GPP among satellite-based models are unexpected. Past satellite-based studies have largely focused on model development and evaluation on the 8-day or shorter time scale. Note that deciduous broadleaf forests have distinctive seasonality of leaf allocation dynamics [56] and, therefore, modeling intra-annual GPP variation is much easier than modeling interannual GPP variation as the influences of seasonality in GPP cancel out on the yearly basis. In addition, studies on the yearly basis require much longer and more persistent observation records than those on the weekly or shorter time scale.
Existing modeling studies paid much attention to error metrics such as R 2 and RMSE but overlooked or even did not report the metric of model biases. For studies on the deciduous broadleaf forests, the MOD17 algorithm achieved an RMSE of 2.32 gC/m 2 /day as evaluated against the BigFoot GPP [57]. The VPM model achieved an R 2 of 0.92 at a 10-day time scale when assessed using flux tower data [46]. A recent study that used the FLUXNET dataset for model evaluation reported that the MOD17 algorithm achieved an R 2 of 0.88 and an RMSE of 12.61 gC/m 2 /8 day (or 1.58 gC/m 2 /day equivalently) and the TL-LUE model achieved an R 2 of 0.90 and an RMSE of 11.21 gC/m 2 /8 day (or 1.40 gC/m 2 /day equivalently) [50]. Comparative studies on global vegetation products using the FLUXNET dataset found that the BESS product achieved an R 2 of 0.72 with an RMSE of 2.59 gC/m 2 /day and the MOD17 product achieved an R 2 of 0.68 with an RMSE of 2.97 gC/m 2 /day for deciduous broadleaf forests [18]. The GPD model was found to achieve an R 2 of 0.787 and an RMSE of 2.25 gC/m 2 /day [26]. The EC-MOD model was reported to achieve an R 2 of 0.74 for the modeled GPP as evaluated using AmeriFlux data across biomes [58], and no biome-specific studies have been found. Despite different datasets and study regions, the model performance on the 8-day or shorter time scale in the above-mentioned studies is close to and comparable to that found in this study. Modeling annual GPP, however, requires paying more attention to the model bias because the cumulative biases could become considerable on a yearly basis, as shown in Figures 1 and 2, even when the model biases are small on the 8-day basis. Our study chooses to use existing parameter settings in the literature and supposes that the models have already been calibrated to achieve their best performance across space and time. Later studies may need to calibrate the models to balance the model performance in terms of bias and RMSE, or, otherwise, the modeling results could be misleading on the yearly basis.
Beside appropriate model calibrations, studies that aim for model improvements need to investigate the potential mechanisms behind the unsatisfactory model performance on the yearly basis. One possible reason is that all models tested here apply biome-specific parameters but the canopy structure and tree species composition could vary greatly across sites and spaces even for the same biome of deciduous broadleaf forest. In our results, all models appear to underestimate GPP for the US-Ha1 and US-Oho sites. One possible reason is that understory LAI could contribute considerable GPP. Both field experiments and modeling studies are needed to understand why the model performance in these two sites is inconsistent with that in other sites.
Accurate modeling of annual GPP greatly influences our understanding of the ecosystem processes as well as the climate-vegetation interactions. Even for deciduous broadleaf forests that have abundant observational sites available among ecosystems, the site-year flux tower data are still limited for comprehensive modeling studies on a yearly basis. To better understand the performance of the state-of-the-art ecosystem models, especially their abilities in simulating interannual variability of vegetation GPP, continuous monitoring of the land surface from both satellite remote sensing and flux tower sites is definitely needed.

Conclusions
This study compares six satellite-based vegetation photosynthesis models for simulating the GPP of deciduous broadleaf forest. We find that the simulated multiyear AAGPPs from different satellite-based models have mismatched spatial patterns with varied changing trends along both latitude and longitude. Model evaluation using flux tower data indicates that all tested models, despite performing well on the 8-day basis, fail to capture the interannual variation of GPP across sites and years. Given that modeling annual GPP greatly influences our ability to predict the carbon exchanges between the land surface and the atmosphere, especially under the scenario of a changing climate, there is a need to develop robust models that are capable of capturing the interannual variation, in addition to intra-annual variation, of vegetation productivity.
Author Contributions: L.L. performed the analysis and wrote the paper; Y.Z. and Y.F. interpreted the results; Q.X. designed the experiments and wrote the paper. All authors commented on and approved the final manuscript.