Annual Gross Primary Production from Vegetation Indices: A Theoretically Sound Approach

.M.) Abstract: A linear relationship between the annual gross primary production (GPP) and a PAR-weighted vegetation index is theoretically derived from the Monteith equation. A semi-empirical model is then proposed to estimate the annual GPP from commonly available vegetation indices images and a representative PAR, which does not require actual meteorological data. A cross validation procedure is used to calibrate and validate the model predictions against reference data. As the calibration/validation process depends on the reference GPP product, the higher the quality of the reference GPP, the better the performance of the semi-empirical model. The annual GPP has been estimated at 1-km scale from MODIS NDVI and EVI images for eight years. Two reference data sets have been used: an optimized GPP product for the study area previously obtained and the MOD17A3 product. Different statistics show a good agreement between the estimates and the reference GPP data, with correlation coefﬁcient around 0.9 and relative RMSE around 20%. The annual GPP is overestimated in semiarid areas and slightly underestimated in dense forest areas. With the above limitations, the model provides an excellent compromise between simplicity and accuracy for the calculation of long time series of annual GPP.


Introduction
The analysis of changes in the long-term estimates of the CO 2 ecosystem-atmosphere fluxes is important to establish the global carbon balance, especially in forest ecosystems, the main sinks of atmospheric carbon in the biosphere. A key factor of this balance is the gross primary production (GPP), which represents the total carbon uptake through photosynthesis per unit of time and per unit of area. Although GPP is the source of carbon for all carbon fluxes in the ecosystem, there is no direct method of measurement. GPP is calculated from in situ net primary production data acquired in eddy covariance (EC) towers after correction for respiration losses [1,2]. Ecosystem models [3] validated against EC data and combined with meteorological and remotely sensed data [4,5] allow for the estimation of GPP across space and time and, hence, for the quantification of carbon fluxes at regional to global scales [2].
The GPP is the rate at which vegetation converts light into chemical energy by photosynthesis. As the photosynthesis is driven by solar radiation, GPP can be estimated using the Monteith equation [6].
GPP = ε f APAR PAR (1) where PAR is the flux density of photosynthetically-active radiation, i.e., the global incident radiation in the range 400-700 nm; f APAR is a structural variable that represents the fraction (between 0 and 1) of incident PAR absorbed by the canopy; and ε (g·MJ −1 ) is the conversion efficiency (also known as light-use efficiency (LUE)). When the quantities in Equation (1) are defined on a daily basis, a subscript i ranging from 1 to 365 is added to them. The units of GPP i and PAR i are g·m −2 ·day −1 and MJ·day −1 , respectively. The annual GPP for the year Y, in kg·m −2 ·yr −1 , is the sum of the corresponding daily values: Taking into account the loss due to autotrophic respiration, the GPP annual can be used to assess the carbon sequestration into dry matter during a year (annual net primary production), and hence to quantify the changes in biomass over long periods of time. However, obtaining long time series of GPP annual from daily GPP values is a time-consuming process that requires a huge number of inputs that might be unavailable, especially in retrospective studies [4].
In this work, a semi-empirical model that avoids the calculation of all the factors in Equation (1) is proposed to estimate the annual GPP. As the next section will show, our approach relies on vegetation indices (VIs) products commonly available such as the Normalized Difference Vegetation Index (NDVI) [7] and the Enhanced Vegetation Index (EVI) [8]. In this case, the two MODIS standard VI products have been used. On the one hand, MODIS NDVI is included since it is referred as the "continuity index" [8] to the existing 30+ year NOAA-AVHRR derived NDVI time series, which can be extended to provide a longer-term data record required in climatic studies. On the other hand, EVI has been shown to improve NDVI sensitivity in dense vegetation regions and a better performance to minimize soil and atmosphere influences [8].

A Theoretically Sound Approach
The incident irradiance at the surface depends on the extraterrestrial irradiance and on the atmospheric conditions. The brightness variation of the Sun due to the sunspot cycle (with a period of ca. 11 years) provokes changes in the extraterrestrial global irradiance smaller than 0.1% [9]. Hence, the extraterrestrial irradiance is practically constant. However, the surface global irradiance at a given location might be highly variable since it is affected by random changes due to atmospheric conditions, which introduce an inter-annual variation. Regarding the long-term trends in the global irradiance at the surface, a general decrease ("global dimming") has been reported during the last half century both in the northern and southern hemisphere, which might be related to a decrease of atmospheric transmittance due to changes in cloud characteristics and in atmospheric aerosols [10,11]. Nevertheless, in the last three decades, these changes have been less significant or even ceased [12].
According to the above considerations, the inter-annual variations of the annual PAR should be negligible. Although PAR Y i in Equation (2) might show random inter-annual variations associated to atmospheric state, it can be approximated (at each location) by a typical or representative value PAR • i that shows the main seasonal periodicity (associated exclusively to geometric factors). This assumption means that the effect of the changes of solar irradiance on annual GPP is less important than that due to vegetation changes (as discussed in Section 4). Radiation explains most of the variation in canopy photosynthesis over short periods, but environmental factors such as water availability become more important over long periods [12][13][14][15]. Since alterations in global solar irradiance only have a small impact on GPP annual , the observed inter-annual variability in GPP annual has to be related to structural changes that affect the f APAR, such as changes in leaf area index (LAI) or leaf area duration.
Remote Sens. 2017, 9,193 3 of 21 The f APAR is related to the vegetation structure and pigment content. It is frequently obtained from remote sensing data using vegetation indices (VIs) [16][17][18][19][20]. Some authors use the VIs as proxies of both ε and f APAR and rewrite Equation (1) as GPP = VI × VI × PAR [21,22]. In the latter case, both broad-band VIs and narrow-band indices can be considered. The Photochemical Reflectance Index (PRI) is a narrow-band VI [23] that is potentially connected with ε [24]. The GPP has also been directly related to PAR through the chlorophyll content, which is in turn closely related to green f APAR [25,26]. In any case, a linear relation with a vegetation index, such as the NDVI and the EVI, is a realistic approximation for the evaluation of f APAR . This has been sufficiently shown by several authors, as mentioned above, and does not require further analysis.
As commented by some authors [27], whereas f APAR,i is related to the vegetation structure and pigment pools, the conversion efficiency is related to the physiology. The daily conversion efficiency ε i can vary spatially between biomes, ecosystems, and plant species, and temporally during the growing season due to environmental and physiological limitations [28][29][30][31][32]. The daily conversion efficiency is generally taken as the product of a biome-specific maximum value (g·MJ −1 ) and several dimensionless factors accounting for the efficiency reduction due to different types of stress, such as the thermal and the water stress. Our study area is characterized by different Mediterranean ecosystems where the water stress introduces most of the inter-annual variability in ε i [4,13]. On a daily scale, the different types of stress must be accounted for at quasi-real time, which is rather difficult and frequently requires actual meteorological data. Nevertheless, it has been shown that after relatively long stress periods, the plant photosynthetic apparatus is damaged and ultimately leads to structural canopy alteration (i.e., a decrease in green LAI and therefore in f APAR ). This alteration can be noticed, with a time delay of one or two months [33,34], in broad-band spectral indices such as the NDVI and narrow-band indices such as the PRI. In particular, numerous studies carried out in different environments have shown that NDVI is sensitive to vegetation water stress [35,36].
It can be safely assumed that the effects of the different down-regulating factors at annual scale affect the f APAR , since stresses acting on the canopy finally produce either less chlorophyll and/or less foliage and reduce the VI value. Hence, at annual scale, the conversion efficiency ε can be considered to be independent of time, and approximated by the ecosystem maximum efficiency. As f APAR,i and ε i can be confounded to varying degrees depending on the underlying dynamic biological processes and on their exact definition [27], with this approach we are avoiding a further insight in these concepts and in their exact operational definition. In fact, this approach takes into account the coordinate response or functional convergence [27] of both variables in the presence of nutrient or water constraints that limit plant growth as well as plant structure and pigment content. Table 1 summarizes the hypotheses concerning the estimation of GPP Y annual (H1 is analyzed in next section). Hence, the variations from one year Y to another are attributed to those of f APAR , i.e., to VI. From Equation (2) and the hypotheses in Table 1, an approximated GPP Y annual can thus be calculated as where That is, C 1 and C 2 are constant for a specific vegetation type, and Remote Sens. 2017, 9,193 4 of 21 This average VI is physically sound as it is an average over the energy received per surface area. Equations (3)-(7) require the series VI Y i and PAR • i (which is the same for all years), and the parameters, a, b and ε max .
In this work, the annual GPP is obtained at 1-km spatial resolution from the MODIS VI standard products (NDVI and EVI) using the semi-empirical model in Equation (4). After a brief analysis of H1, the coefficients C 1 and C 2 are obtained and discussed as a function of the vegetation type. For the sake of clarity, GPP REF annual refers to annual GPP values obtained using Equation (2) since they are considered our "reference values", whereas GPP annual refers to estimated values using the semi-empirical model. Data from 2005 to 2012 are employed in the study. The performance of the model is assessed in its ability to reproduce annual GPP inter-annual variations in close agreement with those shown by the reference data.
The conversion efficiency is independent of time and approximated by the ecosystem maximum efficiency at annual scale.

Study Area
Spain has remarkable landscape diversity and a wide range of ecosystems due to its geographic position in southwestern Europe, climate, geological features and relief, with elevation varying from sea level to 3479 m. A vast plateau surrounded by a number of mountain ranges dominates the central part of Spain. Synoptic air masses create a NW to SE gradient of water availability (total annual precipitation varies from 2000 mm to 120 mm), ranging from Atlantic humid climate zones on the north coast to Mediterranean climate on the east coast, the SE corner becoming the most arid zone of Europe [37]. Spain has a mosaic of land cover that includes significant areas of traditional and newly developed agriculture (49% of land), dominated by non-irrigated land. These areas are embedded in a matrix of natural and semi-natural vegetation (47% of land), mainly occupied by shrublands, and broadleaved or coniferous forests [38]. Vegetation distribution is strongly controlled by aridity, and drought conditions have a marked influence on vegetation cover and activity. The droughts that affect the semiarid areas represent an important factor in environmental degradation. Figure 1 shows a precipitation map obtained averaging the annual precipitation of the eight years analyzed in this study (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012), an average annual PAR image and the reference annual GPP for 2011 (both products will be described later).

Daily PAR Images
Daily PAR images were obtained from daily irradiation images over Spain. The latter were calculated using two different procedures, previously inter-compared to ensure their homogeneity.

Daily PAR Images
Daily PAR images were obtained from daily irradiation images over Spain. The latter were calculated using two different procedures, previously inter-compared to ensure their homogeneity. Only the first procedure uses a satellite product, the down-welling surface short-wave radiation Remote Sens. 2017, 9,193 6 of 21 flux (DSSF) from SEVIRI/MSG images [39]. Therefore, it is preferred from an operational point of view. In this procedure, the DSSF product was integrated along a day to obtain the daily irradiation. The DSSF data are, in the native geostationary projection, centered at 0 • longitude and with a sampling distance of 3 km at the sub-satellite point. DSSF data were downscaled using the spatial disaggregation of the irradiance with a digital elevation model, DEM [40]. First, these data were re-projected and spatially smoothed by means of a linear interpolation to obtain 1-km lat/lon global irradiance images. This procedure removed the sharp borders or discontinuities between adjacent pixels. When no surrounding data were available (e.g., in the coastline), a fast technique to interpolate the images was developed. A further topographic correction was applied, which included the elevation correction and slope and aspect effects by means of a DEM. This procedure was used from 2007 to 2012. The resulting PAR images at 1-km spatial resolution present a mean absolute error (MAE) ranging from 0.5 MJ·m −2 ·day −1 to 0.9 MJ·m −2 ·day −1 in rugged terrain. Although the DSSF product is preferred from an operational point of view, a second procedure was employed when DSSF images were not available. This second procedure [41] applies artificial neural networks to temperature and precipitation maps generated by ordinary kriging from in situ data. Unlike temperature and precipitation, solar radiation data are recorded only at a limited number of weather stations in Spain and thus solar radiation maps cannot be obtained directly interpolating those data. However, they can be estimated using extraterrestrial irradiation and air temperature and precipitation, which serve to characterize atmospheric transmittance. The MAE for PAR was also rather low in this case, about 1.2 MJ·m −2 ·day −1 . A relationship between both series was found and used to fill the gaps in the first series.
According to climate records (www.aemet.es), the study period (2005-2012) presents remarkable weather heterogeneity. It comprises hot and dry years as well as hot and humid ones. However, these general features depend on each particular region, and the drought events are non-uniform and affect different regions on different years.

Daily GPP Images
Daily GPP images at 1-km resolution were obtained for the period 2005-2012 following a procedure based on the optimization of Monteith's approach, Equation (1), by adjusting the inputs (f APAR , PAR and ε) for the study area. This optimized GPP product is driven by meteorological and satellite data (MODIS/TERRA and SEVIRI/MSG). Daily PAR was obtained as described above. f APAR was calculated by applying to MODIS data (at 1-km spatial resolution and 8-day temporal resolution) the algorithm that is actually used to derive the SEVIRI/MSG f APAR product, delivered by the LSA SAF network (EUMETSAT) (http://landsaf.meteo.pt) at 3.1-km spatial resolution (sub-satellite point) and daily frequency over the geostationary MSG grid. This algorithm uses the Renormalized Difference Vegetation Index, RDVI [42]. Subsequently, the f APAR series were filtered and interpolated using an adapted (iteratively reweighted) local regression filter (LOESS) [43]. This procedure takes into account the data quality flags and has shown a very good performance; it is particularly resistant to outliers and offers an optimal reconstruction even in most extreme situations with long seasonal gaps. In order to assign a value of ε max to the different cover types, a hybrid land-cover map for Spain [44] was used (see below). Meteorological data were used to characterize the ε inter-annual variability due to water stress, which produces the most significant efficiency reduction in the study area. In particular, the water stress factor C ws [5], which accounts for limited photosynthetic activity in case of short-term water stress, was obtained from a local water budget based on actual and potential evapotranspiration.
This daily GPP product was validated for 2008 and 2011 [4] by comparison with in situ GPP estimates from EC data (direct validation) and by inter-comparison with the MODIS GPP product (MOD17A3) [45], which will be also used in this work. The direct validation has evidenced an excellent agreement with correlations up to 0.98 in 2008 and 0.92 in 2011 in some sites. The inter-comparison has shown that the two GPP products are consistent temporally. However, a slightly decrease of the correlation has been observed in high precipitation areas (Northern Spain) and in semiarid areas (Southern Spain). Whereas the discrepancies on the northern region could be explained due to differences in the PAR (which seems to be better estimated in our approach), the discrepancies on the southern region could be related to the different way that the two GPP products account for the water stress. Our results evidence that the water stress factor C ws outperforms the VPD (vapor pressure deficit) used in the official MODIS product. A further analysis on the explanatory power of the optimized GPP in terms of its inputs showed, as expected, that PAR and f APAR are the most relevant inputs. Their relative importance depends on the location of the vegetation annual maximum so that the f APAR plays a major role on the GPP estimation when that maximum is not reached during solar solstice. Nevertheless, ε has to be evaluated accurately in order to explain the GPP inter-annual variability associated with the water shortage typical of Mediterranean landscapes [4].

Daily VI Images
The MODIS standard VI product MOD13A2 [8] includes two gridded vegetation indices (NDVI and EVI) at 1-km spatial resolution and 16-day temporal resolution (https://reverb.echo.nasa.gov/), and quality analysis (QA) with statistical data that indicate the quality of the VI product and input reflectance data. The VI products are corrected for molecular scattering, ozone absorption, and aerosols. The MODIS VI algorithm operates on a per-pixel basis and relies on multiple observations over a 16-day period to generate a composited VI.
The VI time series were filtered and interpolated using an adapted LOESS, the same procedure applied to f APAR series [43], to generate daily NDVI and EVI images.

Vegetation Type Images
Two vegetation classifications were used: (i) a hybrid land-cover map obtained by the synergistic combination of four land-cover classifications (CGL2000, CORINE, IGBP and GlobCover) that improves the performance of individual classifications by reconciling their best characteristics while avoiding their main weaknesses [44]; and (ii) an ecosystem functional type map that characterizes the patterns of diversity and status from functional attributes such as the NDVI, land surface temperature and albedo [38].

Methodology
Hypotheses 2 and 3 are supported by literature (see Section 1) and, therefore, they are accepted as a stating point of our methodology. Although H1 is also a sound hypothesis, its adequacy to our study area is assessed by statistical tests. They allow us to evaluate whether PAR inter-annual variations introduce negligible changes in annual GPP as compared to those due to the inter-annual variability of the VI.
As mentioned at the end of Section 2, once H1 is confirmed, PAR • i is calculated. This is considered as a typical or representative daily PAR series for each pixel of the study area. It is obtained by averaging the daily PAR for the eight-year dataset. images, are analyzed to establish two sets of coefficients C 1 and C 2 . The calibration (obtaining C 1 and C 2 ) and testing of the semi-empirical model are performed simultaneously by means of a cross-validation procedure [46]. This procedure allowed us to calculate and test the model parameters from the eight years data used in this study. The cross-validation method systematically removes one case in a dataset (a year data in our case), derives a forecast from the remaining cases, and tests it on the removed case. The procedure is nonparametric and can be applied to any automated model-building technique.
Remote Sens. 2017, 9,193 8 of 21 To carry out the cross-validation procedure, data are divided into eight partitions corresponding to each one of the eight available annual images of GPP and weighted NDVI. The procedure is as follows. For year Y (=2005-2012), the images of years = Y are used to train a linear model. The model is validated using the images of year = Y. The coefficient of correlation (R), MBE (mean bias error), MAE (mean absolute error), RMSE (root mean square error), and their relative values (rMBE, rMAE, and rRMSE) are calculated. Finally, the mean and standard deviation of the regression coefficients are calculated and considered as the final regression coefficients of the model and their associated errors.
The cross validation is carried out in two ways: (i) considering a constant value of the conversion efficiency (C 1 and C 2 are independent of the land-cover class); and (ii) considering that C 1 and C 2 depend on vegetation, according to the two thematic maps described in Section 3.2. A flowchart for GPP annual estimation using the semi-empirical model is shown in Figure 2.
Remote Sens. 2017, 9,193 8 of 21 systematically removes one case in a dataset (a year data in our case), derives a forecast from the remaining cases, and tests it on the removed case. The procedure is nonparametric and can be applied to any automated model-building technique.
To carry out the cross-validation procedure, data are divided into eight partitions corresponding to each one of the eight available annual images of GPP and weighted NDVI. The procedure is as follows. For year Y (=2005-2012), the images of years ≠ Y are used to train a linear model. The model is validated using the images of year = Y. The coefficient of correlation (R), MBE (mean bias error), MAE (mean absolute error), RMSE (root mean square error), and their relative values (rMBE, rMAE, and rRMSE) are calculated. Finally, the mean and standard deviation of the regression coefficients are calculated and considered as the final regression coefficients of the model and their associated errors.
The cross validation is carried out in two ways: (i) considering a constant value of the conversion efficiency (C1 and C2 are independent of the land-cover class); and (ii) considering that C1 and C2 depend on vegetation, according to the two thematic maps described in Section 3.2. A flowchart for estimation using the semi-empirical model is shown in Figure 2.
year n

H1 Hypothesis Testing
Annual PAR images for the eight years (see Section 3.1) and a coefficient of variation image were calculated (Figure 3a). The coefficient of variation (mean value equal to 0.026) shows the inter-annual variability of the annual PAR throughout the period 2005-2012, which is less than 5%,

H1 Hypothesis Testing
Annual PAR images for the eight years (see Section 3.1) and a coefficient of variation image were calculated (Figure 3a). The coefficient of variation (mean value equal to 0.026) shows the inter-annual variability of the annual PAR throughout the period 2005-2012, which is less than 5%, in the entire study area. Thus, the annual PAR can be considered practically constant. Its variation is ca. 50% lower than that corresponding to the annual mean NDVI and EVI (Figure 3b,c). Annual-mean VI and reference annual GPP series (not shown) present rather similar spatial patterns. A further analysis has been carried to assess the effects of both the daily PAR, NDVI and EVI inter-annual variability on the reference annual GPP.
The results of the correlations GPP/PAR (Figure 3d) show that only the 25% of the pixels present a p-value ≤ 0.05, (i.e., statistically significant, with mean R 2 = 0.62), whereas 64% of the pixels show a p-value ≤ 0.05 for the correlation GPP/NDVI (R 2 = 0.73) and 65% for the GPP/EVI (R 2 = 0.74). Therefore, the NDVI (and EVI) inter-annual variations play a significant role in the GPP inter-annual variations. As the daily PAR series of all eight years show similar characteristic seasonal patterns, hypothesis H1 is accepted. That is, we assume that annual PAR is practically constant (as compared with NDVI and EVI) and each pixel can be characterized by a representative PAR

Calibration and Validation of the Semi-Empirical Model
The density scatterplot of GPP REF annual as a function of both NDVI Y and EVI Y for all the image pixels and all years (Figure 4a,b, respectively) evidences a practically linear correlation. Therefore, in a first step, the regression coefficients C 1 and C 2 (Equation (4), using reference annual GPP data) are obtained assuming a constant conversion efficiency value ε. The cross-validation procedure is applied to these data to obtain C 1 and C 2 using NDVI and EVI (Tables 2 and 3). The corresponding estimated GPP annual values using NDVI and EVI are shown in Figure 4c,d, respectively. Most of the estimates distribute along the 1:1 line, which indicates a good agreement between the reference and the estimated GPP. However, when using the NDVI, there is a cluster located at high levels of GPP that does not lie along the 1:1 line, which indicates an underestimation of the model for high GPP values. The corresponding pixels are located in dense forest areas ( Figure 5). The last column in Table 3 shows the statistics of the correlation between estimated and reference annual GPP using the mean values of C 1 and C 2 .  The consideration of different vegetation types implies adjusting the distributions in Figure  4a,b using different regression lines. The hybrid land-cover map for the study area [44] has been used to apply the cross-validation procedure for each vegetation class: (1) irrigated lands; (2) cultivated areas; (3) cropland mosaic; (4) broad-leaved forest; (5) needle-leaved forest; (6) mixed forest; (7) shrubs; (8) herbaceous; and (9) sparse vegetation. As a result, values of C1 and C2 have been     The consideration of different vegetation types implies adjusting the distributions in Figure 4a,b using different regression lines. The hybrid land-cover map for the study area [44] has been used to apply the cross-validation procedure for each vegetation class: (1) irrigated lands; (2) cultivated areas; (3) cropland mosaic; (4) broad-leaved forest; (5) needle-leaved forest; (6) mixed forest; (7) shrubs; (8) herbaceous; and (9) sparse vegetation. As a result, values of C 1 and C 2 have been obtained for each iteration and for each land-cover type. For the sake of brevity, Tables 4 and 5 summarize the mean value and standard deviation of C 1 and C 2 for each land-cover type, as well as the global validation statistics. The correlation between the estimates and the reference annual GPP using the coefficients in Table 4 is shown in Figure 4e,f for NDVI and EVI, respectively. In Figure 4e, the cluster corresponding to highest GPP pixels distributes also along the 1:1 line and, thus, the global correlation between NDVI estimated and reference annual GPP data increases from 0.87 to 0.90 (last column in Table 5) when considering different values of C 1 and C 2 for each land-cover type. However, when using the EVI the estimates hardly improve. Therefore, with EVI the uncertainties of the estimates are rather good even without using a vegetation map. The different vegetation types present rather similar statistics. Cultivated areas, croplands and herbaceous show high coefficient of correlation values, and irrigated lands and shrubs show the lowest values, though still relatively high. The lower rMBE values correspond to herbaceous and needle-leaved forest. When analyzing the spatial dependence of the estimates uncertainties (average for all the years), for example in terms of the rMBE, it is observed that most of the estimates present a bias around zero (Figure 6), and the different vegetation types present rather similar values. When using constant regressions coefficients, for the estimates from both NDVI ( Figure 6a) and EVI (Figure 6b), 20% of the pixels have |rMBE| ≤ 5% (and 68% of the pixels have |rMBE| ≤ 20%). When using coefficients depending on the vegetation type, the 24% of pixels have |rMBE| ≤ 5% (and 73% have |rMBE| ≤ 20%) for NDVI ( Figure 6c) and EVI (Figure 6d). The rMBE image using both constant and vegetation-type dependent values of C 1 and C 2 , shows that in the second case the red and blue areas (overestimation and underestimation) are reduced ( Figure 6). That is, when considering variable C 1 and C 2 , the bias of the semi-empirical model is slightly reduced. However, when using NDVI some areas located in W (small area with very low vegetation cover) and SE Spain still present overestimates of ca. 100%. Something similar occurs in the semiarid Ebro basin in NE Spain when using instead the EVI. Those areas located in SE and NW Spain comprise semiarid lands with a very low productivity (see Figure 1c). Then, although the bias is important in relative terms, it is not so in absolute terms because the aforementioned areas have very low GPP. Other areas presenting bias between 20% and 40% can also be identified, which are not clearly associated with a vegetation type. The presence of some areas with significant bias might be associated with the thematic map used. However, the use of an ecosystem functional type map [38] produces no improvement. The global statistics for the validation when using this alternative map with NDVI are R = 0.88, rMBE = −0.010, rMAE = 0.18, and rRMSE = 0.2. That is, the correlation between the model estimates and the reference GPP is decreased, and the uncertainties are slightly increased, with respect to those obtained using the hybrid classification map.  As mentioned above, Figure 6 shows average uncertainties for all the years when using the hybrid classification map. For the year 2011, the comparison between reference and estimated annual GPP data (Figure 7) shows a similar spatial pattern, although the model slightly underestimates GPP in highly productive areas corresponding to dense forests ( Figure 5), more noticeable when using NDVI. As mentioned above, Figure 6 shows average uncertainties for all the years when using the hybrid classification map. For the year 2011, the comparison between reference and estimated annual GPP data ( Figure 7) shows a similar spatial pattern, although the model slightly underestimates GPP in highly productive areas corresponding to dense forests ( Figure 5), more noticeable when using NDVI.

NDVI EVI
Remote Sens. 2017, 9,193 15 Finally, as an example, Figure 8 shows the annual GPP images for the 2005-2012 period obtained using the proposed model with EVI and the coefficients in Table 4. The observed inter-annual variability in GPP is influenced mostly by changes in structural parameters of the vegetation, and thus by the EVI, which is connected with precipitation pattern. When precipitation diminishes, for example in 2005 [47], a reduction in GPP is observed. In a wetter year, such as 2010, an increase of GPP is shown. Finally, as an example, Figure 8 shows the annual GPP images for the 2005-2012 period obtained using the proposed model with EVI and the coefficients in Table 4. The observed inter-annual variability in GPP is influenced mostly by changes in structural parameters of the vegetation, and thus by the EVI, which is connected with precipitation pattern. When precipitation diminishes, for example in 2005 [47], a reduction in GPP is observed. In a wetter year, such as 2010, an increase of GPP is shown.  Table 4.

Discussion
The proposed semi-empirical model to estimate GPP does not require actual meteorological data. It exclusively relies on a vegetation index product commonly available-the MOD13A2 in this case-and on a representative PAR.
The underlying assumption of the model is that VI is the main factor determining the GPP inter-annual variations. The use of a representative PAR in our model marks a qualitative distinction from those approaches based on a linear relationship between an annual cumulative or mean VI and the annual GPP [2,48,49]. The rationale of our model is also different from the greenness-radiation models that consider GPP equal to the product of a VI and PAR [50]. Certainly, a key factor accounting for the annual GPP is the annual cumulative or mean VI, but our model calculates a PAR-weighted annual mean VI. The use of the representative daily PAR as a weighting factor is important because VI gives the fraction of PAR absorbed but not the total absolute PAR absorbed. The same VI can produce different GPP depending on the incident radiation, and the latter shows a marked seasonal variability that affects differently the different vegetation types, depending on their phenology.
The calibration/validation process, however, does require a reference GPP product and, obviously, the higher the quality of the reference GPP product, the better the performance of the semi-empirical model. In this study, a daily GPP product optimized for the study area has been used to calculate the reference annual GPP instead of using the annual GPP from MODIS (MOD17A3). Although both products show a temporal consistency [4], our product benefits from: an accurate PAR product, a land-cover map more adapted to the study area and a water stress factor that better accounts for the photosynthesis reduction during the summer water shortage. Nevertheless, the MOD17A3 product can be also used as reference to calibrate the model (and could even be recommended for a global study). In order to further test the methodology robustness, the calibration/validation process has been repeated using MOD17A3 as reference for both VIs and considering both constant regression coefficients and coefficients depending on the vegetation type (Table 6).  Table 6 confirms that the MOD17A3 product can be also used as reference to calibrate the model, which significantly extends the range of applicability of the methodology. The additional requisite is the availability of a representative PAR image that could be estimated at the cost of a coarser spatial resolution from databases such as GMAO (Global Modeling Assimilation Office) or ECMWF (European Center for Medium-Range Weather Forecast).
Since the results have shown a good agreement with reference data, the proposed methodology can be used to generate long series of annual GPP to detect regional changes without requiring actual PAR, f APAR and ε. As commented in the literature [27], the absorbed photosynthetic radiation and ε are somehow linked; in this work, the link is established by means of the vegetation indices NDVI and EVI. The time series of EVI is shorter than the NDVI one and it is limited to sensors with the blue band. Nevertheless, a longer series could be generated using the two-band enhanced vegetation index, EVI2 [51], which is an acceptable and accurate substitute of EVI without the blue band.
Annual GPP could be obtained even when no classification map is available with similar uncertainty when using EVI (and with a slightly higher uncertainty in the case of using NDVI), or even in global studies using lower spatial resolution images such as SEVIRI-MSG. On the other hand, there is no need to work with daily images. It is possible to work directly with the original MODIS products, for example, the NDVI at 16-day temporal resolution, or even with a monthly NDVI without losing functionality. To assess their performance, the annual GPP has been calculated using both products with constant C 1 and C 2 . The validation statistics using the 16-day and 30-day products are exactly the same as those from the daily product shown in Table 3. Only a negligible change in the bias (from −0.008 to −0.007) is observed when using monthly NDVI values.
The model, mainly when it is driven with NDVI, overestimates the annual GPP in semiarid areas and slightly underestimates it in dense forest areas. This conclusion has been reached when using both a vegetation class map and an ecosystem functional type map. The overestimation in semiarid areas when using NDVI might be a consequence of NDVI sensitivity to canopy background variations, especially important in sparse vegetation areas (as bare soil NDVI does not equal to zero, then f APAR is slightly overestimated). This overestimation, although reduced, is not completely removed when using EVI. A light, general, GPP overestimation is found in semiarid areas. It could be a consequence of the photosynthesis reduction due to water stress that is not completely taken into account by the semi-empirical model, which evidences that the hypothesis H3 might be somehow inaccurate, that is, the vegetation index decrease does not fully characterize the photosynthesis reduction by water stress. However, the GPP in these areas is very low and therefore the absolute value of the uncertainty is low.
On the other hand, the underestimation in dense forest areas could be attributed to the NDVI saturation at high LAI levels. Generally, most vegetation indices approach a saturation level asymptotically in high biomass regions and for a certain range of LAI values [8]. The dynamic range and the saturation depend on the particular VI. We found the NDVI to asymptotically saturate in dense forest areas while the EVI remained sensitive to canopy variations (Figure 4b). Thus the model benefits from the wider dynamic range of EVI, slightly improving the GPP estimation over dense forests as previously shown in closed forest stands (for example in Sweden [52]). Despite this improvement, a very light, general underestimation of GPP in dense forest areas is still observed and it might be attributed to the fact that the leaf photosynthetic response to light is a non-linear process. Although the integration over time scales of two weeks or more tends to linearize the photosynthesis response curve to light [15], it seems that the non-linearity has not been completely removed. Boreal forest, much denser than Mediterranean forest, would be potentially more affected by this effect and thus no conclusions can be derived on the performing of our model in these areas.

Conclusions
The annual GPP has been estimated at 1-km spatial resolution from the MODIS standard VI products (NDVI and EVI) and the semi-empirical model, Equation (4), with C 1 and C 2 obtained, as a function of the vegetation type, as the coefficients of the linear regression between the reference annual GPP-calculated using Equation (2) with daily GPP data [4]-and the PAR-weighted average VI (Equation (7)).
The main advantage of the semi-empirical model to estimate annual GPP is that it does not require actual meteorological data, thus solving a problem that is often faced, especially in retrospective studies. The model exclusively relies on a vegetation index product commonly available-the MOD13A2 in this case-and on a representative PAR (calculated for a time period sufficiently long although that might not be the same as the period when GPP is estimated). The calibration/validation process, however, does require a reference GPP product and, obviously, the higher the quality of the reference GPP product, the better the performance of the semi-empirical model.
The performance of the model has been assessed in its ability to reproduce inter-annual variations of GPP in close agreement with those shown by the reference data. A total of eight essays have been performed using all the possible combination among the model driven by: NDVI or EVI, with or without vegetation map, and using as reference a previous GPP product generated by the authors [4] or the MODIS GPP product. The different statistics (Table 6) have shown a good agreement between the estimates and the reference data sets, with correlation coefficient between 0.84 and 0.91 and relative RMSE between 20% and 30% in all the cases, enhancing the robustness of the method and its applicability to other regions and potentially to other sensors with similar o lower spatial and temporal resolutions. However, special attention must be given to its applicability in semiarid areas, where the semi-empirical model overestimates the annual GPP, and in very dense forest areas, where the model slightly underestimates GPP. In the latter case, uncertainties are lower when using EVI instead of NDVI. Besides improving vegetation monitoring over high biomass conditions, EVI offers some other advantage: its performance does not depend on a vegetation map and hence this map is not a crucial requirement to apply the methodology.
With the above limitations, it is concluded that the proposed semi-empirical model provides an excellent compromise between simplicity and accuracy for the calculation of long time series of annual GPP.