Validating and Linking the GIMMS Leaf Area Index ( LAI 3 g ) with Environmental Controls in Tropical Africa

The recent Global Inventory Modeling and Mapping Studies (GIMMS) LAI3g product provides a 30-year global times-series of remotely sensed leaf area index (LAI), an essential variable in models of ecosystem process and productivity. In this study, we use a new dataset of field-based LAITrue to indirectly validate the GIMMS LAI3g product, LAIavhrr, in East Africa, comparing the distribution properties of LAIavhrr across biomes and environmental gradients with those properties derived for LAITrue. We show that the increase in LAI with vegetation height in natural biomes is captured by both LAIavhrr and LAITrue, but that LAIavhrr overestimates LAI for all biomes except shrubland and cropland. Non-linear responses of LAI to precipitation and moisture indices, whereby leaf area peaks at intermediate values and declines thereafter, are apparent in both LAITrue and LAIavhrr, although LAITrue reaches its maximum at lower values of the respective environmental OPEN ACCESS Remote Sens. 2014, 6 1974 driver. Socio-economic variables such as governance (protected areas) and population affect both LAI responses, although cause and effect are not always obvious: a positive relationship with human population pressure was detected, but shown to be an artefact of both LAI and human settlement covarying with precipitation. Despite these complexities, targeted field measurements, stratified according to both environmental and socio-economic gradients, could provide crucial data for improving satellite-derived LAI estimates, especially in the human-modified landscapes of tropical Africa.


Introduction
Leaf area index (LAI) is defined as one half of the total leaf surface area per unit ground surface area (projected on the local horizontal datum). It is a key biophysical vegetation property describing biome-specific canopy structure [1], and an essential variable in models of ecosystem processes and productivity [2,3], crop productivity [4] and hydrology [5]. Prescribing these models with accurate LAI parameters is, however, challenging due to the scarcity of landscape scale LAI measurements for most part of world's vegetated biomes. Data deficiencies are especially acute in tropical regions and across degraded woodlands [6,7].
Earth Observation (EO) data can be exploited to fill these data gaps [8,9]. LAI products derived from EO data describe variation in regional and global vegetation leaf area at 250 m to 6 km spatial resolutions and at 8-day to monthly temporal resolutions. Some are generated via inversion of physically-based models against observations of surface-leaving radiation [10,11]. Others are produced using empirical and semi-empirical relationships between LAI and spectral vegetation indices [12,13].
The Global Inventory Modeling and Mapping Studies (GIMMS) LAI3g product was derived from the Advanced Very High Resolution Radiometers (AVHRR) sensors. It provides one of the longest LAI time-series currently available for canopy structure trend assessments [8], spanning the period from July 1981 to December 2011 at 15-day intervals and at 1/12 degree spatial resolution [14]. Applications to date include the assessment of latitudinal vegetation growth trends [15] and trends in vegetation greenness for pastures [16]. The GIMMS LAI3g product was generated using a feed-forward neural network, using AVHRR third generation normalized difference vegetation index (NDVI3g) between 1981 and 2011. The neural network itself was trained through back-propagation using MODIS LAI between 2001 and 2009, and GIMMS data over the same period [14]. Validation of the GIMMS LAI3g product was carried out using field data from just 29 sites, predominantly in northern latitude cropland, grassland and boreal forest biomes. There have been no validation studies of GIMMS LAI3g in tropical biomes.
Here, we use a new dataset of ground-measured LAI (hereafter, LAI True ) [17] to validate the GIMMS LAI3g product (hereafter, LAI avhrr ) in East Africa. We compare biome-specific LAI values, and test whether both LAI True and LAI avhrr capture similar trends in environment-LAI response. In so doing, we characterize LAI within and between East African biomes, and quantify spatial variation in response to climate, topography and disturbance.

Study Region
Natural vegetation in East Africa (here defined as Tanzania, Kenya and Ethiopia) ranges from very open to very dense shrub (plants ≤ 2 m tall), deciduous bush of varying tree densities (plants 2-7 m tall), deciduous broad-leaved woodlands (plants > 7 m tall) of varying species compositions (e.g., Miombo in Tanzania and Acacia woodlands in Kenya and Ethiopia) and broadleaved semi-deciduous and evergreen forests (hereafter referred to as forests). As elsewhere in the tropics, these natural biomes are increasingly being degraded and converted to other land uses, particularly plantation and cropland. The primary drivers of land use change are high population growth and associated demand for land and resources, nationally and internationally [18][19][20]. Pressure is higher near centers of demand, such as major towns, and in areas that are easier to access, such at low-to mid-altitudes, on flatter slopes, outside protected areas and close to roads [19,21]. Consequently, the majority of forest that remains is in mountainous regions, with villages often clustered around the foothills [18]. Drier, fire-prone woodlands and xeric low to high density bushlands dominate at lower altitudes, and are more susceptible to small-scale resource extraction (e.g., for charcoal production) and pressure from livestock grazing [7].

Ground-Measurements of LAI
We measured LAI True at 274 sites across East Africa during ten field work campaigns, carried out between 2007 and 2012 ( Figure 1, Table 1) following standard protocols [7,22]. At each site, we took high-resolution images through digital cameras equipped with hemispherical (fish-eye) lenses. The cameras were mounted on tripods at 1 m above ground, looking vertically upward from beneath the canopy [23]. The -levelled‖ hemispherical photographs were acquired normal to a local horizontal datum, orienting the optical axis of the lens to local zenith [24]. We measured under overcast conditions whenever possible to minimize anisotropy of the sky radiance [23]. On average, 19 images (median, 13) were taken at each site, distributed within plots according to VALERI design [25] (VAlidation of Land European Remote Sensing Instruments) or along linear north-south transects if field conditions required.
Hemispherical images were pre-processed by first extracting blue-channel pixel brightness values and then applying a threshold algorithm for separating sky from vegetation [26]. Resultant binary images were analyzed using the free canopy analysis software CAN-EYE V6.3.8 [27]. For each site, we derived LAI corrected for foliage element clumping [28], limiting the field of view of the lens to values between 0° and 60° to avoid mixed pixels.

Validation of GIMMS LAI3g
Direct validation of the LAI avhrr product with LAI True was not possible, primarily because of the mismatch in spatial resolution between field data and AVHRR data [29]: each LAI avhrr pixel is likely to be a composite of different biomes, depending on the heterogeneity of the landscape. Instead, we indirectly validated LAI avhrr by comparing its distribution properties across biomes and environmental gradients with those properties for LAI True . We thereby assume that the coarse resolution product is able to capture LAI variation and its underlying drivers at landscape scales. First, we grouped LAI avhrr pixels belonging to a particular biome using MODIS land cover products for the years 2007 to 2011 (MCD12Q1v51 [30]), i.e., the time period of our field data, excluding those pixels that switched biome type during this time. We computed LAI avhrr as the average across the dry season months (December through February, June through August). Biomes of the MODIS land cover product were matched to biome types recorded in the field: evergreen broadleaf forest (=forest), deciduous broadleaf forest and woody savannas (=woodland), savannas (=bushland), closed and open shrublands (=shrubland), croplands and cropland/natural vegetation mosaic (=cropland), evergreen and deciduous needle-leaf forest (=plantation).
The geographic extent was set to the borders of Tanzania, Kenya and the southern part of Ethiopia, to match the regions sampled in the field. We created artificial -plots‖ using a random points generator, limiting the number of points within each biome to 1000. We specified a minimum distance of 8 km between points to minimise spatial autocorrelation and to exceed the pixel resolution of LAI avhrr maps. This algorithm created 80 plots in forest, 624 in woodland, 705 in bush, 995 in shrub and 294 in crop. Plantations are not well delimited within the MODIS land cover product and so the spatial extent of this biome was too low to generate a sufficient number of plots. For plantation only, we therefore increased the geographic extent to the west, thereby increasing the number of plantation -plots‖ to 11.
In a first step, we compared distribution properties of LAI avhrr and LAI True for each biome. Secondly, we used generalised additive and general linear effects models to parameterise the responses of LAI avhrr and LAI True to environmental gradients. Statistical analyses were carried out using R statistical software package [31].

LAI Response to Environmental Gradients
Rainfall and temperature estimates were extracted using WORLDCLIM interpolated climatology [32]. From these, we derived five weakly inter-correlated predictor variables (Table 2): mean annual rainfall (PPT), potential evapotranspiration (PET) [33] and an associated moisture index (MI = PPT/PET), precipitation of the driest quarter (PPT_DQ) and an associated moisture index for the driest quarter (MI_DQ), and annual temperature range (TE_R). For topographic predictors (elevation, ELEVATION, and slope, SLOPE), we used data from the 90 m spatial resolution Shuttle Radar Topographic Mission [34].
As a surrogate for the effects of natural grazing pressures on LAI, we used maps of herbivore browser richness (HERBIVORES) derived from International Union for Conservation of Nature range maps [35]. To quantify the effects of human pressures, we recorded whether a sampling site was located within or outside a protected area using the World Database on Protected Areas [36], calculated the (Euclidean) distance to roads (ROADS) and towns (TOWNS) using AFRICOVER data [37] and Open Street Map projects), and used maps of human population density (AFRIPOP [38,39]) to generate population pressure grids at 1 km spatial resolution (POPPRESS [40]). Pressure on a location i increases linearly with number of people p in a remote location j, such that , where N is the number of locations across which the pressure accumulates.
The weight w exerted by a remote population decreases exponentially with distance d, according to a half-normal distribution: . Increasing values of ơ increase the weight given to relatively distant populations. In exploratory analyses, = 5 yielded the strongest correlation with LAI (testing ơ up to 25) and was used in all subsequent analyses. The sample size (number of artificial plots) used for LAI avhrr was matched to the number of LAI True plots measured for the respective biome in the field, so that statistical power was similar for LAI avhrr and LAI True . LAI avhrr was subsetted for each biome (except forests and plantations) by randomly choosing samples 100 times. We then computed distance correlation statistics, DisR, using the dcor function (R package -energy‖) to test for statistical dependence between LAI True or LAI avhrr and each environmental predictor. We used the mean value of DisR across the 100 subsets of data. These analyses were repeated across and within biomes to account for potential biome-specific constraints masking relationships between biophysical structure and environmental drivers.
For subsequent analyses with general additive and general linear models, LAI True and LAI avhrr were log-transformed to meet assumptions of normality. We included latitude and longitude as predictors to test for landscape-scale spatial autocorrelation. Models were fitted using Maximum Likelihood. Automated model selection was carried out using information theoretic approaches and multi-model averaging [41]. First, we constructed a global model including all predictors described above, except for moisture indices and PPD_DQ which were highly correlated with precipitation ( Table 2). We then used the dredge function (R Package MuMIn v1.9.13), which constructs models using all possible combinations of the explanatory variables supplied in the global model. These models were ranked, relative to the -best‖ model, based on the change in Akaike Information Criterion (delta AIC). A multi-model average was calculated across all models with delta AIC < 2.

LAI Distribution within and between Biomes
LAI True and LAI avhrr are not drawn from the same population, differing significantly in their medians, spread and shape (Mann-Whitney-Wilcox test: p < 0.001). LAI True increased with vegetation height, from shrubland and bushland (similar) to woodland and then to forest (pairwise Wilcoxon tests, Bonferroni adjusted, p < 0.008; Table 3). Plantation had significantly lower LAI True compared to forest, but significantly higher LAI True compared to shrubland and bushland ( Figure 2). LAI avhrr increased significantly with from shrubland to bush to woodland to forest (Table 3, Figure 2). Forest also had significantly higher LAI avhrr compared to plantation and cropland; which had significantly lower LAI avhrr than plantation. When comparing for each biome separately, LAI avhrr was significantly higher than LAI True for forest, bushland, woodland and plantation (p < 0.001; Kruskal-Wallis tests), whilst being similar for cropland and shrubland.

Distribution of LAI Response to Environmental Variation
Smoothed response curves of LAI True and LAI avhrr versus PPT and MI were similar, showing an increase of LAI to global maxima, followed by a decrease at higher values of the environmental driver ( Figure 3). Adhering to an a priori functional form, responses of LAI True and LAI avhrr to PPT and MI were better captured by gamma distributions (Residual Standard Error, LAI True : RSSPPT = 1.574, RSSMI = 0.584; LAI avhrr : RSSMI = 0.584, RSSPPT = 0.567) as opposed to logistic functions (LAI True : RSSPPT = 1.609, RSSMI =1.18; LAI avhrr : did not converge) or linear models (LAI True : RSSPPT = 1.676, RSSMI = 1.238; LAI avhrr : RSSMI = 0.600 and RSSPPT = 0.573).
Coefficients for parameters in the gamma fits (nls function R package -stats‖) differed significantly between LAI estimates (Figure 3): the LAI True response reached its maximum at lower values of the respective environmental driver and declines more strongly thereafter. LAI True increased to 2.23 at MI = 0.76 mm and to 3.19 at PPT = 1400 mm decreasing thereafter, while LAI avhrr increased to 3.13 at MI = 1.58 and to 3.48 at PPT = 2700 mm decreasing thereafter.
Distance correlation statistics imply that both LAI True and LAI avhrr are not completely independent of any of the environmental drivers tested ( Table 4). The strength of the correlation between LAI True and LAI avhrr and environmental drivers differs among biomes. Changes in the correlation strength between LAI and PPT, MI or HERBIVORES followed similar trends for LAI True and LAI avhrr (Table 4).
Yet, complex patterns emerged for other environmental drivers and field data showed more pronounced signals in responses to TOWNS, SLOPE and POPPRESS at biome level. PPT, POPPRESS and their interaction were all significant factors contributing to observed variability of LAI True and LAI avhrr in general additive and general linear models (Table 5). Socio-economic variables contributed similarly to spatial variability in both LAI True and LAI avhrr (Table 5). ROADS and TOWNS were significantly contributing to variability in LAI True and LAI avhrr for both model types. LAI was higher within protected areas compared to unprotected land, and increased with SLOPE. However, general linear models identified fewer environmental predictors as contributing significantly to LAI avhrr variability compared to general additive models, suggesting a dominance of nonlinear relationships between environmental variables and LAI (Table 5). In (c,d) gamma fits (y = A × (x (k − 1) × exp(−x/s))/(s k × gamma(k))) are displayed using a natural logarithmic for the y-scale. LAI True response reaches its maximum at lower values of the respective environmental driver compared to LAI avhrr response and declines more strongly thereafter. Table 4. Response of satellite-derived LAI avhrr (EO) and field-derived LAI True (FI) to environmental gradients in single-predictor models, according to distance correlations (0 < DisR < 1, where DisR~0 indicates independence). Highlights indicate that DisR for LAI True and LAI avhrr differed by more than 0.1 (green) or by more than 0.2 (purple), in which cases we assume that the relationship between LAI True and environmental predictor is different from the relationship between LAI avhrr and the environmental predictor.  We suspect the observed increase of LAI with increasing POPPRESS to be an indirect effect resulting from the increase of POPPRESS with PPT: i.e., P (LAI|PPT ∩ POPPRESS) similar to P (LAI|PPT). To visualize this conditional independence, we performed a triangulation-based natural neighbor interpolation of the 2D scattered data of LAI at PPT × POPPRESS locations using the Matlab function TriScatteredInterp.

PPT
This creates an interpolant that fits a surface of the form LAI = f (PPT, POPPRESS) which passes through the sample values at the point locations. We interpolated LAI on a regular grid of PPT × POPPRESS, which enabled us to look at variations of LAI with one variable whilst fixing the other one. Flat slopes in the rainfall isolines suggested that LAI given POPPRESS and PPT is indeed nearly the same as LAI given PPT only, although the curves indicate that LAI given high PPT is lower for high POPPRESS especially in the case of LAI avhrr (Figure 4). We also suspect the negative relationship between distance to town and LAI (Table 5) to be an indirect effect resulting from a saturation of LAI at higher rainfall, where many towns are located. Flat slopes of PPT isolines in response to TOWNS indicate conditional independence of LAI in response to TOWNS. . Population pressure (POPPRESS) and distance to town (TOWN) isolines of LAI in response to increasing annual rainfall (PPT) for LAI True and LAI avhrr . (a,b) For a given level of rainfall, LAI does not depend on the level of POPPRESS. However, in high rainfall regions, high levels of POPPRESS have a noticeable negative impact on LAI, which is especially apparent in the LAI avhrr response. (c,d) Isolines suggest conditional independence of LAI and TOWN. However, there are no data with the combination of high TOWN and high rainfall, thus conclusions in the range of these parameter combinations are highly uncertain. Table 5. Relative Variable Importance (RVI) in general linear and general additive models (GLM and GAM) of field-and satellite-derived LAI in response to environmental predictors. RVI is calculated for each predictor by summing the AIC weights across all models where the variable occurs (models of interest are determined by delta AIC < 2; see main text). Because of high correlation between moisture indices and precipitation (Table 2), we excluded the former from global models. GLMs are first order unless superscripted (a) in which case second order. Smooth terms in GAMs were allowed two effective degrees of freedom. For second order GLMs, signs for coefficients are given for first (P1) and second-order polynomial (P2).

Discussion
The ultimate objective of linking EO data to biophysical land surface attributes (e.g., vegetation leaf area, biomass and productivity) is to characterize those attributes on large spatial scales and over time with minimal need for further fieldwork [42], so that the logistical, financial and subjective sampling constraints of fieldwork can be overcome. This requires that EO products, such as GIMMS LAI3g, accurately reflect conditions on the ground, both in absolute values and in their variation across space and time [43,44]. In this study, the increase in LAI with vegetation height for natural biomes (from shrubland/bushland to woodland and forest) is captured by both LAI avhrr and LAI True . However, we find that LAI avhrr significantly overestimates LAI relative to LAI True for all biomes except for shrubland and cropland. The coarse-spatial resolution of LAI avhrr may cause a bias towards the global mean of the region. Yet in East Africa, one would expect such a bias to result in lower LAI compared with estimates derived at high spatial resolution such as field-based LAI True , because these landscapes are highly heterogeneous, and increasingly modified towards low LAI biomes such as degraded woodlands and crops. Moreover, LAI avhrr extracted from large tracts of homogeneous forest in Eastern Congo was found to be similar to LAI avhrr extracted for forests in our study area, again suggesting a consistent overestimation in the GIMMS LAI3g product.
In tropical Africa, vegetation phenology is particularly tuned to precipitation and its seasonal shifts [45,46], which may be impacted by rising global temperatures [47]. Rainfall triggers the emergence of green leaves in deciduous biomes, and controls vegetative growth and growth-duration in semi-arid and arid environments [48]. Vegetation indices such as NDVI, as a surrogate of vegetation productivity and phenology, have been linked to rainfall and rainfall and rainfall variability [49] in East Africa via a log-linear relationship. Nonlinear responses of LAI to PPT (and derived MI) in our study area are captured by both LAI avhrr and LAI True by the same functional relationship. LAI increases with water availability up to a maximum, declining at higher values of annual rainfall and moisture. However, the response curves differ significantly, in that field-derived LAI reaches its maximum at lower values of the respective environmental driver, and exhibits a more rapid rate of change.
Further to climatic constraints, our analyses show that socio-economic factors play an important role in observed vegetation structure. LAI was found to be higher with increasing terrain steepness and within protected areas. These findings add evidence, based on biophysical structure, to previous studies showing how inaccessibility functions as a passive protection for woody biomes [18,19]. Our results are especially relevant for environmental management concerned with the maintenance of ecosystem processes and function, which are linked to biophysical properties such as LAI. Curiously, LAI was also found to be higher in regions under greater population pressure. Population pressure is greatest in areas of high rainfall, which are suitable for both crop production and high-LAI forest (of which there is less). Further, associated resource demands on adjacent woodlands and forests mean that, in many areas, these biomes have degraded to bushland and shrubland, which may exhibit higher biome-specific LAI than rainfall-limited bushland and shrubland elsewhere. Thus the observed pattern may reflect historical land use impacts rather than a positive causal relationship between human population pressure and vegetation leaf area. A visual comparison of LAI rainfall isolines along a gradient of increasing population pressure shows primarily flat slopes, supporting the above hypothesis. This pattern is especially apparent in the LAI avhrr response and, if at all, there seems to be a negative effect of increasing population pressure on LAI in regions with high rainfall.
Validating coarse-scale spatial resolution data with field measurements is inherently difficult due to challenges of spatial mismatch [50] and differing spatial scales [51]. LAI distributions inferred from field measurement and satellite retrieval should ideally converge to the true intrinsic distribution of the vegetation class in a given region at a given time [52]. Whilst our findings are encouraging regarding the capacity of EO data to capture spatial variation in LAI along major environmental gradients, they also highlight the need for further field assessments of inter-and intra-annual LAI dynamics, especially in remote woody biomes, in croplands and in plantations. An important issue, arising from our use of MODIS land cover products in the generation of biome-specific random plots, is that plantations are difficult to distinguish from natural forests, especially if consisting of broadleaved evergreen trees. We find that very few plantations are identified by the MODIS product, despite large areas of forests being converted to plantations in both Kenya and Ethiopia. Similarly, spectral similarities complicate the distinction between -woody savanna‖ and savanna, as well as between grassland and cropland [30], particularly when the latter are rotated in seasonal succession rather than being left fallow. It is encouraging, however, that despite these uncertainties in biome-specific LAI avhrr estimates, the increase in LAI with vegetation height is captured by LAI avhrr and LAI True similarly. Furthermore, general additive models identified the same set of environmental predictors having similar impacts on both LAI estimates.

Conclusions
Remotely sensed, global-scale estimates of biophysical structure, such as LAI avhrr effectively capture spatial variation in LAI along major environmental gradients, but some challenges remain. First, LAI avhrr appears to consistently overestimate LAI in woody biomes such as forest. Second, nonlinear responses of LAI to water availability, while captured by both LAI avhrr and LAI True similarly, differ significantly in their parameterization, such that the remotely sensed product reaches its maximum at higher values of rainfall and moisture index than field-derived LAI True suggests may be the case. LAI True increased to 2.23 at MI = 0.76 mm and to 3.19 at PPT = 1400 mm decreasing thereafter, while LAI avhrr increased to 3.13 at MI = 1.58 and to 3.48 at PPT = 2700 mm decreasing thereafter. Third, distance correlation statistics show significant relationships with all environmental drivers tested for both LAI True and LAI avhrr , although the strength of that correlation varies between LAI True and LAI avhrr responses.
More generally, we find that the same set of environmental drivers emerges as significant in models of LAI variability. Beside rainfall, temperature and topography, socio-economic correlates such as population pressure, distance to roads, distance to towns and protection status are important for understanding spatial variation of vegetation biophysical structure in the human-modified landscapes of tropical Africa. The responses to these drivers are largely consistent for LAI avhrr and LAI True except for their responses to potential evapotranspiration. Note though, that protection status, rainfall, temperature range and topography were not significant in general linear models of LAI avhrr variability despite being significant in general additive models suggesting complex underlying relationships.
In particular, higher forest LAI observed within the protected area network suggests a potential mechanism for monitoring efforts to reduce forest degradation (e.g., for carbon conservation or catchment protection). Given the above challenges, targeted field measurements, stratified according to both environmental and socio-economic gradients, will be needed to improve the accuracy of satellite-derived estimates.

Author Contributions
Marion Pfeifer and Philip Platts conceived the study, collected field data, performed the analyses and prepared the manuscript. Veronique Lefebvre contributed to statistical analyses and interpretation of results. Petri Pellikka, Rob Marchant and Alemu Gonsamo contributed field data and helped with manuscript revisions. Dereje Denu contributed data from the Jimma Highlands in Ethiopia.