Combined Spatial and Temporal Effects of Environmental Controls on Long-Term Monthly NDVI in the Southern Africa Savanna

Deconstructing the drivers of large-scale vegetation change is critical to predicting and managing projected climate and land use changes that will affect regional vegetation cover in degraded or threated ecosystems. We investigate the shared dynamics of spatially variable vegetation across three large watersheds in the southern Africa savanna. Dynamic Factor Analysis (DFA), a multivariate time-series dimension reduction technique, was used to identify the most important physical drivers of regional vegetation change. We first evaluated the Advanced Very High Resolution Radiometer (AVHRR)vs. the Moderate Resolution Imaging Spectroradiometer (MODIS)-based Normalized Difference Vegetation Index (NDVI) datasets across their overlapping period (2001–2010). NDVI follows a general pattern of cyclic seasonal variation, with distinct spatio-temporal patterns across physio-geographic regions. Both NDVI products produced similar DFA models, although MODIS was simulated better. Soil moisture and precipitation controlled NDVI for mean annual precipitation (MAP) < 750 mm, and above this, evaporation and mean temperature dominated. A second DFA with the full AVHRR (1982–2010) data found that for OPEN ACCESS Remote Sens. 2013, 5 6514 MAP < 750 mm, soil moisture and actual evapotranspiration control NDVI dynamics, followed by mean and maximum temperatures. Above 950 mm, actual evapotranspiration and precipitation dominate. The quantification of the combined spatio-temporal environmental drivers of NDVI expands our ability to understand landscape level changes in vegetation evaluated through remote sensing and improves the basis for the management of vulnerable regions, like the southern Africa savannas.


Introduction
In southern Africa, long-term changes in the savanna ecosystem productivity and structure are believed to be driven by a combination of biotic (including human) and abiotic drivers [1][2][3][4], potentially representing irreversible landscape degradation [5].Changes labeled "degradation" in the southern African savanna [6] are usually referring to the process of shrub encroachment (specifically, a shift from grass-and tree-dominated landscapes to less biologically productive landscapes dominated by scrub), and has been well documented throughout southern Africa [7][8][9].Given this trend of potential degradation, understanding the temporal and spatial changes in vegetation dynamics and then identifying the dominant drivers responsible for these documented vegetation transitions is critically important for management, particularly in the context of increasing current climate variability and the increased future variability and climate change predicted for this region [10].
The critical importance of water is well recognized across savanna vegetation types (from open grassland savanna to densely wooded savanna) [11,12], although its role differs across vegetation types and biomes [13].These studies have also demonstrated that fire regime, soils, herbivory, nutrients and land use also contribute to local patterns of ecosystem structure.However, a long-term simultaneous temporal and spatial analysis of the relative importance of the coupled relationship among environmental covariates driving savanna vegetation has not yet been performed at a monthly time-step using newly available 30+ year time series data.Thus, simultaneously identifying these spatial and temporal patterns of multiple, dynamic explanatory variables of importance, while accounting for unexplained, but shared, temporally varying trends on this longer time scale, is necessary to help improve our understanding of the controlling factors of vegetation response.
The emergence and development of satellite remote sensing has provided an increasingly powerful instrument to monitor, observe and characterize landscapes, since it offers repeated data of large terrestrial areas at longer temporal scales, now from 1982 to 2010, which is a significantly long period of time to really begin to understand the drivers and changes to these biophysical systems [14][15][16][17][18][19][20].Satellite-derived vegetation indices, such as the Enhanced Vegetation Index (EVI) and the Normalized Difference Vegetation Index (NDVI), have been linked with leaf area index, fraction of green vegetation and vegetation primary production [13,21].Vegetation growth and biomass is impacted by multiple factors, including fire, herbivory, competition, soil properties and climate [22].Among these factors, climate change effects on seasonal activity in terrestrial ecosystems are significant and well documented, especially in the middle and higher latitudes [23][24][25][26].In arid and semiarid ecosystems and seasonally dry tropical climates where water is limited, vegetation growth is primarily constrained by water availability [14][15][16][17]27,28].Rainfall stimulates green-up onset and determines the duration of growth and flowering of some desert plants, and its distribution in the rainy season is critical to vegetation germination, growth and biomass [29].Tropical and sub-tropical regions of southern Africa are dominated by savannas, which are simply defined as the coexistence of trees and grasses.Savanna is considered a relatively productive ecosystem, with the entire African savanna biome accounting for 13.6% of the global net primary production (NPP) [30].
Understanding variations in the growing season that impact vegetation productivity is critical for the management of this particular ecosystem in which woody cover depends largely on mean annual precipitation (MAP) [2,3], but in which other factors, such as herbivory, fire and human activities (mainly grazing activities), can also exert a significant influence that is highly variable in space and time [3,31].
Time series approaches to continuous Earth observation-based variables are being adopted by many scholars to advance the understanding of inter-and intra-annual variations in vegetation and to examine and derive relationships between drivers of change and vegetation growth [2,3,[32][33][34][35][36][37][38][39][40].Application of wavelet analysis [41], Fourier analysis [42,43] and other transformations [44,45] all show promise in elucidating patterns of land cover variation, but rarely are multiple, spatiotemporally variable drivers of landscape change assessed simultaneously.To address this challenge, we will apply Dynamic Factor Analysis (DFA), a powerful multivariate times series dimension reduction technique, to investigate the dynamics of vegetation coverage across three large watersheds in southern Africa and two very different time scales, to identify the most important physical drivers of vegetation cover in the region.DFA was initially applied to economic time series [46][47][48][49] and was later extended to include explanatory variables in ecological systems.Because of its inherent advantages (efficiency, explanatory power, suitable for non-stationary data), it has since been applied successfully across a wide range of environmental systems, from the dynamics of squid populations [50] to variations in groundwater levels and quality [51][52][53][54], from soil moisture dynamics [55][56][57], to air quality [58] and, more recently, to link vegetation and climate dynamics [59][60][61].
In this study, we apply DFA to investigate the spatial and temporal interactions between NDVI and a collection of environmental covariates across the Kwando, Okavango and the upper Zambezi watersheds, which represent a dynamic savanna landscape in southern Africa.The specific objectives of this study were to (1) compare and validate a DFA NDVI model based on newly released Advanced Very High Resolution Radiometer (AVHRR) Global Inventory Monitoring and Modeling System third generation (GIMMS3g) data against an existing model based on the Moderate Resolution Imaging Spectroradiometer MODIS 10-year data and (2) develop a long-term model based on the 30+ years of AVHRR GIMMS3g NDVI data, to identify the driving factors that most fully explain the observed variation in the NDVI time series and how they vary across biophysical gradients.Part I of the paper will address a comparison between an adaptation of an existing decadal scale model, which utilizes MODIS NDVI data [61], and the newly available GIMMS3g NDVI dataset for their overlapping 2001-2010 time period.Part II will extend this validated (from Part I) AVHRR dataset to the full longterm period from 1982 to 2010 to determine the dominant physical drivers of vegetation growth across this four-nation southern Africa study region.

Study Area
The Okavango, Kwando and upper Zambezi catchments cover approximately 681,545 km 2 in tropical and sub-tropical southern Africa (Zambia, Angola, Namibia and Botswana) (Figure 1).Mean annual precipitation (MAP) ranges from under 400 mm to 1,400 mm•yr −1 and is strongly correlated with latitude and elevation, with the highest rainfall in the mountainous north (Figure 1).This gradient straddles a previously noted [2] critical threshold of 650 mm, below which precipitation is believed to dominate savanna vegetation patterns and above which other factors, such as fire and herbivory, are hypothesized to play an important role.Relatively low human population in this region [62] reduces the effect of land use changes associated with major roads and settlements and facilitates identification of the explicit effects of climate on vegetation.The cattle stocking rate in this region is very low, with typical values lower than 10 head per km 2 [63].Within our study area, the southern semi-arid areas are characterized by low MAP and high interannual variability, and both the magnitude and relative reliability of MAP increases as we move towards the north [64].Soils are primarily oxisols at higher elevations in the north and entisols in the south [65].In particular, Kalahari sands characterize the majority of the area.Low topography in the south, especially in Caprivi (Namibia and northern Botswana), makes clear hydrologic separation of the catchments difficult.Inter-basin water historically flowed eastward, but these systems have not connected consistently since the late 1970s.

NDVI
MODIS NDVI data (MOD13A3) were used in Part I of the analysis, along with the AVHRR GIMMS3g NDVI data product.MODIS provides monthly NDVI data at a 1-km spatial resolution in the sinusoidal projection.This product is generated using the 16-day 1-km MODIS VI output (created from between a maximum of 64 and a minimum of 1 observations, with 4 measures possible per day at the equator), temporally aggregated using a weighted average to create a calendar-month composite.Grids contaminated by clouds and those with average growing season NDVI less than 0.1 were excluded from analysis.The latest version of the Global Inventory Monitoring and Modeling System (GIMMS) NDVI dataset, termed NDVI3g (third generation GIMMS NDVI from AVHRR sensors), was used in both Part I and Part II of our study.It was generated from the AVHRR onboard a series of National Oceanic and Atmospheric Administration (NOAA) satellites (NOAA 7,9,11,14,16,17 and 18) and with the goal of improving data quality in the northerly lands, where the growing seasons are short [1].This dataset was previously developed by the Global Inventory Modeling and Mapping Studies (GIMMS) group, and was generated in the framework of the Global Inventory Monitoring and Modeling System (GIMMS) project at the National Aeronautics and Space Administration (NASA) Goddard Space Flight Center.No atmospheric correction is applied to the GIMMS data, except for volcanic stratospheric aerosol periods (1982-1984 and 1991-1994) [2].A satellite orbital drift correction is performed using the empirical mode decomposition/reconstruction (EMD) method, minimizing the effects of orbital drift by removing common trends between time series of solar zenith angle (SZA) and NDVI [3].Additionally, other deleterious issues, such as calibration loss and sensor degradation, were addressed [4].The maximum NDVI value over a 15-day period is used to represent each 15-day interval, and this compositing scheme results in two maximum values of NDVI per month [1].The entire available NDVI3g dataset spans from July 1981 to December 2011, with a spatial resolution of about 8 × 8 km.We aggregated the data into monthly observations using the maximum value composition, which further reduces cloud and other noise effects, and finally, we got the monthly NDVI for the entire study area from 1982 to 2010.

Climate Data
In this study, we utilized the Matsuura and Willmott datasets of monthly precipitation  and monthly mean air temperature , which have a spatial resolution of 0.5 degrees by 0.5 degrees, with grid nodes centered on 0.25 degree (Table 1).These datasets improve upon a previous global mean monthly datasets with a refined Shepard interpolation algorithm and an increased number of neighboring station points included in the analysis [66].Based on the grid nodes included in the three catchments, we interpolated the datasets into continuous surfaces with 1-km spatial resolution using inverse distance weighted interpolation.
Global annual water budget data is available from the Willmott and Matsuura Global Climate Resource page [67].The time series water budget data are available for seven variables: actual evapotranspiration, potential evapotranspiration, snowmelt, surplus, deficit, mid-monthly soil moisture and mid-monthly snow cover.This time series data is available at monthly time steps from 1900 to 2010; in this study, the actual evapotranspiration data was used for the timeframe of 1982 to 2010.Actual evapotranspiration (Ea) is the amount of water delivered to the atmosphere by evaporation and transpiration.It is determined by complex interactions of plant, soil and climatic variables, which, given the high spatial heterogeneity in savanna landscapes, makes such a measure influential.
Potential evapotranspiration (E) impacts the water availability of vegetation and, therefore, can be an important factor influencing savanna ecosystems.The National Center for Environmental Prediction-Department of Energy (NCEP-DOE) Reanalysis II global potential evaporative rate data from 1982 to 2010 were used to represent the environmental demand for evapotranspiration (Table 1, [67][68][69]).NCEP-DOE Reanalysis II provides monthly data at about a 2.5-degree spatial resolution in the Network Common Data Form (NetCDF) format.Since the Gaussian grids are irregular, we converted NetCDF to feature layers by month using the "Make NetCDF Feature Layer" tool in ArcGIS software (ESRI, Readlands, CA, USA) and interpolated to continuous surfaces with a 1-km spatial resolution using an inverse distance interpolation method.Soil moisture (S) is an integrated factor that exerts dominant control on the spatial distribution of trees, shrubs and grasses.The Climate Prediction Center (CPC) global monthly soil moisture data from 1982 to 2010 were used, which provides monthly values at a 0.5-degree spatial resolution with grid nodes centered on 0.25 degrees (Table 1).This dataset is based on the land model, a one-layer "bucket" water balance model, which uses CPC monthly global precipitation data over land and monthly global temperature for global reanalysis as the input fields [70].We interpolated the datasets into continuous surfaces with a 1-km spatial resolution using inverse distance weighted interpolation.
Vapor pressure deficit (V), defined as the difference between the saturation and actual vapor pressure, is an accurate indicator of the actual evaporative capacity of the air.It is an integrative climate variable, which can impact water-sensitive savanna systems.We calculated the monthly V data from 1982 to 2010 from NCEP-DOE Reanalysis II monthly maximum temperature, monthly minimum temperature and monthly mean relative humidity.The calculation procedures for monthly V are shown as follows.The saturation vapor pressure can be calculated from air temperature, which is expressed by: (1) where e 0 (T) is saturation vapor pressure at the air temperature, T (kPa), T is air temperature (°C) and exp is natural logarithm.Due to the non-linearity of the above equation, the mean saturation vapor pressure of a month should be calculated as the mean between the saturation vapor pressure at the mean maximum and minimum air temperatures for that month: where e s is the mean saturation vapor pressure.The actual vapor pressure can be calculated as: where RH mean is the mean relative humidity and e a is the mean actual vapor pressure.The mean V of a month can be computed as: Likewise, the derived V gridded layers at a 1-km spatial resolution were used to extract monthly mean V values of each polygon defined by different precipitation intervals.
The Monthly Southern Oscillation Index (O) data was obtained from the Australian Government Bureau of Meteorology [69].This index gives an indication of the development and intensity of El Nino Southern Oscillation (ENSO) events in the Pacific Ocean at a monthly time-step.The difference in pressure between Tahiti and Darwin is used to calculate O, where significant negative values (<−8) indicate El Niño events and positive values (>8) indicate La Niña events.Such measures are important to models of savanna land cover, as ENSO events can impact the precipitation and wind patterns across southern Africa.

Conceptual Basis for Analysis
Monthly NDVI data are used as the indicator of vegetation dynamics and change and as the response (dependent) variable in the analysis.A suite of climate variables (Table 1) were used as candidate explanatory (independent) variables (CEVs) in the analysis: precipitation (P), mean temperature (T), minimum temperature (m), maximum temperature (M), soil moisture (S), vapor pressure deficit (V), actual evapotranspiration (Ea), potential evapotranspiration (E) and the Southern Oscillation Index (O).These variables were selected, due to their presence in the literature as potential drivers of NDVI or vegetation growth and also for being readily available at the necessary spatial and temporal resolutions for this study.Due to many climate data being used in the calculations of other variables (e.g., temperature and precipitation in evapotranspiration measures, etc.), additional analyses had to be undertaken to remove some potentially overlapping variables, and this also inherently restricted the datasets available for use.Herbivory and fire are additional variables that the literature discusses as having potential impacts in savanna landscapes [2,3,31].Although monthly time series data of herbivory were not available for the study region, the very low cattle stocking rate (<1-10 head/km 2 , compared to >250 head/km 2 in other African savanna regions) [63] and low density of wildlife management areas suggest that herbivory effect is likely minimal at the more extensive polygon level used in this analysis.Savannas are the most frequently burnt ecosystems, and fire is regarded as the dominant process preventing savanna trees from achieving their resource-driven potential [3,11].Since the AVHRR GIMMS project does not contain long-term monthly fire data and it cannot be obtained elsewhere at an appropriate scale, these will not be model inputs, but rather, will be represented by the unexplained variance in the DFA model (common trends) as explained below.
Time series of explanatory and response variables were then aggregated up from a pixel-scale by extracting the mean values over specific polygon regions as defined by 50-mm precipitation intervals.This was done across the landscape subdivided for each of the drainage basins and resulted in the production of 48 individual data polygons (Figure 1).While the dominant precipitation gradient across the study area is from the north (higher elevation) to the south (lower elevation), there are also drivers (topography, oceanic influence, etc.) from east to west.As such, the use of three drainage basins may help to pick up differences in these drivers across the east-west gradient.Ultimately, 48 individual polygons were created based on MAP polygons and drainage basin locations shown in Figure 1, yielding an initial set of 433 time series: 1 response variable, 8 CEVs in each of the 48 polygons and one CEV for the whole domain (O).

Dynamic Factor Analysis (DFA)
DFA is a statistical methodology designed to explore common patterns evident in the relationships and interactions between response and explanatory time series.Therefore, a priori knowledge of the dynamics amid response (NDVI) and explanatory (e.g., precipitation, fire, etc.) variables is not needed to successfully apply this tool [53].Unlike other time series methods, DFA is capable of modeling relatively short, incomplete, non-stationary time series [71].Intrinsically a structural time series technique [48], DFA provides the means to model temporal variation in observed data (response variables) as linear combinations of common trends, explanatory variables, a constant intercept and noise [48,49,71].Together, these parameters represent the unexplained variability (common trends) and explained variability (explanatory variables) to create the following model: where S n (t) is a vector comprised of the set of N response variables (n = 1,…,N), α c (t) is a vector comprised of the C common trends (c = 1,…,C), γ c,n are factor loadings (weighting coefficients) indicative of the weight associated with each of the common trends within the Dynamic Factor Model (DFM), μ n is a constant level parameter, υ k (t) is a vector comprised of the K explanatory variables (k = 0,…,K) and β k,n are regression coefficients representing the relative importance of each explanatory variable.In this study, S n signifies the 48 NDVI time series (one from each polygon in Figure 1).ε n (t) and η m (t) are independent Gaussian noise terms with zero mean and an unknown diagonal or symmetric/non-diagonal covariance matrix.Common trends are obtained using the Kalman filter/smoothing algorithm and the expectation maximization (EM) technique [72][73][74] and are modeled as a random walk [48].The EM technique is also used to calculate level parameters (μ n ) and factor loadings (γ c,n ), which denote the relative importance of the common trends.Regression coefficients (β k,n ), which indicate the relative spatial importance of explanatory variables, are modeled using multi-linear regression [50].Canonical correlation coefficients (ρ c,n ) were employed to quantify the cross-correlation evident between response variables and common trends, with values of ρ c,n close to unity, indicating high correlation.The significance of the relationship between response and explanatory variables was based on the magnitude of the β k,n and the associated standard errors (significant for t-values > 2).
To produce models with a minimum number of common trends, the use of non-diagonal error covariance matrices was employed [75].As with other modeling tools, DFA aims to balance model parsimony while maintaining the goodness-of-fit.By developing different dynamic factor models (DFM), this balance can be evaluated.The performance of each developed DFM was evaluated using the Nash and Sutcliffe [76] coefficient of efficiency (C eff ) with a statistical significance test [77] and the Bayesian information criterion (BIC) [78].C eff compares the variance between predicted and observed data about the 1:1 line, with C eff = 1 indicating that the plot of predicted vs. observed data matches the 1:1 line [76].The BIC is a statistical measure for model selection that rewards goodness-of-fit, but penalizes increases in the number of model parameters.The DFM that minimizes the number of common trends required to achieve the best fit, as determined by the C eff and/or BIC, is considered to be the best model.The addition of appropriate explanatory variables may help to improve the performance of the model, identify environmental factors (if any) that affect the response variables and quantify the spatial distribution of the weight of each factor.To directly compare the relative importance of common trends and explanatory variables across a response variable domain, all response and explanatory variables were normalized (mean subtracted, divided by standard deviation) prior to analysis [50,79].DFA was implemented using the Brodgar Version 2.7.2 statistical package (Highland Statistics Ltd., Newburgh, UK), which uses the statistical software, R, Version 2.9.1 (R Core Development Team, Viena, Austria, 2009).

Dimension Reduction of Candidate Explanatory Variables
Nine available candidate explanatory variables (CEV) were initially considered (Table 1).Based on Campo-Bescós et al. [61], we used the regional weighted-average of 8 CEVs, excluding O, since it was found to capture the major variance of each CEV across all polygons.The variance inflation factor (VIF) was used to quantify the severity of the collinearity of each set of CEVs [75].The authors recommend using a backward selection method to remove one variable at a time (the one with the highest VIF > 10) and recalculate VIF after each iteration, until we obtain a set of non-collinear variables.Thus, in an initial reduction step, CEVs with VIF > 10 were excluded from the analysis [75,80,81].

NDVI Analysis Procedure
The DFA analysis followed a three stage process.( 1

Part I: Comparison of MODIS and AVHRR Models (2001-2010)
Figure 2 shows the overlap time period for both weighted-area average NDVI estimations, AVHRR and MODIS.The time period of overlap is from 2001 to 2010, and the associated correlation coefficient is 0.95.While the correlation and trends (Figure 2) are similar, they exhibit differences during the peaks resulting from differences in sensors, time points within the month (from which maximum NDVI is obtained) and the frequency of measures (MODIS being more frequent, with often daily observations being integrated into the monthly measures).As such, differences in the values measured are to be expected, and although their correlations and overall trends do match well, MODIS provides improved spatial, spectral and radiometric resolutions compared to the AVHRR measures.The DFA model performance for MODIS NDVI and AVHRR NDVI3g was compared for the same suite of CEV and the same model type (multi-linear regression used in a previous study [61]: precipitation (P), mean temperature (T), maximum temperature (M), fire (F), soil moisture (S) and potential evapotranspiration (E).Due to the good performance of Model III (multi-linear regression, after removal of common trends) in the previous study, we compared the consistency of this model type for both datasets.For only one CEV (Figure 3), good consistency is found for all the individual CEVs, where F is the variable that itself explains a higher percentage of NDVI variance, followed by E. However, because the NDVI time series is not identical from both sources, goodness of fit differs.In general terms, MODIS NDVI is better predicted than the AVHRR NDVI3g (C eff ).Incremental improvement of the multi-linear regression model performance with the addition of explanatory variables (Table 2) follows the same pattern for both datasets.The best model includes five CEVs that match those selected in the previous study [61]: F, S, P, T and E (Table 2).For the extended time period analyzed in this study, 1982 to 2010 (Part II), the F time series is not available.Therefore, the best model without F included four CEVs: S, P, T and E (Table 2).Again, MODIS NDVI is better predicted than AVHRR, with a C eff value of 0.86 (0.77-0.91) and 0.72 (0.40-0.88), respectively.The spatial importance of each CEV follows similar patterns in both models (Figure 4).In the drier area (MAP < 750 mm) dominated by grassy vegetation, S is the key factor independent of the NDVI source.In these subregions, ranking of weighting coefficients are equal for both NDVI signals.
For wet MAP regions > 950 mm, dominated by woody vegetation, the NDVI signal is also modeled across the domain with similar factor importance, and as described in the previous study [61], E is the main driver and negatively related with NDVI.In this subregion, T and P exhibit also a stronger weight on NDVI from MODIS than from AVHRR.The goodness-of-fit of NDVI (top panel in Figure 4) from AVHRR reaches the lowest values in the humid region.In the previous study, F was highlighted as one of the main drivers of MODIS NDVI in this humid region [61].Considering the coarser temporal resolution of the raw AVHRR time series, the impact of F can be higher on AVHRR than on MODIS time series, with a higher raw temporal resolution.While monthly MODIS NDVI is based on numerous measurements per month, AVHRR has only two measurements per month.Clouds can affect AVHRR NDVI the same way as fire, especially in the humid area.Therefore, this explains the lower predictions for this humid area in which results need to be interpreted carefully.
In all, the consistent models obtained for both datasets indicate that, in spite of the dataset differences, for the overlapping period, both datasets include similar dynamics that are explained by the same environmental drivers.

Experimental Time Series
Monthly NDVI data collected between 1982 and 2010 showed relatively consistent seasonal cycling, typical for this region, across all data analysis precipitation polygons (Figure 5).Differences of patterns (maximum values) are observed between humid and drier regions.Humid areas (MAP > 950 mm) exhibited some NDVI reductions at the maximum photosynthetic period.This fact can be related to punctual anomalies, such as clouds and fires; for example, for clouds, a period of maximum NDVI is correlated to high precipitation and, therefore, to a high frequency of clouds.

Selection of Candidate Explanatory Variables
Following the procedure described in the previous section, we used the set of variables that do not suffer from collinearity across variables (variance inflation factor, VIF < 10, [75]).Variables with VIF > 10 were removed and the VIF recalculated after each iteration (Step 1-2 in Table 3), until we obtained a set of non-collinear variables.The final set of CEVs resulted in the selection of eight CEVs for further investigation, consisting of monthly time series for 1982-2010 of precipitation (P), mean temperature (T), minimum temperature (m), maximum temperature (M), soil moisture (S), actual evapotranspiration (Ea), potential evapotranspiration (E) and the Southern Oscillation Index (O) (Table 1, Figure 6).We used weighted-average variables as explanatory variables.

DFA Models (I, II and III)
The first model was developed as a baseline using only an increasing number of common trends (Model I; Table 4) and no explanatory variables.The best, Model I, was obtained with one common trend (C = 1), sufficient to minimize the BIC indicator and achieving the highest overall C eff of 0.79 (ranging between 0.34 and 0.95 across precipitation polygons) (Table 4).The single common trend illustrated shared variability across the 48 NDVI time series and may be viewed as a general signature of NDVI across the domain, integrating environmental factors that influence vegetation dynamics.However, high positive correlation (0.59 ≤ ρ c,n ≤ 0.97) between the 48 NDVI response variables and the common trend highlights that the model captures the principal variability of NDVI across the domain.Nevertheless, since this trend explains a latent effect (unexplained variance), it does not reveal which biophysical variables are driving NDVI variations.
Next, in order to reduce model reliance on the unexplained variability in Model I, we developed DFA with explanatory variables and common trends (Model II, Table 4).Combinations of weighted-area average CEV time series were evaluated in a factorial manner to determine the best model fits (255 DFMs were explored between one and four common trends; see details in the Supplementary Material).The best model in order of BIC criteria was Ea, T and S (C = 1 and K = 3, BIC = −8,538; bold text in Table 4), which has one explanatory variable (Figure 6I) and three explanatory variables: actual evapotranspiration (Ea, Figure 6F), mean temperature (T, Figure 6B) and soil moisture (S, Figure 6E).Based on statistical significance (p = 0.01), the model had an acceptable model performance with an overall C eff = 0.77 [77].In comparison with Model I, Model II improves model performance for the worst performing polygons, from 0.34 to 0.46.Finally, the explanation of response variables was shifted from unknown (common trend) to known drivers (explanatory variables), with a decrease of average canonical correlation from 0.89 to 0.06 (Model I and II, respectively).Figure 7 shows the weight of each driver for Model II Ea, T and S and the goodness-of-fit along each MAP gradient across the study area.Over the range of MAP, the importance of the common trend is not statistically significant (t-test < 2), illustrating a latent effect equally distributed across the domain.According to Figure 7, S and Ea are the main drivers of NDVI in the savanna grassland (MAP < 750 mm).However, the rate of change on S and Ea over the savanna grassland is opposite.Thus, Ea rises in importance with MAP, while S decreases.T is the third driving factor of NDVI of importance in this region, with a homogeneous influence across the domain (Figure 7).Over a MAP > 950, Ea overcomes this key factor.Its importance is homogeneous across the humid range.The second driving factor in the more wooded savanna is T. As well as Ea, T exhibits a constant pattern across this region.However, its importance is half or lower than Ea.The multi-linear regression model selection (Model III) is presented in Figure 8, showing incremental improvement of Model III performance with the addition of the explanatory variables.Minimum BIC is achieved with five explanatory variables (BIC = −24,333).Thus, the best model fit is model Ea, T, S, P and M with a C eff of 0.80 (Model III, bold text in Table 4), which relates to the explanatory variables of evapotranspiration, mean temperature, soil moisture, precipitation and maximum temperature.Figure 9 shows the weight of each driver for this best model and the goodness-of-fit along each MAP gradient.This analysis found that soil moisture was the most important variable, followed by actual evapotranspiration, with a more minor impact of temperature when the MAP was below 950 mm.Above 950 mm, soil moisture and mean temperature become insignificant, and actual evapotranspiration and precipitation dominate as drivers of NDVI.Notice that the flat response of maximum temperature (M) does not mean it is not important (or needed).M is adding temporal structure to the model (explains temporal variance), but does so equally across the domain (spatially).4).Symbols are filled in red where the weighting coefficients are significant (t-value > 2).In the image, separate use en dash for ranges, thousands with a comma, and are Ea, M, P, S, T, E, etc., italicized?

Discussion
Zhu and Southworth [13] found a significant positive relationship between mean annual NPP and MAP for the bush/scrub and grassland savanna regions, but no relationship in wooded savannas.Specifically, they found a relationship with NPP and MAP up until around 850-900 mm, above which no relationship existed.In this study, many more variables have been included in determining the controlling factors on NDVI across this landscape.Across a thirty-year time series, which is long enough to detect significant changes and patterns in vegetation cover, the key drivers were soil moisture, followed by actual evapotranspiration, with a more minor impact of mean temperature when MAP was below 950 mm.Above 950 mm, soil moisture and mean temperature become insignificant and actual evapotranspiration and precipitation dominate as drivers of NDVI.This switch, as with the Zhu and Southworth [13] study of NPP patterns, occurs around the transition from shrub/scrub and grassland savannas to more wooded savanna environments.It is important to highlight that for the overlap time period between MODIS and AVHRR (2001-2010), the two variables, Ea and F, have a high correlation coefficient (r = −0.84).In fact, Ea has the highest correlation variable with F, followed by P (r = −0.63).As such, the variable, Ea, seems to encompass some of the unavailable fire variable in this Part II model development and also causes differing patterns and trends with the remaining variables, due to its inclusion.However, at the time scale of 1981-2011, these variables were not correlated to such a degree and, hence, could be included in this model together (in the Part I model, Ea was removed, due to collinearity with other variables).The models developed in Part I and Part II of this paper are therefore not possible to compare fully, as different variables were used in their creation, in accordance with the rules of the model development and variable selection.Additionally, the model developed in Part II did not include fire, as this variable in Part I was derived from the MODIS data itself, which does not extend back pre-2000.
The results of the long-term analysis with AVHRR NDVI highlight the shift in importance of driving forces over the region in areas that receive different amounts of mean annual precipitation (MAP) representing three distinct savanna types (from grass to tree dominated systems).For example, for the subregion, where MAP < 750 mm (primarily grass-dominated savannas), our analysis shows that NDVI was most strongly influenced by soil moisture, maximum temperature and actual evapotranspiration, whose impact increases with increasing precipitation, with much smaller effects of precipitation.The importance of soil moisture found for grass-dominated savannas strongly supports the broader view of savannas proposed by Ward et al. [82] and the need to incorporate soil-climate interactions and not focus so strongly on fire-climate restrictions.In these very low moisture environments, the intermediary of the soil and its ability to hold and store moisture (rather than just the precipitation amount per se) is key.Linhoss et al. [83] also identified soil moisture as a key driver of the environmental dynamics in the Okavango Delta.On the other hand, regions with MAP > ~950 mm (where NDVI and overall biomass tend to increase with an increasing presence of woody vegetation) presents markedly different patterns.In these more humid areas (up to 1,300 mm MAP), actual evapotranspiration dominates, followed in importance by precipitation and mean temperatures, while the importance of soil moisture declines (as vegetation is no longer as water limited).Previous studies suggest that for arid and semi-arid areas, increasing pre-season temperature could reduce water availability by increasing evaporation, thereby delaying the season onset [16].However, if we view the key variables across the region as a group, we actually see that moisture and moisture availability do dominate the controls in this water-limited environment.Specifically, soil moisture dominates below 950 mm, a measure of the actual moisture content (precipitation and soil type/topography combined), and above 950 mm, actual evapotranspiration (available water) and precipitation are dominant.All three of these variables link to water availability as the strongest control on plant growth.As such, this does corroborate the findings of previous researchers, that in arid and semiarid ecosystems and seasonally dry tropical climates where water is limited, vegetation growth is primarily constrained by water availability [14][15][16][17]27,28].This relates to the mechanisms where precipitation stimulates green-up onset, and its distribution in the rainy season is critical to vegetation germination, growth and biomass [29].The results of this study reveal the power of DFA to aid in the interpretation of spatially variable environmental effects on a parameter of interest.In particular, this analysis quantified the spatial pattern of the role of each environmental factor in determining the resultant NDVI across an extensive, heterogeneous domain, specifically highlighting the transition between regions of bush/grass vs. tree savanna, a key class of interest within savanna research.
The removal of common trends from the final DFM in Model III yields a statistical model of NDVI based solely on a set of biophysical parameters.It is therefore possible to make predictions about changing NDVI in the study region based on the relevant environmental drivers and how they may change due to mid-and long-term climate forcing.While application of Model III for this purpose may provide a "first-cut" approximation of the likely response of vegetation to changing climate (via NDVI), it is important to recognize that the models developed herein are statistical; as such, any forecasting model inherently assumes that the multivariate relationships identified in the DFA remain consistent across a different collection of climatic variables (a problematic assumption under non-stationary climate conditions).While the sign of the identified relationships may not change over different boundary conditions, the slope and magnitude almost certainly will.
The lengths of some remote sensing records, such as the newly released AVHRR GIMMS NDVI3g, now permit a more systematic quantification of key relationships between the drivers of vegetation growth and the resulting cover type over a sufficiently extensive spatial and temporal scale to allow us to start to develop models for potential use under future climate change scenarios.In addition, we can reverse the model predictions and use them to determine where the model underperformed and, thus, what type of changes may be occurring in the landscape locations and what their drivers may be that were not included in the larger model.This type of change detection analysis will be used in future research across this landscape.This approach will help in improving the understanding of savanna dynamics for vegetation growth, in identifying potentially missed drivers and also in assessing the likely impacts of future climate variability and change in this very sensitive region of the world.

Conclusions
Dynamic factor analysis was applied to study the variation in NDVI across three co-located extensive watersheds in southern Africa and, more specifically, to identify the driving factors explaining the observed variation in savanna vegetation cover time series and their variation across biophysical gradients.Through the use of such models as NDVI, we can improve our understanding of these savanna systems, better evaluate their drivers and, so, hopefully utilize these studies in future change studies and for management purposes.This improvement in our ability to characterize these savanna landscapes allows for a much more solid footing upon which examinations of the change mechanisms and drivers are based.
We first evaluated AVHRR-vs.MODIS-based NDVI datasets across their overlapping period (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010).NDVI follows a general pattern of cyclic seasonal variation, with distinct spatio-temporal patterns across physio-geographic regions.Both NDVI products produced similar DFA models, although MODIS was simulated better.Soil moisture and precipitation controlled NDVI for MAP < 750 mm, and above this, evaporation and mean temperature dominated.A second DFA with the full AVHRR (1982-2010) data found that for MAP < 750 mm, soil moisture and actual evapotranspiration control NDVI dynamics, followed by mean and maximum temperatures.Above 950 mm, actual evapotranspiration and precipitation dominate.The results of this study confirmed the importance of the spatial distribution of soil moisture [84] and precipitation [3,85,86] as determinants of NDVI, but also point to the potential influence of both mean and maximum temperature and actual evapotranspiration, particularly in regions with MAP > ~750 mm.The spatial distribution of these environmental covariates points to a transition in their importance over NDVI from grass-dominated regions with MAP < 750 mm (dominated by soil moisture) and tree-dominated regions with MAP > 950 mm (dominated by precipitation, actual evapotranspiration and, to a lesser degree, mean temperature).
The quantification of the combined spatio-temporal environmental drivers of NDVI expands our ability to understand landscape level changes in vegetation evaluated through remote sensing and improves the basis for the management of vulnerable regions, like southern Africa savannas.Monthly AVHRR-derived vegetation indices hold considerable promise for the large-scale quantification of complex vegetation-climate dynamics (as discussed in [13][14][15][16][17], this issue) and for regional analyses of landscape change as related to global environmental changes [37,84].The research presented here highlights the utility of the time-series approach, and with the dramatic increase in research in global change, this type of methodology holds significant promise for further development, specifically as linked with such spatially and temporally extensive remotely sensed-based research and within the field of land change science (LCS).

Figure 1 .
Figure 1.Spatial pattern of mean annual Normalized Difference Vegetation Index (NDVI) (September-August of the next year) across the study area, derived from monthly NDVI Global Inventory Monitoring and Modeling System third generation (GIMMS3g) data from 1982 to 2010.The subset polygons correspond to mean annual precipitation intervals.The inset map shows the geographic location of the study area in southern Africa.MAP, mean annual precipitation.
) Model I: DFMs were developed based on increasing numbers of common trends (without CEVs) to establish a baseline satisfactory model performance according to goodness-of-fit indicators.(2) Model II: different combinations of CEVs were incorporated to reduce unexplained variability and improve the goodness-of-fit.(3) Model III: the common trends (unexplained variation) were removed, while optimizing multi-linear models of the best CEVs identified in Model II.Multiple regression code in Matlab (v2013a, The MathWorks, Inc., Natick, MA, USA) was used in the optimization process.Model III permitted refining of the selection of the most important CEVs, since inclusion of trends in Model II may mask the effects of important explanatory variables of the multivariate model (2).When selecting the "best" model, we adopted a multi-criteria objective consisting of: (a) model adequacy (minimizing BIC); (b) global model goodness-of-fit (maximizing average C eff ); (c) improvement in C eff for the worst-performing polygons (minimizing range in C eff ); and (d) reduction in the importance of common trends (i.e., minimizing ρ c,n ).

Figure 2 .
Figure 2. Variability of observed average Normalized Difference Vegetation Index (NDVI) estimated from the Moderate Resolution Imaging Spectroradiometer (MODIS) (black line) and the Advanced Very High Resolution Radiometer (AVHRR) (red line) from 2001 to 2010.

Figure 3 .
Figure 3. Dynamic Factor Analysis (DFA) model performance with one explanatory variable.The response variable, NDVI, is from MODIS (black circles) and from AVHRR (red triangles).The range of C eff is presented across polygons.(C eff , Nash-Sutcliffe efficient coefficient; BIC, Bayesian information criterion).

Figure 4 .
Figure 4.The weight of each driver for the best NDVI model and goodness of fit along the mean annual precipitation (MAP) and ecological gradients.The upper panel shows the variation in the Nash and Sutcliffe coefficient of efficiency (C eff ) for MODIS (in black) and AVHRR (in red).The bars show the mean and range of values across watersheds.The lower panel shows the main trajectories in the weighting coefficients for candidate explanatory variables (CEVs) for predicted NDVI values across the MAP gradient (MODIS in black; AVHRR in red).The weighting coefficients for CEVs are shown by symbols corresponding to the labels.Trend lines represent the main trajectories and highlight the shift in the importance of environmental drivers for each NDVI predicted.

Figure 5 .
Figure 5. Variability of observed NDVI for the study region for two mean annual precipitation (MAP) ranges.Upper and lower extremes values and range for MAP > 750 mm (continuous lines and light shading) and MAP < 750 mm (broken lines and dark shading).

Figure 7 .
Figure 7.The weight of each driver for the best NDVI model and goodness-of-fit along the mean annual precipitation (MAP) gradients for Model II.Lines represent the main trajectories and highlight the shift in the importance of environmental drivers for each NDVI zone predicted (see details in the caption of Figure 4).Symbols are filled in red where the weighting coefficients are significant (t-value > 2).

Figure 8 .
Figure 8.The incremental improvement of Model III performance with the addition of the explanatory variables.The best model is shown in bold; the solid line shows weighted averages and dashed lines the range across the spatial domain.

Figure 9 .
Figure 9.The weight of each driver for the best NDVI model and the goodness-of-fit along the mean annual precipitation (MAP) gradient for Model III.Lines represent the main trajectories and highlight the shift in the importance of environmental drivers for each NDVI zone predicted (see details in the caption of Figure4).Symbols are filled in red where the weighting coefficients are significant (t-value > 2).In the image, separate use en dash for ranges, thousands with a comma, and are Ea, M, P, S, T, E, etc., italicized?

Table 1 .
Initial set of candidate explanatory variables and their sources.CPC, Climate Prediction Center.NCEP-DOE, National Center for Environmental Prediction-Department of Energy.

Table 2 .
Incremental best model, with the addition of the addition of the explanatory variables.C eff presented are calculated for the whole area (values in parenthesis represent range for polygons).(C eff : Nash-Sutcliffe Efficient Coefficient).

Table 3 .
Test of collinearity of explanatory variables.The final set of explanatory variables selected based on variance inflation factor (VIF) < 10 are shown in bold.VIF values were calculated for area-weighted time series for the region.

Table 4 .
Selected results of the dynamic factor analysis of NDVI for the long-term AVHRR-based models.Numbers in bold represent selected models.

333 0.80 (0.53-0.88) +
[77]f : Nash-Sutcliffe coefficient of efficiency.Values presented are calculated for the whole area (values in parenthesis represent the range for polygons).dClassification of goodness-of-fitness results based on statistical significance (p = 0.01) for the whole region: −, Unsatisfactory; +, acceptable; ++, good[77].e ρ 1,n : Canonical correlation coefficients for the first common trend of Models I and II.
a Model I: only common trends (unexplained variability); Model II: with common trends and explanatory variables (explained variability); Model III: only explanatory variables (from multiple regression).b BIC: Bayesian information criterion.c