An Evaluation of Vegetation Health in and around Southern African National Parks during the 21st Century (2000–2016)

: Roughly 65% of the African continent is classiﬁed as savanna. Such regions are of critical importance given their high levels of biological productivity, role in the carbon cycle, structural di ﬀ erences, and support of large human populations. Across southern Africa there are 79 national parks within savanna landscapes. Understanding trends and factors of vegetation health in these parks is critical for proper management and sustainability. This research strives to understand factors and trends in vegetation health from 2000 to 2016 in and around the 79 national parks across southern Africa. A backward stepwise regression was used to understand the factors (e.g., precipitation, population density, and presence of transfrontier conservation areas) a ﬀ ecting the normalized di ﬀ erence vegetation index (NDVI) during the 21st century. There was a statistically signiﬁcant positive ( p < 0.05) relationship between mean annual precipitation and NDVI, and a signiﬁcant negative relationship between population density and NDVI. To monitor vegetation trends in and around the parks, directional persistence, a seasonal NDVI time series-based trend analysis, was used. Directional persistence is the net accumulation of directional change in NDVI over time in a given period relative to a ﬁxed benchmarked period. Parks and bu ﬀ er zones across size classes were compared to examine di ﬀ erences in vegetation health. There was an overwhelmingly positive trend throughout. Additionally, national parks, overall, had higher amounts of positive persistence and lower amounts of negative persistence than the surrounding bu ﬀ er zones. Having higher positive persistence inside of parks indicates that they are functioning favorably relative to the bu ﬀ er zones in terms of vegetation resilience. This is an important ﬁnding for park managers and conservation overall in Southern Africa.


Introduction
Globally, drylands cover around 50% of the earth's terrestrial surface [1]. Of these drylands, savannas cover extensive portions of Africa, South America, and Australia. In southern Africa, over 55% of the landscape is classified as savanna [1,2]. Savannas are a type of dryland, and therefore are water limited systems. They are traditionally defined as grasslands with scattered trees, but practically are defined as a heterogeneous landscape composed of a mix of grasses, shrubs, and trees [1,3,4]. Savannas play a key role in the global carbon cycle and global net primary production [5][6][7][8][9][10]. Due to transfrontier conservation areas are actually working and, as such, to address the landscape ecology theory that the size and connectivity of a protected area impacts the overall quality [37].
Currently, conservation development via parks and protected areas is a global phenomenon. However, such landscapes have rarely been studied on a larger scale explicitly to determine the factors of vegetation greenness or park success. This research examines factors of spatial heterogeneity in NDVI, specifically addressing the role of parks versus buffer zones or the surrounding land matrix. Such potential factors of savanna vegetation within these protected areas and their surrounding buffers include precipitation, human population density, transfrontier conservation areas, elevation, park size and area, land cover, and country. The research objectives of this study are as follows: (1) to understand the factors of vegetation health in national parks and their associated buffer zones in southern Africa; and (2) to determine how savanna vegetation persistence within these national parks and buffer zones of southern Africa varied from 2000 to 2016.

Study Area
For this study, 79 national parks across eight countries (South Africa, Zambia, Zimbabwe, Botswana, Namibia, Mozambique, Malawi, and Angola) in southern Africa were analyzed ( Figure 1). Of the various protected area designations (e.g., national parks, game management areas, private game reserves), national parks were selected for their consistency in definition across countries. These parks were taken from the World Database on Protected Areas [40]. Given the spatial scale of the remote sensing techniques, only parks greater than 312,500 m 2 were studied because five or more pixels at the 250 m spatial resolution were needed to be able to establish trends. For this reason, the national parks of Swaziland and Lesotho were not included in this study. Given the large variation in area across these 79 parks, using a natural breaks statistical technique, parks were broken into three main size classes: small, medium, and large ( Figure 2). The small size class had a sample size of 41 and a mean area of 302 km 2 . The medium size class had a sample size of 30 and a mean area of 4,771 km 2 . The large size class had a sample size of 8 and a mean area of 23,284 km 2 ( Figure 2). Buffer zones with sizes of 10 km, 20 km, and 30 km were created for the small, medium, and large parks, respectively. Buffers were created to be able to compare vegetation persistence inside and outside of the park. Buffer sizes chosen for each class are based on literature suggesting that buffers be approximately three times the size of the park [38]. Across this study region, climate varies greatly and was accounted for by comparing mean annual precipitation from the Global Historical Climate Network [41]. Mean annual precipitation of parks in this study varied from 52 to 1466 mm. Land cover variations were defined as savannas with varying amounts of grass/wood biomass. These differences were taken into account by using the Global Land Project Facility data on global land cover [42] (Table A1). Of these 79 national parks, 31 are considered to be transfrontier conservation areas, which can be defined as continuous conservation areas divided by geopolitical boundaries (namely country boundaries) that are cooperating together, but may still function under different management regimes [43].

Understanding Factors of Vegetation Cover: Stepwise Backward Regression
This section addresses factors used to measure vegetation change and related factors.

Model Selection
A backward stepwise regression was used for this analysis because this method is capable of selecting variables with the most explanatory power from a number of input variables [44]. All assumptions of linear models were tested before employing the model: a linear relationship between dependent and independent variables, statistical independence of the errors, homoscedasticity of the errors, and confirmation that the error distribution was normal. This model has a relatively higher level of robustness for small sample sizes [44] compared to a geographically weighted regression. However, to account for the park's spatial distribution on the landscape, we used variables that address differences across space, such as land cover, elevation, and mean annual precipitation. The model starts with all possible variables, a "full" model, and then runs several times to eliminate variables that are deemed not significant by the model. This is done by starting with all variables and then testing each variable by deleting it from the model. Variables are continuously deleted until the loss gives the least significant deterioration with the highest model fit. The final model of significant variables is referred to as a "reduced" model. The Akaike Information Criterion (AIC) is a technique that is based on in-sample fit to then estimate how well a model does at predicting future values [45]. Between models, the one with the lowest AIC is determined to be the model of best fit. Therefore, between the range of full and reduced models, the lowest AIC was used to select the best model here.

Model Variables
In order to understand the factors of vegetation health in and around national parks of southern Africa, the normalized difference vegetation index (NDVI) was used as a measure of vegetation health and abundance. NDVI is calculated as: NIR−R NIR+R , where R is the red band and NIR is the near infrared band [46]. This index works on the principle that NIR is highly reflective in healthy vegetation and red is highly absorptive in healthy vegetation. NDVI values range from −1 and +1, with any value above 0 indicating vegetation and above 0.9 saturating out in dense healthy vegetation [46]. The NDVI product from the Moderate Resolution Imaging Spectroradiometer (MODIS) MOD13Q1.006 Terra Vegetation Index (250 m spatial resolution) was used in this research. This NDVI product was obtained from Google Earth Engine (GEE) and had been atmospherically corrected on a per-pixel basis and was also pre-masked for water, clouds, heavy aerosols, and cloud shadows. The data has also been through the quality assessment check with the provided pixel reliability band. These products are 16 day MODIS observation composites where the pixel value is equal to the NDVI value of the observation closest to nadir view. This is determined between the top two NDVI observations during a 16  The independent variables in these regressions were: country (X1), land cover (X2), area (X3), mean annual precipitation (X4), human population (X5), elevation (X6), and transfrontier (X7). All of the variables except for country (coded) and transfrontier (yes or no) were at the 1 km spatial resolution, and were extracted at the park/buffer scale. The country variable is designated in the model to account for management type, depending on the country name and boundaries, which were obtained from the Thematic Mapping website [47]. The land cover variable is designated in the model depending on the cover type [42]. Within a park or buffer, the land cover type at a pixel level occurring most often was used as a representation of the respective area. Generally, land covers across these national parks are varying types of savannas [1]. This land cover data was obtained from the Global Land Cover Facility at the University of Maryland [42]. The area variable represented the size of each park and buffer zone. Precipitation, gathered from the Global Historical Climatology Network, represents longer-term climate, and was calculated by creating the mean annual precipitation (MAP) across the parks and buffers over 30 years [41]. The population density variable is measured in people per km 2 and was obtained from the WorldPop 2015 dataset [48]. Elevation, which has a complex relationship with vegetation (at extreme highs there is little vegetation, but at mid-level vegetation trees may be the most productive vegetation types and grasses may be less productive, and shrubs may decline consistently with increasing vegetation) is potentially an important explanatory variable [49]. The average elevation was calculated (m) for each park and buffer with data from the Shuttle Radar Topography Mission [50]. The transfrontier variable is binary and the list of transfrontier parks were taken from the Peace Parks Foundation, which is key in setting up and managing these areas [43].

Vegetation Directional Persistence
In addition to the analysis of NDVI absolute values, we investigated the per-pixel time series NDVI values using a measure called vegetation directional persistence. Directional persistence utilizes a novel time series approach based on vegetation indices, such as NDVI, to determine the overall health and persistence of vegetation cover over time, in this study, from 2000 to 2016 [23]. Directional persistence (D) is a cumulative direction of change over the time period of interest when compared to a benchmark observation of NDVI [16,23]. Using benchmark NDVI values and comparison NDVI values in years after the benchmark has been shown to be an indicator of vegetation trends over time and space [16,23]. A random walk statistic is applied to each pixel highlighting statistically significant increase or decrease in vegetation persistence, or health of vegetation at a pixel level [23]. This statistic is based on the accumulation of consecutive outcomes during a Bernoulli process. During this process the probability of a success (step = +1) and a failure (step = −1) are independent of the position of the random walk at that time so the probabilities of a success or failure are not dependent on previous successes or failures [23]. The MODIS NDVI data, described previously, was used for the directional persistence analysis by season (DJF, MAM, JJA, SON). The benchmark seasonal mean NDVI values were obtained for the start of the time series of MODIS data, from 2000 to 2004. This accounts for any anomalous years and corresponds with the timeline from the regression analysis in the first part of this study. The latter part of the NDVI time series was the comparison period from 2005 to 2016 (n = 12). For each season, the mean NDVI pixel value of each of the comparison years was compared with the mean NDVI pixel value from the benchmark [23], for every pixel in the image. If the value in a given comparison year was higher than the benchmark, the pixel was assigned a value of +1, and if the value in a given comparison year was lower than the benchmark, the pixel was assigned a −1. These values for the 12 years were then accumulated to produce a net value of D for each pixel. Because this method uses a random walk statistic, a statistical significance can be attached to each pixel [23]. Given this is a two-tailed test, for the significance level of 0.05, pixels with values of +/−8 were considered to have a significant trend in either a positive or negative direction. For the purposes of this study, directional persistence values were mapped across the entire region of southern Africa, but overall percentages of positive and negative significant values were extracted inside of parks, as well as inside of buffer zones, so a direct comparison between these two landscapes could be made.

Stepwise Backward Regression
Results of the stepwise backward regression are presented below in Table 1a,b, and Table 2a,b. Each of the models, regardless of season, time period, and park or buffer, had over 60% of the variation explained by the independent variables. This indicates that the models are successful in explaining the majority of spatial variations in NDVI values. However, parks in each season and period had higher R 2 values than buffers. There was also a greater range in park R 2 values relative to buffers. For parks, between 65% and 69% of the NDVI variation was explained by these models across seasons and time periods. For buffers, between 61% and 62% of the NDVI variation was explained by these models across seasons and time periods. The model varied slightly in the number of variables in the best model, but across all models, country (X1) was never used as an important variable. The intercepts in these models were very low (hovering around zero) due to the small range in which NDVI is measured. The Akaike Information Criterion (AIC) was included to demonstrate exactly how much these models differed, but within each of the season, time periods, and parks or buffers, the lowest AIC was used to select the best model. In comparison across parks and buffers, the parks had lower AIC values, also supporting that the park models performed better overall.      A variable results table of the stepwise backward regression are below in Table 1a,b, and Table 2a,b. This shows each of the variables, when they were significant, how they were significant (positive or negative, if applicable), and what statistical significance level (p < 0.001 = ****, 0.001 = ***, 0.01 = **, 0.05 = *, or no * for no significance).
In every model, the most important independent variable was MAP. MAP had a positive relationship with NDVI, meaning that higher precipitation resulted in higher NDVI values. MAP consistently had the highest level of significance (Table 1a,b, and Table 2a,b, <0.001) across each model. This is expected considering that savannas are water-limited systems.
The second most important independent variable was mean population density. Mean population density had a negative relationship with NDVI. This means where there were higher human populations, there were lower NDVI values, likely due to human use of the landscape and potential subsequent degradation. Mean population density was significant across most of the models, but tended to be most significant in the buffer zones (Table 1a,b, and Table 2a,b, <0.001). This is due to higher densities of people in the buffer zones than in the parks. In parks, mean population density was significant at a relatively lower significance level and only in Another important variable was whether or not a park was part of a transfrontier conservation area. Transfrontier areas had a positive relationship with NDVI values, so where there was a transfrontier conservation area, NDVI values were higher. This might be expected because areas of cooperation may lead to better management, with healthier vegetation. This variable had a high level of significance (Table 1a,b, and Table 2a (Table 1a,b,  and Table 2a,b, 0.01), except for buffers in 2013-2016, which was a higher level of significance (Table 1a,b,  and Table 2a,b, 0.001). This result is likely showing a strong significance due to the firm establishment of these transfrontier areas in this period, which helps to mitigate some human impact. This may be combined with the high level of biomass at the end of the rains then, when NDVI values are at a maximum.
Elevation had a complicated relationship with NDVI. Mean elevation had a positive relationship with NDVI only in the buffer zones of DJF in 2000-2004 and 2013-2016 (0.05 level of significance), so with higher elevation there were higher NDVI values. This may be caused by humans not utilizing the landscape at higher elevations. Mean elevation had a negative relationship with NDVI only in parks of MAM in 2000-2004 (0.01) and 2013-2016 (0.05), so with higher elevation there were lower NDVI values. However, this may be due to a lack of vegetation at higher elevation in some parks. Overall, this was a much more mixed and weaker result.
Area was not as significant as was hypothesized prior to the analysis. It was only significant in parks at the 0.05 level in DJF 2005-2008 (Table 1a,b, and Table 2a,b). Area had a negative relationship to NDVI, meaning for this one model, the larger the park the lower the NDVI values, which was unexpected.
Land cover was significant in some cases. In DJF in buffer zones from 2000-2004, 2005-2008, and 2009-2012 land cover was significant at the 0.05 level. In MAM, land cover was significant (varying between 0.05, 0.01, and 0.001) across parks and buffers in all time periods. This suggests that land cover plays a more important role in NDVI values later in the wet season when the NDVI values themselves are much higher.

Vegetation Directional Persistence
Directional persistence varied spatially across southern Africa and across seasons (Figure 3a persistence. In eastern Mozambique and western Angola there was a weak negative trend in persistence. In MAM there was more neutral persistence across this landscape. In northeastern Namibia and in the Kalahari (western Namibia and South Africa), there were strong positive persistence trends. The corner area where the borders of South Africa, Mozambique, and Zimbabwe meet, as well as in central-northern Mozambique, there were strong negative trends. In southwestern Angola and western central South Africa there were also some negative trends. These trends across seasons may be related to the NDVI factors in the first part of this study, such as precipitation. Further investigation is needed to determine if there is a change in precipitation timing or amount during this study period.  The percentages of significant positive and negative persistence for parks and buffers for each of the three park size classes and two seasons are represented in Figure 4a and Figure 4b. These figures highlight only significant persistence and do not account for nonsignificant persistence and therefore do not add up to 100%. Across the wet season (Figure 4a and Figure 4b), in parks and buffer zones, there is an overwhelming significant positive persistence trend. The parks and buffer zones had higher percentages of positive persistence (20%-31%) and lower percentages of negative persistence (3%-4%) in DJF compared to MAM. In MAM, positive persistence ranged from 13% to 26% and negative persistence ranged from 5% to 10%. When comparing parks with buffer zones, in DJF, small and medium parks had higher percentages of positive persistence than buffer zones. Large parks/buffers had similar percentages of positive persistence. In DJF, small and medium parks had lower percentages of negative persistence than buffer zones, while large parks/buffers had similar percentages of negative persistence. When comparing parks with buffer zones, in MAM, small, medium, and large parks had higher percentages of positive persistence than buffer zones. In MAM, small, medium, and large parks had a lower percentage of negative persistence than buffer zones. In MAM there was more neutral persistence across this landscape. In northeastern Namibia and in the Kalahari (western Namibia and South Africa), there were strong positive persistence trends. The corner area where the borders of South Africa, Mozambique, and Zimbabwe meet, as well as in central-northern Mozambique, there were strong negative trends. In southwestern Angola and western central South Africa there were also some negative trends. These trends across seasons may be related to the NDVI factors in the first part of this study, such as precipitation. Further investigation is needed to determine if there is a change in precipitation timing or amount during this study period.
The percentages of significant positive and negative persistence for parks and buffers for each of the three park size classes and two seasons are represented in Figure 4a,b. These figures highlight only significant persistence and do not account for nonsignificant persistence and therefore do not add up to 100%. Across the wet season (Figure 4a,b), in parks and buffer zones, there is an overwhelming significant positive persistence trend. The parks and buffer zones had higher percentages of positive persistence (20%-31%) and lower percentages of negative persistence (3%-4%) in DJF compared to MAM. In MAM, positive persistence ranged from 13% to 26% and negative persistence ranged from 5% to 10%. When comparing parks with buffer zones, in DJF, small and medium parks had higher percentages of positive persistence than buffer zones. Large parks/buffers had similar percentages of positive persistence. In DJF, small and medium parks had lower percentages of negative persistence than buffer zones, while large parks/buffers had similar percentages of negative persistence. When comparing parks with buffer zones, in MAM, small, medium, and large parks had higher percentages of positive persistence than buffer zones. In MAM, small, medium, and large parks had a lower percentage of negative persistence than buffer zones. The percentages of significant positive and negative persistence for parks and buffers for each of the three park size classes and two seasons are represented in Figure 4a and Figure 4b. These figures highlight only significant persistence and do not account for nonsignificant persistence and therefore do not add up to 100%. Across the wet season (Figure 4a and Figure 4b), in parks and buffer zones, there is an overwhelming significant positive persistence trend. The parks and buffer zones had higher percentages of positive persistence (20%-31%) and lower percentages of negative persistence (3%-4%) in DJF compared to MAM. In MAM, positive persistence ranged from 13% to 26% and negative persistence ranged from 5% to 10%. When comparing parks with buffer zones, in DJF, small and medium parks had higher percentages of positive persistence than buffer zones. Large parks/buffers had similar percentages of positive persistence. In DJF, small and medium parks had lower percentages of negative persistence than buffer zones, while large parks/buffers had similar percentages of negative persistence. When comparing parks with buffer zones, in MAM, small, medium, and large parks had higher percentages of positive persistence than buffer zones. In MAM, small, medium, and large parks had a lower percentage of negative persistence than buffer zones.

Discussion and Conclusion
This research supports literature that cites the major drivers in savanna systems as climate, namely precipitation, and humans, namely human population densities that are putting pressure on protected areas [2,32,33]. From this study, we can conclude that across southern African savanna

Discussion and Conclusions
This research supports literature that cites the major drivers in savanna systems as climate, namely precipitation, and humans, namely human population densities that are putting pressure on protected areas [2,32,33]. From this study, we can conclude that across southern African savanna parks and buffers, where the savanna systems are by their very nature water-limited, increases in precipitation lead to higher NDVI. This study supports that precipitation is important in maintaining healthy vegetation, but under an increasingly changing climate, where there are increasing temperatures and more precipitation variability leading to less available moisture, these systems may become significantly altered [7,51]. This can potentially have widespread effects given the extent of drylands globally, and is a reason why such studies, and the development of vegetation persistence metrics, are key.
An unexpected finding of this research was that park size was not a significant factor of determining higher or lower NDVI. The phrase "bigger is better" is often used when describing protected areas because of principles of connectivity in landscape ecology [37], which can be true to a certain threshold, but further nuances in this theory have come about more recently. New information suggests that placement of a protected area is more important than size [38,52,53]. However, "bigger is better" has remained an influence on policy makers. The "Convention on Biological Diversity" and "World Parks Congress" have created initiatives to conserve the maximum area of land as possible, but this may not actually be the best way to conserve biodiversity [52,53]. As DeFries et al. (2007) states, "The relative size and placement of a protected area within the greater ecosystem may be more relevant than absolute size for the ability of a protected area to maintain ecological function" [38]. Therefore, other factors should also be taken into consideration, and one of these is relative location. Protected areas that exist further away from human populations in remote locations and those that have less human use of natural resources tend to function better [38]. Our research supports these previous findings because areas of higher human population density had lower NDVI values across seasons and time periods. Also linked to the relationship between humans and parks is the existence of buffer zones. Buffer zones exist to mitigate the effects of humans on the landscape and wildlife. However, human populations have often been moved out of national parks and into the buffer zones. Therefore, the landscape in buffer zones is often more affected than in parks [54]. This is supported by our findings when comparing vegetation persistence in and out of parks. Vegetation persistence experienced more positive and less negative pixels inside of parks across seasons, time periods, and park sizes when compared to the buffer zones.
This research highlights the importance of the transfrontier conservation areas that have been developed in this century. The principle of transfrontier conservation areas lie in theories of landscape ecology, that being more connected is better [39]. Peace Parks Foundation has begun the effort of creating transfrontier conservation areas across Africa, but several of their pilot sites are in southern Africa and are included in this analysis [43]. Our results further support the validity of setting up these conservation areas across geopolitical borders from the perspective of the positive relationship between the existence of a transfrontier conservation area and higher NDVI values.
This research also supports observed global greening trends in vegetation. Greening of the earth has been observed over the last few decades by NASA and NOAA satellites [24]. One of the largest contributing factors to this greening is CO 2 fertilization, but there are other factors, such as nitrogen levels, land use, temperature, and climate [25][26][27]. Our research found that the overwhelmingly strong trend in significant directional persistence was positive for much of the region, i.e., that by using NDVI as a measure of vegetation greenness, the years 2005-2016 were greener or had higher NDVI relative to the benchmark years in the earlier part of the 21st century. This greening is not necessarily a positive change biologically. It is also possible that this could be a type of shrub encroachment (correlated local factors, such as increased herbivory, or global factors, such as increased carbon dioxide) or this greening could be from invasive species [55]. This is why it is important for local scale, fieldwork-based studies, in conjunction with these national and macro-scale analyses, to better understand the local landscape dynamics.
The advantage of using the metric of directional persistence is that it provides a robust and statistically driven foundation for more in-depth analysis. Unlike more traditional approaches, this does not just describe the conversion of the system, but gives us more information on vegetation trends over time that could be further broken down into monthly time steps if desired. Directional persistence can be used to understand vegetation dynamics and, from this, we can highlight areas of vulnerability based on patterns of positive and negative persistence over time. Another major advantage of this technique is that each pixel has a statistical significance that the user can specify. This rigorous and innovative approach is an important contribution to the study of vegetated systems globally, as it can be used for any vegetation system. One limitation of this metric is that it quantifies the amount of vegetation rather than type of vegetation. In these savanna systems, determining the grass-shrub-tree components is important for further quantifying degradation. Therefore, these products may be more difficult for local managers to utilize, and more in-depth localized field-based studies should be carried out.
In conclusion, from this study we can determine that, of the variables we included, precipitation, human population, and transfrontier conservation areas are the most important factors of vegetation health in these savanna parks and buffers. For the future, refining rainfall data and including burn area index data as a proxy for fire occurrence may improve these models further. Another variable that is much more difficult to quantify, but may improve models in the future, is that of herbivory on the landscape. Future research will work to incorporate these additional factors of savanna park vegetation.
Across different park sizes and seasons, but especially during rainy seasons, positive persistence is the dominant trend. Positive persistence is higher inside of parks than buffers, and negative persistence tends to be higher inside the buffer zones than in the parks. This suggests that parks are functioning as designed from the perspective that they are important stable features in an increasingly dynamic world. These protected areas are important for conserving biodiversity, which helps to promote tourism and increase the socio-economic activity in this region, which is particularly important to communities who have been largely dependent on subsistence agriculture [56]. While these findings are a regionally important tool, managers should also continue to monitor vegetation at the more localized scale, and their expertise and field-based, site-specific knowledge is of paramount importance. The use of both types of data in concert will provide park managers and conservationists with a very powerful set of analysis tools, both now and in the future.  Acknowledgments: These authors would like to thank the Peace Parks Foundation for the data on transfrontier conservation areas.

Conflicts of Interest:
The authors declare no conflict of interest. Table A1. This table includes the park names and the land cover table, based on majority of pixels, from the Moderate Resolution Imaging Spectroradiometer (MODIS) Collection 5 global land cover [42].

Park
Land Cover (Majority Pixels)

Agulhas Deciduous woodland
Augrabies Falls Sparse grassland