Predicting Understory Species Richness from Stand and Management Characteristics Using Regression Trees

Managing forests for multiple ecosystem services such as timber, carbon, and biodiversity requires information on ecosystem structure and management characteristics. National forest inventory data are increasingly being used to quantify ecosystem services, but they mostly provide timber management and overstory data, while data on understory shrub and herbaceous diversity are limited. We obtained species richness and stand management data from relevant literature to develop a regression tree model that can be used to predict understory species richness from forest inventory data. Our model explained 57% of the variation in herbaceous species richness in the coastal plain pine forests of the southeastern USA. Results were verified using field data, and important predictors of herbaceous richness included stand age, forest type, time since fire, and time since herbicide-fertilizer application. This approach can make use of available forest inventories to rapidly and cost-effectively estimate understory species richness for subtropical pine forests.


Introduction
Today's complex and changing global environment has generated increasing interest in the use of forest inventory data for analyzing, monitoring and assessing natural resource management options.Available, national-level forest inventory systems such as the United States' Forest Inventory and Analysis (FIA), Canada's National Forest Inventory system, and the European Forest Inventory Database have traditionally been used to monitor and evaluate changes in forest ecosystems over time with an emphasis on timber production [1][2][3].These inventory systems generally consist of permanent plots from which tree and other ancillary stand data are collected periodically.Most recently, these available plot-level data are also being used to estimate landscape and regional-level forest ecosystem services, such as aboveground, understory and below-ground carbon stocks [4].
As managers begin to take advantage of forested ecosystems for other uses besides timber production, we need to assess potential trade-offs and/or synergies between different multiple uses, such as timber, carbon, and biodiversity, as well as understand management's role in determining ecosystem structure and the consequent impacts on these functions and services.Although most national forest inventory data are rich in tree level information, the quality and quantity of data on the understory shrub and herbaceous components are limited [5].As an important objective of multiple use forest management, comprehensive biodiversity conservation requires information on the entire forest plant community, including understory shrubs and herbaceous vegetation, but this type of data is currently unavailable in most of the above-mentioned inventory systems, particularly those used in the southeastern United States (US).
The structure and composition of understory plant communities have been associated with overstory plant composition and specific forest management practices [6][7][8][9].With a robust suite of tree, stand, and management history data, forest inventory systems seem suitable for quantifying the relationship between understory species richness and overstory composition and management.For example, most of the United States Forest Service's FIA data contain information not only on tree biometrics but also stand management and disturbance history.This information, along with overstory data, can be used to predict herbaceous/understory species richness.Twedt et al. [10] have even used FIA data to predict changes in avian abundance in the southeastern US.
The FIA also collects data on stand age, which can be used to help predict understory plant richness.Pine flatwoods, in their seral stage, showed higher plant diversity than younger stands, while recently established pine plantations exhibited greater herbaceous cover that benefited wildlife [6].As further evidence of a relationship between stand age and plant richness, Baker and Hunter [7] discovered that as canopy closure increased in younger forest stands, herbaceous richness decreased due to light limitation and competition for resources, but in older stages, tree mortality from self-thinning opened up the canopy, facilitating greater light penetration to the ground, thereby supporting greater herbaceous richness.
Stand origin and ecological disturbance type can further help determine understory plant species richness.Studies comparing unmanaged forests to plantations have found impoverished flora and fauna in plantations [8,9].Natural disturbances create heterogeneity in ecosystem structure and processes, increasing overall diversity [11].Among different disturbances types, fire frequency and severity are important predictors of pine flatwoods diversity [12,13].For example, two prescribed fires in a 50-year-old slash longleaf mixed flatwood stand reduced overall woody plant biomass but increased herbaceous plant biomass, species richness and diversity [13].Similarly, herbaceous species richness and biomass were positively correlated with fire frequency, but midstory species richness and density were negatively correlated [12,14].Another two variables often measured as part of forest inventories are observations on past stand treatment and management practices-disking, bedding, fertilizing, thinning, and herbivore grazing-which represent potentially important determinants of richness, especially in pine plantations.Species richness increased or did not change for up to twenty years on clearcut sites after intense ground disturbing treatments such as disking, bedding, and herbicide applications [15,16].In some cases, treatments such as roller chopping, bedding, windthrowing, disking, burning, and grazing increased herbaceous species richness and diversity during the first few years after treatment, but subsequently, forests returned to pre-disturbance richness and diversity levels [17,18].
Finally, herbicide application and time since herbicide treatments can also help predict herbaceous species richness.Herbicide applications for the control of undesirable woody and herbaceous plants during the first 3-5 years of stand establishment suppressed floristic diversity, even 15 years after treatment [19].In Georgia and South Carolina, US, five years following herbicide treatment, pineland herbaceous species richness was lower than pre-treatment levels [19,20].On the other hand, a study on a Pinus taeda plantation in Georgia, US, found no differences in either herbaceous richness or diversity between six herbicide treatments and control 11 years after initial treatment, suggesting a return to pre-treatment levels after a finite period of time [21].
Overall, the above literature indicates a relationship between herbaceous species richness and overstory forest structure, composition, and past and existing management practices.Additionally, these same overstory forest structure variables from forest inventory systems are frequently used to assess forest management scenarios, ecosystem service provisions, and biodiversity conservation objectives [10,22].But, despite the importance of herbaceous plant species richness in determining overall biodiversity conservation and wildlife habitat quality metrics [23,24], we know of few inventory systems that provide information on understory plant species richness.
Given the lack of available geo-referenced plot-level data on herbaceous plant diversity in pine forests of the southeastern US coastal plain, we develop a quantitative approach to predict herbaceous plant richness using an analysis of the literature and forest structure and management variables from the USDA Forest Service FIA program's database.Our specific objective is to analyze and model relationships among forest disturbance types and management history characteristics from available plot-level forest inventory data from relevant literature and use these results to predict understory herbaceous richness in pine forests of the southeastern US.To address this problem, we apply a regression tree approach due to its efficient computational costs and the relative ease of interpreting results [25].Prediction of herbaceous plant diversity using forest inventory plot and stand-level data from available forest inventories such as FIA provides additional information and variables for analyzing the potential synergies and tradeoffs among competing multiple-use forest management objectives, such as timber management, carbon storage, and biodiversity.

Experimental Section
We analyzed the literature on studies from pine forests in the outer, mid and lower coastal plain ecoregions (outer coastal plain mixed forest province [26]) of the southeastern United States (Figure 1) to collect herbaceous richness information to develop our model.The geographic area covered in our analysis of the literature spanned areas of the coastal plain present across the states of North Carolina, South Carolina, Georgia, Mississippi, and Florida.The predominant vegetation consists of pine-dominated forests, which are interrupted by scattered areas of cold-deciduous and evergreen broad-leaved forests.Slash (Pinus elliottii) and longleaf pines (Pinus palustris) prevail in the region, especially in the southern areas, loblolly pine (Pinus taeda) is common in northern areas, and sand pine (Pinus clausa), except Choctawhatchee sand pine (Pinus clausa var.immuginata) found in wetter sites, is found in xeric, deep-sand locations of Florida.The longleaf forest ecosystem is commonly cited as having one of the most diverse understory plant communities in the US [27,28].Oak-gum-cypress forests also occur along flood plains in the region, and localized areas of mostly hardwoods are present, but our study was constrained to pine-dominated and mixed-pine hardwood forests [26].The coastal plain ecoregion is predominantly flat (irregular or smooth plains) with relief between 3-9.1 m (10-30 ft) on smooth plains and elevation ranging from 25 to 200 m, and soils are mostly Udults and range from well drained to poorly drained with fine to moderately fine texture.Mean annual rainfall is 102-152 cm (40-60 in), and temperature averages between 16-20 degrees Celsius (60-68 degrees Fahrenheit).The growing season extends 200-280 days [26].Historically, fires-both natural and man-made-have played a vital role in shaping the ecosystems of this region, and regionally common forest types, like longleaf pine forest and areas dominated by sand pine, are widely recognized as fire dependent ecosystems [29].
From our analysis of the literature, we created a database to model understory species richness using overstory, disturbance, and management information.First, we analyzed the published literature on herbaceous species richness in pine forests of the southeastern US coastal plain in Google, Google Scholar, Web of Science, Web of Knowledge, and Cambridge Scientific Abstracts to identify relevant variables by using the following key words: herbaceous richness, understory richness, pine flatwoods, southeastern coastal plain, stand treatments and pine flatwoods, fire and pine flatwoods, herbicide treatments and pine flatwoods, longleaf pine, slash pine, loblolly pine etc. Overall we analyzed more than 100 peer-reviewed publications, USDA Forest Service technical reports, theses, dissertations, and books related to species richness of pine forests in the southeastern coastal plain.Second, from our analysis of the literature, we selected 26 studies (Table 1) from North Carolina, South Carolina, Georgia, Mississippi and Florida that provided relevant plot-level information on number of herbaceous species, as well as information on plot condition, management and disturbance characteristics (Table 2).The 26 studies (Table 1) selected for our analysis met the criteria of providing both information on herbaceous species richness per plot (plot size ranged from 0.25 to 2500 sq.m) and other ancillary information similar to FIA plot-level characteristics (Table 2).From these studies, we compiled a database consisting of 163 observations, plot level herbaceous richness information, and all other variables included in our herbaceous species richness model (Table 2).Since our objective was to predict species richness using forest inventory data, we only used information congruent with the FIA data (Table 2).Table 1.Physiographic distribution of 26 selected studies that provided data for the herbaceous richness model of the pine forests of the southeastern coastal plain.
where S = Richness, C = Intercept, Z = slope, and A = Area.Using the mean Z value (0.372) from Fridley et al. [53], we calculated Log C for each plot using area (A) and richness (S).Once the values for all the variables in the equation were derived, we used the log transformed Arrhenius equation (equation 1) to translate all richness values to a number of understory plant species/4 m 2 .Quadrat sizes reported in the literature we reviewed varied from 0.01 to 1000 m 2 .Therefore, we chose 4 m 2 because it fell in the middle range of different plot sizes used in our literature review, and this size is large enough to incorporate many herbaceous species and simplifies field measurements.Once plot values and sizes were normalized we developed a model using a regression tree for predicting herbaceous species richness in pine forests of the southeastern US coastal plain.The regression tree approach has frequently been used to model ecological data exhibiting strong non-linear relationships and higher order interactions [54][55][56].Breiman et al. [54] provides additional details on classification and regression trees.We used rpart package for R (R version 2.13.1) to fit our regression tree model where our response variable was species richness per 4 m 2 (Table 2) and all other variables in Table 2 were used as predictors.The regression tree partitioned the response variable into homogenous groups using each independent variable at a single split until no further improvement in sum of squares of node (split) could be made with additional splits.We fit the regression tree model with 20 cross-validations.First we built the tree until the complexity parameter, or the cost of adding another variable to the model was zero, and then we pruned the tree and selected the number of splits using a 1-SE criterion [54].According to the 1-SE criterion, the "best tree" is the one whose estimated cross validation error is within one standard error of the minimum cross validation error.
We evaluated our model's predictions by visual assessment of residual plots and by fitting observed versus predicted values for species richness using the model.Similarly, we validated the model with an independent data set obtained from pine flatwood sites in Georgia and Florida (Figure 1).The validation data set was used to calculate validation statistics such as mean prediction error (equation 2), percentage error (equation 3), mean absolute difference (equation 4), and mean square error of prediction (equation 5).

Results and Discussion
Our model explained approximately 57% of the variation in herbaceous species richness.The R 2 value for the observed versus predicted regression was 0.55, and a plot of observed and predicted values (Figure 2) along with residual plots revealed no systematic over or under prediction at different ranges of observed values (homogeneity of prediction).For the independent data, the model's mean prediction error was −1.01, which means, on average, the model predicted approximately 1 species more than the actual observed number found within a 4 m 2 plot.The percent error was −9.4%, indicating that prediction error was within 10% of the mean.Mean absolute difference between observed and predicted was 5 species per 4 m 2 .Results from these statistical analyses indicate a reasonable prediction of richness from our model.
According to our regression tree results, the most important variables, among those tested for predicting richness (Table 2) were stand age, forest type, time since fire, and time since herbicide and fertilizer application (Figure 3).In our model, the variable stand age (<45 years) started the first split, which means separate processes act to determine richness in younger (<45 years; left hand side of the tree) versus older stands (>45 years; right hand side of the tree); thus stand age is an important predictor.On the left hand side, the tree is further broken down based on forest type (longleaf pine oak versus others) and time since herbicide and fertilization application, which indicates the importance of these processes in determining richness in younger forest stands.Similarly, the right hand side of the regression tree (Figure 3) was split based on years since fire (≥2.25), highlighting the role of fire frequency on understory richness in older stands.Continuing on, further splits illustrate the influence of stand age (≥60 years) and forest type (longleaf pine versus others) in stands with fire intervals longer than 2.25 years, whereas splits along the right hand side (Figure 3) confirm the importance of stand age (≥55 years) and forest type (slash and slash pine hardwood versus others) as well as time since fire (≥1.25 years) in stands that undergo shorter fire intervals.Our quantitative approach using relevant data from the literature to fit a regression tree model also revealed interactions and non-linear relationships between model variables.As seen in Figure 3, stand age determines the first split.The second split in younger stands (<45 years) is based on forest type, but time since fire is important in older stands (>45 years).Similarly herbicide and fertilizer application is important in younger stands only.This indicates that interactions among forest type, stand age, time since herbicide and fertilizer, and time since fire determine understory species richness.The model also demonstrated that separate processes such as forest type and time since herbicide and fertilization application interact with stand age to determine herbaceous richness in younger forests, whereas time since fire and forest type interact with stand age to determine richness in older forests (>45 years).Accompanying the diminishing effects of herbicide applications overtime, forest stands usually receive fewer fertilizer and/or herbicide applications as they age and older stands are rarely sprayed.So, fire becomes a more important factor than chemical application when stands grow older, which our model also demonstrated.
The relationships that explained species richness in our model are consistent with past literature on the subject and earlier findings.Stand age in particular is an important predictor of understory species richness with older forests supporting more species (i.e., more diverse); thus our results are consistent with the theory of stand development [57].As a stand grows and advances towards canopy closure, herbaceous richness declines due to decreased sunlight and competition from other trees.But during older stages, mortality caused by self-thinning opens the canopy and allows more light penetration to the understory, thereby supporting greater herbaceous richness [7].Nonetheless, in younger stands (Figure 3, left hand side of the first split) forest type was also important as the longleaf pine oak forest type displayed higher herbaceous richness than any other forest type.While the variable forest type can be a subjective measure, it is a broadly applied descriptive variable in forest research that consistently defines community type (clearly defined parameters) based on dominant species and incorporates other associated variables such as environmental physiographic, structural and compositional characteristics, all of which affect species richness.Hedman et al. [12] also found that mean herbaceous richness-adjusted for overstory density-was higher in naturally regenerated Pinus palustris stands than in either Pinus taeda or Pinus elliotii stands.Noss [27] suggested that higher tree planting density, greater site preparation disturbance, and modified fire regimes also help explain lower herbaceous diversity in P. elliotii and P. taeda plantations compared to natural P. palustris forests.Similarly, forest types other than longleaf pine/oak in our study show that time since herbicide application was an important variable: species richness increased as the time since herbicide and fertilization increased.This is corroborated by other studies that show an initial decline in herbaceous richness after herbicide application, with a delayed return to pre-disturbance richness levels after some time.In their study of a P. taeda plantation in Georgia, Miller et al. [21] also found no differences in either herbaceous richness or diversity between six herbicide treatments and control 11 years after the treatment.Likewise, Boyd et al. [39] encountered no difference in plant richness and diversity between control and herbicide treatments 7 years after application in P. taeda plantations in Georgia.
In older forests (Figure 3, right hand side of the first split), time since fire was important and the longer the time, the lower the richness.Studies in slash/longleaf mixed, loblolly, and P. taeda flatwoods have shown that prescribed fire is an important tool to restore herbaceous species, and as time since fire increases, herbaceous richness declines [13,14,47].Older stands were further divided on both right and left model splits (Figure 3) based on stand age, forest type and time since fire.Therefore our analysis revealed that variables such as forest type, time since fire, stand age, and time since herbicide and fertilization could be important predictors of herbaceous richness.
Overall our study indicates that in younger stands (<45 years), forest type and time since herbicide and fertilization application are important predictors of understory species richness, whereas in older sites (>45 years), time since the last wild or prescribed fire is a useful predictor of species richness, as is stand age and forest type.However, in older stands, time since herbicide and fertilizer application is not an important predictor.As forest managers aim to maximize the provision of ecosystem services, and in particular biodiversity, in natural and planted forests, our research highlights the need for different management practices/approaches based on stand age and forest type.Our model was developed using plot-level data from the literature, and we believe that managers can use insights from our results and apply them to their own local inventory data, or to data collected through FIA or other sources, as long as the data include the important variables discussed in this paper as key indicators of understory plant richness (i.e., stand age, forest type, fire intervals, and time since herbicide and/or fertilization).
Different levels of ecosystem services can be provided from the forest at different stages of forest development, and different management and structural variables affect plant diversity depending on forest age and forest type.Our results corroborate the importance of the use of prescribed fire in pine forests of the southeastern US' coastal plain as a tool for maintaining and even increasing herbaceous species richness and overall forest biodiversity, especially in older stands.Because herbicide and fertilizer use increases stand productivity but reduces understory diversity in young flatwood stands, our model results make evident the trade-offs among timber, biodiversity, and carbon storage management objectives.However, specific long-term management objectives such as maximizing carbon stocks and timber volume might out-weigh the short term negative impacts on understory species richness.While our regression tree was developed using local forest data, we believe that regional estimations of understory richness are also possible through the use of FIA data that are plot based (local) but distributed across the landscape (regional).

Conclusions
Ecosystem goods and services such as biodiversity, timber, and carbon storage are important forest management objectives that provide various economic, environmental, and social benefits to society.However, even though carbon storage and timber are often priority forest management objectives; they can conflict with overall biodiversity conservation objectives [58].Furthermore, geospatial methods and models [2,3,10,22] are increasingly using available forest inventory data to estimate these ecosystem services, but these data are heavily weighted towards overstory tree and stand characteristics.As a result, our approach and model provide a repeatable and rapid means for estimating understory biodiversity, such as herbaceous richness, and to enhance these methods that quantify the trade-offs among these different, and often conflicting, ecosystem services.
Although the integration of detailed understory and herbaceous plant inventories into the FIA database would be ideal, at the regional level, this detail is costly and time-consuming.Therefore, this study uses relevant information from the literature to develop a model for estimating understory species richness from available forest inventory data.The model applies overstory, disturbance and management information that are readily available in the inventory datasets, and validation with an independent data also showed that estimates were reasonable predictors of species richness.We believe the quantitative approach presented here provides an improved method towards accounting for the overall benefits and trade-offs associated with management and policy decisions regarding U.S. forests.As such, the availability and use of forest inventory data make them an important source of information for developing specific forest management objectives, and alternatives and for estimating the interactions among ecosystem services and their trade-offs.

Figure 1 .
Figure 1.Map showing the coastal plain (shaded area) of the southeastern USA where studies that provided data to model herbaceous richness were conducted.Stars indicate two of the sites (Georgia and Florida) that provided data to validate herbaceous richness model.

Figure 2 .
Figure 2. Plot of observed versus predicted values of a model that predict herbaceous richness in pine forest of the southeastern coastal plain.

Figure 3 .
Figure 3.A regression tree for predicting understory herbaceous richness in pine forests of the southeastern coastal plain US.Sage = Stand age, Ftype = Forest type, Tfire = Time since last fire, Therbfer = Time since herbicide and fertilization.Values at the end of each terminal node are richness per 4 m 2 .

Table 2 .
Information collected from the literature to develop an herbaceous species richness prediction model for pine forests of the southeastern U.S. coastal plain.