1. Introduction
The climate is changing, globally becoming warmer almost every year in recent decades. Risks associated with this warming are high, sometimes manifesting into multiple, broad threats to humanity [
1] and the economy [
2]. The recent Intergovernmental Panel on Climate Change (IPCC) report on the impacts of global warming of 1.5 °C above pre-industrial levels, and in comparison to impacts of 2.0 °C, describes many ‘Reasons for Concern’ related to efforts to strengthen the global response to the threat of climate change, sustainable development, and efforts to eradicate poverty [
3]. Even so, with current pledges in the Paris Agreement on Climate Change, ~2.6–3.2 °C of warming is projected by 2100, though the Agreement aims to limit global warming “well below 2 °C” and to “pursue efforts” to limit temperatures above pre-industrial levels to 1.5 °C [
4]. The biodiversity implications of these various levels of warming are huge, as outlined in Warren, et al. [
5], where climatically determined geographic range losses exceeding 50% were projected for 44%, 16%, and 8% of plants by 2100, corresponding to warming of 3.2, 2.0, and 1.5 °C, respectively. Even though climatically determined range losses do not equate with actual distributions of plants because trees live a long time while harboring great genetic diversity, the potential effects of climate change on the biota of the planet are staggering. Meanwhile, the co-benefits of limiting the amount of warming towards the 1.5 °C path are immense.
As a consequence of the range of these potential changes, models are needed to provide a suite of possible outcomes, by species, to assist decision makers to minimize biological impacts and to adapt to the coming changes. Adaptation planning has been accelerating, whether by motivation or mandate. For example, the Northern Institute of Applied Climate Science (NIACS), associated with the USDA Forest Service, has facilitated nearly 300 adaptation demonstrations or projects on forest lands over the last 10 years in the north central and northeastern United States via their Adaptation Workbook [
6],
www.forestadaptation.org. Model outputs are critical for understanding vulnerability and evaluating possible adaptation avenues, particularly when considering transitional or facilitated outcomes [
7,
8,
9].
To arrive at reliable and informative models of how tree species may respond to a rapidly changing climate, a diverse and dynamic field has emerged, where continued refinement affords new insights. Statistical models and mechanistic models form a dichotomy of how one approaches predicting future change and each has their strengths and weaknesses [
10,
11]. Demography approaches add another useful dimension to modeling potential futures [
12,
13], as do paleoecologic studies [
14,
15]. Hybrid approaches, which use a combination of modeling methods, may also provide key insights not otherwise uncovered [
16,
17,
18,
19]. Nonetheless, primary themes from all modeling studies indicate the value of forests in the overall climate equation and the high potential for eventual forest composition and productivity changes in the future [
20,
21].
With passing time, the evidence is mounting that changes are indeed occurring in forest composition and productivity. Evidence of migration of tree species along elevational gradients (up or down) has been mounting for some time, along with the ecological explanations for such movements [
22,
23,
24,
25,
26,
27,
28,
29,
30]. However, latitudinal or longitudinal changes in species range are more difficult to document because of wide distributions, limited sample size, and confounding disturbance factors, such as insect pests and succession following harvest, forest clearing, fire exclusion, human introductions, or other disturbance [
21,
31,
32,
33,
34]. Nonetheless, recent studies conducted with repeated inventory and demography data do provide insights into changes (or not) in range limits. Boisvert-Marsh and others [
35,
36] found poleward shifts in Quebec, Canada for
Acer saccharum Marshal,
Acer rubrum L.,
Fagus grandifolia Ehrh., and
Betula alleghaniensis Britt. between 1970–1977 and 2003–2014, mostly attributed to warming of early- or late-season climatic variables. However, they also detected southward shifts of
Abies balsamea (L.) Mill.,
Picea glauca (Moench) Voss, and
Picea mariana (Mill.) B.S.P., attributed to natural and human disturbances. Sittaro et al. [
37], also in working in Quebec, found that the spatial velocity of temperature at range limits exceeded the pace of tree species migration by a factor of two for 14 of 16 species. Woodall and D’Amato [
38], in a decadal evaluation of 20 eastern US tree species not extending north of the Canada border, found stability for 85% of the species, regardless of the level of canopy disturbance.
Our modeling approach has been to statistically model potential changes in suitable habitat for a large number to species using Forest Inventory and Analysis (FIA) data and environmental co-variates. This approach has evolved along with concomitant large advances in hardware, data, analytical software, and techniques. Our first effort for 80 common trees used county-level data and the statistical technique Regression Tree Analysis [
39,
40]. We then moved to a 20 × 20 km grid, 134 tree species, and the Random Forest technique [
41,
42,
43], our original DISTRIB model which modeled suitable habitat for 134 tree species from the Eastern United States. These models were the basis for several NIACS reports on the vulnerability of forests to climate change in the Mid-Atlantic region [
44], the Central Appalachians [
45], the Central Hardwoods [
46], the Northwoods of Minnesota [
47], Michigan [
48], and Wisconsin [
49], New York and New England [
50], and the Chicago Wilderness region [
51]. Most recently, we have developed a new set of models based on newer FIA data (
www.fia.fs.fed.us), higher resolution soils data [
52], and a hybrid lattice composed of 10 × 10 km and 20 × 20 km grids, derived from FIA plot density and described in a subsequent paper. The objective of this paper is to summarize the outputs from the DISTRIB-II model, for 125 species of trees in the eastern United States.
2. Materials and Methods
In our effort reported here, we present summaries from our recent revision of the original DISTRIB model, now called DISTRIB-II. The extent of our analysis encompasses the United States east of the 100th meridian. In DISTRIB-II, we developed a hybrid lattice of a mix of 20 × 20 and 10 × 10 km cells. The mixture of cell sizes allowed us to optimize modeling by increasing resolution for those cells which had support from sufficient FIA plots. Locations, such as large parts of the Corn Belt in the Midwest, had few FIA plots, so we retained the coarser, 20-km structure, while those locations with higher densities of FIA plots were evaluated and modeled via a 10-km structure. DISTRIB-II also used completely updated data sets of 45 environmental variables and FIA plot data; it also used newer techniques to assign model output values.
2.1. Data
Climate data. We used a range of models and scenarios to capture projections of future temperature and precipitation. Data included current (1981–2010) annual and seasonal mean temperature (°C) and annual and seasonal precipitation totals (mm) based on Parameter-elevation Regressions on Independent Slopes Model, [
53] (PRISM), and end of the century (2070–2099) projected mean values from three General Circulation Models (GCM) under the 4.5 and 8.5 Representative Concentration Pathways (RCP). Downscaled future projections were obtained from NASA Earth Exchange U.S. Downscaled Climate Projections (NEX-US-DCP30) project (
https://cds.nccs.nasa.gov/nex/), with metadata found at (
https://cds.nccs.nasa.gov/wp-content/uploads/2014/04/NEX-DCP30_Tech_Note_v0.pdf) [
54]. These data are derived from GCM runs under the Coupled Model Inter-comparison Project Phase 5 (CMIP5) in support of the IPCC Fifth Assessment Report (IPCC AR5). The NEX US-DCP30 dataset was downscaled to 30 × 30 arcseconds via Bias-Correction Spatial Disaggregation (BCSD) [
55]. Future values were derived by adjusting PRISM data with the change (the deltas) between GCM-simulated data for periods 1981–2010 and 2070–2099, similar to methods described by Monahan, et al. [
56]. These delta adjustments provided closer alignment to current conditions now and minimized exposure to pixel-level artifacts between training and projection climate data. For climate summaries reported in
Table 1, data were aggregated to 10-km across 41,681 cells across the eastern U.S. Three models were used, each with RCP 4.5 and 8.5 [
57]: Community Climate System Model, or CCSM4 (hereafter CCSM45 and CCSM85) [
58], Geophysical Fluid Dynamics Laboratory (Donner), or GFDL-CM3 (GFDL45 and GFDL85) [
59], and Hadley Global Environment Model—Earth System [
60] (or HadGEM2-ES (Had45 and Had85) [
61]. These climate models and RCPs capture, for the entire eastern U.S. study area, a wide distribution space in projected change (
Figure 1 and detailed in
Table 1). Further, the mean change across these combinations (
Figure 1), fall along a strong temperature gradient, from an estimated annual temperature increase of 2.5 °C with CCSM45 to 6.5 °C with Had85, and with an overall mean increase of 4.5 °C. The potential change in annual precipitation (though precipitation changes have higher uncertainty as compared to temperature changes) was higher for all scenarios by end of the century, but for many locations, a reduction in future precipitation is forecasted (i.e., points below the horizontal 0 change line), especially for Had85 and GFDL85 (
Figure 1). Coupled with higher temperatures, especially these scenarios will likely inflict additional physiological stress on organisms for some future periods (see also [
62]). This trend is especially true when examining growing season temperatures, which reach 28.4 °C, an increase of 6.8 °C, for both GFDL85 and Had85. To make matters worse for plant growth, the Hadley model (Had45 and Had85) showed growing season precipitation decreases by end of the century, even though annual precipitation was slightly higher (
Table 1).
Tree Data. As done in the earlier effort [
42], we used U.S. Forest Service Forest Inventory and Analysis (FIA,
www.fia.fs.fed.us) data to derive individual tree species importance values (IV) for each of 84,204 FIA plots. All plots were included with no filtering. The assumption was if the species already grows there, it can grow there. The relative number of stems and relative basal area for each species were weighted equally to calculate IV for each plot. Thus, some species with large numbers of smaller stems (e.g.,
Ulmus,
Acer,
Fraxinus spp.) may be calculated as more important than species with fewer, but larger stems (e.g., some
Quercus). All 84,204 annualized FIA records sampled during the period 2000–2016 were processed, and aggregated to cells with native resolutions of either 10 × 10 km or 20 × 20 km to represent the mean IV within the grid cell. We strove to increase spatial resolution, over that of our previous effort, where the FIA data would support it; to that end, a hybrid lattice was generated through an iterative algorithm to determine whether resolution could be increased to 10 × 10 km (four cells within each 20 × 20 km cell), or maintained at 20 × 20 km. To do so, a 10-km was accepted if ≥50% of the four 10-km cells within a 20-km cell contained two or more FIA plots, otherwise the focal 20-km cell was retained. The resulting hybrid lattice for the eastern U.S. had 29,357 cells, 84.7% of which were comprised of 10 × 10 km cells, and accounting for 2.49 million km
2, or 58% of the eastern U.S. (
Figure 2). The 20 × 20 km cells occupied 1.79 million km
2, or 42% of the area, and were mostly confined to highly agricultural areas, predominantly in the western portion of the eastern U.S. (
Figure 2). To minimize species that have too few samples to build a respectable model, species were only included if they had at least 60 grid cells with at least two FIA plots per cell. This filter resulted in a total of 125 species in the analysis.
Environmental Data. A suite of 45 environmental variables was used to predict IV, for 125 species across the entire eastern US. We used seven climate-related variables, seven elevation-related variables, a solar-related variation of day length variable, nine soil taxonomic orders, and 21 variables related to soil properties to derive the Random Forest models [
41] predicting current species IV (
Table 2). These data were acquired from various sources, with most soils information from gSURRGO [
52], elevation data from the shuttle radar topography mission [
63], a model of solar radiation via latitude [
64], and a model of soil productivity based on soil taxonomy [
65]. We then swapped the seven climate-related variables with future (2070–2099) projections of the same variables according to each of the six GCM/RCP combinations (see above), and Random Forest predicted future IVs for each species. It is important to note that we are not using elevation variables as a proxy for climate—we use them to discriminate among species that prefer lower elevation habitats (for example along the coastal plains or swamps) from those that prefer more elevated habitats with rugged terrain. Also, in addition to improving model fit, the numerous soil variables help restrain the models’ response under future climates and distinguish among species that are mostly climate driven vs. those that are less so.
2.2. Modeling
Individual tree species IV were modeled using the randomForest library [
67] in R version 3.1.1 [
68] (hereafter RF), in which 1001 regression trees were trained with eight randomly selected environmental variables evaluated at each node, and grown to include a minimum of 10 observations. To train the models, only grid cells within the hybrid lattice (10 × 10 or 20 × 20 km) were used that had (1) two or more FIA plots (to ensure representation within each cell), (2) ≥5% forest cover defined by the 2006 NLCD [
69] (classes 41, 42, 43, and 90, to exclude very highly agricultural regions), and (3) a mean IV ≤ 1.5 times the inter-quartile range of IVs across all cells (to exclude outliers because they were unlikely to represent the full 100 or 400 km
2). Each of the 1001 regression trees built by RF provides information about the predicted IV, and the default is to report the mean prediction. However, the random resampling of only eight of 45 variables at each node can result in spurious outcomes due to, for example, omission of an entire class of variables (e.g., climate); while these spurious trees rarely influence overall prediction [
70], outliers can influence prediction distributions at a given cell [
71]. Therefore, we compared the mean predicted value to the median for each cell; if the median = 0 and among all 1001 predicted values the coefficient of variation ≥2.75, then 0 was used as the predicted IV rather than the mean; which was 0 < IVmean < 8 among all species. This “mean-median” combination is a modification to the approach suggested by Roy and Larocque [
71] which limits the influence on outlier predictions, minimizing the area of modeled low suitability, due to a few outliers within the 1001 regression trees for each species.
Once the RF model was trained, predictions of IV were made to all 29,357 cells irrespective of cell size within the hybrid lattice, whether or not at least 2 FIA plots were present, or whether percent forest cover was less or more than five percent.
2.3. Model Reliability
We created a model reliability (ModRel) score from a series of five metrics obtained from the performance statistics of each of 125 species. These included (1) a pseudo R
2 obtained from the RF model (RF R
2); (2) a Fuzzy Kappa (FK) metric which compares outputs of the imputed RF-predicted map to the FIA-derived map [
72]; (3) the deviance of the CV (CVdev) among 30 regression trees via bagging [
41]; and (4) the stability of the top five variables (Top5) from 30 regression trees, and (5) a true skill statistic (TSS) of the imputed RF. The first four were used previously, described in Iverson et al. [
42]. The five variables were normalized to a 0–1 scale and weighted as follows to arrive at a final ModRel score: 0.33 × RF R
2 + 0.33 × FK + 0.11 × CVdev + 0.11 × Top5 + 0.11 × TSS which gives more weighting to RF R
2 and FK, a primary performance metric and a comparison of predicted to observed values, respectively. Then, ModRel scores were assigned to one of four classes: High (ModRel ≥ 0.7), medium (0.7 > ModRel > 0.54), low (0.55 > ModRel ≥ 0.14), and unreliable and excluded from further modeling (ModRel < 0.14).
2.4. Variable Importance
Each of the 45 predictor variables was scored for all species cumulatively according to a variable importance index, which was the average of three normalized (0–100) scores. First, the variable importance, as calculated within the RF function (percent increase in MSE based original and permuted predictors of the out-of-bag data—see the help for “importance” in randomForest library in R), for each of 125 species was summed. Second, the sum of the reciprocal of ranked predictor importance across all species was calculated; the reciprocal produced higher scores for top ranked variables. Third, the frequency, or count, of the number of times a predictor ranks in the top 10 across all species was tabulated. These metrics allowed comparison among the 45 variables for their value in creating the tree species models. Importantly, these metrics are based on all species across the entire eastern U.S. so that species that have specific requirements will not garner much support with these indicator metrics.
2.5. Area-Weighted Importance Values
To incorporate both the area and the relative abundance of each species, we calculated area-weighted importance values for each species. We use area-weighted importance values as a surrogate for the strength of suitable habitat across a species’ distribution. The higher the IV score, the higher the tendency for that species to occupy that cell, and the higher the possible basal area of that species within the cell. This measure of suitable habitat is not a probability of occurrence (though likely similar for many species) but rather an indication of the potential of the cell to host the species. Any value above 0 can be considered suitable habitat, though the strength of that habitat varies according to the area-weighted IV score. These values thus provide an estimate of each species’ importance based on the IV modeled for each cell (or partial cell), multiplied by the area the cell represents. Because of the variation of grid sizes (100 km2 or 400 km2), due to the hybrid grid structure, and the partial cells especially along coasts, the area-weighted values are truer to their actual and projected future suitable habitat. The ratio of future to present modeled condition represents the potential change of suitable habitat in the future, where values >1 indicate an increase in area-weighted importance and values <1 indicate a decrease.
2.6. Changes in Mean Center of Spatial Data
Within ArcGIS 10.3 (ESRI, Redlands, CA, USA), the Mean Center and Directional Distribution functions were used to calculate the current and future ‘center of gravity’ and directional ellipse within 1 standard deviation, respectively, of species ranges generated by our models. No weighting was applied to the IV, but only cells modeled to have an IV > 0 were considered in the calculation of the mean centers and directional ellipses. The coordinates of the mean center were used to calculate distance and direction of potential movement of the suitable habitat for each species and were visualized using polar graphs to evaluate potential changes among all species for each scenario of climate change.
2.7. Analysis of Dominants, Gainers, and Losers
The area-weighted IV allowed comparison of species prominence and potential change according to the climate scenarios, by spatial domain. These provide valuable supplements to FIA data and state reports (
www.fia.fs.fed.us) on the current situation for tree species, as the IVs are based on both density (number of stems) and dominance (basal area) simultaneously. We provide this information for each of 37 states and the District of Columbia, and for five regions within the eastern US. Notably, for the six states split by the 100th meridian (our boundary of the eastern U.S.), some forest patches will be missed but the area in those states west of the 100th meridian is dominated by nonforest or western species (not modeled), with the exception of the Black Hills of South Dakota. We ranked each species according to the modeled current IV and selected the top three for each spatial unit and then calculated the potential changes in area-weighted IV, as ratios of future to current IVs, among the various scenarios of climate change.
2.8. Species-Level Maps
Maps representing species (abundance and suitable habitat under various scenarios) were generated for each species. Specifically, the maps show outputs of the (1) FIA estimate of current abundance, (2) modeled current distribution, and the future distributions according to the (3) CCSM45, (4) CCSM85, (5) GFDL45, (6) GFDL85, (7) Had45, (8) Had85, (9) mean of all three RCP4.5, and (10) mean of all three RCP8.5 scenarios.
2.9. Comparison to Earlier DISTRIB Models
We have been modeling tree species suitable habitat within the eastern U.S. since 1998 [
17,
39,
40,
42,
43], and there have been changes in many dimensions throughout this period. First, we modeled 80 species at the county level of resolution, then 134 species at 20 × 20 km resolution, and most recently 125 species at a hybrid of 10 × 10 and 20 × 20 km resolution, depending on the density of FIA plots (~forest cover). Throughout the period, there has also been a remarkable improvement in environmental data, especially climate and soils data. And, the modeling improvement from regression tree analysis to random forests [
41] was particularly dramatic in enhancing model performance. As expected when using multiple models, updated data sets, or variations in modelling technique, model outcomes will differ between iterations; this is true in this case too.
2.10. Scope and Limitations
The models depicted here represent changes in potentially suitable habitat according to scenarios of climate change; they do not depict projections of actual future distributions by 2100. Earlier work has shown that natural migration proceeds at a much slower pace than change in habitat, especially for long-lived trees [
17,
73,
74,
75]. Therefore, our projections of an increase in the range are likely to overestimate the actual distributions by century’s end, unless humans get seriously involved in moving species.
Though Random Forest has been shown to be a robust modeling tool, highly resistant to overfitting, we sometimes are making predictions into novel parameter space through extrapolation; nonetheless, the resistance to overfitting of Random Forest predictions gives us confidence that the extrapolations are suitably constrained and are not exaggerated projections [
41]. Obviously, not all 125 species models are created equal, and we calculate several metrics to assess model reliability for each species [
42].
When we model potential changes in suitable habitat, one would normally expect the greatest impacts to be experienced by young plants at the point of regeneration, when seedlings or saplings are more susceptible to the increased extreme weather events and other ramifications of the changing climate. However, mature forests are certainly also susceptible, either directly via droughts, especially ‘hot’ droughts [
76], or indirectly via pests and pathogens [
77]. Because the models are based on FIA inventories of trees >2.5 cm dbh, the regeneration component is not well represented in the model formulations. We are modeling the potential niche space that may be available to species under future climates, which may not be the realized niche because disturbances and extreme events will be operating within the suitable habitats. Though the FIA data, in effect, integrates past disturbances by documenting those species that have survived past events, we cannot anticipate future disturbances (like an exotic pest invasion) that will influence actual future distribution and abundance. Further, we cannot assume that all species are in equilibrium with their current climate or other environmental variables.
4. Conclusions
The forests of the eastern United States are characterized by a diverse assemblage of tree species. Climate change has the potential to shift and influence species patterns, thereby creating novel communities, with the greatest disruption and change clearly linked to the emissions pathways that unfold over the course of this century. By quantifying potential habitat changes across a wide array of species over a broad geographic extent, we can consider several dimensions of potential habitat change that lend to understanding individual species responses, as well as focused regional quantification that lend to informed on-the-ground adaptation planning. The trend of increasing general habitat conditions for a large portion of the species is a function of spreading at the range margin extents with, in many cases, a decline in core habitat suitability. Those species projecting losses, while fewer in number, generally show a contraction in the range of suitable habitat under either RCP, but especially so under RCP 8.5. In many cases, a finer extent of regions or state level evaluations reveals the potential impact of shifting habitats via ranked species importance and summary of winners and losers by states or regions. In these more discrete extents, the gradual fading in or out for species presents useful information for planners. Many of the current top species projected to decline in the region (even though their range-wide ratio may be increasing) epitomize the conditions where macro level pressures of climate change can have local level implications. In the end, these results show the high potential for a reshuffling of where suitable habitats for species will occur across the eastern United States, and it is clear that these will reshape competitive pressures and ultimate final outcomes that are beyond any modeling approach. Therefore, we intend these models to be one piece of a package of information that practitioners use for decisions related to adapting to the changing climate we now face. Working together, adaptations in silviculture and ecological management should improve the potential for eastern U.S. forests to continue to thrive in the coming decades.