Tropical Deforestation and Recolonization by Exotic and Native Trees: Spatial Patterns of Tropical Forest Biomass, Functional Groups, and Species Counts and Links to Stand Age, Geoclimate, and Sustainability Goals

We mapped native, endemic, and introduced (i.e., exotic) tree species counts, relative basal areas of functional groups, species basal areas, and forest biomass from forest inventory data, satellite imagery, and environmental data for Puerto Rico and the Virgin Islands. Imagery included time series of Landsat composites and Moderate Resolution Imaging Spectroradiometer (MODIS)-based phenology. Environmental data included climate, land-cover, geology, topography, and road distances. Large-scale deforestation and subsequent forest regrowth are clear in the resulting maps decades after large-scale transition back to forest. Stand age, climate, geology, topography, road/urban locations, and protection are clearly influential. Unprotected forests on more accessible or arable lands are younger and have more introduced species and deciduous and nitrogen-fixing basal areas, fewer endemic species, and less biomass. Exotic species are widespread—except in the oldest, most remote forests on the least arable lands, where shade-tolerant exotics may persist. Although the maps have large uncertainty, their patterns of biomass, tree species diversity, and functional traits suggest that for a given geoclimate, forest age is a core proxy for forest biomass, species counts, nitrogen-fixing status, and leaf longevity. Geoclimate indicates hard-leaved species commonness. Until global wall-to-wall remote sensing data from specialized sensors are available, maps from multispectral image time series and other predictor data should help with running ecosystem models and as sustainable development indicators. Forest attribute models trained with a tree species ordination and mapped with nearest neighbor substitution (Phenological Gradient Nearest Neighbor method, PGNN) yielded larger correlation coefficients for observed vs. mapped tree species basal areas than Cubist regression tree models trained separately on each species. In contrast, Cubist regression tree models of forest structural and functional attributes yielded larger such correlation coefficients than the ordination-trained PGNN models. Remote Sens. 2018, 10, 1724; doi:10.3390/rs10111724 www.mdpi.com/journal/remotesensing Remote Sens. 2018, 10, 1724 2 of 28


Introduction
Tropical forests are important for many reasons, but humanity is constantly changing them.They harbor a majority of Earth's terrestrial biodiversity, for example, but we have been clearing them at a rate of 0.5% per year since 1990, emitting carbon at a rate that amounts to 9% of the global human-caused carbon emissions (the rest is from fossil fuel combustion and cement manufacturing) [1][2][3][4].Studies suggest that old-growth tropical forests are irreplaceable for biodiversity conservation, but the degree to which secondary or plantation tropical forests conserve biodiversity or offset carbon emissions is still under debate [1,[5][6][7][8][9].Either way, managing tropical forest landscapes so that they sustain their biodiversity while providing ecosystem services, like regulating water supplies, stabilizing lands, storing carbon, and other contributions to people, is crucial.Doing so requires not only that we understand, quantify, and monitor how past and future changes affect their biodiversity and ecosystem services, but also that we illustrate these concepts to managers and the public.
Spatially explicit, i.e., mapped, characterizations of tropical forest attributes have distinct advantages for meeting these needs.For one thing, management actions are often applied to specific land areas, but forest inventory data from field plots, which identify species or estimate attributes like carbon stocks, are only points on the landscape.Predicting forest attributes measured in plot data with models that use a set of mapped variables, like climate and satellite imagery as explanatory variables, allows mapping of the likely characteristics of forest across a landscape.Such forest attribute mapping can secondly assist countries in reporting progress toward meeting conservation and forest-related targets in international agreements or conventions, like the Aichi biodiversity targets (which came from the Convention on Biological Diversity) or the United Nations (UN) Sustainable Development Goals (SDGs).Third, a picture is worth a thousand words.
Our purpose here is to paint a picture, for the Caribbean islands of Puerto Rico and the Virgin Islands (PRVI), of the outcome for tropical forests of large-scale clearing and regrowth given variable climate, topography, soil substrates, and disturbance history; of colonization by several exotic species; and of protected areas established at different times.Towards that goal, we first assemble a database of tree species functional traits and other characteristics.Secondly, we apply two approaches to modeling and mapping several attributes of forests, including: (a) counts and relative basal areas of introduced (also called exotic), native, and endemic tree species; (b) relative basal areas of selected functional groups; (c) forest structure and biomass; and (d) individual tree species distributions.We also briefly discuss the relative advantages of the two modeling techniques by comparing plot-level correlations between observed vs. predicted values from the mapping models.The two modeling methods that we compare are Cubist regression tree [10] and Phenological Gradient Nearest Neighbor (PGNN) [11][12][13].Finally, we evaluate spatial patterns in the resulting maps of forest attributes in light of inter-related patterns of disturbance history, protection, climate, geology, and topography, and we address the implications of these and related results for Earth system models (ESMs) and forest sustainability, specifically, forest-related aspects of UN SDG 15, particularly Target 15.2, Indicator 15.2.1:Progress towards sustainable forest management.We provide background on related prior forest attribute modeling in Section 2.3.3.In Section 4.2, we review results from related plot-level analyses of forest tree biomass, functional traits, species composition, and diversity and compare them with the mapped spatial patterns.
Our results suggest that that the machine learning regression method Cubist is best for mapping forest functional traits and characteristics like deciduousness, biomass, and introduced species counts compared with PGNN as applied here, but PGNN is better for mapping the basal areas of individual species.
Our results also suggest that time series of satellite image composites, image-derived age, and other disturbance history variables, together with geoclimate, should be useful as indicators of several aspects of progress toward sustainable development, of habitat types, or of functional traits when implementing dynamic ecosystem models.The maps from this study portray how humans interact with landscapes to affect tropical forests in many ways.

Study Area
Our study area is a collection of Caribbean islands encompassing Puerto Rico and the Virgin Islands.The climate is tropical, under the definition of tropical as having no frost at sea level.The tropical forest types range from young and old secondary forest to old forest, and from dry forests to humid forests.Forest leaf longevities range from deciduous and semideciduous forest to evergreen forests including cloud forests (with annual rainfall of about 800-1500, 1250-2000, 2000-4000, and 2000-4600 mm, respectively).Soils include alluvial soils, mostly in lowland areas, and soils formed over limestone, volcanic, and serpentine substrates.Most limestone areas have complex karst topography.For a given climate, forests formed on limestone and serpentine substrates have leaf types or growth forms leaning toward those of drier forests, given fast-draining soils.Serpentine substrates not only have fast-draining soils, but they are also nutrient poor and have toxic levels of some metals [14].

Field Data and Tree Species Characteristics Dataset
Response variables were calculated from the U.S. Department of Agriculture (USDA) Forest Service Forest Inventory and Analysis program (FIA), which is jointly implemented in Caribbean territories of the United States by the Southern Research Station and the International Institute of Tropical Forestry.On the main island of Puerto Rico, about one-third of the plots are surveyed each year during the first three years of a five-year cycle.On Puerto Rico's outlying islands of Culebra, Vieques, and Mona, as well as on the U.S. Virgin Islands, plots are surveyed during the fourth to fifth year of the cycle.We used data from 442 forested and partially forested plots from the years 2001 to 2004, plus data from the year 2008 for Mona Island, which was when it was first surveyed.Plots are spaced every 24 km 2 on mainland Puerto Rico; every 2 km 2 on Culebra, Mona Island, and St. John; every 2.7 km 2 on accessible portions of Vieques and on St. Thomas; and every 4 km 2 on St. Croix.Each plot has four 7.3-m-radius circular subplots where trees with a diameter at breast height (d.b.h.) of ≥12.7 cm are surveyed [15].Trees with d.b.h. of 2.5 through 12.6 cm (saplings) are surveyed on 2.07-m-radius circular microplots within each subplot.Among the data recorded for each plot are tree and sapling species, d.b.h., and height.The subplot layout includes a center subplot and three surrounding subplots with centers located at azimuths of 360, 120, and 240 degrees and 36.6 m from the plot center.Each plot fits within a 3 × 3 window of Landsat imagery with 30-m pixels.

Remotely Sensed Imagery
We used four types of predictor layers from satellite imagery: (1) previously published land-cover maps described elsewhere (Section 2.3.2);(2) a supervised maximum likelihood classification to land cover of a Landsat image from the year 2000 for the small island of Mona; (3) bands from a long time series of Landsat and Satellite Pour l'Observation de la Terre (SPOT) image composites; and (4) phenology metrics produced with image composites from the Moderate Resolution Imaging Spectroradiometer (MODIS) (Table 1).We briefly describe here the Landsat bands and MODIS phenology metrics and the rationales for including them.
We processed Enhanced Vegetation Index (EVI) data from all 16-day composites from the year 2001 of the MODIS MOD13Q1 Terra product to phenology metrics.The phenology metrics were the mean and coefficients of a Fourier transform of these EVI composites.Missing values in any composite were excluded from the harmonic regression.Bands and phenology metrics from multiseason imagery improve satellite-image-based maps of tropical forest types and structure [29][30][31][32].Here, eliminating missing-data pixels while estimating the phenology metrics with the Fourier transform simultaneously removes noise from the 16-day composites while characterizing vegetation phenology.We resampled the phenology metrics from their native 250-m pixel size to a 28.5-m spatial resolution with bilinear interpolation.
We also created a long time series of orthorectified Landsat and SPOT cloud-minimized image composites at five-year intervals spanning the years 1985-2000 (Table 1).The optical bands from these composites were included as predictor layers.We produced the Landsat composites with regression tree normalization and histogram matching methods described earlier [33,34].Brightness values from a mosaic of panchromatic SPOT imagery for the year 1995, which we produced by normalizing image sections with linear regression, were also added.We used the SPOT imagery because no Landsat imagery are available for the study area from the mid-1990s.
The rationale for including optical bands from quinquennial composites or multidecadal imagery is that they can be used directly in classification and other mapping models as the basis for, or to improve, maps of tropical forest structure, types, and successional stage [32,35,36].Current tropical forest height, foliage height profiles, and past disturbance types, for example, can be directly mapped with current plus past image bands in landscapes where forest regrowth rather than clearing dominates forest landscape dynamics [36].In such landscapes, which include the study area, stand age will explain less variability in forest canopy structure than the time series of spectral bands.Where cycles of tropical regrowth and reclearing are common over the timespan of the image series, a forest age layer is needed to map forest structure, but bands from a time series are still likely to improve such models.Unlike forest age alone, a long time series of spectral bands reflects past disturbance intensity, and its effects on forest recovery, both within a disturbed area and among areas with different disturbance types.For example, a single patch of forest that was partially disturbed by being burned from escaped fire will have different levels of burn severity, reflected as different pixel brightness values within that patch that have different forest structure as the stand regrows.Similarly, that partially burned forest will recover faster and have a different species composition than a nearby patch that was cleared for agriculture of all vegetation and upper organic soil materials and that has much brighter pixels after clearing than the pixels of partially burned forest [36].

Other Predictor Data
Besides the imagery, we assembled several other datasets for variables that are normally related to tropical forest vegetation types and structure to help predict forest characteristics from the FIA data.The variables included climate, geological soil substrate, soils, and topography, as well as indicators of human disturbance including distances to roads and the coast, maps of land cover in the years 2000 and 1978, and percent cover of tree canopies and impervious surfaces (Table 1).Except for Mona Island, the year-2000 land-cover maps [37,38] were from decision-tree classification of different Landsat image composites developed for the prior work.As previously described [37], the land-cover map from the year 1978 was based on air photo interpretation.
The climate layers included long-term (30-year) monthly and annual maps of temperature and precipitation [39] and the mean and coefficients of a Fourier transform of the monthly data [12] to characterize the seasonality of temperature and precipitation.A digital elevation model (DEM) with a 10-m spatial resolution was created from discrete lidar data.These data were processed according to methods described previously [38].All of these data types can improve discrimination of tropical forest biomass, structure, successional stage, physiognomic (i.e., formation) types, or tree species communities [30,31,36,[40][41][42][43].Predictor data zones-discrete variable identifying islands 28.5 TS 1 Topography-elevation, aspect, percent slope, moisture index, and topographic shadow for the base scene date of each Landsat composite ~90 [47] Lidar-derived topography-elevation and slope position (Puerto Rico, St. John, and St. Thomas; other areas filled in with above topographic data) 10 TS 1 Pixel-level percent tree canopy and impervious surface cover (PR only) 30 [48] Land cover in 1978 (mainland PR only) 28.5 [37] 1 TS = Developed for this study (e.g., satellite image composites, Fourier transforms of monthly MODIS and climate, Mona Island land cover, digitized VI geology). 2Original Landsat imagery from the U.S. Geological Survey. 3SPOT = Satellite Pour l'Observation de la Terre (for Puerto Rico (PR) and its outlying islands only). 4 We excluded stand age and protection from the predictor layers, assuming a priori that these variables would have strong enough predictive power that mapping models might overfit to them.Artifacts of such overfitting could include that their boundaries would define unrealistically abrupt forest attribute spatial patterns and, though in a minority of places, areas of large under-or overprediction.In addition, the forest age data we had [38,43] were limited to mainland Puerto Rico, and for the oldest age class, were spatially generalized compared with satellite imagery.Instead, and as discussed in Section 2.3.1, we relied mainly on the time series of Landsat and SPOT bands to help explain forest attribute variability related to stand age and disturbance history.

Background on Modeling Forest Attributes with Cubist and PGNN
The two modeling methods that we used were a machine learning regression approach and an approach that combines ordination with the machine learning method known as k-Nearest Neighbor.For the machine learning regression approach, we used the software Cubist.It is a data-mining tool that builds rule-based models with linear regression models at the terminal node of each ruleset.It is proprietary, but it enhances published work [49] that improves decision-tree classification models to handle continuous variables.It also builds "committee" models in which, like boosted regression trees, successive models are adjusted to increase accuracy.In Cubist, the final predicted value is the average of the successive models.Decision trees with regression, like Cubist, can predict continuous variables more accurately than multiple linear regression models [50].Many studies use Cubist to combine plot-based forest data with satellite imagery and other spatial data to map forest attributes.Example response variables include forest biomass [51], height [36,52,53], foliage height profiles [36], peat depth [54], percent canopy cover [48], and other forest variables.Another machine learning method, Random Forests [55], is also commonly used to map forest biomass from mapped predictors, with plot-level field or lidar data as the response variable [40].Advantages with Cubist include that the models are highly interpretable [52,56] and that when modeling percentage variables like those here, Cubist can be more accurate [57].Studies also use Cubist or other regression tree models to model forest attributes from lidar metrics [53,58].
For the approach combining machine learning with ordination, we used an offshoot of Gradient Nearest Neighbor (GNN) modeling [11], Phenological GNN (PGNN) [12,13].These methods first use canonical correspondence analysis (CCA), an ordination technique, to find linear combinations of predictor layers that relate to the relative basal areas of the tree species in plots.These linear combinations represent environmental gradients (or ordination axes) along which species are arranged, are uncorrelated with each other, and maximize the differentiation among species in the gradient space created by these axes [59].As many of these gradients are estimated as there are predictor layers, but we used the first eight axes [12].These eight gradients are mapped with the predictor variables and the CCA-estimated coefficients.Next, for each pixel, forest attribute values from one or more plots that were nearest to that pixel in the resulting eight-dimensional space were assigned to that pixel to map forest attributes.
The PGNN method differs from GNN in assigning to each pixel the average of a small number of plots (k) that are close neighbors in gradient space, instead of only the closest neighbor.It also adds as explanatory variables seasonality metrics estimated from a time series of a satellite-image-derived greenness index, the Enhanced Vegetation Index (EVI).GNN or PGNN have been applied for mapping forest structure and species basal areas [11,12], forest C stocks [13], indicators of old-growth forest structure [60], and variables used as input to a forest fuels model [61].They have two distinct advantages.First, many plot variables are mapped simultaneously with a single model.Each unknown pixel is scored only once according to its place in multivariate gradient space.Then, the forest attributes of the most similar observation in gradient space (GNN) or the average of the k similar observations (PGNN) are assigned to that pixel.Second, they maintain the mean, range, and covariance among response variables of the original plot data.As with Cubist, studies have also used GNN to model many forest attributes with lidar metrics plus environmental data [58].

Mapping with Forest Inventory Data, Cubist, and PGNN
As mentioned, the FIA plots extend over a 3 × 3 window of raster data with a cell size of ~30 m.Previous work [11] found that modeling plot-level estimates of forest attributes on the response side, and the average or majority of a 3 × 3 window of 30-m pixels on the explanatory side, rather than using subplot-level estimates, improved predictions from GNN modeling.Consequently, we applied a 3 × 3 window focal mean analysis to continuous predictors with ≤30 m cells and a focal majority analysis to discrete predictors.We divided the attribute values by the total area of the plot.We mapped all variables with both Cubist and PGNN, limiting species modeling to the 34 species occurring in at least 20 plots.
With the Cubist method, each variable is modeled separately.We used the default options in Cubist version 2.07 plus five-member committee models.Output from committee models is the average of the n-member model predictions and can improve the accuracy of outputs, similar to classification and regression tree boosting.Nonforest plots were excluded from training data because classification and regression trees can perform poorly with unbalanced reference data [62].To map each variable, we applied its corresponding Cubist model to the stack of spectral and other predictor data layers with software from Ruefenacht et al. [63].
With the PGNN method, a CCA was performed with the basal areas of each of the 34 species of trees with at least 20 observations in the inventory on the response side and the predictor layers on the explanatory side.Nonforest plots (no subplots forested) were included and assigned a value of zero basal area for each species.The CCA was estimated via eigenanalysis with the ade4 package for the R statistical language [64].Each forested pixel was labeled with the plot identification number of a seed plot, which was its nearest forested neighbor in the gradient space.Next, the eight plots that were the nearest neighbors in gradient space to that seed plot were found.Finally, the weighted average values of those eight plots were assigned to the pixel for each variable.The weights for each of the eight plots when averaging them were based on inverse Euclidean distance in gradient space from the seed plot [12].There were as many groups of eight neighbors as there were plots.We applied the same forest mask as above to the final maps.

Results
An important overall result of the mapping was that spatial patterns in the output maps of forest structure, functional groups, species counts, and of some individual species indicated a consistent network of drivers of these patterns.Briefly, this network (the drivers are not exclusive of one another) included the combination of geological substrate (Figure 1a) and climate (Figure 1b), which we refer to here as geoclimate (e.g., Figure 2); stand age (Figure 2), which is closely related to geoclimate, elevation (Figure 1c), and protection (Figure 3); and disturbance history other than stand age.Disturbance history variables other than stand age include those related to forest fragmentation (Figure 3), like past and present patch isolation and spatial contagion, and variables like past land-use or disturbance type or intensity.We discuss these variables further in Section 4.2.
Cubist model correlation coefficients between predicted and observed values of forest structure, species counts, and functional groups were mostly larger than those for PGNN, ranging from 0.24 to 0.65 (Table 2).For PGNN, these coefficients ranged from 0.06 to 0.37 with only forested test data and from 0.07 to 0.54 if nonforest plots were also included in the test data (Table 3).
The PGNN approach achieved correlation coefficients of observed vs. predicted values of 0.4 or greater for 9 of the 34 species mapped and were mostly larger than those from Cubist (Table 3, Appendix A).Including plots with no forested subplots in the test data did not generally change the PGNN correlation coefficients.For maps based on PGNN modeling, spatial patterns of individual species basal areas were as expected for species with observed vs. predicted correlation coefficients above about 0.40 based on field observations and general knowledge of various species distributions.
Correlation coefficients between observed and predicted values for both modeling approaches were in the range of prior work.A coefficient value of 0.52 for Cubist-modeled biomass was in the 0.31-0.92range for various U.S. regions [51].These coefficients for PGNN models of basal area and stand height, of 0.35 and 0.51, respectively, are near those of 0.4-0.77(basal area) and 0.26-0.83(height) from GNN modeling in other U.S. sites [11,61].For Cubist-modeled species counts, a GNN-modeled validation result of 0.53 for species richness is comparable to the same metric for Cubist models of 0.50-0.52 for endemic, introduced, and native species counts.forest cover in the year 2000, with both showing protected areas [70,71] and their date of establishment.Because the spatial pattern of stand age was apparent in the patterns of several of the mapped forest attributes, we briefly discuss here its drivers.In Puerto Rico and elsewhere, the biophysical, socioeconomic, and landscape variables that drive the spatial patterns of forest recovery are those that affect land accessibility, arability, market forces, seed dispersal, and microclimate, as well as spatial contagion related to these drivers [42,43].The oldest forests are usually more remote, which in the study area means further from roads, cities, and coastlines and at higher elevations where slopes are steeper and which in Puerto Rico may be cloud-shrouded and have water-logged soils.Older forests also occur over rugged geology with fast-draining, nutrient-poor, or toxic soils (karst, serpentine) or in places that are less accessible or arable in other ways, such as being protected or being wetlands.These are the places where agriculture was abandoned the soonest or that may never have been cleared.Being less in demand for human uses, except perhaps watershed stabilization, many of these areas were among the first to be protected (Figure 3).In Puerto Rico and elsewhere, protected areas may have been designated where protection is the least inconvenient for humans [72,73].Forests closer to cities and roads, on gentler slopes, and at lower elevations are increasingly younger.The earliest forests to be abandoned and recover are the last to have been cleared.Then, the next sites to be abandoned are closer to remaining or existing forest and receive more seed dispersal and have a cooler and more humid microclimate, enhancing forest recovery.This cycle of interacting socioeconomic and biophysical drivers, including landscape variables like spatial contagion, make distance to existing forest, or its usual complement, distance to pasture, the strongest predictors of the spatial patterns of forest regrowth and stand age [42,43,74].The nearby forest can be secondary or old-growth forest.
Studies have found exceptions to these trends in other tropical countries.Farmers may clear less accessible land for agriculture where they find new opportunities for profit from agriculture or perceive that their best option for income or subsistence is unsanctioned forest clearing for agriculture in more remote areas, including in protected areas or other state-owned land [75,76].As mentioned, protection also makes lands less accessible, and protected areas are not always in the least arable geoclimates.An example is the island of St. John, two-thirds of which was privately owned for some time and then became a National Park.Forest in the park is older compared with that on the other islands east of mainland Puerto Rico, even though it has geoclimates similar to other smaller islands in the study area.

Forest Structure and Biomass
The maps of forest biomass (Figure 4), height, and basal area reflect the strong relationships between forest structure and (1) stand age and (2) geoclimate.Plot-level biomass averages across Puerto Rico by age class, geology, and climate (as gauged by leaf physiognomy) support this observation [43].In plot-level studies, stand age has been the best predictor of young tropical forest biomass, height, or basal area [36,[77][78][79] for a given geoclimate.If other landscape characteristics are the same, older forests have more biomass.Also apparent are the abovementioned biophysical and socioeconomic controls on forest regrowth spatial patterns.Younger forests are those closest to coastlines or the largest cities or roads, and they have less biomass (light blue to white shades in the biomass map).Forests at higher elevations or further from roads have more biomass.
Stand age is a critical component of forest carbon cycling as well.Net ecosystem productivity, for example, strongly correlates with age.Intermediate-aged forests are most productive, but old-growth forests can be huge carbon sinks [80], accumulating carbon for centuries.The rate of C accumulation in large trees increases with age, and clearing old-growth forest releases more carbon than can be accumulated by young forests [81,82].
Geoclimate is also important.Forest biomass differs by geologic substrate when averaged across age classes for a given climate zone.The forests with the most biomass are on volcanic or alluvial geologic substrates, which are fertile.Forest in karst zones has less biomass, being edaphically dry.Ultramafic, serpentine soils support the least forest biomass for a given climate, being edaphically dry plus nutrient poor and toxic.Geology likely influences forest biomass across the tropics, including in the Amazon basin [41].As for climate, within the same geology and age class, humid forests (blue shades in the PET/P map in Figure 1) have more biomass than stands in drier places (white to brown shades in the PET/P map).Cloud forests on peaks, however, are stunted by water-logged soils, increased ultraviolet-B radiation, and high winds.
Disturbance history variables other than forest age affect tropical forest height and rates of biomass accumulation [36,79], but they are less obvious in these regional maps of forest structure.As well, the effects of finer-scaled topography on forest biomass were inconsistent in the maps.Topography likely influences forest biomass, however.In Panama moist forests, concave slopes have more biomass than convex ones [83].Trees are visibly larger in valleys and concave slopes in dry and edaphically dry parts of the study area as well.Stand age is a critical component of forest carbon cycling as well.Net ecosystem productivity, for example, strongly correlates with age.Intermediate-aged forests are most productive, but oldgrowth forests can be huge carbon sinks [80], accumulating carbon for centuries.The rate of C

Forest Functional Groups
The effects of stand age and climate are also evident in the maps of relative basal areas of N-fixing and deciduous N-fixing tree species (Figure 5).These attributes are low (yellow in color) in the oldest forests and high in young forests (deep blue color), particularly so for dry forests (Figure 5a-c).Deciduous N-fixing relative basal areas are largest in young dry forests, but they are also large in the youngest forests in humid areas.Pioneer tree species are often N-fixers [84], and N-fixing trees have long been planted as part of land reclamation plans, including in tropical regions [85,86].Even without planting, N-fixing tree species are common in tropical forest pioneer stands, and their prevalence declines with age [87][88][89].This tendency applies to Puerto Rico, where younger forests across climates have larger relative basal areas of N-fixing species, many of them introduced [87].Besides stand age, moisture availability affects the prevalence of N-fixing species.Tropical dry forests in general have more legumes compared with tropical humid forests [88,89].For many legumes, the capacity to fix N and small leaflet sizes probably help them survive seasonal drought [89].In addition, the larger N-fixing basal area appearing in Central Puerto Rico at higher elevations could also reflect the presence of N-fixing species in former shade coffee, including Inga laurina ((Sw.)Willd.) and Andira inermis ((W.Wright Kunth ex DC.), and N-fixing species planted for shade coffee, including Inga vera (Willd.)and Erythrina poeppigiana ((Walp.)O.F. Cook).In the Virgin Islands, the maps indicate that N-fixing species numbers and relative basal area are low in St. John, with its older forests.Other protected, older-forest areas in these islands show a similar pattern.
In the leaf phenology maps, relative basal areas of evergreen species are larger in more humid zones, while deciduous species have larger relative basal area in dry zones, as is typical of tropical forest landscapes (Figure 6a,b).Deciduous species, however, also have relatively more basal area in humid areas in younger forests, like those south and southeast of San Juan.The youngest tropical forests tend to be more deciduous in humid zones in Puerto Rico and elsewhere, probably because shorter leaf longevity protects against desiccation in the drier microclimate of young forest [70,90,91].
The maps of evergreen coriaceous species (Figure 6c) also show large relative basal areas on serpentine substrates (the westernmost dark green patches), in cloud forests (the remaining dark green patches at the highest elevations), and in mangrove forests (the small, dark green patches on coastlines).Hard leaves are typical in such places.Sclerophylly protects leaves in stressful environments [14,92,93].
Variables related to forest function, like nitrogen availability and leaf characteristics, are now critical to Earth system models.Until maps of such variables are available from more direct sources, maps like these can be used for running such models in a spatially explicit manner.In the leaf phenology maps, relative basal areas of evergreen species are larger in more humid zones, while deciduous species have larger relative basal area in dry zones, as is typical of tropical forest landscapes (Figure 6a,b).Deciduous species, however, also have relatively more basal area in humid areas in younger forests, like those south and southeast of San Juan.The youngest tropical forests tend to be more deciduous in humid zones in Puerto Rico and elsewhere, probably because shorter leaf longevity protects against desiccation in the drier microclimate of young forest [70,90,91].The maps of evergreen coriaceous species (Figure 6c) also show large relative basal areas on serpentine substrates (the westernmost dark green patches), in cloud forests (the remaining dark green patches at the highest elevations), and in mangrove forests (the small, dark green patches on coastlines).Hard leaves are typical in such places.Sclerophylly protects leaves in stressful environments [14,92,93].
Variables related to forest function, like nitrogen availability and leaf characteristics, are now critical to Earth system models.Until maps of such variables are available from more direct sources, maps like these can be used for running such models in a spatially explicit manner.Maps based on models that consider a majority of potentially important variables can help illustrate net impacts on forest species composition.Like forest structure and functional groups, spatial patterns of species counts, which are indicators of species richness and diversity, also follow the network of drivers that control forest age and include geoclimate and in some places land-use history.The maps of introduced species counts and relative basal areas (Figures 7 and 8) show that introduced species are widespread across the study area and can often dominate stands.At the same time, they show that introduced species are uncommon in the abovementioned stressful environments: cloud forests and forests on serpentine substrate.These latter areas are among the least arable or accessible lands and, probably not coincidentally, are among the oldest forests that were made less accessible through relatively early protection.Introduced species are also less common in other of the longest-protected or inaccessible areas.Introduced species relative basal areas are increasingly larger as variables change in directions that that are most associated with forest clearing for agriculture: with decreasing elevation or ruggedness and decreasing distances to cities, and on volcanic, alluvial, or flat limestone substrates.Plot-level studies encompassing the entire study region, all of Puerto Rico, in subregions of Puerto Rico, and elsewhere in the Caribbean [32,78,[94][95][96][97][98][99][100], confirm that stands dominated by introduced species are widespread in the Caribbean islands.Disturbance type, including past or present land use, can strongly affect the species composition and diversity in Caribbean forests, including whether introduced species dominate forest regrowth [36,43,97,99,101,102].Over time, however, a similar land use before abandonment [95] or seed sources in an area being limited to nearby secondary forest [103] can make climate [95] or time since abandonment [103] the drivers that most strongly impact species composition.In addition, native species seedlings can thrive in the understory of stands dominated by introduced species [104,105].Plot-level studies also show that shade-tolerant introduced species, like Sysygium jambos ((L.)Alston), are are widespread in Puerto Rico, can dominate stands there, and can occur in older forests.Although it can persist for decades [78,102,106], it now appears to be susceptible to another introduced species, guava rust (Puccinia psidii Winter), which is relatively new to Puerto Rico [107].Finally, introduced, fire-tolerant grasses in dry and dry-moist areas can dominate land for decades.The introduced Leucaena leucocephala ((Lam.)de Wit) can regenerate in these areas through its fast growth, partly suppressing these grasses while in some places providing protective shade for native seedlings [100, 108,109].
Native and endemic species counts are mapped as being largest where forests are older and there has been less or less recent disturbance from land use.Puerto-Rico-wide plot-level studies show that species richness increases with basal area, stand age, proximity to the largest forest patches (which often encompass the oldest forests), and being on limestone or serpentine substrate, and that abandoned pasture is less diverse than recent shade coffee [43,97].Smaller-scale studies in Puerto Rico and elsewhere in the tropics show that species richness increases with stand age or basal area [77,78,102] and proximity to old forest patches [110,111].
Endemic species counts and relative basal area are concentrated at the highest elevations, where cloud forests occur, and on serpentine and karst substrates.Larger endemic species counts and relative basal areas were mapped where the largest old forest patches occurred within these areas.Plots in Northeastern Puerto Rico also found more endemic species in the cloud forests there [112].Endemic species often evolve in these environments, including in Puerto Rico, as they can be stressful and isolated [14,[113][114][115][116].
For individual species with higher mapping model correlation coefficients between observed and predicted values, spatial patterns mapped with PGNN were as expected from field observation or published descriptions.Examples in Figure 9 include an introduced species found in young or frequently disturbed dry forests (Figure 9a); a common native species found in dry and edaphically dry places (Figure 9b); a native species typical of humid karst areas (Figure 9c); and the most common species in Puerto Rico, which is an introduced species (Figure 9d).

Implications for Sustainable Development Goal Indicators
The UN SDGs address terrestrial and freshwater ecosystems and their biodiversity and sustainable management in SDG 15 [117].Many of its quantitative indicators are measures of national

Implications for Sustainable Development Goal Indicators
The UN SDGs address terrestrial and freshwater ecosystems and their biodiversity and sustainable management in SDG 15 [117].Many of its quantitative indicators are measures of national legislation or regulation.Examples are indicators (I) quantifying protected areas by ecosystem type (I 15.1.2),areas under sustainable forest management (I 15.2.1), protected areas important for mountain biodiversity (I 15.4.1), and legislation on invasive alien species (I 15.8.1).The five subindicators of progress towards sustainable forest management (I 15.2.1) include: (1) net rate of change in forest area, (2) change in aboveground biomass stocks in forest and forest area or area proportion under (3) legal protection, (4) a long-term management plan, or (5) a certified management plan.
Several of the maps we present are relevant to these SDG indicators and could help communicate to decision makers and the public the concepts or effects of these SDGs.They indicate important sites for biodiversity conservation (SDG Indicator 15.1.2),showing forest with high concentrations of native and endemic tree species.These areas have unique climate (cloud forests, also addressing SDG Indicator 15.4.1, important sites for mountain biodiversity) or geological substrate (serpentine and karst).In the case of Puerto Rico, a large portion of these areas is protected.Despite this protection advantage, some ecological zones and the ranges of many species remain poorly represented by protected areas, implying that more protection is needed there [70,72].The maps also depict forest biomass (I 15.2.1) and introduced species spatial patterns (I 15.8.1).
We note that many of the introduced species are pioneer species that are shade-intolerant and consequently expected to decline with stand age as canopies close during forest succession.They are also mostly deciduous.A drawback of the maps is that the model of the common introduced tree species S. jambos was poor.It is an evergreen species, and because it is shade tolerant, it has more potential to be invasive than the other most common introduced species (though, as mentioned, it is being impacted by guava rust).It has one of the lowest correlation coefficients between observed and predicted basal areas.In other words, the predictor variables that we used may only be effective for mapping introduced species that are common in young or disturbed forests or that are deciduous.Other data are needed to map introduced species that are evergreen and can colonize older evergreen forest stands.
In addition, concern over the effects of overfertilization of terrestrial ecosystems with macronutrients like nitrogen [118] have led scientists to call for SDG indicators related to atmospheric nitrogen deposition [119].Although it is a macronutrient, adding nitrogen to secondary tropical forests is thought to enhance tropical forest growth, but it can also inhibit growth [120,121].In addition, increasing the basal area of N-fixing trees, including exotic species or species other than legumes, increases nitrogen levels in leaf litter, forest floor organic matter, or soils, and soils of secondary tropical forests release more NOx to the atmosphere [28,87,104,109,122], and NOx's are potent greenhouse gases.The maps here illustrate where N-fixing species are most abundant.

Modeling Forest Characteristics vs. Modeling Individual Tree Species
As mentioned, the Cubist models of forest structure, species counts, and relative basal areas of functional groups and species origin usually had larger correlations between observed and predicted values than did PGNN models, even when test data included nonforest.A previous comparison of plot-level biomass values [123] similarly found that Cubist model outputs more closely matched observed values than GNN.The likely source of this tendency is that, here, the PGNN model is based on relative basal areas of all tree species rather than the particular structural, functional, or species origin variable being modeled.Even though species composition is related to structural and functional forest attributes, our results suggest that training individual models directly with these latter variables could better optimizes models.This approach could be tested for PGNN and GNN.
Although basing models on the relationships between the basal areas of all species and environmental gradients may be a disadvantage for modeling structural, functional, and diversity summaries, it apparently gives GNN and PGNN an advantage when mapping individual species distributions, as might be expected.Most tree species are associated with one or more communities of other tree species.Mapping tree species distributions with ordination axes improves maps of species that are rarely sampled, as previously observed [12,69], compared with models based only on such species occurrences.The ordination approach probably has the effect of increasing the number of observations of a species by virtue of that species' association with other species in a dataset.A linear regression of the correlation coefficients of observed vs. predicted values for species-level models against the number of nonzero observations for each species explained no variability in the range of these coefficients for the PGNN approach but 20% of the range of correlation coefficients for the Cubist model results.This relationship suggests that Cubist results are more sensitive to the number of nonzero observations of species in the dataset than are PGNN results.
Species with highly predictable ranges, however, like the Sierra Palm, (Prestoea acuminata var.montana ((Willd.)H.E. Moore var.montana (Graham) A. Hend.& G. Galeano), were mapped well by both PGNN and Cubist (Figure 4), despite having few positive observations.It occurs only at higher elevations on mainland Puerto Rico in humid forest zones.Prior work in temperate forests also finds that species with "strong climate controls" or "limited range" were the most easily mapped ones [11].Bursera simaruba ((L.)Sarg.) occurs on all islands.It had the most observations and the largest correlation coefficients between observed and predicted values.It, too, has a highly predictable range of occurring mainly in drier forests or in humid forests on the tops and sides of limestone hills.These coefficients were much smaller for Spathodea campanulata (P.Beauv.), the second most commonly observed species in the dataset.The most likely reason is that it is a widespread pioneer species.Besides a humid climate, which covers most of mainland Puerto Rico, the only thing it seems to need is sunlight [94,96,97,103].Prior work also found widespread species more difficult to map [11].

Conclusions
From the mapping results, prior studies, and international sustainability goals, we conclude that: 1.
The legacy of past large-scale clearing and subsequent forest recovery are clear when mapping current forest attributes in this tropical region.Introduced species are widespread across most of Puerto Rico and the Virgin Islands, and they often dominate stands.Moreover, introduced species are most prominent in younger forests.They are less common or rare, however, in the least-arable geoclimatic zones or more remote places.

2.
Native species counts are usually larger than counts of introduced species, and their relative basal areas tend to increase with stand age.Endemic species are concentrated in the oldest forests, which are in the least arable and most remote locations, including isolated extreme environments where endemic species often evolve, that were also, probably not coincidentally, protected early and in some places never cleared.3.
Our mapping models explained a minority of the variance in forest attributes.Still, our results suggest that PGNN is better than regression tree models for mapping individual species and that, as in others' work, the most common species are generally not the most accurately mapped ones.Instead, species with ranges limited by climate or other environmental factors are most easily mapped.Further, ordination in particular likely helps map species by virtue of their association with other species, which in effect increases the sample size for model training.

4.
In contrast, regression tree models appear to be better than ordination-based models for mapping other forest attributes, like forest biomass, functional groups, and species origin, at least when species basal areas are used to fit the ordination-based models.
See Wilson et al. 2012. 5U.S. Department of Agriculture (USDA) Natural Resources and Conservation Service (NRCS).

28 Figure 1 .
Figure 1.Puerto Rico and the Virgin Islands: (a) geological substrate; (b) potential evapotranspiration to precipitation ratio, a climatic indicator of moisture; and (c) elevation.The highest elevation areas are cloud forest areas.

Figure 2 .
Figure 2. Puerto Rico dry vs. humid geoclimatic zones by forest age[43].The oldest humid forests on volcanic substrates are at high elevations.

Figure 1 . 28 Figure 1 .
Figure 1.Puerto Rico and the Virgin Islands: (a) geological substrate; (b) potential evapotranspiration to precipitation ratio, a climatic indicator of moisture; and (c) elevation.The highest elevation areas are cloud forest areas.

Figure 2 .
Figure 2. Puerto Rico dry vs. humid geoclimatic zones by forest age[43].The oldest humid forests on volcanic substrates are at high elevations.

Figure 2 .
Figure 2. Puerto Rico dry vs. humid geoclimatic zones by forest age[43].The oldest humid forests on volcanic substrates are at high elevations.

Figure 3 .
Figure 3. Puerto Rico and two of its outying islands, showing (a) forest cover in the year 1951 and (b)

Figure 3 .
Figure 3. Puerto Rico and two of its outying islands, showing (a) forest cover in the year 1951 and (b) forest cover in the year 2000, with both showing protected areas [70,71] and their date of establishment.

Figure 4 .
Figure 4. AboveGround Biomass per Hectare (AGBH) (live dry weight in Mg/ha) as mapped with (a) PGNN and (b) Cubist regression tree modeling, and basal area of Prestoea acuminata var.montana as mapped with (c) PGNN and (d) Cubist.

Figure 4 .
Figure 4. AboveGround Biomass per Hectare (AGBH) (live dry weight in Mg/ha) as mapped with (a) PGNN and (b) Cubist regression tree modeling, and basal area of Prestoea acuminata var.montana as mapped with (c) PGNN and (d) Cubist.

Figure 5 .
Figure 5. Stand age in Puerto Rico circa the year 2000 (a) and relative basal areas, as mapped with Cubist regression tree models, of (b) N-fixing species and (c) deciduous N-fixing species.

Figure 5 .
Figure 5. Stand age in Puerto Rico circa the year 2000 (a) and relative basal areas, as mapped with Cubist regression tree models, of (b) N-fixing species and (c) deciduous N-fixing species.

28 Figure 7 .
Figure 7. Plot-level counts of (a) native species, (b) introduced species, and (c) endemic species as mapped with Cubist regression tree models.

Figure 7 .
Figure 7. Plot-level counts of (a) native species, (b) introduced species, and (c) endemic species as mapped with Cubist regression tree models.

Figure 8 .
Figure 8. Relative basal areas of (a) native species, (b) introduced species, and (c) endemic species as mapped with Cubist regression tree models.

Figure 8 .
Figure 8. Relative basal areas of (a) native species, (b) introduced species, and (c) endemic species as mapped with Cubist regression tree models.

Figure 9 .
Figure 9. Basal areas in m 2 /ha of four species as mapped with PGNN modeling, including (a) an introduced species found in young or frequently disturbed dry forests; (b) a common native species found in dry and edaphically dry places; (c) a native species typical of humid karst areas; and (c) the most common species in Puerto Rico, which is an introduced species.

Figure 9 .
Figure 9. Basal areas in m 2 /ha of four species as mapped with PGNN modeling, including (a) an introduced species found in young or frequently disturbed dry forests; (b) a common native species found in dry and edaphically dry places; (c) a native species typical of humid karst areas; and (c) the most common species in Puerto Rico, which is an introduced species.

Table 1 .
Mapped predictor data.TS = Developed for this study.

Table 2 .
Correlation coefficients (Correl.Coef.)and mean absolute error (MAE), in the same units as the forest characteristics, for observed vs. predicted plot-level values for mapping models of forest structure, tree species characteristics, and functional traits, ordered by decreasing correlation coefficient for Cubist models.Forest Characteristic

Table 2 .
Correlation coefficients (Correl.Coef.) and mean absolute error (MAE), in the same units as the forest characteristics, for observed vs. predicted plot-level values for mapping models of forest structure, tree species characteristics, and functional traits, ordered by decreasing correlation coefficient for Cubist models.

Table 3 .
Count of nonzero observations, correlation coefficients (Correl.Coef.), and mean absolute error (MAE) in m 2 ha −1 for observed vs. predicted plot-level values for mapping models of species basal areas (BA), ordered by decreasing correlation coefficient for PGNN models.These results were nearly identical if nonforest test data were included for PGNN.
5. Several forest attributes addressed in UN SDGs have spatial patterns that reflect stand age, geoclimate, and aspects of disturbance history other than stand age since abandonment or last disturbance, like past disturbance type or intensity.Forest age, height, biomass, or disturbance-type history can now be mapped more accurately with long time series of globally