Mapping Species Composition of Forests and Tree Plantations in Northeastern Costa Rica with an Integration of Hyperspectral and Multitemporal Landsat Imagery

An efficient means to map tree plantations is needed to detect tropical land use change and evaluate reforestation projects. To analyze recent tree plantation expansion in northeastern Costa Rica, we examined the potential of combining moderate-resolution hyperspectral imagery (2005 HyMap mosaic) with multitemporal, multispectral data (Landsat) to accurately classify (1) general forest types and (2) tree plantations by species composition. Following a linear discriminant analysis to reduce data dimensionality, we compared four Random Forest classification models: hyperspectral data (HD) alone; HD plus interannual spectral metrics; HD plus a multitemporal forest regrowth classification; and all three models combined. The fourth, combined model achieved overall accuracy of 88.5%. Adding multitemporal data significantly improved classification accuracy (p < 0.0001) OPEN ACCESS https://ntrs.nasa.gov/search.jsp?R=20150019878 2020-02-17T15:36:55+00:00Z


Introduction
Tree plantations and tree crops are expanding rapidly across the globe in response to rising demand for wood pulp, timber, fruit, and vegetable oil [1,2]. However, the current distribution of tree plantations and tree crops is often estimated from government documents reporting planted areas that are of variable accuracy and lack consistent monitoring [3][4][5][6]. Global and regional maps of tree plantations and tree crops using readily available satellite imagery are greatly needed to improve estimates of carbon storage and monitor land-cover change, particularly in tropical regions where extensive field inventories are rare [3,5].
In this study, we assess the potential of an integration of hyperspectral and multitemporal multispectral imagery to aid regional reforestation monitoring by distinguishing tropical tree plantations from other forest types and identifying tree plantations at the species level. Most region-scale maps of forest cover based on multispectral or SAR satellite imagery do not distinguish tree plantations and tree crops from other forest types [7,8] and often combine them with secondary forests [6,[9][10][11][12][13]. This consolidation of forest types can oversimplify and mask key ecological differences that are important for monitoring forest change, biodiversity, and carbon stocks. Tree plantations and tree crops support much lower levels of biodiversity than mature and secondary forests, and potentially have higher rates of clearing for timber management and soil carbon emissions [14][15][16].
Different tree plantation species can also have distinct impacts on local ecosystem services and biodiversity. For example, Eucalyptus spp. and Pinus spp. are prone to burning in dry regions, which increases carbon emissions [17] and species within plantations differ in carbon sequestration rates [18]. Plantation species with open or structurally complex canopies can promote dense native understory regrowth and forest connectivity [19][20][21]. Conversely, fast-growing tree species with dense shade can function as "biological deserts" with very low biological diversity [22]. Similarly, crop species like oil palm, rubber, and fruit trees are often intensively managed in monoculture plantations [2,23].
In Costa Rica, despite an intense interest in REDD+ and substantial government investment [24], the degree of success of reforestation programs in increasing the cover of tree plantations is not well known. Government payments to establish tree plantations during the 1980s led to a large pulse of reforestation [9,[25][26][27][28][29][30], and the recent 1996 Forestry Law (no. 7575) further encouraged tree plantation establishment through payments for environmental services (PES) [31][32][33][34][35]. Northeastern Costa Rica in particular has received extensive reforestation payments, in part because of efforts to increase habitat connectivity in the San Juan-La Selva Biological Corridor (SJLSBC; Figure 1) [32]. Previous satellite image-based land-cover maps of the SJLSBC have not successfully distinguished tree plantations or individual tree plantation species from other forest types at regional spatial extents [9,[36][37][38]. However, high loss rates of reforestation (a land cover category including both secondary forests and tree plantations) have been observed in this region [9,25,29]. As a result, the long-term persistence and composition of tree plantations in this landscape remains unclear [9].
Hyperspectral imagery may improve our ability to accurately distinguish tropical tree plantations from other forest types and to identify species within plantations [39]. Hyperspectral imagery has been used to identify tree species occurring within temperate and tropical forests [40][41][42]. Most subtropical and tropical studies to date have relied on high-resolution hyperspectral imagery to identify rainforest and savanna tree species at the individual tree canopy scale [39,[43][44][45][46][47]. Moderate-resolution (10-30 m pixels) hyperspectral imagery has been used to successfully distinguish emergent forest canopy trees in Peru [48], invasive tree species in Hawaii [49], Indian tree species [50], and mangrove forest species [51]. A small number of studies have used high-resolution data to determine tree species within already-delineated timber plantations as part of timber inventories [52,53].
Despite the potential for identifying forest species with hyperspectral imagery, the degree to which tropical forest types and tree plantation species composition can be mapped remains unclear. The relatively few studies that have attempted to distinguish tropical secondary forests from mature tropical forests using hyperspectral data have had variable accuracy [54][55][56]. Distinguishing tree plantations from other forest types may present an additional challenge. Spectral reflectance of tree plantations can be quite variable even within the same tree species, changing with plantation nutrition and disease status [57], degree of canopy disturbance [58], underlying soil type [59], and plantation age [52]. Tree plantation classification accuracy across landscapes is highly variable in the multispectral sensor literature, with frequent confusion between plantations and both secondary and mature forests, respectively [9,[60][61][62][63][64][65][66][67].
Some non-native species, such as pine, acacia, and eucalyptus, have been classified with high overall accuracy (>80%) using multispectral data [60][61][62], implying that accurate plantation discrimination may be less challenging when tree plantations and tree crops are not comprised of or mixed with native species or when spectral contrast with local native vegetation or agriculture is high. In northeastern Costa Rica, Fagan et al. [9] were able to classify non-native tree plantations with high accuracy using multispectral data, but native tree plantations were typically confused with secondary forests (<65% accuracy). Sesnie et al. [37], working in the same region, was able to classify a general tree plantation class with ~90% accuracy using Landsat and ancillary data and a labor-intensive image-subsetting technique; however, class accuracy decreased to 55% when implemented at larger image extents. Spectral confusion with native vegetation is a well-known challenge in agroforestry and tree crop systems, particularly in mapping shade coffee [68,69], oil palm [70][71][72][73], and evergreen rubber tree plantations [74,75].
In this study we developed a cross-sensor approach to map forest types that combines infrequent airborne hyperspectral imagery with multitemporal satellite imagery. Hyperspectral data has frequently been fused with LiDAR, and to a lesser extent SAR, [41,47,76,77] to conduct single-date mapping of forest types. However, this approach is often constrained by data availability and cost to relatively small areas. Incorporating multitemporal, inter-annual data on forest disturbance and spectral change into classifications may help to distinguish tree plantations and secondary forests from less disturbed mature forests across large areas. Currently, the data source with the longest temporal record and global coverage is the freely-available imagery collected by the moderate-resolution multispectral Landsat sensors [77]. Landsat data has been effectively used to map and age tropical reforestation [12,78], and single-band spectral chronologies have been used to track disturbance to forests and forest recovery over time [79,80]. Although frequent cloud cover can lower the utility of multitemporal approaches by decreasing image frequency [81], less frequent image intervals may still contain valuable information for distinguishing undisturbed primary forest from reforestation in recently disturbed sites [78,81,82].
The principle objectives of this study were to: (1) test whether integrating hyperspectral imagery with multitemporal Landsat information increases discrimination of tree plantations, secondary, and mature forest types, compared to hyperspectral imagery alone; and (2) examine whether this integration also improved mapping of tree plantation species composition. We hypothesized that accurate plantation discrimination would be more difficult when tree plantations are comprised of native species. We further hypothesized that integrating Landsat multitemporal data with hyperspectral data would improve the overall single-date classification accuracy of tree plantations in northeastern Costa Rica.

Study Area
The study area encompasses principal reforestation areas in the lowlands of northeastern Costa Rica, covering a total of 4509 km 2 ( Figure 1). We selected this region for study because of the extensive availability of satellite and airborne remote sensing data, a history of large-scale reforestation efforts, and the presence of an established PES program (1996) that subsidizes tree planting and maintenance among landowners. Ongoing partnerships between American and Canadian space agencies and the Costa Rican government have led to numerous aerial missions in the last decade to collect hyperspectral, LiDAR, SAR, and high-resolution multispectral data across much of the country [83,84]. The long-term rainforest monitoring plots, experimental tree plantations, and secondary forests at La Selva Biological Station have been the particular target of remotely sensed data collection.
The northeastern lowlands are an agricultural frontier whose settlement expanded rapidly in the late 1960s leading to widespread deforestation [29,30,85,86] that slowed after 2001 [9,25,34]. The study area has moderate variability in annual rainfall (3.2 ± 0.8 m/year) and elevation (108 ± 98 m), with central ranges of low hills cut by broad river valleys giving way to coastal plains to the east. Despite loss of over half of the area's forest cover, the Atlantic lowlands retain large patches of old-growth forest outside protected areas [9,87]. The SJLSBC was established in 1997 to maintain and re-establish forest connectivity between Costa Rica's montane parks and Nicaragua's lowland Indio Maiz Biological Reserve [88].
To evaluate the extent of reforestation in this landscape, we mapped the narrow central waist of the San Juan-La Selva Biological Corridor and adjacent areas using aerial hyperspectral imagery ( Figure 1). Adjacent areas outside the SJLSBC were mapped to the northwest and southeast (along flight lines) within Costa Rica (hereafter, the "study region"; Figure 1).

Remote Sensing Data
HyMap II hyperspectral data were acquired by the NASA CARTA-II flight campaign conducted over northern Costa Rica from 01 March to 25 March, 2005 [83]. The HyMap II whiskbroom sensor (manufactured by HyVista Corporation, Castle Hill, Australia) included 125 spectral channels covering the 458 to 2491 nm range at a 15 nm sampling interval. Water absorption regions centered at 1350 and 1850 nm were excluded. The instantaneous field of view covered a 61° swath with a 14.2 to 16.7 m pixel size [83]. Seven HyMap images with low cloud cover (<20%) covering the central SJLSBC and adjacent areas were selected for analysis (Table A1).
Image pre-processing and processing steps to normalize reflectance values and improve image quality for enhancing discrimination of forest types and plantation species are summarized in Figure 2. We used the HyMap surface reflectance data products released by HyVista, which included georectification from onboard DGPS, ATREM model processing to correct for atmospheric effects, and EFFORT polishing of reflectance values, described in Arroyo-Mora et al. [97]. We resampled HyMap images to the mean pixel size (15 m) using the nearest neighbor method and co-registered image tiles to a 2005 Landsat 7 pan image (<7.5 m root mean squared error (RSME); Table A1) using an automated tie point program [98]. Clouds, cloud shadows and terrain shadows were masked from each image using manual decision trees based on NDVI and hyperspectral bands centered on 460, 500, 1170, and 1460 nm. We corrected the masked images for cross-track and along-track illumination differences using a bilinear cross-track correction algorithm (A. Singh, in prep). Because of differences in illumination among image dates and residual errors from georectification (<7.5 m RSME), we conducted cross-image illumination correction using the haze correction relative normalization method [99], selecting the reference normalization image that had the lowest blue-band reflectance in shadowed areas (Table A1).

Figure 2.
Flow diagram of pre-processing and processing steps. Blue diamonds denote images, green squares mark image processing, white boxes and circles denote data derived from images or field work, grey circles mark randomly selected, independent testing and training data, and red circles mark analysis products.
We mosaicked the resulting final images, replacing cloudy and shadowed pixels with clear pixels in overlap areas where possible ( Figure 1). Some minor radiometric differences between images persisted after correction; we used spectrally stable old-growth forest targets [81] to assess among-image differences and compare spectral variation between the HyMap mosaic and Landsat (see below) imagery. We estimated that among-image radiometric differences led to an additional 9% variance in spectral reflectance, on average, at stable forest targets. We further addressed radiometric variability across image tiles by collecting training and validation data across the entire study area (see Section 2.3).
Multispectral data were acquired by the Landsat 5 and 7 sensors (scene path 15, row 53) in four years (1986, 1996, 2001, and 2005) with low annual cloud cover, between the months of October and March (Table A1). Each year selected was further composited to remove existing clouds using the next closest image dates to the degree possible (Table A1) Landsat 7 images to less than 0.5 pixel RMSE (<15 m) using an automated tie-point program [98]. We corrected images for atmospheric effects using LEDAPS [81] and radiometrically normalized all images to the most haze-free image in the time series (Table A1) using the MAD algorithm [100]. To remove clouds and Landsat 7 missing scan line errors, all images except 2001 were composites of two or three image dates separated by less than thirteen months. Prior to image mosaicking, clouds and line errors were masked using custom decision trees employing Band 1, 3, 4 and 6 thresholds. Further details on image processing can be found in Fagan et al. [9] supplemental. To highlight recently disturbed forest and tree plantations in a single measure for multitemporal analyses, we used a single Landsat band (band 5; 1550-1750 nm) for each pixel across the image time series (Figure 3). Landsat band 5 by itself has been used to detect forest disturbance in previous  [101,102], and in our preliminary analysis (M.E. Fagan, unpublished data), it performed as well or better than other single-band metrics like Normalized Burn Ratio and Tasseled Cap bands [79,80]. For each Landsat image pair we calculated the Euclidean spectral distance in band 5, across all possible image pairs; masked cloud and scan-line error pixels were omitted from the calculation. We used the mean and variance of spectral distance to summarize the magnitude and variability of spectral change for each pixel. For example, pixels with a larger mean distance and high variability represent forest areas that likely experienced disturbance, such as tree plantations with rapid harvest rotations.
To further characterize forest disturbance and reforestation, we classified the Landsat image time series. Prior to classification, we selected six spectral bands (Landsat bands 1-5 and 7) and calculated four vegetation indices from the Landsat data. The vegetation indices were NDVI [103], the Brightness and Green bands from the Tasseled Cap transformation [104], and a normalized difference of Band 2 and Band 5 (ND25, calculated like NDVI). These ten bands, along with elevation derived from a hole-filled SRTM DEM (v. 2.1) at 90-meter resolution [105,106], were the inputs into the classification. We used a Random Forest supervised classifier (described below) and field-collected training and validation data (for full details, see Fagan et al. [9]; Figure A1). Landsat images were first classified to forest and nonforest; these forest masks were integrated over time to distinguish stable, mature forest (forest persisting from 1986 to 2005) from reforestation and open land cover. Independent testing data indicated high within-year classification accuracy of the forest masks (e.g., 96% overall accuracy for 2005) and amongyear forest-change accuracy for the three land cover types (88% overall; see Fagan et al. [9] for details). Because distinguishing reforestation from other forest types was our primary goal, we converted the Landsatderived 2005 land cover map into a mask that indicated the presence or absence of reforestation.
To permit the inclusion of the 30 m Landsat data in the hyperspectral classification, final summary metrics and land cover variables were resampled to a 15 m pixel size and then matched to the geo-registered HyMap image using a nearest neighbor resampling method.

Field Data Collection
Georeferenced land use data came from three sources: field observations, surveyed boundaries of managed forests (polygons), and visual interpretation of georeferenced high-resolution aerial photos. We collected 1859 land use data points in all for classification model training and field validation. Field observations consisted of a total of 1575 land use points collected using a handheld global positioning system (GPS) device during separate field seasons in 2004 and 2005 (Trimble GeoXT; [25]) and 2009 to 2012 (Garmin 60CSx; [9]). The 2009 to 2012 field data was backdated to 2005 by comparing it with high-resolution aerial photos collected during the CARTA-II mission (0.5 m pixels). An effort was made to locate field points at intervals greater than 250 m along paved and unpaved roads across the region. Each land use point was located a minimum of 60 m from a road. Information recorded at each point included the dominant land use within a 60-m radius area, tree species, and additional information on land use history such as forest age. Where the exact age of secondary forests was unknown, visual assessment of forest cover in two types of imagery was used to determine the approximate age of secondary forest points: the 1986 and 1996 Landsat imagery, and black and white aerial photos taken in 1986 and 1992 (1:60,000 scale; [30]).
Tree plantation boundaries were georeferenced by a local forest extension organization, FUNDECOR. Boundary polygons were collected using Garmin 60CSx GPS units and in many cases were refined through repeated visits. Field data from these polygons were generated by randomly locating target points at a distance from the polygon edge (>30 m inside the polygon) and then checking the resulting points (n = 64 points) in the high-resolution CARTA-II imagery. CARTA-II aerial photos were also used to directly create 220 land use points. Image interpreted points were selected a minimum distance of 500 m from existing land use points to increase the number of training and validation samples in under-surveyed image tiles within the hyperspectral image mosaic. Each point selected from aerial photos was surrounded by at least a 60 m radius homogenous area.
To develop training data for image classification, we randomly selected 70% of the land use data points in each class. Each training point was buffered by a 60 m radius polygon known to be a single land use type from field and aerial photo inspection, and all pixels that fell within the 60 m buffer were assigned to that land use class as training pixels. Overlapping land use polygons were winnowed by random selection; all resulting polygons were separated by >150 m. This resulted in a total of 1287 points and 33,182 pixels for training, representing 0.2% of the hyperspectral image. No single land use class was trained with less than 250 pixels and the median was 1028 pixels ( Table 2).

Land Use Classes
We classified three main forest types, tree plantations, mature forests, and secondary forests, to meet our first objective of testing the ability of hyperspectral and multitemporal data to distinguish tree plantations from other forest types. Mature forests were defined as lowland rainforests or swamp forests older than available Landsat time series data (i.e., >24 years in age; (Table 2)). This category potentially includes a wide range of forest compositions, as the majority of mature forest constituted selectively-logged old-growth lowland rainforest (S.E. Sesnie, pers. comm.). Secondary forests were defined as completely-cleared area of forest regrowth <24 years in age (the limit of our historical records). Tree plantations consisted of closed canopy monocultures made up of one of six tree plantation species (Table 2). To improve discrimination of these key forest types and improve monitoring of land use change, we also distinguished 11 common non-forest land uses, for a total of 20 land use classes initially (Table 2). However, for the first set of analyses distinguishing forest types, land use classes were grouped post-classification into the three forest categories defined above and a non-forest class, categorized as "other" ( Table 2).
To meet our second objective and examine how well hyperspectral and Landsat data distinguish tree plantation species composition, subsequent analyses focused on classification accuracy for individual tree species in areas classified as tree plantations. Thus, we reported the class accuracy of tree plantations both as a single land cover class and by species composition. Table 2. Class information and descriptions for the land-use classes in this study. The classification class was used for the initial model classification and map prediction, and the summary class was used for comparisons of the Random Forest models. Tree plantations were assessed both as a grouped class and as separated class (i.e., each species) in the final summary accuracy assessment, and so are marked in italics. Classes referred to as reforestation in the text are in bold, and include secondary forests and tree plantations.

Classification Model Comparison
We compared four separate models integrating multitemporal Landsat data into a single-date hyperspectral classification (Table 3). The first classification model ("Hyper") consisted of hyperspectral data only. A second model ("HyperLs") used both the hyperspectral data and two metrics derived from multitemporal Landsat spectral data (i.e., the mean and variance of band 5 over the image composite time series) as predictor variables in a supervised classification. The third model ("HyperLC") modified the Hyper results post-classification using the time series of Landsat land-cover data to reclassify mature forests as secondary forest where they intersected Landsat time series-derived secondary forest. The fourth model ("HyperLsLC") was exactly the same as the third model, except that it modified the results of the HyperLs classification model. Prior to implementing classification models, we reduced data dimensionality for the training data using linear discriminant analysis (LDA). LDA has been extensively used to discriminate tree species with remotely sensed data and performs well with hyperspectral data despite violations of normality assumptions [39,107]. In this analysis, the 105 hyperspectral bands ( Figure 4) were reduced to 19 LDA bands with low correlation. The LDA helped to reduce radiometric differences across image tiles, improving the consistency of training and testing samples across the study area. The LDA model was developed using all 20 possible land cover classes because initial analyses revealed that combining land use classes prior to classification markedly decreased forest type discrimination.
The Random Forest (RF) machine learning algorithm was used to develop a classifier for the LDA-processed hyperspectral and Landsat image data [108]. Random Forest is an ensemble decision tree classifier that is non-parametric and flexible with regards to data distributions and assumptions (see [108] for a full description). RF has been used for hyperspectral classification [45,109,110] and the analysis of datasets with hundreds to thousands of potentially correlated predictors (e.g., [111]). We implemented RF using the RandomForest package in the R statistics package v. 2.14.1 [112], with default settings and n = 1000 decision trees per classification.  Table 2 for class name acronyms. The gray polygon represents one standard deviation above and below the yellow secondary forest class (abbreviated here as "Secfor"). Water-absorbing spectral regions (centered on 1.3 and 1.9 µm) were excluded from the analysis and are not shown.

Post-Classification Processing
Following hyperspectral image classification, post-processing was conducted to remove some misclassification errors and improve discrimination of individual land cover classes. The Random Forest classifier outputs from the Hyper and HyperLs models were used to predict land use maps, and each map was filtered to remove classification speckle anomalies in three steps. First, a duplicate image was processed with a 3 × 3 moving window majority filter. Second, adjacent pixel clusters less than 4 pixels in size were sieved in the original classified map. Finally, sieved pixels were replaced with pixels from the filtered duplicate image. This process removed speckle, preserved narrow linear features from replacement by majority classes, set the minimum mapping unit at 0.09 hectares (four pixels), and increased map accuracy by 1.7%-2% across classification models. Post-processed map accuracy was assessed using independent validation data, as described below.
The post-processed classification results from Hyper and HyperLs were the starting points for the HyperLC and HyperLsLC models, respectively ( Figure 2, Table 3). We used the Landsat-derived 2005 reforestation mask to reclassify mature forest in the Hyper and HyperLs land use maps to secondary forest. We then assessed the accuracy of the reclassified land use maps for the HyperLC and HyperLsLC models.

Independent Validation Data
To assess classification accuracy with independent validation data, we evaluated 1086 testing points using a stratified random approach. For each land use class, 64 randomly-located data points were generated using the Raster package in R statistics software (v. 2.14). The land use class of each validation point was identified using aerial and Landsat imagery. Points were discarded when they intersected a cloud, cloud shadow, or were located within a land use type that could not be identified.
Validation samples from aerial and satellite imagery for tree plantation species and secondary forests could only be confirmed for a limited number of sites, contributing to a low number of initial samples for these forest types. In order to obtain a sufficient number of independent samples to evaluate each classification model, we combined the stratified random sample with field validation data (Section 2.3) randomly withheld from the classification model. We randomly withheld 30% of the field data for testing purposes, or 558 land use points. All retained testing points were separated by >100 m from each other and training data polygons to reduce spatial autocorrelation between samples. The median number of testing samples across all classes was 74 ( Table 2).

Subsampling the Validation Data
To correct for an uneven number of testing samples [113], we equalized sampling effort across classes by creating multiple balanced sets of test data using a subsampling bootstrap of the original test data [114]. "Subsampling" bootstrap samples without replacement were developed relatively recently [114] and are mathematically straightforward and considered more robust to lack of independence than traditional bootstraps [115]. Subsampled bootstraps allowed the estimation of confidence intervals through an asymptotic sampling function based on the Central Limit Theorem (by default and in our study, a square root function), a selected subsample size (n = 20, sampled without replacement) from each land use class, and the original population size of validation points (Table 2) [114,115]. Subsample bootstraps were run 1000 times to develop confidence intervals around accuracy estimates and approximate a stratified random sampling design [116][117][118].
We compared RF model performance using four summary land use categories: tree plantations, secondary forests, mature forests, and all other classes ( Table 2). Each of the four summary land use categories (e.g., mature forest) was subsampled as one class in the accuracy assessment, with proportional sampling from each sub-class (e.g., lowland mature forest and swamp forest). For each model, we evaluated RF model accuracy using subsampled test data, calculating the mean and 95% confidence interval of several statistics from error matrices: overall percent accuracy, Kappa statistic, and omission and commission accuracies for each class. For the class accuracy statistics, we used the size of the original, unbootstrapped sample ( Table 2, "Testing points") to calculate standard error in confidence intervals. For the overall accuracy and Kappa statistics, we used the mean of the test sample size across land use classes (82 pixels) to calculate standard error in confidence interval estimates. To statistically compare overall and class accuracies among classification models, we used generalized linear models (GLM).

Forest Type Classification
The four RF models showed increased overall bootstrapped accuracy as additional multitemporal data were added to each classification approach (Hyper model, 83.5%; HyperLsLC: 88.5%). The addition of land cover information on secondary forests (LC models) significantly improved (p < 0.0001) accuracy by +1.7 to +2.7 ( Figure 5); Kappa values also significantly increased (p < 0.0001; Figure A2). The addition of multitemporal spectral data led to a slightly smaller increase in accuracy than the Landsat-derived reforestation data (model HyperLC vs. HyperLs, p < 0.0001), but combining both land cover and spectral data led to the highest overall model accuracy (HyperLsLC, p < 0.0001). For the Hyper and HyperLs models, the Random Forest "out-of-bag" (OOB) internal accuracy measure was much higher and less variable (~96%) than the overall accuracy derived from independent test data because spatial autocorrelation was reduced for independent test data ( Figure 5).

Figure 5.
Overall accuracy comparison of the four Random Forest models. For each model, the bootstrapped overall accuracy for the four summary land-use classes is shown in black (mean and 95% confidence intervals), and the original unbootstrapped accuracy is shown in red. Letters denote that bootstrapped accuracy is significantly different for all of the models (p < 0.0001). For the two models on the left with distinct RF classification models (see text), we show the overall mean accuracy of the full twenty-class RF model on the original unbootstrapped testing data and the corresponding OOB accuracy for the same twenty-class model.
Tree plantations were well discriminated from secondary and mature forests across all models ( Figure 6, Table S2). Although secondary and mature forests were confused with each other, the Hyper model classified tree plantations with 85.6% producer's accuracy. Including multitemporal spectral data (models HyperLs and HyperLsLC) caused a significant (p < 0.0001), but small (+1.5), increase in the user's accuracy of tree plantations. The inclusion of multitemporal land cover information (e.g., the HyperLC model) did not increase the accuracy of tree plantation classification, because the information was only used to reclassify mature forests to secondary forests.

Figure 6.
Producer's and user's accuracy for the Random Forest models, by summary land-use class (mean ± 95% CI). Producer's accuracy is defined as 100 minus the error of omission for a given class, and user's accuracy is defined as 100 minus the error of commission for a given class. The Hyper model had relatively low producer's accuracy for secondary forests (54.5%) and user's accuracy for mature forests (77.5%), primarily because of confusion between these forest types. The addition of multitemporal Landsat spectral data (models HyperLs and HyperLsLC) showed a significant increase (+5.8 to +7.5 between models, p < 0.0001; Figure 6) in the producer's accuracy of secondary forests. Models with multitemporal spectral data had markedly improved discrimination of mature forests from tree plantations and secondary forests, and only slightly lower discrimination between tree plantations and secondary forests (Table S2). As a result, small but significant increases in user's accuracy of mature forests were observed in these models (p < 0.0001), along with small but significant declines in producer's accuracy for mature forests and in user's accuracy for secondary forests.
Incorporating information on secondary forests from Landsat-derived land cover maps (models HyperLC and HyperLsLC) significantly improved discrimination of secondary forests overall (12.7 to 15.0 between models, p < 0.0001; Figure 6). There was a concomitant small decline in accuracy for mature forests (~−2 across models in user's and producer's accuracy, respectively; Figure 6, Table S2). Because initial analysis indicated heavy confusion between mature and secondary forests in the Hyper model ( Figure 6), we assumed that the spectral similarity and species overlap between these two forest types justified reassignment based on multitemporal information. Despite the highest overall accuracy for the HyperLsLC classification models, secondary forests only achieved acceptable producer's accuracy (75.8%) because of persistent confusion with tree plantations and non-forest land uses (Table 4).

Tree Plantation Species Discrimination
Across all of the models, all six tree species within plantations were classified with acceptable (≥67%) to high (>80%) producer's accuracy and, with the exception of Vochysia spp., high user's accuracy (Figure 7, Table 5).

Figure 7.
Producer's and user's accuracy across the four Random Forest models for tree plantations as a general class (left column; mean ±95% CI) and for each tree plantation species. Producer's and user's accuracy are defined in Figure 6. Please note that, for clarity, the order of the species and RF models has changed slightly from other figures. The three non-native species are grouped on the left, while the three native species are grouped on the right. The main source of omission error for tree plantation species was misclassification as secondary forests ( Table 5). The three non-native tree species had higher classification accuracy than the native tree species, and Terminalia spp. had the lowest producer's accuracy and greatest confusion with secondary forests, followed by commission errors for Vochysia spp. (Table 5).
As noted above, the reclassification of secondary forest using land cover maps (the-LC models) had little effect on tree plantation classification accuracy. Differences between the Hyper and HyperLC models, for example, were likely a result of minor differences between RF decision tree classifiers that were developed separately. Including multitemporal spectral data, however, positively and negatively affected classification accuracy for plantation tree species composition. Tree species with more open canopies, including Citrus, Tectona, and Terminalia, had small declines (1%-7%) in producer's and/or user's accuracy in the Ls models (Figure 7). Vochysia and Gmelina, by contrast, had large increases (4%-8%) in both producer's and user's accuracy, while Hieronyma was largely unchanged between models (Figure 7).
Landscape analysis of the most accurate overall map (HymapLsLC, Figure 8; Table A3 has the full map accuracy statistics) revealed that 10.5% of all forests in the study region were tree plantations in 2005, but only 1.3% of all forests were native tree plantations (Figure 9).  [9]); nonforest is shown in light gray and forest in light green.

Classification of Tree Plantations
We found that single-date hyperspectral data was able to accurately distinguish tree plantations from mature forests and secondary forest, although secondary forests had poor overall accuracy. Furthermore, hyperspectral data alone (Hyper model) accurately classified all six tree plantations to species with acceptable to high (75%-92%) producer's accuracy and intermediate to very high (57%-99%) user's accuracy (Figure 7). There was marked variation among tree plantation species in classification accuracy ( Table 5). A common non-native plantation species, Tectona spp., had the highest user's and producer's accuracy across models, most likely because of its distinctive foliage and spectral values in the mid infra-red (1490-1760 nm and 2000-2400 nm; Figure A3). Terminalia spp. tree plantations had the lowest producer's accuracy across models and were often misclassified as secondary forest. This tree genus has a thin, high canopy with low leaf area index (LAI) [89] and its spectral signature is likely to have been influenced by abundant native understory trees.
Because monospecific tree plantations were of limited spatial extent in our landscape relative to secondary and mature forests (Figure 9), we chose to first assess accuracy for tree plantations as a single map category, and then assess accuracy for individual reforestation species classes. Using a stratified random and bootstrapped sample of our validation data, producer's and user's accuracies for the tree plantation species were acceptable to high across models (Figure 7). In support of our first hypothesis, non-native tree species had higher producer's accuracy than native species across models (Figure 7). Native genera, like Vochysia spp., were present in multiple forest types (mature and secondary forest, and tree plantations), which may have lowered accuracy. Vochysia ferruginea, in particular, is quite common in secondary forests in the region (R.C. Chazdon, pers. comm.). It is also likely that the higher number of training pixels for the non-native species may be partly responsible for differences in spectral separability between native and non-native tree species. Tree species classification accuracy from this study was comparable to other studies that have attempted to separate tropical tree species using aerial hyperspectral data. For example, Clark et al. [43] achieved 92% overall accuracy in separating seven Costa Rican tree species with high-resolution HYDICE imagery, but individual species' producer's accuracies varied from 70%-100%. More recently, Feret and Asner [39] had ~85% overall classification accuracy with six Hawaiian tree species and 73.2% overall accuracy with seventeen tree species using high-spectral resolution CAO imagery. In our study region, reflectance values for tree species within plantations may have overlapped with natural forest because of dense understory regrowth, local canopy dieback from disease or seasonal leaf loss, the occurrence of similar species in both plantations and secondary forests, and spectral averaging in moderate-resolution imagery [48,119]. However, there was little confusion between tree plantation species in our study. As a result, assessing the accuracy of tree plantations as one singular forest type rather than as individual tree plantation species only slightly improved the producer and user's accuracy (85% and 84%, respectively; Table 5) over the mean accuracy across tree species (83% and 83%, respectively).

Classification of Other Forest Types
Despite high accuracy with tree plantations, we were not able to accurately classify secondary forests using hyperspectral data alone (<55% producer's accuracy), in large part due to their confusion with mature forests and, to a lesser extent, tree plantations ( Table 3). The poor secondary forest accuracy we observed is similar to the results of Galvao et al. [55] with CHRIS-PROBA imagery in Brazilian landscapes, rather than the higher accuracy reported by Thenkabail et al. [54] using Hyperion imagery in African forests. Because spectral NIR reflectance saturates as leaf area index and canopy closure increases during succession, we might expect mature forests to be spectrally similar to older reforestation in the absence of consistent species differences in reflectance [56]. If this is true, moderate-resolution hyperspectral images may be similar to moderate-resolution multispectral images in their inability to distinguish mature from older secondary forests, particularly in wet tropical forest that can quickly achieve canopy closure in less than a decade [55,77].
Future efforts to map secondary forests using single-date hyperspectral imagery could be more successful using spectral signatures of pioneer species (e.g., Cecropia spp.) and canopy N content as indicators of secondary forest [54,[122][123][124]. In our image mosaic, the continuous spectral transitions between forest successional stages observed in other tropical regions [125] may have been masked by spatial variation in reflectance due to selective logging, image differences, and atmospheric haze. Forest undisturbed by selective logging was relatively rare in our landscape. Forests with different logging intensities have been shown to exhibit distinct hyperspectral spectral reflectance [97].

Hyperspectral and Multitemporal Data Classification Accuracy
The inclusion of multitemporal Landsat information in the hyperspectral analysis significantly improved the classification accuracy of tree plantations, secondary forests, and mature forests ( Figure 6). We were able to distinguish between all forest types with good producer's and user's accuracy by combining hyperspectral data, Landsat multitemporal spectral data, and Landsat-derived information on forest history (the HyperLsLC model; Table 5). We interpret these results as support for our hypothesis that including multitemporal information would improve map accuracy for tree plantations. However, increased accuracy was largely a result of better discrimination of mature and secondary forests. Relatively little gain in accuracy was achieved for tree plantations as a single land use category.
We were able to classify the six tree species within plantations with reasonably high accuracy using hyperspectral imagery alone. Including multitemporal spectral information slightly lowered the classification accuracy of species with open and/or temporally variable canopies (the tree crop Citrus, the deciduous Tectona, and the thin-canopied Terminalia). However, multitemporal information markedly increased the classification accuracy of the tree species with more dense canopies (Gmelina and Vochysia). Temporal variability in reflectance within plantations dominated by open species may not be uniform across space, likely because of forest management history and site factors that can influence canopy and understory structure (e.g., [52,57,58]). Overall, there was a small average gain in user's accuracy across species. There was mixed support for our second hypothesis, suggesting that adding multitemporal information to hyperspectral classification may be most useful in regions more extensively dominated by secondary forests and dense-canopied tree plantations. The additional processing needed to include multitemporal satellite imagery may be most efficiently applied in cases where distinguishing secondary and mature forest types from one another is needed to assess landscape-scale forest conditions. The accuracy improvement from including multitemporal spectral data was smaller than anticipated. In the Landsat imagery, mature forests frequently showed minor spectral changes over time (Figure 3) or were contaminated by cloud artifacts despite cloud masking, leading to false positives for spectral changes. Spectral changes from forest clearing and secondary forest regrowth were large in many locations ( Figure 3), but in some locations changes were smaller, with incomplete disturbance and rapid forest recovery. The mismatch in spatial resolution between Landsat and HyMap may also have contributed to error in the spectral change metrics, as secondary forest and tree plantations typically occurred in small patches adjacent to mature forests.
Greater gains in accuracy from integrating multitemporal information with hyperspectral data would likely be realized by tracking reforestation across a greater number of Landsat image dates with enhanced cloud filtering and replacement techniques, utilizing extensive and freely available image archives [78,82]. We used only four Landsat image dates for this analysis because of extensive cloudiness in this region. Better indices of spectral change could be derived from a subdecadal cloud-free composite methodology [79], or a monthly Landsat time series with a cloud pixel filter based on temporal outliers [126]. The inclusion of more frequent imagery can potentially reveal spectral or land use trajectories unique to different species within plantations or to specific management techniques applied, like tree thinning (e.g., [75]). The use of a variety of spectral indices might also improve the sensitivity of the data to spectral shifts with regrowth and disturbance [79]. The utility of a multitemporal approach will likely increase as new multitemporal change indices are tested [79,126] and the Landsat archives are more specifically processed to facilitate multitemporal comparisons [12]. There is an increasing number of pre-processed image products for making consistent land change measurements over time [81,127,128].

Model Accuracy Assessment
In our study, random sampling of tree plantations for independent accuracy assessment was difficult due to the low reliability of distinguishing tree plantations from secondary forests on aerial photos. As a result, we included opportunistically-sampled ground validation data to increase sample size for secondary forests and tree plantation species. Because the inclusion of ground validation data imbalanced class sizes across our class strata [113,118], we employed a subsampling bootstrap to equalize class sizes in a stratified random sampling validation. The subsampling bootstrap permitted a more robust estimate of the accuracy confidence intervals for each class, which has two principal limitations. First, equal class sample sizes overestimated errors of commission for common classes like secondary forest by assuming that all classes were equally abundant in the study area. A stratified random sampling scheme with equal class sizes emphasizes user's accuracy (errors of omission) over overall accuracy [129]. Secondly, the subsampling bootstrap depends on sufficient sample sizes and well-spaced sample locations to estimate class accuracy, and may over-or underestimate error for classes with low sample sizes, such as areas reforested with native species in our study region.
Band reduction and transformation using LDA decreased but did not eliminate inter-image differences (Figures 1 and 3), even though training data for all classes were collected across multiple image tiles. Radiometric correction of hyperspectral imagery, especially imagery mosaics with limited overlap among image tiles, is an area where additional research effort is needed on how to more effectively normalize reflectance data across flight lines [100,130,131]. Accurate surface reflectance estimates across hyperspectral image flight lines will be required for other approaches utilizing field spectroscopy rather than extensive field data collection efforts [132]. Improved radiometric, terrain, and atmospheric correction between our HyMap image tiles could have potentially increased classification accuracy. However, because our training and validation data for each class came from multiple image tiles and encompassed a variety of site, terrain shadowing, atmospheric, and image-illumination differences (Figures 4 and A3), our final classified land use maps were free of image artifacts (Figure 8).

Status of Tree Plantations in Northeastern Costa Rica
Primary forests are being converted to tree plantations and tree crops across the tropics [15,70,133]. However, in previously deforested landscapes like northeastern Costa Rica, tree plantations can dramatically enhance carbon storage and tree cover [134,135]. Because of the rarity of native tree plantations in our region, maps that do not distinguish non-native tree plantations from secondary forests (e.g., [12]) are much more likely to have biased estimates of natural regeneration and deforestation rates than maps that attempt to distinguish plantations from secondary forest (e.g., [9,37]). We estimate that only ~3% of the reforestation classes mapped by Fagan et al. [9] in this region were actually native tree plantation species, with the remainder secondary forests (~78%) and non-native tree plantation species (~19%).
The dominance of non-native tree plantations in this region, coupled with the shorter rotation times of non-native species (Table 1), indicates that tree plantation cover will be highly dynamic in the coming decade. While plantation management will benefit timber supplies in this region, rapid harvest rates may limit the potential for long-term increases in forest connectivity from tree planting in this habitat corridor. Further, non-native tree plantations were often planted outside the SJLS corridor ( Figure 8). The re-establishment of long-lasting and diverse forest cover in this landscape would likely benefit from adjusting reforestation payments to favor reforestation with higher-value native tree species. Native tree plantation species were largely planted within the SJLS Corridor and in the forested foothills to the southeast, adjacent to existing forest cover (Figure 8). Native species can have longer rotation times, but often require less intensive management (Table 1), and wood products and other ecosystem services provided by native trees may eventually be preferred over non-native species, as new data on growth rates and rates of return on investments become available [136].

Conclusions
Given the increasing importance of tropical tree plantations and tree crops, there is an urgent need to improve our ability to inventory and monitor tree plantations and distinguish them from natural forest types.
In this initial test of a new methodology, we found that hyperspectral and multitemporal data can be effectively integrated to discriminate tree plantations from secondary forests and mature forests with 88.5% overall accuracy. The addition of multitemporal spectral metrics and land use history derived from a Landsat image time series improved the classification accuracy of secondary forests and closed canopy tree plantation species, but slightly decreased discrimination of plantation species with more open canopies. Adding multitemporal information may only be worth the added image processing effort when image quality is high and/or project objectives include mapping of secondary forest or dense tree plantations.
As increasing amounts of high-quality hyperspectral imagery become available from aerial (CAO, NEON, GLihT, AVIRIS, HyMap, CASI) and satellite (HyspIRI, EnMap) sensors over the next decade [42,77,130], our results suggest that it may be possible to use hyperspectral data alone to make accurate regional or even global maps of tree crop and plantation species. However, to inventory multiple tropical forest types in support of reforestation and forest management initiatives like REDD+, we show that it may be necessary to combine hyperspectral imagery with other remotely sensed data that has greater temporal frequency and a longer historical record. Both the acquisition of rigorous validation data on tree plantations and the integration of hyperspectral and multitemporal imagery can be logistically demanding, but we found that together they can reveal insights into reforestation patterns important to refining national reforestation programs.

Acknowledgments
This work was supported by National Aeronautics and Space Administration Earth System Science Fellowship NNX10AP49H, the ASPRS Ta Liang Memorial Award, The Earth Institute, the Columbia Institute of Latin American Studies, and by an appointment to the NASA Postdoctoral Program at the Goddard Space Flight Center, administered by Oak Ridge Associated Universities through a contract with NASA. The authors would like to thank Margaret Kalacska for an insightful and helpful early review of this manuscript, and to thank Chris Small for countless hours of great remote sensing advice in small restaurants. Field research was made possible by logistical support provided by FUNDECOR and the staff at the Organization for Tropical Studies La Selva Biological Station, and we would like to thank Andres Sanchûn, Jose Miranda, Marvin Paniagua, and Mauricio Gaitan for assistance in the field. We thank CENAT and Carlos Andres Campos for providing geospatial data on Costa Rica and would like to express our appreciation to Bonnie Tice and Sue Pirkle. Finally, the authors wish to thank the three anonymous reviewers for their insightful comments, which led to marked improvements in the original manuscript.

Author Contributions
Matthew Fagan contributed the main idea, designed the methodology, processed all the data, conducted all the experiments, drafted the preliminary version of this paper, and finished the final version of this paper. Ruth DeFries contributed to the main idea and methodology, and revised all paper versions. Steven Sesnie and J. Pablo Arroyo-Mora contributed to the methodology, shared data for analysis, and revised all paper versions. Carlomagno Soto assisted with data discovery and shared data for analysis. Aditya Singh and Philip Townsend contributed a new algorithm for hyperspectral data processing, prior to its publication. Robin Chazdon shared data for analysis and revised all paper versions. Figure A1. The 2005 Landsat-based land cover map used as part of the multitemporal forest regrowth classification in this study. The red outline denotes the San Juan-La Selva Biological Corridor, and the white outline denotes the outer boundary of the HyMap land-use map (see Figure 8). Non-native tree plantations are shown in purple, and native reforestation (including secondary forest and native tree plantations) is shown in green; all other land cover classes are defined in Table 2. Figure A2. Kappa values of the four different RF models (mean ±95% CI) with four summary classes. All Kappa values differ significantly between models (p < 0.0001). Figure A3. Pairs of hyperspectral reflectance spectra, taken from the training data for all nine forest types. See Table 2 for sample sizes and class name acronyms. This figure is similar to Figure 4, but displays variability in each class with pairwise comparisons to secondary forest. Lines show mean reflectance, with buffer polygons showing ± one standard deviation. Secondary forests (pink lines, yellow polygons) are compared against the other forest classes (black lines, gray polygons). Note the large amount of variation in spectral reflectance for each forest type.     Table A3. Bootstrapped accuracy confusion matrix for the HyperLsLC map in Figure 8. We show here the four summary classes disambiguated into their component classes, with producer's and user's accuracies estimated directly from the table.