Estimating Forest Characteristics for Longleaf Pine Restoration Using Normalized Remotely Sensed Imagery in Florida USA

: E ﬀ ective forest management is predicated on accurate information pertaining to the characteristics and condition of forests. Unfortunately, ground-based information that accurately describes the complex spatial and contextual nature of forests across broad landscapes is cost prohibitive to collect. In this case study we address technical challenges associated with estimating forest characteristics from remotely sensed data by incorporating ﬁeld plot layouts speciﬁcally designed for calibrating models from such data, applying new image normalization procedures to bring images of varying spatial resolutions to a common radiometric scale, and implementing an ensemble generalized additive modeling technique. Image normalization and ensemble models provided accurate estimates of forest types, presence / absence of longleaf pine ( Pinus palustris ), and tree basal areas and tree densities over a large segment of the panhandle of Florida, USA. This study overcomes several of the major barriers associated with linking remotely sensed imagery with plot data to estimate key forest characteristics over large areas. D.L.R.A.; software, J.H.; supervision, D.L.R.A.; validation, J.H.; visualization, J.H.; writing–original J.H.; writing–review J.H., D.L.R.A., N.A., C.S., S.D., J.G., and R.S. All


Introduction
Longleaf ecosystems are some of the most endangered forest ecosystems in the world [1]. Historically, these ecosystems covered approximately 37 million ha in the southeast United States of America (USA) and provided habitat for a wide range of plants and animals owing to their unique structure and adaptations to fire. The species composition and structure of these ecosystems is maintained by frequent fire, which preserves a diverse early successional understory and relatively open longleaf pine (Pinus palustris) dominated overstory [2]. Due to many anthropogenic factors such as land use change, the suppression of fire on the landscape, over-harvesting of timber, and the replacement of longleaf pine with faster growing loblolly (Pinus taeda) and slash (Pinus elliottii) pines, less than 1.5 million ha of longleaf pine remain [2]. Because of this dramatic loss of habitat there has been a recent resurgence in longleaf ecosystem restoration and conservation that calls for more than doubling the existing area covered by longleaf ecosystems by 2024 [3].  Table A2.

Overview
Primary datasets used in this study consist of field inventory plots, aerial based digital imagery [13], multi-temporal Sentinel 2 satellite imagery with level 1 top of atmosphere correction [14], and multi-temporal Landsat 8 satellite imagery with level 1 terrain precision correction [15,16] for the extent of ASGA. Field plot measurements were collected by trained technicians [17] between the months of September 2017 and February 2018. Aerial imagery was flown by Quantum Spatial [13] at a nominal spatial resolution of 0.6 m for blue, green, red, and near infrared portions of the electromagnetic spectrum between October 26 th and November 18 th , 2017 ( Figure A1). In total, 29 aerial images representing a given flight path were acquired for the study area (Table A2). Multitemporal Sentinel 2 images were manually selected for cloud free tiles and were downloaded from the European Space Agency (ESA) website Copernicus Open Access Hub [18]. Images were acquired for a leaf on growing season (LGS; March-May), leaf on dormant season (LDS; October-November) and a leaf off winter season (LWS; December-February) between the years of 2017 and 2018 to capture variations in tree species phenology ( Figure A1). Sentinel 2 bands 2, 3, 4, and 8 at a nominal spatial resolution of 10 m were used for study comparisons. In total, 15 Sentinel 2 tiles were acquired for the study area (Table A2). Landsat 8 images were also manually selected for cloud-free scenes and were downloaded from United States Geological Survey (USGS) website Earth Explorer [19]. They were acquired for similar LGS, LDS, and LWS periods between the years 2015 and 2018 ( Figure A1). Only Landsat 8 bands 2-7 were used for study comparisons. In total, 6 Landsat 8 scenes were acquired for the study area (Table A2). Band selection for Sentinel 2 and Landsat 8 images were based on the desire to select portions of the electromagnetic spectrum that were comparable with the portion of the spectrum sensed within the aerial imagery while also including the short wave infrared (SWIR) component of the spectrum. For SWIR portions of the spectrum, only Landsat 8 bands were included due to their similarities to Sentinel 2 band spectral and spatial resolutions. Sentinel 2 bands 5-7 were not included within analyses because they were not comparable with the spectral resolutions of either the aerial or Landsat 8 imagery.
To reduce the negative impacts of co-registration errors [20], we used a field plot layout consisting of four adjacent circular subplots, each with a radius of 9 m (Section 2.4). In total, 244 field plots were installed (Section 2.3) and used to measure tree species counts and diameters at breast  Table A2.

Overview
Primary datasets used in this study consist of field inventory plots, aerial based digital imagery [13], multi-temporal Sentinel 2 satellite imagery with level 1 top of atmosphere correction [14], and multi-temporal Landsat 8 satellite imagery with level 1 terrain precision correction [15,16] for the extent of ASGA. Field plot measurements were collected by trained technicians [17] between the months of September 2017 and February 2018. Aerial imagery was flown by Quantum Spatial [13] at a nominal spatial resolution of 0.6 m for blue, green, red, and near infrared portions of the electromagnetic spectrum between October 26th and November 18th, 2017 ( Figure A1). In total, 29 aerial images representing a given flight path were acquired for the study area (Table A2). Multi-temporal Sentinel 2 images were manually selected for cloud free tiles and were downloaded from the European Space Agency (ESA) website Copernicus Open Access Hub [18]. Images were acquired for a leaf on growing season (LGS; March-May), leaf on dormant season (LDS; October-November) and a leaf off winter season (LWS; December-February) between the years of 2017 and 2018 to capture variations in tree species phenology ( Figure A1). Sentinel 2 bands 2, 3, 4, and 8 at a nominal spatial resolution of 10 m were used for study comparisons. In total, 15 Sentinel 2 tiles were acquired for the study area (Table A2). Landsat 8 images were also manually selected for cloud-free scenes and were downloaded from United States Geological Survey (USGS) website Earth Explorer [19]. They were acquired for similar LGS, LDS, and LWS periods between the years 2015 and 2018 ( Figure A1). Only Landsat 8 bands 2-7 were used for study comparisons. In total, 6 Landsat 8 scenes were acquired for the study area (Table A2). Band selection for Sentinel 2 and Landsat 8 images were based on the desire to select portions of the electromagnetic spectrum that were comparable with the portion of the spectrum sensed within the aerial imagery while also including the short wave infrared (SWIR) component of the spectrum. For SWIR portions of the spectrum, only Landsat 8 bands were included due to their similarities to Sentinel 2 band spectral and spatial resolutions. Sentinel 2 bands 5-7 were not included within analyses because they were not comparable with the spectral resolutions of either the aerial or Landsat 8 imagery.
To reduce the negative impacts of co-registration errors [20], we used a field plot layout consisting of four adjacent circular subplots, each with a radius of 9 m (Section 2.4). In total, 244 field plots were installed (Section 2.3) and used to measure tree species counts and diameters at breast height (dbh; 1.37 m). For the extent of each field plot, spectral and texture metrics (Section 2.6) were derived from source and normalized Landsat 8, Sentinel 2, and aerial imagery (Section 2.5). These metrics were used to calibrate classification models of the dominant forest type (DFT) and longleaf pine presence (LPP), as well as continuous models of basal area per ha (BAH) and trees per ha (TPH) of pine and other tree species groups (Section 2.7).

Sample Design
A significant portion of the ASGA is made up of private land holdings. Owing to access constraints, forests in these holdings could not be sampled on the ground. The remaining area, consisting primarily of public lands, constituted our target population ( Figure 2). Plot locations within the target population were allocated spatially at random within 804 m of a road. In total, a sample of 244 locations was drawn from the target population. The restriction on sampling necessitated the following assumptions: (1) the target population encompassed the spectral domain of the full ASGA and (2) the relationships between variables observed in the field and those derived from the imagery were the same across the target population and the full ASGA. height (dbh; 1.37 m). For the extent of each field plot, spectral and texture metrics (Section 2.6) were derived from source and normalized Landsat 8, Sentinel 2, and aerial imagery (Section 2.5). These metrics were used to calibrate classification models of the dominant forest type (DFT) and longleaf pine presence (LPP), as well as continuous models of basal area per ha (BAH) and trees per ha (TPH) of pine and other tree species groups (Section 2.7).

Sample Design
A significant portion of the ASGA is made up of private land holdings. Owing to access constraints, forests in these holdings could not be sampled on the ground. The remaining area, consisting primarily of public lands, constituted our target population ( Figure 2). Plot locations within the target population were allocated spatially at random within 804 m of a road. In total, a sample of 244 locations was drawn from the target population. The restriction on sampling necessitated the following assumptions: 1) the target population encompassed the spectral domain of the full ASGA and 2) the relationships between variables observed in the field and those derived from the imagery were the same across the target population and the full ASGA. Only the first of those assumptions could be empirically evaluated. To do so, we partitioned the multivariate distribution (feature space) of the normalized spectral and texture metrics (Section 2.5) over the ASGA into 100 classes using an unsupervised k-means classification [21,22]. The number of k-mean classes (k = 100) was based on the desire to split feature space into relatively small subsets that could be mapped back to geographic space and used to geographically identify sampled portions of feature space. While the number of classes chosen for our study was 100, increasing or decreasing the number of classes would have the impact of decreasing or increasing the amount of feature space attributed to each class, respectively. Similarly, in a geographic sense, increasing or decreasing the number of k-mean classes would translate to increasing or decreasing the geographic precision of sampled areas within feature space, respectively. To mitigate the computation load, a simple random sample of 10,000 locations drawn from across the ASGA was used to build the k-means classifier and estimate the proportion of area attributed to each class across the study area. We then compared the class proportions for this full ASGA sample to the proportions across the 244 sample locations drawn from the target population.

Field Data
Plot data consisted of global position system (GPS) locations and tree measurements within the boundaries of subplots (36 m by 36 m, containing 4 subplots each with a 9 m radius; Figure 2) at those Only the first of those assumptions could be empirically evaluated. To do so, we partitioned the multivariate distribution (feature space) of the normalized spectral and texture metrics (Section 2.5) over the ASGA into 100 classes using an unsupervised k-means classification [21,22]. The number of k-mean classes (k = 100) was based on the desire to split feature space into relatively small subsets that could be mapped back to geographic space and used to geographically identify sampled portions of feature space. While the number of classes chosen for our study was 100, increasing or decreasing the number of classes would have the impact of decreasing or increasing the amount of feature space attributed to each class, respectively. Similarly, in a geographic sense, increasing or decreasing the number of k-mean classes would translate to increasing or decreasing the geographic precision of sampled areas within feature space, respectively. To mitigate the computation load, a simple random sample of 10,000 locations drawn from across the ASGA was used to build the k-means classifier and estimate the proportion of area attributed to each class across the study area. We then compared the

Field Data
Plot data consisted of global position system (GPS) locations and tree measurements within the boundaries of subplots (36 m by 36 m, containing 4 subplots each with a 9 m radius; Figure 2) at those locations. Basal area (BAH; m 2 ha −1 ) and tree density (TPH; trees ha −1 ) were summarized for trees greater than 5 cm in dbh by Pine and Other species classes (Table A1). Additionally, each plot was assigned a dominant forest type (DFT) label of Nonforest, Pine, or Other using the rules defined in Table 1. Finally, longleaf pine presence or absence (trees greater than 5 cm dbh) was described at the plot level (Table 1). Table 1. Definitions and queries used to label Pine, Other, and Nonforest dominant forest cover types (DFT) and the presence of longleaf pine (LPP) along with proportions of plots meeting those criteria. Species composition for Pine and Other class labels can be found in Table A1. Field plots were located using a Wide Area Augmentation System (WAAS) enabled global position system (GPS) embedded within a Trimble Recon© data recorder (Trimble, Sunnyvale, CA, USA). Plot GPS and tree data were collected using Environmental Systems Research Institute (ESRI)'s ArcPad© version 10.0 (ESRI, Redlands, CA, USA) and a suite of mobile data collection applets developed specifically for this project (Supplemental Materials 1). While every attempt was made to navigate to the exact field plot location, real time navigation can introduce geographic error. To minimize this error when relating field plot summaries to remotely sensed data, 20 GPS positions were collected and averaged to estimate the exact center of the southeastern subplot within each 36 m by 36 m field plot. From the southeastern plot location, remaining subplots centers were located based on ground distance and compass bearings. On average the standard deviation of GPS positions across all plots was less than 1 m with a maximum standard deviation in northing of 2.8 m and in easting of 5.1 m occurring at one plot location (average horizontal dilution of precision of 0.93).

Image Normalization
Remotely sensed data were re-scaled to a common radiometric scale using an enhanced aggregate no-change regression (EANR) methodology. Similar to aggregate no-change regression (ANR) [6,23], EANR seeks to leverage strong linear relationships among Landsat, Sentinel 2, and aerial image bands to bring finer spatial resolution imagery to the same relative radiometric scale as coarser imagery. The main differences between EANR and ANR are: (1) an added aggregation step to bring finer resolution imagery to the same spatial scale as the reference imagery, (2) a normalization of raw digital number (DN) values, (3) a trimming procedure to mitigate confounding effects of land use/cover changes for images acquired at different dates, and (4) a sampling scheme to extract spectral aggregates within overlapping image boundaries.
Like ANR, EANR extends the concepts of automatic scattergram-controlled regression [24] by applying an area slicing algorithm to identify no-change pixels. It then spatially aggregates pixel values around selected locations to minimize the effect of co-registration errors [20,23]. Aggregated values at selected locations of the subject image are then regressed against the corresponding values for a reference image using ordinary least squares regression (OLS) on a band-by-band basis. Finally, Forests 2020, 11, 426 6 of 21 regression coefficients are applied to the subject image to bring it to the same radiometric scale as the reference image and to other fine-scaled images that overlap the same reference image.
Critical to this process is the elimination of areas and pixels within the region of image overlap that have been affected by land use or land cover change. This was accomplished in two steps. The first step required visually estimating the proportion (p ∆ ) of area within the overlapping regions of the images that were impacted by changes due to phenomena such as clouds, land use change, or land cover change. For our study, we used an estimate of p ∆ = 20 percent to mask Landsat 8 and Sentinel 2 pixels. For aerial imagery we used a p ∆ ranging between 20 and 30 percent. Applying these change proportions, raw DN values were separately normalized DN to the unit scale for each of the subject (S) and reference (R) images and bands (k). Differences in the normalized DN values between subject and reference images were then obtained for each band: For step two, the ∆ DN k values were ordered from smallest to largest. Pixels in the lower 10% (i.e., 0.5p ∆ ) and upper 10% were identified as "change" pixels and masked from subsequent analyses.
The spatial aggregation process minimizes the effect of co-registration errors [20] by extracting and calculating the mean values of unchanged pixels within the overlap of each image at a coarser grain size. In this procedure an aggregation block window size of 270 m by 270 m was chosen based on simulated results found for Landsat 8 co-registration errors reported in Hogland and Affleck [20]. Aggregated blocks with more than 30 percent of the pixels identified as no-change were used for regression analyses of subject image blocks on reference image blocks for individual bands. The resulting regression coefficients for each band were then applied back to the non-aggregated pixel values of the subject image. The described EANR procedure was developed and integrated into the RMRS Raster Utility project [22] (Supplemental Materials 2).
For Landsat 8 and Sentinel 2 images, we used EANR with 1000 randomly chosen locations within regions of overlap to bring images within a given season to a common relative radiometric scale. For normalization of Landsat 8 imagery, path/row 18/39 data were used as the reference images for each season. Reference images for normalization of Sentinel 2 imagery are specified in Table A2. For aerial imagery, normalized Landsat 8 imagery was used as the reference image. In instances when substantial land use or land cover change had occurred due to differences in image acquisition dates (e.g., changes in agricultural fields), manually defined spatial masks were used to remove additional pixels prior to implementation of the EANR procedure ( Figure 3).
within regions of overlap to bring images within a given season to a common relative radiometric scale. For normalization of Landsat 8 imagery, path/row 18/39 data were used as the reference images for each season. Reference images for normalization of Sentinel 2 imagery are specified in Table A2. For aerial imagery, normalized Landsat 8 imagery was used as the reference image. In instances when substantial land use or land cover change had occurred due to differences in image acquisition dates (e.g., changes in agricultural fields), manually defined spatial masks were used to remove additional pixels prior to implementation of the EANR procedure ( Figure 3).

Spectral and Texture Metrics
Once normalized, focal mean and standard deviation analyses [5,6,22] were performed to quantify texture and mimic field plot extents for each band, season, and source of remotely sensed imagery ( Table 2). Due to differing spatial resolutions of the imagery, various window sizes were used to capture spectral and texture patterns at approximately the same spatial resolution as the field plot extent. Window sizes for each image source were chosen based on the smallest window extent with an odd number of rows and columns that could be used to calculate focal mean and standard deviation statistics without resampling the imagery. For Landsat 8 imagery, mean band DN values were extracted based on the location of a given field plot and the nearest pixel (30 m by 30 m spatial resolution). Additionally, at those locations, Landsat 8 band standard deviations for a three by three moving window (90 m by 90 m) were calculated for each pixel and extracted. For Sentinel 2 imagery, a three by three moving window (30 m by 30 m) was used to calculate mean values and a five by five window (50 m by 50 m) was used to calculate standard deviation. For aerial imagery, a 61 by 61 window (grain size of 0.6 m,~37 m by 37 m) was used to calculate image mean and standard deviation at each location. In total, 68 different metrics were created and extracted from the normalized remotely sensed imagery using the averaged GPS location of each field plot and the nearest image pixel. As shown in Table 2, these metrics include 36 band values extracted from Landsat 8 based imagery (2 metrics for 6 bands and 3 seasons), 24 band values extracted from Sentinel 2 based imagery (2 metrics for 4 bands and 3 seasons), and 8 band values extracted from the aerial imagery (2 metrics for 4 bands and 1 season). To evaluate the impact of EANR on estimating DFT, LLP, BAH and TPH, this metric creation and extraction process was repeated for non-normalized imagery and the values from both sets were used to build and compare models.

Model Development, Comparisons, and Raster Surface Creation
For each response variable (DFT and LLP classes, plus Pine and Other tree species BAH and TPH) we used a variable selection routine described in [5,6] to select EANR normalized and non-normalized remotely sensed metrics that significantly (α = 0.05) improved the model fit of generalized additive models (GAMs), as defined by increased percent deviance explained. This variable selection routine uses GAMs, deviance, and pseudo p-values to iterate through potential predictor variables and exhaustively find the combination of additive variables that are both significant and improve the model fit by a user specified amount (Supplemental Materials 3). GAMs are a flexible modeling technique that can accommodate nonlinear relationships between response and predictor variables using penalized regression splines [25] and can be applied to non-Gaussian response data such as the DFT and LLP. While useful in identifying nonlinear, nonparametric relationships, GAMs can overfit sample data making estimates less generalizable to a given population [26]. To address the issue of overfitting, we employed a Monte Carlo re-sampling scheme to build a suite of 50 GAMs constructed from random subsets of our data. For each of the 50 GAMs, 75 percent of the observations within our sample were used to develop relationships between a given response and our previously selected predictor variables. The remaining 25 percent of the observations that were not used to build the GAM constituted an out of bag (OOB) subset of the data and were used to assess the accuracy of GAM estimates P . Statistics calculated for assessment were root mean squared error (RMSE) for continuous response variables (BAH and TPH) and classification accuracy (from a most likely class rule) for the binomial (LPP) and multinomial (DFT) categorical response variables. Once calibrated, we applied the ensemble of 50 GAMs (EGAM) to estimate each response variable Ê and corresponding standard error SE at the pixel level as follows:Ê EGAMs based on EANR normalized imagery were compared against EGAMs based on non-normalized imagery using averaged RMSE, classification accuracy statistics, and Akaike's information criterion (AIC) [27,28]. Our best fitting EGAMs (normalized versus raw image based predictors) were used with the corresponding image metrics to produce raster surfaces depicting the condition of forests in the ASGA. Additionally, estimation errors at field plot locations were evaluated for spatial correlation using a global Moran's I test [29]. The described analyses and procedures were developed and performed in R statistical software [30] and are available in Supplemental Materials 3.

Field Data
Plot summaries from field data are displayed in Figure 4 and Table 1. Almost half the plots were in the Pine class (48%), but only 16.8% contained longleaf pine trees above 5 cm dbh and BAH above 2 m 2 ha −1 . Mean BAH for the DFT Other class was only slightly higher than that for the DFT Pine class, but BAH was more variable in the former. Mean TPH and variability in TPH was substantially lower in the Pine class relative to the Other forested class. Taken together, this indicates that tree diameter (dbh) tended to be larger in the Pine class.
normalized imagery using averaged RMSE, classification accuracy statistics, and Akaike's information criterion (AIC) [27,28]. Our best fitting EGAMs (normalized versus raw image based predictors) were used with the corresponding image metrics to produce raster surfaces depicting the condition of forests in the ASGA. Additionally, estimation errors at field plot locations were evaluated for spatial correlation using a global Moran's I test [29]. The described analyses and procedures were developed and performed in R statistical software [30] and are available in Supplemental Materials 3.

Field Data
Plot summaries from field data are displayed in Figure 4 and Table 1. Almost half the plots were in the Pine class (48%), but only 16.8% contained longleaf pine trees above 5 cm dbh and BAH above 2 m 2 ha -1 . Mean BAH for the DFT Other class was only slightly higher than that for the DFT Pine class, but BAH was more variable in the former. Mean TPH and variability in TPH was substantially lower in the Pine class relative to the Other forested class. Taken together, this indicates that tree diameter (dbh) tended to be larger in the Pine class.

Image Normalization
The EANR-adjusted Landsat 8, Sentinel 2, and aerial imagery is shown in Figure 5. The R 2 statistics for LDS, LGS, and LWS Landsat 8 normalization regressions were generally large (greater than 0.9; Figure 6) across all bands, with the lowest R 2 coinciding with LWS analyses. Similarly, for Sentinel 2 imagery, the R 2 statistics of the image-to-image overlap regressions were large (greater than 0.9) with the lowest occurring in LWS. The strength of these associations resulted in the improved radiometric consistency across Landsat 8 and Sentinel 2 images that is visually evident in Figure 5 (cf. Figure A1). In contrast, regressions of overlapping aerial and Landsat 8 Imagery were weak for several flight paths ( Figure 6). The acquisition dates for these paths tended to be later ( Figure 3) and potentially coincided with substantial changes in plant phenology. As a result, original scene boundaries were still apparent in normalized aerial image mosaics ( Figure 5). Sentinel 2 imagery, the R statistics of the image-to-image overlap regressions were large (greater than 0.9) with the lowest occurring in LWS. The strength of these associations resulted in the improved radiometric consistency across Landsat 8 and Sentinel 2 images that is visually evident in Figure 5 (cf. Figure A1). In contrast, regressions of overlapping aerial and Landsat 8 Imagery were weak for several flight paths ( Figure 6). The acquisition dates for these paths tended to be later ( Figure  3) and potentially coincided with substantial changes in plant phenology. As a result, original scene boundaries were still apparent in normalized aerial image mosaics ( Figure 5).

Sample Distribution
Clusters derived from spectral and texture metrics using the k-means unsupervised clustering algorithm divided the feature space into 100 classes of varying size (Figure 7). The distribution of the than 0.9) with the lowest occurring in LWS. The strength of these associations resulted in the improved radiometric consistency across Landsat 8 and Sentinel 2 images that is visually evident in Figure 5 (cf. Figure A1). In contrast, regressions of overlapping aerial and Landsat 8 Imagery were weak for several flight paths ( Figure 6). The acquisition dates for these paths tended to be later ( Figure  3) and potentially coincided with substantial changes in plant phenology. As a result, original scene boundaries were still apparent in normalized aerial image mosaics ( Figure 5).

Sample Distribution
Clusters derived from spectral and texture metrics using the k-means unsupervised clustering algorithm divided the feature space into 100 classes of varying size (Figure 7). The distribution of the Figure 6. Box plots of enhanced aggregate no-change normalization (EANR) coefficients of determination (R 2 ) results for each source of imagery and season across all image bands. For Landsat 8 and Sentinel 2, the reference and subject images were of the same source; for Aerial analyses, the reference images were normalized Landsat 8 imagery.

Sample Distribution
Clusters derived from spectral and texture metrics using the k-means unsupervised clustering algorithm divided the feature space into 100 classes of varying size (Figure 7). The distribution of the field plot locations across those same 100 classes shows that the field plots spanned a wide range of the feature space, but did not capture all the spectral classes. Many of the classes not represented at field plot locations corresponded to urban and agricultural cover types. field plot locations across those same 100 classes shows that the field plots spanned a wide range of the feature space, but did not capture all the spectral classes. Many of the classes not represented at field plot locations corresponded to urban and agricultural cover types.

Model Development, Comparisons, and Raster Surface Creation
Spectral and texture metrics selected for modeling varied by response variable and whether images were normalized to a common radiometric scale (Table 3). Using the variable selection procedure described by Hogland et al. [6], EGAMs selected between four and twelve metrics out of the potential 68, at a significance threshold of α = 0.05. In all instances, metrics were selected from each image source and from multiple seasons (Tables 2 and 3). Compared with EGAMs built using non-normalized texture metrics, EANR based EGAMs generally improved model fit as measured by average classification error, RMSE, and AIC (Table 3). EGAMs based on non-normalized imagery generally selected fewer metrics and tended to utilize metrics derived from Landsat 8 and Sentinel 2 data sources. In all cases, EGAMs that used normalized predictors were selected as the top fitting models and used to assess accuracy and create raster surfaces. Table 3. Selected predictor variables and mean error statistics for normalized (EANR) and nonnormalized (RAW) based ensemble generalized additive models (EGAMs). Error statistics for dominant forest type (DFT) and longleaf pine presence (LPP) models are in terms of 1-accuracy of the classification, while BAH and TPH errors are measured in terms of root mean squared error (RMSE) on the square root scale. Train and out of bag (OOB) errors denote whether error statistics were calculated from observations used to train a given model or withheld from training, respectively. Mean Akaike information criterion (AIC ̅̅̅̅̅ ) was calculated using training data and the AIC statistics for each generalized additive model within the corresponding ensemble.

Model Development, Comparisons, and Raster Surface Creation
Spectral and texture metrics selected for modeling varied by response variable and whether images were normalized to a common radiometric scale (Table 3). Using the variable selection procedure described by Hogland et al. [6], EGAMs selected between four and twelve metrics out of the potential 68, at a significance threshold of α = 0.05. In all instances, metrics were selected from each image source and from multiple seasons (Tables 2 and 3). Compared with EGAMs built using non-normalized texture metrics, EANR based EGAMs generally improved model fit as measured by average classification error, RMSE, and AIC (Table 3). EGAMs based on non-normalized imagery generally selected fewer metrics and tended to utilize metrics derived from Landsat 8 and Sentinel 2 data sources. In all cases, EGAMs that used normalized predictors were selected as the top fitting models and used to assess accuracy and create raster surfaces. Table 3. Selected predictor variables and mean error statistics for normalized (EANR) and non-normalized (RAW) based ensemble generalized additive models (EGAMs). Error statistics for dominant forest type (DFT) and longleaf pine presence (LPP) models are in terms of 1-accuracy of the classification, while BAH and TPH errors are measured in terms of root mean squared error (RMSE) on the square root scale. Train and out of bag (OOB) errors denote whether error statistics were calculated from observations used to train a given model or withheld from training, respectively. Mean Akaike information criterion (AIC) was calculated using training data and the AIC statistics for each generalized additive model within the corresponding ensemble. Overall classification accuracies for top fitting DFT and LPP EGAMs were greater than 88% with the most common classification errors being the mapping of the Other forested class as pine, and longleaf presence being mapped as absence (Figure 8). Training and OOB errors for DFT, LPP, BAH and TPH EGAMs are reported in Table 3 and suggest strong relationships among response and normalized predictor variables, particularly in the Pine class. Estimated BAH and TPH values from top fitting models were also strongly correlated with observed values, though in densely stocked conditions (TPH above 1000 stems ha −1 , BAH above 20-30 m 2 ha −1 ), EGAMs tended to underestimate BAH and TPH (Figure 9). Estimation errors for some EGAMs were positively spatially correlated, suggesting spatial patterns or trends in model errors (Table 4). Though tempting to address the spatial variability in estimation error using kriging [31], our sample design does not lend itself to accounting for spatial trends in the residuals across the largely privately-owned tracts of the AGSA that were outside the target population. Overall classification accuracies for top fitting DFT and LPP EGAMs were greater than 88% with the most common classification errors being the mapping of the Other forested class as pine, and longleaf presence being mapped as absence (Figure 8). Training and OOB errors for DFT, LPP, BAH and TPH EGAMs are reported in Table 3 and suggest strong relationships among response and normalized predictor variables, particularly in the Pine class. Estimated BAH and TPH values from top fitting models were also strongly correlated with observed values, though in densely stocked conditions (TPH above 1000 stems ha -1 , BAH above 20-30 m 2 ha -1 ), EGAMs tended to underestimate BAH and TPH (Figure 9). Estimation errors for some EGAMs were positively spatially correlated, suggesting spatial patterns or trends in model errors (Table 4). Though tempting to address the spatial variability in estimation error using kriging [31], our sample design does not lend itself to accounting for spatial trends in the residuals across the largely privately-owned tracts of the AGSA that were outside the target population.    EGAM raster surfaces were created at a spatial resolution of 30 m across the ASGA for all response variables. For DFT and LPP, EGAM estimates include class probabilities and model-based estimated standard errors for each cell within the study area. For pine and other √ and √ , estimates and standard errors were calculated using squared transformations of EGAM estimates.  EGAM raster surfaces were created at a spatial resolution of 30 m across the ASGA for all response variables. For DFT and LPP, EGAM estimates include class probabilities and model-based estimated standard errors for each cell within the study area. For pine and other √ BAH and √ TPH, estimates and standard errors were calculated using squared transformations of EGAM estimates.

Response
DFT and LPP raster surfaces are presented in Figure 10 while BAH and TPH surfaces are presented in Figure 11. Additionally, Figures 10 and 11 display the spatial distribution of EGAM-specific feature space k-means classes that were not represented within the sample. Some banding of estimates and standard errors is evident in Figures 10 and 11 for the models using the A2 and A5 metrics (e.g., for LLP, Pine TPH, Other BAH) in the eastern portion of the ASGA. Those bands tend to appear in the corresponding sample representation maps, suggesting that certain flight paths in the aerial imagery could not be fully normalized and were unable to be adequately sampled in the field. DFT and LPP raster surfaces are presented in Figure 10 while BAH and TPH surfaces are presented in Figure 11. Additionally, Figures 10 and 11 display the spatial distribution of EGAMspecific feature space k-means classes that were not represented within the sample. Some banding of estimates and standard errors is evident in Figures 10 and 11 for the models using the A2 and A5 metrics (e.g., for LLP, Pine TPH, Other BAH) in the eastern portion of the ASGA. Those bands tend to appear in the corresponding sample representation maps, suggesting that certain flight paths in the aerial imagery could not be fully normalized and were unable to be adequately sampled in the field. Figure 10. Spatial distribution of the dominant forest type most likely class (DFT), class probabilities, and class probability standard errors for Nonforested (red), Other (green), and Pine (blue) cover types (Table 1). Additionally, presence and absence of longleaf pine within a plot based on a most likely class rule (LPP), presence probabilities, and presence probability standard errors (purple to green color gradient). Finally, the spatial location of k-mean classes not represented in the sample used to train EGAMs are displayed as orange areas in the bottom row of graphics. Figure 10. Spatial distribution of the dominant forest type most likely class (DFT), class probabilities, and class probability standard errors for Nonforested (red), Other (green), and Pine (blue) cover types (Table 1). Additionally, presence and absence of longleaf pine within a plot based on a most likely class rule (LPP), presence probabilities, and presence probability standard errors (purple to green color gradient). Finally, the spatial location of k-mean classes not represented in the sample used to train EGAMs are displayed as orange areas in the bottom row of graphics. DFT and LPP raster surfaces are presented in Figure 10 while BAH and TPH surfaces are presented in Figure 11. Additionally, Figures 10 and 11 display the spatial distribution of EGAMspecific feature space k-means classes that were not represented within the sample. Some banding of estimates and standard errors is evident in Figures 10 and 11 for the models using the A2 and A5 metrics (e.g., for LLP, Pine TPH, Other BAH) in the eastern portion of the ASGA. Those bands tend to appear in the corresponding sample representation maps, suggesting that certain flight paths in the aerial imagery could not be fully normalized and were unable to be adequately sampled in the field. Figure 10. Spatial distribution of the dominant forest type most likely class (DFT), class probabilities, and class probability standard errors for Nonforested (red), Other (green), and Pine (blue) cover types (Table 1). Additionally, presence and absence of longleaf pine within a plot based on a most likely class rule (LPP), presence probabilities, and presence probability standard errors (purple to green color gradient). Finally, the spatial location of k-mean classes not represented in the sample used to train EGAMs are displayed as orange areas in the bottom row of graphics.

Discussion
We mapped key forest characteristics across ASGA that can be used to inform spatially explicit longleaf restoration decision making. These surfaces transform data that are relatively easy to collect into pertinent information for longleaf pine restoration, providing the fine level of spatial detail and accuracy needed to make well-informed restoration decisions. We have also described a procedure for, and demonstrated the importance of, bringing images to a common radiometric scale across sources, years, and spatial resolutions ( Figure 5). Models built with EANR-normalized spectral and texture metrics utilized distinct sources, phenological seasons, and bands, and had less error than models built using the raw imagery (Table 3). Additionally, our study highlights how multiple image resolutions (temporal, spectral, and spatial) can improve model fit and estimation, especially after images have been brought to a common radiometric scale using a technique like EANR. Finally, we presented and implemented an EGAM approach to estimate DFT, LPP, and BAH and TPH for pine and other tree species groups that accommodates nonlinear relationships while mitigating the potential for overfitting. Combined, EANR and EGAM produce better estimates of key forest characteristics than previous longleaf pine mapping efforts [5,6] (reductions in classification error and RMSE). Moreover, the EGAM procedure provided spatial depictions of estimation error (standard error surfaces) that can be used by resource managers when making management decisions.
Sample design played an important role in our study. Because some areas were inaccessible, field sampling was limited to a subset of the ASGA (Figure 2). This limitation means that inferences must rely on the assumptions that the sample adequately characterizes the feature space and that the predictor-response relationships do not vary among different ownership types across the ASGA. For our study, the sampled locations captured an appreciable extent of the feature space (Figures 7, 10 and 11) and there were relatively strong relationships among response and predictor variables. Because of these reasons we feel relatively confident that the model estimates provide an accurate depiction of forested conditions across the ASGA. However, our assessment did identify portions of the feature space that were unsampled, and others that were over-or under-represented. From a model calibration perspective, overrepresentation can be thought of as wasted effort when variability in the predictor-response relationship is relatively minor. In those instances, observations from overrepresented regions of feature space could potentially be reallocated to unsampled or underrepresented areas. Additionally, our assessment detected some geographic areas where the imagery could not be adequately normalized and thus may have related differently to the vegetation conditions. Future designs should attempt to spread sample locations across predictor variable space, while balancing the sample to mimic the distributional characteristics of the population [32]. Toward this end, the raster surfaces created from this study could be used to spread and balance sampling locations for future studies [33], thereby providing additional means to minimize error and reduce sample cost.
Quantitatively, DFT and LPP EGAM fit statistics suggest that our models accurately depicted forest cover types and the presence of longleaf pine. Similarly BAH and TPH EGAM estimates were strongly correlated with observed plot BAH and TPH over the more commonly observed lower ranges of these variables. Moreover, EANR based EGAMs outperformed EGAMs based on spectral and texture metrics from raw imagery. Interestingly, the correlation between observed and predicted values for Other TPH was substantially less than BAH and Pine TPH models, especially when tree densities were high. This may be due to the variability in tree crown shapes associated with the various tree species of the Other class (Table A1). However, further investigation is needed to substantiate this claim.
Qualitatively, our raster surfaces display more detail and are less impacted by variation in image acquisition dates when compared to previous mapping efforts [5]. However, there were instances when our EGAMs appeared to underestimate TPH ( Figure 9) and to omit instances when longleaf pine was present (Figure 8). For example, when TPH was greater than 1000 ha −1 (e.g., as would be the case in young or overstocked stands), EGAMs underestimated TPH. Additionally, some areas dominated by cities, agricultural fields, water, and aquatic vegetation were underrepresented or not represented ( Figures 10 and 11) in our sample and led to imprecise estimates ( Figure 12). However, DFT, LPP, BAH and TPH models generally produced reliable, spatially explicit estimates. represented (Figures 10 and 11) in our sample and led to imprecise estimates ( Figure 12). However, DFT, LPP, BAH and TPH models generally produced reliable, spatially explicit estimates. While estimation error could be reduced for almost every EGAM by including additional EANR normalized spectral and texture predictive variables (i.e., setting an α = 0.1), the added complexity and processing associated with including more variables outweighed the marginal gains in accuracy. Similarly, many of the predictor variables used in our EGAMs were highly correlated, suggesting marginal utility of additional metrics. Additionally, areas within feature space that were underrepresented or not represented generally produced extremely variable estimates, which should be viewed with skepticism. Moreover, EGAMs that were more complex (i.e., more predictor variables) tended to have larger proportions of the expanded feature space underrepresented. This suggests that for limited sample sizes, simpler models (i.e., fewer predictors) may be preferred for wall to wall mapping endeavors along with masking land cover types that are not forested. Finally, for some of our EGAMs, estimation errors exhibited minor positive spatial correlation. While tempting to use kriging methods [31] to remove global spatial trends in estimation error, our design did not adequately sample geographic space across ASGA, limiting the utility of kriging methods to contiguous accessible areas sampled within our study.
Though additional improvements can be made to our estimates of forest characteristics, tradeoffs between processing, model development, and sampling costs related to improvements in accuracy should be weighed to evaluate the level of precision needed to inform decision making [34]. Given the amount of model error potentially introduced by co-registration errors [20], our EGAMs explained the majority of the variation that can be accounted for within the field data and provide a substantial improvement over previous efforts [5,6]. Additionally, by implementing an ensemble approach to generalized additive models, we were able to capture nonparametric trends in the data while mitigating overfitting and providing a technique to empirically estimate model-based standard errors. The resultant estimates and standard errors provide the type of information needed to make both fine and coarse grained restoration decisions across the ASGA. While estimation error could be reduced for almost every EGAM by including additional EANR normalized spectral and texture predictive variables (i.e., setting an α = 0.1), the added complexity and processing associated with including more variables outweighed the marginal gains in accuracy. Similarly, many of the predictor variables used in our EGAMs were highly correlated, suggesting marginal utility of additional metrics. Additionally, areas within feature space that were underrepresented or not represented generally produced extremely variable estimates, which should be viewed with skepticism. Moreover, EGAMs that were more complex (i.e., more predictor variables) tended to have larger proportions of the expanded feature space underrepresented. This suggests that for limited sample sizes, simpler models (i.e., fewer predictors) may be preferred for wall to wall mapping endeavors along with masking land cover types that are not forested. Finally, for some of our EGAMs, estimation errors exhibited minor positive spatial correlation. While tempting to use kriging methods [31] to remove global spatial trends in estimation error, our design did not adequately sample geographic space across ASGA, limiting the utility of kriging methods to contiguous accessible areas sampled within our study.
Though additional improvements can be made to our estimates of forest characteristics, tradeoffs between processing, model development, and sampling costs related to improvements in accuracy should be weighed to evaluate the level of precision needed to inform decision making [34]. Given the amount of model error potentially introduced by co-registration errors [20], our EGAMs explained the majority of the variation that can be accounted for within the field data and provide a substantial improvement over previous efforts [5,6]. Additionally, by implementing an ensemble approach to generalized additive models, we were able to capture nonparametric trends in the data while mitigating overfitting and providing a technique to empirically estimate model-based standard errors.
The resultant estimates and standard errors provide the type of information needed to make both fine and coarse grained restoration decisions across the ASGA.
For example, DFT, LLP, Pine and Other BAH and TPH, and standard error raster surfaces, can be combined using a geographic information system and map algebra to locate, prioritize, and quantify overstocked pine areas where longleaf pine trees are present. The resulting surfaces from those spatial analyses can then be used to plan thinning and prescribed fire regimes to restore those areas to healthy longleaf ecosystem stand densities and compositions [35]. Similarly, these same surfaces could be used to locate, prioritize, and quantify areas where tree species other than pine make up the dominant component of BAH (DFT Other class) but still have a remnant pine and longleaf element (hardwood encroachment). Once identified, silviculture prescriptions can be planned and implemented to convert those locations into a healthy longleaf ecosystem. Moreover, those silviculture prescriptions can be designed in such a manner that they can be spatially depicted and coupled with BAH and TPH surfaces and logistic spatial modeling techniques as described in [36] to quantify implementation costs and further prioritize these types of restoration efforts.

Conclusions
Adept forest management requires accurate information concerning the status and distribution of forest resources. To create accurate information pertinent to longleaf pine restoration, we developed procedures to bring multi-temporal images to a common radiometric scale, model nonparametric relationships between remotely sensed and field measured data, and produce spatial depictions of forest cover types, longleaf pine presence, and BAH and TPH by forest cover class. These procedures and sources of data were combined to produce the types of information needed to inform longleaf restoration planning and implementation at planning and tactical spatial scales across a relatively large area in northwestern Florida.