A Forest Attribute Mapping Framework : A Pilot Study in a Northern Boreal Forest , Northwest Territories , Canada

A methods framework is presented that utilizes field plots, airborne light detection and ranging (LiDAR), and spaceborne Geoscience Laser Altimeter System (GLAS) data to estimate forest attributes over a 20 Mha area in Northern Canada. The framework was implemented to scale up forest attribute models from field data to intersecting airborne LiDAR data, and then to GLAS footprints. GLAS data were sequentially filtered and submitted to the k-nearest neighbour (k-NN) imputation algorithm to yield regional estimates of stand height and crown closure at a 30 m resolution. Resulting outputs were assessed against independent airborne LiDAR data to evaluate regional estimates of stand height (mean difference = −1 m, RMSE = 5 m) and crown closure (mean difference = −5%, RMSE = 9%). Additional assessments were performed as a function of dominant vegetation type and ecoregion to further evaluate regional products. These attributes form the primary descriptive structure attributes that are typical of forest inventory mapping programs, and provide insight into how they can be derived in northern boreal regions where field information and physical access is often limited.


Introduction
The health and sustainability of boreal forests are of increasing concern, with evidence that the circumboreal regions of the planet are warming and sensitive to changes in climate [1][2][3].A large portion of Canada and approximately 28% of the world's forests are categorized as boreal [4,5].Evidence of permafrost thaw and changes in forest fire disturbance are being reported in northern boreal regions such as the Taiga Plains [6].Changes to forest structure raises concerns about future delivery of forest ecosystem services, and require a framework to track forest area and extent.A forest inventory is the primary means of acquiring information regarding the structure and condition of a forest [7][8][9].This information has traditionally been based on the interpretation of analogue and digital aerial photographs combined with field and aerial surveys [10][11][12].Such methods are impractical to apply over vast, inaccessible forested areas in northern boreal regions such as the Northwest Territories (NWT) where forests are distributed across 40 Mha of land.Existing inventory data are sparse and inconsistent due to varying vintage, and to changes in data definitions and measurement standards that necessitate an investigation of new data sources and methods [13].This reality is consistent with previous reports that inventory data are often out of date or not available for large regions of the boreal [14].A relatively current, spatially contiguous forest inventory does not exist across the forested extent of the NWT.The most spatially complete vegetation information of the NWT serving as a reconnaissance inventory was completed as part of the Earth Observation for Sustainable Development of Forests (EOSD) project.The goal of the EOSD project was to map land cover over the forested areas of Canada for circa 2000, which included the distribution of four main forest types: treed wetland, coniferous, deciduous, and mixedwood stands [15].In the NWT, 20 Mha of EOSD land cover was updated to circa 2007.Using this updated EOSD land cover as a base, could an approach based on scaling field, multi-sensor remote sensing data and models be used to estimate forest structure as value-added products [16]?Developing such an approach would help meet increasing needs for inventory data that could support forest management in remote areas where such data is sparse.Being able to generate periodic inventories across large areas would also help monitor the state of forested areas across the boreal for multiple objectives such as wood production, climate change mitigation and biodiversity conservation [17].
The estimation of forest structure from airborne light detection and ranging (LiDAR) and spaceborne Geoscience Laser Altimeter System (GLAS) data has been the subject of considerable research [18][19][20][21][22][23][24].When combined, spatially extensive spaceborne observations, airborne LiDAR data samples, and in situ field plot data become frameworks that scale parameters from one data source to the next that result in regionally mapped products.Similar approaches have been adopted in various forms by many for regional and national mapping purposes [25][26][27][28][29][30][31]; moreover, such approaches have recently been favoured by government agencies, such as Australia's Terrestrial Ecosystem Research Network [32].
The objectives of this study were: (a) to develop a methods framework that uses field, airborne, and spaceborne remote sensing data to estimate stand height and crown closure; and (b) to apply the methods framework to generate spatially-contiguous maps of these attributes using the k-Nearest Neighbour (k-NN) statistical method of imputation.Stand height and crown closure are fundamental attributes of forest structure that are used to describe forest stands in inventory mapping [33,34], but are not available across the forested areas of the NWT [13].This study focused on estimating and mapping these two structural elements of a forest stand over a northern boreal environment in the NWT.

Study Area
The region of interest was an eight Landsat Thematic Mapper scene area, occupying approximately 20 Mha, covering the southern portion of the Canadian Taiga Plains Ecozone in the NWT, Yukon Territories (YT), Alberta (AB), and British Columbia (BC) (Figure 1).The Taiga Plains contains conifer, deciduous and mixedwood forest stands consisting of black spruce (Picea mariana), white spruce (Picea glauca), jack pine (Pinus banksiana), larch (Larix laricina), trembling aspen (Populus tremuloides), balsam poplar (Populus balsamifera) and white birch (Betula papyrifera) tree species [35].The Taiga Plains Ecozone is divided into ecoregions which describe vegetation and topographic patterns in a regional climatic framework.The various ecoregions further divide the landscape relative to physiography (i.e., discontinuous permafrost, plains, plateaus, and hill systems) and geographic features (i.e., major rivers or lakes) [35,36].
Mean summer temperatures in the region range from 7 • C in the north to 14 • C in the south, whereas mean winter temperatures of −15 • C are common in the south, compared to −26 • C in northern areas.Snow cover and freshwater ice typically last for six to eight months, and discontinuous permafrost is widespread throughout the area.Although wetlands and bogs are common throughout, the area receives little rainfall, between 200 mm and 500 mm annually [35].Multiple sources of data were acquired across the region, including in situ field plots, airborne LiDAR point cloud data, and GLAS waveform data from which to develop an integrative scaling approach.

Field Data
A total of 47 field plots established in the summer of 2006 near Fort Simpson were employed in this study, where diameter at breast height (DBH), tree height, species composition, and crown closure attributes were measured within each plot.The field plots were 400 m 2 in size (20 m × 20 m squares) and randomly established across a range of coniferous, deciduous, and mixedwood forest types.Conditions for plot location included accessibility (i.e., within 1 km of a road), and spacing at least 3 Landsat pixels apart (i.e., plots established were from 95 m to 1.4 km apart).Forty-one of the 47 plots fell within the airborne LiDAR survey extent, of which three were removed because of inadequate data, resulting in 38 field plots for airborne LiDAR model development (Figure 1b).
Stand height was defined as the mean height of all trees greater than or equal to Lorey's mean height within the plot.Lorey's mean height is a function of the mean basal area, whereby individual trees were weighted in proportion to their cross-sectional area at a height of 1.3 m [37].Only trees with a diameter ≥ 5 cm were used to compute Lorey's mean height.The crown closure of each plot was based on the average of 5 densiometer counts recorded at plot centre and the plot corners using the method described by Lemmon [38].

Airborne LiDAR Data
Two airborne LiDAR surveys were considered under the concept of LiDAR plots [39].First, on 15 August 2007, a 310 km 2 area airborne LiDAR survey was flown near Fort Simpson to cover the geographic extent of the field plot data (Figure 1b) (hereafter referred to as areal LiDAR).These data were acquired by an Optech ALTM-3100 sensor used onboard a de Havilland DHC-6 Twin Otter at an altitude range between 550 and 650 m above ground level.Data were acquired with a pulse repetition frequency (PRF) of 33 kHz, scan rate of 38 Hz, and a scan angle of ±20 • which resulted in a sampling density of at least 1 return per square meter.During the survey, the laser power supply was degraded due to a hardware fault resulting in a low density of ground returns over the study area.The areal LiDAR and accompanying field data were utilized in this study to develop LiDAR-based models of stand height and crown closure.
A second airborne LiDAR survey was flown in transects across Canada (hereafter referred to as transect LiDAR) between 14 June and 20 August 2010 using the same Optech ALTM-3100 onboard a twin engine PA-31 Piper Navajo [39,40].The transect data were acquired as part of a national mission where (full or portions of) 9 transects fell within the region of interest (Figure 1), all with similar survey configurations: above ground altitude from 600 to 1900 m, PRF from 50 to 70 kHz, and scan angle from 15 • to 20 • [40].While some variation in the LiDAR point cloud distributions is expected from different acquisition configurations, their effects were assumed to be minimal in the determination of canopy height and cover metrics for the noted flight parameters [41][42][43].Transect data have been similarly utilized in studies to estimate, calibrate and validate several forest inventory attributes such as canopy height, basal area, gross stem volume, canopy density indices, and aboveground biomass [39,44,45].A selected subset of this dataset was used as an independent validation source.LiDAR point clouds from both surveys were processed to 25 m × 25 m raster cells to generate a suite of metrics using FUSION [46].

GLAS Data
Throughout this study, release 33 of GLAS global altimetry raw waveform data (GLA01), waveform-based range corrections (GLA05), and global land surface altimetry data (GLA14) [47,48] were employed to model stand height and crown closure.Both GLA01 and GLA05 data were used to calculate metrics directly from the waveform energy profile, whereas GLA14 data were used for quality control and extracting laser footprint geometries.GLAS data were acquired under laser campaigns, defined as temporally discrete, approximately 1 month long periods of data acquisition.Each laser campaign is denoted as LXY, where X indicates the laser number fired (1, 2, or 3), and Y is a letter denoting the times that laser has acquired data (e.g., L2C represents the third data acquisition from laser 2).Of all the available laser campaigns, data were restricted to L2A and L3A only.These campaign footprints were selected as they were the only data to coincide within the areal LiDAR geographic extent, were acquired at similar phenological states to each other (fall), and contained footprints that were not acquired during winter (thus minimizing the influence of snow cover) [49].L2A data were acquired between 25 September and 19 November 2003 with footprints exhibiting a semi-major axis of 99.9 m and eccentricity of 0.88.L3A data were acquired almost a year later between 3 October and 8 November 2004, with a semi-major axis of 55.8 m, and eccentricity of 0.57.A total of 15,967 L2A, and 13,301 L3A footprints were available before quality control measures were executed.A more in depth description of the GLAS data acquisition strategy, GLAS mission properties and their suitability for forest mapping is available in the literature [50][51][52][53].

Methods
The developed framework (Figure 2) illustrates how forest attribute models were generated between field and airborne LiDAR data, which in turn were used to support development of GLAS modelled attributes.GLAS attributes were filtered and each level of quality controlled data was utilized as inputs for the k-NN imputation method.The accuracy of regional outputs from each filter was assessed against independent airborne LiDAR data, where available.The standard deviation of each output was calculated and used in conjunction with empirical distribution statistics (mean and standard deviation) from field plot, airborne LiDAR, and GLAS modelled attributes to propagate uncertainty to regional estimates via Monte Carlo uncertainty propagation techniques.

Airborne LiDAR Models
Models of stand height and crown closure were developed between the areal LiDAR and coincident field plot measurements.The models were based on airborne LiDAR height and point return ratio metrics derived from FUSION [46] as predictors in multiple linear regression and nonlinear multiplicative power models to find the best fit throughout the range of data.The model fit statistics computed for each model included the coefficient of multiple determination, adjusted R 2 (Adj.R 2 ), root mean square error (RMSE), and cross-validation mean prediction error (MPE) between each of the fitted statistical models.If there were no numerical differences between model fit statistics, the model exhibiting the greatest predictive range was favoured.For each FUSION-computed LiDAR metric, a constant variance Breusch-Pagan test for heteroscedasticity was employed to determine if the statistical model residual variance was dependent on the independent variables used in the model.Rejection of the hypothesis of homogeneity of variance would result in the use of a single independent variable in a functional form such as a power model for estimating the forest structure attribute.
LiDAR point clouds >2 m above ground were used to derive percentile metrics.For example, p95 represented the 95th percentile accounting for 95% of LiDAR return points >2 m above ground.High percentile metrics p80, p90, p95, and p100 were each investigated as linear regression model predictors of stand height, similar to other reported studies [44,54].Stand height estimates from the resulting function were submitted to a 5-fold cross-validation to estimate the mean prediction error.The 5-fold cross validation consisted of random sub-setting of the field-areal LiDAR data points into five mutually exclusive 80% training to 20% validation data splits.The resulting mean prediction error was calculated as the average of the results from the 5 folds to minimize random bias that may arise from a single computation of a single random selection.The resulting model was applied to the appropriate areal LiDAR metric to produce a 25 m resolution raster of stand height across the areal LiDAR area (Figure 1b).
Crown closure models were based on an apparent foliage profile approach [55-57], which requires the estimation of the probability of a gap from the top of the canopy to a given height class z (P gap ), 2 m above the ground.P gap is estimated by computing one minus the quotient of the sum of the total number of hits down to z (z j>z ), and the total number of independent airborne LiDAR shots (N All ) which includes both ground and canopy returns (Equation ( 1)).The cumulative projected foliage area index (L z ) was computed over a range from 2 m above ground to the top of the canopy (Equation ( 2)).The parameter L z was used in a power function as a predictor of crown closure.Areal LiDAR models were developed based on all coincident (38) field plots, and similarly submitted to a 5-fold cross-validation from which the MPE was computed.
There were no field measurements across the transect LiDAR that could support stand height and crown closure model development.Within the overlap region between the areal LiDAR and the transect LiDAR surveys, however, there was a linear association and a statistically significant correlation between the transect LiDAR p95 metric and estimated stand height from the areal LiDAR survey (i.e., r = 0.95, p = 0.00).Based on this result and that LiDAR provides accurate and consistent estimates of height [42], the stand height model developed from the areal LiDAR was applied to the same metric across the transect LiDAR data.Using a similar logic, the model form for estimating crown closure from the areal LiDAR was applied to the transect LiDAR data.This approach was similarly employed by Wulder et al. [39] in the context of a LiDAR plots approach to meet the needs for forest plot information in support of large-scale programs over ecologically similar regions.Descriptive statistics were used to compare stand height and crown closure from the transect LiDAR survey to those from overlapping GLAS footprints.

Determining Optimal GLAS Quality Control
The accuracy of canopy attributes is influenced by GLAS waveform data quality that in turn, is influenced by cloud contamination, atmospheric saturation, terrain slope [58][59][60][61], and the presence of snow [49].To control for these factors, the effect of applying successive filters was evaluated to determine the optimal set of filters to apply to the GLAS waveforms acquired over the NWT study area.
GLAS height percentiles were selected as response variables to determine the optimum filter set as they are strong predictors of stand height [21, 53,62].A set of GLAS filters was assessed against coincident transect LiDAR data, where within footprint transect LiDAR estimates of stand height were calculated as the area weighted mean of intersecting transect LiDAR cells within GLAS footprint perimeters.The initial GLAS model of stand height, derived from areal LiDAR data, was based on snow free footprints only, as snow is the greatest contributor of uncertainty in such estimates [49].Five filter levels (labelled Level 1 to Level 5; Table 1) were investigated, each applied in succession.Level 1 GLAS footprints were unfiltered, whereas globally applicable instrument and local height anomaly criteria were applied at Level 2. Mean footprint slope > 5 • and any associated vertical bias (the product of the mean footprint diameter and the tangent of the estimated slope) that exceeded 25% of initially estimated stand height [62] were removed to represent Level 3 filtering.Level 4 excluded footprints potentially affected by snow, based on Canadian historical climate data [63], as a function of laser campaign appropriate cut-off dates.As a result, GLAS footprints acquired after 22 October 2003 for L2A, and 15 October 2004 for L3A were removed.Level 5 data were filtered by GLAS laser pulse emission energy, rationalized by the pursuit of maximizing signal-to-noise ratios.That is, larger emission energies typically yield a greater signal-to-noise ratios, where the converse is true for lower emission energies making the latter more susceptible to waveform misinterpretation [53].To minimize the presence of such waveforms, those deemed to exhibit low emission energies (<28 mJ) were removed from analysis [53].Table 1 notes the details of each filter, the number of GLAS footprints, and associated percent changes for each level of quality control.
Table 1.Brief description of sequentially applied filters to GLAS footprints within the study region.The number of footprints removed (N − ) and remaining (N) after filter application, and the percent difference in the number of available footprints from filter Level 1 (∆% L1 ) and between each filter (∆% Diff ) are also noted.The number of GLAS footprints that were spatially coincident with the transect LiDAR extent decreases with each incrementing filtering level.To avoid possible bias, the largest number of coincident footprints common across all filter levels was sampled from the population of coincident footprints from each filter level.For example, after other filter applications, the Level 5 filter (lowest) population exhibited only 20 coincident footprints resulting in 20 footprints also being sampled from the populations of filter levels 1-4.The agreement between the GLAS and transect LiDAR stand height estimates was determined by the RMSE and mean absolute difference values following each level of filtering.The optimal filter set was determined when the RMSE and mean absolute difference values remained constant before and after the application of an individual filter.

GLAS Model Development
The areal LiDAR data were the only available data source to develop GLAS models of stand height and crown closure.This overlap provided data for the initial model of stand height to base assessment of GLAS filter optimization criteria (described above).As this initial model was based on data that were subject to the removal of footprints that may have snow, its use for estimating stand height from GLAS with the highest achievable accuracy was unknown.The GLAS footprints utilized in initial GLAS model development were revisited after the optimization criteria for GLAS data was defined from across the study area to determine if they required revision.After optimization, an additional footprint was removed from analysis due to its proximity (< GLAS footprint diameter) to a relatively large waterbody ( than GLAS footprint diameter), resulting in a total of 43 airborne LiDAR coincident footprints remaining for model development of GLAS stand height and crown closure.
The GLAS metrics evaluated as predictors of stand height included p85, p90, p95 and p100, and these were calculated based on ground elevations identified by a standard parametrization of the Gaussian distribution fit to the last peak which effectively isolates the ground return [47,64].Previous studies report lower height percentiles than p85 are less correlated with stand height, and were therefore not considered in model development [53].The optimal metric for modelling stand height was selected based on the same criteria used for areal LiDAR model development (i.e., maximizes adjusted R 2 , minimizes RMSE and MPE, but favours the broadest predictive range in the event that no differences are noted between model fit statistics for any two or more of the tested metrics).Intersecting transect LiDAR served as independent data to validate GLAS models.For crown closure, the GLAS equivalent (in reference to airborne LiDAR) of Lz was calculated as the ratio of the cumulative sum of waveform energy from 2 m above the identified ground elevation to the top of the canopy.While theoretically similar to airborne LiDAR return ratios, GLAS energy ratios require adjustment to account for the sensitivity of waveforms to apparent canopy and/or ground reflectance differences [58,64].To negate these energy profile imbalances, waveform ground returns were scaled as a function of land cover [65].The portion of the waveform to be scaled was identified using a standard parametrization of the Gaussian distribution fit to the last peak.The most suitable scaling factors were applied based on the dominant vegetation type within each GLAS footprint, where the scaling factors 1, 0.25, and 2 were applied to coniferous, deciduous, and mixedwood dominated footprints, respectively [65].

Regional Mapping
k-NN is an imputation method that predicts a numerical target based on predictor data as a function of similarity distance measures.The method stores known cases of target and predictor data and utilises these cases to best match externally sourced predictor data to predict a numerical response.k-NN was selected in this study as it is widely used in Europe, the United States and Canada as a tool for forest inventory mapping as a function of multi-source geospatial data [27,[66][67][68].The k-NN method was used to translate the point-based GLAS-modelled estimates of stand height and crown closure to a 30 m grid [67,69].GLAS estimated stand height and crown closure were model response variables estimated as a function of an optimal set of a possible 11 coincident 30 m raster products (bilinearly resampled from native resolution where necessary) assigned as predictor variables (Table 2).The categorical variable, EOSD, was coded sequentially from 1 to 3, representing coniferous, mixedwood, and deciduous land cover classes, respectively.The value of each predictor variable at the individual GLAS points was calculated as the area weighted mean of the predictor cell values within the GLAS footprint perimeter [28,31].Distance weighting and number of neighbours are user input parameters in the k-NN method that can significantly change model outputs.An optimization process adapted from McRoberts et al. [67] was used to define the optimal parameters for running the k-NN algorithm.Combinations of the neighbour weighting kernel, a Minkowski distance parameter, number of neighbours (k), and predictor variable selection were varied to derive the model containing the largest overall model accuracy.The "kknn" package from the R project was used to investigate 8 neighbour weighting kernels [86] in combination with values of k ranging from 1 to 10, and different predictor combinations (from all combinations of 2, and 3 predictors etc. up to all predictors).The distance metric (D), measured within the n-dimensional predictor (feature) space, was evaluated using three unique Minkowski distance (Equation (3)) parameter (λ) values, where λ = 1 corresponds to Manhattan distance, λ = 2 represents common Euclidean distance, and λ = 3 approximates Chebyshev distance.The number of dimensions in the space that X and Y occupy was represented by n, with x i and y i indicating their respective dimensional coordinates [87].
The optimal combination of both input parameters and predictor variables used in the k-NN process was selected based on which unique combination produced the smallest mean square error as calculated from a leave-one-out cross validation procedure [86,88].Preliminary results indicated that the optimal value of k exhibiting the lowest mean square error was large given the large amount of data available for model training (Figures S1 and S2).Determining the optimal model is also affected by the strength of the relationship between the response and predictor variables.This expectedly increased algorithm complexity and computational intensity as noted by McRoberts et al. [67].As a mitigation strategy, we followed the suggestion of McRoberts et al. [66] and selected a smaller value of k that would yield a mean square error within 5% of the lowest mean square error obtained (at larger values of k).The optimization techniques for k-NN imputation were applied separately for stand height and crown closure for the optimal set of quality controlled GLAS footprint data.We elected not to utilize k-NN's multivariate predictive capabilities in order to optimize individual attribute accuracies.

Uncertainties
Uncertainties inherent in regional forest attribute estimates may propagate from errors in field measurements and acquisition of airborne LiDAR and GLAS footprint data.The propagation of uncertainty is undertaken at each modelling step, and is a function of the data quality used in model development at each step [89].It is challenging to empirically quantify propagated uncertainty at each modelling step due to variations in sample sizes at each modelling step, and varying levels of spatial coincidence, however, Monte Carlo methods can be utilised to address this problem [90,91].The Monte Carlo method of error propagation consists of repeated calculations of a quantity based on random inputs within the limits of data precision.The distribution of the calculated quantity then illustrates the effects of the imprecision of the data [92].Monte Carlo methods rely on a "black box" approach to model equation systems, therefore the need to solve complex first and second order derivatives, common in other methods, is negated [93].Furthermore, Monte Carlo simulations avoid errors associated with the linearization of otherwise non-linear models [94], a concern when model forms are highly non-linear, as is the case for crown closure.It is important to note Monte Carlo simulations did not quantify unknown uncertainties such as the temporal mismatch between datasets.
Monte Carlo simulations require the knowledge of the data distribution and its associated mean and standard deviation.Given a known data distribution, it is possible to relate the model response variable (Y) to some given model dependencies (x and p) via a simple function (Equation ( 4)).
To simulate uncertainties in Y, 1000 random values of x and p were generated from their respective empirical data distributions, and Y calculated for each.This yields a distribution of 1000 Y values from which the mean and standard deviation could be calculated.In this study f(x, p) took the form of a single or nested (i.e., to propagate from field to airborne LiDAR to GLAS data) linear (stand height) or power (crown closure) function.Given uncertainties were propagated from field plot scale to GLAS scale (and to regional scale, as explained below), an equality of variance test was applied to both attributes that verified the GLAS data and field data distributions were statistically similar (p = 0.56 for stand height and p = 0.10 for crown closure).
Uncertainty propagation through the k-NN algorithm was conducted on a unique pixel basis, where all k nearest neighbors selected from the reference set, used to estimate a response for a given unique cell, were used to calculate the pixel-level standard deviation.Estimates of pixel-level statistics allowed the propagation of uncertainty from GLAS data to each cell via MC simulations, where each cell's uncertainty represented an accumulated estimate of the standard deviation propagated from field plot to regional k-NN mapped products via airborne LiDAR, and GLAS data.

Regional Attribute Assessment
Regionally mapped products were compared to the transect LiDAR.A 1% sample of all cells from the transect LiDAR were randomly selected and filtered for burn scars, resulting in 11,401 useable cells to assess k-NN outputs.To minimize the probability of spatial covariance, the minimum distance between sampled cells was defined >75 m (i.e., three times the size of a single cell).Regional attribute maps were compared against the transect LiDAR as a function of within cell dominant forest vegetation type (coniferous, deciduous, or mixedwood) and ecoregion.We chose not to conduct statistical similarity tests (such as a paired t-test) because the large number of samples would result in high statistical power making them sensitive to very small changes which increases the probability of falsely rejecting the null hypothesis [95].We utilized the mean difference between the regionally mapped k-NN products and transect LiDAR data (calculated as k-NN minus transect LiDAR) and summary statistics such as the mean absolute difference, and the RMSE to assess the regionally mapped k-NN products.

Airborne LiDAR Models
A simple linear function based on the areal LiDAR p95 metric was selected as the best predictor of field measured stand height (Table 3).A power function based on Lz was used as a predictor of crown closure (Table 3).Despite the inherent challenge associated with measuring and modelling crown closure from above-and below-crown perspectives, and the limited data range attained from the field plots (28% based on plots sampled), the model Adj.R 2 of 0.63 and RMSE of 5% were comparable to what has been reported in the literature [96,97].Model fit statistics for both stand height and crown closure were indicative of model performance as noted by the similarity between the magnitudes of RMSE and the resulting cross-validation mean prediction error.

GLAS Models
With increased filtering, initial GLAS models of stand height better represented transect LiDAR equivalents with no appreciable gains past Level 4 (Figure 3).No data were removed from Level 4 to Level 5 filtering, and any difference in the presented statistics between filter Levels 4 and 5 were attributed to the selection of different data samples within the same population.As the difference between filter Levels 4 and 5 was negligible, Level 4 quality control yielded an optimal set of GLAS data for regional mapping of stand height and crown closure by k-NN.The GLAS p85 metric demonstrated the strongest association with areal LiDAR stand heights (Table 4) by linear regression.Similar to areal LiDAR modelled crown closure, the GLAS equivalent was described by a simple power function utilizing Lz values as the dependent variable (Table 4).The similarity between RMSE and cross-validation mean prediction error values demonstrate that model fit statistics were indicative of model performance.
There were 55 independent GLAS footprints that coincided with the transect LiDAR used for independent validation.From these, stand height was linearly related and statistically correlated with the transect LiDAR (Figure 4a; r = 0.95, p = 0.00).A similar linear relationship and statistical correlation was determined for crown closure (Figure 4b; r = 0.89, p = 0.00).The descriptive statistics between GLAS and transect LiDAR were relatively similar for both stand height and crown closure (Table 5), although the minimum and mean values for GLAS crown closure were 13% and 11% smaller than transect LiDAR equivalents, respectively.On average, GLAS crown closure was lower and within 5% of the transect LiDAR, a value that is within the precision of densiometer measurement in the field and the areal LiDAR energy degradation that may have affected model transfer.The crown closure estimates were equally varied with estimates occurring over a very similar range (Table 5).GLAS data provided a reasonable estimate of both stand height and crown closure based on transect LiDAR as a comparison dataset.Based on these results, GLAS models of stand height and crown closure were scaled spatially across the study area using the k-NN algorithm.

Regional Mapping and Uncertainties
The optimal number of neighbours (k), Minkowski distance parameter, weighting kernel, and predictor set were determined by iterative k-NN imputation and mean square error analysis using Level 4 quality controlled GLAS data (Table 6; for a visual analysis of k-NN optimization for stand height and crown closure, see Figures S1 and S2).Overall, optimal k-NN parameters and predictor sets were the same for stand height and crown closure excluding the Minkowski distance parameter and two Landsat bands (Table 6).The inverse distance weighting kernel and a relatively low number of neighbours (k = 5) are typical of many k-NN studies [68].Of the input predictor variables (Table 2), those selected by the model were considered optimal because they minimized error in the estimates (Table 6).
The output ranges of regionally mapped stand height (Figure 5a) and crown closure (Figure 5b) were similar to output ranges for GLAS k-NN inputs, the areal LiDAR, and the field plot data.The regional crown closure ranges were between 20% and 70%, which resembled the range of the GLAS footprints used to drive model imputation, and the independent intersecting transect LiDAR data.The largest values depicted in the maps of Monte Carlo-simulated uncertainties (Figure 5c,d) corresponded with the largest values of imputed stand height (Figure 5a) and crown closure (Figure 5b), respectively.The distribution of k-NN mapped stand height tended to overestimate at lower values (between 6 m and 17 m) and underestimate at higher values (>18 m) with respect to intersecting transect LiDAR (Figure 6a).A similar (but amplified) trend was noted for crown closure (Figure 6b).The distribution of k-NN crown closure appears to differ in shape with respect to transect LiDAR.This suggests that mapped crown closure was frequently underestimated between 50% and 65%, and overestimated from 30% to 50%.Similar trends were observed for attribute mapping within Canada using the k-NN algorithm [27].Monte Carlo-simulated uncertainties associated with mapped stand height (Figure 6c) followed a more linear trend compared to those associated with crown closure (Figure 6d), which appears more sporadic.Table 6.Finalized k-NN model parameters for stand height and crown closure, derived from initial optimal parameters.Bracketed values of k are those deemed optimal, prior to 5% mean squared error adjustment.Note, λ refers to the Minkowski distance parameter (Equation ( 3)).

Regional Attribute Assessment
Based on forest land cover type and average mean difference values, coniferous was most accurate for stand height and deciduous was most accurate for crown closure (Table 7).Stand height was equally underestimated for deciduous (i.e., −2.4 m) and mixedwood (i.e., −2.2 m) stands with the average difference in crown closure being largest for mixedwood stands (i.e., −7.5%).This latter value was not unexpected given that mixedwood stands tend to be the most variable ranging from conifer-dominant to deciduous-dominant species compositions.Overall, stand height differences were within 3 m and crown closure within 8% of independent, transect LiDAR derived values.Notwithstanding, stand height (percent difference −7.1%) was observed to be better characterized than crown closure (percent difference −11.1%), an expected result given the challenges and inherent uncertainties in measuring crown closure in the field [98].
Average differences in height were underestimated in 73% (16 of 22 ) of the ecoregions with 27% (6 of 22 ecoregions) exhibiting a difference larger than ±2 m.Crown closure differences were, on average, underestimated with values from 4% to 7% lower than transect LiDAR estimated values across the three species land cover types (Table 7).The largest mean differences (for both stand height and crown closure) occurred across more westerly ecoregions such as the South Mackenzie Plain and the Liard Upland and Liard Plain, suspected to occur as a result of increased topographic complexity (similarly observed by Beaudoin et al. [27]) and greater heterogeneity in forest type.This is further supported by the mean absolute difference and RMSE values, where larger magnitudes tend to occur for ecoregions with less homogeneity (Table 7).Some of the differences were attributed to relatively low sample sizes within some ecoregions (i.e., Horn Plateau, Nahanni Range, and Sibbeston Lake Plain).

Discussion
This study developed and applied a methods framework (Figure 2) that combined multi-source data from field plots, airborne and spaceborne LiDAR, optical remote sensing, climate and physiographic data to map two forest attributes, stand height and crown closure over a northern boreal region.These attributes are often used in characterizing forest stands [33], and used to derive other attributes such as volume and above ground biomass [14,71,99].While the concept of combining multi-source data in spatial mapping is not new, limited data availability across a remote, northern boreal region posed unique challenges over a landscape with considerable diversity in stand structure and composition.A key element of our study was to use airborne and spaceborne LiDAR to spatially extend forest attribute estimates from a limited sample of field plots.Four questions relevant to the methods framework and results achieved that merit discussion are: (i) How did the methods framework compare to published studies?(ii) How did the accuracy and uncertainty estimates achieved compare to other studies?(iii) What are the limitations and opportunities for improvement?(iv) What are the potential applications of the products generated?

How Did the Methods Framework Compare to Published Studies?
Remote sensing-based forest inventories that couple field with Landsat Thematic Mapper and other multi-source spatial data with or without nearest neighbour techniques have become increasingly popular to predict and map attributes such as stand height, timber volume and aboveground biomass [68, [100][101][102][103]. Four examples spanning an 11 year period from 2007 to 2018 were selected from the literature [27,28,31,104].First, Gjertsen [104] generated a multi-source forest inventory in Finland based on the k-NN method using data from field plots, land cover maps and Landsat satellite image data to map selected forest attributes.In Finland, the national forest inventory is based on a regular 3 km × 3 km network of permanent sample plots of which 20% is re-measured every five years.The benefit of such a dense network is that all forest conditions to be mapped would be represented in the dataset [104].National forest inventory field plot networks in European forests such as Finland are at much higher densities than those in Canada, particularly in northern boreal regions.
Second, Beaudoin et al. [27] made use of national forest inventory plots with the k-NN method to nationally map 127 stand-level attributes across Canada.The National Forest Inventory in Canada consists of a 4 km 2 photo plot on a regular 20 km x 20 km network, of which 10% of photo plots are sampled in the field [9].Photo plots with detailed forest attributes were not available above 60 • North latitude, resulting in product reliability being less certain at higher latitudes.Contiguous maps were produced using MODIS (Moderate Resolution Imaging Spectroradiometer) data, limiting product resolution to 250 m.There was an acknowledged need to improve maps generated by Beaudoin et al. [27] in northern taiga boreal ecozones, which was in part the rationale for our study.
Third, Margolis et al. [28] employed a three-phase sampling approach to estimate total above ground biomass from field plot, airborne profiling LiDAR and GLAS data as samples within broad strata defined by land cover and ecoregion across the North American boreal forest.Their approach was not probability-based, but rather based on ground plots that were as representative as possible across the area of interest.The GLAS orbits from which the footprint samples were derived were assumed to be randomly acquired resulting in multiple samples within each defined stratum.Aside from the nuances between airborne profiling and scanning LiDAR, differences with the current study were that a spatial imputation algorithm to map forest attributes was not employed.
Fourth, a national transect of airborne LiDAR [39] was used as calibration and validation data in combination with Landsat satellite reflectance data, geographic position and topographic data to map 10 forest structure attributes using a Random Forest nearest neighbour imputation approach [31].More than 80,000 airborne LiDAR plots were used to generate and validate attribute models that attained coefficient of determination (R 2 ) values ranging from 0.49 to 0.61 with mapping being done across the Canadian boreal forest.Large area map accuracies in terms of average RMSE normalized by the mean of observations were found comparable to other reported studies [27,68].The study was novel considering its national geographic scope across Canada.Of particular interest in our study was to undertake a more regional scale inventory by anchoring local field plot measurements and scaling of LiDAR data that would be specific to the ecological distribution of forests in the southern NWT.The spatial representativeness of the LiDAR data has been acknowledged as an important element for generating locally accurate maps of forest attributes [39].
The similarity across these studies was the use of multi-source data in generating maps of forest attributes.How such an approach was implemented was in part attributed to data availability, forest conditions and spatial extent over which the mapping was to be undertaken.In this study, the limited number of field plots necessitated a need to extend the sample distribution through modelling.The airborne LiDAR spatially extended the field plots that in turn, were helpful in further extending the sample coverage through modelled relationships with GLAS data.The GLAS data then became the spatially distributed dependent variable data as input to the imputation functions using k-NN.Land cover, physiographic, climatic and Landsat data served as predictor variables in a k-NN mapping framework (Figure 2) to yield 30 m resolution regional products.

How Did the Accuracy and Uncertainty Estimates Achieved Compare to Other Studies?
An accuracy assessment and uncertainty analysis provided insights into the quality of the derived estimates and maps of stand height and crown closure (Tables 3 and 5 and Figures 4-6).Broad assessment over the study area indicates results were broadly similar to other large-area mapping studies for stand height.For example, Bolton et al. [105] compared two global maps of variants of GLAS-based canopy height [62,106] with 95th percentile heights from the Canadian national airborne LiDAR transect as a function of Canadian ecozone.The results of Bolton et al. [105] indicate that relative to the products of Lefsky [106], and Simard et al. [62] (RMSE = 6.68 m, and 5.85 m, respectively), the current study demonstrates improved accuracies of stand height over the Taiga Plains.Caveats are required for more in-depth comparisons as Lefsky [106] mapped Lorey's height at a 500 m resolution, whereas Simard et al. [62] mapped RH100 (GLAS 100th percentile) at an approximate 1 km resolution.The relative accuracies of the current study's 30 m stand height product are augmented upon acknowledging that the accuracy of forest variables at the pixel level may be low with imputing methods such as k-NN, but are higher when aggregated to the stand level [104].A variant of the presented stand height product at a similar 30 m resolution across the Canadian boreal region was generated by Matasci et al. [31].Between k-NN predictions and transect LiDAR stand height estimates the coefficient of determination (R 2 = 0.34) was calculated to facilitate a comparison with the 95th percentile height produced by Matasci et al. [31] over the Taiga Plains (approximately R 2 = 0.48).It is challenging, however, to compare any single statistical metric at the scale of the Taiga Plains Ecozone that, in itself, spans a range of climatic and physiographic conditions that encompasses multiple ecoregions (34, Table 7).For crown closure, there were no directly comparable studies.At the national level, Beaudoin et al. [27] reported an R 2 = 0.64 for 6 well-inventoried ecozones and two partly-inventoried ecozones (Mixedwood Plains and Taiga Cordillera) using National Forest Inventory ground plots and 250 m MODIS data.Our result was an R 2 = 0.34 representing a portion of the Taiga Plains, a much more poorly-inventoried ecozone using data that was affected by the propagation of the degraded laser energies experienced during data acquisition.Future field and airborne data acquisition are needed to support more definitive assessments.
In this study, the distribution of stand height and crown closure was within 3 m and 8%, respectively, between k-NN and independent transect LiDAR data when assessed by broad species and Taiga Plains Ecoregion, demonstrating consistency across different spatial extents regardless of sample representation [28,107].Land cover analysis indicated mean differences (between k-NN and transect LiDAR assessment data) were smallest for coniferous vegetation compared to deciduous, and mixedwood.This was expected as the proportion of data available for model development was dominated by coniferous data, whereas (relatively) few data were available for deciduous and mixedwood vegetation.This distribution was typical of the study area, resulting in more coniferous data being available for model training.This predominantly coniferous dataset provided insight into possible modifications to sample each vegetation land cover type more equally in future modelling with the need to ensure representivity.In particular, future modelling could benefit from increased sampling of a wider range of deciduous and mixedwood stands to increase their characterisation accuracies.Ecoregion analysis indicates broad similarity in mean differences for both stand height and crown closure across all ecoregions within the study area.However, greater differences tended to occur for Ecoregions situated in the western reaches of the study area which are increasingly mountainous with respect to the majority of the study area.Terrain complexity can create challenges concerning the analysis and interpretation of GLAS waveforms as return signals can be reflected from similar elevations causing signal mixing [24].
The spatial estimation of uncertainties as a function of each attribute (Figure 5c,d) aids in the identification of model limitations.Reported uncertainties represent contributions from model transfer between data sources (field plots, areal LiDAR, and GLAS), and the range of predictions from the k-NN model.However, where a primary contributor of uncertainty was expected through product assessment, multiple sources of unquantifiable uncertainty exist.The stand structure models developed for the areal LiDAR were compromised when applied to the transect LiDAR (used for assessment purposes) due to degraded laser energies experienced during the acquisition of the former.This resulted in inherent differences in the calculated stand attributes across the two LiDAR datasets because of differences in their relative point cloud distributions.Uncertainty from temporal disparity between model development and assessment data sources was not a key contributor.Although these data sources represent slightly different forest conditions relative to each other, the influence of a seven-year disparity between datasets is considered minimal as their "shelf-life" is expected to exceed ten years [108], considering the general rates of net primary productivity of forests in northern boreal regions [109].
It was not possible to quantify uncertainties from only having coincident field and areal LiDAR data from which to model and map stand attributes across the entire study area.An assumption was necessary whereby the spatially limited areal LiDAR data used to develop forest attribute models, was considered representative over the entire study area.Other possible sources of uncertainty in the current approach can be attributed to misrepresentations in the utilised ancillary data (e.g., EOSD land cover misclassification, elevation artifacts) and airborne LiDAR and GLAS geolocation errors [110].Furthermore, the differences between airborne multiple point discrete return and spaceborne continuous waveform profiling data formats have been noted to produce different variances in attribute estimates for boreal forests [111].Typically, profiling formats reduce variability in regression models due to the additional information afforded by such data, however, this effect is uncertain for the larger footprint of GLAS due to possible aggregation effects.
Despite the inherent uncertainties in scaling from field and airborne to satellite LiDAR, the results attained demonstrate the viability of an inventory mapping approach over a northern boreal region where existing inventory data is sparse and of varying vintage.Moreover, the uncertainty exercise adds value through its provision of confidence limits to forest inventory data that acts as a guide for future mission planning to acquire additional data that could improve model development and forest inventory mapping accuracy.On reflection, the reported values are expected to represent a near upper limit of uncertainties.This expectation is driven by the large inherent differences between the development and assessment data sources which are known to artificially inflate discrepancies.

What Are the Limitations and Opportunities for Improvement?
The k-NN estimates of both stand height and crown closure were constrained by the input range of the GLAS data relative to the full extent of landscape conditions represented in the study area.While the inclusion of additional GLAS laser campaigns may have mitigated this issue, most GLAS campaigns were acquired under winter conditions where vegetation phenology and snow cover precluded their use.Utilizing a fundamentally more accurate data source (e.g., airborne LiDAR data) in k-NN models may have improved regional estimates, as the quality of imputed products is proportional to the quality of inputs.However, it is uncertain if the spatially reduced representation of conditions can overcome the accuracies achieved compared to a more spatially explicit dataset such as GLAS, notwithstanding the quality control challenges associated with the latter.The challenges associated with the use of GLAS data in this study prompted the development of a quality control ruleset of filtering that resulted in determining the optimal level for attribute modelling purposes.This approach warrants its utility in regional mapping of stand attributes across high latitude boreal regions [112].
The intrinsic limitations associated with optical data (utilised as predictor data) may have contributed to the discrepancy between stand height and crown closure observations.Other than in limited cases, passive optical data are generally poor predictors of stand height and stand structure [113].Measurements from an active sensor such as airborne and spaceborne LiDAR should have mitigated such a problem; but therein lies the inherent challenge of measuring crown closure and other canopy cover metrics from both ground and remote sensing data [114,115].
The GLAS input data influenced the range of regional estimates of crown closure more than stand height.This was suspected to be a function of laser pulse energy degradations in the areal LiDAR data, from which GLAS models were developed.There was a disparity between the GLAS and transect LiDAR distributions (Figure S3) that was not initially detected during validation (Figure 4) due to the narrow range of data represented in the limited overlap between the two data sources.This was further observed by examining the ranges of the transect LiDAR and GLAS input data with respect to both attributes (Figure S3), whereby distributions associated with stand height exhibited a greater similarity than crown closure distributions.This suggests that effects of the areal LiDAR energy degradations was not an issue for high percentile LiDAR metrics used as an estimator of stand height.When areal LiDAR models were applied to the transect LiDAR data, a saturation was observed in crown closure at approximately 60% (Figure 6b).This produced as a lack of sensitivity in the saturation range of the crown closure assessment data that likely contributed to the inflated assessment statistics relative to stand height (Table 7).
Possible avenues for improving spatial estimations of stand structure include the exploration of k-NN's multivariate capabilities as well as the use of both mean square error and mean deviation to potentially better optimize the k-NN algorithm and improve map accuracy.The inclusion of the mean deviation at attribute distribution end-members as an optimization agent has proved valuable in reducing bias at extreme values [116], especially for crown closure.Furthermore, the addition of finer (native) resolution predictor variables, including vegetation spectral indices less sensitive to topographic effects may warrant further investigation.Studies using satellite radar backscatter from L-band sensors such as ALOS PALSAR (Advanced Land Observing Satellite Phased Array type L-band Synthetic Aperture Radar) and other radar sensors have shown promising results in the estimation of various forest inventory attributes [100,[117][118][119].While saturation of radar backscatter has been observed in high biomass regions [117], the lower stand structure and aboveground biomass of the northern boreal compared to tropical regions suggests the exploration of L-band multi-polarization satellite radar data as a spatial predictor of stand structure merits investigation to mitigate the effect of Landsat spectral saturation.A similar approach was employed by Cartus et al. [118] where ALOS PALSAR and Landsat were used in regression tree models to map canopy height and growing stock volumes with R 2 values that ranged from 0.70 to 0.85.In their study, the combined use of multi-temporal PALSAR intensity, coherence and Landsat resulted in higher retrieval accuracies than from any of the datasets alone.
Nearest neighbour methods are non-parametric approaches to multivariate prediction that have proven useful for predicting continuous forest attributes [120].With these methods, no prediction may be smaller or larger than the input reference dataset, resulting in the implicit assumption that the distribution of the input dataset must be fully representative of the region over which estimates are to be obtained [120].Other methods for imputing forest inventory variables include variants of nearest neighbour such as Most Similar Neighbour and Random Forest that have generated favourable results when compared to conventional k-NN [121,122].Data dimensionality and complexity increases as the number of predictors increases (although relatively few predictors were tested here), raising challenges in determining the optimal predictor dataset for imputation.However, the use of Random Forest can mitigate the need to execute such comprehensive optimization procedures as the algorithm includes a feature selection process [123].That is, Random Forest can determine which predictors contribute most to overall model accuracy through internal error assessment.This serves to reduce computation complexity, and can also reduce the degradation of overall accuracies by eliminating noisy or unsuitable predictors [124,125].Moreover, its robust non-parametric nature, computational efficiency, internal assessment of error and additional parameters, such as variable importance, has been deemed beneficial for the spatial mapping of forest variables [31, 126,127] and is thus worthy of consideration in future work.
A follow-up study will investigate the application of the presented methods framework (Figure 2) to more northern latitudes to assess latitudinal inventory gradients.However, this poses a unique challenge associated with data availability in an already data limited environment.The northern limit of the national airborne LiDAR transect within the Taiga Plains is approximately 65 • North latitude, therefore the acquisition of additional airborne LiDAR data will be necessary to aid in the spatial extension of the current study.Moreover, the use of upcoming spaceborne missions such as ICESat-2 [128] may supplement other available remote sensing data for the purposes of forest attribute mapping in a high latitude boreal region.

What Are the Potential Applications of the Products Generated?
Maps of stand height and crown closure along with broad species information from EOSD land cover are fundamental attributes in describing forest composition and structure.In consultation with the Government of NWT, these forest attributes are useful for a multitude of resource management objectives.Three examples of applications include enhancing knowledge of the NWT ecological land classification, informing both boreal woodland caribou (Rangifer tarandus caribou) and Barren-ground caribou (Rangifer tarandus groenlandicus) habitat, and identifying areas for post-fire salvage logging.
The landscapes of the NWT include a wide array of interacting terrain and climatic conditions that result in a range of ecosystems varying in composition, productivity and biodiversity [35].There is an increasing need for indicators that describe how the spatial distribution and patterns in vegetation cover is developing and evolving, particularly with changing climate conditions [129].The Taiga Plains ecosystem classification provides a framework for describing the biophysical patterns within a geographic context, and serves as a framework for sustainable forest management [35].Building thematic maps of forest composition and structure provides knowledge into the amount and distribution of how forest vegetation is distributed among different ecoregions, and provides the means to assess trends over time through periodic reassessments following succession, natural disturbances caused by fire, and forest recovery.
The boreal woodland caribou and barren-ground caribou are considered threatened on the NWT list of species at risk [130].The Government of NWT has developed a recovery strategy for the boreal woodland caribou [131] and a barren-ground caribou management strategy [132] to support the recovery and sustainability of NWT's caribou herds.Land cover maps and forest structure information such as forest height and crown closure can and are being used to develop indicators and maps of wildlife habitat suitability [133,134].Information is also needed to characterize the status and changes to caribou habitat suitability, particularly in response to changing disturbance patterns (e.g., forest fires) resulting from climate change and industrial activities [135].The interplay of Global Positioning System collars on caribou with knowledge of forest composition and structure can provide further insights into caribou use of the landscape [136].For these reasons, there is particular interest in the integration of multi-source information that includes products developed from this study to advance studies into implementing the policies defined in the NWT caribou strategy documents.
The Government of NWT has implemented a Biomass Energy Strategy to establish and expand a woody biomass energy industry throughout the NWT [137].Its purpose is to support multi-user efforts to reduce energy costs, decrease greenhouse gas emissions and develop a local, renewable industry based on NWT forests.This creates a need for a sustainable supply of woody biomass.Knowledge of land cover and forest structure generated by this study provides information about the distribution of forests across the NWT.The projected increase in fire occurrence and area burned in the NWT [138] raises concerns regarding the sustainability of its timber supply when there is a desire to increase its forest industrial base.Following forest fires, assessing salvage logging potential is necessary to minimize losses in timber value.As a result, the products from this study can be used for information of pre-burn vegetation to help identify areas that merit further assessment for salvage logging following recent fire disturbance.
These three applications provide a subset of examples of the many applications and information needs that could be served by more current assessments of NWT forests beyond the spatial footprint of its current operational inventories.Current work to expand the suite of mapped forest attributes to total wood volume and aboveground biomass would further support the applications of this study.

Conclusions
This study developed a methods framework (Figure 2) to address the challenge of large-scale forest attribute mapping in remote regions where in situ field data are sparse or unavailable.Over a study area in the southern NWT within the Taiga Plains Ecozone, estimates of stand height (mean difference = −1 m) and crown closure (mean difference = −5%) were produced along with an analysis of spatial uncertainty that could be used to target future work.Opportunities for improving the inventory maps include using in situ and airborne LiDAR data that would be more representative of the full ecological range of the area to be mapped, and by using as temporally similar datasets as possible.Imputation results may be improved by inclusion of other remote sensing data that may be more sensitive to changes in stand structure such as PALSAR, and investigating imputation algorithms such as Random Forest.The inventory data products generated have potential to provide valuable information to stakeholders (the Government of Northwest Territories) to fulfill reporting requirements and aid in the decision making process regarding the status, condition, and distribution of forest resources in northern boreal regions.
Given the utility of stand height in the estimation of tree volume and above ground biomass, the developed products could be used to infer these additional attributes across the study area.A problem in inferring these derived attributes over large areas is often where there is a lack of coincident field and airborne or satellite LiDAR.An approach to this problem was employed by Saatchi et al. [99] who developed power-law functional relationships between AGB and tree height from field data alone.Estimates of tree height from airborne and satellite LiDAR could then be used with field-based functions to estimate both tree volume and AGB that could subsequently be spatially imputed.Previous studies have demonstrated the utility of stand height as a predictor of both tree volume and AGB [54,71,139], and thus this work can serve as the foundation for estimating and mapping these additional forest inventory variables.The upcoming Global Ecosystem Dynamics Investigation and ICESat-2 missions will provide data that may be utilised in conjunction with current products to initiate change detection and monitoring investigations.
The presented framework's applicability over large geographies, and to other modelling scenarios including (but not limited to) wetlands, permafrost, and biomass, indicate further value to a host of critical stakeholders.Given the adaptability of the framework, future work could focus on developing more concise models of stand height and crown closure based on multi-source remotely sensed datasets and mapping other attributes such as stand volume and aboveground biomass over a larger, more ecologically diverse landscape in the Canadian north.

Supplementary Materials:
The following are available online at www.mdpi.com/link.Figure S1: Visualization of the k-NN optimization for stand height.Component (a) illustrates which Minkowski distance parameter minimizes the mean squared error (MSE); (b) illustrates which number of predictors is best (three in this case); (c) illustrates which combination of 3 predictors minimizes MSE (combination 104); and (d) optimizes predictor combination 104 for k (found to be six in this case, indicated by black point) and weighting kernel (inverse distance in this case).Figure S2: Visualization of the k-NN optimization for crown closure.Component (a) illustrates which Minkowski distance parameter minimizes the mean squared error (MSE); (b) illustrates which number of predictors is best (seven in this case); (c) illustrates which combination of 7 predictors minimizes MSE (combination 40); and (d) optimizes predictor combination 40 for k (found to be 10 in this case, indicated by black point) and weighting kernel (inverse distance in this case).Figure S3: Distributions of all transect LiDAR and optimal Geoscience Laser Altimeter System (GLAS) data utilized to drive k-NN models for: (a) stand height; and (b) crown closure.Descriptive statistics (minimum, maximum, and mean) associated with each data source demonstrate the similarities or differences.

Figure 1 .
Figure 1.Study site location and data distribution: (a) the study site and Taiga Plains ecozone within Canada; (b) field plot locations with respect to the areal LiDAR extent, and overlapping transect LiDAR and Geoscience Laser aLtimeter System (GLAS) footprints; (c) the distribution of areal and transect LiDAR, and GLAS footprints throughout the study site; and (d) along orbit GLAS footprint separation (note points represent footprint centres, not extent).

Figure 2 .
Figure 2. General workflow of the developed methods framework.Note, merged data refers to each set of filtered data (response and predictor variables) being submitted to the k-NN algorithm.

Figure 3 .
Figure 3. Summary of the change of root mean squared error (RMSE) and mean absolute difference (MAD) noted between intersecting Geoscience Laser Altimeter System (GLAS) and transect LiDAR estimates of stand height as a function of five different levels of quality control.

Figure 4 .
Figure 4. Comparison of Geoscience Laser Altimeter System (GLAS) modelled (a) stand height and (b) crown closure with 55 coincident transect LiDAR data.

Figure 5 .
Figure 5. k-NN mapped (a) stand height and (b) crown closure, and associated uncertainties mapped for (c) stand height, and (d) crown closure.Non-forested features appear as white.
for Sustainable Development of Forests; b Canadian Digital Elevation Model.

Figure 6 .
Figure 6.Comparisons of transect LiDAR and regionally mapped k-NN distributions of (a) stand height, and (b) crown closure, and the relationship between k-NN attributes and their associated uncertainties for (c) stand height, and (d) crown closure (yellow shades indicate higher point density, as noted by inset legends).Comparisons are based on 1% of intersecting transect LiDAR data, resulting in 11,401 points (after removal of fire affected data).

Table 2 .
Summary of all predictors available for forest attribute predictions via the k-nearest neighbour algorithm.
a Earth Observation for Sustainable Development of Forests; b Canadian Digital Elevation Model; c Normalized Difference Vegetation Index; d Normalized Difference Moisture Index; e Soil Adjusted Vegetation Index.

Table 3 .
Resulting model coefficients and fit statistics for testing areal LiDAR height metrics and Lz metric against field plot stand height and crown closure equivalents.P95 was determined as optimal as it exhibits most favourable Adj.R 2 , RMSE, and MPE.Implemented models are bold.Note, min and max are the minimum and maximum model predictive ranges, respectively.

Table 4 .
Resulting model coefficients and fit statistics for testing Geoscience Laser Altimeter System (GLAS) height metrics and scaled Lz metric against areal LiDAR modelled stand height and crown closure equivalents.P85 was determined as optimal as it exhibits most favourable Adj.R 2 , RMSE, and greatest predictive range.Implemented models are bold.Note, min and max are the minimum and maximum model predictive ranges, respectively.

Table 5 .
Summary statistics indicating the similarity between Geoscience Laser Altimeter System (GLAS) and the independent, coincident transect LiDAR data.

Table 7 .
Assessment of regionally mapped stand height and crown closure via mean difference.Mean difference was determined by use of all available boreal transect LiDAR plots data across the region of interest, as a function of dominant vegetation type and ecoregion.