Increasing Precision for French Forest Inventory Estimates using the k-NN Technique with Optical and Photogrammetric Data and Model-Assisted Estimators

Multisource forest inventory methods were developed to improve the precision of national forest inventory estimates. These methods rely on the combination of inventory data and auxiliary information correlated with forest attributes of interest. As these methods have been predominantly tested over coniferous forests, the present study used this approach for heterogeneous and complex deciduous forests in the center of France. The auxiliary data considered included a forest type map, Landsat 8 spectral bands and derived vegetation indexes, and 3D variables derived from photogrammetric canopy height models. On a subset area, changes in canopy height estimated from two successive photogrammetric models were also used. A model-assisted inference framework, using a k nearest-neighbors approach, was used to predict 11 field inventory variables simultaneously. The results showed that among the auxiliary variables tested, 3D metrics improved the precision of dendrometric estimates more than other auxiliary variables. Relative efficiencies (RE) varying from 2.15 for volume to 1.04 for stand density were obtained using all auxiliary variables. Canopy height changes also increased RE from 3% to 26%. Our results confirmed the importance of 3D metrics as auxiliary variables and demonstrated the value of canopy change variables for increasing the precision of estimates of forest structural attributes such as density and quadratic mean diameter.


Introduction
French forests, as with their European counterparts, undergo rapid changes, mainly due to the abandonment of agricultural lands [1].In France, forests are defined as land areas of at least 0.5 ha with a minimum average width of 20 m, and a canopy cover of trees capable of reaching a height of 5 m in situ of more than 10% [2].French forest area expanded by around 100,000 ha per year during the last century, and the phenomenon is showing no signs of diminishing.Similarly, forest-growing stock experienced the same trend with a time lag, and even doubled during the last 50 years [3].
As the French National Forest Inventory (NFI) is continuous, with a new sample generated each year, such an expansion is handled with a sample size adapted yearly to these new conditions [4].Nevertheless, a large proportion of these new resources are located in areas where the intensity of forest management practices has been historically limited.These new resources have different properties compared to ancient forests [3], and remain largely unknown by local stakeholders.Support for action by territorial decision-makers, forest industry, and local administrations, requires precise estimation of forest resources at larger spatial scales for establishing forest management strategies under sustainable development.As the number of NFI plots at those scales is usually limited, alternative solutions must be developed.
Multisource national forest inventory (MSNFI) methods have been applied (e.g., [5][6][7]) to improve the precision of forest resource estimates without increasing inventory costs or sampling intensities for smaller areas than normally possible using NFI field data alone [8].MSNFI relies on the combination of field inventory data with wall-to-wall auxiliary variables derived from thematic maps and remotely sensed data.The objective of MSNFI methods is to infer population parameters [6,9].Design-based estimators [10] are well adapted to NFI data because they rely on probability sampling designs as the basis for inference and are nearly design-unbiased [11][12][13][14].A model-based approach is an alternative which assumes that the population under study is a random realization of a "super-population", but is subject to bias [15] and requires the use of a correctly specified model form [16,17].Each method has its advantages and drawbacks [11], and the appropriate inference approach should be selected according to the objectives of interest because the respective estimators are based on different underlying assumptions [18][19][20].
MSNFIs have been implemented using either parametric or non-parametric models [21-24], including co-kriging, universal kriging, Bayesian regression model, geographically weighted regression, and k nearest neighbors (k-NN) [8,[25][26][27][28].Among these methods k-NN approaches have gained popularity in MSNFIs because of their multivariate properties [24,29], allowing simultaneous prediction of multiple forest attributes with a single model, and the absence of assumptions regarding the distributions of both response and auxiliary variables.A strong argument for k-NN relies on its ease of use and the fact that predictions better preserve the covariances among the observed forest response variable observations [24].Overall, the performance of k-NN relies on the number of neighbors, k, the distance metric, the weights used for individual neighbors, and the auxiliary variables to build the model [30,31].The optimal value of k is a trade-off between accuracy and variance.Single nearest neighbor (k = 1) uses only a single sample plot value in the imputations [15,32,33], and avoids extrapolation beyond bounds of reality [34], but at the cost of a reduced prediction accuracy [35].A greater k value (>3 to 10 or much greater) selected based on an optimization criterion (e.g., prediction error) using the leave-one-out technique generally improves predictive accuracy [20,36].
In terms of variables, the majority of MSNFIs rely on freely available medium (i.e., Landsat TM) or coarse (i.e., MODIS) resolution optical images [8,37] due to their availability over large territories and their frequent renewal [38].However, the correlation between optical variables and major forest attributes such as volume or biomass may be weak due to signal saturation problems [39].Haapanen and Tuominen [40] combined Landsat TM data with textural features extracted from aerial photographs and reported that the combination of variables from both data sources provided the greatest predictive accuracies.The development of three-dimensional remote sensing from both spaceborne and airborne platforms provides information related to canopy height and structure that is well-correlated with some of the major forest attributes measured in the field [19,31,41], and is expected to provide greater accuracies for forest attribute estimates [42].Studies have been conducted using Airborne Laser Scanning (ALS) data [43,44], Digital Aerial Photography (DAP) [41,45,46], and even Interferometric Synthetic Aperture Radar (InSAR) [47].Estimates obtained using auxiliary variables derived from image matching techniques were found to be in good agreement with estimates obtained using ALS-derived variables [48,49].The availability and affordability of aerial photographs at frequent intervals make them a viable alternative for forest inventory applications for updating or retrospective analysis compared to ALS or InSAR data [45,50,51].Furthermore, multi-temporal aerial photographs also facilitate estimation of growth increment or canopy change detection that may improve the precision of MSNFI estimates.Typically, studies have employed simple height-related metrics (e.g., mean height, percentiles, maximum height) for forest attribute estimation (e.g., [52]).Other studies focus on more complex descriptors of canopy structure and heterogeneity at the plot-level [53][54][55].The proliferation of auxiliary information poses the issue of the curse of dimensionality [56] and the related noise introduced by variables unrelated to response attributes of interest.For k-NN, the value of k can then be large when the relationship between field attributes and auxiliary variables is weak, and the selection of nearest neighbors may become difficult [20].
According to [57], more than 35% of studies involving k-NN approaches focused on boreal coniferous forests, and most of the studies on temperate continental and temperate mountain forests were for the USA.Despite the multivariate capabilities of k-NN, many researchers focused on a single response variable [20,58,59].In this context, the objective of this paper was to test the performance of MSNFI based on multivariate k-NN imputations for improving the precision of estimates beyond what could be obtained by NFI field data alone in a broadleaved-dominated forest, which is complex in terms of composition, structure, and fragmentation.The secondary objectives were: 1) to assess the influence of auxiliary data (Landsat images, 3D models from aerial images and forest type map) on the precision of the MSNFI estimates, and 2) to further evaluate the potential of diachronic 3D data from two successive aerial data coverages on the precision of MSNFI estimates.The performance of MSNFI was evaluated in a multivariate context with 11 forest attributes as response variables: growing stock volume, production volume, basal area, density, quadratic mean diameter, and the volume respectively of broadleaved, conifers, oak, Scots pine, other broadleaved, and other conifers.

Study Site
The study site is located in Central France (48 • 6 N to 48 • 5 N and 1 • 50 E to 1 • 30 E) and covers the Sologne and Orléans forests, an area of 7335 km 2 , of which 48% consists of forests (3600 km 2 ) (Figure 1).The area is representative of western oak-dominated broadleaved French forests with respect to landscape heterogeneity, forest habitats, and diversity in forest management practices [60].The area is influenced by a degraded oceanic climate and is marked by a greater temperature range compared with the littoral area.Mean annual temperature and precipitation are 10.9 • C and 731 mm, respectively.The slowly undulating relief ranges from 70 to 180 m elevations, and bears hydromorphic (58%), brown (23%), and podzolized (17%) soils made of sand and clay, and resulting from the erosion of the Massif Central.The forests are dominated by broadleaved (75.3%) mainly oaks (i.e., Quercus robur L. and Quercus petraea Mill.).Coniferous stands represent 15.5% of the forest area and are dominated by maritime (Pinus pinaster Ait.) and Scots pines (Pinus sylvestris L.).The remaining area of the forests is covered by mixed stands consisting mostly of oak and Scots pine.The forests are characterized by different management strategies related to forest ownership.While the southern part of the area is dominated by private forests (90%) with various intensities of management, the northern part is more intensely managed and includes the largest state forest of France, which covers around 34,600 ha.

National Forest Inventory Data
The French NFI is a continuous countrywide inventory [1,61].The two-phase sampling plan is based on a permanent systematic 1-kilometer-square unit grid.The first-phase annual sample is drawn randomly for one-tenth of the grid (10 km 2 ).For each 1-kilometer square grid cell one to four points may be drawn, thus constructing an annual sample of ~80,000 points.The first phase points are photo interpreted on 0.5 m resolution infrared aerial photographs for land cover and land use, and produce results for land cover.The second phase is a subsample of the first taken in different proportions.For forests, the standard second phase sample is obtained by taking one point out of two (~6500 points) that are visited in the field.Field observations are obtained from concentric plots of 6, 9, 15, and 25 m radius.The stand description in terms of land cover and use, as well as composition, structure, age, etc. is assessed for the 25 m plot.Trees are measured on the three smallest concentric plots, according to their circumference at a height of 1.3 m: trees in the 23.5-70.5 cm circumference range are measured on the 6 m radius plot, trees in the 70.5-117.5cm circumference range are measured on the 9 m radius plots, and the trees with a circumference of at least 117.5 m are measured on the 15 m radius plot.Dendrometric measurements include species, diameter at 0.1 and 1.3 m, total height, as well as diameter increment during the last five years.Additional measurements including timber height, mid-diameter and mid-timber diameter are taken on a subsample of plots to fit volume models [1].
For the purpose of this research, data from 775 plots measured during the 5-year period 2010-2014 were used.This period has been selected to match the acquisition date of the remotely sensed data whose reference year was 2014.A total of 11 forest attributes were considered as response variables: growing stock volume, basal area, density, and quadratic mean diameter, production volume (estimated based on measurements of the last 5 growth rings), volumes of broadleaved, conifers, oak, Scots pine, other broadleaved and other conifers.Summary statistics for the plot variables are provided in Table 1.Two photogrammetric elevation models (2008 and 2014) are available for a subset (~44%) of the study area (N = 346 plots, Table 1), and were used to assess the effect of canopy height changes on the precision of estimates.3 m, total height, as well as diameter increment during the last five years.Additional measurements including timber height, mid-diameter and mid-timber diameter are taken on a subsample of plots to fit volume models [1].
For the purpose of this research, data from 775 plots measured during the 5-year period 2010-2014 were used.This period has been selected to match the acquisition date of the remotely sensed data whose reference year was 2014.A total of 11 forest attributes were considered as response variables: growing stock volume, basal area, density, and quadratic mean diameter, production volume (estimated based on measurements of the last 5 growth rings), volumes of broadleaved, conifers, oak, Scots pine, other broadleaved and other conifers.Summary statistics for the plot variables are provided in Table 1.Two photogrammetric elevation models (2008 and 2014) are available for a subset (~44%) of the study area (N = 346 plots, Table 1), and were used to assess the effect of canopy height changes on the precision of estimates.

Auxiliary Data
Three sources of auxiliary data available at no cost were considered: a forest type (FT) map (BD Forêt ® v2) constructed by the French National Institute of Geographic and Forest Information (IGN), DAP models of the forest canopy surface, and Landsat images.All auxiliary variables were processed at a 30 m spatial resolution corresponding to the Landsat image pixel size and to the field plot area used for dendrometric measurements.
The FT map product BD Forêt ® is a vector layer constructed by photo-interpretation of near-infrared aerial photographs.The map includes a total of 32 classes and a minimum polygon size of 0.5 ha.The nomenclature includes a description of the vegetation type with the dominant species, according to tree cover classes.The map corresponding to the study region has been constructed using aerial images acquired in 2008.The forest contours were used to construct a forest mask.The mask was further intersected with the 30 m grid, and only the grid cells completely in the forest mask were considered for the analysis [10].The FT map attributes were summarized for each grid cell and aggregated into three classes corresponding to pure broadleaved, pure conifers and mixed stands.
The DAP Digital Surface Models (DSM) were normalized using a co-occurring ALS Digital Terrain Model (DTM) to generate Canopy Height Models (CHM).ALS data were not considered as auxiliary data for describing the forest structure as their availability and cost prevent regular updates.The ALS data were acquired by IGN between January 17 and March 30, 2014, as part of the land survey with a point density of 2 pts.m −2 and a flight line overlap of 50%.The point cloud was processed using the iterative TIN algorithm of the Terrascan software (Terrasolid, https://www.terrasolid.com,last consulted October 25, 2018), followed by visual inspection and manual correction whenever necessary.The algorithm was run using a terrain angle of 88 • , iteration angle of 6 • , and an iteration distance of 1.4 m.The point data were delivered by 1 km x 1 km tiles, and were buffered using a 30 m threshold.Ground classified points were used to construct a digital terrain model with a 1 m resolution using inverse distance weighted interpolation.The DAP DSMs were constructed using aerial images acquired in 2013 (eastern part of the area, from July 06 to 12), 2014 (western part of the area, from May 18 to July 16), and 2008 (western part of the area only).The 2013-2014 images were acquired using an IGN camera (sensor V28T, 14,650 x 10,700 pixels of 6.8 µm each) mounted with a Zeiss lens having a 125 mm focal length.The images were acquired at 6,400 m above ground level, leading to a 35 cm resolution at the ground level.The 2008 images were acquired on August 30 using a DMC camera (sensor of 13,824 x 7680 pixels of 9 µm each) and a Zeiss 90 mm focal length lens.The acquisition was done at an altitude of 5,500 m, leading to a 54 cm resolution at the ground level.For all flights, the image overlap was 60% along and 25% across tracks.The image orientation was done by IGN as part of the production process.The dense image matching was done using MicMac open source photogrammetric software [62].MicMac is based on a multi-resolution-optimization approach and used a coarse to fine matching strategy based on image pyramids to improve the processing time and control the similarity of matches between levels.The dense matching was performed using a Per Images Matching approach (PIMs mode) in image geometry and rely on tie points generated using the 'Scale-Invariant-Feature Transform' (SIFT) detector [63].The matching approach produces a depth map for each image pair.The depth maps are then merged together and converted into a dense point cloud.The resulting point clouds were converted into a 1 m DSM using the maximum height per pixels.Empty pixels were not interpolated.The DSM was further converted into a canopy height model (CHM) by subtracting the ALS DTM.Two categories of metrics were derived from the CHM to serve as auxiliary variables for each cell of the 30 m grid: distribution metrics above a height threshold fixed at 5 m to match with the definition of forest and structure metrics [53].The distributional metrics included the percentiles 0 to 100 by steps of 10 (p0 to p100), as well as upper percentiles 95 and 99 (p95, p99), the mean (h mean ), standard deviation (h std ), variance (h var ), and the mean absolute deviation (h mad ) of heights.The structure metrics (Table A1), included the gap area ratio (Ga), the mean inner canopy volume (Vi), the mean outer canopy volume (Vo), the mean inner canopy volume above a given threshold value (Th) fixed here at 5 m (Vci), the mean outer canopy volume above Th (Vco), defined as the complement of the canopy volume to the maximum height, the mean inner canopy volume within gaps (Vgi) and its outer complement (Vgo), the standard rumple area (Ra) [55], or the ones defined with respect to Vi (Ra1) and Vci (Ra2), as well as the number of empty pixels (NA).Over the subset area, various difference metrics were computed, including changes in gap area (dGa), in volume differences (e.g., dVi, dVo), mean, minimum and maximum heights (dh mean , dp0, dp100), as well as in the upper percentiles (dp95, dp99).
Four Landsat 8 images acquired on September 8, 2014 were downloaded from the Theia platform (https://theia-landsat.cnes.fr)with the processing level 2A, including orthorectification, atmospheric correction and cloud detection [64].The reflectance images were used to compute various indices (Table A2): the simple ratio (SR), the normalized difference vegetation index (NDVI), the Specific Leaf Area Vegetation Index (SLAVI), the Soil Adjusted Vegetation Index (SAVI), the Modified Soil Adjusted Vegetation Index (MSAVI), the enhance vegetation index (EVI) the green NDVI (GNDVI), the Normalized Difference Moisture Index (NDMI), Normalized Difference Water Index (NDWI) [65,66], as well as brightness (Br), greenness (Gr), and wetness (We) derived from Tasseled Cap transformation [67].In addition to computed spectral reflectance indices, we also included the seven reflectance bands of Landsat 8: Ultra Blue (UB), Blue (B), Green (G), Red (R), Near Infrared (NIR), Shortwave Infrared (SWIR) 1, and Shortwave Infrared (SWIR) 2. The Landsat metrics were assigned to the grid cells by intersecting the grid with the coordinate of the center of the Landsat pixels.

Optimization of the k-NN Model
MSNFI performance depends on: (1) the modeling framework, and (2) the correlation between the surveyed forest attributes and the auxiliary information.
As indicated above, a non-parametric multivariate k-NN approach was adopted to construct wall-to-wall predictions of forest attributes.The k-NN parameters include the number of neighbors (k), the distance metric used to search the variable space, and a weighting function to compute predictions [57].In k-NN, the population units for which both response and auxiliary variables are available are termed the reference set and the units for which predictions of response variables are sought are termed the target set.The auxiliary variables are also termed feature variables and the space defined by them is termed the feature space.
Based on a literature survey [57,68] and preliminary results, two distance metrics were compared: Euclidean and Canonical Correlation Analysis (CCA) [69].The Euclidean distance is computed in a normalized space of auxiliary variables.The CCA distance is defined by Equation (1): where Γ is the matrix of canonical vectors corresponding to the X's found by canonical correlation analysis between X and Y, Γ T the transposed matrix, and ∧ is the canonical correlation matrix.
Values of k in the range of 1 to 20 were tested to optimize model performance.The weight of the neighbors used to compute the imputed values for the continuous variables according to Equation (2): where d ij is the distance between target pixel i and reference pixel j in the feature space.k-NN models were generated using six different combinations of auxiliary variables: (1) Landsat variables alone, (2) Landsat variables and FT from the forest map, 3() 3D metrics from DAP CHMs, (4) 3D metrics from DAP CHMs with FT from the forest map, (5) Landsat variables and 3D metrics from DAP CHMs, and (6) all auxiliary variables.On the subset area, the introduction of change detection variables was also tested.Only two combinations of auxiliary variables were considered: all former auxiliary variables used alone, or with the change detection variables.
Note that preliminary analyses revealed estimation accuracy issues due to auxiliary variables having weak correlations with the response variables, which has also been reported in [30,31].The problem was solved by filtering out auxiliary variables that have maximum correlation values less than 0.2 with the response variable considered.A total of four auxiliary variables were discarded from analysis, including three Landsat bands (UB, B, R), and one volume metric (Vco).
The performance of the k-NN models was evaluated using a leave-one-out cross-validation approach (LOOCV).Model accuracy was evaluated using the Root Mean Squared Difference (RMSD) between observed and imputed values [70].For comparing imputation models across multiple response variables with different measurement units, RMSD value was standardized by dividing RMSD by the standard deviation of the observations of the response variable (i.e., Scaled RMSD = RMSD/Standard deviation of the response variable) [71].Optimal k values were selected as those for which mean RMSD across multiple response variables did not differ more than 5% from the minimal mean RMSD value [20].
k-NN optimization was conducted using R open source software with the yaImpute package [71].

Statistical Inference
Using field data only, the design-based, simple expansion (Exp) estimators of the population mean and variance were defined by Equations ( 3) and (4).
where n is the number of sample plots, and y i is the observed value of plot i.
The inference of population parameters from the k-NN predictions across the study area was assessed using model-assisted generalized regression (GREG) estimators of means and variances [72].Despite using the term regression in the label characterizing GREG estimators, multiple prediction techniques other than regression models, and particularly other than linear regression models, have been used with the GREG estimators [73][74][75].The GREG estimator of the mean is the difference between the mean of the k-NN predictions across population units and the estimated bias, as shown in Equation ( 5): where ε i = ( ŷi − y i ) is the difference between k-NN prediction and the observed response variable (estimate of bias), N is the number of population units or pixels, and n is the number of field sample plots.The associated variance estimator is computed using Equation (6): where ε = 1 n • n i=1 ε i .The performance of k-NN was evaluated through a measure of relative efficiency (RE = Vâr μExp /Vâr μGREG .RE provides an estimate of the gain in precision resulting from the use of the auxiliary information that is incorporated into the model-assisted estimators in comparison with pure field-based estimates.RE values greater than 1 indicate increased precision.The primary advantage of the GREG estimators is that they take advantage of the relationship between the probability based sample plot data and their corresponding predictions to reduce the variance of the estimated population mean [20] and thereby increase RE.For each combination of auxiliary variables: μGREG, Vâr μGREG , and RE were calculated for all 11 response variables.

External Validation
Internal models are recognized to underestimate variance [12,13].To account for this issue, a complementary analysis was conducted.Both the main and subset areas were divided into three sub-areas having approximately the same number of field plots (i.e., 258 (~775/3) and 115 (~346/3) plots for the main and the subset areas, respectively).Each sub-area was used to build a k-NN model following the methods presented in Sections 2.4 and 2.5.The models were then applied to the two other sub-areas as external models.The mean REs obtained with both approaches were compared to assess the underestimation associated with the internal model.

k-NN Optimization
Table 2 shows the mean RMSD values for the various combinations of auxiliary variables and distance metrics.The smaller the RMSD, the greater the similarity between the reference and the imputed observations.Using Euclidean distance, mean RMSD values ranged from 1.00 for the Landsat metrics alone to 0.84 for the overall metrics.The k values selected were relatively stable, ranging from 4 to 8. Interestingly, the combination of 3D metrics with the FT map performed well with a mean RMSD of 0.85, which was only 0.01 greater than for the combination having smaller RMSD with a 37.5% decrease in the number of auxiliary variables.By comparison, the combination of 3D metrics with Landsat metrics (39 variables) performed slightly less well with mean RMSD of 0.86, and a greater range of individual RMSD values.Slightly smaller RMSD values were obtained using CCA distance, with mean RMSD values ranging from 0.99 for the Landsat metrics alone to 0.84 with the full set of metrics.The optimal k value was also rather stable, ranging from 5 to 6.With both distance metrics, the introduction of canopy change auxiliary variables contributed to greater mean RMSD values.The improvement was more pronounced using the Euclidean distance metric, with a gain in mean RMSD of 0.02.

Statistical Inference
Tables 3 and 4 show the estimated means and corresponding REs for the 11 field response attributes for the whole area computed using Euclidean and CCA distances, respectively.Using Euclidean distance (Table 3), the greatest RE was obtained for total volume, with a value of 2.18.The smallest RE was obtained for the volume of other broadleaved, with a value of 1.03.For nine out of the 11 forest attributes, the greatest RE was achieved with all auxiliary variables combined.The stand density was most accurately estimated using the 3D metrics alone (RE = 1.21).For the volume of other broadleaved, the most accurate auxiliary variable combination included the 3D metrics and the forest types, and produced RE of 1.03.With CCA distance (Table 4), the greatest RE was obtained for the volume of other conifers with a value of 2.37.As for Euclidean distance, the least accurate results were obtained for the volume of other broadleaved, reaching a RE of 1.06.The greatest REs were achieved using all auxiliary variables combined for only six of the 11 forest attributes.The greatest REs for total volume (2.04), stand basal area (1.44), broadleaved volume (2.15), and oak volume (1.65) were obtained using as the 3D metrics and the FT.For stand density, the greatest RE was obtained with the 3D metrics only (1.14).On average, Euclidean distance was more accurate for six of the 11 forest attributes.But CCA distance tended to be more accurate for those field attributes showing the smallest REs, namely the production volume, the Scots pine volume and the volume of other conifers.Table 5 shows the RE achieved on the subset area after including 3D change detection metrics.Using Euclidean distance, REs increased for nine of the 11 field attributes, with an average RE difference of 0.05.The greatest increase was obtained for the production volume with a RE increase of 0.17.RE was most degraded for the volume of other conifers, with a difference of −0.06.Using CCA distance, the mean RE difference was 0.04.However, the introduction of 3D change detection metrics contributed to increased RE for six forest attributes only.The greatest increases were achieved for conifer volume and production volume, with RE differences of 0.17 and 0.15, respectively.On the contrary, RE values were degraded for the volume of other conifers (−0.06), the broadleaved volume (−0.03), and the stand basal area (−0.02).Overall the greatest RE values were obtained using a CCA distance for seven out of the 11 field attributes.

External Validation
The results of the external validation are presented in Figure 2.Over the main area, the mean RE obtained with the Euclidean distance was 1.45 for the internal model and 1.38 for the external model.REs obtained with the CCA distance were of the same magnitude, with an internal RE of 1.41 and an external one slightly greater with a value of 1.42.The results obtained over the subset area showed the limitation of CCA distance associated with respect to the number of field sample units.Using the Euclidean distance, the greatest REs were obtained using the change detection metrics, with values of 1.41 and 1.35 for the internal and external models, respectively.This is 0.04 and 0.05 greater than the RE achieved without considering change metrics.By comparison, the greatest RE output with the CCA distance were almost similar to the two sets of metrics, reaching 1.27 with the internal models and 1.16 and 1.15 with the external models with and without change metrics, respectively.
Figure 2 also highlights the differences between forest attributes.Volume attributes were estimated with the greatest RE, provided these attributes were well represented within the field sampling plots, and were not rare events.Indeed, the total volume, total volume of conifers and of broadleaved were estimated with RE above 1.5.Those volumes with smaller frequencies like the volume of other broadleaved than oaks and the volume of other conifers than scot pines were estimated with RE around 1. The introduction of change detection metrics with the Euclidean distance mostly benefits estimation of production volume, a flux variable, and stand density to a lesser extent.RE for the former increased by 0.13 from 1.08 to 1.21 for the internal model, and by 0.2 (i.e., from 1.01 to 1.21) for the external model.The latter showed a gain in RE of 0.09 and 0.08 for the internal and external models, respectively.3.

k-NN Optimization
Our results demonstrated that 3D metrics extracted from DAP CHMSs were the most efficient auxiliary variables for estimating the 11 forests attributes considered (Table 2).Similar results were obtained in boreal countries [76].However, the limited efficiency of Landsat metrics was expected in the structurally complex broadleaved-dominated forests, with only small correlations between Landsat indices and the forest attributes considered.Such small correlations could be explained by the well documented saturation effect of Landsat spectral indices in average biomass levels [77], thus providing limited capabilities to differentiate subtle variations in forest attributes.That said, the combination of 3D auxiliary variables with information about vegetation composition and related structural characteristics summarized by vegetation indices, significantly improved the RMSD, illustrating the complementarity of information sources to estimate field measured forest attributes [78].Interestingly, the combination of 3D metrics with either Landsat metrics or the FT map provided similar results (Table 2).Despite a limited spatial resolution, the FT information serves as a proxy for the dominant species, which is an important variable associated with numerous forest attributes.Such information is rare and expensive to produce, as it remains largely based on semi-automated approaches and requires substantial manual photo-interpretation. Furthermore, the FT map used in this study was constructed using images acquired in 2008 and could thus be considered as outdated in some locations where changes due to management or natural events such as storms have occurred, contributing to errors in population mean estimates.A study described in Reference [79] introduced a stratified k-NN method by auxiliary data in multisource forest inventories for reducing the effect of inaccurate map data on the forest-resource estimates in a model-based framework.Those methods for design-based, model-assisted approaches should be investigated.However, considering the  3.

k-NN Optimization
Our results demonstrated that 3D metrics extracted from DAP CHMSs were the most efficient auxiliary variables for estimating the 11 forests attributes considered (Table 2).Similar results were obtained in boreal countries [76].However, the limited efficiency of Landsat metrics was expected in the structurally complex broadleaved-dominated forests, with only small correlations between Landsat indices and the forest attributes considered.Such small correlations could be explained by the well documented saturation effect of Landsat spectral indices in average biomass levels [77], thus providing limited capabilities to differentiate subtle variations in forest attributes.That said, the combination of 3D auxiliary variables with information about vegetation composition and related structural characteristics summarized by vegetation indices, significantly improved the RMSD, illustrating the complementarity of information sources to estimate field measured forest attributes [78].Interestingly, the combination of 3D metrics with either Landsat metrics or the FT map provided similar results (Table 2).Despite a limited spatial resolution, the FT information serves as a proxy for the dominant species, which is an important variable associated with numerous forest attributes.Such information is rare and expensive to produce, as it remains largely based on semi-automated approaches and requires substantial manual photo-interpretation. Furthermore, the FT map used in this study was constructed using images acquired in 2008 and could thus be considered as outdated in some locations where changes due to management or natural events such as storms have occurred, contributing to errors in population mean estimates.A study described in Reference [79] introduced a stratified k-NN method by auxiliary data in multisource forest inventories for reducing the effect of inaccurate map data on the forest-resource estimates in a model-based framework.Those methods for design-based, model-assisted approaches should be investigated.However, considering the importance of the auxiliary variable, there is an interest for NFIs to automate construction/updating of those maps to account for variations in forest surfaces and types in a k-NN framework.
The introduction of variables based on 3D changes contributed to the reduction of RMSD, demonstrating the importance of this kind of metric for increasing the precision of estimates and for developing monitoring systems.This result further points out the potential of time-series of optical imageries to improve neighbor selection within the k-NN method and to contribute to improved estimations [80].Long-term times-series including decades of observations might provide insights into forest management [78], especially clear and partial cuts, and could be used to update forest maps [7].Short-term time-series, built using images collected over a year, might also provide information about phenology and further improve neighbor selection [33].Apart from time-series data, texture metrics extracted from aerial images or very high-resolution satellites data need to be further considered, as they were found to provide information about forest structure and composition [81,82] and did not appear to saturate with increases in biomass levels [82].
In terms of k-NN setup, the optimal number of neighbors, defined according to [20] was relatively constant, despite a large variation in the number of auxiliary variables.Our results are in agreement with the literature, with optimal k values ranging between 1 and 10, with five as a most often selected value [57].CCA distance, despite using both auxiliary and field data in the k-NN construction only marginally improved the mean RMSD with respect to Euclidean Distance.This makes Euclidean distance appealing since additional field attributes could be estimated without a requirement to re-compute a new k-NN model.This is particularly attractive for NFIs that collect hundreds of variables and could also compute additional information from those field measured attributes.

Statistical Inference
The estimated means and REs confirmed the trends observed with the RMSD values.The combination of field and Landsat data provided limited precision gain, and even generated greater variance than the one obtained using field data alone for a majority of field attributes.All but one of the REs were greater than 1.0 when the auxiliary variable set included the 3D metrics and either Landsat metrics or FT map.The greatest REs were obtained for volume-related field attributes, to the degree that the attributes were not rare events and were well-represented by the field data.Such a result was expected, since volume is a function of the canopy height and is thus well correlated with the auxiliary variables describing canopy surface height and structure [53].The degraded precision of volume attributes having small frequencies in our field sample such as the volume of other broadleaved trees, or the volume of other conifers showing a limited precision gain, which has also been reported by others [68].Indeed, forest attributes with few observations from rare populations are expected to contribute to large standard errors [83].Other structural metrics were estimated with RE ranging from 1.14 (stand density) to 1.58 (basal area).Those results are of the same magnitude as reported by [29] for a forest area in north central Minnesota in the USA, with values ranging from 1.23 to 1.35 for six forest attributes including volume.The moderate precision gain achieved here for stand density and to some extent to production volume and the QMD could be explained by the smaller correlations of those field attributes with both 3D and 2D metrics [84].Even though some studies did not report the same trends [29,85] the smaller precision gains reported here for density and QMD could be attributed to the complexity of the forest, characterized by two dominant forest structures made of regular stands (~55%) and of coppice-with-standards (~40%).
An important result was the performance gains achieved with the introduction of variables associated with 3D changes (Table 5).RE increased on average by 0.06 and 0.04 using the Euclidean and CCA distance, respectively.Using both distance metrics, the most substantial precision gain was obtained for the production volume (0.17 and 0.15, respectively), which is a flux variable and thus benefited from the inclusion of 3D variables related to canopy dynamics.Other field attributes benefiting from the inclusion of change detection variables with both distance metrics were the conifer and oak volumes.While the QMD also showed a substantial precision gain with CCA distance (0.09), the stand density (0.09), the stand basal area (0.07), and the total volume (0.06) exhibit among the largest gain in RE based on Euclidean distance.This indicated that changes in canopy height and structure provide additional and pertinent information about traditional field measurements of forest structure that are difficult to estimate from remote sensing means [86].The differences achieved in the k-NN setup have been reported by others [68,86].The study described in Reference [86] reported that MSN was sensitive the choice of the response variables.

External Validation
The external validation showed limited RE differences between internal and external models, indicating that the results achieved over the whole area with internal models are not as overly optimistic as reported in other studies [13].That said, the study highlights a limitation of the CCA distance metric when an internal model is used.As CCA uses both X and Y variables to set-up the k-NN, it tends to generate a reduced variance as compared to Euclidean distance.However, in an external validation context, CCA-based predictions appear to have more limited capabilities to predict attributes outside the geographical domain used to train the k-NN model [87].Furthermore, the study conducted in Reference [68] reported that CCA distance tended to have decreased accuracy compared to other distance metrics when no variable selection is performed, probably highlighting a curse of dimensionality issue while both X and Y variables are considered in the feature space [56].Furthermore, the study described in Reference [29] strongly advised using variable selection approach for the CCA distance metric despite its capability to weight auxiliary variables according to their explanatory power, because multicollinearity degrades the predictive power and the estimation of canonical weights rapidly becomes more complex leading to instability figure [88].This result therefore indicates that model-assisted inference without variable selection should avoid using CCA distance metric, at the benefit of a distance metric independent of the response variables, such as the Euclidean one.

Conclusions
Five principal conclusions could be drawn from this research: (1) provided a sufficient number of sample plots, design-based model-assisted inference performs well with large dimension data, as far as the data sources are related to the forest attributes considered.In such situation, the diversity and complementary of the auxiliary data are expected to improve precision (produce larger REs) and reduce RMSD; (2) a substantial increase in precision of the forest attribute estimates was brought by 3D metrics and the addition of canopy change metrics.The latter contributed to improve markedly the estimations in production volumes; (3) the optimal k value was stable with respect to the k-NN configuration tested; (4) the CCA distance metric involving both feature and response variables could be affected by dimensionality problems.Euclidean distance should be preferred when no variable selection is performed; and (5) the k-NN technique in conjunction with model-assisted estimators produced a significant improvement in precision of inventory parameter estimates.
This work demonstrated the potential of MSNFI approaches in complex broadleaved dominated forests such as those found in France.This opens up the possibilities of more forest attribute estimation for smaller spatial domains.Future work will focus on a downscaling approach involving both model-assisted and model-based approaches.
No 633464, as well as CHM-ERA project from the French National Research Agency (ANR) as part of the "Investissements d'Avenir" program (ANR-11-LABX-0002-01, Lab of Excellence ARBRE).
Remote Sens. 2018, 10, x FOR PEER REVIEW 4 of 20 structure, age, etc. is assessed for the 25 m plot.Trees are measured on the three smallest concentric plots, according to their circumference at a height of 1.3 m: trees in the 23.5-70.5 cm circumference range are measured on the 6 m radius plot, trees in the 70.5-117.5cm circumference range are measured on the 9 m radius plots, and the trees with a circumference of at least 117.5 m are measured on the 15 m radius plot.Dendrometric measurements include species, diameter at 0.1 and 1.

Figure 1 .
Figure 1.Localization of the study area (a), showing (b) the forest mask and the 775 national forest inventory plots in the main and the subset (southwestern part below dotted line) areas; 1 km 2 tiles of (c) the forest type map; (d) a 30 m resolution Landsat color composition (bands 4,3,2); and (e) a 1 m resolution photogrammetric canopy height model.The position of the tile used to illustrate is shown as a black rectangle in (b).

Figure 1 .
Figure 1.Localization of the study area (a), showing (b) the forest mask and the 775 national forest inventory plots in the main and the subset (southwestern part below dotted line) areas; 1 km 2 tiles of (c) the forest type map; (d) a 30 m resolution Landsat color composition (bands 4,3,2); and (e) a 1 m resolution photogrammetric canopy height model.The position of the tile used to illustrate is shown as a black rectangle in (b).

Figure 2 .
Figure 2. Mean relative efficiencies (RE) achieved with the internal and external models for 11 forest attributes over the main and subset areas.The legend of the number in parenthesis corresponding to forest attributes are given in Table3.

Figure 2 .
Figure 2. Mean relative efficiencies (RE) achieved with the internal and external models for 11 forest attributes over the main and subset areas.The legend of the number in parenthesis corresponding to forest attributes are given in Table3.

Table 1 .
Mean and variance of field plot attributes for the whole area and the subset area.

Table 1 .
Mean and variance of field plot attributes for the whole area and the subset area.

Table 2 .
Cross-validated Root Mean Squared Differences (RMSD) of the imputation, for the optimal k-NN models.Numbers in parenthesis are minimum and maximum values of the 11 forest attributes.

Table 3 .
Mean, standard error (SE), and relative efficiencies (RE) computed on the main area for the different combination of auxiliary variables using Euclidean distance.The greatest RE for each forest attributes appears in bold.

Table 4 .
Mean, standard error (SE) and relative efficiencies (RE) computed on the main area for the different combinations of auxiliary variables using Canonical Correlation Analysis distance.The greatest RE for each forest attribute appears in bold.

Table 5 .
Effect of the inclusion of diachronic variables on the mean, standard error (SE), and relative efficiencies (RE) over the subset area, for both distance metrics.The greatest RE for each forest attribute and distance metric appears in bold.