Tree-Lists Estimation for Chinese Boreal Forests by Integrating Weibull Diameter Distributions with MODIS-Based Forest Attributes from kNN Imputation

Wall-to-wall tree-lists information (lists of species and diameter for every tree) at a regional scale is required for managers to assess forest sustainability and design effective forest management strategies. Currently, the k-nearest neighbors (kNN) method and the Weibull diameter distribution function have been widely used for estimating tree lists. However, the kNN method usually relies on a large number of field inventory plots to impute tree lists, whereas the Weibull function relies on strong correlations between stand attributes and diameter distribution across large regions. In this study, we developed a framework to estimate wall-to-wall tree lists over large areas based on a limited number of forest inventory plots. This framework integrates the ability of extrapolating diameter distribution from Weibull and kNN imputation of wall-to-wall forest stand attributes from Moderate Resolution Imaging Spectroradiometer (MODIS). We estimated tree lists using this framework in Chinese boreal forests (Great Xing’an Mountains) and evaluated the accuracy of this framework. The results showed that the passing rate of the Kolmogorov–Smirnov (KS) test for Weibull diameter distribution by species was from 52% to 88.16%, which means that Weibull distribution could describe the diameter distribution by species well. The imputed stand attributes (diameter at breast height (DBH), height, and age) from the kNN method showed comparable accuracy with the previous studies for all species. There was no significant difference in the tree density between the estimated and observed tree-lists. Results suggest that this framework is well-suited to estimating the tree-lists in a large area. Our results were also ecologically realistic, capturing dominant ecological patterns and processes.


Introduction
Tree-lists (i.e., lists of species and diameter for every tree) [1] are of primary importance for deriving many stand attributes of interest (e.g., species composition, basal area, biomass).They are also valuable information for parameterizing forest landscape models (FLMs) [2], which are useful tools to examine how climate change, forest management, and other environmental stressors interact with species distributions and ecosystem conditions over large areas and long time frames [3,4].As such, tree-lists across large regions are required to assess forest sustainability and design effective Forests 2018, 9, 758 3 of 18

Study Area
Our study area was located in the Great Xing'an Mountains in China, which is a typical southern boreal forest area with elevations varying from 139 to 1511 m (121 • 12 -127 • 00 E, 50 • 10 -53 • 33 N) (Figure 1).This area has a cold temperate continental monsoon climate with average annual temperature varying from 1 • C at its southern extremes to −6 • C at its northern extremes, and precipitation varying from 442 mm in the south to 240 mm in the north.More than 60% of the annual precipitation falls in the summer season from June to August [26].The slope of our study area is relatively gentle, with 80% of the area less than 15 • .Most of the landscape is mainly dominated by cool temperate coniferous forests.The main boreal conifer tree species is Dahurian larch (Larix gmelinii (Rupr.)Kuzen, hereafter "larch"), which are usually found in cool moist sites.The broadleaf tree species mainly include white birch (Betula platyphylla Suk.) and aspen (Populus davidiana Dode), and occupy drier, well-drained sites of the area [27].Other tree species, such as Korean spruce (Picea koraiensis Nakai, hereafter "spruce"), Asian black birch (Betula davurica Pall., hereafter "black birch"), Mongolian Scots pine (Pinus sylvestris var.mongolica Litv., hereafter "pine"), willow (Chosenia arbutifolia (Pall.) A. Skv.), Mongolian oak (Quercus mongolica Fisch.ex Ledeb.), and a shrub species (Pinus pumila (Pall.)Regel), are interspersed with larch forests and have a small area of distribution [28].Black birch and Mongolian oak were mainly distributed in the southern part of the study area, whereas pine was distributed in the northern part.Young and middle-aged forests comprised the main part of the forest landscapes, with the exception of the conservation area due to long-term logging and fire disturbance.The forest structures varied due to frequent disturbance and large spatial variation in climate and terrain.

Study Area
Our study area was located in the Great Xing'an Mountains in China, which is a typical southern boreal forest area with elevations varying from 139 to 1511 m (121°12'-127°00' E, 50°10'-53°33' N) (Figure 1).This area has a cold temperate continental monsoon climate with average annual temperature varying from 1 °C at its southern extremes to −6 °C at its northern extremes, and precipitation varying from 442 mm in the south to 240 mm in the north.More than 60% of the annual precipitation falls in the summer season from June to August [26].The slope of our study area is relatively gentle, with 80% of the area less than 15°.Most of the landscape is mainly dominated by cool temperate coniferous forests.The main boreal conifer tree species is Dahurian larch (Larix gmelinii (Rupr.)Kuzen, hereafter "larch"), which are usually found in cool moist sites.The broadleaf tree species mainly include white birch (Betula platyphylla Suk.) and aspen (Populus davidiana Dode), and occupy drier, well-drained sites of the area [27].Other tree species, such as Korean spruce (Picea koraiensis Nakai, hereafter "spruce"), Asian black birch (Betula davurica Pall., hereafter "black birch"), Mongolian Scots pine (Pinus sylvestris var.mongolica Litv., hereafter "pine"), willow (Chosenia arbutifolia (Pall.) A. Skv.), Mongolian oak (Quercus mongolica Fisch.ex Ledeb.), and a shrub species (Pinus pumila (Pall.)Regel), are interspersed with larch forests and have a small area of distribution [28].Black birch and Mongolian oak were mainly distributed in the southern part of the study area, whereas pine was distributed in the northern part.Young and middle-aged forests comprised the main part of the forest landscapes, with the exception of the conservation area due to long-term logging and fire disturbance.The forest structures varied due to frequent disturbance and large spatial variation in climate and terrain.

Plot and Stand Inventory Data
Two types of forest inventory data were collected in this study.One was the limited plot inventory data with tree-lists, which were used to build the parameter estimation models of

Plot and Stand Inventory Data
Two types of forest inventory data were collected in this study.One was the limited plot inventory data with tree-lists, which were used to build the parameter estimation models of Weibull diameter distributions by species.The other were the stand inventory data without tree-lists, which were used to estimate the wall-to-wall stand attributes by the kNN imputation method.
The plot inventory data included a total of 212 plots, which were measured in Huzhong and Jiagedaqi forestry bureaus in 2011 (Figure 1).Among these plots, 191 plots were in the middle part of the study area and contained most tree species of the study area except Mongolian oak and black birch.Thus, we surveyed another 21 plots that included Mongolian oak and black birch located in the southern part of the study area.All plots were 20 m × 30 m.The diameter at breast height (DBH) for all trees larger than 5 cm was measured in each plot.We also recorded the GPS coordinates, slope, elevation, mean stand age, and mean height in each plot.We calculated the arithmetic mean DBH by species by dividing the sum of tree DBHs by the tree number of corresponding tree species.
Stand inventory data were collected from the China Forestry Science Data Center (CFSDC, http://www.cfsdc.org),which included 7635 stand polygons with relatively complete attributes from 1997 to 2001.These stand polygons ranged from a few hectares to dozens of hectares.Within each stand polygon, stand characteristics such as stand DBH, height, age, tree species composition (volume proportion), and the stand volume density were recorded in accordance with Chinese inventory requirements for forest management planning and design [29].Stand DBH is the DBH of the trees with the mean basal area of the stand.It is different from mean arithmetic mean DBH.The detailed information about polygon stand inventory data can be found in Zhang et al. [30].

Predictor Variables
Forest stand characteristics can be characterized by the reflectance values of satellite images and influenced by environment factors.In this study, Moderate Resolution Imaging Spectroradiometer (MODIS) satellite image variables and environment variables were selected as predictor variables to estimate wall-to-wall stand attributes (e.g., stand mean DBH, height, and age).The detailed lists of predictor variables can be found in Zhang et al. [30].MODIS variables included seven MODIS monthly composite surface reflectance bands and several vegetation indices correlated with vegetation characteristics in June of 2000.Environment variables in this study included climate variables, topographic variables, and location variables.Climate variables included mean annual precipitation and temperature between 1982 and 2009, (raster layers with a 1 km spatial resolution), which were generated by interpolating data from the National Meteorological Center of China by Mao et al. [31].Topographic variables including elevation, slope, and cosine of aspect (COSASP) were derived from the Shuttle Radar Topography Mission digital elevation model (SRTMDEM, 90 m spatial resolution).SRTMDEM was downloaded from the International Scientific and Technical Data Mirror Site, Computer Network Information Center, Chinese Academy of Sciences (http://www.gscloud.cn/).All the predictor variables except for topographic variables were resampled to 250 m pixel resolution using nearest neighbor method.

Overall Framework for Tree-Lists Estimation
The approach in this study included the following steps (Figure 2): (1) We built the Weibull parameter prediction models (WPPMs) using plot inventory data for estimating diameter distributions in each pixel; (2) We used kNN models to map forest stand attributes (e.g., species-level biomass, stand age, height, DBH) and provided the independent variables for WPPMs in each pixel; (3) We estimated tree-lists by species in each pixel by combining Weibull diameter distributions estimated by WPPMs and forest stand attributes imputed by the kNN method.

Building WPPMs
The two-parameter Weibull function was applied to modeling the diameter distribution by species in the 212 plots.The equation of the Weibull probability density function is where f(x) is the tree density probability function with diameter, x represents diameter, and b and c are the shape parameter and scale parameter of the function, respectively.Maximum likelihood estimation (MLE, Equation ( 2)) was used to estimate the parameters of the Weibull functions of each species for each plot.MLE is a common method used to estimate the parameters of distributions [20,21].The MLE equation is as follows:

Building WPPMs
The two-parameter Weibull function was applied to modeling the diameter distribution by species in the 212 plots.The equation of the Weibull probability density function is where f (x) is the tree density probability function with diameter, x represents diameter, and b and c are the shape parameter and scale parameter of the function, respectively.Maximum likelihood estimation (MLE, Equation ( 2)) was used to estimate the parameters of the Weibull functions of each species for each plot.MLE is a common method used to estimate the parameters of distributions [20,21].The MLE equation is as follows: where x 1 , x 2 , . . ., x n are the observed values of the samples from the population, L(x 1 , x 2 , . . ., x n , θ) is the MLE function, and f (x, θ) is the probability density function with unknown parameters (θ).The maximum probability for the samples (x 1 , x 2 , . . ., x n ) is required when using MLE method to estimate the value of θ.If the MLE function (L) is a differentiable function, the value of θ satisfies the requirement of the function: According to Equation ( 3), the unknown parameters can be estimated.The MLE method was executed using the MASS package [32] in R [33].When parameters b and c were estimated in each plot, the relationship between the parameters and stand variables could be built.In this study, arithmetic mean DBH by species was selected as the independent variable to build WPPMs using the ordinary least squares (OLS) method.

Mapping Forest Stand Attributes
The stand attributes including DBH, age, and height were mapped by integrating the forest attributes from stand polygons with predictor variables as described in Section 2.3 using the kNN method.The inverse distance weighted method and k = 6 that Zhang et al. used were selected to weight the nearest neighbor reference elements.The mean arithmetic DBH for larch and white birch in each pixel were separately estimated using mean stand DBH, age, and height as independent variables by Equations ( 4) and ( 5), which were built by Qi [34]: BD = e 0.393344+0.831571lnD+0.00305t+0.005589H . ( LD is the mean arithmetic DBH of larch, BD is the mean arithmetic DBH of white birch, D is mean stand DBH, t is the stand age, and H is the mean stand height.For other species, we used mean stand DBH instead of mean arithmetic DBH as the independent variable to estimate the diameter distribution in each pixel.When mean arithmetic DBH by species was estimated, parameters b and c of the Weibull function by species in each pixel could be calculated using the WPPMs.

Generating Tree-Lists by Species
Species-level biomass in step two for the study area in 2000 were from a previous study, from which the accuracy for the species-level biomass assessment was provided [30].In step three, the species-level biomass was used to estimate the tree-lists in each pixel with the diameter distribution by species.Once diameter distribution by species was estimated by WPPMs in each pixel, the proportion of the tree density at each diameter class by species (PTDS) could be calculated in each pixel.Then, the proportion of the species-level biomass at each diameter class (PSB) was calculated according to PTDS and single-tree biomass by species at each diameter class [35].Finally, the species-level biomass could be divided into each diameter class according to the PSB and converted to the tree density at each diameter class by dividing them by the single tree biomass at each diameter class.

Accuracy Assessment
In order for our framework to have a reliable accuracy, each step of the method was assessed using a variety of accuracy measures.
In the first step, the goodness of fit of the estimated Weibull diameter distributions in each plot was evaluated by the Kolmogorov-Smirnov (KS) test [36,37] and an error index proposed by Packalén and Maltamo [12]: where f i is the observed stem number of class i by species, fi is the estimated stem number of class i by species based on WPPMs, N is the total observed stem number by species, and Ň is the total Forests 2018, 9, 758 7 of 18 estimated stem number by species.The error index ranges from 0 to 1, with 0 meaning a perfect fit and 1 meaning that the distributions do not overlap at all [38].WPPMs were evaluated by using leave-one-out cross-validation (LOOCV).The R 2 , root mean square error (RMSE), and bias were calculated based on the parameters of Weibull diameter distribution functions: where n is the number of the samples, y i is the ith observed value, yi is the ith estimated value, and y is the mean value of all the observed values.For assessment of WPPMs' accuracy, n is the number of plots used to fit WPPMs for each species, y i is the parameter value estimated for plot i based on MLE, and yi is the parameter value estimated for plot i based on WPPMs.Because R 2 , root mean square error (RMSE), and bias have also been widely used to evaluate the accuracy of many maps of forest stand attributes, the maps of stand attributes (e.g., stand DBH, height, and age) in the second step were also evaluated by calculating these three measures using stand inventory polygons from the CFSDC.Although all stand inventory polygons were used in random forests-based kNN (RF-kNN) model development, only the 2nd through 7th nearest-neighboring polygons were used to impute stand attributes for each target pixel.This means that each polygon was not used to impute the pixels that had the same place as it and only contributed minimally to the estimate for the corresponding pixel.Thus, the two datasets can be deemed independent for comparisons between the field stand values and the modeled pixel values with the same spatial place.
Assessments for estimated tree-lists by species in the year 2000 across the study area were difficult because we only had the inventory tree-lists data in 2011.In order to compare with the estimated tree-lists of 2000, we transformed the inventory tree-lists data of 2011 by subtracting the DBH growth of the six species in the past eleven years based on equations (Table 1) which were built using the ordinary least squares (OLS) method and evaluated using the leave-one-out cross validation (LOOCV) method against the published data [39].We evaluated the accuracy of the estimated tree-lists in 2000 by calculating the error index based on these transformed inventory tree-lists data.The error indices by species were calculated by each plot and by all the plots as a whole.
We also divided the forest inventory data in 2011 into three age classes (Young, Middle, and Mature) by technical regulations for inventory for forest management planning and design (GB/T26424-2010).Additionally, we compared the tree density from observed tree-lists data in 2011 with that derived from estimated tree-lists data in 2000 for the three main forest types (i.e., larch forests, white birch forests, and mixed forests) with the same age class and similar environment using analysis of variance (ANOVA).When identifying the similar environment, we mainly considered three factors: slope, aspect, and elevation.

Weibull Parameter Prediction Models
The KS test showed that diameter distributions of all species could be well fitted using the Weibull functions estimated by the MLE method (agreement > 50%), especially for larch, white birch, and pine (Table 2, agreement > 80%).The leave-one-out cross validation (LOOCV) of WPPMs showed that the scale parameter (c) had a good relationship with the arithmetic mean DBH (d) for all the species (Table 3, small RMSE and high R 2 ).However, the arithmetic mean DBH (d) explained less variation in shape parameter (b) among different plots (R 2 : 0.13-0.29).Overall, diameter distributions by species in each plot were well-fitted, with mean values of error indices from 0.27 to 0.43, except for Mongolian oak.When separately summing the estimated and observed trees by diameter class of species to calculate the error indices, all the species diameter distributions were well-represented with low error indices (0.04-0.14) (Table 4).

Maps of Forest Stand Attributes
The maps of forest stand DBH, height, and age were estimated at a 250 m spatial resolution using RF-kNN method and assessed using inventory stand polygon data (Figure 3).Results showed that mean estimated values of stand DBH and height were close to their mean observed values with lowest biases (Figure 3a,b), while mean estimated stand age was slightly greater than the observed (Figure 3c).R 2 values for the three stand attributes ranged from 0.47 to 0.60, with the highest values for stand height, and the lowest values for stand DBH.RMSE values of the three stand attributes (Figure 3) were relatively accepted compared to their mean observed values (mean stand DBH: 12.55 cm, mean stand height: 12.22 m, mean stand age: 54.56 years).
The spatial distributions of three estimated stand attributes were in relation to the area of fire in 1987 and elevations (Figures 1 and 3).Fire areas and low elevation areas had consistently low predictions of the three stand attributes.In order to obtain the diameter distributions of larch and white birch in each pixel, arithmetic mean DBH of larch and white birch were also mapped using Equations ( 4) and (5) based on the three estimated attributes (Figure 4).Because other species often occupied very little proportions of the species composition, we used the mean stand DBH instead of the arithmetic mean DBH of species to calculate their diameter distributions based on the WPPMs in each pixel.

Maps of Tree Density from Tree-Lists
Tree density (trees with DBH >5 cm) of three main forest types derived from estimated tree-lists in 2000s based on WPPMs did not show noticeable difference compared to the observed tree density    4) and (5).

Maps of Tree Density from Tree-Lists
Tree density (trees with DBH >5 cm) of three main forest types derived from estimated tree-lists in 2000s based on WPPMs did not show noticeable difference compared to the observed tree density  4) and (5).

Maps of Tree Density from Tree-Lists
Tree density (trees with DBH >5 cm) of three main forest types derived from estimated tree-lists in 2000s based on WPPMs did not show noticeable difference compared to the observed tree density of the same forest types with a similar environment (Figure 5).The estimated wall-to-wall tree-lists only showed slightly poorer accuracy compared to the estimated tree-lists using stand attributes of plot inventory data in 2011 (Table 4, Figures 6 and 7).When combining all the plots into one data set, the accuracy of tree-lists for most species improved greatly, except for Mongolian oak, and the smallest stems of all species were underestimated (Figure 7).The estimated DBH of most tree species was concentrated in the range of 6 to 22 cm, except for aspen (Figure 8).The total tree density mainly ranged from 500 to 2500 trees/ha (Figure 9).The larch forests with high tree density mainly distributed in places with high elevation.White birch often showed an opposite trend to larch for the spatial tree density distribution, especially in the burned areas in 1987 (Figure 9).For other species, estimated tree density showed less than 450 trees/ha.smallest stems of all species were underestimated (Figure 7).The estimated DBH of most tree species was concentrated in the range of 6 to 22 cm, except for aspen (Figure 8).The total tree density mainly ranged from 500 to 2500 trees/ha (Figure 9).The larch forests with high tree density mainly distributed in places with high elevation.White birch often showed an opposite trend to larch for the spatial tree density distribution, especially in the burned areas in 1987 (Figure 9).For other species, estimated tree density showed less than 450 trees/ha.the accuracy of tree-lists for most species improved greatly, except for Mongolian oak, and the smallest stems of all species were underestimated (Figure 7).The estimated DBH of most tree species was concentrated in the range of 6 to 22 cm, except for aspen (Figure 8).The total tree density mainly ranged from 500 to 2500 trees/ha (Figure 9).The larch forests with high tree density mainly distributed in places with high elevation.White birch often showed an opposite trend to larch for the spatial tree density distribution, especially in the burned areas in 1987 (Figure 9).For other species, estimated tree density showed less than 450 trees/ha.

Discussion
With continuous need of detailed forest attributes (e.g., tree-lists) for answering regional questions, developing methods to obtain such detailed forest attributes attracts a great deal of research interest.Although both the kNN method and Weibull diameter distribution have been widely used for estimating tree-lists [5,9,12,13], the integration of these two methods is novel in generating tree-lists that have seen wide applications.A key novelty of this research is the generation of wall-to-wall estimates of tree-lists over large areas using a limited number of forest ground inventory data.
For reginal forest attribute estimation like this, one important issue is accuracy.Because wall-to-wall tree-lists estimated by our framework are transformed from species-level biomass by kNN imputation and tree density percent from WPPMs, the accuracy of our estimated tree-lists were mainly determined by WPPMs and kNN imputation models.For a more general assessment of our method framework, we compared the accuracies of WPPMs and kNN models with other similar studies.We found that the parameter prediction models of Weibull diameter distributions

Discussion
With continuous need of detailed forest attributes (e.g., tree-lists) for answering regional questions, developing methods to obtain such detailed forest attributes attracts a great deal of research interest.Although both the kNN method and Weibull diameter distribution have been widely used for estimating tree-lists [5,9,12,13], the integration of these two methods is novel in generating tree-lists that have seen wide applications.A key novelty of this research is the generation of wall-to-wall estimates of tree-lists over large areas using a limited number of forest ground inventory data.
For reginal forest attribute estimation like this, one important issue is accuracy.Because wall-to-wall tree-lists estimated by our framework are transformed from species-level biomass by kNN imputation and tree density percent from WPPMs, the accuracy of our estimated tree-lists were mainly determined by WPPMs and kNN imputation models.For a more general assessment of our method framework, we compared the accuracies of WPPMs and kNN models with other similar studies.We found that the parameter prediction models of Weibull diameter distributions (WPPMs) in our study produced comparable accuracy with the previous studies.For example, Liu [40] reported that the determination coefficient (R 2 ) of the shape parameter (b) estimated by the previous parameter prediction models (PPMs) was generally close to 0.1-slightly poorer than our results.Additionally, our estimates of scale parameter (c) for all the species were slightly better than the results that Diamantopoulou et al., reported in the south and southwestern (Mediterranean) regions of Turkey (c: R 2 = 0.96-0.99).The mean error index (0.32-0.34) of the estimated diameter distributions in a typical Finnish managed boreal forest area that Packalén and Maltamo reported was similar to our results about the dominant tree species.These comparisons indicate that our WPPMs can be used at least as accurately as previous studies to estimate tree diameter distribution.
Our imputed wall-to-wall stand attributes (e.g., stand DBH, height, and age) also showed comparable accuracy to the previous similar studies.For example, the accuracy of our study was very close to the results that Beaudoin et al. (2014) [41] reported about the stand age and height in a Canadian forest, with R 2 of 0.57 and 0.58, RMSE 44.29 years and 5.66 m, and bias of 1.54 years and 0.07 m.In addition, the area with low estimated values of stand DBH, height, and age was consistent with the area disturbed by fires in 1987 and low elevations linked to intensive human activities, indicating that our spatial predictions of forest attributes are realistic, capturing dominant ecological patterns and processes.
Although we were unable to directly assess the accuracy of the estimated tree-lists in the 2000s due to the lack of tree-lists inventory data in this period, the assessments of WPPMs and kNN imputation models in our study demonstrated that our estimated tree-lists had comparable accuracy to previous studies.The assessment of the estimated tree-lists in 2000 based on transformed inventory tree-lists data provided the accuracy about our method framework, which were comparable with a previous study that was in much smaller area [12].The comparisons between estimated and observed total density also indicated that the estimated tree-lists in 2000s had no obvious difference with tree-lists from the forest inventory in 2011 with the same forest types and similar environmental conditions.In addition, the estimated tree density ranging from 400 to 2500 trees/ha (Figure 9) (DBH ≥ 5 cm) is consistent with the previous results reported by Zhai,et al. [42] and Zhao et al. [43].These comparisons also indicated that the estimated tree-lists had a relatively reliable accuracy.
There are many methods to describe the diameter distributions of forests stands, among which the Weibull function is one of the most simple and accurate functions for modeling diameter distribution.Liu et al. [44] compared three methods of Weibull function in modeling diameter distributions in Daxing'an Mountain, China.They found that a finite mixture model (FMM) of Weibull functions was more flexible to describe highly skewed and irregular diameter distributions than a single Weibull function to fit the whole stand only, and a single Weibull function to fit each species component separately was also able to fit each species component well in mixed forest stands.However, FMM is often difficult to use for large regions due to many unknown parameters requiring estimation [45].A single Weibull function to fit each species component separately is easy to use but first requires the information of species composition, which is also difficult to obtain only through the forest inventory across large regions.Our method framework could use a Weibull function to fit each species component separately and provide the information about species composition across large regions using a kNN model based on MODIS data.MODIS data have wide coverage and high temporal resolution, and have been widely used to generate regional maps of forest attributes [3,5,24,30,41,[46][47][48][49][50].The advantage of our method framework is its use of MODIS data as predictor variables to estimate stand attributes.Compared to other high-resolution optical images (e.g., Landsat), MODIS data provide abundant and near-real-time information for the forest stand attribute imputation across large regions [14,30].Therefore, our framework may be more feasible for use across large regions to describe diameter distributions.
Many approaches have been used to estimate Weibull parameters for describing diameter distributions such as ordinary least squares (OLS), seemingly unrelated regression (SUR), cumulative distribution function regression (CDFR), or artificial neural network (ANN) methods [21].Although Diamantopoulou et al. [20] proved that complex methods (e.g., ANN) were superior to simple methods (e.g., OLS) in the estimation of Weibull parameters, the application of complex methods is often limited across large regions because they are often based on a training method and their results cannot go not beyond the range of training samples.The OLS method is simple and has been widely used to fit the parameter models.Additionally, many previous studies have also proved that it was possible to estimate the scale or shape parameters of a two-parameter Weibull function using the parametric models fitted by OLS [51,52].Therefore, in this study parametric WPPMs were built using only OLS.
In this study, only the arithmetic DBHs of larch and white birch were calculated and used when estimating the wall-to-wall species' diameter distribution.For other species, we used stand DBH instead of tree species' arithmetic DBH to estimate the parameters of Weibull diameter distribution because of their small percentage of forest species composition for most forest stands in our study area.Although it is reasonable for the Chinese boreal forests, which have simple stand structures, great uncertainties might be produced for forests with complicated structures using stand DBH instead of tree species' arithmetic DBH.Therefore, our method framework is generally suited for forests with simple stand structure, such as boreal forests.

Conclusions
Wall-to-wall tree-lists across large regions are required to support science, policy, and reporting information requirements.However, it is difficult to obtain such information across large regions due to the high cost of the forest inventory.In this study, we presented a framework to estimate wall-to-wall tree-lists using limited forest inventory plots by integrating a Weibull diameter distribution with the imputed stand attributes based on the kNN method in Chinese boreal forests.The Weibull function showed strong ability to describe the diameter distributions of tree species.The WPPMs of six species showed acceptable accuracy for estimating the parameters of Weibull functions.The imputed stand age, mean height, and DBH across the study area by the kNN method also showed comparable accuracy with previous study.The estimated tree density derived from tree-lists did not show obvious difference between the observed and estimated data for similar forest types.Our estimated tree-lists also captured some effects of ecological processes, such as fire disturbance.Our method provides an alternative for estimating the tree lists over large area such as the boreal forest biome in China.
Author Contributions: The authors provided equal contribution towards decisions regarding methodology and study design.Early drafts were written by Q.Z., and revised and edited by Y.L., and H.S.H.

Figure 1 .
Figure 1.The location and elevation of the study area with the distribution of the forest inventory plots (red plots).

Figure 1 .
Figure 1.The location and elevation of the study area with the distribution of the forest inventory plots (red plots).

Figure 2 .
Figure 2. The overall framework of generating tree-lists.DBH: Diameter at breast height; kNN, k-nearest neighbor; WPPMs, Weibull parameter prediction models by species.

Figure 2 .
Figure 2. The overall framework of generating tree-lists.DBH: Diameter at breast height; kNN, k-nearest neighbor; WPPMs, Weibull parameter prediction models by species.

Figure 3 .
Figure 3. Goodness-of-fittings (density scatterplots) of the random forest (RF)-based kNN models of stand DBH, height, and age (a,b,c) and maps of estimated diameter, height, and age (d,e,f).The points superimposed on the density image are the points from those areas of lowest regional densities, which are the identification of outliers.The dotted line is the 1:1 line and the dashed line is the geometric mean functional regression line.

Figure 3 .
Figure 3. Goodness-of-fittings (density scatterplots) of the random forest (RF)-based kNN models of stand DBH, height, and age (a-c) and maps of estimated diameter, height, and age (d-f).The points superimposed on the density image are the points from those areas of lowest regional densities, which are the identification of outliers.The dotted line is the 1:1 line and the dashed line is the geometric mean functional regression line.

Figure 3 .
Figure 3. Goodness-of-fittings (density scatterplots) of the random forest (RF)-based kNN models of stand DBH, height, and age (a,b,c) and maps of estimated diameter, height, and age (d,e,f).The points superimposed on the density image are the points from those areas of lowest regional densities, which are the identification of outliers.The dotted line is the 1:1 line and the dashed line is the geometric mean functional regression line.

Figure 5 .
Figure 5. Boxplots of observed vs. predicted tree density of three main forest types (Larch forests (a), White birch forests (b) and Mixed forests (c)) with similar environment (p > 0.05).Observed tree density was calculated from the inventory plot data in 2011.Estimated tree density was calculated from the estimated tree-lists in 2000s.The same letter "a" indicated that there was no obvious difference between observed and estimated tree density.

Figure 6 .
Figure 6.Error indices of estimated wall-to-wall tree-lists by species in 2000 based on the transformed inventory tree-lists data.The error indices were calculated by each transformed plot.

Figure 5 .
Figure 5. Boxplots of observed vs. predicted tree density of three main forest types (Larch forests (a), White birch forests (b) and Mixed forests (c)) with similar environment (p > 0.05).Observed tree density was calculated from the inventory plot data in 2011.Estimated tree density was calculated from the estimated tree-lists in 2000s.The same letter "a" indicated that there was no obvious difference between observed and estimated tree density.

Figure 5 .
Figure 5. Boxplots of observed vs. predicted tree density of three main forest types (Larch forests (a), White birch forests (b) and Mixed forests (c)) with similar environment (p > 0.05).Observed tree density was calculated from the inventory plot data in 2011.Estimated tree density was calculated from the estimated tree-lists in 2000s.The same letter "a" indicated that there was no obvious difference between observed and estimated tree density.

Figure 6 .
Figure 6.Error indices of estimated wall-to-wall tree-lists by species in 2000 based on the transformed inventory tree-lists data.The error indices were calculated by each transformed plot.Figure 6. Error indices of estimated wall-to-wall tree-lists by species in 2000 based on the transformed inventory tree-lists data.The error indices were calculated by each transformed plot.

Figure 6 . 18 Figure 7 .
Figure 6.Error indices of estimated wall-to-wall tree-lists by species in 2000 based on the transformed inventory tree-lists data.The error indices were calculated by each transformed plot.Figure 6. Error indices of estimated wall-to-wall tree-lists by species in 2000 based on the transformed inventory tree-lists data.The error indices were calculated by each transformed plot.Forests 2018, 9, x FOR PEER REVIEW 12 of 18

Figure 7 .
Figure 7. DBH distributions by species (larch (a), white birch (b), pine (c), aspen (d), spruce (e), Mongolian oak (f)) of the observed vs. estimated wall-to-wall tree-lists based on the transformed inventory tree-lists data.The error indices were calculated by regarding all transformed plots as a whole.

Figure 8 .
Figure 8.Estimated tree-lists of the study area in 2000 by species (larch and white birch (a); aspen, pine and willow (b); black birch and Mongolian oak (c); spruce (d)).

Figure 7 .
Figure 7. DBH distributions by species (larch (a), white birch (b), pine (c), aspen (d), spruce (e), Mongolian oak (f)) of the observed vs. estimated wall-to-wall tree-lists based on the transformed inventory tree-lists data.The error indices were calculated by regarding all transformed plots as a whole.

Figure 7 .
Figure 7. DBH distributions by species (larch (a), white birch (b), pine (c), aspen (d), spruce (e), Mongolian oak (f)) of the observed vs. estimated wall-to-wall tree-lists based on the transformed inventory tree-lists data.The error indices were calculated by regarding all transformed plots as a whole.

Figure 8 .
Figure 8.Estimated tree-lists of the study area in 2000 by species (larch and white birch (a); aspen, pine and willow (b); black birch and Mongolian oak (c); spruce (d)).

Figure 8 .
Figure 8.Estimated tree-lists of the study area in 2000 by species (larch and white birch (a); aspen, pine and willow (b); black birch and Mongolian oak (c); spruce (d)).

Funding:
This research was funded by National Key Research and Development Program of China (grant numbers: 2017YFA0604402 and 2016YFA0600804), and National Natural Science Foundation of China (grant number: 31570461).
where x1, x2, …, xn are the observed values of the samples from the population, ( 1 ,  2 , … ,   , ) is the MLE function, and (x, ) is the probability density function with unknown parameters ().The maximum probability for the samples (x1, x2, …, xn) is required when using MLE method to estimate the value of .If the MLE function (L) is a differentiable function, the value of  satisfies the requirement of the function:

Table 1 .
Equations of DBH (d, unit: cm) and age (a, unit: year) for six species and accuracy assessment using leave-one-out cross validation (LOOCV).RMSE: Root mean square error.

Table 2 .
The agreement of each species for the two-parameter Weibull diameter distribution using Kolmogorov-Smirnov (KS) test (p > 0.05).
* Agreement is the percentage of the samples that the passed the KS test.

Table 3 .
Equations of shape parameter (b) and scale parameter (c) of Weibull diameter distribution depend on arithmetic mean DBH (d, unit: cm) for each species and accuracy assessment using leave-one-out cross validation (LOOCV).

Table 4 .
Error indices of DBH distributions for each species in all plots based on the WPPMs using LOOCV method.Error indices of summing all the plots are the error indices calculated by regarding all plots as a whole. *