Estimation Using the k-NN Technique Combined with Forest Inventory Data , Satellite Image Data and Additional Feature Variables

The main objective of this study was to evaluate the effectiveness of adding feature variables, such as forest type information and topographicand climatic-environmental factors to satellite image data, on the accuracy of stand volume estimates made with the k-nearest neighbor (k-NN) technique in southwestern Japan. Data from the Forest Resources Monitoring Survey—a national plot sampling survey in Japan—was used as in situ data in this study. The estimates obtained from three Landsat Enhanced Thematic Mapper Plus (ETM+) datasets acquired in different seasons with various combinations of additional feature variables were compared. The results showed that although the addition of environmental factors to satellite image data did not always help improve estimation OPEN ACCESS Remote Sens. 2015, 7 379 accuracy, the use of summer rainfall (SRF) data had a consistent positive effect on accuracy improvement. Therefore, SRF may be a useful feature variable to consider in stand volume estimation in this study area. Moreover, the use of forest type information is very effective at reducing k-NN estimation errors when using an optimum combination of satellite image data and environmental factors. All of the results indicated that the k-NN technique combined with appropriate feature variables is applicable to nationwide stand volume estimation in Japan.


Introduction
National forest inventories (NFIs) have been implemented to obtain forest information in many countries [1].An NFI is generally based on a plot sampling method (hereafter, we define it as a sample-based approach), meaning that only spatially discontinuous information can be obtained within a country.Remote sensing is a unique technique for expanding plot level information to large area continuous information (hereafter, we use the term "multisource inventory" [2] to describe the approach of combining field plot and satellite image data).The use of the latter is the most cost-effective and realistic way to obtain nationwide, wall-to-wall estimates of forest resources [3].Meanwhile, the integration of in situ surveys and satellite image data has the potential to become the most appropriate forest observation method for the Global Earth Observation System of Systems (GEOSS) [4].McRoberts and Tomppo (2007) [5] summarized four primary ways to enhance an NFI using satellite image data: (1) providing faster and less expensive observations or measurements of forest attributes; (2) increasing the precision of large area inventory estimates; (3) providing inventory estimates with acceptable bias and precision for small areas where sufficient field data are not available; and (4) producing forest thematic maps usable for timber production, procurement and ecological studies.
In Finland and Sweden, nationwide forest databases, including various kinds of information, e.g., stand volume, stand basal area and stand age, have been created by combining NFI field plot data, medium resolution satellite image data and other digital maps using the non-parametric k-nearest neighbor (k-NN) technique [3,6].In China [7,8], Ireland [9] and Norway [10], the technique was tested in parts of the country to estimate forest information.Some previous studies revealed that the k-NN technique has the potential to increase the precision of NFI estimates by the post-stratification technique [5,[11][12][13].Due to its ready availability, the k-NN technique has received considerable attention and merited special discussion in recent years [5].
Since the first implementation of the multisource inventory based on the k-NN technique by the Finnish Forest Research Institute in 1989 [14], many studies have proposed various methods based on the k-NN technique to improve the accuracy of stand volume estimates.For example, soil type (peat land and mineral soil) [14], forest site quality [15], the age of each stand and the year of the last thinning operation [16], geographical distance [17] and large-area variation of forest variables [18] were used as a priori information to stratify forests and/or as explanatory variables of the k-NN technique, in combination with satellite image data, to estimate stand volume.All of these studies revealed that adding and/or using some spatial information alongside satellite image data have the potential to improve the accuracy of the estimates in the k-NN technique.These additional feature variables have also been used in combination with Moderate Resolution Imaging Spectroradiometer (MODIS) data to improve the accuracy of estimates in a k-NN framework (the k-NN, the most similar neighbors (MSN or k-MSN [19,20]) and the phenological gradient nearest neighbor (PGNN)) [21,22].
In Japan, a national, sample-based forest inventory, named the "Forest Resources Monitoring Survey" (FRMS), was initiated in 1999 to contribute and promote sustainable forest management [23,24].The FRMS plots were geo-referenced, i.e., the plot data could also be used for estimating forest attributes via the k-NN technique, combined with satellite image data.For example, Kajisa et al. [25] estimated the stand volume in the Kyushu region of Japan using FRMS plots and satellite image data, and concluded that the k-NN technique was suitable for this purpose in their study area.Kajisa et al. [25] proposed a simple method using only satellite image data as an explanatory variable.They estimated the stand volume of coniferous and broad-leaved forests simultaneously without stratifying forests according to a priori information.Therefore, there is still the potential to improve their method by using additional feature variables in the same region.The topographic and climatic characteristics within the study area varied; hence, we assumed that adding environmental factors, derived from a digital elevation model (DEM), and climatic data to satellite image data could improve estimation accuracy, because these environmental factors affect stand structure and tree growth [26][27][28][29][30][31][32].We also considered the utility of forest type information, because the reflective properties of light and the ranges of stand volume differ among various forest types.As the effectiveness of additional feature variables would differ depending on the scene or the season, we also needed to compare the estimates obtained from different seasonal satellite image data.
The main objective of this study was to evaluate the effectiveness of adding feature variables, such as forest type information and topographic-and climatic-environmental factors, to satellite image data on the accuracy of stand volume estimates using the k-NN technique in southwestern Japan.FRMS data, which is a systematically sampled set of forest inventory data in Japan, was used as in situ data in the analysis, as with Kajisa et al. [25].We compared the stand volume estimates obtained using three Landsat 7 Enhanced Thematic Mapper Plus (ETM+) datasets acquired in different seasons.

Study Area
A large area located in southwestern Japan, which covered all or part of nine prefectures (Ehime, Shimane, Hiroshima, Yamaguchi, Fukuoka, Oita, Kumamoto, Miyazaki and Kagoshima), was selected for the study (Figure 1).The study area included several tree species, such as Japanese cedar (Cryptomeria japonica), Japanese cypress (Chamaecyparis obtusa), Japanese red pine (Pinus densiflora), evergreen broad-leaved species (e.g., Quercus glauca, Castanopsis cuspidate and Machilus thunbergii) and deciduous broad-leaved species (e.g., Quercus serrata and Quercus acutissima).The climatic conditions differed significantly within the study area, because it extended more than 600 km in a latitudinal direction and encompassed a range of elevations exceeding 1700 m.Due to the climatic differences, the phenology of the vegetation within the study area varied considerably.

Satellite Image Data
Three sets of Landsat 7 ETM+ data acquired in three different seasons were used to estimate the stand volume (Table 1).The ETM+ data on three data tiles (three rows) between Row 36 and Row 38 on Path 112 of the World Reference System 2 (Figure 1) were downloaded via the Internet from the Earth Resources Observation and Science Center, United States Geological Survey.These data were acquired before the function of the scan line corrector built in the sensor of Landsat 7 failed.Because the thermal-infrared Band 6 has coarse spatial resolution (60 m), only visible, near-infrared and mid-infrared spectral bands (1-5 and 7) of ETM+ data, with a 30-m spatial resolution, were analyzed.The product types of all ETM+ data were Level 1T (standard terrain correction; L1T).The L1T products have been orthorectified using a DEM derived from the Shuttle Radar Topography Mission and resampled by the cubic convolution method.We mosaicked ETM+ data tiles separately for each observation date.The mosaicked ETM+ data were re-registered to a mosaicked and orthorectified aerial photograph (acquired in about 1990; black-and-white; 1-m spatial resolution), because the ETM+ data had minor geometric errors.We employed the nearest-neighbor resampling method with a first-order polynomial equation with less than 0.5 pixels total RMSE.Following the geometric correction, we calculated the at-sensor spectral radiance from the digital numbers (DN) of the ETM+ data by the following equation: where L is the spectral radiance at the sensor's aperture (W•m −2 •sr −1 •μm −1 ), Qcal is the quantized calibrated pixel value (DN), Qcalmin is the minimum quantized calibrated pixel value corresponding to LMIN, Qcalmax is the maximum quantized calibrated pixel value corresponding to LMAX, LMIN is the spectral at-sensor radiance that is scaled to Qcalmin We did not perform any atmospheric correction via radiative transfer software due to the difficulty of ensuring a constant atmospheric condition under a single parameter setting for large areas.A dark object subtraction method [33,34] was used to make radiometric corrections to exclude path radiance, and then, a corrected radiance (Lcor) was obtained.Dark objects were selected from a single pixel in a clear water body via trial-and-error.Cloudy and hazy areas were visually identified and masked out manually.For the remaining areas, a constant atmospheric condition was assumed.In addition, we assumed that there were no atmospheric transmittance loss and no diffuse downward radiation at the surface and that the dark object in the remaining study area would remain absolutely dark, whereupon the reflectance (R) value was obtained from the Lcor with the following formula: where π is a mathematical constant equal to ~3.14159 (unitless), d is the Earth-Sun distance (astronomical units), ESUN is the mean exoatmospheric solar irradiance (W•m −2 •μm −1 ) and θs is the solar zenith angle (degrees).In this study, the Earth-Sun distance values obtained from Chander et al. [35] were employed.This preprocessing is important, because most of the remote sensing applications are based on spectral reflectance; however, it should be noted that the standardization of DN would give the same results from spectral reflectance, since the processing of Equations ( 1) and ( 2) is a linear transformation (see Section 2.5).Because some previous studies reported that topographic normalization did not give clear results in k-NN estimation for satellite images with a high Sun elevation angle [36,37], we assumed there would be only minor topographic effects on the ETM+ data and did not perform topographic normalization in this study.

In Situ Data
We used plot-level stand volume data (m 3 /ha) derived from the first term of the FRMS as in situ data.The FRMS plots were set up at systematic 4-km grid intervals covering all forests in Japan.The first term of the FRMS was completed between April 1999, and March 2004.All plots are scheduled to be measured every five years, and about one-fifth of the plots are measured each year.Each plot is composed of three concentric circles with radii of 5.64 m (0.01 ha; S), 11.28 m (0.04 ha; M) and 17.84 m (0.10 ha; L), within which a total of 40 variables (nine for trees and understory vegetation and 31 for topography and other information) was measured by field crews.The size of trees to be measured was different for each circle; trees >1 cm, 5 cm and >18 cm in diameter at breast height (DBH) were measured in the S, M and L circles, respectively.In each plot, the tree height was measured for at least 20 trees, which were expected to be selected from a wide range of tree sizes to construct a DBH-height curve for the plot [23,25,31,38,39].We used the information of species, DBH and tree height from the 40 measurement variables to calculate the plot-level stand volume.The individual stem volume was calculated from general two-way volume equations for each tree species; then, the summation of volume per 0.10 ha was defined as the observed stand volume in the field in this study.
During the first survey term, all FRMS plots were established by field crews with the help of Global Positioning System (GPS) navigation.According to the previous study [38], the location of FRMS plots had some positional errors.For example, some of the plots were subject to considerable positional errors (exceeding 20 m in RMSE).To avoid modeling errors due to positional inaccuracy between ETM+ data and field plot data, efforts to eliminate or to replace potentially erroneous data (e.g., sample plots falling near the forest/non-forest boundary) were employed in some previous studies [6,15,[40][41][42].In this study, we eliminated potentially erroneous data by the following steps: (1) we generated circular areas (radius of 50 m, corresponding to approximately 3 × 3 pixels of ETM+ data) based on the position coordinates of FRMS plots on the geographic information system (GIS) software; (2) the forest cover of the circular area was visually checked using the mosaicked aerial photograph and ETM+ data; (3) plots whose forest cover ratio for the circular area was less than 50% were eliminated; and (4) plots that were seemingly subject to dramatic changes by clear-cutting or other disturbance were excluded from the analysis [3,40].In order to evaluate the accuracy of stand volume estimates appropriately for the three sets of Landsat 7 ETM+ data, we only selected the FRMS plots within cloudless areas on all ETM+ data.Consequently, a total of 891 FRMS plot data was used in this study.

Additional Feature Variables
As previously mentioned, we considered the use of forest type information for stand volume estimation.Because we did not have any accurate wall-to-wall digital maps of the forest types for this study area, we used forest type information (FTYPE; categorical variable), which was determined by the FRMS plot data itself.The FRMS plots were grouped into two classes (evergreen coniferous forest, ECF; broad-leaved forest, BF) by the dominant analysis [31,43] and the following procedures: (1) To obtain the dominant tree species group in all plots, first, all species of trees within each FRMS plot were classified into five tree species groups (evergreen coniferous, deciduous coniferous, evergreen broad-leaved, deciduous broad-leaved and bamboo); then, the dominant tree species group was determined based on the maximum basal area (m 2 ) within each plot.(2) Plots that were dominated by bamboo trees were excluded from the analysis.Deciduous coniferous trees were not found in this study area, that is to say, the ECF class was only found for coniferous trees.(3) Two broad-leaved forest types (i.e., evergreen broad-leaved forest and deciduous broad-leaved forest) were aggregated and defined as BF, because of insufficient FRMS plots in these classes (for more detailed information on this procedure, see [31]).The sample sizes for the ECF and BF classes were 501 and 390, respectively.A summary of plot data concerning the stand volume is shown in Table 2.A 10-m grid DEM derived from topographic maps at a 1:25,000 scale developed by photogrammetric interpretation, published by the Geospatial Information Authority of Japan, was used for the calculation of topographic-environmental factors.First, the grid size of the DEM was converted to 30 m by a bilinear interpolation resampling method; then, we computed the elevation (Elev), terrain slope angle (Slope), and solar radiation index (SRI [44]).We employed a 3 × 3 (i.e., 90 m × 90 m) moving window to calculate Slope in this study.Slope and SRI are calculated by the following formulae: where ∆ and ∆ are the average elevation changes per unit of distance in the x and y directions.
where Lat is the latitude (degrees; north positive, south negative) and Aspect* is 180°-Aspect.Aspect was also computed by a 3 × 3 moving window.The 1-km Mesh Climatic Data of Japan published by the Japan Meteorological Agency [45] was used for the calculation of climatic-environmental factors.We calculated Kira's warmth index (WI) and coldness index (CI) [46], summer rainfall (SRF) (from June to September) and winter rainfall (WRF) (from November to February).The WI was calculated from the equation WI = ∑(MT − 5), where the summation is made for the months in which the monthly mean temperature (MT) is higher than 5 °C, and the CI was calculated from the equation CI = − ∑(5 − MT), where the summation is made for the winter months in which the MT is lower than 5 °C [47].
The mesh climatic data were calculated from meteorological observation data between 1971 and 2000 and interpolated with a 1-km mesh.Following computation, these four climatic-environmental factors were rasterized and converted from a grid size of 1 km to 30 m.
Although the use of explanatory variables that directly relate to the stand volume is very effective, it is extremely difficult to obtain such spatial data in wide ranges.Therefore, we used the above-mentioned seven environmental factors in the stand volume estimation.

Stand Volume Estimation and Accuracy Assessment
The reflectance data of six spectral bands (referred to simply as spectral data) on each seasonal ETM+ data, FTYPE and seven environmental factors were all stacked, and they are referred to as D0404, D0525 and D0712, respectively.Each dataset includes 14 variables (i.e., spectral data, FTYPE, Elev, Slope, SRI, WI, CI, SRF and WRF), and their ranges are different by each explanatory variable; so, all of the variables, except that of FTYPE, were standardized to avoid scale effects [21,48].
Using the k-NN technique, the stand volume can be estimated based on the simple assumption that the pixel to be estimated has a stand volume similar to the reference pixels.The stand volume for each pixel was calculated as the weighted mean stand volume of the reference pixel of the k-nearest samples in the feature space (explanatory variable space).The pixel weights, , , and estimates, ŷp, were calculated using the following equations: where i is an arbitrary field plot, p is an arbitrary pixel, pj is the pixel corresponding to the field plot j, d is the distance metric defined in the feature space, k is the number of reference plots, is the observed stand volume of the reference pixel and t ≥ 0 [14,49].As the importance of each variable is different, some algorithms, such as k-MSN, PGNN and others, including the weighting of the distance metric [3,18], may be adequate when using different types of information.In addition, although fixed values of t = 0, 1 and 2 are usually used in the k-NN technique, the optimum value of t increases as k increases [49].However, given its interpretive advantages [21] and the ease of comparing its results with those of previous studies [7,10,25,40,42,[50][51][52], we used a simple k-NN technique with a fixed value of t = 2 in order to investigate the effectiveness of additional feature variables in this study.First, we created prediction models without FTYPE and then created prediction models using FTYPE as a dummy variable.Hereafter, we define the former estimation as Analysis 1 and the latter as Analysis 2.
The selection of explanatory variables is a key procedure to remove uninformative or noisy variables for accurate stand volume estimation.We tested all possible variable combinations for the estimation of stand volume to find the best k-NN estimator (i.e., the optimum combination model).During this study, we examined the effectiveness of adding environmental factors to spectral data; therefore, the basic models (spectral data alone for Analysis 1, spectral data and FTYPE for Analysis 2) were fixed, and the variable selection was conducted only for the seven environmental factors.In each variable combination of each dataset and analysis, we selected the smallest value of k where the RMSE is not more than 1% greater than the smallest value of RMSE, as the optimum value of k [49,53].
We used the RMSE for accuracy assessment.To facilitate a comparison with previous studies (e.g., [7,10,25,40,42,[50][51][52]), we also used relative RMSE (rRMSE, %).The RMSE and rRMSE were calculated using a leave-one-out cross-validation procedure with the following equations: where yi is the observed volume of the validation data i, ŷi is the estimated volume of i, n is the number of validation data and is the mean of the observed stand volume.In accordance with McRoberts [49,54], another accuracy assessment was also performed graphically, taking the relationship between observed stand volume and estimated stand volume by a binned group of n = 30 samples by arranging estimated values in ascending order.

Results
Figure 2 shows the relationships between the RMSEs of the stand volume estimates and the number of explanatory variables used in Analysis 1.The results indicated that the addition of explanatory variables did not always contribute to error reduction.The numbers of additional feature variables used in the optimum combination models were one or three, suggesting that the relatively simple model performed well for stand volume estimation in our study area.The patterns of the relationships between the RMSEs of the stand volume estimates and the number of explanatory variables in Analysis 2 were the same as those in Analysis 1 (results are not shown).When adding FTYPE to spectral data, the RMSE and the rRMSE of all datasets decreased (Table 3).The rRMSE of BF decreased from 73.9%-75.0%to 60.6%-62.7%,and the rRMSE of ECF decreased from 55.2%-56.7% to 53.2%-55.0%.The addition of FTYP had a greater positive effect on accuracy improvement for BF.The RMSEs of the estimates decreased with the addition of some variables, but increased with the addition of others (Table 4).For example, the addition of most environmental factors to the spectral data effectively improved the accuracy of D0404 and D0712, and the addition of most environmental factors negatively affected D0525.The addition of one specific environmental factor to the basic models revealed that the addition of SRF had a consistent positive effect on accuracy improvement.In Analysis 1, the addition of SRF reduced the RMSE by 2.5, 2.0 and 1.5 m 3 /ha for D0404, D0525 and D0712, respectively.Similarly, in Analysis 2, the addition of SRF reduced the RMSE by 2.3, 0.7 and 1.6 m 3 /ha for D0404, D0525 and D0712, respectively.Although there were some exceptions (especially for D0525), WI and CI had positive effects on accuracy improvement.In contrast, Slope and SRI negatively affected the stand volume estimation in most cases.Values with an underline indicate cases where the RMSE was reduced compared with basic models.Explanatory variables of optimum combination models are: a spectral data and SRF; b spectral data and SRF; c spectral data, Elev, WI and SRF; d spectral data, FTYPE (forest type information), CI and SRF; e spectral data, FTYPE and SRF; f spectral data, FTYPE, CI and SRF.
Figure 3 shows the estimated stand volume from the optimum combination model of D0525 in Analysis 2. The estimated stand volume was overestimated for smaller volumes and underestimated for larger volumes (Figure 3a).However, the graph of the binned observations versus the binned estimates revealed that there was no bias in the results (Figure 3b).Similarly, all of the other models that included the basic models of Analysis 1 showed no biases in the results (results not shown).Finally, we created a spatial distribution map of stand volume in the study area.Figure 4 shows a map of stand volume estimated by the optimum combination model for D0525 from Analysis 1.This map clearly shows the differences in stand volume among municipal units, as well as the differences in stand volume within a municipal unit.The map shows that there are many "hot spots" of high stand volume values (displayed with orange and red colors) in Kyushu Island (e.g., stand volume ≥ 400 m 3 /ha), and the northern region of the study area shows relatively low stand volume.

Discussion
When we used spectral data alone, the rRMSEs of the estimates ranged from 62.0% to 63.2% in Analysis 1 and from 58.2% to 59.9% in Analysis 2, respectively (Table 3).Although the RMSEs of our results at the pixel level were larger than those reported by some previous studies, they were moderate compared with those reported by previous studies, e.g., 44.2% (China [7]), 47.6% (Finland [50]), 58%-80% (Sweden [42]), 59.0% (Sweden [52]), 66.2% (Japan [25]), 66.6% (Sweden [40]), 79.3% (Finland [51]) and 91% (Norway [10]).Accordingly, the stand volume in Japan could be estimated by the k-NN technique with accuracies similar to earlier studies in terms of the rRMSE.There are many reasons for the large RMSEs of pixel-level results, including outliers due to the spatial mismatch between the satellite image pixels and the field plots, within the scene variation of the atmospheric effect, and so on [3,49].In particular, the high value and wide range of stand volume of the ECF class of Japanese forests is one of the main reasons for large RMSEs [25], because the pattern of small-value overestimation and high-value underestimation (Figure 3a) is typical for the k-NN technique [21,25,40,42].Moreover, if we did not remove the FRMS plot data falling near the forest/non-forest boundary in this study, the RMSEs of the estimates might have been much larger.
When we used the combination models in addition to the basic models, the RMSEs of the estimates decreased for some variable combinations and increased for others (Figure 2 and Table 4).In other words, the positive effects of the environmental factors were not always as expected.As the inclusion of unrelated variables used to calculate distances would cause detrimental effects on accuracy [49,53], careful variable selection should be performed to avoid negative effects on estimation accuracy when using environmental factors as explanatory variables in the k-NN technique.
Compared with the six other environmental factors, SRF had a consistent positive effect on error reduction (Table 4).This consistency is important to apply this approach to operational tasks, such as the creation of nationwide stand volume maps.The combination of spectral data with SRF may be useful to estimate stand volume more accurately than in the case of spectral data alone.Compared with topographic-environmental factors, such as Slope and SRI, climatic-environmental factors, such as SRF, have a more significant spatial autocorrelation.Therefore, when adding SRF to the basic model, the nearest neighbors are selected from geographically closer plots.A previous study in Finland reported that the consideration of the geographic distance of nearest neighbor plots was useful for stand volume estimation, because of the gradual changes in vegetation structure in satellite images [17].Therefore, we assumed that the plots geographically close to the target pixel tend to be selected as nearest neighbors using SRF, and this may lead to improvements in the accuracy of stand volume estimations; however, this approach is different from that used by Katila and Tomppo [17].
Meanwhile, the positive effects of the addition of FTYPE were obvious in cases of both the spectral data alone (RMSE reduced by 8.7-10.2m 3 /ha) and the optimum combination model (RMSE reduced by 8.9-10.7 m 3 /ha) (Table 4).The accuracy improvement was large for the BF class, because the predictive ability of the models derived from the spectral data alone was insufficient (rRMSE ranging 73.9%-75.0%,Table 3).When adopting the k-NN technique in Japan and other countries with both coniferous and broad-leaved forests similar to our study area, the use of FTYPE would be useful for mapping stand volume more accurately.
As shown in Figure 4, there are clear differences in stand volume estimates among municipal units.Although the stand volume estimates made in this study showed overestimations for smaller volumes and underestimations for larger volumes, the binned assessment showed that there was no bias in the estimations made with the k-NN technique (Figure 3), as there was in some previous studies [49,54].The accuracy of estimates is improved when the aggregated pixels are evaluated [3,21].Therefore, although the pixel-level estimates made using the k-NN technique presented here have insufficient accuracy for forestry operations, they are useful for assessing the total or average stand volume at regional and national scales for strategic forest planning.
In this study, we used FTYPE, which was determined by the field plot data itself; however, in order to estimate stand volume accurately for unknown pixels, it is necessary to use additional digital maps concerning the forest type for the whole area being analyzed.Therefore, to efficiently use the method presented in this study, an accurate forest type map must be prepared, either manually or via an automatic image classification of remotely-sensed data.
The mapping of forest type is an ongoing project [55].Additional investigations should be conducted on mapping stand volumes and their uncertainties when using wall-to-wall digital forest type maps in accordance with some previous studies [21,22].These analyses and map products are expected to improve forest statistics at the regional and national scale and to support sustainable forest management in Japan.

Conclusions
To evaluate the effectiveness of adding feature variables to satellite image data on the accuracy of stand volume estimates using the k-NN technique, the estimates from three Landsat ETM+ datasets acquired in different seasons with various combinations of additional feature variables were compared.
The results showed that the addition of environmental factors to satellite image data did not always help to improve estimation accuracy.To avoid negative effects on the accuracy of stand volume estimates, careful variable selection should be performed.Among the environmental factors tested in this study, summer rainfall data had a consistent positive effect on accuracy improvement.Therefore, summer rainfall data may be a useful feature variable in stand volume estimations in this study area.The use of forest type information improved the estimation accuracy, particularly for broad-leaved forests.When adopting the k-NN technique in Japan and other countries with both coniferous and broad-leaved forests similar to our study area, the use of forest type information would be useful for mapping stand volume accurately.The binned assessment of stand volume estimates showed that there were no biases in the estimations made with the k-NN technique.Thus, such a map of estimated stand volume would be useful for assessing the total or average stand volume at regional and national scales for strategic forest planning.
All of these results indicated that the k-NN technique combined with appropriate feature variables is applicable to nationwide stand volume estimation in Japan.To ensure our results and to accurately estimate the stand volume for unknown pixels, additional investigations should be conducted on mapping stand volumes and their uncertainties when using wall-to-wall digital forest type maps.In the same manner, the effectiveness of summer rainfall data should be tested in other regions where the climatic characteristics differ from those of our study area.

Figure 1 .
Figure 1.Location of the study area.

Figure 2 .
Figure 2. Relationships between RMSEs of stand volume estimates and the number of explanatory variables for Analysis 1: (a) D0404; (b) D0525; (c) D0712.The RMSEs for six variables indicate the results from the basic models (spectral data alone), and the RMSEs for 13 variables indicate the results from all spectral data and environmental factors; the horizontal lines show the RMSEs of the estimates using basic models.

Figure 3 .
Figure 3. Scatter plot of the observed stand volume versus the estimated stand volume using the optimum combination model of D0525 in Analysis 2. (a) Pixel-level observed stand volume versus estimated stand volume.(b) Binned pixel-level observed stand volume versus binned pixel-level estimated stand volume (group size = 30).The explanatory variables of the optimum combination model of D0525 were spectral data, FTYPE and SRF.The optimum value of k = 20 (the smallest value of k for which the RMSE was not greater than 1% of the smallest RMSE value) was used.

Figure 4 .
Figure 4. Map of the estimated stand volume.Estimates made by the optimum combination model of D0525 from Analysis 1 are shown as an example.The explanatory variables of the model were spectral data and SRF.The optimum value of k = 21 was used.Non-forest mask was generated from GIS data ("National Land numerical information (Forest Region Version 3.1), Ministry of Land, Infrastructure, Transport and Tourism").

Table 1 .
Information of the original ETM+ data used in this study.

Table 2 .
Summary of the plot stand volume data used in the present study following the removal of potentially erroneous data.ECF, evergreen coniferous forest; BF, broad-leaved forest.

Table 3 .
Summary of RMSE and rRMSE for the stand volume estimates of the basic models for each dataset.