Maximizing the Diversity of Ensemble Random Forests for Tree Genera Classification Using High Density LiDAR Data
AbstractRecent research into improving the effectiveness of forest inventory management using airborne LiDAR data has focused on developing advanced theories in data analytics. Furthermore, supervised learning as a predictive model for classifying tree genera (and species, where possible) has been gaining popularity in order to minimize this labor-intensive task. However, bottlenecks remain that hinder the immediate adoption of supervised learning methods. With supervised classification, training samples are required for learning the parameters that govern the performance of a classifier, yet the selection of training data is often subjective and the quality of such samples is critically important. For LiDAR scanning in forest environments, the quantification of data quality is somewhat abstract, normally referring to some metric related to the completeness of individual tree crowns; however, this is not an issue that has received much attention in the literature. Intuitively the choice of training samples having varying quality will affect classification accuracy. In this paper a Diversity Index (DI) is proposed that characterizes the diversity of data quality (Qi) among selected training samples required for constructing a classification model of tree genera. The training sample is diversified in terms of data quality as opposed to the number of samples per class. The diversified training sample allows the classifier to better learn the positive and negative instances and; therefore; has a higher classification accuracy in discriminating the “unknown” class samples from the “known” samples. Our algorithm is implemented within the Random Forests base classifiers with six derived geometric features from LiDAR data. The training sample contains three tree genera (pine; poplar; and maple) and the validation samples contains four labels (pine; poplar; maple; and “unknown”). Classification accuracy improved from 72.8%; when training samples were selected randomly (with stratified sample size); to 93.8%; when samples were selected with additional criteria; and from 88.4% to 93.8% when an ensemble method was used. View Full-Text
Share & Cite This Article
Ko, C.; Sohn, G.; Remmel, T.K.; Miller, J.R. Maximizing the Diversity of Ensemble Random Forests for Tree Genera Classification Using High Density LiDAR Data. Remote Sens. 2016, 8, 646.
Ko C, Sohn G, Remmel TK, Miller JR. Maximizing the Diversity of Ensemble Random Forests for Tree Genera Classification Using High Density LiDAR Data. Remote Sensing. 2016; 8(8):646.Chicago/Turabian Style
Ko, Connie; Sohn, Gunho; Remmel, Tarmo K.; Miller, John R. 2016. "Maximizing the Diversity of Ensemble Random Forests for Tree Genera Classification Using High Density LiDAR Data." Remote Sens. 8, no. 8: 646.
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.