Next Article in Journal
Remote Sensing of River Delta Inundation: Exploiting the Potential of Coarse Spatial Resolution, Temporally-Dense MODIS Time Series
Next Article in Special Issue
Evaluation of Polarimetric SAR Decomposition for Classifying Wetland Vegetation Types
Previous Article in Journal
Multi-Frequency Polarimetric SAR Classification Based on Riemannian Manifold and Simultaneous Sparse Representation
Previous Article in Special Issue
A Collection of SAR Methodologies for Monitoring Wetlands
Article Menu

Export Article

Open AccessArticle
Remote Sens. 2015, 7(7), 8489-8515; doi:10.3390/rs70708489

On the Importance of Training Data Sample Selection in Random Forest Image Classification: A Case Study in Peatland Ecosystem Mapping

Department of Geography and Environmental Studies, Carleton University, Ottawa, ON K1S 5B6, Canada
*
Author to whom correspondence should be addressed.
Academic Editors: Alisa L. Gallant and Prasad S. Thenkabail
Received: 31 March 2015 / Revised: 15 June 2015 / Accepted: 23 June 2015 / Published: 6 July 2015
(This article belongs to the Special Issue Towards Remote Long-Term Monitoring of Wetland Landscapes)
View Full-Text   |   Download PDF [20158 KB, uploaded 7 July 2015]   |  

Abstract

Random Forest (RF) is a widely used algorithm for classification of remotely sensed data. Through a case study in peatland classification using LiDAR derivatives, we present an analysis of the effects of input data characteristics on RF classifications (including RF out-of-bag error, independent classification accuracy and class proportion error). Training data selection and specific input variables (i.e., image channels) have a large impact on the overall accuracy of the image classification. High-dimension datasets should be reduced so that only uncorrelated important variables are used in classifications. Despite the fact that RF is an ensemble approach, independent error assessments should be used to evaluate RF results, and iterative classifications are recommended to assess the stability of predicted classes. Results are also shown to be highly sensitive to the size of the training data set. In addition to being as large as possible, the training data sets used in RF classification should also be (a) randomly distributed or created in a manner that allows for the class proportions of the training data to be representative of actual class proportions in the landscape; and (b) should have minimal spatial autocorrelation to improve classification results and to mitigate inflated estimates of RF out-of-bag classification accuracy. View Full-Text
Keywords: Random Forest; classification; training data sample selection; peatland; wetland; LiDAR Random Forest; classification; training data sample selection; peatland; wetland; LiDAR
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Supplementary material

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Millard, K.; Richardson, M. On the Importance of Training Data Sample Selection in Random Forest Image Classification: A Case Study in Peatland Ecosystem Mapping. Remote Sens. 2015, 7, 8489-8515.

Show more citation formats Show less citations formats

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Remote Sens. EISSN 2072-4292 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top