Next Article in Journal
The Status of Air Quality in the United States During the COVID-19 Pandemic: A Remote Sensing Perspective
Previous Article in Journal
Comparative Analysis of the Global Forest/Non-Forest Maps Derived from SAR and Optical Sensors. Case Studies from Brazilian Amazon and Cerrado Biomes
Previous Article in Special Issue
Learning Rotation Domain Deep Mutual Information Using Convolutional LSTM for Unsupervised PolSAR Image Classification
Open AccessArticle

Effects of Training Set Size on Supervised Machine-Learning Land-Cover Classification of Large-Area High-Resolution Remotely Sensed Data

1
Department of Management Information Systems, West Virginia University, Morgantown, WV 26506, USA
2
Department of Geology and Geography, West Virginia University, Morgantown, WV 26506, USA
*
Author to whom correspondence should be addressed.
Academic Editor: Kacem Chehdi
Remote Sens. 2021, 13(3), 368; https://doi.org/10.3390/rs13030368
Received: 28 December 2020 / Revised: 16 January 2021 / Accepted: 18 January 2021 / Published: 21 January 2021
(This article belongs to the Special Issue Remote Sensing Data and Classification Algorithms)
The size of the training data set is a major determinant of classification accuracy. Nevertheless, the collection of a large training data set for supervised classifiers can be a challenge, especially for studies covering a large area, which may be typical of many real-world applied projects. This work investigates how variations in training set size, ranging from a large sample size (n = 10,000) to a very small sample size (n = 40), affect the performance of six supervised machine-learning algorithms applied to classify large-area high-spatial-resolution (HR) (1–5 m) remotely sensed data within the context of a geographic object-based image analysis (GEOBIA) approach. GEOBIA, in which adjacent similar pixels are grouped into image-objects that form the unit of the classification, offers the potential benefit of allowing multiple additional variables, such as measures of object geometry and texture, thus increasing the dimensionality of the classification input data. The six supervised machine-learning algorithms are support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), learning vector quantization (LVQ), and gradient-boosted trees (GBM). RF, the algorithm with the highest overall accuracy, was notable for its negligible decrease in overall accuracy, 1.0%, when training sample size decreased from 10,000 to 315 samples. GBM provided similar overall accuracy to RF; however, the algorithm was very expensive in terms of training time and computational resources, especially with large training sets. In contrast to RF and GBM, NEU, and SVM were particularly sensitive to decreasing sample size, with NEU classifications generally producing overall accuracies that were on average slightly higher than SVM classifications for larger sample sizes, but lower than SVM for the smallest sample sizes. NEU however required a longer processing time. The k-NN classifier saw less of a drop in overall accuracy than NEU and SVM as training set size decreased; however, the overall accuracies of k-NN were typically less than RF, NEU, and SVM classifiers. LVQ generally had the lowest overall accuracy of all six methods, but was relatively insensitive to sample size, down to the smallest sample sizes. Overall, due to its relatively high accuracy with small training sample sets, and minimal variations in overall accuracy between very large and small sample sets, as well as relatively short processing time, RF was a good classifier for large-area land-cover classifications of HR remotely sensed data, especially when training data are scarce. However, as performance of different supervised classifiers varies in response to training set size, investigating multiple classification algorithms is recommended to achieve optimal accuracy for a project. View Full-Text
Keywords: training sample size; supervised machine learning; high-resolution imagery; large area; GEOBIA training sample size; supervised machine learning; high-resolution imagery; large area; GEOBIA
Show Figures

Figure 1

MDPI and ACS Style

Ramezan, C.A.; Warner, T.A.; Maxwell, A.E.; Price, B.S. Effects of Training Set Size on Supervised Machine-Learning Land-Cover Classification of Large-Area High-Resolution Remotely Sensed Data. Remote Sens. 2021, 13, 368. https://doi.org/10.3390/rs13030368

AMA Style

Ramezan CA, Warner TA, Maxwell AE, Price BS. Effects of Training Set Size on Supervised Machine-Learning Land-Cover Classification of Large-Area High-Resolution Remotely Sensed Data. Remote Sensing. 2021; 13(3):368. https://doi.org/10.3390/rs13030368

Chicago/Turabian Style

Ramezan, Christopher A.; Warner, Timothy A.; Maxwell, Aaron E.; Price, Bradley S. 2021. "Effects of Training Set Size on Supervised Machine-Learning Land-Cover Classification of Large-Area High-Resolution Remotely Sensed Data" Remote Sens. 13, no. 3: 368. https://doi.org/10.3390/rs13030368

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop