Next Article in Journal
Grassland and Cropland Net Ecosystem Production of the U.S. Great Plains: Regression Tree Model Development and Comparative Analysis
Previous Article in Journal
Water Constituents and Water Depth Retrieval from Sentinel-2A—A First Evaluation in an Oligotrophic Lake
Article Menu

Export Article

Open AccessArticle
Remote Sens. 2016, 8(11), 943; doi:10.3390/rs8110943

An Optimal Sample Data Usage Strategy to Minimize Overfitting and Underfitting Effects in Regression Tree Models Based on Remotely-Sensed Data

1
ASRC InuTeq, Contractor to US Geological Survey (USGS) Earth Resources Observation and Science (EROS) Center, 47914 252nd Street, Sioux Falls, SD 57198, USA
2
US Geological Survey (USGS) Earth Resources Observation and Science (EROS) Center, 47914 252nd Street, Sioux Falls, SD 57198, USA
3
Stinger Ghaffarian Technologies (SGT), Contractor to US Geological Survey (USGS) Earth Resources Observation and Science (EROS) Center, 47914 252nd Street, Sioux Falls, SD 57198, USA
*
Author to whom correspondence should be addressed.
Academic Editors: Dongdong Wang and Prasad S. Thenkabail
Received: 11 August 2016 / Revised: 13 October 2016 / Accepted: 7 November 2016 / Published: 11 November 2016
View Full-Text   |   Download PDF [3174 KB, uploaded 11 November 2016]   |  

Abstract

Regression tree models have been widely used for remote sensing-based ecosystem mapping. Improper use of the sample data (model training and testing data) may cause overfitting and underfitting effects in the model. The goal of this study is to develop an optimal sampling data usage strategy for any dataset and identify an appropriate number of rules in the regression tree model that will improve its accuracy and robustness. Landsat 8 data and Moderate-Resolution Imaging Spectroradiometer-scaled Normalized Difference Vegetation Index (NDVI) were used to develop regression tree models. A Python procedure was designed to generate random replications of model parameter options across a range of model development data sizes and rule number constraints. The mean absolute difference (MAD) between the predicted and actual NDVI (scaled NDVI, value from 0–200) and its variability across the different randomized replications were calculated to assess the accuracy and stability of the models. In our case study, a six-rule regression tree model developed from 80% of the sample data had the lowest MAD (MADtraining = 2.5 and MADtesting = 2.4), which was suggested as the optimal model. This study demonstrates how the training data and rule number selections impact model accuracy and provides important guidance for future remote-sensing-based ecosystem modeling. View Full-Text
Keywords: remote sensing; data mining; regression tree mapping model; Cubist optimization; Python scripts; overfitting; underfitting; MODIS NDVI; Landsat remote sensing; data mining; regression tree mapping model; Cubist optimization; Python scripts; overfitting; underfitting; MODIS NDVI; Landsat
Figures

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Supplementary material

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Gu, Y.; Wylie, B.K.; Boyte, S.P.; Picotte, J.; Howard, D.M.; Smith, K.; Nelson, K.J. An Optimal Sample Data Usage Strategy to Minimize Overfitting and Underfitting Effects in Regression Tree Models Based on Remotely-Sensed Data. Remote Sens. 2016, 8, 943.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Remote Sens. EISSN 2072-4292 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top