Next Article in Journal
An Investigation of Oxide Coating Synthesized on an Aluminum Alloy by Plasma Electrolytic Oxidation in Molten Salt
Next Article in Special Issue
Habitat Potential Mapping of Marten (Martes flavigula) and Leopard Cat (Prionailurus bengalensis) in South Korea Using Artificial Neural Network Machine Learning
Previous Article in Journal
Stochastic and Higher-Order Effects on Exploding Pulses
Previous Article in Special Issue
Road Safety Risk Evaluation Using GIS-Based Data Envelopment Analysis—Artificial Neural Networks Approach
Article Menu
Issue 9 (September) cover image

Export Article

Open AccessArticle
Appl. Sci. 2017, 7(9), 888; doi:10.3390/app7090888

Impacts of Sample Design for Validation Data on the Accuracy of Feedforward Neural Network Classification

School of Geography, University of Nottingham, University Park, Nottingham NG7 2RD, UK
Received: 20 July 2017 / Revised: 11 August 2017 / Accepted: 21 August 2017 / Published: 30 August 2017
(This article belongs to the Special Issue Application of Artificial Neural Networks in Geoinformatics)
View Full-Text   |   Download PDF [684 KB, uploaded 30 August 2017]   |  

Abstract

Validation data are often used to evaluate the performance of a trained neural network and used in the selection of a network deemed optimal for the task at-hand. Optimality is commonly assessed with a measure, such as overall classification accuracy. The latter is often calculated directly from a confusion matrix showing the counts of cases in the validation set with particular labelling properties. The sample design used to form the validation set can, however, influence the estimated magnitude of the accuracy. Commonly, the validation set is formed with a stratified sample to give balanced classes, but also via random sampling, which reflects class abundance. It is suggested that if the ultimate aim is to accurately classify a dataset in which the classes do vary in abundance, a validation set formed via random, rather than stratified, sampling is preferred. This is illustrated with the classification of simulated and remotely-sensed datasets. With both datasets, statistically significant differences in the accuracy with which the data could be classified arose from the use of validation sets formed via random and stratified sampling (z = 2.7 and 1.9 for the simulated and real datasets respectively, for both p < 0.05%). The accuracy of the classifications that used a stratified sample in validation were smaller, a result of cases of an abundant class being commissioned into a rarer class. Simple means to address the issue are suggested. View Full-Text
Keywords: cross-validation; multi-layer perceptron; remote sensing; classification error; sample design; machine learning cross-validation; multi-layer perceptron; remote sensing; classification error; sample design; machine learning
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Foody, G.M. Impacts of Sample Design for Validation Data on the Accuracy of Feedforward Neural Network Classification. Appl. Sci. 2017, 7, 888.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Appl. Sci. EISSN 2076-3417 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top