Next Article in Journal
Data on Healthy Food Accessibility in Amsterdam, The Netherlands
Next Article in Special Issue
Demonstration Study: A Protocol to Combine Online Tools and Databases for Identifying Potentially Repurposable Drugs
Previous Article in Journal
Determination of Concentration of the Aqueous Lithium–Bromide Solution in a Vapour Absorption Refrigeration System by Measurement of Electrical Conductivity and Temperature
Previous Article in Special Issue
Learning Parsimonious Classification Rules from Gene Expression Data Using Bayesian Networks with Local Structure
Article Menu

Export Article

Open AccessArticle
Data 2017, 2(1), 8; doi:10.3390/data2010008

An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data

1
Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15260, USA
2
Medical Scientist Training Program, University of Pittsburgh, Pittsburgh, PA 15260, USA
3
Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260, USA
4
Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15260, USA
*
Author to whom correspondence should be addressed.
Academic Editor: Henning Müller
Received: 24 September 2016 / Revised: 15 December 2016 / Accepted: 18 January 2017 / Published: 25 January 2017
(This article belongs to the Special Issue Biomedical Informatics)
View Full-Text   |   Download PDF [2319 KB, uploaded 25 January 2017]   |  

Abstract

Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models. View Full-Text
Keywords: missing value imputation; machine learning; decision tree imputation; k-nearest neighbors imputation; self-organizing map imputation missing value imputation; machine learning; decision tree imputation; k-nearest neighbors imputation; self-organizing map imputation
Figures

Figure 2

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Supplementary material

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Liu, Y.; Gopalakrishnan, V. An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data. Data 2017, 2, 8.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Data EISSN 2306-5729 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top