Next Article in Journal
Determination of Concentration of the Aqueous Lithium–Bromide Solution in a Vapour Absorption Refrigeration System by Measurement of Electrical Conductivity and Temperature
Next Article in Special Issue
An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data
Previous Article in Journal
Acknowledgement to Reviewers of Data in 2016
Previous Article in Special Issue
Application of Taxonomic Modeling to Microbiota Data Mining for Detection of Helminth Infection in Global Populations
Article Menu

Export Article

Open AccessArticle
Data 2017, 2(1), 5; doi:10.3390/data2010005

Learning Parsimonious Classification Rules from Gene Expression Data Using Bayesian Networks with Local Structure

1
Red Bank Veterinary Hospital, 2051 Briggs Road, Mount Laurel, NJ 08054, USA
2
Intelligent Systems Program, University of Pittsburgh, 5113 Sennott Square, 210 South Bouquet Street, Pittsburgh, PA 15260, USA
3
Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206, USA
These authors contributed equally to this work.
*
Author to whom correspondence should be addressed.
Academic Editor: Pufeng Du
Received: 30 September 2016 / Revised: 14 December 2016 / Accepted: 9 January 2017 / Published: 18 January 2017
(This article belongs to the Special Issue Biomedical Informatics)
View Full-Text   |   Download PDF [331 KB, uploaded 18 January 2017]   |  

Abstract

The comprehensibility of good predictive models learned from high-dimensional gene expression data is attractive because it can lead to biomarker discovery. Several good classifiers provide comparable predictive performance but differ in their abilities to summarize the observed data. We extend a Bayesian Rule Learning (BRL-GSS) algorithm, previously shown to be a significantly better predictor than other classical approaches in this domain. It searches a space of Bayesian networks using a decision tree representation of its parameters with global constraints, and infers a set of IF-THEN rules. The number of parameters and therefore the number of rules are combinatorial in the number of predictor variables in the model. We relax these global constraints to learn a more expressive local structure with BRL-LSS. BRL-LSS entails a more parsimonious set of rules because it does not have to generate all combinatorial rules. The search space of local structures is much richer than the space of global structures. We design the BRL-LSS with the same worst-case time-complexity as BRL-GSS while exploring a richer and more complex model space. We measure predictive performance using Area Under the ROC curve (AUC) and Accuracy. We measure model parsimony performance by noting the average number of rules and variables needed to describe the observed data. We evaluate the predictive and parsimony performance of BRL-GSS, BRL-LSS and the state-of-the-art C4.5 decision tree algorithm, across 10-fold cross-validation using ten microarray gene-expression diagnostic datasets. In these experiments, we observe that BRL-LSS is similar to BRL-GSS in terms of predictive performance, while generating a much more parsimonious set of rules to explain the same observed data. BRL-LSS also needs fewer variables than C4.5 to explain the data with similar predictive performance. We also conduct a feasibility study to demonstrate the general applicability of our BRL methods on the newer RNA sequencing gene-expression data. View Full-Text
Keywords: rule based models; gene expression data; Bayesian networks; parsimony rule based models; gene expression data; Bayesian networks; parsimony
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Supplementary materials

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Lustgarten, J.L.; Balasubramanian, J.B.; Visweswaran, S.; Gopalakrishnan, V. Learning Parsimonious Classification Rules from Gene Expression Data Using Bayesian Networks with Local Structure. Data 2017, 2, 5.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Data EISSN 2306-5729 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top