Biomedical Informatics

A special issue of Data (ISSN 2306-5729). This special issue belongs to the section "Computational Biology, Bioinformatics, and Biomedical Data Science".

Deadline for manuscript submissions: closed (30 September 2016) | Viewed by 31925

Special Issue Editor

Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206-3701, USA
Interests: hybrid machine learning; multimodal intelligence; biomedical data science; biomarker discovery; rule learning; educational technologies; early detection; precision medicine; pattern recognition from multimodal signals; risk assessment for management
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The volume of biomedical data, both structured and unstructured, has been unprecedently increasing for the past several years. There is an enormous need for methods to store, process, and interpret these data, to yield new knowledge and insights. The lack of well-defined standards requires novel modes to sort out useful information, and thereby to change the conventional thinking from data acquisition to data analysis and interpretation. In this Special Issue, we aim to present the documentation and reuse of biomedical data, and seek to provide scientists and clinicians with approaches available to specific applications.

We are looking to put together a comprehensive set of papers that highlight the availability of public datasets in bioinformatics, public health informatics, clinical and translational research, including applications that use electronic health records, imaging data, biomedical literature or ontologies. We also welcome new research papers that involve currently unavailable biomedical data that the authors are willing to make publicly available along with their article. We are hoping by way of this Special Issue to bring awareness to the scientific and informatics communities of the challenges in the acquisition, processing, storage, and analyses of biomedical data sets. Novel methods for data cleaning, privacy, structuring, semantics extraction, and analyses will also be considered and highlighted in this Special Issue.

Dr. Vanathi Gopalakrishnan
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Data is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Biomedical Data
  • Genomic, Proteomic, Metabolomic, Lipidomic
  • Clinical Research Informatics
  • Electronic health record processing
  • Quantitative and Qualitative Biomarkers
  • Imaging Data
  • Biosurveillance
  • Knowledge Base and Ontologies
  • Semantic Tagging
  • Machine Learning
  • Data Standardization
  • Biomedical Database
  • Drug Discovery
  • Health Informatics

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

5473 KiB  
Article
Demonstration Study: A Protocol to Combine Online Tools and Databases for Identifying Potentially Repurposable Drugs
by Aditi Chattopadhyay and Madhavi K. Ganapathiraju
Data 2017, 2(2), 15; https://doi.org/10.3390/data2020015 - 04 May 2017
Cited by 3 | Viewed by 6316
Abstract
Traditional methods for discovery and development of new drugs can be very time-consuming and expensive processes because they include several stages, such as compound identification, pre-clinical and clinical trials before the drug is approved by the U.S. Food and Drug Administration (FDA). Therefore, [...] Read more.
Traditional methods for discovery and development of new drugs can be very time-consuming and expensive processes because they include several stages, such as compound identification, pre-clinical and clinical trials before the drug is approved by the U.S. Food and Drug Administration (FDA). Therefore, drug repurposing, namely using currently FDA-approved drugs as therapeutics for other diseases than what they are originally prescribed for, is emerging to be a faster and more cost-effective alternative to current drug discovery methods. In this paper, we have described a three-step in silico protocol for analyzing transcriptomics data using online databases and bioinformatics tools for identifying potentially repurposable drugs. The efficacy of this protocol was evaluated by comparing its predictions with the findings of two case studies of recently reported repurposed drugs: HIV treating drug zidovudine for the treatment of dry age-related macular degeneration and the antidepressant imipramine for small-cell lung carcinoma. The proposed protocol successfully identified the published findings, thus demonstrating the efficacy of this method. In addition, it also yielded several novel predictions that have not yet been published, including the finding that imipramine could potentially treat Severe Acute Respiratory Syndrome (SARS), a disease that currently does not have any treatment or vaccine. Since this in silico protocol is simple to use and does not require advanced computer skills, we believe any motivated participant with access to these databases and tools would be able to apply it to large datasets to identify other potentially repurposable drugs in the future. Full article
(This article belongs to the Special Issue Biomedical Informatics)
Show Figures

Figure 1

2319 KiB  
Article
An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data
by Yuzhe Liu and Vanathi Gopalakrishnan
Data 2017, 2(1), 8; https://doi.org/10.3390/data2010008 - 25 Jan 2017
Cited by 60 | Viewed by 8486
Abstract
Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages [...] Read more.
Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models. Full article
(This article belongs to the Special Issue Biomedical Informatics)
Show Figures

Figure 2

331 KiB  
Article
Learning Parsimonious Classification Rules from Gene Expression Data Using Bayesian Networks with Local Structure
by Jonathan Lyle Lustgarten, Jeya Balaji Balasubramanian, Shyam Visweswaran and Vanathi Gopalakrishnan
Data 2017, 2(1), 5; https://doi.org/10.3390/data2010005 - 18 Jan 2017
Cited by 5 | Viewed by 6443
Abstract
The comprehensibility of good predictive models learned from high-dimensional gene expression data is attractive because it can lead to biomarker discovery. Several good classifiers provide comparable predictive performance but differ in their abilities to summarize the observed data. We extend a Bayesian Rule [...] Read more.
The comprehensibility of good predictive models learned from high-dimensional gene expression data is attractive because it can lead to biomarker discovery. Several good classifiers provide comparable predictive performance but differ in their abilities to summarize the observed data. We extend a Bayesian Rule Learning (BRL-GSS) algorithm, previously shown to be a significantly better predictor than other classical approaches in this domain. It searches a space of Bayesian networks using a decision tree representation of its parameters with global constraints, and infers a set of IF-THEN rules. The number of parameters and therefore the number of rules are combinatorial in the number of predictor variables in the model. We relax these global constraints to learn a more expressive local structure with BRL-LSS. BRL-LSS entails a more parsimonious set of rules because it does not have to generate all combinatorial rules. The search space of local structures is much richer than the space of global structures. We design the BRL-LSS with the same worst-case time-complexity as BRL-GSS while exploring a richer and more complex model space. We measure predictive performance using Area Under the ROC curve (AUC) and Accuracy. We measure model parsimony performance by noting the average number of rules and variables needed to describe the observed data. We evaluate the predictive and parsimony performance of BRL-GSS, BRL-LSS and the state-of-the-art C4.5 decision tree algorithm, across 10-fold cross-validation using ten microarray gene-expression diagnostic datasets. In these experiments, we observe that BRL-LSS is similar to BRL-GSS in terms of predictive performance, while generating a much more parsimonious set of rules to explain the same observed data. BRL-LSS also needs fewer variables than C4.5 to explain the data with similar predictive performance. We also conduct a feasibility study to demonstrate the general applicability of our BRL methods on the newer RNA sequencing gene-expression data. Full article
(This article belongs to the Special Issue Biomedical Informatics)
Show Figures

Figure 1

590 KiB  
Article
Application of Taxonomic Modeling to Microbiota Data Mining for Detection of Helminth Infection in Global Populations
by Mahbaneh Eshaghzadeh Torbati, Makedonka Mitreva and Vanathi Gopalakrishnan
Data 2016, 1(3), 19; https://doi.org/10.3390/data1030019 - 13 Dec 2016
Cited by 21 | Viewed by 5315
Abstract
Human microbiome data from genomic sequencing technologies is fast accumulating, giving us insights into bacterial taxa that contribute to health and disease. The predictive modeling of such microbiota count data for the classification of human infection from parasitic worms, such as helminths, can [...] Read more.
Human microbiome data from genomic sequencing technologies is fast accumulating, giving us insights into bacterial taxa that contribute to health and disease. The predictive modeling of such microbiota count data for the classification of human infection from parasitic worms, such as helminths, can help in the detection and management across global populations. Real-world datasets of microbiome experiments are typically sparse, containing hundreds of measurements for bacterial species, of which only a few are detected in the bio-specimens that are analyzed. This feature of microbiome data produces the challenge of needing more observations for accurate predictive modeling and has been dealt with previously, using different methods of feature reduction. To our knowledge, integrative methods, such as transfer learning, have not yet been explored in the microbiome domain as a way to deal with data sparsity by incorporating knowledge of different but related datasets. One way of incorporating this knowledge is by using a meaningful mapping among features of these datasets. In this paper, we claim that this mapping would exist among members of each individual cluster, grouped based on phylogenetic dependency among taxa and their association to the phenotype. We validate our claim by showing that models incorporating associations in such a grouped feature space result in no performance deterioration for the given classification task. In this paper, we test our hypothesis by using classification models that detect helminth infection in microbiota of human fecal samples obtained from Indonesia and Liberia countries. In our experiments, we first learn binary classifiers for helminth infection detection by using Naive Bayes, Support Vector Machines, Multilayer Perceptrons, and Random Forest methods. In the next step, we add taxonomic modeling by using the SMART-scan module to group the data, and learn classifiers using the same four methods, to test the validity of the achieved groupings. We observed a 6% to 23% and 7% to 26% performance improvement based on the Area Under the receiver operating characteristic (ROC) Curve (AUC) and Balanced Accuracy (Bacc) measures, respectively, over 10 runs of 10-fold cross-validation. These results show that using phylogenetic dependency for grouping our microbiota data actually results in a noticeable improvement in classification performance for helminth infection detection. These promising results from this feasibility study demonstrate that methods such as SMART-scan can be utilized in the future for knowledge transfer from different but related microbiome datasets by phylogenetically-related functional mapping, to enable novel integrative biomarker discovery. Full article
(This article belongs to the Special Issue Biomedical Informatics)
Show Figures

Figure 1

Other

Jump to: Research

302 KiB  
Data Descriptor
SNiPhunter: A SNP-Based Search Engine
by Werner P. Veldsman and Alan Christoffels
Data 2016, 1(3), 17; https://doi.org/10.3390/data1030017 - 29 Sep 2016
Cited by 1 | Viewed by 4700
Abstract
Procuring biomedical literature is a time-consuming process. The genomic sciences software solution described here indexes literature from Pubmed Central’s open access initiative, and makes it available as a web application and through an application programming interface (API). The purpose of this tertiary data [...] Read more.
Procuring biomedical literature is a time-consuming process. The genomic sciences software solution described here indexes literature from Pubmed Central’s open access initiative, and makes it available as a web application and through an application programming interface (API). The purpose of this tertiary data artifact—called SNiPhunter—is to assist researchers in finding articles relevant to a reference single nucleotide polymorphism (SNP) identifier of interest. A novel feature of this NoSQL (not only structured query language) database search engine is that it returns results to the user ordered according to the amount of times a refSNP has appeared in an article, thereby allowing the user to make a quantitative estimate as to the relevance of an article. Queries can also be launched using author-defined keywords. Additional features include a variant call format (VCF) file parser and a multiple query file upload service. Software implementation in this project relied on Python and the NodeJS interpreter, as well as third party libraries retrieved from Github. Full article
(This article belongs to the Special Issue Biomedical Informatics)
Show Figures

Graphical abstract

Back to TopTop