Next Article in Journal
The Gene Structure and Expression Level Changes of the GH3 Gene Family in Brassica napus Relative to Its Diploid Ancestors
Next Article in Special Issue
A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping
Previous Article in Journal
Overexpression of Rice Rab7 Gene Improves Drought and Heat Tolerance and Increases Grain Yield in Rice (Oryza sativa L.)
Previous Article in Special Issue
Network Analyses of Integrated Differentially Expressed Genes in Papillary Thyroid Carcinoma to Identify Characteristic Genes
Article Menu
Issue 1 (January) cover image

Export Article

Open AccessArticle

A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction

1, 2,*, 3 and 3,*
1
School of Information, Yunnan Normal University, Kunming 650500, China
2
Key Laboratory of Educational Informatization for Nationalities Ministry of Education, Yunnan Normal University, Kunming 650500, China
3
School of Software, Yunnan University, Kunming 650091, China
*
Authors to whom correspondence should be addressed.
Genes 2019, 10(1), 57; https://doi.org/10.3390/genes10010057
Received: 30 November 2018 / Revised: 1 January 2019 / Accepted: 10 January 2019 / Published: 17 January 2019
  |  
PDF [2927 KB, uploaded 22 January 2019]
  |  

Abstract

With the continuous accumulation of biological data, more and more machine learning algorithms have been introduced into the field of gene function prediction, which has great significance in decoding the secret of life. Recently, a multi-label supervised topic model named labeled latent Dirichlet allocation (LLDA) has been applied to gene function prediction, and obtained more accurate and explainable predictions than conventional methods. Nonetheless, the LLDA model is only able to construct a bag of amino acid words as a classification feature, and does not support any other features, such as hydrophobicity, which has a profound impact on gene function. To achieve more accurate probabilistic modeling of gene function, we propose a multi-label supervised topic model conditioned on arbitrary features, named Dirichlet multinomial regression LLDA (DMR-LLDA), for introducing multiple types of features into the process of topic modeling. Based on DMR framework, DMR-LLDA applies an exponential a priori construction, previously with weighted features, on the hyper-parameters of gene-topic distribution, so as to reflect the effects of extra features on function probability distribution. In the five-fold cross validation experiment of a yeast datasets, DMR-LLDA outperforms the compared model significantly. All of these experiments demonstrate the effectiveness and potential value of DMR-LLDA for predicting gene function. View Full-Text
Keywords: multi-label classification; topic model; gene function; probability distribution; Dirichlet-multinomial Regression multi-label classification; topic model; gene function; probability distribution; Dirichlet-multinomial Regression
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Liu, L.; Tang, L.; Jin, X.; Zhou, W. A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction. Genes 2019, 10, 57.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Genes EISSN 2073-4425 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top