Background: Long non-coding Ribonucleic Acids (lncRNAs) can be localized to different cellular compartments, such as the nuclear and the cytoplasmic regions. Their biological functions are influenced by the region of the cell where they are located. Compared to the vast number of lncRNAs,
[...] Read more.
Background: Long non-coding Ribonucleic Acids (lncRNAs) can be localized to different cellular compartments, such as the nuclear and the cytoplasmic regions. Their biological functions are influenced by the region of the cell where they are located. Compared to the vast number of lncRNAs, only a relatively small proportion have annotations regarding their subcellular localization. It would be helpful if those few annotated lncRNAs could be leveraged to develop predictive models for localization of other lncRNAs.
Methods: Conventional computational methods use
q-mer profiles from lncRNA sequences and train machine learning models such as support vector machines and logistic regression with the profiles. These methods focus on the exact
q-mer. Given possible sequence mutations and other uncertainties in genomic sequences and their role in biological function, a consideration of these variabilities might improve our ability to model lncRNAs and their localization. Thus, we build on inexact
q-mers and use machine learning/deep learning techniques to study three specific problems in lncRNA subcellular localization, namely, prediction of lncRNA localization using inexact
q-mers, the issue of whether lncRNA localization is cell-type-specific, and the notion of switching (lncRNA) genes.
Results: We performed our analysis using data on lncRNA localization across 15 cell lines. Our results showed that using inexact
q-mers (with
q = 6) can improve the lncRNA localization prediction performance compared to using exact
q-mers. Further, we showed that lncRNA localization, in general, is not cell-line-specific. We also identified a category of LncRNAs which switch cellular compartments between different cell lines (we call them switching lncRNAs). These switching lncRNAs complicate the problem of predicting lncRNA localization using machine learning models, showing that lncRNA localization is still a major challenge.
Full article