DeepmRNALoc: A Novel Predictor of Eukaryotic mRNA Subcellular Localization Based on Deep Learning
Abstract
:1. Introduction
2. Results
2.1. Model Training Based on Five-Fold Cross-Validation
2.2. Trained Model Evaluation Based on an Independent Dataset
2.3. Performance Comparisons of the DeepmRNALoc with Other Existing State-of-the-Art Predictors
2.4. Trained Model Evaluation Based on Independent Human mRNA Data (Dataset 2)
2.5. Performance Comparison of Various k-Value Combinations
2.6. Model Deployment, Web Server Construction, and Usage
3. Materials and Methods
3.1. Benchmark Datasets
3.1.1. Dataset 1 (Training Set and Independent Validation Set)
3.1.2. Dataset 2 (Independent Validation Dataset for Human mRNA Subcellular Localization)
3.2. Numerical Coding of mRNA Sequence Data
- A square was generated;
- The four different nucleotides were marked on each corner of the square;
- The first point at the center point of the square was generated as the starting point;
- A straight line was drawn from the center point of the square to the corner corresponding to the first nucleotide of the sample sequence, taking the midpoint of the line as the second point, and then another straight line was drawn from the second point to the corner corresponding to the second nucleotide of the sample sequence, taking the midpoint of the straight line as the third point, and so on, until the nucleotides of the sample sequence were used up [34,39,40,41].
3.3. Network Architecture of DeepmRNALoc
3.4. Performance Evaluation Criteria
3.5. Five-Fold Cross-Validation for Model Validation and Over-Fitting Issue
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kloc, M.; Zearfoss, N.R.; Etkin, L.D. Mechanisms of subcellular mRNA localization. Cell 2002, 108, 533–544. [Google Scholar] [CrossRef] [Green Version]
- Holt, C.E.; Bullock, S.L. Subcellular mRNA Localization in Animal Cells and Why It Matters. Science 2009, 326, 1212–1216. [Google Scholar] [CrossRef] [Green Version]
- Mili, S.; Macara, I.G. RNA localization and polarity: From A(PC) to Z(BP). Trends Cell Biol. 2009, 19, 156–164. [Google Scholar] [CrossRef] [Green Version]
- Bouvrette, L.P.B.; Cody, N.A.L.; Bergalet, J.; Lefebvre, F.A.; Diot, C.; Wang, X.; Blanchette, M.; Lecuyer, E. CeFra-seq reveals broad asymmetric mRNA and noncoding RNA distribution profiles in Drosophila and human cells. RNA 2018, 24, 98–113. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Martin, K.C.; Ephrussi, A. mRNA Localization: Gene Expression in the Spatial Dimension. Cell 2009, 136, 719–730. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cooper, T.A.; Wan, L.; Dreyfuss, G. RNA and Disease. Cell 2009, 136, 777–793. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Fagerberg, L.; Hallstrom, B.M.; Oksvold, P.; Kampf, C.; Djureinovic, D.; Odeberg, J.; Habuka, M.; Tahmasebpoor, S.; Danielsson, A.; Edlund, K.; et al. Analysis of the Human Tissue-specific Expression by Genome-wide Integration of Transcriptomics and Antibody-based Proteomics. Mol. Cell. Proteom. 2014, 13, 397–406. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Fazal, F.M.; Han, S.; Parker, K.R.; Kaewsapsak, P.; Xu, J.; Boettiger, A.N.; Chang, H.Y.; Ting, A.Y. Atlas of Subcellular RNA Localization Revealed by APEX-Seq. Cell 2019, 178, 473–490. [Google Scholar] [CrossRef]
- Poon, M.M.; Choi, S.H.; Jamieson, C.A.M.; Geschwind, D.H.; Martin, K.C. Identification of process-localized mRNAs from cultured rodent hippocampal neurons. J. Neurosci. 2006, 26, 13390–13399. [Google Scholar] [CrossRef] [Green Version]
- Meyer, C.; Garzia, A.; Tuschl, T. Simultaneous detection of the subcellular localization of RNAs and proteins in cultured cells by combined multicolor RNA-FISH and IF. Methods 2017, 118, 101–110. [Google Scholar] [CrossRef] [Green Version]
- Kwon, S. Single-molecule fluorescence in situ hybridization: Quantitative imaging of single RNA molecules. BMB Rep. 2013, 46, 65–72. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Zidek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
- Whalen, S.; Schreiber, J.; Noble, W.S.; Pollard, K.S. Navigating the pitfalls of applying machine learning in genomics. Nat. Rev. Genet. 2022, 23, 169–181. [Google Scholar] [CrossRef]
- Vamathevan, J.; Clark, D.; Czodrowski, P.; Dunham, I.; Ferran, E.; Lee, G.; Li, B.; Madabhushi, A.; Shah, P.; Spitzer, M.; et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 2019, 18, 463–477. [Google Scholar] [CrossRef]
- Ranjbarvaziri, S.; Kooiker, K.B.; Ellenberger, M.; Fajardo, G.; Zhao, M.M.; Vander Roest, A.S.; Woldeyes, R.A.; Koyano, T.T.; Fong, R.; Ma, N.; et al. Altered Cardiac Energetics and Mitochondrial Dysfunction in Hypertrophic Cardiomyopathy. Circulation 2021, 144, 1714–1731. [Google Scholar] [CrossRef]
- Kermany, D.S.; Goldbaum, M.; Cai, W.J.; Valentim, C.C.S.; Liang, H.Y.; Baxter, S.L.; McKeown, A.; Yang, G.; Wu, X.K.; Yan, F.B.; et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell 2018, 172, 1122–1131. [Google Scholar] [CrossRef]
- Alves, V.M.; Korn, D.; Pervitsky, V.; Thieme, A.; Capuzzi, S.J.; Baker, N.; Chirkova, R.; Ekins, S.; Muratov, E.N.; Hickey, A.; et al. Knowledge-based approaches to drug discovery for rare diseases. Drug Discov. Today 2022, 27, 490–502. [Google Scholar] [CrossRef]
- Wekesa, J.S.; Meng, J.; Luan, Y.S. A deep learning model for plant lncRNA-protein interaction prediction with graph attention. Mol. Genet. Genom. 2020, 295, 1091–1102. [Google Scholar] [CrossRef] [PubMed]
- Wei, L.Y.; Ding, Y.J.; Su, R.; Tang, J.J.; Zou, Q. Prediction of human protein subcellular localization using deep learning. J. Parallel Distrib. Comput. 2018, 117, 212–217. [Google Scholar] [CrossRef]
- Ahmad, A.; Lin, H.; Shatabda, S. Locate-R: Subcellular localization of long non-coding RNAs using nucleotide compositions. Genomics 2020, 112, 2583–2589. [Google Scholar] [CrossRef] [PubMed]
- Cao, Z.; Pan, X.Y.; Yang, Y.; Huang, Y.; Shen, H.B. The lncLocator: A subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier. Bioinformatics 2018, 34, 2185–2194. [Google Scholar] [CrossRef] [PubMed]
- Lin, Y.; Pan, X.Y.; Shen, H.B. IncLocator 2.0: A cell-line-specific subcellular localization predictor for long non-coding RNAs with interpretable deep learning. Bioinformatics 2021, 37, 2308–2316. [Google Scholar] [CrossRef] [PubMed]
- Armenteros, J.J.A.; Sonderby, C.K.; Sonderby, S.K.; Nielsen, H.; Winther, O. DeepLoc: Prediction of protein subcellular localization using deep learning. Bioinformatics 2017, 33, 3387–3395. [Google Scholar] [CrossRef] [PubMed]
- Zhang, T.; Tan, P.; Wang, L.; Jin, N.; Li, Y.; Zhang, L.; Yang, H.; Hu, Z.; Zhang, L.; Hu, C.; et al. RNALocate: A resource for RNA subcellular localizations. Nucleic Acids Res. 2017, 45, D135–D138. [Google Scholar] [CrossRef] [PubMed]
- Cui, T.Y.; Dou, Y.Y.; Tan, P.W.; Ni, Z.; Liu, T.Y.; Wang, D.L.; Huang, Y.; Cai, K.C.; Zhao, X.Y.; Xu, D.; et al. RNALocate v2.0: An updated resource for RNA subcellular localization with increased coverage and annotation. Nucleic Acids Res. 2022, 50, D333–D339. [Google Scholar] [CrossRef] [PubMed]
- Yan, Z.C.; Lecuyer, E.; Blanchette, M. Prediction of mRNA subcellular localization using deep recurrent neural networks. Bioinformatics 2019, 35, I333–I342. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, Z.Y.; Yang, Y.H.; Ding, H.; Wang, D.; Chen, W.; Lin, H. Design powerful predictor for mRNA subcellular location prediction in Homo sapiens. Brief. Bioinform. 2021, 22, 526–535. [Google Scholar] [CrossRef] [PubMed]
- Garg, A.; Singhal, N.; Kumar, R.; Kumar, M. mRNALoc: A novel machine-learning based in-silico tool to predict mRNA subcellular localization. Nucleic Acids Res. 2020, 48, W239–W243. [Google Scholar] [CrossRef]
- Chen, W.; Lei, T.Y.; Jin, D.C.; Lin, H.; Chou, K.C. PseKNC: A flexible web server for generating pseudo K-tuple nucleotide composition. Anal. Biochem. 2014, 456, 53–60. [Google Scholar] [CrossRef]
- Ke, G.L.; Meng, Q.; Finley, T.; Wang, T.F.; Chen, W.; Ma, W.D.; Ye, Q.W.; Liu, T.Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 1–9. [Google Scholar]
- Li, J.; Zhang, L.C.; He, S.D.; Guo, F.; Zou, Q. SubLocEP: A novel ensemble predictor of subcellular localization of eukaryotic mRNA based on machine learning. Brief. Bioinform. 2021, 22, bbaa401. [Google Scholar] [CrossRef]
- Bengio, Y.; Ducharme, R.; Vincent, P.; Jauvin, C. A neural probabilistic language model. J. Mach. Learn. Res. 2003, 3, 1137–1155. [Google Scholar] [CrossRef]
- Karlin, S.; Burge, C. Dinucleotide relative abundance extremes: A genomic signature. Trends Genet. 1995, 11, 283–290. [Google Scholar] [PubMed]
- Jeffrey, H.J. Chaos game representation of gene structure. Nucleic Acids Res. 1990, 18, 2163–2170. [Google Scholar] [CrossRef] [Green Version]
- Ghandi, M.; Mohammad-Noori, M.; Beer, M.A. Robust k-mer frequency estimation using gapped k-mers. J. Math. Biol. 2014, 69, 469–500. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhu, P.P.; Li, W.C.; Zhong, Z.J.; Deng, E.Z.; Ding, H.; Chen, W.; Lin, H. Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. Mol. Biosyst. 2015, 11, 558–563. [Google Scholar] [CrossRef]
- Liu, B.; Long, R.; Chou, K.C. iDHS-EL: Identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework. Bioinformatics 2016, 32, 2411–2418. [Google Scholar] [CrossRef] [PubMed]
- Lee, D.; Karchin, R.; Beer, M.A. Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011, 21, 2167–2180. [Google Scholar] [CrossRef] [Green Version]
- Almeida, J.S.; Carrico, J.A.; Maretzek, A.; Noble, P.A.; Fletcher, M. Analysis of genomic sequences by Chaos Game Representation. Bioinformatics 2001, 17, 429–437. [Google Scholar] [CrossRef] [Green Version]
- Deschavanne, P.J.; Giron, A.; Vilain, J.; Fagot, G.; Fertil, B. Genomic signature: Characterization and classification of species assessed by chaos game representation of sequences. Mol. Biol. Evol. 1999, 16, 1391–1399. [Google Scholar] [CrossRef]
- Wang, Y.W.; Hill, K.; Singh, S.; Kari, L. The spectrum of genomic signatures: From dinucleotides to chaos game representation. Gene 2005, 346, 173–185. [Google Scholar] [CrossRef]
- Sutton, G.G.; White, O.; Adams, M.; Kerlavage, A.R. TIGR Assembler: A new tool for assembling large shotgun sequencing projects. Genome Sci. Technol. 1995, 1, 9–19. [Google Scholar] [CrossRef]
- Chu, Y.; Kaushik, A.C.; Wang, X.; Wang, W.; Zhang, Y.; Shan, X.; Salahub, D.R.; Xiong, Y.; Wei, D.-Q. DTI-CDF: A cascade deep forest model towards the prediction of drug-target interactions based on hybrid features. Brief. Bioinform. 2021, 22, 451–462. [Google Scholar] [CrossRef]
- Li, X.; Liu, T.; Tao, P.; Wang, C.; Chen, L. A highly accurate protein structural class prediction approach using auto cross covariance transformation and recursive feature elimination. Comput. Biol. Chem. 2015, 59, 95–100. [Google Scholar] [CrossRef]
- Shan, X.; Wang, X.; Li, C.-D.; Chu, Y.; Zhang, Y.; Xiong, Y.; Wei, D.-Q. Prediction of CYP450 Enzyme-Substrate Selectivity Based on the Network-Based Label Space Division Method. J. Chem. Inf. Model. 2019, 59, 4577–4586. [Google Scholar] [CrossRef]
- Xiong, Z.; Cui, Y.X.; Liu, Z.H.; Zhao, Y.; Hu, M.; Hu, J.J. Evaluating explorative prediction power of machine learning algorithms for materials discovery using k-fold forward cross-validation. Comput. Mater. Sci. 2020, 171, 109203. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015; pp. 1–14. [Google Scholar]
- Wang, L.; You, Z.H.; Huang, Y.A.; Huang, D.S.; Chan, K.C.C. An efficient approach based on multi-sources information to predict circRNA-disease associations using deep convolutional neural network. Bioinformatics 2020, 36, 4038–4046. [Google Scholar] [CrossRef] [PubMed]
- Quang, D.; Chen, Y.; Xie, X. DANN: A deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 2015, 31, 761–763. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Evaluation Metrics | ||||
---|---|---|---|---|
Location | Precision | Recall | ACC | F-Score |
Cytoplasm | 0.777 | 0.873 | 0.873 | 0.822 |
Endoplasmic reticulum | 0.813 | 0.516 | 0.516 | 0.631 |
Extracellular region | 0.573 | 0.326 | 0.326 | 0.415 |
Mitochondria | 0.867 | 0.897 | 0.897 | 0.882 |
Nucleus | 0.843 | 0.856 | 0.856 | 0.850 |
Evaluation Metrics | ||||
---|---|---|---|---|
Location | Precision | Recall | ACC | F-Score |
Cytoplasm | 0.802 | 0.895 | 0.895 | 0.846 |
Endoplasmic reticulum | 0.816 | 0.594 | 0.594 | 0.688 |
Extracellular region | 0.603 | 0.308 | 0.308 | 0.407 |
Mitochondria | 0.931 | 0.944 | 0.944 | 0.937 |
Nucleus | 0.857 | 0.865 | 0.865 | 0.861 |
Model | Precision | Recall | ACC | F-Score |
---|---|---|---|---|
DeepmRNALoc | 0.817 | 0.822 | 0.822 | 0.814 |
SubLocEP | 0.671 | 0.601 | 0.601 | 0.578 |
RNATracker | 0.595 | 0.519 | 0.519 | 0.540 |
mRNALoc | 0.549 | 0.491 | 0.491 | 0.509 |
iLoc-mRNA | 0.183 | 0.425 | 0.425 | 0.256 |
DeepmRNALoc | SubLocEP | |
---|---|---|
Cytoplasm | 0.923 | 0.883 |
Endoplasmic reticulum | 0.811 | 0.630 |
Nucleus | 0.807 | 0.426 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, S.; Shen, Z.; Liu, T.; Long, W.; Jiang, L.; Peng, S. DeepmRNALoc: A Novel Predictor of Eukaryotic mRNA Subcellular Localization Based on Deep Learning. Molecules 2023, 28, 2284. https://doi.org/10.3390/molecules28052284
Wang S, Shen Z, Liu T, Long W, Jiang L, Peng S. DeepmRNALoc: A Novel Predictor of Eukaryotic mRNA Subcellular Localization Based on Deep Learning. Molecules. 2023; 28(5):2284. https://doi.org/10.3390/molecules28052284
Chicago/Turabian StyleWang, Shihang, Zhehan Shen, Taigang Liu, Wei Long, Linhua Jiang, and Sihua Peng. 2023. "DeepmRNALoc: A Novel Predictor of Eukaryotic mRNA Subcellular Localization Based on Deep Learning" Molecules 28, no. 5: 2284. https://doi.org/10.3390/molecules28052284
APA StyleWang, S., Shen, Z., Liu, T., Long, W., Jiang, L., & Peng, S. (2023). DeepmRNALoc: A Novel Predictor of Eukaryotic mRNA Subcellular Localization Based on Deep Learning. Molecules, 28(5), 2284. https://doi.org/10.3390/molecules28052284