Comprehensive Analysis of Large-Scale Transcriptomes from Multiple Cancer Types
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Sources
2.2. Customized Reference lncRNAs Construction
2.3. Sequencing Data Processing and Feature Matrix Construction for Machine Learning
2.4. Disease-Related Feature Selection by Machine Learning
2.5. Cancer Subtyping Analysis
2.6. Other Bioinformatic Analysis
3. Results
3.1. PipeOne Workflow Overview
3.2. Cancer Genes and Pathways Contributing to Multiple Types of Cancer
3.3. Clinically-Relevant Cancer Subtypes Characterized by Distinct Somatic Mutations
4. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Emrich, S.J.; Barbazuk, W.B.; Li, L.; Schnable, P.S. Gene discovery and annotation using LCM-454 transcriptome sequencing. Genome Res. 2007, 17, 69–73. [Google Scholar] [CrossRef] [Green Version]
- Wang, Z.; Gerstein, M.; Snyder, M. RNA-Seq: A revolutionary tool for transcriptomics. Nat. Rev. Genet. 2009, 10, 57–63. [Google Scholar] [CrossRef]
- Stark, R.; Grzelak, M.; Hadfield, J. RNA sequencing: The teenage years. Nat. Rev. Genet. 2019, 20, 631–656. [Google Scholar] [CrossRef] [PubMed]
- Pan, Q.; Shai, O.; Lee, L.J.; Frey, B.J.; Blencowe, B.J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 2008, 40, 1413–1415. [Google Scholar] [CrossRef] [PubMed]
- Wang, E.T.; Sandberg, R.; Luo, S.; Khrebtukova, I.; Zhang, L.; Mayr, C.; Kingsmore, S.F.; Schroth, G.P.; Burge, C.B. Alternative isoform regulation in human tissue transcriptomes. Nature 2008, 456, 470–476. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nilsen, T.W.; Graveley, B.R. Expansion of the eukaryotic proteome by alternative splicing. Nature 2010, 463, 457–463. [Google Scholar] [CrossRef] [Green Version]
- Scotti, M.M.; Swanson, M.S. RNA mis-splicing in disease. Nat. Rev. Genet. 2016, 17, 19–32. [Google Scholar] [CrossRef]
- Singh, B.; Eyras, E. The role of alternative splicing in cancer. Transcription 2017, 8, 91–98. [Google Scholar] [CrossRef] [Green Version]
- Shi, Y.; Di Giammartino, D.C.; Taylor, D.; Sarkeshik, A.; Rice, W.J.; Yates, J.R., 3rd; Frank, J.; Manley, J.L. Molecular architecture of the human pre-mRNA 3’ processing complex. Mol. Cell 2009, 33, 365–376. [Google Scholar] [CrossRef] [Green Version]
- Tian, B.; Manley, J.L. Alternative polyadenylation of mRNA precursors. Nat. Rev. Mol. Cell Biol. 2017, 18, 18–30. [Google Scholar] [CrossRef]
- Derti, A.; Garrett-Engele, P.; Macisaac, K.D.; Stevens, R.C.; Sriram, S.; Chen, R.; Rohl, C.A.; Johnson, J.M.; Babak, T. A quantitative atlas of polyadenylation in five mammals. Genome Res. 2012, 22, 1173–1183. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Latysheva, N.S.; Babu, M.M. Discovering and understanding oncogenic gene fusions through data intensive computational approaches. Nucleic Acids Res. 2016, 44, 4487–4503. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yoshihara, K.; Wang, Q.; Torres-Garcia, W.; Zheng, S.; Vegesna, R.; Kim, H.; Verhaak, R.G. The landscape and therapeutic relevance of cancer-associated transcript fusions. Oncogene 2015, 34, 4845–4854. [Google Scholar] [CrossRef] [Green Version]
- Lonsdale, J.; Thomas, J.; Salvatore, M.; Phillips, R.; Lo, E.; Shad, S.; Hasz, R.; Walters, G.; Garcia, F.; Young, N.; et al. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 2013, 45, 580–585. [Google Scholar] [CrossRef]
- Weinstein, J.N.; Collisson, E.A.; Mills, G.B.; Shaw, K.R.; Ozenberger, B.A.; Ellrott, K.; Shmulevich, I.; Sander, C.; Stuart, J.M. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 2013, 45, 1113–1120. [Google Scholar] [CrossRef] [PubMed]
- Tan, M.H.; Li, Q.; Shanmugam, R.; Piskol, R.; Kohler, J.; Young, A.N.; Liu, K.I.; Zhang, R.; Ramaswami, G.; Ariyoshi, K.; et al. Dynamic landscape and regulation of RNA editing in mammals. Nature 2017, 550, 249–254. [Google Scholar] [CrossRef]
- Han, L.; Diao, L.; Yu, S.; Xu, X.; Li, J.; Zhang, R.; Yang, Y.; Werner, H.M.J.; Eterovic, A.K.; Yuan, Y.; et al. The Genomic Landscape and Clinical Relevance of A-to-I RNA Editing in Human Cancers. Cancer Cell 2015, 28, 515–528. [Google Scholar] [CrossRef] [Green Version]
- Gao, Y.; Wang, J.; Zhao, F. CIRI: An efficient and unbiased algorithm for de novo circular RNA identification. Genome Biol 2015, 16, 4. [Google Scholar] [CrossRef] [Green Version]
- Iyer, M.K.; Niknafs, Y.S.; Malik, R.; Singhal, U.; Sahu, A.; Hosono, Y.; Barrette, T.R.; Prensner, J.R.; Evans, J.R.; Zhao, S.; et al. The landscape of long noncoding RNAs in the human transcriptome. Nat. Genet. 2015, 47, 199–208. [Google Scholar] [CrossRef]
- Ashwal-Fluss, R.; Meyer, M.; Pamudurti, N.R.; Ivanov, A.; Bartok, O.; Hanan, M.; Evantal, N.; Memczak, S.; Rajewsky, N.; Kadener, S. circRNA biogenesis competes with pre-mRNA splicing. Mol. Cell 2014, 56, 55–66. [Google Scholar] [CrossRef] [Green Version]
- Piwecka, M.; Glazar, P.; Hernandez-Miranda, L.R.; Memczak, S.; Wolf, S.A.; Rybak-Wolf, A.; Filipchyk, A.; Klironomos, F.; Cerda Jara, C.A.; Fenske, P.; et al. Loss of a mammalian circular RNA locus causes miRNA deregulation and affects brain function. Science 2017, 357, aam8526. [Google Scholar] [CrossRef] [Green Version]
- Slack, F.J.; Chinnaiyan, A.M. The role of non-coding rnas in oncology. Cell 2019, 179, 1033–1055. [Google Scholar] [CrossRef]
- Yan, X.; Hu, Z.; Feng, Y.; Hu, X.; Yuan, J.; Zhao, S.D.; Zhang, Y.; Yang, L.; Shan, W.; He, Q.; et al. Comprehensive Genomic Characterization of Long Non-coding RNAs across Human Cancers. Cancer Cell 2015, 28, 529–540. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tseng, Y.Y.; Moriarity, B.S.; Gong, W.; Akiyama, R.; Tiwari, A.; Kawakami, H.; Ronning, P.; Reuland, B.; Guenther, K.; Beadnell, T.C.; et al. PVT1 dependence in cancer with MYC copy-number increase. Nature 2014, 512, 82–86. [Google Scholar] [CrossRef]
- De Koning, A.P.J.; Gu, W.; Castoe, T.A.; Batzer, M.A.; Pollock, D.D. Repetitive Elements May Comprise Over Two-Thirds of the Human Genome. PLoS Genet. 2011, 7, e1002384. [Google Scholar] [CrossRef] [Green Version]
- Bendall, M.L.; de Mulder, M.; Iñiguez, L.P.; Lecanda-Sánchez, A.; Pérez-Losada, M.; Ostrowski, M.A.; Jones, R.B.; Mulder, L.C.F.; Reyes-Terán, G.; Crandall, K.A.; et al. Telescope: Characterization of the retrotranscriptome by accurate estimation of transposable element expression. PLoS Comput. Biol. 2019, 15, e1006453. [Google Scholar] [CrossRef] [Green Version]
- Goodier, J.L. Restricting retrotransposons: A review. Mobile DNA 2016, 7, 16. [Google Scholar] [CrossRef] [Green Version]
- Payer, L.M.; Burns, K.H. Transposable elements in human genetic disease. Nat. Rev. Genet. 2019, 20, 760–772. [Google Scholar] [CrossRef]
- Göke, J.; Lu, X.; Chan, Y.S.; Ng, H.H.; Ly, L.H.; Sachs, F.; Szczerbinska, I. Dynamic transcription of distinct classes of endogenous retroviral elements marks specific populations of early human embryonic cells. Cell Stem Cell 2015, 16, 135–141. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Eisenberg, E.; Levanon, E.Y. A-to-I RNA editing-immune protector and transcriptome diversifier. Nat. Rev. Genet. 2018, 19, 473–490. [Google Scholar] [CrossRef] [PubMed]
- Yee, B.A.; Pratt, G.A.; Graveley, B.R.; Van Nostrand, E.L.; Yeo, G.W. RBP-Maps enables robust generation of splicing regulatory maps. RNA 2019, 25, 193–204. [Google Scholar] [CrossRef] [Green Version]
- Modi, H.; McDonald, T.; Chu, S.; Yee, J.K.; Forman, S.J.; Bhatia, R. Role of BCR/ABL gene-expression levels in determining the phenotype and imatinib sensitivity of transformed human hematopoietic cells. Blood 2007, 109, 5411–5421. [Google Scholar] [CrossRef]
- Conesa, A.; Madrigal, P.; Tarazona, S.; Gomez-Cabrero, D.; Cervera, A.; McPherson, A.; Szczesniak, M.W.; Gaffney, D.J.; Elo, L.L.; Zhang, X.; et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016, 17, 13. [Google Scholar] [CrossRef] [Green Version]
- Frankish, A.; Diekhans, M.; Ferreira, A.M.; Johnson, R.; Jungreis, I.; Loveland, J.; Mudge, J.M.; Sisu, C.; Wright, J.; Armstrong, J.; et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 2019, 47, D766–D773. [Google Scholar] [CrossRef] [Green Version]
- Volders, P.J.; Anckaert, J.; Verheggen, K.; Nuytens, J.; Martens, L.; Mestdagh, P.; Vandesompele, J. LNCipedia 5: Towards a reference set of human long non-coding RNAs. Nucleic Acids Res. 2019, 47, D135–D139. [Google Scholar] [CrossRef] [Green Version]
- Glazar, P.; Papavasileiou, P.; Rajewsky, N. circBase: A database for circular RNAs. RNA 2014, 20, 1666–1670. [Google Scholar] [CrossRef] [Green Version]
- Tang, Z.; Li, C.; Kang, B.; Gao, G.; Li, C.; Zhang, Z. GEPIA: A web server for cancer and normal gene expression profiling and interactive analyses. Nucleic Acids Res. 2017, 45, W98–W102. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kim, D.; Langmead, B.; Salzberg, S.L. HISAT: A fast spliced aligner with low memory requirements. Nat. Methods 2015, 12, 357–360. [Google Scholar] [CrossRef] [Green Version]
- Pertea, M.; Pertea, G.M.; Antonescu, C.M.; Chang, T.C.; Mendell, J.T.; Salzberg, S.L. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 2015, 33, 290–295. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Niknafs, Y.S.; Pandian, B.; Iyer, H.K.; Chinnaiyan, A.M.; Iyer, M.K. TACO produces robust multisample transcriptome assemblies from RNA-seq. Nat. Methods 2017, 14, 68–70. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, L.; Park, H.J.; Dasari, S.; Wang, S.; Kocher, J.P.; Li, W. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 2013, 41, e74. [Google Scholar] [CrossRef] [PubMed]
- Tong, X.; Liu, S. CPPred: Coding potential prediction based on the global description of RNA sequence. Nucleic Acids Res. 2019, 47, e43. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, A.; Zhang, J.; Zhou, Z. PLEK: A tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. BMC Bioinform. 2014, 15, 311. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chen, S.; Zhou, Y.; Chen, Y.; Gu, J. fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018, 34, i884–i890. [Google Scholar] [CrossRef]
- Patro, R.; Duggal, G.; Love, M.I.; Irizarry, R.A.; Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 2017, 14, 417–419. [Google Scholar] [CrossRef] [Green Version]
- Zhang, J.; Chen, S.; Yang, J.; Zhao, F. Accurate quantification of circular RNAs identifies extensive circular isoform switching events. Nat. Commun. 2020, 11, 90. [Google Scholar] [CrossRef] [Green Version]
- Kahles, A.; Ong, C.S.; Zhong, Y.; Rätsch, G. SplAdder: Identification, quantification and testing of alternative splicing events from RNA-Seq data. Bioinformatics 2016, 32, 1840–1847. [Google Scholar] [CrossRef] [PubMed]
- Ha, K.C.H.; Blencowe, B.J.; Morris, Q. QAPA: A new method for the systematic analysis of alternative polyadenylation from RNA-seq data. Genome Biol. 2018, 19, 45. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, F.; Lu, Y.; Yan, S.; Xing, Q.; Tian, W. SPRINT: An SNP-free toolkit for identifying RNA editing sites. Bioinformatics 2017, 33, 3538–3548. [Google Scholar] [CrossRef]
- McKenna, A.; Hanna, M.; Banks, E.; Sivachenko, A.; Cibulskis, K.; Kernytsky, A.; Garimella, K.; Altshuler, D.; Gabriel, S.; Daly, M.; et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010, 20, 1297–1303. [Google Scholar] [CrossRef] [Green Version]
- Wang, K.; Li, M.; Hakonarson, H. ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010, 38, e164. [Google Scholar] [CrossRef]
- Haas, B.J.; Dobin, A.; Li, B.; Stransky, N.; Pochet, N.; Regev, A. Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biol. 2019, 20, 213. [Google Scholar] [CrossRef] [Green Version]
- Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef] [Green Version]
- Mayakonda, A.; Lin, D.C.; Assenov, Y.; Plass, C.; Koeffler, H.P. Maftools: Efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 2018, 28, 1747–1756. [Google Scholar] [CrossRef] [Green Version]
- Di Tommaso, P.; Chatzou, M.; Floden, E.W.; Barja, P.P.; Palumbo, E.; Notredame, C. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 2017, 35, 316–319. [Google Scholar] [CrossRef]
- Sahraeian, S.M.E.; Mohiyuddin, M.; Sebra, R.; Tilgner, H.; Afshar, P.T.; Au, K.F.; Bani Asadi, N.; Gerstein, M.B.; Wong, W.H.; Snyder, M.P.; et al. Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis. Nat. Commun. 2017, 8, 59. [Google Scholar] [CrossRef]
- Cornwell, M.; Vangala, M.; Taing, L.; Herbert, Z.; Koster, J.; Li, B.; Sun, H.; Li, T.; Zhang, J.; Qiu, X.; et al. VIPER: Visualization Pipeline for RNA-seq, a Snakemake workflow for efficient and complete RNA-seq analysis. BMC Bioinform. 2018, 19, 135. [Google Scholar] [CrossRef]
- Kanduc, D. Translational regulation of human papillomavirus type 16 E7 mRNA by the peptide SEQIKA, shared by rabbit alpha(1)-globin and human cytokeratin 7. J. Virol. 2002, 76, 7040–7048. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bodaghi, S.; Yamanegi, K.; Xiao, S.Y.; Da Costa, M.; Palefsky, J.M.; Zheng, Z.M. Colorectal papillomavirus infection in patients with colorectal cancer. Clin. Cancer Res. 2005, 11, 2862–2867. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zeng, Z.M.; Luo, F.F.; Zou, L.X.; He, R.Q.; Pan, D.H.; Chen, X.; Xie, T.T.; Li, Y.Q.; Peng, Z.G.; Chen, G. Human papillomavirus as a potential risk factor for gastric cancer: A meta-analysis of 1,917 cases. Onco. Targets Ther. 2016, 9, 7105–7114. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Latini, F.R.; Hemerly, J.P.; Oler, G.; Riggins, G.J.; Cerutti, J.M. Re-expression of ABI3-binding protein suppresses thyroid tumor growth by promoting senescence and inhibiting invasion. Endocr. Relat. Cancer 2008, 15, 787–799. [Google Scholar] [CrossRef] [Green Version]
- Uekawa, N.; Terauchi, K.; Nishikimi, A.; Shimada, J.; Maruyama, M. Expression of TARSH gene in MEFs senescence and its potential implication in human lung cancer. Biochem. Biophys. Res. Commun. 2005, 329, 1031–1038. [Google Scholar] [CrossRef] [PubMed]
- Wakoh, T.; Uekawa, N.; Terauchi, K.; Sugimoto, M.; Ishigami, A.; Shimada, J.; Maruyama, M. Implication of p53-dependent cellular senescence related gene, TARSH in tumor suppression. Biochem. Biophys. Res. Commun. 2009, 380, 807–812. [Google Scholar] [CrossRef] [PubMed]
- Warburg, O. The Metabolism of Carcinoma Cells. Cancer Res. 1925, 9, 148–163. [Google Scholar] [CrossRef] [Green Version]
- Robinson, M.D.; McCarthy, D.J.; Smyth, G.K. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010, 26, 139–140. [Google Scholar] [CrossRef] [Green Version]
- Love, M.I.; Huber, W.; Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014, 15, 550. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Grüning, B.; Dale, R.; Sjödin, A.; Chapman, B.A.; Rowe, J.; Tomkins-Tinch, C.H.; Valieris, R.; Köster, J.; The Bioconda, T. Bioconda: Sustainable and comprehensive software distribution for the life sciences. Nat. Methods 2018, 15, 475–476. [Google Scholar] [CrossRef] [PubMed]
- Wang, W.; Zhang, X.; Dai, D.-Q. DeFusion: A denoised network regularization framework for multi-omics integration. Brief. Bioinform. 2021, 22, bbab057. [Google Scholar] [CrossRef]
PipeOne | RNACocktail | VIPER | |
---|---|---|---|
Raw data processing | |||
Quality control | √ | x | √ |
Alignment | √ | √ | √ |
Transcriptome reconstruction | √ | √ | x |
Gene quantification | √ | √ | √ |
Novel lncRNA prediction | √ | x | x |
CircRNA prediction | √ | x | x |
Gene quantification | √ | √ | √ |
Fusion prediction | √ | √ | √ |
Variant calling | √ | √ | √ |
RNA editing prediction | √ | √ | x |
Retrotranscriptome | √ | x | x |
Alternative splicing | √ | x | x |
viral DNA detection | x | x | √ |
Long-read | x | √ | x |
Downstream analysis | |||
Result visualization | x | x | √ |
Differential expression analysis | x | √ | √ |
Pathway analysis | x | x | √ |
Batch correction | x | x | √ |
immunological analysis | x | x | √ |
Virus analysis | x | x | √ |
Feature prioritization | √ | x | x |
subtyping/clustering | √ | x | √ |
Multi-modal integration | √ | x | x |
Runtime | |||
Management systems | Nextflow | Python | Snakemake |
Resume | √ | x | √ |
Parrallel | √ | x | √ |
Docker | √ | √ | x |
Conda | √ | √ | √ |
Singularity | √ | x | √ |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nong, B.; Guo, M.; Wang, W.; Songyang, Z.; Xiong, Y. Comprehensive Analysis of Large-Scale Transcriptomes from Multiple Cancer Types. Genes 2021, 12, 1865. https://doi.org/10.3390/genes12121865
Nong B, Guo M, Wang W, Songyang Z, Xiong Y. Comprehensive Analysis of Large-Scale Transcriptomes from Multiple Cancer Types. Genes. 2021; 12(12):1865. https://doi.org/10.3390/genes12121865
Chicago/Turabian StyleNong, Baoting, Mengbiao Guo, Weiwen Wang, Zhou Songyang, and Yuanyan Xiong. 2021. "Comprehensive Analysis of Large-Scale Transcriptomes from Multiple Cancer Types" Genes 12, no. 12: 1865. https://doi.org/10.3390/genes12121865
APA StyleNong, B., Guo, M., Wang, W., Songyang, Z., & Xiong, Y. (2021). Comprehensive Analysis of Large-Scale Transcriptomes from Multiple Cancer Types. Genes, 12(12), 1865. https://doi.org/10.3390/genes12121865