FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon
Abstract
:1. Introduction
2. Materials and Methods
2.1. Fex-SNV Dataset
2.2. Extraction of Features
2.3. Machine-Learning Models
3. Results
3.1. Generation of Models with LinearSVC, Random Forest, and LightGBM
Model | LinearSVC (115) | Random Forest (115) | LightGBM (115) | LightGBM (15) |
---|---|---|---|---|
Accuracy 1 | 0.64 ± 0.10 | 0.71 ± 0.07 | 0.75 ± 0.09 | 0.77 ± 0.07 |
Precision 2 | 0.64 ± 0.09 | 0.71 ± 0.07 | 0.77 ± 0.11 | 0.80 ± 0.12 |
Recall 3 | 0.65 ± 0.15 | 0.73 ± 0.12 | 0.74 ± 0.14 | 0.78 ± 0.13 |
Specificity 4 | 0.63 ± 0.11 | 0.70 ± 0.08 | 0.78 ± 0.13 | 0.77 ± 0.15 |
F1 score 5 | 0.64 ± 0.11 | 0.71 ± 0.08 | 0.75 ± 0.10 | 0.77 ± 0.07 |
NPV 6 | 0.65 ± 0.11 | 0.73 ± 0.11 | 0.76 ± 0.11 | 0.79 ± 0.11 |
MCC 7 | 0.29 ± 0.19 | 0.43 ± 0.15 | 0.52 ± 0.18 | 0.57 ± 0.15 |
AUROC | 0.69 ± 0.08 | 0.79 ± 0.08 | 0.84 ± 0.08 | 0.86 ± 0.08 |
AUPRC | 0.71 ± 0.08 | 0.82 ± 0.07 | 0.85 ± 0.08 | 0.87 ± 0.09 |
3.2. Comparison of FexSplice with SpliceAI and CI-SpliceAI
3.3. Web Service of FexSplice
4. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Maroney, P.A.; Yu, Y.T.; Jankowska, M.; Nilsen, T.W. Direct Analysis of Nematode Cis- and Trans-Spliceosomes: A Functional Role for U5 SnRNA in Spliced Leader Addition Trans-Splicing and the Identification of Novel Sm SnRNPs. RNA 1996, 2, 735–745. [Google Scholar]
- Ule, J.; Stefani, G.; Mele, A.; Ruggiu, M.; Wang, X.; Taneri, B.; Gaasterland, T.; Blencowe, B.J.; Darnell, R.B. An RNA Map Predicting Nova-Dependent Splicing Regulation. Nature 2006, 444, 580–586. [Google Scholar] [CrossRef]
- Singh, G.; Cooper, T.A. Minigene Reporter for Identification and Analysis of Cis Elements and Trans Factors Affecting Pre-MRNA Splicing. BioTechniques 2006, 41, 177–181. [Google Scholar] [CrossRef]
- Ohno, K.; Ohkawara, B.; Shen, X.-M.; Selcen, D.; Engel, A.G. Clinical and Pathologic Features of Congenital Myasthenic Syndromes Caused by 35 Genes-A Comprehensive Review. Int. J. Mol. Sci. 2023, 24, 3730. [Google Scholar] [CrossRef] [PubMed]
- Guth, S.; Martínez, C.; Gaur, R.K.; Valcárcel, J. Evidence for Substrate-Specific Requirement of the Splicing Factor U2AF(35) and for Its Function after Polypyrimidine Tract Recognition by U2AF(65). Mol. Cell Biol. 1999, 19, 8263–8271. [Google Scholar] [CrossRef] [PubMed]
- Fu, Y.; Masuda, A.; Ito, M.; Shinmi, J.; Ohno, K. AG-Dependent 3’-Splice Sites Are Predisposed to Aberrant Splicing Due to a Mutation at the First Nucleotide of an Exon. Nucleic Acids Res. 2011, 39, 4396–4404. [Google Scholar] [CrossRef]
- Yoshida, H.; Park, S.-Y.; Sakashita, G.; Nariai, Y.; Kuwasako, K.; Muto, Y.; Urano, T.; Obayashi, E. Elucidation of the Aberrant 3’ Splice Site Selection by Cancer-Associated Mutations on the U2AF1. Nat. Commun. 2020, 11, 4744. [Google Scholar] [CrossRef]
- Ohno, K.; Takeda, J.-I.; Masuda, A. Rules and Tools to Predict the Splicing Effects of Exonic and Intronic Mutations. Wiley Interdiscip. Rev. RNA 2018, 9, e1451. [Google Scholar] [CrossRef]
- Jaganathan, K.; Kyriazopoulou Panagiotopoulou, S.; McRae, J.F.; Darbandi, S.F.; Knowles, D.; Li, Y.I.; Kosmicki, J.A.; Arbelaez, J.; Cui, W.; Schwartz, G.B.; et al. Predicting Splicing from Primary Sequence with Deep Learning. Cell 2019, 176, 535–548.e24. [Google Scholar] [CrossRef] [PubMed]
- Strauch, Y.; Lord, J.; Niranjan, M.; Baralle, D. CI-SpliceAI-Improving Machine Learning Predictions of Disease Causing Splicing Variants Using Curated Alternative Splice Sites. PLoS ONE 2022, 17, e0269159. [Google Scholar] [CrossRef]
- Shibata, A.; Okuno, T.; Rahman, M.A.; Azuma, Y.; Takeda, J.-I.; Masuda, A.; Selcen, D.; Engel, A.G.; Ohno, K. IntSplice: Prediction of the Splicing Consequences of Intronic Single-Nucleotide Variations in the Human Genome. J. Hum. Genet. 2016, 61, 633–640. [Google Scholar] [CrossRef] [PubMed]
- Takeda, J.-I.; Fukami, S.; Tamura, A.; Shibata, A.; Ohno, K. IntSplice2: Prediction of the Splicing Effects of Intronic Single-Nucleotide Variants Using LightGBM Modeling. Front. Genet. 2021, 12, 701076. [Google Scholar] [CrossRef]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Adv. Neural Inf. Process. Syst. 2017, 3149–3157. [Google Scholar]
- Stenson, P.D.; Mort, M.; Ball, E.V.; Evans, K.; Hayden, M.; Heywood, S.; Hussain, M.; Phillips, A.D.; Cooper, D.N. The Human Gene Mutation Database: Towards a Comprehensive Repository of Inherited Mutation Data for Medical Research, Genetic Diagnosis and next-Generation Sequencing Studies. Hum. Genet. 2017, 136, 665–677. [Google Scholar] [CrossRef] [PubMed]
- Landrum, M.J.; Lee, J.M.; Benson, M.; Brown, G.R.; Chao, C.; Chitipiralla, S.; Gu, B.; Hart, J.; Hoffman, D.; Jang, W.; et al. ClinVar: Improving Access to Variant Interpretations and Supporting Evidence. Nucleic Acids Res. 2018, 46, D1062–D1067. [Google Scholar] [CrossRef]
- Li, K.; Luo, T.; Zhu, Y.; Huang, Y.; Wang, A.; Zhang, D.; Dong, L.; Wang, Y.; Wang, R.; Tang, D.; et al. Performance Evaluation of Differential Splicing Analysis Methods and Splicing Analytics Platform Construction. Nucleic Acids Res. 2022, 50, 9115–9126. [Google Scholar] [CrossRef]
- Howe, K.L.; Achuthan, P.; Allen, J.; Allen, J.; Alvarez-Jarreta, J.; Amode, M.R.; Armean, I.M.; Azov, A.G.; Bennett, R.; Bhai, J.; et al. Ensembl 2021. Nucleic Acids Res. 2021, 49, D884–D891. [Google Scholar] [CrossRef] [PubMed]
- O’Leary, N.A.; Wright, M.W.; Brister, J.R.; Ciufo, S.; Haddad, D.; McVeigh, R.; Rajput, B.; Robbertse, B.; Smith-White, B.; Ako-Adjei, D.; et al. Reference Sequence (RefSeq) Database at NCBI: Current Status, Taxonomic Expansion, and Functional Annotation. Nucleic Acids Res. 2016, 44, D733–D745. [Google Scholar] [CrossRef] [PubMed]
- Sherry, S.T.; Ward, M.H.; Kholodov, M.; Baker, J.; Phan, L.; Smigielski, E.M.; Sirotkin, K. DbSNP: The NCBI Database of Genetic Variation. Nucleic Acids Res. 2001, 29, 308–311. [Google Scholar] [CrossRef]
- Gao, K.; Masuda, A.; Matsuura, T.; Ohno, K. Human Branch Point Consensus Sequence Is YUnAy. Nucleic Acids Res. 2008, 36, 2257–2267. [Google Scholar] [CrossRef]
- Huelga, S.C.; Vu, A.Q.; Arnold, J.D.; Liang, T.Y.; Liu, P.P.; Yan, B.Y.; Donohue, J.P.; Shiue, L.; Hoon, S.; Brenner, S.; et al. Integrative Genome-Wide Analysis Reveals Cooperative Regulation of Alternative Splicing by HnRNP Proteins. Cell Rep. 2012, 1, 167–178. [Google Scholar] [CrossRef]
- Oberg, D.; Fay, J.; Lambkin, H.; Schwartz, S. A Downstream Polyadenylation Element in Human Papillomavirus Type 16 L2 Encodes Multiple GGG Motifs and Interacts with HnRNP H. J. Virol. 2005, 79, 9254–9269. [Google Scholar] [CrossRef]
- Sahashi, K.; Masuda, A.; Matsuura, T.; Shinmi, J.; Zhang, Z.; Takeshima, Y.; Matsuo, M.; Sobue, G.; Ohno, K. In Vitro and in Silico Analysis Reveals an Efficient Algorithm to Predict the Splicing Consequences of Mutations at the 5′ Splice Sites. Nucleic Acids Res. 2007, 35, 5995–6003. [Google Scholar] [CrossRef] [PubMed]
- Yeo, G.; Burge, C.B. Maximum Entropy Modeling of Short Sequence Motifs with Applications to RNA Splicing Signals. J. Comput. Biol. 2004, 11, 377–394. [Google Scholar] [CrossRef] [PubMed]
- Shapiro, M.B.; Senapathy, P. RNA Splice Junctions of Different Classes of Eukaryotes: Sequence Statistics and Functional Implications in Gene Expression. Nucleic Acids Res. 1987, 15, 7155–7174. [Google Scholar] [CrossRef]
- Van Nostrand, E.L.; Freese, P.; Pratt, G.A.; Wang, X.; Wei, X.; Xiao, R.; Blue, S.M.; Chen, J.Y.; Cody, N.A.L.; Dominguez, D.; et al. A Large-Scale Binding and Functional Map of Human RNA-Binding Proteins. Nature 2020, 583, 711–719. [Google Scholar] [CrossRef] [PubMed]
- Kawachi, T.; Masuda, A.; Yamashita, Y.; Takeda, J.I.; Ohkawara, B.; Ito, M.; Ohno, K. Regulated Splicing of Large Exons Is Linked to Phase-Separation of Vertebrate Transcription Factors. EMBO J. 2021, 40, e107485. [Google Scholar] [CrossRef]
- Piva, F.; Giulietti, M.; Burini, A.B.; Principato, G. SpliceAid 2: A Database of Human Splicing Factors Expression Data and RNA Target Motifs. Hum. Mutat. 2012, 33, 81–85. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests; Springer Science and Business Media LLC: Berlin, Germany, 2001. [Google Scholar] [CrossRef]
- Buitinck, L.; Louppe, G.; Blondel, M.; Pedregosa, F.; Mueller, A.; Grisel, O.; Niculae, V.; Prettenhofer, P.; Gramfort, A.; Grobler, J.; et al. API Design for Machine Learning Software: Experiences from the Scikit-Learn Project 2013. arXiv 2013, arXiv:1309.0238. [Google Scholar]
- Vihinen, M. Guidelines for Reporting and Using Prediction Tools for Genetic Variation Analysis. Hum. Mutat. 2013, 34, 275–282. [Google Scholar] [CrossRef]
- Grimm, D.G.; Azencott, C.-A.; Aicheler, F.; Gieraths, U.; MacArthur, D.G.; Samocha, K.E.; Cooper, D.N.; Stenson, P.D.; Daly, M.J.; Smoller, J.W.; et al. The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity. Hum. Mutat. 2015, 36, 513–523. [Google Scholar] [CrossRef]
- Hinrichs, A.S.; Karolchik, D.; Baertsch, R.; Barber, G.P.; Bejerano, G.; Clawson, H.; Diekhans, M.; Furey, T.S.; Harte, R.A.; Hsu, F.; et al. The UCSC Genome Browser Database: Update 2006. Nucleic Acids Res. 2006, 34, D590–D598. [Google Scholar] [CrossRef]
- Coolidge, C.J.; Seely, R.J.; Patton, J.G. Functional Analysis of the Polypyrimidine Tract in Pre-MRNA Splicing. Nucleic Acids Res. 1997, 25, 888–896. [Google Scholar] [CrossRef] [PubMed]
- Mullen, M.P.; Smith, C.W.; Patton, J.G.; Nadal-Ginard, B. Alpha-Tropomyosin Mutually Exclusive Exon Selection: Competition between Branchpoint/Polypyrimidine Tracts Determines Default Exon Choice. Genes Dev. 1991, 5, 642–655. [Google Scholar] [CrossRef] [PubMed]
- Singh, R.; Banerjee, H.; Green, M.R. Differential Recognition of the Polypyrimidine-Tract by the General Splicing Factor U2AF65 and the Splicing Repressor Sex-Lethal. RNA 2000, 6, 901–911. [Google Scholar] [CrossRef]
- Tilgner, H.; Nikolaou, C.; Althammer, S.; Sammeth, M.; Beato, M.; Valcárcel, J.; Guigó, R. Nucleosome Positioning as a Determinant of Exon Recognition. Nat. Struct. Mol. Biol. 2009, 16, 996–1001. [Google Scholar] [CrossRef]
- Lopez-Martinez, A.; Soblechero-Martin, P.; de-la-Puente-Ovejero, L.; Nogales-Gadea, G.; Arechavala-Gomeza, V. An Overview of Alternative Splicing Defects Implicated in Myotonic Dystrophy Type I. Genes 2020, 11, 1109. [Google Scholar] [CrossRef]
- Doktor, T.K.; Schroeder, L.D.; Vested, A.; Palmfeldt, J.; Andersen, H.S.; Gregersen, N.; Andresen, B.S. SMN2 Exon 7 Splicing Is Inhibited by Binding of HnRNP A1 to a Common ESS Motif That Spans the 3’ Splice Site. Hum. Mutat. 2011, 32, 220–230. [Google Scholar] [CrossRef]
- Grodecká, L.; Lockerová, P.; Ravčuková, B.; Buratti, E.; Baralle, F.E.; Dušek, L.; Freiberger, T. Exon First Nucleotide Mutations in Splicing: Evaluation of in Silico Prediction Tools. PLoS ONE 2014, 9, e89570. [Google Scholar] [CrossRef] [PubMed]
- Takeda, J.-I.; Nanatsue, K.; Yamagishi, R.; Ito, M.; Haga, N.; Hirata, H.; Ogi, T.; Ohno, K. InMeRF: Prediction of Pathogenicity of Missense Variants by Individual Modeling for Each Amino Acid Substitution. NAR Genom. Bioinform. 2020, 2, lqaa038. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Joudaki, A.; Takeda, J.-i.; Masuda, A.; Ode, R.; Fujiwara, K.; Ohno, K. FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon. Genes 2023, 14, 1765. https://doi.org/10.3390/genes14091765
Joudaki A, Takeda J-i, Masuda A, Ode R, Fujiwara K, Ohno K. FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon. Genes. 2023; 14(9):1765. https://doi.org/10.3390/genes14091765
Chicago/Turabian StyleJoudaki, Atefeh, Jun-ichi Takeda, Akio Masuda, Rikumo Ode, Koichi Fujiwara, and Kinji Ohno. 2023. "FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon" Genes 14, no. 9: 1765. https://doi.org/10.3390/genes14091765
APA StyleJoudaki, A., Takeda, J.-i., Masuda, A., Ode, R., Fujiwara, K., & Ohno, K. (2023). FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon. Genes, 14(9), 1765. https://doi.org/10.3390/genes14091765