Classifying Consensus Sequences Using Point-Set Representations
Abstract
1. Motivation and Introduction
2. Point-Set Representation
2.1. Exon–Intron Junctions
2.2. Branch Points
2.3. Normal Exon–Exon Junctions vs. Cancer Fusions
2.4. Fractal Structure of Point-Sets in Sites Within Exons and Within Introns
3. Discussion
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Goldberg, M.; Fischer, J.; Hood, L.; Hartwell, L.; Aquardro, C.; Silver, L.; Reynolds, A.E. Genetics: From Genes to Genomes, 7th ed.; McGraw-Hill Publishing: New York, NY, USA, 2021. [Google Scholar]
- Roca, X.; Krainer, A.R.; Eperon, I.C. Pick one, but be quick: 59 splice sites and the problems of too many choices. Genes Dev. 2013, 27, 129–144. [Google Scholar] [CrossRef]
- Sponer, J.; Leszczynski, J.; Hobza, P. Nature of nucleic acid-base stacking: Nonempirical ab initio and empirical potential characterization of 10 stacked base dimers. comparison of stacked and h-bonded base pairs. J. Chem. Phys. 1996, 100, 5590–5596. [Google Scholar] [CrossRef]
- Jurecka, P.; Sponer, J.; Cerny, J.; Hobza, P. Benchmark database of accurate (MP2 and CCSD(T) complete basis set limit) interaction energies of small model complexes, DNA base pairs, and amino acid pairs. Phys. Chem. Chem. Phys. 2006, 8, 1985–1993. [Google Scholar] [CrossRef]
- Olivia, R.; Cavallo, L.; Tramontano, A. Accurate energies of hydrogen bonded nucleic acid base pairs and triplets in tRNA tertiary interactions. Nucleic Acids Res. 2006, 34, 865–879. [Google Scholar] [CrossRef]
- Johnson, C.A.; Bloomingdale, R.J.; Ponnusamy, V.E.; Tillinghast, C.A.; Znosko, B.M.; Lewis, M. A computational model for predicting experimental RNA and DNA nearest-neighbor free energy rankings. J. Chem. Phys. 2011, 115, 9244–9251. [Google Scholar] [CrossRef]
- Jolley, E.A.; Lewis, M.; Znosko, B.M. A computational model for predicting experimental RNA nearest-neighbor free energy rankings: Inosine–Uridine pairs. Chem. Phys. Lett. 2015, 639, 157–160. [Google Scholar] [CrossRef] [PubMed]
- Leon, S.C.; Prentiss, M.; Fyta, M. Binding energies of nucleobase complexes: Relevance to homology recognition of DNA. Phys. Rev. E 2016, 93, 06210. [Google Scholar] [CrossRef]
- Hopfinger, M.C.; Kirkpatrick, C.C.; Znosko, B.M. Predictions and analyses of RNA nearest neighbor parameters for modified nucleotides. Nucleic Acids Res. 2020, 48, 8901–8913. [Google Scholar] [CrossRef] [PubMed]
- Parker, M.T.; Soanes, B.K.; Kusakina, J.; Larrieu, A.; Knop, K.; Joy, N.; Breidenbach, F.; Sherwood, A.V.; Barton, G.J.; Fica, S.M.; et al. m6A modification of U6 snRNA modulates usage of two major classes of pre-mRNA 5′ splice site. eLife 2022, 11, e78808. [Google Scholar] [CrossRef] [PubMed]
- Schneider, T.D.; Stephens, R.M. Sequence logos: A new way to display consensus sequences. Nucleic Acids Res. 1990, 18, 6097–6100. [Google Scholar] [CrossRef]
- Rogozin, I.B.; Milanesi, L. Analysis of donor splice sites in different eukaryotic organisms. J. Mol. Evol. 1997, 45, 50–59. [Google Scholar] [CrossRef]
- Sibley, C.R.; Blazquez, L.; Ule, J. Lessons from non-canonical splicing. Nat. Rev. Genet. 2016, 17, 407–421. [Google Scholar] [CrossRef] [PubMed]
- Hümmer, S.; Borao, S.; Guerra-Moreno, S.; Cozzuto, L.; Hidalgo, E.; Ayte, J. Cross talk between the upstream exon-intron junction and Prp2 facilitates splicing of non-consensus introns. Cell Rep. 2021, 37, 109893. [Google Scholar] [CrossRef]
- Wahl, M.C.; Will, C.L.; Lührmann, R. The spliceosome: Design principles of a dynamic RNP machine. Cell 2009, 136, 701–718. [Google Scholar] [CrossRef]
- Will, C.L.; Lührmann, R. Spliceosome structure and function. Cold Spring Harb. Perspect. Biol. 2011, 3, a003707. [Google Scholar] [CrossRef]
- Hertel, K.J. Spliceosomal Pre-mRNA Splicing Methods and Protocols, 1st ed.; Methods in Molecular Biology, 1126; Humana Press: Totowa, NJ, USA, 2014. [Google Scholar]
- Matera, A.G.; Wang, Z. A day in the life of the spliceosome. Nat. Rev. Mol. Cell Biol. 2014, 15, 108–121. [Google Scholar] [CrossRef] [PubMed]
- Merkhofer, E.C.; Hu, P.; Johnson, T.L. Introduction to co-transcriptional RNA splicing. In Spliceosomal Pre-mRNA Splicing: Methods and Protocols; Humana Press: Totowa, NJ, USA, 2014; pp. 83–96. [Google Scholar]
- Gilbert, W. Why genes in pieces? Nature 1978, 271, 501. [Google Scholar] [CrossRef]
- Kadri, N.K.; Mapel, X.M.; Pausch, H. The intronic branch point sequence is under strong evolutionary constraint in the bovine and human genome. Commun. Biol. 2021, 4, 1206. [Google Scholar] [CrossRef]
- Lasda, E.L.; Blumenthal, T. Trans-splicing. Wiley Interdiscip. Rev. RNA 2011, 2, 417–434. [Google Scholar] [CrossRef] [PubMed]
- Hiller, M.; Zhang, Z.; Backofen, R.; Stamm, S. Pre-mRNA secondary structures influence exon recognition. PLoS Genet. 2007, 3, e204. [Google Scholar] [CrossRef]
- Long, M.; Deutsch, M. Intron exon structures of eukaryotic model organisms. Nucleic Acids Res. 1999, 27, 3219–3228. [Google Scholar] [CrossRef]
- Zhu, L.; Zhang, Y.; Zhang, W.; Yang, S.; Chen, J.-Q.; Tian, D. Patterns of exon-intron architecture variation of genes in eukaryotic genomes. BMC Genom. 2009, 10, 47. [Google Scholar] [CrossRef]
- Wang, Y.; Liu, J.; Huang, B.O.; Xu, Y.-M.; Li, J.; Huang, L.-F.; Lin, J.; Zhang, J.; Min, Q.-H.; Yang, W.-M.; et al. Mechanism of alternative splicing and its regulation. Biomed. Rep. 2015, 3, 152–158. [Google Scholar] [CrossRef] [PubMed]
- Stepankiw, N.; Raghavan, M.; Fogarty, E.A.; Grimson, A.; Pleiss, J.A. Widespread alternative and aberrant splicing revealed by lariat sequencing. Nucleic Acids Res. 2015, 43, 8488–8501. [Google Scholar] [CrossRef] [PubMed]
- Ule, J.; Blencowe, B.J. Alternative splicing regulatory networks: Functions, mechanisms, and evolution. Mol. Cell 2019, 76, 329–345. [Google Scholar] [CrossRef] [PubMed]
- Marasco, L.E.; Kornblihtt, A.R. The physiology of alternative splicing. Nat. Rev. Mol. Cell Biol. 2023, 24, 242–254. [Google Scholar] [CrossRef]
- Walsh, C.E. New paradigm for gene transfer: RNA trans-splicing and small interfering RNA as therapeutic strategies. Semin. Hematol. 2004, 41, 297–302. [Google Scholar] [CrossRef]
- Yang, Y.; Walsh, C.E. Spliceosome-mediated RNA trans-splicing. Mol. Ther. 2005, 12, 1006–1012. [Google Scholar] [CrossRef]
- Cooper, T.A.; Wan, L.; Dreyfuss, G. RNA and disease. Cell 2009, 136, 777–793. [Google Scholar] [CrossRef]
- Mcmanus, C.J.; Duff, M.O.; Eipper-Mains, J.; Graveley, B.R. Global analysis of trans-splicing in Drosophila. Proc. Natl. Acad. Sci. USA 2010, 107, 12975–12979. [Google Scholar] [CrossRef]
- Scotti, M.M.; Swanson, M.S. RNA mis-splicing in disease. Nat. Rev. Genet. 2016, 17, 19–32. [Google Scholar] [CrossRef] [PubMed]
- Jiang, W.; Chen, L. Alternative splicing: Human disease and quantitative analysis from high-throughput sequencing. Comput. Struct. Biotechnol. J. 2021, 19, 183–195. [Google Scholar] [CrossRef]
- Aebersold, R.; Agar, J.N.; Amster, I.J.; Baker, M.S.; Bertozzi, C.R.; Boja, E.S.; Costello, C.E.; Cravatt, B.F.; Fenselau, C.; Garcia, B.A.; et al. How many human proteoforms are there? Nat. Chem. Biol. 2018, 14, 206–214. [Google Scholar] [CrossRef]
- van der Feltz, C.; Hoskins, A.A. Structural and functional modularity of the U2 snRNP in pre-mRNA splicing. Crit. Rev. Biochem. Mol. Biol. 2019, 54, 443–465. [Google Scholar] [CrossRef]
- Burge, C.; Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 1997, 268, 78–94. [Google Scholar] [CrossRef]
- Carmel, I.; Tai, S.; Vig, I.; Ast, G. Comparative analysis detects dependencies among the 5′ splice-site positions. RNA 2004, 10, 828–840. [Google Scholar] [CrossRef]
- Artemyeva-Isman, O.V.; Porter, A.C.G. U5 snRNA interactions with exons ensure splicing precision. Front. Genet. 2021, 12, 676971. [Google Scholar] [CrossRef]
- Speakman, E.; Gunaratne, G.H. On a kneading theory for gene-splicing. CHAOS 2024, 34, 043125. [Google Scholar] [CrossRef]
- Iida, Y.; Sasaki, F. Recognition patterns for exon-intron junctions in higher organisms as revealed by a computer search. J. Biochem. 1983, 94, 1731–1738. [Google Scholar] [CrossRef] [PubMed]
- Kramárek, M.; Soucek, P.; Réblova, K.; Grodecká, L.K.; Freiberger, T. Splicing analysis of STAT3 tandem donor suggests non-canonical binding registers for U1 and U6 snRNAs. Nucleic Acids Res. 2024, 52, 5959–5974. [Google Scholar] [CrossRef] [PubMed]
- Perez, G.; Barber, G.P.; Benet-Pages, A.; Casper, J.; Clawson, H.; Diekhans, M.; Fischer, C.; Gonzalez, A.S.; Hinrichs, J.N.; Lee, C.M.; et al. The UCSC Genome Browser database: 2025 update. Nucleic Acids Res. 2025, 53, D1243–D1249. [Google Scholar] [CrossRef]
- Parker, M.T.; Fica, S.M.; Simpson, G.G. RNA splicing: A split consensus reveals two major 5’ splice site classes. Open Biol. 2025, 15, 240293. [Google Scholar] [CrossRef]
- Anna, A.; Monika, G. Splicing mutations in human genetic disorders: Examples, detection, and confirmation. J. Appl. Genet. 2018, 59, 253–268. [Google Scholar] [CrossRef]
- Mercer, T.R.; Clark, M.B.; Andersen, S.B.; Brunck, M.E.; Haerty, W.; Crawford, J.; Taft, R.J.; Nielsen, L.K.; Dinger, M.E.; Mattick, J.S. Genome-wide discovery of human splicing branchpoints. Genome Res. 2015, 25, 290–303. [Google Scholar] [CrossRef] [PubMed]
- Smirnov, N. Table for estimating the goodness of fit of empirical distributions. Ann. Math. Stat. 1948, 19, 279–281. [Google Scholar] [CrossRef]
- Peacock, J.A. Two-dimensional goodness-of-fit testing in astronomy. Mon. Not. R. Astron. Soc. 1983, 202, 615–627. [Google Scholar] [CrossRef]
- Fasano, G.; Franceschini, A. A multidimensional Kolmogorov-Smirnov test. Mon. Not. R. Astron. Soc. 1987, 225, 155–170. [Google Scholar] [CrossRef]
- Press, W.H.; Teukolsky, S.A. Kolmogorov-Smirnov test for two-dimensional data: How to tell whether a set of (x,y) data paints are consistent with a particular probability distribution, or with another data. Comput. Phys. 1988, 2, 74–77. [Google Scholar] [CrossRef]
- Faustino, N.A.; Cooper, T.A. Pre-mRNA splicing and human disease. Genes. Dev. 2003, 17, 419–437. [Google Scholar] [CrossRef] [PubMed]
- Graveley, B.R. The haplo-spliceo-transcriptome: Common variations in alternative splicing in the human population. Trends Genet. 2007, 24, 5–7. [Google Scholar] [CrossRef][Green Version]
- Fu, R.-H.; Liu, S.-P.; Huang, H.-J.; Chen, S.-J.; Chen, P.-R.; Lin, Y.-H.; Ho, Y.-C.; Chang, W.-L.; Tsai, C.-H.; Shyu, W.-C.; et al. Aberrant alternative splicing events in parkinson’s disease. Cell Transplant. 2013, 22, 653–661. [Google Scholar] [CrossRef]
- Chwalenia, K.; Facemire, L.; Li, H. Chimeric rnas in cancer and normal physiology. Wiley Interdiscip. Rev. RNA 2017, 8, e1427. [Google Scholar] [CrossRef] [PubMed]
- Montes, M.; Sanford, B.L.; Comiskey, D.F.; Chandler, D.S. Rna splicing and disease: Animal models to therapies. Trends Genet. 2018, 35, 68–87. [Google Scholar] [CrossRef]
- Zhang, Y.; Qian, J.; Gu, C.; Yang, Y. Alternative splicing and cancer: A systematic review. Signal Transduct. Target. Ther. 2021, 6, 78. [Google Scholar] [CrossRef] [PubMed]
- Sun, H.; Liu, K.; Yi, C. Regulation and functions of non-m6A mRNA modifications. Nat. Rev. Mol. Cell Biol. 2023, 24, 714–731. [Google Scholar] [CrossRef]
- Kim, P.; Yoon, S.; Kim, N.; Lee, S.; Ko, M.; Lee, H.; Kang, H.; Kim, J.; Lee, S. ChimerDB 2.0—A knowledge-base for fusion genes updated. Nucleic Acids Res. 2010, 38, D81–D85. [Google Scholar] [CrossRef] [PubMed]
- Kim, P.; Zhou, X. FusionGDB: Fusion gene annotation DataBase. Nucleic Acids Res. 2019, 47, D994–D1004. [Google Scholar] [CrossRef]
- Kim, P.; Tan, H.; Liu, J.; Lee, H.; Jung, H.; Kumar, H.; Zhou, H. FusionGDB 2.0: Fusion gene annotation updates aided by deep learning. Nucleic Acids Res. 2022, 50, D1221–D1230. [Google Scholar] [CrossRef]
- Woods, T.; Preeprem, T.; Lee, K.; Chang, W.; Vidakovic, B. Characterizing exons and introns by regularity of nucleotide strings. Biol. Direct 2016, 11, 6. [Google Scholar] [CrossRef][Green Version]
- Cain, J.A.; Montibus, B.; Oakey, R.J. Intragenic CpG islands and their impact on gene regulation. Front. Cell Dev. Biol. 2022, 10, 832348. [Google Scholar] [CrossRef]
- Halsey, T.C.; Jensen, M.H.; Kadanoff, L.P.; Procaccia, I.; Shraiman, B.I. Fractal measures and their singularities: The characterization of strange sets. Phys. Rev. A 1986, 33, 1141–1151. [Google Scholar] [CrossRef] [PubMed]
- Wan, R.; Bai, R.; Zhan, X.; Shi, Y. How is precursor messenger RNA spliced by the spliceosome? Annu. Rev. Biochem. 2020, 89, 333–358. [Google Scholar] [CrossRef]
- Gehring, N.H.; Roignant, J.-Y. Anything but ordinary—Emerging splicing mechanisms in eukaryotic gene regulation. Trends Genet. 2021, 37, 355–372. [Google Scholar] [CrossRef] [PubMed]
- Beusch, I.; Rao, B.; Studer, M.K.; Luhovska, T.; Sukyte, V.; Lei, S.; Oses-Prieto, J.; SeGraves, E.; Burlingame, A.; Jonas, S.; et al. Targeted high-throughput mutagenesis of the human spliceosome reveals its in vivo operating principles. Mol. Cell 2023, 83, 2578–2594. [Google Scholar] [CrossRef]
- Rogalska, M.E.; Vivor, C.; Valcárcel, J. Regulation of pre-mRNA splicing: Roles in physiology and disease, therapeutic prospects. Nat. Rev. Genet. 2023, 24, 251–269. [Google Scholar] [CrossRef]
- Shenasa, H.; Bentley, D.L. Pre-mRNA splicing and its co-transcriptional connections. Trends Genet. 2023, 39, 672–685. [Google Scholar] [CrossRef] [PubMed]
- Zhan, X.; Lu, Y.; Shi, Y. Molecular basis for the activation of human spliceosome. Nat. Commun. 2024, 15, 6348–6357. [Google Scholar] [CrossRef]
- Cech, T.R. The chemistry of self-splicing RNA and RNA enzymes. Science 1987, 236, 1532–1539. [Google Scholar] [CrossRef]
- Cech, T.R. Self-splicing and enzymatic activity of an intervening sequence RNA from Tetrahymena. Biosci. Rep. 1990, 10, 239–261. [Google Scholar] [CrossRef]
- Pyle, A.M. Group II intron self-splicing. Annu. Rev. Biophys. 2016, 45, 183–205. [Google Scholar] [CrossRef]
- Jaganathan, K.; Panagiotopoulou, S.K.; McRae, J.F.; Darbandi, S.F.; Knowles, D.; Li, Y.I.; Kosmicki, J.A.; Arbelaez, J.; Cui, W.; Schwartz, G.B.; et al. Predicting splicing from primary sequence with deep learning. Cell 2019, 176, 535–548. [Google Scholar] [CrossRef] [PubMed]
- de Sainte-Agathe, J.-M.; Filser, M.; Isidor, B.; Besnard, T.; Gueguen, P.; Perrin, A.; Van Goethem, C.; Verebi, C.; Masingue, M.; Rendu, J.; et al. Spliceai-visual: A free online tool to improve SpliceAI splicing variant interpretation. Hum. Genom. 2023, 17, 7. [Google Scholar] [CrossRef] [PubMed]
- Chao, K.-H.; Mao, A.; Liu, A.; Salzberg, S.L.; Pertea, M. OpenSpliceAI provides an efficient modular implementation of SpliceAI enabling easy retraining across nonhuman species. eLife 2025, 14, RP107454. [Google Scholar] [CrossRef] [PubMed]









Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Shulman, J.; Cisneros, C.M.; Gunaratne, P.H.; Gunaratne, G.H. Classifying Consensus Sequences Using Point-Set Representations. Mathematics 2026, 14, 1826. https://doi.org/10.3390/math14111826
Shulman J, Cisneros CM, Gunaratne PH, Gunaratne GH. Classifying Consensus Sequences Using Point-Set Representations. Mathematics. 2026; 14(11):1826. https://doi.org/10.3390/math14111826
Chicago/Turabian StyleShulman, Jason, Cristian M. Cisneros, Preethi H. Gunaratne, and Gemunu H. Gunaratne. 2026. "Classifying Consensus Sequences Using Point-Set Representations" Mathematics 14, no. 11: 1826. https://doi.org/10.3390/math14111826
APA StyleShulman, J., Cisneros, C. M., Gunaratne, P. H., & Gunaratne, G. H. (2026). Classifying Consensus Sequences Using Point-Set Representations. Mathematics, 14(11), 1826. https://doi.org/10.3390/math14111826

