Global Repeat Map (GRM) Application: Finding All DNA Tandem Repeat Units
Abstract
:1. Introduction
2. Materials and Methods
2.1. Algorithm Outline
2.2. Application Usage and Output
3. Results
3.1. Human Chromosome 19
3.2. Comparison of Human, Chimp and Mouse Chromosome 19 GRM Results
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Santos, V.; Da Silva, E.F.; Almeida, C. Genome size and identification of repetitive DNA sequences using low coverage sequencing in Hancornia speciosa Gomes (Apocynaceae: Gentianales). Genet. Mol. Biol. 2020, 43, e20190175. [Google Scholar] [CrossRef] [PubMed]
- Biscotti, M.A.; Olmo, E.; Heslop-Harrison, J.S. Repetitive DNA in eukaryotic genomes. Chromosome Res. 2015, 23, 415–420. [Google Scholar] [CrossRef] [PubMed]
- López-Flores, I.; Garrido-Ramos, M. The Repetitive DNA Content of Eukaryotic Genomes. Genome Dyn. 2012, 7, 1–28. [Google Scholar] [CrossRef] [PubMed]
- Belyayev, A.; Josefiová, J.; Jandová, M.; Kalendar, R.; Krak, K.; Mandák, B. Natural History of a Satellite DNA Family: From the Ancestral Genome Component to Species-Specific Sequences, Concerted and Non-Concerted Evolution. Int. J. Mol. Sci. 2019, 20, 1201. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- A Bolton, K.; Ross, J.P.; Grice, D.M.; A Bowden, N.; Holliday, E.G.; A Avery-Kiejda, K.; Scott, R.J. STaRRRT: A table of short tandem repeats in regulatory regions of the human genome. BMC Genom. 2013, 14, 795. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Debrauwère, H.; Buard, J.; Tessier, J.; Aubert, D.; Vergnaud, G.; Nicolas, A. Meiotic instability of human minisatellite CEB1 in yeast requires DNA double-strand breaks. Nat. Genet. 1999, 23, 367–371. [Google Scholar] [CrossRef]
- Brinkmann, B.; Klintschar, M.; Neuhuber, F.; Hühne, J.; Rolf, B. Mutation Rate in Human Microsatellites: Influence of the Structure and Length of the Tandem Repeat. Am. J. Hum. Genet. 1998, 62, 1408–1415. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sullivan, L.L.; Chew, K.; Sullivan, B.A. α satellite DNA variation and function of the human centromere. Nucleus 2017, 8, 331–339. [Google Scholar] [CrossRef] [Green Version]
- Warburton, P.E.; Willard, H.F. Genomic analysis of sequence variation in tandemly repeated DNA. Evidence for localized homogeneous sequence domains within arrays of alpha-satellite DNA. J. Mol. Biol. 1990, 216, 3–16. Available online: https://www.ncbi.nlm.nih.gov/pubmed/2122000 (accessed on 1 November 2022). [CrossRef]
- Willard, H.F.; Waye, J.S. Chromosome-specific subsets of human alpha satellite DNA: Analysis of sequence divergence within and between chromosomal subsets and evidence for an ancestral pentameric repeat. J. Mol. Evol. 1987, 25, 207–214. Available online: https://www.ncbi.nlm.nih.gov/pubmed/2822935 (accessed on 1 November 2022). [CrossRef]
- Garrido-Ramos, M.A. Satellite DNA: An Evolving Topic. Genes 2017, 8, 230. [Google Scholar] [CrossRef] [PubMed]
- Jagannathan, M.; Warsinger-Pepe, N.; Watase, G.J.; Yamashita, Y.M. Comparative Analysis of Satellite DNA in the Drosophila melanogaster Species Complex. G3 Genes|Genomes|Genet. 2017, 7, 693–704. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Britten, R.J.; Kohne, D.E. Repeated Sequences in DNA. Hundreds of thousands of copies of DNA sequences have been incorporated into the genomes of higher organisms. Science 1968, 161, 529–540. [Google Scholar] [CrossRef] [PubMed]
- Davidson, E.H.; Britten, R.J. Regulation of Gene Expression: Possible Role of Repetitive Sequences. Science 1979, 204, 1052–1059. [Google Scholar] [CrossRef] [PubMed]
- Sulovari, A.; Li, R.; Audano, P.A.; Porubsky, D.; Vollger, M.R.; Logsdon, G.A.; Warren, W.C.; Pollen, A.A.; Chaisson, M.J.P.; Eichler, E.E.; et al. Human-specific tandem repeat expansion and differential gene expression during primate evolution. Proc. Natl. Acad. Sci. USA 2019, 116, 23243–23253. [Google Scholar] [CrossRef] [PubMed]
- Usdin, K. The biological effects of simple tandem repeats: Lessons from the repeat expansion diseases. Genome Res. 2008, 18, 1011–1019. [Google Scholar] [CrossRef] [Green Version]
- Sawaya, S.; Bagshaw, A.; Buschiazzo, E.; Kumar, P.; Chowdhury, S.; Black, M.A.; Gemmell, N. Microsatellite Tandem Repeats Are Abundant in Human Promoters and Are Associated with Regulatory Elements. PLoS ONE 2013, 8, e54710. [Google Scholar] [CrossRef]
- Lemos, B.; Branco, A.T.; Hartl, D.L. Epigenetic effects of polymorphic Y chromosomes modulate chromatin components, immune response, and sexual conflict. Proc. Natl. Acad. Sci. USA 2010, 107, 15826–15831. [Google Scholar] [CrossRef] [Green Version]
- Feliciello, I.; Akrap, I.; Ugarković, D. Satellite DNA Modulates Gene Expression in the Beetle Tribolium castaneum after Heat Stress. PLoS Genet. 2015, 11, e1005466. [Google Scholar] [CrossRef] [Green Version]
- Joshi, S.S.; Meller, V.H. Satellite Repeats Identify X Chromatin for Dosage Compensation in Drosophila melanogaster Males. Curr. Biol. 2017, 27, 1393–1402.e2. [Google Scholar] [CrossRef]
- Lower, S.S.; McGurk, M.P.; Clark, A.G.; Barbash, D.A. Satellite DNA evolution: Old ideas, new approaches. Curr. Opin. Genet. Dev. 2018, 49, 70–78. [Google Scholar] [CrossRef] [PubMed]
- Bersani, F.; Lee, E.; Kharchenko, P.V.; Xu, A.W.; Liu, M.; Xega, K.; MacKenzie, O.C.; Brannigan, B.W.; Wittner, B.S.; Jung, H.; et al. Pericentromeric satellite repeat expansions through RNA-derived DNA intermediates in cancer. Proc. Natl. Acad. Sci. USA 2015, 112, 15148–15153. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Aldrup-MacDonald, M.E.; Kuo, M.E.; Sullivan, L.L.; Chew, K.; Sullivan, B.A. Genomic variation within alpha satellite DNA influences centromere location on human chromosomes with metastable epialleles. Genome Res. 2016, 26, 1301–1311. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, W.; Li, J.; Suzuki, K.; Qu, J.; Wang, P.; Zhou, J.; Liu, X.; Ren, R.; Xu, X.; Ocampo, A.; et al. A Werner syndrome stem cell model unveils heterochromatin alterations as a driver of human aging. Science 2015, 348, 1160–1163. [Google Scholar] [CrossRef] [Green Version]
- Ferree, P.M.; Barbash, D.A. Species-Specific Heterochromatin Prevents Mitotic Chromosome Segregation to Cause Hybrid Lethality in Drosophila. PLoS Biol. 2009, 7, e1000234. [Google Scholar] [CrossRef] [Green Version]
- Pennacchio, L.A.; Rubin, E.M. Genomic strategies to identify mammalian regulatory sequences. Nat. Rev. Genet. 2001, 2, 100–109. [Google Scholar] [CrossRef]
- Visel, A.; Akiyama, J.A.; Shoukry, M.; Afzal, V.; Rubin, E.M.; Pennacchio, L.A. Functional autonomy of distant-acting human enhancers. Genomics 2009, 93, 509–513. [Google Scholar] [CrossRef] [Green Version]
- Noonan, J.P.; McCallion, A.S. Genomics of Long-Range Regulatory Elements. Annu. Rev. Genom. Hum. Genet. 2010, 11, 1–23. [Google Scholar] [CrossRef]
- Verkerk, A.J.; Pieretti, M.; Sutcliffe, J.S.; Fu, Y.-H.; Kuhl, D.P.; Pizzuti, A.; Reiner, O.; Richards, S.; Victoria, M.F.; Zhang, F.; et al. Identification of a gene (FMR1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell 1991, 65, 905–914. [Google Scholar] [CrossRef]
- MacDonald, M.E.; Ambrose, C.M.; Duyao, M.P.; Myers, R.H.; Lin, C.; Srinidhi, L.; Barnes, G.; Taylor, S.A.; James, M.; Groot, N.; et al. The Huntington’s Disease Collaborative Research Group: A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes. Cell 1993, 72, 971–983. [Google Scholar] [CrossRef]
- Fu, Y.H.; Pizzuti, A.; Fenwick, R.G.; King, J.; Rajnarayan, S.; Dunne, P.W.; Dubel, J.; Nasser, G.A.; Ashizawa, T.; de Jong, P.; et al. An Unstable Triplet Repeat in a Gene Related to Myotonic Muscular Dystrophy. Science 1992, 255, 1256–1258. [Google Scholar] [CrossRef] [PubMed]
- La Spada, A.R.; Wilson, E.M.; Lubahn, D.B.; Harding, A.E.; Fischbeck, K.H. Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy. Nature 1991, 352, 77–79. [Google Scholar] [CrossRef]
- Campuzano, V.; Montermini, L.; Moltò, M.D.; Pianese, L.; Cossée, M.; Cavalcanti, F.; Monros, E.; Rodius, F.; Duclos, F.; Monticelli, A.; et al. Friedreich’s Ataxia: Autosomal Recessive Disease Caused by an Intronic GAA Triplet Repeat Expansion. Science 1996, 271, 1423–1427. [Google Scholar] [CrossRef]
- Sevim, V.; Bashir, A.; Chin, C.-S.; Miga, K.H. Alpha-CENTAURI: Assessing novel centromeric repeat sequence variation with long read sequencing. Bioinformatics 2016, 32, 1921–1924. [Google Scholar] [CrossRef] [Green Version]
- Roy, A.; Raychaudhury, C.; Nandy, A. Novel techniques of graphical representation and analysis of DNA sequences—A review. J. Biosci. 1998, 23, 55–71. [Google Scholar] [CrossRef]
- Benson, G. Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res. 1999, 27, 573–580. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chakravarthy, A.N.S.; Iasemidis, L.D.; Tsakalis, K. Autoregressive modeling and feature analysis of DNA sequences. EURASIP J. Adv. Signal Process. 2004, 1, 13–28. [Google Scholar] [CrossRef] [Green Version]
- Krishnan, A.; Tang, F. Exhaustive whole-genome tandem repeats search. Bioinformatics 2004, 20, 2702–2710. [Google Scholar] [CrossRef] [Green Version]
- Nandy, M.A.H.; Basak, S.C. Mathematical descriptors of DNA sequences: Development and applications. ARKIVOC 2006, 9, 211–238. [Google Scholar] [CrossRef] [Green Version]
- Leclercq, S.; Rivals, E.; Jarne, P. Detecting microsatellites within genomes: Significant variation among algorithms. BMC Bioinform. 2007, 8, 125. [Google Scholar] [CrossRef]
- Sharma, P.C.; Grover, A.; Kahl, G. Mining microsatellites in eukaryotic genomes. Trends Biotechnol. 2007, 25, 490–498. [Google Scholar] [CrossRef] [PubMed]
- Merkel, A.; Gemmell, N. Detecting short tandem repeats from genome data: Opening the software black box. Brief. Bioinform. 2008, 9, 355–366. [Google Scholar] [CrossRef] [PubMed]
- Richard, G.-F.; Kerrest, A.; Dujon, B. Comparative Genomics and Molecular Dynamics of DNA Repeats in Eukaryotes. Microbiol. Mol. Biol. Rev. 2008, 72, 686–727. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Saha, S.S.B.; Magbanua, Z.V.; Peterson, D.G. Computational approaches and tools used in identification of dispersed repetitive DNA sequences. Trop. Plant Biol. 2008, 1, 85–96. [Google Scholar] [CrossRef]
- Saha, S.; Bridges, S.; Magbanua, Z.V.; Peterson, D.G. Empirical comparison of ab initio repeat finding programs. Nucleic Acids Res. 2008, 36, 2284–2294. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Arniker, S.B.; Kwan, H. Graphical representation of DNA sequences. In Proceedings of the IEEE International Conference Electro/Information Technology, Windsor, ON, Canada, 7–9 June 2009; pp. 311–314. [Google Scholar] [CrossRef]
- Lorenzo-Ginori, J.V.; Rodríguez-Fuentes, A.; Abalo, R.G.; Rodríguez, R.S. Digital signal processing in the analysis of genomic sequences. Curr. Bioinform. 2009, 4, 28–40. [Google Scholar] [CrossRef]
- Zhou, H.; Du, L.; Yan, H. Detection of Tandem Repeats in DNA Sequences Based on Parametric Spectral Estimation. IEEE Trans. Inf. Technol. Biomed. 2008, 13, 747–755. [Google Scholar] [CrossRef]
- Parisi, V.; De Fonzo, V.; Aluffi-Pentini, F. STRING: Finding tandem repeats in DNA sequences. Bioinformatics 2003, 19, 1733–1738. [Google Scholar] [CrossRef] [Green Version]
- Glunčić, M.; Paar, V. Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm. Nucleic Acids Res. 2012, 41, e17. [Google Scholar] [CrossRef]
- Tørresen, O.K.; Star, B.; Mier, P.; A Andrade-Navarro, M.; Bateman, A.; Jarnot, P.; Gruca, A.; Grynberg, M.; Kajava, A.V.; Promponas, V.J.; et al. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res. 2019, 47, 10994–11006. [Google Scholar] [CrossRef] [Green Version]
- Paar, V.; Glunčić, M.; Basar, I.; Rosandić, M.; Paar, P.; Cvitković, M. Large Tandem, Higher Order Repeats and Regularly Dispersed Repeat Units Contribute Substantially to Divergence Between Human and Chimpanzee Y Chromosomes. J. Mol. Evol. 2010, 72, 34–55. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Paar, V.; Glunčić, M.; Rosandić, M.; Basar, I.; Vlahović, I. Intragene Higher Order Repeats in Neuroblastoma BreakPoint Family Genes Distinguish Humans from Chimpanzees. Mol. Biol. Evol. 2011, 28, 1877–1892. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Miga, K.H.; Alexandrov, I.A. Variation and Evolution of Human Centromeres: A Field Guide and Perspective. Annu. Rev. Genet. 2021, 55, 583–602. [Google Scholar] [CrossRef]
- Altemose, N.; Logsdon, G.A.; Bzikadze, A.V.; Sidhwani, P.; Langley, S.A.; Caldas, G.V.; Hoyt, S.J.; Uralsky, L.; Ryabov, F.D.; Shew, C.J.; et al. Complete genomic and epigenetic maps of human centromeres. Science 2022, 376, eabl4178. [Google Scholar] [CrossRef]
- A Easterling, K.; Pitra, N.J.; Morcol, T.B.; Aquino, J.R.; Lopes, L.G.; Bussey, K.C.; Matthews, P.D.; Bass, H.W. Identification of tandem repeat families from long-read sequences of Humulus lupulus. PLoS ONE 2020, 15, e0233971. [Google Scholar] [CrossRef]
- Schueler, M.G.; Higgins, A.W.; Rudd, M.K.; Gustashaw, K.; Willard, H.F. Genomic and Genetic Definition of a Functional Human Centromere. Science 2001, 294, 109–115. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Rudd, M.K.; Willard, H.F. Analysis of the centromeric regions of the human genome assembly. Trends Genet. 2004, 20, 529–533. [Google Scholar] [CrossRef]
- Prosser, J.; Frommer, M.; Paul, C.; Vincent, P. Sequence relationships of three human satellite DNAs. J. Mol. Biol. 1986, 187, 145–155. [Google Scholar] [CrossRef]
- Moyzis, R.K.; Albright, K.L.; Bartholdi, M.F.; Cram, L.S.; Deaven, L.L.; Hildebrand, C.E.; Joste, N.E.; Longmire, J.L.; Meyne, J.; Schwarzacher-Robinson, T. Human chromosome-specific repetitive DNA sequences: Novel markers for genetic analysis. Chromosoma 1987, 95, 375–386. [Google Scholar] [CrossRef]
- Aldrup-MacDonald, M.E.; Sullivan, B.A. The Past, Present, and Future of Human Centromere Genomics. Genes 2014, 5, 33–50. [Google Scholar] [CrossRef] [Green Version]
- Guenatri, M.; Bailly, D.; Maison, C.; Almouzni, G. Mouse centric and pericentric satellite repeats form distinct functional heterochromatin. J. Cell Biol. 2004, 166, 493–505. [Google Scholar] [CrossRef] [PubMed]
- Komissarov, A.S.; Gavrilova, E.V.; Demin, S.J.; Ishov, A.M.; Podgornaya, O.I. Tandemly repeated DNA families in the mouse genome. BMC Genom. 2011, 12, 531. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Glunčić, M.; Vlahović, I.; Paar, V. Discovery of 33mer in chromosome 21—The largest alpha satellite higher order repeat unit among all human somatic chromosomes. Sci. Rep. 2019, 9, 12629. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Rosandic, M.; Paar, V.; Basar, I. Key-string segmentation algorithm and higher-order repeat 16mer (54 copies) in human alpha satellite DNA in chromosome 7. J. Theor. Biol. 2003, 221, 29–37. [Google Scholar] [CrossRef] [PubMed]
Loc | Type | Name | RefSeq | INSDC | Size (Mb) | GC% | Protein | rRNA | tRNA | Other RNA | Gene | Pseudogene |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Homo sapiens | Chr | 19 | NC_000019.10 | CM000681.2 | 58.62 | 47.9 | 6944 | - | 6 | 2001 | 2494 | 527 |
Pan troglodytes | Chr | 19 | NC_036898.1 | CM009257.2 | 56.73 | 48.4 | 4949 | - | 7 | 834 | 1997 | 213 |
Mus musculus | Chr | 19 | NC_000085.7 | CM001012.3 | 61.42 | 43.1 | 2718 | - | 9 | 1041 | 1380 | 229 |
TR Unit (bp) | No. Copies | Start (bp) | End (bp) | CG(%) |
---|---|---|---|---|
172 | 163 | 24,203,386 | 24,231,348 | 36.1 |
172 | 30 | 24,237,553 | 24,242,705 | 36.1 |
172 | 66 | 24,252,398 | 24,263,766 | 36.7 |
172 | 23 | 24,273,657 | 24,279,543 | 37.5 |
172 | 197 | 24,286,112 | 24,320,483 | 36.1 |
172 | 362 | 24,329,796 | 24,394,003 | 36.2 |
171 | 32 | 24,400,265 | 24,405,823 | 36.0 |
2896 | 12 | 24,412,442 | 24,447,210 | 38.1 |
171 | 2302 | 24,499,052 | 24,891,328 | 39.3 |
340 | 6721 | 24,905,001 | 27,190,218 | 39.2 |
340 | 41 | 27,241,096 | 27,255,422 | 37.5 |
171 | 132 | 27,258,016 | 27,280,642 | 36.3 |
171 | 163 | 27,286,672 | 27,318,118 | 36.1 |
171 | 19 | 27,321,147 | 27,324,392 | 36.8 |
171 | 383 | 27,330,489 | 27,421,643 | 36.1 |
172 | 187 | 27,428,565 | 27,463,476 | 36.0 |
172 | 80 | 27,469,901 | 27,500,339 | 36.2 |
172 | 57 | 27,505,176 | 27,514,962 | 37.0 |
172 | 335 | 27,520,248 | 27,577,951 | 36.0 |
172 | 80 | 27,583,082 | 27,596,854 | 36.6 |
172 | 174 | 27,603,075 | 27,632,964 | 35.9 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Glunčić, M.; Vlahović, I.; Mršić, L.; Paar, V. Global Repeat Map (GRM) Application: Finding All DNA Tandem Repeat Units. Algorithms 2022, 15, 458. https://doi.org/10.3390/a15120458
Glunčić M, Vlahović I, Mršić L, Paar V. Global Repeat Map (GRM) Application: Finding All DNA Tandem Repeat Units. Algorithms. 2022; 15(12):458. https://doi.org/10.3390/a15120458
Chicago/Turabian StyleGlunčić, Matko, Ines Vlahović, Leo Mršić, and Vladimir Paar. 2022. "Global Repeat Map (GRM) Application: Finding All DNA Tandem Repeat Units" Algorithms 15, no. 12: 458. https://doi.org/10.3390/a15120458
APA StyleGlunčić, M., Vlahović, I., Mršić, L., & Paar, V. (2022). Global Repeat Map (GRM) Application: Finding All DNA Tandem Repeat Units. Algorithms, 15(12), 458. https://doi.org/10.3390/a15120458