Novel Concept of Alpha Satellite Cascading Higher-Order Repeats (HORs) and Precise Identification of 15mer and 20mer Cascading HORs in Complete T2T-CHM13 Assembly of Human Chromosome 15
Abstract
:1. Introduction
2. Results and Discussion
2.1. GRM (Global Repeat Map) Diagram and MD (Monomer Distance) Diagram for T2T-CHM13 Assembly of Human Chromosome 15
2.2. Aligned Scheme for Cascading 15mer HOR Array with 4mer, 7mer and 11mer Subfragments
2.3. Aligned Scheme for Cascading 20mer HOR Array with 5mer and 15mer Subfragments
2.4. Aligned Scheme for Willard’s Type 18mer HOR Array
2.5. Aligned Scheme for Interspersed Willard’s Type 25/26mer HOR Array and 34-Monomer Tertiary Subfragment
3. Materials and Methods
- Using GRMapp version 1.0 (the GRM graphical user interface application is freely available at http://genom.hazu.hr/tools.html, URL (accessed on 14 April 2024)), alpha satellite monomers were identified within the entire human chromosome T2T-CHM13 assembly. GRMapp provides all tandem repeats (TRs) in the analyzed assembly as its output. From the list of all TRs, those with lengths of ~171 bp were selected and subjected to GRM diagram analysis within GRMapp. To be classified as alpha satellite monomers, the GRM diagram must exhibit peaks at ~171 bp and multiples at ~342 bp and ~513 kb, and so on.
- The extracted alpha satellite monomers were compared to each other, and a divergence matrix was created. From the divergence matrix, monomer families were identified, encompassing all monomers that differ from each other by less than 5%.
- For each monomer family, a consensus sequence was generated using the stand-alone tool for multiple-sequence alignment, pyabPOA (pyabpoa 1.0.0a0), available at https://github.com/yangao07/abpoa, URL (accessed on 14 April 2024). The consensus sequences for all alpha satellite monomer families are provided in Supplementary Tables S5–S8.
- Chromosome 15 T2T-CHM13 assembly was searched with all consensus sequences using the Edlib open-source C/C++ library for exact pairwise sequence alignment [49]. The search was conducted base by base for the entire chromosome, considering both the direct and reverse complement consensus sequences.
- The results of the search in step (iv) are presented graphically (Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7) such that all monomers of the same family are located in the same column and colored with the same color. As a guideline for constructing rows in a cascading HOR scheme for canonical nmer HOR copies with monomer types (): we commence with a monomer of type and continue until reaching the monomer of type ; alternatively, if a monomer of type is followed by a subsequent monomer where , the subsequent monomer is aligned in the next row. For a representative example, refer to Figure 3.
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Miga, K.H. Centromere studies in the era of ‘telomere-to-telomere’ genomics. Exp. Cell Res. 2020, 394, 112127. [Google Scholar] [CrossRef] [PubMed]
- Nurk, S.; Koren, S.; Rhie, A.; Rautiainen, M.; Bzikadze, A.V.; Mikheenko, A.; Vollger, M.R.; Altemose, N.; Uralsky, L.; Gershman, A.; et al. The complete sequence of a human genome. Science 2022, 376, 44–53. [Google Scholar] [CrossRef] [PubMed]
- Cechova, M.; Miga, K.H. Comprehensive variant discovery in the era of complete human reference genomes. Nat. Methods 2023, 20, 17–19. [Google Scholar] [CrossRef] [PubMed]
- Altemose, N.; Logsdon, G.A.; Bzikadze, A.V.; Sidhwani, P.; Langley, S.A.; Caldas, G.V.; Hoyt, S.J.; Uralsky, L.; Ryabov, F.D.; Shew, C.J.; et al. Complete genomic and epigenetic maps of human centromeres. Science 2022, 376, eabl4178. [Google Scholar] [CrossRef]
- Miga, K.H. The Promises and Challenges of Genomic Studies of Human Centromeres. Prog. Mol. Subcell Biol. 2017, 56, 285–304. [Google Scholar]
- Gershman, A.; Sauria, M.E.G.; Guitart, X.; Vollger, M.R.; Hook, P.W.; Hoyt, S.J.; Jain, M.; Shumate, A.; Razaghi, R.; Koren, S.; et al. Epigenetic patterns in a complete human genome. Science 2022, 376, eabj5089. [Google Scholar] [CrossRef] [PubMed]
- Altemose, N. A classical revival: Human satellite DNAs enter the genomics era. Semin. Cell Dev. Biol. 2022, 128, 2–14. [Google Scholar] [PubMed]
- Wlodzimierz, P.; Hong, M.; Henderson, I.R. TRASH: Tandem Repeat Annotation and Structural Hierarchy. Bioinformatics 2023, 39, btad308. [Google Scholar]
- Glunčić, M.; Vlahović, I.; Rosandić, M.; Paar, V. Tandemly repeated NBPF HOR copies (Olduvai triplets): Possible impact on human brain evolution. Life Sci. Alliance 2023, 6, e202101306. [Google Scholar]
- Manuelidis, L. Chromosomal localization of complex and simple repeated human DNAs. Chromosoma 1978, 66, 23–32. [Google Scholar] [CrossRef]
- Wu, J.C.; Manuelidis, L. Sequence definition and organization of a human repeated DNA. J. Mol. Biol. 1980, 142, 363–386. [Google Scholar] [CrossRef] [PubMed]
- Willard, H.F. Chromosome-specific organization of human alpha satellite DNA. Am. J. Hum. Genet. 1985, 37, 524–532. [Google Scholar]
- Willard, H.F.; Waye, J.S. Chromosome-specific subsets of human alpha satellite DNA: Analysis of sequence divergence within and between chromosomal subsets and evidence for an ancestral pentameric repeat. J. Mol. Evol. 1987, 25, 207–214. [Google Scholar] [PubMed]
- Waye, J.S.; Willard, H.F. Nucleotide sequence heterogeneity of alpha satellite repetitive DNA: A survey of alphoid sequences from different human chromosomes. Nucleic Acids Res. 1987, 15, 7549–7569. [Google Scholar] [PubMed]
- Jørgensen, A.; Bostock, C.; Bak, A. Chromosome-specific subfamilies within human alphoid repetitive DNA. J. Mol. Biol. 1986, 187, 185–196. [Google Scholar] [CrossRef] [PubMed]
- Willard, H.F. Evolution of alpha satellite. Curr. Opin. Genet. Dev. 1991, 1, 509–514. [Google Scholar] [PubMed]
- Choo, K.H.; Vissel, B.; Nagy, A.; Earle, E.; Kalitsis, P. A survey of the genomic distribution of alpha satellite DNA on all the human chromosomes, and derivation of a new consensus sequence. Nucleic Acids Res. 1991, 19, 1179–1182. [Google Scholar] [CrossRef] [PubMed]
- Glunčić, M.; Paar, V. Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm. Nucleic Acids Res. 2013, 41, e17. [Google Scholar] [PubMed]
- Romanova, L.Y.; Deriagin, G.V.; Mashkova, T.D.; Tumeneva, I.G.; Mushegian, A.R.; Kisselev, L.L.; Alexandrov, I.A. Evidence for selection in evolution of alpha satellite DNA: The central role of CENP-B/pJ alpha binding region. J. Mol. Biol. 1996, 261, 334–340. [Google Scholar]
- Warburton, P.E.; Willard, H.F. Human Genome Evolution; BIOS Scientific Publisher: Oxford, UK, 1996; pp. 121–145. [Google Scholar]
- O’Keefe, C.L.; Matera, A.G. Alpha Satellite DNA Variant-Specific Oligoprobes Differing by a Single Base Can Distinguish Chromosome 15 Homologs. Genome Res. 2000, 10, 1342–1350. [Google Scholar]
- Alexandrov, A.; Kazakov, A.; Tumeneva, I.; Shepelev, V.; Yurov, Y. Alpha-satellite DNA of primates: Old and new families. Chromosoma 2001, 110, 253–266. [Google Scholar] [CrossRef]
- Schueler, M.G.; Higgins, A.W.; Rudd, M.K.; Gustashaw, K.; Willard, H.F. Genomic and Genetic Definition of a Functional Human Centromere. Science 2001, 294, 109–115. [Google Scholar] [CrossRef] [PubMed]
- Alkan, C.; Eichler, E.E.; Bailey, J.A.; Sahinalp, S.C.; Tuzun, E. The role of unequal crossover in alpha-satellite DNA evolution: A computational analysis. J. Comput. Biol. 2004, 11, 933–944. [Google Scholar] [CrossRef] [PubMed]
- Jurka, J.; Kapitonov, V.V.; Pavlicek, A.; Klonowski, P.; Kohany, O.; Walichiewicz, J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 2005, 110, 462–467. [Google Scholar] [CrossRef] [PubMed]
- Rudd, M.K.; Wray, G.A.; Willard, H.F. The evolutionary dynamics of α-satellite. Genome Res. 2006, 16, 88–96. [Google Scholar] [CrossRef] [PubMed]
- Alkan, C.; Ventura, M.; Archidiacono, N.; Rocchi, M.; Sahinalp, S.C.; E Eichler, E. Organization and Evolution of Primate Centromeric DNA from Whole-Genome Shotgun Sequence Data. PLoS Comput. Biol. 2007, 3, 1807–1818. [Google Scholar] [CrossRef] [PubMed]
- Paar, V.; GlunčIć, M.; Rosandić, M.; Basar, I.; Vlahović, I. Intragene Higher Order Repeats in Neuroblastoma BreakPoint Family Genes Distinguish Humans from Chimpanzees. Mol. Biol. Evol. 2011, 28, 1877–1892. [Google Scholar] [CrossRef] [PubMed]
- Hayden, K.E.; Strome, E.D.; Merrett, S.L.; Lee, H.-R.; Rudd, M.K.; Willard, H.F. Sequences Associated with Centromere Competency in the Human Genome. Mol. Cell. Biol. 2013, 33, 763–772. [Google Scholar] [CrossRef] [PubMed]
- Terada, S.; Hirai, Y.; Hirai, H.; Koga, A. Higher-order repeat structure in alpha satellite DNA is an attribute of hominoids rather than hominids. J. Hum. Genet. 2013, 58, 752–754. [Google Scholar] [CrossRef] [PubMed]
- Aldrup-MacDonald, M.E.; Sullivan, B.A. The Past, Present, and Future of Human Centromere Genomics. Genes 2014, 5, 33–50. [Google Scholar] [CrossRef]
- Miga, K.H.; Newton, Y.; Jain, M.; Altemose, N.; Willard, H.F.; Kent, W.J. Centromere reference models for human chromosomes X and Y satellite arrays. Genome Res. 2014, 24, 697–707. [Google Scholar] [CrossRef] [PubMed]
- Shepelev, V.; Uralsky, L.; Alexandrov, A.; Yurov, Y.; Rogaev, E.; Alexandrov, I. Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly. Genom. Data 2015, 5, 139–146. [Google Scholar] [CrossRef]
- Sullivan, L.L.; Chew, K.; Sullivan, B.A. α satellite DNA variation and function of the human centromere. Nucleus 2017, 8, 331–339. [Google Scholar] [CrossRef]
- Uralsky, L.; Shepelev, V.; Alexandrov, A.; Yurov, Y.; Rogaev, E.; Alexandrov, I. Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly. Data Brief 2019, 24, 103708. [Google Scholar] [CrossRef]
- Smit, A.F.A.; Hubley, R.; Green, P. RepeatMasker Open-3.0. 1996–2010. Available online: http://www.repeatmasker.org (accessed on 10 April 2024).
- Novák, P.; Neumann, P.; Macas, J. Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinform. 2010, 11, 378. [Google Scholar] [CrossRef] [PubMed]
- Benson, G. Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res. 1999, 27, 573–580. [Google Scholar] [CrossRef]
- Kunyavskaya, O.; Dvorkina, T.; Bzikadze, A.V.; Alexandrov, I.A.; Pevzner, P.A. Automated annotation of human centromeres with HORmon. Genome Res. 2022, 32, 1137–1151. [Google Scholar] [CrossRef]
- Bzikadze, A.V.; Pevzner, P.A. Automated assembly of centromeres from ultra-long error-prone reads. Nat. Biotechnol. 2020, 38, 1309–1316. [Google Scholar] [CrossRef] [PubMed]
- Sevim, V.; Bashir, A.; Chin, C.-S.; Miga, K.H. Alpha-CENTAURI: Assessing novel centromeric repeat sequence variation with long read sequencing. Bioinformatics 2016, 32, 1921–1924. [Google Scholar] [CrossRef]
- Gao, S.; Yang, X.; Guo, H.; Zhao, X.; Wang, B.; Ye, K. HiCAT: A tool for automatic annotation of centromere structure. Genome Biol. 2023, 24, 58. [Google Scholar] [CrossRef]
- Dvorkina, T.; Kunyavskaya, O.; Bzikadze, A.V.; Alexandrov, I.; A Pevzner, P. CentromereArchitect: Inference and analysis of the architecture of centromeres. Bioinformatics 2021, 37, i196–i204. [Google Scholar] [CrossRef] [PubMed]
- Paar, V.; Basar, I.; Rosandic, M.; Gluncic, M. Consensus Higher Order Repeats and Frequency of String Distributions in Human Genome. Curr. Genom. 2007, 8, 93–111. [Google Scholar] [CrossRef] [PubMed]
- Choo, K.; Earle, E.; Vissel, B.; Filby, R. Identification of two distinct subfamilies of alpha satellite DNA that are highly specific for human chromosome 15. Genomics 1990, 7, 143–151. [Google Scholar] [CrossRef] [PubMed]
- Glunčić, M.; Vlahović, I.; Mršić, L.; Paar, V. Global Repeat Map (GRM) Application: Finding All DNA Tandem Repeat Units. Algorithms 2022, 15, 458. [Google Scholar] [CrossRef]
- Glunčić, M.; Vlahović, I.; Paar, V. Discovery of 33mer in chromosome 21—The largest alpha satellite higher order repeat unit among all human somatic chromosomes. Sci. Rep. 2019, 9, 12629. [Google Scholar] [CrossRef]
- Vlahović, I.; Glunčić, M.; Dekanić, K.; Mršić, L.; Jerković, H.; Martinjak, I.; Paar, V. Global repeat map algorithm (GRM) reveals differences in alpha satellite number of tandem and higher order repeats (HORs) in human, Neanderthal and chimpanzee genomes-novel tandem repeat database. In Proceedings of the 43rd International Convention on Information, Communication and Electronic Technology (MIPRO), Opatija, Croatia, 28 September–2 October 2020; pp. 237–242. [Google Scholar]
- Šošić, M.; Šikić, M. Edlib: A C/C ++ library for fast, exact sequence alignment using edit distance. Bioinformatics 2017, 33, 1394–1395. [Google Scholar] [CrossRef]
HOR | Notation | N | τ | No. of HOR Copies | Type of HOR | Multimonomer Tertiary Repeat Fragments |
---|---|---|---|---|---|---|
15mer | hor3 | 15 | 9 | 429 | Cascading | 4, 7, 11 |
20mer | hor4 | 20 | 19 | 164 | Cascading | 5, 15 |
18mer | hor1 | 18 | - | 12 | Willard’s type | - |
25/26mer | hor2 | 25/26 | - | 51/14 | Intermixed Willard’s type | 34 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Glunčić, M.; Vlahović, I.; Rosandić, M.; Paar, V. Novel Concept of Alpha Satellite Cascading Higher-Order Repeats (HORs) and Precise Identification of 15mer and 20mer Cascading HORs in Complete T2T-CHM13 Assembly of Human Chromosome 15. Int. J. Mol. Sci. 2024, 25, 4395. https://doi.org/10.3390/ijms25084395
Glunčić M, Vlahović I, Rosandić M, Paar V. Novel Concept of Alpha Satellite Cascading Higher-Order Repeats (HORs) and Precise Identification of 15mer and 20mer Cascading HORs in Complete T2T-CHM13 Assembly of Human Chromosome 15. International Journal of Molecular Sciences. 2024; 25(8):4395. https://doi.org/10.3390/ijms25084395
Chicago/Turabian StyleGlunčić, Matko, Ines Vlahović, Marija Rosandić, and Vladimir Paar. 2024. "Novel Concept of Alpha Satellite Cascading Higher-Order Repeats (HORs) and Precise Identification of 15mer and 20mer Cascading HORs in Complete T2T-CHM13 Assembly of Human Chromosome 15" International Journal of Molecular Sciences 25, no. 8: 4395. https://doi.org/10.3390/ijms25084395