Nucleotide Composition of Ultra-Conserved Elements Shows Excess of GpC and Depletion of GG and CC Dinucleotides
Abstract
:1. Introduction
2. Materials and Methods
2.1. Databases
2.2. Programs for SNP Computational Processing
2.3. Statistics
3. Results
3.1. Database
3.2. Density of SNPs inside UCNE vs. Whole Genome
3.3. Number of Mutations inside UCNE among 2504 Individuals
3.4. Non-Coding RNAs inside UCNEs
3.5. Meiotic Recombination Rates inside UCNEs
3.6. Search for UCNEs Sequence Markers
4. Discussion
4.1. Strong Nucleotide Stacking Interactions within UCNEs
4.2. Paradox for Purifying Selection of Numerous Mutations in UCNEs
5. Conclusions
- UCNE sequences are AT-rich and enriched by GpC dinucleotides;
- Every human has over 300 mutations inside 4273 UCNE;
- We hypothesized that due to unique dinucleotide composition UCNE sequences may form a DNA duplex with distinctive properties. This hypothesis is awaiting experimental testing.
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Dermitzakis, E.T.; Reymond, A.; Lyle, R.; Scamuffa, N.; Ucla, C.; Deutsch, S.; Stevenson, B.J.; Flegel, V.; Bucher, P.; Jongeneel, C.V.; et al. Numerous potentially functional but non-genic conserved sequences on human chromosome 21. Nature 2002, 420, 578–582. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bejerano, G.; Pheasant, M.; Makunin, I.; Stephen, S.; Kent, W.J.; Mattick, J.S.; Haussler, D. Ultraconserved elements in the human genome. Science 2004, 304, 1321–1325. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Elgar, G.; Vavouri, T. Tuning in to the signals: Noncoding sequence conservation in vertebrate genomes. Trends Genet. 2008, 24, 344–352. [Google Scholar] [CrossRef] [PubMed]
- Dimitrieva, S.; Bucher, P. UCNEbase—A database of ultraconserved non-coding elements and genomic regulatory blocks. Nucleic Acids Res. 2013, 41, D101–D109. [Google Scholar] [CrossRef]
- Habic, A.; Mattick, J.S.; Calin, G.A.; Krese, R.; Konc, J.; Kunej, T. Genetic Variations of Ultraconserved Elements in the Human Genome. OMICS 2019, 23, 549–559. [Google Scholar] [CrossRef] [Green Version]
- Leypold, N.A.; Speicher, M.R. Evolutionary conservation in noncoding genomic regions. Trends Genet. 2021, 37, 903–918. [Google Scholar] [CrossRef]
- Snetkova, V.; Pennacchio, L.A.; Visel, A.; Dickel, D.E. Perfect and imperfect views of ultraconserved sequences. Nat. Rev. Genet. 2022, 23, 182–194. [Google Scholar] [CrossRef]
- Katzman, S.; Kern, A.D.; Bejerano, G.; Fewell, G.; Fulton, L.; Wilson, R.K.; Salama, S.R.; Haussler, D. Human genome ultraconserved elements are ultraselected. Science 2007, 317, 915. [Google Scholar] [CrossRef] [Green Version]
- 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 2015, 526, 68–74. [Google Scholar] [CrossRef] [Green Version]
- International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature 2007, 449, 851–861. [Google Scholar] [CrossRef]
- Zhao, L.; Wang, J.; Li, Y.; Song, T.; Wu, Y.; Fang, S.; Bu, D.; Li, H.; Sun, L.; Pei, D.; et al. NONCODEV6: An updated database dedicated to long non-coding RNA annotation in both animals and plants. Nucleic Acids Res. 2021, 49, D165–D171. [Google Scholar] [CrossRef] [PubMed]
- Kent, W.J.; Sugnet, C.W.; Furey, T.S.; Roskin, K.M.; Pringle, T.H.; Zahler, A.M.; Haussler, D. The human genome browser at UCSC. Genome Res. 2002, 12, 996–1006. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bechtel, J.M.; Wittenschlaeger, T.; Dwyer, T.; Song, J.; Arunachalam, S.; Ramakrishnan, S.K.; Shepard, S.; Fedorov, A. Genomic mid-range inhomogeneity correlates with an abundance of RNA secondary structures. BMC Genom. 2008, 9, 284. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Karlin, S.; Burge, C. Dinucleotide relative abundance extremes: A genomic signature. Trends Genet. 1995, 11, 283–290. [Google Scholar]
- Paudel, R.; Fedorova, L.; Fedorov, A. Adapting Biased Gene Conversion theory to account for intensive GC-content deterioration in the human genome by novel mutations. PLoS ONE 2020, 15, e0232167. [Google Scholar] [CrossRef]
- Rao, M.R.S. Long Non Coding RNA Biology; Springer: Singapore, 2017; Volume 1008. [Google Scholar]
- Khuder, B. Human Genome and Transcriptome Analysis with Next-Generation Sequencing. Doctoral Dissertation, University of Toledo, Toledo, OH, USA, 2017. [Google Scholar]
- Leinonen, R.; Sugawara, H.; Shumway, M.; on behalf of the International Nucleotide Sequence Database Collaboration. The sequence read archive. Nucleic Acids Res. 2011, 39, D19–D21. [Google Scholar] [CrossRef] [Green Version]
- Qiu, S.; McSweeny, A.; Choulet, S.; Saha-Mandal, A.; Fedorova, L.; Fedorov, A. Genome evolution by matrix algorithms: Cellular automata approach to population genetics. Genome Biol. Evol. 2014, 6, 988–999. [Google Scholar] [CrossRef] [Green Version]
- Zhou, Y.; Browning, B.L.; Browning, S.R. Population-Specific Recombination Maps from Segments of Identity by Descent. Am. J. Hum. Genet. 2020, 107, 137–148. [Google Scholar] [CrossRef]
- Fedorova, L.; Fedorov, A. Mid-range inhomogeneity of eukaryotic genomes. Sci. World J. 2011, 11, 842–854. [Google Scholar] [CrossRef] [Green Version]
- Petersheim, M.; Turner, D.H. Base-stacking and base-pairing contributions to helix stability: Thermodynamics of double-helix formation with CCGG, CCGGp, CCGGAp, ACCGGp, CCGGUp, and ACCGGUp. Biochemistry 1983, 22, 256–263. [Google Scholar] [CrossRef]
- Yakovchuk, P.; Protozanova, E.; Frank-Kamenetskii, M.D. Base-stacking and base-pairing contributions into thermal stability of the DNA double helix. Nucleic Acids Res. 2006, 34, 564–574. [Google Scholar] [CrossRef] [PubMed]
- Zacharias, M. Base-Pairing and Base-Stacking Contributions to Double-Stranded DNA Formation. J. Phys. Chem. B 2020, 124, 10345–10352. [Google Scholar] [CrossRef] [PubMed]
- Privalov, P.L.; Crane-Robinson, C. Forces maintaining the DNA double helix. Eur. Biophys. J. 2020, 49, 315–321. [Google Scholar] [CrossRef] [PubMed]
- Dragan, A.I.; Crane-Robinson, C.; Privalov, P.L. Thermodynamic basis of the α-helix and DNA duplex. Eur. Biophys. J. 2021, 50, 787–792. [Google Scholar] [CrossRef]
- Martinez, C.R.; Iverson, B.L. Rethinking the term “pi-stacking”. Chem. Sci. 2012, 3, 2191–2201. [Google Scholar] [CrossRef] [Green Version]
- Abbott, D.; Davies, P.C.W.; Pati, A.K. Quantum Aspects of Life; Imperial College Press: London, UK; World Scientific: Hackensack, NJ, USA, 2008; Volume xxvi, p. 442. [Google Scholar]
- Kool, E.T. Hydrogen bonding, base stacking, and steric effects in dna replication. Annu. Rev. Biophys. Biomol. Struct. 2001, 30, 1–22. [Google Scholar] [CrossRef] [Green Version]
- SantaLucia, J., Jr.; Allawi, H.T.; Seneviratne, P.A. Improved nearest-neighbor parameters for predicting DNA duplex stability. Biochemistry 1996, 35, 3555–3562. [Google Scholar] [CrossRef]
- Sugimoto, N.; Nakano, S.; Yoneyama, M.; Honda, K. Improved thermodynamic parameters and helix initiation factor to predict stability of DNA duplexes. Nucleic Acids Res. 1996, 24, 4501–4505. [Google Scholar] [CrossRef]
- Huguet, J.M.; Bizarro, C.V.; Forns, N.; Smith, S.B.; Bustamante, C.; Ritort, F. Single-molecule derivation of salt dependent base-pair free energies in DNA. Proc. Natl. Acad. Sci. USA 2010, 107, 15431–15436. [Google Scholar] [CrossRef] [Green Version]
- Kilchherr, F.; Wachauf, C.; Pelz, B.; Rief, M.; Zacharias, M.; Dietz, H. Single-molecule dissection of stacking forces in DNA. Science 2016, 353, aaf5508. [Google Scholar] [CrossRef]
- Sponer, J.; Jurečka, P.; Marchan, I.; Luque, F.J.; Orozco, M.; Hobza, P. Nature of base stacking: Reference quantum-chemical stacking energies in ten unique B-DNA base-pair steps. Chemistry 2006, 12, 2854–2865. [Google Scholar] [CrossRef] [PubMed]
- Alexandrov, B.; Gelev, V.; Monisova, Y.; Alexandrov, L.; Bishop, A.R.; Rasmussen, K.; Usheva, A. A nonlinear dynamic model of DNA with a sequence-dependent stacking term. Nucleic Acids Res. 2009, 37, 2405–2410. [Google Scholar] [CrossRef] [PubMed]
- Svozil, D.; Hobza, P.; Sponer, J. Comparison of intrinsic stacking energies of ten unique dinucleotide steps in A-RNA and B-DNA duplexes. Can we determine correct order of stability by quantum-chemical calculations? J. Phys. Chem. B 2010, 114, 1191–1203. [Google Scholar] [CrossRef] [PubMed]
- Santa Lucia, J., Jr. A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc. Natl. Acad. Sci. USA 1998, 95, 1460–1465. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Beyerle, E.R.; Dinpajooh, M.; Ji, H.; von Hippel, P.H.; Marcus, A.H.; Guenza, M.G. Dinucleotides as simple models of the base stacking-unstacking component of DNA ’breathing’ mechanisms. Nucleic Acids Res. 2021, 49, 1872–1885. [Google Scholar] [CrossRef]
- McCole, R.B.; Erceg, J.; Saylor, W.; Wu, C.-T. Ultraconserved Elements Occupy Specific Arenas of Three-Dimensional Mammalian Genome Organization. Cell Rep. 2018, 24, 479–488. [Google Scholar] [CrossRef] [Green Version]
- Wu, Y.; Wang, H. Convergent evolution of bird-mammal shared characteristics for adapting to nocturnality. Proc. Biol. Sci. 2019, 286, 20182185. [Google Scholar] [CrossRef]
Bins for Alternative Allele Frequency | Whole Genome | Ultra Conserved Elements Only | ||
---|---|---|---|---|
Number of SNPs inside Whole Genome | Relative Frequency (%) of SNPs inside Whole Genome | Number of SNPs inside UCEs | Relative Frequency (%) of SNPs inside UCEs | |
0–1% | 68,430,653 | 84.438 | 28,787 | 92.724 |
1–2% | 2,709,034 | 3.343 | 632 | 2.036 |
2–3% | 1,249,017 | 1.541 | 280 | 0.902 |
3–4% | 761,505 | 0.940 | 154 | 0.496 |
4–5% | 536,314 | 0.662 | 93 | 0.300 |
5–6% | 411,115 | 0.507 | 69 | 0.222 |
6–7% | 334,473 | 0.413 | 78 | 0.251 |
7–8% | 287,678 | 0.355 | 59 | 0.190 |
8–9% | 258,931 | 0.320 | 52 | 0.167 |
9–10% | 231,334 | 0.285 | 39 | 0.126 |
10–11% | 213,325 | 0.263 | 31 | 0.100 |
11–12% | 193,665 | 0.239 | 40 | 0.129 |
… | ||||
0–100% | 81,042,272 total | 100% | 31,046 total | 100% |
Region | Average Number of Alternative Alleles Per Person in a Region | |
---|---|---|
Cutoff 50% | Cutoff 2% | |
Africa | 472 | 117 |
America | 370 | 47 |
Europe | 352 | 42 |
East Asia | 373 | 40 |
South Asia | 357 | 46 |
Oligo-Nucleo-Tides | UCNE Sequences | Chromosome #1 | Oligo-Nucleo-Tides | UCNE Sequences | Chromosome #1 | ||||
---|---|---|---|---|---|---|---|---|---|
Relative Freq (%) | Number of Occurrences | Relative Freq (%) | Number of Occurrences | Relative Freq (%) | Number of Occurrences | Relative Freq (%) | Number of Occurrences | ||
1-mer | 3-mer | ||||||||
A | 0.314 | 445,884 | 0.291 | 67,070,277 | TTT | 0.041 | 58,385 | 0.037 | 8,583,142 |
T | 0.317 | 449,114 | 0.292 | 67,244,164 | TTC | 0.020 | 27,875 | 0.020 | 4,548,877 |
C | 0.183 | 260,157 | 0.208 | 48,055,043 | TTG | 0.021 | 29,834 | 0.019 | 4,344,678 |
G | 0.185 | 262,642 | 0.209 | 48,111,528 | TCA | 0.022 | 30,484 | 0.020 | 4,522,569 |
2-mer | TCT | 0.020 | 27,643 | 0.022 | 5,129,424 | ||||
AA | 0.110 | 156,199 | 0.095 | 21,901,540 | TCC | 0.011 | 15,703 | 0.015 | 3,657,040 |
AT | 0.092 | 129,314 | 0.074 | 17,121,783 | TCG | 0.002 | 3345 | 0.002 | 535,651 |
AC | 0.048 | 68,447 | 0.050 | 11,598,278 | TGA | 0.022 | 30,724 | 0.019 | 4,486,632 |
AG | 0.064 | 90,715 | 0.071 | 16,448,644 | TGT | 0.023 | 32,190 | 0.020 | 4,584,113 |
TA | 0.075 | 106,391 | 0.063 | 14,554,789 | TGC | 0.017 | 24,240 | 0.015 | 3,357,313 |
TT | 0.112 | 157,604 | 0.096 | 22,048,241 | TGG | 0.013 | 18,970 | 0.019 | 4,368,306 |
TC | 0.055 | 77,483 | 0.060 | 13,844,699 | CAA | 0.021 | 29,151 | 0.019 | 4,288,540 |
TG | 0.075 | 106,452 | 0.073 | 16,796,378 | CAT | 0.021 | 29,829 | 0.018 | 4,120,946 |
CA | 0.074 | 104,633 | 0.073 | 16,768,284 | CAC | 0.012 | 17,466 | 0.015 | 3,506,405 |
CT | 0.064 | 90,607 | 0.071 | 16,444,797 | CAG | 0.020 | 27,845 | 0.021 | 4,852,390 |
CC | 0.036 | 51,183 | 0.054 | 12,466,763 | CTA | 0.012 | 17,425 | 0.013 | 2,941,433 |
CG | 0.009 | 12,748 | 0.010 | 2,375,159 | CTT | 0.020 | 28,116 | 0.020 | 4,634,644 |
GA | 0.055 | 77,358 | 0.060 | 13,845,615 | CTC | 0.012 | 16,714 | 0.018 | 4,057,534 |
GT | 0.050 | 70,436 | 0.050 | 11,629,291 | CTG | 0.020 | 28,042 | 0.021 | 4,811,169 |
GC | 0.044 | 62,159 | 0.044 | 10,145,272 | CCA | 0.013 | 18,634 | 0.019 | 4,330,820 |
GG | 0.037 | 51,737 | 0.054 | 12,491,312 | CCT | 0.013 | 18,288 | 0.019 | 4,273,302 |
3-mer | CCC | 0.008 | 11,006 | 0.014 | 3,193,020 | ||||
AAA | 0.041 | 57,532 | 0.037 | 8,516,543 | CCG | 0.002 | 3016 | 0.003 | 669,612 |
AAT | 0.034 | 48,424 | 0.024 | 5,470,905 | CGA | 0.002 | 3173 | 0.002 | 523,798 |
AAC | 0.015 | 21,576 | 0.014 | 3,332,435 | CGT | 0.002 | 3453 | 0.003 | 597,422 |
AAG | 0.020 | 28,302 | 0.020 | 4,581,648 | CGC | 0.002 | 2967 | 0.003 | 579,316 |
ATA | 0.022 | 30,622 | 0.019 | 4,475,100 | CGG | 0.002 | 3096 | 0.003 | 674,618 |
ATT | 0.034 | 48,554 | 0.024 | 5,500,468 | GAA | 0.020 | 27,818 | 0.020 | 4,518,460 |
ATC | 0.014 | 19,839 | 0.013 | 3,035,996 | GAT | 0.014 | 20,080 | 0.013 | 3,056,974 |
ATG | 0.021 | 30,019 | 0.018 | 4,110,209 | GAC | 0.009 | 12,409 | 0.010 | 2,216,474 |
ACA | 0.022 | 30,869 | 0.020 | 4,553,751 | GAG | 0.012 | 16,823 | 0.018 | 4,053,693 |
ACT | 0.016 | 22,142 | 0.016 | 3,732,934 | GTA | 0.012 | 16,988 | 0.011 | 2,566,721 |
ACC | 0.008 | 11,883 | 0.012 | 2,725,309 | GTT | 0.016 | 22,177 | 0.014 | 3,329,970 |
ACG | 0.002 | 3335 | 0.003 | 586,276 | GTC | 0.009 | 12,839 | 0.010 | 2,202,280 |
AGA | 0.019 | 27,353 | 0.022 | 5,150,760 | GTG | 0.013 | 18,242 | 0.015 | 3,530,308 |
AGT | 0.016 | 22,472 | 0.016 | 3,719,675 | GCA | 0.017 | 24,365 | 0.015 | 3,361,131 |
AGC | 0.016 | 22,161 | 0.014 | 3,317,232 | GCT | 0.016 | 22,187 | 0.014 | 3,309,131 |
AGG | 0.013 | 18,369 | 0.018 | 4,260,968 | GCC | 0.009 | 12,391 | 0.013 | 2,891,387 |
TAA | 0.029 | 41,225 | 0.020 | 4,577,976 | GCG | 0.002 | 2995 | 0.003 | 583,618 |
TAT | 0.022 | 30,705 | 0.019 | 4,472,951 | GGA | 0.011 | 15,843 | 0.016 | 3,684,403 |
TAC | 0.012 | 16,780 | 0.011 | 2,542,958 | GGT | 0.009 | 12,082 | 0.012 | 2,728,078 |
TAG | 0.012 | 17,407 | 0.013 | 2,960,898 | GGC | 0.009 | 12,562 | 0.013 | 2,891,408 |
TTA | 0.029 | 41,106 | 0.020 | 4,571,528 | GGG | 0.008 | 11,045 | 0.014 | 3,187,415 |
Dinucleotide | ρ (Genome) | ρ (UCNEs) |
---|---|---|
CG | 0.24 | 0.27 ± 0.002 |
GC | 1.02 | 1.30 ± 0.004 |
TA | 0.74 | 0.76 ± 0.002 |
AT | 0.88 | 0.92 ± 0.002 |
CC/GG | 1.24 | 1.08 ± 0.004 |
TT/AA | 1.12 | 1.12 ± 0.002 |
TG/CA | 1.20 | 1.28 ±0.003 |
AG/CT | 1.16 | 1.10 ± 0.003 |
AC/GT | 0.83 | 0.84 ± 0.003 |
GA/TC | 0.99 | 0.94 ± 0.003 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fedorova, L.; Mulyar, O.A.; Lim, J.; Fedorov, A. Nucleotide Composition of Ultra-Conserved Elements Shows Excess of GpC and Depletion of GG and CC Dinucleotides. Genes 2022, 13, 2053. https://doi.org/10.3390/genes13112053
Fedorova L, Mulyar OA, Lim J, Fedorov A. Nucleotide Composition of Ultra-Conserved Elements Shows Excess of GpC and Depletion of GG and CC Dinucleotides. Genes. 2022; 13(11):2053. https://doi.org/10.3390/genes13112053
Chicago/Turabian StyleFedorova, Larisa, Oleh A. Mulyar, Jan Lim, and Alexei Fedorov. 2022. "Nucleotide Composition of Ultra-Conserved Elements Shows Excess of GpC and Depletion of GG and CC Dinucleotides" Genes 13, no. 11: 2053. https://doi.org/10.3390/genes13112053