Next Article in Journal
Single Nucleotide Polymorphisms Associated with AA-Amyloidosis in Siamese and Oriental Shorthair Cats
Previous Article in Journal
Imidazolopiperazine (IPZ)-Induced Differential Transcriptomic Responses on Plasmodium falciparum Wild-Type and IPZ-Resistant Mutant Parasites
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analysis of Nucleotide Variations in Human G-Quadruplex Forming Regions Associated with Disease States

1
School of Graduate and Interdisciplinary Studies, University of Louisville, Louisville, KY 40292, USA
2
Department of Neuroscience Training, University of Louisville, Louisville, KY 40292, USA
3
Kentucky IDeA Network of Biomedical Research Excellence (KY INBRE) Bioinformatics Core, University of Louisville, Louisville, KY 40292, USA
4
Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY 40292, USA
*
Author to whom correspondence should be addressed.
Genes 2023, 14(12), 2125; https://doi.org/10.3390/genes14122125
Submission received: 31 October 2023 / Revised: 20 November 2023 / Accepted: 23 November 2023 / Published: 25 November 2023
(This article belongs to the Section Bioinformatics)

Abstract

:
While the role of G quadruplex (G4) structures has been identified in cancers and metabolic disorders, single nucleotide variations (SNVs) and their effect on G4s in disease contexts have not been extensively studied. The COSMIC and CLINVAR databases were used to detect SNVs present in G4s to identify sequence level changes and their effect on the alteration of the G4 secondary structure. A total of 37,515 G4 SNVs in the COSMIC database and 2378 in CLINVAR were identified. Of those, 7236 COSMIC (19.3%) and 457 (19%) of the CLINVAR variants result in G4 loss, while 2728 (COSMIC) and 129 (CLINVAR) SNVs gain a G4 structure. The remaining variants potentially affect the folding energy without affecting the presence of a G4. Analysis of mutational patterns in the G4 structure shows a higher selective pressure (3-fold) in the coding region on the template strand compared to the reverse strand. At the same time, an equal proportion of SNVs were observed among intronic, promoter, and enhancer regions across strands.

1. Introduction

G-quadruplexes (G4s) are stranded secondary structures of nucleic acids rich in guanine. These nucleic acid sequences are characterized by four runs of at least three guanines separated by short loops, which can potentially fold into an intramolecular or intermolecular G4 structure [1]. The guanine tetrads are stacked on each other and held together by mixed loops of DNA, giving a four-stranded structure with nucleobases on the inside forming a Hoogsteen base pairing and the sugar-phosphate backbone on the outside (Figure 1). They are found in G-rich sequences of both DNA and RNA and are stabilized by metal cations such as potassium (K+) or sodium (Na+) [2]. The binding energy is held through the Hoogsteen hydrogen bonding between the guanines, stabilized by π–π and the charge interactions between the sixth position of oxygen (O6) and cations (K+, Na+) between the stacks. The structural architecture of a G4 is quite diverse and can form different topologies based on factors, including the chemical environment, loop length [3,4], and localization in the sequence or structure molecularity [5]. The stacking of the guanine tetrads is bound by the loops of nucleotide bases of variable sizes, which determine the folding of the secondary structure.

1.1. Functional Role of G4 Regions

Guanine-rich sequences do not always form G4 structures, which can be dependent upon physiological conditions and methylation patterns guided by a chromatin structure for their formation [6]. However, when they do, they affect molecular function. One such perturbation is transcription, which is affected by stalling the replication fork [7,8,9]. In cells that do not have the normal DNA repair machinery, this causes the downregulation of several genes and cell cycle arrest [10].
Additionally, G4 structures, G4 stabilizing agents, and double-stranded breaks (DSBs) facilitate the homologous recombination repair pathway affecting genome instability. Based on the size of the G4, thermodynamically stable short-loop structures within the G4 have been extensively studied to cause instability in replication-dependent processes [11]. The alteration of the DNA polymerase function and helicases in sites of G4 formation has been well established and is used to identify G4s in vivo [12,13].
While some ligands have shown a binding affinity towards G4 structures for the treatment of cancer-specific cells and transcriptional alteration [14], the binding of other ligands that stabilize G4 leads to multiple DNA damage [14], micronuclei formation, delayed replication fork progression [15], and telomeric defects [16,17,18].

1.2. Mutations within G4 Regions

DNA lesions can be mutagenic or lethal, and when they are found in G4 regions, they can alter the secondary structure by changing the guanine tract base pairing or altering the composition of the loop region. A single nucleotide polymorphism (SNP) in the G4 present in the promoter region of c-MYC has been shown to change the transcription in vivo [19]. Mass spectroscopy studies using single-nucleotide substitution in the central block of parallel G4 forming sequences found a deleterious effect on G4 stability and association rate [20]. A trinucleotide CGG repeat expansion in the untranslated region of the FMR1 gene has been linked with ataxias and fragile X syndrome [21]. A T→C SNP at the GC-rich apolipoprotein E (APOE) region is known to vary G4 structure and has been linked to the onset of Alzheimer’s disease [22].
It has been proposed that specific helicases promote genomic stability by actively resolving G4 structures, which can be altered by adding G4 stabilization ligands in the presence of specific DSBs [10,12,23]. Baral et al. identified several eQTL variants in potential G4 regions [24]. Changes in G4 loops led to a significant alteration in gene expression among individuals, further fueling the structural role of G4s in regulating and binding transcription factors [23].
A selective mutation of the G-rich region to disrupt the G4 structure has been found to alter transcription [25]. Mutations can hinder the recruitment of transcription factors that overlap the G-rich region and function as recognition motifs or bind to the G4 structure. Siddiqui-Jain et al. demonstrated that a single G→A mutation destabilizes the folding of G4 in the Pu27 region of MYC, which is otherwise repressed, resulting in a threefold increase in transcriptional activity of the gene in tumor cell lines [19]. Studies related to 8-oxoguanine (8-oxoG) in G4 established the presence of G-A and guanine abasic lesions, which can destabilize the secondary structure, leaving the unfolded sequence prone to cleavage [26].

1.3. Study Motivation

To date, the majority of G4 studies have focused on the identification of G4s [27,28], their association with different features of interest [6,16,29,30,31], and the determination of which putative genomic regions form structural G4s [32,33]. Few studies have looked at how single nucleotide polymorphisms affect G4 formation, although SNVs within G-quadruplexes have been shown to be less frequent than random, suggesting they are under selective pressure [34]. In most cases, these are focused on regions of interest, such as telomeres [35] or specific genes [36,37]. Previous efforts have determined that G4s, in general, are enriched around breakpoints associated with structural variants in cancer [38,39]. The most comprehensive study to date associated SNVs in general with G4 variations, resulting in the identification of more than 5 million gains or losses of G4s genome wide, with the majority occurring within genic regions and a specific enrichment in oncogenes [25].
Given the roles that G4 regions and mutations within them play in transcriptional and translational control and the lack of information concerning how SNVs in G4 regions affect specific disease states, we set out to identify the impacts of mutations in G4 regions and patterns associated with the variants in germline and somatic cells. We compared annotated G4 regions with overlapping variants annotated in the COSMIC [40] and CLINVAR [41] databases. These represent somatic mutations associated with cancers (COSMIC) or germline mutations with clinical relevance (CLINVAR). Because of their high stability and increased cellular uptake, G4 sequences have interesting diagnostic and therapeutic functions. Understanding how known variants in the genome confer stability or disrupt the G4 sequences will allow a better understanding of G4 structure and function.

2. Materials and Methods

2.1. Putative and Validated G4 Identification

Quadparser version 2 [28] with the default parameters was used to identify 175,778 putative G4 regions (pG4) in the human genome hg38.p13 assembly across both strands. Experimentally validated G4 regions were obtained from an experiment utilizing a method called G4 Seq (GEO accession GSE63874) previously performed by Chambers et al. [33]. The intersection between the putative and experimental G4s was found using BEDTOOLS v2.27.1 [42].

2.2. Somatic and Germline Variants in G4 Regions

Cancer-specific curated somatic mutations from the COSMIC database v96 [40] were used for the analysis. COSMIC contains 22,996,215 distinct single nucleotide variants (SNVs) (19,721,019 non-coding variants (NCV) and 5,977,977 coding) from 1.4 million tumor samples. An additional 550,239 germline SNVs from other clinically relevant diseases and disorders were obtained from CLINVAR [41] version 20200203.
For both sets of data, a two-pass analysis was performed. In the first pass, overlaps between the SNVs and putative G4 regions were found to determine the potential loss of a G4 structure due to mutations. In the second pass, mutations leading to a G in regions with flanking guanines that result in the gain of a G4 were detected. In each case, a variant call format (VCF) file describing the coding and non-coding mutations was obtained from COSMIC [40] and CLINVAR [41]. Using the VCF, SNVs were filtered using bcftools v1.8 [43], with insertion and deletion events (INDELs) removed (bcftools -I NCV.vcf) even though INDELS may introduce or remove G4 structures. Initially, we chose to focus on non-INDEL events since INDELS can have a higher false positive rate, particularly in homopolymeric regions [44,45].

2.3. Identification of SNPs Affecting G4 Formation

A window of 30 bases upstream and 30 bases downstream of each variant was used to search for putative G4 sequences. Prospective G4 regions were compared with the Vienna Package RNAfold v2.4.8 to predict changes in stability as a result of the variant [46]. The ∆MFE (minimum free energy) and the ∆ED (ensemble diversity) values were used as the determining metrics. The ∆MFE calculates the stability of the sequence structure based on the binding propensities, while the ∆ED provides the diversity of the sequence structure and alternate structures that can form. G4hunter v20150928 was used to compare the G4 scores and the formation of pG4 [27].
Based on the location of a specific SNV inside a G4 region, the relative location of the mutation was calculated as the position of the SNV in the G4 divided by the total length of the sequence. In terms of multiple potential G4 regions, the whole region was used as a single sequence, and the relative location of the mutation was calculated. Each SNV was converted into a 3-mer based on its location, and changes in the 3-mer resulting in a broken GGG quadruplex structure were calculated. For each 3-mer, the number of changes was calculated using one base before and after the variant’s location, respectively. In addition, SNVs occurring within loops were analyzed. The R package annotatr v1.1.6 was used for randomized background counts for each annotation [47].

2.4. Functional and Transcription Factor Enrichment Analysis

Based on the G4 identified, the hg38.p13 coordinates of the G4 were used to find the enrichment of transcriptional factors using the R package RemapEnrich v0.99.0 [48] for Hep-G2 (hepatocellular carcinoma), K562 (myelogenous leukemia), HEK293 (embryonic kidney), and HEK293T (T antigen–transformed embryonic kidney) cell lines. These cell lines were chosen as a comparative analysis of tumor vs. normal cells that could better elicit differences in the somatic mutations. All enrichment tests were calculated using hypergeometric testing. Further, an enrichment analysis of the genes with individual mutations was selected based on the number of SNVs per gene, the effect of SNVs on the G4, G4 per gene, and samples as specified in the result. The functional annotation enrichment of genes was carried out using g:Profiler functional annotation v0.2.1 [49], while the enrichment analysis of the transcription factors (TFs) involved was carried out using the STRINGdb database v11 [50]. In order to analyze the disruption of motifs by each SNV, the R package motifbreakR v2.10.0 [51] was used.

3. Results

3.1. COSMIC Somatic Mutations

Using the COSMIC database, 37,515 (0.16% of all COSMIC mutations) distinct single nucleotide somatic mutations overlapping experimentally validated G4s were identified within 26,504 pG4 regions from 9693 genes, 8998 of which were determined to be protein coding according to ENSEMBL hg38 annotations. The remaining genes were identified as lncRNA (n = 557), miRNA (n = 111), or other (snoRNA, snRNA, pseudogenes, etc., n = 27). The most frequently observed mutation in the COSMIC-filtered dataset was the transition event G→A (28%), followed by the transversion event T→G (18%) (Figure 2a–c). We expected to see a high number of G→A and G→T (15%) mutations. However, we additionally observed T→G transversion events occurring at higher rates than A→G transitions. Comparatively, higher G/C→A/T variants in intragenic CpG islands have been observed due to the spontaneous deamination of the cytosine hypermethylated CpGs within these regions [52,53]. However, the effect of these mutations is less studied across G4 regions.
We performed a Kruskal–Wallis rank sum test based on the ∆MFE for the G4 stabilizing and destabilizing variants based on the annotation of the G4 present in 3′ UTR, 5′ UTR, CDS, and promoter regions. The results show a significant difference among the groups (H = 13,498, df = 3, p < 0.001). A pairwise Dunn’s test with a Benjamini–Hochberg FDR correction showed that SNVs destabilizing the G4 in CDS regions have a lower ∆MFE compared to the 5′ UTR (FDR < 0.001) and promoters (FDR < 1 × 10−25) across both strands. We found a lower transition:transversion ratio (Χ2 test, p = 0.00001) occurring in the G4 region (1.02), compared to all mutations in the COSMIC database (1.146) (Supplemental Tables S1 and S2).
Based on the G4Hunter [27] and RNAfold [46] results, we compared the number of SNV events that break the G4 structure and change the thermodynamic stability based on the MFE of each sequence. We found that 7236 (19.2% of variants in G4) of the SNVs within the G4Hunter-identified G4s resulted in the loss of a G4, while 2728 SNVs led to the gain (Figure 3a,b and Figure 4, Supplemental Table S3).

3.2. CLINVAR Germline Mutations

Using the CLINVAR database, 4999 SNVs were identified in pG4 regions, out of which 2378 intersect with experimental G4. Most of these G4 mutations occur in exons (65%, n = 1552). The remaining variants are found in introns (34%, n = 804) and promoters (0.8%, n = 20). In total, 97% (n = 2306) of the detected SNVs occur in protein-coding genes (Figure 3c, Supplemental Table S4).

3.3. Predicted Change to G4 Stability

RNAfold was used to differentiate the impact of the variant on the stacking. Variants were classified based on the change in stability and formation of available guanines for stacking by combining the sequence pattern analysis of G4Hunter with thermodynamic parameters from RNAfold (Figure 4a–d). The majority of the SNVs identified in COSMIC and CLINVAR (81%) did not affect the GGG stacking, so the formation of tetrads of guanine was impossible. Though complete structural breakage does not occur, 40% of the variants were predicted to have decreased stability in the G4 structure. This is due to the presence of additional guanines in the loop that aid the conformational diversity of the G4, which can act as an extra base for stacking. Using the combined COSMIC and CLINVAR mutations, 10,435 SNVs were predicted to increase the stability (lower the MFE relative to the reference sequence), while 12,061 SNVs had no predicted MFE change. An additional 15,019 variants were predicted to destabilize the G4. Transversions were more likely to change the structure of the G4 region without disrupting the G stacks and increasing the predicted thermodynamic stability of the structure (17%) compared to transitions (10%). Transition mutations were predicted to destabilize the G4 structure at a higher rate (22.9%) compared to transversions (17.6%) (Table 1).

3.4. G4 Variants in Transcript Regions

We find more G4 mutations in exonic regions, the 5′ untranslated region (UTR), the 3′ UTR, and coding sequence (CDS) regions of protein-coding genes when the G4 is formed in the strand opposite the transcribed gene (Figure 3c), consistent with previous studies on the role that G4s play in transcription [54,55,56,57,58,59,60,61,62,63,64,65]. The number of SNVs around G4 forming regions in introns and promoters was proportionate with the transcript in both the same and opposite strands. This suggests a selective pressure of variants around exon regions compared to non-coding regions. Previously, it was hypothesized that the formation of G4 in either strand within the transcribed region along with nascent RNA would lead to the formation of DNA:RNA hybrid R loops in the G4, which results in physically halting the polymerase movement inhibiting further rounds of transcription [66]. Additionally, G4 formation on the non-template strand could interfere with the reannealing of the DNA strands, increasing the stability of the R loop hybrid [66].

3.5. G4 Variants in Gene Features

G→A mutations are elevated in exons (35.21%) and show a decrease within promoters (22.2%). We find a lower percentage of T→G mutations in G4 regions occurring in exons (11.71%) compared to intron (22.23%), promoter (21.79%), enhancer (32.26%), and intergenic regions (18.29%). This pattern of low T→G variants coincides with counts in the CDS region, while the 5′ UTR shows increased T→G variants (17.94%). G→A SNVs are found less in enhancers (21.94%) distant from the transcription site. Deamination occurring upstream of the transcription site does not appear to affect the G4 region. These regions show the highest proportion of T→G (32.26%) mutations (Table 2).
Previously, higher counts of C→T over G→A variants were identified in the non-template strand, which was hypothesized due to cytosine deamination in the nearby 2 kb downstream of the 5′ end of genes due to a higher exposure of single-stranded DNA [67]. However, we hypothesize that these variants cause a conformational shift in the G4 structure, leading to an alteration in expression and binding patterns across these regions. Additionally, 8-oxoG formation, which is formed via the oxidation of guanine and frequently leads to G→A base pairing, leading to an eventual G→T transversion, binds Sp1 proteins in G4s and is an important regulator for adipose tissue development. The GC-rich promoter region with Sp1 transcription factor sites activates proportional to increasing 8-oxoG abundance [68].
G quadruplexes in the 3′ UTR occurring on the same strand of the coding region are prone to variants cataloged in COSMIC that are predicted to either increase or decrease stabilization based on MFE measures. We investigated which SNVs in each annotation had the highest change in terms of specific nucleotide substitutions. The 3′ UTR has a higher incidence of T→G versus A→G SNVs. This suggests that T→G mutations are more likely to stabilize 3′UTR G4s. Putative G4s in CDS and CpG regions are least prone to variants, while enhancers and the intergenic G4 show higher changes in stabilization (both stabilizing and destabilizing) due to the SNVs (Figure 5a,b, Supplemental Figure S1).

3.6. Enrichment Analysis

3.6.1. Gene Ontology

A Gene Ontology (GO) enrichment analysis was performed for biological processes (GO:BP) and cellular components (GO:CC). A total of 424 GO:BP categories were determined to be significant (FDR ≤ 0.05) using both datasets (Supplemental Figure S2; Supplemental Table S5), while 425 significant GO:BP enrichments were found for COSMIC alone (Supplemental Figure S3; Supplemental Table S6) and 48 were found for CLINVAR (Supplemental Figure S4; Supplemental Table S7). When this was further broken down into mutations resulting in a loss of a G4, we found 205 significant GO:BP for both (Supplemental Table S8), 75 for COSMIC (Supplemental Table S9), and 25 for CLINVAR (Supplemental Table S10). Among the COSMIC enrichments were synapse organization, axonogenesis, neuron projection guidance, axon guidance, cell–substrate adhesion, neuromuscular process, regulation of neuron projection development, and xenobiotic glucuronidation. One example gene is the App transcript, which is involved in synapse formation and function in the developing brain. This transcript is transported to neuronal dendrites, where the transmembrane APP protein plays an integral role in synapse formation and function. However, the translation of App is repressed by the binding of the fragile X mental retardation protein (FMRP) to G4s in the App coding region. This repression is thought to occur through a direct interaction with the ribosomes, resulting in stalled ribosomal progression on the mRNA [69]. Past studies have also shown that this repression can be relieved via the synaptic activation of metabotropic glutamate receptors, specifically mGluR5 receptors. This results in the release of FMRP and an increase in APP translation [70].
The complete set of CLINVAR enrichments included muscular-related processes, such as striated muscle contraction, a neuromuscular process, cardiac muscle cell contraction, muscle tissue morphogenesis, regulation of action potential, cell communication involved in cardiac conduction, regulation of heart rate by cardiac conduction, musculoskeletal movement, cardiac muscle tissue morphogenesis, skeletal muscle contraction, and ventricular cardiac muscle cell action potential. Variants leading to a gain of a G4 result in 115 GO:BP enrichments for both (Supplemental Table S11), 22 for COSMIC (Supplemental Table S12), and 2 for CLINVAR (Supplemental Table S13). Among the COSMIC enrichments from the G4 gain are the positive regulation of transcription by the RNA polymerase II and actin cytoskeleton organization, while the CLINVAR enrichments based on a G4 loss include system development, action potential, and cardiac muscle cell action potential. The loss of the G4 using COSMIC resulted in similar enriched GO terms as did G4 loss in CLINVAR, including muscle contraction, the muscle system process, the heart process, the cardiac muscle cell action potential involved in contraction, actin-mediated cell contraction, the regulation of heart contraction, and action potential.
Among the enriched categories detected were PDZ domain proteins (GIPC2, GRIDZIP, LIMK2, PDLIM7, PDZD7, WHRN, SIPA1L3, PRX, MYO1BA, MAGI2, and MAST) with the G4 in coding regions and variants that negatively affect the RGG (arginine–glycine–glycine) domain or the G4 stability. Proteins with RGG repeats have been known to bind to G4 structures [71]. Variants in these regions affecting the G4 stability could further affect downstream binding.
GO:CC enrichments yield 128 significant categories for both datasets combined (129 for COSMIC only and 14 for CLINVAR only) (Supplemental Figures S5–S7; Supplemental Tables S14–S16). Among the enriched GO:CC categories detected in COSMIC are the collagen-containing extracellular matrix and the cell–cell contact zone, indicating that mutations in these genes affect the adhesion of cells to the extracellular matrix. Other enriched GO:CC terms in CLINVAR include I band, sarcolemma, and the myofilament Z disc. Enriched GO:CC terms from a loss of the G4 using the CLINVAR database include a collagen trimer and a PCSK9-LDLR complex.

3.6.2. KEGG Metabolic Pathways

KEGG enrichment yielded 96 significant pathways for the combined datasets, 91 COSMIC, and 11 CLINVAR (Supplemental Figures S8–S10; Supplemental Tables S17–S19). Those leading to a loss of the G4 yielded 33 significant categories, including 12 and 5 for COSMIC and CLINVAR, respectively (Supplemental Tables S20–S22). Among the enriched categories for genes with a loss of the G4 within CLINVAR are hypertrophic cardiomyopathy, dilated cardiomyopathy, arrhythmogenic right ventricular cardiomyopathy, adrenergic signaling in cardiomyocytes, and acute myeloid leukemia. KEGG enrichments for a gain of the G4 resulted in 31, 3, and 0 for combined, COSMIC only, and CLINVAR only, respectively (Supplemental Tables S23 and S24). The enriched terms from the gain of the G4 in CLINVAR variants are melanoma, the phospholipase D signaling pathway, and cocaine addiction.

3.6.3. INTERPRO Protein Domains

INTERPRO enrichment yielded 23, 23, and 2 enrichments for combined CLINVAR and COSMIC variants, COSMIC only, and CLINVAR only, respectively (Supplemental Tables S25–S27). Included were the Src homology-3 domain (n = 69 FDR = 3.73 × 10−3) and the Pleckstrin homology-like domain (PH) (n = 147, FDR = 1.30 × 10−10). The binding affinity of PH domains has unique recognition sites and is known for functional plasticity [72,73].

3.6.4. Transcription Factors

We identified the enrichment of TFs including NFKB1, ZFX, MBD3, ASX1, SUZ12, NCOR1, HMGN3, USF2, EGR1, GTF2F1, KDM4B, HNRNPH1, HNRNPL, NONO, TARDBP, NFATC3, KDM3A, and HOXA3, among others (Figure 6; Supplemental Figures S11–S13; Supplemental Table S28). The majority (92%) of these had at least one G4 in their gene structure working in a feed-forward regulation of genes. A number of the SNPs found in these regions disrupt the recognition motifs for transcription-binding sites.

3.7. Trinucleotide Context Mutation in the G4 Sequence

Based on the nucleotide context of one base pair before and after the mutation, we identified 79% of the variants to be affecting the loop region and 23% of the SNV after the change leads to the formation of GGG in regions with G(A|C|T)G (Figure 4 and Figure 7, Supplemental Figure S14). We find that 36% (n = 6810) of the transversion mutations are T→G, while 21% (n = 4070) of the transitions are A→G (Figure 7a–d). T→G mutations occurring in the context of GTG→GGG occur in 14% of the SNVs, leading to the formation of a stable G tetrad, while GAG→GGG occurs in 6% of the variants. Interestingly, the destabilization of the GGG region occurs via the GGG→GAG transition in 11% of SNVs (Figure 8). Previously, it has been reported that the GGG exhibits context-dependent specific mutational patterns that preserve the potential for G4 formation [74]. We find G→A mutations to be approximately 29% of the total SNVs in the selected G4 regions, with 26% (n = 4144; 11% of the total) of those variants occurring in the context of GGG→GAG (Figure 8). We observe these patterns in non-coding annotations with the exception of exonic and CDS regions. We identify an increased propensity to form stable multiple conformations with de-stabilized structures for 25% of the sequences with variants, while 14% of the variants have no predicted change in stability (Supplemental Table S29). This approach of analyzing the probable base pairing alternatives for additional guanine Hoogsteen base pairing can help identify the effects of variants within the G4 structure and hence predict the change in structure and functionality of G4s in various molecular processes.
Based on the position of the mutation in the G4, the normalized position for each variant was calculated. The relative location of a variant in a G4 is defined as the position of the variant divided by the length of the G4. For single nucleotide variants mutating to G either from A or T, we find similar elevated patterns in the middle of the G4. T|A→G mutations show conservation of guanine in the center position except for CDS and exons in both the template and non-template strand across both COSMIC and CLINVAR databases (Figure 8, Supplemental Figures S15–S18). These changes are stricter for SNVs within the 5′UTR across the template and non-template strand in the CLINVAR database, where we observe mutations in the center of the G4 for T→G variants. The 5′ UTR COSMIC mutations show mutations across the two extreme loops compared to the center. A→G mutations are observed in a higher proportion at the beginning of G4s in the CDS region, which provides evidence for selective pressure in the coding region preferentially protecting the coding sequence. G4s in UTRs have been reported to be under selection pressure, and variants in the G4 can account for the instability in the G4 and diseases [30], indicating an important functional role leading to conservation both within [63,75] and across species [34,56,76,77].

3.8. Role of the Location of SNVs in G4s

The relative position of G→T substitutions along G4 sequences is shown in Figure 7c. The location of this mutation at the beginning of the G4 can disrupt the structural formation; however, further elevated peaks at varying locations leading to additional guanines across the G4 may introduce additional tetrads in introns and exons (Figure 5c,d).
The observation of increased guanine stacks resulting from G→T|A substitutions that break up longer guanine runs is consistent with studies where the oxidation of multiple G’s occurs at the start of the guanine tetrads [29,78]. Our results help establish that the location of mutations and the type of mutation in G-rich regions likely alter the shape and stability of the G4 structure. Previously, it has been established that the most sensitive sites are located at the center tetrad [79]. For mutations in CLINVAR, we observe a higher mutation rate at the start of the G4 (Supplemental Figure S19). The A→G mutations associated with COSMIC variants show a considerable difference in their location relative to the G4 position (Supplemental Figure S20).

4. Discussion

4.1. Molecular Mechanisms for Promoting Mutations in G4 Regions

High occurrences of oxidized guanines in G4 structures have been previously established [29]. This type of mutation is thought to occur around the external tetrads due to radical-trapping antioxidants that slow mutation efficiency [78]. The oxidative stress occurring due to the reactive oxygen species (ROS) affects the genome stability and promotes mutagenesis, senescence, and other age-related diseases [80]. Mutations in GGG regions can destabilize the stacking of guanines, altering the ionization potential and affecting the ability of the G region to be further oxidized. G→A, T, or C mutations can disrupt the stacking, while mutations to G can further stabilize the G4 or allow additional conformations for the stacking. We investigated the change of each type of SNV in each annotation to have the highest change. Based on the absolute ∆MFE, we find that G4s in the CDS and CpG regions are the least likely regions to be affected by mutations, while enhancers and intergenic G4s are prone to higher variant-induced stabilization and destabilization (Figure 5a,b).
The escape of 8-oxoG from DNA repair during DNA replication can cause the misincorporation of adenine opposite 8-oxoG, leading to the addition of T in place of G (OG mutations) [81,82]. For instance, in a sequence with GTTAGGG with 8-oxoG at its fifth position, a mis-incorporation of the A occurs opposite G. Due to the presence of consistent Gs in the region, the true proportions of change in these regions can be hard to monitor over a range of replications. The methylation of cytosine leads to the formation of 5-methyl cytosine, which is a residue for spontaneous transitions [83]. Cytosine deamination might be the primary cause of C→T transition. Further, based on the context, a high proportion of T→G mutations lead to a GTG→GGG structure, supporting the stability of the G4. It presents a question of whether T→G mutations confer additional stability of the G4 in cancer cells. Past studies have highlighted the conditional impact of OG mutations in a base pairing with A in mutagenic MutY homolog harboring increased G→T transversions in MUTYH, leading to a higher incidence rate of colorectal cancer [84,85,86]. Thymine glycol is a lesion that is highly mutagenic and cytotoxic in regions of DSBs. In vitro studies have shown it to block replicative and repair DNA polymerases [87]. The OG, thymine glycol, and abasic sites formed are repaired by the excision repair pathway. The difference in the repair of 8-oxoG sites has been observed in NEIL glycolysis, which has been known to remove guanidinohydantoin (Gh) and spiroiminodihydantoin (Sp) from G4 structures in the promoter region over parallel conformation [31]. However, the glycolysases were not able to remove the 8-oxoG structure from the telomeric G4 or the same G4 structure in antiparallel structures.
We identified an increase in the presence of A|T→G mutations in the middle stacking of the G4, suggesting a functional impact for specific variants in G4 formation. G4s with spare tires (i.e., additional guanine tracts [88]) allow for alternate G4 structures to form, as does the exclusion of certain guanosines due to lesions or substitution in one tetrad region that might break apart longer guanine runs. A base excision repair with APE1 and OGG1 in the promoter region of VEGF has been implicated in G4 formation and may be involved in other genes as well [89,90].

4.2. TERT G4 Mutations

A prior study highlighted that the entire 67 bp G4 associated with the TERT promoter was protected from DNase cleavage, while the version containing G→T variants was found to be degraded into discrete segments [32,91]. Additionally, this region folds into a compact G4 structure without any hairpins in between the guanine stacks. However, based on DMS footprinting studies, the formation of hairpins has been predicted [92]. We identified 52 possible SNVs in 39 base pair locations in this 67 bp G4. The SNV chr5:1,295,113 (G→T) located in the TERT region is present around a G4 in the non-template strand. The SNV was associated with more than twenty-two cancer types, including the central nervous system, liver, bladder, ovarian, breast, kidney lung, bone, and pancreatic, among others. Many of these SNVs destabilize G4s. Further, with nine tetrads (GGG repeats present), multiple G4s can potentially be formed. With a SNV (G→A), we find the stability with the variant to differ if alternate G4 tetrads are used for the stacking.

4.3. G4 Mutations Disrupting the Transcription Factor Binding

Transcription factor proteins (TFs) known to bind G-rich regions, including SP1 [93], NF-κB [94], CREB [95], and the methyl-CpG binding domain MBD of methyl-CpG binding protein 2 (MeCP2) [96] had decreased association constants up to 10-fold for transcription factor sites with a change of guanine to 8-oxoguanine in model duplex DNA with the donor–acceptor pattern change on the imidazole ring in guanine compared to OG. The structure change for guanine in association with CREB was found to have a role in epigenetic repression [95]. This is supported by our results highlighting the reversal of these sequences to a stabilized G4 via a change through the T→G region in cancer cells. For instance, the variant chr10:122,143,482: G→A significantly affects the binding sites of TFs NHLH1, FOXO3, TAL1, TP53, HES5, HES7, USF2, EGR3, ZNF740, and SP1, among others (Supplemental Table S30). We observe similar observations for an additional 424 SNVs, which occur in at least five cancer types in the COSMIC database and disrupt the TF binding site with an average of 15.1 TF per variant.
A local network cluster (STRING) analysis of the enriched TFs yielded terms related to the Polycomb repressive complex (PRC1) (4/12 FDR 0.00049), the PcG protein complex (6/25, FDR 6.22 × 10−6), and the positive regulation of histone H3-K27 methylation (11/59 FDR 1.82 × 10−10). The PRC1 engages in transcriptional control through chromatin modification with histone 2A through a protein ligase ubiquitylation [97,98]. Although the mechanism of the PRC1 is under active investigation, recent evidence suggests that G tracts selectively remove the PCR2 complex from genes during gene activation [99]. Polycomb complexes have been associated with repression to maintain cell identity but are associated with actively transcribed loci, and this evidence suggests a direct role of G4s across cell types to regulate expression through structural variation [99,100].
Repair mechanisms, including BER and mismatch repair, are required to protect non-canonical or mismatched base pairs due to a polymerase error [101,102,103,104]. Neurogenerative disorders occurring through the expansion of CAG→CTG repeats have been associated with MutSβ, a heterodimer involved in mismatch repair [105,106]. Though the involvement of G4s in gene transcription [58,61,62,64,65,89,107] and telomere regulation [16,17,29,35,108] is well studied, the mechanism of base excision repair by DNA glycosylases in G4s and other non-canonical structures is poorly understood [109,110]. We identified G4s with SNVs in the genes of CHRNG, GRIN2C, CHAT, ADCY1, GABRG3, CACNG3, PPFIA3, LRTOMT, VAMP2, TSPOAP1, MAPK3, GABRR2, KCNJ6, PICK1, and STX1A, among others. These genes have been associated with several psychiatric disorders, including schizophrenia, bipolar disorder, tobacco use disorder, Parkinson’s disease, and autism [111,112,113,114,115,116].
Previous research has shown the presence of G4 sequences in various untranslated dendritic mRNAs, suggesting the role of G4s as a neurite localization signal [117,118]. The deletion of different putative G4 sequences led to a severe loss of signal in neurites [119]. It has been hypothesized that the G4 structure, being sensitive to cationic, can function in correlation to the neuronal activity in localization and transport as activity-dependent changes [120,121]. Cationic sensitivity could influence the stability and structure and regulate the binding of trans-acting factors [118].

5. Conclusions

G4s are formed due to the intricate balance between the folding energy by a nick in the DNA [122], methylated guanines [123], and guanines available for stacking [124]. The balance between the hypomethylated and hypermethylated G-rich regions near promoters (despite cytosine deamination and cytosine methylation) results in preserved CpG islands across mammalian genomes [125], which are maintained by G4 structures that act to sequester DNMT1, contributing to the CpG hypomethylation [107]. These methylation and oxidation patterns result in G4 sequence preservation. In addition, cytosine deamination has been shown to have important roles in the G4 structure, including a destabilization in telomeres [126] and a class switch recombination in mature B cells [127].
With the introduction of next-generation techniques for the identification of G4s, an analysis of variants in these complex regions and the mechanism of formation of a G4 in different cell types remains uncertain. Our study points out genes and G4 sequences that are affected by either somatic or germline mutations. Of those variants identified, 37,515 are observed as somatic mutations associated with cancer, while 4999 are germline mutations of clinical significance. We identify the possible effects of these single nucleotide variants occurring within coding and non-coding regions on the stability of G4s.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/genes14122125/s1, Figure S1: Effect of each SNV on ∆MFE of G4 on different annotations with percentage of the counts shown in the secondary y axis; Figure S2: Top 25 enriched GO:BP terms for COSMIC and CLINVAR G4 mutations; Figure S3: Top 25 enriched GO:BP terms for COSMIC G4 mutations; Figure S4: Top 25 enriched GO:BP terms for CLINVAR G4 mutations; Figure S5: Top 25 enriched GO:CC terms for COSMIC and CLINVAR G4 mutations; Figure S6: Top 25 enriched GO:CC terms for COSMIC G4 mutations; Figure S7: Top 25 enriched GO:CC terms for CLINVAR G4 mutations; Figure S8: Top 25 enriched KEGG terms for COSMIC and CLINVAR G4 mutations; Figure S9: Top 25 enriched KEGG terms for COSMIC G4 mutations; Figure S10: Top 25 enriched KEGG terms for CLINVAR G4 mutations; Figure S11: Top 20 enriched transcription factors with overlapping ChIP-seq peaks for COSMIC G4 SNVs in the HEK293 cell line; Figure S12: Top 20 enriched transcription factors with overlapping ChIP-seq peaks for COSMIC G4 SNVs in the K562 cell line; Figure S13: Top 20 enriched transcription factors with overlapping ChIP-seq peaks for COSMIC G4 SNVs in the Hep-G2 cell line; Figure S14: Frequency of SNVs across G-quadruplex regions within trinucleotide contexts for the CLINVAR database; Figure S15: Distribution of A→G SNVs across the G4 region for different features on (A) the non-template and (B) template strand for CLINVAR variants; Figure S16: Distribution of G→T SNVs across the G4 region for different features on (A) the non-template and (B) template strand for CLINVAR variants; Figure S17: Distribution of G→A SNVs across the G4 region for different features on (A) the non-template and (B) template strand for CLINVAR variants; Figure S18: Distribution of T→G SNVs across the G4 region for different features on (A) the non-template and (B) template strand for CLINVAR variants; Figure S19: Distribution of SNVs across G-quadruplex regions for the (A) forward and (B) reverse strands for SNVs detected in the CLINVAR database; Figure S20: Distribution of SNVs across G-quadruplex regions for the (A) forward and (B) reverse strands for SNVs detected in the CLINVAR database by specific SNV substitution; Table S1: Count of all observed SNVs in the COSMIC database; Table S2: Counts of SNVs in G4 regions from the COSMIC database; Table S3: Changes in putative G4 from the COSMIC database across both strands before and after mutation; Table S4: Count and proportion of variants in experimentally validated G4 regions for different functional regions; Table S5: Significant GO:BP enrichments for all COSMIC and CLINVAR G4 mutations; Table S6: Significant GO:BP enrichments for all COSMIC G4 mutations; Table S7: Significant GO:BP enrichments for all CLINVAR G4 mutations; Table S8: Significant GO:BP enrichments for COSMIC and CLINVAR G4 mutations leading to the loss of a G4; Table S9: Significant GO:BP enrichments for COSMIC G4 mutations leading to the loss of a G4; Table S10: Significant GO:BP enrichments for CLINVAR G4 mutations leading to the loss of a G4; Table S11: Significant GO:BP enrichments for COSMIC and CLINVAR G4 mutations leading to the gain of a G4; Table S12: Significant GO:BP enrichments for COSMIC G4 mutations leading to the gain of a G4; Table S13: Significant GO:BP enrichments for CLINVAR G4 mutations leading to the gain of a G4; Table S14: Significant GO:CC enrichments for COSMIC and CLINVAR G4 mutations; Table S15: Significant GO:CC enrichments for COSMIC G4 mutations; Table S16: Significant GO:CC enrichments for CLINVAR G4 mutations; Table S17: Significant KEGG enrichments for COSMIC and CLINVAR G4 mutations; Table S18: Significant KEGG enrichments for COSMIC G4 mutations; Table S19: Significant KEGG enrichments for CLINVAR G4 mutations; Table S20: Significant KEGG enrichments for COSMIC and CLINVAR G4 mutations leading to a G4 loss; Table S21: Significant KEGG enrichments for COSMIC G4 mutations leading to a G4 loss; Table S22: Significant KEGG enrichments for CLINVAR G4 mutations leading to a G4 loss; Table S23: Significant GO:CC enrichments for COSMIC and CLINVAR G4 mutations leading to a G4 gain; Table S24: Significant KEGG enrichments for COSMIC G4 mutations leading to a G4 gain; Table S25: Significant INTERPRO enrichments for COSMIC and CLINVAR G4 mutations; Table S26: Significant INTERPRO enrichments for COSMIC G4 mutations; Table S27: Significant INTERPRO enrichments for CLINVAR G4 mutations; Table S28: Top 50 significant transcription factor enrichments for COSMIC and CLINVAR G4; Table S29: Count and percentage of effect of SNV calculated by thermodynamic MFE and ED changes in the G quadruplex sequence; Table S30: Effect of transition mutation G→A in chr10:122,143,482 on potential binding for multiple transcription factors.

Author Contributions

Conceptualization, E.C.R.; methodology, A.N., J.H.C. and E.C.R.; software, A.N.; validation, A.N. and E.C.R.; formal analysis, A.N. and E.C.R.; investigation, A.N., J.H.C. and E.C.R.; resources, E.C.R.; data curation, A.N. and E.C.R.; writing—original draft preparation, A.N and E.C.R.; writing—review and editing, A.N., J.H.C., and E.C.R.; visualization, A.N.; supervision, J.H.C. and E.C.R.; project administration, E.C.R.; funding acquisition, E.C.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Institutes of Health, grant number P20GM103436. The contents of this work are solely the responsibility of the authors and do not reflect the official views of the National Institutes of Health.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All code and resulting data are available in the GitHub repository (https://github.com/UofLBioinformatics/G4_SNV (accessed on 24 November 2023)). UCSC Genome Brower tracks are available at https://bit.ly/G4_SNV (accessed on 24 November 2023).

Acknowledgments

We wish to thank members of the Kentucky IDeA Networks of Biomedical Research Excellence (KY INBRE) Bioinformatics Core and the Rouchka and Park labs for their valuable feedback.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sen, D.; Gilbert, W. Formation of parallel four-stranded complexes by guanine-rich motifs in DNA and its implications for meiosis. Nature 1988, 334, 364–366. [Google Scholar] [CrossRef]
  2. Sen, D.; Gilbert, W. A sodium-potassium switch in the formation of four-stranded G4-DNA. Nature 1990, 344, 410–414. [Google Scholar] [CrossRef] [PubMed]
  3. Bugaut, A.; Balasubramanian, S. A sequence-independent study of the influence of short loop lengths on the stability and topology of intramolecular DNA G-quadruplexes. Biochemistry 2008, 47, 689–697. [Google Scholar] [CrossRef]
  4. Zhang, A.Y.; Bugaut, A.; Balasubramanian, S. A sequence-independent analysis of the loop length dependence of intramolecular RNA G-quadruplex stability and topology. Biochemistry 2011, 50, 7251–7258. [Google Scholar] [CrossRef] [PubMed]
  5. Sutyak, K.B.; Zavalij, P.Y.; Robinson, M.L.; Davis, J.T. Controlling molecularity and stability of hydrogen bonded G-quadruplexes by modulating the structure’s periphery. Chem. Commun. 2016, 52, 11112–11115. [Google Scholar] [CrossRef] [PubMed]
  6. Hänsel-Hertsch, R.; Beraldi, D.; Lensing, S.V.; Marsico, G.; Zyner, K.; Parry, A.; Di Antonio, M.; Pike, J.; Kimura, H.; Narita, M. G-quadruplex structures mark human regulatory chromatin. Nat. Genet. 2016, 48, 1267–1272. [Google Scholar] [CrossRef]
  7. Cheung, I.; Schertzer, M.; Rose, A.; Lansdorp, P.M. Disruption of dog-1 in Caenorhabditis elegans triggers deletions upstream of guanine-rich DNA. Nat. Genet. 2002, 31, 405–409. [Google Scholar] [CrossRef]
  8. Dahan, D.; Tsirkas, I.; Dovrat, D.; Sparks, M.A.; Singh, S.P.; Galletto, R.; Aharoni, A. Pif1 is essential for efficient replisome progression through lagging strand G-quadruplex DNA secondary structures. Nucleic Acids Res. 2018, 46, 11847–11857. [Google Scholar] [CrossRef]
  9. Paeschke, K.; Capra, J.A.; Zakian, V.A. DNA replication through G-quadruplex motifs is promoted by the Saccharomyces cerevisiae Pif1 DNA helicase. Cell 2011, 145, 678–691. [Google Scholar] [CrossRef]
  10. Rodriguez, R.; Miller, K.M.; Forment, J.V.; Bradshaw, C.R.; Nikan, M.; Britton, S.; Oelschlaegel, T.; Xhemalce, B.; Balasubramanian, S.; Jackson, S.P. Small-molecule–induced DNA damage identifies alternative DNA structures in human genes. Nat. Chem. Biol. 2012, 8, 301–310. [Google Scholar] [CrossRef]
  11. Piazza, A.; Adrian, M.; Samazan, F.; Heddi, B.; Hamon, F.; Serero, A.; Lopes, J.; Teulade-Fichou, M.P.; Phan, A.T.; Nicolas, A. Short loop length and high thermal stability determine genomic instability induced by G-quadruplex-forming minisatellites. EMBO J. 2015, 34, 1718–1734. [Google Scholar] [CrossRef]
  12. London, T.B.; Barber, L.J.; Mosedale, G.; Kelly, G.P.; Balasubramanian, S.; Hickson, I.D.; Boulton, S.J.; Hiom, K. FANCJ is a structure-specific DNA helicase associated with the maintenance of genomic G/C tracts. J. Biol. Chem. 2008, 283, 36132–36139. [Google Scholar] [CrossRef] [PubMed]
  13. Ribeyre, C.; Lopes, J.; Boulé, J.-B.; Piazza, A.; Guédin, A.; Zakian, V.A.; Mergny, J.-L.; Nicolas, A. The yeast Pif1 helicase prevents genomic instability caused by G-quadruplex-forming CEB1 sequences in vivo. PLoS Genet. 2009, 5, e1000475. [Google Scholar] [CrossRef] [PubMed]
  14. De Magis, A.; Manzo, S.G.; Russo, M.; Marinello, J.; Morigi, R.; Sordet, O.; Capranico, G. DNA damage and genome instability by G-quadruplex ligands are mediated by R loops in human cancer cells. Proc. Natl. Acad. Sci. USA 2019, 116, 816–825. [Google Scholar] [CrossRef] [PubMed]
  15. Madireddy, A.; Purushothaman, P.; Loosbroock, C.P.; Robertson, E.S.; Schildkraut, C.L.; Verma, S.C. G-quadruplex-interacting compounds alter latent DNA replication and episomal persistence of KSHV. Nucleic Acids Res. 2016, 44, 3675–3694. [Google Scholar] [CrossRef] [PubMed]
  16. Lee, J.; Sung, K.; Joo, S.Y.; Jeong, J.-H.; Kim, S.K.; Lee, H. Dynamic interaction of BRCA2 with telomeric G-quadruplexes underlies telomere replication homeostasis. Nat. Commun. 2022, 2022, 3396. [Google Scholar] [CrossRef]
  17. Mei, Y.; Deng, Z.; Vladimirova, O.; Gulve, N.; Johnson, F.B.; Drosopoulos, W.C.; Schildkraut, C.L.; Lieberman, P.M. TERRA G-quadruplex RNA interaction with TRF2 GAR domain is required for telomere integrity. Sci. Rep. 2021, 11, 3509. [Google Scholar] [CrossRef]
  18. Zimmer, J.; Tacconi, E.M.; Folio, C.; Badie, S.; Porru, M.; Klare, K.; Tumiati, M.; Markkanen, E.; Halder, S.; Ryan, A. Targeting BRCA1 and BRCA2 deficiencies with G-quadruplex-interacting compounds. Mol. Cell 2016, 61, 449–460. [Google Scholar] [CrossRef]
  19. Siddiqui-Jain, A.; Grand, C.L.; Bearss, D.J.; Hurley, L.H. Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription. Proc. Natl. Acad. Sci. USA 2002, 99, 11593–11598. [Google Scholar] [CrossRef]
  20. Gros, J.; Rosu, F.; Amrane, S.; De Cian, A.; Gabelica, V.; Lacroix, L.; Mergny, J.-L. Guanines are a quartet’s best friend: Impact of base substitutions on the kinetics and stability of tetramolecular quadruplexes. Nucleic Acids Res. 2007, 35, 3064–3075. [Google Scholar] [CrossRef]
  21. Didiot, M.-C.; Tian, Z.; Schaeffer, C.; Subramanian, M.; Mandel, J.-L.; Moine, H. The G-quartet containing FMRP binding site in FMR1 mRNA is a potent exonic splicing enhancer. Nucleic Acids Res. 2008, 36, 4902–4912. [Google Scholar] [CrossRef]
  22. Chaudhary, S.; Kaushik, M.; Ahmed, S.; Kukreti, R.; Kukreti, S. Structural switch from hairpin to duplex/antiparallel G-quadruplex at single-nucleotide polymorphism (SNP) site of human apolipoprotein E (APOE) gene coding region. ACS Omega 2018, 3, 3173–3182. [Google Scholar] [CrossRef]
  23. Bharti, S.K.; Sommers, J.A.; George, F.; Kuper, J.; Hamon, F.; Shin-ya, K.; Teulade-Fichou, M.-P.; Kisker, C.; Brosh, R.M. Specialization among iron-sulfur cluster helicases to resolve G-quadruplex DNA structures that threaten genomic stability. J. Biol. Chem. 2013, 288, 28217–28229. [Google Scholar] [CrossRef]
  24. Baral, A.; Kumar, P.; Halder, R.; Mani, P.; Yadav, V.K.; Singh, A.; Das, S.K.; Chowdhury, S. Quadruplex-single nucleotide polymorphisms (Quad-SNP) influence gene expression difference among individuals. Nucleic Acids Res. 2012, 40, 3800–3811. [Google Scholar] [CrossRef]
  25. Gong, J.-Y.; Wen, C.-J.; Tang, M.-L.; Duan, R.-F.; Chen, J.-N.; Zhang, J.-Y.; Zheng, K.-W.; He, Y.-D.; Hao, Y.-H.; Yu, Q. G-quadruplex structural variations in human genome associated with single-nucleotide variations and their impact on gene activity. Proc. Natl. Acad. Sci. USA 2021, 118, e2013230118. [Google Scholar] [CrossRef]
  26. Kuznetsova, A.A.; Fedorova, O.S.; Kuznetsov, N.A. Lesion recognition and cleavage of damage-containing quadruplexes and bulged structures by DNA glycosylases. Front. Cell Dev. Biol. 2020, 8, 595687. [Google Scholar] [CrossRef]
  27. Bedrat, A.; Lacroix, L.; Mergny, J.-L. Re-evaluation of G-quadruplex propensity with G4Hunter. Nucleic Acids Res. 2016, 44, 1746–1759. [Google Scholar] [CrossRef]
  28. Huppert, J.L.; Balasubramanian, S. Prevalence of quadruplexes in the human genome. Nucleic Acids Res. 2005, 33, 2908–2916. [Google Scholar] [CrossRef]
  29. Fleming, A.M.; Burrows, C.J. G-quadruplex folds of the human telomere sequence alter the site reactivity and reaction pathway of guanine oxidation compared to duplex DNA. Chem. Res. Toxicol. 2013, 26, 593–607. [Google Scholar] [CrossRef]
  30. Lee, D.S.; Ghanem, L.R.; Barash, Y. Integrative analysis reveals RNA G-quadruplexes in UTRs are selectively constrained and enriched for functional associations. Nat. Commun. 2020, 11, 527. [Google Scholar] [CrossRef]
  31. Zhou, J.; Fleming, A.M.; Averill, A.M.; Burrows, C.J.; Wallace, S.S. The NEIL glycosylases remove oxidized guanine lesions from telomeric and promoter quadruplex DNA structures. Nucleic Acids Res. 2015, 43, 4039–4054. [Google Scholar] [CrossRef] [PubMed]
  32. Adrian, M.; Heddi, B.; Phan, A.T. NMR spectroscopy of G-quadruplexes. Methods 2012, 57, 11–24. [Google Scholar] [CrossRef] [PubMed]
  33. Chambers, V.S.; Marsico, G.; Boutell, J.M.; Di Antonio, M.; Smith, G.P.; Balasubramanian, S. High-throughput sequencing of DNA G-quadruplex structures in the human genome. Nat. Biotechnol. 2015, 33, 877–881. [Google Scholar] [CrossRef]
  34. Nakken, S.; Rognes, T.; Hovig, E. The disruptive positions in human G-quadruplex motifs are less polymorphic and more conserved than their neutral counterparts. Nucleic Acids Res. 2009, 37, 5749–5756. [Google Scholar] [CrossRef]
  35. Lim, K.W.; Alberti, P.; Guedin, A.; Lacroix, L.; Riou, J.-F.; Royle, N.J.; Mergny, J.-L.; Phan, A.T. Sequence variant (CTAGGG)n in the human telomere favors a G-quadruplex structure containing a G·C·G·C tetrad. Nucleic Acids Res. 2009, 37, 6239–6248. [Google Scholar] [CrossRef]
  36. Sagne, C.; Marcel, V.; Bota, M.; Martel-Planche, G.; Nobrega, A.; Palmero, E.I.; Perriaud, L.; Boniol, M.; Vagner, S.; Cox, D.G. Age at cancer onset in germline TP53 mutation carriers: Association with polymorphisms in predicted G-quadruplex structures. Carcinogenesis 2014, 35, 807–815. [Google Scholar] [CrossRef]
  37. Yeom, M.; Kim, I.-H.; Kim, J.-K.; Kang, K.; Eoff, R.L.; Guengerich, F.P.; Choi, J.-Y. Effects of twelve germline missense variations on DNA lesion and G-quadruplex bypass activities of human DNA polymerase REV1. Chem. Res. Toxicol. 2016, 29, 367–379. [Google Scholar] [CrossRef]
  38. Hänsel-Hertsch, R.; Simeone, A.; Shea, A.; Hui, W.W.; Zyner, K.G.; Marsico, G.; Rueda, O.M.; Bruna, A.; Martin, A.; Zhang, X. Landscape of G-quadruplex DNA structural regions in breast cancer. Nat. Genet. 2020, 52, 878–883. [Google Scholar] [CrossRef]
  39. Zhang, R.; Shu, H.; Wang, Y.; Tao, T.; Tu, J.; Wang, C.; Mergny, J.-L.; Sun, X. G-quadruplex structures are key modulators of somatic structural variants in cancers. Cancer Res. 2023, 83, 1234–1248. [Google Scholar] [CrossRef]
  40. Bamford, S.; Dawson, E.; Forbes, S.; Clements, J.; Pettett, R.; Dogan, A.; Flanagan, A.; Teague, J.; Futreal, P.A.; Stratton, M.R. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br. J. Cancer 2004, 91, 355–358. [Google Scholar] [CrossRef] [PubMed]
  41. Landrum, M.J.; Lee, J.M.; Benson, M.; Brown, G.R.; Chao, C.; Chitipiralla, S.; Gu, B.; Hart, J.; Hoffman, D.; Jang, W. ClinVar: Improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018, 46, D1062–D1067. [Google Scholar] [CrossRef]
  42. Quinlan, A.R.; Hall, I.M. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 2010, 26, 841–842. [Google Scholar] [CrossRef] [PubMed]
  43. Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 2011, 27, 2987–2993. [Google Scholar] [CrossRef] [PubMed]
  44. Shigemizu, D.; Fujimoto, A.; Akiyama, S.; Abe, T.; Nakano, K.; Boroevich, K.A.; Yamamoto, Y.; Furuta, M.; Kubo, M.; Nakagawa, H. A practical method to detect SNVs and indels from whole genome and exome sequencing data. Sci. Rep. 2013, 3, 2161. [Google Scholar] [CrossRef] [PubMed]
  45. Zook, J.M.; Chapman, B.; Wang, J.; Mittelman, D.; Hofmann, O.; Hide, W.; Salit, M. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 2014, 32, 246–251. [Google Scholar] [CrossRef] [PubMed]
  46. Gruber, A.R.; Lorenz, R.; Bernhart, S.H.; Neuböck, R.; Hofacker, I.L. The vienna RNA websuite. Nucleic Acids Res. 2008, 36, W70–W74. [Google Scholar] [CrossRef] [PubMed]
  47. Cavalcante, R.G.; Sartor, M.A. Annotatr: Genomic regions in context. Bioinformatics 2017, 33, 2381–2383. [Google Scholar] [CrossRef]
  48. Hammal, F.; de Langen, P.; Bergon, A.; Lopez, F.; Ballester, B. ReMap 2022: A database of Human, Mouse, Drosophila and Arabidopsis regulatory regions from an integrative analysis of DNA-binding sequencing experiments. Nucleic Acids Res. 2022, 50, D316–D325. [Google Scholar] [CrossRef]
  49. Raudvere, U.; Kolberg, L.; Kuzmin, I.; Arak, T.; Adler, P.; Peterson, H.; Vilo, J. g: Profiler: A web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 2019, 47, W191–W198. [Google Scholar] [CrossRef]
  50. Szklarczyk, D.; Gable, A.L.; Nastou, K.C.; Lyon, D.; Kirsch, R.; Pyysalo, S.; Doncheva, N.T.; Legeay, M.; Fang, T.; Bork, P. The STRING database in 2021: Customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021, 49, D605–D612. [Google Scholar] [CrossRef]
  51. Coetzee, S.G.; Coetzee, G.A.; Hazelett, D.J. motifbreakR: An R/Bioconductor package for predicting variant effects at transcription factor binding sites. Bioinformatics 2015, 31, 3847–3849. [Google Scholar] [CrossRef]
  52. Sved, J.; Bird, A. The expected equilibrium of the CpG dinucleotide in vertebrate genomes under a mutation model. Proc. Natl. Acad. Sci. USA 1990, 87, 4692–4696. [Google Scholar] [CrossRef] [PubMed]
  53. Youk, J.; An, Y.; Park, S.; Lee, J.-K.; Ju, Y.S. The genome-wide landscape of C:G > T: A polymorphism at the CpG contexts in the human population. BMC Genom. 2020, 21, 270. [Google Scholar] [CrossRef]
  54. Beaudoin, J.-D.; Perreault, J.-P. 5′-UTR G-quadruplex structures acting as translational repressors. Nucleic Acids Res. 2010, 38, 7022–7036. [Google Scholar] [CrossRef] [PubMed]
  55. Bolduc, F.; Garant, J.-M.; Allard, F.; Perreault, J.-P. Irregular G-quadruplexes found in the untranslated regions of human mRNAs influence translation. J. Biol. Chem. 2016, 291, 21751–21760. [Google Scholar] [CrossRef]
  56. Capra, J.A.; Paeschke, K.; Singh, M.; Zakian, V.A. G-quadruplex DNA sequences are evolutionarily conserved and associated with distinct genomic features in Saccharomyces cerevisiae. PLoS Comput. Biol. 2010, 6, e1000861. [Google Scholar] [CrossRef]
  57. Cogoi, S.; Xodo, L.E. G-quadruplex formation within the promoter of the KRAS proto-oncogene and its effect on transcription. Nucleic Acids Res. 2006, 34, 2536–2549. [Google Scholar] [CrossRef] [PubMed]
  58. David, A.P.; Margarit, E.; Domizi, P.; Banchio, C.; Armas, P.; Calcaterra, N.B. G-quadruplexes as novel cis-elements controlling transcription during embryonic development. Nucleic Acids Res. 2016, 44, 4163–4173. [Google Scholar] [CrossRef]
  59. Endoh, T.; Kawasaki, Y.; Sugimoto, N. Suppression of gene expression by G-quadruplexes in open reading frames depends on G-quadruplex stability. Angew. Chem. Int. Ed. 2013, 52, 5522–5526. [Google Scholar] [CrossRef]
  60. Farhath, M.M.; Thompson, M.; Ray, S.; Sewell, A.; Balci, H.; Basu, S. G-Quadruplex-enabling sequence within the human tyrosine hydroxylase promoter differentially regulates transcription. Biochemistry 2015, 54, 5533–5545. [Google Scholar] [CrossRef]
  61. Fernando, H.; Sewitz, S.; Darot, J.; Tavare, S.; Huppert, J.L.; Balasubramanian, S. Genome-wide analysis of a G-quadruplex-specific single-chain antibody that regulates gene expression. Nucleic Acids Res. 2009, 37, 6716–6722. [Google Scholar] [CrossRef] [PubMed]
  62. Lago, S.; Nadai, M.; Cernilogar, F.M.; Kazerani, M.; Domíniguez Moreno, H.; Schotta, G.; Richter, S.N. Promoter G-quadruplexes and transcription factors cooperate to shape the cell type-specific transcriptome. Nat. Commun. 2021, 12, 3885. [Google Scholar] [CrossRef] [PubMed]
  63. Rezzoug, F.; Thomas, S.D.; Rouchka, E.C.; Miller, D.M. Discovery of a family of genomic sequences which interact specifically with the c-MYC promoter to regulate c-MYC expression. PLoS ONE 2016, 11, e0161588. [Google Scholar] [CrossRef]
  64. Shao, X.; Zhang, W.; Umar, M.I.; Wong, H.Y.; Seng, Z.; Xie, Y.; Zhang, Y.; Yang, L.; Kwok, C.K.; Deng, X. RNA G-quadruplex structures mediate gene regulation in bacteria. MBio 2020, 11, e02926-19. [Google Scholar] [CrossRef]
  65. Spiegel, J.; Cuesta, S.M.; Adhikari, S.; Hänsel-Hertsch, R.; Tannahill, D.; Balasubramanian, S. G-quadruplexes are transcription factor binding hubs in human chromatin. Genome Biol. 2021, 22, 117. [Google Scholar] [CrossRef]
  66. Belotserkovskii, B.P.; Soo Shin, J.H.; Hanawalt, P.C. Strong transcription blockage mediated by R-loop formation within a G-rich homopurine–homopyrimidine sequence localized in the vicinity of the promoter. Nucleic Acids Res. 2017, 45, 6589–6599. [Google Scholar] [CrossRef]
  67. Polak, P.; Arndt, P.F. Transcription induces strand-specific mutations at the 5′ end of human genes. Genome Res. 2008, 18, 1216–1223. [Google Scholar] [CrossRef]
  68. Park, J.W.; Han, Y.I.; Kim, S.W.; Kim, T.M.; Yeom, S.C.; Kang, J.; Park, J. 8-OxoG in GC-rich Sp1 binding sites enhances gene transcription in adipose tissue of juvenile mice. Sci. Rep. 2019, 9, 15618. [Google Scholar] [CrossRef]
  69. Cave, J.W.; Willis, D.E. G-quadruplex regulation of neural gene expression. FEBS J. 2022, 289, 3284–3303. [Google Scholar] [CrossRef]
  70. Westmark, C.J.; Malter, J.S. FMRP mediates mGluR5-dependent translation of amyloid precursor protein. PLoS Biol. 2007, 5, e52. [Google Scholar] [CrossRef]
  71. Huang, Z.-L.; Dai, J.; Luo, W.-H.; Wang, X.-G.; Tan, J.-H.; Chen, S.-B.; Huang, Z.-S. Identification of G-quadruplex-binding protein from the exploration of RGG motif/G-quadruplex interactions. J. Am. Chem. Soc. 2018, 140, 17945–17955. [Google Scholar] [CrossRef]
  72. Fürst, J.; Schedlbauer, A.; Gandini, R.; Garavaglia, M.L.; Saino, S.; Gschwentner, M.; Sarg, B.; Lindner, H.; Jakab, M.; Ritter, M. ICln159 Folds into a Pleckstrin Homology Domain-like Structure: Interaction with kinases and the splicing factor LSm4. J. Biol. Chem. 2005, 280, 31276–31282. [Google Scholar] [CrossRef]
  73. Gervais, V.; Lamour, V.; Jawhari, A.; Frindel, F.; Wasielewski, E.; Dubaele, S.; Egly, J.-M.; Thierry, J.-C.; Kieffer, B.; Poterszman, A. TFIIH contains a PH domain involved in DNA nucleotide excision repair. Nat. Struct. Mol. Biol. 2004, 11, 616–622. [Google Scholar] [CrossRef] [PubMed]
  74. Das, K.; Srivastava, M.; Raghavan, S.C. GNG motifs can replace a GGG stretch during G-quadruplex formation in a context dependent manner. PLoS ONE 2016, 11, e0158794. [Google Scholar] [CrossRef] [PubMed]
  75. Neupane, A.; Chariker, J.H.; Rouchka, E.C. Structural and functional classification of G-quadruplex families within the human genome. Genes 2023, 14, 645. [Google Scholar] [CrossRef] [PubMed]
  76. Verma, A.; Halder, K.; Halder, R.; Yadav, V.K.; Rawal, P.; Thakur, R.K.; Mohd, F.; Sharma, A.; Chowdhury, S. Genome-wide computational and expression analyses reveal G-quadruplex DNA motifs as conserved cis-regulatory elements in human and related species. J. Med. Chem. 2008, 51, 5641–5649. [Google Scholar] [CrossRef]
  77. Yadav, V.K.; Abraham, J.K.; Mani, P.; Kulshrestha, R.; Chowdhury, S. QuadBase: Genome-wide database of G4 DNA—Occurrence and conservation in human, chimpanzee, mouse and rat promoters and 146 microbes. Nucleic Acids Res. 2007, 36, D381–D385. [Google Scholar] [CrossRef] [PubMed]
  78. Pitié, M.; Boldron, C.; Pratviel, G. DNA oxidation by copper and manganese complexes. In Advances in Inorganic Chemistry; Elsevier: Amsterdam, The Netherlands, 2006; Volume 58, pp. 77–130. [Google Scholar]
  79. Bielskutė, S.; Plavec, J.; Podbevšek, P. Impact of oxidative lesions on the human telomeric G-quadruplex. J. Am. Chem. Soc. 2019, 141, 2594–2603. [Google Scholar] [CrossRef] [PubMed]
  80. Liguori, I.; Russo, G.; Curcio, F.; Bulli, G.; Aran, L.; Della-Morte, D.; Gargiulo, G.; Testa, G.; Cacciatore, F.; Bonaduce, D. Oxidative stress, aging, and diseases. Clin. Interv. Aging 2018, 13, 757. [Google Scholar] [CrossRef]
  81. Ishii, T.; Sekiguchi, M. Two ways of escaping from oxidative RNA damage: Selective degradation and cell death. DNA Repair 2019, 81, 102666. [Google Scholar] [CrossRef]
  82. Russo, M.T.; De Luca, G.; Degan, P.; Bignami, M. Different DNA repair strategies to combat the threat from 8-oxoguanine. Mutat. Res./Fundam. Mol. Mech. Mutagen. 2007, 614, 69–76. [Google Scholar] [CrossRef] [PubMed]
  83. Holliday, R.; Grigg, G. DNA methylation and mutation. Mutat. Res./Fundam. Mol. Mech. Mutagen. 1993, 285, 61–67. [Google Scholar] [CrossRef]
  84. Banda, D.M.; Nuñez, N.N.; Burnside, M.A.; Bradshaw, K.M.; David, S.S. Repair of 8-oxoG: A mismatches by the MUTYH glycosylase: Mechanism, metals and medicine. Free Radic. Biol. Med. 2017, 107, 202–215. [Google Scholar] [CrossRef] [PubMed]
  85. van Loon, B.; Hübscher, U. An 8-oxo-guanine repair pathway coordinated by MUTYH glycosylase and DNA polymerase λ. Proc. Natl. Acad. Sci. USA 2009, 106, 18201–18206. [Google Scholar] [CrossRef]
  86. Viel, A.; Bruselles, A.; Meccia, E.; Fornasarig, M.; Quaia, M.; Canzonieri, V.; Policicchio, E.; Urso, E.D.; Agostini, M.; Genuardi, M. A specific mutational signature associated with DNA 8-oxoguanine persistence in MUTYH-defective colorectal cancer. EBioMedicine 2017, 20, 39–49. [Google Scholar] [CrossRef]
  87. Bellon, S.; Shikazono, N.; Cunniffe, S.; Lomax, M.; O’Neill, P. Processing of thymine glycol in a clustered DNA damage site: Mutagenic or cytotoxic. Nucleic Acids Res. 2009, 37, 4430–4440. [Google Scholar] [CrossRef] [PubMed]
  88. Fleming, A.M.; Zhou, J.; Wallace, S.S.; Burrows, C.J. A role for the fifth G-track in G-quadruplex forming oncogene promoter sequences during oxidative stress: Do these “spare tires” have an evolved function? ACS Cent. Sci. 2015, 1, 226–233. [Google Scholar] [CrossRef]
  89. Fleming, A.M.; Zhu, J.; Ding, Y.; Burrows, C.J. 8-Oxo-7, 8-dihydroguanine in the context of a gene promoter G-quadruplex is an on–off switch for transcription. ACS Chem. Biol. 2017, 12, 2417–2426. [Google Scholar] [CrossRef]
  90. Sun, D.; Liu, W.-J.; Guo, K.; Rusche, J.J.; Ebbinghaus, S.; Gokhale, V.; Hurley, L.H. The proximal promoter region of the human vascular endothelial growth factor gene has a G-quadruplex structure that can be targeted by G-quadruplex–interactive agents. Mol. Cancer Ther. 2008, 7, 880–889. [Google Scholar] [CrossRef]
  91. Monsen, R.C.; DeLeeuw, L.; Dean, W.L.; Gray, R.D.; Sabo, T.M.; Chakravarthy, S.; Chaires, J.B.; Trent, J.O. The hTERT core promoter forms three parallel G-quadruplexes. Nucleic Acids Res. 2020, 48, 5720–5734. [Google Scholar] [CrossRef]
  92. Palumbo, S.L.; Ebbinghaus, S.W.; Hurley, L.H. Formation of a unique end-to-end stacked pair of G-quadruplexes in the hTERT core promoter with implications for inhibition of telomerase by G-quadruplex-interactive ligands. J. Am. Chem. Soc. 2009, 131, 10878–10891. [Google Scholar] [CrossRef]
  93. Ramon, O.; Sauvaigo, S.; Gasparutto, D.; Faure, P.; Favier, A.; Cadet, J. Effects of 8-oxo-7, 8-dihydro-2′-deoxyguanosine on the binding of the transcription factor Sp1 to its cognate target DNA sequence (GC box). Free Radic. Res. 1999, 31, 217–229. [Google Scholar] [CrossRef] [PubMed]
  94. Hailer-Morrison, M.K.; Kotler, J.M.; Martin, B.D.; Sugden, K.D. Oxidized guanine lesions as modulators of gene transcription. Altered p50 binding affinity and repair shielding by 7, 8-dihydro-8-oxo-2 ‘-deoxyguanosine lesions in the NF-κB promoter element. Biochemistry 2003, 42, 9761–9770. [Google Scholar] [CrossRef] [PubMed]
  95. Moore, S.P.; Toomire, K.J.; Strauss, P.R. DNA modifications repaired by base excision repair are epigenetic. DNA Repair 2013, 12, 1152–1158. [Google Scholar] [CrossRef] [PubMed]
  96. Valinluck, V.; Tsai, H.-H.; Rogstad, D.K.; Burdzy, A.; Bird, A.; Sowers, L.C. Oxidative damage to methyl-CpG sequences inhibits the binding of the methyl-CpG binding domain (MBD) of methyl-CpG binding protein 2 (MeCP2). Nucleic Acids Res. 2004, 32, 4100–4108. [Google Scholar] [CrossRef] [PubMed]
  97. Reynolds, N.; O’Shaughnessy, A.; Hendrich, B. Transcriptional repressors: Multifaceted regulators of gene expression. Development 2013, 140, 505–512. [Google Scholar] [CrossRef]
  98. Vidal, M.; Starowicz, K. Polycomb complexes PRC1 and their function in hematopoiesis. Exp. Hematol. 2017, 48, 12–31. [Google Scholar] [CrossRef] [PubMed]
  99. Beltran, M.; Tavares, M.; Justin, N.; Khandelwal, G.; Ambrose, J.; Foster, B.M.; Worlock, K.B.; Tvardovskiy, A.; Kunzelmann, S.; Herrero, J. G-tract RNA removes Polycomb repressive complex 2 from genes. Nat. Struct. Mol. Biol. 2019, 26, 899–909. [Google Scholar] [CrossRef]
  100. Wang, X.; Goodrich, K.J.; Gooding, A.R.; Naeem, H.; Archer, S.; Paucek, R.D.; Youmans, D.T.; Cech, T.R.; Davidovich, C. Targeting of polycomb repressive complex 2 to RNA by short repeats of consecutive guanines. Mol. Cell 2017, 65, 1056–1067.e1055. [Google Scholar] [CrossRef]
  101. Bak, S.T.; Sakellariou, D.; Pena-Diaz, J. The dual nature of mismatch repair as antimutator and mutator: For better or for worse. Front. Genet. 2014, 5, 287. [Google Scholar] [CrossRef]
  102. Carell, T.; Kurz, M.Q.; Müller, M.; Rossa, M.; Spada, F. Non-canonical bases in the genome: The regulatory information layer in DNA. Angew. Chem. Int. Ed. 2018, 57, 4296–4312. [Google Scholar] [CrossRef]
  103. Malfatti, M.C.; Antoniali, G.; Codrich, M.; Burra, S.; Mangiapane, G.; Dalla, E.; Tell, G. New perspectives in cancer biology from a study of canonical and non-canonical functions of base excision repair proteins with a focus on early steps. Mutagenesis 2020, 35, 129–149. [Google Scholar] [CrossRef]
  104. Saini, N.; Zhang, Y.; Usdin, K.; Lobachev, K.S. When secondary comes first–the importance of non-canonical DNA structures. Biochimie 2013, 95, 117–123. [Google Scholar] [CrossRef] [PubMed]
  105. Bignami, M.; Mazzei, F. Base Excision Repair in Trinucleotide Repeat Expansion Disorders. In The Base Excision Repair Pathway: Molecular Mechanisms and Role in Disease Development and Therapeutic Design; World Scientific: Singapore, 2017; pp. 501–522. [Google Scholar]
  106. Dexheimer, T.S. DNA repair pathways and mechanisms. In DNA Repair of Cancer Stem Cells; Springer: Berlin/Heidelberg, Germany, 2013; pp. 19–32. [Google Scholar]
  107. Varshney, D.; Spiegel, J.; Zyner, K.; Tannahill, D.; Balasubramanian, S. The regulation and functions of DNA and RNA G-quadruplexes. Nat. Rev. Mol. Cell Biol. 2020, 21, 459–474. [Google Scholar] [CrossRef] [PubMed]
  108. Wang, Q.; Liu, J.-Q.; Chen, Z.; Zheng, K.-W.; Chen, C.-Y.; Hao, Y.-H.; Tan, Z. G-quadruplex formation at the 3′ end of telomere DNA inhibits its extension by telomerase, polymerase and unwinding by helicase. Nucleic Acids Res. 2011, 39, 6229–6237. [Google Scholar] [CrossRef]
  109. Lee, C.Y.; Park, K.S.; Park, H.G. A fluorescent G-quadruplex probe for the assay of base excision repair enzyme activity. Chem. Commun. 2015, 51, 13744–13747. [Google Scholar] [CrossRef] [PubMed]
  110. Leung, K.-H.; He, H.-Z.; Ma, V.P.-Y.; Zhong, H.-J.; Chan, D.S.-H.; Zhou, J.; Mergny, J.-L.; Leung, C.-H.; Ma, D.-L. Detection of base excision repair enzyme activity using a luminescent G-quadruplex selective switch-on probe. Chem. Commun. 2013, 49, 5630–5632. [Google Scholar] [CrossRef]
  111. Craddock, N.; Jones, L.; Jones, I.R.; Kirov, G.; Green, E.K.; Grozeva, D.; Moskvina, V.; Nikolov, I.; Hamshere, M.L.; Vukcevic, D. Strong genetic evidence for a selective influence of GABAA receptors on a component of the bipolar disorder phenotype. Mol. Psychiatry 2010, 15, 146–153. [Google Scholar] [CrossRef]
  112. Jin, Z.; Gao, F.; Flagg, T.; Deng, X. Tobacco-specific nitrosamine 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone promotes functional cooperation of Bcl2 and c-Myc through phosphorylation in regulating cell survival and proliferation. J. Biol. Chem. 2004, 279, 40209–40219. [Google Scholar] [CrossRef]
  113. Mattila, P.M.; Röyttä, M.; Lönnberg, P.; Marjamäki, P.; Helenius, H.; Rinne, J.O. Choline acetyltransferase activity and striatal dopamine receptors in Parkinson’s disease in relation to cognitive impairment. Acta Neuropathol. 2001, 102, 160–166. [Google Scholar] [CrossRef]
  114. Tarabeux, J.; Kebir, O.; Gauthier, J.; Hamdan, F.; Xiong, L.; Piton, A.; Spiegelman, D.; Henrion, É.; Millet, B.; Fathalli, F. Rare mutations in N-methyl-D-aspartate glutamate receptors in autism spectrum disorders and schizophrenia. Transl. Psychiatry 2011, 1, e55. [Google Scholar] [CrossRef]
  115. Williams, N.M.; Bowen, T.; Spurlock, G.; Norton, N.; Williams, H.J.; Hoogendoorn, B.; Owen, M.J.; O’Donovan, M.C. Determination of the genomic structure and mutation screening in schizophrenic individuals for five subunits of the N-methyl-D-aspartate glutamate receptor. Mol. Psychiatry 2002, 7, 508–514. [Google Scholar] [CrossRef]
  116. Yu, Y.; Lin, Y.; Takasaki, Y.; Wang, C.; Kimura, H.; Xing, J.; Ishizuka, K.; Toyama, M.; Kushima, I.; Mori, D. Rare loss of function mutations in N-methyl-D-aspartate glutamate receptors and their contributions to schizophrenia susceptibility. Transl. Psychiatry 2018, 8, 12. [Google Scholar] [CrossRef] [PubMed]
  117. Schofield, J.P.; Cowan, J.L.; Coldwell, M.J. G-quadruplexes mediate local translation in neurons. Biochem. Soc. Trans. 2015, 43, 338–342. [Google Scholar] [CrossRef] [PubMed]
  118. Subramanian, M.; Rage, F.; Tabet, R.; Flatter, E.; Mandel, J.L.; Moine, H. G–quadruplex RNA structure as a signal for neurite mRNA targeting. EMBO Rep. 2011, 12, 697–704. [Google Scholar] [CrossRef] [PubMed]
  119. Kharel, P.; Balaratnam, S.; Beals, N.; Basu, S. The role of RNA G-quadruplexes in human diseases and therapeutic strategies. Wiley Interdiscip. Rev. RNA 2020, 11, e1568. [Google Scholar] [CrossRef] [PubMed]
  120. Lazniewska, J.; Milowska, K.; Zablocka, M.; Mignani, S.; Caminade, A.-M.; Majoral, J.-P.; Bryszewska, M.; Gabryelak, T. Mechanism of cationic phosphorus dendrimer toxicity against murine neural cell lines. Mol. Pharm. 2013, 10, 3484–3496. [Google Scholar] [CrossRef]
  121. Zeng, Y.; Kurokawa, Y.; Win-Shwe, T.-T.; Zeng, Q.; Hirano, S.; Zhang, Z.; Sone, H. Effects of PAMAM dendrimers with various surface functional groups and multiple generations on cytotoxicity and neuronal differentiation using human neural progenitor cells. J. Toxicol. Sci. 2016, 41, 351–370. [Google Scholar] [CrossRef]
  122. Grün, J.T.; Schwalbe, H. Folding dynamics of polymorphic G-quadruplex structures. Biopolymers 2022, 113, e23477. [Google Scholar] [CrossRef]
  123. Stevens, A.J.; de Jong, L.; Kennedy, M.A. The Dynamic Regulation of G-Quadruplex DNA Structures by Cytosine Methylation. Int. J. Mol. Sci. 2022, 23, 2407. [Google Scholar] [CrossRef]
  124. Lech, C.J.; Heddi, B.; Phan, A.T.n. Guanine base stacking in G-quadruplex nucleic acids. Nucleic Acids Res. 2013, 41, 2034–2046. [Google Scholar] [CrossRef] [PubMed]
  125. Illingworth, R.S.; Gruenewald-Schneider, U.; Webb, S.; Kerr, A.R.; James, K.D.; Turner, D.J.; Smith, C.; Harrison, D.J.; Andrews, R.; Bird, A.P. Orphan CpG islands identify numerous conserved promoters in the mammalian genome. PLoS Genet. 2010, 6, e1001134. [Google Scholar] [CrossRef] [PubMed]
  126. Szeltner, Z.; Ferenc, G.; Juhász, T.; Kupihár, Z.; Váradi, Z.; Szüts, D.; Kovács, L. Probing telomeric-like G4 structures with full or partial 2′-deoxy-5-hydroxyuridine substitutions. Biochimie 2023, 214, 33–44. [Google Scholar] [CrossRef] [PubMed]
  127. Dézé, O.; Laffleur, B.; Cogné, M. Roles of G4-DNA and G4-RNA in Class Switch Recombination and Additional Regulations in B-Lymphocytes. Molecules 2023, 28, 1159. [Google Scholar] [CrossRef]
Figure 1. G4 structure of a guanine tetrad formed via Hoogsteen bond formation. dR: sugar–phosphate groups; cyan: guanine nucleotides; red: Hoogsteen bonds.
Figure 1. G4 structure of a guanine tetrad formed via Hoogsteen bond formation. dR: sugar–phosphate groups; cyan: guanine nucleotides; red: Hoogsteen bonds.
Genes 14 02125 g001
Figure 2. SNV mutation types found in quartets of different lengths for (a) intergenic regions, (b) non-template strand genic regions, and (c) template strand genic regions from the COSMIC database.
Figure 2. SNV mutation types found in quartets of different lengths for (a) intergenic regions, (b) non-template strand genic regions, and (c) template strand genic regions from the COSMIC database.
Genes 14 02125 g002
Figure 3. G4 variants detected within annotated functional regions (CDS, exons, 5′UTRs, 3′UTRs, CpG islands, lncRNA, introns, intergenic promoters, and enhancers). Shown are (a) the count of change in pG4 with a G4Hunter score across both strands before and after mutation (0: the absence of pG4; +: the presence of G4 in the forward strand; -: the presence of G4 in the reverse strand); (b) the percentage of the type of mutation across annotations from the COSMIC database; and (c) the percentage of SNVs that occur in a G4 region across the template and non-template strand for functional annotation groups within the CLINVAR database.
Figure 3. G4 variants detected within annotated functional regions (CDS, exons, 5′UTRs, 3′UTRs, CpG islands, lncRNA, introns, intergenic promoters, and enhancers). Shown are (a) the count of change in pG4 with a G4Hunter score across both strands before and after mutation (0: the absence of pG4; +: the presence of G4 in the forward strand; -: the presence of G4 in the reverse strand); (b) the percentage of the type of mutation across annotations from the COSMIC database; and (c) the percentage of SNVs that occur in a G4 region across the template and non-template strand for functional annotation groups within the CLINVAR database.
Genes 14 02125 g003
Figure 4. Thermodynamic changes associated with variants in various genomic features. (a) Non-zero DNA-based ∆MFE associated with variants in genomic features (introns, promoters, exons, enhancers, intergenic regions, and CpG islands) for the alternate allele (top) and the reference allele (bottom). (b) RNA-based ∆MFE associated with variants in genomic features (lncRNA, 3′UTR, promoters, 5′UTR, and intergenic regions) for the alternate allele (top) and the reference allele (bottom). (c) Stabilization effects for DNA-based annotations in non-gene regions (i.e., intergenic) or on the template or non-template strand for variants within gene regions. (d) Stabilization effects of gene-based annotations in intergenic, template, and non-template regions. **: Significant difference (p < 0.01 using Dunn’s test). ***: p < 0.001; ****: p < 0.0001.
Figure 4. Thermodynamic changes associated with variants in various genomic features. (a) Non-zero DNA-based ∆MFE associated with variants in genomic features (introns, promoters, exons, enhancers, intergenic regions, and CpG islands) for the alternate allele (top) and the reference allele (bottom). (b) RNA-based ∆MFE associated with variants in genomic features (lncRNA, 3′UTR, promoters, 5′UTR, and intergenic regions) for the alternate allele (top) and the reference allele (bottom). (c) Stabilization effects for DNA-based annotations in non-gene regions (i.e., intergenic) or on the template or non-template strand for variants within gene regions. (d) Stabilization effects of gene-based annotations in intergenic, template, and non-template regions. **: Significant difference (p < 0.01 using Dunn’s test). ***: p < 0.001; ****: p < 0.0001.
Genes 14 02125 g004
Figure 5. Effect of variants causing a gain or loss of a G4. (a) Effects of transition versus transversion mutations indicate that transitions are more likely to lead to a G4 loss, while transversions are more likely to lead to a G4 gain. (b) Effect of SNVs in a 3-mer context leading to G4 gain (left), loss (center), or no change (right). (c) Effect of SNVs in a 3-mer context shows that variants are more likely to be found in loop regions (green). (d) Breakdown of mutations causing different conformations in enhancers, CpG islands, intergenic regions, exons, introns, and promoters on both the template and non-template DNA strands. (e) Breakdown of mutations causing different conformations in intergenic regions, CDS, 5′UTR, 3′UTR, promoters, and lncRNA on both the template and non-template DNA strands.
Figure 5. Effect of variants causing a gain or loss of a G4. (a) Effects of transition versus transversion mutations indicate that transitions are more likely to lead to a G4 loss, while transversions are more likely to lead to a G4 gain. (b) Effect of SNVs in a 3-mer context leading to G4 gain (left), loss (center), or no change (right). (c) Effect of SNVs in a 3-mer context shows that variants are more likely to be found in loop regions (green). (d) Breakdown of mutations causing different conformations in enhancers, CpG islands, intergenic regions, exons, introns, and promoters on both the template and non-template DNA strands. (e) Breakdown of mutations causing different conformations in intergenic regions, CDS, 5′UTR, 3′UTR, promoters, and lncRNA on both the template and non-template DNA strands.
Genes 14 02125 g005
Figure 6. Significance of the top 20 transcription factors and their genome-wide binding sites.
Figure 6. Significance of the top 20 transcription factors and their genome-wide binding sites.
Genes 14 02125 g006
Figure 7. Distribution of SNVs across the G4 regions on the non-template and template strands. Shown are the results for (a) T→G variants, (b) A→G variants, (c) G→T variants, and (d) G→A variants.
Figure 7. Distribution of SNVs across the G4 regions on the non-template and template strands. Shown are the results for (a) T→G variants, (b) A→G variants, (c) G→T variants, and (d) G→A variants.
Genes 14 02125 g007
Figure 8. Distribution of SNVs in trinucleotide contexts relative to the opposite or same strand as the corresponding gene.
Figure 8. Distribution of SNVs in trinucleotide contexts relative to the opposite or same strand as the corresponding gene.
Genes 14 02125 g008
Table 1. Count and proportion of the effect of the type of mutation on the stability of G4 (COSMIC database).
Table 1. Count and proportion of the effect of the type of mutation on the stability of G4 (COSMIC database).
Type of SNVEffect of SNV on MFEFreqPercentage
TransitionDestabilized860022.93
TransversionFurther stabilized660317.60
TransitionNo change655217.46
TransversionDestabilized641917.11
TransversionNo change550914.68
TransitionFurther stabilized383210.21
Table 2. Proportion of SNV via annotation.
Table 2. Proportion of SNV via annotation.
SNV3′ UTR5′ UTRCDSCpG IslandsEnhancersExonIntergenicIntronlncRNA
GENCODE
Promoter
G→A34.8231.2339.3427.5819.0135.1827.8426.7428.2726.87
G→T18.314.8215.8412.0112.0816.617.1514.5614.8913.64
C→T11.138.2412.1912.235.6811.357.949.0610.468.89
T→G12.6716.488.5916.7829.8411.7418.3819.9118.1518.95
A→G7.6210.836.919.5612.61811.3811.499.8910.98
G→C5.166.054.696.277.285.096.896.36.236.51
C→G3.664.854.477.696.754.333.474.594.566.34
C→A2.762.994.113.851.423.651.952.62.733.3
T→C2.151.661.7922.131.961.672.12.132.07
T→A0.70.860.840.670.890.811.2711.170.82
A→T0.591.20.720.751.60.761.120.9410.85
A→C0.450.80.510.620.710.520.940.720.530.77
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Neupane, A.; Chariker, J.H.; Rouchka, E.C. Analysis of Nucleotide Variations in Human G-Quadruplex Forming Regions Associated with Disease States. Genes 2023, 14, 2125. https://doi.org/10.3390/genes14122125

AMA Style

Neupane A, Chariker JH, Rouchka EC. Analysis of Nucleotide Variations in Human G-Quadruplex Forming Regions Associated with Disease States. Genes. 2023; 14(12):2125. https://doi.org/10.3390/genes14122125

Chicago/Turabian Style

Neupane, Aryan, Julia H. Chariker, and Eric C. Rouchka. 2023. "Analysis of Nucleotide Variations in Human G-Quadruplex Forming Regions Associated with Disease States" Genes 14, no. 12: 2125. https://doi.org/10.3390/genes14122125

APA Style

Neupane, A., Chariker, J. H., & Rouchka, E. C. (2023). Analysis of Nucleotide Variations in Human G-Quadruplex Forming Regions Associated with Disease States. Genes, 14(12), 2125. https://doi.org/10.3390/genes14122125

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop